# Observability — normalized target spec Status: **Draft**. The single design the sister projects converge on. Derived from the three code-verified current-state docs (`../current-state/`). Goal is *path to shared code* (`../shared-contract/ZB.MOM.WW.Telemetry.md`), so each normalized section maps to a shared library seam. ## 0. Scope **Normalized here:** one OpenTelemetry bootstrap across all three signals (metrics + traces + logs) via a single `AddZbTelemetry` extension; the shared `Resource` attribute set (`service.name` / `service.namespace` / `service.version` / `site.id` / `node.role` / `host.name`) that makes every node distinguishable in a collector; standard instrumentation everyone enables (ASP.NET Core, HttpClient, gRPC client, runtime, process meters); exporter conventions (Prometheus scrape endpoint default, OTLP opt-in); a shared Serilog bootstrap with identity enrichers (`SiteId`, `NodeRole` from `ZbTelemetryOptions`; `NodeHostname` auto from `Environment.MachineName`) matching the OTel Resource dimensions (metrics and logs therefore carry identical dimensions); a `TraceContextEnricher` that stamps `trace_id`/`span_id` from `Activity.Current` onto every Serilog event, enabling log↔trace correlation; an `ILogRedactor` redaction seam. **Explicitly NOT normalized** (domain-specific — keep per project): each app's actual instruments — `otopcua.*` meters and spans, `mxgateway.*` counters/histograms/gauges — they are registered *through* the shared bootstrap but their names and semantics remain bespoke (see [`METRIC-CONVENTIONS.md`](METRIC-CONVENTIONS.md) §4); the redaction *policy* (which field names, which command types) — only the `ILogRedactor` seam is shared, each project supplies its own implementation; the MxGateway net48 x86 worker's `IWorkerLogger` (stderr key=value format, out-of-process, out of scope). ## 1. OpenTelemetry pipeline — `AddZbTelemetry` A single `IHostApplicationBuilder` extension is the front door for all three OTel signals. It wires the shared `Resource`, registers standard instrumentation, and configures the selected exporter: ```csharp builder.AddZbTelemetry(o => { o.ServiceName = "mxgateway"; // populates Resource service.name o.ServiceNamespace = "ZB.MOM.WW"; // constant across the fleet (default) o.ServiceVersion = "1.0.0"; // populated from AssemblyInformationalVersion o.SiteId = cfg.SiteId; // Resource site.id + Serilog SiteId property o.NodeRole = cfg.NodeRole; // Resource node.role + Serilog NodeRole property o.Meters = ["MxGateway.Server"]; // app's own Meter name(s) o.ActivitySources = ["MxGateway.Server"]; // app's own ActivitySource name(s) o.Exporter = ZbExporter.Prometheus; // default; ZbExporter.Otlp opt-in // o.OtlpEndpoint = "http://collector:4317"; // required when Exporter = Otlp }); app.MapZbMetrics(); // mounts Prometheus /metrics scrape endpoint ``` This is the headline fix: nobody in the fleet sets a `Resource` or `service.name` today, making every node indistinguishable in a collector. Every project must call `AddZbTelemetry` to be observable. > **`IServiceCollection` overload:** `AddZbTelemetry` also has an `IServiceCollection`-based > overload for host configurations where `IHostApplicationBuilder` is not available (detailed in > the shared-contract). The `IHostApplicationBuilder` overload is the primary path for all three > apps on .NET 10. ## 2. Shared Resource The OTel `Resource` attached to all three signals is built from `ZbTelemetryOptions`: | OTel attribute | Options property | Notes | |---|---|---| | `service.name` | `ServiceName` | Required. Lower-case short identifier (`otopcua`, `mxgateway`, `scadabridge`) | | `service.namespace` | `ServiceNamespace` | Default `"ZB.MOM.WW"` — constant across the fleet | | `service.version` | `ServiceVersion` | Optional; recommend populating from `AssemblyInformationalVersion` | | `site.id` | `SiteId` | Optional; identifies the physical/logical site | | `node.role` | `NodeRole` | Optional; e.g. `"central"`, `"site"`, `"hub"` | | `host.name` | _(auto)_ | Always populated from `Environment.MachineName` | The same `SiteId` and `NodeRole` values are passed to the Serilog enrichers (§4) so a metric, a span, and a log line from the same node carry identical dimensions and join up in any OTel-compatible backend. ## 3. Standard instrumentation `AddZbTelemetry` enables the following instrumentation for all projects. Any project that already enables a subset gets it consolidated; no project may skip this baseline: | Instrumentation | Package | Signal | |---|---|---| | ASP.NET Core | `OpenTelemetry.Instrumentation.AspNetCore` | Traces + Metrics | | HttpClient | `OpenTelemetry.Instrumentation.Http` | Traces + Metrics | | gRPC client | `OpenTelemetry.Instrumentation.GrpcNetClient` | Traces | | .NET runtime | `OpenTelemetry.Instrumentation.Runtime` | Metrics | | Process | `OpenTelemetry.Instrumentation.Process` | Metrics | App-specific `Meter` names and `ActivitySource` names are registered via `o.Meters` and `o.ActivitySources`. This is how MxGateway's hand-rolled `GatewayMetrics` finally gets an export path instead of dying in an in-memory `GetSnapshot()`. ## 4. Exporter conventions `ZbTelemetryOptions.Exporter` selects the export path: | Value | Behaviour | |---|---| | `ZbExporter.Prometheus` | Mounts a Prometheus `/metrics` scrape endpoint via `app.MapZbMetrics()`. Default for all three apps — consistent with OtOpcUa's existing `/metrics`. | | `ZbExporter.Otlp` | Exports to an OTLP endpoint specified by `o.OtlpEndpoint` (gRPC, `http://collector:4317`). Opt-in path to a real OTel Collector; coexists with Prometheus. | Both exporters carry the shared `Resource`. OTLP is the path to a real backend (Tempo, Prometheus-remote-write, Loki); Prometheus covers the "scrape from the node" case that all three apps currently use or aspire to. ## 5. Serilog logging stack `AddZbSerilog` is a companion extension in the `.Serilog` package. It registers the Serilog application logger in DI and applies a set of fixed enrichers: > **No process-global state.** `AddZbSerilog` does NOT set or freeze the static > `Log.Logger`. It passes `preserveStaticLogger: true` to `AddSerilog` so the global > logger is left entirely untouched. This keeps the library safe for multi-host processes > (integration tests, Aspire, test suites) — calling `AddZbSerilog` twice in the same > process must never throw "The logger is already frozen". > > The optional Stage-1 **bootstrap logger** (`CreateBootstrapLogger`) — a minimal > console-only `Log.Logger` for capturing startup exceptions before `IConfiguration` is > available — is the **application's responsibility**, not the library's. Apps that need > it should add this to `Program.cs` before calling `AddZbSerilog`: > ```csharp > Log.Logger = new LoggerConfiguration().WriteTo.Console().CreateBootstrapLogger(); > ``` **Application logger (DI):** reads sinks and overrides from `IConfiguration` (`ReadFrom.Configuration`) and applies a set of fixed enrichers: | Enricher | Property name | Source | |---|---|---| | `ZbLogEnricherNames.SiteId` | `"SiteId"` | `ZbTelemetryOptions.SiteId` | | `ZbLogEnricherNames.NodeRole` | `"NodeRole"` | `ZbTelemetryOptions.NodeRole` | | `ZbLogEnricherNames.NodeHostname` | `"NodeHostname"` | `Environment.MachineName` | | `TraceContextEnricher` | `trace_id`, `span_id` | `Activity.Current` | | `RedactionEnricher` | _(project-defined fields)_ | `ILogRedactor` implementation | `SiteId` and `NodeRole` are bound from the same `ZbTelemetryOptions` object as the OTel `Resource`; `NodeHostname` is populated automatically from `Environment.MachineName` (not a caller-supplied option). All three identity properties appear on logs and metrics/traces alike, so signals from the same node carry identical dimensions. When no `Activity.Current` is present (e.g. background services, startup), `TraceContextEnricher` emits nothing — it does not inject empty or zero values. `MinimumLevel` is set explicitly in code (default `Information`) and can be overridden via `IConfiguration` (`Serilog:MinimumLevel`). Sinks are fully config-driven: `ReadFrom.Configuration` reads `Serilog:WriteTo` from `appsettings.json` / environment. > **Per-project config paths:** `AddZbSerilog` reads `Serilog:MinimumLevel` from `IConfiguration`. > Callers that bind MinimumLevel from a different key (e.g. ScadaBridge's > `ScadaBridge:Logging:MinimumLevel`) apply that override themselves before or after calling > `AddZbSerilog`. The config key for MinimumLevel remains per-project; `AddZbSerilog` is not > parameterized on it. OTel log export is wired in the same call: logs flow through the OTel pipeline with the same `Resource` attached, making all three signals (metrics / traces / logs) available in a single backend. ## 6. Redaction seam — `ILogRedactor` `ILogRedactor` is a single-method interface that receives the mutable log-event property dictionary and scrubs any fields that must not leave the process: ```csharp public interface ILogRedactor { void Redact(IDictionary properties); } ``` `RedactionEnricher` applies a registered `ILogRedactor` on every log event. The seam is shared; the **policy** is per-project (which field names, which command types, which classification levels). MxGateway's existing `GatewayLogRedactor` is the reference implementation; it migrates to this seam during adoption. If no `ILogRedactor` is registered, `RedactionEnricher` is a no-op. This preserves the operational property MxGateway already has (secrets never leave the process in log events) while making the plumbing reusable. ## 7. Per-project migration | Project | Current state | Primary gaps | What normalizes | |---|---|---|---| | **OtOpcUa** | Full OTel SDK (`WithMetrics` + `WithTracing`); Prometheus `/metrics`; Serilog bootstrap; 7 instruments + 2 spans. | No `Resource` / `service.name` anywhere; no trace↔log correlation; no `SiteId`/`NodeRole` enrichers. | Call `AddZbTelemetry` (adds Resource; consolidates standard instrumentation); call `AddZbSerilog` (adds `TraceContextEnricher` + identity enrichers); remove bespoke Serilog bootstrap (if a pre-Build bootstrap logger is wanted, add `Log.Logger = new LoggerConfiguration()...CreateBootstrapLogger();` in `Program.cs`). | | **MxGateway** | Hand-rolled `GatewayMetrics` (13 counters / 3 histograms `ms` / 4 gauges); in-memory snapshot only — no export; MEL logging with `GatewayLogScope` correlation + `GatewayLogRedactor`; no OTel SDK. | No OTel SDK; no export; `ms` histograms diverge from OTel semconv (`s`); MEL → Serilog migration; no Resource. | Call `AddZbTelemetry` (wires OTel SDK around existing `GatewayMetrics` — finally exports); call `AddZbSerilog` (replaces MEL; re-expresses `GatewayLogScope` as `LogContext.PushProperty`; moves `GatewayLogRedactor` behind `ILogRedactor`). Duration unit convergence (`ms`→`s`) tracked in GAPS. **This is the one adoption done now.** | | **ScadaBridge** | `OpenTelemetry.Api` ref only (dangling — CVE-patch origin, zero usage); Serilog bootstrap (`LoggerConfigurationFactory`) with `SiteId`/`NodeRole`/`NodeHostname` enrichers. | No OTel SDK; no metrics; no tracing; no export; no trace↔log correlation. ScadaBridge's enricher property names are already the target names — migration is additive. | Call `AddZbTelemetry` (adds OTel SDK + metrics + traces + export); call `AddZbSerilog` (consolidates `LoggerConfigurationFactory`; adds `TraceContextEnricher`); if a pre-Build bootstrap logger is wanted, set `Log.Logger = new LoggerConfiguration()...CreateBootstrapLogger();` in `Program.cs`. | > The MxGateway logging migration (`MEL → Serilog`, re-expressing `GatewayLogRedactor` > behind `ILogRedactor`) is the **only sister-repo touch in scope for this release**. OtOpcUa > and ScadaBridge adoption is deferred to the follow-on tracked in > [`../GAPS.md`](../GAPS.md). ## 8. Acceptance (what "converged" means) A project is converged when: (a) it calls `builder.AddZbTelemetry(o => ...)` with all required Resource attributes populated; (b) it calls `app.MapZbMetrics()` (or configures OTLP); (c) it calls `builder.AddZbSerilog(...)` and the `TraceContextEnricher` stamps `trace_id`/`span_id` on every log event emitted under an active `Activity`; (d) its `ILogRedactor` implementation (if applicable) is registered and applied by `RedactionEnricher`; (e) every node in the fleet is distinguishable by `service.name` + `site.id` + `node.role` in a collector or log aggregator.