# Observability (metrics / traces / logs) Third normalized component under the operability cluster. **Goal: path to shared code** — converge the three sister projects onto a common OpenTelemetry Resource, a shared Serilog bootstrap with unified enrichers, and a trace↔log correlation bridge, proposed as the `ZB.MOM.WW.Telemetry` library set (2 packages), while each project keeps its own application instruments and sink configuration. - The one target: [`spec/SPEC.md`](spec/SPEC.md) - Metric naming reference: [`spec/METRIC-CONVENTIONS.md`](spec/METRIC-CONVENTIONS.md) - The proposed shared library: [`shared-contract/ZB.MOM.WW.Telemetry.md`](shared-contract/ZB.MOM.WW.Telemetry.md) - Divergences + backlog: [`GAPS.md`](GAPS.md) - Current state, per project: [`current-state/`](current-state/) ## Why observability is a strong normalization candidate All three projects instrument something — but in three completely different ways and at three very different levels of completeness. The divergences are structural: - **OtOpcUa** has the full OpenTelemetry SDK (metrics + tracing), Prometheus export, and a bespoke Serilog enricher for driver-lifecycle correlation — but no Resource (`service.name` is never set) and no trace↔log bridge. - **MxAccessGateway** has 20 hand-rolled instruments (counters, histograms, gauges) recording real production data — that never leave the process. No OTel SDK, no exporter, no tracing. Logging uses Microsoft.Extensions.Logging rather than Serilog, with a bespoke correlation-scope and redaction pipeline. - **ScadaBridge** has zero application instruments. Its `OpenTelemetry.Api` reference is a CVE patch, not instrumentation. It does have the cleanest structured logging enricher set (`SiteId`/`NodeRole`/`NodeHostname`) — but those properties exist only in Serilog, not in the OTel Resource, so logs and metrics cannot join in a backend. Nobody sets a Resource. Nobody does trace↔log correlation. MxGateway's metrics are invisible. ScadaBridge has no metrics at all. The common fix is a single `AddZbTelemetry(options)` call that: creates a shared Resource from a `service.name`/`site.id`/`node.role` options object; registers the project's own Meter/ActivitySource names with the OTel SDK; and exposes Prometheus `/metrics`. A companion `AddZbSerilog(options)` wires Serilog with the same options as enricher properties and adds `TraceContextEnricher` so logs carry `trace_id`/`span_id`. The unifying hinge: the same identity triple (`service.name`/`site.id`/ `node.role`) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and a log line from the same node carry identical dimensions and join up in a backend. **Adopted across all three apps on 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, behaviour-preserving). Note: MxAccessGateway's MEL→Serilog migration was *not* actually done at library-build time despite an earlier claim — it landed in this adoption pass, along with the metrics export. See [`GAPS.md` → Adoption status — 2026-06-01](GAPS.md) for the per-repo result, the accepted scope decisions (ScadaBridge keeps `LoggerConfigurationFactory`; MxGateway keeps its log-scope code), and the deferred follow-ons. ## Status by project | Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status | |---|---|---|---|---|---|---| | **OtOpcUa** | ✅ full SDK via `AddZbTelemetry` | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog via `AddZbSerilog` (sinks in `appsettings`) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope, kept) + shared | ✅ **Adopted 2026-06-01** | | **MxAccessGateway** | ✅ `AddZbTelemetry` exports `GatewayMetrics` | ✅ 20 instruments (`mxgateway.*`) now exported; new `/metrics` | ⛔ none | ✅ **Serilog (migrated from MEL in this pass)** | `SiteId`/`NodeRole`/`NodeHostname` via `AddZbSerilog`; `GatewayLogScope` kept; `ILogRedactor` seam | ✅ **Adopted 2026-06-01** | | **ScadaBridge** | ✅ `AddZbTelemetry` (both roots) | ✅ Resource + std instrumentation; `/metrics` (Central) | ⛔ none | Serilog via `LoggerConfigurationFactory` (kept) + shared `TraceContextEnricher` | `SiteId`/`NodeRole`/`NodeHostname` (process-level) + trace context | ✅ **Adopted 2026-06-01** (logging via factory, not `AddZbSerilog` — see GAPS) | See each project's [`current-state//CURRENT-STATE.md`](current-state/) for the code-verified detail and its adoption plan. ## Normalized vs. left per-project **Normalized (the shared target):** - `AddZbTelemetry(ZbTelemetryOptions)` — front door for the OTel SDK. Populates the shared Resource (`service.name`, `service.namespace`, `service.version`, `site.id`, `node.role`, `host.name`). Registers the caller-supplied Meter and ActivitySource name(s). Wires standard instrumentation (ASP.NET Core, HttpClient, runtime, process). Prometheus default; OTLP opt-in. - `app.MapZbMetrics()` — maps the Prometheus `/metrics` endpoint (shared path + shared exporter). - `AddZbSerilog(ZbTelemetryOptions)` — shared Serilog two-stage bootstrap generalizing ScadaBridge's `LoggerConfigurationFactory`. Wires `SiteId`/`NodeRole`/`NodeHostname` enrichers from the same options object as the OTel Resource. Wires `TraceContextEnricher` (`trace_id`/`span_id` from `Activity.Current`). Preserves `ReadFrom.Configuration` for sinks and explicit `MinimumLevel.Is` override. - `ILogRedactor` seam — generalized from MxGateway's `GatewayLogRedactor`. The seam is shared; the redaction policy (which fields/commands) stays per-project. - Metric naming convention: `..`; Meter name = project namespace (`ZB.MOM.WW.`); duration unit = `s` (OTel semconv). **Left per-project (not forced together):** - Application `Meter`, `ActivitySource`, and all instrument definitions — `otopcua.*`, `mxgateway.*`, `scadabridge.*` instruments are owned by each repo. - Serilog sink configuration (`appsettings.json` Console/File templates, rolling intervals). - Per-operation/per-session correlation enrichers (`LogContextEnricher` in OtOpcUa; `LogContext.PushProperty` scope in MxGateway after migration). - Redaction policies (`MxGatewayLogRedactor` implements `ILogRedactor` with gateway-specific command/field rules). - Config section paths for `SiteId`/`NodeRole`/`NodeHostname` — each project binds these from its own config hierarchy and passes the resolved values to `AddZbTelemetry`/`AddZbSerilog`. ## Package structure `ZB.MOM.WW.Telemetry` ships as two dependency-split packages: | Package | Contents | Consumers | |---|---|---| | `ZB.MOM.WW.Telemetry` | `AddZbTelemetry`, `ZbTelemetryOptions`, Resource builder, standard instrumentation, Prometheus/OTLP exporters, `app.MapZbMetrics()` | All three | | `ZB.MOM.WW.Telemetry.Serilog` | `AddZbSerilog`, shared enrichers (`SiteId`/`NodeRole`/`NodeHostname`/`TraceContextEnricher`), `ILogRedactor` seam | All three (Serilog users); MxGateway on migration | Both packages share `ZbTelemetryOptions` as the single options object that drives Resource attributes, Serilog enrichers, Meter/ActivitySource names, and exporter selection — the unifying hinge that makes a metric, a span, and a log line from the same node carry identical dimensions. ## Component status **Status: Built @ 0.1.0 and published to the Gitea NuGet feed. Adopted across all three apps on 2026-06-01** (OtOpcUa, MxAccessGateway, ScadaBridge — branch `feat/adopt-zb-telemetry` per repo). The MxAccessGateway MEL→Serilog migration and metrics export both landed in this pass (they were not actually done beforehand despite an earlier claim). Per-repo result + deferred follow-ons: [`GAPS.md` → Adoption status — 2026-06-01](GAPS.md). The shared library lives at [`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`](../../ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages — `ZB.MOM.WW.Telemetry` and `ZB.MOM.WW.Telemetry.Serilog`; 19 tests; `dotnet pack` → 2 nupkgs @ 0.1.0). Build/test/pack from `ZB.MOM.WW.Telemetry/`: ```bash dotnet test ZB.MOM.WW.Telemetry.slnx dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts ```