# Adopt `ZB.MOM.WW.Telemetry` across the three sister apps — design **Date:** 2026-06-01 **Status:** Approved (design); implementation plan to follow via writing-plans. **Scope:** Integrate the built-but-unadopted `ZB.MOM.WW.Telemetry` (+ `.Serilog`) shared library into all three sister apps — **OtOpcUa**, **MxAccessGateway**, **ScadaBridge** — wiring the shared OpenTelemetry Resource, standard instrumentation, Prometheus `/metrics`, and the shared Serilog bootstrap with identity enrichers and trace↔log correlation. This is the second full cross-fleet adoption of one of the six shared `ZB.MOM.WW.*` libraries (after `ZB.MOM.WW.Health`). It follows the adoption backlog in [`components/observability/GAPS.md`](../../components/observability/GAPS.md), re-verified against current code on 2026-06-01. > **Correction recorded during design:** the library CLAUDE.md and > [`components/observability/README.md`](../../components/observability/README.md) claim > *"MxAccessGateway logging adopted (MEL → Serilog migration done on its own branch)."* This is > **false on `main`** — MxGateway is still MEL-only (no Serilog packages, `GatewayLogScope` / > `GatewayLogRedactor` still bespoke), and its `MxGateway.Server` meter is **not exported at all** > (no `AddOpenTelemetry`, no `/metrics`). That branch never landed. This design therefore includes > the full MxGateway MEL→Serilog migration, and the bookkeeping task corrects the false claim. --- ## 1. Goal & scope Wire the two shared packages into all three apps: - **`ZB.MOM.WW.Telemetry`** — `AddZbTelemetry(options)`: shared OTel Resource (the identity triple `service.name` / `site.id` / `node.role` + `service.namespace` / `service.version` / `host.name`), caller-supplied Meters/ActivitySources, standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process), Prometheus always-on exporter (OTLP opt-in), and `app.MapZbMetrics()` to mount `/metrics`. - **`ZB.MOM.WW.Telemetry.Serilog`** — `AddZbSerilog(options)`: two-stage Serilog bootstrap, `ReadFrom.Configuration` sinks, `SiteId`/`NodeRole`/`NodeHostname` enrichers, `TraceContextEnricher` (writes `trace_id`/`span_id` from `Activity.Current`), and the `ILogRedactor` seam via `RedactionEnricher`. Uses `preserveStaticLogger: true` so it is test-safe. **The headline gap (§1 of GAPS):** *no* app sets a single OTel Resource attribute today, so every metric and span from every node is indistinguishable in a backend — no service identity, no site/role topology, no version label. `AddZbTelemetry` closes this for all three at once. This is the single highest-value observability gap across the fleet. **Behaviour-preserving bar** (same as the Health adoption): same log messages at the same levels, same metric series with the same names and units, same `/metrics` path. New series produced by standard instrumentation are *additive*. All genuinely breaking items are **deferred** (see §6). --- ## 2. Distribution - **Feed:** Gitea NuGet registry `dohertj2-gitea` (`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`). Credentials live **creds-only at the user level** (`~/.nuget/NuGet/NuGet.Config` ``), matched by source name — **never committed to any repo**. Already configured during the Health round; no change needed here. - **Source-mapping — the two-pattern gotcha (carried from Health):** under `packageSourceMapping`, the glob `ZB.MOM.WW.Telemetry.*` matches `ZB.MOM.WW.Telemetry.Serilog` but **not** the bare core id `ZB.MOM.WW.Telemetry`. Each repo therefore needs **both**: ```xml ``` - **Per-repo wiring:** | Repo | CPM? | Change | |---|---|---| | OtOpcUa | yes (`Directory.Packages.props`) | add 2 `` @ `0.1.0`; extend existing `NuGet.config` mapping with both Telemetry patterns; add 2 versionless `` to the Host csproj | | ScadaBridge | yes | add 2 `` @ `0.1.0`; extend existing `nuget.config` mapping; add 2 versionless `` to the Host csproj | | MxAccessGateway | **no CPM** | add 2 direct versioned `` to the Server csproj; extend its `nuget.config` mapping (the file created during the Health round) | - **Task 0 (gating, like Health):** the library docs claim these two packages are already on the feed. **Verify first; pack + push the two `.nupkg`s if missing** — the Health round proved this claim cannot be trusted. - **Serilog version floor (Gap V1):** OtOpcUa pins `Serilog.AspNetCore` 9.0.0, ScadaBridge 10.0.0. Confirm the `.Serilog` package's Serilog dependency floor is satisfied by both (bump if not), and pick MxGateway's fresh `Serilog.AspNetCore` version to align. --- ## 3. Per-app adoption surface ### OtOpcUa (`master`) — moderate Already has Serilog (inline `UseSerilog`), full OTel, and Prometheus `/metrics`. - **Metrics/traces:** replace the hand-rolled `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs` (`AddOpenTelemetry().WithMetrics(...AddPrometheusExporter()).WithTracing(...)` + `MapPrometheusScrapingEndpoint("/metrics")`) with ```csharp builder.AddZbTelemetry(o => { o.ServiceName = "otopcua"; o.ServiceVersion = /* AssemblyInformationalVersion */; o.Meters = ["ZB.MOM.WW.OtOpcUa"]; o.ActivitySources = ["ZB.MOM.WW.OtOpcUa"]; // Exporter defaults to Prometheus }); // ... app.MapZbMetrics(); ``` **Same meter/source names and same `/metrics` path** → behaviour-preserving; *gains* the Resource identity + standard instrumentation. (OtOpcUa records spans but has no trace exporter today; Prometheus is metrics-only, so traces remain a no-op exporter-wise — unchanged. OTLP trace wiring is deferred, §6.) - **Logging:** replace the inline `builder.Host.UseSerilog((ctx, lc) => lc.ReadFrom.Configuration(...).WriteTo.Console().WriteTo.File(...))` with `builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; })`, moving the Console/File sinks into `appsettings` `Serilog:WriteTo` so `ReadFrom.Configuration` reproduces them. Keep the existing driver-scope `LogContextEnricher` alongside the shared enrichers. - **Identity:** `ServiceName="otopcua"`; `SiteId`/`NodeRole` omitted (none in config). ### ScadaBridge (`main`) — moderate, two composition roots Serilog already (via `LoggerConfigurationFactory`); **no OTel at all**; `SiteId` + `NodeRole` already read from config (`ScadaBridge:Node:*`, `NodeOptions`). - **Metrics:** add `builder.AddZbTelemetry(o => { o.ServiceName="scadabridge"; o.SiteId=siteId; o.NodeRole=nodeRole; })` + `app.MapZbMetrics()` in **both** composition roots — the Central block and the Site block of `Program.cs` (the same two-root pattern the Health adoption used). `Meters=[]` for now (app instruments are deferred, §6). Purely additive — no metrics exist today to break. - **Logging:** replace `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` + `builder.Host.UseSerilog()` with `builder.AddZbSerilog(o => { o.ServiceName="scadabridge"; o.SiteId=siteId; o.NodeRole=nodeRole; })` — its enrichers reproduce the factory's `SiteId`/`NodeRole`/`NodeHostname`. Keep a minimal `CreateBootstrapLogger()` line for early-startup capture per the library's documented pattern, then delete `LoggerConfigurationFactory`. Verify the existing sinks are config-driven (`Serilog` section in `appsettings`) so the swap is byte-equivalent; mirror any code-side sinks into config. ### MxAccessGateway (`main`) — heaviest (the MEL→Serilog migration) MEL-only; custom `MxGateway.Server` meter **not exported**; no `/metrics`. The x86 net48 worker is a separate process and **out of scope** — telemetry is for the Server. - **Logging (MEL → Serilog):** - Add Serilog packages (`Serilog.AspNetCore` + sinks) to the Server csproj (direct versioned ref). - Replace the temporary `LoggerFactory.Create(...)` MEL bootstrap in `GatewayApplication.cs` (and `builder.Logging` config) with `builder.AddZbSerilog(o => { o.ServiceName="mxgateway"; })` + a `CreateBootstrapLogger()` line. - `GatewayLogScope` → `Serilog.Context.LogContext.PushProperty(...)`. - `GatewayLogRedactor` → implement the `ILogRedactor` seam, register in DI (picked up by `RedactionEnricher`). - Request-logging middleware → `UseSerilogRequestLogging()` (or keep the middleware but emit via a Serilog `ILogger`). Sinks to `appsettings`. - **Metrics:** `builder.AddZbTelemetry(o => { o.ServiceName="mxgateway"; o.Meters=["MxGateway.Server"]; })` + `app.MapZbMetrics()` → the 20 existing instruments (13 counters, 3 histograms, 4 gauges) finally export. **Keep the `MxGateway.Server` meter name and the `ms` histogram units** (rename and unit conversion are deferred, §6). `GetSnapshot()` in-memory read path stays untouched. --- ## 4. Shared seam ``` ZbTelemetryOptions (ServiceName / SiteId / NodeRole / Meters / ActivitySources / Exporter) │ ┌─────────────────┴──────────────────┐ AddZbTelemetry (core) AddZbSerilog (.Serilog) • ZbResource (identity triple) • ReadFrom.Configuration sinks • app Meters + ActivitySources • SiteId / NodeRole / NodeHostname enrichers • standard instrumentation • TraceContextEnricher (trace_id / span_id) • Prometheus always + OTLP opt-in • ILogRedactor seam (RedactionEnricher) │ │ app.MapZbMetrics() → /metrics preserveStaticLogger: true (test-safe) ``` Both packages share the single `ZbTelemetryOptions`. The Serilog OTLP log sink derives its Resource attributes from `ZbResource.BuildAttributes` (single source of truth), so logs can never drift from metrics and traces in a backend. --- ## 5. Sequencing & execution Subagent-driven, classification-driven review chain. **Task 0 gates everything** (verify/publish the feed). Then three **independent** per-repo phases — each its own git repo, branch **`feat/adopt-zb-telemetry`**, commit per task, **never skip hooks, never force-push**: 1. **Task 0 (gating):** verify the two Telemetry `.nupkg`s are on the Gitea feed; pack + push if missing (creds-only user config, already set). 2. **OtOpcUa:** source-mapping + package refs → `AddZbTelemetry` swap → `AddZbSerilog` swap → tests. 3. **ScadaBridge:** source-mapping + package refs → `AddZbTelemetry` (both roots) → `AddZbSerilog` (replace `LoggerConfigurationFactory`) → tests. 4. **MxAccessGateway:** source-mapping + package refs → **MEL→Serilog** (sub-tasked, `high-risk`) → `AddZbTelemetry` metrics export → tests. 5. **scadaproj bookkeeping:** add an "Adoption status — DONE" section to `components/observability/GAPS.md` (per-repo table + deferred items), **and correct the false "MxGateway logging already adopted" claim** in CLAUDE.md, the library CLAUDE.md, and `components/observability/README.md`. The MxGateway MEL→Serilog migration is the one `high-risk` change (logging behaviour on the most operational app) and gets the full spec→code serial review chain. The other per-app swaps are `standard`. --- ## 6. Deferred (out of scope this round; recorded in GAPS) | # | Item | Why deferred | |---|---|---| | #6 | MxGateway histogram `ms` → `s` | Breaking dashboard/alert change — needs ops coordination | | #7 | MxGateway meter rename `MxGateway.Server` → `ZB.MOM.WW.MxGateway` | Breaking Prometheus label change — needs ops coordination | | #9 | ScadaBridge app instruments (`ScadaBridgeTelemetry` + `scadabridge.*`) | Application-specific work, not shared-library adoption | | #10 | OtOpcUa OTLP exporter alongside Prometheus | Opt-in; no consumer for OTLP yet | | #11 | OtOpcUa trace-export no-op (spans recorded, no exporter) | Resolved by #10 / OTLP; or document | None of these block the behaviour-preserving initial adoption. --- ## 7. Testing All tests run **offline** — Prometheus is in-process, no OTLP collector required, and the library's own test suites are network-free. - **OtOpcUa:** assert `/metrics` is still served, the `ZB.MOM.WW.OtOpcUa` meter is present, the Resource carries `service.name`, and the shared Serilog enrichers are wired. - **ScadaBridge:** assert `/metrics` is served in **both** roles, the logger carries `SiteId`/`NodeRole` enrichers, and startup is clean after `LoggerConfigurationFactory` removal. - **MxAccessGateway** (the careful one): assert log messages are still emitted at the same levels, redaction still applies, request logging still fires, `/metrics` is now served, and the `GetSnapshot()` path is unchanged — using the existing fake-worker test harness (no MXAccess needed). --- ## 8. Acceptance bar - Each app builds and its test suite is green. - `/metrics` serves the same existing series (plus additive standard-instrumentation series); meter names and units unchanged. - Logs carry the same messages at the same levels, plus the shared identity enrichers and `trace_id`/`span_id` correlation. - No secrets committed to any repo (the Gitea token stays creds-only at the user level). - `components/observability/GAPS.md` updated; the false "MxGateway logging adopted" claim corrected.