From dee55aadc64176cceadf0333901ce6ed44fdc146 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 1 Jun 2026 15:58:10 -0400 Subject: [PATCH] docs(observability): record ZB.MOM.WW.Telemetry adoption across 3 apps; correct false MxGateway logging-status claim MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All 3 apps adopted on branch feat/adopt-zb-telemetry (behaviour-preserving). Records the per-repo result + accepted scope deviations (ScadaBridge keeps LoggerConfigurationFactory + TraceContextEnricher instead of AddZbSerilog; MxGateway keeps GatewayLogScope, exposes redaction via ILogRedactor seam) and deferred follow-ons (#6 ms->s, #7 meter rename, #9 app instruments, OTLP, and the new ScadaBridge Site-node HTTP/1.1 metrics-listener item). Corrects the prior false 'MxGateway logging adopted on its own branch' claim — that migration actually landed in this pass. --- CLAUDE.md | 11 ++++++-- ZB.MOM.WW.Telemetry/CLAUDE.md | 23 +++++++++------- components/observability/GAPS.md | 44 ++++++++++++++++++++++++++++++ components/observability/README.md | 21 +++++++++----- 4 files changed, 79 insertions(+), 20 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 3060579..047f95f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -183,9 +183,14 @@ enrichers, and redaction policies. The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages — `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog`; 19 tests; -`dotnet pack` → 2 nupkgs @ 0.1.0). **MxAccessGateway logging adopted** (MEL → Serilog migration done on -its own branch) — the one in-pass adoption. Broader OtOpcUa and ScadaBridge telemetry adoption is -follow-on, tracked in [`components/observability/GAPS.md`](components/observability/GAPS.md). +`dotnet pack` → 2 nupkgs @ 0.1.0). **Adopted across all three apps on 2026-06-01** (branch +`feat/adopt-zb-telemetry` per repo, behaviour-preserving): `AddZbTelemetry` (Resource + standard +instrumentation + Prometheus `/metrics`) everywhere; OtOpcUa + MxGateway on `AddZbSerilog` (MxGateway's +MEL→Serilog migration + metrics export both landed in this pass — they were *not* actually done +beforehand despite an earlier claim); ScadaBridge keeps its `LoggerConfigurationFactory` (min-level +governance) and only adds the shared `TraceContextEnricher`. Deferred: MxGateway `ms`→`s` + Meter +rename, ScadaBridge app instruments + Site-node HTTP/1.1 metrics listener, OTLP wiring. Per-repo +result tracked in [`components/observability/GAPS.md`](components/observability/GAPS.md). Build/test from `ZB.MOM.WW.Telemetry/`: `dotnet test`. Consumer matrix: all three apps consume both packages after adoption (OtOpcUa, MxGateway Server, ScadaBridge Host + any instrumented project). diff --git a/ZB.MOM.WW.Telemetry/CLAUDE.md b/ZB.MOM.WW.Telemetry/CLAUDE.md index d54940d..606377b 100644 --- a/ZB.MOM.WW.Telemetry/CLAUDE.md +++ b/ZB.MOM.WW.Telemetry/CLAUDE.md @@ -4,7 +4,7 @@ Observability libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGat The library normalizes the three-project observability surface: a shared OpenTelemetry Resource driven by a single identity triple (`service.name` / `site.id` / `node.role`), standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and `TraceContextEnricher` for trace↔log correlation. -**Built at 0.1.0. MxAccessGateway logging adopted (MEL → Serilog migration done on its own branch). OtOpcUa and ScadaBridge telemetry adoption is follow-on.** Adoption tracked in `~/Desktop/scadaproj/components/observability/GAPS.md`. +**Built at 0.1.0, published to the Gitea NuGet feed, and adopted across all three apps on 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, behaviour-preserving). MxAccessGateway's MEL→Serilog migration + metrics export both landed in this pass — they were *not* actually done beforehand despite the earlier claim. ScadaBridge keeps its `LoggerConfigurationFactory` (min-level governance) and only adds the shared `TraceContextEnricher`; it does not call `AddZbSerilog`. Per-repo result + deferred follow-ons tracked in `~/Desktop/scadaproj/components/observability/GAPS.md`. --- @@ -21,12 +21,13 @@ The library normalizes the three-project observability surface: a shared OpenTel | Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` | |---|:---:|:---:| -| **OtOpcUa** | yes (after adoption) | yes (after adoption) | -| **MxAccessGateway** | yes (after adoption) | yes (MEL → Serilog adopted now) | -| **ScadaBridge** | yes (after adoption) | yes (after adoption) | +| **OtOpcUa** | ✅ adopted | ✅ adopted (`AddZbSerilog`) | +| **MxAccessGateway** | ✅ adopted (`GatewayMetrics` exported) | ✅ adopted (MEL→Serilog migrated in this pass) | +| **ScadaBridge** | ✅ adopted (both roots) | ⚠️ referenced for `TraceContextEnricher` only — keeps `LoggerConfigurationFactory`, does **not** call `AddZbSerilog` | -MxAccessGateway's logging adoption is the one in-pass migration. Full metrics/tracing wiring -for all three apps is follow-on. +All three adopted on 2026-06-01 (branch `feat/adopt-zb-telemetry` per repo). ScadaBridge's logging +deviates: it keeps its own `LoggerConfigurationFactory` (min-level governance contract) and only +adds the shared `TraceContextEnricher`. See `components/observability/GAPS.md` for the full result. --- @@ -60,11 +61,13 @@ All test assemblies run offline: ## Status -Built at **0.1.0** and published to the Gitea NuGet feed. MxAccessGateway logging (MEL → Serilog) -adopted on its own branch. **OtOpcUa and ScadaBridge telemetry adoption not yet started** — -tracked in the component backlog: +Built at **0.1.0**, published to the Gitea NuGet feed, and **adopted across all three apps on +2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, behaviour-preserving). MxAccessGateway's +MEL→Serilog migration and metrics export both landed in this pass (not beforehand, despite the +earlier claim). Deferred follow-ons (MxGateway `ms`→`s` + Meter rename, ScadaBridge app instruments ++ Site-node HTTP/1.1 metrics listener, OTLP wiring) are tracked in the component backlog: -- `~/Desktop/scadaproj/components/observability/GAPS.md` — adoption order, effort, and risk +- `~/Desktop/scadaproj/components/observability/GAPS.md` — adoption status + deferred follow-ons Design documentation: diff --git a/components/observability/GAPS.md b/components/observability/GAPS.md index b679c74..4d47256 100644 --- a/components/observability/GAPS.md +++ b/components/observability/GAPS.md @@ -181,3 +181,47 @@ app is opt-in and tracked here, not forced. unit migration (Gap U1) and the Meter rename (Gap N1) are deferred from the initial MxGateway adoption (Task #9). They are breaking dashboard/alert changes requiring ops coordination and are tracked as separate backlog items #6 and #7 in the adoption backlog above. + +## Adoption status — 2026-06-01 (DONE) + +`ZB.MOM.WW.Telemetry` + `ZB.MOM.WW.Telemetry.Serilog` (`0.1.0`) were adopted across **all three** +sister apps in one pass, behaviour-preserving. Each adoption landed on a per-repo branch +`feat/adopt-zb-telemetry` (one commit per task). Plan + design: +[`docs/plans/2026-06-01-telemetry-library-adoption.md`](../../docs/plans/2026-06-01-telemetry-library-adoption.md). + +> **Correction:** the prior claim that *"MxAccessGateway logging was adopted (MEL → Serilog) on its +> own branch"* was **false on `main`** — MxGateway was still MEL-only, and its `MxGateway.Server` +> meter was never exported. The full MEL→Serilog migration **and** the metrics export both landed +> in this 2026-06-01 pass. + +| Repo | `AddZbTelemetry` (Resource + std instrumentation + Prometheus) | `/metrics` | Logging | Meter (unchanged) | +|---|---|---|---|---| +| **OtOpcUa** | ✅ replaced hand-rolled `ObservabilityExtensions` | ✅ `/metrics` (path unchanged) | ✅ `AddZbSerilog` (sinks moved to `appsettings`; `LogContextEnricher` kept) | `ZB.MOM.WW.OtOpcUa` | +| **ScadaBridge** | ✅ added in `BindSharedOptions` (both Central + Site roots) | ✅ Central; mapped on Site too (see follow-on) | ⚠️ **kept `LoggerConfigurationFactory`** + added shared `TraceContextEnricher` — did **not** adopt `AddZbSerilog` | (none yet; #9) | +| **MxAccessGateway** | ✅ exports existing `GatewayMetrics` | ✅ new `/metrics` | ✅ MEL→`AddZbSerilog`; `GatewayLogRedactor` exposed via `ILogRedactor` seam (`GatewayLogRedactorSeam`); `GatewayLogScope`/middleware kept as-is | `MxGateway.Server` (name + `ms` units unchanged) | + +### Accepted scope decisions (deviations from the original backlog) + +- **ScadaBridge keeps `LoggerConfigurationFactory` (backlog #5 revised).** The factory implements a + documented governance contract (REQ-HOST-8 / Host-011/014/020/022): `ScadaBridge:Logging:MinimumLevel` + is the floor and **overrides** `Serilog:MinimumLevel`, with operator warnings. `AddZbSerilog` + hard-codes `MinimumLevel.Is(Information)` before `ReadFrom.Configuration`, which would invert that + precedence and silently drop the knob. So ScadaBridge keeps the factory and only **adds the shared + `TraceContextEnricher`** to it — gaining trace↔log correlation without regressing the contract. Full + `AddZbSerilog` adoption for ScadaBridge would first require teaching the shared bootstrap to accept a + caller-supplied minimum-level governance hook. +- **MxGateway keeps `GatewayLogScope` + request-logging middleware as-is.** The Serilog MEL provider + captures MEL `BeginScope` dictionaries as structured properties, so the scope/correlation code keeps + producing the same properties under Serilog. Only the provider swap + the `ILogRedactor` adapter were + needed. + +### Deferred (still open follow-ons) + +- **#6** MxGateway histogram `ms`→`s`, **#7** Meter rename `MxGateway.Server`→`ZB.MOM.WW.MxGateway` + (both break dashboards — ops-coordinated). +- **#9** ScadaBridge application instruments (`ScadaBridgeTelemetry` + `scadabridge.*`). +- **#10/#11** OTLP exporter wiring; OtOpcUa trace export is still a no-op (Prometheus is metrics-only). +- **NEW — ScadaBridge Site-node `/metrics` scrape:** the Site role's Kestrel is HTTP/2-only (gRPC), + so the mapped `/metrics` is not HTTP/1.1-scrapable on that listener. The in-process metrics + Resource + still apply; Central serves `/metrics` normally. A follow-on should add a dedicated HTTP/1.1 (or + `Http1AndHttp2`) listener/port for site-node scraping. diff --git a/components/observability/README.md b/components/observability/README.md index 8ad929a..ed73b1e 100644 --- a/components/observability/README.md +++ b/components/observability/README.md @@ -40,16 +40,20 @@ Serilog with the same options as enricher properties and adds `TraceContextEnric `node.role`) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and a log line from the same node carry identical dimensions and join up in a backend. -One adoption happens **in this task**: MxAccessGateway migrates off MEL onto `AddZbSerilog`. All -other app wiring is follow-on, consistent with how Auth and UI-Theme are structured. +**Adopted across all three apps on 2026-06-01** (branch `feat/adopt-zb-telemetry` per repo, +behaviour-preserving). Note: MxAccessGateway's MEL→Serilog migration was *not* actually done at +library-build time despite an earlier claim — it landed in this adoption pass, along with the +metrics export. See [`GAPS.md` → Adoption status — 2026-06-01](GAPS.md) for the per-repo result, +the accepted scope decisions (ScadaBridge keeps `LoggerConfigurationFactory`; MxGateway keeps its +log-scope code), and the deferred follow-ons. ## Status by project | Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status | |---|---|---|---|---|---|---| -| **OtOpcUa** | ✅ full SDK (`WithMetrics`+`WithTracing`) | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog (Console+File) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope) | Not started (follow-on) | -| **MxAccessGateway** | ⛔ none (hand-rolled `Meter`) | 🟡 20 instruments (`mxgateway.*`); **never exported** | ⛔ none | **Serilog (migrated from MEL — adopted)** | `SiteId`/`NodeRole`/`NodeHostname` (via `AddZbSerilog`); session/worker enrichers via `LogContext.PushProperty` | **Logging adopted; OTel metrics/traces follow-on** | -| **ScadaBridge** | ⛔ (`OpenTelemetry.Api` CVE-patch only) | ⛔ zero instruments | ⛔ none | Serilog (Console+File) | `SiteId`/`NodeRole`/`NodeHostname` (process-level; strongest set) | Not started (follow-on) | +| **OtOpcUa** | ✅ full SDK via `AddZbTelemetry` | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog via `AddZbSerilog` (sinks in `appsettings`) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope, kept) + shared | ✅ **Adopted 2026-06-01** | +| **MxAccessGateway** | ✅ `AddZbTelemetry` exports `GatewayMetrics` | ✅ 20 instruments (`mxgateway.*`) now exported; new `/metrics` | ⛔ none | ✅ **Serilog (migrated from MEL in this pass)** | `SiteId`/`NodeRole`/`NodeHostname` via `AddZbSerilog`; `GatewayLogScope` kept; `ILogRedactor` seam | ✅ **Adopted 2026-06-01** | +| **ScadaBridge** | ✅ `AddZbTelemetry` (both roots) | ✅ Resource + std instrumentation; `/metrics` (Central) | ⛔ none | Serilog via `LoggerConfigurationFactory` (kept) + shared `TraceContextEnricher` | `SiteId`/`NodeRole`/`NodeHostname` (process-level) + trace context | ✅ **Adopted 2026-06-01** (logging via factory, not `AddZbSerilog` — see GAPS) | See each project's [`current-state//CURRENT-STATE.md`](current-state/) for the code-verified detail and its adoption plan. @@ -100,8 +104,11 @@ hinge that makes a metric, a span, and a log line from the same node carry ident ## Component status -**Status: Built @ 0.1.0. MxAccessGateway MEL → Serilog logging adopted (on its own branch). -OtOpcUa and ScadaBridge telemetry adoption is follow-on, tracked in [`GAPS.md`](GAPS.md).** +**Status: Built @ 0.1.0 and published to the Gitea NuGet feed. Adopted across all three apps on +2026-06-01** (OtOpcUa, MxAccessGateway, ScadaBridge — branch `feat/adopt-zb-telemetry` per repo). +The MxAccessGateway MEL→Serilog migration and metrics export both landed in this pass (they were +not actually done beforehand despite an earlier claim). Per-repo result + deferred follow-ons: +[`GAPS.md` → Adoption status — 2026-06-01](GAPS.md). The shared library lives at [`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`](../../ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages —