Files
scadaproj/docs/plans/2026-06-01-zb-mom-ww-telemetry-shared-library.md
T
Joseph Doherty c77df2a2cd docs: implementation plans for ZB.MOM.WW.Health + ZB.MOM.WW.Telemetry
Two TDD plans (one per library, per house precedent) derived from the approved
design, with co-located .tasks.json execution-persistence:

- Health: components/health docs + 3 dependency-split packages (11 tasks)
- Telemetry: components/observability docs + 2 packages (3 OTel signals +
  Serilog) + the MxGateway MEL->Serilog migration (12 tasks)

Each task carries classification / est-time / parallelizable metadata for the
executing-plans workflow.
2026-06-01 06:15:22 -04:00

20 KiB
Raw Blame History

ZB.MOM.WW.Telemetry Shared Library Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Author the components/observability/ normalization docs and build the ZB.MOM.WW.Telemetry shared library (2 NuGet packages) that gives the fleet one OpenTelemetry bootstrap across all three signals (metrics + traces + logs) with a shared Resource and a shared Serilog logging stack, then migrate MxAccessGateway's logging from Microsoft.Extensions.Logging onto that shared stack — the one sister-repo adoption that proves the contract.

Architecture: A new standalone nested repo (~/Desktop/scadaproj/ZB.MOM.WW.Telemetry), .NET 10, two library projects — ZB.MOM.WW.Telemetry (OTel metrics+traces bootstrap, shared Resource, standard instrumentation, Prometheus/OTLP exporters) and ZB.MOM.WW.Telemetry.Serilog (shared Serilog bootstrap, SiteId/NodeRole/NodeHostname enrichers, a new TraceContextEnricher, OTel log export, ILogRedactor seam). The unifying hinge: one ZbTelemetryOptions identity triple (ServiceName/SiteId/NodeRole) feeds both the OTel Resource and the Serilog enrichers. Reference implementations: OTel bootstrap from OtOpcUa ObservabilityExtensions, Serilog bootstrap + enrichers from ScadaBridge LoggerConfigurationFactory, redaction from MxGateway GatewayLogRedactor. Health/Telemetry wiring into OtOpcUa & ScadaBridge stays a future GAPS.md item; the ONLY app touched here is MxGateway's logging.

Tech Stack: .NET 10, C#; xUnit + coverlet; OpenTelemetry SDK 1.15.3 (OpenTelemetry.Extensions.Hosting), OpenTelemetry.Exporter.Prometheus.AspNetCore 1.15.3-beta.1, OpenTelemetry.Exporter.OpenTelemetryProtocol 1.15.3, OpenTelemetry.Instrumentation.{AspNetCore,Http,GrpcNetClient,Runtime,Process} (~1.121.15); Serilog 4.3.1, Serilog.AspNetCore (see version note), Serilog.Settings.Configuration, Serilog.Sinks.{Console,File,OpenTelemetry}; central package management; .slnx; Version 0.1.0 lockstep.

Version note (a real convergence item): OtOpcUa pins Serilog.AspNetCore 9.0.0, ScadaBridge 10.0.0. Pin 9.0.0 in this library (works on net10, lowest common); record the 9↔10 split in GAPS.md as a convergence task. Consumers' central package management governs the final version at adoption.

Source references (read-only, to port/generalize from):

  • OTel bootstrap + Meter/ActivitySource: OtOpcUa ~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs + ~/Desktop/OtOpcUa/src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs
  • Serilog bootstrap + enrichers: ScadaBridge ~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/{LoggerConfigurationFactory,LoggingOptions}.cs + appsettings.json:3-23
  • Hand-rolled metrics to re-home onto OTel export: MxGateway ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs
  • Logging to migrate: MxGateway ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/{GatewayRequestLoggingMiddlewareExtensions,GatewayLogScope,GatewayLoggerExtensions,GatewayLogRedactor}.cs + GatewayApplication.cs:34,61
  • Design: ~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md

Conventions for every task: TDD — failing test first, minimal impl, green, commit. File-scoped namespaces, sealed by default. Never log secrets. Commit after each green task. The Files: block is the files_to_edit contract.


Phase 0 — Normalization docs (spec drives the API)

Task 1: components/observability spec + METRIC-CONVENTIONS + shared-contract

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 2

Files:

  • Create: components/observability/spec/SPEC.md
  • Create: components/observability/spec/METRIC-CONVENTIONS.md
  • Create: components/observability/shared-contract/ZB.MOM.WW.Telemetry.md

Steps:

  1. spec/SPEC.md — Section 0 Scope: normalized = OTel bootstrap (3 signals), the shared Resource attribute set, standard instrumentation, exporter conventions (Prometheus default / OTLP opt-in), Serilog bootstrap + enrichers + trace↔log correlation + redaction seam. NOT normalized = each app's actual instruments (otopcua.*, mxgateway.*), redaction policy (which fields), the net48 worker's IWorkerLogger.
  2. spec/METRIC-CONVENTIONS.md (mirrors auth CANONICAL-ROLES.md / theme DESIGN-TOKENS.md): Meter name = app namespace; instrument name = <app>.<subsystem>.<event>; duration unit = seconds (OTel semconv — flag MxGateway's ms histograms); the Resource attribute list (service.name, service.namespace=ZB.MOM.WW, service.version, site.id, node.role, host.name); the standard instrumentation everyone enables.
  3. shared-contract/ZB.MOM.WW.Telemetry.md — paper API of both packages: ZbTelemetryOptions, AddZbTelemetry, MapZbMetrics, ZbExporter enum; AddZbSerilog, ZbLogEnricherNames, TraceContextEnricher, ILogRedactor.

Acceptance: Three files exist; SPEC.md has explicit Section 0; METRIC-CONVENTIONS.md states the seconds rule and the Resource set. No tests (docs).


Task 2: components/observability current-state ×3 + GAPS + README

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 1

Files:

  • Create: components/observability/current-state/otopcua/CURRENT-STATE.md
  • Create: components/observability/current-state/mxaccessgw/CURRENT-STATE.md
  • Create: components/observability/current-state/scadabridge/CURRENT-STATE.md
  • Create: components/observability/GAPS.md
  • Create: components/observability/README.md

Steps:

  1. Transcribe the design doc's "Telemetry" + "Logging" current-state into the three docs at full file:line depth (re-verify against live repos). OtOpcUa = full OTel + Prometheus + Serilog (no Resource, no trace↔log correlation). MxGateway = hand-rolled GatewayMetrics (no export) + MEL logging — its Adoption plan = the migration in Phase 4. ScadaBridge = OpenTelemetry.Api dangling CVE-patch ref + Serilog (cleanest enrichers).
  2. GAPS.md — top entries: no Resource/service.name anywhere (P1); MxGateway metrics never export (P1); MxGateway MEL→Serilog (P1, done here); mss unit convergence; no trace↔log correlation anywhere; Serilog.AspNetCore 9↔10 split; ScadaBridge has zero instrumentation.
  3. README.md — overview + per-project status table.

Acceptance: Five files exist; current-states cite real file:line; GAPS.md lists the migration + convergence items. No tests (docs).


Phase 1 — Scaffold

Task 3: Create repo, solution, and project shells

Classification: small Estimated implement time: ~5 min Parallelizable with: none (gates impl tasks)

Files:

  • Create: ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.slnx
  • Create: ZB.MOM.WW.Telemetry/Directory.Build.props
  • Create: ZB.MOM.WW.Telemetry/Directory.Packages.props
  • Create: ZB.MOM.WW.Telemetry/.gitignore
  • Create: ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.csproj
  • Create: ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZB.MOM.WW.Telemetry.Serilog.csproj
  • Create: ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Tests/…csproj
  • Create: ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/…csproj

Steps:

  1. cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Telemetry && cd ZB.MOM.WW.Telemetry && git init && dotnet new gitignore
  2. dotnet new sln -n ZB.MOM.WW.Telemetry --format slnx (fallback .sln).
  3. dotnet new classlib -f net10.0 ×2 libs; dotnet new xunit -f net10.0 ×2 tests. Delete default classes.
  4. Refs: .Serilog → core ZB.MOM.WW.Telemetry; each test → its lib; core lib <FrameworkReference Include="Microsoft.AspNetCore.App"/> (for MapZbMetrics / instrumentation).
  5. Copy Directory.Build.props from ZB.MOM.WW.Auth.
  6. Directory.Packages.props — pin the OTel + Serilog versions from the Tech Stack/Version-note above + test pkgs (Microsoft.NET.Test.Sdk 17.14.1, xunit 2.9.3, xunit.runner.visualstudio 3.1.4, coverlet.collector 6.0.4) + Serilog.Sinks.InMemory or Serilog.Sinks.TestCorrelator for tests.
  7. dotnet sln add all 4; dotnet build.
  8. Commit: chore: scaffold ZB.MOM.WW.Telemetry solution and projects

Acceptance: dotnet build green; 4 projects.


Phase 2 — ZB.MOM.WW.Telemetry (metrics + traces)

Task 4: ZbTelemetryOptions + shared Resource builder

Classification: standard Estimated implement time: ~4 min Parallelizable with: none (Tasks 5-6 build on it)

Files:

  • Create: src/ZB.MOM.WW.Telemetry/ZbTelemetryOptions.cs
  • Create: src/ZB.MOM.WW.Telemetry/ZbResource.cs
  • Test: tests/ZB.MOM.WW.Telemetry.Tests/ZbResourceTests.cs

Step 1 — failing test: ZbResource.Build(options) returns a ResourceBuilder whose attributes include service.name (= ServiceName), service.namespace (= ServiceNamespace, default "ZB.MOM.WW"), service.version, site.id (= SiteId), node.role (= NodeRole), host.name. Assert all six present with expected values (build the Resource, inspect Attributes).

Step 2 — FAIL. Step 3 — implement ZbTelemetryOptions (ServiceName, ServiceNamespace=ZB.MOM.WW, ServiceVersion, SiteId, NodeRole, string[] Meters, string[] ActivitySources, ZbExporter Exporter=Prometheus, OTLP endpoint) + ZbResource.Build. Step 4 — PASS. Step 5 — commit: feat(telemetry): options + shared OTel Resource


Task 5: AddZbTelemetry (metrics + traces wiring)

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none

Files:

  • Create: src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs
  • Test: tests/ZB.MOM.WW.Telemetry.Tests/AddZbTelemetryTests.cs

Step 1 — failing test: build a host with AddZbTelemetry(o => { o.ServiceName="t"; o.Meters=["Test.Meter"]; }) using an in-memory metrics exporter (MetricReader/InMemoryExporter test harness); emit a counter on Test.Meter; assert the metric is collected and the export carries the Resource service.name="t". Port the builder shape from OtOpcUa ObservabilityExtensions.cs:18-25 (AddOpenTelemetry().WithMetrics(...).WithTracing(...)), generalized to register o.Meters/o.ActivitySources by name + standard instrumentation (AddAspNetCoreInstrumentation, AddHttpClientInstrumentation, AddGrpcClientInstrumentation, AddRuntimeInstrumentation, AddProcessInstrumentation) + exporter switch (Prometheus default, OTLP when o.Exporter==Otlp).

Step 2 — FAIL. Step 3 — implement. Classification high-risk → executor runs spec+code review serially (this is the fleet's telemetry front door). Step 4 — PASS. Step 5 — commit: feat(telemetry): AddZbTelemetry metrics+traces bootstrap


Task 6: MapZbMetrics Prometheus endpoint

Classification: small Estimated implement time: ~3 min Parallelizable with: Task 7, Task 8 (different package)

Files:

  • Create: src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs
  • Test: tests/ZB.MOM.WW.Telemetry.Tests/MapZbMetricsTests.cs

Step 1 — failing test: WebApplicationFactory app with AddZbTelemetry(Prometheus) + app.MapZbMetrics()GET /metrics returns 200 with text/plain; version=0.0.4 Prometheus exposition. Port /metrics mapping from OtOpcUa ObservabilityExtensions.cs:36-38.

Step 2 — FAIL. Step 3 — implement MapZbMetrics delegating to MapPrometheusScrapingEndpoint. Step 4 — PASS. Step 5 — commit: feat(telemetry): MapZbMetrics Prometheus scrape endpoint


Phase 3 — ZB.MOM.WW.Telemetry.Serilog (logs signal)

Task 7: Identity enrichers + AddZbSerilog bootstrap

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 6

Files:

  • Create: src/ZB.MOM.WW.Telemetry.Serilog/ZbLogEnricherNames.cs
  • Create: src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs
  • Test: tests/ZB.MOM.WW.Telemetry.Serilog.Tests/EnricherTests.cs

Step 1 — failing test: using Serilog.Sinks.InMemory, configure via AddZbSerilog(options) with SiteId="s1", NodeRole="Central" and log one event; assert the event carries properties SiteId=s1, NodeRole=Central, NodeHostname=<machine>. Bind these from the same ZbTelemetryOptions (reference the core package) so the dimensions match the Resource. Port two-stage bootstrap + MinimumLevel.Is override + ReadFrom.Configuration from ScadaBridge LoggerConfigurationFactory.cs:62-88.

Step 2 — FAIL. Step 3 — implement AddZbSerilog(this IHostApplicationBuilder, Action<ZbTelemetryOptions>) (or LoggerConfiguration factory mirroring ScadaBridge) with Enrich.WithProperty for the triple. Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): AddZbSerilog bootstrap + identity enrichers


Task 8: TraceContextEnricher (trace↔log correlation)

Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 6

Files:

  • Create: src/ZB.MOM.WW.Telemetry.Serilog/TraceContextEnricher.cs
  • Modify: src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs (register enricher)
  • Test: tests/ZB.MOM.WW.Telemetry.Serilog.Tests/TraceContextEnricherTests.cs

Step 1 — failing test: with an active Activity (start an ActivitySource span), a logged event carries trace_id and span_id equal to Activity.Current.TraceId/SpanId; with no active Activity, neither property is added (clean omission). This is new shared glue — no existing app has it.

Step 2 — FAIL. Step 3 — implement ILogEventEnricher reading Activity.Current; add to AddZbSerilog. Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): TraceContextEnricher for trace<->log correlation


Task 9: ILogRedactor seam + OTel log export

Classification: standard Estimated implement time: ~5 min Parallelizable with: none

Files:

  • Create: src/ZB.MOM.WW.Telemetry.Serilog/ILogRedactor.cs
  • Create: src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs
  • Modify: src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs (optional WriteTo.OpenTelemetry with shared Resource)
  • Test: tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs

Step 1 — failing test: register a fake ILogRedactor that masks a property named apiKey; log an event with apiKey="mxgw_secret"; assert the sink sees it masked. The seam is shared; policy is the consumer's (generalize MxGateway GatewayLogRedactor.cs). Also assert (separate test) that when o.Exporter routes logs to OTLP, the log records carry the same Resource as metrics/traces.

Step 2 — FAIL. Step 3 — implement ILogRedactor { void Redact(IDictionary<string,object?> properties); } + a RedactionEnricher that applies the registered redactor; wire optional WriteTo.OpenTelemetry(resource: ZbResource…). Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): ILogRedactor seam + OTel log export


Phase 4 — MxGateway MEL → Serilog migration (the one sister-repo touch)

Touches ~/Desktop/MxAccessGateway. Prereq: Phase 3 complete (ZB.MOM.WW.Telemetry.Serilog packed, or referenced via local project/nupkg source). Add the package via a local NuGet source or ProjectReference to the packed lib. The net48 x86 worker is OUT of scope — leave WorkerConsoleLogger/IWorkerLogger untouched.

Task 10: Swap gateway bootstrap to AddZbSerilog

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none

Files:

  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs (replace default MEL logging with AddZbSerilog)
  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj (add package ref)
  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/appsettings.json (Serilog section: Console+File sinks, MinimumLevel)
  • Test: existing ~/Desktop/MxAccessGateway/src/MxGateway.Tests/ (fake worker — no MXAccess needed)

Step 1 — failing/red state: add a focused test (or reuse an existing logging test) asserting the host builds with Serilog as the provider and a log event carries SiteId/NodeRole. Step 2 — run, expect FAIL (still MEL). Step 3 — implement: reference ZB.MOM.WW.Telemetry.Serilog; call AddZbSerilog mapping o.ServiceName="mxgateway", SiteId/NodeRole from config; add the Serilog config section. Remove the default logging assumptions. Step 4 — run, expect PASS; then dotnet build src/MxGateway.sln + dotnet test src/MxGateway.Tests green. Step 5 — commit (in MxGateway repo): refactor(logging): adopt ZB.MOM.WW.Telemetry.Serilog bootstrap


Task 11: Re-express correlation scope + redactor on the shared seam

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none

Files:

  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayRequestLoggingMiddlewareExtensions.cs (BeginScope → LogContext.PushProperty)
  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogScope.cs (emit via LogContext)
  • Create: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorAdapter.cs (implements ILogRedactor, delegates to existing GatewayLogRedactor policy)
  • Test: existing MxGateway.Tests correlation/redaction tests

Step 1 — failing test: assert (a) a request still emits the correlation properties (SessionId/CorrelationId/etc.) now via Serilog LogContext, and (b) a mxgw_-prefixed secret is still redacted through the registered ILogRedactor. Step 2 — FAIL (still MEL BeginScope/old redactor path). Step 3 — implement: convert the scope middleware to push Serilog LogContext properties (keep header parsing from GatewayRequestLoggingMiddlewareExtensions.cs:22-41); register GatewayLogRedactorAdapter : ILogRedactor wrapping the existing GatewayLogRedactor field/command policy. Step 4 — PASS; full dotnet test src/MxGateway.Tests green (record counts); verify no secret leakage. Step 5 — commit (MxGateway repo): refactor(logging): correlation scope + redaction on shared ILogRedactor seam


Phase 5 — Package & register

Task 12: Pack, README, register in indexes

Classification: small Estimated implement time: ~5 min Parallelizable with: none (final)

Files:

  • Create: ZB.MOM.WW.Telemetry/README.md
  • Modify: both lib .csproj (PackageId/Description/metadata)
  • Modify: components/README.md (registry row)
  • Modify: CLAUDE.md (Component-normalization table row)
  • Modify: upcoming.md (check off Observability)

Steps:

  1. NuGet metadata on both lib .csprojs.
  2. dotnet test (both test projects green) — record counts.
  3. dotnet pack -c Release -o ./artifacts → confirm 2 *.0.1.0.nupkg.
  4. README.md — packages, the identity-triple hinge, exporter options (Prometheus default / OTLP opt-in), consumer matrix, "built; MxGateway logging adopted; broader adoption deferred" note.
  5. Register: components/README.md row (status Draft), CLAUDE.md row, tick Observability in upcoming.md.
  6. Commit: lib repo docs: README + pack metadata; scadaproj git add components/observability CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(telemetry): ZB.MOM.WW.Telemetry library + observability normalization component + MxGateway logging adoption"

Acceptance: 2 nupkgs @ 0.1.0; all library tests green + MxGateway tests green (counts recorded); indexes updated; design-doc build-order steps 2-6 (telemetry side) complete.


Summary of parallelism

  • Phase 0 docs: Task 1 ∥ Task 2.
  • Phase 1 scaffold: Task 3 (barrier).
  • Phase 2 core: Task 4 → Task 5 (sequential); Task 6 ∥ the Serilog tasks.
  • Phase 3 serilog: Task 7 ∥ Task 8 (Task 8 modifies the same extensions file as Task 7 — sequence if conflict), then Task 9.
  • Phase 4 migration: Task 10 → Task 11 (serial, same repo; needs Phase 3).
  • Phase 5: Task 12 (barrier).