Two TDD plans (one per library, per house precedent) derived from the approved design, with co-located .tasks.json execution-persistence: - Health: components/health docs + 3 dependency-split packages (11 tasks) - Telemetry: components/observability docs + 2 packages (3 OTel signals + Serilog) + the MxGateway MEL->Serilog migration (12 tasks) Each task carries classification / est-time / parallelizable metadata for the executing-plans workflow.
20 KiB
ZB.MOM.WW.Telemetry Shared Library Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
Goal: Author the components/observability/ normalization docs and build the ZB.MOM.WW.Telemetry shared library (2 NuGet packages) that gives the fleet one OpenTelemetry bootstrap across all three signals (metrics + traces + logs) with a shared Resource and a shared Serilog logging stack, then migrate MxAccessGateway's logging from Microsoft.Extensions.Logging onto that shared stack — the one sister-repo adoption that proves the contract.
Architecture: A new standalone nested repo (~/Desktop/scadaproj/ZB.MOM.WW.Telemetry), .NET 10, two library projects — ZB.MOM.WW.Telemetry (OTel metrics+traces bootstrap, shared Resource, standard instrumentation, Prometheus/OTLP exporters) and ZB.MOM.WW.Telemetry.Serilog (shared Serilog bootstrap, SiteId/NodeRole/NodeHostname enrichers, a new TraceContextEnricher, OTel log export, ILogRedactor seam). The unifying hinge: one ZbTelemetryOptions identity triple (ServiceName/SiteId/NodeRole) feeds both the OTel Resource and the Serilog enrichers. Reference implementations: OTel bootstrap from OtOpcUa ObservabilityExtensions, Serilog bootstrap + enrichers from ScadaBridge LoggerConfigurationFactory, redaction from MxGateway GatewayLogRedactor. Health/Telemetry wiring into OtOpcUa & ScadaBridge stays a future GAPS.md item; the ONLY app touched here is MxGateway's logging.
Tech Stack: .NET 10, C#; xUnit + coverlet; OpenTelemetry SDK 1.15.3 (OpenTelemetry.Extensions.Hosting), OpenTelemetry.Exporter.Prometheus.AspNetCore 1.15.3-beta.1, OpenTelemetry.Exporter.OpenTelemetryProtocol 1.15.3, OpenTelemetry.Instrumentation.{AspNetCore,Http,GrpcNetClient,Runtime,Process} (~1.12–1.15); Serilog 4.3.1, Serilog.AspNetCore (see version note), Serilog.Settings.Configuration, Serilog.Sinks.{Console,File,OpenTelemetry}; central package management; .slnx; Version 0.1.0 lockstep.
Version note (a real convergence item): OtOpcUa pins Serilog.AspNetCore 9.0.0, ScadaBridge 10.0.0. Pin 9.0.0 in this library (works on net10, lowest common); record the 9↔10 split in GAPS.md as a convergence task. Consumers' central package management governs the final version at adoption.
Source references (read-only, to port/generalize from):
- OTel bootstrap + Meter/ActivitySource: OtOpcUa
~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs+~/Desktop/OtOpcUa/src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs - Serilog bootstrap + enrichers: ScadaBridge
~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/{LoggerConfigurationFactory,LoggingOptions}.cs+appsettings.json:3-23 - Hand-rolled metrics to re-home onto OTel export: MxGateway
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs - Logging to migrate: MxGateway
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/{GatewayRequestLoggingMiddlewareExtensions,GatewayLogScope,GatewayLoggerExtensions,GatewayLogRedactor}.cs+GatewayApplication.cs:34,61 - Design:
~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md
Conventions for every task: TDD — failing test first, minimal impl, green, commit. File-scoped namespaces, sealed by default. Never log secrets. Commit after each green task. The Files: block is the files_to_edit contract.
Phase 0 — Normalization docs (spec drives the API)
Task 1: components/observability spec + METRIC-CONVENTIONS + shared-contract
Classification: small Estimated implement time: ~5 min Parallelizable with: Task 2
Files:
- Create:
components/observability/spec/SPEC.md - Create:
components/observability/spec/METRIC-CONVENTIONS.md - Create:
components/observability/shared-contract/ZB.MOM.WW.Telemetry.md
Steps:
spec/SPEC.md— Section 0 Scope: normalized = OTel bootstrap (3 signals), the sharedResourceattribute set, standard instrumentation, exporter conventions (Prometheus default / OTLP opt-in), Serilog bootstrap + enrichers + trace↔log correlation + redaction seam. NOT normalized = each app's actual instruments (otopcua.*,mxgateway.*), redaction policy (which fields), the net48 worker'sIWorkerLogger.spec/METRIC-CONVENTIONS.md(mirrors authCANONICAL-ROLES.md/ themeDESIGN-TOKENS.md): Meter name = app namespace; instrument name =<app>.<subsystem>.<event>; duration unit = seconds (OTel semconv — flag MxGateway'smshistograms); the Resource attribute list (service.name,service.namespace=ZB.MOM.WW,service.version,site.id,node.role,host.name); the standard instrumentation everyone enables.shared-contract/ZB.MOM.WW.Telemetry.md— paper API of both packages:ZbTelemetryOptions,AddZbTelemetry,MapZbMetrics,ZbExporterenum;AddZbSerilog,ZbLogEnricherNames,TraceContextEnricher,ILogRedactor.
Acceptance: Three files exist; SPEC.md has explicit Section 0; METRIC-CONVENTIONS.md states the seconds rule and the Resource set. No tests (docs).
Task 2: components/observability current-state ×3 + GAPS + README
Classification: small Estimated implement time: ~5 min Parallelizable with: Task 1
Files:
- Create:
components/observability/current-state/otopcua/CURRENT-STATE.md - Create:
components/observability/current-state/mxaccessgw/CURRENT-STATE.md - Create:
components/observability/current-state/scadabridge/CURRENT-STATE.md - Create:
components/observability/GAPS.md - Create:
components/observability/README.md
Steps:
- Transcribe the design doc's "Telemetry" + "Logging" current-state into the three docs at full
file:linedepth (re-verify against live repos). OtOpcUa = full OTel + Prometheus + Serilog (no Resource, no trace↔log correlation). MxGateway = hand-rolledGatewayMetrics(no export) + MEL logging — its Adoption plan = the migration in Phase 4. ScadaBridge =OpenTelemetry.Apidangling CVE-patch ref + Serilog (cleanest enrichers). GAPS.md— top entries: noResource/service.nameanywhere (P1); MxGateway metrics never export (P1); MxGateway MEL→Serilog (P1, done here);ms→sunit convergence; no trace↔log correlation anywhere;Serilog.AspNetCore9↔10 split; ScadaBridge has zero instrumentation.README.md— overview + per-project status table.
Acceptance: Five files exist; current-states cite real file:line; GAPS.md lists the migration + convergence items. No tests (docs).
Phase 1 — Scaffold
Task 3: Create repo, solution, and project shells
Classification: small Estimated implement time: ~5 min Parallelizable with: none (gates impl tasks)
Files:
- Create:
ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.slnx - Create:
ZB.MOM.WW.Telemetry/Directory.Build.props - Create:
ZB.MOM.WW.Telemetry/Directory.Packages.props - Create:
ZB.MOM.WW.Telemetry/.gitignore - Create:
ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.csproj - Create:
ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZB.MOM.WW.Telemetry.Serilog.csproj - Create:
ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Tests/…csproj - Create:
ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/…csproj
Steps:
cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Telemetry && cd ZB.MOM.WW.Telemetry && git init && dotnet new gitignoredotnet new sln -n ZB.MOM.WW.Telemetry --format slnx(fallback.sln).dotnet new classlib -f net10.0×2 libs;dotnet new xunit -f net10.0×2 tests. Delete default classes.- Refs:
.Serilog→ coreZB.MOM.WW.Telemetry; each test → its lib; core lib<FrameworkReference Include="Microsoft.AspNetCore.App"/>(forMapZbMetrics/ instrumentation). - Copy
Directory.Build.propsfromZB.MOM.WW.Auth. Directory.Packages.props— pin the OTel + Serilog versions from the Tech Stack/Version-note above + test pkgs (Microsoft.NET.Test.Sdk17.14.1,xunit2.9.3,xunit.runner.visualstudio3.1.4,coverlet.collector6.0.4) +Serilog.Sinks.InMemoryorSerilog.Sinks.TestCorrelatorfor tests.dotnet sln addall 4;dotnet build.- Commit:
chore: scaffold ZB.MOM.WW.Telemetry solution and projects
Acceptance: dotnet build green; 4 projects.
Phase 2 — ZB.MOM.WW.Telemetry (metrics + traces)
Task 4: ZbTelemetryOptions + shared Resource builder
Classification: standard Estimated implement time: ~4 min Parallelizable with: none (Tasks 5-6 build on it)
Files:
- Create:
src/ZB.MOM.WW.Telemetry/ZbTelemetryOptions.cs - Create:
src/ZB.MOM.WW.Telemetry/ZbResource.cs - Test:
tests/ZB.MOM.WW.Telemetry.Tests/ZbResourceTests.cs
Step 1 — failing test: ZbResource.Build(options) returns a ResourceBuilder whose attributes include service.name (= ServiceName), service.namespace (= ServiceNamespace, default "ZB.MOM.WW"), service.version, site.id (= SiteId), node.role (= NodeRole), host.name. Assert all six present with expected values (build the Resource, inspect Attributes).
Step 2 — FAIL. Step 3 — implement ZbTelemetryOptions (ServiceName, ServiceNamespace=ZB.MOM.WW, ServiceVersion, SiteId, NodeRole, string[] Meters, string[] ActivitySources, ZbExporter Exporter=Prometheus, OTLP endpoint) + ZbResource.Build. Step 4 — PASS. Step 5 — commit: feat(telemetry): options + shared OTel Resource
Task 5: AddZbTelemetry (metrics + traces wiring)
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none
Files:
- Create:
src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs - Test:
tests/ZB.MOM.WW.Telemetry.Tests/AddZbTelemetryTests.cs
Step 1 — failing test: build a host with AddZbTelemetry(o => { o.ServiceName="t"; o.Meters=["Test.Meter"]; }) using an in-memory metrics exporter (MetricReader/InMemoryExporter test harness); emit a counter on Test.Meter; assert the metric is collected and the export carries the Resource service.name="t". Port the builder shape from OtOpcUa ObservabilityExtensions.cs:18-25 (AddOpenTelemetry().WithMetrics(...).WithTracing(...)), generalized to register o.Meters/o.ActivitySources by name + standard instrumentation (AddAspNetCoreInstrumentation, AddHttpClientInstrumentation, AddGrpcClientInstrumentation, AddRuntimeInstrumentation, AddProcessInstrumentation) + exporter switch (Prometheus default, OTLP when o.Exporter==Otlp).
Step 2 — FAIL. Step 3 — implement. Classification high-risk → executor runs spec+code review serially (this is the fleet's telemetry front door). Step 4 — PASS. Step 5 — commit: feat(telemetry): AddZbTelemetry metrics+traces bootstrap
Task 6: MapZbMetrics Prometheus endpoint
Classification: small Estimated implement time: ~3 min Parallelizable with: Task 7, Task 8 (different package)
Files:
- Create:
src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs - Test:
tests/ZB.MOM.WW.Telemetry.Tests/MapZbMetricsTests.cs
Step 1 — failing test: WebApplicationFactory app with AddZbTelemetry(Prometheus) + app.MapZbMetrics() → GET /metrics returns 200 with text/plain; version=0.0.4 Prometheus exposition. Port /metrics mapping from OtOpcUa ObservabilityExtensions.cs:36-38.
Step 2 — FAIL. Step 3 — implement MapZbMetrics delegating to MapPrometheusScrapingEndpoint. Step 4 — PASS. Step 5 — commit: feat(telemetry): MapZbMetrics Prometheus scrape endpoint
Phase 3 — ZB.MOM.WW.Telemetry.Serilog (logs signal)
Task 7: Identity enrichers + AddZbSerilog bootstrap
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 6
Files:
- Create:
src/ZB.MOM.WW.Telemetry.Serilog/ZbLogEnricherNames.cs - Create:
src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs - Test:
tests/ZB.MOM.WW.Telemetry.Serilog.Tests/EnricherTests.cs
Step 1 — failing test: using Serilog.Sinks.InMemory, configure via AddZbSerilog(options) with SiteId="s1", NodeRole="Central" and log one event; assert the event carries properties SiteId=s1, NodeRole=Central, NodeHostname=<machine>. Bind these from the same ZbTelemetryOptions (reference the core package) so the dimensions match the Resource. Port two-stage bootstrap + MinimumLevel.Is override + ReadFrom.Configuration from ScadaBridge LoggerConfigurationFactory.cs:62-88.
Step 2 — FAIL. Step 3 — implement AddZbSerilog(this IHostApplicationBuilder, Action<ZbTelemetryOptions>) (or LoggerConfiguration factory mirroring ScadaBridge) with Enrich.WithProperty for the triple. Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): AddZbSerilog bootstrap + identity enrichers
Task 8: TraceContextEnricher (trace↔log correlation)
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 6
Files:
- Create:
src/ZB.MOM.WW.Telemetry.Serilog/TraceContextEnricher.cs - Modify:
src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs(register enricher) - Test:
tests/ZB.MOM.WW.Telemetry.Serilog.Tests/TraceContextEnricherTests.cs
Step 1 — failing test: with an active Activity (start an ActivitySource span), a logged event carries trace_id and span_id equal to Activity.Current.TraceId/SpanId; with no active Activity, neither property is added (clean omission). This is new shared glue — no existing app has it.
Step 2 — FAIL. Step 3 — implement ILogEventEnricher reading Activity.Current; add to AddZbSerilog. Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): TraceContextEnricher for trace<->log correlation
Task 9: ILogRedactor seam + OTel log export
Classification: standard Estimated implement time: ~5 min Parallelizable with: none
Files:
- Create:
src/ZB.MOM.WW.Telemetry.Serilog/ILogRedactor.cs - Create:
src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs - Modify:
src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogExtensions.cs(optionalWriteTo.OpenTelemetrywith shared Resource) - Test:
tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs
Step 1 — failing test: register a fake ILogRedactor that masks a property named apiKey; log an event with apiKey="mxgw_secret"; assert the sink sees it masked. The seam is shared; policy is the consumer's (generalize MxGateway GatewayLogRedactor.cs). Also assert (separate test) that when o.Exporter routes logs to OTLP, the log records carry the same Resource as metrics/traces.
Step 2 — FAIL. Step 3 — implement ILogRedactor { void Redact(IDictionary<string,object?> properties); } + a RedactionEnricher that applies the registered redactor; wire optional WriteTo.OpenTelemetry(resource: ZbResource…). Step 4 — PASS. Step 5 — commit: feat(telemetry.serilog): ILogRedactor seam + OTel log export
Phase 4 — MxGateway MEL → Serilog migration (the one sister-repo touch)
Touches
~/Desktop/MxAccessGateway. Prereq: Phase 3 complete (ZB.MOM.WW.Telemetry.Serilogpacked, or referenced via local project/nupkgsource). Add the package via a local NuGet source orProjectReferenceto the packed lib. The net48 x86 worker is OUT of scope — leaveWorkerConsoleLogger/IWorkerLoggeruntouched.
Task 10: Swap gateway bootstrap to AddZbSerilog
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none
Files:
- Modify:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs(replace default MEL logging withAddZbSerilog) - Modify:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj(add package ref) - Modify:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/appsettings.json(Serilog section: Console+File sinks, MinimumLevel) - Test: existing
~/Desktop/MxAccessGateway/src/MxGateway.Tests/(fake worker — no MXAccess needed)
Step 1 — failing/red state: add a focused test (or reuse an existing logging test) asserting the host builds with Serilog as the provider and a log event carries SiteId/NodeRole. Step 2 — run, expect FAIL (still MEL).
Step 3 — implement: reference ZB.MOM.WW.Telemetry.Serilog; call AddZbSerilog mapping o.ServiceName="mxgateway", SiteId/NodeRole from config; add the Serilog config section. Remove the default logging assumptions.
Step 4 — run, expect PASS; then dotnet build src/MxGateway.sln + dotnet test src/MxGateway.Tests green.
Step 5 — commit (in MxGateway repo): refactor(logging): adopt ZB.MOM.WW.Telemetry.Serilog bootstrap
Task 11: Re-express correlation scope + redactor on the shared seam
Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none
Files:
- Modify:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayRequestLoggingMiddlewareExtensions.cs(BeginScope →LogContext.PushProperty) - Modify:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogScope.cs(emit via LogContext) - Create:
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorAdapter.cs(implementsILogRedactor, delegates to existingGatewayLogRedactorpolicy) - Test: existing
MxGateway.Testscorrelation/redaction tests
Step 1 — failing test: assert (a) a request still emits the correlation properties (SessionId/CorrelationId/etc.) now via Serilog LogContext, and (b) a mxgw_-prefixed secret is still redacted through the registered ILogRedactor. Step 2 — FAIL (still MEL BeginScope/old redactor path).
Step 3 — implement: convert the scope middleware to push Serilog LogContext properties (keep header parsing from GatewayRequestLoggingMiddlewareExtensions.cs:22-41); register GatewayLogRedactorAdapter : ILogRedactor wrapping the existing GatewayLogRedactor field/command policy.
Step 4 — PASS; full dotnet test src/MxGateway.Tests green (record counts); verify no secret leakage.
Step 5 — commit (MxGateway repo): refactor(logging): correlation scope + redaction on shared ILogRedactor seam
Phase 5 — Package & register
Task 12: Pack, README, register in indexes
Classification: small Estimated implement time: ~5 min Parallelizable with: none (final)
Files:
- Create:
ZB.MOM.WW.Telemetry/README.md - Modify: both lib
.csproj(PackageId/Description/metadata) - Modify:
components/README.md(registry row) - Modify:
CLAUDE.md(Component-normalization table row) - Modify:
upcoming.md(check off Observability)
Steps:
- NuGet metadata on both lib
.csprojs. dotnet test(both test projects green) — record counts.dotnet pack -c Release -o ./artifacts→ confirm 2*.0.1.0.nupkg.README.md— packages, the identity-triple hinge, exporter options (Prometheus default / OTLP opt-in), consumer matrix, "built; MxGateway logging adopted; broader adoption deferred" note.- Register:
components/README.mdrow (statusDraft),CLAUDE.mdrow, tick Observability inupcoming.md. - Commit: lib repo
docs: README + pack metadata; scadaprojgit add components/observability CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(telemetry): ZB.MOM.WW.Telemetry library + observability normalization component + MxGateway logging adoption"
Acceptance: 2 nupkgs @ 0.1.0; all library tests green + MxGateway tests green (counts recorded); indexes updated; design-doc build-order steps 2-6 (telemetry side) complete.
Summary of parallelism
- Phase 0 docs: Task 1 ∥ Task 2.
- Phase 1 scaffold: Task 3 (barrier).
- Phase 2 core: Task 4 → Task 5 (sequential); Task 6 ∥ the Serilog tasks.
- Phase 3 serilog: Task 7 ∥ Task 8 (Task 8 modifies the same extensions file as Task 7 — sequence if conflict), then Task 9.
- Phase 4 migration: Task 10 → Task 11 (serial, same repo; needs Phase 3).
- Phase 5: Task 12 (barrier).