diff --git a/docs/plans/2026-06-01-telemetry-library-adoption.md b/docs/plans/2026-06-01-telemetry-library-adoption.md new file mode 100644 index 0000000..cce9d63 --- /dev/null +++ b/docs/plans/2026-06-01-telemetry-library-adoption.md @@ -0,0 +1,848 @@ +# ZB.MOM.WW.Telemetry Adoption Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. + +**Goal:** Adopt the shared `ZB.MOM.WW.Telemetry` + `ZB.MOM.WW.Telemetry.Serilog` packages across OtOpcUa, MxAccessGateway, and ScadaBridge — giving all three the OTel Resource identity triple, standard instrumentation, Prometheus `/metrics`, and shared Serilog correlation — behaviour-preserving, with breaking items deferred. + +**Architecture:** Gitea-registry distribution (`dohertj2-gitea`, creds-only at user level). Each app references the shared packages and swaps its bespoke wiring for `AddZbTelemetry` / `AddZbSerilog`, keeping existing meter names, units, log messages, and the `/metrics` path. Each sister repo is its own git repo; work happens on branch `feat/adopt-zb-telemetry`, one commit per task, **never skip hooks, never force-push.** + +**Tech Stack:** .NET 10, OpenTelemetry SDK, Prometheus exporter, Serilog, NuGet Central Package Management (OtOpcUa + ScadaBridge; MxGateway has none). + +**Source design:** [`2026-06-01-telemetry-library-adoption-design.md`](2026-06-01-telemetry-library-adoption-design.md) + +--- + +## Two refinements discovered during planning (deviations from the design doc) + +Both serve the approved **behaviour-preserving** acceptance bar: + +1. **ScadaBridge logging — KEEP `LoggerConfigurationFactory`.** The design doc said "delete the + factory and swap to `AddZbSerilog`." Code review showed the factory implements a documented + governance contract (REQ-HOST-8 / Host-011/014/020/022): `ScadaBridge:Logging:MinimumLevel` is + the floor and **overrides** `Serilog:MinimumLevel`, with operator warnings when both are set or + a level is mistyped. `AddZbSerilog` hard-codes `MinimumLevel.Is(Information)` *before* + `ReadFrom.Configuration`, which inverts that precedence and silently drops the + `ScadaBridge:Logging:MinimumLevel` knob (and breaks its tests). **Plan: keep the factory, add the + shared `TraceContextEnricher` to it** (gaining trace↔log correlation) and do NOT adopt + `AddZbSerilog` for ScadaBridge. ScadaBridge still fully adopts the metrics/Resource half. + +2. **MxGateway logging — keep `GatewayLogScope` + request-logging middleware as-is.** The Serilog + MEL provider captures MEL `BeginScope` dictionaries as structured properties, so the existing + middleware keeps producing the same scope properties once Serilog is the provider. The only + logging code changes are: register Serilog as the provider (`AddZbSerilog`), migrate the + `appsettings` `Logging` section to a `Serilog` section, and wrap the static `GatewayLogRedactor` + behind the `ILogRedactor` seam. No rewrite of working scope code. + +--- + +## Execution order & parallelism + +- **Task 0 gates everything** (packages must be on the feed before any repo can restore). +- After Task 0, the **three repo phases are independent** (separate working directories) and may run + concurrently: OtOpcUa (Tasks 1–3), ScadaBridge (Tasks 4–6), MxGateway (Tasks 7–11). +- **Within a repo, tasks are sequential** (same working tree / same branch — do not dispatch two + implementers against one repo concurrently). +- **Task 12** (scadaproj bookkeeping) runs last, after all three phases land. + +Branch setup (first task in each repo creates it): `git checkout -b feat/adopt-zb-telemetry` from the +repo's default branch (`master` for OtOpcUa, `main` for the others). + +--- + +## Task 0: Publish/verify Telemetry packages on the Gitea feed + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** none (gates all) + +**Files:** +- Work in: `/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry/` +- No repo files edited (publish only). Credentials already at `~/.nuget/NuGet/NuGet.Config`. + +**Context:** The library CLAUDE.md claims these are "published to the Gitea NuGet feed." The Health +round proved that claim unreliable. Verify; pack + push only if missing. Mirrors Health Task 0. + +**Step 1: Check whether `ZB.MOM.WW.Telemetry` 0.1.0 is already on the feed** + +```bash +cd /Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry +# Use the user-level creds (source name dohertj2-gitea) already configured. +dotnet nuget list source # confirm dohertj2-gitea is NOT registered globally (creds are user-level only) +curl -s -u "dohertj2:$(grep -A2 dohertj2-gitea ~/.nuget/NuGet/NuGet.Config | grep ClearTextPassword | sed -E 's/.*value="([^"]+)".*/\1/')" \ + "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/zb.mom.ww.telemetry/index.json" -o /tmp/tele.json -w "%{http_code}\n" +``` +Expected: `200` if already published (then SKIP to Step 4), `404` if missing (continue). + +**Step 2: Pack the two packages (only if missing)** + +```bash +dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts +ls ./artifacts/*.nupkg +``` +Expected: `ZB.MOM.WW.Telemetry.0.1.0.nupkg` and `ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg`. + +**Step 3: Push both to Gitea (only if missing)** + +```bash +TOKEN=$(grep -A2 dohertj2-gitea ~/.nuget/NuGet/NuGet.Config | grep ClearTextPassword | sed -E 's/.*value="([^"]+)".*/\1/') +for pkg in ./artifacts/ZB.MOM.WW.Telemetry.0.1.0.nupkg ./artifacts/ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg; do + dotnet nuget push "$pkg" --source "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" --api-key "$TOKEN" +done +``` +Expected: `Your package was pushed.` for each (or `409 Conflict` if a version already exists — acceptable). + +**Step 4: Verify both ids resolve** + +```bash +for id in zb.mom.ww.telemetry zb.mom.ww.telemetry.serilog; do + curl -s -u "dohertj2:$TOKEN" "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/registration/$id/index.json" -w " -> %{http_code}\n" -o /dev/null +done +``` +Expected: `-> 200` for both. + +**Step 5: No commit** (publish-only task). Record completion. + +> **SECURITY:** the Gitea token must NEVER be written into any repo file or commit. It lives only in +> `~/.nuget/NuGet/NuGet.Config`. The `curl`/`push` commands read it from there at runtime. + +--- + +## Task 1: OtOpcUa — distribution wiring (source mapping + package refs) + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 4, Task 7 (other repos) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/NuGet.config` +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/Directory.Packages.props` +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj` + +**Step 1: Branch** +```bash +cd /Users/dohertj2/Desktop/OtOpcUa && git checkout master && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry +``` + +**Step 2: Add Telemetry patterns to `NuGet.config`** — under ``, add BOTH patterns (the `.*` glob does NOT match the bare core id): +```xml + + + + + + +``` + +**Step 3: Add versions to `Directory.Packages.props`** (next to the Health `` lines): +```xml + + +``` + +**Step 4: Add versionless refs to the Host csproj** (next to the `ZB.MOM.WW.Health` refs): +```xml + + +``` + +**Step 5: Restore + build to confirm the Gitea feed resolves and Serilog floor is satisfied** +```bash +dotnet restore ZB.MOM.WW.OtOpcUa.slnx +dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug +``` +Expected: restore pulls both packages from `dohertj2-gitea`; build succeeds. If restore fails on a +`Serilog.AspNetCore` floor (OtOpcUa pins 9.0.0), bump `Serilog.AspNetCore` (and the related +`Serilog.*` 9.x lines) in `Directory.Packages.props` to the floor the package requires, then rebuild. + +**Step 6: Commit** +```bash +git add NuGet.config Directory.Packages.props src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj +git commit -m "build(otopcua): reference ZB.MOM.WW.Telemetry packages from Gitea feed" +``` + +--- + +## Task 2: OtOpcUa — swap OTel wiring to AddZbTelemetry + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** none (within OtOpcUa) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs` (rewrite body; keep both method names + signatures) +- Test (oracle, do not edit): `/Users/dohertj2/Desktop/OtOpcUa/tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Observability/OtOpcUaTelemetryHookTests.cs` + +**Context:** Today `AddOtOpcUaObservability()` (called at `Program.cs:138`) hand-wires +`AddOpenTelemetry().WithMetrics(...AddMeter("ZB.MOM.WW.OtOpcUa")...AddPrometheusExporter()).WithTracing(...AddSource("ZB.MOM.WW.OtOpcUa"))`, +and `MapOtOpcUaMetrics()` (called at `Program.cs:160`) maps `/metrics`. Keep both call sites +unchanged; rewrite the extension bodies to delegate to the shared library. **Same meter/source +names + same `/metrics` path** ⇒ behaviour-preserving; gains the Resource identity triple + +standard instrumentation. + +**Step 1: Rewrite `ObservabilityExtensions.cs`** preserving the two public method signatures: +```csharp +using Microsoft.AspNetCore.Routing; +using Microsoft.Extensions.DependencyInjection; +using ZB.MOM.WW.OtOpcUa.Commons.Observability; // OtOpcUaTelemetry +using ZB.MOM.WW.Telemetry; + +namespace ZB.MOM.WW.OtOpcUa.Host.Observability; + +/// +/// OtOpcUa observability wiring, delegated to the shared ZB.MOM.WW.Telemetry library. +/// Keeps the existing meter/ActivitySource names ("ZB.MOM.WW.OtOpcUa") and the "/metrics" +/// scrape path, and adds the shared OTel Resource + standard instrumentation. +/// +public static class ObservabilityExtensions +{ + public static IServiceCollection AddOtOpcUaObservability(this IServiceCollection services) + { + ArgumentNullException.ThrowIfNull(services); + return services.AddZbTelemetry(o => + { + o.ServiceName = "otopcua"; + o.Meters = [OtOpcUaTelemetry.MeterName]; // "ZB.MOM.WW.OtOpcUa" + o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; // "ZB.MOM.WW.OtOpcUa" + // Exporter defaults to Prometheus — preserves the existing /metrics posture. + }); + } + + // Keep the SAME signature the Program.cs:160 call site uses (app.MapOtOpcUaMetrics()). + // MapZbMetrics() maps MapPrometheusScrapingEndpoint() whose default path is "/metrics". + public static IEndpointRouteBuilder MapOtOpcUaMetrics(this IEndpointRouteBuilder endpoints) + { + ArgumentNullException.ThrowIfNull(endpoints); + endpoints.MapZbMetrics(); + return endpoints; + } +} +``` +> If the existing `MapOtOpcUaMetrics` extends `WebApplication`/`IApplicationBuilder` rather than +> `IEndpointRouteBuilder`, keep THAT receiver type and call `app.MapZbMetrics();` — match the +> current signature so `Program.cs:160` compiles unchanged. + +**Step 2: Build** +```bash +cd /Users/dohertj2/Desktop/OtOpcUa && dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug +``` +Expected: PASS. (The now-redundant direct `OpenTelemetry.Extensions.Hosting` / +`OpenTelemetry.Exporter.Prometheus.AspNetCore` refs may stay — they resolve the same assemblies the +shared package brings; leaving them is lower-risk than pruning.) + +**Step 3: Run the telemetry hook tests (the behaviour oracle)** +```bash +dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~OtOpcUaTelemetryHookTests" +``` +Expected: PASS — the meter `ZB.MOM.WW.OtOpcUa` and ActivitySource still emit (the shared +`AddZbTelemetry` registered them via `o.Meters`/`o.ActivitySources`). + +**Step 4: Commit** +```bash +git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs +git commit -m "feat(otopcua): wire OTel via AddZbTelemetry (shared Resource + std instrumentation)" +``` + +--- + +## Task 3: OtOpcUa — swap Serilog to AddZbSerilog + move sinks to config + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (within OtOpcUa) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs:49-52` (the inline `UseSerilog` block) +- Modify: `/Users/dohertj2/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json` (currently `{}`) +- Test (oracle): `/Users/dohertj2/Desktop/OtOpcUa/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Observability/LogContextEnricherTests.cs` + +**Context:** Today `Program.cs:49-52` configures Serilog in code with `ReadFrom.Configuration` + +`WriteTo.Console()` + `WriteTo.File("logs/otopcua-.log", rollingInterval: Day)`. `AddZbSerilog` uses +`ReadFrom.Configuration` only, so the Console/File sinks must move into config to be reproduced. The +role-specific `appsettings.*.json` already carry `Serilog:MinimumLevel` overrides — those keep +working through `ReadFrom.Configuration`. + +**Step 1: Add the sinks to `appsettings.json`** (replace the empty `{}`): +```json +{ + "Serilog": { + "Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ], + "WriteTo": [ + { "Name": "Console" }, + { "Name": "File", "Args": { "path": "logs/otopcua-.log", "rollingInterval": "Day" } } + ] + } +} +``` +> Do NOT add `"Enrich": ["FromLogContext"]` unless it is already enabled today — adding it would +> newly surface driver-scope properties and change output. Preserve the current enrich set. + +**Step 2: Replace the inline `UseSerilog` block in `Program.cs`.** Remove lines 49-52: +```csharp +builder.Host.UseSerilog((ctx, lc) => lc + .ReadFrom.Configuration(ctx.Configuration) + .WriteTo.Console() + .WriteTo.File("logs/otopcua-.log", rollingInterval: RollingInterval.Day)); +``` +and replace with: +```csharp +builder.AddZbSerilog(o => o.ServiceName = "otopcua"); +``` +Add `using ZB.MOM.WW.Telemetry.Serilog;` to the `using` block. Keep `app.UseSerilogRequestLogging();` +(line 141) unchanged. Keep the existing `using Serilog;` if still referenced; remove +`RollingInterval` import only if now unused. + +**Step 3: Build + run the LogContextEnricher tests** +```bash +cd /Users/dohertj2/Desktop/OtOpcUa +dotnet build ZB.MOM.WW.OtOpcUa.slnx -c Debug +dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~LogContextEnricherTests" +``` +Expected: build PASS; tests PASS (the static `LogContextEnricher.Push` helper is unaffected — it is +not registered in DI and AddZbSerilog does not change its disposable contract). + +**Step 4: Sanity-check that logs still emit** (no automated log-output harness here): +```bash +# Quick smoke: build runs; optionally run the host briefly in a role that doesn't need infra +# and confirm console log lines appear. If no safe role exists, rely on the build + the request- +# logging path remaining wired (UseSerilogRequestLogging at Program.cs:141). +``` + +**Step 5: Commit** +```bash +git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs src/Server/ZB.MOM.WW.OtOpcUa.Host/appsettings.json +git commit -m "feat(otopcua): adopt AddZbSerilog (shared enrichers + trace correlation); sinks to config" +``` + +--- + +## Task 4: ScadaBridge — distribution wiring (source mapping + package refs) + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 7 (other repos) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/nuget.config` +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/Directory.Packages.props` +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj` + +**Step 1: Branch** +```bash +cd /Users/dohertj2/Desktop/ScadaBridge && git checkout main && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry +``` + +**Step 2: Add Telemetry patterns to `nuget.config`** under ``: +```xml + + +``` + +**Step 3: Add versions to `Directory.Packages.props`** (next to the Health lines): +```xml + + +``` + +**Step 4: Add versionless refs to the Host csproj** (next to the Health refs): +```xml + + +``` +> `ZB.MOM.WW.Telemetry.Serilog` is referenced here only for the public `TraceContextEnricher` type +> used in Task 6 — ScadaBridge does NOT call `AddZbSerilog`. + +**Step 5: Restore + build** (watch for OTel version conflicts with the pinned `OpenTelemetry.Api 1.15.3`) +```bash +dotnet restore ZB.MOM.WW.ScadaBridge.slnx +dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug +``` +Expected: PASS. If a transitive OTel version conflicts with the CVE-override `OpenTelemetry.Api`, +align the override version to what the shared package requires. + +**Step 6: Commit** +```bash +git add nuget.config Directory.Packages.props src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj +git commit -m "build(scadabridge): reference ZB.MOM.WW.Telemetry packages from Gitea feed" +``` + +--- + +## Task 5: ScadaBridge — AddZbTelemetry in both composition roots + MapZbMetrics + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (within ScadaBridge) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs` (`BindSharedOptions`, ~lines 100-117 — add the registration; called by BOTH roots) +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs` (Central endpoint section ~206-259; Site endpoint section ~307-320 — add `app.MapZbMetrics()` in each) +- Test: `/Users/dohertj2/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/` (add a `/metrics`-served assertion; HealthCheckTests pattern with `WebApplicationFactory`) + +**Context:** ScadaBridge has NO OTel today (only the `OpenTelemetry.Api` CVE override). `SiteId`, +`NodeRole`, `NodeHostname` are available from config (`ScadaBridge:Node:*`). `BindSharedOptions` is +called by both the Central and Site roots, so registering telemetry there covers both without +duplication. This is purely additive (no metrics exist to break). + +**Step 1: Register telemetry in `BindSharedOptions`.** Inside `SiteServiceRegistration.BindSharedOptions(IServiceCollection services, IConfiguration config)`, after the existing `services.Configure<...>` calls, add: +```csharp +// Shared OTel: Resource identity (service.name / site.id / node.role) + standard instrumentation +// + Prometheus exporter. Mounted at /metrics by app.MapZbMetrics() in each composition root. +services.AddZbTelemetry(o => +{ + o.ServiceName = "scadabridge"; + o.SiteId = config["ScadaBridge:Node:SiteId"] ?? "central"; + o.NodeRole = config["ScadaBridge:Node:Role"]; + // o.Meters left empty — application instruments are a deferred follow-on (GAPS #9). +}); +``` +Add `using ZB.MOM.WW.Telemetry;`. (Use the SAME default `?? "central"` for SiteId that +`Program.cs:45` uses, so the Resource attribute matches the log enricher value.) + +**Step 2: Map `/metrics` in BOTH roots.** In `Program.cs`: +- Central block — after `app.UseRouting()` and alongside the other `Map*` calls (e.g. just after `app.MapZbHealth();`), add: + ```csharp + app.MapZbMetrics(); + ``` +- Site block — in its endpoint section (where `app.MapGrpcService<...>()` is mapped, ~307-320), add: + ```csharp + app.MapZbMetrics(); + ``` +Add `using ZB.MOM.WW.Telemetry;` to `Program.cs` if not already present. `MapZbMetrics()` requires +routing; the Central block already calls `UseRouting()`, and the Site block's `MapGrpcService` +implies endpoint routing — if the Site app lacks `UseRouting()`, add it before `MapZbMetrics()`. + +**Step 3: Add a `/metrics` integration test** in the Host.Tests project (mirror `HealthCheckTests`): +```csharp +[Fact] +public async Task Metrics_Endpoint_IsMapped() +{ + using var factory = /* existing WebApplicationFactory setup for Central role */; + using var client = factory.CreateClient(); + var response = await client.GetAsync("/metrics"); + Assert.Equal(HttpStatusCode.OK, response.StatusCode); + var body = await response.Content.ReadAsStringAsync(); + Assert.Contains("# ", body); // Prometheus exposition format (HELP/TYPE comments) +} +``` +> Reuse the exact `WebApplicationFactory` + in-memory config bootstrapping that +> `HealthCheckTests.cs` already uses for the Central role (it sets the env to "Central" and removes +> the Akka hosted service). Do not invent a new harness. + +**Step 4: Build + test** +```bash +cd /Users/dohertj2/Desktop/ScadaBridge +dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug +dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter "FullyQualifiedName~HealthCheckTests|FullyQualifiedName~Metrics_Endpoint_IsMapped|FullyQualifiedName~CompositionRoot" +``` +Expected: PASS (existing composition-root + health tests stay green; new metrics test passes). + +**Step 5: Commit** +```bash +git add src/ZB.MOM.WW.ScadaBridge.Host/SiteServiceRegistration.cs src/ZB.MOM.WW.ScadaBridge.Host/Program.cs tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ +git commit -m "feat(scadabridge): wire AddZbTelemetry + /metrics in both composition roots" +``` + +--- + +## Task 6: ScadaBridge — add shared TraceContextEnricher to LoggerConfigurationFactory + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** none (within ScadaBridge) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs` (the `Build` return expression) +- Test (oracle): `/Users/dohertj2/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/SerilogTests.cs` (+ any `LoggerConfigurationFactory` tests) + +**Context (deviation from design doc — see top of plan):** KEEP `LoggerConfigurationFactory` intact +(it owns the Host-011/014/020/022 minimum-level governance). Only add the shared +`TraceContextEnricher` so logs emitted inside a span carry `trace_id`/`span_id` and can be joined to +traces. This gains the cross-cutting correlation win without regressing ScadaBridge's logging +contract. + +**Step 1: Add the enricher to the `Build` return.** In `LoggerConfigurationFactory.Build(...)`, the +final expression currently ends: +```csharp + return new LoggerConfiguration() + .ReadFrom.Configuration(configuration) + .MinimumLevel.Is(minimumLevel) + .Enrich.WithProperty("SiteId", siteId) + .Enrich.WithProperty("NodeHostname", nodeHostname) + .Enrich.WithProperty("NodeRole", nodeRole); +``` +Add the shared enricher as the last `.Enrich`: +```csharp + .Enrich.WithProperty("NodeRole", nodeRole) + .Enrich.With(new ZB.MOM.WW.Telemetry.Serilog.TraceContextEnricher()); +``` +(Or add `using ZB.MOM.WW.Telemetry.Serilog;` and use `.Enrich.With(new TraceContextEnricher())`.) + +**Step 2: Build + run the Serilog tests** +```bash +cd /Users/dohertj2/Desktop/ScadaBridge +dotnet build ZB.MOM.WW.ScadaBridge.slnx -c Debug +dotnet test ZB.MOM.WW.ScadaBridge.slnx --filter "FullyQualifiedName~SerilogTests|FullyQualifiedName~LoggerConfiguration" +``` +Expected: PASS. The three node-identity enrichers and the min-level governance are untouched; +`trace_id`/`span_id` only appear when an `Activity.Current` exists (none in these tests → no change +to asserted properties). + +**Step 3: Commit** +```bash +git add src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs +git commit -m "feat(scadabridge): add shared TraceContextEnricher to log pipeline (trace correlation)" +``` + +--- + +## Task 7: MxAccessGateway — distribution wiring (source mapping + package refs) + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 4 (other repos) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/nuget.config` +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (NO CPM — direct versioned refs) + +**Step 1: Branch** +```bash +cd /Users/dohertj2/Desktop/MxAccessGateway && git checkout main && git pull --ff-only && git checkout -b feat/adopt-zb-telemetry +``` + +**Step 2: Add Telemetry patterns to `nuget.config`** under ``: +```xml + + +``` + +**Step 3: Add direct versioned refs to the Server csproj** (in the main `` of ``s). MxGateway has no Serilog/OTel today, so it needs the shared packages AND the concrete sink assemblies referenced by the `appsettings` `Using` block: +```xml + + + + + +``` +> Versions align with ScadaBridge's pins (Serilog.AspNetCore 10.0.0, Console 6.1.1, File 7.0.0). If +> the `.Serilog` package requires a different `Serilog.AspNetCore` floor, match it. + +**Step 4: Restore + build** +```bash +dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug +``` +Expected: PASS (packages resolve from Gitea + nuget.org). + +**Step 5: Commit** +```bash +git add nuget.config src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj +git commit -m "build(mxgateway): reference ZB.MOM.WW.Telemetry + Serilog packages" +``` + +--- + +## Task 8: MxAccessGateway — migrate appsettings Logging → Serilog section + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** none (within MxGateway) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/appsettings.json` + +**Context:** Current `Logging` (MEL) section: `Default: Information`, `Microsoft.AspNetCore: Warning`. +`AddZbSerilog` reads sinks/levels via `ReadFrom.Configuration` from a `Serilog` section. Translate +the levels and add Console + File sinks so logging output is preserved after the provider swap. + +**Step 1: Replace the `Logging` block with a `Serilog` block.** Remove: +```json + "Logging": { + "LogLevel": { "Default": "Information", "Microsoft.AspNetCore": "Warning" } + }, +``` +Add: +```json + "Serilog": { + "Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ], + "MinimumLevel": { + "Default": "Information", + "Override": { "Microsoft.AspNetCore": "Warning" } + }, + "WriteTo": [ + { "Name": "Console" }, + { "Name": "File", "Args": { "path": "logs/mxgateway-.log", "rollingInterval": "Day" } } + ] + }, +``` +> Keep the rest of `appsettings.json` (gateway config) unchanged. Note: `AddZbSerilog` applies its +> own `MinimumLevel.Is(Information)` before `ReadFrom.Configuration`, so the `Serilog:MinimumLevel` +> above is honoured (raising the floor to Information and overriding Microsoft.AspNetCore to Warning +> — matching today's MEL levels). + +**Step 2: Commit** (config-only; build happens in Task 9 once the provider is wired) +```bash +git add src/ZB.MOM.WW.MxGateway.Server/appsettings.json +git commit -m "config(mxgateway): translate MEL Logging section to Serilog" +``` + +--- + +## Task 9: MxAccessGateway — wire AddZbSerilog (MEL → Serilog provider swap) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (within MxGateway) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (`CreateBuilder`, after `ConfigureSelfSignedTls(builder)` ~line 63) +- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs` (add a provider-swap assertion) + +**Context (high-risk — logging on the most operational app):** Register Serilog as the host's +logging provider so all existing MEL `ILogger`/`ILoggerFactory` calls (including +`UseGatewayRequestLoggingScope`'s middleware) route through Serilog. The Serilog MEL provider +captures MEL `BeginScope` dictionaries as structured properties, so `GatewayLogScope` and the +request-logging middleware keep working unchanged. The temporary `LoggerFactory.Create(...AddConsole())` +at lines 96-100 (used only by the TLS cert provider) may remain as-is. + +**Step 1: Add the failing test** in `GatewayApplicationTests.cs` — assert the logger factory is now Serilog-backed: +```csharp +[Fact] +public void Build_UsesSerilogLoggerProvider() +{ + using var app = GatewayApplication.Build([]); + var factory = app.Services.GetRequiredService(); + // Serilog.Extensions.Hosting registers SerilogLoggerFactory when AddSerilog replaces the factory. + Assert.Equal("SerilogLoggerFactory", factory.GetType().Name); +} +``` + +**Step 2: Run it — expect FAIL** (`dotnet test ... --filter Build_UsesSerilogLoggerProvider`) → today the factory is the default MEL `LoggerFactory`. + +**Step 3: Wire `AddZbSerilog`.** In `GatewayApplication.CreateBuilder`, immediately after +`ConfigureSelfSignedTls(builder);`, add: +```csharp +builder.AddZbSerilog(o => o.ServiceName = "mxgateway"); +``` +Add `using ZB.MOM.WW.Telemetry.Serilog;`. (`AddZbSerilog` calls `services.AddSerilog(..., preserveStaticLogger: true)`, +which registers `SerilogLoggerFactory` — replacing the MEL factory, so default providers do not +double-log.) + +**Step 4: Run the test — expect PASS**, then run the broader logging-adjacent suites: +```bash +cd /Users/dohertj2/Desktop/MxAccessGateway +dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayApplicationTests" +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~FakeWorker" +``` +Expected: PASS — `Build_MapsCanonicalHealthEndpoints`, `Build_RegistersGatewayMetrics`, the +config-validation cases, and the fake-worker smoke all stay green; the new provider-swap test passes. + +**Step 5: Verify no double console logging** — if `SerilogLoggerFactory` is confirmed in Step 4, the +default providers are bypassed and no extra step is needed. If you observe duplicated console lines +in any manual run, add `builder.Logging.ClearProviders();` immediately before `AddZbSerilog`. + +**Step 6: Commit** +```bash +git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs +git commit -m "feat(mxgateway): adopt AddZbSerilog — MEL→Serilog provider swap (behaviour-preserving)" +``` + +--- + +## Task 10: MxAccessGateway — wrap GatewayLogRedactor behind the ILogRedactor seam + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** none (within MxGateway) + +**Files:** +- Create: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorSeam.cs` +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (register the seam in DI in `CreateBuilder`) +- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorSeamTests.cs` + +**Context:** The shared `RedactionEnricher` applies any DI-registered `ILogRedactor` to every log +event before it reaches a sink. MxGateway's redaction lives in the static `GatewayLogRedactor` +(API-key Bearer tokens, client identity). Provide a thin `ILogRedactor` that redacts the relevant +log-event properties (`ClientIdentity`, `authorization`) via the existing static helper. Keep +`GatewayLogRedactor` for its current callers (`GatewayLogScope`, `DashboardRedactor`). + +**Step 1: Write the failing test** (`GatewayLogRedactorSeamTests.cs`): +```csharp +using System.Collections.Generic; +using ZB.MOM.WW.MxGateway.Server.Diagnostics; +using Xunit; + +public class GatewayLogRedactorSeamTests +{ + [Fact] + public void Redact_MasksApiKeyInClientIdentity() + { + var redactor = new GatewayLogRedactorSeam(); + var props = new Dictionary + { + ["ClientIdentity"] = "Bearer mxgw_operator01_super-secret" + }; + redactor.Redact(props); + Assert.Equal("Bearer mxgw_operator01_[redacted]", props["ClientIdentity"]); + } +} +``` + +**Step 2: Run it — expect FAIL** (type doesn't exist). + +**Step 3: Implement `GatewayLogRedactorSeam.cs`:** +```csharp +using ZB.MOM.WW.Telemetry.Serilog; + +namespace ZB.MOM.WW.MxGateway.Server.Diagnostics; + +/// +/// Adapts the static to the shared seam +/// so the telemetry RedactionEnricher masks API-key/credential material on every log event. +/// +public sealed class GatewayLogRedactorSeam : ILogRedactor +{ + private static readonly string[] IdentityKeys = ["ClientIdentity", "authorization", "Authorization"]; + + public void Redact(IDictionary properties) + { + ArgumentNullException.ThrowIfNull(properties); + foreach (var key in IdentityKeys) + { + if (properties.TryGetValue(key, out var value) && value is string s) + { + properties[key] = GatewayLogRedactor.RedactClientIdentity(s); + } + } + } +} +``` + +**Step 4: Register in DI.** In `GatewayApplication.CreateBuilder`, alongside the other singletons, add: +```csharp +builder.Services.AddSingleton(); +``` + +**Step 5: Run the test + build** +```bash +cd /Users/dohertj2/Desktop/MxAccessGateway +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayLogRedactorSeamTests" +dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug +``` +Expected: PASS. + +**Step 6: Commit** +```bash +git add src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactorSeam.cs src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorSeamTests.cs +git commit -m "feat(mxgateway): expose GatewayLogRedactor via shared ILogRedactor seam" +``` + +--- + +## Task 11: MxAccessGateway — wire AddZbTelemetry (export GatewayMetrics) + MapZbMetrics + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** none (within MxGateway) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs` (`CreateBuilder` after `AddSingleton()` ~line 72; `MapGatewayEndpoints` after `MapZbHealth()` ~line 177) +- Test: `/Users/dohertj2/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs` (add `/metrics`-served assertion) + existing `GatewayMetricsTests` as oracle + +**Context:** The `MxGateway.Server` meter (13 counters, 3 ms-histograms, 4 gauges) exists but is +never exported (no OTel SDK, no `/metrics`). `AddZbTelemetry` with `Meters = ["MxGateway.Server"]` +registers the meter with the OTel MeterProvider + Prometheus exporter; `MapZbMetrics()` mounts +`/metrics`. **Keep the `MxGateway.Server` name and the `ms` histogram units** (rename #7 + unit #6 +are deferred). `GetSnapshot()` is untouched. + +**Step 1: Add `AddZbTelemetry` in `CreateBuilder`**, immediately after `builder.Services.AddSingleton();`: +```csharp +builder.AddZbTelemetry(o => +{ + o.ServiceName = "mxgateway"; + o.Meters = [GatewayMetrics.MeterName]; // "MxGateway.Server" — unchanged (rename deferred) +}); +``` +Add `using ZB.MOM.WW.Telemetry;`. + +**Step 2: Map `/metrics` in `MapGatewayEndpoints`**, after `endpoints.MapZbHealth();`: +```csharp +endpoints.MapZbMetrics(); +``` + +**Step 3: Add the served-endpoint test** in `GatewayApplicationTests.cs`: +```csharp +[Fact] +public async Task Build_MapsMetricsEndpoint() +{ + using var app = GatewayApplication.Build([]); + await app.StartAsync(); + try + { + using var client = new HttpClient { BaseAddress = new Uri(app.Urls.First()) }; + var response = await client.GetAsync("/metrics"); + Assert.Equal(HttpStatusCode.OK, response.StatusCode); + } + finally { await app.StopAsync(); } +} +``` +> If the existing test class already has a started-host helper (the config-validation tests call +> `StartAsync`), reuse it rather than starting a fresh host. Tests bind ephemeral ports (`:0`). + +**Step 4: Build + test** +```bash +cd /Users/dohertj2/Desktop/MxAccessGateway +dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj -c Debug +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~GatewayApplicationTests|FullyQualifiedName~GatewayMetricsTests" +``` +Expected: PASS — the `MeterListener`-based `GatewayMetricsTests` (Tests-027 isolation) stay green +because the meter name/instruments are unchanged; the new `/metrics` test passes. + +**Step 5: Commit** +```bash +git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs +git commit -m "feat(mxgateway): export GatewayMetrics via AddZbTelemetry + /metrics (name/units unchanged)" +``` + +--- + +## Task 12: scadaproj — bookkeeping (GAPS + correct the false "MxGateway logging adopted" claim) + +**Classification:** trivial +**Estimated implement time:** ~4 min +**Parallelizable with:** none (runs after all repo phases) + +**Files:** +- Modify: `/Users/dohertj2/Desktop/scadaproj/components/observability/GAPS.md` (add "Adoption status — 2026-06-01 (DONE)" section) +- Modify: `/Users/dohertj2/Desktop/scadaproj/components/observability/README.md` (correct the "MxGateway logging adopted" claim) +- Modify: `/Users/dohertj2/Desktop/scadaproj/ZB.MOM.WW.Telemetry/CLAUDE.md` (same correction) +- Modify: `/Users/dohertj2/Desktop/scadaproj/CLAUDE.md` (observability row + "MxAccessGateway logging adopted" note) + +**Step 1: Add an adoption-status section to `GAPS.md`** with a per-repo table (what each app now +does), the **accepted scope note** (ScadaBridge keeps `LoggerConfigurationFactory` + adds +`TraceContextEnricher` rather than adopting `AddZbSerilog`; MxGateway keeps `GatewayLogScope`), and a +**Deferred** subsection listing #6 (histogram ms→s), #7 (meter rename), #9 (ScadaBridge app +instruments), #10/#11 (OTLP) as still-open. + +**Step 2: Correct the false claim** everywhere it appears — the prior text said MxGateway's MEL→Serilog +migration was "done on its own branch." Replace with: "MxGateway MEL→Serilog migration + metrics +export landed on `main` via the 2026-06-01 telemetry adoption (branch `feat/adopt-zb-telemetry`)." + +**Step 3: Commit** +```bash +cd /Users/dohertj2/Desktop/scadaproj +git add components/observability/GAPS.md components/observability/README.md ZB.MOM.WW.Telemetry/CLAUDE.md CLAUDE.md +git commit -m "docs(observability): record ZB.MOM.WW.Telemetry adoption across 3 apps; correct MxGateway logging-status claim" +``` + +--- + +## Acceptance checklist (whole plan) + +- [ ] Both Telemetry packages resolve from the Gitea feed (Task 0 verified `200`). +- [ ] OtOpcUa: builds; `OtOpcUaTelemetryHookTests` + `LogContextEnricherTests` green; `/metrics` still served; meter `ZB.MOM.WW.OtOpcUa` unchanged. +- [ ] ScadaBridge: builds; composition-root + health + new metrics tests green; `/metrics` served in both roles; `LoggerConfigurationFactory` governance intact. +- [ ] MxGateway: builds; `GatewayApplicationTests` + `GatewayMetricsTests` + fake-worker smoke green; logger is Serilog-backed; redaction applied via seam; `/metrics` served; `MxGateway.Server` name + `ms` units unchanged. +- [ ] No secrets committed to any repo (token stays in `~/.nuget/NuGet/NuGet.Config`). +- [ ] `components/observability/GAPS.md` updated; the false "MxGateway logging adopted" claim corrected. +- [ ] All three feature branches committed (one commit per task), no hooks skipped, no force-push. diff --git a/docs/plans/2026-06-01-telemetry-library-adoption.md.tasks.json b/docs/plans/2026-06-01-telemetry-library-adoption.md.tasks.json new file mode 100644 index 0000000..80fa80d --- /dev/null +++ b/docs/plans/2026-06-01-telemetry-library-adoption.md.tasks.json @@ -0,0 +1,20 @@ +{ + "planPath": "docs/plans/2026-06-01-telemetry-library-adoption.md", + "tasks": [ + {"id": 0, "taskId": 23, "subject": "Task 0: Publish/verify Telemetry packages on Gitea", "status": "pending", "classification": "small"}, + {"id": 1, "taskId": 24, "subject": "Task 1: OtOpcUa — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]}, + {"id": 2, "taskId": 25, "subject": "Task 2: OtOpcUa — swap OTel to AddZbTelemetry", "status": "pending", "classification": "standard", "blockedBy": [1]}, + {"id": 3, "taskId": 26, "subject": "Task 3: OtOpcUa — swap Serilog to AddZbSerilog", "status": "pending", "classification": "standard", "blockedBy": [2]}, + {"id": 4, "taskId": 27, "subject": "Task 4: ScadaBridge — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]}, + {"id": 5, "taskId": 28, "subject": "Task 5: ScadaBridge — AddZbTelemetry both roots + MapZbMetrics", "status": "pending", "classification": "standard", "blockedBy": [4]}, + {"id": 6, "taskId": 29, "subject": "Task 6: ScadaBridge — TraceContextEnricher in LoggerConfigurationFactory", "status": "pending", "classification": "small", "blockedBy": [5]}, + {"id": 7, "taskId": 30, "subject": "Task 7: MxAccessGateway — distribution wiring", "status": "pending", "classification": "small", "blockedBy": [0]}, + {"id": 8, "taskId": 31, "subject": "Task 8: MxAccessGateway — appsettings Logging → Serilog", "status": "pending", "classification": "small", "blockedBy": [7]}, + {"id": 9, "taskId": 32, "subject": "Task 9: MxAccessGateway — AddZbSerilog (MEL→Serilog provider swap)", "status": "pending", "classification": "high-risk", "blockedBy": [8]}, + {"id": 10, "taskId": 33, "subject": "Task 10: MxAccessGateway — ILogRedactor seam", "status": "pending", "classification": "standard", "blockedBy": [9]}, + {"id": 11, "taskId": 34, "subject": "Task 11: MxAccessGateway — AddZbTelemetry metrics export + MapZbMetrics", "status": "pending", "classification": "standard", "blockedBy": [10]}, + {"id": 12, "taskId": 35, "subject": "Task 12: scadaproj — bookkeeping + correct false claim", "status": "pending", "classification": "trivial", "blockedBy": [3, 6, 11]} + ], + "notes": "Task 0 gates all. After Task 0 the three repo phases (OtOpcUa 1-3, ScadaBridge 4-6, MxGateway 7-11) are independent and may run concurrently across their separate working directories; within a repo tasks are sequential. Task 12 last.", + "lastUpdated": "2026-06-01" +}