From 6c2a43a23827a3c0f797026dbbda82ce18307eed Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 1 Jun 2026 16:32:57 -0400 Subject: [PATCH] docs: plan for ZB.MOM.WW.Telemetry follow-ons (A additive/hygiene, B metric normalization, C ScadaBridge instruments, D OTLP opt-in) --- docs/plans/2026-06-01-telemetry-followons.md | 117 +++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/plans/2026-06-01-telemetry-followons.md diff --git a/docs/plans/2026-06-01-telemetry-followons.md b/docs/plans/2026-06-01-telemetry-followons.md new file mode 100644 index 0000000..92bf7c4 --- /dev/null +++ b/docs/plans/2026-06-01-telemetry-followons.md @@ -0,0 +1,117 @@ +# ZB.MOM.WW.Telemetry — Follow-ons Implementation Plan + +> Continuation of [`2026-06-01-telemetry-library-adoption.md`](2026-06-01-telemetry-library-adoption.md). +> Executes the deferred follow-ons recorded in `components/observability/GAPS.md`, all four groups +> selected by the user. + +**Goal:** Close the recorded telemetry follow-ons across the three apps — additive/hygiene fixes, +MxGateway metric normalization, ScadaBridge first application instruments, and OTLP opt-in. + +**Branches:** new `feat/telemetry-followons` per repo (off the now-updated default). Commit per task, +never skip hooks, never force-push. The three repo phases are independent (parallel); within a repo, +sequential. + +**Behaviour bar:** additive/opt-in by default (Prometheus stays the default exporter; new instruments +are new series; the MxGateway `ms`→`s` + rename are the *one* intentional metric-shape change, safe +because those series were never Prometheus-exported before the adoption). + +--- + +## OtOpcUa (branch `feat/telemetry-followons` off `master`) + +### Task O-A2: align Serilog to the 10.x line +**Classification:** small · **Files:** `Directory.Packages.props` +Bump `Serilog.AspNetCore`, `Serilog.Extensions.Hosting`, `Serilog.Settings.Configuration` from +`9.0.0` → `10.0.0` (ScadaBridge already runs `10.0.0` with `Serilog 4.x`, so 10.x is 4.x-compatible — +no Serilog 5 needed). Keep `Serilog 4.3.0` (or bump to `4.3.1` to match ScadaBridge). Restore + build +`ZB.MOM.WW.OtOpcUa.slnx`; run `--filter LogContextEnricherTests`. Commit. + +### Task O-D: OTLP exporter opt-in (config-driven) +**Classification:** standard · **Parallelizable with:** O-A2 (disjoint files) +**Files:** `src/Server/.../Observability/ObservabilityExtensions.cs`, `src/Server/.../Program.cs:138` +Refactor `AddOtOpcUaObservability` to accept `IConfiguration` and read +`OtOpcUa:Telemetry:Exporter` (`Prometheus`|`Otlp`, default Prometheus) + `OtOpcUa:Telemetry:OtlpEndpoint`; +set `o.Exporter`/`o.OtlpEndpoint` accordingly. Update the call site to +`builder.Services.AddOtOpcUaObservability(builder.Configuration)`. Default (no config) stays Prometheus. +This also makes OtOpcUa's recorded spans exportable when OTLP is configured (resolves the trace no-op). +Build; run `OtOpcUaTelemetryHookTests`. Commit. + +--- + +## MxAccessGateway (branch `feat/telemetry-followons` off `main`) + +### Task M-A3: gitignore stray doc artifacts +**Classification:** trivial · **Files:** `.gitignore` +Append a `# Documentation review artifacts` block ignoring `*-docs-issues.md`, `*-docs-fixed.md`, +`*-docs-final.md` (the 5 untracked `*-docs-*.md` files are CommentChecker "Documentation Analysis +Report" output). Commit. (Do NOT delete the files — just ignore.) + +### Task M-B: metric normalization (`ms`→`s` + meter rename) +**Classification:** standard · **Files:** `src/.../Metrics/GatewayMetrics.cs`, test if needed +- Rename `MeterName` const `"MxGateway.Server"` → `"ZB.MOM.WW.MxGateway"`. (AddZbTelemetry uses the + const, so it follows automatically; no test asserts the literal; `GatewayMetricsTests` filter by + meter *instance*, not name.) +- Change the 3 histograms' unit `"ms"`→`"s"` (CreateHistogram lines) and their 4 record sites + `.TotalMilliseconds` → `.TotalSeconds`. The snapshot/dashboard do NOT read these histograms, so no + read-path impact. Check `GatewayMetricsTests` for any histogram-value assertion in ms and update. +Build the Server project; run `--filter "GatewayMetricsTests|GatewayApplicationTests"`. Commit. + +### Task M-D: OTLP exporter opt-in +**Classification:** small · **Files:** `src/.../GatewayApplication.cs` (the `AddZbTelemetry` lambda) +In the `AddZbTelemetry` lambda, read `MxGateway:Telemetry:Exporter` + `MxGateway:Telemetry:OtlpEndpoint` +from `builder.Configuration` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build. +Commit. (Sequential after M-B — both touch GatewayApplication.cs / metrics area.) + +--- + +## ScadaBridge (branch `feat/telemetry-followons` off `main`) + +### Task S-A1: site-node HTTP/1.1 `/metrics` listener +**Classification:** standard · **Files:** `src/.../NodeOptions.cs`, `src/.../Program.cs` (Site Kestrel) +Add `MetricsPort` (default `8082`) to `NodeOptions`. In the Site block's `ConfigureKestrel`, add a +second `ListenAnyIP(metricsPort, lo => lo.Protocols = Http1AndHttp2)` alongside the existing HTTP/2-only +gRPC-port listener, so the already-mapped `/metrics` becomes scrapable over HTTP/1.1 on site nodes. +Read the port from `ScadaBridge:Node:MetricsPort` (default 8082). Build; existing Host.Tests stay green. +Commit. + +### Task S-D: OTLP exporter opt-in +**Classification:** small · **Files:** `src/.../SiteServiceRegistration.cs` (the `AddZbTelemetry` lambda) +In `BindSharedOptions`, read `ScadaBridge:Telemetry:Exporter` + `ScadaBridge:Telemetry:OtlpEndpoint` +from `config` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build. Commit. +(Sequential after S-C0 — both edit the `AddZbTelemetry` call.) + +### Task S-C0: `ScadaBridgeTelemetry` meter + registration +**Classification:** standard · **Files:** Create `src/ZB.MOM.WW.ScadaBridge.Commons/Observability/ScadaBridgeTelemetry.cs`; edit `SiteServiceRegistration.cs` (`AddZbTelemetry` Meters) +Create a `ScadaBridgeTelemetry` static class: `Meter "ZB.MOM.WW.ScadaBridge"` + the four instruments +(`scadabridge.deployments.applied` counter; `scadabridge.store_and_forward.queue.depth` observable +gauge; `scadabridge.inbound_api.requests` counter; `scadabridge.site.connection.up` up/down gauge) with +thin static emit helpers. Register `o.Meters = ["ZB.MOM.WW.ScadaBridge"]` in the `AddZbTelemetry` call. +Build. Commit. (Precedes C1–C4.) + +### Tasks S-C1…S-C4: wire the four emit points +**Classification:** standard each · depend on S-C0 +- **S-C1 `deployments.applied`** — increment on the DeploymentManager/DeploymentService success path. +- **S-C2 `store_and_forward.queue.depth`** — observable-gauge callback reading the StoreAndForward depth + (SQLite `COUNT`/existing depth accessor). +- **S-C3 `inbound_api.requests`** — increment (tag = method) in the InboundAPI endpoint filter/middleware. +- **S-C4 `site.connection.up`** — +1 on site-stream open, −1 on close in the Communication/SiteStream + gRPC server. +Each implementer finds the cleanest emit point and **STOPs + reports** if no clean point exists rather +than forcing a fragile edit. Add a focused test where practical. Build; commit per instrument. + +--- + +## scadaproj bookkeeping + +### Task Z: update GAPS.md +**Classification:** trivial · **Files:** `components/observability/GAPS.md` +Move the handled follow-ons (#6/#7 done; A1 site-listener done; #9 first instruments done; #10/#11 OTLP +opt-in done) from "Deferred" to a "Follow-ons — DONE 2026-06-01" subsection; note what each app now does. +Commit + (on user request) push all branches/merges. + +--- + +## Sequencing + +After each repo branch is cut: OtOpcUa {O-A2 ∥ O-D}; MxGateway {M-A3 → M-B → M-D}; ScadaBridge +{S-A1 ∥ (S-C0 → {S-C1, S-C2, S-C3, S-C4} → S-D)}. Repos run in parallel. Z + merge/push last.