docs: plan for ZB.MOM.WW.Telemetry follow-ons (A additive/hygiene, B metric normalization, C ScadaBridge instruments, D OTLP opt-in)
This commit is contained in:
@@ -0,0 +1,117 @@
|
||||
# ZB.MOM.WW.Telemetry — Follow-ons Implementation Plan
|
||||
|
||||
> Continuation of [`2026-06-01-telemetry-library-adoption.md`](2026-06-01-telemetry-library-adoption.md).
|
||||
> Executes the deferred follow-ons recorded in `components/observability/GAPS.md`, all four groups
|
||||
> selected by the user.
|
||||
|
||||
**Goal:** Close the recorded telemetry follow-ons across the three apps — additive/hygiene fixes,
|
||||
MxGateway metric normalization, ScadaBridge first application instruments, and OTLP opt-in.
|
||||
|
||||
**Branches:** new `feat/telemetry-followons` per repo (off the now-updated default). Commit per task,
|
||||
never skip hooks, never force-push. The three repo phases are independent (parallel); within a repo,
|
||||
sequential.
|
||||
|
||||
**Behaviour bar:** additive/opt-in by default (Prometheus stays the default exporter; new instruments
|
||||
are new series; the MxGateway `ms`→`s` + rename are the *one* intentional metric-shape change, safe
|
||||
because those series were never Prometheus-exported before the adoption).
|
||||
|
||||
---
|
||||
|
||||
## OtOpcUa (branch `feat/telemetry-followons` off `master`)
|
||||
|
||||
### Task O-A2: align Serilog to the 10.x line
|
||||
**Classification:** small · **Files:** `Directory.Packages.props`
|
||||
Bump `Serilog.AspNetCore`, `Serilog.Extensions.Hosting`, `Serilog.Settings.Configuration` from
|
||||
`9.0.0` → `10.0.0` (ScadaBridge already runs `10.0.0` with `Serilog 4.x`, so 10.x is 4.x-compatible —
|
||||
no Serilog 5 needed). Keep `Serilog 4.3.0` (or bump to `4.3.1` to match ScadaBridge). Restore + build
|
||||
`ZB.MOM.WW.OtOpcUa.slnx`; run `--filter LogContextEnricherTests`. Commit.
|
||||
|
||||
### Task O-D: OTLP exporter opt-in (config-driven)
|
||||
**Classification:** standard · **Parallelizable with:** O-A2 (disjoint files)
|
||||
**Files:** `src/Server/.../Observability/ObservabilityExtensions.cs`, `src/Server/.../Program.cs:138`
|
||||
Refactor `AddOtOpcUaObservability` to accept `IConfiguration` and read
|
||||
`OtOpcUa:Telemetry:Exporter` (`Prometheus`|`Otlp`, default Prometheus) + `OtOpcUa:Telemetry:OtlpEndpoint`;
|
||||
set `o.Exporter`/`o.OtlpEndpoint` accordingly. Update the call site to
|
||||
`builder.Services.AddOtOpcUaObservability(builder.Configuration)`. Default (no config) stays Prometheus.
|
||||
This also makes OtOpcUa's recorded spans exportable when OTLP is configured (resolves the trace no-op).
|
||||
Build; run `OtOpcUaTelemetryHookTests`. Commit.
|
||||
|
||||
---
|
||||
|
||||
## MxAccessGateway (branch `feat/telemetry-followons` off `main`)
|
||||
|
||||
### Task M-A3: gitignore stray doc artifacts
|
||||
**Classification:** trivial · **Files:** `.gitignore`
|
||||
Append a `# Documentation review artifacts` block ignoring `*-docs-issues.md`, `*-docs-fixed.md`,
|
||||
`*-docs-final.md` (the 5 untracked `*-docs-*.md` files are CommentChecker "Documentation Analysis
|
||||
Report" output). Commit. (Do NOT delete the files — just ignore.)
|
||||
|
||||
### Task M-B: metric normalization (`ms`→`s` + meter rename)
|
||||
**Classification:** standard · **Files:** `src/.../Metrics/GatewayMetrics.cs`, test if needed
|
||||
- Rename `MeterName` const `"MxGateway.Server"` → `"ZB.MOM.WW.MxGateway"`. (AddZbTelemetry uses the
|
||||
const, so it follows automatically; no test asserts the literal; `GatewayMetricsTests` filter by
|
||||
meter *instance*, not name.)
|
||||
- Change the 3 histograms' unit `"ms"`→`"s"` (CreateHistogram lines) and their 4 record sites
|
||||
`.TotalMilliseconds` → `.TotalSeconds`. The snapshot/dashboard do NOT read these histograms, so no
|
||||
read-path impact. Check `GatewayMetricsTests` for any histogram-value assertion in ms and update.
|
||||
Build the Server project; run `--filter "GatewayMetricsTests|GatewayApplicationTests"`. Commit.
|
||||
|
||||
### Task M-D: OTLP exporter opt-in
|
||||
**Classification:** small · **Files:** `src/.../GatewayApplication.cs` (the `AddZbTelemetry` lambda)
|
||||
In the `AddZbTelemetry` lambda, read `MxGateway:Telemetry:Exporter` + `MxGateway:Telemetry:OtlpEndpoint`
|
||||
from `builder.Configuration` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build.
|
||||
Commit. (Sequential after M-B — both touch GatewayApplication.cs / metrics area.)
|
||||
|
||||
---
|
||||
|
||||
## ScadaBridge (branch `feat/telemetry-followons` off `main`)
|
||||
|
||||
### Task S-A1: site-node HTTP/1.1 `/metrics` listener
|
||||
**Classification:** standard · **Files:** `src/.../NodeOptions.cs`, `src/.../Program.cs` (Site Kestrel)
|
||||
Add `MetricsPort` (default `8082`) to `NodeOptions`. In the Site block's `ConfigureKestrel`, add a
|
||||
second `ListenAnyIP(metricsPort, lo => lo.Protocols = Http1AndHttp2)` alongside the existing HTTP/2-only
|
||||
gRPC-port listener, so the already-mapped `/metrics` becomes scrapable over HTTP/1.1 on site nodes.
|
||||
Read the port from `ScadaBridge:Node:MetricsPort` (default 8082). Build; existing Host.Tests stay green.
|
||||
Commit.
|
||||
|
||||
### Task S-D: OTLP exporter opt-in
|
||||
**Classification:** small · **Files:** `src/.../SiteServiceRegistration.cs` (the `AddZbTelemetry` lambda)
|
||||
In `BindSharedOptions`, read `ScadaBridge:Telemetry:Exporter` + `ScadaBridge:Telemetry:OtlpEndpoint`
|
||||
from `config` (in scope) and set `o.Exporter`/`o.OtlpEndpoint`. Default Prometheus. Build. Commit.
|
||||
(Sequential after S-C0 — both edit the `AddZbTelemetry` call.)
|
||||
|
||||
### Task S-C0: `ScadaBridgeTelemetry` meter + registration
|
||||
**Classification:** standard · **Files:** Create `src/ZB.MOM.WW.ScadaBridge.Commons/Observability/ScadaBridgeTelemetry.cs`; edit `SiteServiceRegistration.cs` (`AddZbTelemetry` Meters)
|
||||
Create a `ScadaBridgeTelemetry` static class: `Meter "ZB.MOM.WW.ScadaBridge"` + the four instruments
|
||||
(`scadabridge.deployments.applied` counter; `scadabridge.store_and_forward.queue.depth` observable
|
||||
gauge; `scadabridge.inbound_api.requests` counter; `scadabridge.site.connection.up` up/down gauge) with
|
||||
thin static emit helpers. Register `o.Meters = ["ZB.MOM.WW.ScadaBridge"]` in the `AddZbTelemetry` call.
|
||||
Build. Commit. (Precedes C1–C4.)
|
||||
|
||||
### Tasks S-C1…S-C4: wire the four emit points
|
||||
**Classification:** standard each · depend on S-C0
|
||||
- **S-C1 `deployments.applied`** — increment on the DeploymentManager/DeploymentService success path.
|
||||
- **S-C2 `store_and_forward.queue.depth`** — observable-gauge callback reading the StoreAndForward depth
|
||||
(SQLite `COUNT`/existing depth accessor).
|
||||
- **S-C3 `inbound_api.requests`** — increment (tag = method) in the InboundAPI endpoint filter/middleware.
|
||||
- **S-C4 `site.connection.up`** — +1 on site-stream open, −1 on close in the Communication/SiteStream
|
||||
gRPC server.
|
||||
Each implementer finds the cleanest emit point and **STOPs + reports** if no clean point exists rather
|
||||
than forcing a fragile edit. Add a focused test where practical. Build; commit per instrument.
|
||||
|
||||
---
|
||||
|
||||
## scadaproj bookkeeping
|
||||
|
||||
### Task Z: update GAPS.md
|
||||
**Classification:** trivial · **Files:** `components/observability/GAPS.md`
|
||||
Move the handled follow-ons (#6/#7 done; A1 site-listener done; #9 first instruments done; #10/#11 OTLP
|
||||
opt-in done) from "Deferred" to a "Follow-ons — DONE 2026-06-01" subsection; note what each app now does.
|
||||
Commit + (on user request) push all branches/merges.
|
||||
|
||||
---
|
||||
|
||||
## Sequencing
|
||||
|
||||
After each repo branch is cut: OtOpcUa {O-A2 ∥ O-D}; MxGateway {M-A3 → M-B → M-D}; ScadaBridge
|
||||
{S-A1 ∥ (S-C0 → {S-C1, S-C2, S-C3, S-C4} → S-D)}. Repos run in parallel. Z + merge/push last.
|
||||
Reference in New Issue
Block a user