Files
scadaproj/docs/plans/2026-06-01-telemetry-followons.md
T

6.9 KiB
Raw Blame History

ZB.MOM.WW.Telemetry — Follow-ons Implementation Plan

Continuation of 2026-06-01-telemetry-library-adoption.md. Executes the deferred follow-ons recorded in components/observability/GAPS.md, all four groups selected by the user.

Goal: Close the recorded telemetry follow-ons across the three apps — additive/hygiene fixes, MxGateway metric normalization, ScadaBridge first application instruments, and OTLP opt-in.

Branches: new feat/telemetry-followons per repo (off the now-updated default). Commit per task, never skip hooks, never force-push. The three repo phases are independent (parallel); within a repo, sequential.

Behaviour bar: additive/opt-in by default (Prometheus stays the default exporter; new instruments are new series; the MxGateway mss + rename are the one intentional metric-shape change, safe because those series were never Prometheus-exported before the adoption).


OtOpcUa (branch feat/telemetry-followons off master)

Task O-A2: align Serilog to the 10.x line

Classification: small · Files: Directory.Packages.props Bump Serilog.AspNetCore, Serilog.Extensions.Hosting, Serilog.Settings.Configuration from 9.0.010.0.0 (ScadaBridge already runs 10.0.0 with Serilog 4.x, so 10.x is 4.x-compatible — no Serilog 5 needed). Keep Serilog 4.3.0 (or bump to 4.3.1 to match ScadaBridge). Restore + build ZB.MOM.WW.OtOpcUa.slnx; run --filter LogContextEnricherTests. Commit.

Task O-D: OTLP exporter opt-in (config-driven)

Classification: standard · Parallelizable with: O-A2 (disjoint files) Files: src/Server/.../Observability/ObservabilityExtensions.cs, src/Server/.../Program.cs:138 Refactor AddOtOpcUaObservability to accept IConfiguration and read OtOpcUa:Telemetry:Exporter (Prometheus|Otlp, default Prometheus) + OtOpcUa:Telemetry:OtlpEndpoint; set o.Exporter/o.OtlpEndpoint accordingly. Update the call site to builder.Services.AddOtOpcUaObservability(builder.Configuration). Default (no config) stays Prometheus. This also makes OtOpcUa's recorded spans exportable when OTLP is configured (resolves the trace no-op). Build; run OtOpcUaTelemetryHookTests. Commit.


MxAccessGateway (branch feat/telemetry-followons off main)

Task M-A3: gitignore stray doc artifacts

Classification: trivial · Files: .gitignore Append a # Documentation review artifacts block ignoring *-docs-issues.md, *-docs-fixed.md, *-docs-final.md (the 5 untracked *-docs-*.md files are CommentChecker "Documentation Analysis Report" output). Commit. (Do NOT delete the files — just ignore.)

Task M-B: metric normalization (mss + meter rename)

Classification: standard · Files: src/.../Metrics/GatewayMetrics.cs, test if needed

  • Rename MeterName const "MxGateway.Server""ZB.MOM.WW.MxGateway". (AddZbTelemetry uses the const, so it follows automatically; no test asserts the literal; GatewayMetricsTests filter by meter instance, not name.)
  • Change the 3 histograms' unit "ms""s" (CreateHistogram lines) and their 4 record sites .TotalMilliseconds.TotalSeconds. The snapshot/dashboard do NOT read these histograms, so no read-path impact. Check GatewayMetricsTests for any histogram-value assertion in ms and update. Build the Server project; run --filter "GatewayMetricsTests|GatewayApplicationTests". Commit.

Task M-D: OTLP exporter opt-in

Classification: small · Files: src/.../GatewayApplication.cs (the AddZbTelemetry lambda) In the AddZbTelemetry lambda, read MxGateway:Telemetry:Exporter + MxGateway:Telemetry:OtlpEndpoint from builder.Configuration (in scope) and set o.Exporter/o.OtlpEndpoint. Default Prometheus. Build. Commit. (Sequential after M-B — both touch GatewayApplication.cs / metrics area.)


ScadaBridge (branch feat/telemetry-followons off main)

Task S-A1: site-node HTTP/1.1 /metrics listener

Classification: standard · Files: src/.../NodeOptions.cs, src/.../Program.cs (Site Kestrel) Add MetricsPort (default 8082) to NodeOptions. In the Site block's ConfigureKestrel, add a second ListenAnyIP(metricsPort, lo => lo.Protocols = Http1AndHttp2) alongside the existing HTTP/2-only gRPC-port listener, so the already-mapped /metrics becomes scrapable over HTTP/1.1 on site nodes. Read the port from ScadaBridge:Node:MetricsPort (default 8082). Build; existing Host.Tests stay green. Commit.

Task S-D: OTLP exporter opt-in

Classification: small · Files: src/.../SiteServiceRegistration.cs (the AddZbTelemetry lambda) In BindSharedOptions, read ScadaBridge:Telemetry:Exporter + ScadaBridge:Telemetry:OtlpEndpoint from config (in scope) and set o.Exporter/o.OtlpEndpoint. Default Prometheus. Build. Commit. (Sequential after S-C0 — both edit the AddZbTelemetry call.)

Task S-C0: ScadaBridgeTelemetry meter + registration

Classification: standard · Files: Create src/ZB.MOM.WW.ScadaBridge.Commons/Observability/ScadaBridgeTelemetry.cs; edit SiteServiceRegistration.cs (AddZbTelemetry Meters) Create a ScadaBridgeTelemetry static class: Meter "ZB.MOM.WW.ScadaBridge" + the four instruments (scadabridge.deployments.applied counter; scadabridge.store_and_forward.queue.depth observable gauge; scadabridge.inbound_api.requests counter; scadabridge.site.connection.up up/down gauge) with thin static emit helpers. Register o.Meters = ["ZB.MOM.WW.ScadaBridge"] in the AddZbTelemetry call. Build. Commit. (Precedes C1C4.)

Tasks S-C1…S-C4: wire the four emit points

Classification: standard each · depend on S-C0

  • S-C1 deployments.applied — increment on the DeploymentManager/DeploymentService success path.
  • S-C2 store_and_forward.queue.depth — observable-gauge callback reading the StoreAndForward depth (SQLite COUNT/existing depth accessor).
  • S-C3 inbound_api.requests — increment (tag = method) in the InboundAPI endpoint filter/middleware.
  • S-C4 site.connection.up — +1 on site-stream open, 1 on close in the Communication/SiteStream gRPC server. Each implementer finds the cleanest emit point and STOPs + reports if no clean point exists rather than forcing a fragile edit. Add a focused test where practical. Build; commit per instrument.

scadaproj bookkeeping

Task Z: update GAPS.md

Classification: trivial · Files: components/observability/GAPS.md Move the handled follow-ons (#6/#7 done; A1 site-listener done; #9 first instruments done; #10/#11 OTLP opt-in done) from "Deferred" to a "Follow-ons — DONE 2026-06-01" subsection; note what each app now does. Commit + (on user request) push all branches/merges.


Sequencing

After each repo branch is cut: OtOpcUa {O-A2 ∥ O-D}; MxGateway {M-A3 → M-B → M-D}; ScadaBridge {S-A1 ∥ (S-C0 → {S-C1, S-C2, S-C3, S-C4} → S-D)}. Repos run in parallel. Z + merge/push last.