diff --git a/components/observability/GAPS.md b/components/observability/GAPS.md index 4d47256..f127d3b 100644 --- a/components/observability/GAPS.md +++ b/components/observability/GAPS.md @@ -215,13 +215,24 @@ sister apps in one pass, behaviour-preserving. Each adoption landed on a per-rep producing the same properties under Serilog. Only the provider swap + the `ILogRedactor` adapter were needed. -### Deferred (still open follow-ons) +## Follow-ons — DONE 2026-06-01 -- **#6** MxGateway histogram `ms`→`s`, **#7** Meter rename `MxGateway.Server`→`ZB.MOM.WW.MxGateway` - (both break dashboards — ops-coordinated). -- **#9** ScadaBridge application instruments (`ScadaBridgeTelemetry` + `scadabridge.*`). -- **#10/#11** OTLP exporter wiring; OtOpcUa trace export is still a no-op (Prometheus is metrics-only). -- **NEW — ScadaBridge Site-node `/metrics` scrape:** the Site role's Kestrel is HTTP/2-only (gRPC), - so the mapped `/metrics` is not HTTP/1.1-scrapable on that listener. The in-process metrics + Resource - still apply; Central serves `/metrics` normally. A follow-on should add a dedicated HTTP/1.1 (or - `Http1AndHttp2`) listener/port for site-node scraping. +All the deferred follow-ons were then executed (branch `feat/telemetry-followons` per repo, +behaviour-preserving except the intentional, no-consumer-yet metric-shape change in #6/#7). Plan: +[`docs/plans/2026-06-01-telemetry-followons.md`](../../docs/plans/2026-06-01-telemetry-followons.md). + +| Item | Status | What landed | +|---|---|---| +| **#6** MxGateway histogram `ms`→`s` | ✅ | 3 histograms record `.TotalSeconds`, unit `"s"`. Safe — never Prometheus-exported before, so no dashboards broke. | +| **#7** Meter rename → `ZB.MOM.WW.MxGateway` | ✅ | `GatewayMetrics.MeterName` renamed; `docs/Metrics.md` synced. | +| **#9** ScadaBridge app instruments | ✅ | `ScadaBridgeTelemetry` meter (`ZB.MOM.WW.ScadaBridge`) + first 4: `deployments.applied` (counter), `store_and_forward.queue.depth` (sync-safe cached gauge), `inbound_api.requests` (counter, bounded `method` tag), `site.connection.up` (balanced open/close gauge). | +| **#10/#11** OTLP opt-in | ✅ | All 3 apps read `:Telemetry:Exporter` (`Prometheus`\|`Otlp`) + `:OtlpEndpoint`, default Prometheus. Setting OTLP also exports OtOpcUa's spans (resolves the trace no-op) — once a collector endpoint is configured. | +| **Site-node `/metrics` scrape** | ✅ | ScadaBridge `NodeOptions.MetricsPort` (default **8084**, avoids the site `RemotingPort=8082` collision) + a second `Http1AndHttp2` Kestrel listener on the Site role; `StartupValidator` enforces MetricsPort ≠ Remoting/Grpc. | +| Serilog version drift | ✅ | OtOpcUa `Serilog.AspNetCore`/`.Extensions.Hosting`/`.Settings.Configuration` aligned to `10.0.0` (family-consistent). | + +**Still open (not code — operational/future):** + +- **OTLP is opt-in but unexercised** until an OTel collector endpoint is deployed and the + `:Telemetry:Exporter=Otlp` + `:OtlpEndpoint` config is set. The wiring is in place; only a + collector is missing. +- **Further ScadaBridge instruments** beyond the first 4 are additive future work (not blocking).