docs(observability): record telemetry follow-ons DONE (metric normalization, ScadaBridge instruments, OTLP opt-in, site metrics listener, Serilog alignment)
This commit is contained in:
@@ -215,13 +215,24 @@ sister apps in one pass, behaviour-preserving. Each adoption landed on a per-rep
|
||||
producing the same properties under Serilog. Only the provider swap + the `ILogRedactor` adapter were
|
||||
needed.
|
||||
|
||||
### Deferred (still open follow-ons)
|
||||
## Follow-ons — DONE 2026-06-01
|
||||
|
||||
- **#6** MxGateway histogram `ms`→`s`, **#7** Meter rename `MxGateway.Server`→`ZB.MOM.WW.MxGateway`
|
||||
(both break dashboards — ops-coordinated).
|
||||
- **#9** ScadaBridge application instruments (`ScadaBridgeTelemetry` + `scadabridge.*`).
|
||||
- **#10/#11** OTLP exporter wiring; OtOpcUa trace export is still a no-op (Prometheus is metrics-only).
|
||||
- **NEW — ScadaBridge Site-node `/metrics` scrape:** the Site role's Kestrel is HTTP/2-only (gRPC),
|
||||
so the mapped `/metrics` is not HTTP/1.1-scrapable on that listener. The in-process metrics + Resource
|
||||
still apply; Central serves `/metrics` normally. A follow-on should add a dedicated HTTP/1.1 (or
|
||||
`Http1AndHttp2`) listener/port for site-node scraping.
|
||||
All the deferred follow-ons were then executed (branch `feat/telemetry-followons` per repo,
|
||||
behaviour-preserving except the intentional, no-consumer-yet metric-shape change in #6/#7). Plan:
|
||||
[`docs/plans/2026-06-01-telemetry-followons.md`](../../docs/plans/2026-06-01-telemetry-followons.md).
|
||||
|
||||
| Item | Status | What landed |
|
||||
|---|---|---|
|
||||
| **#6** MxGateway histogram `ms`→`s` | ✅ | 3 histograms record `.TotalSeconds`, unit `"s"`. Safe — never Prometheus-exported before, so no dashboards broke. |
|
||||
| **#7** Meter rename → `ZB.MOM.WW.MxGateway` | ✅ | `GatewayMetrics.MeterName` renamed; `docs/Metrics.md` synced. |
|
||||
| **#9** ScadaBridge app instruments | ✅ | `ScadaBridgeTelemetry` meter (`ZB.MOM.WW.ScadaBridge`) + first 4: `deployments.applied` (counter), `store_and_forward.queue.depth` (sync-safe cached gauge), `inbound_api.requests` (counter, bounded `method` tag), `site.connection.up` (balanced open/close gauge). |
|
||||
| **#10/#11** OTLP opt-in | ✅ | All 3 apps read `<App>:Telemetry:Exporter` (`Prometheus`\|`Otlp`) + `:OtlpEndpoint`, default Prometheus. Setting OTLP also exports OtOpcUa's spans (resolves the trace no-op) — once a collector endpoint is configured. |
|
||||
| **Site-node `/metrics` scrape** | ✅ | ScadaBridge `NodeOptions.MetricsPort` (default **8084**, avoids the site `RemotingPort=8082` collision) + a second `Http1AndHttp2` Kestrel listener on the Site role; `StartupValidator` enforces MetricsPort ≠ Remoting/Grpc. |
|
||||
| Serilog version drift | ✅ | OtOpcUa `Serilog.AspNetCore`/`.Extensions.Hosting`/`.Settings.Configuration` aligned to `10.0.0` (family-consistent). |
|
||||
|
||||
**Still open (not code — operational/future):**
|
||||
|
||||
- **OTLP is opt-in but unexercised** until an OTel collector endpoint is deployed and the
|
||||
`<App>:Telemetry:Exporter=Otlp` + `:OtlpEndpoint` config is set. The wiring is in place; only a
|
||||
collector is missing.
|
||||
- **Further ScadaBridge instruments** beyond the first 4 are additive future work (not blocking).
|
||||
|
||||
Reference in New Issue
Block a user