Files
scadaproj/components/observability/current-state/scadabridge/CURRENT-STATE.md
T
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00

153 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Observability — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET, Docker; solution
`ZB.MOM.WW.ScadaBridge.slnx`. The telemetry posture is split across a dangling OTel package ref
(metrics/traces) and a substantive Serilog setup (logs). All paths relative to repo root.
Verified 2026-06-01.
Structurally the cleanest logging enricher set in the family — `SiteId` / `NodeRole` /
`NodeHostname` are already first-class Serilog enricher properties — but the weakest on
metrics/tracing: zero instrumentation. The `OpenTelemetry.Api` package reference is a CVE-patch
artefact, not instrumentation.
## 1. Metrics and traces (absent)
### `OpenTelemetry.Api` — CVE-patch ref, not instrumentation
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj`:
- `:31``<PackageReference Include="OpenTelemetry.Api" />` — a **direct version override** added
to satisfy GHSA-g94r-2vxg-569j / GHSA-8785-wc3w-h8q6 (OpenTelemetry 1.9.0 CVEs introduced via
`Akka.Hosting`'s pinned transitive dependency).
There is **no `AddOpenTelemetry()` call** in the solution. No `Meter` is created. No
`ActivitySource` is declared. No exporter is configured. The package reference solely overrides the
transitive version — it has no runtime effect on observability.
### Instrument coverage
Zero application instruments. There is no custom `Meter`, no counter, no histogram, no gauge, and
no span in the ScadaBridge codebase. This is the largest gap in the family.
## 2. Logging (Serilog — strongest enricher set)
### Two-stage bootstrap
`src/ZB.MOM.WW.ScadaBridge.Host/Program.cs`:
- `:2754` — two-stage Serilog bootstrap: an initial logger is created for startup messages before
the host is built; the full logger replaces it during `UseSerilog`.
### `LoggerConfigurationFactory.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs`:
Full factory method signature: `Build(IConfiguration config, string nodeRole, string siteId, string nodeHostname)`.
- `:62` — reads `ScadaBridge:Logging:MinimumLevel` from configuration.
- `:84``ReadFrom.Configuration(config)` pulls sink configuration from `appsettings.json`.
- `:85` — explicit `MinimumLevel.Is(...)` override from the typed option.
- `:8688` — three structural enrichers:
- `.Enrich.WithProperty("SiteId", siteId)` — site identifier (e.g. `"site-a"`).
- `.Enrich.WithProperty("NodeHostname", nodeHostname)` — node hostname.
- `.Enrich.WithProperty("NodeRole", nodeRole)` — Akka cluster role (e.g. `"central"`, `"site"`).
These three properties are the cleanest and most complete set in the family. ScadaBridge's property
names (`SiteId` / `NodeRole` / `NodeHostname`) are also the ones the shared `AddZbTelemetry`
options object maps onto `site.id` / `node.role` / `host.name` OTel Resource attributes — no
renaming needed on adoption.
### Sink configuration
`appsettings.json:323` — Serilog sinks configured via `ReadFrom.Configuration`:
- Console sink with output template that includes `[{NodeRole}/{NodeHostname}]`.
- File sink (path in config; rolling interval).
### `LoggingOptions.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggingOptions.cs`:
- `MinimumLevel` — config-bound minimum level; default `Information`.
### Missing elements
- **No custom enrichers** beyond the three structural properties. `LogContextEnricher` (OtOpcUa's
driver-correlation enricher) has no equivalent; MxGateway's per-session correlation scope has no
equivalent. Per-request/per-operation correlation is not present.
- **No `trace_id` / `span_id` enricher.** As with the other two projects, log lines do not carry
trace context. Because ScadaBridge has zero `ActivitySource` instrumentation, this is consistent —
but it means no trace↔log correlation path exists even hypothetically.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | ⛔ none | ⛔ none | ⛔ none |
| Traces | ⛔ none | ⛔ none | ⛔ none |
| Logs | Serilog | Console + file (`appsettings.json`) | ⛔ none (no `service.name` property) |
| Trace↔log correlation | — | — | ⛔ absent (no ActivitySource; no enricher) |
## 4. Notable design choices
- **`SiteId` / `NodeRole` / `NodeHostname` as first-class enrichers** — unlike OtOpcUa's driver-
scoped `LogContextEnricher`, ScadaBridge's structural enrichers are attached at logger creation and
appear on every log line from the process. This is the target pattern for the shared bootstrap.
- **`nodeRole` + `siteId` passed into the factory** — ScadaBridge's `LoggerConfigurationFactory.Build`
takes these as constructor arguments rather than reading them from a registered options object.
The shared `AddZbSerilog` approach binds them from the same `ZbTelemetryOptions` used for the OTel
Resource, unifying the source.
- **Config-driven `MinimumLevel`** — `ScadaBridge:Logging:MinimumLevel` is a typed config path;
`ReadFrom.Configuration` for sinks. The shared bootstrap's `AddZbSerilog` must support the same
pattern.
- **No custom enrichers** — ScadaBridge's logging is intentionally minimal on operation-scoped
context. Correlation in the distributed model is provided by structured log fields from Akka
actor context, not a log enricher pipeline.
- **CVE-patch ref discipline** — the `OpenTelemetry.Api` pin is a responsible CVE response but
leaves the telemetry story incomplete. On adoption, the CVE pin is superseded by the full OTel SDK
pulled in by `AddZbTelemetry`; the explicit `<PackageReference>` override can be removed.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**Replace CVE-patch ref with full OTel SDK via `AddZbTelemetry`:**
- Remove the lone `OpenTelemetry.Api` override from
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:31`.
- Add `builder.AddZbTelemetry(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.Meters = ["ZB.MOM.WW.ScadaBridge"]; })`.
The full OTel SDK supersedes the transitive version override; the CVE is resolved transitively
via the SDK's current dependency.
**Add first application instruments:**
- Define a `ScadaBridgeTelemetry` class (mirror `OtOpcUaTelemetry`) with a `Meter` named
`"ZB.MOM.WW.ScadaBridge"` and an initial set of instruments covering the most observable
operations: site connection lifecycle, alarm received, data-change received, actor supervision
events. Naming convention: `scadabridge.<subsystem>.<event>`.
- Register the meter name in `AddZbTelemetry` options. Expose `/metrics` via `app.MapZbMetrics()`.
ScadaBridge goes from zero instrumentation to a baseline exportable set.
**Adopt `AddZbSerilog`:**
- Replace the `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` call in
`Program.cs:2754` with `builder.AddZbSerilog(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; })`.
The three enrichers (`SiteId`, `NodeRole`, `NodeHostname`) are now provided by the shared
`AddZbSerilog` path (`SiteId`/`NodeRole` from options; `NodeHostname` auto from
`Environment.MachineName`); `LoggerConfigurationFactory` can be deleted.
- `ReadFrom.Configuration` for sinks and `MinimumLevel.Is` override from config are preserved
inside `AddZbSerilog` — behavior is unchanged.
- The `TraceContextEnricher` is wired automatically by `AddZbSerilog`; once application instruments
are added (above), `trace_id` / `span_id` will appear on log lines emitted during spans.
**Keep bespoke:**
- `LoggingOptions.cs` — the `MinimumLevel` typed option and its config path
(`ScadaBridge:Logging:MinimumLevel`) remain; `AddZbSerilog` must accept the minimum-level
override from configuration. The config path stays ScadaBridge's own.
- Console output template including `[{NodeRole}/{NodeHostname}]` — driven by `appsettings.json`;
no change.
- Akka actor-context log fields — per-operation context emitted by Akka infrastructure; not an
enricher concern.
- `ZB.MOM.WW.ScadaBridge.Host.csproj` package set otherwise — no other changes to the project file.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
library build. Adding instruments and adopting `AddZbSerilog`/`AddZbTelemetry` lands in the
ScadaBridge repo as a separate commit once the nupkg is available.