Files
scadaproj/components/observability/current-state/scadabridge/CURRENT-STATE.md
T
Joseph Doherty 7d243890ed docs(observability): spec + METRIC-CONVENTIONS + ZB.MOM.WW.Telemetry shared contract
Author the three normalization docs for the observability component:
- components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project),
  AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline,
  exporter conventions, Serilog two-stage bootstrap with identity enrichers and
  TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and
  acceptance criteria.
- components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app
  namespace; MxGateway.Server flagged as convergence target), instrument naming pattern
  (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms
  flagged), Resource attribute set table, standard instrumentation baseline, and per-app
  instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms
  / 4 gauges; ScadaBridge TBD).
- components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two
  packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder +
  IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog,
  ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher.
  Consumer matrix and open contract questions included.
2026-06-01 07:19:38 -04:00

152 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Observability — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET, Docker; solution
`ZB.MOM.WW.ScadaBridge.slnx`. The telemetry posture is split across a dangling OTel package ref
(metrics/traces) and a substantive Serilog setup (logs). All paths relative to repo root.
Verified 2026-06-01.
Structurally the cleanest logging enricher set in the family — `SiteId` / `NodeRole` /
`NodeHostname` are already first-class Serilog enricher properties — but the weakest on
metrics/tracing: zero instrumentation. The `OpenTelemetry.Api` package reference is a CVE-patch
artefact, not instrumentation.
## 1. Metrics and traces (absent)
### `OpenTelemetry.Api` — CVE-patch ref, not instrumentation
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj`:
- `:31``<PackageReference Include="OpenTelemetry.Api" />` — a **direct version override** added
to satisfy GHSA-g94r-2vxg-569j / GHSA-8785-wc3w-h8q6 (OpenTelemetry 1.9.0 CVEs introduced via
`Akka.Hosting`'s pinned transitive dependency).
There is **no `AddOpenTelemetry()` call** in the solution. No `Meter` is created. No
`ActivitySource` is declared. No exporter is configured. The package reference solely overrides the
transitive version — it has no runtime effect on observability.
### Instrument coverage
Zero application instruments. There is no custom `Meter`, no counter, no histogram, no gauge, and
no span in the ScadaBridge codebase. This is the largest gap in the family.
## 2. Logging (Serilog — strongest enricher set)
### Two-stage bootstrap
`src/ZB.MOM.WW.ScadaBridge.Host/Program.cs`:
- `:2754` — two-stage Serilog bootstrap: an initial logger is created for startup messages before
the host is built; the full logger replaces it during `UseSerilog`.
### `LoggerConfigurationFactory.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs`:
Full factory method signature: `Build(IConfiguration config, string nodeRole, string siteId, string nodeHostname)`.
- `:62` — reads `ScadaBridge:Logging:MinimumLevel` from configuration.
- `:84``ReadFrom.Configuration(config)` pulls sink configuration from `appsettings.json`.
- `:85` — explicit `MinimumLevel.Is(...)` override from the typed option.
- `:8688` — three structural enrichers:
- `.Enrich.WithProperty("SiteId", siteId)` — site identifier (e.g. `"site-a"`).
- `.Enrich.WithProperty("NodeHostname", nodeHostname)` — node hostname.
- `.Enrich.WithProperty("NodeRole", nodeRole)` — Akka cluster role (e.g. `"central"`, `"site"`).
These three properties are the cleanest and most complete set in the family. ScadaBridge's property
names (`SiteId` / `NodeRole` / `NodeHostname`) are also the ones the shared `AddZbTelemetry`
options object maps onto `site.id` / `node.role` / `host.name` OTel Resource attributes — no
renaming needed on adoption.
### Sink configuration
`appsettings.json:323` — Serilog sinks configured via `ReadFrom.Configuration`:
- Console sink with output template that includes `[{NodeRole}/{NodeHostname}]`.
- File sink (path in config; rolling interval).
### `LoggingOptions.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggingOptions.cs`:
- `MinimumLevel` — config-bound minimum level; default `Information`.
### Missing elements
- **No custom enrichers** beyond the three structural properties. `LogContextEnricher` (OtOpcUa's
driver-correlation enricher) has no equivalent; MxGateway's per-session correlation scope has no
equivalent. Per-request/per-operation correlation is not present.
- **No `trace_id` / `span_id` enricher.** As with the other two projects, log lines do not carry
trace context. Because ScadaBridge has zero `ActivitySource` instrumentation, this is consistent —
but it means no trace↔log correlation path exists even hypothetically.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | ⛔ none | ⛔ none | ⛔ none |
| Traces | ⛔ none | ⛔ none | ⛔ none |
| Logs | Serilog | Console + file (`appsettings.json`) | ⛔ none (no `service.name` property) |
| Trace↔log correlation | — | — | ⛔ absent (no ActivitySource; no enricher) |
## 4. Notable design choices
- **`SiteId` / `NodeRole` / `NodeHostname` as first-class enrichers** — unlike OtOpcUa's driver-
scoped `LogContextEnricher`, ScadaBridge's structural enrichers are attached at logger creation and
appear on every log line from the process. This is the target pattern for the shared bootstrap.
- **`nodeRole` + `siteId` passed into the factory** — ScadaBridge's `LoggerConfigurationFactory.Build`
takes these as constructor arguments rather than reading them from a registered options object.
The shared `AddZbSerilog` approach binds them from the same `ZbTelemetryOptions` used for the OTel
Resource, unifying the source.
- **Config-driven `MinimumLevel`** — `ScadaBridge:Logging:MinimumLevel` is a typed config path;
`ReadFrom.Configuration` for sinks. The shared bootstrap's `AddZbSerilog` must support the same
pattern.
- **No custom enrichers** — ScadaBridge's logging is intentionally minimal on operation-scoped
context. Correlation in the distributed model is provided by structured log fields from Akka
actor context, not a log enricher pipeline.
- **CVE-patch ref discipline** — the `OpenTelemetry.Api` pin is a responsible CVE response but
leaves the telemetry story incomplete. On adoption, the CVE pin is superseded by the full OTel SDK
pulled in by `AddZbTelemetry`; the explicit `<PackageReference>` override can be removed.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**Replace CVE-patch ref with full OTel SDK via `AddZbTelemetry`:**
- Remove the lone `OpenTelemetry.Api` override from
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:31`.
- Add `builder.AddZbTelemetry(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.Meters = ["ZB.MOM.WW.ScadaBridge"]; })`.
The full OTel SDK supersedes the transitive version override; the CVE is resolved transitively
via the SDK's current dependency.
**Add first application instruments:**
- Define a `ScadaBridgeTelemetry` class (mirror `OtOpcUaTelemetry`) with a `Meter` named
`"ZB.MOM.WW.ScadaBridge"` and an initial set of instruments covering the most observable
operations: site connection lifecycle, alarm received, data-change received, actor supervision
events. Naming convention: `scadabridge.<subsystem>.<event>`.
- Register the meter name in `AddZbTelemetry` options. Expose `/metrics` via `app.MapZbMetrics()`.
ScadaBridge goes from zero instrumentation to a baseline exportable set.
**Adopt `AddZbSerilog`:**
- Replace the `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` call in
`Program.cs:2754` with `builder.AddZbSerilog(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.NodeHostname = cfg.NodeHostname; })`.
The three enrichers (`SiteId`, `NodeRole`, `NodeHostname`) are now provided by the shared
`AddZbSerilog` path; `LoggerConfigurationFactory` can be deleted.
- `ReadFrom.Configuration` for sinks and `MinimumLevel.Is` override from config are preserved
inside `AddZbSerilog` — behavior is unchanged.
- The `TraceContextEnricher` is wired automatically by `AddZbSerilog`; once application instruments
are added (above), `trace_id` / `span_id` will appear on log lines emitted during spans.
**Keep bespoke:**
- `LoggingOptions.cs` — the `MinimumLevel` typed option and its config path
(`ScadaBridge:Logging:MinimumLevel`) remain; `AddZbSerilog` must accept the minimum-level
override from configuration. The config path stays ScadaBridge's own.
- Console output template including `[{NodeRole}/{NodeHostname}]` — driven by `appsettings.json`;
no change.
- Akka actor-context log fields — per-operation context emitted by Akka infrastructure; not an
enricher concern.
- `ZB.MOM.WW.ScadaBridge.Host.csproj` package set otherwise — no other changes to the project file.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
library build. Adding instruments and adopting `AddZbSerilog`/`AddZbTelemetry` lands in the
ScadaBridge repo as a separate commit once the nupkg is available.