Files
scadaproj/components/observability/current-state/scadabridge/CURRENT-STATE.md
T
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00

8.2 KiB
Raw Blame History

Observability — current state: ScadaBridge

Repo: ~/Desktop/ScadaBridge. Stack: .NET 10, Akka.NET, Docker; solution ZB.MOM.WW.ScadaBridge.slnx. The telemetry posture is split across a dangling OTel package ref (metrics/traces) and a substantive Serilog setup (logs). All paths relative to repo root. Verified 2026-06-01.

Structurally the cleanest logging enricher set in the family — SiteId / NodeRole / NodeHostname are already first-class Serilog enricher properties — but the weakest on metrics/tracing: zero instrumentation. The OpenTelemetry.Api package reference is a CVE-patch artefact, not instrumentation.

1. Metrics and traces (absent)

OpenTelemetry.Api — CVE-patch ref, not instrumentation

src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:

  • :31<PackageReference Include="OpenTelemetry.Api" /> — a direct version override added to satisfy GHSA-g94r-2vxg-569j / GHSA-8785-wc3w-h8q6 (OpenTelemetry 1.9.0 CVEs introduced via Akka.Hosting's pinned transitive dependency).

There is no AddOpenTelemetry() call in the solution. No Meter is created. No ActivitySource is declared. No exporter is configured. The package reference solely overrides the transitive version — it has no runtime effect on observability.

Instrument coverage

Zero application instruments. There is no custom Meter, no counter, no histogram, no gauge, and no span in the ScadaBridge codebase. This is the largest gap in the family.

2. Logging (Serilog — strongest enricher set)

Two-stage bootstrap

src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:

  • :2754 — two-stage Serilog bootstrap: an initial logger is created for startup messages before the host is built; the full logger replaces it during UseSerilog.

LoggerConfigurationFactory.cs

src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs:

Full factory method signature: Build(IConfiguration config, string nodeRole, string siteId, string nodeHostname).

  • :62 — reads ScadaBridge:Logging:MinimumLevel from configuration.
  • :84ReadFrom.Configuration(config) pulls sink configuration from appsettings.json.
  • :85 — explicit MinimumLevel.Is(...) override from the typed option.
  • :8688 — three structural enrichers:
    • .Enrich.WithProperty("SiteId", siteId) — site identifier (e.g. "site-a").
    • .Enrich.WithProperty("NodeHostname", nodeHostname) — node hostname.
    • .Enrich.WithProperty("NodeRole", nodeRole) — Akka cluster role (e.g. "central", "site").

These three properties are the cleanest and most complete set in the family. ScadaBridge's property names (SiteId / NodeRole / NodeHostname) are also the ones the shared AddZbTelemetry options object maps onto site.id / node.role / host.name OTel Resource attributes — no renaming needed on adoption.

Sink configuration

appsettings.json:323 — Serilog sinks configured via ReadFrom.Configuration:

  • Console sink with output template that includes [{NodeRole}/{NodeHostname}].
  • File sink (path in config; rolling interval).

LoggingOptions.cs

src/ZB.MOM.WW.ScadaBridge.Host/LoggingOptions.cs:

  • MinimumLevel — config-bound minimum level; default Information.

Missing elements

  • No custom enrichers beyond the three structural properties. LogContextEnricher (OtOpcUa's driver-correlation enricher) has no equivalent; MxGateway's per-session correlation scope has no equivalent. Per-request/per-operation correlation is not present.
  • No trace_id / span_id enricher. As with the other two projects, log lines do not carry trace context. Because ScadaBridge has zero ActivitySource instrumentation, this is consistent — but it means no trace↔log correlation path exists even hypothetically.

3. Signal summary

Signal Provider Export Resource / service.name
Metrics none none none
Traces none none none
Logs Serilog Console + file (appsettings.json) none (no service.name property)
Trace↔log correlation absent (no ActivitySource; no enricher)

4. Notable design choices

  • SiteId / NodeRole / NodeHostname as first-class enrichers — unlike OtOpcUa's driver- scoped LogContextEnricher, ScadaBridge's structural enrichers are attached at logger creation and appear on every log line from the process. This is the target pattern for the shared bootstrap.
  • nodeRole + siteId passed into the factory — ScadaBridge's LoggerConfigurationFactory.Build takes these as constructor arguments rather than reading them from a registered options object. The shared AddZbSerilog approach binds them from the same ZbTelemetryOptions used for the OTel Resource, unifying the source.
  • Config-driven MinimumLevelScadaBridge:Logging:MinimumLevel is a typed config path; ReadFrom.Configuration for sinks. The shared bootstrap's AddZbSerilog must support the same pattern.
  • No custom enrichers — ScadaBridge's logging is intentionally minimal on operation-scoped context. Correlation in the distributed model is provided by structured log fields from Akka actor context, not a log enricher pipeline.
  • CVE-patch ref discipline — the OpenTelemetry.Api pin is a responsible CVE response but leaves the telemetry story incomplete. On adoption, the CVE pin is superseded by the full OTel SDK pulled in by AddZbTelemetry; the explicit <PackageReference> override can be removed.

Adoption plan → ZB.MOM.WW.Telemetry

Replace CVE-patch ref with full OTel SDK via AddZbTelemetry:

  • Remove the lone OpenTelemetry.Api override from src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:31.
  • Add builder.AddZbTelemetry(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.Meters = ["ZB.MOM.WW.ScadaBridge"]; }). The full OTel SDK supersedes the transitive version override; the CVE is resolved transitively via the SDK's current dependency.

Add first application instruments:

  • Define a ScadaBridgeTelemetry class (mirror OtOpcUaTelemetry) with a Meter named "ZB.MOM.WW.ScadaBridge" and an initial set of instruments covering the most observable operations: site connection lifecycle, alarm received, data-change received, actor supervision events. Naming convention: scadabridge.<subsystem>.<event>.
  • Register the meter name in AddZbTelemetry options. Expose /metrics via app.MapZbMetrics(). ScadaBridge goes from zero instrumentation to a baseline exportable set.

Adopt AddZbSerilog:

  • Replace the LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname) call in Program.cs:2754 with builder.AddZbSerilog(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; }). The three enrichers (SiteId, NodeRole, NodeHostname) are now provided by the shared AddZbSerilog path (SiteId/NodeRole from options; NodeHostname auto from Environment.MachineName); LoggerConfigurationFactory can be deleted.
  • ReadFrom.Configuration for sinks and MinimumLevel.Is override from config are preserved inside AddZbSerilog — behavior is unchanged.
  • The TraceContextEnricher is wired automatically by AddZbSerilog; once application instruments are added (above), trace_id / span_id will appear on log lines emitted during spans.

Keep bespoke:

  • LoggingOptions.cs — the MinimumLevel typed option and its config path (ScadaBridge:Logging:MinimumLevel) remain; AddZbSerilog must accept the minimum-level override from configuration. The config path stays ScadaBridge's own.
  • Console output template including [{NodeRole}/{NodeHostname}] — driven by appsettings.json; no change.
  • Akka actor-context log fields — per-operation context emitted by Akka infrastructure; not an enricher concern.
  • ZB.MOM.WW.ScadaBridge.Host.csproj package set otherwise — no other changes to the project file.

Adoption is a follow-on task (tracked in GAPS.md), not part of the ZB.MOM.WW.Telemetry library build. Adding instruments and adopting AddZbSerilog/AddZbTelemetry lands in the ScadaBridge repo as a separate commit once the nupkg is available.