215a646e35
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).
C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).
C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.
m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.
I4: §5 standard instrumentation table corrected — OtOpcUa now shows ⛔ not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.
I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).
I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.
I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.
m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
153 lines
8.2 KiB
Markdown
153 lines
8.2 KiB
Markdown
# Observability — current state: ScadaBridge
|
||
|
||
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET, Docker; solution
|
||
`ZB.MOM.WW.ScadaBridge.slnx`. The telemetry posture is split across a dangling OTel package ref
|
||
(metrics/traces) and a substantive Serilog setup (logs). All paths relative to repo root.
|
||
Verified 2026-06-01.
|
||
|
||
Structurally the cleanest logging enricher set in the family — `SiteId` / `NodeRole` /
|
||
`NodeHostname` are already first-class Serilog enricher properties — but the weakest on
|
||
metrics/tracing: zero instrumentation. The `OpenTelemetry.Api` package reference is a CVE-patch
|
||
artefact, not instrumentation.
|
||
|
||
## 1. Metrics and traces (absent)
|
||
|
||
### `OpenTelemetry.Api` — CVE-patch ref, not instrumentation
|
||
|
||
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj`:
|
||
- `:31` — `<PackageReference Include="OpenTelemetry.Api" />` — a **direct version override** added
|
||
to satisfy GHSA-g94r-2vxg-569j / GHSA-8785-wc3w-h8q6 (OpenTelemetry 1.9.0 CVEs introduced via
|
||
`Akka.Hosting`'s pinned transitive dependency).
|
||
|
||
There is **no `AddOpenTelemetry()` call** in the solution. No `Meter` is created. No
|
||
`ActivitySource` is declared. No exporter is configured. The package reference solely overrides the
|
||
transitive version — it has no runtime effect on observability.
|
||
|
||
### Instrument coverage
|
||
|
||
Zero application instruments. There is no custom `Meter`, no counter, no histogram, no gauge, and
|
||
no span in the ScadaBridge codebase. This is the largest gap in the family.
|
||
|
||
## 2. Logging (Serilog — strongest enricher set)
|
||
|
||
### Two-stage bootstrap
|
||
|
||
`src/ZB.MOM.WW.ScadaBridge.Host/Program.cs`:
|
||
- `:27–54` — two-stage Serilog bootstrap: an initial logger is created for startup messages before
|
||
the host is built; the full logger replaces it during `UseSerilog`.
|
||
|
||
### `LoggerConfigurationFactory.cs`
|
||
|
||
`src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs`:
|
||
|
||
Full factory method signature: `Build(IConfiguration config, string nodeRole, string siteId, string nodeHostname)`.
|
||
|
||
- `:62` — reads `ScadaBridge:Logging:MinimumLevel` from configuration.
|
||
- `:84` — `ReadFrom.Configuration(config)` pulls sink configuration from `appsettings.json`.
|
||
- `:85` — explicit `MinimumLevel.Is(...)` override from the typed option.
|
||
- `:86–88` — three structural enrichers:
|
||
- `.Enrich.WithProperty("SiteId", siteId)` — site identifier (e.g. `"site-a"`).
|
||
- `.Enrich.WithProperty("NodeHostname", nodeHostname)` — node hostname.
|
||
- `.Enrich.WithProperty("NodeRole", nodeRole)` — Akka cluster role (e.g. `"central"`, `"site"`).
|
||
|
||
These three properties are the cleanest and most complete set in the family. ScadaBridge's property
|
||
names (`SiteId` / `NodeRole` / `NodeHostname`) are also the ones the shared `AddZbTelemetry`
|
||
options object maps onto `site.id` / `node.role` / `host.name` OTel Resource attributes — no
|
||
renaming needed on adoption.
|
||
|
||
### Sink configuration
|
||
|
||
`appsettings.json:3–23` — Serilog sinks configured via `ReadFrom.Configuration`:
|
||
- Console sink with output template that includes `[{NodeRole}/{NodeHostname}]`.
|
||
- File sink (path in config; rolling interval).
|
||
|
||
### `LoggingOptions.cs`
|
||
|
||
`src/ZB.MOM.WW.ScadaBridge.Host/LoggingOptions.cs`:
|
||
- `MinimumLevel` — config-bound minimum level; default `Information`.
|
||
|
||
### Missing elements
|
||
|
||
- **No custom enrichers** beyond the three structural properties. `LogContextEnricher` (OtOpcUa's
|
||
driver-correlation enricher) has no equivalent; MxGateway's per-session correlation scope has no
|
||
equivalent. Per-request/per-operation correlation is not present.
|
||
- **No `trace_id` / `span_id` enricher.** As with the other two projects, log lines do not carry
|
||
trace context. Because ScadaBridge has zero `ActivitySource` instrumentation, this is consistent —
|
||
but it means no trace↔log correlation path exists even hypothetically.
|
||
|
||
## 3. Signal summary
|
||
|
||
| Signal | Provider | Export | Resource / service.name |
|
||
|---|---|---|---|
|
||
| Metrics | ⛔ none | ⛔ none | ⛔ none |
|
||
| Traces | ⛔ none | ⛔ none | ⛔ none |
|
||
| Logs | Serilog | Console + file (`appsettings.json`) | ⛔ none (no `service.name` property) |
|
||
| Trace↔log correlation | — | — | ⛔ absent (no ActivitySource; no enricher) |
|
||
|
||
## 4. Notable design choices
|
||
|
||
- **`SiteId` / `NodeRole` / `NodeHostname` as first-class enrichers** — unlike OtOpcUa's driver-
|
||
scoped `LogContextEnricher`, ScadaBridge's structural enrichers are attached at logger creation and
|
||
appear on every log line from the process. This is the target pattern for the shared bootstrap.
|
||
- **`nodeRole` + `siteId` passed into the factory** — ScadaBridge's `LoggerConfigurationFactory.Build`
|
||
takes these as constructor arguments rather than reading them from a registered options object.
|
||
The shared `AddZbSerilog` approach binds them from the same `ZbTelemetryOptions` used for the OTel
|
||
Resource, unifying the source.
|
||
- **Config-driven `MinimumLevel`** — `ScadaBridge:Logging:MinimumLevel` is a typed config path;
|
||
`ReadFrom.Configuration` for sinks. The shared bootstrap's `AddZbSerilog` must support the same
|
||
pattern.
|
||
- **No custom enrichers** — ScadaBridge's logging is intentionally minimal on operation-scoped
|
||
context. Correlation in the distributed model is provided by structured log fields from Akka
|
||
actor context, not a log enricher pipeline.
|
||
- **CVE-patch ref discipline** — the `OpenTelemetry.Api` pin is a responsible CVE response but
|
||
leaves the telemetry story incomplete. On adoption, the CVE pin is superseded by the full OTel SDK
|
||
pulled in by `AddZbTelemetry`; the explicit `<PackageReference>` override can be removed.
|
||
|
||
---
|
||
|
||
## Adoption plan → `ZB.MOM.WW.Telemetry`
|
||
|
||
**Replace CVE-patch ref with full OTel SDK via `AddZbTelemetry`:**
|
||
|
||
- Remove the lone `OpenTelemetry.Api` override from
|
||
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:31`.
|
||
- Add `builder.AddZbTelemetry(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.Meters = ["ZB.MOM.WW.ScadaBridge"]; })`.
|
||
The full OTel SDK supersedes the transitive version override; the CVE is resolved transitively
|
||
via the SDK's current dependency.
|
||
|
||
**Add first application instruments:**
|
||
|
||
- Define a `ScadaBridgeTelemetry` class (mirror `OtOpcUaTelemetry`) with a `Meter` named
|
||
`"ZB.MOM.WW.ScadaBridge"` and an initial set of instruments covering the most observable
|
||
operations: site connection lifecycle, alarm received, data-change received, actor supervision
|
||
events. Naming convention: `scadabridge.<subsystem>.<event>`.
|
||
- Register the meter name in `AddZbTelemetry` options. Expose `/metrics` via `app.MapZbMetrics()`.
|
||
ScadaBridge goes from zero instrumentation to a baseline exportable set.
|
||
|
||
**Adopt `AddZbSerilog`:**
|
||
|
||
- Replace the `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` call in
|
||
`Program.cs:27–54` with `builder.AddZbSerilog(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; })`.
|
||
The three enrichers (`SiteId`, `NodeRole`, `NodeHostname`) are now provided by the shared
|
||
`AddZbSerilog` path (`SiteId`/`NodeRole` from options; `NodeHostname` auto from
|
||
`Environment.MachineName`); `LoggerConfigurationFactory` can be deleted.
|
||
- `ReadFrom.Configuration` for sinks and `MinimumLevel.Is` override from config are preserved
|
||
inside `AddZbSerilog` — behavior is unchanged.
|
||
- The `TraceContextEnricher` is wired automatically by `AddZbSerilog`; once application instruments
|
||
are added (above), `trace_id` / `span_id` will appear on log lines emitted during spans.
|
||
|
||
**Keep bespoke:**
|
||
|
||
- `LoggingOptions.cs` — the `MinimumLevel` typed option and its config path
|
||
(`ScadaBridge:Logging:MinimumLevel`) remain; `AddZbSerilog` must accept the minimum-level
|
||
override from configuration. The config path stays ScadaBridge's own.
|
||
- Console output template including `[{NodeRole}/{NodeHostname}]` — driven by `appsettings.json`;
|
||
no change.
|
||
- Akka actor-context log fields — per-operation context emitted by Akka infrastructure; not an
|
||
enricher concern.
|
||
- `ZB.MOM.WW.ScadaBridge.Host.csproj` package set otherwise — no other changes to the project file.
|
||
|
||
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
|
||
library build. Adding instruments and adopting `AddZbSerilog`/`AddZbTelemetry` lands in the
|
||
ScadaBridge repo as a separate commit once the nupkg is available.
|