docs(observability): spec + METRIC-CONVENTIONS + ZB.MOM.WW.Telemetry shared contract

Author the three normalization docs for the observability component:
- components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project),
  AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline,
  exporter conventions, Serilog two-stage bootstrap with identity enrichers and
  TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and
  acceptance criteria.
- components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app
  namespace; MxGateway.Server flagged as convergence target), instrument naming pattern
  (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms
  flagged), Resource attribute set table, standard instrumentation baseline, and per-app
  instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms
  / 4 gauges; ScadaBridge TBD).
- components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two
  packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder +
  IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog,
  ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher.
  Consumer matrix and open contract questions included.
This commit is contained in:
Joseph Doherty
2026-06-01 07:19:38 -04:00
parent 76295695ee
commit 7d243890ed
6 changed files with 1149 additions and 0 deletions
@@ -0,0 +1,191 @@
# Observability — current state: MxAccessGateway
Repo: `~/Desktop/MxAccessGateway`. Stack: .NET 10 gateway (x64) + .NET 4.8 worker (**x86**);
solution `src/MxGateway.sln`. Telemetry code is concentrated in
`src/ZB.MOM.WW.MxGateway.Server/Metrics/` (instruments) and
`src/ZB.MOM.WW.MxGateway.Server/Diagnostics/` (logging correlation + redaction).
All paths relative to repo root. Verified 2026-06-01.
The most unusual observability posture in the family: **13 counters, 3 histograms, and 4 observable
gauges** all fully hand-rolled using `System.Diagnostics.Metrics` directly — but **never exported**
(no OpenTelemetry SDK, no Prometheus exporter, no OTLP). All metric data dies in an in-memory
`GetSnapshot()`. Logging is `Microsoft.Extensions.Logging` exclusively (no Serilog), with a bespoke
correlation scope and a log-redaction pipeline. The net48 x86 worker is out of process and out of
scope — its `IWorkerLogger` (stderr key=value) is not addressed here.
## 1. Metrics (hand-rolled, unexported)
### `GatewayMetrics.cs`
`src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs`:
Meter name: `"MxGateway.Server"` (does not follow the project namespace `ZB.MOM.WW.MxGateway`).
All instruments are instance members of `GatewayMetrics`. The class is registered as a **singleton**
at `GatewayApplication.cs:62`. There is **no `OpenTelemetry.Extensions.Hosting`**,
**no `AddOpenTelemetry()` call**, and **no exporter** — the `Meter` is created with
`new Meter("MxGateway.Server")` and `GetSnapshot()` is the only read path.
**Counters (13):**
| Instrument name | Tracks |
|---|---|
| `mxgateway.sessions.opened` | New session requests |
| `mxgateway.sessions.closed` | Sessions torn down |
| `mxgateway.commands.started` | MXAccess command dispatched |
| `mxgateway.commands.succeeded` | Command completed OK |
| `mxgateway.commands.failed` | Command error |
| `mxgateway.events.received` | MXAccess events from worker |
| `mxgateway.queues.overflows` | Queue overflow (backpressure) |
| `mxgateway.faults` | Unhandled gateway faults |
| `mxgateway.workers.killed` | Worker process forcibly terminated |
| `mxgateway.workers.exited` | Worker process exited cleanly |
| `mxgateway.heartbeats.failed` | Worker heartbeat timeouts |
| `mxgateway.grpc.streams.disconnected` | gRPC event stream disconnects |
| `mxgateway.retries.attempted` | Retry attempts (any subsystem) |
**Histograms (3) — unit `ms` (diverges from OTel semconv `s`):**
| Instrument name | Tracks |
|---|---|
| `mxgateway.workers.startup.duration` | Time from worker spawn to ready |
| `mxgateway.commands.duration` | End-to-end MXAccess command latency |
| `mxgateway.events.stream_send.duration` | gRPC event stream send latency |
**Observable gauges (4):**
| Instrument name | Tracks |
|---|---|
| `mxgateway.sessions.open` | Currently open sessions (live count) |
| `mxgateway.workers.running` | Currently running worker processes |
| `mxgateway.events.worker_queue.depth` | Per-worker event queue depth |
| `mxgateway.events.grpc_stream_queue.depth` | Per-stream gRPC send queue depth |
All 20 instruments share the `mxgateway.*` prefix and `<category>.<event>` naming — consistent
with the family convention. Duration histograms record in **milliseconds** (`ms`); OTel semantic
conventions require seconds (`s`). This is the only project with `ms` histograms.
### Singleton wiring
`src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`:
- `:62``services.AddSingleton<GatewayMetrics>()` registers the metrics singleton.
There is no `AddOpenTelemetry()` call anywhere in the gateway. The `GatewayMetrics` `Meter` is
created independently of any OTel SDK — it participates in `MeterListener` / `GetSnapshot()` only.
Without the OTel SDK, this data is **invisible to Prometheus, OTLP, or any backend**.
### No tracing
No `ActivitySource` is defined. No spans are created. Tracing is entirely absent.
## 2. Logging (Microsoft.Extensions.Logging)
All logging in the gateway server uses `Microsoft.Extensions.Logging` (MEL) exclusively. There is
no Serilog dependency. Sink configuration lives in `appsettings.json` (Console, with structured
logging via the default host builder).
### Correlation scope
`src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogScope.cs`:
Defines the per-request/per-session correlation property bag.
`src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayRequestLoggingMiddlewareExtensions.cs`:
- `:2241``UseGatewayRequestLogging()` middleware reads the following HTTP headers from each
incoming request: `x-session-id`, `x-worker-process-id`, `x-correlation-id`, `x-command-method`,
`authorization` (for redaction, not logging).
- Registered at `GatewayApplication.cs:34`.
`src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLoggerExtensions.cs`:
- `:1118``BeginGatewayScope(ILogger, GatewayLogScope)` calls `logger.BeginScope(scope)`
MEL's `ILogger.BeginScope` mechanism, which pushes properties as a scoped dictionary.
The correlation tuple (`SessionId` / `WorkerProcessId` / `CorrelationId` / `CommandMethod`) is
injected into log lines produced within the scope. No `trace_id` / `span_id` enrichment — there
is no ActivitySource, so this is consistent but leaves no path to trace correlation.
### Log redaction — `GatewayLogRedactor.cs`
`src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactor.cs`:
- Masks sensitive data in log lines for two categories:
- **`AuthenticateUser`** commands: the password argument is replaced.
- **`WriteSecured`** commands: the value argument is replaced.
- **`mxgw_` bearer tokens**: the token body is masked, keeping only the key-id prefix.
- Redaction is applied before the log event is emitted — no sensitive data reaches the sink.
This is the only project in the family with an explicit log-redaction pipeline. OtOpcUa and
ScadaBridge have no equivalent.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | `System.Diagnostics.Metrics` (`Meter` direct) | ⛔ none (`GetSnapshot()` only) | ⛔ none |
| Traces | — | ⛔ none | ⛔ none |
| Logs | MEL (`Microsoft.Extensions.Logging`) | Console via `appsettings.json` | ⛔ none |
| Trace↔log correlation | — | — | ⛔ absent (no ActivitySource exists) |
## 4. Notable design choices
- **`GatewayMetrics` singleton** — all counter/gauge increments are lock-free atomic operations on
the underlying `Meter` instruments; the singleton is intentional.
- **`ms` histogram unit** — `workers.startup.duration`, `commands.duration`, and
`events.stream_send.duration` all record in milliseconds. This is non-standard (OTel semconv
requires `s`) and means raw values differ from OtOpcUa's `s` histograms by a factor of 1000.
- **MEL correlation via `BeginScope`** — MEL scopes are supported by structured logging providers
(e.g. Serilog.Extensions.Hosting, Seq, Application Insights) but are provider-dependent. The
scope properties may not appear in all sink configurations, unlike Serilog's `LogContext` which
is sink-agnostic.
- **Redaction placement** — `GatewayLogRedactor` sits between the caller and the log emission point,
not inside a sink. This is the correct placement; the shared `ILogRedactor` seam preserves this.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**This is the one in-pass adoption.** The MxGateway MEL → Serilog migration is executed as part of
the `ZB.MOM.WW.Telemetry` library build, not deferred as a follow-on. The changes below land in
the MxAccessGateway repo as part of Task #9 (blocked by Task #8 — library build).
**Migrate logging MEL → `AddZbSerilog`:**
- Replace `WebApplicationBuilder` default logging with `builder.AddZbSerilog(o => { o.ServiceName = "mxgateway"; o.SiteId = ...; o.NodeRole = ...; })`.
Gains structured `SiteId` / `NodeRole` / `NodeHostname` enrichers on every log event, plus
`TraceContextEnricher` (currently moot — no spans — but ready for when tracing is added).
- Re-express the `GatewayLogScope` / `BeginGatewayScope` / `UseGatewayRequestLogging` correlation
mechanism as a Serilog `LogContext.PushProperty` scope. The middleware at
`GatewayRequestLoggingMiddlewareExtensions.cs:2241` is refactored to push the same four
properties (`SessionId`, `WorkerProcessId`, `CorrelationId`, `CommandMethod`) via Serilog's
`LogContext` rather than MEL `BeginScope`. Behavior is identical; portability improves.
- Move `GatewayLogRedactor` behind the shared `ILogRedactor` seam. The redaction policy (which
commands/tokens to scrub and how) stays per-project in a `MxGatewayLogRedactor : ILogRedactor`
implementation; the seam is shared.
- Console + file sinks configured via `ReadFrom.Configuration` in `appsettings.json` — consistent
with OtOpcUa and ScadaBridge's Serilog approach.
**Wire metrics export via `AddZbTelemetry`:**
- Add `builder.AddZbTelemetry(o => { o.ServiceName = "mxgateway"; o.SiteId = ...; o.NodeRole = ...; o.Meters = ["MxGateway.Server"]; })`.
This registers the OTel SDK and connects `GatewayMetrics`'s existing `Meter` to the Prometheus
exporter. The 13 counters, 3 histograms, and 4 gauges **begin exporting** for the first time.
`GatewayMetrics.cs` itself is unchanged — only the SDK layer is added around it.
- Add `app.MapZbMetrics()` to expose `/metrics`.
**Convert histogram unit `ms` → `s`:**
- Rename the three histograms' values: multiply recorded values by `0.001` at the call site, or
re-create the instruments with unit `s`. This is a breaking change to existing dashboards/alerts
but required for OTel semconv compliance. Tagged as a convergence item in `GAPS.md`.
**Keep bespoke:**
- `GatewayMetrics.cs` — all 20 instruments (`mxgateway.*` counters, histograms, gauges) stay
per-project. `AddZbTelemetry` registers the Meter name; it does not own or replace the instruments.
- Meter name `"MxGateway.Server"` — a follow-on rename to `"ZB.MOM.WW.MxGateway"` is tracked in
`GAPS.md` but is not required for the initial adoption (it is a Prometheus label change that
breaks existing dashboards).
- `GatewayApplication.cs:62` singleton registration — unchanged; `GatewayMetrics` remains a
singleton; `AddZbTelemetry` simply hooks the OTel SDK to it.
- The net48 x86 worker's `IWorkerLogger` (stderr key=value) — out of process and out of scope.
No changes.
@@ -0,0 +1,158 @@
# Observability — current state: OtOpcUa
Repo: `~/Desktop/OtOpcUa`. Stack: .NET 10, Akka.NET, OPC UA; solution `ZB.MOM.WW.OtOpcUa.slnx`.
Telemetry code lives in two places: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/` (host-side
bootstrap) and `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/` (instruments + enricher).
All paths relative to repo root. Verified 2026-06-01.
The most complete observability implementation in the family: OpenTelemetry SDK with both metrics and
tracing signals, Prometheus export, Serilog structured logging with a per-session correlation enricher,
and a dedicated instrument vocabulary. The one significant gap: **no OTel Resource / `service.name`**,
so all signals are indistinguishable from one another and from other fleet members in a backend.
## 1. Metrics (OpenTelemetry SDK)
### Bootstrap — `ObservabilityExtensions.cs`
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs`:
- `:18``AddOtOpcUaObservability(IServiceCollection)` is the service-registration entry point.
- `:20``AddOpenTelemetry()` wires the OTel SDK.
- `:2123``.WithMetrics(b => b.AddMeter(OtOpcUaTelemetry.MeterName).AddPrometheusExporter())`:
registers the application meter and attaches the Prometheus scrape exporter.
- `:2425``.WithTracing(b => b.AddSource(OtOpcUaTelemetry.ActivitySourceName))`:
registers the application activity source for trace data.
- **No `ResourceBuilder` call anywhere** — `service.name`, `service.namespace`, `service.version`,
`site.id`, and `node.role` are not set. The OTel SDK defaults to an empty/SDK-default Resource.
- `:36``MapOtOpcUaMetrics(IEndpointRouteBuilder)` maps the Prometheus endpoint.
- `:38` — endpoint path is `/metrics`.
`Program.cs`:
- `:138``builder.Services.AddOtOpcUaObservability()`
- `:160``app.MapOtOpcUaMetrics()`
Package refs in csproj: `OpenTelemetry.Extensions.Hosting`, `OpenTelemetry.Exporter.Prometheus.AspNetCore`.
**No `OpenTelemetry.Exporter.OpenTelemetryProtocol`** — OTLP is not available; Prometheus is the
only export path.
### Instruments — `OtOpcUaTelemetry.cs`
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`:
- `:19``MeterName = "ZB.MOM.WW.OtOpcUa"` (the `Meter` the SDK will collect).
- `:20``ActivitySourceName = "ZB.MOM.WW.OtOpcUa"` (the `ActivitySource` for spans).
Instruments defined (all `static readonly` on `OtOpcUaTelemetry`):
| Instrument | Kind | Unit | Subsystem |
|---|---|---|---|
| `otopcua.deploy.applied` | `Counter<long>` | — | deploy |
| `otopcua.deploy.apply.duration` | `Histogram<double>` | `s` | deploy |
| `otopcua.driver.lifecycle` | `Counter<long>` | — | driver |
| `otopcua.virtualtag.eval` | `Counter<long>` | — | virtual-tag |
| `otopcua.scriptedalarm.transition` | `Counter<long>` | — | scripted-alarm |
| `otopcua.opcua.sink.write` | `Counter<long>` | — | opc-ua sink |
| `otopcua.redundancy.service_level_change` | `Counter<long>` | — | redundancy |
Two activity spans: `otopcua.deploy.apply`, `otopcua.opcua.address_space_rebuild`.
Naming convention: `otopcua.<subsystem>.<event>`. Duration histogram correctly uses unit `s`
(OTel semantic conventions). **No standard instrumentation** (ASP.NET Core, HttpClient, runtime,
gRPC client meters) is wired — only the bespoke application instruments.
## 2. Logging (Serilog)
### Bootstrap
`Program.cs`:
- `:4952` — two-stage Serilog bootstrap: initial logger for startup, then full
`UseSerilog(ReadFrom.Configuration)`. Sinks: Console + rolling file `logs/otopcua-.log`.
- `:141``UseSerilogRequestLogging()` on the `WebApplication`.
### Correlation enricher — `LogContextEnricher.cs`
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/LogContextEnricher.cs`:
- `:1836``Push(driverInstanceId, driverType, capability, correlationId)` calls
`LogContext.PushProperty` for four properties:
- `DriverInstanceId` — Galaxy driver instance GUID.
- `DriverType` — driver type discriminator.
- `CapabilityName` — OPC UA capability being exercised.
- `CorrelationId` — caller-supplied correlation token.
This enricher is driver-lifecycle-scoped, not request-scoped — it pushes when a driver operation
begins and is disposable to pop on completion.
**No `trace_id` / `span_id` enricher.** Although OtOpcUa creates `ActivitySource` spans, the
active `Activity.Current` trace context is never pushed onto Serilog's `LogContext`. A log line
emitted during a span cannot be correlated to the span in a backend.
**No structural enrichers for `service.name` / `site.id` / `node.role`** — these dimensions are
absent from every log line. ScadaBridge has these; OtOpcUa does not.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | OTel SDK (`Meter` + `WithMetrics`) | Prometheus `/metrics` | ⛔ none |
| Traces | OTel SDK (`ActivitySource` + `WithTracing`) | ⛔ none (no exporter configured) | ⛔ none |
| Logs | Serilog | Console + rolling file | ⛔ none (no `service.name` property) |
| Trace↔log correlation | — | — | ⛔ absent (`trace_id`/`span_id` not pushed) |
Note: `WithTracing` registers the `ActivitySource` for collection, but no exporter (OTLP or
otherwise) is attached to the tracing pipeline. Spans are created and recorded by the SDK but never
shipped anywhere — effectively a no-op in production.
## 4. Notable design choices
- **Instrument naming** follows `<meter>.<subsystem>.<event>` cleanly and consistently — this is the
pattern the shared spec codifies as the fleet convention.
- **Duration unit** correctly uses `s` on `otopcua.deploy.apply.duration` — no conversion needed on
adoption; this contrasts with MxAccessGateway's `ms` histograms.
- **LogContextEnricher is bespoke but valuable** — the `DriverInstanceId`/`DriverType`/`CapabilityName`
correlation is OtOpcUa-specific domain context; it should survive adoption behind the shared
enricher layer.
- **No OTLP path** — with no OTLP exporter, OtOpcUa cannot send metrics or traces to a collector
(Prometheus is scrape-pull only). This limits operational flexibility.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**Replace with shared bootstrap:**
- `AddOtOpcUaObservability()``builder.AddZbTelemetry(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; o.Meters = [OtOpcUaTelemetry.MeterName]; o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; })`.
This adds the missing `Resource` (gains `service.name` / `service.namespace` / `service.version` /
`site.id` / `node.role` / `host.name` on every metric and span). Prometheus `/metrics` stays the
default exporter; OTLP becomes opt-in via options.
- Add standard instrumentation through `AddZbTelemetry` options: ASP.NET Core meters, HttpClient,
runtime + process meters — none wired today.
- Fix the tracing no-op: wire an OTLP exporter (or at minimum note that tracing is recorded but not
exported); `AddZbTelemetry` provides OTLP as the opt-in path.
- `MapOtOpcUaMetrics``app.MapZbMetrics()` (same `/metrics` path; shared convention).
**Replace with shared Serilog bootstrap:**
- Serilog bootstrap in `Program.cs:4952``builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; })`.
This adds structural `SiteId` / `NodeRole` / `NodeHostname` properties to every log line
(currently absent) and wires the `TraceContextEnricher` so `trace_id`/`span_id` appear on log
lines emitted during active spans.
- Console + file sinks continue via `ReadFrom.Configuration` in `appsettings.json` — no sink changes
needed.
- `UseSerilogRequestLogging()` stays.
**Keep bespoke:**
- `OtOpcUaTelemetry.cs` — the application `Meter`, `ActivitySource`, and all instrument definitions
(`otopcua.*` counters, histograms, spans). These are domain instruments; `AddZbTelemetry` registers
them by name but does not own them.
- `LogContextEnricher.cs` — driver-lifecycle correlation properties (`DriverInstanceId`,
`DriverType`, `CapabilityName`, `CorrelationId`) are OtOpcUa-specific. The enricher continues to
push via `LogContext.PushProperty` alongside the shared enrichers.
- `ObservabilityExtensions.cs` itself can be simplified or removed — it becomes a thin wrapper that
calls `AddZbTelemetry` with OtOpcUa-specific options. The per-project entry point remains; only
the implementation body is delegated to the shared library.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
library build. The library build delivers the shared bootstrap and enrichers; adoption lands in the
OtOpcUa repo as a separate commit once the nupkg is available.
@@ -0,0 +1,151 @@
# Observability — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET, Docker; solution
`ZB.MOM.WW.ScadaBridge.slnx`. The telemetry posture is split across a dangling OTel package ref
(metrics/traces) and a substantive Serilog setup (logs). All paths relative to repo root.
Verified 2026-06-01.
Structurally the cleanest logging enricher set in the family — `SiteId` / `NodeRole` /
`NodeHostname` are already first-class Serilog enricher properties — but the weakest on
metrics/tracing: zero instrumentation. The `OpenTelemetry.Api` package reference is a CVE-patch
artefact, not instrumentation.
## 1. Metrics and traces (absent)
### `OpenTelemetry.Api` — CVE-patch ref, not instrumentation
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj`:
- `:31``<PackageReference Include="OpenTelemetry.Api" />` — a **direct version override** added
to satisfy GHSA-g94r-2vxg-569j / GHSA-8785-wc3w-h8q6 (OpenTelemetry 1.9.0 CVEs introduced via
`Akka.Hosting`'s pinned transitive dependency).
There is **no `AddOpenTelemetry()` call** in the solution. No `Meter` is created. No
`ActivitySource` is declared. No exporter is configured. The package reference solely overrides the
transitive version — it has no runtime effect on observability.
### Instrument coverage
Zero application instruments. There is no custom `Meter`, no counter, no histogram, no gauge, and
no span in the ScadaBridge codebase. This is the largest gap in the family.
## 2. Logging (Serilog — strongest enricher set)
### Two-stage bootstrap
`src/ZB.MOM.WW.ScadaBridge.Host/Program.cs`:
- `:2754` — two-stage Serilog bootstrap: an initial logger is created for startup messages before
the host is built; the full logger replaces it during `UseSerilog`.
### `LoggerConfigurationFactory.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggerConfigurationFactory.cs`:
Full factory method signature: `Build(IConfiguration config, string nodeRole, string siteId, string nodeHostname)`.
- `:62` — reads `ScadaBridge:Logging:MinimumLevel` from configuration.
- `:84``ReadFrom.Configuration(config)` pulls sink configuration from `appsettings.json`.
- `:85` — explicit `MinimumLevel.Is(...)` override from the typed option.
- `:8688` — three structural enrichers:
- `.Enrich.WithProperty("SiteId", siteId)` — site identifier (e.g. `"site-a"`).
- `.Enrich.WithProperty("NodeHostname", nodeHostname)` — node hostname.
- `.Enrich.WithProperty("NodeRole", nodeRole)` — Akka cluster role (e.g. `"central"`, `"site"`).
These three properties are the cleanest and most complete set in the family. ScadaBridge's property
names (`SiteId` / `NodeRole` / `NodeHostname`) are also the ones the shared `AddZbTelemetry`
options object maps onto `site.id` / `node.role` / `host.name` OTel Resource attributes — no
renaming needed on adoption.
### Sink configuration
`appsettings.json:323` — Serilog sinks configured via `ReadFrom.Configuration`:
- Console sink with output template that includes `[{NodeRole}/{NodeHostname}]`.
- File sink (path in config; rolling interval).
### `LoggingOptions.cs`
`src/ZB.MOM.WW.ScadaBridge.Host/LoggingOptions.cs`:
- `MinimumLevel` — config-bound minimum level; default `Information`.
### Missing elements
- **No custom enrichers** beyond the three structural properties. `LogContextEnricher` (OtOpcUa's
driver-correlation enricher) has no equivalent; MxGateway's per-session correlation scope has no
equivalent. Per-request/per-operation correlation is not present.
- **No `trace_id` / `span_id` enricher.** As with the other two projects, log lines do not carry
trace context. Because ScadaBridge has zero `ActivitySource` instrumentation, this is consistent —
but it means no trace↔log correlation path exists even hypothetically.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | ⛔ none | ⛔ none | ⛔ none |
| Traces | ⛔ none | ⛔ none | ⛔ none |
| Logs | Serilog | Console + file (`appsettings.json`) | ⛔ none (no `service.name` property) |
| Trace↔log correlation | — | — | ⛔ absent (no ActivitySource; no enricher) |
## 4. Notable design choices
- **`SiteId` / `NodeRole` / `NodeHostname` as first-class enrichers** — unlike OtOpcUa's driver-
scoped `LogContextEnricher`, ScadaBridge's structural enrichers are attached at logger creation and
appear on every log line from the process. This is the target pattern for the shared bootstrap.
- **`nodeRole` + `siteId` passed into the factory** — ScadaBridge's `LoggerConfigurationFactory.Build`
takes these as constructor arguments rather than reading them from a registered options object.
The shared `AddZbSerilog` approach binds them from the same `ZbTelemetryOptions` used for the OTel
Resource, unifying the source.
- **Config-driven `MinimumLevel`** — `ScadaBridge:Logging:MinimumLevel` is a typed config path;
`ReadFrom.Configuration` for sinks. The shared bootstrap's `AddZbSerilog` must support the same
pattern.
- **No custom enrichers** — ScadaBridge's logging is intentionally minimal on operation-scoped
context. Correlation in the distributed model is provided by structured log fields from Akka
actor context, not a log enricher pipeline.
- **CVE-patch ref discipline** — the `OpenTelemetry.Api` pin is a responsible CVE response but
leaves the telemetry story incomplete. On adoption, the CVE pin is superseded by the full OTel SDK
pulled in by `AddZbTelemetry`; the explicit `<PackageReference>` override can be removed.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**Replace CVE-patch ref with full OTel SDK via `AddZbTelemetry`:**
- Remove the lone `OpenTelemetry.Api` override from
`src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:31`.
- Add `builder.AddZbTelemetry(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.Meters = ["ZB.MOM.WW.ScadaBridge"]; })`.
The full OTel SDK supersedes the transitive version override; the CVE is resolved transitively
via the SDK's current dependency.
**Add first application instruments:**
- Define a `ScadaBridgeTelemetry` class (mirror `OtOpcUaTelemetry`) with a `Meter` named
`"ZB.MOM.WW.ScadaBridge"` and an initial set of instruments covering the most observable
operations: site connection lifecycle, alarm received, data-change received, actor supervision
events. Naming convention: `scadabridge.<subsystem>.<event>`.
- Register the meter name in `AddZbTelemetry` options. Expose `/metrics` via `app.MapZbMetrics()`.
ScadaBridge goes from zero instrumentation to a baseline exportable set.
**Adopt `AddZbSerilog`:**
- Replace the `LoggerConfigurationFactory.Build(config, nodeRole, siteId, nodeHostname)` call in
`Program.cs:2754` with `builder.AddZbSerilog(o => { o.ServiceName = "scadabridge"; o.SiteId = cfg.SiteId; o.NodeRole = cfg.NodeRole; o.NodeHostname = cfg.NodeHostname; })`.
The three enrichers (`SiteId`, `NodeRole`, `NodeHostname`) are now provided by the shared
`AddZbSerilog` path; `LoggerConfigurationFactory` can be deleted.
- `ReadFrom.Configuration` for sinks and `MinimumLevel.Is` override from config are preserved
inside `AddZbSerilog` — behavior is unchanged.
- The `TraceContextEnricher` is wired automatically by `AddZbSerilog`; once application instruments
are added (above), `trace_id` / `span_id` will appear on log lines emitted during spans.
**Keep bespoke:**
- `LoggingOptions.cs` — the `MinimumLevel` typed option and its config path
(`ScadaBridge:Logging:MinimumLevel`) remain; `AddZbSerilog` must accept the minimum-level
override from configuration. The config path stays ScadaBridge's own.
- Console output template including `[{NodeRole}/{NodeHostname}]` — driven by `appsettings.json`;
no change.
- Akka actor-context log fields — per-operation context emitted by Akka infrastructure; not an
enricher concern.
- `ZB.MOM.WW.ScadaBridge.Host.csproj` package set otherwise — no other changes to the project file.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
library build. Adding instruments and adopting `AddZbSerilog`/`AddZbTelemetry` lands in the
ScadaBridge repo as a separate commit once the nupkg is available.