C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).
C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).
C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.
m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.
I4: §5 standard instrumentation table corrected — OtOpcUa now shows ⛔ not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.
I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).
I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.
I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.
m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
Observability (metrics / traces / logs)
Third normalized component under the operability cluster. Goal: path to shared code — converge
the three sister projects onto a common OpenTelemetry Resource, a shared Serilog bootstrap with
unified enrichers, and a trace↔log correlation bridge, proposed as the ZB.MOM.WW.Telemetry
library set (2 packages), while each project keeps its own application instruments and sink
configuration.
- The one target:
spec/SPEC.md - Metric naming reference:
spec/METRIC-CONVENTIONS.md - The proposed shared library:
shared-contract/ZB.MOM.WW.Telemetry.md - Divergences + backlog:
GAPS.md - Current state, per project:
current-state/
Why observability is a strong normalization candidate
All three projects instrument something — but in three completely different ways and at three very different levels of completeness. The divergences are structural:
- OtOpcUa has the full OpenTelemetry SDK (metrics + tracing), Prometheus export, and a bespoke
Serilog enricher for driver-lifecycle correlation — but no Resource (
service.nameis never set) and no trace↔log bridge. - MxAccessGateway has 20 hand-rolled instruments (counters, histograms, gauges) recording real production data — that never leave the process. No OTel SDK, no exporter, no tracing. Logging uses Microsoft.Extensions.Logging rather than Serilog, with a bespoke correlation-scope and redaction pipeline.
- ScadaBridge has zero application instruments. Its
OpenTelemetry.Apireference is a CVE patch, not instrumentation. It does have the cleanest structured logging enricher set (SiteId/NodeRole/NodeHostname) — but those properties exist only in Serilog, not in the OTel Resource, so logs and metrics cannot join in a backend.
Nobody sets a Resource. Nobody does trace↔log correlation. MxGateway's metrics are invisible. ScadaBridge has no metrics at all.
The common fix is a single AddZbTelemetry(options) call that: creates a shared Resource from a
service.name/site.id/node.role options object; registers the project's own Meter/ActivitySource
names with the OTel SDK; and exposes Prometheus /metrics. A companion AddZbSerilog(options) wires
Serilog with the same options as enricher properties and adds TraceContextEnricher so logs carry
trace_id/span_id. The unifying hinge: the same identity triple (service.name/site.id/
node.role) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and
a log line from the same node carry identical dimensions and join up in a backend.
One adoption happens in this task: MxAccessGateway migrates off MEL onto AddZbSerilog. All
other app wiring is follow-on, consistent with how Auth and UI-Theme are structured.
Status by project
| Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status |
|---|---|---|---|---|---|---|
| OtOpcUa | ✅ full SDK (WithMetrics+WithTracing) |
✅ 7 instruments (otopcua.*); Prometheus /metrics |
🟡 2 spans defined; no exporter | Serilog (Console+File) | DriverInstanceId/DriverType/CapabilityName/CorrelationId (driver-scope) |
Not started (follow-on) |
| MxAccessGateway | ⛔ none (hand-rolled Meter) |
🟡 20 instruments (mxgateway.*); never exported |
⛔ none | MEL → migrating to Serilog in this task | SessionId/WorkerProcessId/CorrelationId/CommandMethod (MEL scope) |
In progress (Task #9) |
| ScadaBridge | ⛔ (OpenTelemetry.Api CVE-patch only) |
⛔ zero instruments | ⛔ none | Serilog (Console+File) | SiteId/NodeRole/NodeHostname (process-level; strongest set) |
Not started (follow-on) |
See each project's current-state/<project>/CURRENT-STATE.md for the
code-verified detail and its adoption plan.
Normalized vs. left per-project
Normalized (the shared target):
AddZbTelemetry(ZbTelemetryOptions)— front door for the OTel SDK. Populates the shared Resource (service.name,service.namespace,service.version,site.id,node.role,host.name). Registers the caller-supplied Meter and ActivitySource name(s). Wires standard instrumentation (ASP.NET Core, HttpClient, runtime, process). Prometheus default; OTLP opt-in.app.MapZbMetrics()— maps the Prometheus/metricsendpoint (shared path + shared exporter).AddZbSerilog(ZbTelemetryOptions)— shared Serilog two-stage bootstrap generalizing ScadaBridge'sLoggerConfigurationFactory. WiresSiteId/NodeRole/NodeHostnameenrichers from the same options object as the OTel Resource. WiresTraceContextEnricher(trace_id/span_idfromActivity.Current). PreservesReadFrom.Configurationfor sinks and explicitMinimumLevel.Isoverride.ILogRedactorseam — generalized from MxGateway'sGatewayLogRedactor. The seam is shared; the redaction policy (which fields/commands) stays per-project.- Metric naming convention:
<meter>.<subsystem>.<event>; Meter name = project namespace (ZB.MOM.WW.<ProjectName>); duration unit =s(OTel semconv).
Left per-project (not forced together):
- Application
Meter,ActivitySource, and all instrument definitions —otopcua.*,mxgateway.*,scadabridge.*instruments are owned by each repo. - Serilog sink configuration (
appsettings.jsonConsole/File templates, rolling intervals). - Per-operation/per-session correlation enrichers (
LogContextEnricherin OtOpcUa;LogContext.PushPropertyscope in MxGateway after migration). - Redaction policies (
MxGatewayLogRedactorimplementsILogRedactorwith gateway-specific command/field rules). - Config section paths for
SiteId/NodeRole/NodeHostname— each project binds these from its own config hierarchy and passes the resolved values toAddZbTelemetry/AddZbSerilog.
Package structure
ZB.MOM.WW.Telemetry ships as two dependency-split packages:
| Package | Contents | Consumers |
|---|---|---|
ZB.MOM.WW.Telemetry |
AddZbTelemetry, ZbTelemetryOptions, Resource builder, standard instrumentation, Prometheus/OTLP exporters, app.MapZbMetrics() |
All three |
ZB.MOM.WW.Telemetry.Serilog |
AddZbSerilog, shared enrichers (SiteId/NodeRole/NodeHostname/TraceContextEnricher), ILogRedactor seam |
All three (Serilog users); MxGateway on migration |
Both packages share ZbTelemetryOptions as the single options object that drives Resource
attributes, Serilog enrichers, Meter/ActivitySource names, and exporter selection — the unifying
hinge that makes a metric, a span, and a log line from the same node carry identical dimensions.
Component status
Status: Draft. Spec and shared-contract written; current-state docs verified; GAPS backlog
populated. Library implementation in progress (ZB.MOM.WW.Telemetry — Task #8). MxAccessGateway
MEL → Serilog migration in progress (Task #9, blocked by library build). Adoption by OtOpcUa and
ScadaBridge is follow-on, tracked in GAPS.md.