Author the three normalization docs for the observability component: - components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project), AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline, exporter conventions, Serilog two-stage bootstrap with identity enrichers and TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and acceptance criteria. - components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app namespace; MxGateway.Server flagged as convergence target), instrument naming pattern (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms flagged), Resource attribute set table, standard instrumentation baseline, and per-app instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms / 4 gauges; ScadaBridge TBD). - components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder + IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog, ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher. Consumer matrix and open contract questions included.
8.6 KiB
Observability — current state: OtOpcUa
Repo: ~/Desktop/OtOpcUa. Stack: .NET 10, Akka.NET, OPC UA; solution ZB.MOM.WW.OtOpcUa.slnx.
Telemetry code lives in two places: src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ (host-side
bootstrap) and src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/ (instruments + enricher).
All paths relative to repo root. Verified 2026-06-01.
The most complete observability implementation in the family: OpenTelemetry SDK with both metrics and
tracing signals, Prometheus export, Serilog structured logging with a per-session correlation enricher,
and a dedicated instrument vocabulary. The one significant gap: no OTel Resource / service.name,
so all signals are indistinguishable from one another and from other fleet members in a backend.
1. Metrics (OpenTelemetry SDK)
Bootstrap — ObservabilityExtensions.cs
src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs:
:18—AddOtOpcUaObservability(IServiceCollection)is the service-registration entry point.:20—AddOpenTelemetry()wires the OTel SDK.:21–23—.WithMetrics(b => b.AddMeter(OtOpcUaTelemetry.MeterName).AddPrometheusExporter()): registers the application meter and attaches the Prometheus scrape exporter.:24–25—.WithTracing(b => b.AddSource(OtOpcUaTelemetry.ActivitySourceName)): registers the application activity source for trace data.- No
ResourceBuildercall anywhere —service.name,service.namespace,service.version,site.id, andnode.roleare not set. The OTel SDK defaults to an empty/SDK-default Resource. :36—MapOtOpcUaMetrics(IEndpointRouteBuilder)maps the Prometheus endpoint.:38— endpoint path is/metrics.
Program.cs:
:138—builder.Services.AddOtOpcUaObservability():160—app.MapOtOpcUaMetrics()
Package refs in csproj: OpenTelemetry.Extensions.Hosting, OpenTelemetry.Exporter.Prometheus.AspNetCore.
No OpenTelemetry.Exporter.OpenTelemetryProtocol — OTLP is not available; Prometheus is the
only export path.
Instruments — OtOpcUaTelemetry.cs
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs:
:19—MeterName = "ZB.MOM.WW.OtOpcUa"(theMeterthe SDK will collect).:20—ActivitySourceName = "ZB.MOM.WW.OtOpcUa"(theActivitySourcefor spans).
Instruments defined (all static readonly on OtOpcUaTelemetry):
| Instrument | Kind | Unit | Subsystem |
|---|---|---|---|
otopcua.deploy.applied |
Counter<long> |
— | deploy |
otopcua.deploy.apply.duration |
Histogram<double> |
s |
deploy |
otopcua.driver.lifecycle |
Counter<long> |
— | driver |
otopcua.virtualtag.eval |
Counter<long> |
— | virtual-tag |
otopcua.scriptedalarm.transition |
Counter<long> |
— | scripted-alarm |
otopcua.opcua.sink.write |
Counter<long> |
— | opc-ua sink |
otopcua.redundancy.service_level_change |
Counter<long> |
— | redundancy |
Two activity spans: otopcua.deploy.apply, otopcua.opcua.address_space_rebuild.
Naming convention: otopcua.<subsystem>.<event>. Duration histogram correctly uses unit s
(OTel semantic conventions). No standard instrumentation (ASP.NET Core, HttpClient, runtime,
gRPC client meters) is wired — only the bespoke application instruments.
2. Logging (Serilog)
Bootstrap
Program.cs:
:49–52— two-stage Serilog bootstrap: initial logger for startup, then fullUseSerilog(ReadFrom.Configuration). Sinks: Console + rolling filelogs/otopcua-.log.:141—UseSerilogRequestLogging()on theWebApplication.
Correlation enricher — LogContextEnricher.cs
src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/LogContextEnricher.cs:
:18–36—Push(driverInstanceId, driverType, capability, correlationId)callsLogContext.PushPropertyfor four properties:DriverInstanceId— Galaxy driver instance GUID.DriverType— driver type discriminator.CapabilityName— OPC UA capability being exercised.CorrelationId— caller-supplied correlation token.
This enricher is driver-lifecycle-scoped, not request-scoped — it pushes when a driver operation begins and is disposable to pop on completion.
No trace_id / span_id enricher. Although OtOpcUa creates ActivitySource spans, the
active Activity.Current trace context is never pushed onto Serilog's LogContext. A log line
emitted during a span cannot be correlated to the span in a backend.
No structural enrichers for service.name / site.id / node.role — these dimensions are
absent from every log line. ScadaBridge has these; OtOpcUa does not.
3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | OTel SDK (Meter + WithMetrics) |
Prometheus /metrics |
⛔ none |
| Traces | OTel SDK (ActivitySource + WithTracing) |
⛔ none (no exporter configured) | ⛔ none |
| Logs | Serilog | Console + rolling file | ⛔ none (no service.name property) |
| Trace↔log correlation | — | — | ⛔ absent (trace_id/span_id not pushed) |
Note: WithTracing registers the ActivitySource for collection, but no exporter (OTLP or
otherwise) is attached to the tracing pipeline. Spans are created and recorded by the SDK but never
shipped anywhere — effectively a no-op in production.
4. Notable design choices
- Instrument naming follows
<meter>.<subsystem>.<event>cleanly and consistently — this is the pattern the shared spec codifies as the fleet convention. - Duration unit correctly uses
sonotopcua.deploy.apply.duration— no conversion needed on adoption; this contrasts with MxAccessGateway'smshistograms. - LogContextEnricher is bespoke but valuable — the
DriverInstanceId/DriverType/CapabilityNamecorrelation is OtOpcUa-specific domain context; it should survive adoption behind the shared enricher layer. - No OTLP path — with no OTLP exporter, OtOpcUa cannot send metrics or traces to a collector (Prometheus is scrape-pull only). This limits operational flexibility.
Adoption plan → ZB.MOM.WW.Telemetry
Replace with shared bootstrap:
AddOtOpcUaObservability()→builder.AddZbTelemetry(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; o.Meters = [OtOpcUaTelemetry.MeterName]; o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; }). This adds the missingResource(gainsservice.name/service.namespace/service.version/site.id/node.role/host.nameon every metric and span). Prometheus/metricsstays the default exporter; OTLP becomes opt-in via options.- Add standard instrumentation through
AddZbTelemetryoptions: ASP.NET Core meters, HttpClient, runtime + process meters — none wired today. - Fix the tracing no-op: wire an OTLP exporter (or at minimum note that tracing is recorded but not
exported);
AddZbTelemetryprovides OTLP as the opt-in path. MapOtOpcUaMetrics→app.MapZbMetrics()(same/metricspath; shared convention).
Replace with shared Serilog bootstrap:
- Serilog bootstrap in
Program.cs:49–52→builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; }). This adds structuralSiteId/NodeRole/NodeHostnameproperties to every log line (currently absent) and wires theTraceContextEnrichersotrace_id/span_idappear on log lines emitted during active spans. - Console + file sinks continue via
ReadFrom.Configurationinappsettings.json— no sink changes needed. UseSerilogRequestLogging()stays.
Keep bespoke:
OtOpcUaTelemetry.cs— the applicationMeter,ActivitySource, and all instrument definitions (otopcua.*counters, histograms, spans). These are domain instruments;AddZbTelemetryregisters them by name but does not own them.LogContextEnricher.cs— driver-lifecycle correlation properties (DriverInstanceId,DriverType,CapabilityName,CorrelationId) are OtOpcUa-specific. The enricher continues to push viaLogContext.PushPropertyalongside the shared enrichers.ObservabilityExtensions.csitself can be simplified or removed — it becomes a thin wrapper that callsAddZbTelemetrywith OtOpcUa-specific options. The per-project entry point remains; only the implementation body is delegated to the shared library.
Adoption is a follow-on task (tracked in GAPS.md), not part of the ZB.MOM.WW.Telemetry
library build. The library build delivers the shared bootstrap and enrichers; adoption lands in the
OtOpcUa repo as a separate commit once the nupkg is available.