# Observability — current state: OtOpcUa Repo: `~/Desktop/OtOpcUa`. Stack: .NET 10, Akka.NET, OPC UA; solution `ZB.MOM.WW.OtOpcUa.slnx`. Telemetry code lives in two places: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/` (host-side bootstrap) and `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/` (instruments + enricher). All paths relative to repo root. Verified 2026-06-01. The most complete observability implementation in the family: OpenTelemetry SDK with both metrics and tracing signals, Prometheus export, Serilog structured logging with a per-session correlation enricher, and a dedicated instrument vocabulary. The one significant gap: **no OTel Resource / `service.name`**, so all signals are indistinguishable from one another and from other fleet members in a backend. ## 1. Metrics (OpenTelemetry SDK) ### Bootstrap — `ObservabilityExtensions.cs` `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs`: - `:18` — `AddOtOpcUaObservability(IServiceCollection)` is the service-registration entry point. - `:20` — `AddOpenTelemetry()` wires the OTel SDK. - `:21–23` — `.WithMetrics(b => b.AddMeter(OtOpcUaTelemetry.MeterName).AddPrometheusExporter())`: registers the application meter and attaches the Prometheus scrape exporter. - `:24–25` — `.WithTracing(b => b.AddSource(OtOpcUaTelemetry.ActivitySourceName))`: registers the application activity source for trace data. - **No `ResourceBuilder` call anywhere** — `service.name`, `service.namespace`, `service.version`, `site.id`, and `node.role` are not set. The OTel SDK defaults to an empty/SDK-default Resource. - `:36` — `MapOtOpcUaMetrics(IEndpointRouteBuilder)` maps the Prometheus endpoint. - `:38` — endpoint path is `/metrics`. `Program.cs`: - `:138` — `builder.Services.AddOtOpcUaObservability()` - `:160` — `app.MapOtOpcUaMetrics()` Package refs in csproj: `OpenTelemetry.Extensions.Hosting`, `OpenTelemetry.Exporter.Prometheus.AspNetCore`. **No `OpenTelemetry.Exporter.OpenTelemetryProtocol`** — OTLP is not available; Prometheus is the only export path. ### Instruments — `OtOpcUaTelemetry.cs` `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`: - `:19` — `MeterName = "ZB.MOM.WW.OtOpcUa"` (the `Meter` the SDK will collect). - `:20` — `ActivitySourceName = "ZB.MOM.WW.OtOpcUa"` (the `ActivitySource` for spans). Instruments defined (all `static readonly` on `OtOpcUaTelemetry`): | Instrument | Kind | Unit | Subsystem | |---|---|---|---| | `otopcua.deploy.applied` | `Counter` | — | deploy | | `otopcua.deploy.apply.duration` | `Histogram` | `s` | deploy | | `otopcua.driver.lifecycle` | `Counter` | — | driver | | `otopcua.virtualtag.eval` | `Counter` | — | virtual-tag | | `otopcua.scriptedalarm.transition` | `Counter` | — | scripted-alarm | | `otopcua.opcua.sink.write` | `Counter` | — | opc-ua sink | | `otopcua.redundancy.service_level_change` | `Counter` | — | redundancy | Two activity spans: `otopcua.deploy.apply`, `otopcua.opcua.address_space_rebuild`. Naming convention: `otopcua..`. Duration histogram correctly uses unit `s` (OTel semantic conventions). **No standard instrumentation** (ASP.NET Core, HttpClient, runtime, gRPC client meters) is wired — only the bespoke application instruments. ## 2. Logging (Serilog) ### Bootstrap `Program.cs`: - `:49–52` — two-stage Serilog bootstrap: initial logger for startup, then full `UseSerilog(ReadFrom.Configuration)`. Sinks: Console + rolling file `logs/otopcua-.log`. - `:141` — `UseSerilogRequestLogging()` on the `WebApplication`. ### Correlation enricher — `LogContextEnricher.cs` `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/LogContextEnricher.cs`: - `:18–36` — `Push(driverInstanceId, driverType, capability, correlationId)` calls `LogContext.PushProperty` for four properties: - `DriverInstanceId` — Galaxy driver instance GUID. - `DriverType` — driver type discriminator. - `CapabilityName` — OPC UA capability being exercised. - `CorrelationId` — caller-supplied correlation token. This enricher is driver-lifecycle-scoped, not request-scoped — it pushes when a driver operation begins and is disposable to pop on completion. **No `trace_id` / `span_id` enricher.** Although OtOpcUa creates `ActivitySource` spans, the active `Activity.Current` trace context is never pushed onto Serilog's `LogContext`. A log line emitted during a span cannot be correlated to the span in a backend. **No structural enrichers for `service.name` / `site.id` / `node.role`** — these dimensions are absent from every log line. ScadaBridge has these; OtOpcUa does not. ## 3. Signal summary | Signal | Provider | Export | Resource / service.name | |---|---|---|---| | Metrics | OTel SDK (`Meter` + `WithMetrics`) | Prometheus `/metrics` | ⛔ none | | Traces | OTel SDK (`ActivitySource` + `WithTracing`) | ⛔ none (no exporter configured) | ⛔ none | | Logs | Serilog | Console + rolling file | ⛔ none (no `service.name` property) | | Trace↔log correlation | — | — | ⛔ absent (`trace_id`/`span_id` not pushed) | Note: `WithTracing` registers the `ActivitySource` for collection, but no exporter (OTLP or otherwise) is attached to the tracing pipeline. Spans are created and recorded by the SDK but never shipped anywhere — effectively a no-op in production. ## 4. Notable design choices - **Instrument naming** follows `..` cleanly and consistently — this is the pattern the shared spec codifies as the fleet convention. - **Duration unit** correctly uses `s` on `otopcua.deploy.apply.duration` — no conversion needed on adoption; this contrasts with MxAccessGateway's `ms` histograms. - **LogContextEnricher is bespoke but valuable** — the `DriverInstanceId`/`DriverType`/`CapabilityName` correlation is OtOpcUa-specific domain context; it should survive adoption behind the shared enricher layer. - **No OTLP path** — with no OTLP exporter, OtOpcUa cannot send metrics or traces to a collector (Prometheus is scrape-pull only). This limits operational flexibility. --- ## Adoption plan → `ZB.MOM.WW.Telemetry` **Replace with shared bootstrap:** - `AddOtOpcUaObservability()` → `builder.AddZbTelemetry(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; o.Meters = [OtOpcUaTelemetry.MeterName]; o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; })`. This adds the missing `Resource` (gains `service.name` / `service.namespace` / `service.version` / `site.id` / `node.role` / `host.name` on every metric and span). Prometheus `/metrics` stays the default exporter; OTLP becomes opt-in via options. - Add standard instrumentation through `AddZbTelemetry` options: ASP.NET Core meters, HttpClient, runtime + process meters — none wired today. - Fix the tracing no-op: wire an OTLP exporter (or at minimum note that tracing is recorded but not exported); `AddZbTelemetry` provides OTLP as the opt-in path. - `MapOtOpcUaMetrics` → `app.MapZbMetrics()` (same `/metrics` path; shared convention). **Replace with shared Serilog bootstrap:** - Serilog bootstrap in `Program.cs:49–52` → `builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; })`. This adds structural `SiteId` / `NodeRole` / `NodeHostname` properties to every log line (currently absent) and wires the `TraceContextEnricher` so `trace_id`/`span_id` appear on log lines emitted during active spans. - Console + file sinks continue via `ReadFrom.Configuration` in `appsettings.json` — no sink changes needed. - `UseSerilogRequestLogging()` stays. **Keep bespoke:** - `OtOpcUaTelemetry.cs` — the application `Meter`, `ActivitySource`, and all instrument definitions (`otopcua.*` counters, histograms, spans). These are domain instruments; `AddZbTelemetry` registers them by name but does not own them. - `LogContextEnricher.cs` — driver-lifecycle correlation properties (`DriverInstanceId`, `DriverType`, `CapabilityName`, `CorrelationId`) are OtOpcUa-specific. The enricher continues to push via `LogContext.PushProperty` alongside the shared enrichers. - `ObservabilityExtensions.cs` itself can be simplified or removed — it becomes a thin wrapper that calls `AddZbTelemetry` with OtOpcUa-specific options. The per-project entry point remains; only the implementation body is delegated to the shared library. **Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry` library build. The library build delivers the shared bootstrap and enrichers; adoption lands in the OtOpcUa repo as a separate commit once the nupkg is available.