Files
scadaproj/components/observability/current-state/otopcua/CURRENT-STATE.md
T
Joseph Doherty 7d243890ed docs(observability): spec + METRIC-CONVENTIONS + ZB.MOM.WW.Telemetry shared contract
Author the three normalization docs for the observability component:
- components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project),
  AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline,
  exporter conventions, Serilog two-stage bootstrap with identity enrichers and
  TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and
  acceptance criteria.
- components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app
  namespace; MxGateway.Server flagged as convergence target), instrument naming pattern
  (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms
  flagged), Resource attribute set table, standard instrumentation baseline, and per-app
  instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms
  / 4 gauges; ScadaBridge TBD).
- components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two
  packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder +
  IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog,
  ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher.
  Consumer matrix and open contract questions included.
2026-06-01 07:19:38 -04:00

159 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Observability — current state: OtOpcUa
Repo: `~/Desktop/OtOpcUa`. Stack: .NET 10, Akka.NET, OPC UA; solution `ZB.MOM.WW.OtOpcUa.slnx`.
Telemetry code lives in two places: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/` (host-side
bootstrap) and `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/` (instruments + enricher).
All paths relative to repo root. Verified 2026-06-01.
The most complete observability implementation in the family: OpenTelemetry SDK with both metrics and
tracing signals, Prometheus export, Serilog structured logging with a per-session correlation enricher,
and a dedicated instrument vocabulary. The one significant gap: **no OTel Resource / `service.name`**,
so all signals are indistinguishable from one another and from other fleet members in a backend.
## 1. Metrics (OpenTelemetry SDK)
### Bootstrap — `ObservabilityExtensions.cs`
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Observability/ObservabilityExtensions.cs`:
- `:18``AddOtOpcUaObservability(IServiceCollection)` is the service-registration entry point.
- `:20``AddOpenTelemetry()` wires the OTel SDK.
- `:2123``.WithMetrics(b => b.AddMeter(OtOpcUaTelemetry.MeterName).AddPrometheusExporter())`:
registers the application meter and attaches the Prometheus scrape exporter.
- `:2425``.WithTracing(b => b.AddSource(OtOpcUaTelemetry.ActivitySourceName))`:
registers the application activity source for trace data.
- **No `ResourceBuilder` call anywhere** — `service.name`, `service.namespace`, `service.version`,
`site.id`, and `node.role` are not set. The OTel SDK defaults to an empty/SDK-default Resource.
- `:36``MapOtOpcUaMetrics(IEndpointRouteBuilder)` maps the Prometheus endpoint.
- `:38` — endpoint path is `/metrics`.
`Program.cs`:
- `:138``builder.Services.AddOtOpcUaObservability()`
- `:160``app.MapOtOpcUaMetrics()`
Package refs in csproj: `OpenTelemetry.Extensions.Hosting`, `OpenTelemetry.Exporter.Prometheus.AspNetCore`.
**No `OpenTelemetry.Exporter.OpenTelemetryProtocol`** — OTLP is not available; Prometheus is the
only export path.
### Instruments — `OtOpcUaTelemetry.cs`
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`:
- `:19``MeterName = "ZB.MOM.WW.OtOpcUa"` (the `Meter` the SDK will collect).
- `:20``ActivitySourceName = "ZB.MOM.WW.OtOpcUa"` (the `ActivitySource` for spans).
Instruments defined (all `static readonly` on `OtOpcUaTelemetry`):
| Instrument | Kind | Unit | Subsystem |
|---|---|---|---|
| `otopcua.deploy.applied` | `Counter<long>` | — | deploy |
| `otopcua.deploy.apply.duration` | `Histogram<double>` | `s` | deploy |
| `otopcua.driver.lifecycle` | `Counter<long>` | — | driver |
| `otopcua.virtualtag.eval` | `Counter<long>` | — | virtual-tag |
| `otopcua.scriptedalarm.transition` | `Counter<long>` | — | scripted-alarm |
| `otopcua.opcua.sink.write` | `Counter<long>` | — | opc-ua sink |
| `otopcua.redundancy.service_level_change` | `Counter<long>` | — | redundancy |
Two activity spans: `otopcua.deploy.apply`, `otopcua.opcua.address_space_rebuild`.
Naming convention: `otopcua.<subsystem>.<event>`. Duration histogram correctly uses unit `s`
(OTel semantic conventions). **No standard instrumentation** (ASP.NET Core, HttpClient, runtime,
gRPC client meters) is wired — only the bespoke application instruments.
## 2. Logging (Serilog)
### Bootstrap
`Program.cs`:
- `:4952` — two-stage Serilog bootstrap: initial logger for startup, then full
`UseSerilog(ReadFrom.Configuration)`. Sinks: Console + rolling file `logs/otopcua-.log`.
- `:141``UseSerilogRequestLogging()` on the `WebApplication`.
### Correlation enricher — `LogContextEnricher.cs`
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/LogContextEnricher.cs`:
- `:1836``Push(driverInstanceId, driverType, capability, correlationId)` calls
`LogContext.PushProperty` for four properties:
- `DriverInstanceId` — Galaxy driver instance GUID.
- `DriverType` — driver type discriminator.
- `CapabilityName` — OPC UA capability being exercised.
- `CorrelationId` — caller-supplied correlation token.
This enricher is driver-lifecycle-scoped, not request-scoped — it pushes when a driver operation
begins and is disposable to pop on completion.
**No `trace_id` / `span_id` enricher.** Although OtOpcUa creates `ActivitySource` spans, the
active `Activity.Current` trace context is never pushed onto Serilog's `LogContext`. A log line
emitted during a span cannot be correlated to the span in a backend.
**No structural enrichers for `service.name` / `site.id` / `node.role`** — these dimensions are
absent from every log line. ScadaBridge has these; OtOpcUa does not.
## 3. Signal summary
| Signal | Provider | Export | Resource / service.name |
|---|---|---|---|
| Metrics | OTel SDK (`Meter` + `WithMetrics`) | Prometheus `/metrics` | ⛔ none |
| Traces | OTel SDK (`ActivitySource` + `WithTracing`) | ⛔ none (no exporter configured) | ⛔ none |
| Logs | Serilog | Console + rolling file | ⛔ none (no `service.name` property) |
| Trace↔log correlation | — | — | ⛔ absent (`trace_id`/`span_id` not pushed) |
Note: `WithTracing` registers the `ActivitySource` for collection, but no exporter (OTLP or
otherwise) is attached to the tracing pipeline. Spans are created and recorded by the SDK but never
shipped anywhere — effectively a no-op in production.
## 4. Notable design choices
- **Instrument naming** follows `<meter>.<subsystem>.<event>` cleanly and consistently — this is the
pattern the shared spec codifies as the fleet convention.
- **Duration unit** correctly uses `s` on `otopcua.deploy.apply.duration` — no conversion needed on
adoption; this contrasts with MxAccessGateway's `ms` histograms.
- **LogContextEnricher is bespoke but valuable** — the `DriverInstanceId`/`DriverType`/`CapabilityName`
correlation is OtOpcUa-specific domain context; it should survive adoption behind the shared
enricher layer.
- **No OTLP path** — with no OTLP exporter, OtOpcUa cannot send metrics or traces to a collector
(Prometheus is scrape-pull only). This limits operational flexibility.
---
## Adoption plan → `ZB.MOM.WW.Telemetry`
**Replace with shared bootstrap:**
- `AddOtOpcUaObservability()``builder.AddZbTelemetry(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; o.Meters = [OtOpcUaTelemetry.MeterName]; o.ActivitySources = [OtOpcUaTelemetry.ActivitySourceName]; })`.
This adds the missing `Resource` (gains `service.name` / `service.namespace` / `service.version` /
`site.id` / `node.role` / `host.name` on every metric and span). Prometheus `/metrics` stays the
default exporter; OTLP becomes opt-in via options.
- Add standard instrumentation through `AddZbTelemetry` options: ASP.NET Core meters, HttpClient,
runtime + process meters — none wired today.
- Fix the tracing no-op: wire an OTLP exporter (or at minimum note that tracing is recorded but not
exported); `AddZbTelemetry` provides OTLP as the opt-in path.
- `MapOtOpcUaMetrics``app.MapZbMetrics()` (same `/metrics` path; shared convention).
**Replace with shared Serilog bootstrap:**
- Serilog bootstrap in `Program.cs:4952``builder.AddZbSerilog(o => { o.ServiceName = "otopcua"; o.SiteId = ...; o.NodeRole = ...; })`.
This adds structural `SiteId` / `NodeRole` / `NodeHostname` properties to every log line
(currently absent) and wires the `TraceContextEnricher` so `trace_id`/`span_id` appear on log
lines emitted during active spans.
- Console + file sinks continue via `ReadFrom.Configuration` in `appsettings.json` — no sink changes
needed.
- `UseSerilogRequestLogging()` stays.
**Keep bespoke:**
- `OtOpcUaTelemetry.cs` — the application `Meter`, `ActivitySource`, and all instrument definitions
(`otopcua.*` counters, histograms, spans). These are domain instruments; `AddZbTelemetry` registers
them by name but does not own them.
- `LogContextEnricher.cs` — driver-lifecycle correlation properties (`DriverInstanceId`,
`DriverType`, `CapabilityName`, `CorrelationId`) are OtOpcUa-specific. The enricher continues to
push via `LogContext.PushProperty` alongside the shared enrichers.
- `ObservabilityExtensions.cs` itself can be simplified or removed — it becomes a thin wrapper that
calls `AddZbTelemetry` with OtOpcUa-specific options. The per-project entry point remains; only
the implementation body is delegated to the shared library.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Telemetry`
library build. The library build delivers the shared bootstrap and enrichers; adoption lands in the
OtOpcUa repo as a separate commit once the nupkg is available.