8311912f40
- NuGet metadata: expanded Description and PackageTags on both library csproj files (opentelemetry;observability;metrics;tracing;prometheus;otlp;... / serilog;logging;...) - Full dotnet test: 7 (Telemetry) + 12 (Serilog) = 19 tests, all green - dotnet pack: ZB.MOM.WW.Telemetry.0.1.0.nupkg + ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg (artifacts/ gitignored, not committed) - ZB.MOM.WW.Telemetry/README.md: overview, 2 packages, unifying hinge prose, exporter options, OTel signals + trace-log correlation, test/pack commands, status - ZB.MOM.WW.Telemetry/CLAUDE.md: package responsibilities, consumer matrix, build/test/pack commands, status + pointers to components/observability/ - components/README.md: Observability row added to component registry table - CLAUDE.md: Telemetry row added to component-normalization table; intro count updated to four shared libs; observability prose paragraph added (MxGateway logging adoption noted) - upcoming.md: Observability item ticked done, pointing at components/observability/ and ZB.MOM.WW.Telemetry; MxGateway MEL->Serilog adoption noted - components/observability/README.md: status updated to Built @ 0.1.0, library build/pack commands added, MxGateway adoption row updated
115 lines
7.3 KiB
Markdown
115 lines
7.3 KiB
Markdown
# Observability (metrics / traces / logs)
|
|
|
|
Third normalized component under the operability cluster. **Goal: path to shared code** — converge
|
|
the three sister projects onto a common OpenTelemetry Resource, a shared Serilog bootstrap with
|
|
unified enrichers, and a trace↔log correlation bridge, proposed as the `ZB.MOM.WW.Telemetry`
|
|
library set (2 packages), while each project keeps its own application instruments and sink
|
|
configuration.
|
|
|
|
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
|
|
- Metric naming reference: [`spec/METRIC-CONVENTIONS.md`](spec/METRIC-CONVENTIONS.md)
|
|
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Telemetry.md`](shared-contract/ZB.MOM.WW.Telemetry.md)
|
|
- Divergences + backlog: [`GAPS.md`](GAPS.md)
|
|
- Current state, per project: [`current-state/`](current-state/)
|
|
|
|
## Why observability is a strong normalization candidate
|
|
|
|
All three projects instrument something — but in three completely different ways and at three very
|
|
different levels of completeness. The divergences are structural:
|
|
|
|
- **OtOpcUa** has the full OpenTelemetry SDK (metrics + tracing), Prometheus export, and a bespoke
|
|
Serilog enricher for driver-lifecycle correlation — but no Resource (`service.name` is never set)
|
|
and no trace↔log bridge.
|
|
- **MxAccessGateway** has 20 hand-rolled instruments (counters, histograms, gauges) recording real
|
|
production data — that never leave the process. No OTel SDK, no exporter, no tracing. Logging
|
|
uses Microsoft.Extensions.Logging rather than Serilog, with a bespoke correlation-scope and
|
|
redaction pipeline.
|
|
- **ScadaBridge** has zero application instruments. Its `OpenTelemetry.Api` reference is a CVE
|
|
patch, not instrumentation. It does have the cleanest structured logging enricher set
|
|
(`SiteId`/`NodeRole`/`NodeHostname`) — but those properties exist only in Serilog, not in the
|
|
OTel Resource, so logs and metrics cannot join in a backend.
|
|
|
|
Nobody sets a Resource. Nobody does trace↔log correlation. MxGateway's metrics are invisible.
|
|
ScadaBridge has no metrics at all.
|
|
|
|
The common fix is a single `AddZbTelemetry(options)` call that: creates a shared Resource from a
|
|
`service.name`/`site.id`/`node.role` options object; registers the project's own Meter/ActivitySource
|
|
names with the OTel SDK; and exposes Prometheus `/metrics`. A companion `AddZbSerilog(options)` wires
|
|
Serilog with the same options as enricher properties and adds `TraceContextEnricher` so logs carry
|
|
`trace_id`/`span_id`. The unifying hinge: the same identity triple (`service.name`/`site.id`/
|
|
`node.role`) populates both the OTel Resource and the Serilog enrichers, so a metric, a span, and
|
|
a log line from the same node carry identical dimensions and join up in a backend.
|
|
|
|
One adoption happens **in this task**: MxAccessGateway migrates off MEL onto `AddZbSerilog`. All
|
|
other app wiring is follow-on, consistent with how Auth and UI-Theme are structured.
|
|
|
|
## Status by project
|
|
|
|
| Project | OTel SDK today | Metrics today | Tracing today | Logging today | Enrichers today | Adoption status |
|
|
|---|---|---|---|---|---|---|
|
|
| **OtOpcUa** | ✅ full SDK (`WithMetrics`+`WithTracing`) | ✅ 7 instruments (`otopcua.*`); Prometheus `/metrics` | 🟡 2 spans defined; no exporter | Serilog (Console+File) | `DriverInstanceId`/`DriverType`/`CapabilityName`/`CorrelationId` (driver-scope) | Not started (follow-on) |
|
|
| **MxAccessGateway** | ⛔ none (hand-rolled `Meter`) | 🟡 20 instruments (`mxgateway.*`); **never exported** | ⛔ none | **Serilog (migrated from MEL — adopted)** | `SiteId`/`NodeRole`/`NodeHostname` (via `AddZbSerilog`); session/worker enrichers via `LogContext.PushProperty` | **Logging adopted; OTel metrics/traces follow-on** |
|
|
| **ScadaBridge** | ⛔ (`OpenTelemetry.Api` CVE-patch only) | ⛔ zero instruments | ⛔ none | Serilog (Console+File) | `SiteId`/`NodeRole`/`NodeHostname` (process-level; strongest set) | Not started (follow-on) |
|
|
|
|
See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the
|
|
code-verified detail and its adoption plan.
|
|
|
|
## Normalized vs. left per-project
|
|
|
|
**Normalized (the shared target):**
|
|
|
|
- `AddZbTelemetry(ZbTelemetryOptions)` — front door for the OTel SDK. Populates the shared
|
|
Resource (`service.name`, `service.namespace`, `service.version`, `site.id`, `node.role`,
|
|
`host.name`). Registers the caller-supplied Meter and ActivitySource name(s). Wires standard
|
|
instrumentation (ASP.NET Core, HttpClient, runtime, process). Prometheus default; OTLP opt-in.
|
|
- `app.MapZbMetrics()` — maps the Prometheus `/metrics` endpoint (shared path + shared exporter).
|
|
- `AddZbSerilog(ZbTelemetryOptions)` — shared Serilog two-stage bootstrap generalizing
|
|
ScadaBridge's `LoggerConfigurationFactory`. Wires `SiteId`/`NodeRole`/`NodeHostname` enrichers
|
|
from the same options object as the OTel Resource. Wires `TraceContextEnricher`
|
|
(`trace_id`/`span_id` from `Activity.Current`). Preserves `ReadFrom.Configuration` for sinks
|
|
and explicit `MinimumLevel.Is` override.
|
|
- `ILogRedactor` seam — generalized from MxGateway's `GatewayLogRedactor`. The seam is shared;
|
|
the redaction policy (which fields/commands) stays per-project.
|
|
- Metric naming convention: `<meter>.<subsystem>.<event>`; Meter name = project namespace
|
|
(`ZB.MOM.WW.<ProjectName>`); duration unit = `s` (OTel semconv).
|
|
|
|
**Left per-project (not forced together):**
|
|
|
|
- Application `Meter`, `ActivitySource`, and all instrument definitions — `otopcua.*`,
|
|
`mxgateway.*`, `scadabridge.*` instruments are owned by each repo.
|
|
- Serilog sink configuration (`appsettings.json` Console/File templates, rolling intervals).
|
|
- Per-operation/per-session correlation enrichers (`LogContextEnricher` in OtOpcUa;
|
|
`LogContext.PushProperty` scope in MxGateway after migration).
|
|
- Redaction policies (`MxGatewayLogRedactor` implements `ILogRedactor` with gateway-specific
|
|
command/field rules).
|
|
- Config section paths for `SiteId`/`NodeRole`/`NodeHostname` — each project binds these from
|
|
its own config hierarchy and passes the resolved values to `AddZbTelemetry`/`AddZbSerilog`.
|
|
|
|
## Package structure
|
|
|
|
`ZB.MOM.WW.Telemetry` ships as two dependency-split packages:
|
|
|
|
| Package | Contents | Consumers |
|
|
|---|---|---|
|
|
| `ZB.MOM.WW.Telemetry` | `AddZbTelemetry`, `ZbTelemetryOptions`, Resource builder, standard instrumentation, Prometheus/OTLP exporters, `app.MapZbMetrics()` | All three |
|
|
| `ZB.MOM.WW.Telemetry.Serilog` | `AddZbSerilog`, shared enrichers (`SiteId`/`NodeRole`/`NodeHostname`/`TraceContextEnricher`), `ILogRedactor` seam | All three (Serilog users); MxGateway on migration |
|
|
|
|
Both packages share `ZbTelemetryOptions` as the single options object that drives Resource
|
|
attributes, Serilog enrichers, Meter/ActivitySource names, and exporter selection — the unifying
|
|
hinge that makes a metric, a span, and a log line from the same node carry identical dimensions.
|
|
|
|
## Component status
|
|
|
|
**Status: Built @ 0.1.0. MxAccessGateway MEL → Serilog logging adopted (on its own branch).
|
|
OtOpcUa and ScadaBridge telemetry adoption is follow-on, tracked in [`GAPS.md`](GAPS.md).**
|
|
|
|
The shared library lives at
|
|
[`~/Desktop/scadaproj/ZB.MOM.WW.Telemetry/`](../../ZB.MOM.WW.Telemetry/) (.NET 10; 2 packages —
|
|
`ZB.MOM.WW.Telemetry` and `ZB.MOM.WW.Telemetry.Serilog`; 19 tests; `dotnet pack` → 2 nupkgs @ 0.1.0).
|
|
Build/test/pack from `ZB.MOM.WW.Telemetry/`:
|
|
|
|
```bash
|
|
dotnet test ZB.MOM.WW.Telemetry.slnx
|
|
dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts
|
|
```
|