Files
scadaproj/ZB.MOM.WW.Telemetry/README.md
T
Joseph Doherty 544a6ddb77 Fix all baseline code-review findings across the six shared libraries
Resolves the 35 findings from the 2026-06-01 baseline (commit 26ba1c7),
test-first for every behavioral change. +51 tests (331 -> 382 passing, 0 failed).

- Telemetry-001 (HIGH): RedactionEnricher now honours property removal, so a
  redactor that drops a key actually scrubs the secret from the event.
- Auth: LDAP validator ValidateOnStart; API-key verify no longer fails on a
  best-effort MarkUsed write or a corrupt scopes column (fail-closed); LDAP cert
  validation hook; KeyPrefix persistence aligned; README algorithm corrected.
- Health: Akka checks return Degraded (not throw) when the cluster isn't up yet;
  GrpcDependencyHealthCheck catch-all; null 'description' rendered; composite
  endpoint builder; XML docs shipped.
- Audit: CompositeAuditWriter no longer re-throws OperationCanceledException;
  TruncatingAuditRedactor over-redact scrubs Target + safe negative max; options
  record; XML docs shipped.
- Configuration: TryAddEnumerable idempotent registration; consistent port
  quoting; strict invariant port parsing; XML docs + README packaged.
- Theme: mobile toggle is now CSS-only (no Bootstrap JS); token/CSS hygiene;
  XML docs on the public parameter surface.

Shared-contract/spec docs updated where the code was the source of truth
(observability service.instance.id, MapZbMetrics, redactor reach). All changes
additive/back-compatible at v0.1.0. code-reviews bookkeeping follows separately.
2026-06-01 11:22:14 -04:00

162 lines
6.7 KiB
Markdown

# ZB.MOM.WW.Telemetry
Observability libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). These are **libraries, not a service** — each package is linked directly into the consuming application at build time. There is no central telemetry process; all instrumentation runs in-process alongside the application.
The library normalizes the three-project observability surface: a shared OpenTelemetry Resource identity, standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and trace↔log correlation — so metrics, traces, and log lines from the same node carry identical dimensions and can join up in any backend.
---
## Packages
| Package | Description | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Telemetry` | `AddZbTelemetry` extension, `ZbTelemetryOptions`, shared OTel Resource builder (`ZbResource`), standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process), Prometheus always-on exporter + OTLP opt-in overlay, `app.MapZbMetrics()` endpoint extension. | `Microsoft.AspNetCore.App` (framework ref), `OpenTelemetry.*` stack |
| `ZB.MOM.WW.Telemetry.Serilog` | `AddZbSerilog` extension, shared enrichers (`SiteId`/`NodeRole`/`NodeHostname`), `TraceContextEnricher` (writes `trace_id`/`span_id` from `Activity.Current` into every log event), `ILogRedactor` seam (per-project sensitive-field redaction), `RedactionEnricher`. | `ZB.MOM.WW.Telemetry`, `Serilog.*` stack |
---
## The unifying hinge
The single `ZbTelemetryOptions` object drives both packages. Its identity triple —
`ServiceName` → OTel Resource `service.name`, `SiteId``site.id`, `NodeRole``node.role`
is applied once and flows automatically to **both** the OpenTelemetry Resource (so every metric
and span carries it) **and** the Serilog enrichers (so every log event carries it). A metric,
a span, and a log line emitted by the same node share identical `service.name`, `site.id`, and
`node.role` dimensions, enabling cross-signal correlation in any backend (Grafana, Jaeger, Seq,
Loki, etc.) without per-project bookkeeping.
---
## Consumer matrix
| Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` |
|---|:---:|:---:|
| **OtOpcUa** | yes | yes |
| **MxAccessGateway** | yes | yes (logging adopted — MEL → Serilog migration done) |
| **ScadaBridge** | yes | yes |
All three apps consume both packages after adoption. MxAccessGateway's MEL→Serilog migration
is the one in-pass adoption completed on its own branch; OtOpcUa and ScadaBridge adoption is
follow-on (tracked in `components/observability/GAPS.md`).
---
## OTel signals
`AddZbTelemetry` wires all three OpenTelemetry signals in a single call:
| Signal | What is wired |
|---|---|
| **Metrics** | App Meters (via `options.Meters[]`) + standard: ASP.NET Core, HttpClient, .NET runtime, process. Exported via Prometheus (always on) with OTLP as an additive overlay. |
| **Traces** | App ActivitySources (via `options.ActivitySources[]`) + standard: ASP.NET Core, HttpClient, gRPC client. Exported via OTLP when `Exporter = ZbExporter.Otlp`. |
| **Logs** | Wired by `AddZbSerilog` (companion call). Serilog is used as the log sink; logs are bridged to OpenTelemetry via `Serilog.Sinks.OpenTelemetry` when configured. |
Trace↔log correlation is automatic: `TraceContextEnricher` reads `Activity.Current` for each
log event and attaches `trace_id` and `span_id`, so log events produced inside a traced request
carry the same span identity as the trace backend.
**Redaction reach.** A registered `ILogRedactor` may **remove** or **replace** any top-level
property, and `RedactionEnricher` honours both (a removed key is dropped from the event). The seam
sees the unwrapped value of scalar properties only — a destructured `{@Object}` property is exposed
as its raw Serilog `StructureValue` wrapper, so a redactor can replace/remove the whole structured
property but **cannot** mask a field nested inside it. To protect a sensitive field of a logged
object, log it as its own scalar property (do not destructure it) or remove the whole property by
key. See the `ILogRedactor` XML doc for the full contract.
---
## Exporter options
Prometheus is **always wired** for metrics regardless of the `Exporter` setting. OTLP is an
additive overlay — set `Exporter = ZbExporter.Otlp` and `OtlpEndpoint` to push to a collector
in addition to the scrape endpoint.
```csharp
// Prometheus only (default — scrape /metrics)
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
o.Meters = ["ZB.MOM.WW.MxGateway"];
});
// OTLP overlay (metrics + traces pushed to collector; /metrics still active)
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
o.Meters = ["ZB.MOM.WW.MxGateway"];
o.Exporter = ZbExporter.Otlp;
o.OtlpEndpoint = "http://collector:4317";
});
// Mount the Prometheus scrape endpoint (call after app.UseRouting())
app.MapZbMetrics(); // → /metrics
```
```csharp
// Serilog bootstrap (same options object drives enrichers)
builder.AddZbSerilog(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
});
```
---
## Building and testing
```bash
# from ZB.MOM.WW.Telemetry/
dotnet build ZB.MOM.WW.Telemetry.slnx
dotnet test ZB.MOM.WW.Telemetry.slnx
```
All test assemblies run with no external dependencies (no running OTel collector, no Serilog
backend):
| Assembly | Tests |
|---|---|
| `ZB.MOM.WW.Telemetry.Tests` | 12 |
| `ZB.MOM.WW.Telemetry.Serilog.Tests` | 17 |
| **Total** | **29** |
---
## Packing
```bash
dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts
```
Produces two `.nupkg` files in `artifacts/`:
```
ZB.MOM.WW.Telemetry.0.1.0.nupkg
ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg
```
`GeneratePackageOnBuild` is off — pack explicitly as above. Both packages are versioned
lockstep from `Directory.Build.props`.
---
## Status
**Built at 0.1.0. MxAccessGateway logging adopted (MEL → Serilog migration, on its own branch).
Broader OtOpcUa and ScadaBridge telemetry adoption deferred.** Adoption is tracked in the
component backlog:
- `~/Desktop/scadaproj/components/observability/GAPS.md`
Design documentation lives alongside that backlog:
- `~/Desktop/scadaproj/components/observability/spec/SPEC.md` — normalized observability target
- `~/Desktop/scadaproj/components/observability/spec/METRIC-CONVENTIONS.md` — metric naming reference
- `~/Desktop/scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md` — proposed API
- `~/Desktop/scadaproj/components/observability/current-state/` — per-project current state (code-verified)