docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions

C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
This commit is contained in:
Joseph Doherty
2026-06-01 07:32:58 -04:00
parent 645388b1f1
commit 215a646e35
6 changed files with 94 additions and 58 deletions
+18 -5
View File
@@ -13,8 +13,9 @@ logs) via a single `AddZbTelemetry` extension; the shared `Resource` attribute s
`host.name`) that makes every node distinguishable in a collector; standard instrumentation
everyone enables (ASP.NET Core, HttpClient, gRPC client, runtime, process meters); exporter
conventions (Prometheus scrape endpoint default, OTLP opt-in); a shared Serilog bootstrap
with identity enrichers (`SiteId`, `NodeRole`, `NodeHostname`) bound from the same options
object as the OTel Resource (metrics and logs therefore carry identical dimensions); a
with identity enrichers (`SiteId`, `NodeRole` from `ZbTelemetryOptions`; `NodeHostname` auto
from `Environment.MachineName`) matching the OTel Resource dimensions (metrics and logs
therefore carry identical dimensions); a
`TraceContextEnricher` that stamps `trace_id`/`span_id` from `Activity.Current` onto every
Serilog event, enabling log↔trace correlation; an `ILogRedactor` redaction seam.
@@ -53,6 +54,11 @@ This is the headline fix: nobody in the fleet sets a `Resource` or `service.name
making every node indistinguishable in a collector. Every project must call `AddZbTelemetry`
to be observable.
> **`IServiceCollection` overload:** `AddZbTelemetry` also has an `IServiceCollection`-based
> overload for host configurations where `IHostApplicationBuilder` is not available (detailed in
> the shared-contract). The `IHostApplicationBuilder` overload is the primary path for all three
> apps on .NET 10.
## 2. Shared Resource
The OTel `Resource` attached to all three signals is built from `ZbTelemetryOptions`:
@@ -119,15 +125,22 @@ project's bespoke logging bootstrap with a shared two-stage pattern:
| `TraceContextEnricher` | `trace_id`, `span_id` | `Activity.Current` |
| `RedactionEnricher` | _(project-defined fields)_ | `ILogRedactor` implementation |
The three identity properties (`SiteId`, `NodeRole`, `NodeHostname`) are bound from the
same `ZbTelemetryOptions` object as the OTel `Resource`, so logs and metrics/traces carry
identical dimensions. When no `Activity.Current` is present (e.g. background services,
`SiteId` and `NodeRole` are bound from the same `ZbTelemetryOptions` object as the OTel
`Resource`; `NodeHostname` is populated automatically from `Environment.MachineName` (not a
caller-supplied option). All three identity properties appear on logs and metrics/traces alike,
so signals from the same node carry identical dimensions. When no `Activity.Current` is present (e.g. background services,
startup), `TraceContextEnricher` emits nothing — it does not inject empty or zero values.
`MinimumLevel` is set explicitly in code (default `Information`) and can be overridden via
`IConfiguration` (`Serilog:MinimumLevel`). Sinks are fully config-driven:
`ReadFrom.Configuration` reads `Serilog:WriteTo` from `appsettings.json` / environment.
> **Per-project config paths:** `AddZbSerilog` reads `Serilog:MinimumLevel` from `IConfiguration`.
> Callers that bind MinimumLevel from a different key (e.g. ScadaBridge's
> `ScadaBridge:Logging:MinimumLevel`) apply that override themselves before or after calling
> `AddZbSerilog`. The config key for MinimumLevel remains per-project; `AddZbSerilog` is not
> parameterized on it.
OTel log export is wired in the same call: logs flow through the OTel pipeline with the
same `Resource` attached, making all three signals (metrics / traces / logs) available in a
single backend.