docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).
C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).
C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.
m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.
I4: §5 standard instrumentation table corrected — OtOpcUa now shows ⛔ not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.
I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).
I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.
I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.
m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
This commit is contained in:
@@ -13,8 +13,9 @@ logs) via a single `AddZbTelemetry` extension; the shared `Resource` attribute s
|
||||
`host.name`) that makes every node distinguishable in a collector; standard instrumentation
|
||||
everyone enables (ASP.NET Core, HttpClient, gRPC client, runtime, process meters); exporter
|
||||
conventions (Prometheus scrape endpoint default, OTLP opt-in); a shared Serilog bootstrap
|
||||
with identity enrichers (`SiteId`, `NodeRole`, `NodeHostname`) bound from the same options
|
||||
object as the OTel Resource (metrics and logs therefore carry identical dimensions); a
|
||||
with identity enrichers (`SiteId`, `NodeRole` from `ZbTelemetryOptions`; `NodeHostname` auto
|
||||
from `Environment.MachineName`) matching the OTel Resource dimensions (metrics and logs
|
||||
therefore carry identical dimensions); a
|
||||
`TraceContextEnricher` that stamps `trace_id`/`span_id` from `Activity.Current` onto every
|
||||
Serilog event, enabling log↔trace correlation; an `ILogRedactor` redaction seam.
|
||||
|
||||
@@ -53,6 +54,11 @@ This is the headline fix: nobody in the fleet sets a `Resource` or `service.name
|
||||
making every node indistinguishable in a collector. Every project must call `AddZbTelemetry`
|
||||
to be observable.
|
||||
|
||||
> **`IServiceCollection` overload:** `AddZbTelemetry` also has an `IServiceCollection`-based
|
||||
> overload for host configurations where `IHostApplicationBuilder` is not available (detailed in
|
||||
> the shared-contract). The `IHostApplicationBuilder` overload is the primary path for all three
|
||||
> apps on .NET 10.
|
||||
|
||||
## 2. Shared Resource
|
||||
|
||||
The OTel `Resource` attached to all three signals is built from `ZbTelemetryOptions`:
|
||||
@@ -119,15 +125,22 @@ project's bespoke logging bootstrap with a shared two-stage pattern:
|
||||
| `TraceContextEnricher` | `trace_id`, `span_id` | `Activity.Current` |
|
||||
| `RedactionEnricher` | _(project-defined fields)_ | `ILogRedactor` implementation |
|
||||
|
||||
The three identity properties (`SiteId`, `NodeRole`, `NodeHostname`) are bound from the
|
||||
same `ZbTelemetryOptions` object as the OTel `Resource`, so logs and metrics/traces carry
|
||||
identical dimensions. When no `Activity.Current` is present (e.g. background services,
|
||||
`SiteId` and `NodeRole` are bound from the same `ZbTelemetryOptions` object as the OTel
|
||||
`Resource`; `NodeHostname` is populated automatically from `Environment.MachineName` (not a
|
||||
caller-supplied option). All three identity properties appear on logs and metrics/traces alike,
|
||||
so signals from the same node carry identical dimensions. When no `Activity.Current` is present (e.g. background services,
|
||||
startup), `TraceContextEnricher` emits nothing — it does not inject empty or zero values.
|
||||
|
||||
`MinimumLevel` is set explicitly in code (default `Information`) and can be overridden via
|
||||
`IConfiguration` (`Serilog:MinimumLevel`). Sinks are fully config-driven:
|
||||
`ReadFrom.Configuration` reads `Serilog:WriteTo` from `appsettings.json` / environment.
|
||||
|
||||
> **Per-project config paths:** `AddZbSerilog` reads `Serilog:MinimumLevel` from `IConfiguration`.
|
||||
> Callers that bind MinimumLevel from a different key (e.g. ScadaBridge's
|
||||
> `ScadaBridge:Logging:MinimumLevel`) apply that override themselves before or after calling
|
||||
> `AddZbSerilog`. The config key for MinimumLevel remains per-project; `AddZbSerilog` is not
|
||||
> parameterized on it.
|
||||
|
||||
OTel log export is wired in the same call: logs flow through the OTel pipeline with the
|
||||
same `Resource` attached, making all three signals (metrics / traces / logs) available in a
|
||||
single backend.
|
||||
|
||||
Reference in New Issue
Block a user