Files
scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md
T
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00

255 lines
12 KiB
Markdown

# Proposed shared library: `ZB.MOM.WW.Telemetry`
A contract on paper — the public surface to extract so the three projects stop implementing
observability separately. Realizes [`../spec/SPEC.md`](../spec/SPEC.md) and
[`../spec/METRIC-CONVENTIONS.md`](../spec/METRIC-CONVENTIONS.md). **Not yet created.**
Reference implementations already exist: OtOpcUa `ObservabilityExtensions.cs` (OTel + Serilog),
ScadaBridge `LoggerConfigurationFactory.cs` (Serilog enrichers), MxGateway
`GatewayMetrics.cs` + `GatewayLogRedactor.cs`.
## Packages (.NET 10)
```
ZB.MOM.WW.Telemetry # OTel bootstrap: Resource, metrics, traces, exporters
ZB.MOM.WW.Telemetry.Serilog # Serilog bootstrap: enrichers, TraceContextEnricher, ILogRedactor
```
Both packages are .NET 10 — all three logging-bearing processes are .NET 10 (OtOpcUa server,
mxaccessgw gateway, ScadaBridge central). The x86 net48 mxaccessgw worker uses a bespoke
`IWorkerLogger` (stderr key=value); net48 multi-targeting is **not** required. Published to
the Gitea NuGet feed; SemVer; lockstep to start.
## Packaging & distribution
**Two NuGet packages, one DLL each**, on the Gitea NuGet feed. Libraries linked into each
app — there is no central telemetry service. Both packages are consumed by all three apps
after adoption:
| Package (→ DLL) | Transitive deps | OtOpcUa | MxGateway | ScadaBridge |
|---|---|---|---|---|
| `…Telemetry` | OpenTelemetry SDK, `OpenTelemetry.Exporter.Prometheus.AspNetCore`, `OpenTelemetry.Exporter.OpenTelemetryProtocol`, standard instrumentation packages | ✅ | ✅ | ✅ |
| `…Telemetry.Serilog` | Serilog, `Serilog.Extensions.Hosting`, `Serilog.AspNetCore` (version note below) | ✅ | ✅ | ✅ |
> **`Serilog.AspNetCore` version split (open convergence note):** OtOpcUa and ScadaBridge
> target .NET 10 and may use `Serilog.AspNetCore` 9.x; MxGateway's adoption starts from
> `Serilog.AspNetCore` 9.x as well. If a project remains on .NET 8 ASP.NET Core for any
> reason, the compatible version is `Serilog.AspNetCore` 8.x. Coordinate the version floor
> when the first app takes a dependency and pin it in `Directory.Packages.props`.
---
## `ZB.MOM.WW.Telemetry`
```csharp
namespace ZB.MOM.WW.Telemetry;
/// Selects how instrumentation data is exported.
public enum ZbExporter
{
/// Prometheus scrape endpoint (default). Call app.MapZbMetrics() to mount /metrics.
Prometheus,
/// OTLP gRPC export. Set OtlpEndpoint (e.g. "http://collector:4317").
/// Coexists with Prometheus when both endpoints are desired.
Otlp,
}
/// Options for AddZbTelemetry. All properties feed the shared OTel Resource and
/// Serilog enrichers (via AddZbSerilog in the .Serilog package).
public sealed class ZbTelemetryOptions
{
/// Required. Short lower-case app identifier — e.g. "otopcua", "mxgateway", "scadabridge".
/// Populates OTel Resource service.name.
public string ServiceName { get; set; } = "";
/// Fleet-wide namespace. Default "ZB.MOM.WW". Do not override per-app.
/// Populates OTel Resource service.namespace.
public string ServiceNamespace { get; set; } = "ZB.MOM.WW";
/// Optional. Populate from AssemblyInformationalVersion.
/// Populates OTel Resource service.version.
public string? ServiceVersion { get; set; }
/// Optional. Physical or logical site identifier.
/// Populates OTel Resource site.id and Serilog property SiteId.
public string? SiteId { get; set; }
/// Optional. Node function: "central", "site", "hub", "standalone".
/// Populates OTel Resource node.role and Serilog property NodeRole.
public string? NodeRole { get; set; }
/// App-specific Meter names to register with the OTel MeterProvider.
/// Always register the app's primary Meter here. Standard instrumentation meters are
/// added automatically (ASP.NET Core, HttpClient, runtime, process).
public string[] Meters { get; set; } = [];
/// App-specific ActivitySource names to register with the OTel TracerProvider.
public string[] ActivitySources { get; set; } = [];
/// Export path. Default Prometheus; use Otlp for a real collector.
public ZbExporter Exporter { get; set; } = ZbExporter.Prometheus;
/// Required when Exporter = ZbExporter.Otlp.
/// OTLP gRPC endpoint, e.g. "http://collector:4317".
public string? OtlpEndpoint { get; set; }
}
/// Extension point for configuring the OTel bootstrap on an IHostApplicationBuilder.
public static class ZbTelemetryExtensions
{
/// Configures the OpenTelemetry MeterProvider and TracerProvider with the shared Resource,
/// standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process),
/// the app's own Meters and ActivitySources, and the selected exporter.
/// Does NOT configure Serilog — call AddZbSerilog() in the .Serilog package for that.
public static IHostApplicationBuilder AddZbTelemetry(
this IHostApplicationBuilder builder,
Action<ZbTelemetryOptions> configure);
/// IServiceCollection overload for contexts where IHostApplicationBuilder is not available.
/// Requires the caller to supply a pre-built ZbTelemetryOptions (Resource attributes must
/// be populated before DI composition, so the options-object overload is preferred).
public static IServiceCollection AddZbTelemetry(
this IServiceCollection services,
ZbTelemetryOptions options);
}
/// Builds the shared OTel ResourceBuilder from ZbTelemetryOptions.
/// Used internally by AddZbTelemetry. Exposed for tests and custom pipelines.
public static class ZbResource
{
/// Returns a ResourceBuilder pre-populated with service.name, service.namespace,
/// service.version, site.id, node.role, and host.name (always Environment.MachineName).
/// Attributes with null values are omitted from the Resource.
public static ResourceBuilder Build(ZbTelemetryOptions options);
}
/// Endpoint extension for mounting the Prometheus /metrics scrape endpoint.
public static class ZbMetricsEndpointExtensions
{
/// Mounts the Prometheus /metrics endpoint.
/// Only valid when ZbTelemetryOptions.Exporter = ZbExporter.Prometheus (or both).
/// Call after app.UseRouting().
public static IEndpointConventionBuilder MapZbMetrics(
this IEndpointRouteBuilder endpoints);
}
```
---
## `ZB.MOM.WW.Telemetry.Serilog`
```csharp
namespace ZB.MOM.WW.Telemetry.Serilog;
/// Extension point for configuring the Serilog two-stage bootstrap on an IHostApplicationBuilder.
public static class ZbSerilogExtensions
{
/// Two-stage Serilog bootstrap:
/// Stage 1 — minimal console-only bootstrap logger (for startup errors before IConfiguration).
/// Stage 2 — application logger wired from IConfiguration (ReadFrom.Configuration reads
/// Serilog:WriteTo sinks + Serilog:MinimumLevel from "Serilog:MinimumLevel") with
/// fixed enrichers: SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from
/// Environment.MachineName (auto — not a caller-supplied option); TraceContextEnricher;
/// and RedactionEnricher (applied only when ILogRedactor is registered).
///
/// MinimumLevel: AddZbSerilog reads "Serilog:MinimumLevel" from IConfiguration. Callers that
/// bind MinimumLevel from a different config key (e.g. ScadaBridge's
/// "ScadaBridge:Logging:MinimumLevel") apply that override themselves before or after
/// calling AddZbSerilog — this remains per-project and AddZbSerilog does not read it.
///
/// OTel log export is wired automatically: logs flow through the OTel pipeline with the same
/// Resource as the metrics and traces (all three signals correlated in a backend).
///
/// The configure delegate receives the same ZbTelemetryOptions used by AddZbTelemetry.
/// Typically share a single options-population lambda across both calls.
public static IHostApplicationBuilder AddZbSerilog(
this IHostApplicationBuilder builder,
Action<ZbTelemetryOptions> configure);
}
/// Canonical Serilog property name constants for the identity enrichers.
/// Use these constants — not literal strings — when querying properties in sinks or tests.
public static class ZbLogEnricherNames
{
/// Serilog property: physical or logical site identifier. Matches OTel Resource site.id.
public const string SiteId = "SiteId";
/// Serilog property: node function (central, site, hub, standalone). Matches OTel node.role.
public const string NodeRole = "NodeRole";
/// Serilog property: machine name (Environment.MachineName). Matches OTel host.name.
public const string NodeHostname = "NodeHostname";
}
/// Stamps trace_id and span_id from Activity.Current onto every Serilog log event.
/// When Activity.Current is null (no active span — background services, startup, non-traced paths)
/// the enricher emits nothing; it does NOT inject empty strings or zero values.
/// This enables a log line to be clicked through to its originating trace in a backend.
public sealed class TraceContextEnricher : ILogEventEnricher
{
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}
/// Seam for project-specific log-event redaction.
/// The shared library applies this via RedactionEnricher; each project provides its own
/// implementation that knows which fields (by property name) or which command payloads
/// must not leave the process in log events.
/// If no ILogRedactor is registered in DI, RedactionEnricher is a no-op.
public interface ILogRedactor
{
/// Inspect and mutate properties in-place. Remove or replace any sensitive values.
/// Called on every log event before it reaches any sink.
void Redact(IDictionary<string, object?> properties);
}
/// Applies a registered ILogRedactor to every Serilog log event.
/// Registered automatically by AddZbSerilog. The enricher resolves ILogRedactor from DI
/// on first use; if none is registered it is permanently inert (no DI call per event).
public sealed class RedactionEnricher : ILogEventEnricher
{
public RedactionEnricher(IServiceProvider serviceProvider);
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}
```
---
## Consumer matrix
| Consumer | Packages | Notes |
|---|---|---|
| **MxGateway** | Both | MEL → Serilog migration: `GatewayLogScope`/`BeginScope``LogContext.PushProperty`; `GatewayLogRedactor``ILogRedactor` impl; `GatewayMetrics` stays, wired through `o.Meters`. **Done in this release.** |
| **OtOpcUa** | Both | Consolidate existing Serilog bootstrap; add `TraceContextEnricher` + `SiteId`/`NodeRole` enrichers; add Resource to existing OTel pipeline. Deferred to GAPS backlog. |
| **ScadaBridge** | Both | Add full OTel SDK (metrics + traces + export); consolidate `LoggerConfigurationFactory`; add `TraceContextEnricher`. Deferred to GAPS backlog. |
The net48 x86 mxaccessgw worker is excluded from both packages. Its `IWorkerLogger`
(stderr key=value format) is an out-of-process concern and remains bespoke.
---
## Open contract questions
1. **`IServiceCollection` overload completeness:** the `IHostApplicationBuilder`-based
overload is the primary path (available in all three apps on .NET 10). The
`IServiceCollection` overload is a fallback for unusual host configurations. Validate
that both overloads wire OTel log export identically (same Resource, same enrichers).
2. **OTel log export channel:** `AddZbSerilog` uses `Serilog.Sinks.OpenTelemetry` to push
logs into the OTel pipeline (sharing the Resource). Confirm the sink version is
compatible with the OpenTelemetry SDK version pinned in `ZB.MOM.WW.Telemetry`
(`Directory.Packages.props`).
3. **`RedactionEnricher` DI timing:** `RedactionEnricher` resolves `ILogRedactor` from
`IServiceProvider` on first use (lazy, to avoid a circular-DI problem during Serilog's
two-stage bootstrap). Validate that the service provider is fully built by the time the
first post-startup log event fires. If MxGateway's `GatewayLogRedactor` has dependencies
that are not available at stage-1 bootstrap time, the lazy-resolve pattern protects it.
4. **`SiteId` / `NodeRole` null handling:** `AddZbTelemetry` and `AddZbSerilog` silently
omit null `SiteId`/`NodeRole` from the Resource and enricher set. Confirm this is the
correct behavior for OtOpcUa, which may run in a single-site configuration where neither
field is meaningful, versus ScadaBridge, where `SiteId` is essential for multi-cluster
fleet visibility.
See [`../GAPS.md`](../GAPS.md) for the adoption order and effort/risk.