Files
scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md
T
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00

12 KiB

Proposed shared library: ZB.MOM.WW.Telemetry

A contract on paper — the public surface to extract so the three projects stop implementing observability separately. Realizes ../spec/SPEC.md and ../spec/METRIC-CONVENTIONS.md. Not yet created. Reference implementations already exist: OtOpcUa ObservabilityExtensions.cs (OTel + Serilog), ScadaBridge LoggerConfigurationFactory.cs (Serilog enrichers), MxGateway GatewayMetrics.cs + GatewayLogRedactor.cs.

Packages (.NET 10)

ZB.MOM.WW.Telemetry          # OTel bootstrap: Resource, metrics, traces, exporters
ZB.MOM.WW.Telemetry.Serilog  # Serilog bootstrap: enrichers, TraceContextEnricher, ILogRedactor

Both packages are .NET 10 — all three logging-bearing processes are .NET 10 (OtOpcUa server, mxaccessgw gateway, ScadaBridge central). The x86 net48 mxaccessgw worker uses a bespoke IWorkerLogger (stderr key=value); net48 multi-targeting is not required. Published to the Gitea NuGet feed; SemVer; lockstep to start.

Packaging & distribution

Two NuGet packages, one DLL each, on the Gitea NuGet feed. Libraries linked into each app — there is no central telemetry service. Both packages are consumed by all three apps after adoption:

Package (→ DLL) Transitive deps OtOpcUa MxGateway ScadaBridge
…Telemetry OpenTelemetry SDK, OpenTelemetry.Exporter.Prometheus.AspNetCore, OpenTelemetry.Exporter.OpenTelemetryProtocol, standard instrumentation packages
…Telemetry.Serilog Serilog, Serilog.Extensions.Hosting, Serilog.AspNetCore (version note below)

Serilog.AspNetCore version split (open convergence note): OtOpcUa and ScadaBridge target .NET 10 and may use Serilog.AspNetCore 9.x; MxGateway's adoption starts from Serilog.AspNetCore 9.x as well. If a project remains on .NET 8 ASP.NET Core for any reason, the compatible version is Serilog.AspNetCore 8.x. Coordinate the version floor when the first app takes a dependency and pin it in Directory.Packages.props.


ZB.MOM.WW.Telemetry

namespace ZB.MOM.WW.Telemetry;

/// Selects how instrumentation data is exported.
public enum ZbExporter
{
    /// Prometheus scrape endpoint (default). Call app.MapZbMetrics() to mount /metrics.
    Prometheus,

    /// OTLP gRPC export. Set OtlpEndpoint (e.g. "http://collector:4317").
    /// Coexists with Prometheus when both endpoints are desired.
    Otlp,
}

/// Options for AddZbTelemetry. All properties feed the shared OTel Resource and
/// Serilog enrichers (via AddZbSerilog in the .Serilog package).
public sealed class ZbTelemetryOptions
{
    /// Required. Short lower-case app identifier — e.g. "otopcua", "mxgateway", "scadabridge".
    /// Populates OTel Resource service.name.
    public string ServiceName { get; set; } = "";

    /// Fleet-wide namespace. Default "ZB.MOM.WW". Do not override per-app.
    /// Populates OTel Resource service.namespace.
    public string ServiceNamespace { get; set; } = "ZB.MOM.WW";

    /// Optional. Populate from AssemblyInformationalVersion.
    /// Populates OTel Resource service.version.
    public string? ServiceVersion { get; set; }

    /// Optional. Physical or logical site identifier.
    /// Populates OTel Resource site.id and Serilog property SiteId.
    public string? SiteId { get; set; }

    /// Optional. Node function: "central", "site", "hub", "standalone".
    /// Populates OTel Resource node.role and Serilog property NodeRole.
    public string? NodeRole { get; set; }

    /// App-specific Meter names to register with the OTel MeterProvider.
    /// Always register the app's primary Meter here. Standard instrumentation meters are
    /// added automatically (ASP.NET Core, HttpClient, runtime, process).
    public string[] Meters { get; set; } = [];

    /// App-specific ActivitySource names to register with the OTel TracerProvider.
    public string[] ActivitySources { get; set; } = [];

    /// Export path. Default Prometheus; use Otlp for a real collector.
    public ZbExporter Exporter { get; set; } = ZbExporter.Prometheus;

    /// Required when Exporter = ZbExporter.Otlp.
    /// OTLP gRPC endpoint, e.g. "http://collector:4317".
    public string? OtlpEndpoint { get; set; }
}

/// Extension point for configuring the OTel bootstrap on an IHostApplicationBuilder.
public static class ZbTelemetryExtensions
{
    /// Configures the OpenTelemetry MeterProvider and TracerProvider with the shared Resource,
    /// standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process),
    /// the app's own Meters and ActivitySources, and the selected exporter.
    /// Does NOT configure Serilog — call AddZbSerilog() in the .Serilog package for that.
    public static IHostApplicationBuilder AddZbTelemetry(
        this IHostApplicationBuilder builder,
        Action<ZbTelemetryOptions>   configure);

    /// IServiceCollection overload for contexts where IHostApplicationBuilder is not available.
    /// Requires the caller to supply a pre-built ZbTelemetryOptions (Resource attributes must
    /// be populated before DI composition, so the options-object overload is preferred).
    public static IServiceCollection AddZbTelemetry(
        this IServiceCollection    services,
        ZbTelemetryOptions         options);
}

/// Builds the shared OTel ResourceBuilder from ZbTelemetryOptions.
/// Used internally by AddZbTelemetry. Exposed for tests and custom pipelines.
public static class ZbResource
{
    /// Returns a ResourceBuilder pre-populated with service.name, service.namespace,
    /// service.version, site.id, node.role, and host.name (always Environment.MachineName).
    /// Attributes with null values are omitted from the Resource.
    public static ResourceBuilder Build(ZbTelemetryOptions options);
}

/// Endpoint extension for mounting the Prometheus /metrics scrape endpoint.
public static class ZbMetricsEndpointExtensions
{
    /// Mounts the Prometheus /metrics endpoint.
    /// Only valid when ZbTelemetryOptions.Exporter = ZbExporter.Prometheus (or both).
    /// Call after app.UseRouting().
    public static IEndpointConventionBuilder MapZbMetrics(
        this IEndpointRouteBuilder endpoints);
}

ZB.MOM.WW.Telemetry.Serilog

namespace ZB.MOM.WW.Telemetry.Serilog;

/// Extension point for configuring the Serilog two-stage bootstrap on an IHostApplicationBuilder.
public static class ZbSerilogExtensions
{
    /// Two-stage Serilog bootstrap:
    ///   Stage 1 — minimal console-only bootstrap logger (for startup errors before IConfiguration).
    ///   Stage 2 — application logger wired from IConfiguration (ReadFrom.Configuration reads
    ///             Serilog:WriteTo sinks + Serilog:MinimumLevel from "Serilog:MinimumLevel") with
    ///             fixed enrichers: SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from
    ///             Environment.MachineName (auto — not a caller-supplied option); TraceContextEnricher;
    ///             and RedactionEnricher (applied only when ILogRedactor is registered).
    ///
    ///   MinimumLevel: AddZbSerilog reads "Serilog:MinimumLevel" from IConfiguration. Callers that
    ///             bind MinimumLevel from a different config key (e.g. ScadaBridge's
    ///             "ScadaBridge:Logging:MinimumLevel") apply that override themselves before or after
    ///             calling AddZbSerilog — this remains per-project and AddZbSerilog does not read it.
    ///
    /// OTel log export is wired automatically: logs flow through the OTel pipeline with the same
    /// Resource as the metrics and traces (all three signals correlated in a backend).
    ///
    /// The configure delegate receives the same ZbTelemetryOptions used by AddZbTelemetry.
    /// Typically share a single options-population lambda across both calls.
    public static IHostApplicationBuilder AddZbSerilog(
        this IHostApplicationBuilder builder,
        Action<ZbTelemetryOptions>   configure);
}

/// Canonical Serilog property name constants for the identity enrichers.
/// Use these constants — not literal strings — when querying properties in sinks or tests.
public static class ZbLogEnricherNames
{
    /// Serilog property: physical or logical site identifier. Matches OTel Resource site.id.
    public const string SiteId       = "SiteId";

    /// Serilog property: node function (central, site, hub, standalone). Matches OTel node.role.
    public const string NodeRole     = "NodeRole";

    /// Serilog property: machine name (Environment.MachineName). Matches OTel host.name.
    public const string NodeHostname = "NodeHostname";
}

/// Stamps trace_id and span_id from Activity.Current onto every Serilog log event.
/// When Activity.Current is null (no active span — background services, startup, non-traced paths)
/// the enricher emits nothing; it does NOT inject empty strings or zero values.
/// This enables a log line to be clicked through to its originating trace in a backend.
public sealed class TraceContextEnricher : ILogEventEnricher
{
    public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}

/// Seam for project-specific log-event redaction.
/// The shared library applies this via RedactionEnricher; each project provides its own
/// implementation that knows which fields (by property name) or which command payloads
/// must not leave the process in log events.
/// If no ILogRedactor is registered in DI, RedactionEnricher is a no-op.
public interface ILogRedactor
{
    /// Inspect and mutate properties in-place. Remove or replace any sensitive values.
    /// Called on every log event before it reaches any sink.
    void Redact(IDictionary<string, object?> properties);
}

/// Applies a registered ILogRedactor to every Serilog log event.
/// Registered automatically by AddZbSerilog. The enricher resolves ILogRedactor from DI
/// on first use; if none is registered it is permanently inert (no DI call per event).
public sealed class RedactionEnricher : ILogEventEnricher
{
    public RedactionEnricher(IServiceProvider serviceProvider);
    public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}

Consumer matrix

Consumer Packages Notes
MxGateway Both MEL → Serilog migration: GatewayLogScope/BeginScopeLogContext.PushProperty; GatewayLogRedactorILogRedactor impl; GatewayMetrics stays, wired through o.Meters. Done in this release.
OtOpcUa Both Consolidate existing Serilog bootstrap; add TraceContextEnricher + SiteId/NodeRole enrichers; add Resource to existing OTel pipeline. Deferred to GAPS backlog.
ScadaBridge Both Add full OTel SDK (metrics + traces + export); consolidate LoggerConfigurationFactory; add TraceContextEnricher. Deferred to GAPS backlog.

The net48 x86 mxaccessgw worker is excluded from both packages. Its IWorkerLogger (stderr key=value format) is an out-of-process concern and remains bespoke.


Open contract questions

  1. IServiceCollection overload completeness: the IHostApplicationBuilder-based overload is the primary path (available in all three apps on .NET 10). The IServiceCollection overload is a fallback for unusual host configurations. Validate that both overloads wire OTel log export identically (same Resource, same enrichers).

  2. OTel log export channel: AddZbSerilog uses Serilog.Sinks.OpenTelemetry to push logs into the OTel pipeline (sharing the Resource). Confirm the sink version is compatible with the OpenTelemetry SDK version pinned in ZB.MOM.WW.Telemetry (Directory.Packages.props).

  3. RedactionEnricher DI timing: RedactionEnricher resolves ILogRedactor from IServiceProvider on first use (lazy, to avoid a circular-DI problem during Serilog's two-stage bootstrap). Validate that the service provider is fully built by the time the first post-startup log event fires. If MxGateway's GatewayLogRedactor has dependencies that are not available at stage-1 bootstrap time, the lazy-resolve pattern protects it.

  4. SiteId / NodeRole null handling: AddZbTelemetry and AddZbSerilog silently omit null SiteId/NodeRole from the Resource and enricher set. Confirm this is the correct behavior for OtOpcUa, which may run in a single-site configuration where neither field is meaningful, versus ScadaBridge, where SiteId is essential for multi-cluster fleet visibility.

See ../GAPS.md for the adoption order and effort/risk.