Files
scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md
T
Joseph Doherty 544a6ddb77 Fix all baseline code-review findings across the six shared libraries
Resolves the 35 findings from the 2026-06-01 baseline (commit 26ba1c7),
test-first for every behavioral change. +51 tests (331 -> 382 passing, 0 failed).

- Telemetry-001 (HIGH): RedactionEnricher now honours property removal, so a
  redactor that drops a key actually scrubs the secret from the event.
- Auth: LDAP validator ValidateOnStart; API-key verify no longer fails on a
  best-effort MarkUsed write or a corrupt scopes column (fail-closed); LDAP cert
  validation hook; KeyPrefix persistence aligned; README algorithm corrected.
- Health: Akka checks return Degraded (not throw) when the cluster isn't up yet;
  GrpcDependencyHealthCheck catch-all; null 'description' rendered; composite
  endpoint builder; XML docs shipped.
- Audit: CompositeAuditWriter no longer re-throws OperationCanceledException;
  TruncatingAuditRedactor over-redact scrubs Target + safe negative max; options
  record; XML docs shipped.
- Configuration: TryAddEnumerable idempotent registration; consistent port
  quoting; strict invariant port parsing; XML docs + README packaged.
- Theme: mobile toggle is now CSS-only (no Bootstrap JS); token/CSS hygiene;
  XML docs on the public parameter surface.

Shared-contract/spec docs updated where the code was the source of truth
(observability service.instance.id, MapZbMetrics, redactor reach). All changes
additive/back-compatible at v0.1.0. code-reviews bookkeeping follows separately.
2026-06-01 11:22:14 -04:00

15 KiB

Proposed shared library: ZB.MOM.WW.Telemetry

A contract on paper — the public surface to extract so the three projects stop implementing observability separately. Realizes ../spec/SPEC.md and ../spec/METRIC-CONVENTIONS.md. Not yet created. Reference implementations already exist: OtOpcUa ObservabilityExtensions.cs (OTel + Serilog), ScadaBridge LoggerConfigurationFactory.cs (Serilog enrichers), MxGateway GatewayMetrics.cs + GatewayLogRedactor.cs.

Packages (.NET 10)

ZB.MOM.WW.Telemetry          # OTel bootstrap: Resource, metrics, traces, exporters
ZB.MOM.WW.Telemetry.Serilog  # Serilog bootstrap: enrichers, TraceContextEnricher, ILogRedactor

Both packages are .NET 10 — all three logging-bearing processes are .NET 10 (OtOpcUa server, mxaccessgw gateway, ScadaBridge central). The x86 net48 mxaccessgw worker uses a bespoke IWorkerLogger (stderr key=value); net48 multi-targeting is not required. Published to the Gitea NuGet feed; SemVer; lockstep to start.

Packaging & distribution

Two NuGet packages, one DLL each, on the Gitea NuGet feed. Libraries linked into each app — there is no central telemetry service. Both packages are consumed by all three apps after adoption:

Package (→ DLL) Transitive deps OtOpcUa MxGateway ScadaBridge
…Telemetry OpenTelemetry SDK, OpenTelemetry.Exporter.Prometheus.AspNetCore, OpenTelemetry.Exporter.OpenTelemetryProtocol, standard instrumentation packages
…Telemetry.Serilog Serilog, Serilog.Extensions.Hosting, Serilog.AspNetCore (version note below)

Serilog.AspNetCore version split (open convergence note): OtOpcUa and ScadaBridge target .NET 10 and may use Serilog.AspNetCore 9.x; MxGateway's adoption starts from Serilog.AspNetCore 9.x as well. If a project remains on .NET 8 ASP.NET Core for any reason, the compatible version is Serilog.AspNetCore 8.x. Coordinate the version floor when the first app takes a dependency and pin it in Directory.Packages.props.


ZB.MOM.WW.Telemetry

namespace ZB.MOM.WW.Telemetry;

/// Selects how instrumentation data is exported.
public enum ZbExporter
{
    /// Prometheus scrape endpoint (default). Call app.MapZbMetrics() to mount /metrics.
    Prometheus,

    /// OTLP gRPC export. Set OtlpEndpoint (e.g. "http://collector:4317").
    /// Coexists with Prometheus when both endpoints are desired.
    Otlp,
}

/// Options for AddZbTelemetry. All properties feed the shared OTel Resource and
/// Serilog enrichers (via AddZbSerilog in the .Serilog package).
public sealed class ZbTelemetryOptions
{
    /// Required. Short lower-case app identifier — e.g. "otopcua", "mxgateway", "scadabridge".
    /// Populates OTel Resource service.name.
    public string ServiceName { get; set; } = "";

    /// Fleet-wide namespace. Default "ZB.MOM.WW". Do not override per-app.
    /// Populates OTel Resource service.namespace.
    public string ServiceNamespace { get; set; } = "ZB.MOM.WW";

    /// Optional. Populate from AssemblyInformationalVersion.
    /// Populates OTel Resource service.version.
    public string? ServiceVersion { get; set; }

    /// Optional. Physical or logical site identifier.
    /// Populates OTel Resource site.id and Serilog property SiteId.
    public string? SiteId { get; set; }

    /// Optional. Node function: "central", "site", "hub", "standalone".
    /// Populates OTel Resource node.role and Serilog property NodeRole.
    public string? NodeRole { get; set; }

    /// App-specific Meter names to register with the OTel MeterProvider.
    /// Always register the app's primary Meter here. Standard instrumentation meters are
    /// added automatically (ASP.NET Core, HttpClient, runtime, process).
    public string[] Meters { get; set; } = [];

    /// App-specific ActivitySource names to register with the OTel TracerProvider.
    public string[] ActivitySources { get; set; } = [];

    /// Export path. Default Prometheus; use Otlp for a real collector.
    public ZbExporter Exporter { get; set; } = ZbExporter.Prometheus;

    /// Required when Exporter = ZbExporter.Otlp.
    /// OTLP gRPC endpoint, e.g. "http://collector:4317".
    public string? OtlpEndpoint { get; set; }
}

/// Extension point for configuring the OTel bootstrap on an IHostApplicationBuilder.
public static class ZbTelemetryExtensions
{
    /// Configures the OpenTelemetry MeterProvider and TracerProvider with the shared Resource,
    /// standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process),
    /// the app's own Meters and ActivitySources, and the selected exporter.
    /// Does NOT configure Serilog — call AddZbSerilog() in the .Serilog package for that.
    public static IHostApplicationBuilder AddZbTelemetry(
        this IHostApplicationBuilder builder,
        Action<ZbTelemetryOptions>   configure);

    /// IServiceCollection overload for contexts where IHostApplicationBuilder is not available.
    /// Requires the caller to supply a pre-built ZbTelemetryOptions (Resource attributes must
    /// be populated before DI composition, so the options-object overload is preferred).
    public static IServiceCollection AddZbTelemetry(
        this IServiceCollection    services,
        ZbTelemetryOptions         options);

    /// IServiceCollection convenience overload that accepts a configure delegate.
    /// Equivalent to calling BuildOptions(configure) then AddZbTelemetry(services, options).
    /// Use when only an IServiceCollection is available but the lambda form is preferred.
    public static IServiceCollection AddZbTelemetry(
        this IServiceCollection    services,
        Action<ZbTelemetryOptions> configure);
}

/// Builds the shared OTel ResourceBuilder from ZbTelemetryOptions.
/// Used internally by AddZbTelemetry. Exposed for tests and custom pipelines.
public static class ZbResource
{
    /// Deterministic, process-stable service instance identifier, formatted as
    /// "MachineName:ProcessId". Populates OTel Resource service.instance.id on every signal
    /// (metrics, traces, logs). The OTel SDK's random-GUID default is disabled in favour of this
    /// value so all signals from one process share one restart-stable instance id, enabling
    /// cross-signal correlation. Always present (not optional).
    public static string InstanceId { get; }

    /// Returns a ResourceBuilder pre-populated with service.name, service.namespace,
    /// service.version, service.instance.id (always — see InstanceId), site.id, node.role, and
    /// host.name (always Environment.MachineName). Attributes with null/empty values are omitted.
    public static ResourceBuilder Build(ZbTelemetryOptions options);

    /// The single source of truth for the shared Resource attribute set. Both the OTel SDK
    /// metrics/traces pipeline and the Serilog OTLP log sink derive their attributes from this one
    /// map, so logs cannot drift from metrics/traces. service.name/namespace/instance.id/host.name
    /// are always present; service.version/site.id/node.role are included only when the option is
    /// non-null/non-empty.
    public static IReadOnlyDictionary<string, object> BuildAttributes(ZbTelemetryOptions options);
}

/// Endpoint extension for mounting the Prometheus /metrics scrape endpoint.
public static class ZbMetricsEndpointExtensions
{
    /// Mounts the Prometheus /metrics endpoint. Valid under ANY ZbTelemetryOptions.Exporter value:
    /// AddZbTelemetry always wires the Prometheus exporter, and OTLP (ZbExporter.Otlp) is only an
    /// additive overlay — so /metrics serves scrape data even when Exporter = ZbExporter.Otlp.
    /// Call after app.UseRouting().
    public static IEndpointConventionBuilder MapZbMetrics(
        this IEndpointRouteBuilder endpoints);
}

ZB.MOM.WW.Telemetry.Serilog

namespace ZB.MOM.WW.Telemetry.Serilog;

/// Extension point for registering the Serilog application logger in DI on an IHostApplicationBuilder.
public static class ZbSerilogExtensions
{
    /// Registers the Serilog application logger in DI. Wires configuration-driven sinks
    /// (ReadFrom.Configuration reads Serilog:WriteTo sinks + Serilog:MinimumLevel) with
    /// fixed enrichers: SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from
    /// Environment.MachineName (auto — not a caller-supplied option); TraceContextEnricher;
    /// and RedactionEnricher (applied only when ILogRedactor is registered).
    ///
    ///   MinimumLevel: AddZbSerilog reads "Serilog:MinimumLevel" from IConfiguration. Callers that
    ///             bind MinimumLevel from a different config key (e.g. ScadaBridge's
    ///             "ScadaBridge:Logging:MinimumLevel") apply that override themselves before or after
    ///             calling AddZbSerilog — this remains per-project and AddZbSerilog does not read it.
    ///
    /// IMPORTANT — no process-global state: AddZbSerilog does NOT set Log.Logger. It passes
    /// preserveStaticLogger: true to AddSerilog so the static logger is left untouched.
    /// This makes AddZbSerilog safe to call multiple times in one process (integration tests,
    /// multi-host apps) without hitting "The logger is already frozen".
    ///
    /// Apps that need a pre-Build() bootstrap logger (for startup exceptions before IConfiguration
    /// is available) should set Log.Logger themselves in Program.cs:
    ///   Log.Logger = new LoggerConfiguration().WriteTo.Console().CreateBootstrapLogger();
    /// That is an application-level decision — not done by this library.
    ///
    /// OTel log export is wired automatically: logs flow through the OTel pipeline with the same
    /// Resource as the metrics and traces (all three signals correlated in a backend).
    ///
    /// The configure delegate receives the same ZbTelemetryOptions used by AddZbTelemetry.
    /// Typically share a single options-population lambda across both calls.
    public static IHostApplicationBuilder AddZbSerilog(
        this IHostApplicationBuilder builder,
        Action<ZbTelemetryOptions>   configure);
}

/// Canonical Serilog property name constants for the identity enrichers.
/// Use these constants — not literal strings — when querying properties in sinks or tests.
public static class ZbLogEnricherNames
{
    /// Serilog property: physical or logical site identifier. Matches OTel Resource site.id.
    public const string SiteId       = "SiteId";

    /// Serilog property: node function (central, site, hub, standalone). Matches OTel node.role.
    public const string NodeRole     = "NodeRole";

    /// Serilog property: machine name (Environment.MachineName). Matches OTel host.name.
    public const string NodeHostname = "NodeHostname";
}

/// Stamps trace_id and span_id from Activity.Current onto every Serilog log event.
/// When Activity.Current is null (no active span — background services, startup, non-traced paths)
/// the enricher emits nothing; it does NOT inject empty strings or zero values.
/// This enables a log line to be clicked through to its originating trace in a backend.
public sealed class TraceContextEnricher : ILogEventEnricher
{
    public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}

/// Seam for project-specific log-event redaction.
/// The shared library applies this via RedactionEnricher; each project provides its own
/// implementation that knows which fields (by property name) or which command payloads
/// must not leave the process in log events.
/// If no ILogRedactor is registered in DI, RedactionEnricher is a no-op.
public interface ILogRedactor
{
    /// Inspect and mutate properties in-place. Remove or replace any sensitive values.
    /// Called on every log event before it reaches any sink.
    void Redact(IDictionary<string, object?> properties);
}

/// Applies a registered ILogRedactor to every Serilog log event.
/// Registered automatically by AddZbSerilog. The enricher resolves ILogRedactor from DI
/// on first use; if none is registered it is permanently inert (no DI call per event).
public sealed class RedactionEnricher : ILogEventEnricher
{
    public RedactionEnricher(IServiceProvider serviceProvider);
    public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
}

Consumer matrix

Consumer Packages Notes
MxGateway Both MEL → Serilog migration: GatewayLogScope/BeginScopeLogContext.PushProperty; GatewayLogRedactorILogRedactor impl; GatewayMetrics stays, wired through o.Meters. Done in this release.
OtOpcUa Both Consolidate existing Serilog bootstrap; add TraceContextEnricher + SiteId/NodeRole enrichers; add Resource to existing OTel pipeline. Deferred to GAPS backlog.
ScadaBridge Both Add full OTel SDK (metrics + traces + export); consolidate LoggerConfigurationFactory; add TraceContextEnricher. Deferred to GAPS backlog.

The net48 x86 mxaccessgw worker is excluded from both packages. Its IWorkerLogger (stderr key=value format) is an out-of-process concern and remains bespoke.


Open contract questions

  1. IServiceCollection overload completeness: the IHostApplicationBuilder-based overload is the primary path (available in all three apps on .NET 10). The IServiceCollection overload is a fallback for unusual host configurations. Validate that both overloads wire OTel log export identically (same Resource, same enrichers).

  2. OTel log export channel: AddZbSerilog uses Serilog.Sinks.OpenTelemetry to push logs into the OTel pipeline (sharing the Resource). Confirm the sink version is compatible with the OpenTelemetry SDK version pinned in ZB.MOM.WW.Telemetry (Directory.Packages.props).

  3. RedactionEnricher DI timing: RedactionEnricher resolves ILogRedactor from IServiceProvider on first use (lazy, to avoid a circular-DI problem at host-build time). Validate that the service provider is fully built by the time the first post-startup log event fires. If MxGateway's GatewayLogRedactor has dependencies that are not yet available when the DI container is being composed, the lazy-resolve pattern protects it. (Note: the library no longer sets a Stage-1 bootstrap logger, so there is no Stage-1 vs. Stage-2 logger lifetime to reason about — only the single DI-registered application logger.)

  4. SiteId / NodeRole null handling: AddZbTelemetry and AddZbSerilog silently omit null SiteId/NodeRole from the Resource and enricher set. Confirm this is the correct behavior for OtOpcUa, which may run in a single-site configuration where neither field is meaningful, versus ScadaBridge, where SiteId is essential for multi-cluster fleet visibility.

See ../GAPS.md for the adoption order and effort/risk.