544a6ddb77
Resolves the 35 findings from the 2026-06-01 baseline (commit 26ba1c7),
test-first for every behavioral change. +51 tests (331 -> 382 passing, 0 failed).
- Telemetry-001 (HIGH): RedactionEnricher now honours property removal, so a
redactor that drops a key actually scrubs the secret from the event.
- Auth: LDAP validator ValidateOnStart; API-key verify no longer fails on a
best-effort MarkUsed write or a corrupt scopes column (fail-closed); LDAP cert
validation hook; KeyPrefix persistence aligned; README algorithm corrected.
- Health: Akka checks return Degraded (not throw) when the cluster isn't up yet;
GrpcDependencyHealthCheck catch-all; null 'description' rendered; composite
endpoint builder; XML docs shipped.
- Audit: CompositeAuditWriter no longer re-throws OperationCanceledException;
TruncatingAuditRedactor over-redact scrubs Target + safe negative max; options
record; XML docs shipped.
- Configuration: TryAddEnumerable idempotent registration; consistent port
quoting; strict invariant port parsing; XML docs + README packaged.
- Theme: mobile toggle is now CSS-only (no Bootstrap JS); token/CSS hygiene;
XML docs on the public parameter surface.
Shared-contract/spec docs updated where the code was the source of truth
(observability service.instance.id, MapZbMetrics, redactor reach). All changes
additive/back-compatible at v0.1.0. code-reviews bookkeeping follows separately.
287 lines
15 KiB
Markdown
287 lines
15 KiB
Markdown
# Proposed shared library: `ZB.MOM.WW.Telemetry`
|
|
|
|
A contract on paper — the public surface to extract so the three projects stop implementing
|
|
observability separately. Realizes [`../spec/SPEC.md`](../spec/SPEC.md) and
|
|
[`../spec/METRIC-CONVENTIONS.md`](../spec/METRIC-CONVENTIONS.md). **Not yet created.**
|
|
Reference implementations already exist: OtOpcUa `ObservabilityExtensions.cs` (OTel + Serilog),
|
|
ScadaBridge `LoggerConfigurationFactory.cs` (Serilog enrichers), MxGateway
|
|
`GatewayMetrics.cs` + `GatewayLogRedactor.cs`.
|
|
|
|
## Packages (.NET 10)
|
|
|
|
```
|
|
ZB.MOM.WW.Telemetry # OTel bootstrap: Resource, metrics, traces, exporters
|
|
ZB.MOM.WW.Telemetry.Serilog # Serilog bootstrap: enrichers, TraceContextEnricher, ILogRedactor
|
|
```
|
|
|
|
Both packages are .NET 10 — all three logging-bearing processes are .NET 10 (OtOpcUa server,
|
|
mxaccessgw gateway, ScadaBridge central). The x86 net48 mxaccessgw worker uses a bespoke
|
|
`IWorkerLogger` (stderr key=value); net48 multi-targeting is **not** required. Published to
|
|
the Gitea NuGet feed; SemVer; lockstep to start.
|
|
|
|
## Packaging & distribution
|
|
|
|
**Two NuGet packages, one DLL each**, on the Gitea NuGet feed. Libraries linked into each
|
|
app — there is no central telemetry service. Both packages are consumed by all three apps
|
|
after adoption:
|
|
|
|
| Package (→ DLL) | Transitive deps | OtOpcUa | MxGateway | ScadaBridge |
|
|
|---|---|---|---|---|
|
|
| `…Telemetry` | OpenTelemetry SDK, `OpenTelemetry.Exporter.Prometheus.AspNetCore`, `OpenTelemetry.Exporter.OpenTelemetryProtocol`, standard instrumentation packages | ✅ | ✅ | ✅ |
|
|
| `…Telemetry.Serilog` | Serilog, `Serilog.Extensions.Hosting`, `Serilog.AspNetCore` (version note below) | ✅ | ✅ | ✅ |
|
|
|
|
> **`Serilog.AspNetCore` version split (open convergence note):** OtOpcUa and ScadaBridge
|
|
> target .NET 10 and may use `Serilog.AspNetCore` 9.x; MxGateway's adoption starts from
|
|
> `Serilog.AspNetCore` 9.x as well. If a project remains on .NET 8 ASP.NET Core for any
|
|
> reason, the compatible version is `Serilog.AspNetCore` 8.x. Coordinate the version floor
|
|
> when the first app takes a dependency and pin it in `Directory.Packages.props`.
|
|
|
|
---
|
|
|
|
## `ZB.MOM.WW.Telemetry`
|
|
|
|
```csharp
|
|
namespace ZB.MOM.WW.Telemetry;
|
|
|
|
/// Selects how instrumentation data is exported.
|
|
public enum ZbExporter
|
|
{
|
|
/// Prometheus scrape endpoint (default). Call app.MapZbMetrics() to mount /metrics.
|
|
Prometheus,
|
|
|
|
/// OTLP gRPC export. Set OtlpEndpoint (e.g. "http://collector:4317").
|
|
/// Coexists with Prometheus when both endpoints are desired.
|
|
Otlp,
|
|
}
|
|
|
|
/// Options for AddZbTelemetry. All properties feed the shared OTel Resource and
|
|
/// Serilog enrichers (via AddZbSerilog in the .Serilog package).
|
|
public sealed class ZbTelemetryOptions
|
|
{
|
|
/// Required. Short lower-case app identifier — e.g. "otopcua", "mxgateway", "scadabridge".
|
|
/// Populates OTel Resource service.name.
|
|
public string ServiceName { get; set; } = "";
|
|
|
|
/// Fleet-wide namespace. Default "ZB.MOM.WW". Do not override per-app.
|
|
/// Populates OTel Resource service.namespace.
|
|
public string ServiceNamespace { get; set; } = "ZB.MOM.WW";
|
|
|
|
/// Optional. Populate from AssemblyInformationalVersion.
|
|
/// Populates OTel Resource service.version.
|
|
public string? ServiceVersion { get; set; }
|
|
|
|
/// Optional. Physical or logical site identifier.
|
|
/// Populates OTel Resource site.id and Serilog property SiteId.
|
|
public string? SiteId { get; set; }
|
|
|
|
/// Optional. Node function: "central", "site", "hub", "standalone".
|
|
/// Populates OTel Resource node.role and Serilog property NodeRole.
|
|
public string? NodeRole { get; set; }
|
|
|
|
/// App-specific Meter names to register with the OTel MeterProvider.
|
|
/// Always register the app's primary Meter here. Standard instrumentation meters are
|
|
/// added automatically (ASP.NET Core, HttpClient, runtime, process).
|
|
public string[] Meters { get; set; } = [];
|
|
|
|
/// App-specific ActivitySource names to register with the OTel TracerProvider.
|
|
public string[] ActivitySources { get; set; } = [];
|
|
|
|
/// Export path. Default Prometheus; use Otlp for a real collector.
|
|
public ZbExporter Exporter { get; set; } = ZbExporter.Prometheus;
|
|
|
|
/// Required when Exporter = ZbExporter.Otlp.
|
|
/// OTLP gRPC endpoint, e.g. "http://collector:4317".
|
|
public string? OtlpEndpoint { get; set; }
|
|
}
|
|
|
|
/// Extension point for configuring the OTel bootstrap on an IHostApplicationBuilder.
|
|
public static class ZbTelemetryExtensions
|
|
{
|
|
/// Configures the OpenTelemetry MeterProvider and TracerProvider with the shared Resource,
|
|
/// standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process),
|
|
/// the app's own Meters and ActivitySources, and the selected exporter.
|
|
/// Does NOT configure Serilog — call AddZbSerilog() in the .Serilog package for that.
|
|
public static IHostApplicationBuilder AddZbTelemetry(
|
|
this IHostApplicationBuilder builder,
|
|
Action<ZbTelemetryOptions> configure);
|
|
|
|
/// IServiceCollection overload for contexts where IHostApplicationBuilder is not available.
|
|
/// Requires the caller to supply a pre-built ZbTelemetryOptions (Resource attributes must
|
|
/// be populated before DI composition, so the options-object overload is preferred).
|
|
public static IServiceCollection AddZbTelemetry(
|
|
this IServiceCollection services,
|
|
ZbTelemetryOptions options);
|
|
|
|
/// IServiceCollection convenience overload that accepts a configure delegate.
|
|
/// Equivalent to calling BuildOptions(configure) then AddZbTelemetry(services, options).
|
|
/// Use when only an IServiceCollection is available but the lambda form is preferred.
|
|
public static IServiceCollection AddZbTelemetry(
|
|
this IServiceCollection services,
|
|
Action<ZbTelemetryOptions> configure);
|
|
}
|
|
|
|
/// Builds the shared OTel ResourceBuilder from ZbTelemetryOptions.
|
|
/// Used internally by AddZbTelemetry. Exposed for tests and custom pipelines.
|
|
public static class ZbResource
|
|
{
|
|
/// Deterministic, process-stable service instance identifier, formatted as
|
|
/// "MachineName:ProcessId". Populates OTel Resource service.instance.id on every signal
|
|
/// (metrics, traces, logs). The OTel SDK's random-GUID default is disabled in favour of this
|
|
/// value so all signals from one process share one restart-stable instance id, enabling
|
|
/// cross-signal correlation. Always present (not optional).
|
|
public static string InstanceId { get; }
|
|
|
|
/// Returns a ResourceBuilder pre-populated with service.name, service.namespace,
|
|
/// service.version, service.instance.id (always — see InstanceId), site.id, node.role, and
|
|
/// host.name (always Environment.MachineName). Attributes with null/empty values are omitted.
|
|
public static ResourceBuilder Build(ZbTelemetryOptions options);
|
|
|
|
/// The single source of truth for the shared Resource attribute set. Both the OTel SDK
|
|
/// metrics/traces pipeline and the Serilog OTLP log sink derive their attributes from this one
|
|
/// map, so logs cannot drift from metrics/traces. service.name/namespace/instance.id/host.name
|
|
/// are always present; service.version/site.id/node.role are included only when the option is
|
|
/// non-null/non-empty.
|
|
public static IReadOnlyDictionary<string, object> BuildAttributes(ZbTelemetryOptions options);
|
|
}
|
|
|
|
/// Endpoint extension for mounting the Prometheus /metrics scrape endpoint.
|
|
public static class ZbMetricsEndpointExtensions
|
|
{
|
|
/// Mounts the Prometheus /metrics endpoint. Valid under ANY ZbTelemetryOptions.Exporter value:
|
|
/// AddZbTelemetry always wires the Prometheus exporter, and OTLP (ZbExporter.Otlp) is only an
|
|
/// additive overlay — so /metrics serves scrape data even when Exporter = ZbExporter.Otlp.
|
|
/// Call after app.UseRouting().
|
|
public static IEndpointConventionBuilder MapZbMetrics(
|
|
this IEndpointRouteBuilder endpoints);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## `ZB.MOM.WW.Telemetry.Serilog`
|
|
|
|
```csharp
|
|
namespace ZB.MOM.WW.Telemetry.Serilog;
|
|
|
|
/// Extension point for registering the Serilog application logger in DI on an IHostApplicationBuilder.
|
|
public static class ZbSerilogExtensions
|
|
{
|
|
/// Registers the Serilog application logger in DI. Wires configuration-driven sinks
|
|
/// (ReadFrom.Configuration reads Serilog:WriteTo sinks + Serilog:MinimumLevel) with
|
|
/// fixed enrichers: SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from
|
|
/// Environment.MachineName (auto — not a caller-supplied option); TraceContextEnricher;
|
|
/// and RedactionEnricher (applied only when ILogRedactor is registered).
|
|
///
|
|
/// MinimumLevel: AddZbSerilog reads "Serilog:MinimumLevel" from IConfiguration. Callers that
|
|
/// bind MinimumLevel from a different config key (e.g. ScadaBridge's
|
|
/// "ScadaBridge:Logging:MinimumLevel") apply that override themselves before or after
|
|
/// calling AddZbSerilog — this remains per-project and AddZbSerilog does not read it.
|
|
///
|
|
/// IMPORTANT — no process-global state: AddZbSerilog does NOT set Log.Logger. It passes
|
|
/// preserveStaticLogger: true to AddSerilog so the static logger is left untouched.
|
|
/// This makes AddZbSerilog safe to call multiple times in one process (integration tests,
|
|
/// multi-host apps) without hitting "The logger is already frozen".
|
|
///
|
|
/// Apps that need a pre-Build() bootstrap logger (for startup exceptions before IConfiguration
|
|
/// is available) should set Log.Logger themselves in Program.cs:
|
|
/// Log.Logger = new LoggerConfiguration().WriteTo.Console().CreateBootstrapLogger();
|
|
/// That is an application-level decision — not done by this library.
|
|
///
|
|
/// OTel log export is wired automatically: logs flow through the OTel pipeline with the same
|
|
/// Resource as the metrics and traces (all three signals correlated in a backend).
|
|
///
|
|
/// The configure delegate receives the same ZbTelemetryOptions used by AddZbTelemetry.
|
|
/// Typically share a single options-population lambda across both calls.
|
|
public static IHostApplicationBuilder AddZbSerilog(
|
|
this IHostApplicationBuilder builder,
|
|
Action<ZbTelemetryOptions> configure);
|
|
}
|
|
|
|
/// Canonical Serilog property name constants for the identity enrichers.
|
|
/// Use these constants — not literal strings — when querying properties in sinks or tests.
|
|
public static class ZbLogEnricherNames
|
|
{
|
|
/// Serilog property: physical or logical site identifier. Matches OTel Resource site.id.
|
|
public const string SiteId = "SiteId";
|
|
|
|
/// Serilog property: node function (central, site, hub, standalone). Matches OTel node.role.
|
|
public const string NodeRole = "NodeRole";
|
|
|
|
/// Serilog property: machine name (Environment.MachineName). Matches OTel host.name.
|
|
public const string NodeHostname = "NodeHostname";
|
|
}
|
|
|
|
/// Stamps trace_id and span_id from Activity.Current onto every Serilog log event.
|
|
/// When Activity.Current is null (no active span — background services, startup, non-traced paths)
|
|
/// the enricher emits nothing; it does NOT inject empty strings or zero values.
|
|
/// This enables a log line to be clicked through to its originating trace in a backend.
|
|
public sealed class TraceContextEnricher : ILogEventEnricher
|
|
{
|
|
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
|
|
}
|
|
|
|
/// Seam for project-specific log-event redaction.
|
|
/// The shared library applies this via RedactionEnricher; each project provides its own
|
|
/// implementation that knows which fields (by property name) or which command payloads
|
|
/// must not leave the process in log events.
|
|
/// If no ILogRedactor is registered in DI, RedactionEnricher is a no-op.
|
|
public interface ILogRedactor
|
|
{
|
|
/// Inspect and mutate properties in-place. Remove or replace any sensitive values.
|
|
/// Called on every log event before it reaches any sink.
|
|
void Redact(IDictionary<string, object?> properties);
|
|
}
|
|
|
|
/// Applies a registered ILogRedactor to every Serilog log event.
|
|
/// Registered automatically by AddZbSerilog. The enricher resolves ILogRedactor from DI
|
|
/// on first use; if none is registered it is permanently inert (no DI call per event).
|
|
public sealed class RedactionEnricher : ILogEventEnricher
|
|
{
|
|
public RedactionEnricher(IServiceProvider serviceProvider);
|
|
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Consumer matrix
|
|
|
|
| Consumer | Packages | Notes |
|
|
|---|---|---|
|
|
| **MxGateway** | Both | MEL → Serilog migration: `GatewayLogScope`/`BeginScope` → `LogContext.PushProperty`; `GatewayLogRedactor` → `ILogRedactor` impl; `GatewayMetrics` stays, wired through `o.Meters`. **Done in this release.** |
|
|
| **OtOpcUa** | Both | Consolidate existing Serilog bootstrap; add `TraceContextEnricher` + `SiteId`/`NodeRole` enrichers; add Resource to existing OTel pipeline. Deferred to GAPS backlog. |
|
|
| **ScadaBridge** | Both | Add full OTel SDK (metrics + traces + export); consolidate `LoggerConfigurationFactory`; add `TraceContextEnricher`. Deferred to GAPS backlog. |
|
|
|
|
The net48 x86 mxaccessgw worker is excluded from both packages. Its `IWorkerLogger`
|
|
(stderr key=value format) is an out-of-process concern and remains bespoke.
|
|
|
|
---
|
|
|
|
## Open contract questions
|
|
|
|
1. **`IServiceCollection` overload completeness:** the `IHostApplicationBuilder`-based
|
|
overload is the primary path (available in all three apps on .NET 10). The
|
|
`IServiceCollection` overload is a fallback for unusual host configurations. Validate
|
|
that both overloads wire OTel log export identically (same Resource, same enrichers).
|
|
|
|
2. **OTel log export channel:** `AddZbSerilog` uses `Serilog.Sinks.OpenTelemetry` to push
|
|
logs into the OTel pipeline (sharing the Resource). Confirm the sink version is
|
|
compatible with the OpenTelemetry SDK version pinned in `ZB.MOM.WW.Telemetry`
|
|
(`Directory.Packages.props`).
|
|
|
|
3. **`RedactionEnricher` DI timing:** `RedactionEnricher` resolves `ILogRedactor` from
|
|
`IServiceProvider` on first use (lazy, to avoid a circular-DI problem at host-build time).
|
|
Validate that the service provider is fully built by the time the first post-startup log
|
|
event fires. If MxGateway's `GatewayLogRedactor` has dependencies that are not yet
|
|
available when the DI container is being composed, the lazy-resolve pattern protects it.
|
|
(Note: the library no longer sets a Stage-1 bootstrap logger, so there is no Stage-1 vs.
|
|
Stage-2 logger lifetime to reason about — only the single DI-registered application logger.)
|
|
|
|
4. **`SiteId` / `NodeRole` null handling:** `AddZbTelemetry` and `AddZbSerilog` silently
|
|
omit null `SiteId`/`NodeRole` from the Resource and enricher set. Confirm this is the
|
|
correct behavior for OtOpcUa, which may run in a single-site configuration where neither
|
|
field is meaningful, versus ScadaBridge, where `SiteId` is essential for multi-cluster
|
|
fleet visibility.
|
|
|
|
See [`../GAPS.md`](../GAPS.md) for the adoption order and effort/risk.
|