# Observability — current state: MxAccessGateway Repo: `~/Desktop/MxAccessGateway`. Stack: .NET 10 gateway (x64) + .NET 4.8 worker (**x86**); solution `src/MxGateway.sln`. Telemetry code is concentrated in `src/ZB.MOM.WW.MxGateway.Server/Metrics/` (instruments) and `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/` (logging correlation + redaction). All paths relative to repo root. Verified 2026-06-01. The most unusual observability posture in the family: **13 counters, 3 histograms, and 4 observable gauges** all fully hand-rolled using `System.Diagnostics.Metrics` directly — but **never exported** (no OpenTelemetry SDK, no Prometheus exporter, no OTLP). All metric data dies in an in-memory `GetSnapshot()`. Logging is `Microsoft.Extensions.Logging` exclusively (no Serilog), with a bespoke correlation scope and a log-redaction pipeline. The net48 x86 worker is out of process and out of scope — its `IWorkerLogger` (stderr key=value) is not addressed here. ## 1. Metrics (hand-rolled, unexported) ### `GatewayMetrics.cs` `src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs`: Meter name: `"MxGateway.Server"` (does not follow the project namespace `ZB.MOM.WW.MxGateway`). All instruments are instance members of `GatewayMetrics`. The class is registered as a **singleton** at `GatewayApplication.cs:62`. There is **no `OpenTelemetry.Extensions.Hosting`**, **no `AddOpenTelemetry()` call**, and **no exporter** — the `Meter` is created with `new Meter("MxGateway.Server")` and `GetSnapshot()` is the only read path. **Counters (13):** | Instrument name | Tracks | |---|---| | `mxgateway.sessions.opened` | New session requests | | `mxgateway.sessions.closed` | Sessions torn down | | `mxgateway.commands.started` | MXAccess command dispatched | | `mxgateway.commands.succeeded` | Command completed OK | | `mxgateway.commands.failed` | Command error | | `mxgateway.events.received` | MXAccess events from worker | | `mxgateway.queues.overflows` | Queue overflow (backpressure) | | `mxgateway.faults` | Unhandled gateway faults | | `mxgateway.workers.killed` | Worker process forcibly terminated | | `mxgateway.workers.exited` | Worker process exited cleanly | | `mxgateway.heartbeats.failed` | Worker heartbeat timeouts | | `mxgateway.grpc.streams.disconnected` | gRPC event stream disconnects | | `mxgateway.retries.attempted` | Retry attempts (any subsystem) | **Histograms (3) — unit `ms` (diverges from OTel semconv `s`):** | Instrument name | Tracks | |---|---| | `mxgateway.workers.startup.duration` | Time from worker spawn to ready | | `mxgateway.commands.duration` | End-to-end MXAccess command latency | | `mxgateway.events.stream_send.duration` | gRPC event stream send latency | **Observable gauges (4):** | Instrument name | Tracks | |---|---| | `mxgateway.sessions.open` | Currently open sessions (live count) | | `mxgateway.workers.running` | Currently running worker processes | | `mxgateway.events.worker_queue.depth` | Per-worker event queue depth | | `mxgateway.events.grpc_stream_queue.depth` | Per-stream gRPC send queue depth | All 20 instruments share the `mxgateway.*` prefix and `.` naming — consistent with the family convention. Duration histograms record in **milliseconds** (`ms`); OTel semantic conventions require seconds (`s`). This is the only project with `ms` histograms. ### Singleton wiring `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`: - `:62` — `services.AddSingleton()` registers the metrics singleton. There is no `AddOpenTelemetry()` call anywhere in the gateway. The `GatewayMetrics` `Meter` is created independently of any OTel SDK — it participates in `MeterListener` / `GetSnapshot()` only. Without the OTel SDK, this data is **invisible to Prometheus, OTLP, or any backend**. ### No tracing No `ActivitySource` is defined. No spans are created. Tracing is entirely absent. ## 2. Logging (Microsoft.Extensions.Logging) All logging in the gateway server uses `Microsoft.Extensions.Logging` (MEL) exclusively. There is no Serilog dependency. Sink configuration lives in `appsettings.json` (Console, with structured logging via the default host builder). ### Correlation scope `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogScope.cs`: Defines the per-request/per-session correlation property bag. `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayRequestLoggingMiddlewareExtensions.cs`: - `:22–41` — `UseGatewayRequestLogging()` middleware reads the following HTTP headers from each incoming request: `x-session-id`, `x-worker-process-id`, `x-correlation-id`, `x-command-method`, `authorization` (for redaction, not logging). - Registered at `GatewayApplication.cs:34`. `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLoggerExtensions.cs`: - `:11–18` — `BeginGatewayScope(ILogger, GatewayLogScope)` calls `logger.BeginScope(scope)` — MEL's `ILogger.BeginScope` mechanism, which pushes properties as a scoped dictionary. The correlation tuple (`SessionId` / `WorkerProcessId` / `CorrelationId` / `CommandMethod`) is injected into log lines produced within the scope. No `trace_id` / `span_id` enrichment — there is no ActivitySource, so this is consistent but leaves no path to trace correlation. ### Log redaction — `GatewayLogRedactor.cs` `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/GatewayLogRedactor.cs`: - Masks sensitive data in log lines for two categories: - **`AuthenticateUser`** commands: the password argument is replaced. - **`WriteSecured`** commands: the value argument is replaced. - **`mxgw_` bearer tokens**: the token body is masked, keeping only the key-id prefix. - Redaction is applied before the log event is emitted — no sensitive data reaches the sink. This is the only project in the family with an explicit log-redaction pipeline. OtOpcUa and ScadaBridge have no equivalent. ## 3. Signal summary | Signal | Provider | Export | Resource / service.name | |---|---|---|---| | Metrics | `System.Diagnostics.Metrics` (`Meter` direct) | ⛔ none (`GetSnapshot()` only) | ⛔ none | | Traces | — | ⛔ none | ⛔ none | | Logs | MEL (`Microsoft.Extensions.Logging`) | Console via `appsettings.json` | ⛔ none | | Trace↔log correlation | — | — | ⛔ absent (no ActivitySource exists) | ## 4. Notable design choices - **`GatewayMetrics` singleton** — all counter/gauge increments are lock-free atomic operations on the underlying `Meter` instruments; the singleton is intentional. - **`ms` histogram unit** — `workers.startup.duration`, `commands.duration`, and `events.stream_send.duration` all record in milliseconds. This is non-standard (OTel semconv requires `s`) and means raw values differ from OtOpcUa's `s` histograms by a factor of 1000. - **MEL correlation via `BeginScope`** — MEL scopes are supported by structured logging providers (e.g. Serilog.Extensions.Hosting, Seq, Application Insights) but are provider-dependent. The scope properties may not appear in all sink configurations, unlike Serilog's `LogContext` which is sink-agnostic. - **Redaction placement** — `GatewayLogRedactor` sits between the caller and the log emission point, not inside a sink. This is the correct placement; the shared `ILogRedactor` seam preserves this. --- ## Adoption plan → `ZB.MOM.WW.Telemetry` **This is the one in-pass adoption.** The MxGateway MEL → Serilog migration is executed as part of the `ZB.MOM.WW.Telemetry` library build, not deferred as a follow-on. The changes below land in the MxAccessGateway repo as part of Task #9 (blocked by Task #8 — library build). **Migrate logging MEL → `AddZbSerilog`:** - Replace `WebApplicationBuilder` default logging with `builder.AddZbSerilog(o => { o.ServiceName = "mxgateway"; o.SiteId = ...; o.NodeRole = ...; })`. Gains structured `SiteId` / `NodeRole` / `NodeHostname` enrichers on every log event, plus `TraceContextEnricher` (currently moot — no spans — but ready for when tracing is added). - Re-express the `GatewayLogScope` / `BeginGatewayScope` / `UseGatewayRequestLogging` correlation mechanism as a Serilog `LogContext.PushProperty` scope. The middleware at `GatewayRequestLoggingMiddlewareExtensions.cs:22–41` is refactored to push the same four properties (`SessionId`, `WorkerProcessId`, `CorrelationId`, `CommandMethod`) via Serilog's `LogContext` rather than MEL `BeginScope`. Behavior is identical; portability improves. - Move `GatewayLogRedactor` behind the shared `ILogRedactor` seam. The redaction policy (which commands/tokens to scrub and how) stays per-project in a `MxGatewayLogRedactor : ILogRedactor` implementation; the seam is shared. - Console + file sinks configured via `ReadFrom.Configuration` in `appsettings.json` — consistent with OtOpcUa and ScadaBridge's Serilog approach. **Wire metrics export via `AddZbTelemetry`:** - Add `builder.AddZbTelemetry(o => { o.ServiceName = "mxgateway"; o.SiteId = ...; o.NodeRole = ...; o.Meters = ["MxGateway.Server"]; /* temporary — update to "ZB.MOM.WW.MxGateway" when the Meter-rename gap (Gap N1) is closed */ })`. This registers the OTel SDK and connects `GatewayMetrics`'s existing `Meter` to the Prometheus exporter. The 13 counters, 3 histograms, and 4 gauges **begin exporting** for the first time. `GatewayMetrics.cs` itself is unchanged — only the SDK layer is added around it. - Add `app.MapZbMetrics()` to expose `/metrics`. **Convert histogram unit `ms` → `s`:** - Rename the three histograms' values: multiply recorded values by `0.001` at the call site, or re-create the instruments with unit `s`. This is a breaking change to existing dashboards/alerts but required for OTel semconv compliance. Tagged as a convergence item in `GAPS.md`. **Keep bespoke:** - `GatewayMetrics.cs` — all 20 instruments (`mxgateway.*` counters, histograms, gauges) stay per-project. `AddZbTelemetry` registers the Meter name; it does not own or replace the instruments. - Meter name `"MxGateway.Server"` — a follow-on rename to `"ZB.MOM.WW.MxGateway"` is tracked in `GAPS.md` but is not required for the initial adoption (it is a Prometheus label change that breaks existing dashboards). - `GatewayApplication.cs:62` singleton registration — unchanged; `GatewayMetrics` remains a singleton; `AddZbTelemetry` simply hooks the OTel SDK to it. - The net48 x86 worker's `IWorkerLogger` (stderr key=value) — out of process and out of scope. No changes.