Commit Graph

9 Commits

Author SHA1 Message Date
Joseph Doherty 2f124fa02c docs(observability): record telemetry follow-ons DONE (metric normalization, ScadaBridge instruments, OTLP opt-in, site metrics listener, Serilog alignment) 2026-06-01 17:16:46 -04:00
Joseph Doherty dee55aadc6 docs(observability): record ZB.MOM.WW.Telemetry adoption across 3 apps; correct false MxGateway logging-status claim
All 3 apps adopted on branch feat/adopt-zb-telemetry (behaviour-preserving).
Records the per-repo result + accepted scope deviations (ScadaBridge keeps
LoggerConfigurationFactory + TraceContextEnricher instead of AddZbSerilog;
MxGateway keeps GatewayLogScope, exposes redaction via ILogRedactor seam) and
deferred follow-ons (#6 ms->s, #7 meter rename, #9 app instruments, OTLP, and
the new ScadaBridge Site-node HTTP/1.1 metrics-listener item). Corrects the
prior false 'MxGateway logging adopted on its own branch' claim — that migration
actually landed in this pass.
2026-06-01 15:58:10 -04:00
Joseph Doherty 544a6ddb77 Fix all baseline code-review findings across the six shared libraries
Resolves the 35 findings from the 2026-06-01 baseline (commit 26ba1c7),
test-first for every behavioral change. +51 tests (331 -> 382 passing, 0 failed).

- Telemetry-001 (HIGH): RedactionEnricher now honours property removal, so a
  redactor that drops a key actually scrubs the secret from the event.
- Auth: LDAP validator ValidateOnStart; API-key verify no longer fails on a
  best-effort MarkUsed write or a corrupt scopes column (fail-closed); LDAP cert
  validation hook; KeyPrefix persistence aligned; README algorithm corrected.
- Health: Akka checks return Degraded (not throw) when the cluster isn't up yet;
  GrpcDependencyHealthCheck catch-all; null 'description' rendered; composite
  endpoint builder; XML docs shipped.
- Audit: CompositeAuditWriter no longer re-throws OperationCanceledException;
  TruncatingAuditRedactor over-redact scrubs Target + safe negative max; options
  record; XML docs shipped.
- Configuration: TryAddEnumerable idempotent registration; consistent port
  quoting; strict invariant port parsing; XML docs + README packaged.
- Theme: mobile toggle is now CSS-only (no Bootstrap JS); token/CSS hygiene;
  XML docs on the public parameter surface.

Shared-contract/spec docs updated where the code was the source of truth
(observability service.instance.id, MapZbMetrics, redactor reach). All changes
additive/back-compatible at v0.1.0. code-reviews bookkeeping follows separately.
2026-06-01 11:22:14 -04:00
Joseph Doherty 8311912f40 feat(telemetry): pack ZB.MOM.WW.Telemetry 0.1.0 + README/CLAUDE + register observability component in indexes
- NuGet metadata: expanded Description and PackageTags on both library csproj files
  (opentelemetry;observability;metrics;tracing;prometheus;otlp;... / serilog;logging;...)
- Full dotnet test: 7 (Telemetry) + 12 (Serilog) = 19 tests, all green
- dotnet pack: ZB.MOM.WW.Telemetry.0.1.0.nupkg + ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg
  (artifacts/ gitignored, not committed)
- ZB.MOM.WW.Telemetry/README.md: overview, 2 packages, unifying hinge prose,
  exporter options, OTel signals + trace-log correlation, test/pack commands, status
- ZB.MOM.WW.Telemetry/CLAUDE.md: package responsibilities, consumer matrix,
  build/test/pack commands, status + pointers to components/observability/
- components/README.md: Observability row added to component registry table
- CLAUDE.md: Telemetry row added to component-normalization table; intro count
  updated to four shared libs; observability prose paragraph added (MxGateway
  logging adoption noted)
- upcoming.md: Observability item ticked done, pointing at components/observability/
  and ZB.MOM.WW.Telemetry; MxGateway MEL->Serilog adoption noted
- components/observability/README.md: status updated to Built @ 0.1.0, library
  build/pack commands added, MxGateway adoption row updated
2026-06-01 08:20:05 -04:00
Joseph Doherty f569d537d1 fix(telemetry.serilog): don't set process-global Log.Logger in AddZbSerilog (multi-host safe)
Remove the Stage-1 bootstrap-logger line (Log.Logger = new LoggerConfiguration()
.WriteTo.Console().CreateBootstrapLogger()) from AddZbSerilog. A shared library must
not mutate process-global state: when multiple hosts are built in one process (integration
tests, Aspire multi-host, parallel test runs) the second call throws "The logger is
already frozen".

AddSerilog is now called with preserveStaticLogger: true so Serilog.Extensions.Hosting
leaves the static Log.Logger entirely untouched. The DI-registered application logger is
the only artifact AddZbSerilog produces.

Apps that want a pre-Build() bootstrap logger should set Log.Logger themselves in
Program.cs before calling AddZbSerilog — that decision belongs to the application.

Three new regression tests in MultiHostTests verify: two hosts build in the same process
without throwing; Log.Logger is not mutated; each host gets its own independent DI ILogger.

Docs (SPEC.md §5 and shared-contract ZB.MOM.WW.Telemetry.md) updated: the "two-stage
bootstrap" framing is replaced with the correct description — library registers only the
DI application logger; optional Stage-1 bootstrap is the app's responsibility.
2026-06-01 08:13:35 -04:00
Joseph Doherty 37fb84f477 feat(telemetry): core review fixes (Prometheus+OTLP coexistence, ServiceName validation, null guards) + contract overload note
- Fix #1: Prometheus exporter always wired for metrics; OTLP is additive overlay
  when Exporter == ZbExporter.Otlp so /metrics + MapZbMetrics work in all modes.
- Fix #2: BuildOptions throws ArgumentException when ServiceName is null/whitespace.
- Fix #3: AddZbTelemetry(IHostApplicationBuilder) guard: ThrowIfNull(configure)
  added alongside existing ThrowIfNull(builder).
- Fix #6: Contract doc adds IServiceCollection convenience overload signature.
- Tests: +3 new tests (OtlpExporter still serves /metrics, empty ServiceName throws,
  whitespace ServiceName throws). Total: 7 passed (was 4).
2026-06-01 07:43:47 -04:00
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00
Joseph Doherty fba3d09eed docs(observability): current-state x3 + GAPS + README
Complete the observability normalization component docs:

- components/observability/current-state/otopcua/CURRENT-STATE.md — full
  OTel SDK (metrics + tracing) + Prometheus; 7 otopcua.* instruments + 2
  spans; Serilog with driver-scope LogContextEnricher; no Resource/service.name
  anywhere; tracing pipeline wired but no exporter; adoption plan: AddZbTelemetry
  gains shared Resource + trace↔log correlation; LogContextEnricher kept bespoke.

- components/observability/current-state/mxaccessgw/CURRENT-STATE.md — 20
  hand-rolled instruments (13 counters, 3 histograms ms-unit, 4 gauges) in
  GatewayMetrics.cs; no OTel SDK → metrics never export; MEL logging with
  GatewayLogScope correlation and GatewayLogRedactor; adoption plan: in-pass
  MEL → AddZbSerilog migration (LogContext correlation, ILogRedactor seam) +
  AddZbTelemetry wires OTel SDK so GatewayMetrics finally exports.

- components/observability/current-state/scadabridge/CURRENT-STATE.md —
  OpenTelemetry.Api is a CVE-patch override only (zero instrumentation); Serilog
  with SiteId/NodeRole/NodeHostname enrichers (strongest set in family); adoption
  plan: replace CVE ref with AddZbTelemetry; adopt AddZbSerilog (LoggerConfigurationFactory
  deleted); add first scadabridge.* instruments.

- components/observability/GAPS.md — divergence table across §1 Resource (P1,
  nobody), §2 metrics export (P1, MxGateway invisible), §3 MxGateway MEL→Serilog
  (P1, in-pass done), §4 trace↔log correlation, §5 ms→s unit, §6 Meter naming,
  §7 standard instrumentation, §8 Serilog version, §9 ScadaBridge zero
  instrumentation; 11-item prioritized backlog.

- components/observability/README.md — overview, per-project status table
  (OTel today / metrics / tracing / logging / enrichers / adoption status),
  normalized vs. left-per-project boundary, 2-package structure, component status.
2026-06-01 07:23:08 -04:00
Joseph Doherty 7d243890ed docs(observability): spec + METRIC-CONVENTIONS + ZB.MOM.WW.Telemetry shared contract
Author the three normalization docs for the observability component:
- components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project),
  AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline,
  exporter conventions, Serilog two-stage bootstrap with identity enrichers and
  TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and
  acceptance criteria.
- components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app
  namespace; MxGateway.Server flagged as convergence target), instrument naming pattern
  (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms
  flagged), Resource attribute set table, standard instrumentation baseline, and per-app
  instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms
  / 4 gauges; ScadaBridge TBD).
- components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two
  packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder +
  IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog,
  ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher.
  Consumer matrix and open contract questions included.
2026-06-01 07:19:38 -04:00