Compare commits

...

49 Commits

Author SHA1 Message Date
Joseph Doherty 18e4b70572 docs(plans): design ZB.MOM.WW.Configuration shared startup-options-validation library
Approved brainstorming design for the Config + validation normalization pass
(Tier-2 candidate in upcoming.md). Scope: startup options validation only,
single package ZB.MOM.WW.Configuration, Approach A (lightweight base + rule
primitives + DI/startup helpers). Full pass = components/configuration/ docs +
built library.
2026-06-01 09:10:35 -04:00
Joseph Doherty a09cc02d46 Merge feat/zb-mom-ww-audit: Audit normalization component + ZB.MOM.WW.Audit (0.1.0)
# Conflicts:
#	CLAUDE.md
#	components/README.md
2026-06-01 09:09:44 -04:00
Joseph Doherty 88c557dee8 fix(telemetry): identical resource across all 3 signals (symmetric OTLP trigger + deterministic service.instance.id)
Fix 1 — symmetric OTLP trigger: ZbSerilogConfig.ApplyOpenTelemetryExport now activates only
when options.Exporter == ZbExporter.Otlp, matching the core OTel metrics/traces path. The
previous fallback that also triggered on a bare OtlpEndpoint is removed; OtlpEndpoint is the
address to use when Otlp is selected, not an independent enable.

Fix 2 — deterministic service.instance.id: ZbResource.InstanceId (MachineName:ProcessId) is
a new public property that produces a stable, process-unique id without a random GUID.
ZbResource.Configure passes autoGenerateServiceInstanceId:false + serviceInstanceId:InstanceId
so metrics and traces never get a random auto-generated id. ZbSerilogConfig.BuildResourceAttributes
adds service.instance.id from ZbResource.InstanceId so the Serilog OTLP log sink carries the
exact same value — all three signals now share an identical resource for cross-signal joins.

Tests: +2 in ZbResourceTests (InstanceId determinism, no-GUID check), +2 in RedactionTests
(service.instance.id parity assertion in BuildResourceAttributes, symmetric OTLP trigger tests).
Total: 9 + 14 = 23 tests, all green.
2026-06-01 08:26:09 -04:00
Joseph Doherty 8311912f40 feat(telemetry): pack ZB.MOM.WW.Telemetry 0.1.0 + README/CLAUDE + register observability component in indexes
- NuGet metadata: expanded Description and PackageTags on both library csproj files
  (opentelemetry;observability;metrics;tracing;prometheus;otlp;... / serilog;logging;...)
- Full dotnet test: 7 (Telemetry) + 12 (Serilog) = 19 tests, all green
- dotnet pack: ZB.MOM.WW.Telemetry.0.1.0.nupkg + ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg
  (artifacts/ gitignored, not committed)
- ZB.MOM.WW.Telemetry/README.md: overview, 2 packages, unifying hinge prose,
  exporter options, OTel signals + trace-log correlation, test/pack commands, status
- ZB.MOM.WW.Telemetry/CLAUDE.md: package responsibilities, consumer matrix,
  build/test/pack commands, status + pointers to components/observability/
- components/README.md: Observability row added to component registry table
- CLAUDE.md: Telemetry row added to component-normalization table; intro count
  updated to four shared libs; observability prose paragraph added (MxGateway
  logging adoption noted)
- upcoming.md: Observability item ticked done, pointing at components/observability/
  and ZB.MOM.WW.Telemetry; MxGateway MEL->Serilog adoption noted
- components/observability/README.md: status updated to Built @ 0.1.0, library
  build/pack commands added, MxGateway adoption row updated
2026-06-01 08:20:05 -04:00
Joseph Doherty f569d537d1 fix(telemetry.serilog): don't set process-global Log.Logger in AddZbSerilog (multi-host safe)
Remove the Stage-1 bootstrap-logger line (Log.Logger = new LoggerConfiguration()
.WriteTo.Console().CreateBootstrapLogger()) from AddZbSerilog. A shared library must
not mutate process-global state: when multiple hosts are built in one process (integration
tests, Aspire multi-host, parallel test runs) the second call throws "The logger is
already frozen".

AddSerilog is now called with preserveStaticLogger: true so Serilog.Extensions.Hosting
leaves the static Log.Logger entirely untouched. The DI-registered application logger is
the only artifact AddZbSerilog produces.

Apps that want a pre-Build() bootstrap logger should set Log.Logger themselves in
Program.cs before calling AddZbSerilog — that decision belongs to the application.

Three new regression tests in MultiHostTests verify: two hosts build in the same process
without throwing; Log.Logger is not mutated; each host gets its own independent DI ILogger.

Docs (SPEC.md §5 and shared-contract ZB.MOM.WW.Telemetry.md) updated: the "two-stage
bootstrap" framing is replaced with the correct description — library registers only the
DI application logger; optional Stage-1 bootstrap is the app's responsibility.
2026-06-01 08:13:35 -04:00
Joseph Doherty f1240c0bd4 refactor(telemetry.serilog): review fixes (thread-safe redactor, bootstrap logger, minlevel ordering, test coverage) 2026-06-01 07:48:57 -04:00
Joseph Doherty 37fb84f477 feat(telemetry): core review fixes (Prometheus+OTLP coexistence, ServiceName validation, null guards) + contract overload note
- Fix #1: Prometheus exporter always wired for metrics; OTLP is additive overlay
  when Exporter == ZbExporter.Otlp so /metrics + MapZbMetrics work in all modes.
- Fix #2: BuildOptions throws ArgumentException when ServiceName is null/whitespace.
- Fix #3: AddZbTelemetry(IHostApplicationBuilder) guard: ThrowIfNull(configure)
  added alongside existing ThrowIfNull(builder).
- Fix #6: Contract doc adds IServiceCollection convenience overload signature.
- Tests: +3 new tests (OtlpExporter still serves /metrics, empty ServiceName throws,
  whitespace ServiceName throws). Total: 7 passed (was 4).
2026-06-01 07:43:47 -04:00
Joseph Doherty c284e4d68d docs(audit): register component in indexes + GAPS cross-check 2026-06-01 07:41:45 -04:00
Joseph Doherty 2b856074d5 feat(telemetry.serilog): ILogRedactor seam + OTel log export 2026-06-01 07:40:58 -04:00
Joseph Doherty 70f91a855a feat(telemetry.serilog): TraceContextEnricher for trace<->log correlation 2026-06-01 07:38:54 -04:00
Joseph Doherty 1344f249d0 feat(telemetry.serilog): AddZbSerilog bootstrap + identity enrichers 2026-06-01 07:38:07 -04:00
Joseph Doherty 7f05107c1d feat(audit): AddZbAudit DI extension with safe defaults
TryAdd registers NullAuditRedactor + NoOpAuditWriter so consumer
registrations win; symmetric override tests for both writer and redactor.
2026-06-01 07:34:48 -04:00
Joseph Doherty 3e4d4369bf feat(telemetry): MapZbMetrics Prometheus scrape endpoint 2026-06-01 07:34:26 -04:00
Joseph Doherty 4126e1df54 feat(telemetry): AddZbTelemetry metrics+traces bootstrap 2026-06-01 07:33:51 -04:00
Joseph Doherty 215a646e35 docs(observability): fix metric-convention instrument names + NodeHostname-auto + resolve settled questions
C1: NodeHostname is AUTO throughout. Shared-contract AddZbSerilog doc comment now reads
"SiteId + NodeRole from ZbTelemetryOptions; NodeHostname from Environment.MachineName (auto)".
SPEC.md §0 and §5 prose updated to match. ScadaBridge adoption snippet no longer sets
o.NodeHostname (removed; NodeHostname is auto, not caller-supplied).

C2: METRIC-CONVENTIONS §6.1 OtOpcUa instrument table replaced with code-verified set:
counters otopcua.deploy.applied / driver.lifecycle / virtualtag.eval / scriptedalarm.transition /
opcua.sink.write / redundancy.service_level_change; histogram otopcua.deploy.apply.duration (s);
ActivitySource ZB.MOM.WW.OtOpcUa with spans otopcua.deploy.apply + otopcua.opcua.address_space_rebuild.
Removed invented names (deploy.failed, tag.subscriptions, tag.reads, tag.writes, session.active,
connection.gateway).

C3: METRIC-CONVENTIONS §6.2 MxGateway instrument table replaced with code-verified names from
GatewayMetrics.cs: 13 counters (sessions.opened/closed, commands.started/succeeded/failed,
events.received, queues.overflows, faults, workers.killed/exited, heartbeats.failed,
grpc.streams.disconnected, retries.attempted); 3 histograms ms (workers.startup.duration,
commands.duration, events.stream_send.duration); 4 gauges (sessions.open, workers.running,
events.worker_queue.depth, events.grpc_stream_queue.depth). Removed invented names.

m3: §2 example table replaced mxgateway.session.active + mxgateway.worker.call.duration
(invented) with mxgateway.sessions.open + mxgateway.commands.duration (real). Also fixed
the §2 rule-2 body text example which referenced mxgateway.worker.call.duration.

I4: §5 standard instrumentation table corrected — OtOpcUa now shows  not added for all
five baseline instrumentations, matching current-state/otopcua. All three projects lack
standard instrumentation today; AddZbTelemetry adds it on adoption.

I1+m1: GAPS.md "Decisions still open" — removed the two settled questions (Prometheus-default
and ms→s/meter-rename bundling). Moved them to a new "Decisions settled" section with explicit
resolution notes. One genuinely open question remains (SiteId/NodeRole config binding path).

I2: SPEC.md §5 AddZbSerilog: added note that AddZbSerilog reads Serilog:MinimumLevel from
IConfiguration; callers with a different config key (e.g. ScadaBridge:Logging:MinimumLevel)
apply that override themselves — stays per-project. Shared-contract doc comment updated to match.

I3: MxAccessGateway adoption plan Meters = ["MxGateway.Server"] annotated as temporary with
note to update to ZB.MOM.WW.MxGateway when Gap N1 (Meter-rename) is closed.

m2: SPEC.md §1 now notes AddZbTelemetry also has an IServiceCollection overload for non-standard
hosts, with the IHostApplicationBuilder overload as the primary path.
2026-06-01 07:32:58 -04:00
Joseph Doherty 453ec7358d feat(audit): redactor + writer helpers (Null/Truncating/NoOp/Composite/Redacting)
Code-review fixes: CompositeAuditWriter re-throws OperationCanceledException
(honors cancellation) + evt null-guard; RedactingAuditWriter evt null-guard;
added marker-longer-than-max and cancellation-propagation regression tests.
2026-06-01 07:31:28 -04:00
Joseph Doherty 645388b1f1 feat(telemetry): options + shared OTel Resource 2026-06-01 07:30:54 -04:00
Joseph Doherty a1c3d5ec81 chore: scaffold ZB.MOM.WW.Telemetry solution and projects
Two library projects (ZB.MOM.WW.Telemetry core + Serilog) and two xUnit
test projects; central PM via Directory.Packages.props; dotnet build green.
2026-06-01 07:27:30 -04:00
Joseph Doherty 3934e528f2 feat(audit): AuditEvent record + AuditOutcome + writer/redactor seams
Includes equality-as-normalized-instant remarks on OccurredAtUtc and a
same-instant/different-offset equality regression test (code-review follow-up).
2026-06-01 07:25:31 -04:00
Joseph Doherty fba3d09eed docs(observability): current-state x3 + GAPS + README
Complete the observability normalization component docs:

- components/observability/current-state/otopcua/CURRENT-STATE.md — full
  OTel SDK (metrics + tracing) + Prometheus; 7 otopcua.* instruments + 2
  spans; Serilog with driver-scope LogContextEnricher; no Resource/service.name
  anywhere; tracing pipeline wired but no exporter; adoption plan: AddZbTelemetry
  gains shared Resource + trace↔log correlation; LogContextEnricher kept bespoke.

- components/observability/current-state/mxaccessgw/CURRENT-STATE.md — 20
  hand-rolled instruments (13 counters, 3 histograms ms-unit, 4 gauges) in
  GatewayMetrics.cs; no OTel SDK → metrics never export; MEL logging with
  GatewayLogScope correlation and GatewayLogRedactor; adoption plan: in-pass
  MEL → AddZbSerilog migration (LogContext correlation, ILogRedactor seam) +
  AddZbTelemetry wires OTel SDK so GatewayMetrics finally exports.

- components/observability/current-state/scadabridge/CURRENT-STATE.md —
  OpenTelemetry.Api is a CVE-patch override only (zero instrumentation); Serilog
  with SiteId/NodeRole/NodeHostname enrichers (strongest set in family); adoption
  plan: replace CVE ref with AddZbTelemetry; adopt AddZbSerilog (LoggerConfigurationFactory
  deleted); add first scadabridge.* instruments.

- components/observability/GAPS.md — divergence table across §1 Resource (P1,
  nobody), §2 metrics export (P1, MxGateway invisible), §3 MxGateway MEL→Serilog
  (P1, in-pass done), §4 trace↔log correlation, §5 ms→s unit, §6 Meter naming,
  §7 standard instrumentation, §8 Serilog version, §9 ScadaBridge zero
  instrumentation; 11-item prioritized backlog.

- components/observability/README.md — overview, per-project status table
  (OTel today / metrics / tracing / logging / enrichers / adoption status),
  normalized vs. left-per-project boundary, 2-package structure, component status.
2026-06-01 07:23:08 -04:00
Joseph Doherty 7d243890ed docs(observability): spec + METRIC-CONVENTIONS + ZB.MOM.WW.Telemetry shared contract
Author the three normalization docs for the observability component:
- components/observability/spec/SPEC.md — Section 0 scope (normalized vs. per-project),
  AddZbTelemetry pipeline, shared Resource attribute set, standard instrumentation baseline,
  exporter conventions, Serilog two-stage bootstrap with identity enrichers and
  TraceContextEnricher, ILogRedactor redaction seam, per-project migration table, and
  acceptance criteria.
- components/observability/spec/METRIC-CONVENTIONS.md — meter naming convention (app
  namespace; MxGateway.Server flagged as convergence target), instrument naming pattern
  (<app>.<subsystem>.<event>), mandatory duration unit = seconds (MxGateway ms histograms
  flagged), Resource attribute set table, standard instrumentation baseline, and per-app
  instrument tables (OtOpcUa 7 instruments + 2 spans; MxGateway 13 counters / 3 histograms
  / 4 gauges; ScadaBridge TBD).
- components/observability/shared-contract/ZB.MOM.WW.Telemetry.md — paper API for the two
  packages: ZbTelemetryOptions, ZbExporter enum, AddZbTelemetry (IHostApplicationBuilder +
  IServiceCollection overloads), ZbResource.Build, MapZbMetrics; AddZbSerilog,
  ZbLogEnricherNames constants, TraceContextEnricher, ILogRedactor, RedactionEnricher.
  Consumer matrix and open contract questions included.
2026-06-01 07:19:38 -04:00
Joseph Doherty 54654a49af chore(audit): scaffold ZB.MOM.WW.Audit solution 2026-06-01 07:19:36 -04:00
Joseph Doherty 76295695ee docs(health): align shared-contract to shipped API + per-lib CLAUDE.md + cleanup
- Contract: DatabaseHealthCheck<TContext> ctor now shows IServiceProvider (resolves
  IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe)
- Contract: RequireActiveNode gains retryAfterSeconds = 5 default parameter
- Packages: remove dangling AspNetCore.HealthChecks.UI.Client PackageVersion (no
  csproj referenced it)
- Tests: fix CS8625 in RoleLessCases — use object?[] so null role rows compile
  warning-free under Nullable=enable
- Add ZB.MOM.WW.Health/CLAUDE.md (packages, responsibilities, consumer matrix,
  build/test/pack commands, status + pointer to components/health/)
2026-06-01 07:17:18 -04:00
Joseph Doherty 6588e15f57 docs(audit): fix canonical record field count (10 not 8) + drop BCL-only overstatement (review fixes) 2026-06-01 07:16:18 -04:00
Joseph Doherty 0c087d150d feat(health): pack ZB.MOM.WW.Health 0.1.0 + README + register health component in indexes
- Added PackageTags to all 3 library csproj files (health-checks;aspnetcore/akka/efcore;scada;wonderware;zb-mom-ww)
- Full solution dotnet test: 58 tests green (32 Akka + 20 core + 6 EFCore)
- dotnet pack -c Release produces ZB.MOM.WW.Health.0.1.0.nupkg, ZB.MOM.WW.Health.Akka.0.1.0.nupkg, ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg; artifacts/ not committed
- ZB.MOM.WW.Health/README.md: overview, packages table, consumer matrix, versioning, build/test/pack instructions, status note
- components/README.md: Health row added to component registry
- CLAUDE.md: Health row in Component-normalization table + Health paragraph; intro updated from "two pieces" to "three pieces"
- upcoming.md: Health checks item checked off with pointer to components/health/ and ZB.MOM.WW.Health/
- components/health/README.md: status updated from "Draft / scaffolded / follow-on" to "Built @ 0.1.0"
2026-06-01 07:09:14 -04:00
Joseph Doherty 69c1be943e docs(audit): README + GAPS adoption backlog 2026-06-01 07:08:31 -04:00
Joseph Doherty ef234d3574 docs(audit): shared-contract ZB.MOM.WW.Audit 2026-06-01 07:08:31 -04:00
Joseph Doherty 8f0b70d12f docs(audit): spec + event-model 2026-06-01 07:04:54 -04:00
Joseph Doherty 1c2b23cbbb refactor(health.akka): review polish (internal decision helper, role guard, factory results, test coverage) + fix SPEC §4 gate description 2026-06-01 07:04:29 -04:00
Joseph Doherty edbc79204f refactor(health.ef): review polish (timer release, timeout test, provider disposal, drop unused dep)
- Eagerly call CancelAfter(InfiniteTimeSpan) after a successful probe so the pending OS
  timer is released on the happy path rather than held for the full timeout window.
- Add ProbeTimeout_Unhealthy test: 50 ms timeout with an infinite-blocking probe delegate
  asserts Unhealthy, covering the timeout code path.
- Fix ProbeQueryThrows_Unhealthy to use Task.FromException rather than a synchronous throw,
  accurately modelling a faulted async delegate.
- Wrap all BuildServiceProvider() results in await using so ServiceProvider is disposed
  after each test (no DI provider leak).
- Remove unused Microsoft.EntityFrameworkCore.InMemory package reference; tests use
  SQLite only (InMemory CanConnect semantics differ and the package was not exercised).
- Add <remarks> to DatabaseHealthCheck<TContext> noting the scoped-resolution path is
  safe for AddDbContextPool (scope dispose returns context to pool, not destroys it).
2026-06-01 07:03:16 -04:00
Joseph Doherty a7a8f1e493 docs(audit): correct file:line refs + split MxGateway CLI/dashboard action vocab (review fixes) 2026-06-01 07:01:46 -04:00
Joseph Doherty aa2251b93d feat(health): core review fixes (async writer, gRPC cancellation, validation, configurable retry-after) 2026-06-01 07:00:21 -04:00
Joseph Doherty cf277eb7df feat(health.akka): active/leader check with role filter + IActiveNodeGate impl 2026-06-01 06:55:46 -04:00
Joseph Doherty 9c8c1431af docs(audit): current-state ScadaBridge 2026-06-01 06:55:07 -04:00
Joseph Doherty 02cc687556 docs(audit): current-state MxAccessGateway 2026-06-01 06:55:07 -04:00
Joseph Doherty e498bb7c5a docs(audit): current-state OtOpcUa 2026-06-01 06:55:07 -04:00
Joseph Doherty 2dbedce0ac feat(health.ef): generic DatabaseHealthCheck<TContext> 2026-06-01 06:48:20 -04:00
Joseph Doherty 25dd328280 feat(health.akka): cluster health check with configurable status policy 2026-06-01 06:47:29 -04:00
Joseph Doherty 1ab2f32e8e feat(health): gRPC dependency health check 2026-06-01 06:44:05 -04:00
Joseph Doherty 5b82d68ea9 feat(health): IActiveNodeGate seam + RequireActiveNode filter 2026-06-01 06:43:11 -04:00
Joseph Doherty d1b837e718 feat(health): canonical JSON health response writer 2026-06-01 06:42:24 -04:00
Joseph Doherty 5fb579c2f0 docs: implementation plan for ZB.MOM.WW.Audit shared library 2026-06-01 06:39:05 -04:00
Joseph Doherty 18be42d0e2 feat(health): scaffold ZB.MOM.WW.Health solution + Task 4 (tags + three-tier MapZbHealth)
Consolidates the library into the scadaproj repo (matching the ZB.MOM.WW.Auth
convention — tracked in-parent, not a nested repo). 3 dependency-split packages
(core / .Akka / .EntityFrameworkCore) + 3 test projects, .slnx, central PM.
Task 4: ZbHealthTags + MapZbHealth (/health/ready,/active,/healthz). 8/8 tests.
2026-06-01 06:35:39 -04:00
Joseph Doherty 07d5907258 docs(health): resolve spec/contract/gaps consistency (review fixes)
Applies canonical resolutions for eight settled decisions:
- GAPS: remove three stale "Decisions still open" bullets (#1 IActiveNodeGate placement, #2 GrpcChannel type, #3 OtOpcUaCompat named constant)
- Shared contract: AkkaClusterHealthCheck, ActiveNodeHealthCheck constructors take IServiceProvider (lazy ActorSystem, Degraded-when-not-ready)
- Shared contract: AkkaActiveNodeGate takes IServiceProvider; reads SelfMember+leader directly, null-guarded; does not proxy ActiveNodeHealthCheck
- Shared contract: DatabaseHealthCheckOptions.Probe renamed to ProbeQuery; consumer matrix updated
- Shared contract: settled AddZbHealthChecks open question removed (spec §5 is per-project AddHealthChecks)
- SPEC §2.2: OtOpcUaCompat Leaving/Exiting cell updated from — to Degraded + footnote; §2.3 startup-safety note added
- README: status line corrected from "built and tested" to "scaffolded … implementation is follow-on (task #7)"; IActiveNodeGate "left per-project" bullet removed
- OtOpcUa current-state: AddZbHealthChecks → AddHealthChecks().AddCheck<...>(); IClusterRoleInfo note reframed as accepted trade-off
- ScadaBridge current-state: IActiveNodeGate bullet rewritten — interface moves to ZB.MOM.WW.Health on adoption, InboundApiEndpointFilter references shared interface
2026-06-01 06:33:42 -04:00
Joseph Doherty 16540b3001 docs: design for audit normalization component + ZB.MOM.WW.Audit 2026-06-01 06:32:39 -04:00
Joseph Doherty 3d25ee5090 docs(health): current-state x3 + GAPS + README
Code-verified current-state docs for OtOpcUa (three-tier full), ScadaBridge
(two-tier, no /healthz), and MxAccessGateway (bare liveness only / no probes).
GAPS backlog with P1 for MxGateway and convergence items for Akka status policy,
DB probe technique, and response writer. README with per-project status table.
2026-06-01 06:23:53 -04:00
Joseph Doherty 1dc35a8c43 docs(health): spec + ZB.MOM.WW.Health shared contract
Authors components/health/spec/SPEC.md (normalized three-tier endpoint
convention, probe catalog, response-writer contract, migration notes) and
components/health/shared-contract/ZB.MOM.WW.Health.md (paper API for the
3-package library: core, Akka, EntityFrameworkCore).
2026-06-01 06:20:19 -04:00
Joseph Doherty c77df2a2cd docs: implementation plans for ZB.MOM.WW.Health + ZB.MOM.WW.Telemetry
Two TDD plans (one per library, per house precedent) derived from the approved
design, with co-located .tasks.json execution-persistence:

- Health: components/health docs + 3 dependency-split packages (11 tasks)
- Telemetry: components/observability docs + 2 packages (3 OTel signals +
  Serilog) + the MxGateway MEL->Serilog migration (12 tasks)

Each task carries classification / est-time / parallelizable metadata for the
executing-plans workflow.
2026-06-01 06:15:22 -04:00
Joseph Doherty 29b309c6c1 docs: design for health + observability normalization components
Adds the approved brainstorm design for the next two component-normalization
entries (Health #1, Observability #2 from upcoming.md):

- components/health/ -> ZB.MOM.WW.Health (3 dependency-split packages)
- components/observability/ -> ZB.MOM.WW.Telemetry (2 packages, 3 OTel signals
  + shared Serilog bootstrap)

Scope: normalization docs + build both libraries (.NET 10, tested, packed);
one sister-repo touch (MxGateway MEL->Serilog migration); no other app adoption.
Unifying hinge: one identity triple (service.name/site.id/node.role) feeds both
the OTel Resource and the Serilog enrichers.
2026-06-01 06:08:51 -04:00
121 changed files with 11898 additions and 8 deletions
+59 -3
View File
@@ -6,9 +6,11 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
`scadaproj` is primarily an umbrella/index workspace that aggregates a family of
related SCADA / OT / Wonderware / OPC UA "sister projects" that live as **sibling
directories under `~/Desktop/`**. It now also **hosts two pieces of source itself**
the shared [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) library and the shared
[`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) UI kit — both the realized output of their
directories under `~/Desktop/`**. It now also **hosts four pieces of source itself**
the shared [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) library, the shared
[`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) UI kit, the shared
[`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) health-check library, and the shared
[`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) observability library — all the realized output of their
respective component normalizations (see [Component normalization](#component-normalization)).
The point of this file is to give a high-level scan of each sister project — its purpose,
location, stack, and primary commands — so a fresh Claude Code session can orient across
@@ -119,6 +121,9 @@ each project's **code-verified current state**, and the **gaps** between. See
|---|---|---|---|---|
| Auth (login / identity / authz) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Auth` lib | [`components/auth/`](components/auth/) | [`ZB.MOM.WW.Auth/`](ZB.MOM.WW.Auth/) |
| UI Theme (layout / tokens / components) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Theme` RCL | [`components/ui-theme/`](components/ui-theme/) | [`ZB.MOM.WW.Theme/`](ZB.MOM.WW.Theme/) |
| Health (readiness / liveness / active-node) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Health` lib | [`components/health/`](components/health/) | [`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/) |
| Observability (metrics / traces / logs) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Telemetry` lib + `.Serilog` | [`components/observability/`](components/observability/) | [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/) |
| Audit (event model + writer seam) | Built (lib `0.1.0`) | Shared `ZB.MOM.WW.Audit` lib | [`components/audit/`](components/audit/) | [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/) |
The auth component is fully populated: a normalized [`spec`](components/auth/spec/SPEC.md), a
proposed [`shared-contract`](components/auth/shared-contract/ZB.MOM.WW.Auth.md), three
@@ -149,6 +154,57 @@ The implementation plan is at
Build/test from `ZB.MOM.WW.Theme/`: `dotnet test`. Consumer matrix: all three apps consume
the single `ZB.MOM.WW.Theme` package (OtOpcUa AdminUI, MxGateway Server, ScadaBridge Host + CentralUI).
The health component is fully populated: a normalized [`spec`](components/health/spec/SPEC.md), a
[`shared-contract`](components/health/shared-contract/ZB.MOM.WW.Health.md), three
[`current-state`](components/health/current-state/) docs, and an adoption [`GAPS`](components/health/GAPS.md)
backlog. Shared = three-tier endpoint convention (ready/active/healthz) + canonical JSON writer +
`IActiveNodeGate` seam + `GrpcDependencyHealthCheck` + `AkkaClusterHealthCheck` + `ActiveNodeHealthCheck`
+ `DatabaseHealthCheck<TContext>`; left per-project = which probes each app registers,
orchestrator wiring, and ScadaBridge's distributed health-monitoring pipeline.
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Health/`](ZB.MOM.WW.Health/)
(.NET 10; 3 packages — `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`;
58 tests; `dotnet pack` → 3 nupkgs @ 0.1.0).
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/health/GAPS.md`](components/health/GAPS.md).
Build/test from `ZB.MOM.WW.Health/`: `dotnet test`. Consumer matrix: MxAccessGateway → core only;
OtOpcUa & ScadaBridge → all three packages.
The observability component is fully populated: a normalized [`spec`](components/observability/spec/SPEC.md),
a [`metric-conventions`](components/observability/spec/METRIC-CONVENTIONS.md) reference, a
[`shared-contract`](components/observability/shared-contract/ZB.MOM.WW.Telemetry.md), three
[`current-state`](components/observability/current-state/) docs, and an adoption [`GAPS`](components/observability/GAPS.md)
backlog. Shared = OTel Resource (service.name/site.id/node.role identity triple) + standard instrumentation
(ASP.NET Core, HttpClient, gRPC client, runtime, process) + Prometheus always-on exporter + OTLP opt-in
+ Serilog two-stage bootstrap + SiteId/NodeRole/NodeHostname enrichers + TraceContextEnricher (trace_id/span_id)
+ ILogRedactor seam; left per-project = application Meters/ActivitySources, sink config, per-operation
enrichers, and redaction policies.
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Telemetry/`](ZB.MOM.WW.Telemetry/)
(.NET 10; 2 packages — `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog`; 19 tests;
`dotnet pack` → 2 nupkgs @ 0.1.0). **MxAccessGateway logging adopted** (MEL → Serilog migration done on
its own branch) — the one in-pass adoption. Broader OtOpcUa and ScadaBridge telemetry adoption is
follow-on, tracked in [`components/observability/GAPS.md`](components/observability/GAPS.md).
Build/test from `ZB.MOM.WW.Telemetry/`: `dotnet test`. Consumer matrix: all three apps consume both
packages after adoption (OtOpcUa, MxGateway Server, ScadaBridge Host + any instrumented project).
The audit component is fully populated: a normalized [`spec`](components/audit/spec/SPEC.md), an
[`event-model`](components/audit/spec/EVENT-MODEL.md) reference, a
[`shared-contract`](components/audit/shared-contract/ZB.MOM.WW.Audit.md), three
[`current-state`](components/audit/current-state/) docs, and an adoption [`GAPS`](components/audit/GAPS.md)
backlog. Common ground = canonical `AuditEvent` record + `AuditOutcome` enum + `IAuditWriter` /
`IAuditRedactor` seams + helpers (`NullAuditRedactor`, `TruncatingAuditRedactor`, `NoOpAuditWriter`,
`CompositeAuditWriter`, `RedactingAuditWriter`) + `AddZbAudit` DI registration; left per-project =
transport/storage and domain vocabulary. Closes the loop on Auth — audit's `Actor` field = the Auth
principal. `IAuditRedactor` is aligned with Telemetry's `ILogRedactor` seam convention.
The shared library is **built and lives in this repo** at [`ZB.MOM.WW.Audit/`](ZB.MOM.WW.Audit/)
(.NET 10; 1 package — `ZB.MOM.WW.Audit`; only non-BCL dependency `Microsoft.Extensions.DependencyInjection.Abstractions`;
19 tests; `dotnet pack` → 1 nupkg @ 0.1.0). Repo: `https://gitea.dohertylan.com/dohertj2/zb-mom-ww-audit`.
**Not yet adopted** by the three apps — that's the follow-on tracked in [`components/audit/GAPS.md`](components/audit/GAPS.md).
Build/test from `ZB.MOM.WW.Audit/`: `dotnet test`. Consumer matrix: all three apps consume the single
`ZB.MOM.WW.Audit` package (OtOpcUa, MxAccessGateway, ScadaBridge each map their own audit record/seam
onto the canonical type at the emit boundary).
## Per-project primary commands
Run these from inside each project directory (not from `scadaproj`).
+482
View File
@@ -0,0 +1,482 @@
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.
##
## Get latest from `dotnet new gitignore`
# dotenv files
.env
# User-specific files
*.rsuser
*.suo
*.user
*.userosscache
*.sln.docstates
# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs
# Mono auto generated files
mono_crash.*
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
[Ww][Ii][Nn]32/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/
[Ll]ogs/
# Visual Studio 2015/2017 cache/options directory
.vs/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/
# Visual Studio 2017 auto generated files
Generated\ Files/
# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*
# NUnit
*.VisualState.xml
TestResult.xml
nunit-*.xml
# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c
# Benchmark Results
BenchmarkDotNet.Artifacts/
# .NET
project.lock.json
project.fragment.lock.json
artifacts/
# Tye
.tye/
# ASP.NET Scaffolding
ScaffoldingReadMe.txt
# StyleCop
StyleCopReport.xml
# Files built by Visual Studio
*_i.c
*_p.c
*_h.h
*.ilk
*.meta
*.obj
*.iobj
*.pch
*.pdb
*.ipdb
*.pgc
*.pgd
*.rsp
# but not Directory.Build.rsp, as it configures directory-level build defaults
!Directory.Build.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*_wpftmp.csproj
*.log
*.tlog
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc
# Chutzpah Test files
_Chutzpah*
# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opendb
*.opensdf
*.sdf
*.cachefile
*.VC.db
*.VC.VC.opendb
# Visual Studio profiler
*.psess
*.vsp
*.vspx
*.sap
# Visual Studio Trace Files
*.e2e
# TFS 2012 Local Workspace
$tf/
# Guidance Automation Toolkit
*.gpState
# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# TeamCity is a build add-in
_TeamCity*
# DotCover is a Code Coverage Tool
*.dotCover
# AxoCover is a Code Coverage Tool
.axoCover/*
!.axoCover/settings.json
# Coverlet is a free, cross platform Code Coverage Tool
coverage*.json
coverage*.xml
coverage*.info
# Visual Studio code coverage results
*.coverage
*.coveragexml
# NCrunch
_NCrunch_*
.*crunch*.local.xml
nCrunchTemp_*
# MightyMoose
*.mm.*
AutoTest.Net/
# Web workbench (sass)
.sass-cache/
# Installshield output folder
[Ee]xpress/
# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html
# Click-Once directory
publish/
# Publish Web Output
*.[Pp]ublish.xml
*.azurePubxml
# Note: Comment the next line if you want to checkin your web deploy settings,
# but database connection strings (with potential passwords) will be unencrypted
*.pubxml
*.publishproj
# Microsoft Azure Web App publish settings. Comment the next line if you want to
# checkin your Azure Web App publish settings, but sensitive information contained
# in these scripts will be unencrypted
PublishScripts/
# NuGet Packages
*.nupkg
# NuGet Symbol Packages
*.snupkg
# The packages folder can be ignored because of Package Restore
**/[Pp]ackages/*
# except build/, which is used as an MSBuild target.
!**/[Pp]ackages/build/
# Uncomment if necessary however generally it will be regenerated when needed
#!**/[Pp]ackages/repositories.config
# NuGet v3's project.json files produces more ignorable files
*.nuget.props
*.nuget.targets
# Microsoft Azure Build Output
csx/
*.build.csdef
# Microsoft Azure Emulator
ecf/
rcf/
# Windows Store app package directories and files
AppPackages/
BundleArtifacts/
Package.StoreAssociation.xml
_pkginfo.txt
*.appx
*.appxbundle
*.appxupload
# Visual Studio cache files
# files ending in .cache can be ignored
*.[Cc]ache
# but keep track of directories ending in .cache
!?*.[Cc]ache/
# Others
ClientBin/
~$*
*~
*.dbmdl
*.dbproj.schemaview
*.jfm
*.pfx
*.publishsettings
orleans.codegen.cs
# Including strong name files can present a security risk
# (https://github.com/github/gitignore/pull/2483#issue-259490424)
#*.snk
# Since there are multiple workflows, uncomment next line to ignore bower_components
# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
#bower_components/
# RIA/Silverlight projects
Generated_Code/
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
ServiceFabricBackup/
*.rptproj.bak
# SQL Server files
*.mdf
*.ldf
*.ndf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
*.rptproj.rsuser
*- [Bb]ackup.rdl
*- [Bb]ackup ([0-9]).rdl
*- [Bb]ackup ([0-9][0-9]).rdl
# Microsoft Fakes
FakesAssemblies/
# GhostDoc plugin setting file
*.GhostDoc.xml
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
node_modules/
# Visual Studio 6 build log
*.plg
# Visual Studio 6 workspace options file
*.opt
# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
*.vbw
# Visual Studio 6 auto-generated project file (contains which files were open etc.)
*.vbp
# Visual Studio 6 workspace and project file (working project files containing files to include in project)
*.dsw
*.dsp
# Visual Studio 6 technical files
*.ncb
*.aps
# Visual Studio LightSwitch build output
**/*.HTMLClient/GeneratedArtifacts
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
_Pvt_Extensions
# Paket dependency manager
.paket/paket.exe
paket-files/
# FAKE - F# Make
.fake/
# CodeRush personal settings
.cr/personal
# Python Tools for Visual Studio (PTVS)
__pycache__/
*.pyc
# Cake - Uncomment if you are using it
# tools/**
# !tools/packages.config
# Tabs Studio
*.tss
# Telerik's JustMock configuration file
*.jmconfig
# BizTalk build output
*.btp.cs
*.btm.cs
*.odx.cs
*.xsd.cs
# OpenCover UI analysis results
OpenCover/
# Azure Stream Analytics local run output
ASALocalRun/
# MSBuild Binary and Structured Log
*.binlog
# NVidia Nsight GPU debugger configuration file
*.nvuser
# MFractors (Xamarin productivity tool) working folder
.mfractor/
# Local History for Visual Studio
.localhistory/
# Visual Studio History (VSHistory) files
.vshistory/
# BeatPulse healthcheck temp database
healthchecksdb
# Backup folder for Package Reference Convert tool in Visual Studio 2017
MigrationBackup/
# Ionide (cross platform F# VS Code tools) working folder
.ionide/
# Fody - auto-generated XML schema
FodyWeavers.xsd
# VS Code files for those working on multiple tools
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
*.code-workspace
# Local History for Visual Studio Code
.history/
# Windows Installer files from build outputs
*.cab
*.msi
*.msix
*.msm
*.msp
# JetBrains Rider
*.sln.iml
.idea/
##
## Visual studio for Mac
##
# globs
Makefile.in
*.userprefs
*.usertasks
config.make
config.status
aclocal.m4
install-sh
autom4te.cache/
*.tar.gz
tarballs/
test-results/
# content below from: https://github.com/github/gitignore/blob/main/Global/macOS.gitignore
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
# content below from: https://github.com/github/gitignore/blob/main/Global/Windows.gitignore
# Windows thumbnail cache files
Thumbs.db
ehthumbs.db
ehthumbs_vista.db
# Dump file
*.stackdump
# Folder config file
[Dd]esktop.ini
# Recycle Bin used on file shares
$RECYCLE.BIN/
# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp
# Windows shortcuts
*.lnk
# Vim temporary swap files
*.swp
+10
View File
@@ -0,0 +1,10 @@
<Project>
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<Version>0.1.0</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
</Project>
+15
View File
@@ -0,0 +1,15 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<!-- Extensions -->
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="10.0.7" />
<!-- Test -->
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
</ItemGroup>
</Project>
+59
View File
@@ -0,0 +1,59 @@
# ZB.MOM.WW.Audit
Canonical audit event model, best-effort writer seam, and redactor seam for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). This is a **library, not a service** — it is linked directly into the consuming application at build time. Transport and storage remain per-project; only the shared record + seams live here.
---
## Packages
| Package | Description | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Audit` | Canonical `AuditEvent` record, `AuditOutcome` enum, `IAuditWriter` + `IAuditRedactor` seams, shipped helpers (`NullAuditRedactor`, `TruncatingAuditRedactor`, `NoOpAuditWriter`, `CompositeAuditWriter`, `RedactingAuditWriter`), and `AddZbAudit` DI extension. | `Microsoft.Extensions.DependencyInjection.Abstractions` |
---
## Consumer Matrix
| Consumer | ZB.MOM.WW.Audit |
|---|:---:|
| **OtOpcUa** | yes (adoption deferred) |
| **MxAccessGateway** | yes (adoption deferred) |
| **ScadaBridge** | yes (adoption deferred — "align, don't replace") |
Adoption is tracked in `components/audit/GAPS.md` in the outer `scadaproj` workspace. Each app brings its own transport (Akka broadcast / SQLite append / SQL ingest) and domain vocabulary (channels / kinds / event-types) — those stay per-project. The shared library provides the canonical record and the two seams that decouple "what to audit" from "how to store it".
---
## Auth alignment
`AuditEvent.Actor` is a string today. At adoption time it SHOULD be set to the `ZB.MOM.WW.Auth` principal identifier — this is the "audit closes the loop on Auth" hinge described in the spec. No compile-time dependency on `ZB.MOM.WW.Auth` is introduced here; the alignment is by convention.
---
## Versioning
The single package is versioned from `Directory.Build.props`. The current release is **0.1.0**. A single version bump in `Directory.Build.props` bumps the package.
---
## Building and packing
```bash
# From ZB.MOM.WW.Audit/
dotnet build ZB.MOM.WW.Audit.slnx
dotnet test ZB.MOM.WW.Audit.slnx
# Produce the NuGet package into ./artifacts/
./build/pack.sh
```
---
## Design documentation
Full design docs live in the `components/audit` folder of the SCADA project workspace:
- `~/Desktop/scadaproj-audit/components/audit/spec/SPEC.md` — overall audit specification
- `~/Desktop/scadaproj-audit/components/audit/spec/EVENT-MODEL.md` — field-by-field event model + per-project mapping table
- `~/Desktop/scadaproj-audit/components/audit/shared-contract/ZB.MOM.WW.Audit.md` — public API contract (on paper)
- `~/Desktop/scadaproj-audit/components/audit/GAPS.md` — adoption backlog
+8
View File
@@ -0,0 +1,8 @@
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.Audit/ZB.MOM.WW.Audit.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.Audit.Tests/ZB.MOM.WW.Audit.Tests.csproj" />
</Folder>
</Solution>
+4
View File
@@ -0,0 +1,4 @@
#!/usr/bin/env bash
# pack.sh — produce the ZB.MOM.WW.Audit NuGet package into ./artifacts.
set -euo pipefail
dotnet pack -c Release -o ./artifacts
@@ -0,0 +1,50 @@
namespace ZB.MOM.WW.Audit;
/// <summary>
/// Canonical, transport-agnostic audit record — who did what, when, with what outcome.
/// Required core + optional common fields + a <see cref="DetailsJson"/> extension bag. Each
/// sister app maps its own record onto this; domain vocabularies (channels/kinds/event-types)
/// map into <see cref="Action"/>/<see cref="Category"/>/<see cref="DetailsJson"/> and are not
/// modelled here. See scadaproj/components/audit/spec/EVENT-MODEL.md.
/// </summary>
public sealed record AuditEvent
{
/// <summary>Idempotency key uniquely identifying this audit event.</summary>
public required Guid EventId { get; init; }
/// <summary>When the audited action occurred. Normalized to UTC on assignment.</summary>
/// <remarks>Participates in record value-equality as a normalized instant: two events whose
/// <c>OccurredAtUtc</c> denote the same instant at different offsets (e.g. <c>12:00+05:00</c> and
/// <c>07:00Z</c>) compare equal and share a hash code. Relevant to consumers that dedup/key on
/// <see cref="AuditEvent"/> value-equality.</remarks>
public required DateTimeOffset OccurredAtUtc
{
get => _occurredAtUtc;
init => _occurredAtUtc = value.ToUniversalTime();
}
private readonly DateTimeOffset _occurredAtUtc;
/// <summary>Who performed the action (identity string; the ZB.MOM.WW.Auth principal at adoption).</summary>
public required string Actor { get; init; }
/// <summary>What was done — a verb/event-type string.</summary>
public required string Action { get; init; }
/// <summary>Normalized outcome.</summary>
public required AuditOutcome Outcome { get; init; }
/// <summary>Optional subsystem/grouping for the action.</summary>
public string? Category { get; init; }
/// <summary>Optional target of the action (resource/method/connection).</summary>
public string? Target { get; init; }
/// <summary>Optional node that emitted the event.</summary>
public string? SourceNode { get; init; }
/// <summary>Optional correlation id joining this row to its originating request/workflow.</summary>
public Guid? CorrelationId { get; init; }
/// <summary>Optional JSON extension carrying project-specific fields.</summary>
public string? DetailsJson { get; init; }
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Normalized outcome of an audited action.</summary>
public enum AuditOutcome
{
/// <summary>The action completed successfully.</summary>
Success,
/// <summary>The action failed due to an error.</summary>
Failure,
/// <summary>The action was rejected by authentication/authorization.</summary>
Denied,
}
@@ -0,0 +1,21 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions;
namespace ZB.MOM.WW.Audit;
/// <summary>DI helpers for ZB.MOM.WW.Audit.</summary>
public static class AuditServiceCollectionExtensions
{
/// <summary>
/// Registers safe defaults — <see cref="NullAuditRedactor"/> and <see cref="NoOpAuditWriter"/> —
/// using TryAdd so a consumer that has already registered a real writer/redactor wins. Consumers
/// compose <see cref="RedactingAuditWriter"/>/<see cref="CompositeAuditWriter"/> around their own sink.
/// </summary>
public static IServiceCollection AddZbAudit(this IServiceCollection services)
{
ArgumentNullException.ThrowIfNull(services);
services.TryAddSingleton<IAuditRedactor>(NullAuditRedactor.Instance);
services.TryAddSingleton<IAuditWriter>(NoOpAuditWriter.Instance);
return services;
}
}
@@ -0,0 +1,28 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Fans an event out to several writers. Best-effort: a failing writer does not stop the others.</summary>
/// <remarks>A failing writer's exception is swallowed so the fan-out drains and the caller is never
/// aborted — but <see cref="OperationCanceledException"/> is re-thrown so cancellation is honored.</remarks>
public sealed class CompositeAuditWriter : IAuditWriter
{
private readonly IReadOnlyList<IAuditWriter> _inner;
/// <summary>Creates a composite over the given writers.</summary>
public CompositeAuditWriter(IEnumerable<IAuditWriter> inner)
{
ArgumentNullException.ThrowIfNull(inner);
_inner = inner.ToArray();
}
/// <inheritdoc />
public async Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
{
ArgumentNullException.ThrowIfNull(evt);
foreach (var writer in _inner)
{
try { await writer.WriteAsync(evt, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { throw; } // honor cancellation; do not swallow
catch { /* best-effort seam: a failing writer must not stop the others or the caller */ }
}
}
}
@@ -0,0 +1,13 @@
namespace ZB.MOM.WW.Audit;
/// <summary>
/// Filters an <see cref="AuditEvent"/> between construction and persistence — truncates oversized
/// fields and scrubs sensitive content. Pure function: returns a filtered COPY and MUST NOT throw
/// (over-redact on internal failure). Shaped to mirror Telemetry's <c>ILogRedactor</c> so a future
/// ZB.MOM.WW.Hosting aggregator can wire both consistently; intentionally has no dependency on it.
/// </summary>
public interface IAuditRedactor
{
/// <summary>Apply the configured truncation/redaction policy and return a filtered copy.</summary>
AuditEvent Apply(AuditEvent rawEvent);
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit;
/// <summary>
/// Best-effort sink for <see cref="AuditEvent"/>s. Implementations MUST swallow/log internal
/// failures rather than propagating them — a failed audit write must never abort the
/// user-facing action that produced it.
/// </summary>
public interface IAuditWriter
{
/// <summary>Persist an audit event. Best-effort; must not throw to the caller.</summary>
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Writer that discards events. Default when audit is disabled, and useful in tests.</summary>
public sealed class NoOpAuditWriter : IAuditWriter
{
/// <summary>Shared singleton instance.</summary>
public static readonly NoOpAuditWriter Instance = new();
private NoOpAuditWriter() { }
/// <inheritdoc />
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) => Task.CompletedTask;
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Identity redactor — returns the event unchanged. The default when no policy is configured.</summary>
public sealed class NullAuditRedactor : IAuditRedactor
{
/// <summary>Shared singleton instance.</summary>
public static readonly NullAuditRedactor Instance = new();
private NullAuditRedactor() { }
/// <inheritdoc />
public AuditEvent Apply(AuditEvent rawEvent) => rawEvent;
}
@@ -0,0 +1,24 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Decorator: applies an <see cref="IAuditRedactor"/>, then delegates to an inner <see cref="IAuditWriter"/>.</summary>
public sealed class RedactingAuditWriter : IAuditWriter
{
private readonly IAuditRedactor _redactor;
private readonly IAuditWriter _inner;
/// <summary>Creates the decorator around <paramref name="inner"/> using <paramref name="redactor"/>.</summary>
public RedactingAuditWriter(IAuditRedactor redactor, IAuditWriter inner)
{
ArgumentNullException.ThrowIfNull(redactor);
ArgumentNullException.ThrowIfNull(inner);
_redactor = redactor;
_inner = inner;
}
/// <inheritdoc />
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
{
ArgumentNullException.ThrowIfNull(evt);
return _inner.WriteAsync(_redactor.Apply(evt), ct);
}
}
@@ -0,0 +1,41 @@
namespace ZB.MOM.WW.Audit;
/// <summary>
/// Redactor that caps oversized <see cref="AuditEvent.DetailsJson"/> and <see cref="AuditEvent.Target"/>.
/// Never throws — over-redacts (drops DetailsJson) on internal failure. The secret-field policy
/// (which fields are sensitive) stays per-project; compose this with a project redactor as needed.
/// </summary>
public sealed class TruncatingAuditRedactor : IAuditRedactor
{
private readonly TruncatingAuditRedactorOptions _options;
/// <summary>Creates the redactor with the given options (defaults when null).</summary>
public TruncatingAuditRedactor(TruncatingAuditRedactorOptions? options = null)
=> _options = options ?? new TruncatingAuditRedactorOptions();
/// <inheritdoc />
public AuditEvent Apply(AuditEvent rawEvent)
{
try
{
return rawEvent with
{
Target = Truncate(rawEvent.Target, _options.MaxTargetLength),
DetailsJson = Truncate(rawEvent.DetailsJson, _options.MaxDetailsJsonLength),
};
}
catch
{
// Hard contract: never throw. Over-redact on internal failure.
return rawEvent with { DetailsJson = null };
}
}
private string? Truncate(string? value, int max)
{
if (value is null || value.Length <= max) return value;
var marker = _options.TruncationMarker;
if (marker.Length >= max) return marker[..max];
return string.Concat(value.AsSpan(0, max - marker.Length), marker);
}
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit;
/// <summary>Caps for <see cref="TruncatingAuditRedactor"/>.</summary>
public sealed class TruncatingAuditRedactorOptions
{
/// <summary>Max length of <see cref="AuditEvent.DetailsJson"/> before truncation. Default 4096.</summary>
public int MaxDetailsJsonLength { get; set; } = 4096;
/// <summary>Max length of <see cref="AuditEvent.Target"/> before truncation. Default 512.</summary>
public int MaxTargetLength { get; set; } = 512;
/// <summary>Marker appended to a truncated value. Default "…[truncated]".</summary>
public string TruncationMarker { get; set; } = "…[truncated]";
}
@@ -0,0 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Audit</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Canonical audit event model + best-effort writer and redactor seams for the ZB.MOM.WW SCADA family.</Description>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-audit</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-audit</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" />
</ItemGroup>
</Project>
@@ -0,0 +1,67 @@
namespace ZB.MOM.WW.Audit.Tests;
public class AuditEventTests
{
private static AuditEvent Minimal() => new()
{
EventId = Guid.NewGuid(),
OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "alice",
Action = "ConfigPublished",
Outcome = AuditOutcome.Success,
};
[Fact]
public void Required_core_fields_round_trip()
{
var id = Guid.NewGuid();
var evt = Minimal() with { EventId = id, Actor = "svc", Action = "ApiCall", Outcome = AuditOutcome.Denied };
Assert.Equal(id, evt.EventId);
Assert.Equal("svc", evt.Actor);
Assert.Equal("ApiCall", evt.Action);
Assert.Equal(AuditOutcome.Denied, evt.Outcome);
}
[Fact]
public void OccurredAtUtc_is_normalized_to_utc()
{
var local = new DateTimeOffset(2026, 6, 1, 12, 0, 0, TimeSpan.FromHours(5));
var evt = Minimal() with { OccurredAtUtc = local };
Assert.Equal(TimeSpan.Zero, evt.OccurredAtUtc.Offset);
Assert.Equal(local.UtcDateTime, evt.OccurredAtUtc.UtcDateTime);
}
[Fact]
public void Optional_fields_default_to_null()
{
var evt = Minimal();
Assert.Null(evt.Category);
Assert.Null(evt.Target);
Assert.Null(evt.SourceNode);
Assert.Null(evt.CorrelationId);
Assert.Null(evt.DetailsJson);
}
[Fact]
public void Records_with_same_values_are_equal()
{
var id = Guid.NewGuid();
var when = DateTimeOffset.UtcNow;
AuditEvent Make() => new() { EventId = id, OccurredAtUtc = when, Actor = "a", Action = "x", Outcome = AuditOutcome.Success };
Assert.Equal(Make(), Make());
}
[Fact]
public void Same_instant_at_different_offset_compares_equal()
{
// Guards the UTC-normalizing init-setter: if OccurredAtUtc is ever "simplified" back to a
// plain auto-property, these two (same instant, different offset) would stop comparing equal.
var id = Guid.NewGuid();
var utc = new DateTimeOffset(2026, 6, 1, 7, 0, 0, TimeSpan.Zero);
var plus5 = new DateTimeOffset(2026, 6, 1, 12, 0, 0, TimeSpan.FromHours(5)); // same instant as utc
AuditEvent With(DateTimeOffset when) =>
new() { EventId = id, OccurredAtUtc = when, Actor = "a", Action = "x", Outcome = AuditOutcome.Success };
Assert.Equal(With(utc), With(plus5));
Assert.Equal(With(utc).GetHashCode(), With(plus5).GetHashCode());
}
}
@@ -0,0 +1,32 @@
using Microsoft.Extensions.DependencyInjection;
namespace ZB.MOM.WW.Audit.Tests;
public class AuditServiceCollectionExtensionsTests
{
[Fact]
public void Registers_null_redactor_and_noop_writer_by_default()
{
var sp = new ServiceCollection().AddZbAudit().BuildServiceProvider();
Assert.IsType<NullAuditRedactor>(sp.GetRequiredService<IAuditRedactor>());
Assert.IsType<NoOpAuditWriter>(sp.GetRequiredService<IAuditWriter>());
}
[Fact]
public void Does_not_override_a_preregistered_writer()
{
var services = new ServiceCollection();
services.AddSingleton<IAuditWriter>(new CompositeAuditWriter(System.Array.Empty<IAuditWriter>()));
var sp = services.AddZbAudit().BuildServiceProvider();
Assert.IsType<CompositeAuditWriter>(sp.GetRequiredService<IAuditWriter>());
}
[Fact]
public void Does_not_override_a_preregistered_redactor()
{
var services = new ServiceCollection();
services.AddSingleton<IAuditRedactor>(new TruncatingAuditRedactor());
var sp = services.AddZbAudit().BuildServiceProvider();
Assert.IsType<TruncatingAuditRedactor>(sp.GetRequiredService<IAuditRedactor>());
}
}
@@ -0,0 +1,48 @@
namespace ZB.MOM.WW.Audit.Tests;
public class CompositeAuditWriterTests
{
private sealed class RecordingWriter : IAuditWriter
{
public int Count;
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) { Count++; return Task.CompletedTask; }
}
private sealed class ThrowingWriter : IAuditWriter
{
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) => throw new InvalidOperationException("boom");
}
private sealed class CancellingWriter : IAuditWriter
{
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) => throw new OperationCanceledException();
}
private static AuditEvent Evt() => new() { EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "a", Action = "x", Outcome = AuditOutcome.Success };
[Fact]
public async Task Fans_out_to_all_writers()
{
var a = new RecordingWriter(); var b = new RecordingWriter();
await new CompositeAuditWriter(new IAuditWriter[] { a, b }).WriteAsync(Evt());
Assert.Equal(1, a.Count);
Assert.Equal(1, b.Count);
}
[Fact]
public async Task One_failing_writer_does_not_stop_the_others()
{
var after = new RecordingWriter();
var sut = new CompositeAuditWriter(new IAuditWriter[] { new ThrowingWriter(), after });
await sut.WriteAsync(Evt()); // must not throw
Assert.Equal(1, after.Count);
}
[Fact]
public async Task Cancellation_is_propagated_not_swallowed()
{
// OperationCanceledException is re-thrown (unlike ordinary writer failures, which are swallowed).
var after = new RecordingWriter();
var sut = new CompositeAuditWriter(new IAuditWriter[] { new CancellingWriter(), after });
await Assert.ThrowsAsync<OperationCanceledException>(() => sut.WriteAsync(Evt()));
}
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit.Tests;
public class NoOpAuditWriterTests
{
[Fact]
public async Task WriteAsync_completes_without_error()
{
var evt = new AuditEvent { EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "a", Action = "x", Outcome = AuditOutcome.Success };
await NoOpAuditWriter.Instance.WriteAsync(evt);
}
}
@@ -0,0 +1,12 @@
namespace ZB.MOM.WW.Audit.Tests;
public class NullAuditRedactorTests
{
[Fact]
public void Apply_returns_input_unchanged()
{
var evt = new AuditEvent { EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "a", Action = "x", Outcome = AuditOutcome.Success, DetailsJson = "{\"k\":1}" };
Assert.Same(evt, NullAuditRedactor.Instance.Apply(evt));
}
}
@@ -0,0 +1,26 @@
namespace ZB.MOM.WW.Audit.Tests;
public class RedactingAuditWriterTests
{
private sealed class CapturingWriter : IAuditWriter
{
public AuditEvent? Last;
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) { Last = evt; return Task.CompletedTask; }
}
private sealed class StampRedactor : IAuditRedactor
{
public AuditEvent Apply(AuditEvent rawEvent) => rawEvent with { DetailsJson = "redacted" };
}
private static AuditEvent Evt() => new() { EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "a", Action = "x", Outcome = AuditOutcome.Success, DetailsJson = "secret" };
[Fact]
public async Task Inner_writer_receives_the_redacted_event()
{
var inner = new CapturingWriter();
var sut = new RedactingAuditWriter(new StampRedactor(), inner);
await sut.WriteAsync(Evt());
Assert.Equal("redacted", inner.Last!.DetailsJson);
}
}
@@ -0,0 +1,56 @@
namespace ZB.MOM.WW.Audit.Tests;
public class TruncatingAuditRedactorTests
{
private static AuditEvent Evt(string? details, string? target = null) => new()
{
EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow,
Actor = "a", Action = "x", Outcome = AuditOutcome.Success,
DetailsJson = details, Target = target,
};
[Fact]
public void Short_values_pass_through_unchanged()
{
var r = new TruncatingAuditRedactor(new() { MaxDetailsJsonLength = 100 });
var evt = Evt("small");
Assert.Equal("small", r.Apply(evt).DetailsJson);
}
[Fact]
public void Oversized_details_are_truncated_with_marker()
{
var opts = new TruncatingAuditRedactorOptions { MaxDetailsJsonLength = 10, TruncationMarker = "~" };
var r = new TruncatingAuditRedactor(opts);
var result = r.Apply(Evt(new string('x', 50)));
Assert.Equal(10, result.DetailsJson!.Length);
Assert.EndsWith("~", result.DetailsJson);
}
[Fact]
public void Oversized_target_is_truncated()
{
var r = new TruncatingAuditRedactor(new() { MaxTargetLength = 5, TruncationMarker = "" });
var result = r.Apply(Evt(null, target: "abcdefghij"));
Assert.Equal(5, result.Target!.Length);
}
[Fact]
public void Null_fields_are_left_null()
{
var r = new TruncatingAuditRedactor();
var result = r.Apply(Evt(null));
Assert.Null(result.DetailsJson);
Assert.Null(result.Target);
}
[Fact]
public void Marker_longer_than_max_clips_the_marker_itself()
{
// Misconfiguration: marker longer than the cap. Must not throw; clips to the first max chars.
var opts = new TruncatingAuditRedactorOptions { MaxDetailsJsonLength = 3, TruncationMarker = "…[truncated]" };
var r = new TruncatingAuditRedactor(opts);
var result = r.Apply(Evt(new string('x', 20)));
Assert.Equal(3, result.DetailsJson!.Length);
}
}
@@ -0,0 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Audit\ZB.MOM.WW.Audit.csproj" />
</ItemGroup>
</Project>
+482
View File
@@ -0,0 +1,482 @@
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.
##
## Get latest from `dotnet new gitignore`
# dotenv files
.env
# User-specific files
*.rsuser
*.suo
*.user
*.userosscache
*.sln.docstates
# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs
# Mono auto generated files
mono_crash.*
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
[Ww][Ii][Nn]32/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/
[Ll]ogs/
# Visual Studio 2015/2017 cache/options directory
.vs/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/
# Visual Studio 2017 auto generated files
Generated\ Files/
# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*
# NUnit
*.VisualState.xml
TestResult.xml
nunit-*.xml
# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c
# Benchmark Results
BenchmarkDotNet.Artifacts/
# .NET
project.lock.json
project.fragment.lock.json
artifacts/
# Tye
.tye/
# ASP.NET Scaffolding
ScaffoldingReadMe.txt
# StyleCop
StyleCopReport.xml
# Files built by Visual Studio
*_i.c
*_p.c
*_h.h
*.ilk
*.meta
*.obj
*.iobj
*.pch
*.pdb
*.ipdb
*.pgc
*.pgd
*.rsp
# but not Directory.Build.rsp, as it configures directory-level build defaults
!Directory.Build.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*_wpftmp.csproj
*.log
*.tlog
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc
# Chutzpah Test files
_Chutzpah*
# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opendb
*.opensdf
*.sdf
*.cachefile
*.VC.db
*.VC.VC.opendb
# Visual Studio profiler
*.psess
*.vsp
*.vspx
*.sap
# Visual Studio Trace Files
*.e2e
# TFS 2012 Local Workspace
$tf/
# Guidance Automation Toolkit
*.gpState
# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# TeamCity is a build add-in
_TeamCity*
# DotCover is a Code Coverage Tool
*.dotCover
# AxoCover is a Code Coverage Tool
.axoCover/*
!.axoCover/settings.json
# Coverlet is a free, cross platform Code Coverage Tool
coverage*.json
coverage*.xml
coverage*.info
# Visual Studio code coverage results
*.coverage
*.coveragexml
# NCrunch
_NCrunch_*
.*crunch*.local.xml
nCrunchTemp_*
# MightyMoose
*.mm.*
AutoTest.Net/
# Web workbench (sass)
.sass-cache/
# Installshield output folder
[Ee]xpress/
# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html
# Click-Once directory
publish/
# Publish Web Output
*.[Pp]ublish.xml
*.azurePubxml
# Note: Comment the next line if you want to checkin your web deploy settings,
# but database connection strings (with potential passwords) will be unencrypted
*.pubxml
*.publishproj
# Microsoft Azure Web App publish settings. Comment the next line if you want to
# checkin your Azure Web App publish settings, but sensitive information contained
# in these scripts will be unencrypted
PublishScripts/
# NuGet Packages
*.nupkg
# NuGet Symbol Packages
*.snupkg
# The packages folder can be ignored because of Package Restore
**/[Pp]ackages/*
# except build/, which is used as an MSBuild target.
!**/[Pp]ackages/build/
# Uncomment if necessary however generally it will be regenerated when needed
#!**/[Pp]ackages/repositories.config
# NuGet v3's project.json files produces more ignorable files
*.nuget.props
*.nuget.targets
# Microsoft Azure Build Output
csx/
*.build.csdef
# Microsoft Azure Emulator
ecf/
rcf/
# Windows Store app package directories and files
AppPackages/
BundleArtifacts/
Package.StoreAssociation.xml
_pkginfo.txt
*.appx
*.appxbundle
*.appxupload
# Visual Studio cache files
# files ending in .cache can be ignored
*.[Cc]ache
# but keep track of directories ending in .cache
!?*.[Cc]ache/
# Others
ClientBin/
~$*
*~
*.dbmdl
*.dbproj.schemaview
*.jfm
*.pfx
*.publishsettings
orleans.codegen.cs
# Including strong name files can present a security risk
# (https://github.com/github/gitignore/pull/2483#issue-259490424)
#*.snk
# Since there are multiple workflows, uncomment next line to ignore bower_components
# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
#bower_components/
# RIA/Silverlight projects
Generated_Code/
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
ServiceFabricBackup/
*.rptproj.bak
# SQL Server files
*.mdf
*.ldf
*.ndf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
*.rptproj.rsuser
*- [Bb]ackup.rdl
*- [Bb]ackup ([0-9]).rdl
*- [Bb]ackup ([0-9][0-9]).rdl
# Microsoft Fakes
FakesAssemblies/
# GhostDoc plugin setting file
*.GhostDoc.xml
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
node_modules/
# Visual Studio 6 build log
*.plg
# Visual Studio 6 workspace options file
*.opt
# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
*.vbw
# Visual Studio 6 auto-generated project file (contains which files were open etc.)
*.vbp
# Visual Studio 6 workspace and project file (working project files containing files to include in project)
*.dsw
*.dsp
# Visual Studio 6 technical files
*.ncb
*.aps
# Visual Studio LightSwitch build output
**/*.HTMLClient/GeneratedArtifacts
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
_Pvt_Extensions
# Paket dependency manager
.paket/paket.exe
paket-files/
# FAKE - F# Make
.fake/
# CodeRush personal settings
.cr/personal
# Python Tools for Visual Studio (PTVS)
__pycache__/
*.pyc
# Cake - Uncomment if you are using it
# tools/**
# !tools/packages.config
# Tabs Studio
*.tss
# Telerik's JustMock configuration file
*.jmconfig
# BizTalk build output
*.btp.cs
*.btm.cs
*.odx.cs
*.xsd.cs
# OpenCover UI analysis results
OpenCover/
# Azure Stream Analytics local run output
ASALocalRun/
# MSBuild Binary and Structured Log
*.binlog
# NVidia Nsight GPU debugger configuration file
*.nvuser
# MFractors (Xamarin productivity tool) working folder
.mfractor/
# Local History for Visual Studio
.localhistory/
# Visual Studio History (VSHistory) files
.vshistory/
# BeatPulse healthcheck temp database
healthchecksdb
# Backup folder for Package Reference Convert tool in Visual Studio 2017
MigrationBackup/
# Ionide (cross platform F# VS Code tools) working folder
.ionide/
# Fody - auto-generated XML schema
FodyWeavers.xsd
# VS Code files for those working on multiple tools
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
*.code-workspace
# Local History for Visual Studio Code
.history/
# Windows Installer files from build outputs
*.cab
*.msi
*.msix
*.msm
*.msp
# JetBrains Rider
*.sln.iml
.idea/
##
## Visual studio for Mac
##
# globs
Makefile.in
*.userprefs
*.usertasks
config.make
config.status
aclocal.m4
install-sh
autom4te.cache/
*.tar.gz
tarballs/
test-results/
# content below from: https://github.com/github/gitignore/blob/main/Global/macOS.gitignore
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
# content below from: https://github.com/github/gitignore/blob/main/Global/Windows.gitignore
# Windows thumbnail cache files
Thumbs.db
ehthumbs.db
ehthumbs_vista.db
# Dump file
*.stackdump
# Folder config file
[Dd]esktop.ini
# Recycle Bin used on file shares
$RECYCLE.BIN/
# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp
# Windows shortcuts
*.lnk
# Vim temporary swap files
*.swp
+72
View File
@@ -0,0 +1,72 @@
# ZB.MOM.WW.Health
Health-check libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). These are **libraries, not a service** — each package is linked directly into the consuming application at build time. There is no central health process or network hop; probes run in-process alongside the application.
The library normalizes the three-tier health endpoint convention (`/health/ready`, `/health/active`, `/healthz`) and provides reusable probe implementations so the three sister projects share a common surface without duplicating probe logic.
**Built at 0.1.0. NOT yet adopted by the three apps.** Adoption is tracked in `~/Desktop/scadaproj/components/health/GAPS.md`.
---
## Packages
| Package | Responsibilities | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Health` | Core tier convention, `MapZbHealth` extension, canonical JSON writer (`ZbHealthWriter`), `IActiveNodeGate` seam, `GrpcDependencyHealthCheck` reachability probe, tier-tag constants (`ZbHealthTags`). No Akka or EF dependency. | `Microsoft.AspNetCore.App` (framework ref), `Grpc.Net.Client` |
| `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` with a configurable `AkkaClusterStatusPolicy` (presets: `Default` / `OtOpcUaCompat`), `ActiveNodeHealthCheck` with an optional role filter, and `AkkaActiveNodeGate` that backs `IActiveNodeGate` from cluster member state. | `ZB.MOM.WW.Health`, `Akka.Cluster` |
| `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck<TContext>` — resolves `IDbContextFactory<TContext>` when registered, else a scoped `TContext`; pool-safe. Default probe: `CanConnectAsync`. Optional `ProbeQuery` delegate for query-based validation. | `ZB.MOM.WW.Health`, `Microsoft.EntityFrameworkCore` |
---
## Consumer matrix
| Consumer | `ZB.MOM.WW.Health` (core) | `ZB.MOM.WW.Health.Akka` | `ZB.MOM.WW.Health.EntityFrameworkCore` |
|---|:---:|:---:|:---:|
| **OtOpcUa** | yes | yes | yes |
| **MxAccessGateway** | yes | — | — |
| **ScadaBridge** | yes | yes | yes |
MxAccessGateway consumes the core package only — it has no Akka cluster and no EF DbContext. OtOpcUa and ScadaBridge consume all three packages.
---
## Build, test, and pack commands
```bash
# From ZB.MOM.WW.Health/
# Build
dotnet build ZB.MOM.WW.Health.slnx
dotnet build ZB.MOM.WW.Health.slnx -c Release
# Test (no external dependencies — no running Akka cluster, no database)
dotnet test ZB.MOM.WW.Health.slnx
# Pack (three .nupkg files land in artifacts/)
dotnet pack ZB.MOM.WW.Health.slnx -c Release -o ./artifacts
```
All three test assemblies run offline:
| Assembly | Tests |
|---|---|
| `ZB.MOM.WW.Health.Tests` | 20 |
| `ZB.MOM.WW.Health.Akka.Tests` | 32 |
| `ZB.MOM.WW.Health.EntityFrameworkCore.Tests` | 6 |
| **Total** | **58** |
`GeneratePackageOnBuild` is off — pack explicitly with the command above.
---
## Status
Built at **0.1.0** and published to the Gitea NuGet feed. **Not yet adopted** by the three apps — adoption is tracked in the component backlog:
- `~/Desktop/scadaproj/components/health/GAPS.md` — adoption order, effort, and risk
Design documentation:
- `~/Desktop/scadaproj/components/health/spec/SPEC.md` — normalized three-tier target
- `~/Desktop/scadaproj/components/health/shared-contract/ZB.MOM.WW.Health.md` — proposed API (aligned to shipped code)
- `~/Desktop/scadaproj/components/health/current-state/` — per-project current state (code-verified)
+12
View File
@@ -0,0 +1,12 @@
<Project>
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<Version>0.1.0</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
</Project>
+31
View File
@@ -0,0 +1,31 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<!-- Akka -->
<PackageVersion Include="Akka.Cluster" Version="1.5.62" />
<PackageVersion Include="Akka.TestKit.Xunit2" Version="1.5.62" />
<!-- Health Checks / ASP.NET Core -->
<PackageVersion Include="Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions" Version="10.0.7" />
<PackageVersion Include="Microsoft.AspNetCore.Mvc.Testing" Version="10.0.7" />
<!-- gRPC -->
<PackageVersion Include="Grpc.Net.Client" Version="2.71.0" />
<!-- Entity Framework Core -->
<PackageVersion Include="Microsoft.EntityFrameworkCore" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.Sqlite" Version="10.0.7" />
<PackageVersion Include="Microsoft.EntityFrameworkCore.InMemory" Version="10.0.7" />
<!-- Test -->
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
</ItemGroup>
</Project>
+84
View File
@@ -0,0 +1,84 @@
# ZB.MOM.WW.Health
Health-check libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). These are **libraries, not a service** — each package is linked directly into the consuming application at build time. There is no central health process or network hop; probes run in-process alongside the application.
The library normalizes the three-tier health endpoint convention (`/health/ready`, `/health/active`, `/healthz`) and provides reusable probe implementations so the three sister projects share a common surface without duplicating probe logic.
---
## Packages
| Package | Description | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Health` | Core tiers, `MapZbHealth` extension, canonical JSON writer (`ZbHealthWriter`), `IActiveNodeGate` seam, `GrpcDependencyHealthCheck` reachability probe, and tier-tag constants (`ZbHealthTags`). No Akka or EF dependency. | `Microsoft.AspNetCore.App` (framework ref), `Grpc.Net.Client` |
| `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` with a configurable `AkkaClusterStatusPolicy` (presets: `Default` three-way / `OtOpcUaCompat` two-way), `ActiveNodeHealthCheck` with an optional role filter, and `AkkaActiveNodeGate` that backs `IActiveNodeGate` from the cluster member state. | `ZB.MOM.WW.Health`, `Akka.Cluster` |
| `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck<TContext>` with `CanConnectAsync` by default and an optional `ProbeQuery` delegate for custom connectivity validation. | `ZB.MOM.WW.Health`, `Microsoft.EntityFrameworkCore` |
---
## Consumer Matrix
| Consumer | `ZB.MOM.WW.Health` (core) | `ZB.MOM.WW.Health.Akka` | `ZB.MOM.WW.Health.EntityFrameworkCore` |
|---|:---:|:---:|:---:|
| **OtOpcUa** | yes (+ `GrpcDependencyHealthCheck` for the MxAccessGateway channel) | yes | yes |
| **MxAccessGateway** | yes (+ `GrpcDependencyHealthCheck` for the x86 worker IPC) | — | — |
| **ScadaBridge** | yes | yes | yes |
MxAccessGateway consumes the core package only — it has no Akka cluster and no EF DbContext. OtOpcUa and ScadaBridge consume all three packages.
---
## Versioning
All three packages are versioned **lockstep** from `Directory.Build.props`. The current release is **0.1.0**. A single version bump in `Directory.Build.props` bumps all three packages simultaneously — consumers should reference the same version for all ZB.MOM.WW.Health packages.
---
## Building and testing
```bash
# from ZB.MOM.WW.Health/
dotnet build ZB.MOM.WW.Health.slnx
dotnet test ZB.MOM.WW.Health.slnx
```
All three test assemblies run with `dotnet test` and require no external dependencies (no running Akka cluster, no database):
| Assembly | Tests |
|---|---|
| `ZB.MOM.WW.Health.Tests` | 20 |
| `ZB.MOM.WW.Health.Akka.Tests` | 32 |
| `ZB.MOM.WW.Health.EntityFrameworkCore.Tests` | 6 |
| **Total** | **58** |
---
## Packing
```bash
dotnet pack ZB.MOM.WW.Health.slnx -c Release -o ./artifacts
```
Produces three `.nupkg` files in `artifacts/`:
```
ZB.MOM.WW.Health.0.1.0.nupkg
ZB.MOM.WW.Health.Akka.0.1.0.nupkg
ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg
```
`GeneratePackageOnBuild` is off — pack explicitly as above.
---
## Status
**Built at 0.1.0. NOT yet adopted by the three apps.** Adoption is tracked in the component backlog:
- `~/Desktop/scadaproj/components/health/GAPS.md`
Design documentation lives alongside that backlog:
- `~/Desktop/scadaproj/components/health/spec/SPEC.md` — normalized three-tier target
- `~/Desktop/scadaproj/components/health/shared-contract/ZB.MOM.WW.Health.md` — proposed API
- `~/Desktop/scadaproj/components/health/current-state/` — per-project current state (code-verified)
+12
View File
@@ -0,0 +1,12 @@
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.Health.Akka/ZB.MOM.WW.Health.Akka.csproj" />
<Project Path="src/ZB.MOM.WW.Health.EntityFrameworkCore/ZB.MOM.WW.Health.EntityFrameworkCore.csproj" />
<Project Path="src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.Health.Akka.Tests/ZB.MOM.WW.Health.Akka.Tests.csproj" />
<Project Path="tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests.csproj" />
<Project Path="tests/ZB.MOM.WW.Health.Tests/ZB.MOM.WW.Health.Tests.csproj" />
</Folder>
</Solution>
@@ -0,0 +1,153 @@
using System.Runtime.CompilerServices;
using Akka.Actor;
using Akka.Cluster;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
[assembly: InternalsVisibleTo("ZB.MOM.WW.Health.Akka.Tests")]
namespace ZB.MOM.WW.Health.Akka;
/// <summary>
/// Pure decision function for the active / leader probe, factored out of
/// <see cref="ActiveNodeHealthCheck"/> so the role-less and role-filtered matrices are exhaustively
/// table-testable without forming a real cluster.
/// </summary>
internal static class ActiveNodeDecision
{
/// <summary>
/// Maps the resolved cluster facts to a <see cref="HealthStatus"/>.
/// </summary>
/// <param name="selfUp">Whether the local node's member status is <c>Up</c>.</param>
/// <param name="isLeader">
/// Whether the local node is the leader: the cluster leader in role-less mode, or the
/// role-singleton leader in role-filtered mode.
/// </param>
/// <param name="hasRole">
/// Whether the local node carries <paramref name="requiredRole"/>. Ignored when
/// <paramref name="requiredRole"/> is <c>null</c>.
/// </param>
/// <param name="requiredRole">
/// The role to scope the check to, or <c>null</c> for the role-less (whole-cluster-leader) mode.
/// </param>
/// <returns>
/// Role-less: Healthy iff the node is Up and the cluster leader, otherwise Unhealthy.
/// Role-filtered: Healthy when the node lacks the role (probe irrelevant) or carries the role and
/// is the role-singleton leader; Degraded when it carries the role but is not the leader.
/// </returns>
public static HealthStatus Evaluate(bool selfUp, bool isLeader, bool hasRole, string? requiredRole)
{
if (requiredRole is null)
return selfUp && isLeader ? HealthStatus.Healthy : HealthStatus.Unhealthy;
if (!hasRole)
return HealthStatus.Healthy;
return isLeader ? HealthStatus.Healthy : HealthStatus.Degraded;
}
}
/// <summary>
/// Health check that reports whether this node is the designated active / leader node.
/// An optional role scopes the check to nodes carrying that role. Register to the
/// <see cref="ZbHealthTags.Active"/> tag.
/// </summary>
/// <remarks>
/// The <see cref="ActorSystem"/> is resolved lazily from the service provider. If it is not yet
/// available — e.g. during startup before Akka is initialised — the check returns
/// <see cref="HealthStatus.Degraded"/> rather than throwing, so it is startup-safe.
/// </remarks>
public sealed class ActiveNodeHealthCheck : IHealthCheck
{
private readonly IServiceProvider _serviceProvider;
private readonly string? _role;
/// <summary>
/// Role-less constructor: Healthy when the node is <c>Up</c> and the cluster leader
/// (ScadaBridge ActiveNode pattern); Unhealthy otherwise. Degraded when the ActorSystem /
/// cluster is not yet ready.
/// </summary>
/// <param name="serviceProvider">
/// The application service provider. The <see cref="ActorSystem"/> is resolved lazily so the
/// check is startup-safe: if no <see cref="ActorSystem"/> is registered yet the result is Degraded.
/// </param>
public ActiveNodeHealthCheck(IServiceProvider serviceProvider)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
_role = null;
}
/// <summary>
/// Role-filtered constructor: Healthy when the node lacks <paramref name="role"/> or carries it
/// and is the role-singleton leader; Degraded when it carries the role but is not the leader
/// (OtOpcUa AdminRoleLeader pattern). Degraded when the ActorSystem / cluster is not yet ready.
/// </summary>
/// <param name="serviceProvider">
/// The application service provider. The <see cref="ActorSystem"/> is resolved lazily so the
/// check is startup-safe: if no <see cref="ActorSystem"/> is registered yet the result is Degraded.
/// </param>
/// <param name="role">The Akka cluster role to scope the check to.</param>
public ActiveNodeHealthCheck(IServiceProvider serviceProvider, string role)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
ArgumentException.ThrowIfNullOrWhiteSpace(role);
_role = role;
}
/// <inheritdoc />
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var system = _serviceProvider.GetService<ActorSystem>();
if (system is null)
return Task.FromResult(HealthCheckResult.Degraded("ActorSystem not yet available."));
var cluster = Cluster.Get(system);
var self = cluster.SelfMember;
var selfUp = self.Status == MemberStatus.Up;
bool hasRole;
bool isLeader;
if (_role is null)
{
hasRole = false;
var leader = cluster.State.Leader;
isLeader = leader is not null && leader == self.Address;
}
else
{
hasRole = self.HasRole(_role);
var roleLeader = cluster.State.RoleLeader(_role);
isLeader = roleLeader is not null && roleLeader == self.Address;
}
var health = ActiveNodeDecision.Evaluate(selfUp, isLeader, hasRole, _role);
var description = DescribeResult(health, self.Status, selfUp, isLeader);
var result = health switch
{
HealthStatus.Healthy => HealthCheckResult.Healthy(description),
HealthStatus.Degraded => HealthCheckResult.Degraded(description),
_ => HealthCheckResult.Unhealthy(description),
};
return Task.FromResult(result);
}
private string DescribeResult(HealthStatus health, MemberStatus status, bool selfUp, bool isLeader)
{
if (_role is null)
{
if (health == HealthStatus.Healthy)
return "Active node (cluster leader).";
return selfUp && !isLeader
? "Standby: node is Up but not the cluster leader."
: $"Standby: node is not Up (status: {status}).";
}
return health switch
{
HealthStatus.Healthy => $"Active for role '{_role}' (or not a role member).",
_ => $"Role '{_role}' member but not leader.",
};
}
}
@@ -0,0 +1,50 @@
using Akka.Actor;
using Akka.Cluster;
using Microsoft.Extensions.DependencyInjection;
namespace ZB.MOM.WW.Health.Akka;
/// <summary>
/// <see cref="IActiveNodeGate"/> implementation that computes <see cref="IsActiveNode"/> directly
/// from the Akka cluster state (self member <c>Up</c> and the local node is the cluster leader).
/// Register as a singleton.
/// </summary>
/// <remarks>
/// The <see cref="ActorSystem"/> is resolved lazily from the service provider; if it is not yet
/// available — e.g. during startup before Akka is initialised — <see cref="IsActiveNode"/> returns
/// <c>false</c> (the safe default during startup). This gate reads the cluster state directly and
/// does not resolve <see cref="ActiveNodeHealthCheck"/> from DI.
/// </remarks>
public sealed class AkkaActiveNodeGate : IActiveNodeGate
{
private readonly IServiceProvider _serviceProvider;
/// <summary>Initializes a new <see cref="AkkaActiveNodeGate"/>.</summary>
/// <param name="serviceProvider">
/// The application service provider. The <see cref="ActorSystem"/> is resolved lazily; if it is
/// not yet available <see cref="IsActiveNode"/> returns <c>false</c>.
/// </param>
public AkkaActiveNodeGate(IServiceProvider serviceProvider)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
}
/// <inheritdoc />
public bool IsActiveNode
{
get
{
var system = _serviceProvider.GetService<ActorSystem>();
if (system is null)
return false;
var cluster = Cluster.Get(system);
var self = cluster.SelfMember;
if (self.Status != MemberStatus.Up)
return false;
var leader = cluster.State.Leader;
return leader is not null && leader == self.Address;
}
}
}
@@ -0,0 +1,56 @@
using Akka.Actor;
using Akka.Cluster;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health.Akka;
/// <summary>
/// Health check that maps the local node's Akka cluster membership status to a
/// <see cref="HealthStatus"/> through a configurable <see cref="AkkaClusterStatusPolicy"/>.
/// Register to the <see cref="ZbHealthTags.Ready"/> tag (recommended <c>[ready, active]</c>).
/// </summary>
/// <remarks>
/// The <see cref="ActorSystem"/> is resolved lazily from the service provider. If it is not yet
/// available — e.g. during startup before Akka is initialised — the check returns
/// <see cref="HealthStatus.Degraded"/> rather than throwing, so it is safe to register before Akka
/// is fully up.
/// </remarks>
public sealed class AkkaClusterHealthCheck : IHealthCheck
{
private readonly IServiceProvider _serviceProvider;
private readonly AkkaClusterStatusPolicy _policy;
/// <summary>Initializes a new <see cref="AkkaClusterHealthCheck"/>.</summary>
/// <param name="serviceProvider">
/// The application service provider. The <see cref="ActorSystem"/> is resolved lazily so the
/// check is startup-safe: if no <see cref="ActorSystem"/> is registered yet the result is Degraded.
/// </param>
/// <param name="policy">The status-to-health mapping policy to apply.</param>
public AkkaClusterHealthCheck(IServiceProvider serviceProvider, AkkaClusterStatusPolicy policy)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
_policy = policy ?? throw new ArgumentNullException(nameof(policy));
}
/// <inheritdoc />
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var system = _serviceProvider.GetService<ActorSystem>();
if (system is null)
return Task.FromResult(HealthCheckResult.Degraded("ActorSystem not yet available."));
var status = Cluster.Get(system).SelfMember.Status;
var health = _policy.Evaluate(status);
var description = $"Akka cluster member status: {status}";
var result = health switch
{
HealthStatus.Healthy => HealthCheckResult.Healthy(description),
HealthStatus.Degraded => HealthCheckResult.Degraded(description),
_ => HealthCheckResult.Unhealthy(description),
};
return Task.FromResult(result);
}
}
@@ -0,0 +1,56 @@
using Akka.Cluster;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health.Akka;
/// <summary>
/// Pure mapping from an Akka <see cref="MemberStatus"/> to a <see cref="HealthStatus"/>.
/// </summary>
/// <remarks>
/// <para>
/// Wraps a <see cref="Func{MemberStatus, HealthStatus}"/> so the decision logic is a deterministic,
/// table-testable function — <see cref="AkkaClusterHealthCheck"/> only supplies the live cluster
/// status. Two named presets reconcile the divergence between the existing ScadaBridge and OtOpcUa
/// implementations; construct a custom instance for project-specific overrides.
/// </para>
/// </remarks>
public sealed class AkkaClusterStatusPolicy
{
private readonly Func<MemberStatus, HealthStatus> _evaluate;
/// <summary>Initializes a new <see cref="AkkaClusterStatusPolicy"/>.</summary>
/// <param name="evaluate">The pure status-to-health mapping function.</param>
public AkkaClusterStatusPolicy(Func<MemberStatus, HealthStatus> evaluate)
{
_evaluate = evaluate ?? throw new ArgumentNullException(nameof(evaluate));
}
/// <summary>Applies the policy to the given member status.</summary>
/// <param name="status">The local node's Akka cluster member status.</param>
/// <returns>The mapped <see cref="HealthStatus"/>.</returns>
public HealthStatus Evaluate(MemberStatus status) => _evaluate(status);
/// <summary>
/// ScadaBridge origin: <c>Up</c>/<c>Joining</c> → Healthy, <c>Leaving</c>/<c>Exiting</c> →
/// Degraded, everything else → Unhealthy. The convergence target for all projects.
/// </summary>
public static AkkaClusterStatusPolicy Default { get; } = new(static status => status switch
{
MemberStatus.Up or MemberStatus.Joining => HealthStatus.Healthy,
MemberStatus.Leaving or MemberStatus.Exiting => HealthStatus.Degraded,
_ => HealthStatus.Unhealthy,
});
/// <summary>
/// OtOpcUa origin: self-<c>Up</c>-among-reachable-members → Healthy, any non-<c>Up</c> state
/// (including <c>Leaving</c>/<c>Exiting</c>/<c>Down</c>) → Degraded. Provided for backward
/// compatibility during OtOpcUa's migration.
/// </summary>
/// <remarks>
/// The original OtOpcUa check scanned the reachable member set for self with
/// <c>Status == Up</c>; any other state caused the scan to miss self and collapse to Degraded.
/// This preset reproduces that behavior: only <see cref="MemberStatus.Up"/> is Healthy.
/// </remarks>
public static AkkaClusterStatusPolicy OtOpcUaCompat { get; } = new(static status =>
status == MemberStatus.Up ? HealthStatus.Healthy : HealthStatus.Degraded);
}
@@ -0,0 +1,21 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Health.Akka</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Akka.Cluster health-check extensions for the ZB.MOM.WW SCADA family.</Description>
<PackageTags>health-checks;akka;akka-cluster;scada;wonderware;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.Health\ZB.MOM.WW.Health.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Akka.Cluster" />
</ItemGroup>
</Project>
@@ -0,0 +1,111 @@
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health.EntityFrameworkCore;
/// <summary>
/// Health check that verifies database reachability through an EF Core <typeparamref name="TContext"/>.
/// </summary>
/// <remarks>
/// <para>
/// The default probe calls
/// <see cref="Microsoft.EntityFrameworkCore.Infrastructure.DatabaseFacade.CanConnectAsync(CancellationToken)"/>
/// (the ScadaBridge pattern): <see cref="HealthStatus.Healthy"/> when it returns <c>true</c>,
/// <see cref="HealthStatus.Unhealthy"/> when it returns <c>false</c> or throws. Supplying
/// <see cref="DatabaseHealthCheckOptions{TContext}.ProbeQuery"/> swaps in a stricter query-based probe
/// (the OtOpcUa "query <c>Deployments</c>" pattern): the result is <see cref="HealthStatus.Healthy"/>
/// unless the delegate throws, in which case it is <see cref="HealthStatus.Unhealthy"/>. No exception
/// escapes <see cref="CheckHealthAsync"/>.
/// </para>
/// <para>
/// The context is resolved from the application <see cref="IServiceProvider"/>: an
/// <see cref="IDbContextFactory{TContext}"/> is used when one is registered (each probe gets a fresh,
/// disposed context); otherwise a scoped <typeparamref name="TContext"/> is resolved from a new DI
/// scope. Recommended registration tag: <c>ZbHealthTags.Ready</c> (applied by the registrant).
/// </para>
/// <para>
/// The scoped-resolution path is safe for <c>AddDbContextPool</c>: disposing the
/// <see cref="IServiceScope"/> returns the pooled context to the pool rather than destroying it,
/// so no pooled instance is prematurely discarded.
/// </para>
/// </remarks>
/// <typeparam name="TContext">The EF Core <see cref="DbContext"/> to probe.</typeparam>
public sealed class DatabaseHealthCheck<TContext> : IHealthCheck
where TContext : DbContext
{
private readonly IServiceProvider _serviceProvider;
private readonly DatabaseHealthCheckOptions<TContext> _options;
/// <summary>Initializes a new <see cref="DatabaseHealthCheck{TContext}"/>.</summary>
/// <param name="serviceProvider">
/// Application service provider used to resolve <typeparamref name="TContext"/> — preferring a
/// registered <see cref="IDbContextFactory{TContext}"/>, otherwise a scoped instance.
/// </param>
/// <param name="options">
/// Probe override and timeout. When <c>null</c>, the default <c>CanConnectAsync</c> probe with a
/// 10 s timeout is used.
/// </param>
public DatabaseHealthCheck(
IServiceProvider serviceProvider,
DatabaseHealthCheckOptions<TContext>? options = null)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
_options = options ?? new DatabaseHealthCheckOptions<TContext>();
}
/// <inheritdoc />
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
timeoutCts.CancelAfter(_options.Timeout);
try
{
var result = await ProbeAsync(timeoutCts.Token).ConfigureAwait(false);
// Eagerly release the pending timer on the happy path so the OS timer
// resource is not held for the full timeout duration.
timeoutCts.CancelAfter(Timeout.InfiniteTimeSpan);
return result;
}
catch (OperationCanceledException ex)
when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"Database probe timed out after {_options.Timeout}.", ex);
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy("Database connection failed.", ex);
}
}
private async Task<HealthCheckResult> ProbeAsync(CancellationToken cancellationToken)
{
var factory = _serviceProvider.GetService<IDbContextFactory<TContext>>();
if (factory is not null)
{
await using var db = await factory.CreateDbContextAsync(cancellationToken).ConfigureAwait(false);
return await RunProbeAsync(db, cancellationToken).ConfigureAwait(false);
}
await using var scope = _serviceProvider.CreateAsyncScope();
var scoped = scope.ServiceProvider.GetRequiredService<TContext>();
return await RunProbeAsync(scoped, cancellationToken).ConfigureAwait(false);
}
private async Task<HealthCheckResult> RunProbeAsync(TContext db, CancellationToken cancellationToken)
{
if (_options.ProbeQuery is { } probeQuery)
{
await probeQuery(db, cancellationToken).ConfigureAwait(false);
return HealthCheckResult.Healthy("Database query probe succeeded.");
}
var canConnect = await db.Database.CanConnectAsync(cancellationToken).ConfigureAwait(false);
return canConnect
? HealthCheckResult.Healthy("Database connection is available.")
: HealthCheckResult.Unhealthy("Database connection failed.");
}
}
@@ -0,0 +1,28 @@
using Microsoft.EntityFrameworkCore;
namespace ZB.MOM.WW.Health.EntityFrameworkCore;
/// <summary>
/// Options for <see cref="DatabaseHealthCheck{TContext}"/>.
/// </summary>
/// <typeparam name="TContext">The EF Core <see cref="DbContext"/> the probe runs against.</typeparam>
public sealed class DatabaseHealthCheckOptions<TContext>
where TContext : DbContext
{
/// <summary>
/// Optional query-based probe that overrides the default
/// <see cref="Microsoft.EntityFrameworkCore.Infrastructure.DatabaseFacade.CanConnectAsync(CancellationToken)"/>
/// reachability check with stricter, query-level validation (the OtOpcUa "query <c>Deployments</c>"
/// pattern). Throw to signal failure; return normally to signal success.
/// </summary>
/// <remarks>
/// Example: <c>(db, ct) => db.Deployments.AsNoTracking().Take(1).ToListAsync(ct)</c>.
/// When <c>null</c>, the default <c>CanConnectAsync</c> probe is used.
/// </remarks>
public Func<TContext, CancellationToken, Task>? ProbeQuery { get; set; }
/// <summary>
/// Maximum time the probe may run before it is treated as a failure. Defaults to 10 seconds.
/// </summary>
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
}
@@ -0,0 +1,21 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Health.EntityFrameworkCore</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Entity Framework Core health-check extensions for the ZB.MOM.WW SCADA family.</Description>
<PackageTags>health-checks;entity-framework-core;efcore;scada;wonderware;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.Health\ZB.MOM.WW.Health.csproj" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.EntityFrameworkCore" />
</ItemGroup>
</Project>
@@ -0,0 +1,80 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Endpoint filter that gates a route to the active node. Resolves <see cref="IActiveNodeGate"/>
/// from request services; when it is registered and reports a standby
/// (<see cref="IActiveNodeGate.IsActiveNode"/> is <c>false</c>) the request is short-circuited with
/// HTTP 503 and a <c>Retry-After</c> header. When no gate is registered (non-clustered host / tests)
/// the request is served, preserving prior behaviour.
/// </summary>
public sealed class ActiveNodeGateEndpointFilter : IEndpointFilter
{
/// <summary>Default <c>Retry-After</c> value (seconds) advertised on a standby 503 response.</summary>
private const int DefaultRetryAfterSeconds = 5;
private readonly int _retryAfterSeconds;
/// <summary>Initializes a new <see cref="ActiveNodeGateEndpointFilter"/> using the default 5 s retry-after.</summary>
public ActiveNodeGateEndpointFilter()
: this(DefaultRetryAfterSeconds)
{
}
/// <summary>Initializes a new <see cref="ActiveNodeGateEndpointFilter"/>.</summary>
/// <param name="retryAfterSeconds">The <c>Retry-After</c> value (seconds) advertised on a standby 503 response.</param>
public ActiveNodeGateEndpointFilter(int retryAfterSeconds) => _retryAfterSeconds = retryAfterSeconds;
/// <summary>
/// Returns 503 (with <c>Retry-After</c>) when the resolved <see cref="IActiveNodeGate"/> reports
/// a standby node; otherwise delegates to the next filter or endpoint handler.
/// </summary>
/// <param name="context">The endpoint filter invocation context.</param>
/// <param name="next">The next filter or endpoint handler in the pipeline.</param>
public async ValueTask<object?> InvokeAsync(
EndpointFilterInvocationContext context,
EndpointFilterDelegate next)
{
ArgumentNullException.ThrowIfNull(context);
ArgumentNullException.ThrowIfNull(next);
var httpContext = context.HttpContext;
var gate = httpContext.RequestServices.GetService<IActiveNodeGate>();
if (gate is { IsActiveNode: false })
{
httpContext.Response.Headers.RetryAfter =
_retryAfterSeconds.ToString(System.Globalization.CultureInfo.InvariantCulture);
return Results.StatusCode(StatusCodes.Status503ServiceUnavailable);
}
return await next(context);
}
}
/// <summary>
/// Route convention that gates endpoint(s) to the active node, returning 503 on standby nodes.
/// </summary>
public static class ActiveNodeGateExtensions
{
/// <summary>
/// Applies <see cref="ActiveNodeGateEndpointFilter"/> to the decorated endpoint(s): the route is
/// served only when the DI-resolved <see cref="IActiveNodeGate"/> reports the node active, and
/// returns 503 with a <c>Retry-After</c> header when the node is a standby.
/// </summary>
/// <param name="builder">The endpoint convention builder to decorate.</param>
/// <param name="retryAfterSeconds">
/// The <c>Retry-After</c> value (seconds) advertised on a standby 503 response. Defaults to 5.
/// </param>
/// <returns>The same <paramref name="builder"/> for chaining.</returns>
public static IEndpointConventionBuilder RequireActiveNode(
this IEndpointConventionBuilder builder,
int retryAfterSeconds = 5)
{
ArgumentNullException.ThrowIfNull(builder);
return builder.AddEndpointFilter(new ActiveNodeGateEndpointFilter(retryAfterSeconds));
}
}
@@ -0,0 +1,88 @@
using Grpc.Core;
using Grpc.Net.Client;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Health check that verifies a downstream gRPC dependency is reachable over its
/// <see cref="GrpcChannel"/>.
/// </summary>
/// <remarks>
/// <para>
/// The probe is injectable via <see cref="GrpcDependencyOptions.Probe"/>; the default drives the
/// channel to a connected state with <see cref="GrpcChannel.ConnectAsync"/>. The result is
/// <see cref="HealthStatus.Healthy"/> when the probe returns <c>true</c>, and
/// <see cref="HealthStatus.Unhealthy"/> when it returns <c>false</c>, throws an
/// <see cref="RpcException"/>, or times out / is cancelled within
/// <see cref="GrpcDependencyOptions.Timeout"/>.
/// </para>
/// <para>
/// Recommended registration tags: <see cref="ZbHealthTags.Ready"/> and
/// <see cref="ZbHealthTags.Active"/> — a missing downstream gRPC dependency makes the node both
/// not-ready and not-able-to-act. The registrant applies the tags.
/// </para>
/// </remarks>
public sealed class GrpcDependencyHealthCheck : IHealthCheck
{
private readonly GrpcChannel _channel;
private readonly GrpcDependencyOptions _options;
/// <summary>Initializes a new <see cref="GrpcDependencyHealthCheck"/>.</summary>
/// <param name="channel">The gRPC channel to the downstream dependency.</param>
/// <param name="options">
/// Probe, dependency name, and timeout. When <c>null</c>, defaults are used (the default probe is
/// <see cref="GrpcChannel.ConnectAsync"/> with a 5 s timeout).
/// </param>
public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null)
{
_channel = channel ?? throw new ArgumentNullException(nameof(channel));
_options = options ?? new GrpcDependencyOptions();
}
/// <inheritdoc />
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var name = _options.DependencyName ?? "gRPC dependency";
var probe = _options.Probe ?? DefaultProbeAsync;
using var timeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
timeoutCts.CancelAfter(_options.Timeout);
try
{
var reachable = await probe(_channel, timeoutCts.Token).ConfigureAwait(false);
return reachable
? HealthCheckResult.Healthy($"{name} is reachable.")
: HealthCheckResult.Unhealthy($"{name} is unreachable.");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (RpcException ex) when (ex.StatusCode == StatusCode.Cancelled && cancellationToken.IsCancellationRequested)
{
throw new OperationCanceledException(cancellationToken);
}
catch (RpcException ex)
{
return HealthCheckResult.Unhealthy($"{name} probe failed: {ex.Status.StatusCode}.", ex);
}
catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested)
{
return HealthCheckResult.Unhealthy($"{name} probe timed out after {_options.Timeout}.", ex);
}
}
/// <summary>
/// Default probe: connects the channel and reports reachability. Returns <c>true</c> once the
/// channel reaches a connected state; surfaces failures as a thrown exception (handled by the caller).
/// </summary>
private static async Task<bool> DefaultProbeAsync(GrpcChannel channel, CancellationToken cancellationToken)
{
await channel.ConnectAsync(cancellationToken).ConfigureAwait(false);
return true;
}
}
@@ -0,0 +1,41 @@
using Grpc.Net.Client;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Options for <see cref="GrpcDependencyHealthCheck"/>.
/// </summary>
public sealed class GrpcDependencyOptions
{
/// <summary>
/// The reachability probe. Returns <c>true</c> when the dependency is reachable, <c>false</c>
/// otherwise. When <c>null</c> the default probe is used: <see cref="GrpcChannel.ConnectAsync"/>,
/// which drives the channel to the <see cref="Grpc.Core.ConnectivityState.Ready"/> state (or
/// throws / cancels on failure). Override to perform a richer probe, e.g. a
/// <c>grpc.health.v1.Health/Check</c> RPC returning <c>SERVING</c>.
/// </summary>
public Func<GrpcChannel, CancellationToken, Task<bool>>? Probe { get; set; }
/// <summary>
/// Human-readable name of the dependency, surfaced in the <c>HealthCheckResult</c> description.
/// </summary>
public string? DependencyName { get; set; }
private TimeSpan _timeout = TimeSpan.FromSeconds(5);
/// <summary>Maximum time the probe may take before it is treated as unreachable. Default 5 s.</summary>
/// <exception cref="ArgumentOutOfRangeException">Thrown when set to a value &lt;= <see cref="TimeSpan.Zero"/>.</exception>
public TimeSpan Timeout
{
get => _timeout;
set
{
if (value <= TimeSpan.Zero)
{
throw new ArgumentOutOfRangeException(nameof(value), value, "Timeout must be greater than zero.");
}
_timeout = value;
}
}
}
@@ -0,0 +1,20 @@
namespace ZB.MOM.WW.Health;
/// <summary>
/// Single-property seam: is this node the active / leader node?
/// </summary>
/// <remarks>
/// Attach to endpoints or route groups via
/// <see cref="ActiveNodeGateExtensions.RequireActiveNode"/>. A standby node must not serve the
/// gated routes, so the filter returns HTTP 503 when <see cref="IsActiveNode"/> is <c>false</c>.
/// The implementation is supplied by the consumer — the <c>ZB.MOM.WW.Health.Akka</c> package ships
/// <c>AkkaActiveNodeGate</c> for clustered nodes; non-Akka hosts provide their own.
/// </remarks>
public interface IActiveNodeGate
{
/// <summary>
/// <c>true</c> when this node is the active node and may serve gated routes;
/// <c>false</c> on a standby node.
/// </summary>
bool IsActiveNode { get; }
}
@@ -0,0 +1,30 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Health</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Core ASP.NET Core health-check extensions for the ZB.MOM.WW SCADA family.</Description>
<PackageTags>health-checks;aspnetcore;scada;wonderware;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-health</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<!--
Microsoft.AspNetCore.App is a shared framework, not a NuGet package. It brings in the
ASP.NET Core health-checks middleware, IHealthCheck, HealthCheckService, and the full
Microsoft.Extensions.* surface. Referencing the shared framework is the supported path
for net10.0 libraries that target ASP.NET Core.
-->
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>
<ItemGroup>
<!-- Abstractions for IHealthCheck / HealthCheckResult (also transitively provided by the
framework ref above, but declared explicitly so the dependency is visible to consumers). -->
<PackageReference Include="Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions" />
<PackageReference Include="Grpc.Net.Client" />
</ItemGroup>
</Project>
@@ -0,0 +1,85 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Diagnostics.HealthChecks;
using Microsoft.AspNetCore.Routing;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Maps the canonical ZB.MOM.WW three-tier health endpoints in one call.
/// </summary>
public static class ZbHealthEndpointExtensions
{
/// <summary>
/// Maps the three health tiers:
/// <list type="bullet">
/// <item><description><c>/health/ready</c> — runs only checks tagged <see cref="ZbHealthTags.Ready"/>.</description></item>
/// <item><description><c>/health/active</c> — runs only checks tagged <see cref="ZbHealthTags.Active"/>.</description></item>
/// <item><description><c>/healthz</c> — bare process liveness; runs no checks (always 200 while the process is up).</description></item>
/// </list>
/// All three are anonymous. Status mapping is the ASP.NET Core default:
/// Healthy/Degraded → 200, Unhealthy → 503.
/// </summary>
/// <remarks>
/// Does NOT call <c>services.AddHealthChecks()</c> — the caller registers probes and their tags.
/// The readiness and active tiers use the canonical JSON writer
/// (<see cref="ZbHealthWriter.WriteJsonAsync"/>) unless overridden via
/// <see cref="ZbHealthEndpointOptions.ResponseWriter"/>. The liveness tier runs no checks and
/// emits a minimal <c>200 OK</c> body.
/// </remarks>
/// <returns>
/// The <see cref="IEndpointConventionBuilder"/> for the readiness (<c>/health/ready</c>) endpoint.
/// A single tier is returned (rather than a composite) to keep the API simple; conventions
/// applied to the result affect only the readiness endpoint.
/// </returns>
public static IEndpointConventionBuilder MapZbHealth(
this IEndpointRouteBuilder endpoints,
ZbHealthEndpointOptions? options = null)
{
ArgumentNullException.ThrowIfNull(endpoints);
options ??= new ZbHealthEndpointOptions();
var responseWriter = options.ResponseWriter ?? ZbHealthWriter.WriteJsonAsync;
var ready = endpoints.MapHealthChecks(options.ReadyPath, new HealthCheckOptions
{
Predicate = static c => c.Tags.Contains(ZbHealthTags.Ready),
ResponseWriter = responseWriter,
}).AllowAnonymous();
endpoints.MapHealthChecks(options.ActivePath, new HealthCheckOptions
{
Predicate = static c => c.Tags.Contains(ZbHealthTags.Active),
ResponseWriter = responseWriter,
}).AllowAnonymous();
// Liveness: run no checks. The endpoint returns 200 as long as the process can respond.
// No JSON writer — the empty report would carry no useful data, so the framework default
// (a minimal plain-text body) is sufficient.
endpoints.MapHealthChecks(options.LivePath, new HealthCheckOptions
{
Predicate = static _ => false,
}).AllowAnonymous();
return ready;
}
/// <summary>
/// Maps the three health tiers, configuring options inline. See the other
/// <see cref="MapZbHealth(IEndpointRouteBuilder, ZbHealthEndpointOptions?)"/> overload for tier semantics.
/// </summary>
/// <param name="endpoints">The endpoint route builder to map onto.</param>
/// <param name="configure">Callback that mutates a fresh <see cref="ZbHealthEndpointOptions"/>.</param>
/// <returns>The <see cref="IEndpointConventionBuilder"/> for the readiness endpoint.</returns>
public static IEndpointConventionBuilder MapZbHealth(
this IEndpointRouteBuilder endpoints,
Action<ZbHealthEndpointOptions> configure)
{
ArgumentNullException.ThrowIfNull(endpoints);
ArgumentNullException.ThrowIfNull(configure);
var options = new ZbHealthEndpointOptions();
configure(options);
return endpoints.MapZbHealth(options);
}
}
@@ -0,0 +1,27 @@
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Options for <see cref="ZbHealthEndpointExtensions.MapZbHealth"/>. Lets callers override the
/// three tier route paths and the JSON response writer. The defaults match the ZB.MOM.WW health contract.
/// </summary>
public sealed class ZbHealthEndpointOptions
{
/// <summary>Path for the readiness tier (runs only checks tagged <see cref="ZbHealthTags.Ready"/>).</summary>
public string ReadyPath { get; set; } = "/health/ready";
/// <summary>Path for the active-node tier (runs only checks tagged <see cref="ZbHealthTags.Active"/>).</summary>
public string ActivePath { get; set; } = "/health/active";
/// <summary>Path for the bare liveness tier (runs no checks; 200 while the process is up).</summary>
public string LivePath { get; set; } = "/healthz";
/// <summary>
/// Response writer for the readiness and active tiers. Defaults to
/// <see cref="ZbHealthWriter.WriteJsonAsync"/> (canonical JSON). The liveness tier runs no checks
/// and emits a minimal body, so this writer is not applied to it.
/// </summary>
public Func<HttpContext, HealthReport, Task>? ResponseWriter { get; set; }
}
@@ -0,0 +1,19 @@
namespace ZB.MOM.WW.Health;
/// <summary>
/// Canonical health-check tag constants for the ZB.MOM.WW three-tier health pattern.
/// Use these when registering checks, e.g.
/// <c>AddCheck("db", check, tags: [ZbHealthTags.Ready])</c>, so that
/// <see cref="ZbHealthEndpointExtensions.MapZbHealth"/> routes each check to the right tier.
/// </summary>
public static class ZbHealthTags
{
/// <summary>Readiness checks (dependencies needed before the node can serve traffic).</summary>
public const string Ready = "ready";
/// <summary>Active-node checks (leader / active-singleton gating).</summary>
public const string Active = "active";
/// <summary>Liveness — process is up. Reserved tag; the liveness endpoint runs no checks.</summary>
public const string Live = "live";
}
@@ -0,0 +1,77 @@
using System.Text.Json;
using System.Text.Json.Serialization;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Diagnostics.HealthChecks;
namespace ZB.MOM.WW.Health;
/// <summary>
/// Canonical JSON response writer for the ZB.MOM.WW health endpoints.
/// </summary>
/// <remarks>
/// Self-contained — it has no runtime dependency on <c>AspNetCore.HealthChecks.UI.Client</c>;
/// the JSON shape is modelled after that library's <c>UIResponseWriter</c> output but written here
/// with <see cref="System.Text.Json"/>. The body shape is:
/// <code>
/// {
/// "status": "Healthy|Degraded|Unhealthy",
/// "totalDurationMs": 12.34,
/// "entries": {
/// "&lt;name&gt;": { "status": "...", "description": "...", "durationMs": 1.23 }
/// }
/// }
/// </code>
/// The HTTP status code is left to the ASP.NET Core health-checks middleware (Healthy/Degraded → 200,
/// Unhealthy → 503); this writer only renders the body and sets <c>Content-Type: application/json</c>.
/// </remarks>
public static class ZbHealthWriter
{
private static readonly JsonSerializerOptions SerializerOptions = new()
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
};
/// <summary>
/// Writes <paramref name="report"/> to the response as canonical ZB.MOM.WW health JSON.
/// </summary>
/// <param name="context">The current HTTP context. Its <see cref="HttpResponse"/> is written to.</param>
/// <param name="report">The aggregated health report for the tier that ran.</param>
public static async Task WriteJsonAsync(HttpContext context, HealthReport report)
{
ArgumentNullException.ThrowIfNull(context);
ArgumentNullException.ThrowIfNull(report);
context.Response.ContentType = "application/json; charset=utf-8";
var payload = new HealthReportDto
{
Status = report.Status.ToString(),
TotalDurationMs = report.TotalDuration.TotalMilliseconds,
Entries = report.Entries.ToDictionary(
static e => e.Key,
static e => new HealthEntryDto
{
Status = e.Value.Status.ToString(),
Description = e.Value.Description,
DurationMs = e.Value.Duration.TotalMilliseconds,
}),
};
await JsonSerializer.SerializeAsync(context.Response.Body, payload, SerializerOptions, context.RequestAborted).ConfigureAwait(false);
}
private sealed class HealthReportDto
{
public string Status { get; init; } = string.Empty;
public double TotalDurationMs { get; init; }
public Dictionary<string, HealthEntryDto> Entries { get; init; } = new();
}
private sealed class HealthEntryDto
{
public string Status { get; init; } = string.Empty;
public string? Description { get; init; }
public double DurationMs { get; init; }
}
}
@@ -0,0 +1,93 @@
using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.Health.Akka;
namespace ZB.MOM.WW.Health.Akka.Tests;
/// <summary>
/// Table-driven tests for the pure <see cref="ActiveNodeDecision.Evaluate"/> helper covering both
/// the role-less (ScadaBridge ActiveNode) and role-filtered (OtOpcUa AdminRoleLeader) matrices,
/// plus the startup-safety null-guards on <see cref="ActiveNodeHealthCheck"/> and
/// <see cref="AkkaActiveNodeGate"/> when no <see cref="ActorSystem"/> is registered.
/// </summary>
public sealed class ActiveNodeDecisionTests
{
// Role-less: requiredRole == null. hasRole is irrelevant. Healthy iff (selfUp && isLeader), else Unhealthy.
public static IEnumerable<object?[]> RoleLessCases() => new[]
{
new object?[] { true, true, false, (string?)null, HealthStatus.Healthy },
new object?[] { true, false, false, (string?)null, HealthStatus.Unhealthy },
new object?[] { false, true, false, (string?)null, HealthStatus.Unhealthy },
new object?[] { false, false, false, (string?)null, HealthStatus.Unhealthy },
};
[Theory]
[MemberData(nameof(RoleLessCases))]
public void Evaluate_RoleLess(bool selfUp, bool isLeader, bool hasRole, string? requiredRole, HealthStatus expected)
{
Assert.Equal(expected, ActiveNodeDecision.Evaluate(selfUp, isLeader, hasRole, requiredRole));
}
// Role-filtered: requiredRole != null.
// lacks role -> Healthy (probe irrelevant for this node)
// has role & is leader -> Healthy (selfUp is ignored — role-filtered mode only cares about leadership)
// has role & not leader -> Degraded
public static IEnumerable<object[]> RoleFilteredCases() => new[]
{
// node lacks the role -> Healthy regardless of selfUp / isLeader
new object[] { true, true, false, "admin", HealthStatus.Healthy },
new object[] { true, false, false, "admin", HealthStatus.Healthy },
new object[] { false, false, false, "admin", HealthStatus.Healthy },
// node carries the role and is leader -> Healthy (selfUp=true)
new object[] { true, true, true, "admin", HealthStatus.Healthy },
// node carries the role and is leader -> Healthy (selfUp=false: role-filtered mode ignores selfUp)
new object[] { false, true, true, "admin", HealthStatus.Healthy },
// node carries the role but is not leader -> Degraded
new object[] { true, false, true, "admin", HealthStatus.Degraded },
new object[] { false, false, true, "admin", HealthStatus.Degraded },
};
[Theory]
[MemberData(nameof(RoleFilteredCases))]
public void Evaluate_RoleFiltered(bool selfUp, bool isLeader, bool hasRole, string? requiredRole, HealthStatus expected)
{
Assert.Equal(expected, ActiveNodeDecision.Evaluate(selfUp, isLeader, hasRole, requiredRole));
}
[Fact]
public async Task HealthCheck_RoleLess_NoActorSystem_ReturnsDegraded()
{
var provider = new ServiceCollection().BuildServiceProvider();
var check = new ActiveNodeHealthCheck(provider);
var result = await check.CheckHealthAsync(NewContext(check));
Assert.Equal(HealthStatus.Degraded, result.Status);
}
[Fact]
public async Task HealthCheck_RoleFiltered_NoActorSystem_ReturnsDegraded()
{
var provider = new ServiceCollection().BuildServiceProvider();
var check = new ActiveNodeHealthCheck(provider, "admin");
var result = await check.CheckHealthAsync(NewContext(check));
Assert.Equal(HealthStatus.Degraded, result.Status);
}
[Fact]
public void Gate_NoActorSystem_IsActiveNodeFalse()
{
var provider = new ServiceCollection().BuildServiceProvider();
var gate = new AkkaActiveNodeGate(provider);
Assert.False(gate.IsActiveNode);
}
private static HealthCheckContext NewContext(IHealthCheck check) => new()
{
Registration = new HealthCheckRegistration("active-node", check, HealthStatus.Unhealthy, tags: null),
};
}
@@ -0,0 +1,77 @@
using Akka.Cluster;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.Health.Akka;
namespace ZB.MOM.WW.Health.Akka.Tests;
/// <summary>
/// Table-driven tests for the pure status-mapping function inside <see cref="AkkaClusterStatusPolicy"/>.
/// The two presets (<see cref="AkkaClusterStatusPolicy.Default"/> and
/// <see cref="AkkaClusterStatusPolicy.OtOpcUaCompat"/>) are the convergence targets for ScadaBridge
/// and OtOpcUa respectively; every <see cref="MemberStatus"/> is exercised so a drift in either
/// preset fails loudly. Also covers the startup-safety null-guard on <see cref="AkkaClusterHealthCheck"/>.
/// </summary>
public sealed class AkkaClusterStatusPolicyTests
{
public static IEnumerable<object[]> DefaultCases() => new[]
{
new object[] { MemberStatus.Up, HealthStatus.Healthy },
new object[] { MemberStatus.Joining, HealthStatus.Healthy },
new object[] { MemberStatus.Leaving, HealthStatus.Degraded },
new object[] { MemberStatus.Exiting, HealthStatus.Degraded },
new object[] { MemberStatus.WeaklyUp, HealthStatus.Unhealthy },
new object[] { MemberStatus.Down, HealthStatus.Unhealthy },
new object[] { MemberStatus.Removed, HealthStatus.Unhealthy },
new object[] { (MemberStatus)99, HealthStatus.Unhealthy }, // unknown / future status
};
[Theory]
[MemberData(nameof(DefaultCases))]
public void Default_MapsEveryStatus(MemberStatus status, HealthStatus expected)
{
Assert.Equal(expected, AkkaClusterStatusPolicy.Default.Evaluate(status));
}
public static IEnumerable<object[]> OtOpcUaCompatCases() => new[]
{
new object[] { MemberStatus.Up, HealthStatus.Healthy },
new object[] { MemberStatus.Joining, HealthStatus.Degraded },
new object[] { MemberStatus.Leaving, HealthStatus.Degraded },
new object[] { MemberStatus.Exiting, HealthStatus.Degraded },
new object[] { MemberStatus.WeaklyUp, HealthStatus.Degraded },
new object[] { MemberStatus.Down, HealthStatus.Degraded },
new object[] { MemberStatus.Removed, HealthStatus.Degraded },
new object[] { (MemberStatus)99, HealthStatus.Degraded }, // unknown / future status
};
[Theory]
[MemberData(nameof(OtOpcUaCompatCases))]
public void OtOpcUaCompat_OnlyUpIsHealthy(MemberStatus status, HealthStatus expected)
{
Assert.Equal(expected, AkkaClusterStatusPolicy.OtOpcUaCompat.Evaluate(status));
}
[Fact]
public void CustomPolicy_UsesSuppliedFunc()
{
var policy = new AkkaClusterStatusPolicy(_ => HealthStatus.Unhealthy);
Assert.Equal(HealthStatus.Unhealthy, policy.Evaluate(MemberStatus.Up));
}
[Fact]
public async Task HealthCheck_NoActorSystem_ReturnsDegraded()
{
var provider = new ServiceCollection().BuildServiceProvider();
var check = new AkkaClusterHealthCheck(provider, AkkaClusterStatusPolicy.Default);
var result = await check.CheckHealthAsync(NewContext(check));
Assert.Equal(HealthStatus.Degraded, result.Status);
}
private static HealthCheckContext NewContext(IHealthCheck check) => new()
{
Registration = new HealthCheckRegistration("akka-cluster", check, HealthStatus.Unhealthy, tags: null),
};
}
@@ -0,0 +1,22 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Health.Akka\ZB.MOM.WW.Health.Akka.csproj" />
</ItemGroup>
</Project>
@@ -0,0 +1,155 @@
using Microsoft.Data.Sqlite;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.Health.EntityFrameworkCore;
namespace ZB.MOM.WW.Health.EntityFrameworkCore.Tests;
/// <summary>
/// Verifies <see cref="DatabaseHealthCheck{TContext}"/> against a real SQLite database (in-memory,
/// connection kept open) so the <c>CanConnectAsync</c> semantics exercise an actual provider:
/// reachable → Healthy, unopenable connection → Unhealthy (no throw escapes), a custom
/// <see cref="DatabaseHealthCheckOptions{TContext}.ProbeQuery"/> that queries → Healthy, a
/// throwing <c>ProbeQuery</c> → Unhealthy, and a timed-out probe → Unhealthy. Both the
/// <see cref="IDbContextFactory{TContext}"/> and the scoped-<c>TContext</c> resolution paths
/// are covered.
/// </summary>
public sealed class DatabaseHealthCheckTests
{
/// <summary>A minimal context with one entity, used purely to drive provider behaviour.</summary>
private sealed class WidgetContext : DbContext
{
public WidgetContext(DbContextOptions<WidgetContext> options) : base(options) { }
public DbSet<Widget> Widgets => Set<Widget>();
}
private sealed class Widget
{
public int Id { get; set; }
}
private static HealthCheckContext NewContext() => new()
{
Registration = new HealthCheckRegistration(
"database",
sp => throw new InvalidOperationException("not used"),
HealthStatus.Unhealthy,
tags: null),
};
/// <summary>
/// Builds a provider whose <typeparamref name="WidgetContext"/> is backed by the supplied open
/// SQLite connection (and creates the schema). When <paramref name="useFactory"/> is true the
/// context is registered via <c>AddDbContextFactory</c>; otherwise via <c>AddDbContext</c> (scoped).
/// </summary>
private static ServiceProvider BuildProvider(SqliteConnection connection, bool useFactory)
{
connection.Open();
var services = new ServiceCollection();
if (useFactory)
{
services.AddDbContextFactory<WidgetContext>(o => o.UseSqlite(connection));
}
else
{
services.AddDbContext<WidgetContext>(o => o.UseSqlite(connection));
}
var provider = services.BuildServiceProvider();
using var scope = provider.CreateScope();
scope.ServiceProvider.GetRequiredService<WidgetContext>().Database.EnsureCreated();
return provider;
}
[Theory]
[InlineData(true)]
[InlineData(false)]
public async Task ReachableContext_Healthy(bool useFactory)
{
using var connection = new SqliteConnection("DataSource=:memory:");
await using var provider = BuildProvider(connection, useFactory);
var check = new DatabaseHealthCheck<WidgetContext>(provider);
var result = await check.CheckHealthAsync(NewContext(), CancellationToken.None);
Assert.Equal(HealthStatus.Healthy, result.Status);
}
[Fact]
public async Task UnopenableConnection_Unhealthy_NoThrow()
{
// Point the context at a file path that cannot be opened (parent directory does not exist).
var bogusPath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString("N"), "missing", "db.sqlite");
var services = new ServiceCollection();
services.AddDbContext<WidgetContext>(o => o.UseSqlite($"DataSource={bogusPath};Mode=ReadWrite"));
await using var provider = services.BuildServiceProvider();
var check = new DatabaseHealthCheck<WidgetContext>(provider);
var result = await check.CheckHealthAsync(NewContext(), CancellationToken.None);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task CustomProbeQuery_RunsQuery_Healthy()
{
using var connection = new SqliteConnection("DataSource=:memory:");
await using var provider = BuildProvider(connection, useFactory: true);
var options = new DatabaseHealthCheckOptions<WidgetContext>
{
ProbeQuery = (ctx, ct) => ctx.Widgets.AsNoTracking().AnyAsync(ct),
};
var check = new DatabaseHealthCheck<WidgetContext>(provider, options);
var result = await check.CheckHealthAsync(NewContext(), CancellationToken.None);
Assert.Equal(HealthStatus.Healthy, result.Status);
}
[Fact]
public async Task ProbeQueryThrows_Unhealthy()
{
using var connection = new SqliteConnection("DataSource=:memory:");
await using var provider = BuildProvider(connection, useFactory: false);
var options = new DatabaseHealthCheckOptions<WidgetContext>
{
// Use a faulted task rather than a synchronous throw to accurately model
// async probe delegates that encounter an error.
ProbeQuery = (_, _) => Task.FromException(new InvalidOperationException("boom")),
};
var check = new DatabaseHealthCheck<WidgetContext>(provider, options);
var result = await check.CheckHealthAsync(NewContext(), CancellationToken.None);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task ProbeTimeout_Unhealthy()
{
using var connection = new SqliteConnection("DataSource=:memory:");
await using var provider = BuildProvider(connection, useFactory: true);
// Use a very short timeout and a probe that blocks indefinitely (until cancelled).
var options = new DatabaseHealthCheckOptions<WidgetContext>
{
Timeout = TimeSpan.FromMilliseconds(50),
ProbeQuery = async (_, ct) => await Task.Delay(Timeout.Infinite, ct),
};
var check = new DatabaseHealthCheck<WidgetContext>(provider, options);
var result = await check.CheckHealthAsync(NewContext(), CancellationToken.None);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
}
@@ -0,0 +1,23 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Health.EntityFrameworkCore\ZB.MOM.WW.Health.EntityFrameworkCore.csproj" />
</ItemGroup>
</Project>
@@ -0,0 +1,70 @@
using System.Net;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.TestHost;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.Health;
namespace ZB.MOM.WW.Health.Tests;
/// <summary>
/// Verifies <see cref="ActiveNodeGateExtensions.RequireActiveNode"/>: a decorated endpoint serves
/// normally (200) when the resolved <see cref="IActiveNodeGate"/> reports the node active, and
/// returns 503 with a <c>Retry-After</c> header when the node is a standby.
/// </summary>
public sealed class ActiveNodeGateTests
{
private sealed class FakeActiveNodeGate : IActiveNodeGate
{
public bool IsActiveNode { get; set; }
}
private static async Task<HttpResponseMessage> CallAsync(bool isActive)
{
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseTestServer();
builder.Services.AddSingleton<IActiveNodeGate>(new FakeActiveNodeGate { IsActiveNode = isActive });
await using var app = builder.Build();
app.MapGet("/x", () => "ok").RequireActiveNode();
await app.StartAsync();
var client = app.GetTestClient();
return await client.GetAsync("/x");
}
[Fact]
public async Task ActiveNode_Returns200()
{
var response = await CallAsync(isActive: true);
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal("ok", await response.Content.ReadAsStringAsync());
}
[Fact]
public async Task StandbyNode_Returns503_WithRetryAfterHeader()
{
var response = await CallAsync(isActive: false);
Assert.Equal(HttpStatusCode.ServiceUnavailable, response.StatusCode);
Assert.True(
response.Headers.Contains("Retry-After"),
"Standby response must carry a Retry-After header.");
}
[Fact]
public async Task NoGateRegistered_AllowsRequest()
{
// When no IActiveNodeGate is registered (non-clustered host / tests), the endpoint is served.
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseTestServer();
await using var app = builder.Build();
app.MapGet("/x", () => "ok").RequireActiveNode();
await app.StartAsync();
var response = await app.GetTestClient().GetAsync("/x");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
}
}
@@ -0,0 +1,107 @@
using Grpc.Core;
using Grpc.Net.Client;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.Health;
namespace ZB.MOM.WW.Health.Tests;
/// <summary>
/// Verifies <see cref="GrpcDependencyHealthCheck"/> via an injected probe (no live gRPC server):
/// probe-true → Healthy, probe-false → Unhealthy, and an <see cref="RpcException"/> from the probe
/// → Unhealthy. The channel is constructed but never dialled because the probe is stubbed.
/// </summary>
public sealed class GrpcDependencyHealthCheckTests
{
private static readonly GrpcChannel Channel = GrpcChannel.ForAddress("http://localhost");
private static async Task<HealthCheckResult> RunAsync(
GrpcDependencyOptions options, CancellationToken cancellationToken = default)
{
var check = new GrpcDependencyHealthCheck(Channel, options);
var context = new HealthCheckContext
{
Registration = new HealthCheckRegistration("grpc-dep", check, HealthStatus.Unhealthy, tags: null),
};
return await check.CheckHealthAsync(context, cancellationToken);
}
[Fact]
public async Task ProbeReturnsTrue_Healthy()
{
var result = await RunAsync(new GrpcDependencyOptions
{
Probe = static (_, _) => Task.FromResult(true),
});
Assert.Equal(HealthStatus.Healthy, result.Status);
}
[Fact]
public async Task ProbeReturnsFalse_Unhealthy()
{
var result = await RunAsync(new GrpcDependencyOptions
{
Probe = static (_, _) => Task.FromResult(false),
});
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task ProbeThrowsRpcException_Unhealthy()
{
var result = await RunAsync(new GrpcDependencyOptions
{
Probe = static (_, _) => throw new RpcException(new Status(StatusCode.Unavailable, "down")),
});
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task DependencyName_AppearsInDescription()
{
var result = await RunAsync(new GrpcDependencyOptions
{
DependencyName = "mxaccessgw worker",
Probe = static (_, _) => Task.FromResult(false),
});
Assert.Equal(HealthStatus.Unhealthy, result.Status);
Assert.Contains("mxaccessgw worker", result.Description);
}
[Fact]
public async Task ProbeExceedsTimeout_Unhealthy()
{
var result = await RunAsync(new GrpcDependencyOptions
{
Timeout = TimeSpan.FromMilliseconds(50),
Probe = static async (_, ct) =>
{
await Task.Delay(Timeout.Infinite, ct);
return true;
},
});
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task ExternalCancellation_Throws()
{
using var cts = new CancellationTokenSource();
await cts.CancelAsync();
await Assert.ThrowsAnyAsync<OperationCanceledException>(() => RunAsync(
new GrpcDependencyOptions
{
Probe = static async (_, ct) =>
{
await Task.Delay(Timeout.Infinite, ct);
return true;
},
},
cts.Token));
}
}
@@ -0,0 +1,92 @@
using System.Net;
using System.Net.Http.Json;
using System.Text.Json;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.TestHost;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.Health;
namespace ZB.MOM.WW.Health.Tests;
/// <summary>
/// Verifies the canonical JSON response writer (<see cref="ZbHealthWriter.WriteJsonAsync"/>):
/// the JSON body shape, the <c>application/json</c> content type, and that the framework's
/// status-to-HTTP mapping (Healthy/Degraded → 200, Unhealthy → 503) is preserved when the
/// writer is wired onto the ready/active tiers by <see cref="ZbHealthEndpointExtensions.MapZbHealth"/>.
/// </summary>
public sealed class ResponseWriterTests
{
private sealed class StubHealthCheck : IHealthCheck
{
private readonly HealthCheckResult _result;
public StubHealthCheck(HealthStatus status, string? description = null) =>
_result = new HealthCheckResult(status, description);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default) => Task.FromResult(_result);
}
private static async Task<HttpResponseMessage> GetReadyAsync(
HealthStatus status, string description = "db reachable")
{
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseTestServer();
builder.Services.AddHealthChecks()
.AddCheck("db", new StubHealthCheck(status, description), tags: new[] { ZbHealthTags.Ready });
await using var app = builder.Build();
app.MapZbHealth();
await app.StartAsync();
var client = app.GetTestClient();
return await client.GetAsync("/health/ready");
}
[Fact]
public async Task ReadyEndpoint_Healthy_WritesJsonBody_With200()
{
var response = await GetReadyAsync(HealthStatus.Healthy);
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal("application/json", response.Content.Headers.ContentType?.MediaType);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
var root = doc.RootElement;
Assert.Equal("Healthy", root.GetProperty("status").GetString());
Assert.Equal(JsonValueKind.Number, root.GetProperty("totalDurationMs").ValueKind);
var entries = root.GetProperty("entries");
var db = entries.GetProperty("db");
Assert.Equal("Healthy", db.GetProperty("status").GetString());
Assert.Equal("db reachable", db.GetProperty("description").GetString());
}
[Fact]
public async Task ReadyEndpoint_Degraded_Returns200_WithDegradedStatus()
{
var response = await GetReadyAsync(HealthStatus.Degraded);
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal("application/json", response.Content.Headers.ContentType?.MediaType);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
Assert.Equal("Degraded", doc.RootElement.GetProperty("status").GetString());
}
[Fact]
public async Task ReadyEndpoint_Unhealthy_Returns503_WithUnhealthyStatus()
{
var response = await GetReadyAsync(HealthStatus.Unhealthy);
Assert.Equal(HttpStatusCode.ServiceUnavailable, response.StatusCode);
Assert.Equal("application/json", response.Content.Headers.ContentType?.MediaType);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
Assert.Equal("Unhealthy", doc.RootElement.GetProperty("status").GetString());
}
}
@@ -0,0 +1,165 @@
using System.Net;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.TestHost;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.Health;
namespace ZB.MOM.WW.Health.Tests;
/// <summary>
/// Verifies the three-tier <see cref="ZbHealthEndpointExtensions.MapZbHealth"/> convention:
/// each endpoint runs only the checks tagged for its tier, /healthz runs nothing, and the
/// standard ASP.NET HealthChecks status-to-HTTP mapping (Healthy/Degraded → 200, Unhealthy → 503)
/// holds per tier.
/// </summary>
public sealed class TierMappingTests
{
/// <summary>
/// An <see cref="IHealthCheck"/> test double that records each invocation and returns a
/// configurable result, so tests can assert which checks actually ran per tier.
/// </summary>
private sealed class RecordingHealthCheck : IHealthCheck
{
private readonly HealthStatus _status;
private int _invocations;
public RecordingHealthCheck(HealthStatus status) => _status = status;
public int Invocations => Volatile.Read(ref _invocations);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
Interlocked.Increment(ref _invocations);
return Task.FromResult(new HealthCheckResult(_status));
}
}
private static async Task<(HttpResponseMessage Response, RecordingHealthCheck Ready, RecordingHealthCheck Active)>
RunAsync(string path, HealthStatus readyStatus = HealthStatus.Healthy, HealthStatus activeStatus = HealthStatus.Healthy)
{
var ready = new RecordingHealthCheck(readyStatus);
var active = new RecordingHealthCheck(activeStatus);
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseTestServer();
builder.Services.AddHealthChecks()
.AddCheck("ready-check", ready, tags: new[] { ZbHealthTags.Ready })
.AddCheck("active-check", active, tags: new[] { ZbHealthTags.Active });
await using var app = builder.Build();
app.MapZbHealth();
await app.StartAsync();
var client = app.GetTestClient();
var response = await client.GetAsync(path);
return (response, ready, active);
}
[Fact]
public async Task ReadyEndpoint_RunsOnlyReadyCheck()
{
var (response, ready, active) = await RunAsync("/health/ready");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal(1, ready.Invocations);
Assert.Equal(0, active.Invocations);
}
[Fact]
public async Task ActiveEndpoint_RunsOnlyActiveCheck()
{
var (response, ready, active) = await RunAsync("/health/active");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal(0, ready.Invocations);
Assert.Equal(1, active.Invocations);
}
[Fact]
public async Task LivenessEndpoint_RunsNoChecks_AndReturns200()
{
var (response, ready, active) = await RunAsync("/healthz");
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal(0, ready.Invocations);
Assert.Equal(0, active.Invocations);
}
[Fact]
public async Task ReadyEndpoint_Healthy_Returns200()
{
var (response, _, _) = await RunAsync("/health/ready", readyStatus: HealthStatus.Healthy);
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
}
[Fact]
public async Task ReadyEndpoint_Unhealthy_Returns503()
{
var (response, _, _) = await RunAsync("/health/ready", readyStatus: HealthStatus.Unhealthy);
Assert.Equal(HttpStatusCode.ServiceUnavailable, response.StatusCode);
}
[Fact]
public async Task ActiveEndpoint_Unhealthy_Returns503()
{
var (response, _, _) = await RunAsync("/health/active", activeStatus: HealthStatus.Unhealthy);
Assert.Equal(HttpStatusCode.ServiceUnavailable, response.StatusCode);
}
[Fact]
public async Task LivenessEndpoint_UnaffectedByUnhealthyChecks()
{
// Even though every registered check is Unhealthy, /healthz runs none of them
// (predicate _ => false) and stays 200 as long as the process is up.
var (response, ready, active) = await RunAsync(
"/healthz", readyStatus: HealthStatus.Unhealthy, activeStatus: HealthStatus.Unhealthy);
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal(0, ready.Invocations);
Assert.Equal(0, active.Invocations);
}
[Fact]
public async Task Options_OverrideRoutePaths()
{
var ready = new RecordingHealthCheck(HealthStatus.Healthy);
var active = new RecordingHealthCheck(HealthStatus.Healthy);
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseTestServer();
builder.Services.AddHealthChecks()
.AddCheck("ready-check", ready, tags: new[] { ZbHealthTags.Ready })
.AddCheck("active-check", active, tags: new[] { ZbHealthTags.Active });
await using var app = builder.Build();
app.MapZbHealth(new ZbHealthEndpointOptions
{
ReadyPath = "/custom/ready",
ActivePath = "/custom/active",
LivePath = "/custom/live",
});
await app.StartAsync();
var client = app.GetTestClient();
var readyResponse = await client.GetAsync("/custom/ready");
Assert.Equal(HttpStatusCode.OK, readyResponse.StatusCode);
Assert.Equal(1, ready.Invocations);
Assert.Equal(0, active.Invocations);
var liveResponse = await client.GetAsync("/custom/live");
Assert.Equal(HttpStatusCode.OK, liveResponse.StatusCode);
// The default paths must no longer be mapped when overridden.
var defaultReady = await client.GetAsync("/health/ready");
Assert.Equal(HttpStatusCode.NotFound, defaultReady.StatusCode);
}
}
@@ -0,0 +1,28 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Microsoft.AspNetCore.Mvc.Testing" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<!-- WebApplicationFactory requires the full ASP.NET Core shared framework -->
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Health\ZB.MOM.WW.Health.csproj" />
</ItemGroup>
</Project>
+482
View File
@@ -0,0 +1,482 @@
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.
##
## Get latest from `dotnet new gitignore`
# dotenv files
.env
# User-specific files
*.rsuser
*.suo
*.user
*.userosscache
*.sln.docstates
# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs
# Mono auto generated files
mono_crash.*
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
[Ww][Ii][Nn]32/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/
[Ll]ogs/
# Visual Studio 2015/2017 cache/options directory
.vs/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/
# Visual Studio 2017 auto generated files
Generated\ Files/
# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*
# NUnit
*.VisualState.xml
TestResult.xml
nunit-*.xml
# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c
# Benchmark Results
BenchmarkDotNet.Artifacts/
# .NET
project.lock.json
project.fragment.lock.json
artifacts/
# Tye
.tye/
# ASP.NET Scaffolding
ScaffoldingReadMe.txt
# StyleCop
StyleCopReport.xml
# Files built by Visual Studio
*_i.c
*_p.c
*_h.h
*.ilk
*.meta
*.obj
*.iobj
*.pch
*.pdb
*.ipdb
*.pgc
*.pgd
*.rsp
# but not Directory.Build.rsp, as it configures directory-level build defaults
!Directory.Build.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*_wpftmp.csproj
*.log
*.tlog
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc
# Chutzpah Test files
_Chutzpah*
# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opendb
*.opensdf
*.sdf
*.cachefile
*.VC.db
*.VC.VC.opendb
# Visual Studio profiler
*.psess
*.vsp
*.vspx
*.sap
# Visual Studio Trace Files
*.e2e
# TFS 2012 Local Workspace
$tf/
# Guidance Automation Toolkit
*.gpState
# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# TeamCity is a build add-in
_TeamCity*
# DotCover is a Code Coverage Tool
*.dotCover
# AxoCover is a Code Coverage Tool
.axoCover/*
!.axoCover/settings.json
# Coverlet is a free, cross platform Code Coverage Tool
coverage*.json
coverage*.xml
coverage*.info
# Visual Studio code coverage results
*.coverage
*.coveragexml
# NCrunch
_NCrunch_*
.*crunch*.local.xml
nCrunchTemp_*
# MightyMoose
*.mm.*
AutoTest.Net/
# Web workbench (sass)
.sass-cache/
# Installshield output folder
[Ee]xpress/
# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html
# Click-Once directory
publish/
# Publish Web Output
*.[Pp]ublish.xml
*.azurePubxml
# Note: Comment the next line if you want to checkin your web deploy settings,
# but database connection strings (with potential passwords) will be unencrypted
*.pubxml
*.publishproj
# Microsoft Azure Web App publish settings. Comment the next line if you want to
# checkin your Azure Web App publish settings, but sensitive information contained
# in these scripts will be unencrypted
PublishScripts/
# NuGet Packages
*.nupkg
# NuGet Symbol Packages
*.snupkg
# The packages folder can be ignored because of Package Restore
**/[Pp]ackages/*
# except build/, which is used as an MSBuild target.
!**/[Pp]ackages/build/
# Uncomment if necessary however generally it will be regenerated when needed
#!**/[Pp]ackages/repositories.config
# NuGet v3's project.json files produces more ignorable files
*.nuget.props
*.nuget.targets
# Microsoft Azure Build Output
csx/
*.build.csdef
# Microsoft Azure Emulator
ecf/
rcf/
# Windows Store app package directories and files
AppPackages/
BundleArtifacts/
Package.StoreAssociation.xml
_pkginfo.txt
*.appx
*.appxbundle
*.appxupload
# Visual Studio cache files
# files ending in .cache can be ignored
*.[Cc]ache
# but keep track of directories ending in .cache
!?*.[Cc]ache/
# Others
ClientBin/
~$*
*~
*.dbmdl
*.dbproj.schemaview
*.jfm
*.pfx
*.publishsettings
orleans.codegen.cs
# Including strong name files can present a security risk
# (https://github.com/github/gitignore/pull/2483#issue-259490424)
#*.snk
# Since there are multiple workflows, uncomment next line to ignore bower_components
# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
#bower_components/
# RIA/Silverlight projects
Generated_Code/
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
ServiceFabricBackup/
*.rptproj.bak
# SQL Server files
*.mdf
*.ldf
*.ndf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
*.rptproj.rsuser
*- [Bb]ackup.rdl
*- [Bb]ackup ([0-9]).rdl
*- [Bb]ackup ([0-9][0-9]).rdl
# Microsoft Fakes
FakesAssemblies/
# GhostDoc plugin setting file
*.GhostDoc.xml
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
node_modules/
# Visual Studio 6 build log
*.plg
# Visual Studio 6 workspace options file
*.opt
# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
*.vbw
# Visual Studio 6 auto-generated project file (contains which files were open etc.)
*.vbp
# Visual Studio 6 workspace and project file (working project files containing files to include in project)
*.dsw
*.dsp
# Visual Studio 6 technical files
*.ncb
*.aps
# Visual Studio LightSwitch build output
**/*.HTMLClient/GeneratedArtifacts
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
_Pvt_Extensions
# Paket dependency manager
.paket/paket.exe
paket-files/
# FAKE - F# Make
.fake/
# CodeRush personal settings
.cr/personal
# Python Tools for Visual Studio (PTVS)
__pycache__/
*.pyc
# Cake - Uncomment if you are using it
# tools/**
# !tools/packages.config
# Tabs Studio
*.tss
# Telerik's JustMock configuration file
*.jmconfig
# BizTalk build output
*.btp.cs
*.btm.cs
*.odx.cs
*.xsd.cs
# OpenCover UI analysis results
OpenCover/
# Azure Stream Analytics local run output
ASALocalRun/
# MSBuild Binary and Structured Log
*.binlog
# NVidia Nsight GPU debugger configuration file
*.nvuser
# MFractors (Xamarin productivity tool) working folder
.mfractor/
# Local History for Visual Studio
.localhistory/
# Visual Studio History (VSHistory) files
.vshistory/
# BeatPulse healthcheck temp database
healthchecksdb
# Backup folder for Package Reference Convert tool in Visual Studio 2017
MigrationBackup/
# Ionide (cross platform F# VS Code tools) working folder
.ionide/
# Fody - auto-generated XML schema
FodyWeavers.xsd
# VS Code files for those working on multiple tools
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
*.code-workspace
# Local History for Visual Studio Code
.history/
# Windows Installer files from build outputs
*.cab
*.msi
*.msix
*.msm
*.msp
# JetBrains Rider
*.sln.iml
.idea/
##
## Visual studio for Mac
##
# globs
Makefile.in
*.userprefs
*.usertasks
config.make
config.status
aclocal.m4
install-sh
autom4te.cache/
*.tar.gz
tarballs/
test-results/
# content below from: https://github.com/github/gitignore/blob/main/Global/macOS.gitignore
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
# content below from: https://github.com/github/gitignore/blob/main/Global/Windows.gitignore
# Windows thumbnail cache files
Thumbs.db
ehthumbs.db
ehthumbs_vista.db
# Dump file
*.stackdump
# Folder config file
[Dd]esktop.ini
# Recycle Bin used on file shares
$RECYCLE.BIN/
# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp
# Windows shortcuts
*.lnk
# Vim temporary swap files
*.swp
+74
View File
@@ -0,0 +1,74 @@
# ZB.MOM.WW.Telemetry
Observability libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). These are **libraries, not a service** — each package is linked directly into the consuming application at build time. There is no central telemetry process; instrumentation runs in-process alongside the application.
The library normalizes the three-project observability surface: a shared OpenTelemetry Resource driven by a single identity triple (`service.name` / `site.id` / `node.role`), standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and `TraceContextEnricher` for trace↔log correlation.
**Built at 0.1.0. MxAccessGateway logging adopted (MEL → Serilog migration done on its own branch). OtOpcUa and ScadaBridge telemetry adoption is follow-on.** Adoption tracked in `~/Desktop/scadaproj/components/observability/GAPS.md`.
---
## Packages
| Package | Responsibilities | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Telemetry` | `AddZbTelemetry` — wires OTel SDK (metrics + tracing), populates shared Resource (`service.name`, `service.namespace`, `service.version`, `site.id`, `node.role`, `host.name`), registers caller-supplied Meters/ActivitySources, adds standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process), Prometheus always-on exporter, OTLP additive overlay. `app.MapZbMetrics()` — mounts `/metrics`. `ZbTelemetryOptions` — the single options object shared by both packages. | `Microsoft.AspNetCore.App` (framework ref), `OpenTelemetry.*` stack |
| `ZB.MOM.WW.Telemetry.Serilog` | `AddZbSerilog` — shared two-stage Serilog bootstrap: `ReadFrom.Configuration`-driven sinks, `MinimumLevel.Is(Information)` default (config-overridable), `SiteId`/`NodeRole`/`NodeHostname` enrichers from `ZbTelemetryOptions`, `TraceContextEnricher` (writes `trace_id`/`span_id` from `Activity.Current`), `ILogRedactor` seam (per-project sensitive-field redaction via `RedactionEnricher`). Does NOT freeze `Log.Logger` — safe for multi-host/test scenarios. | `ZB.MOM.WW.Telemetry`, `Serilog.*` stack |
---
## Consumer matrix
| Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` |
|---|:---:|:---:|
| **OtOpcUa** | yes (after adoption) | yes (after adoption) |
| **MxAccessGateway** | yes (after adoption) | yes (MEL → Serilog adopted now) |
| **ScadaBridge** | yes (after adoption) | yes (after adoption) |
MxAccessGateway's logging adoption is the one in-pass migration. Full metrics/tracing wiring
for all three apps is follow-on.
---
## Build, test, and pack commands
```bash
# From ZB.MOM.WW.Telemetry/
# Build
dotnet build ZB.MOM.WW.Telemetry.slnx
dotnet build ZB.MOM.WW.Telemetry.slnx -c Release
# Test (no external dependencies — no running OTel collector, no Serilog backend required)
dotnet test ZB.MOM.WW.Telemetry.slnx
# Pack (two .nupkg files land in artifacts/)
dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts
```
All test assemblies run offline:
| Assembly | Tests |
|---|---|
| `ZB.MOM.WW.Telemetry.Tests` | 7 |
| `ZB.MOM.WW.Telemetry.Serilog.Tests` | 12 |
| **Total** | **19** |
`GeneratePackageOnBuild` is off — pack explicitly with the command above.
---
## Status
Built at **0.1.0** and published to the Gitea NuGet feed. MxAccessGateway logging (MEL → Serilog)
adopted on its own branch. **OtOpcUa and ScadaBridge telemetry adoption not yet started**
tracked in the component backlog:
- `~/Desktop/scadaproj/components/observability/GAPS.md` — adoption order, effort, and risk
Design documentation:
- `~/Desktop/scadaproj/components/observability/spec/SPEC.md` — normalized observability target
- `~/Desktop/scadaproj/components/observability/spec/METRIC-CONVENTIONS.md` — metric naming reference
- `~/Desktop/scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md` — proposed shared-library API
- `~/Desktop/scadaproj/components/observability/current-state/` — per-project current state (code-verified)
+12
View File
@@ -0,0 +1,12 @@
<Project>
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<Version>0.1.0</Version>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
</Project>
@@ -0,0 +1,38 @@
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
</PropertyGroup>
<ItemGroup>
<!-- OpenTelemetry core + exporters -->
<PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.15.3" />
<PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.15.3-beta.1" />
<PackageVersion Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.15.3" />
<!-- OpenTelemetry instrumentation libraries -->
<PackageVersion Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.15.2" />
<PackageVersion Include="OpenTelemetry.Instrumentation.Http" Version="1.15.1" />
<PackageVersion Include="OpenTelemetry.Instrumentation.GrpcNetClient" Version="1.15.1-beta.1" />
<PackageVersion Include="OpenTelemetry.Instrumentation.Runtime" Version="1.15.1" />
<PackageVersion Include="OpenTelemetry.Instrumentation.Process" Version="1.15.1-beta.1" />
<!-- Serilog -->
<PackageVersion Include="Serilog" Version="4.3.1" />
<PackageVersion Include="Serilog.AspNetCore" Version="9.0.0" />
<PackageVersion Include="Serilog.Extensions.Hosting" Version="9.0.0" />
<PackageVersion Include="Serilog.Settings.Configuration" Version="9.0.0" />
<PackageVersion Include="Serilog.Sinks.Console" Version="6.0.0" />
<PackageVersion Include="Serilog.Sinks.File" Version="7.0.0" />
<PackageVersion Include="Serilog.Sinks.OpenTelemetry" Version="4.2.0" />
<PackageVersion Include="Serilog.Sinks.InMemory" Version="2.0.0" />
<!-- Test -->
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
<PackageVersion Include="xunit" Version="2.9.3" />
<PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" />
<PackageVersion Include="Microsoft.AspNetCore.Mvc.Testing" Version="10.0.7" />
</ItemGroup>
</Project>
+153
View File
@@ -0,0 +1,153 @@
# ZB.MOM.WW.Telemetry
Observability libraries for the **ZB.MOM.WW SCADA family** (OtOpcUa, MxAccessGateway, ScadaBridge). These are **libraries, not a service** — each package is linked directly into the consuming application at build time. There is no central telemetry process; all instrumentation runs in-process alongside the application.
The library normalizes the three-project observability surface: a shared OpenTelemetry Resource identity, standard instrumentation wiring, Prometheus and OTLP export, and a Serilog bootstrap with enrichers and trace↔log correlation — so metrics, traces, and log lines from the same node carry identical dimensions and can join up in any backend.
---
## Packages
| Package | Description | Key Dependencies |
|---|---|---|
| `ZB.MOM.WW.Telemetry` | `AddZbTelemetry` extension, `ZbTelemetryOptions`, shared OTel Resource builder (`ZbResource`), standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process), Prometheus always-on exporter + OTLP opt-in overlay, `app.MapZbMetrics()` endpoint extension. | `Microsoft.AspNetCore.App` (framework ref), `OpenTelemetry.*` stack |
| `ZB.MOM.WW.Telemetry.Serilog` | `AddZbSerilog` extension, shared enrichers (`SiteId`/`NodeRole`/`NodeHostname`), `TraceContextEnricher` (writes `trace_id`/`span_id` from `Activity.Current` into every log event), `ILogRedactor` seam (per-project sensitive-field redaction), `RedactionEnricher`. | `ZB.MOM.WW.Telemetry`, `Serilog.*` stack |
---
## The unifying hinge
The single `ZbTelemetryOptions` object drives both packages. Its identity triple —
`ServiceName` → OTel Resource `service.name`, `SiteId``site.id`, `NodeRole``node.role`
is applied once and flows automatically to **both** the OpenTelemetry Resource (so every metric
and span carries it) **and** the Serilog enrichers (so every log event carries it). A metric,
a span, and a log line emitted by the same node share identical `service.name`, `site.id`, and
`node.role` dimensions, enabling cross-signal correlation in any backend (Grafana, Jaeger, Seq,
Loki, etc.) without per-project bookkeeping.
---
## Consumer matrix
| Consumer | `ZB.MOM.WW.Telemetry` (core) | `ZB.MOM.WW.Telemetry.Serilog` |
|---|:---:|:---:|
| **OtOpcUa** | yes | yes |
| **MxAccessGateway** | yes | yes (logging adopted — MEL → Serilog migration done) |
| **ScadaBridge** | yes | yes |
All three apps consume both packages after adoption. MxAccessGateway's MEL→Serilog migration
is the one in-pass adoption completed on its own branch; OtOpcUa and ScadaBridge adoption is
follow-on (tracked in `components/observability/GAPS.md`).
---
## OTel signals
`AddZbTelemetry` wires all three OpenTelemetry signals in a single call:
| Signal | What is wired |
|---|---|
| **Metrics** | App Meters (via `options.Meters[]`) + standard: ASP.NET Core, HttpClient, .NET runtime, process. Exported via Prometheus (always on) with OTLP as an additive overlay. |
| **Traces** | App ActivitySources (via `options.ActivitySources[]`) + standard: ASP.NET Core, HttpClient, gRPC client. Exported via OTLP when `Exporter = ZbExporter.Otlp`. |
| **Logs** | Wired by `AddZbSerilog` (companion call). Serilog is used as the log sink; logs are bridged to OpenTelemetry via `Serilog.Sinks.OpenTelemetry` when configured. |
Trace↔log correlation is automatic: `TraceContextEnricher` reads `Activity.Current` for each
log event and attaches `trace_id` and `span_id`, so log events produced inside a traced request
carry the same span identity as the trace backend.
---
## Exporter options
Prometheus is **always wired** for metrics regardless of the `Exporter` setting. OTLP is an
additive overlay — set `Exporter = ZbExporter.Otlp` and `OtlpEndpoint` to push to a collector
in addition to the scrape endpoint.
```csharp
// Prometheus only (default — scrape /metrics)
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
o.Meters = ["ZB.MOM.WW.MxGateway"];
});
// OTLP overlay (metrics + traces pushed to collector; /metrics still active)
builder.AddZbTelemetry(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
o.Meters = ["ZB.MOM.WW.MxGateway"];
o.Exporter = ZbExporter.Otlp;
o.OtlpEndpoint = "http://collector:4317";
});
// Mount the Prometheus scrape endpoint (call after app.UseRouting())
app.MapZbMetrics(); // → /metrics
```
```csharp
// Serilog bootstrap (same options object drives enrichers)
builder.AddZbSerilog(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = config["Site:Id"];
o.NodeRole = "standalone";
});
```
---
## Building and testing
```bash
# from ZB.MOM.WW.Telemetry/
dotnet build ZB.MOM.WW.Telemetry.slnx
dotnet test ZB.MOM.WW.Telemetry.slnx
```
All test assemblies run with no external dependencies (no running OTel collector, no Serilog
backend):
| Assembly | Tests |
|---|---|
| `ZB.MOM.WW.Telemetry.Tests` | 7 |
| `ZB.MOM.WW.Telemetry.Serilog.Tests` | 12 |
| **Total** | **19** |
---
## Packing
```bash
dotnet pack ZB.MOM.WW.Telemetry.slnx -c Release -o ./artifacts
```
Produces two `.nupkg` files in `artifacts/`:
```
ZB.MOM.WW.Telemetry.0.1.0.nupkg
ZB.MOM.WW.Telemetry.Serilog.0.1.0.nupkg
```
`GeneratePackageOnBuild` is off — pack explicitly as above. Both packages are versioned
lockstep from `Directory.Build.props`.
---
## Status
**Built at 0.1.0. MxAccessGateway logging adopted (MEL → Serilog migration, on its own branch).
Broader OtOpcUa and ScadaBridge telemetry adoption deferred.** Adoption is tracked in the
component backlog:
- `~/Desktop/scadaproj/components/observability/GAPS.md`
Design documentation lives alongside that backlog:
- `~/Desktop/scadaproj/components/observability/spec/SPEC.md` — normalized observability target
- `~/Desktop/scadaproj/components/observability/spec/METRIC-CONVENTIONS.md` — metric naming reference
- `~/Desktop/scadaproj/components/observability/shared-contract/ZB.MOM.WW.Telemetry.md` — proposed API
- `~/Desktop/scadaproj/components/observability/current-state/` — per-project current state (code-verified)
@@ -0,0 +1,10 @@
<Solution>
<Folder Name="/src/">
<Project Path="src/ZB.MOM.WW.Telemetry.Serilog/ZB.MOM.WW.Telemetry.Serilog.csproj" />
<Project Path="src/ZB.MOM.WW.Telemetry/ZB.MOM.WW.Telemetry.csproj" />
</Folder>
<Folder Name="/tests/">
<Project Path="tests/ZB.MOM.WW.Telemetry.Serilog.Tests/ZB.MOM.WW.Telemetry.Serilog.Tests.csproj" />
<Project Path="tests/ZB.MOM.WW.Telemetry.Tests/ZB.MOM.WW.Telemetry.Tests.csproj" />
</Folder>
</Solution>
@@ -0,0 +1,17 @@
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Seam for project-specific log-event redaction. The shared library applies this via
/// <see cref="RedactionEnricher"/>; each project provides its own implementation that knows which
/// fields (by property name) or which command payloads must not leave the process in log events.
/// If no <see cref="ILogRedactor"/> is registered in DI, <see cref="RedactionEnricher"/> is a no-op.
/// </summary>
public interface ILogRedactor
{
/// <summary>
/// Inspects and mutates the supplied log-event <paramref name="properties"/> in place — remove
/// or replace any sensitive values. Called on every log event before it reaches any sink.
/// </summary>
/// <param name="properties">The mutable property dictionary for the current log event.</param>
void Redact(IDictionary<string, object?> properties);
}
@@ -0,0 +1,82 @@
using Microsoft.Extensions.DependencyInjection;
using Serilog.Core;
using Serilog.Events;
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Applies a registered <see cref="ILogRedactor"/> to every Serilog log event. Registered
/// automatically by <see cref="ZbSerilogExtensions.AddZbSerilog"/>. The enricher resolves
/// <see cref="ILogRedactor"/> from DI on first use (lazy, to avoid a circular-DI problem during
/// Serilog's bootstrap); if none is registered it is permanently inert — no DI call per event.
/// Resolution is thread-safe: <see cref="LazyThreadSafetyMode.ExecutionAndPublication"/> ensures
/// exactly one DI lookup regardless of how many logging threads race to the first event.
/// </summary>
public sealed class RedactionEnricher : ILogEventEnricher
{
private readonly Lazy<ILogRedactor?> _redactor;
/// <summary>
/// Creates the enricher bound to a service provider from which the project-supplied
/// <see cref="ILogRedactor"/> is resolved lazily on first use (thread-safe).
/// </summary>
/// <param name="serviceProvider">Provider used to resolve a registered <see cref="ILogRedactor"/>.</param>
public RedactionEnricher(IServiceProvider serviceProvider)
{
ArgumentNullException.ThrowIfNull(serviceProvider);
_redactor = new Lazy<ILogRedactor?>(
() => serviceProvider.GetService<ILogRedactor>(),
LazyThreadSafetyMode.ExecutionAndPublication);
}
/// <summary>
/// Hands the log event's scalar properties to the registered <see cref="ILogRedactor"/> and
/// writes back any values the redactor changed. No-op when no redactor is registered.
/// </summary>
/// <param name="logEvent">The log event to redact.</param>
/// <param name="propertyFactory">Factory used to materialize replacement properties.</param>
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
{
ArgumentNullException.ThrowIfNull(logEvent);
ArgumentNullException.ThrowIfNull(propertyFactory);
var redactor = ResolveRedactor();
if (redactor is null)
{
return;
}
var snapshot = new Dictionary<string, object?>(logEvent.Properties.Count);
foreach (var property in logEvent.Properties)
{
snapshot[property.Key] = property.Value is ScalarValue scalar
? scalar.Value
: property.Value;
}
redactor.Redact(snapshot);
foreach (var entry in snapshot)
{
if (HasChanged(logEvent, entry.Key, entry.Value))
{
logEvent.AddOrUpdateProperty(
propertyFactory.CreateProperty(entry.Key, entry.Value));
}
}
}
private ILogRedactor? ResolveRedactor() => _redactor.Value;
private static bool HasChanged(LogEvent logEvent, string key, object? newValue)
{
if (!logEvent.Properties.TryGetValue(key, out var existing))
{
// Redactor added a brand-new property.
return true;
}
var existingValue = existing is ScalarValue scalar ? scalar.Value : existing;
return !Equals(existingValue, newValue);
}
}
@@ -0,0 +1,43 @@
using System.Diagnostics;
using Serilog.Core;
using Serilog.Events;
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Stamps <c>trace_id</c> and <c>span_id</c> from <see cref="Activity.Current"/> onto every Serilog
/// log event, enabling a log line to be correlated back to its originating trace in a backend.
/// When <see cref="Activity.Current"/> is null (no active span — background services, startup,
/// non-traced paths) the enricher emits nothing; it does NOT inject empty strings or zero values.
/// </summary>
public sealed class TraceContextEnricher : ILogEventEnricher
{
/// <summary>Serilog property name carrying the W3C trace id.</summary>
public const string TraceIdPropertyName = "trace_id";
/// <summary>Serilog property name carrying the W3C span id.</summary>
public const string SpanIdPropertyName = "span_id";
/// <summary>
/// Adds <c>trace_id</c>/<c>span_id</c> properties from <see cref="Activity.Current"/> when an
/// activity is active; otherwise leaves the event untouched.
/// </summary>
/// <param name="logEvent">The log event to enrich.</param>
/// <param name="propertyFactory">Factory used to create the trace-context properties.</param>
public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
{
ArgumentNullException.ThrowIfNull(logEvent);
ArgumentNullException.ThrowIfNull(propertyFactory);
var activity = Activity.Current;
if (activity is null)
{
return;
}
logEvent.AddPropertyIfAbsent(
propertyFactory.CreateProperty(TraceIdPropertyName, activity.TraceId.ToString()));
logEvent.AddPropertyIfAbsent(
propertyFactory.CreateProperty(SpanIdPropertyName, activity.SpanId.ToString()));
}
}
@@ -0,0 +1,33 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Telemetry.Serilog</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Serilog structured logging extensions for the ZB.MOM.WW SCADA family. Provides a shared two-stage Serilog bootstrap (AddZbSerilog), enrichers for SiteId/NodeRole/NodeHostname, a TraceContextEnricher for trace_id/span_id correlation with OpenTelemetry spans, and an ILogRedactor seam for per-project sensitive-field redaction.</Description>
<PackageTags>opentelemetry;observability;serilog;logging;tracing;enrichers;scada;wonderware;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-telemetry</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-telemetry</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Serilog" />
<PackageReference Include="Serilog.AspNetCore" />
<PackageReference Include="Serilog.Extensions.Hosting" />
<PackageReference Include="Serilog.Settings.Configuration" />
<PackageReference Include="Serilog.Sinks.Console" />
<PackageReference Include="Serilog.Sinks.File" />
<PackageReference Include="Serilog.Sinks.OpenTelemetry" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.Telemetry\ZB.MOM.WW.Telemetry.csproj" />
</ItemGroup>
<ItemGroup>
<AssemblyAttribute Include="System.Runtime.CompilerServices.InternalsVisibleTo">
<_Parameter1>ZB.MOM.WW.Telemetry.Serilog.Tests</_Parameter1>
</AssemblyAttribute>
</ItemGroup>
</Project>
@@ -0,0 +1,27 @@
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Canonical Serilog property name constants for the identity enrichers stamped by
/// <see cref="ZbSerilogExtensions.AddZbSerilog"/>. Use these constants — not literal strings —
/// when querying properties in sinks or tests. Each property mirrors a shared OTel Resource
/// attribute so logs and metrics/traces from the same node carry identical dimensions.
/// </summary>
public static class ZbLogEnricherNames
{
/// <summary>
/// Serilog property: physical or logical site identifier. Matches OTel Resource <c>site.id</c>.
/// </summary>
public const string SiteId = "SiteId";
/// <summary>
/// Serilog property: node function (<c>central</c>, <c>site</c>, <c>hub</c>, <c>standalone</c>).
/// Matches OTel Resource <c>node.role</c>.
/// </summary>
public const string NodeRole = "NodeRole";
/// <summary>
/// Serilog property: machine name (<see cref="System.Environment.MachineName"/>).
/// Matches OTel Resource <c>host.name</c>. Populated automatically — not a caller-supplied option.
/// </summary>
public const string NodeHostname = "NodeHostname";
}
@@ -0,0 +1,152 @@
using System;
using System.Collections.Generic;
using Serilog;
using Serilog.Configuration;
using Serilog.Sinks.OpenTelemetry;
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Reusable seam that applies the shared ZB.MOM.WW logging configuration (identity enrichers,
/// trace-context correlation, redaction, and OTel log export) to a
/// <see cref="LoggerConfiguration"/>. Shared by <see cref="ZbSerilogExtensions.AddZbSerilog"/>
/// and unit tests so both exercise an identical enricher/sink set.
/// Internal to keep the public NuGet surface minimal; exposed to the test assembly via
/// <c>[assembly: InternalsVisibleTo("ZB.MOM.WW.Telemetry.Serilog.Tests")]</c>.
/// </summary>
internal static class ZbSerilogConfig
{
/// <summary>
/// Applies the shared identity enrichers — <see cref="ZbLogEnricherNames.SiteId"/> and
/// <see cref="ZbLogEnricherNames.NodeRole"/> from <paramref name="options"/>, and
/// <see cref="ZbLogEnricherNames.NodeHostname"/> from
/// <see cref="System.Environment.MachineName"/> (auto, never a caller-supplied option) — to
/// <paramref name="loggerConfiguration"/>. <c>SiteId</c>/<c>NodeRole</c> are stamped only when
/// the option is non-null/non-empty, mirroring the shared OTel Resource omission rules.
/// </summary>
/// <param name="loggerConfiguration">The Serilog configuration to enrich.</param>
/// <param name="options">The telemetry options describing the service identity.</param>
/// <returns>The same <paramref name="loggerConfiguration"/> for chaining.</returns>
public static LoggerConfiguration Apply(
LoggerConfiguration loggerConfiguration,
ZbTelemetryOptions options) =>
Apply(loggerConfiguration, options, serviceProvider: null);
/// <summary>
/// Overload of <see cref="Apply(LoggerConfiguration, ZbTelemetryOptions)"/> that additionally
/// wires the service-provider-dependent stages — the redaction enricher (which lazily resolves
/// a registered <c>ILogRedactor</c>). When <paramref name="serviceProvider"/> is null, only the
/// provider-independent enrichers are applied.
/// </summary>
/// <param name="loggerConfiguration">The Serilog configuration to enrich.</param>
/// <param name="options">The telemetry options describing the service identity.</param>
/// <param name="serviceProvider">
/// Provider used to lazily resolve project-supplied seams (e.g. <c>ILogRedactor</c>);
/// may be null in tests or pipelines without DI.
/// </param>
/// <returns>The same <paramref name="loggerConfiguration"/> for chaining.</returns>
public static LoggerConfiguration Apply(
LoggerConfiguration loggerConfiguration,
ZbTelemetryOptions options,
IServiceProvider? serviceProvider)
{
ArgumentNullException.ThrowIfNull(loggerConfiguration);
ArgumentNullException.ThrowIfNull(options);
LoggerEnrichmentConfiguration enrich = loggerConfiguration.Enrich;
if (!string.IsNullOrEmpty(options.SiteId))
{
enrich.WithProperty(ZbLogEnricherNames.SiteId, options.SiteId);
}
if (!string.IsNullOrEmpty(options.NodeRole))
{
enrich.WithProperty(ZbLogEnricherNames.NodeRole, options.NodeRole);
}
enrich.WithProperty(ZbLogEnricherNames.NodeHostname, Environment.MachineName);
enrich.With(new TraceContextEnricher());
if (serviceProvider is not null)
{
enrich.With(new RedactionEnricher(serviceProvider));
}
ApplyOpenTelemetryExport(loggerConfiguration, options);
return loggerConfiguration;
}
/// <summary>
/// Adds a <c>WriteTo.OpenTelemetry</c> log sink when an OTLP exporter is explicitly
/// selected (<see cref="ZbTelemetryOptions.Exporter"/> = <see cref="ZbExporter.Otlp"/>).
/// <see cref="ZbTelemetryOptions.OtlpEndpoint"/> is the address used when OTLP is selected
/// — it is NOT an independent enable. This matches the core OTel path behaviour so that
/// an endpoint-only config (without <c>Exporter=Otlp</c>) exports nothing to OTLP on any
/// signal. The sink carries the same Resource attributes as <c>ZbResource</c>
/// (<c>service.name</c>/<c>service.namespace</c>/<c>service.version</c>/
/// <c>service.instance.id</c>/<c>site.id</c>/<c>node.role</c>/<c>host.name</c>) so logs
/// correlate with metrics and traces in the backend.
/// </summary>
private static void ApplyOpenTelemetryExport(
LoggerConfiguration loggerConfiguration,
ZbTelemetryOptions options)
{
if (options.Exporter != ZbExporter.Otlp)
{
return;
}
var resourceAttributes = BuildResourceAttributes(options);
loggerConfiguration.WriteTo.OpenTelemetry(sink =>
{
if (!string.IsNullOrEmpty(options.OtlpEndpoint))
{
sink.Endpoint = options.OtlpEndpoint;
}
sink.Protocol = OtlpProtocol.Grpc;
sink.ResourceAttributes = resourceAttributes;
});
}
/// <summary>
/// Builds the OTLP Resource-attribute map mirroring <c>ZbResource</c>. Null/empty optional
/// attributes are omitted, matching the shared Resource's omission rules. The
/// <c>service.instance.id</c> is sourced from <see cref="ZbResource.InstanceId"/> — the
/// same deterministic <c>MachineName:ProcessId</c> value used by the OTel SDK path — so
/// all three signals carry an identical instance identifier. Internal so it can be asserted
/// by the test assembly without being part of the public NuGet API.
/// </summary>
internal static IDictionary<string, object> BuildResourceAttributes(ZbTelemetryOptions options)
{
var attributes = new Dictionary<string, object>
{
["service.name"] = options.ServiceName,
["service.namespace"] = options.ServiceNamespace,
["service.instance.id"] = ZbResource.InstanceId,
["host.name"] = Environment.MachineName,
};
if (!string.IsNullOrEmpty(options.ServiceVersion))
{
attributes["service.version"] = options.ServiceVersion;
}
if (!string.IsNullOrEmpty(options.SiteId))
{
attributes["site.id"] = options.SiteId;
}
if (!string.IsNullOrEmpty(options.NodeRole))
{
attributes["node.role"] = options.NodeRole;
}
return attributes;
}
}
@@ -0,0 +1,84 @@
using Microsoft.Extensions.Hosting;
using Serilog;
using Serilog.Events;
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.Telemetry.Serilog;
/// <summary>
/// Extension point for configuring the shared Serilog application logger on an
/// <see cref="IHostApplicationBuilder"/>. Wires config-driven sinks
/// (<c>ReadFrom.Configuration</c>), an explicit minimum level (<c>Serilog:MinimumLevel</c>,
/// default <see cref="LogEventLevel.Information"/>), and the shared enricher/redaction/OTel-export
/// set via <see cref="ZbSerilogConfig"/>. Does NOT configure OTel metrics/traces — call
/// <c>AddZbTelemetry</c> in the core package for that.
///
/// <para>
/// This method intentionally does <strong>not</strong> set the process-global
/// <see cref="Log.Logger"/> (via <c>CreateBootstrapLogger</c> or otherwise). Mutating
/// process-global state in a shared library causes "logger is already frozen" exceptions
/// when multiple hosts are built in the same process (integration tests, multi-host apps).
/// </para>
/// <para>
/// Apps that need a pre-<c>Build()</c> bootstrap logger to capture early startup exceptions
/// should set <see cref="Log.Logger"/> themselves in <c>Program.cs</c> before calling
/// <c>AddZbSerilog</c>:
/// <code>
/// Log.Logger = new LoggerConfiguration().WriteTo.Console().CreateBootstrapLogger();
/// // ... then build the host ...
/// builder.AddZbSerilog(o => { ... });
/// </code>
/// This keeps global-state mutation firmly in the application, not the library.
/// </para>
/// </summary>
public static class ZbSerilogExtensions
{
/// <summary>
/// Registers the Serilog application logger in DI. Wires configuration-driven sinks
/// (<c>ReadFrom.Configuration</c>), a code default of <see cref="LogEventLevel.Information"/>
/// that config can override via <c>Serilog:MinimumLevel</c> or namespace overrides, plus
/// the identity enrichers (<c>SiteId</c>/<c>NodeRole</c> from <paramref name="configure"/>,
/// <c>NodeHostname</c> = <see cref="System.Environment.MachineName"/>).
///
/// <para>
/// This method does <strong>not</strong> set the process-global <see cref="Log.Logger"/>.
/// <c>preserveStaticLogger: true</c> is passed to <c>AddSerilog</c> so the static logger
/// is left entirely untouched — safe to call multiple times in the same process (integration
/// tests, multi-host scenarios) without hitting "logger is already frozen".
/// </para>
/// <para>
/// If early-startup bootstrap logging is required (before <c>Build()</c>), set
/// <c>Log.Logger = new LoggerConfiguration().WriteTo.Console().CreateBootstrapLogger();</c>
/// in <c>Program.cs</c> before calling this method. That decision belongs to the
/// application, not the shared library.
/// </para>
/// </summary>
/// <param name="builder">The host application builder.</param>
/// <param name="configure">Populates the <see cref="ZbTelemetryOptions"/>.</param>
public static IHostApplicationBuilder AddZbSerilog(
this IHostApplicationBuilder builder,
Action<ZbTelemetryOptions> configure)
{
ArgumentNullException.ThrowIfNull(builder);
ArgumentNullException.ThrowIfNull(configure);
var options = new ZbTelemetryOptions();
configure(options);
// Register the application logger in DI only. preserveStaticLogger: true ensures
// AddSerilog does NOT freeze or replace Log.Logger — critical for multi-host
// processes (integration tests etc.) where AddZbSerilog may be called more than once.
builder.Services.AddSerilog(
(serviceProvider, loggerConfiguration) =>
{
loggerConfiguration
.MinimumLevel.Is(LogEventLevel.Information)
.ReadFrom.Configuration(builder.Configuration);
ZbSerilogConfig.Apply(loggerConfiguration, options, serviceProvider);
},
preserveStaticLogger: true);
return builder;
}
}
@@ -0,0 +1,34 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>true</IsPackable>
<PackageId>ZB.MOM.WW.Telemetry</PackageId>
<Authors>ZB.MOM.WW</Authors>
<Description>Core OpenTelemetry extensions for the ZB.MOM.WW SCADA family. Wires the OTel SDK (metrics + tracing + logs), populates a shared Resource (service.name/site.id/node.role), registers standard instrumentation (ASP.NET Core, HttpClient, runtime, process), and maps a Prometheus /metrics endpoint. OTLP exporter opt-in overlay included.</Description>
<PackageTags>opentelemetry;observability;metrics;tracing;prometheus;otlp;aspnetcore;scada;wonderware;zb-mom-ww</PackageTags>
<PackageProjectUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-telemetry</PackageProjectUrl>
<RepositoryUrl>https://gitea.dohertylan.com/dohertj2/zb-mom-ww-telemetry</RepositoryUrl>
</PropertyGroup>
<ItemGroup>
<!--
Microsoft.AspNetCore.App is a shared framework, not a NuGet package. It brings in the
ASP.NET Core middleware surface (MapZbMetrics, instrumentation, routing, etc.).
Referencing the shared framework is the supported path for net10.0 libraries that
target ASP.NET Core.
-->
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="OpenTelemetry.Extensions.Hosting" />
<PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" />
<PackageReference Include="OpenTelemetry.Instrumentation.Http" />
<PackageReference Include="OpenTelemetry.Instrumentation.GrpcNetClient" />
<PackageReference Include="OpenTelemetry.Instrumentation.Runtime" />
<PackageReference Include="OpenTelemetry.Instrumentation.Process" />
</ItemGroup>
</Project>
@@ -0,0 +1,22 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Routing;
namespace ZB.MOM.WW.Telemetry;
/// <summary>
/// Endpoint extension for mounting the Prometheus <c>/metrics</c> scrape endpoint.
/// </summary>
public static class ZbMetricsEndpointExtensions
{
/// <summary>
/// Mounts the Prometheus <c>/metrics</c> endpoint. Only valid when
/// <see cref="ZbTelemetryOptions.Exporter"/> = <see cref="ZbExporter.Prometheus"/>.
/// Call after <c>app.UseRouting()</c>.
/// </summary>
/// <param name="endpoints">The endpoint route builder.</param>
public static IEndpointConventionBuilder MapZbMetrics(this IEndpointRouteBuilder endpoints)
{
ArgumentNullException.ThrowIfNull(endpoints);
return endpoints.MapPrometheusScrapingEndpoint();
}
}
@@ -0,0 +1,65 @@
using System.Collections.Generic;
using OpenTelemetry.Resources;
namespace ZB.MOM.WW.Telemetry;
/// <summary>
/// Builds the shared OpenTelemetry ResourceBuilder from <see cref="ZbTelemetryOptions"/>.
/// Used internally by <c>AddZbTelemetry</c> so metrics, traces, and logs carry an identical
/// Resource. Exposed for tests and custom pipelines.
/// </summary>
public static class ZbResource
{
/// <summary>
/// Deterministic, process-stable service instance identifier. Formatted as
/// <c>MachineName:ProcessId</c> so that every signal (metrics, traces, logs) from the same
/// process carries the exact same <c>service.instance.id</c>, enabling cross-signal
/// correlation without a random GUID that changes on each startup.
/// </summary>
public static string InstanceId =>
$"{System.Environment.MachineName}:{System.Environment.ProcessId}";
/// <summary>
/// Returns a <see cref="ResourceBuilder"/> pre-populated with <c>service.name</c>,
/// <c>service.namespace</c>, <c>service.version</c>, <c>service.instance.id</c>,
/// <c>site.id</c>, <c>node.role</c>, and <c>host.name</c> (always
/// <see cref="System.Environment.MachineName"/>). Attributes with null values are omitted
/// from the Resource.
/// </summary>
/// <param name="options">The telemetry options describing the service identity.</param>
public static ResourceBuilder Build(ZbTelemetryOptions options) =>
Configure(ResourceBuilder.CreateDefault(), options);
/// <summary>
/// Applies the shared ZB.MOM.WW Resource attributes to an existing <see cref="ResourceBuilder"/>.
/// Internal seam so the <c>AddZbTelemetry</c> pipeline produces a Resource identical to
/// <see cref="Build"/>.
/// </summary>
internal static ResourceBuilder Configure(ResourceBuilder builder, ZbTelemetryOptions options)
{
builder.AddService(
serviceName: options.ServiceName,
serviceNamespace: options.ServiceNamespace,
serviceVersion: options.ServiceVersion,
autoGenerateServiceInstanceId: false,
serviceInstanceId: InstanceId);
var attributes = new List<KeyValuePair<string, object>>
{
new("host.name", System.Environment.MachineName),
};
if (!string.IsNullOrEmpty(options.SiteId))
{
attributes.Add(new("site.id", options.SiteId));
}
if (!string.IsNullOrEmpty(options.NodeRole))
{
attributes.Add(new("node.role", options.NodeRole));
}
builder.AddAttributes(attributes);
return builder;
}
}
@@ -0,0 +1,136 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;
namespace ZB.MOM.WW.Telemetry;
/// <summary>
/// Extension point for configuring the OpenTelemetry metrics + traces bootstrap on an
/// <see cref="IHostApplicationBuilder"/> (or directly on an <see cref="IServiceCollection"/>).
/// Wires the shared Resource, standard instrumentation, the app's own Meters and
/// ActivitySources, and the selected exporter. Does NOT configure Serilog.
/// </summary>
public static class ZbTelemetryExtensions
{
/// <summary>
/// Configures the OpenTelemetry MeterProvider and TracerProvider with the shared Resource,
/// standard instrumentation (ASP.NET Core, HttpClient, gRPC client, runtime, process), the
/// app's own Meters and ActivitySources, and the selected exporter.
/// </summary>
/// <param name="builder">The host application builder.</param>
/// <param name="configure">Populates the <see cref="ZbTelemetryOptions"/>.</param>
public static IHostApplicationBuilder AddZbTelemetry(
this IHostApplicationBuilder builder,
Action<ZbTelemetryOptions> configure)
{
ArgumentNullException.ThrowIfNull(builder);
ArgumentNullException.ThrowIfNull(configure);
builder.Services.AddZbTelemetry(BuildOptions(configure));
return builder;
}
/// <summary>
/// <see cref="IServiceCollection"/> overload for contexts where
/// <see cref="IHostApplicationBuilder"/> is not available. Requires the caller to supply a
/// pre-built <see cref="ZbTelemetryOptions"/>.
/// </summary>
/// <param name="services">The service collection.</param>
/// <param name="options">The fully-populated telemetry options.</param>
public static IServiceCollection AddZbTelemetry(
this IServiceCollection services,
ZbTelemetryOptions options)
{
ArgumentNullException.ThrowIfNull(services);
ArgumentNullException.ThrowIfNull(options);
services.AddOpenTelemetry()
.ConfigureResource(rb => ZbResource.Configure(rb, options))
.WithMetrics(metrics =>
{
foreach (var meter in options.Meters)
{
metrics.AddMeter(meter);
}
metrics
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddProcessInstrumentation();
ApplyMetricsExporter(metrics, options);
})
.WithTracing(tracing =>
{
foreach (var source in options.ActivitySources)
{
tracing.AddSource(source);
}
tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddGrpcClientInstrumentation();
ApplyTracingExporter(tracing, options);
});
return services;
}
/// <summary>
/// IServiceCollection overload that accepts a configure delegate (convenience for callers
/// that only have an <see cref="IServiceCollection"/> but prefer the lambda form).
/// </summary>
/// <param name="services">The service collection.</param>
/// <param name="configure">Populates the <see cref="ZbTelemetryOptions"/>.</param>
public static IServiceCollection AddZbTelemetry(
this IServiceCollection services,
Action<ZbTelemetryOptions> configure) =>
services.AddZbTelemetry(BuildOptions(configure));
private static ZbTelemetryOptions BuildOptions(Action<ZbTelemetryOptions> configure)
{
ArgumentNullException.ThrowIfNull(configure);
var options = new ZbTelemetryOptions();
configure(options);
if (string.IsNullOrWhiteSpace(options.ServiceName))
{
throw new ArgumentException(
"ZbTelemetryOptions.ServiceName is required (e.g. \"otopcua\").",
nameof(configure));
}
return options;
}
private static void ApplyMetricsExporter(MeterProviderBuilder metrics, ZbTelemetryOptions options)
{
// Prometheus is always wired so that /metrics and MapZbMetrics() work regardless of
// the exporter setting. OTLP is an additive overlay when explicitly requested.
metrics.AddPrometheusExporter();
if (options.Exporter == ZbExporter.Otlp)
{
metrics.AddOtlpExporter(o => ConfigureOtlp(o, options));
}
}
private static void ApplyTracingExporter(TracerProviderBuilder tracing, ZbTelemetryOptions options)
{
// Prometheus is metrics-only; traces have no Prometheus path. Only OTLP exports traces.
if (options.Exporter == ZbExporter.Otlp)
{
tracing.AddOtlpExporter(o => ConfigureOtlp(o, options));
}
}
private static void ConfigureOtlp(
OpenTelemetry.Exporter.OtlpExporterOptions otlp,
ZbTelemetryOptions options)
{
if (!string.IsNullOrEmpty(options.OtlpEndpoint))
{
otlp.Endpoint = new Uri(options.OtlpEndpoint);
}
}
}
@@ -0,0 +1,76 @@
namespace ZB.MOM.WW.Telemetry;
/// <summary>
/// Selects how instrumentation data is exported.
/// </summary>
public enum ZbExporter
{
/// <summary>
/// Prometheus scrape endpoint (default). Call <c>app.MapZbMetrics()</c> to mount <c>/metrics</c>.
/// </summary>
Prometheus,
/// <summary>
/// OTLP gRPC export. Set <see cref="ZbTelemetryOptions.OtlpEndpoint"/>
/// (e.g. <c>"http://collector:4317"</c>).
/// </summary>
Otlp,
}
/// <summary>
/// Options for <c>AddZbTelemetry</c>. All properties feed the shared OpenTelemetry Resource.
/// </summary>
public sealed class ZbTelemetryOptions
{
/// <summary>
/// Required. Short lower-case app identifier — e.g. <c>"otopcua"</c>, <c>"mxgateway"</c>,
/// <c>"scadabridge"</c>. Populates OTel Resource <c>service.name</c>.
/// </summary>
public string ServiceName { get; set; } = "";
/// <summary>
/// Fleet-wide namespace. Default <c>"ZB.MOM.WW"</c>. Do not override per-app.
/// Populates OTel Resource <c>service.namespace</c>.
/// </summary>
public string ServiceNamespace { get; set; } = "ZB.MOM.WW";
/// <summary>
/// Optional. Populate from <c>AssemblyInformationalVersion</c>.
/// Populates OTel Resource <c>service.version</c>.
/// </summary>
public string? ServiceVersion { get; set; }
/// <summary>
/// Optional. Physical or logical site identifier.
/// Populates OTel Resource <c>site.id</c>.
/// </summary>
public string? SiteId { get; set; }
/// <summary>
/// Optional. Node function: <c>"central"</c>, <c>"site"</c>, <c>"hub"</c>, <c>"standalone"</c>.
/// Populates OTel Resource <c>node.role</c>.
/// </summary>
public string? NodeRole { get; set; }
/// <summary>
/// App-specific Meter names to register with the OTel MeterProvider. Standard instrumentation
/// meters are added automatically (ASP.NET Core, HttpClient, runtime, process).
/// </summary>
public string[] Meters { get; set; } = [];
/// <summary>
/// App-specific ActivitySource names to register with the OTel TracerProvider.
/// </summary>
public string[] ActivitySources { get; set; } = [];
/// <summary>
/// Export path. Default Prometheus; use <see cref="ZbExporter.Otlp"/> for a real collector.
/// </summary>
public ZbExporter Exporter { get; set; } = ZbExporter.Prometheus;
/// <summary>
/// Required when <see cref="Exporter"/> = <see cref="ZbExporter.Otlp"/>.
/// OTLP gRPC endpoint, e.g. <c>"http://collector:4317"</c>.
/// </summary>
public string? OtlpEndpoint { get; set; }
}
@@ -0,0 +1,75 @@
using Serilog;
using Serilog.Events;
using Serilog.Sinks.InMemory;
using ZB.MOM.WW.Telemetry;
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.Telemetry.Serilog.Tests;
public sealed class EnricherTests
{
private static string ScalarValue(LogEvent logEvent, string propertyName)
{
Assert.True(
logEvent.Properties.TryGetValue(propertyName, out var value),
$"expected property '{propertyName}' to be present");
var scalar = Assert.IsType<ScalarValue>(value);
return scalar.Value?.ToString() ?? "";
}
[Fact]
public void Identity_enrichers_stamp_SiteId_NodeRole_and_NodeHostname()
{
var sink = new InMemorySink();
var options = new ZbTelemetryOptions
{
ServiceName = "otopcua",
SiteId = "s1",
NodeRole = "Central",
};
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options);
using var logger = loggerConfig
.WriteTo.Sink(sink)
.CreateLogger();
logger.Information("hello");
var logEvent = Assert.Single(sink.LogEvents);
Assert.Equal("s1", ScalarValue(logEvent, ZbLogEnricherNames.SiteId));
Assert.Equal("Central", ScalarValue(logEvent, ZbLogEnricherNames.NodeRole));
Assert.Equal(
Environment.MachineName,
ScalarValue(logEvent, ZbLogEnricherNames.NodeHostname));
}
[Fact]
public void Null_SiteId_and_NodeRole_are_suppressed_but_NodeHostname_is_always_present()
{
var sink = new InMemorySink();
var options = new ZbTelemetryOptions
{
ServiceName = "otopcua",
SiteId = null,
NodeRole = null,
};
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options);
using var logger = loggerConfig
.WriteTo.Sink(sink)
.CreateLogger();
logger.Information("hello");
var logEvent = Assert.Single(sink.LogEvents);
Assert.False(logEvent.Properties.ContainsKey(ZbLogEnricherNames.SiteId),
"SiteId should be absent when null");
Assert.False(logEvent.Properties.ContainsKey(ZbLogEnricherNames.NodeRole),
"NodeRole should be absent when null");
Assert.Equal(
Environment.MachineName,
ScalarValue(logEvent, ZbLogEnricherNames.NodeHostname));
}
}
@@ -0,0 +1,102 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Serilog;
using ZB.MOM.WW.Telemetry;
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.Telemetry.Serilog.Tests;
/// <summary>
/// Regression tests for the process-global-state hazard: AddZbSerilog must not set or
/// freeze Log.Logger. When multiple hosts are built in the same process (integration
/// tests, multi-host apps) AddZbSerilog must be callable repeatedly without throwing
/// "The logger is already frozen".
/// </summary>
public sealed class MultiHostTests
{
[Fact]
public void AddZbSerilog_called_twice_in_same_process_does_not_throw()
{
// Arrange + Act: build two completely independent hosts in the same test process.
// Prior to the fix, the second call to AddZbSerilog would crash with
// "The logger is already frozen" because Stage-1 set the process-global Log.Logger.
var exception = Record.Exception(() =>
{
var builder1 = Host.CreateApplicationBuilder();
builder1.AddZbSerilog(o =>
{
o.ServiceName = "host-one";
o.SiteId = "s1";
o.NodeRole = "central";
});
using var host1 = builder1.Build();
var builder2 = Host.CreateApplicationBuilder();
builder2.AddZbSerilog(o =>
{
o.ServiceName = "host-two";
o.SiteId = "s2";
o.NodeRole = "site";
});
using var host2 = builder2.Build();
});
Assert.Null(exception);
}
[Fact]
public void AddZbSerilog_does_not_mutate_global_Log_Logger()
{
// Capture whatever the static logger is before calling AddZbSerilog.
var loggerBefore = Log.Logger;
var builder = Host.CreateApplicationBuilder();
builder.AddZbSerilog(o =>
{
o.ServiceName = "no-global-state";
});
using var host = builder.Build();
// AddZbSerilog must leave Log.Logger exactly as it was found.
// (ReferenceEquals is the right check — it must be the *same* instance, not
// just an equivalent one, so we know the library never touched the static field.)
Assert.True(
ReferenceEquals(loggerBefore, Log.Logger),
"AddZbSerilog must not replace or freeze the global Log.Logger");
}
[Fact]
public void AddZbSerilog_each_host_resolves_its_own_DI_ILogger()
{
// Both hosts must resolve a working Serilog ILogger from DI independently —
// neither host's logger is the process-global Log.Logger.
var builder1 = Host.CreateApplicationBuilder();
builder1.AddZbSerilog(o => { o.ServiceName = "host-a"; });
using var host1 = builder1.Build();
var builder2 = Host.CreateApplicationBuilder();
builder2.AddZbSerilog(o => { o.ServiceName = "host-b"; });
using var host2 = builder2.Build();
var logger1 = host1.Services.GetRequiredService<ILogger>();
var logger2 = host2.Services.GetRequiredService<ILogger>();
// Both are non-null and independently functional.
Assert.NotNull(logger1);
Assert.NotNull(logger2);
// They are distinct instances (each host has its own application logger).
Assert.False(
ReferenceEquals(logger1, logger2),
"each host must have its own DI-registered ILogger instance");
// Neither matches the global Log.Logger — the library must not have promoted
// a DI logger to process-global state.
Assert.False(
ReferenceEquals(logger1, Log.Logger),
"host-a's DI logger must not be the global Log.Logger");
Assert.False(
ReferenceEquals(logger2, Log.Logger),
"host-b's DI logger must not be the global Log.Logger");
}
}
@@ -0,0 +1,204 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Serilog;
using Serilog.Core;
using Serilog.Events;
using Serilog.Sinks.InMemory;
using ZB.MOM.WW.Telemetry;
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.Telemetry.Serilog.Tests;
public sealed class RedactionTests
{
private const string Masked = "***";
private sealed class FakeRedactor : ILogRedactor
{
public void Redact(IDictionary<string, object?> properties)
{
if (properties.ContainsKey("apiKey"))
{
properties["apiKey"] = Masked;
}
}
}
private static string? ScalarOrNull(LogEvent logEvent, string propertyName) =>
logEvent.Properties.TryGetValue(propertyName, out var value) && value is ScalarValue scalar
? scalar.Value?.ToString()
: null;
[Fact]
public void Registered_redactor_masks_sensitive_property()
{
var serviceProvider = new ServiceCollection()
.AddSingleton<ILogRedactor>(new FakeRedactor())
.BuildServiceProvider();
var sink = new InMemorySink();
var options = new ZbTelemetryOptions { ServiceName = "mxgateway" };
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options, serviceProvider);
using Logger logger = loggerConfig.WriteTo.Sink(sink).CreateLogger();
logger.Information("authenticating {apiKey}", "mxgw_secret");
var logEvent = Assert.Single(sink.LogEvents);
Assert.Equal(Masked, ScalarOrNull(logEvent, "apiKey"));
}
[Fact]
public void No_redactor_registered_is_a_no_op()
{
var serviceProvider = new ServiceCollection().BuildServiceProvider();
var sink = new InMemorySink();
var options = new ZbTelemetryOptions { ServiceName = "mxgateway" };
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options, serviceProvider);
using Logger logger = loggerConfig.WriteTo.Sink(sink).CreateLogger();
logger.Information("authenticating {apiKey}", "mxgw_secret");
var logEvent = Assert.Single(sink.LogEvents);
Assert.Equal("mxgw_secret", ScalarOrNull(logEvent, "apiKey"));
}
[Fact]
public void AddZbSerilog_with_otlp_options_builds_without_error()
{
var builder = Host.CreateApplicationBuilder();
builder.AddZbSerilog(o =>
{
o.ServiceName = "mxgateway";
o.SiteId = "s1";
o.NodeRole = "central";
o.Exporter = ZbExporter.Otlp;
o.OtlpEndpoint = "http://localhost:4317";
});
using var host = builder.Build();
// Serilog.ILogger is registered by AddSerilog — not Microsoft.Extensions.Logging.ILogger.
var logger = host.Services.GetRequiredService<ILogger>();
logger.Information("otlp wiring smoke test");
}
[Fact]
public void BuildResourceAttributes_contains_required_keys_and_optional_keys_when_set()
{
var options = new ZbTelemetryOptions
{
ServiceName = "mxgateway",
ServiceNamespace = "ZB.MOM.WW",
SiteId = "site-a",
NodeRole = "central",
};
var attributes = ZbSerilogConfig.BuildResourceAttributes(options);
// Required keys always present.
Assert.True(attributes.ContainsKey("service.name"), "service.name must be present");
Assert.True(attributes.ContainsKey("service.namespace"), "service.namespace must be present");
Assert.True(attributes.ContainsKey("host.name"), "host.name must be present");
// service.instance.id must be present and match ZbResource.InstanceId (parity with OTel SDK path).
Assert.True(attributes.ContainsKey("service.instance.id"), "service.instance.id must be present");
Assert.Equal(ZbResource.InstanceId, attributes["service.instance.id"]);
// Optional keys present when options supply them.
Assert.True(attributes.ContainsKey("site.id"), "site.id must be present when SiteId is set");
Assert.True(attributes.ContainsKey("node.role"), "node.role must be present when NodeRole is set");
Assert.Equal("mxgateway", attributes["service.name"]);
Assert.Equal("ZB.MOM.WW", attributes["service.namespace"]);
Assert.Equal(Environment.MachineName, attributes["host.name"]);
Assert.Equal("site-a", attributes["site.id"]);
Assert.Equal("central", attributes["node.role"]);
}
[Fact]
public void BuildResourceAttributes_omits_optional_keys_when_not_set()
{
var options = new ZbTelemetryOptions
{
ServiceName = "mxgateway",
SiteId = null,
NodeRole = null,
};
var attributes = ZbSerilogConfig.BuildResourceAttributes(options);
Assert.False(attributes.ContainsKey("site.id"), "site.id must be absent when SiteId is null");
Assert.False(attributes.ContainsKey("node.role"), "node.role must be absent when NodeRole is null");
// service.instance.id is always present regardless of optional fields.
Assert.True(attributes.ContainsKey("service.instance.id"), "service.instance.id must always be present");
Assert.Equal(ZbResource.InstanceId, attributes["service.instance.id"]);
}
/// <summary>
/// Fix 1 — Symmetric OTLP trigger: the Serilog path must only activate the OTel log sink
/// when <c>Exporter == ZbExporter.Otlp</c>, NOT merely when <c>OtlpEndpoint</c> is set.
/// This matches the core OTel metrics/traces path that ignores a bare endpoint without
/// <c>Exporter=Otlp</c>.
/// </summary>
[Fact]
public void ApplyOpenTelemetryExport_does_not_activate_when_only_endpoint_is_set()
{
// Arrange: set OtlpEndpoint but leave Exporter at the default (not Otlp).
var options = new ZbTelemetryOptions
{
ServiceName = "mxgateway",
OtlpEndpoint = "http://localhost:4317",
// Exporter is intentionally left at default (ZbExporter.None / Prometheus only)
};
// Act: Apply the shared Serilog config — if the bug is present this will attempt to
// connect to localhost:4317 and the OpenTelemetry sink will be registered.
// We verify by inspecting the LoggerConfiguration directly: after Apply, if WriteTo
// contained an OTel sink the LoggerConfiguration's internal list would be non-empty.
// The simplest observable proxy: building the logger must not throw, and we assert
// the exporter is not Otlp.
Assert.NotEqual(ZbExporter.Otlp, options.Exporter);
// Building the logger with only OtlpEndpoint set (no Exporter=Otlp) must not throw
// and must not attempt any OTLP connection — the sink should simply be absent.
var exception = Record.Exception(() =>
{
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options);
using var logger = loggerConfig.CreateLogger();
logger.Information("no otlp sink expected");
});
Assert.Null(exception);
}
[Fact]
public void ApplyOpenTelemetryExport_activates_when_Exporter_is_Otlp()
{
// Arrange: Exporter explicitly set to Otlp (no endpoint — exporter registered but won't connect).
var options = new ZbTelemetryOptions
{
ServiceName = "mxgateway",
Exporter = ZbExporter.Otlp,
// OtlpEndpoint intentionally left null — we test the trigger, not the connection.
};
// Act + Assert: must not throw (the sink is registered but won't connect in tests).
var exception = Record.Exception(() =>
{
var loggerConfig = new LoggerConfiguration();
ZbSerilogConfig.Apply(loggerConfig, options);
using var logger = loggerConfig.CreateLogger();
logger.Information("otlp sink registered");
});
Assert.Null(exception);
}
}
@@ -0,0 +1,71 @@
using System.Diagnostics;
using Serilog;
using Serilog.Core;
using Serilog.Events;
using Serilog.Sinks.InMemory;
using ZB.MOM.WW.Telemetry;
using ZB.MOM.WW.Telemetry.Serilog;
namespace ZB.MOM.WW.Telemetry.Serilog.Tests;
public sealed class TraceContextEnricherTests
{
private const string SourceName = "ZB.MOM.WW.Telemetry.Serilog.Tests.TraceContext";
private static Logger BuildLogger(InMemorySink sink) =>
new LoggerConfiguration()
.Enrich.With(new TraceContextEnricher())
.WriteTo.Sink(sink)
.CreateLogger();
private static string? ScalarOrNull(LogEvent logEvent, string propertyName) =>
logEvent.Properties.TryGetValue(propertyName, out var value) && value is ScalarValue scalar
? scalar.Value?.ToString()
: null;
[Fact]
public void Active_activity_stamps_trace_id_and_span_id()
{
using var listener = new ActivityListener
{
ShouldListenTo = source => source.Name == SourceName,
Sample = (ref ActivityCreationOptions<ActivityContext> _) =>
ActivitySamplingResult.AllDataAndRecorded,
};
ActivitySource.AddActivityListener(listener);
using var activitySource = new ActivitySource(SourceName);
var sink = new InMemorySink();
using var logger = BuildLogger(sink);
using var activity = activitySource.StartActivity("unit-test");
Assert.NotNull(activity);
Assert.NotNull(Activity.Current);
// Capture IDs before the log call so assertions are not sensitive to activity
// lifecycle — Activity.Current may differ after the log call returns.
var expectedTraceId = activity.TraceId.ToString();
var expectedSpanId = activity.SpanId.ToString();
logger.Information("traced");
var logEvent = Assert.Single(sink.LogEvents);
Assert.Equal(expectedTraceId, ScalarOrNull(logEvent, "trace_id"));
Assert.Equal(expectedSpanId, ScalarOrNull(logEvent, "span_id"));
}
[Fact]
public void No_active_activity_omits_trace_id_and_span_id()
{
Assert.Null(Activity.Current);
var sink = new InMemorySink();
using var logger = BuildLogger(sink);
logger.Information("untraced");
var logEvent = Assert.Single(sink.LogEvents);
Assert.False(logEvent.Properties.ContainsKey("trace_id"));
Assert.False(logEvent.Properties.ContainsKey("span_id"));
}
}
@@ -0,0 +1,23 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Serilog.Sinks.InMemory" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Telemetry.Serilog\ZB.MOM.WW.Telemetry.Serilog.csproj" />
</ItemGroup>
</Project>
@@ -0,0 +1,143 @@
using System.Diagnostics.Metrics;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.DependencyInjection;
using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.Telemetry.Tests;
public sealed class AddZbTelemetryTests
{
// Fix #2: empty ServiceName must throw ArgumentException --------------------------
[Fact]
public void AddZbTelemetry_Throws_WhenServiceNameIsEmpty()
{
var builder = WebApplication.CreateBuilder();
var ex = Assert.Throws<ArgumentException>(() =>
builder.AddZbTelemetry(o =>
{
o.ServiceName = ""; // explicitly empty
}));
Assert.Equal("configure", ex.ParamName);
}
[Fact]
public void AddZbTelemetry_Throws_WhenServiceNameIsWhitespace()
{
var builder = WebApplication.CreateBuilder();
var ex = Assert.Throws<ArgumentException>(() =>
builder.AddZbTelemetry(o =>
{
o.ServiceName = " ";
}));
Assert.Equal("configure", ex.ParamName);
}
// Fix #1: Prometheus coexists with OTLP — /metrics must still serve under Otlp exporter
[Fact]
public async Task AddZbTelemetry_OtlpExporter_StillServesPrometheusEndpoint()
{
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseUrls("http://127.0.0.1:0");
builder.AddZbTelemetry(o =>
{
o.ServiceName = "telemetry-test";
o.Exporter = ZbExporter.Otlp;
// OtlpEndpoint intentionally left null — exporter will be registered but won't
// connect anywhere; we are only verifying Prometheus remains present.
o.Meters = ["Test.OtlpCoexist.Meter"];
});
await using var app = builder.Build();
app.MapZbMetrics();
await app.StartAsync();
var address = app.Urls.First();
using var client = new HttpClient { BaseAddress = new Uri(address) };
var response = await client.GetAsync("/metrics");
Assert.Equal(System.Net.HttpStatusCode.OK, response.StatusCode);
Assert.Equal("text/plain", response.Content.Headers.ContentType?.MediaType);
await app.StopAsync();
}
// Existing test ---------------------------------------------------------------
[Fact]
public void AddZbTelemetry_ExportsAppMeter_WithSharedResource()
{
// 1.15.x note: AddInMemoryExporter moved out of the core OpenTelemetry assembly into a
// separate OpenTelemetry.Exporter.InMemory package (not referenced here). We attach a
// BaseExporter<Metric> directly instead — it both collects metric names and exposes the
// MeterProvider Resource via ParentProvider.GetResource().
var capture = new CapturingMetricExporter();
var builder = WebApplication.CreateBuilder();
builder.AddZbTelemetry(o =>
{
o.ServiceName = "t";
o.SiteId = "site-test";
o.NodeRole = "central";
o.Meters = ["Test.Meter"];
});
// Compose a capturing reader onto the pipeline AddZbTelemetry already registered.
builder.Services.ConfigureOpenTelemetryMeterProvider(b =>
b.AddReader(new PeriodicExportingMetricReader(capture)
{
TemporalityPreference = MetricReaderTemporalityPreference.Cumulative,
}));
// Create the meter + instrument BEFORE the provider is built so the MeterProvider's
// listener subscribes to it during construction.
using var meter = new Meter("Test.Meter");
var counter = meter.CreateCounter<long>("test.events.count");
using var app = builder.Build();
var meterProvider = app.Services.GetRequiredService<MeterProvider>();
counter.Add(1);
meterProvider.ForceFlush();
// The app's meter was registered and its instrument was collected through the pipeline.
Assert.Contains("test.events.count", capture.MetricNames);
// The exported metric carries the shared Resource (identical to ZbResource.Build).
Assert.NotNull(capture.CapturedResource);
var attrs = capture.CapturedResource!.Attributes.ToDictionary(a => a.Key, a => a.Value);
Assert.Equal("t", attrs["service.name"]);
Assert.Equal("ZB.MOM.WW", attrs["service.namespace"]);
Assert.Equal("site-test", attrs["site.id"]);
Assert.Equal("central", attrs["node.role"]);
Assert.Equal(Environment.MachineName, attrs["host.name"]);
}
/// <summary>
/// Collects exported metric names and captures the MeterProvider Resource on first export so
/// the test can assert the pipeline wired both the app meter and the shared Resource.
/// </summary>
private sealed class CapturingMetricExporter : BaseExporter<Metric>
{
public List<string> MetricNames { get; } = [];
public Resource? CapturedResource { get; private set; }
public override ExportResult Export(in Batch<Metric> batch)
{
CapturedResource ??= ParentProvider?.GetResource();
foreach (var metric in batch)
{
MetricNames.Add(metric.Name);
}
return ExportResult.Success;
}
}
}
@@ -0,0 +1,38 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Hosting;
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.Telemetry.Tests;
public sealed class MapZbMetricsTests
{
[Fact]
public async Task MapZbMetrics_ServesPrometheusEndpoint()
{
var builder = WebApplication.CreateBuilder();
builder.WebHost.UseUrls("http://127.0.0.1:0");
builder.AddZbTelemetry(o =>
{
o.ServiceName = "t";
o.Exporter = ZbExporter.Prometheus;
o.Meters = ["Test.Meter"];
});
await using var app = builder.Build();
app.MapZbMetrics();
await app.StartAsync();
var address = app.Urls.First();
using var client = new HttpClient { BaseAddress = new Uri(address) };
var response = await client.GetAsync("/metrics");
Assert.Equal(System.Net.HttpStatusCode.OK, response.StatusCode);
Assert.NotNull(response.Content.Headers.ContentType);
Assert.Equal("text/plain", response.Content.Headers.ContentType!.MediaType);
await app.StopAsync();
}
}
@@ -0,0 +1,28 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<IsPackable>false</IsPackable>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
<PackageReference Include="xunit" />
<PackageReference Include="xunit.runner.visualstudio" />
<PackageReference Include="Microsoft.AspNetCore.Mvc.Testing" />
</ItemGroup>
<ItemGroup>
<Using Include="Xunit" />
</ItemGroup>
<ItemGroup>
<!-- WebApplicationFactory requires the full ASP.NET Core shared framework -->
<FrameworkReference Include="Microsoft.AspNetCore.App" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\ZB.MOM.WW.Telemetry\ZB.MOM.WW.Telemetry.csproj" />
</ItemGroup>
</Project>
@@ -0,0 +1,73 @@
using OpenTelemetry.Resources;
using ZB.MOM.WW.Telemetry;
namespace ZB.MOM.WW.Telemetry.Tests;
public sealed class ZbResourceTests
{
[Fact]
public void Build_PopulatesAllResourceAttributes()
{
var options = new ZbTelemetryOptions
{
ServiceName = "otopcua",
ServiceNamespace = "ZB.MOM.WW",
ServiceVersion = "1.2.3",
SiteId = "site-7",
NodeRole = "central",
};
var resource = ZbResource.Build(options).Build();
var attributes = resource.Attributes.ToDictionary(a => a.Key, a => a.Value);
Assert.Equal("otopcua", attributes["service.name"]);
Assert.Equal("ZB.MOM.WW", attributes["service.namespace"]);
Assert.Equal("1.2.3", attributes["service.version"]);
Assert.Equal("site-7", attributes["site.id"]);
Assert.Equal("central", attributes["node.role"]);
Assert.Equal(Environment.MachineName, attributes["host.name"]);
// service.instance.id must be the deterministic MachineName:ProcessId — NOT a random GUID.
Assert.Equal(ZbResource.InstanceId, attributes["service.instance.id"]);
}
[Fact]
public void Build_OmitsOptionalAttributes_WhenNull()
{
var options = new ZbTelemetryOptions
{
ServiceName = "mxgateway",
// ServiceVersion / SiteId / NodeRole left null
};
var resource = ZbResource.Build(options).Build();
var keys = resource.Attributes.Select(a => a.Key).ToHashSet();
Assert.Contains("service.name", keys);
Assert.Contains("service.namespace", keys);
Assert.Contains("host.name", keys);
// service.instance.id is always present (deterministic, not optional).
Assert.Contains("service.instance.id", keys);
Assert.DoesNotContain("service.version", keys);
Assert.DoesNotContain("site.id", keys);
Assert.DoesNotContain("node.role", keys);
}
[Fact]
public void InstanceId_is_deterministic_MachineName_colon_ProcessId()
{
// InstanceId must be stable within the process and follow the MachineName:ProcessId format.
var expected = $"{Environment.MachineName}:{Environment.ProcessId}";
Assert.Equal(expected, ZbResource.InstanceId);
// Calling it twice returns the same value (no random component).
Assert.Equal(ZbResource.InstanceId, ZbResource.InstanceId);
}
[Fact]
public void InstanceId_does_not_contain_a_random_guid()
{
// The old OTel SDK default was a random GUID; the deterministic id must NOT be a GUID.
Assert.False(
Guid.TryParse(ZbResource.InstanceId, out _),
$"service.instance.id must not be a GUID; got '{ZbResource.InstanceId}'");
}
}
+3
View File
@@ -19,6 +19,9 @@ specs and analyses that *drive* changes made in the individual repos.
|---|---|---|---|---|
| Auth (login / identity / authz) | Draft | OtOpcUa, MxAccessGateway, ScadaBridge | Path to shared code (`ZB.MOM.WW.Auth`) | [`auth/`](auth/) |
| UI Theme (layout / tokens / components) | Draft | OtOpcUa, MxAccessGateway, ScadaBridge | Path to shared code (`ZB.MOM.WW.Theme`) | [`ui-theme/`](ui-theme/) |
| Health (readiness / liveness / active-node) | Draft | OtOpcUa, MxAccessGateway, ScadaBridge | Shared `ZB.MOM.WW.Health` lib (3 packages) | [`health/`](health/) |
| Observability (metrics / traces / logs) | Draft | OtOpcUa, MxAccessGateway, ScadaBridge | Shared `ZB.MOM.WW.Telemetry` lib (2 packages) | [`observability/`](observability/) |
| Audit (event model + writer seam) | Draft | OtOpcUa, MxAccessGateway, ScadaBridge | Path to shared code (`ZB.MOM.WW.Audit`) | [`audit/`](audit/) |
> Add a row when you start normalizing a new component. Status: `Draft` → `Reviewed` → `Adopting` → `Converged`.
+114
View File
@@ -0,0 +1,114 @@
# Audit — gaps & adoption backlog
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach the shared `ZB.MOM.WW.Audit` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
> **Adoption is deferred this round.** The library is being designed (shared contract in
> [`shared-contract/ZB.MOM.WW.Audit.md`](shared-contract/ZB.MOM.WW.Audit.md)) but is not yet
> wired into any app — exactly where `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` sit today.
> The items below are the follow-on work; each lands as a separate PR per project.
## Divergence vs spec
### §1 Canonical record (`AuditEvent`)
| Canonical field | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| `EventId` (Guid, required) | ✅ — idempotency key; buffer key + filtered-unique DB index | ⛔ — no event key; only an `AUTOINCREMENT` rowid (`AuditId`) | ✅ — direct |
| `OccurredAtUtc` (DateTimeOffset, required) | 🟡 — `DateTime` UTC; widen at mapping boundary | 🟡 — `DateTimeOffset` but store-assigned (not caller-supplied); direct after widening | 🟡 — `DateTime` UTC-forced; widen at mapping boundary |
| `Actor` (string, required) | ✅ — direct (`AuditEvent.Actor``ConfigAuditLog.Principal`) | 🟡 — `KeyId` nullable; keyless events (`init-db`/`list-keys`) need a `"system"`/`"cli"` fallback | 🟡 — nullable on system-originated rows; fallback needed |
| `Action` (string, required) | 🟡 — `Action` field exists, but persisted as `"{Category}:{Action}"` composite in `EventType`; canonical keeps them separate | ✅ — `EventType` literal direct | 🟡 — derived as `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) |
| `Outcome` (AuditOutcome, required) | ⛔ **NEW** — derived from `EventType` vocabulary; not stored today | ⛔ **NEW** — derived: `constraint-denied``Denied`, else `Success` | ⛔ **NEW** — derived from `Status` (+`InboundAuthFailure` Kind→`Denied`) |
| `Category` (string?) | ✅ — `AuditEvent.Category` (e.g. `"Config"`) | ⛔ — no field; constant `"ApiKey"` at mapping | ✅ — `Channel` |
| `Target` (string?) | ⛔ — no dedicated field; closest is `DetailsJson` | ⛔ — embedded in `Details` text (`commandKind`/`target`) | ✅ — direct |
| `SourceNode` (string?) | ✅ — `SourceNode` (logical cluster node / host name, NOT an OPC UA NodeId) | 🟡 — `RemoteAddress`; dashboard path only (null on CLI/constraint paths) | ✅ — direct |
| `CorrelationId` (Guid?) | ✅ — direct (`CorrelationId.Value`) | ⛔ — not captured today; left null | ✅ — direct |
| `DetailsJson` (string?) | ✅ — direct (JSON CHECK constraint enforced) | 🟡 — `Details` is a plain string, not JSON; wrap or store as-is | 🟡 — ~15 rich/plumbing fields serialize here at the cross-project reporting boundary |
### §2 `IAuditWriter` seam
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Named seam | ⛔ — no `IAuditWriter`; `AuditWriterActor` is the sink, consumed directly via Akka messaging | ⛔ — `IApiKeyAuditStore` (narrow, two-method) is the seam; no general `IAuditWriter` | ✅ — `IAuditWriter` with `WriteAsync(AuditEvent, CancellationToken)` signature; "failures must NEVER abort the user-facing action" contract; best-effort |
| Best-effort / never throws | 🟡 — the actor drops a failed flush (best-effort), but the seam is not a typed interface a caller can inject independently | ⛔ — no contract; `AppendAsync` may propagate | ✅ |
| Record type at the seam | 🟡 — OtOpcUa's own `AuditEvent` (8 fields, with Commons value-types `NodeId`/`CorrelationId`) | ⛔ — `ApiKeyAuditEntry` (4 fields) | 🟡 — ScadaBridge's ~25-field `AuditEvent` (rich record; adoption = keep own record, adopt canonical interface name + `AuditOutcome`) |
### §3 `IAuditRedactor` seam
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Named seam | ⛔ — no redactor; no payload filtering today | ⛔ — no redactor; safety by construction (entry type cannot carry a secret) | ✅ — `IAuditPayloadFilter` (`AuditEvent Apply(AuditEvent)`, pure/never-throws/over-redacts); **only the name differs** from canonical `IAuditRedactor` |
| Over-redacts on failure | ⛔ — n/a | ⛔ — n/a | ✅ — `SafeDefaultAuditPayloadFilter` is the reference |
### §4 `AuditOutcome` — the new normalized field
`Outcome` is a **genuinely new field** across all three projects. No app stores it today;
each encodes it implicitly. All three must derive and emit it at adoption:
**Gap O1 (OtOpcUa):** derive from `EventType` vocabulary — `OpcUaAccessDenied` /
`CrossClusterNamespaceAttempt``Denied`; config-write verbs → `Success`. No `Failure`
value exists in OtOpcUa's vocabulary today (failed flushes are dropped, not emitted), so
OtOpcUa will produce only `Success` / `Denied` until/unless failure events are added.
**Gap O2 (MxGateway):** derive — `constraint-denied``Denied`; all others → `Success`.
No `Failure` events are emitted today.
**Gap O3 (ScadaBridge):** derive from `AuditStatus``Delivered``Success`;
`Failed` / `Parked` / `Discarded``Failure`; `Kind = InboundAuthFailure``Denied`.
In-flight states (`Submitted` / `Forwarded` / `Attempted`) collapse to the last-known
terminal state when projecting; `Skipped` is excluded from the canonical projection.
### §5 `Actor` → Auth principal
At adoption, every emit site should supply the `ZB.MOM.WW.Auth` principal as `Actor`
(string). The library carries no Auth dependency — `Actor` is a plain `string` — but the
handshake with Auth is the semantic goal (closes the loop).
**Gap P1 (all 3):** at adoption, update emit sites to populate `Actor` from the Auth
principal (LDAP user / API-key name). Auth adoption (#8 in `components/auth/GAPS.md`) is a
prerequisite for the full story; until then, use the existing actor string.
### §6 OtOpcUa two-producer problem
OtOpcUa has **two writers to `ConfigAuditLog`**: the structured Akka `AuditEvent` path AND
older SQL stored procedures that `INSERT` directly (bare `EventType`, NULL `EventId` /
`CorrelationId`, populated `ClusterId` / `GenerationId`). Normalization targets the
structured path only; the SP path stays per-project.
**Gap Q1 (OtOpcUa):** decide at adoption whether to route SP events through the actor
or leave them non-idempotent. Also: the `ClusterId`-filter / actor-never-sets-`ClusterId`
mismatch (Admin UI `ClusterAudit.razor` filters by `ClusterId`, but the actor path sets
`NodeId` not `ClusterId`, so structured rows are invisible to the cluster view). Fix when
normalizing the query surface.
## Adoption backlog (ordered)
| # | Item | Projects | Priority | Effort | Risk | Notes |
|---|---|---|---|---|---|---|
| 1 | **OtOpcUa:** rename `AuditWriterActor` → implements `IAuditWriter`; replace `Commons/Messages/Audit/AuditEvent.cs` with canonical record; add `Outcome` derivation at every emit site (Gap O1) | OtOpcUa | Med | M | Med | Actor internals (batching / dedup / flush triggers) stay bespoke; only the seam type and record change. Commons value-types `NodeId`/`CorrelationId` bridged at construction. |
| 2 | **MxGateway:** map `IApiKeyAuditStore` / `ApiKeyAuditEntry` / `ApiKeyAuditRecord``IAuditWriter` / `AuditEvent`; generate `EventId` per write; add `"system"`/`"cli"` Actor fallback; constant `Category = "ApiKey"`; `constraint-denied``Outcome.Denied` (Gaps O2, record gaps) | MxGateway | Low | S | Med | ⚠ **COORDINATE** — a parallel session is editing this repo for the MEL→Serilog migration (Health/Telemetry normalization). Do NOT start until the Serilog session has landed (or is explicitly fenced off); the two efforts share `Security/Authentication/` DI wiring. |
| 3 | **ScadaBridge:** rename `IAuditPayloadFilter``IAuditRedactor` (or alias during transition); adopt canonical `AuditOutcome` enum (Gap O3); confirm writer contract matches (already byte-for-byte) | ScadaBridge | Low | S | High | **"Align, don't replace."** Blast radius is HIGH — `IAuditPayloadFilter` is used across the entire pipeline (site, central, wiring). Rename + alias only; no transport/storage/record change. `DefaultAuditPayloadFilter` / `SafeDefaultAuditPayloadFilter` implementations unchanged. |
| 4 | **All:** populate `Actor` from `ZB.MOM.WW.Auth` principal at emit sites (Gap P1) | All 3 | Low | S | Low | **Prerequisite:** Auth adoption per `components/auth/GAPS.md` #8. Until Auth is adopted, leave the existing actor string as-is. |
| 5 | **OtOpcUa:** reconcile two-producer problem — decide SP path routing + fix `ClusterId`-filter / actor mismatch in `ClusterAudit.razor` (Gap Q1) | OtOpcUa | Low | S | Low | Normalization does not unify the SP path; this is a reconcile item to decide and document. The mismatch means structured `AuditEvent` rows are currently invisible to the cluster-scoped view. |
| 6 | **MxGateway:** add `CorrelationId` capture at constraint denial + dashboard paths; structured `Target` from `Details` text (currently embedded as a plain string in `ConstraintEnforcer`) | MxGateway | Low | S | Low | Nice-to-have parity; not required for adoption. `CorrelationId` and `Target` canonical fields left null until this is done. |
**Sequencing:** #3 (ScadaBridge rename) is lowest-risk and self-contained — do it first (or
last, depending on blast-radius appetite). #1 (OtOpcUa) is medium effort but independent; it
can start once the shared library is built. #2 (MxGateway) is the smallest code change but
has the highest **coordination dependency** — gate it on the Serilog migration landing first.
#4 (Actor→Auth) is blocked on Auth adoption and is the last to close. #5 and #6 are cleanup
items with no bearing on shared-library adoption.
Each adoption lands as an opt-in version bump per project behind the seam; the shared library
is consumed but the bespoke transport/storage/UI for each project is not touched.
## Decisions still open
- ScadaBridge `IAuditPayloadFilter``IAuditRedactor`: outright rename vs. transitional alias
(both are valid; alias reduces blast radius in the short term).
- MxGateway `Details` plain string → `DetailsJson`: store as-is or wrap in a JSON object at
the mapping boundary.
- `AuditOutcome` column in OtOpcUa storage: add a new `Outcome` column to `ConfigAuditLog`
or fold into `DetailsJson` / derive at read time (schema change vs. runtime cost).
- OtOpcUa SP path: route through the actor path (unified producer) or leave as a bespoke
secondary writer with its own column conventions (separate reconcile effort).
+72
View File
@@ -0,0 +1,72 @@
# Audit (who-did-what)
Status: **Draft**. Normalized component — path to shared code. Goal: converge the three
sister projects onto a canonical `AuditEvent` record + `AuditOutcome` enum + two thin seams
(`IAuditWriter`, `IAuditRedactor`), proposed as the `ZB.MOM.WW.Audit` library, while each
project keeps its own transport, storage, domain vocabulary, and redaction policy.
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
- Canonical event model + field reference: [`spec/EVENT-MODEL.md`](spec/EVENT-MODEL.md)
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Audit.md`](shared-contract/ZB.MOM.WW.Audit.md)
- Divergences + backlog: [`GAPS.md`](GAPS.md)
- Current state, per project: [`current-state/`](current-state/)
## Why audit is a strong normalization candidate
All three projects record a structured who-did-what trail with an actor identity, an action
verb, and a timestamp. Two (OtOpcUa + ScadaBridge) already have a named `AuditEvent` record
with an `EventId` idempotency key, `Actor`, and `CorrelationId`. ScadaBridge already ships
**both** canonical seams under slightly different names (`IAuditWriter` is byte-for-byte the
spec; `IAuditPayloadFilter` is the canonical `IAuditRedactor`). OtOpcUa's record is almost
field-for-field aligned. MxGateway has a narrow API-key-lifecycle log that maps cleanly.
The one new field across all three is `AuditOutcome` — no project stores it explicitly today;
each encodes it implicitly and derives it at adoption. This is the bulk of the per-project
work. Transport, storage, domain vocabulary, and redaction policy are **not** unified — each
project keeps its own bespoke implementation behind the seam.
**Audit closes the loop on Auth.** Every audit row's `Actor` is exactly the identity that the
`ZB.MOM.WW.Auth` component normalizes (LDAP/GLAuth principal, API-key name). The library keeps
`Actor` as a plain `string` (no Auth dependency), but at adoption each emit site supplies the
Auth principal.
**`IAuditRedactor` naming is aligned with Telemetry's `ILogRedactor`** — same shape and naming
discipline so a future `ZB.MOM.WW.Hosting` aggregator wires both redactors with one mental
model — but there is no cross-package dependency between the two libraries.
## Status by project
| Project | Audit today | Seams present | `AuditOutcome` | Adoption status |
|---|---|---|---|---|
| **OtOpcUa** | Akka cluster-broadcast `AuditEvent` → cluster-singleton `AuditWriterActor` (batch 500/5 s, two-layer dedup) over EF `ConfigAuditLog` (SQL Server). Also a legacy SQL stored-procedure write path (bare `EventType`, NULL `EventId`). Admin UI page `ClusterAudit.razor`. | No named `IAuditWriter` seam; no redactor seam. | Not stored — encoded in `EventType` strings (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt``Denied`; config-write verbs → `Success`). | Not started |
| **MxAccessGateway** | Single SQLite-backed `IApiKeyAuditStore` / `ApiKeyAuditEntry` — key lifecycle (CLI + dashboard) + constraint denials only. No authn events persisted; no production read consumer. | Narrow custom seam (`IApiKeyAuditStore`); no general `IAuditWriter`; redaction is by-construction (secret never enters the record type). | Not stored — derived: `constraint-denied``Denied`; all others → `Success`. | Not started |
| **ScadaBridge** | Full pipeline: site SQLite hot-path (`SqliteAuditWriter` + ring-buffer fallback) → Akka `ClusterClient` forwarder → central MS SQL (ingest / reconcile / purge / partition maintenance). Rich ~25-field `AuditEvent` record. CLI `export`/`verify-chain`; Blazor audit UI. | ✅ `IAuditWriter` (matches canonical contract word-for-word); ✅ `IAuditPayloadFilter` (= canonical `IAuditRedactor`, identical signature, pure/never-throws/over-redacts). | Not stored explicitly — derived from `Status` (`Delivered``Success`; `Failed`/`Parked`/`Discarded``Failure`; `Kind = InboundAuthFailure``Denied`). | Not started (align, don't replace) |
See each project's `current-state/<project>/CURRENT-STATE.md` for code-verified detail and
adoption plan:
- [`current-state/otopcua/CURRENT-STATE.md`](current-state/otopcua/CURRENT-STATE.md)
- [`current-state/mxaccessgw/CURRENT-STATE.md`](current-state/mxaccessgw/CURRENT-STATE.md)
- [`current-state/scadabridge/CURRENT-STATE.md`](current-state/scadabridge/CURRENT-STATE.md)
## Normalized vs. left per-project
**Normalized (the shared `ZB.MOM.WW.Audit` library):** the canonical `AuditEvent` record
(5 required fields + 4 optional common + `DetailsJson` extension bag); the `AuditOutcome`
enum (`Success | Failure | Denied`); the `IAuditWriter` seam (best-effort, never throws to
caller); the `IAuditRedactor` seam (pure, never throws, over-redacts on failure); shipped
helpers (`NoOpAuditWriter`, `CompositeAuditWriter`, `RedactingAuditWriter`,
`NullAuditRedactor`, `TruncatingAuditRedactor`). Library has no Akka / EF / SQLite / Serilog
dependency; its only non-BCL dependency is `Microsoft.Extensions.DependencyInjection.Abstractions`.
**Left per-project (each project keeps these behind the seam):** transport and storage (Akka
singleton + EF/SQL Server; SQLite; site-SQLite + central MS SQL + forwarding/reconcile
pipeline); domain vocabulary (`EventType` strings / API-key event-type literals / `Channel` +
`Kind` + `Status` enums); query, CLI, and UI surfaces (`ClusterAudit.razor`; `ListRecentAsync`;
`export` / `verify-chain`; Blazor audit pages); redaction *policy* (which fields/payloads are
sensitive — only the `IAuditRedactor` *seam* is shared).
> **Adoption is deferred this round.** The `ZB.MOM.WW.Audit` library is being designed and
> the shared contract defined, but none of the three apps wire it in yet — exactly where
> `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` sit today. The per-project adoption backlog is in
> [`GAPS.md`](GAPS.md).
@@ -0,0 +1,118 @@
# Audit — current state: MxAccessGateway (`mxaccessgw`)
Repo: `~/Desktop/MxAccessGateway` (Gitea `mxaccessgw`). Stack: .NET 10 gateway (x64) + x86/net48 worker.
Audit lives entirely in the **gateway** (.NET 10); the worker records nothing.
All paths relative to repo root; audit code under `src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/`. Verified 2026-06-01.
This is the **narrowest** of the three implementations: a single SQLite-backed append-only log scoped
to **API-key lifecycle and constraint denials**. There is no general-purpose audit abstraction, no
separate redaction seam, and no CorrelationId. Read-back exists but has no production consumer today.
## How it works today
The audit log is one seam, `IApiKeyAuditStore`
(`src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/IApiKeyAuditStore.cs:6`), with exactly two
operations: `AppendAsync(ApiKeyAuditEntry, ...)` (`IApiKeyAuditStore.cs:14`) and
`ListRecentAsync(int count, ...)` (`IApiKeyAuditStore.cs:22`). Single implementation,
`SqliteApiKeyAuditStore` (`SqliteApiKeyAuditStore.cs:5`), registered as a singleton in
`AuthStoreServiceCollectionExtensions.cs:23` alongside the rest of the auth stores.
- **Append-side shape:** callers pass `ApiKeyAuditEntry(string? KeyId, string EventType, string? RemoteAddress, string? Details)`
(`ApiKeyAuditEntry.cs:3`). The store sets the timestamp itself — `AppendAsync` writes
`created_utc = DateTimeOffset.UtcNow.ToString("O")` (`SqliteApiKeyAuditStore.cs:20`), so the caller
cannot supply the time and there is **no idempotency/event key** (the only identity is the DB
`AUTOINCREMENT` rowid).
- **Read-side shape:** `ListRecentAsync` returns `ApiKeyAuditRecord(long AuditId, string? KeyId, string EventType, string? RemoteAddress, DateTimeOffset CreatedUtc, string? Details)`
(`ApiKeyAuditRecord.cs:3`), ordered `audit_id DESC LIMIT $count` (`SqliteApiKeyAuditStore.cs:38-42`),
returning `[]` for `count <= 0` (`SqliteApiKeyAuditStore.cs:29-32`).
- **Storage:** SQLite, the same gateway-owned auth DB (`AuthSqliteConnectionFactory`, WAL; default
`C:\ProgramData\MxGateway\gateway-auth.db`). Table `api_key_audit` is created by
`SqliteAuthStoreMigrator.cs:95-102``audit_id INTEGER PRIMARY KEY AUTOINCREMENT, key_id TEXT NULL,
event_type TEXT NOT NULL, remote_address TEXT NULL, created_utc TEXT NOT NULL, details TEXT NULL`,
plus index `ix_api_key_audit_key_id_created_utc` (`SqliteAuthStoreMigrator.cs:107-108`). Table name
constant `SqliteAuthSchema.ApiKeyAuditTable = "api_key_audit"` (`SqliteAuthSchema.cs:11`). The log is
append-only: there is no update/delete/prune path.
- **Producers (three, all in the gateway):**
- **Admin CLI** `ApiKeyAdminCliRunner` — its private `AppendAuditAsync` (`ApiKeyAdminCliRunner.cs:153`)
always passes `RemoteAddress: null` (`ApiKeyAdminCliRunner.cs:163`). Event types:
`"init-db"` (`:48`), `"create-key"` (`:74`), `"list-keys"` (`:83`),
`"revoke-key"` with details `revoked`/`not-found-or-already-revoked` (`:102`),
`"rotate-key"` with details `rotated`/`not-found` (`:121`).
- **Dashboard** `DashboardApiKeyManagementService` — its `AppendAuditAsync` (`:197`) captures
`RemoteAddress: httpContextAccessor.HttpContext?.Connection.RemoteIpAddress?.ToString()` (`:207`).
Event types: `"dashboard-create-key"` (`:62`), `"dashboard-revoke-key"` (`:103`, details
`revoked`/`not-found-or-already-revoked`), `"dashboard-rotate-key"` (`:145`, details `rotated`/`not-found`),
`"dashboard-delete-key"` (`:187`, details `deleted`/`not-found-or-active`).
- **Constraint denials** `ConstraintEnforcer.RecordDenialAsync` (`ConstraintEnforcer.cs:117`) writes
`EventType: "constraint-denied"`, `RemoteAddress: null`, and `Details:
$"{commandKind}: {target}: {failure.ConstraintName}: {failure.Message}"` (`ConstraintEnforcer.cs:124-129`).
This is the only "denial" event in the log.
- **No authn events.** The verifier (`ApiKeyVerifier`) and the gRPC authorization interceptor
(`GatewayGrpcAuthorizationInterceptor`) do **not** write to the audit store — authentication
success/failure and `Unauthenticated`/`PermissionDenied` outcomes are surfaced as gRPC statuses and
(per policy) discriminated for logging, but are not persisted as audit rows. So in practice the log
records **key lifecycle (CLI + dashboard) + constraint denials**, not per-request authn outcomes.
- **No separate redaction seam — scrubbing is structural, in the store/entry shape.** There is no
redactor, scrubber, sanitizer, or masking helper. Safety comes from *what the entry type can carry*:
`ApiKeyAuditEntry` has no field for a secret, and every caller passes only a `KeyId` (the public
key identifier, never the secret), an event-type literal, and short hand-built `Details` strings —
the secret/pepper never enters the audit path. This aligns with the repo policy that "API keys,
passwords, `WriteSecured` payloads, and `AuthenticateUser` credentials must never reach logs"
(`CLAUDE.md:79`). Net: redaction is by construction, not a pluggable seam.
- **Read-back has no production consumer.** `ListRecentAsync` is called only by tests
(`SqliteAuthStoreTests`, `ApiKeyAdminCliRunnerTests`). The dashboard `ApiKeysPage.razor` mentions the
audit log only in a delete-confirmation string (`ApiKeysPage.razor:321`) — it does **not** render it.
There is no UI or RPC that surfaces audit history today.
## Mapping to the canonical record
Target: `ZB.MOM.WW.Audit`'s `AuditEvent { Guid EventId; DateTimeOffset OccurredAtUtc; string Actor;
string Action; AuditOutcome Outcome; string? Category; string? Target; string? SourceNode;
Guid? CorrelationId; string? DetailsJson; }` with `AuditOutcome ∈ { Success, Failure, Denied }`.
| `AuditEvent` field | Source today | Mapping note |
|---|---|---|
| `EventId` (Guid, required) | — none — | **Must be generated** at write time. `ApiKeyAuditRecord` has only the autoincrement `AuditId` (`ApiKeyAuditRecord.cs:4`); no idempotency key exists. |
| `OccurredAtUtc` (required) | `CreatedUtc` (`ApiKeyAuditRecord.cs:8`), set as `DateTimeOffset.UtcNow` in the store (`SqliteApiKeyAuditStore.cs:20`) | Direct. Note: time is store-assigned today, not caller-supplied. |
| `Actor` (required) | `KeyId` (`ApiKeyAuditRecord.cs:5`) | Nullable today (`init-db`/`list-keys` pass `null`); the canonical `Actor` is required, so a fallback (e.g. `"system"`/`"cli"`) is needed for keyless events. |
| `Action` (required) | `EventType` (`ApiKeyAuditRecord.cs:6`) | Direct. CLI vocab: `init-db`, `create-key`, `list-keys`, `revoke-key`, `rotate-key`; dashboard vocab: `dashboard-create-key`, `dashboard-revoke-key`, `dashboard-rotate-key`, `dashboard-delete-key`; plus `constraint-denied`. |
| `Outcome` (required) | derived | `constraint-denied``Denied`; everything else → `Success` (no `Failure` events are emitted today). |
| `Category` | — none — | Constant `"ApiKey"`. |
| `Target` | — none as a field — | No structured target. (`ConstraintEnforcer` does embed `commandKind`/`target` inside `Details` text, but there is no dedicated column.) |
| `SourceNode` | `RemoteAddress` (`ApiKeyAuditRecord.cs:7`) | Direct; populated only on the dashboard path (`DashboardApiKeyManagementService.cs:207`), `null` on CLI/constraint paths. |
| `CorrelationId` | — none — | Not captured today. |
| `DetailsJson` | `Details` (`ApiKeyAuditRecord.cs:9`) | Today this is a **plain string**, not JSON; either store as-is in `DetailsJson` or wrap as a small JSON object. |
---
## Adoption plan → `ZB.MOM.WW.Audit`
**Effort: LOW.** The seam is tiny (one interface, two methods, one record pair) and the data already
maps cleanly onto `AuditEvent`. Concretely:
1. **Adapter, not rewrite.** Map `IApiKeyAuditStore` → the shared `IAuditWriter`, and
`ApiKeyAuditEntry`/`ApiKeyAuditRecord``AuditEvent`, using the table above: generate a new
`EventId` Guid per write; `KeyId → Actor` (with a `"system"` fallback for null); `EventType → Action`;
`CreatedUtc → OccurredAtUtc`; `RemoteAddress → SourceNode`; `constraint-denied → Outcome.Denied`,
else `Success`; constant `Category = "ApiKey"`; `Details → DetailsJson`. The three producers
(`ApiKeyAdminCliRunner`, `DashboardApiKeyManagementService`, `ConstraintEnforcer`) keep their call
sites — only the injected type changes.
2. **Redaction stays by-construction.** No separate redactor needs porting; just preserve the rule that
callers never put secrets in `DetailsJson` (mirrors `CLAUDE.md:79`). The shared writer can keep its
own redaction policy as a defence-in-depth layer.
3. **Read-back is free to drop or defer.** `ListRecentAsync` has no production consumer, so the adapter
need not implement a shared query API on day one — only the test/CLI read paths exercise it.
4. **No new dimensions required.** `CorrelationId` and a structured `Target` are absent today and are
*not* in scope to add as part of adoption (descriptive parity only); the canonical record simply
leaves them `null`.
**Coordination risk — sequence against the health/observability work.** A parallel session is actively
editing **this same repo** (`mxaccessgw`) for the MEL → Serilog logging migration
(`ZB.MOM.WW.Health` + `ZB.MOM.WW.Telemetry` normalization). Because audit adoption here also touches the
gateway's `Security/Authentication/` wiring (DI registration in `AuthStoreServiceCollectionExtensions.cs`,
and the three producer call sites), the two efforts can collide on the same files and on logging-pipeline
DI. **Do not start MxGateway audit adoption until the Serilog migration in this repo has landed (or is
explicitly fenced off)**, and confirm with the orchestrator that the logging session is not mid-flight in
`Security/` before opening a PR. The audit and logging seams are conceptually independent (audit = durable
SQLite record of who-did-what; logging = operational telemetry), but they share the gateway's startup/DI
surface, so they must be merged in a defined order rather than in parallel.
@@ -0,0 +1,140 @@
# Audit — current state: OtOpcUa
Repo: `~/Desktop/OtOpcUa` (Gitea `lmxopcua`). Stack: .NET 10, Akka.NET cluster, EF Core + SQL Server.
All paths below are relative to the repo root. Verified against source on 2026-06-01.
OtOpcUa already has a structured, idempotent audit pipeline: a cluster-broadcast `AuditEvent`
message, a cluster-singleton writer actor that batches and bulk-inserts, and an append-only
`ConfigAuditLog` EF entity with two-layer dedup. There is **also** a second, older write path —
SQL stored procedures that `INSERT dbo.ConfigAuditLog` directly — so the table has two
producers with slightly different column conventions (see §1).
## 1. How it works today
**Record shape**`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs:9-17`:
a sealed record `AuditEvent(Guid EventId, string Category, string Action, string Actor,
DateTime OccurredAtUtc, string? DetailsJson, NodeId SourceNode, CorrelationId CorrelationId)`.
`NodeId` and `CorrelationId` are Commons value-types — `NodeId` wraps a string (the *logical
cluster node / host name*, explicitly **not** an OPC UA NodeId per its XML doc,
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/NodeId.cs:3-8`); `CorrelationId` wraps a `Guid`
(`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/CorrelationId.cs:3`).
**Transport**`AuditEvent` is an Akka message meant to be sent to the `AuditWriterActor`
**cluster singleton** (`AuditEvent.cs:6` describes it as "cluster-broadcast … consumed by the
`AuditWriterActor` singleton"). The singleton is registered through Akka.Hosting at
`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ServiceCollectionExtensions.cs:68-75`
(`WithSingleton<AuditWriterActorKey>(AuditWriterSingletonName, …)`). Any cluster member can
emit an `AuditEvent`; the singleton is the one sink that persists it.
**Storage** — EF entity `ConfigAuditLog`
(`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs:7-44`): append-only
("Grants revoked for UPDATE/DELETE on all principals", `ConfigAuditLog.cs:4-5`). Columns:
`AuditId` (identity PK), `Timestamp` (default `SYSUTCDATETIME()`), `Principal`, `EventType`,
`ClusterId?`, `NodeId?`, `GenerationId?`, `DetailsJson?`, `EventId?` (Guid), `CorrelationId?`
(Guid). Mapping/constraints in `OtOpcUaConfigDbContext.cs:429-463`: `DetailsJson` must be valid
JSON (`CK_ConfigAuditLog_DetailsJson_IsJson`, line 435-436); `Principal`/`EventType`/`ClusterId`/`NodeId`
length-capped (lines 441-444); supporting indexes `IX_ConfigAuditLog_Cluster_Time` (line 449-451)
and `IX_ConfigAuditLog_Generation` (line 452-454).
**Writer / batching**`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs`:
a `ReceiveActor` with `FlushBatchSize = 500` (line 25) and `FlushInterval = 5s` (line 26).
It buffers events in a `Dictionary<Guid, AuditEvent>` keyed by `EventId` (line 30), flushing
when the buffer hits 500 (line 60), when the 5s periodic timer fires (`PreStart`, line 50-53),
or on `PreRestart`/`PostStop` (lines 96-107) so a supervisor swap or coordinated shutdown does
not lose the buffer. `FlushBuffer` (lines 63-93) snapshots and clears the buffer, then for each
event constructs a `ConfigAuditLog` row (lines 75-84): `Timestamp = OccurredAtUtc`,
`Principal = Actor`, `EventType = $"{Category}:{Action}"`, `NodeId = SourceNode.Value`,
`DetailsJson`, `EventId`, `CorrelationId = CorrelationId.Value`. A failed flush is logged and the
batch is **dropped** (`catch` at lines 89-92) — best-effort, no retry/dead-letter.
**Dedup / idempotency (two layers)** — described at `AuditWriterActor.cs:17-21`:
1. *In-buffer* — duplicate `EventId`s within a batch collapse via the dictionary (last-write-wins;
`HandleEvent`, lines 55-61).
2. *Database* — a **filtered unique index** `UX_ConfigAuditLog_EventId` (`OtOpcUaConfigDbContext.cs:459-462`,
`IsUnique()` + `HasFilter("[EventId] IS NOT NULL")`) gives cross-restart safety: a retry of an
already-flushed batch hits the constraint, the duplicate insert is dropped, and the rest of the
batch survives. `EventId`/`CorrelationId` are nullable so legacy/backfill rows (NULL) don't
collide — confirmed in the entity XML (`ConfigAuditLog.cs:33-43`) and migration
`Migrations/20260526105027_AddConfigAuditLogEventIdColumns.cs:26-31`.
**Scope** — two producers, two conventions:
- **Akka `AuditEvent` path** (the structured one): config writes + authorization checks. The
EventType vocabulary lives in the entity XML doc (`ConfigAuditLog.cs:18`): `DraftCreated |
DraftEdited | Published | RolledBack | NodeApplied | CredentialAdded | CredentialDisabled |
ClusterCreated | NodeAdded | ExternalIdReleased | CrossClusterNamespaceAttempt |
OpcUaAccessDenied | …`. Note the access-denied / cross-cluster entries are authz-check events,
not config writes.
- **SQL stored-procedure path** (older, still present): several SPs `INSERT dbo.ConfigAuditLog`
directly — e.g. `Published`/`RolledBack`/`NodeApplied`/`ExternalIdReleased`/`CrossClusterNamespaceAttempt`
in `Migrations/20260417215224_StoredProcedures.cs:151,217,351,407,504`. These use `SUSER_SNAME()`
as `Principal`, set `ClusterId`/`GenerationId`, write a **bare** `EventType` (no `Category:Action`
split), and leave `EventId`/`CorrelationId` NULL.
**Query / UI** — the only read surface is the Admin UI page
`src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/ClusterAudit.razor`
(`@page "/clusters/{ClusterId}/audit"`, `[Authorize]`, lines 1-2). It reads the latest
`PageSize = 200` rows (line 69) **filtered by `ClusterId`**, newest-first (`OnInitializedAsync`,
lines 74-82), and renders Timestamp / Principal / Event(Type) / Node / Correlation(first 8 hex) /
Details columns (lines 38-58). Tested in
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AuditWriterActorTests.cs`: count-threshold
flush (lines 26-41), in-buffer dedup of duplicate EventIds (lines 45-62), `PostStop` flush
(lines 66-81), and the column mapping incl. `EventType == "Config:Edit"` and `NodeId == "node-a"`
(lines 85-104).
> Load-bearing gotcha: the actor path **never sets `ClusterId`** (lines 75-84), but the UI filters
> on `ClusterId` (`ClusterAudit.razor:78`). So today the cluster-scoped view surfaces the
> stored-procedure rows; structured `AuditEvent` rows written by the actor (which carry the host in
> `NodeId`, not `ClusterId`) won't appear under a cluster. Worth flagging during normalization.
## 2. Mapping to the canonical `AuditEvent`
Target = `ZB.MOM.WW.Audit.AuditEvent` (built in parallel). OtOpcUa's existing `AuditEvent` is
already almost field-for-field aligned; the only synthesized field is `Outcome`.
| Canonical field | OtOpcUa source | Mapping |
|---|---|---|
| `Guid EventId` | `AuditEvent.EventId` | Direct. Already the idempotency key (buffer key + `UX_ConfigAuditLog_EventId`). |
| `DateTimeOffset OccurredAtUtc` | `AuditEvent.OccurredAtUtc` (`DateTime`) | Direct; widen `DateTime`(UTC) → `DateTimeOffset`. |
| `string Actor` | `AuditEvent.Actor` | Direct (→ `ConfigAuditLog.Principal`). At Auth adoption this becomes the `ZB.MOM.WW.Auth` principal. |
| `string Action` | `AuditEvent.Action` (+ `Category`) | Direct. Today persisted as `"{Category}:{Action}"` in `EventType`; canonical keeps `Action` and `Category` separate. |
| `AuditOutcome Outcome` | *(none)* | **Derived** from the EventType vocabulary, not stored today. `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt``Denied`; the config-write verbs → `Success`. No explicit `Failure` value exists yet (a failed flush is dropped, not recorded as an event). |
| `string? Category` | `AuditEvent.Category` | Direct (e.g. `"Config"`). |
| `string? Target` | *(none)* | No dedicated field today; the closest is `SourceNode``NodeId` (the acting host) or details. Leave null or carry the affected object in `DetailsJson`. |
| `string? SourceNode` | `AuditEvent.SourceNode` (`NodeId.Value`) | Direct — the logical cluster node / host name (NOT an OPC UA NodeId). Currently lands in `ConfigAuditLog.NodeId`. |
| `Guid? CorrelationId` | `AuditEvent.CorrelationId` (`CorrelationId.Value`) | Direct. |
| `string? DetailsJson` | `AuditEvent.DetailsJson` | Direct; carries everything else (incl. `ClusterId`/`GenerationId`, which today are separate columns on the SP path). |
## 3. Adoption plan → `ZB.MOM.WW.Audit`
**Effort: medium.** OtOpcUa is the *donor* design for the canonical record, so most of the work is
re-pointing types and bridging two persistence conventions, not redesigning the pipeline.
**Replace with the shared library:**
- `Commons/Messages/Audit/AuditEvent.cs` → the canonical `ZB.MOM.WW.Audit.AuditEvent`. Add the new
`Outcome` field (derive it at every emit site from the EventType vocabulary, e.g.
`OpcUaAccessDenied → Denied`); keep `Category`/`Action`/`SourceNode`/`CorrelationId` as-is. Decide
whether `SourceNode`/`CorrelationId` carry the Commons value-types or the canonical primitives at
the seam (likely a thin adapter at construction).
- `AuditWriterActor` → implement the library's `IAuditWriter` (keep the actor as OtOpcUa's
Akka-cluster-singleton transport/batching adapter behind that seam; the 500/5s batching,
PreRestart/PostStop flush, and two-layer dedup stay bespoke per §"left per-project").
**Keep bespoke (thin adapter only):**
- Transport — the cluster-broadcast → singleton `AuditWriterActor`, batching, and flush triggers.
- Storage — the `ConfigAuditLog` EF entity, indexes, and `UX_ConfigAuditLog_EventId` idempotency
index. Map the canonical record onto the existing columns; add an `Outcome` column (or fold it into
`EventType`/`DetailsJson` if a schema change is undesirable). `ClusterId`/`GenerationId` remain
OtOpcUa-specific columns fed via `DetailsJson` or kept as side columns.
- Domain vocabulary — the EventType strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …)
and the `Category:Action` composition convention.
- Query/UI — `ClusterAudit.razor` and its `ClusterId` filter.
**Reconcile, not extract:**
- The **two producers** (Akka `AuditEvent` path vs. SQL stored-procedure `INSERT`s using
`SUSER_SNAME()`). The SP path bypasses the canonical record entirely and writes a different
column convention (bare `EventType`, NULL `EventId`/`CorrelationId`, populated
`ClusterId`/`GenerationId`). Adopting the library does not by itself unify these; either route the
SP events through the actor or accept that SP rows stay non-idempotent and absent from the
`EventId` dedup guarantee. Flag for the normalization spec.
- The **`ClusterId`-filter / actor-never-sets-`ClusterId`** mismatch noted in §1 — fix when the
query surface is normalized so structured `AuditEvent` rows are discoverable by cluster.
@@ -0,0 +1,162 @@
# Audit — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET; solution `ZB.MOM.WW.ScadaBridge.slnx`.
Audit code centers on the dedicated `ZB.MOM.WW.ScadaBridge.AuditLog` project, with the shared
record + seams living in `ZB.MOM.WW.ScadaBridge.Commons`. All paths relative to repo root.
Verified 2026-06-01.
**By far the largest audit implementation in the family** — a full who-did-what pipeline
across a site SQLite hot-path and a central MS SQL store, with forwarding, reconciliation,
purge, partition maintenance, redaction, CLI export, hash-chain verify (v1 stub), and a Blazor
UI. **Key finding: ScadaBridge is already at the target.** It already has an `IAuditWriter`
best-effort seam (near-identical to the canonical contract) and an `IAuditPayloadFilter`
redaction seam (= the library's `IAuditRedactor`, just renamed). Adoption is *align, don't
replace* — mostly naming alignment; the enormous transport/storage/CLI/UI stays bespoke.
## 1. How it works today
### The record — `AuditEvent` (~25 fields)
`src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditEvent.cs:22` — a `sealed record`,
append-only, "single source of truth for AuditLog (#23) rows." Far richer than the canonical
10-field event. Notable fields:
- Identity / correlation: `EventId` (idempotency key, `:25`), `CorrelationId` (per-op
lifecycle, `:68`), `ExecutionId` (per-run, `:75`), `ParentExecutionId` (spawner link, `:82`).
- Classification: `Channel` (`:62`), `Kind` (`:65`), `Status` (`:109`) — the domain enums (below).
- Provenance: `SourceSiteId` (`:85`), `SourceNode` (`:94`, stamped from `INodeIdentityProvider`),
`SourceInstanceId` (`:97`), `SourceScript` (`:100`), `Actor` (`:103`), `Target` (`:106`).
- Outcome detail: `HttpStatus` (`:112`), `DurationMs` (`:115`), `ErrorMessage` (`:118`),
`ErrorDetail` (`:121`).
- Payload: `RequestSummary` / `ResponseSummary` (truncated+redacted, `:124`/`:127`),
`PayloadTruncated` (`:130`), `Extra` (free-form JSON, `:133`).
- Lifecycle plumbing: `IngestedAtUtc` (null on site, stamped at central ingest, `:52`),
`ForwardState` (site-only, null on central, `:136`).
**UTC-forcing init-setters.** `OccurredAtUtc` (`:39`) and `IngestedAtUtc` (`:52`) keep a backing
field and call `DateTime.SpecifyKind(value, DateTimeKind.Utc)` on assignment, so a value built
from a literal or rehydrated from a SQL Server `datetime2` column (which strips `Kind` on the
wire) cannot leak downstream as `Unspecified`/local. The record uses `DateTime` (not
`DateTimeOffset`) deliberately, to match the partitioned `datetime2` column shape (`:9-21`).
### Domain vocabulary — four enums
`src/ZB.MOM.WW.ScadaBridge.Commons/Types/Enums/`:
- `AuditChannel.cs:7` — trust boundary crossed: `ApiOutbound`, `DbOutbound`, `Notification`,
`ApiInbound`.
- `AuditKind.cs:8` — specific event within a channel: `ApiCall`, `ApiCallCached`, `DbWrite`,
`DbWriteCached`, `NotifySend`, `NotifyDeliver`, `InboundRequest`, `InboundAuthFailure`,
`CachedSubmit`, `CachedResolve`. Cached variants emit multiple rows per operation.
- `AuditStatus.cs:8` — lifecycle status of the row: `Submitted`, `Forwarded`, `Attempted`,
`Delivered`, `Failed`, `Parked`, `Discarded`, `Skipped`.
- `AuditForwardState.cs:9` — site-local forwarding state (central rows leave null): `Pending`,
`Forwarded`, `Reconciled`. The site retention purge MUST NOT drop a `Pending` row.
### The writer seam — `IAuditWriter` (best-effort, never aborts the action)
`src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/IAuditWriter.cs:10` — boundary-side
abstraction: `Task WriteAsync(AuditEvent evt, CancellationToken ct = default)` (`:18`). The
contract is explicit and matches the canonical seam almost word-for-word: **"Failures must NEVER
abort the user-facing action"** (`:8`), best-effort, "implementations must swallow/log internal
failures rather than propagating them to the calling boundary code" (`:13-14`).
### The redaction seam — `IAuditPayloadFilter` (pure, never throws)
`src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs:22``AuditEvent Apply(
AuditEvent rawEvent)` (`:30`). Filters an event between construction and persistence:
truncates oversized payloads, redacts headers/body/SQL params, sets `PayloadTruncated`.
**Pure function** returning a filtered COPY via `with` expressions, and **MUST NOT throw**
on internal failure it over-redacts and increments the `AuditRedactionFailure` health metric
(`:11-20`, `:26-28`). This is exactly the canonical `IAuditRedactor` under a different name.
Two implementations: `DefaultAuditPayloadFilter.cs:56` (full truncation + header/body/SQL
redaction with live options) and `SafeDefaultAuditPayloadFilter.cs:19` (always-safe fallback —
header-only redaction, over-redacts on parse failure, `:42-59`).
### Transport / storage / pipeline — stays per-project
The `ZB.MOM.WW.ScadaBridge.AuditLog` project is split into `Site/`, `Central/`, `Payload/`, and
`Configuration/`. This is the bespoke half and is **not** a candidate for extraction; cited here
only to show the scale around the common core:
- **Site hot-path:** `Site/SqliteAuditWriter.cs:32` (`IAuditWriter` over an owned `SqliteConnection`
fed by a bounded `Channel<T>` drained on a background task, so script-thread callers never block
on disk I/O; first-write-wins on duplicate `EventId`). `Site/FallbackAuditWriter.cs:28` composes
the SQLite writer with a drop-oldest `RingBufferFallback` so a primary failure never bubbles out.
`Site/Telemetry/` forwards rows to central over Akka `ClusterClient`.
- **Central ingest/store:** `Central/CentralAuditWriter.cs:40` (`ICentralAuditWriter`, direct MS SQL
write for central-originated events, per-call EF scope, idempotent `InsertIfNotExistsAsync`,
swallows every exception per "alog.md §13"). `Central/AuditLogIngestActor.cs:46` batches site
telemetry; `Central/SiteAuditReconciliationActor.cs:68` periodically pulls to catch dropped
forwards; `Central/AuditLogPurgeActor.cs:58` enforces retention; `Central/AuditLogPartitionMaintenanceService.cs:55`
manages the partitioned table.
- **CLI:** `CLI/Commands/AuditCommands.cs:12` builds `export` (`:137`, formats `csv`/`jsonl`/`parquet`)
and `verify-chain` (`:226`). Hash-chain verify is currently a **v1 no-op stub**
`CLI/Commands/AuditVerifyChainHelpers.cs:6-10` ("v1 is a no-op").
- **UI:** Blazor pages under `CentralUI/Components/Pages/Audit/` (e.g. `AuditLogPage.razor:1`,
gated by `[Authorize(Policy = AuthorizationPolicies.OperationalAudit)]`) plus drill-down
components in `CentralUI/Components/Audit/`.
- **Wiring:** `AuditLog/ServiceCollectionExtensions.cs:59` `AddAuditLog(...)`, `:316`
`AddAuditLogCentralMaintenance(...)`.
## 2. Mapping to the canonical record
Target (`ZB.MOM.WW.Audit`, being built): `record AuditEvent { Guid EventId; DateTimeOffset
OccurredAtUtc; string Actor; string Action; AuditOutcome Outcome; string? Category; string?
Target; string? SourceNode; Guid? CorrelationId; string? DetailsJson; }`. ScadaBridge's record is
a strict superset — the canonical fields map directly; the rich extras collapse into `DetailsJson`.
| Canonical field | ScadaBridge source | Notes |
|---|---|---|
| `EventId` (Guid) | `AuditEvent.EventId` | Direct; same idempotency-key role. |
| `OccurredAtUtc` (DateTimeOffset) | `AuditEvent.OccurredAtUtc` (`DateTime`, UTC-forced) | Type bridge `DateTime`(Utc)↔`DateTimeOffset`; semantics identical. |
| `Actor` (string) | `AuditEvent.Actor` (nullable) | Direct; ScadaBridge allows null (system-originated rows). |
| `Action` (string) | `AuditEvent.Kind` (+`Channel`) | Derive a stable action string, e.g. `{Channel}.{Kind}` (`ApiOutbound.ApiCall`). |
| `Outcome` (Success/Failure/Denied) | `AuditEvent.Status` | `Delivered`→Success; `Failed`/`Parked`/`Discarded`→Failure; `InboundAuthFailure`(Kind)→Denied; in-flight `Submitted`/`Forwarded`/`Attempted` collapse to the last-known terminal state when projecting. |
| `Category` (string?) | `AuditEvent.Channel` | The coarse bucket; pairs with `Action` above. |
| `Target` (string?) | `AuditEvent.Target` | Direct. |
| `SourceNode` (string?) | `AuditEvent.SourceNode` | Direct (`node-a`/`central-b`/…). |
| `CorrelationId` (Guid?) | `AuditEvent.CorrelationId` | Direct (per-op lifecycle id). |
| `DetailsJson` (string?) | `ExecutionId`, `ParentExecutionId`, `SourceSiteId`, `SourceInstanceId`, `SourceScript`, `HttpStatus`, `DurationMs`, `ErrorMessage`, `ErrorDetail`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra`, `IngestedAtUtc`, `ForwardState` | The ~15 rich/plumbing fields serialize into the canonical `DetailsJson` extension. |
The canonical record is a lossy *projection* of ScadaBridge's — fine for cross-project
reporting, but ScadaBridge keeps its full record as the storage shape (the partitioned SQL
schema, forwarding state, and reconciliation all depend on the extra columns).
## 3. Adoption plan → `ZB.MOM.WW.Audit`
**Posture: align, don't replace.** ScadaBridge is the reference implementation the shared
library is being extracted *from*; it already has both seams. Adoption is mostly renaming and
contract-confirmation, with a deliberately small touched surface and a large blast radius if
done carelessly. **Priority: LOW. Blast radius: HIGH.**
**Align (small, naming-level):**
- **Rename the redaction seam to match the contract.** `IAuditPayloadFilter` → adopt
`ZB.MOM.WW.Audit.IAuditRedactor` (`AuditEvent Apply(AuditEvent)` — identical signature and
pure/never-throws contract). Either alias `IAuditPayloadFilter : IAuditRedactor` during
transition or rename outright; `DefaultAuditPayloadFilter` / `SafeDefaultAuditPayloadFilter`
implement it unchanged. See [`../../shared-contract/`](../../shared-contract/).
- **Confirm the writer contract matches.** `IAuditWriter.WriteAsync(AuditEvent, CancellationToken
= default)` is already byte-for-byte the canonical signature, and the "never abort the
user-facing action" wording matches. The only delta is the **record type**: the library's
`IAuditWriter` is typed on the *canonical* 10-field `AuditEvent`, while ScadaBridge's is typed on
its ~25-field record. Resolve by either (a) keeping ScadaBridge's writer on its own rich record
and adopting only the library's *interface name + outcome enum*, or (b) having the shared seam be
generic over the event type. **Recommended: (a)** — adopt the canonical `AuditOutcome` enum and
the interface naming, but keep the bespoke `AuditEvent` as ScadaBridge's storage record, since the
whole transport/partition/forwarding layer is built on its extra columns. (Best-practice fit: this
is the minimal-coupling option — share the contract, not the schema.)
**Keep bespoke (the large, untouched majority):**
- The entire `Site/` (SQLite hot-path + ring-buffer fallback + telemetry forwarder) and `Central/`
(ingest / reconcile / purge / partition maintenance) pipeline.
- The `AuditEvent` rich record itself, the four domain enums (`AuditChannel`/`AuditKind`/
`AuditStatus`/`AuditForwardState`), CLI `export`/`verify-chain`, and the Blazor audit UI.
- The redaction *policy* (`DefaultAuditPayloadFilter` options, per-target overrides) — only the
interface name is shared, not the implementation.
**Net:** ScadaBridge converges by renaming one interface and adopting the canonical `AuditOutcome`
enum + the `Kind`/`Channel`→`Action`/`Category` and ``→`DetailsJson` projection for any
cross-project reporting. No transport, storage, CLI, or UI is replaced. Sequencing and the
cross-project gap list live in [`../../GAPS.md`](../../GAPS.md); the canonical target is
[`../../spec/SPEC.md`](../../spec/SPEC.md).
@@ -0,0 +1,153 @@
# Proposed shared library: `ZB.MOM.WW.Audit`
A contract on paper — the public surface to extract so the three projects stop
re-implementing audit-event capture with incompatible shapes. Realizes
[`../spec/SPEC.md`](../spec/SPEC.md).
**Not yet created.** Reference implementations already exist: ScadaBridge's
`IAuditWriter`/`IAuditPayloadFilter` (already at target shape), mxaccessgw
structured-log audit trail, OtOpcUa admin-UI audit log.
## Package (.NET 10)
```
ZB.MOM.WW.Audit # the single package: event record, seams, helpers, DI wiring
```
Single package, single DLL. Only non-BCL dependency:
`Microsoft.Extensions.DependencyInjection.Abstractions` (for `AddZbAudit`).
Published to the Gitea NuGet feed; SemVer.
| Package (→ DLL) | Transitive deps | OtOpcUa | mxaccessgw | ScadaBridge |
|---|---|---|---|---|
| `ZB.MOM.WW.Audit` | `Microsoft.Extensions.DependencyInjection.Abstractions` | ✅ | ✅ | ✅ |
All three auth-bearing processes are .NET 10 — the x86/net48 mxaccessgw worker does
no audit emission, so net48 multi-targeting is **not** required.
## `AuditEvent` record and `AuditOutcome` enum
```csharp
public sealed record AuditEvent {
public required Guid EventId { get; init; }
public required DateTimeOffset OccurredAtUtc { get; init; } // normalized to UTC on assignment
public required string Actor { get; init; }
public required string Action { get; init; }
public required AuditOutcome Outcome { get; init; }
public string? Category { get; init; }
public string? Target { get; init; }
public string? SourceNode { get; init; }
public Guid? CorrelationId { get; init; }
public string? DetailsJson { get; init; }
}
public enum AuditOutcome { Success, Failure, Denied }
```
`OccurredAtUtc` is the only field with a normalization contract: any value assigned
is coerced to UTC (via `ToUniversalTime()`). All other fields are caller-supplied and
carried through without transformation by the library internals.
## Seams
### `IAuditWriter`
```csharp
public interface IAuditWriter
{
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
```
**Hard contract:**
- Best-effort delivery. The implementation **MUST swallow all internal failures** and
**MUST NOT throw** to the caller. A write that fails silently is preferable to
a write that crashes the calling thread or kills a request pipeline.
- `CancellationToken` is respected for cooperative cancellation but a cancellation
does not constitute a contract violation; the implementation may choose to complete
a partially-written event anyway.
### `IAuditRedactor`
```csharp
public interface IAuditRedactor
{
AuditEvent Apply(AuditEvent rawEvent);
}
```
**Hard contract:**
- Pure function (no I/O, no side effects).
- **MUST NOT throw.** On any internal failure the implementation must over-redact
(e.g. replace the affected field with a sentinel such as `"[redacted]"`) rather
than propagate the exception. Lossier output is always preferable to a thrown
exception reaching the caller.
## Shipped helpers (concrete)
### Redactors
| Type | Behaviour |
|---|---|
| `NullAuditRedactor` | Identity — returns the event unchanged. Registered as the default by `AddZbAudit`. |
| `TruncatingAuditRedactor` | Caps `DetailsJson` and `Target` to a configurable maximum length and appends a marker (e.g. `"…"`) when truncated. Never throws. Configured via `TruncatingAuditRedactorOptions`. |
| `TruncatingAuditRedactorOptions` | Options record for `TruncatingAuditRedactor`: `MaxDetailsJsonLength`, `MaxTargetLength`, `TruncationMarker`. |
### Writers
| Type | Behaviour |
|---|---|
| `NoOpAuditWriter` | Discards every event. Registered as the default by `AddZbAudit`; consumer replaces with a real writer. |
| `CompositeAuditWriter` | Fan-out: forwards each event to an ordered list of inner `IAuditWriter` instances. A failing inner writer is swallowed (per the `IAuditWriter` contract) — it does **not** abort the remaining writers in the list. |
| `RedactingAuditWriter` | Decorator: calls `IAuditRedactor.Apply` on the event, then delegates the redacted event to an inner `IAuditWriter`. Separates the redaction concern from any concrete writer. |
## DI wiring
```csharp
public static IServiceCollection AddZbAudit(this IServiceCollection services);
```
Registers defaults via `TryAdd` so any prior consumer registration wins:
- `IAuditRedactor``NullAuditRedactor` (singleton)
- `IAuditWriter``NoOpAuditWriter` (singleton)
A consumer that registers its own `IAuditWriter` (e.g. a Serilog-backed writer or a
`CompositeAuditWriter`) before or after calling `AddZbAudit` will see its registration
respected. `AddZbAudit` does **not** clear or override existing registrations.
## Relationship to Telemetry (`ILogRedactor`)
`IAuditRedactor` mirrors Telemetry.Serilog's `ILogRedactor` in shape and naming — same
single-method contract, same "pure, must not throw, over-redact on failure" semantics —
so that a future `ZB.MOM.WW.Hosting` aggregator package can wire both behind a single
configuration surface without an impedance mismatch.
`ZB.MOM.WW.Audit` has **no dependency** on `ZB.MOM.WW.Telemetry` or any Serilog package.
The alignment is intentional design convergence; the independence is a hard boundary.
## What stays in each consumer
OtOpcUa: admin-UI audit sink (Blazor event handler → `IAuditWriter`), `Category`
constants specific to OPC UA operations.
mxaccessgw: gRPC interceptor that captures actor/action from call metadata; constraint-aware
`Category` tagging; `DetailsJson` serialization of gateway-specific payloads.
ScadaBridge: site-scoped `SourceNode` population; `ManagementActor` enforcement callbacks;
`IAuditPayloadFilter``IAuditRedactor` migration (shape is already equivalent — adoption
is a near-zero-effort rename).
## Open contract questions
1. **Batching**: a `WriteBatchAsync(IEnumerable<AuditEvent>, CancellationToken)` overload on
`IAuditWriter` may be warranted once a database-backed writer is in use. Defer until
the first consumer demonstrates the need; batching can be added without breaking the
existing single-event surface.
2. **Structured `DetailsJson`**: confirm whether callers should supply raw JSON strings or
whether a typed `TDetails` generic overload (serialized internally) is cleaner. The
current `string?` keeps the library dependency-free but shifts serialization to the caller.
3. **`CompositeAuditWriter` error policy**: decide whether per-writer failure should be
observable (e.g. an optional `ILogger<CompositeAuditWriter>`) or always silently dropped.
Logging the failure is diagnostic-friendly but adds a logging dependency.
See [`../GAPS.md`](../GAPS.md) for the adoption order and effort/risk.
+94
View File
@@ -0,0 +1,94 @@
# Canonical event model (standardized)
Status: **Standardized**. The org-wide audit record + outcome enum every sister project maps onto.
This is the reference companion to [`SPEC.md`](SPEC.md) (mirroring auth's `CANONICAL-ROLES.md` /
theme's `DESIGN-TOKENS.md`): the field-by-field canonical record, the `AuditOutcome` definition with
which app states map onto each value, and the full per-project mapping table. The shared library
defines exactly this record; each project **projects its native record onto it** at the seam.
## The canonical record
```csharp
namespace ZB.MOM.WW.Audit;
public sealed record AuditEvent
{
// REQUIRED core — who / what / when / outcome
public required Guid EventId { get; init; } // idempotency key
public required DateTimeOffset OccurredAtUtc { get; init; } // normalized to UTC
public required string Actor { get; init; } // who — = ZB.MOM.WW.Auth principal at adoption
public required string Action { get; init; } // what — verb / event-type string
public required AuditOutcome Outcome { get; init; } // Success | Failure | Denied
// OPTIONAL common
public string? Category { get; init; } // subsystem / grouping bucket
public string? Target { get; init; } // on-what (resource / method / connection)
public string? SourceNode { get; init; } // emitting logical node / host
public Guid? CorrelationId { get; init; } // join to originating request / workflow
// EXTENSION — everything project-specific, as JSON
public string? DetailsJson { get; init; }
}
public enum AuditOutcome { Success, Failure, Denied }
```
### Field-by-field
| Field | Req? | Type | Meaning | Notes |
|---|:-:|---|---|---|
| `EventId` | yes | `Guid` | Idempotency key | Backs at-least-once transports: OtOpcUa's filtered-unique `EventId` index, ScadaBridge's first-write-wins. MxGateway has none today → **generate at write time**. |
| `OccurredAtUtc` | yes | `DateTimeOffset` | When it happened, UTC | MxGateway already uses `DateTimeOffset`. OtOpcUa / ScadaBridge store UTC-forced `DateTime` and widen at the mapping boundary. |
| `Actor` | yes | `string` | Who acted | SHOULD be the `ZB.MOM.WW.Auth` principal ([`SPEC.md`](SPEC.md) §4). Kept a `string` (no Auth dependency). Keyless events use a `"system"` / `"cli"` fallback rather than empty. |
| `Action` | yes | `string` | What was done (verb / event-type) | Carries each app's domain verb: OtOpcUa `EventType`, MxGateway `EventType`, ScadaBridge `{Channel}.{Kind}`. |
| `Outcome` | yes | `AuditOutcome` | Success / Failure / Denied | **New normalized field — no app stores it today; each derives it** (see below). |
| `Category` | no | `string?` | Coarse subsystem / grouping | OtOpcUa `Category` (`"Config"`); MxGateway constant `"ApiKey"`; ScadaBridge `Channel`. |
| `Target` | no | `string?` | The object acted on | ScadaBridge `Target` (direct). OtOpcUa / MxGateway have no dedicated field → null or fold into `DetailsJson`. |
| `SourceNode` | no | `string?` | Emitting logical node / host | OtOpcUa `SourceNode` (a logical node name, **not** an OPC UA NodeId); ScadaBridge `SourceNode`; MxGateway `RemoteAddress`. |
| `CorrelationId` | no | `Guid?` | Join to originating request / workflow | OtOpcUa / ScadaBridge direct; MxGateway has none today (left null). |
| `DetailsJson` | no | `string?` | Extension bag — all project-specific data | Must be valid JSON where stored (OtOpcUa enforces this with a CHECK constraint). Absorbs each app's surplus columns. |
## `AuditOutcome` — definition and app-state mapping
Three values, deliberately minimal — enough to normalize denials and failures without importing any
app's full taxonomy. `Outcome` is **derived** at each emit site (no app persists it today; OtOpcUa
encodes it implicitly in `EventType`, MxGateway in the event-type literal, ScadaBridge in `Status`):
| `AuditOutcome` | Meaning | OtOpcUa (`EventType`) | MxGateway (event type) | ScadaBridge (`AuditStatus` / `AuditKind`) |
|---|---|---|---|---|
| **`Success`** | The action completed | config-write verbs — `DraftCreated`, `DraftEdited`, `Published`, `RolledBack`, `NodeApplied`, `CredentialAdded`, `ClusterCreated`, `NodeAdded`, `ExternalIdReleased`, … | key-lifecycle — `init-db`, `create-key`, `list-keys`, `revoke-key`, `rotate-key` + all `dashboard-*` | `Status = Delivered` |
| **`Failure`** | The action was attempted and failed | *(none today — a failed actor flush is dropped, not recorded as an event)* | *(none emitted today)* | `Status ∈ { Failed, Parked, Discarded }` |
| **`Denied`** | The action was rejected by authorization / policy | `OpcUaAccessDenied`, `CrossClusterNamespaceAttempt` | `constraint-denied` | `Kind = InboundAuthFailure` |
Notes:
- **OtOpcUa has no `Failure` source.** Its vocabulary only distinguishes success-verbs from
access-denials; an internal write failure is dropped (best-effort), not emitted as an event. So
OtOpcUa produces only `Success` / `Denied` until/unless it adds failure events.
- **MxGateway emits only `Success` / `Denied`** today (no failure events; authentication
success/failure is surfaced as gRPC status, not persisted — see its current-state doc).
- **ScadaBridge in-flight states** (`Submitted` / `Forwarded` / `Attempted`) are not terminal; when
projecting to a single `Outcome` they collapse to the last-known terminal state. `Skipped` is not a
user-facing outcome and is excluded from the canonical projection.
## Per-project mapping table (canonical ← native record)
Consolidated from the three current-state docs. "Direct" = field exists with the same role; the
right-hand notes flag the type bridges and synthesized fields.
| Canonical field | OtOpcUa `AuditEvent` (8 fields) | MxGateway `ApiKeyAuditRecord` (6 fields) | ScadaBridge `AuditEvent` (~25 fields) |
|---|---|---|---|
| `EventId` | `EventId` — direct (idempotency key) | **generate** new `Guid` (only `AuditId` rowid exists) | `EventId` — direct |
| `OccurredAtUtc` | `OccurredAtUtc` (`DateTime` UTC) → widen | `CreatedUtc` (store-assigned `DateTimeOffset`) — direct | `OccurredAtUtc` (`DateTime` UTC-forced) → widen |
| `Actor` | `Actor` — direct | `KeyId` (nullable → `"system"`/`"cli"` fallback) | `Actor` (nullable on system rows) |
| `Action` | `Action` (persisted as `"{Category}:{Action}"`) | `EventType` — direct | `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) |
| `Outcome` | **derive** from `EventType` | **derive**: `constraint-denied``Denied`, else `Success` | **derive** from `Status` (+`InboundAuthFailure``Denied`) |
| `Category` | `Category` (`"Config"`) | constant `"ApiKey"` | `Channel` |
| `Target` | — none — (null or via `DetailsJson`) | — none — (`commandKind`/`target` embedded in `Details` text) | `Target` — direct |
| `SourceNode` | `SourceNode` (logical node, `NodeId.Value`) | `RemoteAddress` (dashboard path only) | `SourceNode` — direct |
| `CorrelationId` | `CorrelationId` (`CorrelationId.Value`) — direct | — none — | `CorrelationId` — direct |
| `DetailsJson` | `DetailsJson` — direct (also `ClusterId`/`GenerationId` on the SP path) | `Details` (plain string → store as-is or wrap) | the ~15 rich/plumbing fields (`ExecutionId`, `SourceSiteId`, `HttpStatus`, `DurationMs`, `ErrorMessage`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra`, `ForwardState`, …) serialize here |
The canonical record is a **lossy projection**: it is sufficient for cross-project reporting, but each
project keeps its native record as the storage shape — ScadaBridge especially, whose partitioned SQL
schema, forwarding state, and reconciliation depend on the extra columns ([`SPEC.md`](SPEC.md) §5).
+146
View File
@@ -0,0 +1,146 @@
# Audit — normalized target spec
Status: **Draft**. The single design the sister projects converge on. Derived from the three
code-verified current-state docs (`../current-state/`) and the locked design
(`../../../docs/plans/2026-06-01-audit-component-design.md`). Goal is *path to shared code*
(`../shared-contract/ZB.MOM.WW.Audit.md`), so each normalized section maps to a shared library seam.
## 0. Normalized vs left-per-project
**Normalized here** (the shared `ZB.MOM.WW.Audit` library):
- **The canonical `AuditEvent` record** — required core (`EventId`, `OccurredAtUtc`, `Actor`,
`Action`, `Outcome`) + optional common (`Category`, `Target`, `SourceNode`, `CorrelationId`) +
the `DetailsJson` extension bag. The full field-by-field reference is [`EVENT-MODEL.md`](EVENT-MODEL.md).
- **`AuditOutcome`** — the 3-value `Success | Failure | Denied` enum (§3). This is a *new*
normalized field every app derives; see [`EVENT-MODEL.md`](EVENT-MODEL.md) for the per-app derivation.
- **The two seams** — `IAuditWriter` (best-effort, never throws to caller, §1) and `IAuditRedactor`
(pure, never throws, over-redacts on failure, §2).
**Explicitly NOT normalized** (domain-specific / divergent — keep per project):
- **Transport & storage** — OtOpcUa's Akka cluster-broadcast → singleton `AuditWriterActor` (batch
500 / 5 s, two-layer dedup) over `ConfigAuditLog`; MxGateway's SQLite `IApiKeyAuditStore` append +
list-recent; ScadaBridge's site-SQLite hot-path → central MS SQL ingest / reconcile / purge /
partition-maintenance / hash-chain pipeline. The shared core carries no Akka / EF / SQLite /
Serilog dependency; its only non-BCL dependency is `Microsoft.Extensions.DependencyInjection.Abstractions`
(for `AddZbAudit`).
- **Domain vocabulary** — ScadaBridge's `Channel` / `Kind` / `Status` / `ForwardState` enums and
OtOpcUa's `EventType` strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …). These map
*into* `Action` / `Category` / `Outcome` / `DetailsJson`; they do not leak into the shared type.
- **Query / CLI / UI / export** surfaces (OtOpcUa `ClusterAudit.razor`; ScadaBridge `export` /
`verify-chain` CLI + Blazor audit pages; MxGateway's unused `ListRecentAsync`).
- **Each app's redaction *policy*** — *which* fields/commands/payloads are sensitive. Only the
`IAuditRedactor` *seam* is shared; the `Default` / `Safe` filter behaviour stays per-project.
> **Scope of the producer path.** OtOpcUa has **two producers** writing the same `ConfigAuditLog`
> table — the structured Akka `AuditEvent` path *and* older SQL stored procedures that `INSERT`
> directly (`SUSER_SNAME()`, bare `EventType`, NULL `EventId`). Normalization targets the
> **structured producer path** (the one that builds an `AuditEvent`), not every SQL insert; the SP
> path stays per-project and is a reconcile item, not an extraction item (`../GAPS.md`).
## 1. The writer contract — `IAuditWriter` (best-effort)
```csharp
public interface IAuditWriter
{
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
```
Audit is a side-channel, never on the critical path. The hard rule:
- **`WriteAsync` MUST NOT throw to the caller.** An implementation swallows/logs its own internal
failures; a failed write **must never abort the user-facing action** it is recording. (ScadaBridge's
seam already states this almost word-for-word: "Failures must NEVER abort the user-facing action.")
- Idempotency is carried by `EventId`, so retries and at-least-once transports are safe (OtOpcUa's
filtered-unique `EventId` index and ScadaBridge's first-write-wins are both honoured by this key).
- Delivery is at-most-once *as a contract* — a writer MAY drop on failure (OtOpcUa drops a failed
batch; ScadaBridge's ring-buffer fallback drops oldest). Durability is a per-project transport
decision, not part of this seam.
Shipped helpers (the only concrete writers): `NoOpAuditWriter` (discards — tests / disabled audit),
`CompositeAuditWriter` (fans out to N writers; **one writer throwing does not stop the others**), and
`RedactingAuditWriter` (decorator: applies the redactor, then delegates to an inner writer).
## 2. The redactor contract — `IAuditRedactor` (never throws)
```csharp
public interface IAuditRedactor
{
AuditEvent Apply(AuditEvent rawEvent);
}
```
A pure projection from a raw event to a safe one, applied between event construction and the writer
chain. The hard rule:
- **`Apply` MUST NOT throw.** On any internal failure it **over-redacts** (returns a strictly safer
event) rather than propagating — a redactor that throws would either crash the audit path or leak
the unredacted event. (ScadaBridge's `SafeDefaultAuditPayloadFilter` is the reference: header-only
redaction, over-redacts on parse failure.)
- It is a **pure function** returning a filtered *copy* (via `with`); it does not mutate the input or
perform I/O.
The seam is **aligned-but-independent** with Telemetry's `ILogRedactor` — same shape and naming
discipline so a future `ZB.MOM.WW.Hosting` aggregator wires both with one mental model — but there is
**no cross-package dependency**. Shipped helpers: `NullAuditRedactor` (identity — the default when no
policy is configured) and `TruncatingAuditRedactor` (caps `DetailsJson` / `Target` to a configured
max + sets a truncation marker; never throws). The *secret-field policy* (which fields/commands are
sensitive) stays per-project via composition.
## 3. `AuditOutcome` — the new normalized field
`Outcome` is in the **required core**, but **no app stores it today** — each encodes outcome
implicitly and must **derive** it at adoption (this is the one genuinely new field):
- **OtOpcUa** — derived from the `EventType` vocabulary (`OpcUaAccessDenied` /
`CrossClusterNamespaceAttempt``Denied`; config-write verbs → `Success`).
- **MxGateway** — `constraint-denied``Denied`; key-lifecycle events → `Success`.
- **ScadaBridge** — `AuditStatus``Outcome` (`Delivered``Success`; `Failed` / `Parked` /
`Discarded``Failure`; `InboundAuthFailure` kind → `Denied`).
The three values normalize denials and failures across the family without importing any app's full
taxonomy. The enum definition and the complete state-by-state mapping live in [`EVENT-MODEL.md`](EVENT-MODEL.md).
## 4. The hinge — audit closes the loop on Auth
Every audit row's `Actor` is the *who*, which is exactly the identity the **Auth** component already
normalizes (LDAP/GLAuth principal, API-key name). Auth is the read side ("who is this and what may
they do"); audit is the write side ("who did what"). The spec ties them by stating:
- **`Actor` SHOULD be the `ZB.MOM.WW.Auth` principal** at adoption time.
- But `Actor` is **kept as a plain `string`** in the contract, so the library carries **no dependency
on `ZB.MOM.WW.Auth`**. (MxGateway's keyless events — `init-db` / `list-keys` — supply a `"system"` /
`"cli"` fallback rather than leaving the required field empty.)
This mirrors Auth's own decision to keep audit *read* inside `OBSERVE` and audit *export* inside
`ADMINISTER` rather than minting a separate auditor role: the two components share a vocabulary, not a
dependency.
## 5. ScadaBridge is already at the target
ScadaBridge already ships **both** seams: an `IAuditWriter` whose best-effort contract matches
word-for-word, and an `IAuditPayloadFilter` that *is* the canonical `IAuditRedactor` under a different
name (identical `AuditEvent Apply(AuditEvent)` signature, pure / never-throws / over-redacts). The
library essentially **lifts ScadaBridge's seams**.
The one real (non-naming) decision is the **writer's record type**: the canonical `IAuditWriter` is
typed on the 10-field `AuditEvent`; ScadaBridge's writer is typed on its ~25-field record.
> **Resolution (recommended):** share the **interface *name* + the `AuditOutcome` enum**, not the
> record schema. ScadaBridge keeps its rich ~25-field record as its **storage shape** (its whole
> transport / partition / forwarding / reconciliation layer is built on the extra columns), and maps
> to the canonical 10-field record **only at cross-app reporting boundaries**. This is the
> minimal-coupling option — share the contract, not the schema — and avoids making the shared seam
> generic over the event type. ScadaBridge therefore converges by **renaming one interface** and
> adopting `AuditOutcome`, with no transport / storage / CLI / UI change.
## 6. Acceptance (what "converged" means)
A project is converged when: (a) its structured audit-producer path constructs the canonical
`AuditEvent` (with `Outcome` derived per §3) and persists via an implementation of `IAuditWriter`;
(b) any redaction runs through an `IAuditRedactor`; (c) `Actor` carries the `ZB.MOM.WW.Auth` principal
where one exists (string fallback otherwise); with its transport, storage, domain vocabulary, query
surfaces, and redaction *policy* unchanged. Per-project deltas and the adoption backlog are in
[`../GAPS.md`](../GAPS.md); the proposed library API is [`../shared-contract/ZB.MOM.WW.Audit.md`](../shared-contract/ZB.MOM.WW.Audit.md).
+133
View File
@@ -0,0 +1,133 @@
# Health — gaps & adoption backlog
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach the shared `ZB.MOM.WW.Health` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
## Divergence vs spec
### §1 Endpoint tiers
| Spec tier | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| `/health/ready` (tag `ready`) | ✅ present | ⛔ absent | ✅ present (name-predicate) |
| `/health/active` (tag `active`) | ✅ present | ⛔ absent | ✅ present (name-predicate) |
| `/healthz` (bare process liveness) | ✅ present | ⛔ absent | ⛔ absent |
| `/health/live` (non-standard) | — | ⛔ present (hardcoded `"Healthy"`, bypasses health-check pipeline) | — |
**Gap T1 (P1):** MxAccessGateway has no standard health tiers. The existing `/health/live`
`MapGet` lambda must be replaced by `app.MapZbHealth()` + real probes.
**Gap T2:** ScadaBridge lacks `/healthz`. `MapZbHealth()` adds it automatically.
**Gap T3:** MxAccessGateway's `/health/live` uses a raw `MapGet` that bypasses the ASP.NET Core
health-check middleware — it does not participate in `IHealthCheckPublisher`, `HealthReport`, or
UI integration. Must be removed.
### §2 Probe coverage
| Probe | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Database connectivity | ✅ `DatabaseHealthCheck` (query probe) | ⛔ none | ✅ `DatabaseHealthCheck` (`CanConnectAsync`) |
| Akka cluster membership | ✅ `AkkaClusterHealthCheck` (2-way) | n/a (no Akka) | ✅ `AkkaClusterHealthCheck` (3-way) |
| Active / leader node | ✅ `AdminRoleLeaderHealthCheck` (role-filtered) | n/a | ✅ `ActiveNodeHealthCheck` (role-less) |
| Downstream gRPC dependency | ⛔ none | ⛔ none | ⛔ none |
**Gap P1 (P1):** MxAccessGateway has zero probes — `AddHealthChecks()` at
`GatewayApplication.cs:61` is dead code. Minimum viable: a `GrpcDependencyHealthCheck`
targeting the x86 worker IPC channel.
**Gap P2:** No project probes its downstream gRPC dependency. OtOpcUa should probe the
MxAccessGateway channel; MxAccessGateway should probe the worker IPC.
**Gap P3:** Dead `AddHealthChecks()` in MxAccessGateway (`GatewayApplication.cs:61`) should be
removed or replaced — it currently implies health checks are configured when they are not.
### §3 Akka status-policy divergence
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe implementation | Scans `State.Members` for self by address | Reads `SelfMember.Status` directly |
| Joining status | Degraded (not in Members as Up) | Healthy |
| Leaving/Exiting status | Degraded | Degraded |
| Other (Removed, Down…) | Degraded | Unhealthy |
| ActorSystem null guard | — (none; `ActorSystem` injected directly) | ✅ Degraded if null |
The two implementations diverge in how they classify `Joining` (ScadaBridge calls it Healthy;
OtOpcUa would see it as Degraded because `SelfMember` with status `Joining` would not appear as
`Up` in the member scan). They also diverge in the Removed/Down classification (ScadaBridge
Unhealthy, OtOpcUa Degraded).
The shared `ZB.MOM.WW.Health.Akka.AkkaClusterHealthCheck` ships two presets to preserve both
behaviors rather than forcing one onto the other:
- **Default** — ScadaBridge's three-way policy (`Up`/`Joining`=Healthy, `Leaving`/`Exiting`=Degraded,
else Unhealthy)
- **OtOpcUaCompat** — OtOpcUa's self-Up-among-members scan (found Up=Healthy, not found=Degraded)
**Gap A1:** OtOpcUa adopts the `OtOpcUaCompat` preset; ScadaBridge adopts the `Default` preset.
Both preserve existing behavior without forcing convergence on a single policy.
**Gap A2:** OtOpcUa's `AkkaClusterHealthCheck` injects `ActorSystem` directly (no null guard).
The shared implementation injects via `AkkaHostedService` for startup safety.
### §4 Database probe technique
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe method | `db.Deployments.AsNoTracking().Take(1).ToListAsync()` (query) | `_dbContext.Database.CanConnectAsync()` (connection only) |
| Injection style | `IDbContextFactory<T>` (pooled, safe for concurrent probes) | `DbContext` directly (scoped, requires care in background use) |
| Schema verification | ✅ implies schema is applied | ⛔ connection only |
**Gap D1:** `ZB.MOM.WW.Health.EntityFrameworkCore.DatabaseHealthCheck<TContext>` uses
`CanConnectAsync` as the default (ScadaBridge behavior). An optional `ProbeQuery` delegate covers
OtOpcUa's stricter approach. Both apps retain their existing probe semantics; neither is forced
to change unless desired.
**Gap D2:** ScadaBridge injects `DbContext` directly; the shared probe should use
`IDbContextFactory<TContext>` for safe reuse from a background-service health-check context.
ScadaBridge's DI registration will need updating on adoption.
### §5 Active-node / leader check
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe type | `AdminRoleLeaderHealthCheck` (role-filtered: `"admin"`) | `ActiveNodeHealthCheck` (role-less; Up + leader) |
| Non-role-bearing node | Healthy immediately | n/a (all central nodes have no role filter) |
| Leader status | Healthy | Healthy |
| Non-leader (standby) | Degraded | Unhealthy |
| `IActiveNodeGate` backing | Not present | `ActiveNodeGate` (separate type, duplicated logic) |
**Gap L1:** `ZB.MOM.WW.Health.Akka.ActiveNodeHealthCheck` with an optional `RoleFilter`
parameter unifies both behaviors. OtOpcUa passes `RoleFilter = "admin"` (role-filtered);
ScadaBridge uses no role filter.
**Gap L2:** ScadaBridge's `ActiveNodeGate` duplicates `ActiveNodeHealthCheck` logic. The shared
`IActiveNodeGate` seam + a backing singleton eliminates the duplication.
### §6 Response writer
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Writer | Default (plain-text/JSON) | Bespoke `GatewayHealthReply` JSON | `UIResponseWriter.WriteHealthCheckUIResponse` |
**Gap W1:** the shared `ZB.MOM.WW.Health` package ships a canonical JSON response writer
(lifting `HealthChecks.UI.Client` style to the default). All three projects adopt it on
`MapZbHealth()` call — no per-project writer wiring needed.
### §7 Endpoint authentication
Both OtOpcUa and ScadaBridge expose health endpoints without authentication (`AllowAnonymous` or
open by default). MxAccessGateway's `/health/live` has no authentication requirement. The spec
canonizes this: health tiers are `AllowAnonymous`; `MapZbHealth()` applies `AllowAnonymous` by
default.
No gap — consistent across all three. `MapZbHealth()` should document and enforce this default.
## Adoption backlog (ordered)
| # | Item | Projects | Priority | Effort | Risk | Notes |
|---|---|---|---|---|---|---|
| 1 | MxAccessGateway: remove dead `/health/live` + `AddHealthChecks()`, add `GrpcDependencyHealthCheck` (worker IPC) + `MapZbHealth()` | MxGateway | P1 | S | Low | Gap T1, T3, P1, P3 — no probes/tiers today; highest delta |
| 2 | OtOpcUa: replace 3 bespoke checks with shared probes (`AkkaClusterHealthCheck` OtOpcUaCompat + `ActiveNodeHealthCheck` role-filtered + `DatabaseHealthCheck<T>` ProbeQuery) | OtOpcUa | P2 | S | Low | Gap A1, D1, L1 |
| 3 | ScadaBridge: replace 3 bespoke checks with shared probes (Default policy + role-less Active + `CanConnectAsync`) + add `/healthz` + unify `ActiveNodeGate` with `IActiveNodeGate` seam | ScadaBridge | P2 | S | Low | Gap T2, A1, D2, L1, L2 |
| 4 | OtOpcUa + MxAccessGateway: add `GrpcDependencyHealthCheck` for downstream gRPC channel | OtOpcUa, MxGateway | P2 | S | Low | Gap P2 — closes the silent-gateway-down scenario |
| 5 | All: adopt canonical response writer (switch from per-project writers to `MapZbHealth` default) | all 3 | P3 | XS | Low | Gap W1 — mechanical; bundled with #13 |
| 6 | DB injection style: switch ScadaBridge from injected `DbContext` to `IDbContextFactory<T>` | ScadaBridge | P3 | XS | Low | Gap D2 — background-service safety |
**Note: adoption items #16 are all follow-on tasks.** They are tracked here as the backlog for
after `ZB.MOM.WW.Health` @ 0.1.0 is published. The library build itself (nupkgs, tests) is a
separate task. This is consistent with how `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` are structured:
the library is built first; adoption by the three apps is the next step.
+88
View File
@@ -0,0 +1,88 @@
# Health (readiness / liveness / active-node)
Second normalized component under the operability cluster. **Goal: path to shared code** — converge
the three sister projects onto a common three-tier health endpoint convention and a set of shared
probe implementations, proposed as the `ZB.MOM.WW.Health` library set (3 packages), while each
project keeps its own probe registration and orchestrator wiring.
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Health.md`](shared-contract/ZB.MOM.WW.Health.md)
- Divergences + backlog: [`GAPS.md`](GAPS.md)
- Current state, per project: [`current-state/`](current-state/)
## Why health is a strong normalization candidate
Both OtOpcUa and ScadaBridge trace their health-check structure to the same "ScadaLink three-tier
pattern" (`HealthEndpoints.cs:13` says so explicitly) but have already diverged in probe logic,
status semantics, response writer, and endpoint registration style. MxAccessGateway has no shared
ancestry here — it has a single hardcoded `/health/live` endpoint with no real probes at all.
The common core (three tiers, database probe, Akka cluster probe, active-node probe) is
re-implemented twice and absent once. Shared probe implementations with configurable policies
close the gap without forcing identical behavior onto projects with legitimately different cluster
semantics.
## Status by project
| Project | Endpoints today | Probes today | Response writer | `/healthz` | `IActiveNodeGate` | Adoption status |
|---|---|---|---|---|---|---|
| **OtOpcUa** | `/health/ready`, `/health/active`, `/healthz` | Database (query), AkkaCluster (2-way), AdminRoleLeader (role-filtered) | Default (plain-text/JSON) | ✅ present | — | Not started |
| **MxAccessGateway** | `/health/live` only (raw `MapGet`; hardcoded `"Healthy"`) | **None** (`AddHealthChecks()` called but unused) | Bespoke `GatewayHealthReply` JSON | ⛔ absent | — | Not started |
| **ScadaBridge** | `/health/ready`, `/health/active` | Database (`CanConnectAsync`), AkkaCluster (3-way), ActiveNode (role-less) | `HealthChecks.UI.Client` JSON | ⛔ absent | `ActiveNodeGate` (backs Inbound API 503 gate) | Not started |
See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the
code-verified detail and its adoption plan.
## Normalized vs. left per-project
**Normalized (the shared target):**
- Three-tier endpoint convention: `/health/ready` (tag `ready`), `/health/active` (tag `active`),
`/healthz` (bare liveness). Mapped by `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
- Canonical JSON response writer (lifted from `HealthChecks.UI.Client` style; no per-project
writer wiring needed).
- `IActiveNodeGate` seam — generalized from ScadaBridge's `ActiveNodeGate`; wired into `MapZbHealth`
for automatic active-tier response.
- `GrpcDependencyHealthCheck` — reachability probe for a downstream gRPC dependency (covers
OtOpcUa → MxAccessGateway channel and MxAccessGateway → worker IPC).
- `AkkaClusterHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with a configurable status policy.
Default = ScadaBridge's three-way policy; `OtOpcUaCompat` preset preserves OtOpcUa's two-way
self-Up-among-members scan.
- `ActiveNodeHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with an optional role filter. Role-less =
ScadaBridge's behavior (Up + cluster leader); role-filtered = OtOpcUa's `AdminRoleLeader`
behavior.
- `DatabaseHealthCheck<TContext>` (in `ZB.MOM.WW.Health.EntityFrameworkCore`) with default
`CanConnectAsync` and an optional `ProbeQuery` delegate.
- `AllowAnonymous` on all three tiers by default (consistent across all three projects today).
**Left per-project (not forced together):**
- Which probes each app registers, their names, and which tags they carry.
- Orchestrator / Traefik wiring (sidecars, route rules, upstreams).
- ScadaBridge's `HealthMonitoring/` distributed aggregation pipeline (`SiteHealthCollector`,
`CentralHealthAggregator`, `HealthReportSender`, etc.) — domain-specific, no shared-library
equivalent.
- MxAccessGateway's `GatewayHealthReply` metadata (`DefaultBackend`, `WorkerProtocolVersion`) —
keep as a bespoke `/info` endpoint.
- The x86 worker process — out of process and out of scope; the gateway-side
`GrpcDependencyHealthCheck` observes it indirectly.
## Package structure
`ZB.MOM.WW.Health` ships as three dependency-split packages:
| Package | Contents | Consumers |
|---|---|---|
| `ZB.MOM.WW.Health` | Core tiers, `MapZbHealth`, canonical writer, `IActiveNodeGate`, `GrpcDependencyHealthCheck` | All three |
| `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` + status presets, `ActiveNodeHealthCheck` + role filter | OtOpcUa, ScadaBridge |
| `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck<TContext>` + optional probe delegate | OtOpcUa, ScadaBridge |
MxAccessGateway consumes the core package only (no Akka, no EF). OtOpcUa and ScadaBridge consume
all three.
## Component status
**Status: Draft — library built at 0.1.0.** Spec and shared-contract written; current-state docs
verified; GAPS backlog populated. Library implemented and packed at
[`../../ZB.MOM.WW.Health/`](../../ZB.MOM.WW.Health/) (3 packages, 58 tests;
`ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`).
Adoption by the three apps is the next follow-on tracked in [`GAPS.md`](GAPS.md).
@@ -0,0 +1,133 @@
# Health — current state: MxAccessGateway
Repo: `~/Desktop/MxAccessGateway`. Stack: .NET 10 gateway (x64) + .NET 4.8 worker (x86), gRPC;
solution `src/MxGateway.sln`.
Health code lives in `src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`. All paths relative
to repo root.
Verified 2026-06-01.
**Summary: bare liveness only.** MxAccessGateway has a single `/health/live` endpoint that returns
a hardcoded `GatewayHealthReply` JSON object. `AddHealthChecks()` is called at startup but is
entirely unused — no `IHealthCheck` implementations are registered, `MapHealthChecks` is never
called, and there is no readiness or active-node tier. The net48 x86 worker process has no HTTP
server and therefore no health endpoint of any kind.
## 1. Endpoint wiring
`src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs`:
- `:61``builder.Services.AddHealthChecks()` is called in the DI registration block. **This call
is dead**: no `.AddCheck<T>()` call follows it, no `MapHealthChecks` is ever called. The
framework registers the health-check infrastructure but nothing is wired through it.
- `:139145``MapGatewayEndpoints` maps a raw `endpoints.MapGet("/health/live", ...)` (not
`MapHealthChecks`). The handler is an inline lambda that returns `Results.Ok(new GatewayHealthReply(...))`
with a hardcoded `Status: "Healthy"`:
```csharp
endpoints.MapGet(
"/health/live",
() => Results.Ok(new GatewayHealthReply(
Status: "Healthy",
DefaultBackend: GatewayContractInfo.DefaultBackendName,
WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
.WithName("LiveHealth");
```
This endpoint always returns HTTP 200 `{"Status":"Healthy",...}` as long as the process is alive.
It carries no authentication requirement (no `[Authorize]` or `.RequireAuthorization()`).
## 2. Response shape
`GatewayHealthReply` is a record with three fields:
- `Status` — always `"Healthy"` (hardcoded string, not the ASP.NET Core `HealthStatus` enum)
- `DefaultBackend` — value of `GatewayContractInfo.DefaultBackendName` (the configured backend
name, useful for confirming which gateway instance a probe hit)
- `WorkerProtocolVersion` — value of `GatewayContractInfo.WorkerProtocolVersion` (the gRPC
protocol version the gateway expects from the worker, useful for version-skew detection)
The response is not `HealthChecks.UI.Client` JSON and is not the standard ASP.NET Core health
response shape. It is a bespoke JSON record.
## 3. Probes
None. There is no `IHealthCheck` registered. The `/health/live` response does not reflect:
- Whether the SQLite auth-store is reachable
- Whether any active MXAccess session is functional
- Whether the x86 worker named-pipe IPC is connected or the worker process is alive
- Whether the gRPC service is actually accepting calls
The endpoint is purely a process liveness indicator.
## 4. Tier coverage
| Tier | Endpoint | Status |
|---|---|---|
| Process liveness | `/health/live` (raw `MapGet`) | ✅ present (but non-standard) |
| Readiness | `/health/ready` | ⛔ absent |
| Active node | `/health/active` | ⛔ absent (not Akka-based; not applicable as-is) |
| `healthz` convention | `/healthz` | ⛔ absent |
MxAccessGateway is not an Akka.NET application — it has no cluster, no leader election, and no
active-node concept. The "active" tier in the shared spec translates here to "is the worker process
connected and the gRPC service ready to accept calls?" rather than cluster leadership.
## 5. x86 worker
`ZB.MOM.WW.MxGateway.Worker` is a .NET 4.8 console application communicating with the gateway
over Windows named-pipe IPC. It has no HTTP server, no health endpoint, and no exposure to any
probe mechanism. Its liveness must be inferred indirectly — either via the gateway process
monitoring it (not currently implemented) or via the `GrpcDependencyHealthCheck` the gateway
could use to probe the IPC channel.
## 6. Notable gaps
- `AddHealthChecks()` at `:61` is dead code. No `IHealthCheck` is ever registered via this call.
- `/health/live` uses `MapGet` (a raw minimal-API handler) rather than `MapHealthChecks`. It
bypasses the ASP.NET Core health-check middleware entirely, which means it does not participate
in the standard health-check pipeline (no `IHealthCheckPublisher`, no `HealthReport`, no UI
integration).
- The hardcoded `"Healthy"` status means the endpoint cannot reflect real probe results even if
probes were added later — the handler must be replaced, not just supplemented.
- No readiness gating: orchestrators (Kubernetes, Traefik) that rely on `/health/ready` returning
503 until the process is actually ready will receive 200 (or 404) from MxAccessGateway today.
---
## Adoption plan → `ZB.MOM.WW.Health`
**Replace `/health/live` + wire the shared tiers:**
The `AddHealthChecks()` call at `GatewayApplication.cs:61` is already present — it just needs
probes registered against it. The raw `MapGet("/health/live", ...)` handler at `:139145` must be
removed and replaced with `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
Steps:
1. **Remove** the inline `MapGet("/health/live", ...)` lambda (`:139145`). The `GatewayHealthReply`
record and `DefaultBackend`/`WorkerProtocolVersion` metadata can be surfaced differently (e.g., a
`/info` endpoint or as custom data on the health response).
2. **Register a `GrpcDependencyHealthCheck`** (from `ZB.MOM.WW.Health`) that probes the
named-pipe IPC channel to the x86 worker. Tag `["ready"]`. This replaces the hardcoded
liveness-only response with a real probe that reflects whether the worker is reachable.
3. **Optionally add a `GrpcDependencyHealthCheck`** for any downstream gRPC dependency (e.g., the
Galaxy Repository connection) if the gateway is expected to be healthy only when its upstreams are
reachable. Tag `["ready"]`.
4. **Call `app.MapZbHealth()`** — this maps `/health/ready` (tag `ready`), `/health/active` (tag
`active`; initially empty — no active-node concept in MxGateway), and `/healthz` (bare liveness).
The `/healthz` endpoint replaces the semantic role that `/health/live` served today.
5. **Do not add `ZB.MOM.WW.Health.Akka`** — MxAccessGateway has no Akka dependency. The consumer
matrix in the design specifies MxGateway uses the core package only.
**Keep bespoke:**
- The `WorkerProtocolVersion` / `DefaultBackend` metadata from `GatewayHealthReply` is
MxAccessGateway-specific; keep it as a separate `/info` endpoint or embed it as `Data` on a
custom probe rather than normalizing it into the shared contract.
- The x86 worker itself (net48 console, named-pipe IPC, no HTTP) remains outside the shared health
scheme. The `GrpcDependencyHealthCheck` observes the worker indirectly from the gateway side.
- Per-gateway auth and TLS concerns on who may call health endpoints remain per-project.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Health`
library build. MxGateway is the **highest-priority adopter** (P1 gap — no probes/tiers today)
and should be the first app wired up once the nupkg is available.
@@ -0,0 +1,154 @@
# Health — current state: OtOpcUa
Repo: `~/Desktop/OtOpcUa`. Stack: .NET 10, Akka.NET, OPC UA; solution `ZB.MOM.WW.OtOpcUa.slnx`.
Health code lives in `src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/`. All paths relative to repo root.
Verified 2026-06-01.
Full three-tier pattern: `/health/ready`, `/health/active`, and `/healthz`. Three probes covering
the database, the Akka cluster, and the admin-role leader. All endpoints are `AllowAnonymous` to
permit Traefik and load-balancer probing without credentials.
## 1. Endpoint wiring
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs`:
- `:13` — XML comment explicitly names this as "ScadaLink's three-tier pattern: `ready` = boot ok;
`active` = fully serving traffic; `healthz` = bare process liveness."
- `:17``AddOtOpcUaHealth(IServiceCollection)` calls `services.AddHealthChecks()` and registers
all three probes (lines 2022):
- `DatabaseHealthCheck` name `"configdb"`, tags `["ready","active"]`
- `AkkaClusterHealthCheck` name `"akka"`, tags `["ready","active"]`
- `AdminRoleLeaderHealthCheck` name `"admin-leader"`, tags `["active"]` only
- `:28``MapOtOpcUaHealth(IEndpointRouteBuilder)` maps three endpoints (lines 3344):
- `/health/ready` — predicate `c => c.Tags.Contains("ready")`, `.AllowAnonymous()` (lines 3336)
- `/health/active` — predicate `c => c.Tags.Contains("active")`, `.AllowAnonymous()` (lines 3740)
- `/healthz` — predicate `_ => false` (no probes run; bare process liveness only), `.AllowAnonymous()` (lines 4144)
`Program.cs`:
- `:137``builder.Services.AddOtOpcUaHealth()`
- `:159``app.MapOtOpcUaHealth()`
Response writer: default ASP.NET Core plain-text/JSON (no `HealthChecks.UI.Client`).
## 2. Probes
### DatabaseHealthCheck
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/DatabaseHealthCheck.cs`:
- `:9` — injects `IDbContextFactory<OtOpcUaConfigDbContext>`
- `:2537` — opens a pooled context via `CreateDbContextAsync`, runs
`db.Deployments.AsNoTracking().Take(1).ToListAsync()`. If the query succeeds →
`HealthCheckResult.Healthy("ConfigDb reachable")` (`:31`). If it throws →
`HealthCheckResult.Unhealthy("ConfigDb unreachable", ex)` (`:35`). No `Degraded` path.
The probe exercises a real query (not just `CanConnectAsync`) — it confirms the `Deployments` table
is readable, which implies the schema migration has run. This is **stricter** than ScadaBridge's
`CanConnectAsync` but more opaque about the failure reason.
Tags on registration: `["ready","active"]` — the database must be reachable for both readiness and
active-node determination.
### AkkaClusterHealthCheck
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AkkaClusterHealthCheck.cs`:
- `:9` — injects `ActorSystem` directly
- `:2733` — calls `Cluster.Get(_system)`, scans `cluster.State.Members` for the member whose
`Address == cluster.SelfAddress` and `Status == MemberStatus.Up`:
- Found Up → `HealthCheckResult.Healthy($"Self Up; {cluster.State.Members.Count} member(s)")` (`:32`)
- Not found → `HealthCheckResult.Degraded("Self not yet Up in cluster")` (`:33`)
No `Unhealthy` path — joining/leaving/removed nodes are all reported as `Degraded`. This differs from
ScadaBridge's more granular three-way policy (see GAPS).
Tags on registration: `["ready","active"]`.
### AdminRoleLeaderHealthCheck
`src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AdminRoleLeaderHealthCheck.cs`:
- `:14` — injects `IClusterRoleInfo`
- `:2738` — three-path logic:
- Node does not carry the `"admin"` role → `Healthy("Node does not carry admin role")` (`:30`) —
non-admin nodes are immediately healthy, so this probe never gates a non-admin node.
- Admin role + node is the role leader → `Healthy($"Admin leader ({...})")` (`:36`)
- Admin role + not the leader → `Degraded($"Admin member but not leader (leader=...)")` (`:37`)
Tags on registration: `["active"]` only — does not participate in `/health/ready`. The intent is
Traefik routing: the active node (admin-role leader) gets sticky admin-UI traffic; standby nodes
are reachable for data-plane OPC UA but report `Degraded` on `/health/active` so the load balancer
does not route control-plane traffic to them.
Note: no `Unhealthy` path for the role-filter case. If the ActorSystem is not running, `IClusterRoleInfo`
presumably returns safe defaults (no role); this is not separately health-checked.
## 3. Tag / tier summary
| Probe | `/health/ready` | `/health/active` | `/healthz` |
|---|---|---|---|
| `DatabaseHealthCheck` | ✅ | ✅ | — |
| `AkkaClusterHealthCheck` | ✅ | ✅ | — |
| `AdminRoleLeaderHealthCheck` | — | ✅ | — |
| (no probes) | — | — | ✅ (bare liveness) |
`/healthz` runs zero probes — it is a pure process liveness sentinel (process reachable = healthy;
a crashed process = no response). Kubernetes liveness probes, Traefik TCP checks, and uptime
monitors use this tier.
## 4. Downstream dependency coverage
No probe for the upstream MxAccessGateway gRPC channel. If the gateway is unreachable, OtOpcUa
reports healthy here (the GalaxyDriver will surface errors in OPC UA diagnostics, but `/health/ready`
and `/health/active` will not reflect it). This is a gap that the shared `GrpcDependencyHealthCheck`
probe in `ZB.MOM.WW.Health` would close.
## 5. Notable design choices
- **AllowAnonymous on all tiers** — see `HealthEndpoints.cs:3032` comment: "Without it the
`AddOtOpcUaAuth` fallback policy 401s every probe and Traefik marks every backend unhealthy."
- **Query probe, not `CanConnectAsync`** — the `Deployments` query validates that the schema has
been applied. ScadaBridge uses `CanConnectAsync`; neither is wrong but they diverge.
- **`Degraded` semantics** — the Akka check uses `Degraded` (not `Unhealthy`) for a joining/pre-Up
node. ASP.NET Core maps `Degraded` to HTTP 200 by default; Traefik sees 200 and considers the
node ready. If `Unhealthy` (HTTP 503) is required to gate traffic, the `Degraded` path is
insufficient.
- **`IClusterRoleInfo` abstraction** — the admin-leader check depends on `IClusterRoleInfo`, an OtOpcUa
interface, not the raw `Akka.Cluster.Cluster` API. This is a testability-friendly layer absent in
ScadaBridge's direct Akka usage.
---
## Adoption plan → `ZB.MOM.WW.Health`
**Replace with shared probes:**
- `AkkaClusterHealthCheck``ZB.MOM.WW.Health.Akka.AkkaClusterHealthCheck` using the
**`OtOpcUaCompat` preset** (self-Up-among-members scan → Healthy/Degraded). The preset keeps
OtOpcUa's existing two-way policy without forcing ScadaBridge's three-way policy onto it.
- `AdminRoleLeaderHealthCheck``ZB.MOM.WW.Health.Akka.ActiveNodeHealthCheck` with
`RoleFilter = "admin"`. The role-filter parameter produces identical behavior: non-admin nodes
immediately healthy, admin leader healthy, admin non-leader degraded.
- `DatabaseHealthCheck``ZB.MOM.WW.Health.EntityFrameworkCore.DatabaseHealthCheck<OtOpcUaConfigDbContext>`
with a `ProbeQuery` delegate of `db => db.Deployments.AsNoTracking().Take(1).ToListAsync()`.
The delegate preserves the stricter query probe rather than falling back to `CanConnectAsync`.
- Add `GrpcDependencyHealthCheck` targeting the MxAccessGateway channel (closes the downstream
dependency gap noted in §4). Tag `["ready","active"]`.
- Replace `AddOtOpcUaHealth` / `MapOtOpcUaHealth` with
`services.AddHealthChecks().AddCheck<...>()` (one call per probe, per spec §5) +
`app.MapZbHealth()`. The `/healthz` bare-liveness tier is part of `MapZbHealth` by default —
no separate wiring needed.
**Keep bespoke:**
- `IClusterRoleInfo` and its Akka implementation — on adoption this testability seam is given up
for the health-check path. The shared `ActiveNodeHealthCheck` reads cluster role state from the
ActorSystem directly (resolving it lazily via `IServiceProvider`); it does not accept
`IClusterRoleInfo` as an injection point. This is an accepted trade-off: the shared implementation
is simpler and consistent across projects, while `IClusterRoleInfo` remains available elsewhere
in the OtOpcUa codebase where it is used outside health checks.
- The `AllowAnonymous` policy — this is an OtOpcUa auth concern; `MapZbHealth` must document that
callers are responsible for applying `AllowAnonymous` (or the shared helper applies it by default).
- Which probes are registered and their tag assignments — the shared library supplies the check
implementations; the wiring (which names, which tags, which options) remains per-project.
**Adoption is a follow-on task** (tracked in `GAPS.md`), not part of the `ZB.MOM.WW.Health`
library build. The library build delivers the shared implementations; adoption lands in the
OtOpcUa repo as a separate commit once the nupkg is available.

Some files were not shown because too many files have changed in this diff Show More