Files
scadaproj/docs/plans/2026-06-01-audit-component-design.md
T

15 KiB
Raw Blame History

Design — Audit normalization component + ZB.MOM.WW.Audit shared library

Date: 2026-06-01 Status: Approved design (brainstorm output). Implementation plan follows separately via the writing-plans workflow.

This design adds Audit as the next entry in the component-normalization program, following the exact arc already used for Auth (ZB.MOM.WW.Auth), UI-Theme (ZB.MOM.WW.Theme), and the in-flight Health/Telemetry pair: normalize the concern under components/, then build a thin, tested, packed shared library in this repo. It is the #3-ranked candidate in upcoming.md (Audit — "ties back to Auth").

Scope decisions (locked during brainstorm)

  1. Audit only — logging is out of scope. The parallel health/observability session already owns structured logging: its ZB.MOM.WW.Telemetry.Serilog package holds the shared Serilog bootstrap, SiteId/NodeRole/Host enrichers, trace_id/span_id correlation, an ILogRedactor seam, OTel log export, and the MxGateway MEL→Serilog migration. That is the upcoming.md Tier-2 "Logging" candidate. This session does not create a second Serilog owner.
  2. Deliverable depth = docs + a thin built library. Matches the house arc (Auth/Theme/Health/ Telemetry were all docs then a tested + packed lib) and components/README.md's "extract only what is genuinely common." No sister-repo adoption this round — adoption is deferred to GAPS.md, exactly where Auth/Theme/Health sit today.
  3. Canonical record shape = required core + optional common + JSON extension bag. No project's domain enums leak into the shared type; Actor stays a plain string (no hard dependency on ZB.MOM.WW.Auth).
  4. Redaction seam = aligned-but-independent. Audit defines its own IAuditRedactor (over AuditEvent), shaped + named to mirror Telemetry's ILogRedactor so a future ZB.MOM.WW.Hosting aggregator wires both with one mental model — but no cross-package dependency; audit stays thin.
  5. Packaging = single package ZB.MOM.WW.Audit. The shared core (record + seams + tiny helpers) has zero heavy dependencies — Akka/EF/SQLite/Serilog are per-project transport, which stays per-project. Auth split into 4 / Health into 3 only because they had heavy, independently-optional impls; audit has none. A future heavy shared sink (EF/Akka) would become an opt-in satellite then — YAGNI now.

The unifying hinge — audit closes the loop on Auth

Every audit row's Actor is the who — which is precisely the identity the ZB.MOM.WW.Auth component already normalizes (LDAP/GLAuth principal, API-key name). Audit is the write-side counterpart of Auth's read-side identity: Auth answers "who is this and what may they do," audit records "who did what." The spec ties them by stating Actor SHOULD be the ZB.MOM.WW.Auth principal at adoption time (kept as a string in the contract so the library carries no dependency on Auth).

Repo layout

scadaproj/
├─ components/
│   └─ audit/                          NEW normalization component (docs)
│       ├─ README.md                   overview + per-project status table (links into the docs below)
│       ├─ spec/SPEC.md                the ONE normalized target (Section 0: normalized vs left-per-project)
│       ├─ spec/EVENT-MODEL.md         canonical record + Outcome + per-project mapping reference
│       │                               (mirrors auth CANONICAL-ROLES.md / theme DESIGN-TOKENS.md)
│       ├─ shared-contract/ZB.MOM.WW.Audit.md   proposed public API on paper
│       ├─ current-state/{otopcua,mxaccessgw,scadabridge}/CURRENT-STATE.md   code-verified, file:line
│       └─ GAPS.md                     per-project deltas + adoption/extraction backlog
├─ ZB.MOM.WW.Audit/                    NEW built library (nested git repo, .NET 10) → 1 nupkg @ 0.1.0
└─ docs/plans/
    ├─ 2026-06-01-audit-component-design.md            (this design)
    └─ 2026-06-01-zb-mom-ww-audit-shared-library.md    (impl plan — from writing-plans)

Index updates (same discipline as prior components): add the audit/ row to components/README.md, the Component-normalization table in CLAUDE.md, and check off Audit (#3) in upcoming.md.

Code-verified current state (2026-06-01 scan)

OtOpcUa MxGateway ScadaBridge
Record AuditEvent (8 fields) ApiKeyAuditRecord (6 fields) AuditEvent (~25 fields)
Writer seam Akka tell → singleton IApiKeyAuditStore.AppendAsync IAuditWriter.WriteAsync (best-effort)
Redaction seam none scrubs in store IAuditPayloadFilter.Apply (truncate + redact, never throws)
Transport Akka cluster broadcast → AuditWriterActor (batch 500 / 5s, 2-layer dedup) SQLite append + list-recent Site SQLite hot-path → Central MS SQL ingest/reconcile/purge/partition-maintenance + hash-chain verify
Storage ConfigAuditLog EF entity (filtered-unique EventId index) SQLite table partitioned SQL Server datetime2, EF + migrations
Domain vocab EventType strings (DraftCreated/Published/OpcUaAccessDenied/…) API-key event types Channel/Kind/Status/ForwardState enums
Scope config writes + authz checks API-key auth events/denials only full who-did-what across site + central (CLI + UI + export)

Key finding: ScadaBridge is already at the target — it has IAuditWriter (best-effort, "failures must NEVER abort the user-facing action") + IAuditPayloadFilter (pure, never-throws, over-redacts on failure) with contracts near-identical to what we extract. The library essentially lifts ScadaBridge's seams, renaming the filter to the ILogRedactor-aligned IAuditRedactor. OtOpcUa's fire-and-forget Akka tell is morally the same best-effort writer; MxGateway's IApiKeyAuditStore is a specialized, narrower writer. The genuinely common core is the who/what/when/outcome/target/correlation + details record plus those two seams; transport and storage diverge wildly and stay per-project.

Key refs:

  • OtOpcUa src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs; src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs; src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs.
  • MxGateway src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/{IApiKeyAuditStore,ApiKeyAuditRecord,ApiKeyAuditEntry,SqliteApiKeyAuditStore}.cs.
  • ScadaBridge src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditEvent.cs; src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/IAuditWriter.cs; src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs (+ the whole ZB.MOM.WW.ScadaBridge.AuditLog/ Site+Central pipeline).

Library design — ZB.MOM.WW.Audit (1 package, BCL-only)

Canonical record + outcome

namespace ZB.MOM.WW.Audit;

public sealed record AuditEvent
{
    // REQUIRED core — who / what / when / outcome
    public required Guid           EventId       { get; init; }  // idempotency key
    public required DateTimeOffset OccurredAtUtc { get; init; }  // UTC (see note)
    public required string         Actor         { get; init; }  // who — = ZB.MOM.WW.Auth principal at adoption
    public required string         Action        { get; init; }  // what — verb/event-type string
    public required AuditOutcome   Outcome       { get; init; }  // Success | Failure | Denied

    // OPTIONAL common
    public string? Category      { get; init; }  // subsystem/grouping
    public string? Target        { get; init; }  // on-what (resource/method/connection)
    public string? SourceNode    { get; init; }  // emitting node
    public Guid?   CorrelationId { get; init; }  // join to originating request/workflow

    // EXTENSION — everything project-specific, as JSON
    public string? DetailsJson   { get; init; }
}

public enum AuditOutcome { Success, Failure, Denied }

Timestamp choice: DateTimeOffset — unambiguous UTC (MxGateway already uses it). ScadaBridge/OtOpcUa store UTC-forced DateTime and convert at their mapping boundary. (Swappable to UTC DateTime if the team prefers to match the storage majority; flagged as the one open detail.)

Why Outcome is in the required core: denials/failures are genuinely common — OtOpcUa OpcUaAccessDenied, MxGateway API-key denials, ScadaBridge InboundAuthFailure + AuditStatus. A 3-value Success | Failure | Denied enum normalizes them without importing any app's full taxonomy.

Two seams

// Lifts ScadaBridge's IAuditWriter: best-effort, MUST swallow internal failures, NEVER throw to caller.
public interface IAuditWriter
{
    Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}

// Mirrors Telemetry's ILogRedactor shape (aligned-but-independent). Pure function; MUST NOT throw
// (over-redact on internal failure). Generalizes ScadaBridge's IAuditPayloadFilter.
public interface IAuditRedactor
{
    AuditEvent Apply(AuditEvent rawEvent);
}

Thin shipped helpers (the only concrete types)

  • NullAuditRedactor — identity; the default when no policy is configured.
  • TruncatingAuditRedactor — caps DetailsJson/Target to a configured max + sets a truncation marker; never throws (over-redacts on failure). Generalizes ScadaBridge's truncation half. The secret-field policy (which fields/commands are sensitive) stays per-project via composition.
  • NoOpAuditWriter — discards (tests / disabled audit).
  • CompositeAuditWriter — fans out to N writers; one writer throwing does not stop the others (holds the best-effort contract).
  • RedactingAuditWriter — decorator: Apply the redactor, then delegate to an inner IAuditWriter. Generalizes ScadaBridge's "filter between event construction and the writer chain."
  • services.AddZbAudit(...) — DI extension wiring redactor + decorator; Null/NoOp by default.

How each repo maps onto the canonical record

Canonical OtOpcUa MxGateway ScadaBridge
EventId / OccurredAtUtc EventId / OccurredAtUtc new Guid / CreatedUtc EventId / OccurredAtUtc
Actor / Action Actor / Action KeyId / EventType Actor / Kind(+Channel)→str
Outcome derive from action denial → Denied StatusOutcome
Category / Target / SourceNode Category / — / SourceNode "ApiKey" / — / RemoteAddress Channel / Target / SourceNode
CorrelationId / DetailsJson CorrelationId / DetailsJson — / Details CorrelationId / Request+Response+Error+Extra → JSON

Stays per-project (explicitly NOT in the library)

  • Transport/storage: OtOpcUa Akka broadcast + AuditWriterActor + ConfigAuditLog; ScadaBridge Site-SQLite hot-path + Central MS-SQL ingest/reconcile/purge/partition-maintenance + hash-chain; MxGateway SQLite IApiKeyAuditStore.
  • Domain vocab: Channel/Kind/Status/ForwardState enums, OtOpcUa EventType strings — these map into Action/Category/DetailsJson.
  • Query / CLI / UI / export surfaces.
  • Each app's redaction policy (which fields are secret) — only the seam is shared.

Normalization component docs

Follows components/README.md's six-part layout (matching auth + ui-theme). spec/SPEC.md opens with a Section 0 stating normalized vs. left-per-project explicitly. spec/EVENT-MODEL.md is the reference doc (canonical record + Outcome + the mapping table), mirroring auth's CANONICAL-ROLES.md / theme's DESIGN-TOKENS.md. Three current-state/<project>/CURRENT-STATE.md at full code-verified depth, each ending in an Adoption plan. GAPS.md turns deltas into a prioritized backlog. Registers at status Draft (Draft → Reviewed → Adopting → Converged).

Testing & verification

Every type ships tests (mirrors auth's 172 / theme's 32; a thin lib lands ~4060; dotnet test from the library root):

  • Record/enum — required fields enforced; value-equality; OccurredAtUtc round-trips as UTC; DetailsJson passthrough; AuditOutcome values.
  • NullAuditRedactor — identity (input returned unchanged).
  • TruncatingAuditRedactor — caps DetailsJson/Target + sets truncation marker; never throws on malformed input (over-redacts on internal failure — the seam's hard contract).
  • NoOpAuditWriter — discards, completes.
  • CompositeAuditWriter — fans out to all inner writers; one writer throwing does not stop the rest.
  • RedactingAuditWriter — passes the redacted (not raw) event to the inner writer; never throws to caller.
  • AddZbAudit — resolves IAuditWriter/IAuditRedactor with NoOp/Null defaults; decorator composition wires.

Verification gates (evidence, not assertions): dotnet test green + dotnet pack1 nupkg @ 0.1.0.

GAPS.md / adoption backlog (deferred — adoption lives here)

  • Per-project divergences vs SPEC.md; each current-state ends in an Adoption plan.
  • ScadaBridge — already at the target; adoption is "align, don't replace" (rename IAuditPayloadFilterIAuditRedactor; its IAuditWriter already matches). Large surface, but mostly naming/contract alignment, no behaviour change. Risk: high blast radius, so low priority.
  • MxGateway — map IApiKeyAuditStore/ApiKeyAuditRecordIAuditWriter/AuditEvent. Low effort, but coordinate — the parallel session is already editing this repo (MEL→Serilog).
  • OtOpcUaAuditEvent → canonical record; AuditWriterActor : IAuditWriter; ConfigAuditLog mapping. Medium effort.
  • Cross-cutting items: Outcome normalization across all three; Actor = ZB.MOM.WW.Auth principal (closes the loop on Auth); IAuditRedactor naming aligned with Telemetry's ILogRedactor.

Build order

1. components/audit/ docs   (spec first — drives the API):
     current-state ×3 → SPEC + EVENT-MODEL → shared-contract → GAPS
2. ZB.MOM.WW.Audit library  (record + enum + 2 seams + helpers + AddZbAudit), tests, pack @ 0.1.0
3. Index/registry updates   (components/README, CLAUDE.md, upcoming.md #3) + GAPS cross-check
   (no adoption step — deferred to GAPS)

Implementation tasks (native task IDs)

  • #7 Author components/audit/ normalization docs (current-state ×3 + SPEC + EVENT-MODEL + shared-contract + GAPS)
  • #8 Build ZB.MOM.WW.Audit library (1 package: record + enum + 2 seams + helpers + AddZbAudit; tests; dotnet pack @ 0.1.0) — blocked by #7 (spec drives the API)
  • #9 Index/registry updates (components/README.md, CLAUDE.md, upcoming.md #3) + GAPS cross-check — blocked by #8

Adoption into the three apps is intentionally not a task here — it is the GAPS.md follow-on, identical to where Auth/Theme/Health adoption sits today.