docs: design for audit normalization component + ZB.MOM.WW.Audit

This commit is contained in:
Joseph Doherty
2026-06-01 06:32:39 -04:00
parent 3d25ee5090
commit 16540b3001
@@ -0,0 +1,246 @@
# Design — Audit normalization component + `ZB.MOM.WW.Audit` shared library
Date: 2026-06-01
Status: **Approved design** (brainstorm output). Implementation plan follows separately
via the writing-plans workflow.
This design adds **Audit** as the next entry in the [component-normalization](../../components/README.md)
program, following the exact arc already used for **Auth** (`ZB.MOM.WW.Auth`), **UI-Theme**
(`ZB.MOM.WW.Theme`), and the in-flight **Health/Telemetry** pair: normalize the concern under
`components/`, then build a thin, tested, packed shared library in this repo. It is the #3-ranked
candidate in [`upcoming.md`](../../upcoming.md) (Audit — "ties back to Auth").
## Scope decisions (locked during brainstorm)
1. **Audit only — logging is out of scope.** The parallel health/observability session already owns
structured logging: its `ZB.MOM.WW.Telemetry.Serilog` package holds the shared Serilog bootstrap,
`SiteId`/`NodeRole`/`Host` enrichers, `trace_id`/`span_id` correlation, an `ILogRedactor` seam, OTel
log export, **and** the MxGateway MEL→Serilog migration. That is the `upcoming.md` Tier-2 "Logging"
candidate. This session does **not** create a second Serilog owner.
2. **Deliverable depth = docs + a thin built library.** Matches the house arc (Auth/Theme/Health/
Telemetry were all docs *then* a tested + packed lib) and `components/README.md`'s "extract only what
is genuinely common." **No sister-repo adoption this round** — adoption is deferred to `GAPS.md`,
exactly where Auth/Theme/Health sit today.
3. **Canonical record shape = required core + optional common + JSON extension bag.** No project's
domain enums leak into the shared type; `Actor` stays a plain string (no hard dependency on
`ZB.MOM.WW.Auth`).
4. **Redaction seam = aligned-but-independent.** Audit defines its own `IAuditRedactor` (over
`AuditEvent`), shaped + named to mirror Telemetry's `ILogRedactor` so a future `ZB.MOM.WW.Hosting`
aggregator wires both with one mental model — but **no cross-package dependency**; audit stays thin.
5. **Packaging = single package `ZB.MOM.WW.Audit`.** The shared core (record + seams + tiny helpers)
has **zero heavy dependencies** — Akka/EF/SQLite/Serilog are per-project *transport*, which stays
per-project. Auth split into 4 / Health into 3 only because they had heavy, independently-optional
impls; audit has none. A future heavy shared sink (EF/Akka) would become an opt-in satellite then —
YAGNI now.
## The unifying hinge — audit closes the loop on Auth
Every audit row's `Actor` is the *who* — which is precisely the identity the `ZB.MOM.WW.Auth` component
already normalizes (LDAP/GLAuth principal, API-key name). Audit is the write-side counterpart of Auth's
read-side identity: Auth answers "who is this and what may they do," audit records "who did what." The
spec ties them by stating `Actor` SHOULD be the `ZB.MOM.WW.Auth` principal at adoption time (kept as a
string in the contract so the library carries no dependency on Auth).
## Repo layout
```
scadaproj/
├─ components/
│ └─ audit/ NEW normalization component (docs)
│ ├─ README.md overview + per-project status table (links into the docs below)
│ ├─ spec/SPEC.md the ONE normalized target (Section 0: normalized vs left-per-project)
│ ├─ spec/EVENT-MODEL.md canonical record + Outcome + per-project mapping reference
│ │ (mirrors auth CANONICAL-ROLES.md / theme DESIGN-TOKENS.md)
│ ├─ shared-contract/ZB.MOM.WW.Audit.md proposed public API on paper
│ ├─ current-state/{otopcua,mxaccessgw,scadabridge}/CURRENT-STATE.md code-verified, file:line
│ └─ GAPS.md per-project deltas + adoption/extraction backlog
├─ ZB.MOM.WW.Audit/ NEW built library (nested git repo, .NET 10) → 1 nupkg @ 0.1.0
└─ docs/plans/
├─ 2026-06-01-audit-component-design.md (this design)
└─ 2026-06-01-zb-mom-ww-audit-shared-library.md (impl plan — from writing-plans)
```
Index updates (same discipline as prior components): add the `audit/` row to `components/README.md`,
the Component-normalization table in [`CLAUDE.md`](../../CLAUDE.md), and check off **Audit (#3)** in
[`upcoming.md`](../../upcoming.md).
## Code-verified current state (2026-06-01 scan)
| | OtOpcUa | MxGateway | ScadaBridge |
|---|---|---|---|
| Record | `AuditEvent` (8 fields) | `ApiKeyAuditRecord` (6 fields) | `AuditEvent` (~25 fields) |
| Writer seam | Akka `tell` → singleton | `IApiKeyAuditStore.AppendAsync` | **`IAuditWriter.WriteAsync`** (best-effort) |
| Redaction seam | none | scrubs in store | **`IAuditPayloadFilter.Apply`** (truncate + redact, never throws) |
| Transport | Akka cluster broadcast → `AuditWriterActor` (batch 500 / 5s, 2-layer dedup) | SQLite append + list-recent | Site SQLite hot-path → Central MS SQL ingest/reconcile/purge/partition-maintenance + hash-chain verify |
| Storage | `ConfigAuditLog` EF entity (filtered-unique `EventId` index) | SQLite table | partitioned SQL Server `datetime2`, EF + migrations |
| Domain vocab | `EventType` strings (DraftCreated/Published/OpcUaAccessDenied/…) | API-key event types | `Channel`/`Kind`/`Status`/`ForwardState` enums |
| Scope | config writes + authz checks | API-key auth events/denials only | full who-did-what across site + central (CLI + UI + export) |
**Key finding:** ScadaBridge is **already at the target** — it has `IAuditWriter` (best-effort, "failures
must NEVER abort the user-facing action") + `IAuditPayloadFilter` (pure, never-throws, over-redacts on
failure) with contracts near-identical to what we extract. The library essentially *lifts ScadaBridge's
seams*, renaming the filter to the `ILogRedactor`-aligned `IAuditRedactor`. OtOpcUa's fire-and-forget
Akka `tell` is morally the same best-effort writer; MxGateway's `IApiKeyAuditStore` is a specialized,
narrower writer. The genuinely common core is the *who/what/when/outcome/target/correlation + details*
record plus those two seams; **transport and storage diverge wildly and stay per-project.**
Key refs:
- OtOpcUa `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs`;
`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs`;
`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs`.
- MxGateway `src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/{IApiKeyAuditStore,ApiKeyAuditRecord,ApiKeyAuditEntry,SqliteApiKeyAuditStore}.cs`.
- ScadaBridge `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditEvent.cs`;
`src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/IAuditWriter.cs`;
`src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs` (+ the whole
`ZB.MOM.WW.ScadaBridge.AuditLog/` Site+Central pipeline).
## Library design — `ZB.MOM.WW.Audit` (1 package, BCL-only)
### Canonical record + outcome
```csharp
namespace ZB.MOM.WW.Audit;
public sealed record AuditEvent
{
// REQUIRED core — who / what / when / outcome
public required Guid EventId { get; init; } // idempotency key
public required DateTimeOffset OccurredAtUtc { get; init; } // UTC (see note)
public required string Actor { get; init; } // who — = ZB.MOM.WW.Auth principal at adoption
public required string Action { get; init; } // what — verb/event-type string
public required AuditOutcome Outcome { get; init; } // Success | Failure | Denied
// OPTIONAL common
public string? Category { get; init; } // subsystem/grouping
public string? Target { get; init; } // on-what (resource/method/connection)
public string? SourceNode { get; init; } // emitting node
public Guid? CorrelationId { get; init; } // join to originating request/workflow
// EXTENSION — everything project-specific, as JSON
public string? DetailsJson { get; init; }
}
public enum AuditOutcome { Success, Failure, Denied }
```
**Timestamp choice:** `DateTimeOffset` — unambiguous UTC (MxGateway already uses it). ScadaBridge/OtOpcUa
store UTC-forced `DateTime` and convert at their mapping boundary. (Swappable to UTC `DateTime` if the
team prefers to match the storage majority; flagged as the one open detail.)
**Why `Outcome` is in the required core:** denials/failures are genuinely common — OtOpcUa
`OpcUaAccessDenied`, MxGateway API-key denials, ScadaBridge `InboundAuthFailure` + `AuditStatus`. A
3-value `Success | Failure | Denied` enum normalizes them without importing any app's full taxonomy.
### Two seams
```csharp
// Lifts ScadaBridge's IAuditWriter: best-effort, MUST swallow internal failures, NEVER throw to caller.
public interface IAuditWriter
{
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
// Mirrors Telemetry's ILogRedactor shape (aligned-but-independent). Pure function; MUST NOT throw
// (over-redact on internal failure). Generalizes ScadaBridge's IAuditPayloadFilter.
public interface IAuditRedactor
{
AuditEvent Apply(AuditEvent rawEvent);
}
```
### Thin shipped helpers (the only concrete types)
- `NullAuditRedactor` — identity; the default when no policy is configured.
- `TruncatingAuditRedactor` — caps `DetailsJson`/`Target` to a configured max + sets a truncation
marker; never throws (over-redacts on failure). Generalizes ScadaBridge's truncation half. The
secret-field *policy* (which fields/commands are sensitive) stays per-project via composition.
- `NoOpAuditWriter` — discards (tests / disabled audit).
- `CompositeAuditWriter` — fans out to N writers; one writer throwing does not stop the others
(holds the best-effort contract).
- `RedactingAuditWriter` — decorator: `Apply` the redactor, then delegate to an inner `IAuditWriter`.
Generalizes ScadaBridge's "filter between event construction and the writer chain."
- `services.AddZbAudit(...)` — DI extension wiring redactor + decorator; `Null`/`NoOp` by default.
### How each repo maps onto the canonical record
| Canonical | OtOpcUa | MxGateway | ScadaBridge |
|---|---|---|---|
| `EventId` / `OccurredAtUtc` | `EventId` / `OccurredAtUtc` | new Guid / `CreatedUtc` | `EventId` / `OccurredAtUtc` |
| `Actor` / `Action` | `Actor` / `Action` | `KeyId` / `EventType` | `Actor` / `Kind`(+`Channel`)→str |
| `Outcome` | derive from action | denial → `Denied` | `Status``Outcome` |
| `Category` / `Target` / `SourceNode` | `Category` / — / `SourceNode` | `"ApiKey"` / — / `RemoteAddress` | `Channel` / `Target` / `SourceNode` |
| `CorrelationId` / `DetailsJson` | `CorrelationId` / `DetailsJson` | — / `Details` | `CorrelationId` / Request+Response+Error+Extra → JSON |
### Stays per-project (explicitly NOT in the library)
- **Transport/storage:** OtOpcUa Akka broadcast + `AuditWriterActor` + `ConfigAuditLog`; ScadaBridge
Site-SQLite hot-path + Central MS-SQL ingest/reconcile/purge/partition-maintenance + hash-chain;
MxGateway SQLite `IApiKeyAuditStore`.
- **Domain vocab:** `Channel`/`Kind`/`Status`/`ForwardState` enums, OtOpcUa `EventType` strings — these
map into `Action`/`Category`/`DetailsJson`.
- **Query / CLI / UI / export** surfaces.
- Each app's redaction **policy** (which fields are secret) — only the *seam* is shared.
## Normalization component docs
Follows `components/README.md`'s six-part layout (matching auth + ui-theme). `spec/SPEC.md` opens with a
Section 0 stating normalized vs. left-per-project explicitly. `spec/EVENT-MODEL.md` is the reference doc
(canonical record + `Outcome` + the mapping table), mirroring auth's `CANONICAL-ROLES.md` / theme's
`DESIGN-TOKENS.md`. Three `current-state/<project>/CURRENT-STATE.md` at full code-verified depth, each
ending in an Adoption plan. `GAPS.md` turns deltas into a prioritized backlog. Registers at status
**Draft** (`Draft → Reviewed → Adopting → Converged`).
## Testing & verification
Every type ships tests (mirrors auth's 172 / theme's 32; a thin lib lands ~4060; `dotnet test` from the
library root):
- **Record/enum** — required fields enforced; value-equality; `OccurredAtUtc` round-trips as UTC;
`DetailsJson` passthrough; `AuditOutcome` values.
- **`NullAuditRedactor`** — identity (input returned unchanged).
- **`TruncatingAuditRedactor`** — caps `DetailsJson`/`Target` + sets truncation marker; **never throws**
on malformed input (over-redacts on internal failure — the seam's hard contract).
- **`NoOpAuditWriter`** — discards, completes.
- **`CompositeAuditWriter`** — fans out to all inner writers; one writer throwing does not stop the rest.
- **`RedactingAuditWriter`** — passes the *redacted* (not raw) event to the inner writer; never throws to
caller.
- **`AddZbAudit`** — resolves `IAuditWriter`/`IAuditRedactor` with `NoOp`/`Null` defaults; decorator
composition wires.
**Verification gates (evidence, not assertions):** `dotnet test` green + `dotnet pack` → **1 nupkg @
0.1.0**.
## GAPS.md / adoption backlog (deferred — adoption lives here)
- Per-project divergences vs `SPEC.md`; each `current-state` ends in an Adoption plan.
- **ScadaBridge** — already at the target; adoption is "align, don't replace" (rename
`IAuditPayloadFilter``IAuditRedactor`; its `IAuditWriter` already matches). Large surface, but mostly
naming/contract alignment, no behaviour change. Risk: high blast radius, so low priority.
- **MxGateway** — map `IApiKeyAuditStore`/`ApiKeyAuditRecord``IAuditWriter`/`AuditEvent`. Low effort,
but **coordinate** — the parallel session is already editing this repo (MEL→Serilog).
- **OtOpcUa** — `AuditEvent` → canonical record; `AuditWriterActor : IAuditWriter`; `ConfigAuditLog`
mapping. Medium effort.
- Cross-cutting items: `Outcome` normalization across all three; `Actor` = `ZB.MOM.WW.Auth` principal
(closes the loop on Auth); `IAuditRedactor` naming aligned with Telemetry's `ILogRedactor`.
## Build order
```
1. components/audit/ docs (spec first — drives the API):
current-state ×3 → SPEC + EVENT-MODEL → shared-contract → GAPS
2. ZB.MOM.WW.Audit library (record + enum + 2 seams + helpers + AddZbAudit), tests, pack @ 0.1.0
3. Index/registry updates (components/README, CLAUDE.md, upcoming.md #3) + GAPS cross-check
(no adoption step — deferred to GAPS)
```
## Implementation tasks (native task IDs)
- #7 Author `components/audit/` normalization docs (current-state ×3 + SPEC + EVENT-MODEL +
shared-contract + GAPS)
- #8 Build `ZB.MOM.WW.Audit` library (1 package: record + enum + 2 seams + helpers + `AddZbAudit`;
tests; `dotnet pack` @ 0.1.0) — blocked by #7 (spec drives the API)
- #9 Index/registry updates (`components/README.md`, `CLAUDE.md`, `upcoming.md` #3) + GAPS cross-check —
blocked by #8
Adoption into the three apps is intentionally **not** a task here — it is the `GAPS.md` follow-on,
identical to where Auth/Theme/Health adoption sits today.