Files
scadaproj/docs/plans/2026-06-01-audit-component-design.md
T

247 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design — Audit normalization component + `ZB.MOM.WW.Audit` shared library
Date: 2026-06-01
Status: **Approved design** (brainstorm output). Implementation plan follows separately
via the writing-plans workflow.
This design adds **Audit** as the next entry in the [component-normalization](../../components/README.md)
program, following the exact arc already used for **Auth** (`ZB.MOM.WW.Auth`), **UI-Theme**
(`ZB.MOM.WW.Theme`), and the in-flight **Health/Telemetry** pair: normalize the concern under
`components/`, then build a thin, tested, packed shared library in this repo. It is the #3-ranked
candidate in [`upcoming.md`](../../upcoming.md) (Audit — "ties back to Auth").
## Scope decisions (locked during brainstorm)
1. **Audit only — logging is out of scope.** The parallel health/observability session already owns
structured logging: its `ZB.MOM.WW.Telemetry.Serilog` package holds the shared Serilog bootstrap,
`SiteId`/`NodeRole`/`Host` enrichers, `trace_id`/`span_id` correlation, an `ILogRedactor` seam, OTel
log export, **and** the MxGateway MEL→Serilog migration. That is the `upcoming.md` Tier-2 "Logging"
candidate. This session does **not** create a second Serilog owner.
2. **Deliverable depth = docs + a thin built library.** Matches the house arc (Auth/Theme/Health/
Telemetry were all docs *then* a tested + packed lib) and `components/README.md`'s "extract only what
is genuinely common." **No sister-repo adoption this round** — adoption is deferred to `GAPS.md`,
exactly where Auth/Theme/Health sit today.
3. **Canonical record shape = required core + optional common + JSON extension bag.** No project's
domain enums leak into the shared type; `Actor` stays a plain string (no hard dependency on
`ZB.MOM.WW.Auth`).
4. **Redaction seam = aligned-but-independent.** Audit defines its own `IAuditRedactor` (over
`AuditEvent`), shaped + named to mirror Telemetry's `ILogRedactor` so a future `ZB.MOM.WW.Hosting`
aggregator wires both with one mental model — but **no cross-package dependency**; audit stays thin.
5. **Packaging = single package `ZB.MOM.WW.Audit`.** The shared core (record + seams + tiny helpers)
has **zero heavy dependencies** — Akka/EF/SQLite/Serilog are per-project *transport*, which stays
per-project. Auth split into 4 / Health into 3 only because they had heavy, independently-optional
impls; audit has none. A future heavy shared sink (EF/Akka) would become an opt-in satellite then —
YAGNI now.
## The unifying hinge — audit closes the loop on Auth
Every audit row's `Actor` is the *who* — which is precisely the identity the `ZB.MOM.WW.Auth` component
already normalizes (LDAP/GLAuth principal, API-key name). Audit is the write-side counterpart of Auth's
read-side identity: Auth answers "who is this and what may they do," audit records "who did what." The
spec ties them by stating `Actor` SHOULD be the `ZB.MOM.WW.Auth` principal at adoption time (kept as a
string in the contract so the library carries no dependency on Auth).
## Repo layout
```
scadaproj/
├─ components/
│ └─ audit/ NEW normalization component (docs)
│ ├─ README.md overview + per-project status table (links into the docs below)
│ ├─ spec/SPEC.md the ONE normalized target (Section 0: normalized vs left-per-project)
│ ├─ spec/EVENT-MODEL.md canonical record + Outcome + per-project mapping reference
│ │ (mirrors auth CANONICAL-ROLES.md / theme DESIGN-TOKENS.md)
│ ├─ shared-contract/ZB.MOM.WW.Audit.md proposed public API on paper
│ ├─ current-state/{otopcua,mxaccessgw,scadabridge}/CURRENT-STATE.md code-verified, file:line
│ └─ GAPS.md per-project deltas + adoption/extraction backlog
├─ ZB.MOM.WW.Audit/ NEW built library (nested git repo, .NET 10) → 1 nupkg @ 0.1.0
└─ docs/plans/
├─ 2026-06-01-audit-component-design.md (this design)
└─ 2026-06-01-zb-mom-ww-audit-shared-library.md (impl plan — from writing-plans)
```
Index updates (same discipline as prior components): add the `audit/` row to `components/README.md`,
the Component-normalization table in [`CLAUDE.md`](../../CLAUDE.md), and check off **Audit (#3)** in
[`upcoming.md`](../../upcoming.md).
## Code-verified current state (2026-06-01 scan)
| | OtOpcUa | MxGateway | ScadaBridge |
|---|---|---|---|
| Record | `AuditEvent` (8 fields) | `ApiKeyAuditRecord` (6 fields) | `AuditEvent` (~25 fields) |
| Writer seam | Akka `tell` → singleton | `IApiKeyAuditStore.AppendAsync` | **`IAuditWriter.WriteAsync`** (best-effort) |
| Redaction seam | none | scrubs in store | **`IAuditPayloadFilter.Apply`** (truncate + redact, never throws) |
| Transport | Akka cluster broadcast → `AuditWriterActor` (batch 500 / 5s, 2-layer dedup) | SQLite append + list-recent | Site SQLite hot-path → Central MS SQL ingest/reconcile/purge/partition-maintenance + hash-chain verify |
| Storage | `ConfigAuditLog` EF entity (filtered-unique `EventId` index) | SQLite table | partitioned SQL Server `datetime2`, EF + migrations |
| Domain vocab | `EventType` strings (DraftCreated/Published/OpcUaAccessDenied/…) | API-key event types | `Channel`/`Kind`/`Status`/`ForwardState` enums |
| Scope | config writes + authz checks | API-key auth events/denials only | full who-did-what across site + central (CLI + UI + export) |
**Key finding:** ScadaBridge is **already at the target** — it has `IAuditWriter` (best-effort, "failures
must NEVER abort the user-facing action") + `IAuditPayloadFilter` (pure, never-throws, over-redacts on
failure) with contracts near-identical to what we extract. The library essentially *lifts ScadaBridge's
seams*, renaming the filter to the `ILogRedactor`-aligned `IAuditRedactor`. OtOpcUa's fire-and-forget
Akka `tell` is morally the same best-effort writer; MxGateway's `IApiKeyAuditStore` is a specialized,
narrower writer. The genuinely common core is the *who/what/when/outcome/target/correlation + details*
record plus those two seams; **transport and storage diverge wildly and stay per-project.**
Key refs:
- OtOpcUa `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs`;
`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs`;
`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs`.
- MxGateway `src/ZB.MOM.WW.MxGateway.Server/Security/Authentication/{IApiKeyAuditStore,ApiKeyAuditRecord,ApiKeyAuditEntry,SqliteApiKeyAuditStore}.cs`.
- ScadaBridge `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditEvent.cs`;
`src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/IAuditWriter.cs`;
`src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs` (+ the whole
`ZB.MOM.WW.ScadaBridge.AuditLog/` Site+Central pipeline).
## Library design — `ZB.MOM.WW.Audit` (1 package, BCL-only)
### Canonical record + outcome
```csharp
namespace ZB.MOM.WW.Audit;
public sealed record AuditEvent
{
// REQUIRED core — who / what / when / outcome
public required Guid EventId { get; init; } // idempotency key
public required DateTimeOffset OccurredAtUtc { get; init; } // UTC (see note)
public required string Actor { get; init; } // who — = ZB.MOM.WW.Auth principal at adoption
public required string Action { get; init; } // what — verb/event-type string
public required AuditOutcome Outcome { get; init; } // Success | Failure | Denied
// OPTIONAL common
public string? Category { get; init; } // subsystem/grouping
public string? Target { get; init; } // on-what (resource/method/connection)
public string? SourceNode { get; init; } // emitting node
public Guid? CorrelationId { get; init; } // join to originating request/workflow
// EXTENSION — everything project-specific, as JSON
public string? DetailsJson { get; init; }
}
public enum AuditOutcome { Success, Failure, Denied }
```
**Timestamp choice:** `DateTimeOffset` — unambiguous UTC (MxGateway already uses it). ScadaBridge/OtOpcUa
store UTC-forced `DateTime` and convert at their mapping boundary. (Swappable to UTC `DateTime` if the
team prefers to match the storage majority; flagged as the one open detail.)
**Why `Outcome` is in the required core:** denials/failures are genuinely common — OtOpcUa
`OpcUaAccessDenied`, MxGateway API-key denials, ScadaBridge `InboundAuthFailure` + `AuditStatus`. A
3-value `Success | Failure | Denied` enum normalizes them without importing any app's full taxonomy.
### Two seams
```csharp
// Lifts ScadaBridge's IAuditWriter: best-effort, MUST swallow internal failures, NEVER throw to caller.
public interface IAuditWriter
{
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
// Mirrors Telemetry's ILogRedactor shape (aligned-but-independent). Pure function; MUST NOT throw
// (over-redact on internal failure). Generalizes ScadaBridge's IAuditPayloadFilter.
public interface IAuditRedactor
{
AuditEvent Apply(AuditEvent rawEvent);
}
```
### Thin shipped helpers (the only concrete types)
- `NullAuditRedactor` — identity; the default when no policy is configured.
- `TruncatingAuditRedactor` — caps `DetailsJson`/`Target` to a configured max + sets a truncation
marker; never throws (over-redacts on failure). Generalizes ScadaBridge's truncation half. The
secret-field *policy* (which fields/commands are sensitive) stays per-project via composition.
- `NoOpAuditWriter` — discards (tests / disabled audit).
- `CompositeAuditWriter` — fans out to N writers; one writer throwing does not stop the others
(holds the best-effort contract).
- `RedactingAuditWriter` — decorator: `Apply` the redactor, then delegate to an inner `IAuditWriter`.
Generalizes ScadaBridge's "filter between event construction and the writer chain."
- `services.AddZbAudit(...)` — DI extension wiring redactor + decorator; `Null`/`NoOp` by default.
### How each repo maps onto the canonical record
| Canonical | OtOpcUa | MxGateway | ScadaBridge |
|---|---|---|---|
| `EventId` / `OccurredAtUtc` | `EventId` / `OccurredAtUtc` | new Guid / `CreatedUtc` | `EventId` / `OccurredAtUtc` |
| `Actor` / `Action` | `Actor` / `Action` | `KeyId` / `EventType` | `Actor` / `Kind`(+`Channel`)→str |
| `Outcome` | derive from action | denial → `Denied` | `Status``Outcome` |
| `Category` / `Target` / `SourceNode` | `Category` / — / `SourceNode` | `"ApiKey"` / — / `RemoteAddress` | `Channel` / `Target` / `SourceNode` |
| `CorrelationId` / `DetailsJson` | `CorrelationId` / `DetailsJson` | — / `Details` | `CorrelationId` / Request+Response+Error+Extra → JSON |
### Stays per-project (explicitly NOT in the library)
- **Transport/storage:** OtOpcUa Akka broadcast + `AuditWriterActor` + `ConfigAuditLog`; ScadaBridge
Site-SQLite hot-path + Central MS-SQL ingest/reconcile/purge/partition-maintenance + hash-chain;
MxGateway SQLite `IApiKeyAuditStore`.
- **Domain vocab:** `Channel`/`Kind`/`Status`/`ForwardState` enums, OtOpcUa `EventType` strings — these
map into `Action`/`Category`/`DetailsJson`.
- **Query / CLI / UI / export** surfaces.
- Each app's redaction **policy** (which fields are secret) — only the *seam* is shared.
## Normalization component docs
Follows `components/README.md`'s six-part layout (matching auth + ui-theme). `spec/SPEC.md` opens with a
Section 0 stating normalized vs. left-per-project explicitly. `spec/EVENT-MODEL.md` is the reference doc
(canonical record + `Outcome` + the mapping table), mirroring auth's `CANONICAL-ROLES.md` / theme's
`DESIGN-TOKENS.md`. Three `current-state/<project>/CURRENT-STATE.md` at full code-verified depth, each
ending in an Adoption plan. `GAPS.md` turns deltas into a prioritized backlog. Registers at status
**Draft** (`Draft → Reviewed → Adopting → Converged`).
## Testing & verification
Every type ships tests (mirrors auth's 172 / theme's 32; a thin lib lands ~4060; `dotnet test` from the
library root):
- **Record/enum** — required fields enforced; value-equality; `OccurredAtUtc` round-trips as UTC;
`DetailsJson` passthrough; `AuditOutcome` values.
- **`NullAuditRedactor`** — identity (input returned unchanged).
- **`TruncatingAuditRedactor`** — caps `DetailsJson`/`Target` + sets truncation marker; **never throws**
on malformed input (over-redacts on internal failure — the seam's hard contract).
- **`NoOpAuditWriter`** — discards, completes.
- **`CompositeAuditWriter`** — fans out to all inner writers; one writer throwing does not stop the rest.
- **`RedactingAuditWriter`** — passes the *redacted* (not raw) event to the inner writer; never throws to
caller.
- **`AddZbAudit`** — resolves `IAuditWriter`/`IAuditRedactor` with `NoOp`/`Null` defaults; decorator
composition wires.
**Verification gates (evidence, not assertions):** `dotnet test` green + `dotnet pack` → **1 nupkg @
0.1.0**.
## GAPS.md / adoption backlog (deferred — adoption lives here)
- Per-project divergences vs `SPEC.md`; each `current-state` ends in an Adoption plan.
- **ScadaBridge** — already at the target; adoption is "align, don't replace" (rename
`IAuditPayloadFilter``IAuditRedactor`; its `IAuditWriter` already matches). Large surface, but mostly
naming/contract alignment, no behaviour change. Risk: high blast radius, so low priority.
- **MxGateway** — map `IApiKeyAuditStore`/`ApiKeyAuditRecord``IAuditWriter`/`AuditEvent`. Low effort,
but **coordinate** — the parallel session is already editing this repo (MEL→Serilog).
- **OtOpcUa** — `AuditEvent` → canonical record; `AuditWriterActor : IAuditWriter`; `ConfigAuditLog`
mapping. Medium effort.
- Cross-cutting items: `Outcome` normalization across all three; `Actor` = `ZB.MOM.WW.Auth` principal
(closes the loop on Auth); `IAuditRedactor` naming aligned with Telemetry's `ILogRedactor`.
## Build order
```
1. components/audit/ docs (spec first — drives the API):
current-state ×3 → SPEC + EVENT-MODEL → shared-contract → GAPS
2. ZB.MOM.WW.Audit library (record + enum + 2 seams + helpers + AddZbAudit), tests, pack @ 0.1.0
3. Index/registry updates (components/README, CLAUDE.md, upcoming.md #3) + GAPS cross-check
(no adoption step — deferred to GAPS)
```
## Implementation tasks (native task IDs)
- #7 Author `components/audit/` normalization docs (current-state ×3 + SPEC + EVENT-MODEL +
shared-contract + GAPS)
- #8 Build `ZB.MOM.WW.Audit` library (1 package: record + enum + 2 seams + helpers + `AddZbAudit`;
tests; `dotnet pack` @ 0.1.0) — blocked by #7 (spec drives the API)
- #9 Index/registry updates (`components/README.md`, `CLAUDE.md`, `upcoming.md` #3) + GAPS cross-check —
blocked by #8
Adoption into the three apps is intentionally **not** a task here — it is the `GAPS.md` follow-on,
identical to where Auth/Theme/Health adoption sits today.