docs(audit): spec + event-model

This commit is contained in:
Joseph Doherty
2026-06-01 07:04:54 -04:00
parent a7a8f1e493
commit 8f0b70d12f
2 changed files with 239 additions and 0 deletions
+94
View File
@@ -0,0 +1,94 @@
# Canonical event model (standardized)
Status: **Standardized**. The org-wide audit record + outcome enum every sister project maps onto.
This is the reference companion to [`SPEC.md`](SPEC.md) (mirroring auth's `CANONICAL-ROLES.md` /
theme's `DESIGN-TOKENS.md`): the field-by-field canonical record, the `AuditOutcome` definition with
which app states map onto each value, and the full per-project mapping table. The shared library
defines exactly this record; each project **projects its native record onto it** at the seam.
## The canonical record
```csharp
namespace ZB.MOM.WW.Audit;
public sealed record AuditEvent
{
// REQUIRED core — who / what / when / outcome
public required Guid EventId { get; init; } // idempotency key
public required DateTimeOffset OccurredAtUtc { get; init; } // normalized to UTC
public required string Actor { get; init; } // who — = ZB.MOM.WW.Auth principal at adoption
public required string Action { get; init; } // what — verb / event-type string
public required AuditOutcome Outcome { get; init; } // Success | Failure | Denied
// OPTIONAL common
public string? Category { get; init; } // subsystem / grouping bucket
public string? Target { get; init; } // on-what (resource / method / connection)
public string? SourceNode { get; init; } // emitting logical node / host
public Guid? CorrelationId { get; init; } // join to originating request / workflow
// EXTENSION — everything project-specific, as JSON
public string? DetailsJson { get; init; }
}
public enum AuditOutcome { Success, Failure, Denied }
```
### Field-by-field
| Field | Req? | Type | Meaning | Notes |
|---|:-:|---|---|---|
| `EventId` | yes | `Guid` | Idempotency key | Backs at-least-once transports: OtOpcUa's filtered-unique `EventId` index, ScadaBridge's first-write-wins. MxGateway has none today → **generate at write time**. |
| `OccurredAtUtc` | yes | `DateTimeOffset` | When it happened, UTC | MxGateway already uses `DateTimeOffset`. OtOpcUa / ScadaBridge store UTC-forced `DateTime` and widen at the mapping boundary. |
| `Actor` | yes | `string` | Who acted | SHOULD be the `ZB.MOM.WW.Auth` principal ([`SPEC.md`](SPEC.md) §4). Kept a `string` (no Auth dependency). Keyless events use a `"system"` / `"cli"` fallback rather than empty. |
| `Action` | yes | `string` | What was done (verb / event-type) | Carries each app's domain verb: OtOpcUa `EventType`, MxGateway `EventType`, ScadaBridge `{Channel}.{Kind}`. |
| `Outcome` | yes | `AuditOutcome` | Success / Failure / Denied | **New normalized field — no app stores it today; each derives it** (see below). |
| `Category` | no | `string?` | Coarse subsystem / grouping | OtOpcUa `Category` (`"Config"`); MxGateway constant `"ApiKey"`; ScadaBridge `Channel`. |
| `Target` | no | `string?` | The object acted on | ScadaBridge `Target` (direct). OtOpcUa / MxGateway have no dedicated field → null or fold into `DetailsJson`. |
| `SourceNode` | no | `string?` | Emitting logical node / host | OtOpcUa `SourceNode` (a logical node name, **not** an OPC UA NodeId); ScadaBridge `SourceNode`; MxGateway `RemoteAddress`. |
| `CorrelationId` | no | `Guid?` | Join to originating request / workflow | OtOpcUa / ScadaBridge direct; MxGateway has none today (left null). |
| `DetailsJson` | no | `string?` | Extension bag — all project-specific data | Must be valid JSON where stored (OtOpcUa enforces this with a CHECK constraint). Absorbs each app's surplus columns. |
## `AuditOutcome` — definition and app-state mapping
Three values, deliberately minimal — enough to normalize denials and failures without importing any
app's full taxonomy. `Outcome` is **derived** at each emit site (no app persists it today; OtOpcUa
encodes it implicitly in `EventType`, MxGateway in the event-type literal, ScadaBridge in `Status`):
| `AuditOutcome` | Meaning | OtOpcUa (`EventType`) | MxGateway (event type) | ScadaBridge (`AuditStatus` / `AuditKind`) |
|---|---|---|---|---|
| **`Success`** | The action completed | config-write verbs — `DraftCreated`, `DraftEdited`, `Published`, `RolledBack`, `NodeApplied`, `CredentialAdded`, `ClusterCreated`, `NodeAdded`, `ExternalIdReleased`, … | key-lifecycle — `init-db`, `create-key`, `list-keys`, `revoke-key`, `rotate-key` + all `dashboard-*` | `Status = Delivered` |
| **`Failure`** | The action was attempted and failed | *(none today — a failed actor flush is dropped, not recorded as an event)* | *(none emitted today)* | `Status ∈ { Failed, Parked, Discarded }` |
| **`Denied`** | The action was rejected by authorization / policy | `OpcUaAccessDenied`, `CrossClusterNamespaceAttempt` | `constraint-denied` | `Kind = InboundAuthFailure` |
Notes:
- **OtOpcUa has no `Failure` source.** Its vocabulary only distinguishes success-verbs from
access-denials; an internal write failure is dropped (best-effort), not emitted as an event. So
OtOpcUa produces only `Success` / `Denied` until/unless it adds failure events.
- **MxGateway emits only `Success` / `Denied`** today (no failure events; authentication
success/failure is surfaced as gRPC status, not persisted — see its current-state doc).
- **ScadaBridge in-flight states** (`Submitted` / `Forwarded` / `Attempted`) are not terminal; when
projecting to a single `Outcome` they collapse to the last-known terminal state. `Skipped` is not a
user-facing outcome and is excluded from the canonical projection.
## Per-project mapping table (canonical ← native record)
Consolidated from the three current-state docs. "Direct" = field exists with the same role; the
right-hand notes flag the type bridges and synthesized fields.
| Canonical field | OtOpcUa `AuditEvent` (8 fields) | MxGateway `ApiKeyAuditRecord` (6 fields) | ScadaBridge `AuditEvent` (~25 fields) |
|---|---|---|---|
| `EventId` | `EventId` — direct (idempotency key) | **generate** new `Guid` (only `AuditId` rowid exists) | `EventId` — direct |
| `OccurredAtUtc` | `OccurredAtUtc` (`DateTime` UTC) → widen | `CreatedUtc` (store-assigned `DateTimeOffset`) — direct | `OccurredAtUtc` (`DateTime` UTC-forced) → widen |
| `Actor` | `Actor` — direct | `KeyId` (nullable → `"system"`/`"cli"` fallback) | `Actor` (nullable on system rows) |
| `Action` | `Action` (persisted as `"{Category}:{Action}"`) | `EventType` — direct | `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) |
| `Outcome` | **derive** from `EventType` | **derive**: `constraint-denied``Denied`, else `Success` | **derive** from `Status` (+`InboundAuthFailure``Denied`) |
| `Category` | `Category` (`"Config"`) | constant `"ApiKey"` | `Channel` |
| `Target` | — none — (null or via `DetailsJson`) | — none — (`commandKind`/`target` embedded in `Details` text) | `Target` — direct |
| `SourceNode` | `SourceNode` (logical node, `NodeId.Value`) | `RemoteAddress` (dashboard path only) | `SourceNode` — direct |
| `CorrelationId` | `CorrelationId` (`CorrelationId.Value`) — direct | — none — | `CorrelationId` — direct |
| `DetailsJson` | `DetailsJson` — direct (also `ClusterId`/`GenerationId` on the SP path) | `Details` (plain string → store as-is or wrap) | the ~15 rich/plumbing fields (`ExecutionId`, `SourceSiteId`, `HttpStatus`, `DurationMs`, `ErrorMessage`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra`, `ForwardState`, …) serialize here |
The canonical record is a **lossy projection**: it is sufficient for cross-project reporting, but each
project keeps its native record as the storage shape — ScadaBridge especially, whose partitioned SQL
schema, forwarding state, and reconciliation depend on the extra columns ([`SPEC.md`](SPEC.md) §5).
+145
View File
@@ -0,0 +1,145 @@
# Audit — normalized target spec
Status: **Draft**. The single design the sister projects converge on. Derived from the three
code-verified current-state docs (`../current-state/`) and the locked design
(`../../../docs/plans/2026-06-01-audit-component-design.md`). Goal is *path to shared code*
(`../shared-contract/ZB.MOM.WW.Audit.md`), so each normalized section maps to a shared library seam.
## 0. Normalized vs left-per-project
**Normalized here** (the shared `ZB.MOM.WW.Audit` library):
- **The canonical `AuditEvent` record** — required core (`EventId`, `OccurredAtUtc`, `Actor`,
`Action`, `Outcome`) + optional common (`Category`, `Target`, `SourceNode`, `CorrelationId`) +
the `DetailsJson` extension bag. The full field-by-field reference is [`EVENT-MODEL.md`](EVENT-MODEL.md).
- **`AuditOutcome`** — the 3-value `Success | Failure | Denied` enum (§3). This is a *new*
normalized field every app derives; see [`EVENT-MODEL.md`](EVENT-MODEL.md) for the per-app derivation.
- **The two seams** — `IAuditWriter` (best-effort, never throws to caller, §1) and `IAuditRedactor`
(pure, never throws, over-redacts on failure, §2).
**Explicitly NOT normalized** (domain-specific / divergent — keep per project):
- **Transport & storage** — OtOpcUa's Akka cluster-broadcast → singleton `AuditWriterActor` (batch
500 / 5 s, two-layer dedup) over `ConfigAuditLog`; MxGateway's SQLite `IApiKeyAuditStore` append +
list-recent; ScadaBridge's site-SQLite hot-path → central MS SQL ingest / reconcile / purge /
partition-maintenance / hash-chain pipeline. The shared core is **BCL-only** and carries no Akka /
EF / SQLite / Serilog dependency.
- **Domain vocabulary** — ScadaBridge's `Channel` / `Kind` / `Status` / `ForwardState` enums and
OtOpcUa's `EventType` strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …). These map
*into* `Action` / `Category` / `Outcome` / `DetailsJson`; they do not leak into the shared type.
- **Query / CLI / UI / export** surfaces (OtOpcUa `ClusterAudit.razor`; ScadaBridge `export` /
`verify-chain` CLI + Blazor audit pages; MxGateway's unused `ListRecentAsync`).
- **Each app's redaction *policy*** — *which* fields/commands/payloads are sensitive. Only the
`IAuditRedactor` *seam* is shared; the `Default` / `Safe` filter behaviour stays per-project.
> **Scope of the producer path.** OtOpcUa has **two producers** writing the same `ConfigAuditLog`
> table — the structured Akka `AuditEvent` path *and* older SQL stored procedures that `INSERT`
> directly (`SUSER_SNAME()`, bare `EventType`, NULL `EventId`). Normalization targets the
> **structured producer path** (the one that builds an `AuditEvent`), not every SQL insert; the SP
> path stays per-project and is a reconcile item, not an extraction item (`../GAPS.md`).
## 1. The writer contract — `IAuditWriter` (best-effort)
```csharp
public interface IAuditWriter
{
Task WriteAsync(AuditEvent evt, CancellationToken ct = default);
}
```
Audit is a side-channel, never on the critical path. The hard rule:
- **`WriteAsync` MUST NOT throw to the caller.** An implementation swallows/logs its own internal
failures; a failed write **must never abort the user-facing action** it is recording. (ScadaBridge's
seam already states this almost word-for-word: "Failures must NEVER abort the user-facing action.")
- Idempotency is carried by `EventId`, so retries and at-least-once transports are safe (OtOpcUa's
filtered-unique `EventId` index and ScadaBridge's first-write-wins are both honoured by this key).
- Delivery is at-most-once *as a contract* — a writer MAY drop on failure (OtOpcUa drops a failed
batch; ScadaBridge's ring-buffer fallback drops oldest). Durability is a per-project transport
decision, not part of this seam.
Shipped helpers (the only concrete writers): `NoOpAuditWriter` (discards — tests / disabled audit),
`CompositeAuditWriter` (fans out to N writers; **one writer throwing does not stop the others**), and
`RedactingAuditWriter` (decorator: applies the redactor, then delegates to an inner writer).
## 2. The redactor contract — `IAuditRedactor` (never throws)
```csharp
public interface IAuditRedactor
{
AuditEvent Apply(AuditEvent rawEvent);
}
```
A pure projection from a raw event to a safe one, applied between event construction and the writer
chain. The hard rule:
- **`Apply` MUST NOT throw.** On any internal failure it **over-redacts** (returns a strictly safer
event) rather than propagating — a redactor that throws would either crash the audit path or leak
the unredacted event. (ScadaBridge's `SafeDefaultAuditPayloadFilter` is the reference: header-only
redaction, over-redacts on parse failure.)
- It is a **pure function** returning a filtered *copy* (via `with`); it does not mutate the input or
perform I/O.
The seam is **aligned-but-independent** with Telemetry's `ILogRedactor` — same shape and naming
discipline so a future `ZB.MOM.WW.Hosting` aggregator wires both with one mental model — but there is
**no cross-package dependency**. Shipped helpers: `NullAuditRedactor` (identity — the default when no
policy is configured) and `TruncatingAuditRedactor` (caps `DetailsJson` / `Target` to a configured
max + sets a truncation marker; never throws). The *secret-field policy* (which fields/commands are
sensitive) stays per-project via composition.
## 3. `AuditOutcome` — the new normalized field
`Outcome` is in the **required core**, but **no app stores it today** — each encodes outcome
implicitly and must **derive** it at adoption (this is the one genuinely new field):
- **OtOpcUa** — derived from the `EventType` vocabulary (`OpcUaAccessDenied` /
`CrossClusterNamespaceAttempt``Denied`; config-write verbs → `Success`).
- **MxGateway** — `constraint-denied``Denied`; key-lifecycle events → `Success`.
- **ScadaBridge** — `AuditStatus``Outcome` (`Delivered``Success`; `Failed` / `Parked` /
`Discarded``Failure`; `InboundAuthFailure` kind → `Denied`).
The three values normalize denials and failures across the family without importing any app's full
taxonomy. The enum definition and the complete state-by-state mapping live in [`EVENT-MODEL.md`](EVENT-MODEL.md).
## 4. The hinge — audit closes the loop on Auth
Every audit row's `Actor` is the *who*, which is exactly the identity the **Auth** component already
normalizes (LDAP/GLAuth principal, API-key name). Auth is the read side ("who is this and what may
they do"); audit is the write side ("who did what"). The spec ties them by stating:
- **`Actor` SHOULD be the `ZB.MOM.WW.Auth` principal** at adoption time.
- But `Actor` is **kept as a plain `string`** in the contract, so the library carries **no dependency
on `ZB.MOM.WW.Auth`**. (MxGateway's keyless events — `init-db` / `list-keys` — supply a `"system"` /
`"cli"` fallback rather than leaving the required field empty.)
This mirrors Auth's own decision to keep audit *read* inside `OBSERVE` and audit *export* inside
`ADMINISTER` rather than minting a separate auditor role: the two components share a vocabulary, not a
dependency.
## 5. ScadaBridge is already at the target
ScadaBridge already ships **both** seams: an `IAuditWriter` whose best-effort contract matches
word-for-word, and an `IAuditPayloadFilter` that *is* the canonical `IAuditRedactor` under a different
name (identical `AuditEvent Apply(AuditEvent)` signature, pure / never-throws / over-redacts). The
library essentially **lifts ScadaBridge's seams**.
The one real (non-naming) decision is the **writer's record type**: the canonical `IAuditWriter` is
typed on the 8-field `AuditEvent`; ScadaBridge's writer is typed on its ~25-field record.
> **Resolution (recommended):** share the **interface *name* + the `AuditOutcome` enum**, not the
> record schema. ScadaBridge keeps its rich ~25-field record as its **storage shape** (its whole
> transport / partition / forwarding / reconciliation layer is built on the extra columns), and maps
> to the canonical 8-field record **only at cross-app reporting boundaries**. This is the
> minimal-coupling option — share the contract, not the schema — and avoids making the shared seam
> generic over the event type. ScadaBridge therefore converges by **renaming one interface** and
> adopting `AuditOutcome`, with no transport / storage / CLI / UI change.
## 6. Acceptance (what "converged" means)
A project is converged when: (a) its structured audit-producer path constructs the canonical
`AuditEvent` (with `Outcome` derived per §3) and persists via an implementation of `IAuditWriter`;
(b) any redaction runs through an `IAuditRedactor`; (c) `Actor` carries the `ZB.MOM.WW.Auth` principal
where one exists (string fallback otherwise); with its transport, storage, domain vocabulary, query
surfaces, and redaction *policy* unchanged. Per-project deltas and the adoption backlog are in
[`../GAPS.md`](../GAPS.md); the proposed library API is [`../shared-contract/ZB.MOM.WW.Audit.md`](../shared-contract/ZB.MOM.WW.Audit.md).