diff --git a/components/audit/spec/EVENT-MODEL.md b/components/audit/spec/EVENT-MODEL.md new file mode 100644 index 0000000..9cc6181 --- /dev/null +++ b/components/audit/spec/EVENT-MODEL.md @@ -0,0 +1,94 @@ +# Canonical event model (standardized) + +Status: **Standardized**. The org-wide audit record + outcome enum every sister project maps onto. +This is the reference companion to [`SPEC.md`](SPEC.md) (mirroring auth's `CANONICAL-ROLES.md` / +theme's `DESIGN-TOKENS.md`): the field-by-field canonical record, the `AuditOutcome` definition with +which app states map onto each value, and the full per-project mapping table. The shared library +defines exactly this record; each project **projects its native record onto it** at the seam. + +## The canonical record + +```csharp +namespace ZB.MOM.WW.Audit; + +public sealed record AuditEvent +{ + // REQUIRED core — who / what / when / outcome + public required Guid EventId { get; init; } // idempotency key + public required DateTimeOffset OccurredAtUtc { get; init; } // normalized to UTC + public required string Actor { get; init; } // who — = ZB.MOM.WW.Auth principal at adoption + public required string Action { get; init; } // what — verb / event-type string + public required AuditOutcome Outcome { get; init; } // Success | Failure | Denied + + // OPTIONAL common + public string? Category { get; init; } // subsystem / grouping bucket + public string? Target { get; init; } // on-what (resource / method / connection) + public string? SourceNode { get; init; } // emitting logical node / host + public Guid? CorrelationId { get; init; } // join to originating request / workflow + + // EXTENSION — everything project-specific, as JSON + public string? DetailsJson { get; init; } +} + +public enum AuditOutcome { Success, Failure, Denied } +``` + +### Field-by-field + +| Field | Req? | Type | Meaning | Notes | +|---|:-:|---|---|---| +| `EventId` | yes | `Guid` | Idempotency key | Backs at-least-once transports: OtOpcUa's filtered-unique `EventId` index, ScadaBridge's first-write-wins. MxGateway has none today → **generate at write time**. | +| `OccurredAtUtc` | yes | `DateTimeOffset` | When it happened, UTC | MxGateway already uses `DateTimeOffset`. OtOpcUa / ScadaBridge store UTC-forced `DateTime` and widen at the mapping boundary. | +| `Actor` | yes | `string` | Who acted | SHOULD be the `ZB.MOM.WW.Auth` principal ([`SPEC.md`](SPEC.md) §4). Kept a `string` (no Auth dependency). Keyless events use a `"system"` / `"cli"` fallback rather than empty. | +| `Action` | yes | `string` | What was done (verb / event-type) | Carries each app's domain verb: OtOpcUa `EventType`, MxGateway `EventType`, ScadaBridge `{Channel}.{Kind}`. | +| `Outcome` | yes | `AuditOutcome` | Success / Failure / Denied | **New normalized field — no app stores it today; each derives it** (see below). | +| `Category` | no | `string?` | Coarse subsystem / grouping | OtOpcUa `Category` (`"Config"`); MxGateway constant `"ApiKey"`; ScadaBridge `Channel`. | +| `Target` | no | `string?` | The object acted on | ScadaBridge `Target` (direct). OtOpcUa / MxGateway have no dedicated field → null or fold into `DetailsJson`. | +| `SourceNode` | no | `string?` | Emitting logical node / host | OtOpcUa `SourceNode` (a logical node name, **not** an OPC UA NodeId); ScadaBridge `SourceNode`; MxGateway `RemoteAddress`. | +| `CorrelationId` | no | `Guid?` | Join to originating request / workflow | OtOpcUa / ScadaBridge direct; MxGateway has none today (left null). | +| `DetailsJson` | no | `string?` | Extension bag — all project-specific data | Must be valid JSON where stored (OtOpcUa enforces this with a CHECK constraint). Absorbs each app's surplus columns. | + +## `AuditOutcome` — definition and app-state mapping + +Three values, deliberately minimal — enough to normalize denials and failures without importing any +app's full taxonomy. `Outcome` is **derived** at each emit site (no app persists it today; OtOpcUa +encodes it implicitly in `EventType`, MxGateway in the event-type literal, ScadaBridge in `Status`): + +| `AuditOutcome` | Meaning | OtOpcUa (`EventType`) | MxGateway (event type) | ScadaBridge (`AuditStatus` / `AuditKind`) | +|---|---|---|---|---| +| **`Success`** | The action completed | config-write verbs — `DraftCreated`, `DraftEdited`, `Published`, `RolledBack`, `NodeApplied`, `CredentialAdded`, `ClusterCreated`, `NodeAdded`, `ExternalIdReleased`, … | key-lifecycle — `init-db`, `create-key`, `list-keys`, `revoke-key`, `rotate-key` + all `dashboard-*` | `Status = Delivered` | +| **`Failure`** | The action was attempted and failed | *(none today — a failed actor flush is dropped, not recorded as an event)* | *(none emitted today)* | `Status ∈ { Failed, Parked, Discarded }` | +| **`Denied`** | The action was rejected by authorization / policy | `OpcUaAccessDenied`, `CrossClusterNamespaceAttempt` | `constraint-denied` | `Kind = InboundAuthFailure` | + +Notes: + +- **OtOpcUa has no `Failure` source.** Its vocabulary only distinguishes success-verbs from + access-denials; an internal write failure is dropped (best-effort), not emitted as an event. So + OtOpcUa produces only `Success` / `Denied` until/unless it adds failure events. +- **MxGateway emits only `Success` / `Denied`** today (no failure events; authentication + success/failure is surfaced as gRPC status, not persisted — see its current-state doc). +- **ScadaBridge in-flight states** (`Submitted` / `Forwarded` / `Attempted`) are not terminal; when + projecting to a single `Outcome` they collapse to the last-known terminal state. `Skipped` is not a + user-facing outcome and is excluded from the canonical projection. + +## Per-project mapping table (canonical ← native record) + +Consolidated from the three current-state docs. "Direct" = field exists with the same role; the +right-hand notes flag the type bridges and synthesized fields. + +| Canonical field | OtOpcUa `AuditEvent` (8 fields) | MxGateway `ApiKeyAuditRecord` (6 fields) | ScadaBridge `AuditEvent` (~25 fields) | +|---|---|---|---| +| `EventId` | `EventId` — direct (idempotency key) | **generate** new `Guid` (only `AuditId` rowid exists) | `EventId` — direct | +| `OccurredAtUtc` | `OccurredAtUtc` (`DateTime` UTC) → widen | `CreatedUtc` (store-assigned `DateTimeOffset`) — direct | `OccurredAtUtc` (`DateTime` UTC-forced) → widen | +| `Actor` | `Actor` — direct | `KeyId` (nullable → `"system"`/`"cli"` fallback) | `Actor` (nullable on system rows) | +| `Action` | `Action` (persisted as `"{Category}:{Action}"`) | `EventType` — direct | `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) | +| `Outcome` | **derive** from `EventType` | **derive**: `constraint-denied`→`Denied`, else `Success` | **derive** from `Status` (+`InboundAuthFailure`→`Denied`) | +| `Category` | `Category` (`"Config"`) | constant `"ApiKey"` | `Channel` | +| `Target` | — none — (null or via `DetailsJson`) | — none — (`commandKind`/`target` embedded in `Details` text) | `Target` — direct | +| `SourceNode` | `SourceNode` (logical node, `NodeId.Value`) | `RemoteAddress` (dashboard path only) | `SourceNode` — direct | +| `CorrelationId` | `CorrelationId` (`CorrelationId.Value`) — direct | — none — | `CorrelationId` — direct | +| `DetailsJson` | `DetailsJson` — direct (also `ClusterId`/`GenerationId` on the SP path) | `Details` (plain string → store as-is or wrap) | the ~15 rich/plumbing fields (`ExecutionId`, `SourceSiteId`, `HttpStatus`, `DurationMs`, `ErrorMessage`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra`, `ForwardState`, …) serialize here | + +The canonical record is a **lossy projection**: it is sufficient for cross-project reporting, but each +project keeps its native record as the storage shape — ScadaBridge especially, whose partitioned SQL +schema, forwarding state, and reconciliation depend on the extra columns ([`SPEC.md`](SPEC.md) §5). diff --git a/components/audit/spec/SPEC.md b/components/audit/spec/SPEC.md new file mode 100644 index 0000000..1988c20 --- /dev/null +++ b/components/audit/spec/SPEC.md @@ -0,0 +1,145 @@ +# Audit — normalized target spec + +Status: **Draft**. The single design the sister projects converge on. Derived from the three +code-verified current-state docs (`../current-state/`) and the locked design +(`../../../docs/plans/2026-06-01-audit-component-design.md`). Goal is *path to shared code* +(`../shared-contract/ZB.MOM.WW.Audit.md`), so each normalized section maps to a shared library seam. + +## 0. Normalized vs left-per-project + +**Normalized here** (the shared `ZB.MOM.WW.Audit` library): + +- **The canonical `AuditEvent` record** — required core (`EventId`, `OccurredAtUtc`, `Actor`, + `Action`, `Outcome`) + optional common (`Category`, `Target`, `SourceNode`, `CorrelationId`) + + the `DetailsJson` extension bag. The full field-by-field reference is [`EVENT-MODEL.md`](EVENT-MODEL.md). +- **`AuditOutcome`** — the 3-value `Success | Failure | Denied` enum (§3). This is a *new* + normalized field every app derives; see [`EVENT-MODEL.md`](EVENT-MODEL.md) for the per-app derivation. +- **The two seams** — `IAuditWriter` (best-effort, never throws to caller, §1) and `IAuditRedactor` + (pure, never throws, over-redacts on failure, §2). + +**Explicitly NOT normalized** (domain-specific / divergent — keep per project): + +- **Transport & storage** — OtOpcUa's Akka cluster-broadcast → singleton `AuditWriterActor` (batch + 500 / 5 s, two-layer dedup) over `ConfigAuditLog`; MxGateway's SQLite `IApiKeyAuditStore` append + + list-recent; ScadaBridge's site-SQLite hot-path → central MS SQL ingest / reconcile / purge / + partition-maintenance / hash-chain pipeline. The shared core is **BCL-only** and carries no Akka / + EF / SQLite / Serilog dependency. +- **Domain vocabulary** — ScadaBridge's `Channel` / `Kind` / `Status` / `ForwardState` enums and + OtOpcUa's `EventType` strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …). These map + *into* `Action` / `Category` / `Outcome` / `DetailsJson`; they do not leak into the shared type. +- **Query / CLI / UI / export** surfaces (OtOpcUa `ClusterAudit.razor`; ScadaBridge `export` / + `verify-chain` CLI + Blazor audit pages; MxGateway's unused `ListRecentAsync`). +- **Each app's redaction *policy*** — *which* fields/commands/payloads are sensitive. Only the + `IAuditRedactor` *seam* is shared; the `Default` / `Safe` filter behaviour stays per-project. + +> **Scope of the producer path.** OtOpcUa has **two producers** writing the same `ConfigAuditLog` +> table — the structured Akka `AuditEvent` path *and* older SQL stored procedures that `INSERT` +> directly (`SUSER_SNAME()`, bare `EventType`, NULL `EventId`). Normalization targets the +> **structured producer path** (the one that builds an `AuditEvent`), not every SQL insert; the SP +> path stays per-project and is a reconcile item, not an extraction item (`../GAPS.md`). + +## 1. The writer contract — `IAuditWriter` (best-effort) + +```csharp +public interface IAuditWriter +{ + Task WriteAsync(AuditEvent evt, CancellationToken ct = default); +} +``` + +Audit is a side-channel, never on the critical path. The hard rule: + +- **`WriteAsync` MUST NOT throw to the caller.** An implementation swallows/logs its own internal + failures; a failed write **must never abort the user-facing action** it is recording. (ScadaBridge's + seam already states this almost word-for-word: "Failures must NEVER abort the user-facing action.") +- Idempotency is carried by `EventId`, so retries and at-least-once transports are safe (OtOpcUa's + filtered-unique `EventId` index and ScadaBridge's first-write-wins are both honoured by this key). +- Delivery is at-most-once *as a contract* — a writer MAY drop on failure (OtOpcUa drops a failed + batch; ScadaBridge's ring-buffer fallback drops oldest). Durability is a per-project transport + decision, not part of this seam. + +Shipped helpers (the only concrete writers): `NoOpAuditWriter` (discards — tests / disabled audit), +`CompositeAuditWriter` (fans out to N writers; **one writer throwing does not stop the others**), and +`RedactingAuditWriter` (decorator: applies the redactor, then delegates to an inner writer). + +## 2. The redactor contract — `IAuditRedactor` (never throws) + +```csharp +public interface IAuditRedactor +{ + AuditEvent Apply(AuditEvent rawEvent); +} +``` + +A pure projection from a raw event to a safe one, applied between event construction and the writer +chain. The hard rule: + +- **`Apply` MUST NOT throw.** On any internal failure it **over-redacts** (returns a strictly safer + event) rather than propagating — a redactor that throws would either crash the audit path or leak + the unredacted event. (ScadaBridge's `SafeDefaultAuditPayloadFilter` is the reference: header-only + redaction, over-redacts on parse failure.) +- It is a **pure function** returning a filtered *copy* (via `with`); it does not mutate the input or + perform I/O. + +The seam is **aligned-but-independent** with Telemetry's `ILogRedactor` — same shape and naming +discipline so a future `ZB.MOM.WW.Hosting` aggregator wires both with one mental model — but there is +**no cross-package dependency**. Shipped helpers: `NullAuditRedactor` (identity — the default when no +policy is configured) and `TruncatingAuditRedactor` (caps `DetailsJson` / `Target` to a configured +max + sets a truncation marker; never throws). The *secret-field policy* (which fields/commands are +sensitive) stays per-project via composition. + +## 3. `AuditOutcome` — the new normalized field + +`Outcome` is in the **required core**, but **no app stores it today** — each encodes outcome +implicitly and must **derive** it at adoption (this is the one genuinely new field): + +- **OtOpcUa** — derived from the `EventType` vocabulary (`OpcUaAccessDenied` / + `CrossClusterNamespaceAttempt` → `Denied`; config-write verbs → `Success`). +- **MxGateway** — `constraint-denied` → `Denied`; key-lifecycle events → `Success`. +- **ScadaBridge** — `AuditStatus` → `Outcome` (`Delivered` → `Success`; `Failed` / `Parked` / + `Discarded` → `Failure`; `InboundAuthFailure` kind → `Denied`). + +The three values normalize denials and failures across the family without importing any app's full +taxonomy. The enum definition and the complete state-by-state mapping live in [`EVENT-MODEL.md`](EVENT-MODEL.md). + +## 4. The hinge — audit closes the loop on Auth + +Every audit row's `Actor` is the *who*, which is exactly the identity the **Auth** component already +normalizes (LDAP/GLAuth principal, API-key name). Auth is the read side ("who is this and what may +they do"); audit is the write side ("who did what"). The spec ties them by stating: + +- **`Actor` SHOULD be the `ZB.MOM.WW.Auth` principal** at adoption time. +- But `Actor` is **kept as a plain `string`** in the contract, so the library carries **no dependency + on `ZB.MOM.WW.Auth`**. (MxGateway's keyless events — `init-db` / `list-keys` — supply a `"system"` / + `"cli"` fallback rather than leaving the required field empty.) + +This mirrors Auth's own decision to keep audit *read* inside `OBSERVE` and audit *export* inside +`ADMINISTER` rather than minting a separate auditor role: the two components share a vocabulary, not a +dependency. + +## 5. ScadaBridge is already at the target + +ScadaBridge already ships **both** seams: an `IAuditWriter` whose best-effort contract matches +word-for-word, and an `IAuditPayloadFilter` that *is* the canonical `IAuditRedactor` under a different +name (identical `AuditEvent Apply(AuditEvent)` signature, pure / never-throws / over-redacts). The +library essentially **lifts ScadaBridge's seams**. + +The one real (non-naming) decision is the **writer's record type**: the canonical `IAuditWriter` is +typed on the 8-field `AuditEvent`; ScadaBridge's writer is typed on its ~25-field record. + +> **Resolution (recommended):** share the **interface *name* + the `AuditOutcome` enum**, not the +> record schema. ScadaBridge keeps its rich ~25-field record as its **storage shape** (its whole +> transport / partition / forwarding / reconciliation layer is built on the extra columns), and maps +> to the canonical 8-field record **only at cross-app reporting boundaries**. This is the +> minimal-coupling option — share the contract, not the schema — and avoids making the shared seam +> generic over the event type. ScadaBridge therefore converges by **renaming one interface** and +> adopting `AuditOutcome`, with no transport / storage / CLI / UI change. + +## 6. Acceptance (what "converged" means) + +A project is converged when: (a) its structured audit-producer path constructs the canonical +`AuditEvent` (with `Outcome` derived per §3) and persists via an implementation of `IAuditWriter`; +(b) any redaction runs through an `IAuditRedactor`; (c) `Actor` carries the `ZB.MOM.WW.Auth` principal +where one exists (string fallback otherwise); with its transport, storage, domain vocabulary, query +surfaces, and redaction *policy* unchanged. Per-project deltas and the adoption backlog are in +[`../GAPS.md`](../GAPS.md); the proposed library API is [`../shared-contract/ZB.MOM.WW.Audit.md`](../shared-contract/ZB.MOM.WW.Audit.md).