docs(audit): current-state ScadaBridge

This commit is contained in:
Joseph Doherty
2026-06-01 06:55:07 -04:00
parent 02cc687556
commit 9c8c1431af
@@ -0,0 +1,162 @@
# Audit — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET; solution `ZB.MOM.WW.ScadaBridge.slnx`.
Audit code centers on the dedicated `ZB.MOM.WW.ScadaBridge.AuditLog` project, with the shared
record + seams living in `ZB.MOM.WW.ScadaBridge.Commons`. All paths relative to repo root.
Verified 2026-06-01.
**By far the largest audit implementation in the family** — a full who-did-what pipeline
across a site SQLite hot-path and a central MS SQL store, with forwarding, reconciliation,
purge, partition maintenance, redaction, CLI export, hash-chain verify (v1 stub), and a Blazor
UI. **Key finding: ScadaBridge is already at the target.** It already has an `IAuditWriter`
best-effort seam (near-identical to the canonical contract) and an `IAuditPayloadFilter`
redaction seam (= the library's `IAuditRedactor`, just renamed). Adoption is *align, don't
replace* — mostly naming alignment; the enormous transport/storage/CLI/UI stays bespoke.
## 1. How it works today
### The record — `AuditEvent` (~25 fields)
`src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Audit/AuditEvent.cs:22` — a `sealed record`,
append-only, "single source of truth for AuditLog (#23) rows." Far richer than the canonical
8-field event. Notable fields:
- Identity / correlation: `EventId` (idempotency key, `:25`), `CorrelationId` (per-op
lifecycle, `:68`), `ExecutionId` (per-run, `:75`), `ParentExecutionId` (spawner link, `:82`).
- Classification: `Channel` (`:62`), `Kind` (`:65`), `Status` (`:109`) — the domain enums (below).
- Provenance: `SourceSiteId` (`:85`), `SourceNode` (`:94`, stamped from `INodeIdentityProvider`),
`SourceInstanceId` (`:97`), `SourceScript` (`:100`), `Actor` (`:103`), `Target` (`:106`).
- Outcome detail: `HttpStatus` (`:112`), `DurationMs` (`:115`), `ErrorMessage` (`:118`),
`ErrorDetail` (`:121`).
- Payload: `RequestSummary` / `ResponseSummary` (truncated+redacted, `:124`/`:127`),
`PayloadTruncated` (`:130`), `Extra` (free-form JSON, `:133`).
- Lifecycle plumbing: `IngestedAtUtc` (null on site, stamped at central ingest, `:52`),
`ForwardState` (site-only, null on central, `:136`).
**UTC-forcing init-setters.** `OccurredAtUtc` (`:39`) and `IngestedAtUtc` (`:52`) keep a backing
field and call `DateTime.SpecifyKind(value, DateTimeKind.Utc)` on assignment, so a value built
from a literal or rehydrated from a SQL Server `datetime2` column (which strips `Kind` on the
wire) cannot leak downstream as `Unspecified`/local. The record uses `DateTime` (not
`DateTimeOffset`) deliberately, to match the partitioned `datetime2` column shape (`:9-21`).
### Domain vocabulary — four enums
`src/ZB.MOM.WW.ScadaBridge.Commons/Types/Enums/`:
- `AuditChannel.cs:7` — trust boundary crossed: `ApiOutbound`, `DbOutbound`, `Notification`,
`ApiInbound`.
- `AuditKind.cs:8` — specific event within a channel: `ApiCall`, `ApiCallCached`, `DbWrite`,
`DbWriteCached`, `NotifySend`, `NotifyDeliver`, `InboundRequest`, `InboundAuthFailure`,
`CachedSubmit`, `CachedResolve`. Cached variants emit multiple rows per operation.
- `AuditStatus.cs:8` — lifecycle status of the row: `Submitted`, `Forwarded`, `Attempted`,
`Delivered`, `Failed`, `Parked`, `Discarded`, `Skipped`.
- `AuditForwardState.cs:9` — site-local forwarding state (central rows leave null): `Pending`,
`Forwarded`, `Reconciled`. The site retention purge MUST NOT drop a `Pending` row.
### The writer seam — `IAuditWriter` (best-effort, never aborts the action)
`src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/IAuditWriter.cs:10` — boundary-side
abstraction: `Task WriteAsync(AuditEvent evt, CancellationToken ct = default)` (`:18`). The
contract is explicit and matches the canonical seam almost word-for-word: **"Failures must NEVER
abort the user-facing action"** (`:8`), best-effort, "implementations must swallow/log internal
failures rather than propagating them to the calling boundary code" (`:13-14`).
### The redaction seam — `IAuditPayloadFilter` (pure, never throws)
`src/ZB.MOM.WW.ScadaBridge.AuditLog/Payload/IAuditPayloadFilter.cs:22``AuditEvent Apply(
AuditEvent rawEvent)` (`:30`). Filters an event between construction and persistence:
truncates oversized payloads, redacts headers/body/SQL params, sets `PayloadTruncated`.
**Pure function** returning a filtered COPY via `with` expressions, and **MUST NOT throw**
on internal failure it over-redacts and increments the `AuditRedactionFailure` health metric
(`:11-20`, `:26-28`). This is exactly the canonical `IAuditRedactor` under a different name.
Two implementations: `DefaultAuditPayloadFilter.cs:56` (full truncation + header/body/SQL
redaction with live options) and `SafeDefaultAuditPayloadFilter.cs:19` (always-safe fallback —
header-only redaction, over-redacts on parse failure, `:42-59`).
### Transport / storage / pipeline — stays per-project
The `ZB.MOM.WW.ScadaBridge.AuditLog` project is split into `Site/`, `Central/`, `Payload/`, and
`Configuration/`. This is the bespoke half and is **not** a candidate for extraction; cited here
only to show the scale around the common core:
- **Site hot-path:** `Site/SqliteAuditWriter.cs:32` (`IAuditWriter` over an owned `SqliteConnection`
fed by a bounded `Channel<T>` drained on a background task, so script-thread callers never block
on disk I/O; first-write-wins on duplicate `EventId`). `Site/FallbackAuditWriter.cs:28` composes
the SQLite writer with a drop-oldest `RingBufferFallback` so a primary failure never bubbles out.
`Site/Telemetry/` forwards rows to central over Akka `ClusterClient`.
- **Central ingest/store:** `Central/CentralAuditWriter.cs:40` (`ICentralAuditWriter`, direct MS SQL
write for central-originated events, per-call EF scope, idempotent `InsertIfNotExistsAsync`,
swallows every exception per "alog.md §13"). `Central/AuditLogIngestActor.cs:46` batches site
telemetry; `Central/SiteAuditReconciliationActor.cs:68` periodically pulls to catch dropped
forwards; `Central/AuditLogPurgeActor.cs:58` enforces retention; `Central/AuditLogPartitionMaintenanceService.cs:55`
manages the partitioned table.
- **CLI:** `CLI/Commands/AuditCommands.cs:21` builds `export` (`:137`, formats `csv`/`jsonl`/`parquet`)
and `verify-chain` (`:226`). Hash-chain verify is currently a **v1 no-op stub**
`CLI/Commands/AuditVerifyChainHelpers.cs:6-10` ("v1 is a no-op").
- **UI:** Blazor pages under `CentralUI/Components/Pages/Audit/` (e.g. `AuditLogPage.razor:1`,
gated by `[Authorize(Policy = AuthorizationPolicies.OperationalAudit)]`) plus drill-down
components in `CentralUI/Components/Audit/`.
- **Wiring:** `AuditLog/ServiceCollectionExtensions.cs:59` `AddAuditLog(...)`, `:316`
`AddAuditLogCentralMaintenance(...)`.
## 2. Mapping to the canonical record
Target (`ZB.MOM.WW.Audit`, being built): `record AuditEvent { Guid EventId; DateTimeOffset
OccurredAtUtc; string Actor; string Action; AuditOutcome Outcome; string? Category; string?
Target; string? SourceNode; Guid? CorrelationId; string? DetailsJson; }`. ScadaBridge's record is
a strict superset — the canonical fields map directly; the rich extras collapse into `DetailsJson`.
| Canonical field | ScadaBridge source | Notes |
|---|---|---|
| `EventId` (Guid) | `AuditEvent.EventId` | Direct; same idempotency-key role. |
| `OccurredAtUtc` (DateTimeOffset) | `AuditEvent.OccurredAtUtc` (`DateTime`, UTC-forced) | Type bridge `DateTime`(Utc)↔`DateTimeOffset`; semantics identical. |
| `Actor` (string) | `AuditEvent.Actor` (nullable) | Direct; ScadaBridge allows null (system-originated rows). |
| `Action` (string) | `AuditEvent.Kind` (+`Channel`) | Derive a stable action string, e.g. `{Channel}.{Kind}` (`ApiOutbound.ApiCall`). |
| `Outcome` (Success/Failure/Denied) | `AuditEvent.Status` | `Delivered`→Success; `Failed`/`Parked`/`Discarded`→Failure; `InboundAuthFailure`(Kind)→Denied; in-flight `Submitted`/`Forwarded`/`Attempted` collapse to the last-known terminal state when projecting. |
| `Category` (string?) | `AuditEvent.Channel` | The coarse bucket; pairs with `Action` above. |
| `Target` (string?) | `AuditEvent.Target` | Direct. |
| `SourceNode` (string?) | `AuditEvent.SourceNode` | Direct (`node-a`/`central-b`/…). |
| `CorrelationId` (Guid?) | `AuditEvent.CorrelationId` | Direct (per-op lifecycle id). |
| `DetailsJson` (string?) | `ExecutionId`, `ParentExecutionId`, `SourceSiteId`, `SourceInstanceId`, `SourceScript`, `HttpStatus`, `DurationMs`, `ErrorMessage`, `ErrorDetail`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra`, `IngestedAtUtc`, `ForwardState` | The ~15 rich/plumbing fields serialize into the canonical `DetailsJson` extension. |
The canonical record is a lossy *projection* of ScadaBridge's — fine for cross-project
reporting, but ScadaBridge keeps its full record as the storage shape (the partitioned SQL
schema, forwarding state, and reconciliation all depend on the extra columns).
## 3. Adoption plan → `ZB.MOM.WW.Audit`
**Posture: align, don't replace.** ScadaBridge is the reference implementation the shared
library is being extracted *from*; it already has both seams. Adoption is mostly renaming and
contract-confirmation, with a deliberately small touched surface and a large blast radius if
done carelessly. **Priority: LOW. Blast radius: HIGH.**
**Align (small, naming-level):**
- **Rename the redaction seam to match the contract.** `IAuditPayloadFilter` → adopt
`ZB.MOM.WW.Audit.IAuditRedactor` (`AuditEvent Apply(AuditEvent)` — identical signature and
pure/never-throws contract). Either alias `IAuditPayloadFilter : IAuditRedactor` during
transition or rename outright; `DefaultAuditPayloadFilter` / `SafeDefaultAuditPayloadFilter`
implement it unchanged. See [`../../shared-contract/`](../../shared-contract/).
- **Confirm the writer contract matches.** `IAuditWriter.WriteAsync(AuditEvent, CancellationToken
= default)` is already byte-for-byte the canonical signature, and the "never abort the
user-facing action" wording matches. The only delta is the **record type**: the library's
`IAuditWriter` is typed on the *canonical* 8-field `AuditEvent`, while ScadaBridge's is typed on
its ~25-field record. Resolve by either (a) keeping ScadaBridge's writer on its own rich record
and adopting only the library's *interface name + outcome enum*, or (b) having the shared seam be
generic over the event type. **Recommended: (a)** — adopt the canonical `AuditOutcome` enum and
the interface naming, but keep the bespoke `AuditEvent` as ScadaBridge's storage record, since the
whole transport/partition/forwarding layer is built on its extra columns. (Best-practice fit: this
is the minimal-coupling option — share the contract, not the schema.)
**Keep bespoke (the large, untouched majority):**
- The entire `Site/` (SQLite hot-path + ring-buffer fallback + telemetry forwarder) and `Central/`
(ingest / reconcile / purge / partition maintenance) pipeline.
- The `AuditEvent` rich record itself, the four domain enums (`AuditChannel`/`AuditKind`/
`AuditStatus`/`AuditForwardState`), CLI `export`/`verify-chain`, and the Blazor audit UI.
- The redaction *policy* (`DefaultAuditPayloadFilter` options, per-target overrides) — only the
interface name is shared, not the implementation.
**Net:** ScadaBridge converges by renaming one interface and adopting the canonical `AuditOutcome`
enum + the `Kind`/`Channel`→`Action`/`Category` and ``→`DetailsJson` projection for any
cross-project reporting. No transport, storage, CLI, or UI is replaced. Sequencing and the
cross-project gap list live in [`../../GAPS.md`](../../GAPS.md); the canonical target is
[`../../spec/SPEC.md`](../../spec/SPEC.md).