Files
scadaproj/components/audit/GAPS.md
T

124 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Audit — gaps & adoption backlog
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
reach the shared `ZB.MOM.WW.Audit` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
> **✅ ADOPTED 2026-06-02 (local-only) — DEEP.** The backlog (#1#6) was implemented across all three apps on each repo's
> **`feat/adopt-zb-audit`** branch (stacked on `feat/adopt-zb-auth`) — committed + spec/code-reviewed, then **merged to
> each repo's local default (main/master) and PUSHED to origin (gitea) on 2026-06-03** (in sync). The user chose **DEEP adopt**:
> the canonical 9-field `AuditEvent` is the record EVERYWHERE
> (domain fields ride in `DetailsJson`), so the §1 "keep own record" framing below was superseded. OtOpcUa: canonical
> record + `AuditWriterActor : IAuditWriter` + `Outcome` col/migration + `ClusterAudit` fix. MxGateway: canonical SQLite
> `audit_event` store + `IAuditWriter` + `IApiKeyAuditStore`→canonical adapter. **ScadaBridge: a full audit-subsystem
> re-architecture** (codec + site `audit_event`/`audit_forward_state` sidecar + central partitioned-table collapse to
> 10 canonical + persisted computed cols, MSSQL-verified). §5 (Actor→Auth principal) wired via per-app
> `IAuditActorAccessor` (Phase 3). The Task 2.0 gate found this doc's pre-adoption framing was partly stale (MxGateway's
> store had moved into the lib; OtOpcUa's structured path was dormant; ScadaBridge's filter was typed to its own record).
> Detail: `docs/plans/2026-06-02-auth-audit-normalization-phase2-deep.md` + `…-scadabridge-audit-rearch.md`. The
> ⛔/🟡 cells below describe the PRE-adoption divergence (kept for history).
## Divergence vs spec
### §1 Canonical record (`AuditEvent`)
| Canonical field | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| `EventId` (Guid, required) | ✅ — idempotency key; buffer key + filtered-unique DB index | ⛔ — no event key; only an `AUTOINCREMENT` rowid (`AuditId`) | ✅ — direct |
| `OccurredAtUtc` (DateTimeOffset, required) | 🟡 — `DateTime` UTC; widen at mapping boundary | 🟡 — `DateTimeOffset` but store-assigned (not caller-supplied); direct after widening | 🟡 — `DateTime` UTC-forced; widen at mapping boundary |
| `Actor` (string, required) | ✅ — direct (`AuditEvent.Actor``ConfigAuditLog.Principal`) | 🟡 — `KeyId` nullable; keyless events (`init-db`/`list-keys`) need a `"system"`/`"cli"` fallback | 🟡 — nullable on system-originated rows; fallback needed |
| `Action` (string, required) | 🟡 — `Action` field exists, but persisted as `"{Category}:{Action}"` composite in `EventType`; canonical keeps them separate | ✅ — `EventType` literal direct | 🟡 — derived as `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) |
| `Outcome` (AuditOutcome, required) | ⛔ **NEW** — derived from `EventType` vocabulary; not stored today | ⛔ **NEW** — derived: `constraint-denied``Denied`, else `Success` | ⛔ **NEW** — derived from `Status` (+`InboundAuthFailure` Kind→`Denied`) |
| `Category` (string?) | ✅ — `AuditEvent.Category` (e.g. `"Config"`) | ⛔ — no field; constant `"ApiKey"` at mapping | ✅ — `Channel` |
| `Target` (string?) | ⛔ — no dedicated field; closest is `DetailsJson` | ⛔ — embedded in `Details` text (`commandKind`/`target`) | ✅ — direct |
| `SourceNode` (string?) | ✅ — `SourceNode` (logical cluster node / host name, NOT an OPC UA NodeId) | 🟡 — `RemoteAddress`; dashboard path only (null on CLI/constraint paths) | ✅ — direct |
| `CorrelationId` (Guid?) | ✅ — direct (`CorrelationId.Value`) | ⛔ — not captured today; left null | ✅ — direct |
| `DetailsJson` (string?) | ✅ — direct (JSON CHECK constraint enforced) | 🟡 — `Details` is a plain string, not JSON; wrap or store as-is | 🟡 — ~15 rich/plumbing fields serialize here at the cross-project reporting boundary |
### §2 `IAuditWriter` seam
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Named seam | ⛔ — no `IAuditWriter`; `AuditWriterActor` is the sink, consumed directly via Akka messaging | ⛔ — `IApiKeyAuditStore` (narrow, two-method) is the seam; no general `IAuditWriter` | ✅ — `IAuditWriter` with `WriteAsync(AuditEvent, CancellationToken)` signature; "failures must NEVER abort the user-facing action" contract; best-effort |
| Best-effort / never throws | 🟡 — the actor drops a failed flush (best-effort), but the seam is not a typed interface a caller can inject independently | ⛔ — no contract; `AppendAsync` may propagate | ✅ |
| Record type at the seam | 🟡 — OtOpcUa's own `AuditEvent` (8 fields, with Commons value-types `NodeId`/`CorrelationId`) | ⛔ — `ApiKeyAuditEntry` (4 fields) | 🟡 — ScadaBridge's ~25-field `AuditEvent` (rich record; adoption = keep own record, adopt canonical interface name + `AuditOutcome`) |
### §3 `IAuditRedactor` seam
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Named seam | ⛔ — no redactor; no payload filtering today | ⛔ — no redactor; safety by construction (entry type cannot carry a secret) | ✅ — `IAuditPayloadFilter` (`AuditEvent Apply(AuditEvent)`, pure/never-throws/over-redacts); **only the name differs** from canonical `IAuditRedactor` |
| Over-redacts on failure | ⛔ — n/a | ⛔ — n/a | ✅ — `SafeDefaultAuditPayloadFilter` is the reference |
### §4 `AuditOutcome` — the new normalized field
`Outcome` is a **genuinely new field** across all three projects. No app stores it today;
each encodes it implicitly. All three must derive and emit it at adoption:
**Gap O1 (OtOpcUa):** derive from `EventType` vocabulary — `OpcUaAccessDenied` /
`CrossClusterNamespaceAttempt``Denied`; config-write verbs → `Success`. No `Failure`
value exists in OtOpcUa's vocabulary today (failed flushes are dropped, not emitted), so
OtOpcUa will produce only `Success` / `Denied` until/unless failure events are added.
**Gap O2 (MxGateway):** derive — `constraint-denied``Denied`; all others → `Success`.
No `Failure` events are emitted today.
**Gap O3 (ScadaBridge):** derive from `AuditStatus``Delivered``Success`;
`Failed` / `Parked` / `Discarded``Failure`; `Kind = InboundAuthFailure``Denied`.
In-flight states (`Submitted` / `Forwarded` / `Attempted`) collapse to the last-known
terminal state when projecting; `Skipped` is excluded from the canonical projection.
### §5 `Actor` → Auth principal
At adoption, every emit site should supply the `ZB.MOM.WW.Auth` principal as `Actor`
(string). The library carries no Auth dependency — `Actor` is a plain `string` — but the
handshake with Auth is the semantic goal (closes the loop).
**Gap P1 (all 3):** at adoption, update emit sites to populate `Actor` from the Auth
principal (LDAP user / API-key name). Auth adoption (#8 in `components/auth/GAPS.md`) is a
prerequisite for the full story; until then, use the existing actor string.
### §6 OtOpcUa two-producer problem
OtOpcUa has **two writers to `ConfigAuditLog`**: the structured Akka `AuditEvent` path AND
older SQL stored procedures that `INSERT` directly (bare `EventType`, NULL `EventId` /
`CorrelationId`, populated `ClusterId` / `GenerationId`). Normalization targets the
structured path only; the SP path stays per-project.
**Gap Q1 (OtOpcUa):** decide at adoption whether to route SP events through the actor
or leave them non-idempotent. Also: the `ClusterId`-filter / actor-never-sets-`ClusterId`
mismatch (Admin UI `ClusterAudit.razor` filters by `ClusterId`, but the actor path sets
`NodeId` not `ClusterId`, so structured rows are invisible to the cluster view). Fix when
normalizing the query surface.
## Adoption backlog (ordered)
| # | Item | Projects | Priority | Effort | Risk | Notes |
|---|---|---|---|---|---|---|
| 1 | **OtOpcUa:** rename `AuditWriterActor` → implements `IAuditWriter`; replace `Commons/Messages/Audit/AuditEvent.cs` with canonical record; add `Outcome` derivation at every emit site (Gap O1) | OtOpcUa | Med | M | Med | Actor internals (batching / dedup / flush triggers) stay bespoke; only the seam type and record change. Commons value-types `NodeId`/`CorrelationId` bridged at construction. |
| 2 | **MxGateway:** map `IApiKeyAuditStore` / `ApiKeyAuditEntry` / `ApiKeyAuditRecord``IAuditWriter` / `AuditEvent`; generate `EventId` per write; add `"system"`/`"cli"` Actor fallback; constant `Category = "ApiKey"`; `constraint-denied``Outcome.Denied` (Gaps O2, record gaps) | MxGateway | Low | S | Med | ⚠ **COORDINATE** — a parallel session is editing this repo for the MEL→Serilog migration (Health/Telemetry normalization). Do NOT start until the Serilog session has landed (or is explicitly fenced off); the two efforts share `Security/Authentication/` DI wiring. |
| 3 | **ScadaBridge:** rename `IAuditPayloadFilter``IAuditRedactor` (or alias during transition); adopt canonical `AuditOutcome` enum (Gap O3); confirm writer contract matches (already byte-for-byte) | ScadaBridge | Low | S | High | **"Align, don't replace."** Blast radius is HIGH — `IAuditPayloadFilter` is used across the entire pipeline (site, central, wiring). Rename + alias only; no transport/storage/record change. `DefaultAuditPayloadFilter` / `SafeDefaultAuditPayloadFilter` implementations unchanged. |
| 4 | **All:** populate `Actor` from `ZB.MOM.WW.Auth` principal at emit sites (Gap P1) | All 3 | Low | S | Low | **Prerequisite:** Auth adoption per `components/auth/GAPS.md` #8. Until Auth is adopted, leave the existing actor string as-is. |
| 5 | **OtOpcUa:** reconcile two-producer problem — decide SP path routing + fix `ClusterId`-filter / actor mismatch in `ClusterAudit.razor` (Gap Q1) | OtOpcUa | Low | S | Low | Normalization does not unify the SP path; this is a reconcile item to decide and document. The mismatch means structured `AuditEvent` rows are currently invisible to the cluster-scoped view. |
| 6 | **MxGateway:** add `CorrelationId` capture at constraint denial + dashboard paths; structured `Target` from `Details` text (currently embedded as a plain string in `ConstraintEnforcer`) | MxGateway | Low | S | Low | Nice-to-have parity; not required for adoption. `CorrelationId` and `Target` canonical fields left null until this is done. |
**Sequencing:** #3 (ScadaBridge rename) is lowest-risk and self-contained — do it first (or
last, depending on blast-radius appetite). #1 (OtOpcUa) is medium effort but independent; it
can start once the shared library is built. #2 (MxGateway) is the smallest code change but
has the highest **coordination dependency** — gate it on the Serilog migration landing first.
#4 (Actor→Auth) is blocked on Auth adoption and is the last to close. #5 and #6 are cleanup
items with no bearing on shared-library adoption.
Each adoption lands as an opt-in version bump per project behind the seam; the shared library
is consumed but the bespoke transport/storage/UI for each project is not touched.
## Decisions still open
- ScadaBridge `IAuditPayloadFilter``IAuditRedactor`: outright rename vs. transitional alias
(both are valid; alias reduces blast radius in the short term).
- MxGateway `Details` plain string → `DetailsJson`: store as-is or wrap in a JSON object at
the mapping boundary.
- `AuditOutcome` column in OtOpcUa storage: add a new `Outcome` column to `ConfigAuditLog`
or fold into `DetailsJson` / derive at read time (schema change vs. runtime cost).
- OtOpcUa SP path: route through the actor path (unified producer) or leave as a bespoke
secondary writer with its own column conventions (separate reconcile effort).