docs(audit): README + GAPS adoption backlog
This commit is contained in:
@@ -0,0 +1,114 @@
|
||||
# Audit — gaps & adoption backlog
|
||||
|
||||
Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to
|
||||
reach the shared `ZB.MOM.WW.Audit` library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
|
||||
|
||||
> **Adoption is deferred this round.** The library is being designed (shared contract in
|
||||
> [`shared-contract/ZB.MOM.WW.Audit.md`](shared-contract/ZB.MOM.WW.Audit.md)) but is not yet
|
||||
> wired into any app — exactly where `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` sit today.
|
||||
> The items below are the follow-on work; each lands as a separate PR per project.
|
||||
|
||||
## Divergence vs spec
|
||||
|
||||
### §1 Canonical record (`AuditEvent`)
|
||||
|
||||
| Canonical field | OtOpcUa | MxAccessGateway | ScadaBridge |
|
||||
|---|---|---|---|
|
||||
| `EventId` (Guid, required) | ✅ — idempotency key; buffer key + filtered-unique DB index | ⛔ — no event key; only an `AUTOINCREMENT` rowid (`AuditId`) | ✅ — direct |
|
||||
| `OccurredAtUtc` (DateTimeOffset, required) | 🟡 — `DateTime` UTC; widen at mapping boundary | 🟡 — `DateTimeOffset` but store-assigned (not caller-supplied); direct after widening | 🟡 — `DateTime` UTC-forced; widen at mapping boundary |
|
||||
| `Actor` (string, required) | ✅ — direct (`AuditEvent.Actor` → `ConfigAuditLog.Principal`) | 🟡 — `KeyId` nullable; keyless events (`init-db`/`list-keys`) need a `"system"`/`"cli"` fallback | 🟡 — nullable on system-originated rows; fallback needed |
|
||||
| `Action` (string, required) | 🟡 — `Action` field exists, but persisted as `"{Category}:{Action}"` composite in `EventType`; canonical keeps them separate | ✅ — `EventType` literal direct | 🟡 — derived as `{Channel}.{Kind}` (e.g. `ApiOutbound.ApiCall`) |
|
||||
| `Outcome` (AuditOutcome, required) | ⛔ **NEW** — derived from `EventType` vocabulary; not stored today | ⛔ **NEW** — derived: `constraint-denied`→`Denied`, else `Success` | ⛔ **NEW** — derived from `Status` (+`InboundAuthFailure` Kind→`Denied`) |
|
||||
| `Category` (string?) | ✅ — `AuditEvent.Category` (e.g. `"Config"`) | ⛔ — no field; constant `"ApiKey"` at mapping | ✅ — `Channel` |
|
||||
| `Target` (string?) | ⛔ — no dedicated field; closest is `DetailsJson` | ⛔ — embedded in `Details` text (`commandKind`/`target`) | ✅ — direct |
|
||||
| `SourceNode` (string?) | ✅ — `SourceNode` (logical cluster node / host name, NOT an OPC UA NodeId) | 🟡 — `RemoteAddress`; dashboard path only (null on CLI/constraint paths) | ✅ — direct |
|
||||
| `CorrelationId` (Guid?) | ✅ — direct (`CorrelationId.Value`) | ⛔ — not captured today; left null | ✅ — direct |
|
||||
| `DetailsJson` (string?) | ✅ — direct (JSON CHECK constraint enforced) | 🟡 — `Details` is a plain string, not JSON; wrap or store as-is | 🟡 — ~15 rich/plumbing fields serialize here at the cross-project reporting boundary |
|
||||
|
||||
### §2 `IAuditWriter` seam
|
||||
|
||||
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|
||||
|---|---|---|---|
|
||||
| Named seam | ⛔ — no `IAuditWriter`; `AuditWriterActor` is the sink, consumed directly via Akka messaging | ⛔ — `IApiKeyAuditStore` (narrow, two-method) is the seam; no general `IAuditWriter` | ✅ — `IAuditWriter` with `WriteAsync(AuditEvent, CancellationToken)` signature; "failures must NEVER abort the user-facing action" contract; best-effort |
|
||||
| Best-effort / never throws | 🟡 — the actor drops a failed flush (best-effort), but the seam is not a typed interface a caller can inject independently | ⛔ — no contract; `AppendAsync` may propagate | ✅ |
|
||||
| Record type at the seam | 🟡 — OtOpcUa's own `AuditEvent` (8 fields, with Commons value-types `NodeId`/`CorrelationId`) | ⛔ — `ApiKeyAuditEntry` (4 fields) | 🟡 — ScadaBridge's ~25-field `AuditEvent` (rich record; adoption = keep own record, adopt canonical interface name + `AuditOutcome`) |
|
||||
|
||||
### §3 `IAuditRedactor` seam
|
||||
|
||||
| | OtOpcUa | MxAccessGateway | ScadaBridge |
|
||||
|---|---|---|---|
|
||||
| Named seam | ⛔ — no redactor; no payload filtering today | ⛔ — no redactor; safety by construction (entry type cannot carry a secret) | ✅ — `IAuditPayloadFilter` (`AuditEvent Apply(AuditEvent)`, pure/never-throws/over-redacts); **only the name differs** from canonical `IAuditRedactor` |
|
||||
| Over-redacts on failure | ⛔ — n/a | ⛔ — n/a | ✅ — `SafeDefaultAuditPayloadFilter` is the reference |
|
||||
|
||||
### §4 `AuditOutcome` — the new normalized field
|
||||
|
||||
`Outcome` is a **genuinely new field** across all three projects. No app stores it today;
|
||||
each encodes it implicitly. All three must derive and emit it at adoption:
|
||||
|
||||
→ **Gap O1 (OtOpcUa):** derive from `EventType` vocabulary — `OpcUaAccessDenied` /
|
||||
`CrossClusterNamespaceAttempt` → `Denied`; config-write verbs → `Success`. No `Failure`
|
||||
value exists in OtOpcUa's vocabulary today (failed flushes are dropped, not emitted), so
|
||||
OtOpcUa will produce only `Success` / `Denied` until/unless failure events are added.
|
||||
|
||||
→ **Gap O2 (MxGateway):** derive — `constraint-denied` → `Denied`; all others → `Success`.
|
||||
No `Failure` events are emitted today.
|
||||
|
||||
→ **Gap O3 (ScadaBridge):** derive from `AuditStatus` — `Delivered` → `Success`;
|
||||
`Failed` / `Parked` / `Discarded` → `Failure`; `Kind = InboundAuthFailure` → `Denied`.
|
||||
In-flight states (`Submitted` / `Forwarded` / `Attempted`) collapse to the last-known
|
||||
terminal state when projecting; `Skipped` is excluded from the canonical projection.
|
||||
|
||||
### §5 `Actor` → Auth principal
|
||||
|
||||
At adoption, every emit site should supply the `ZB.MOM.WW.Auth` principal as `Actor`
|
||||
(string). The library carries no Auth dependency — `Actor` is a plain `string` — but the
|
||||
handshake with Auth is the semantic goal (closes the loop).
|
||||
|
||||
→ **Gap P1 (all 3):** at adoption, update emit sites to populate `Actor` from the Auth
|
||||
principal (LDAP user / API-key name). Auth adoption (#8 in `components/auth/GAPS.md`) is a
|
||||
prerequisite for the full story; until then, use the existing actor string.
|
||||
|
||||
### §6 OtOpcUa two-producer problem
|
||||
|
||||
OtOpcUa has **two writers to `ConfigAuditLog`**: the structured Akka `AuditEvent` path AND
|
||||
older SQL stored procedures that `INSERT` directly (bare `EventType`, NULL `EventId` /
|
||||
`CorrelationId`, populated `ClusterId` / `GenerationId`). Normalization targets the
|
||||
structured path only; the SP path stays per-project.
|
||||
|
||||
→ **Gap Q1 (OtOpcUa):** decide at adoption whether to route SP events through the actor
|
||||
or leave them non-idempotent. Also: the `ClusterId`-filter / actor-never-sets-`ClusterId`
|
||||
mismatch (Admin UI `ClusterAudit.razor` filters by `ClusterId`, but the actor path sets
|
||||
`NodeId` not `ClusterId`, so structured rows are invisible to the cluster view). Fix when
|
||||
normalizing the query surface.
|
||||
|
||||
## Adoption backlog (ordered)
|
||||
|
||||
| # | Item | Projects | Priority | Effort | Risk | Notes |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | **OtOpcUa:** rename `AuditWriterActor` → implements `IAuditWriter`; replace `Commons/Messages/Audit/AuditEvent.cs` with canonical record; add `Outcome` derivation at every emit site (Gap O1) | OtOpcUa | Med | M | Med | Actor internals (batching / dedup / flush triggers) stay bespoke; only the seam type and record change. Commons value-types `NodeId`/`CorrelationId` bridged at construction. |
|
||||
| 2 | **MxGateway:** map `IApiKeyAuditStore` / `ApiKeyAuditEntry` / `ApiKeyAuditRecord` → `IAuditWriter` / `AuditEvent`; generate `EventId` per write; add `"system"`/`"cli"` Actor fallback; constant `Category = "ApiKey"`; `constraint-denied`→`Outcome.Denied` (Gaps O2, record gaps) | MxGateway | Low | S | Med | ⚠ **COORDINATE** — a parallel session is editing this repo for the MEL→Serilog migration (Health/Telemetry normalization). Do NOT start until the Serilog session has landed (or is explicitly fenced off); the two efforts share `Security/Authentication/` DI wiring. |
|
||||
| 3 | **ScadaBridge:** rename `IAuditPayloadFilter` → `IAuditRedactor` (or alias during transition); adopt canonical `AuditOutcome` enum (Gap O3); confirm writer contract matches (already byte-for-byte) | ScadaBridge | Low | S | High | **"Align, don't replace."** Blast radius is HIGH — `IAuditPayloadFilter` is used across the entire pipeline (site, central, wiring). Rename + alias only; no transport/storage/record change. `DefaultAuditPayloadFilter` / `SafeDefaultAuditPayloadFilter` implementations unchanged. |
|
||||
| 4 | **All:** populate `Actor` from `ZB.MOM.WW.Auth` principal at emit sites (Gap P1) | All 3 | Low | S | Low | **Prerequisite:** Auth adoption per `components/auth/GAPS.md` #8. Until Auth is adopted, leave the existing actor string as-is. |
|
||||
| 5 | **OtOpcUa:** reconcile two-producer problem — decide SP path routing + fix `ClusterId`-filter / actor mismatch in `ClusterAudit.razor` (Gap Q1) | OtOpcUa | Low | S | Low | Normalization does not unify the SP path; this is a reconcile item to decide and document. The mismatch means structured `AuditEvent` rows are currently invisible to the cluster-scoped view. |
|
||||
| 6 | **MxGateway:** add `CorrelationId` capture at constraint denial + dashboard paths; structured `Target` from `Details` text (currently embedded as a plain string in `ConstraintEnforcer`) | MxGateway | Low | S | Low | Nice-to-have parity; not required for adoption. `CorrelationId` and `Target` canonical fields left null until this is done. |
|
||||
|
||||
**Sequencing:** #3 (ScadaBridge rename) is lowest-risk and self-contained — do it first (or
|
||||
last, depending on blast-radius appetite). #1 (OtOpcUa) is medium effort but independent; it
|
||||
can start once the shared library is built. #2 (MxGateway) is the smallest code change but
|
||||
has the highest **coordination dependency** — gate it on the Serilog migration landing first.
|
||||
#4 (Actor→Auth) is blocked on Auth adoption and is the last to close. #5 and #6 are cleanup
|
||||
items with no bearing on shared-library adoption.
|
||||
|
||||
Each adoption lands as an opt-in version bump per project behind the seam; the shared library
|
||||
is consumed but the bespoke transport/storage/UI for each project is not touched.
|
||||
|
||||
## Decisions still open
|
||||
|
||||
- ScadaBridge `IAuditPayloadFilter` → `IAuditRedactor`: outright rename vs. transitional alias
|
||||
(both are valid; alias reduces blast radius in the short term).
|
||||
- MxGateway `Details` plain string → `DetailsJson`: store as-is or wrap in a JSON object at
|
||||
the mapping boundary.
|
||||
- `AuditOutcome` column in OtOpcUa storage: add a new `Outcome` column to `ConfigAuditLog`
|
||||
or fold into `DetailsJson` / derive at read time (schema change vs. runtime cost).
|
||||
- OtOpcUa SP path: route through the actor path (unified producer) or leave as a bespoke
|
||||
secondary writer with its own column conventions (separate reconcile effort).
|
||||
@@ -0,0 +1,72 @@
|
||||
# Audit (who-did-what)
|
||||
|
||||
Status: **Draft**. Normalized component — path to shared code. Goal: converge the three
|
||||
sister projects onto a canonical `AuditEvent` record + `AuditOutcome` enum + two thin seams
|
||||
(`IAuditWriter`, `IAuditRedactor`), proposed as the `ZB.MOM.WW.Audit` library, while each
|
||||
project keeps its own transport, storage, domain vocabulary, and redaction policy.
|
||||
|
||||
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
|
||||
- Canonical event model + field reference: [`spec/EVENT-MODEL.md`](spec/EVENT-MODEL.md)
|
||||
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Audit.md`](shared-contract/ZB.MOM.WW.Audit.md)
|
||||
- Divergences + backlog: [`GAPS.md`](GAPS.md)
|
||||
- Current state, per project: [`current-state/`](current-state/)
|
||||
|
||||
## Why audit is a strong normalization candidate
|
||||
|
||||
All three projects record a structured who-did-what trail with an actor identity, an action
|
||||
verb, and a timestamp. Two (OtOpcUa + ScadaBridge) already have a named `AuditEvent` record
|
||||
with an `EventId` idempotency key, `Actor`, and `CorrelationId`. ScadaBridge already ships
|
||||
**both** canonical seams under slightly different names (`IAuditWriter` is byte-for-byte the
|
||||
spec; `IAuditPayloadFilter` is the canonical `IAuditRedactor`). OtOpcUa's record is almost
|
||||
field-for-field aligned. MxGateway has a narrow API-key-lifecycle log that maps cleanly.
|
||||
|
||||
The one new field across all three is `AuditOutcome` — no project stores it explicitly today;
|
||||
each encodes it implicitly and derives it at adoption. This is the bulk of the per-project
|
||||
work. Transport, storage, domain vocabulary, and redaction policy are **not** unified — each
|
||||
project keeps its own bespoke implementation behind the seam.
|
||||
|
||||
**Audit closes the loop on Auth.** Every audit row's `Actor` is exactly the identity that the
|
||||
`ZB.MOM.WW.Auth` component normalizes (LDAP/GLAuth principal, API-key name). The library keeps
|
||||
`Actor` as a plain `string` (no Auth dependency), but at adoption each emit site supplies the
|
||||
Auth principal.
|
||||
|
||||
**`IAuditRedactor` naming is aligned with Telemetry's `ILogRedactor`** — same shape and naming
|
||||
discipline so a future `ZB.MOM.WW.Hosting` aggregator wires both redactors with one mental
|
||||
model — but there is no cross-package dependency between the two libraries.
|
||||
|
||||
## Status by project
|
||||
|
||||
| Project | Audit today | Seams present | `AuditOutcome` | Adoption status |
|
||||
|---|---|---|---|---|
|
||||
| **OtOpcUa** | Akka cluster-broadcast `AuditEvent` → cluster-singleton `AuditWriterActor` (batch 500/5 s, two-layer dedup) over EF `ConfigAuditLog` (SQL Server). Also a legacy SQL stored-procedure write path (bare `EventType`, NULL `EventId`). Admin UI page `ClusterAudit.razor`. | No named `IAuditWriter` seam; no redactor seam. | Not stored — encoded in `EventType` strings (`OpcUaAccessDenied`/`CrossClusterNamespaceAttempt` → `Denied`; config-write verbs → `Success`). | Not started |
|
||||
| **MxAccessGateway** | Single SQLite-backed `IApiKeyAuditStore` / `ApiKeyAuditEntry` — key lifecycle (CLI + dashboard) + constraint denials only. No authn events persisted; no production read consumer. | Narrow custom seam (`IApiKeyAuditStore`); no general `IAuditWriter`; redaction is by-construction (secret never enters the record type). | Not stored — derived: `constraint-denied` → `Denied`; all others → `Success`. | Not started |
|
||||
| **ScadaBridge** | Full pipeline: site SQLite hot-path (`SqliteAuditWriter` + ring-buffer fallback) → Akka `ClusterClient` forwarder → central MS SQL (ingest / reconcile / purge / partition maintenance). Rich ~25-field `AuditEvent` record. CLI `export`/`verify-chain`; Blazor audit UI. | ✅ `IAuditWriter` (matches canonical contract word-for-word); ✅ `IAuditPayloadFilter` (= canonical `IAuditRedactor`, identical signature, pure/never-throws/over-redacts). | Not stored explicitly — derived from `Status` (`Delivered`→`Success`; `Failed`/`Parked`/`Discarded`→`Failure`; `Kind = InboundAuthFailure`→`Denied`). | Not started (align, don't replace) |
|
||||
|
||||
See each project's `current-state/<project>/CURRENT-STATE.md` for code-verified detail and
|
||||
adoption plan:
|
||||
|
||||
- [`current-state/otopcua/CURRENT-STATE.md`](current-state/otopcua/CURRENT-STATE.md)
|
||||
- [`current-state/mxaccessgw/CURRENT-STATE.md`](current-state/mxaccessgw/CURRENT-STATE.md)
|
||||
- [`current-state/scadabridge/CURRENT-STATE.md`](current-state/scadabridge/CURRENT-STATE.md)
|
||||
|
||||
## Normalized vs. left per-project
|
||||
|
||||
**Normalized (the shared `ZB.MOM.WW.Audit` library):** the canonical `AuditEvent` record
|
||||
(5 required fields + 4 optional common + `DetailsJson` extension bag); the `AuditOutcome`
|
||||
enum (`Success | Failure | Denied`); the `IAuditWriter` seam (best-effort, never throws to
|
||||
caller); the `IAuditRedactor` seam (pure, never throws, over-redacts on failure); shipped
|
||||
helpers (`NoOpAuditWriter`, `CompositeAuditWriter`, `RedactingAuditWriter`,
|
||||
`NullAuditRedactor`, `TruncatingAuditRedactor`). Library is BCL-only — no Akka / EF / SQLite
|
||||
/ Serilog dependency.
|
||||
|
||||
**Left per-project (each project keeps these behind the seam):** transport and storage (Akka
|
||||
singleton + EF/SQL Server; SQLite; site-SQLite + central MS SQL + forwarding/reconcile
|
||||
pipeline); domain vocabulary (`EventType` strings / API-key event-type literals / `Channel` +
|
||||
`Kind` + `Status` enums); query, CLI, and UI surfaces (`ClusterAudit.razor`; `ListRecentAsync`;
|
||||
`export` / `verify-chain`; Blazor audit pages); redaction *policy* (which fields/payloads are
|
||||
sensitive — only the `IAuditRedactor` *seam* is shared).
|
||||
|
||||
> **Adoption is deferred this round.** The `ZB.MOM.WW.Audit` library is being designed and
|
||||
> the shared contract defined, but none of the three apps wire it in yet — exactly where
|
||||
> `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` sit today. The per-project adoption backlog is in
|
||||
> [`GAPS.md`](GAPS.md).
|
||||
Reference in New Issue
Block a user