docs(audit): current-state OtOpcUa
This commit is contained in:
@@ -0,0 +1,140 @@
|
||||
# Audit — current state: OtOpcUa
|
||||
|
||||
Repo: `~/Desktop/OtOpcUa` (Gitea `lmxopcua`). Stack: .NET 10, Akka.NET cluster, EF Core + SQL Server.
|
||||
All paths below are relative to the repo root. Verified against source on 2026-06-01.
|
||||
|
||||
OtOpcUa already has a structured, idempotent audit pipeline: a cluster-broadcast `AuditEvent`
|
||||
message, a cluster-singleton writer actor that batches and bulk-inserts, and an append-only
|
||||
`ConfigAuditLog` EF entity with two-layer dedup. There is **also** a second, older write path —
|
||||
SQL stored procedures that `INSERT dbo.ConfigAuditLog` directly — so the table has two
|
||||
producers with slightly different column conventions (see §1).
|
||||
|
||||
## 1. How it works today
|
||||
|
||||
**Record shape** — `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs:9-17`:
|
||||
a sealed record `AuditEvent(Guid EventId, string Category, string Action, string Actor,
|
||||
DateTime OccurredAtUtc, string? DetailsJson, NodeId SourceNode, CorrelationId CorrelationId)`.
|
||||
`NodeId` and `CorrelationId` are Commons value-types — `NodeId` wraps a string (the *logical
|
||||
cluster node / host name*, explicitly **not** an OPC UA NodeId per its XML doc,
|
||||
`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/NodeId.cs:3-8`); `CorrelationId` wraps a `Guid`
|
||||
(`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/CorrelationId.cs:3`).
|
||||
|
||||
**Transport** — `AuditEvent` is an Akka message meant to be sent to the `AuditWriterActor`
|
||||
**cluster singleton** (`AuditEvent.cs:6` describes it as "cluster-broadcast … consumed by the
|
||||
`AuditWriterActor` singleton"). The singleton is registered through Akka.Hosting at
|
||||
`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ServiceCollectionExtensions.cs:68-75`
|
||||
(`WithSingleton<AuditWriterActorKey>(AuditWriterSingletonName, …)`). Any cluster member can
|
||||
emit an `AuditEvent`; the singleton is the one sink that persists it.
|
||||
|
||||
**Storage** — EF entity `ConfigAuditLog`
|
||||
(`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs:7-44`): append-only
|
||||
("Grants revoked for UPDATE/DELETE on all principals", `ConfigAuditLog.cs:4-5`). Columns:
|
||||
`AuditId` (identity PK), `Timestamp` (default `SYSUTCDATETIME()`), `Principal`, `EventType`,
|
||||
`ClusterId?`, `NodeId?`, `GenerationId?`, `DetailsJson?`, `EventId?` (Guid), `CorrelationId?`
|
||||
(Guid). Mapping/constraints in `OtOpcUaConfigDbContext.cs:429-463`: `DetailsJson` must be valid
|
||||
JSON (`CK_ConfigAuditLog_DetailsJson_IsJson`, line 435-436); `Principal`/`EventType`/`ClusterId`/`NodeId`
|
||||
length-capped (lines 441-444); supporting indexes `IX_ConfigAuditLog_Cluster_Time` (line 449-451)
|
||||
and `IX_ConfigAuditLog_Generation` (line 452-454).
|
||||
|
||||
**Writer / batching** — `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs`:
|
||||
a `ReceiveActor` with `FlushBatchSize = 500` (line 25) and `FlushInterval = 5s` (line 26).
|
||||
It buffers events in a `Dictionary<Guid, AuditEvent>` keyed by `EventId` (line 30), flushing
|
||||
when the buffer hits 500 (line 60), when the 5s periodic timer fires (`PreStart`, line 50-53),
|
||||
or on `PreRestart`/`PostStop` (lines 96-107) so a supervisor swap or coordinated shutdown does
|
||||
not lose the buffer. `FlushBuffer` (lines 63-93) snapshots and clears the buffer, then for each
|
||||
event constructs a `ConfigAuditLog` row (lines 75-84): `Timestamp = OccurredAtUtc`,
|
||||
`Principal = Actor`, `EventType = $"{Category}:{Action}"`, `NodeId = SourceNode.Value`,
|
||||
`DetailsJson`, `EventId`, `CorrelationId = CorrelationId.Value`. A failed flush is logged and the
|
||||
batch is **dropped** (`catch` at lines 89-92) — best-effort, no retry/dead-letter.
|
||||
|
||||
**Dedup / idempotency (two layers)** — described at `AuditWriterActor.cs:17-21`:
|
||||
1. *In-buffer* — duplicate `EventId`s within a batch collapse via the dictionary (last-write-wins;
|
||||
`HandleEvent`, lines 55-61).
|
||||
2. *Database* — a **filtered unique index** `UX_ConfigAuditLog_EventId` (`OtOpcUaConfigDbContext.cs:459-462`,
|
||||
`IsUnique()` + `HasFilter("[EventId] IS NOT NULL")`) gives cross-restart safety: a retry of an
|
||||
already-flushed batch hits the constraint, the duplicate insert is dropped, and the rest of the
|
||||
batch survives. `EventId`/`CorrelationId` are nullable so legacy/backfill rows (NULL) don't
|
||||
collide — confirmed in the entity XML (`ConfigAuditLog.cs:33-43`) and migration
|
||||
`Migrations/20260526105027_AddConfigAuditLogEventIdColumns.cs:27-38`.
|
||||
|
||||
**Scope** — two producers, two conventions:
|
||||
- **Akka `AuditEvent` path** (the structured one): config writes + authorization checks. The
|
||||
EventType vocabulary lives in the entity XML doc (`ConfigAuditLog.cs:18`): `DraftCreated |
|
||||
DraftEdited | Published | RolledBack | NodeApplied | CredentialAdded | CredentialDisabled |
|
||||
ClusterCreated | NodeAdded | ExternalIdReleased | CrossClusterNamespaceAttempt |
|
||||
OpcUaAccessDenied | …`. Note the access-denied / cross-cluster entries are authz-check events,
|
||||
not config writes.
|
||||
- **SQL stored-procedure path** (older, still present): several SPs `INSERT dbo.ConfigAuditLog`
|
||||
directly — e.g. `Published`/`RolledBack`/`NodeApplied`/`ExternalIdReleased`/`CrossClusterNamespaceAttempt`
|
||||
in `Migrations/20260417215224_StoredProcedures.cs:151,217,351,407,504`. These use `SUSER_SNAME()`
|
||||
as `Principal`, set `ClusterId`/`GenerationId`, write a **bare** `EventType` (no `Category:Action`
|
||||
split), and leave `EventId`/`CorrelationId` NULL.
|
||||
|
||||
**Query / UI** — the only read surface is the Admin UI page
|
||||
`src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/ClusterAudit.razor`
|
||||
(`@page "/clusters/{ClusterId}/audit"`, `[Authorize]`, lines 1-2). It reads the latest
|
||||
`PageSize = 200` rows (line 69) **filtered by `ClusterId`**, newest-first (`OnInitializedAsync`,
|
||||
lines 74-82), and renders Timestamp / Principal / Event(Type) / Node / Correlation(first 8 hex) /
|
||||
Details columns (lines 38-58). Tested in
|
||||
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AuditWriterActorTests.cs`: count-threshold
|
||||
flush (lines 26-41), in-buffer dedup of duplicate EventIds (lines 45-62), `PostStop` flush
|
||||
(lines 66-81), and the column mapping incl. `EventType == "Config:Edit"` and `NodeId == "node-a"`
|
||||
(lines 85-104).
|
||||
|
||||
> Load-bearing gotcha: the actor path **never sets `ClusterId`** (lines 75-84), but the UI filters
|
||||
> on `ClusterId` (`ClusterAudit.razor:78`). So today the cluster-scoped view surfaces the
|
||||
> stored-procedure rows; structured `AuditEvent` rows written by the actor (which carry the host in
|
||||
> `NodeId`, not `ClusterId`) won't appear under a cluster. Worth flagging during normalization.
|
||||
|
||||
## 2. Mapping to the canonical `AuditEvent`
|
||||
|
||||
Target = `ZB.MOM.WW.Audit.AuditEvent` (built in parallel). OtOpcUa's existing `AuditEvent` is
|
||||
already almost field-for-field aligned; the only synthesized field is `Outcome`.
|
||||
|
||||
| Canonical field | OtOpcUa source | Mapping |
|
||||
|---|---|---|
|
||||
| `Guid EventId` | `AuditEvent.EventId` | Direct. Already the idempotency key (buffer key + `UX_ConfigAuditLog_EventId`). |
|
||||
| `DateTimeOffset OccurredAtUtc` | `AuditEvent.OccurredAtUtc` (`DateTime`) | Direct; widen `DateTime`(UTC) → `DateTimeOffset`. |
|
||||
| `string Actor` | `AuditEvent.Actor` | Direct (→ `ConfigAuditLog.Principal`). At Auth adoption this becomes the `ZB.MOM.WW.Auth` principal. |
|
||||
| `string Action` | `AuditEvent.Action` (+ `Category`) | Direct. Today persisted as `"{Category}:{Action}"` in `EventType`; canonical keeps `Action` and `Category` separate. |
|
||||
| `AuditOutcome Outcome` | *(none)* | **Derived** from the EventType vocabulary, not stored today. `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt` → `Denied`; the config-write verbs → `Success`. No explicit `Failure` value exists yet (a failed flush is dropped, not recorded as an event). |
|
||||
| `string? Category` | `AuditEvent.Category` | Direct (e.g. `"Config"`). |
|
||||
| `string? Target` | *(none)* | No dedicated field today; the closest is `SourceNode`→`NodeId` (the acting host) or details. Leave null or carry the affected object in `DetailsJson`. |
|
||||
| `string? SourceNode` | `AuditEvent.SourceNode` (`NodeId.Value`) | Direct — the logical cluster node / host name (NOT an OPC UA NodeId). Currently lands in `ConfigAuditLog.NodeId`. |
|
||||
| `Guid? CorrelationId` | `AuditEvent.CorrelationId` (`CorrelationId.Value`) | Direct. |
|
||||
| `string? DetailsJson` | `AuditEvent.DetailsJson` | Direct; carries everything else (incl. `ClusterId`/`GenerationId`, which today are separate columns on the SP path). |
|
||||
|
||||
## 3. Adoption plan → `ZB.MOM.WW.Audit`
|
||||
|
||||
**Effort: medium.** OtOpcUa is the *donor* design for the canonical record, so most of the work is
|
||||
re-pointing types and bridging two persistence conventions, not redesigning the pipeline.
|
||||
|
||||
**Replace with the shared library:**
|
||||
- `Commons/Messages/Audit/AuditEvent.cs` → the canonical `ZB.MOM.WW.Audit.AuditEvent`. Add the new
|
||||
`Outcome` field (derive it at every emit site from the EventType vocabulary, e.g.
|
||||
`OpcUaAccessDenied → Denied`); keep `Category`/`Action`/`SourceNode`/`CorrelationId` as-is. Decide
|
||||
whether `SourceNode`/`CorrelationId` carry the Commons value-types or the canonical primitives at
|
||||
the seam (likely a thin adapter at construction).
|
||||
- `AuditWriterActor` → implement the library's `IAuditWriter` (keep the actor as OtOpcUa's
|
||||
Akka-cluster-singleton transport/batching adapter behind that seam; the 500/5s batching,
|
||||
PreRestart/PostStop flush, and two-layer dedup stay bespoke per §"left per-project").
|
||||
|
||||
**Keep bespoke (thin adapter only):**
|
||||
- Transport — the cluster-broadcast → singleton `AuditWriterActor`, batching, and flush triggers.
|
||||
- Storage — the `ConfigAuditLog` EF entity, indexes, and `UX_ConfigAuditLog_EventId` idempotency
|
||||
index. Map the canonical record onto the existing columns; add an `Outcome` column (or fold it into
|
||||
`EventType`/`DetailsJson` if a schema change is undesirable). `ClusterId`/`GenerationId` remain
|
||||
OtOpcUa-specific columns fed via `DetailsJson` or kept as side columns.
|
||||
- Domain vocabulary — the EventType strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …)
|
||||
and the `Category:Action` composition convention.
|
||||
- Query/UI — `ClusterAudit.razor` and its `ClusterId` filter.
|
||||
|
||||
**Reconcile, not extract:**
|
||||
- The **two producers** (Akka `AuditEvent` path vs. SQL stored-procedure `INSERT`s using
|
||||
`SUSER_SNAME()`). The SP path bypasses the canonical record entirely and writes a different
|
||||
column convention (bare `EventType`, NULL `EventId`/`CorrelationId`, populated
|
||||
`ClusterId`/`GenerationId`). Adopting the library does not by itself unify these; either route the
|
||||
SP events through the actor or accept that SP rows stay non-idempotent and absent from the
|
||||
`EventId` dedup guarantee. Flag for the normalization spec.
|
||||
- The **`ClusterId`-filter / actor-never-sets-`ClusterId`** mismatch noted in §1 — fix when the
|
||||
query surface is normalized so structured `AuditEvent` rows are discoverable by cluster.
|
||||
Reference in New Issue
Block a user