From e498bb7c5ab10d8e2e4caf85dba559b7a96b3505 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 1 Jun 2026 06:55:07 -0400 Subject: [PATCH] docs(audit): current-state OtOpcUa --- .../current-state/otopcua/CURRENT-STATE.md | 140 ++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 components/audit/current-state/otopcua/CURRENT-STATE.md diff --git a/components/audit/current-state/otopcua/CURRENT-STATE.md b/components/audit/current-state/otopcua/CURRENT-STATE.md new file mode 100644 index 0000000..3c18263 --- /dev/null +++ b/components/audit/current-state/otopcua/CURRENT-STATE.md @@ -0,0 +1,140 @@ +# Audit — current state: OtOpcUa + +Repo: `~/Desktop/OtOpcUa` (Gitea `lmxopcua`). Stack: .NET 10, Akka.NET cluster, EF Core + SQL Server. +All paths below are relative to the repo root. Verified against source on 2026-06-01. + +OtOpcUa already has a structured, idempotent audit pipeline: a cluster-broadcast `AuditEvent` +message, a cluster-singleton writer actor that batches and bulk-inserts, and an append-only +`ConfigAuditLog` EF entity with two-layer dedup. There is **also** a second, older write path — +SQL stored procedures that `INSERT dbo.ConfigAuditLog` directly — so the table has two +producers with slightly different column conventions (see §1). + +## 1. How it works today + +**Record shape** — `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs:9-17`: +a sealed record `AuditEvent(Guid EventId, string Category, string Action, string Actor, +DateTime OccurredAtUtc, string? DetailsJson, NodeId SourceNode, CorrelationId CorrelationId)`. +`NodeId` and `CorrelationId` are Commons value-types — `NodeId` wraps a string (the *logical +cluster node / host name*, explicitly **not** an OPC UA NodeId per its XML doc, +`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/NodeId.cs:3-8`); `CorrelationId` wraps a `Guid` +(`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/CorrelationId.cs:3`). + +**Transport** — `AuditEvent` is an Akka message meant to be sent to the `AuditWriterActor` +**cluster singleton** (`AuditEvent.cs:6` describes it as "cluster-broadcast … consumed by the +`AuditWriterActor` singleton"). The singleton is registered through Akka.Hosting at +`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ServiceCollectionExtensions.cs:68-75` +(`WithSingleton(AuditWriterSingletonName, …)`). Any cluster member can +emit an `AuditEvent`; the singleton is the one sink that persists it. + +**Storage** — EF entity `ConfigAuditLog` +(`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs:7-44`): append-only +("Grants revoked for UPDATE/DELETE on all principals", `ConfigAuditLog.cs:4-5`). Columns: +`AuditId` (identity PK), `Timestamp` (default `SYSUTCDATETIME()`), `Principal`, `EventType`, +`ClusterId?`, `NodeId?`, `GenerationId?`, `DetailsJson?`, `EventId?` (Guid), `CorrelationId?` +(Guid). Mapping/constraints in `OtOpcUaConfigDbContext.cs:429-463`: `DetailsJson` must be valid +JSON (`CK_ConfigAuditLog_DetailsJson_IsJson`, line 435-436); `Principal`/`EventType`/`ClusterId`/`NodeId` +length-capped (lines 441-444); supporting indexes `IX_ConfigAuditLog_Cluster_Time` (line 449-451) +and `IX_ConfigAuditLog_Generation` (line 452-454). + +**Writer / batching** — `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs`: +a `ReceiveActor` with `FlushBatchSize = 500` (line 25) and `FlushInterval = 5s` (line 26). +It buffers events in a `Dictionary` keyed by `EventId` (line 30), flushing +when the buffer hits 500 (line 60), when the 5s periodic timer fires (`PreStart`, line 50-53), +or on `PreRestart`/`PostStop` (lines 96-107) so a supervisor swap or coordinated shutdown does +not lose the buffer. `FlushBuffer` (lines 63-93) snapshots and clears the buffer, then for each +event constructs a `ConfigAuditLog` row (lines 75-84): `Timestamp = OccurredAtUtc`, +`Principal = Actor`, `EventType = $"{Category}:{Action}"`, `NodeId = SourceNode.Value`, +`DetailsJson`, `EventId`, `CorrelationId = CorrelationId.Value`. A failed flush is logged and the +batch is **dropped** (`catch` at lines 89-92) — best-effort, no retry/dead-letter. + +**Dedup / idempotency (two layers)** — described at `AuditWriterActor.cs:17-21`: +1. *In-buffer* — duplicate `EventId`s within a batch collapse via the dictionary (last-write-wins; + `HandleEvent`, lines 55-61). +2. *Database* — a **filtered unique index** `UX_ConfigAuditLog_EventId` (`OtOpcUaConfigDbContext.cs:459-462`, + `IsUnique()` + `HasFilter("[EventId] IS NOT NULL")`) gives cross-restart safety: a retry of an + already-flushed batch hits the constraint, the duplicate insert is dropped, and the rest of the + batch survives. `EventId`/`CorrelationId` are nullable so legacy/backfill rows (NULL) don't + collide — confirmed in the entity XML (`ConfigAuditLog.cs:33-43`) and migration + `Migrations/20260526105027_AddConfigAuditLogEventIdColumns.cs:27-38`. + +**Scope** — two producers, two conventions: +- **Akka `AuditEvent` path** (the structured one): config writes + authorization checks. The + EventType vocabulary lives in the entity XML doc (`ConfigAuditLog.cs:18`): `DraftCreated | + DraftEdited | Published | RolledBack | NodeApplied | CredentialAdded | CredentialDisabled | + ClusterCreated | NodeAdded | ExternalIdReleased | CrossClusterNamespaceAttempt | + OpcUaAccessDenied | …`. Note the access-denied / cross-cluster entries are authz-check events, + not config writes. +- **SQL stored-procedure path** (older, still present): several SPs `INSERT dbo.ConfigAuditLog` + directly — e.g. `Published`/`RolledBack`/`NodeApplied`/`ExternalIdReleased`/`CrossClusterNamespaceAttempt` + in `Migrations/20260417215224_StoredProcedures.cs:151,217,351,407,504`. These use `SUSER_SNAME()` + as `Principal`, set `ClusterId`/`GenerationId`, write a **bare** `EventType` (no `Category:Action` + split), and leave `EventId`/`CorrelationId` NULL. + +**Query / UI** — the only read surface is the Admin UI page +`src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/ClusterAudit.razor` +(`@page "/clusters/{ClusterId}/audit"`, `[Authorize]`, lines 1-2). It reads the latest +`PageSize = 200` rows (line 69) **filtered by `ClusterId`**, newest-first (`OnInitializedAsync`, +lines 74-82), and renders Timestamp / Principal / Event(Type) / Node / Correlation(first 8 hex) / +Details columns (lines 38-58). Tested in +`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AuditWriterActorTests.cs`: count-threshold +flush (lines 26-41), in-buffer dedup of duplicate EventIds (lines 45-62), `PostStop` flush +(lines 66-81), and the column mapping incl. `EventType == "Config:Edit"` and `NodeId == "node-a"` +(lines 85-104). + +> Load-bearing gotcha: the actor path **never sets `ClusterId`** (lines 75-84), but the UI filters +> on `ClusterId` (`ClusterAudit.razor:78`). So today the cluster-scoped view surfaces the +> stored-procedure rows; structured `AuditEvent` rows written by the actor (which carry the host in +> `NodeId`, not `ClusterId`) won't appear under a cluster. Worth flagging during normalization. + +## 2. Mapping to the canonical `AuditEvent` + +Target = `ZB.MOM.WW.Audit.AuditEvent` (built in parallel). OtOpcUa's existing `AuditEvent` is +already almost field-for-field aligned; the only synthesized field is `Outcome`. + +| Canonical field | OtOpcUa source | Mapping | +|---|---|---| +| `Guid EventId` | `AuditEvent.EventId` | Direct. Already the idempotency key (buffer key + `UX_ConfigAuditLog_EventId`). | +| `DateTimeOffset OccurredAtUtc` | `AuditEvent.OccurredAtUtc` (`DateTime`) | Direct; widen `DateTime`(UTC) → `DateTimeOffset`. | +| `string Actor` | `AuditEvent.Actor` | Direct (→ `ConfigAuditLog.Principal`). At Auth adoption this becomes the `ZB.MOM.WW.Auth` principal. | +| `string Action` | `AuditEvent.Action` (+ `Category`) | Direct. Today persisted as `"{Category}:{Action}"` in `EventType`; canonical keeps `Action` and `Category` separate. | +| `AuditOutcome Outcome` | *(none)* | **Derived** from the EventType vocabulary, not stored today. `OpcUaAccessDenied`/`CrossClusterNamespaceAttempt` → `Denied`; the config-write verbs → `Success`. No explicit `Failure` value exists yet (a failed flush is dropped, not recorded as an event). | +| `string? Category` | `AuditEvent.Category` | Direct (e.g. `"Config"`). | +| `string? Target` | *(none)* | No dedicated field today; the closest is `SourceNode`→`NodeId` (the acting host) or details. Leave null or carry the affected object in `DetailsJson`. | +| `string? SourceNode` | `AuditEvent.SourceNode` (`NodeId.Value`) | Direct — the logical cluster node / host name (NOT an OPC UA NodeId). Currently lands in `ConfigAuditLog.NodeId`. | +| `Guid? CorrelationId` | `AuditEvent.CorrelationId` (`CorrelationId.Value`) | Direct. | +| `string? DetailsJson` | `AuditEvent.DetailsJson` | Direct; carries everything else (incl. `ClusterId`/`GenerationId`, which today are separate columns on the SP path). | + +## 3. Adoption plan → `ZB.MOM.WW.Audit` + +**Effort: medium.** OtOpcUa is the *donor* design for the canonical record, so most of the work is +re-pointing types and bridging two persistence conventions, not redesigning the pipeline. + +**Replace with the shared library:** +- `Commons/Messages/Audit/AuditEvent.cs` → the canonical `ZB.MOM.WW.Audit.AuditEvent`. Add the new + `Outcome` field (derive it at every emit site from the EventType vocabulary, e.g. + `OpcUaAccessDenied → Denied`); keep `Category`/`Action`/`SourceNode`/`CorrelationId` as-is. Decide + whether `SourceNode`/`CorrelationId` carry the Commons value-types or the canonical primitives at + the seam (likely a thin adapter at construction). +- `AuditWriterActor` → implement the library's `IAuditWriter` (keep the actor as OtOpcUa's + Akka-cluster-singleton transport/batching adapter behind that seam; the 500/5s batching, + PreRestart/PostStop flush, and two-layer dedup stay bespoke per §"left per-project"). + +**Keep bespoke (thin adapter only):** +- Transport — the cluster-broadcast → singleton `AuditWriterActor`, batching, and flush triggers. +- Storage — the `ConfigAuditLog` EF entity, indexes, and `UX_ConfigAuditLog_EventId` idempotency + index. Map the canonical record onto the existing columns; add an `Outcome` column (or fold it into + `EventType`/`DetailsJson` if a schema change is undesirable). `ClusterId`/`GenerationId` remain + OtOpcUa-specific columns fed via `DetailsJson` or kept as side columns. +- Domain vocabulary — the EventType strings (`DraftCreated`, `Published`, `OpcUaAccessDenied`, …) + and the `Category:Action` composition convention. +- Query/UI — `ClusterAudit.razor` and its `ClusterId` filter. + +**Reconcile, not extract:** +- The **two producers** (Akka `AuditEvent` path vs. SQL stored-procedure `INSERT`s using + `SUSER_SNAME()`). The SP path bypasses the canonical record entirely and writes a different + column convention (bare `EventType`, NULL `EventId`/`CorrelationId`, populated + `ClusterId`/`GenerationId`). Adopting the library does not by itself unify these; either route the + SP events through the actor or accept that SP rows stay non-idempotent and absent from the + `EventId` dedup guarantee. Flag for the normalization spec. +- The **`ClusterId`-filter / actor-never-sets-`ClusterId`** mismatch noted in §1 — fix when the + query surface is normalized so structured `AuditEvent` rows are discoverable by cluster.