Files
scadaproj/components/audit/current-state/otopcua/CURRENT-STATE.md
T

10 KiB

Audit — current state: OtOpcUa

Repo: ~/Desktop/OtOpcUa (Gitea lmxopcua). Stack: .NET 10, Akka.NET cluster, EF Core + SQL Server. All paths below are relative to the repo root. Verified against source on 2026-06-01.

OtOpcUa already has a structured, idempotent audit pipeline: a cluster-broadcast AuditEvent message, a cluster-singleton writer actor that batches and bulk-inserts, and an append-only ConfigAuditLog EF entity with two-layer dedup. There is also a second, older write path — SQL stored procedures that INSERT dbo.ConfigAuditLog directly — so the table has two producers with slightly different column conventions (see §1).

1. How it works today

Record shapesrc/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Audit/AuditEvent.cs:9-17: a sealed record AuditEvent(Guid EventId, string Category, string Action, string Actor, DateTime OccurredAtUtc, string? DetailsJson, NodeId SourceNode, CorrelationId CorrelationId). NodeId and CorrelationId are Commons value-types — NodeId wraps a string (the logical cluster node / host name, explicitly not an OPC UA NodeId per its XML doc, src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/NodeId.cs:3-8); CorrelationId wraps a Guid (src/Core/ZB.MOM.WW.OtOpcUa.Commons/Types/CorrelationId.cs:3).

TransportAuditEvent is an Akka message meant to be sent to the AuditWriterActor cluster singleton (AuditEvent.cs:6 describes it as "cluster-broadcast … consumed by the AuditWriterActor singleton"). The singleton is registered through Akka.Hosting at src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/ServiceCollectionExtensions.cs:68-75 (WithSingleton<AuditWriterActorKey>(AuditWriterSingletonName, …)). Any cluster member can emit an AuditEvent; the singleton is the one sink that persists it.

Storage — EF entity ConfigAuditLog (src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ConfigAuditLog.cs:7-44): append-only ("Grants revoked for UPDATE/DELETE on all principals", ConfigAuditLog.cs:4-5). Columns: AuditId (identity PK), Timestamp (default SYSUTCDATETIME()), Principal, EventType, ClusterId?, NodeId?, GenerationId?, DetailsJson?, EventId? (Guid), CorrelationId? (Guid). Mapping/constraints in OtOpcUaConfigDbContext.cs:429-463: DetailsJson must be valid JSON (CK_ConfigAuditLog_DetailsJson_IsJson, line 435-436); Principal/EventType/ClusterId/NodeId length-capped (lines 441-444); supporting indexes IX_ConfigAuditLog_Cluster_Time (line 449-451) and IX_ConfigAuditLog_Generation (line 452-454).

Writer / batchingsrc/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs: a ReceiveActor with FlushBatchSize = 500 (line 25) and FlushInterval = 5s (line 26). It buffers events in a Dictionary<Guid, AuditEvent> keyed by EventId (line 30), flushing when the buffer hits 500 (line 60), when the 5s periodic timer fires (PreStart, line 50-53), or on PreRestart/PostStop (lines 96-107) so a supervisor swap or coordinated shutdown does not lose the buffer. FlushBuffer (lines 63-93) snapshots and clears the buffer, then for each event constructs a ConfigAuditLog row (lines 75-84): Timestamp = OccurredAtUtc, Principal = Actor, EventType = $"{Category}:{Action}", NodeId = SourceNode.Value, DetailsJson, EventId, CorrelationId = CorrelationId.Value. A failed flush is logged and the batch is dropped (catch at lines 89-92) — best-effort, no retry/dead-letter.

Dedup / idempotency (two layers) — described at AuditWriterActor.cs:17-21:

  1. In-buffer — duplicate EventIds within a batch collapse via the dictionary (last-write-wins; HandleEvent, lines 55-61).
  2. Database — a filtered unique index UX_ConfigAuditLog_EventId (OtOpcUaConfigDbContext.cs:459-462, IsUnique() + HasFilter("[EventId] IS NOT NULL")) gives cross-restart safety: a retry of an already-flushed batch hits the constraint, the duplicate insert is dropped, and the rest of the batch survives. EventId/CorrelationId are nullable so legacy/backfill rows (NULL) don't collide — confirmed in the entity XML (ConfigAuditLog.cs:33-43) and migration Migrations/20260526105027_AddConfigAuditLogEventIdColumns.cs:26-31.

Scope — two producers, two conventions:

  • Akka AuditEvent path (the structured one): config writes + authorization checks. The EventType vocabulary lives in the entity XML doc (ConfigAuditLog.cs:18): DraftCreated | DraftEdited | Published | RolledBack | NodeApplied | CredentialAdded | CredentialDisabled | ClusterCreated | NodeAdded | ExternalIdReleased | CrossClusterNamespaceAttempt | OpcUaAccessDenied | …. Note the access-denied / cross-cluster entries are authz-check events, not config writes.
  • SQL stored-procedure path (older, still present): several SPs INSERT dbo.ConfigAuditLog directly — e.g. Published/RolledBack/NodeApplied/ExternalIdReleased/CrossClusterNamespaceAttempt in Migrations/20260417215224_StoredProcedures.cs:151,217,351,407,504. These use SUSER_SNAME() as Principal, set ClusterId/GenerationId, write a bare EventType (no Category:Action split), and leave EventId/CorrelationId NULL.

Query / UI — the only read surface is the Admin UI page src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Clusters/ClusterAudit.razor (@page "/clusters/{ClusterId}/audit", [Authorize], lines 1-2). It reads the latest PageSize = 200 rows (line 69) filtered by ClusterId, newest-first (OnInitializedAsync, lines 74-82), and renders Timestamp / Principal / Event(Type) / Node / Correlation(first 8 hex) / Details columns (lines 38-58). Tested in tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/AuditWriterActorTests.cs: count-threshold flush (lines 26-41), in-buffer dedup of duplicate EventIds (lines 45-62), PostStop flush (lines 66-81), and the column mapping incl. EventType == "Config:Edit" and NodeId == "node-a" (lines 85-104).

Load-bearing gotcha: the actor path never sets ClusterId (lines 75-84), but the UI filters on ClusterId (ClusterAudit.razor:78). So today the cluster-scoped view surfaces the stored-procedure rows; structured AuditEvent rows written by the actor (which carry the host in NodeId, not ClusterId) won't appear under a cluster. Worth flagging during normalization.

2. Mapping to the canonical AuditEvent

Target = ZB.MOM.WW.Audit.AuditEvent (built in parallel). OtOpcUa's existing AuditEvent is already almost field-for-field aligned; the only synthesized field is Outcome.

Canonical field OtOpcUa source Mapping
Guid EventId AuditEvent.EventId Direct. Already the idempotency key (buffer key + UX_ConfigAuditLog_EventId).
DateTimeOffset OccurredAtUtc AuditEvent.OccurredAtUtc (DateTime) Direct; widen DateTime(UTC) → DateTimeOffset.
string Actor AuditEvent.Actor Direct (→ ConfigAuditLog.Principal). At Auth adoption this becomes the ZB.MOM.WW.Auth principal.
string Action AuditEvent.Action (+ Category) Direct. Today persisted as "{Category}:{Action}" in EventType; canonical keeps Action and Category separate.
AuditOutcome Outcome (none) Derived from the EventType vocabulary, not stored today. OpcUaAccessDenied/CrossClusterNamespaceAttemptDenied; the config-write verbs → Success. No explicit Failure value exists yet (a failed flush is dropped, not recorded as an event).
string? Category AuditEvent.Category Direct (e.g. "Config").
string? Target (none) No dedicated field today; the closest is SourceNodeNodeId (the acting host) or details. Leave null or carry the affected object in DetailsJson.
string? SourceNode AuditEvent.SourceNode (NodeId.Value) Direct — the logical cluster node / host name (NOT an OPC UA NodeId). Currently lands in ConfigAuditLog.NodeId.
Guid? CorrelationId AuditEvent.CorrelationId (CorrelationId.Value) Direct.
string? DetailsJson AuditEvent.DetailsJson Direct; carries everything else (incl. ClusterId/GenerationId, which today are separate columns on the SP path).

3. Adoption plan → ZB.MOM.WW.Audit

Effort: medium. OtOpcUa is the donor design for the canonical record, so most of the work is re-pointing types and bridging two persistence conventions, not redesigning the pipeline.

Replace with the shared library:

  • Commons/Messages/Audit/AuditEvent.cs → the canonical ZB.MOM.WW.Audit.AuditEvent. Add the new Outcome field (derive it at every emit site from the EventType vocabulary, e.g. OpcUaAccessDenied → Denied); keep Category/Action/SourceNode/CorrelationId as-is. Decide whether SourceNode/CorrelationId carry the Commons value-types or the canonical primitives at the seam (likely a thin adapter at construction).
  • AuditWriterActor → implement the library's IAuditWriter (keep the actor as OtOpcUa's Akka-cluster-singleton transport/batching adapter behind that seam; the 500/5s batching, PreRestart/PostStop flush, and two-layer dedup stay bespoke per §"left per-project").

Keep bespoke (thin adapter only):

  • Transport — the cluster-broadcast → singleton AuditWriterActor, batching, and flush triggers.
  • Storage — the ConfigAuditLog EF entity, indexes, and UX_ConfigAuditLog_EventId idempotency index. Map the canonical record onto the existing columns; add an Outcome column (or fold it into EventType/DetailsJson if a schema change is undesirable). ClusterId/GenerationId remain OtOpcUa-specific columns fed via DetailsJson or kept as side columns.
  • Domain vocabulary — the EventType strings (DraftCreated, Published, OpcUaAccessDenied, …) and the Category:Action composition convention.
  • Query/UI — ClusterAudit.razor and its ClusterId filter.

Reconcile, not extract:

  • The two producers (Akka AuditEvent path vs. SQL stored-procedure INSERTs using SUSER_SNAME()). The SP path bypasses the canonical record entirely and writes a different column convention (bare EventType, NULL EventId/CorrelationId, populated ClusterId/GenerationId). Adopting the library does not by itself unify these; either route the SP events through the actor or accept that SP rows stay non-idempotent and absent from the EventId dedup guarantee. Flag for the normalization spec.
  • The ClusterId-filter / actor-never-sets-ClusterId mismatch noted in §1 — fix when the query surface is normalized so structured AuditEvent rows are discoverable by cluster.