7.8 KiB
Cert-action audit logging + OpcUaProbeResult dead-letter tidy — Design
Date: 2026-06-19
Status: Approved (brainstorming)
Branch: feat/cert-audit-and-probe-tidy (off master 40e8a23e)
Two independent, self-contained backlog items, batched into one branch.
- Item 1 — Cert-action audit logging (
stillpending.md§A.7 deferred follow-up): the meaty one. Standard complexity. - Item 2 —
OpcUaProbeResultdead-letter tidy (Phase-4 final-integration-review follow-up (i)): trivial actor hygiene.
Hard guardrails (both): no EF migration, no Commons/proto/interface change, no bUnit.
Stage by explicit path; never stage the never-stage files (sql_login.txt,
Host/pki/, docker-dev/docker-compose.yml, pending.md, current.md,
stillpending.md). No --no-verify, no force-push. Finish = merge to master + push.
Background: H2 was reclassified, then this was chosen
The session began on the "within-timestamp tie-cluster paging (#400)" follow-up — found
already shipped (2e6c6d3a/0f929ae6, 2026-06-17); reconciled the stale pending.md
banners. Then surveyed the backlog: the only genuinely-OPEN §A item, H2 (OPC UA
HistoryUpdate service), was reclassified infra-gated/DEFERRED (no historian backend
can durably insert/replace/delete — IHistorianDataSource is read-only and the Wonderware
sidecar has no update RPC; same boundary as H5b). The two remaining actionable, in-repo,
non-infra items are the two below.
Item 1 — Cert-action audit logging
Problem
When a FleetAdmin Trusts / Untrusts / Deletes a peer certificate
(Certificates.razor → CertificateStoreManager), nothing is written to the audit DB.
Today only certificate validation is logged (to Serilog). These are security-relevant
mutations and should leave a durable audit trail.
Key discovery that shapes the design
OtOpcUa has no live structured AuditEvent emit sites yet. The Security/AuditActor
helper is explicitly forward-looking ("tested and ready so future emit sites pick up the
correct Actor automatically"), and IAuditWriter is not registered as an injectable
DI service anywhere. But the persistence path exists end-to-end:
AuditWriterActor(an AdminRole cluster singleton,ControlPlane/Audit/) implementsIAuditWriter(from the externalZB.MOM.WW.Auditpackage), batchesAuditEvents, and bulk-inserts into the existingConfigAuditLogtable (filtered-unique-index dedup onEventId).- AdminUI already reaches an AdminRole singleton from the
ActorRegistry:AdminOperationsClientdoesregistry.Get<AdminOperationsActorKey>(). Same pattern works forAuditWriterActorKey.
So this is the first structured emit site — exactly what the infra was built for. The
ConfigAuditLog table and AuditWriterActor already exist ⇒ no EF migration, no Commons
change.
Components
-
ActorAuditWriter : IAuditWriter— new,AdminUI/Audit/ActorAuditWriter.cs. Holds theAuditWriterActorKeysingleton-proxyIActorRef(resolved fromActorRegistry, mirroringAdminOperationsClient);WriteAsync(evt, ct)does_proxy.Tell(evt)and returnsTask.CompletedTask(best-effort, non-blocking — the actor owns batching/dedup/flush). RegisteredAddSingleton<IAuditWriter, ActorAuditWriter>()in the AdminUI DI extension. -
CertAuditEvents.Build(action, store, thumbprint, actor, success, error)→AuditEvent— new,AdminUI/Audit/CertAuditEvents.cs. Pure static factory:Category = "Certificate",Action ∈ {Trust, Untrust, Delete}SourceNode = thumbprintDetailsJson= compact JSON{ "store": …, "thumbprint": …, "error": …? }Outcome= Success / Failure (fromsuccess)Actor = actor, freshEventId,OccurredAtUtc = DateTimeOffset.UtcNowPure ⇒ fully unit-testable; the implementer confirms the exactAuditEventrequired / optional fields + theAuditOutcomeenum member names against theZB.MOM.WW.Auditpackage.
-
CertificateStoreManager— modify. Audit at the choke point (the thing that performs the mutation; already has a unit-test class — far more testable than the razor, which has no bUnit). Each public method gains the actor string and emits the event from theCertActionResultit already computes:Trust(string thumbprint, string actor)Untrust(string thumbprint, string actor)Delete(string store, string thumbprint, string actor)Production ctor gainsIAuditWriter _audit; the internal test ctor defaults it to a null/fake writer so existing tests compile. Each method calls_audit.WriteAsync(CertAuditEvents.Build(...))(non-blocking) — including the not-found / invalid-thumbprint failure paths (audited withOutcome = Failure).
-
Certificates.razor— modify. Pass the authenticated principal name into the manager calls:authState.User.Identity?.Name ?? "system"(already fetched at the FleetAdmin re-check, ~line 185). Sourcing the actor fromAuthenticationState(not the scopedIAuditActorAccessor/HttpContext) avoids the Blazor-Server HttpContext-null pitfall over the SignalR circuit.
Data flow
FleetAdmin clicks Trust
→ Certificates.razor RequestAction (FleetAdmin re-checked)
→ CertManager.Trust(thumbprint, actor)
→ filesystem move (rejected → trusted) [unchanged]
→ CertAuditEvents.Build(Trust, store, thumbprint, actor, success, error)
→ IAuditWriter.WriteAsync(evt) (= ActorAuditWriter → proxy.Tell)
→ AuditWriterActor batches → bulk insert → ConfigAuditLog
Both success and failure are audited (Outcome reflects result.Success).
Testing (xUnit + Shouldly; no bUnit)
CertAuditEventsTests— factory output for trust/untrust/delete × success/failure: Category, Action, SourceNode, Outcome, DetailsJson shape.- Extend
CertificateStoreManagerTests— inject a recording fakeIAuditWriter; assert exactly one event per action with the correct Action + Outcome, including the not-found failure path. ActorAuditWriterTests— aTestKitprobe registered underAuditWriterActorKeyreceives the Tell'dAuditEvent.- Razor wiring proven by
dotnet build+ a live docker-dev/run(login disabled on the local rig — agent drives it; deploy, perform a Trust/Untrust on the Certificates page, confirm a newConfigAuditLogrow).
Classification: standard (additive DI + first structured emit path; multi-file).
Item 2 — OpcUaProbeResult dead-letter tidy
Problem
DriverHostActor, HistorianAdapterActor, ScriptedAlarmHostActor subscribe the
redundancy-state topic (for state changes) but have no handler for the co-published
PeerOpcUaProbeActor.OpcUaProbeResult, so each INFO-dead-letters that message (benign,
capped at 10 then 5-min-suppressed, but noisy).
Change
Add Receive<PeerOpcUaProbeActor.OpcUaProbeResult>(_ => { }) to each of the three actors —
the exact intentional-drop pattern already used by PeerProbeSupervisor.cs:69 and
OpcUaPublishActor (which actually consumes it). ~3 lines + a one-line doc comment each.
The implementer confirms each actor's topic subscription before adding the drop.
Testing
A dead-letter probe test (EventStream / AllDeadLetters) asserting that an
OpcUaProbeResult published to the redundancy topic produces no dead-letter for each
actor — mirrors the existing Phase-B dead-letter-probe test idiom.
Classification: trivial → small (mechanical drop handler + one guard test).
Execution
Subagent-driven (this session). Item 1 and Item 2 touch disjoint files ⇒ their implementers can be dispatched concurrently. Item 1 gets spec + code review (standard); Item 2 gets a code review (small). Final integration review before finish. Finish = merge to master + push.