# Cert-action audit logging + `OpcUaProbeResult` dead-letter tidy — Design **Date:** 2026-06-19 **Status:** Approved (brainstorming) **Branch:** `feat/cert-audit-and-probe-tidy` (off master `40e8a23e`) Two independent, self-contained backlog items, batched into one branch. - **Item 1 — Cert-action audit logging** (`stillpending.md` §A.7 deferred follow-up): the meaty one. Standard complexity. - **Item 2 — `OpcUaProbeResult` dead-letter tidy** (Phase-4 final-integration-review follow-up (i)): trivial actor hygiene. Hard guardrails (both): **no EF migration, no Commons/proto/interface change, no bUnit.** Stage by explicit path; never stage the never-stage files (`sql_login.txt`, `Host/pki/`, `docker-dev/docker-compose.yml`, `pending.md`, `current.md`, `stillpending.md`). No `--no-verify`, no force-push. Finish = merge to master + push. --- ## Background: H2 was reclassified, then this was chosen The session began on the "within-timestamp tie-cluster paging (#400)" follow-up — found **already shipped** (`2e6c6d3a`/`0f929ae6`, 2026-06-17); reconciled the stale `pending.md` banners. Then surveyed the backlog: the only genuinely-OPEN §A item, **H2 (OPC UA `HistoryUpdate` service)**, was reclassified **infra-gated/DEFERRED** (no historian backend can durably insert/replace/delete — `IHistorianDataSource` is read-only and the Wonderware sidecar has no update RPC; same boundary as H5b). The two remaining actionable, in-repo, non-infra items are the two below. --- ## Item 1 — Cert-action audit logging ### Problem When a FleetAdmin **Trusts / Untrusts / Deletes** a peer certificate (`Certificates.razor` → `CertificateStoreManager`), nothing is written to the audit DB. Today only certificate *validation* is logged (to Serilog). These are security-relevant mutations and should leave a durable audit trail. ### Key discovery that shapes the design OtOpcUa has **no live structured `AuditEvent` emit sites yet**. The `Security/AuditActor` helper is explicitly forward-looking ("tested and ready so future emit sites pick up the correct Actor automatically"), and `IAuditWriter` is **not** registered as an injectable DI service anywhere. But the persistence path exists end-to-end: - `AuditWriterActor` (an **AdminRole cluster singleton**, `ControlPlane/Audit/`) implements `IAuditWriter` (from the external `ZB.MOM.WW.Audit` package), batches `AuditEvent`s, and bulk-inserts into the existing **`ConfigAuditLog`** table (filtered-unique-index dedup on `EventId`). - AdminUI already reaches an AdminRole singleton from the `ActorRegistry`: `AdminOperationsClient` does `registry.Get()`. Same pattern works for `AuditWriterActorKey`. So this is the **first** structured emit site — exactly what the infra was built for. The `ConfigAuditLog` table and `AuditWriterActor` already exist ⇒ **no EF migration, no Commons change.** ### Components 1. **`ActorAuditWriter : IAuditWriter`** — *new*, `AdminUI/Audit/ActorAuditWriter.cs`. Holds the `AuditWriterActorKey` singleton-proxy `IActorRef` (resolved from `ActorRegistry`, mirroring `AdminOperationsClient`); `WriteAsync(evt, ct)` does `_proxy.Tell(evt)` and returns `Task.CompletedTask` (best-effort, non-blocking — the actor owns batching/dedup/flush). Registered `AddSingleton()` in the AdminUI DI extension. 2. **`CertAuditEvents.Build(action, store, thumbprint, actor, success, error)` → `AuditEvent`** — *new*, `AdminUI/Audit/CertAuditEvents.cs`. Pure static factory: - `Category = "Certificate"`, `Action ∈ {Trust, Untrust, Delete}` - `SourceNode = thumbprint` - `DetailsJson` = compact JSON `{ "store": …, "thumbprint": …, "error": …? }` - `Outcome` = Success / Failure (from `success`) - `Actor = actor`, fresh `EventId`, `OccurredAtUtc = DateTimeOffset.UtcNow` Pure ⇒ fully unit-testable; the implementer confirms the exact `AuditEvent` required / optional fields + the `AuditOutcome` enum member names against the `ZB.MOM.WW.Audit` package. 3. **`CertificateStoreManager`** — *modify*. Audit at the **choke point** (the thing that performs the mutation; already has a unit-test class — far more testable than the razor, which has no bUnit). Each public method gains the actor string and emits the event from the `CertActionResult` it already computes: - `Trust(string thumbprint, string actor)` - `Untrust(string thumbprint, string actor)` - `Delete(string store, string thumbprint, string actor)` Production ctor gains `IAuditWriter _audit`; the internal test ctor defaults it to a null/fake writer so existing tests compile. Each method calls `_audit.WriteAsync(CertAuditEvents.Build(...))` (non-blocking) — including the not-found / invalid-thumbprint **failure** paths (audited with `Outcome = Failure`). 4. **`Certificates.razor`** — *modify*. Pass the authenticated principal name into the manager calls: `authState.User.Identity?.Name ?? "system"` (already fetched at the FleetAdmin re-check, ~line 185). Sourcing the actor from `AuthenticationState` (not the scoped `IAuditActorAccessor`/HttpContext) avoids the Blazor-Server HttpContext-null pitfall over the SignalR circuit. ### Data flow ``` FleetAdmin clicks Trust → Certificates.razor RequestAction (FleetAdmin re-checked) → CertManager.Trust(thumbprint, actor) → filesystem move (rejected → trusted) [unchanged] → CertAuditEvents.Build(Trust, store, thumbprint, actor, success, error) → IAuditWriter.WriteAsync(evt) (= ActorAuditWriter → proxy.Tell) → AuditWriterActor batches → bulk insert → ConfigAuditLog ``` Both success and failure are audited (Outcome reflects `result.Success`). ### Testing (xUnit + Shouldly; no bUnit) - `CertAuditEventsTests` — factory output for trust/untrust/delete × success/failure: Category, Action, SourceNode, Outcome, DetailsJson shape. - Extend `CertificateStoreManagerTests` — inject a recording fake `IAuditWriter`; assert exactly one event per action with the correct Action + Outcome, including the not-found failure path. - `ActorAuditWriterTests` — a `TestKit` probe registered under `AuditWriterActorKey` receives the Tell'd `AuditEvent`. - Razor wiring proven by `dotnet build` + a live docker-dev `/run` (login disabled on the local rig — agent drives it; deploy, perform a Trust/Untrust on the Certificates page, confirm a new `ConfigAuditLog` row). ### Classification: **standard** (additive DI + first structured emit path; multi-file). --- ## Item 2 — `OpcUaProbeResult` dead-letter tidy ### Problem `DriverHostActor`, `HistorianAdapterActor`, `ScriptedAlarmHostActor` subscribe the redundancy-state topic (for state changes) but have **no** handler for the co-published `PeerOpcUaProbeActor.OpcUaProbeResult`, so each INFO-dead-letters that message (benign, capped at 10 then 5-min-suppressed, but noisy). ### Change Add `Receive(_ => { })` to each of the three actors — the exact intentional-drop pattern already used by `PeerProbeSupervisor.cs:69` and `OpcUaPublishActor` (which actually consumes it). ~3 lines + a one-line doc comment each. The implementer confirms each actor's topic subscription before adding the drop. ### Testing A dead-letter probe test (`EventStream` / `AllDeadLetters`) asserting that an `OpcUaProbeResult` published to the redundancy topic produces **no** dead-letter for each actor — mirrors the existing Phase-B dead-letter-probe test idiom. ### Classification: **trivial → small** (mechanical drop handler + one guard test). --- ## Execution Subagent-driven (this session). Item 1 and Item 2 touch **disjoint** files ⇒ their implementers can be dispatched concurrently. Item 1 gets spec + code review (standard); Item 2 gets a code review (small). Final integration review before finish. Finish = merge to master + push.