# Cert-action audit logging + `OpcUaProbeResult` dead-letter tidy — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task. **Goal:** Record certificate Trust/Untrust/Delete actions to the `ConfigAuditLog` DB table via the existing `IAuditWriter`/`AuditWriterActor` path (the first live structured audit emit site), and silence the benign `OpcUaProbeResult` dead-letters in three Runtime actors. **Architecture:** Item 1 adds an `ActorAuditWriter : IAuditWriter` AdminUI adapter (Tells the `AuditWriterActorKey` cluster-singleton proxy, mirroring `AdminOperationsClient`), a pure `CertAuditEvents` factory, and audits at the `CertificateStoreManager` choke point. Item 2 adds an intentional-drop `Receive<…OpcUaProbeResult>(_ => {})` to three actors (the `PeerProbeSupervisor.cs:69` idiom). No EF migration, no Commons/proto/interface change, no bUnit. **Tech Stack:** .NET 10, Akka.NET (Akka.Hosting `ActorRegistry`), Blazor Server (AdminUI), EF Core (`ConfigAuditLog` — table already exists), `ZB.MOM.WW.Audit` package (`IAuditWriter`/`AuditEvent`), xUnit + Shouldly, Akka.TestKit. **Design:** `docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy-design.md` **Branch:** `feat/cert-audit-and-probe-tidy` (off master `40e8a23e`; design committed `e8c75cc3`) **Standing guardrails:** stage by explicit path (never `git add .`); never stage `sql_login.txt`, `Host/pki/`, `docker-dev/docker-compose.yml`, `pending.md`, `current.md`, `stillpending.md`. No `--no-verify`, no force-push. `dangerouslyDisableSandbox: true` for build/test/rig. Finish = merge to master + push. **Dependency graph:** `{T1 ∥ T3} → T2 → T4`. T1 (AdminUI cert-audit core) and T3 (Runtime actor drops) touch disjoint projects ⇒ dispatch their implementers concurrently. T2 depends on T1 (uses `CertAuditEvents` + `IAuditWriter`). T4 is the final gate. --- ### Task 0 (do this once, before T1/T2): Confirm the `AuditEvent` / `IAuditWriter` / `AuditOutcome` shape Not a code task — a **read** the T1/T2 implementer MUST do first so the factory matches the real package type (it lives in the external `ZB.MOM.WW.Audit` assembly, not this repo). - Go-to-definition / decompile `ZB.MOM.WW.Audit.AuditEvent`, `IAuditWriter`, and the `Outcome` enum type. Confirm: exact property names + whether it's a record with a primary ctor or settable inits; the `Outcome` enum member names (the design assumes `Success` / `Failure` — verify); the `CorrelationId` property type (Guid? / a CorrelationId struct / nullable). - Ground truth for how the existing code populates it: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs:97-107` reads `EventId, OccurredAtUtc, Actor, Category, Action, SourceNode, DetailsJson, CorrelationId, Outcome`. Mirror those names exactly. - If any assumed field/name differs, adjust the T1 factory code below accordingly (the shape is the contract; the literal property names must match the package). --- ### Task 1: Cert-audit core — `ActorAuditWriter` + `CertAuditEvents` factory + DI + unit tests **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** Task 3 **Files:** - Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ActorAuditWriter.cs` - Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/CertAuditEvents.cs` - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs` (register `IAuditWriter` near line 55) - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/CertAuditEventsTests.cs` - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/ActorAuditWriterTests.cs` **Step 1 — Write `CertAuditEventsTests` (failing).** Cover trust/untrust/delete × success/failure: - `Build("Trust", "trusted", thumb, "alice", success: true, error: null)` ⇒ `Category=="Certificate"`, `Action=="Trust"`, `SourceNode==thumb`, `Actor=="alice"`, `Outcome` is the success member, `DetailsJson` contains the store + thumbprint, `EventId != Guid.Empty`. - A failure case (`success: false, error: "certificate not found in rejected"`) ⇒ `Outcome` is the failure member and `DetailsJson` includes the error text. Use Shouldly. (Assert `DetailsJson` by `ShouldContain` substrings, not exact JSON, to stay robust.) **Step 2 — Write `CertAuditEvents`:** ```csharp using System.Text.Json; using ZB.MOM.WW.Audit; // confirm namespace in Task 0 namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit; /// /// Pure factory for the canonical emitted on a certificate-store /// mutation (Trust / Untrust / Delete). Separated from CertificateStoreManager so the /// event shape is unit-testable without a filesystem or an actor system. /// public static class CertAuditEvents { /// The audit category for all certificate-store actions. public const string Category = "Certificate"; /// Builds the audit event for one certificate action and outcome. /// The action verb: Trust / Untrust / Delete. /// The PKI sub-store acted on (e.g. trusted / rejected). /// The certificate thumbprint. /// The authenticated principal performing the action. /// Whether the action succeeded. /// The failure message when is false; otherwise null. /// A canonical audit event ready for . public static AuditEvent Build( string action, string store, string thumbprint, string actor, bool success, string? error) { var details = JsonSerializer.Serialize(new { store, thumbprint, error = success ? null : error, }); return new AuditEvent { EventId = Guid.NewGuid(), OccurredAtUtc = DateTimeOffset.UtcNow, Actor = actor, Category = Category, Action = action, SourceNode = thumbprint, DetailsJson = details, // CorrelationId / Outcome property types per Task 0. Outcome member names assumed // Success / Failure — confirm and adjust. Outcome = success ? AuditOutcome.Success : AuditOutcome.Failure, }; } } ``` (Adjust property/enum names to the real package per Task 0. If `AuditEvent` uses a positional record ctor instead of init-setters, switch to the ctor form.) **Step 3 — Write `ActorAuditWriterTests` (failing).** Use Akka.TestKit: register a `TestProbe` in an `ActorRegistry` under `AuditWriterActorKey`, construct `ActorAuditWriter(registry)`, call `WriteAsync(evt)`, assert the probe `ExpectMsg()` equal to `evt`. (Mirror how other AdminUI TestKit tests build an `ActorRegistry` + register a key — search the AdminUI.Tests project.) **Step 4 — Write `ActorAuditWriter`** (copy the `AdminOperationsClient` resolve idiom): ```csharp using Akka.Actor; using Akka.Hosting; using ZB.MOM.WW.Audit; using ZB.MOM.WW.OtOpcUa.ControlPlane; // AuditWriterActorKey namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit; /// /// backed by the AuditWriterActor cluster singleton registered /// in AddOtOpcUaControlPlane. Resolves the singleton proxy from /// at construction (mirroring AdminOperationsClient); is a /// best-effort, non-blocking Tell — the actor owns batching, dedup, and flush. /// public sealed class ActorAuditWriter : IAuditWriter { private readonly IActorRef _proxy; /// Initializes a new instance resolving the audit-writer singleton proxy. /// The actor registry to resolve the audit-writer singleton proxy. public ActorAuditWriter(ActorRegistry registry) => _proxy = registry.Get(); /// public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) { _proxy.Tell(evt); return Task.CompletedTask; } } ``` (Confirm the exact `IAuditWriter.WriteAsync` signature in Task 0 and match it.) **Step 5 — Register in DI.** In `EndpointRouteBuilderExtensions.cs`, next to `services.AddSingleton();` (line ~55): ```csharp services.AddSingleton(); ``` (Confirm `ActorRegistry` is resolvable in this method's host — it is, Akka.Hosting registers it as a singleton; `AdminOperationsClient` proves the pattern works in AdminUI.) **Step 6 — Run tests + build:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~Audit"` → green. **Step 7 — Commit** (stage by path): `git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/` then `git commit -m "feat(adminui): IAuditWriter adapter + cert-action audit-event factory"`. --- ### Task 2: Wire audit into `CertificateStoreManager` + `Certificates.razor` + extend manager tests **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** none (depends on Task 1) **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs` - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor:194-205` - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs` **Step 1 — Extend `CertificateStoreManagerTests` (failing).** Add a recording fake `IAuditWriter`: ```csharp private sealed class RecordingAuditWriter : ZB.MOM.WW.Audit.IAuditWriter { public List Events { get; } = new(); public Task WriteAsync(ZB.MOM.WW.Audit.AuditEvent evt, CancellationToken ct = default) { Events.Add(evt); return Task.CompletedTask; } } ``` Change the test ctor to build `_audit = new RecordingAuditWriter();` and `_sut = new CertificateStoreManager(_root, _audit);`. Add tests: - `Trust_success_writes_one_success_audit_event` — seed a rejected cert, `Trust(thumb, "alice")`, assert `_audit.Events` has one event with `Action=="Trust"`, success outcome, `SourceNode==thumb`. - `Delete_not_found_writes_one_failure_audit_event` — `Delete("trusted", unknownThumb, "bob")`, assert one event, failure outcome. - Confirm the existing (no-audit-arg) assertions still pass via the defaulted test ctor. **Step 2 — Modify `CertificateStoreManager`:** - Add field `private readonly IAuditWriter _audit;`. - Production ctor: `public CertificateStoreManager(IConfiguration config, IAuditWriter audit)` — set `_audit = audit;` (and keep the pki-root read). - Internal test ctor: `internal CertificateStoreManager(string pkiRoot, IAuditWriter? audit = null)` → `_audit = audit ?? NullAuditWriter`. (Use a tiny private no-op `IAuditWriter` if the package has no Null impl — confirm in Task 0.) - Add the actor param to the three public methods and emit after computing the result: ```csharp public CertActionResult Trust(string thumbprint, string actor) { var r = Move("rejected", "trusted", thumbprint); Audit("Trust", "rejected→trusted", thumbprint, actor, r); return r; } // Untrust(thumbprint, actor) → Move("trusted","rejected",…) + Audit("Untrust", "trusted→rejected", …) // Delete(store, thumbprint, actor) → existing body + Audit("Delete", store, …, r) at each return private void Audit(string action, string store, string thumbprint, string actor, CertActionResult r) => _audit.WriteAsync(CertAuditEvents.Build(action, store, thumbprint, actor, r.Success, r.Error)); ``` Ensure **every** return path of `Delete` (invalid store, invalid thumbprint, not-found, success, IO failure) is audited — simplest is to compute the `CertActionResult` into a local, `Audit(...)`, then return it. Keep `_audit.WriteAsync(...)` fire-and-forget (returns a completed task). **Step 3 — Modify `Certificates.razor` `ConfirmAction` (line 194):** capture the actor and thread it: ```csharp var actor = authState.User.Identity?.Name ?? "system"; var result = p.Verb switch { "trust" => CertManager.Trust(p.Thumbprint, actor), "untrust" => CertManager.Untrust(p.Thumbprint, actor), "delete" => p.Kind switch { StoreKind.Trusted => CertManager.Delete("trusted", p.Thumbprint, actor), StoreKind.Rejected => CertManager.Delete("rejected", p.Thumbprint, actor), _ => CertActionResult.Fail($"cannot delete from {p.Kind}"), }, _ => CertActionResult.Fail("unknown action"), }; ``` (`authState` is already in scope from the FleetAdmin re-check at line 185.) **Step 4 — Test + build:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~CertificateStoreManager"` → green; `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → 0 errors (proves the razor compiles). **Step 5 — Commit:** `git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs` then `git commit -m "feat(adminui): audit cert Trust/Untrust/Delete to ConfigAuditLog"`. --- ### Task 3: `OpcUaProbeResult` dead-letter tidy — three actor drop handlers + guard test **Classification:** small **Estimated implement time:** ~4 min **Parallelizable with:** Task 1 **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (handler region ~line 482+) - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs` (ctor handlers ~line 86) - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` (ctor handlers ~line 172) - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/…` (new guard test; place beside the existing dead-letter-probe test — find it via `rg "AllDeadLetters" tests`) **Step 1 — Write the guard test (failing without the fix).** For at least `DriverHostActor` (the most-trafficked), subscribe a probe to `AllDeadLetters` on the EventStream, spin up the actor with its redundancy-topic subscription, publish a `PeerOpcUaProbeActor.OpcUaProbeResult` to the `RedundancyStateTopic`, and assert **no** `DeadLetter` wrapping an `OpcUaProbeResult`. (Mirror the Phase-B `AllDeadLetters`-probe idiom already in Runtime.Tests.) A combined test covering all three is fine if the harness makes that easy; otherwise the `DriverHostActor` case is the required guard and the other two are covered by inspection + build. **Step 2 — Add the drop to each actor.** Each already subscribes the redundancy-state topic in `PreStart` and imports `Akka.Cluster.Tools.PublishSubscribe`. Add, next to the existing `RedundancyStateChanged` / `SubscribeAck` handlers: ```csharp // The redundancy-state topic also carries OpcUaProbeResult (published by OpcUaPublishActor's // peer-probes). We don't consume it here — drop it so it doesn't dead-letter. Matches // PeerProbeSupervisor / OpcUaPublishActor. (intentional no-op) Receive(_ => { }); ``` - `HistorianAdapterActor`: add in the ctor near `Receive(_ => { });` (~line 87). - `ScriptedAlarmHostActor`: add in the ctor near the other `Receive<…>` registrations (~line 172). - `DriverHostActor`: add in the steady-state behaviour where `Receive<…>` handlers are registered (the `Receive` handler — confirm it exists in each behaviour that is active while subscribed; add the drop wherever `RedundancyStateChanged` is handled so the probe result is dropped in the same behaviour(s)). **Note:** if `DriverHostActor` uses `Become`/multiple behaviours, add the drop to every behaviour that handles `RedundancyStateChanged`, else the probe still dead-letters in the others. **Step 3 — Test + build:** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ProbeResult"` → green; `dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime` → 0 errors. **Step 4 — Commit:** `git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` then `git commit -m "fix(runtime): drop OpcUaProbeResult in redundancy-topic subscribers (no dead-letter)"`. --- ### Task 4: Reconcile + full build/test + live `/run` + finish **Classification:** standard (verification + integration review + finish) **Estimated implement time:** ~5 min + live run **Parallelizable with:** none (depends on T1, T2, T3) **Step 1 — Full build + solution test:** `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 errors; `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests` → green. **Step 2 — Final integration review** (code-reviewer subagent over `git diff master..HEAD`): confirm no Commons/proto/EF change, audit emitted on all cert paths (incl. failures), the `AuditEvent` shape matches the package, the drop handler is in every behaviour that subscribes the topic, and tests genuinely fail without the fix. **Step 3 — Live `/run` (agent-driven; docker-dev login is disabled).** Rebuild both central nodes to the branch (`docker compose -f docker-dev/docker-compose.yml up -d --build central-1 central-2`); open the AdminUI Certificates page (`http://localhost:9200`), perform a Trust or Untrust (or seed a rejected cert first), and confirm a new `ConfigAuditLog` row exists (query the shared SQL `10.100.0.35,14330` ConfigDb: principal = the dev admin user, EventType = `Certificate:Trust`, Outcome = success). Confirm server logs show **no** `OpcUaProbeResult` dead-letter on either node. Capture the evidence. **Step 4 — Finish.** `git checkout master && git merge --ff-only feat/cert-audit-and-probe-tidy` (fast-forward; the branch is linear off master), then `git push`. Delete the feature branch. Update `pending.md`/`stillpending.md` (on-disk, never-staged) to record both items shipped.