diff --git a/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md b/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md new file mode 100644 index 00000000..4e4b8bee --- /dev/null +++ b/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md @@ -0,0 +1,336 @@ +# Cert-action audit logging + `OpcUaProbeResult` dead-letter tidy — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task. + +**Goal:** Record certificate Trust/Untrust/Delete actions to the `ConfigAuditLog` DB table +via the existing `IAuditWriter`/`AuditWriterActor` path (the first live structured audit +emit site), and silence the benign `OpcUaProbeResult` dead-letters in three Runtime actors. + +**Architecture:** Item 1 adds an `ActorAuditWriter : IAuditWriter` AdminUI adapter (Tells the +`AuditWriterActorKey` cluster-singleton proxy, mirroring `AdminOperationsClient`), a pure +`CertAuditEvents` factory, and audits at the `CertificateStoreManager` choke point. Item 2 +adds an intentional-drop `Receive<…OpcUaProbeResult>(_ => {})` to three actors (the +`PeerProbeSupervisor.cs:69` idiom). No EF migration, no Commons/proto/interface change, no bUnit. + +**Tech Stack:** .NET 10, Akka.NET (Akka.Hosting `ActorRegistry`), Blazor Server (AdminUI), +EF Core (`ConfigAuditLog` — table already exists), `ZB.MOM.WW.Audit` package +(`IAuditWriter`/`AuditEvent`), xUnit + Shouldly, Akka.TestKit. + +**Design:** `docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy-design.md` +**Branch:** `feat/cert-audit-and-probe-tidy` (off master `40e8a23e`; design committed `e8c75cc3`) + +**Standing guardrails:** stage by explicit path (never `git add .`); never stage +`sql_login.txt`, `Host/pki/`, `docker-dev/docker-compose.yml`, `pending.md`, `current.md`, +`stillpending.md`. No `--no-verify`, no force-push. `dangerouslyDisableSandbox: true` for +build/test/rig. Finish = merge to master + push. + +**Dependency graph:** `{T1 ∥ T3} → T2 → T4`. T1 (AdminUI cert-audit core) and T3 (Runtime +actor drops) touch disjoint projects ⇒ dispatch their implementers concurrently. T2 depends +on T1 (uses `CertAuditEvents` + `IAuditWriter`). T4 is the final gate. + +--- + +### Task 0 (do this once, before T1/T2): Confirm the `AuditEvent` / `IAuditWriter` / `AuditOutcome` shape + +Not a code task — a **read** the T1/T2 implementer MUST do first so the factory matches the +real package type (it lives in the external `ZB.MOM.WW.Audit` assembly, not this repo). + +- Go-to-definition / decompile `ZB.MOM.WW.Audit.AuditEvent`, `IAuditWriter`, + and the `Outcome` enum type. Confirm: exact property names + whether it's a record with a + primary ctor or settable inits; the `Outcome` enum member names (the design assumes + `Success` / `Failure` — verify); the `CorrelationId` property type (Guid? / a CorrelationId + struct / nullable). +- Ground truth for how the existing code populates it: + `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs:97-107` reads + `EventId, OccurredAtUtc, Actor, Category, Action, SourceNode, DetailsJson, CorrelationId, + Outcome`. Mirror those names exactly. +- If any assumed field/name differs, adjust the T1 factory code below accordingly (the + shape is the contract; the literal property names must match the package). + +--- + +### Task 1: Cert-audit core — `ActorAuditWriter` + `CertAuditEvents` factory + DI + unit tests + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 3 + +**Files:** +- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ActorAuditWriter.cs` +- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/CertAuditEvents.cs` +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs` (register `IAuditWriter` near line 55) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/CertAuditEventsTests.cs` +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/ActorAuditWriterTests.cs` + +**Step 1 — Write `CertAuditEventsTests` (failing).** Cover trust/untrust/delete × success/failure: +- `Build("Trust", "trusted", thumb, "alice", success: true, error: null)` ⇒ `Category=="Certificate"`, + `Action=="Trust"`, `SourceNode==thumb`, `Actor=="alice"`, `Outcome` is the success member, + `DetailsJson` contains the store + thumbprint, `EventId != Guid.Empty`. +- A failure case (`success: false, error: "certificate not found in rejected"`) ⇒ `Outcome` is the + failure member and `DetailsJson` includes the error text. +Use Shouldly. (Assert `DetailsJson` by `ShouldContain` substrings, not exact JSON, to stay robust.) + +**Step 2 — Write `CertAuditEvents`:** +```csharp +using System.Text.Json; +using ZB.MOM.WW.Audit; // confirm namespace in Task 0 + +namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit; + +/// +/// Pure factory for the canonical emitted on a certificate-store +/// mutation (Trust / Untrust / Delete). Separated from CertificateStoreManager so the +/// event shape is unit-testable without a filesystem or an actor system. +/// +public static class CertAuditEvents +{ + /// The audit category for all certificate-store actions. + public const string Category = "Certificate"; + + /// Builds the audit event for one certificate action and outcome. + /// The action verb: Trust / Untrust / Delete. + /// The PKI sub-store acted on (e.g. trusted / rejected). + /// The certificate thumbprint. + /// The authenticated principal performing the action. + /// Whether the action succeeded. + /// The failure message when is false; otherwise null. + /// A canonical audit event ready for . + public static AuditEvent Build( + string action, string store, string thumbprint, string actor, bool success, string? error) + { + var details = JsonSerializer.Serialize(new + { + store, + thumbprint, + error = success ? null : error, + }); + + return new AuditEvent + { + EventId = Guid.NewGuid(), + OccurredAtUtc = DateTimeOffset.UtcNow, + Actor = actor, + Category = Category, + Action = action, + SourceNode = thumbprint, + DetailsJson = details, + // CorrelationId / Outcome property types per Task 0. Outcome member names assumed + // Success / Failure — confirm and adjust. + Outcome = success ? AuditOutcome.Success : AuditOutcome.Failure, + }; + } +} +``` +(Adjust property/enum names to the real package per Task 0. If `AuditEvent` uses a positional +record ctor instead of init-setters, switch to the ctor form.) + +**Step 3 — Write `ActorAuditWriterTests` (failing).** Use Akka.TestKit: register a `TestProbe` +in an `ActorRegistry` under `AuditWriterActorKey`, construct `ActorAuditWriter(registry)`, call +`WriteAsync(evt)`, assert the probe `ExpectMsg()` equal to `evt`. (Mirror how other +AdminUI TestKit tests build an `ActorRegistry` + register a key — search the AdminUI.Tests project.) + +**Step 4 — Write `ActorAuditWriter`** (copy the `AdminOperationsClient` resolve idiom): +```csharp +using Akka.Actor; +using Akka.Hosting; +using ZB.MOM.WW.Audit; +using ZB.MOM.WW.OtOpcUa.ControlPlane; // AuditWriterActorKey + +namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit; + +/// +/// backed by the AuditWriterActor cluster singleton registered +/// in AddOtOpcUaControlPlane. Resolves the singleton proxy from +/// at construction (mirroring AdminOperationsClient); is a +/// best-effort, non-blocking Tell — the actor owns batching, dedup, and flush. +/// +public sealed class ActorAuditWriter : IAuditWriter +{ + private readonly IActorRef _proxy; + + /// Initializes a new instance resolving the audit-writer singleton proxy. + /// The actor registry to resolve the audit-writer singleton proxy. + public ActorAuditWriter(ActorRegistry registry) => _proxy = registry.Get(); + + /// + public Task WriteAsync(AuditEvent evt, CancellationToken ct = default) + { + _proxy.Tell(evt); + return Task.CompletedTask; + } +} +``` +(Confirm the exact `IAuditWriter.WriteAsync` signature in Task 0 and match it.) + +**Step 5 — Register in DI.** In `EndpointRouteBuilderExtensions.cs`, next to +`services.AddSingleton();` (line ~55): +```csharp +services.AddSingleton(); +``` +(Confirm `ActorRegistry` is resolvable in this method's host — it is, Akka.Hosting registers it +as a singleton; `AdminOperationsClient` proves the pattern works in AdminUI.) + +**Step 6 — Run tests + build:** +`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~Audit"` → green. + +**Step 7 — Commit** (stage by path): +`git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/` +then `git commit -m "feat(adminui): IAuditWriter adapter + cert-action audit-event factory"`. + +--- + +### Task 2: Wire audit into `CertificateStoreManager` + `Certificates.razor` + extend manager tests + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 1) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs` +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor:194-205` +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs` + +**Step 1 — Extend `CertificateStoreManagerTests` (failing).** Add a recording fake `IAuditWriter`: +```csharp +private sealed class RecordingAuditWriter : ZB.MOM.WW.Audit.IAuditWriter +{ + public List Events { get; } = new(); + public Task WriteAsync(ZB.MOM.WW.Audit.AuditEvent evt, CancellationToken ct = default) + { Events.Add(evt); return Task.CompletedTask; } +} +``` +Change the test ctor to build `_audit = new RecordingAuditWriter();` and +`_sut = new CertificateStoreManager(_root, _audit);`. Add tests: +- `Trust_success_writes_one_success_audit_event` — seed a rejected cert, `Trust(thumb, "alice")`, + assert `_audit.Events` has one event with `Action=="Trust"`, success outcome, `SourceNode==thumb`. +- `Delete_not_found_writes_one_failure_audit_event` — `Delete("trusted", unknownThumb, "bob")`, + assert one event, failure outcome. +- Confirm the existing (no-audit-arg) assertions still pass via the defaulted test ctor. + +**Step 2 — Modify `CertificateStoreManager`:** +- Add field `private readonly IAuditWriter _audit;`. +- Production ctor: `public CertificateStoreManager(IConfiguration config, IAuditWriter audit)` — + set `_audit = audit;` (and keep the pki-root read). +- Internal test ctor: `internal CertificateStoreManager(string pkiRoot, IAuditWriter? audit = null)` + → `_audit = audit ?? NullAuditWriter`. (Use a tiny private no-op `IAuditWriter` if the package + has no Null impl — confirm in Task 0.) +- Add the actor param to the three public methods and emit after computing the result: +```csharp +public CertActionResult Trust(string thumbprint, string actor) +{ + var r = Move("rejected", "trusted", thumbprint); + Audit("Trust", "rejected→trusted", thumbprint, actor, r); + return r; +} +// Untrust(thumbprint, actor) → Move("trusted","rejected",…) + Audit("Untrust", "trusted→rejected", …) +// Delete(store, thumbprint, actor) → existing body + Audit("Delete", store, …, r) at each return + +private void Audit(string action, string store, string thumbprint, string actor, CertActionResult r) + => _audit.WriteAsync(CertAuditEvents.Build(action, store, thumbprint, actor, r.Success, r.Error)); +``` +Ensure **every** return path of `Delete` (invalid store, invalid thumbprint, not-found, success, +IO failure) is audited — simplest is to compute the `CertActionResult` into a local, `Audit(...)`, +then return it. Keep `_audit.WriteAsync(...)` fire-and-forget (returns a completed task). + +**Step 3 — Modify `Certificates.razor` `ConfirmAction` (line 194):** capture the actor and thread it: +```csharp +var actor = authState.User.Identity?.Name ?? "system"; +var result = p.Verb switch +{ + "trust" => CertManager.Trust(p.Thumbprint, actor), + "untrust" => CertManager.Untrust(p.Thumbprint, actor), + "delete" => p.Kind switch + { + StoreKind.Trusted => CertManager.Delete("trusted", p.Thumbprint, actor), + StoreKind.Rejected => CertManager.Delete("rejected", p.Thumbprint, actor), + _ => CertActionResult.Fail($"cannot delete from {p.Kind}"), + }, + _ => CertActionResult.Fail("unknown action"), +}; +``` +(`authState` is already in scope from the FleetAdmin re-check at line 185.) + +**Step 4 — Test + build:** +`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~CertificateStoreManager"` → green; +`dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → 0 errors (proves the razor compiles). + +**Step 5 — Commit:** +`git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs` +then `git commit -m "feat(adminui): audit cert Trust/Untrust/Delete to ConfigAuditLog"`. + +--- + +### Task 3: `OpcUaProbeResult` dead-letter tidy — three actor drop handlers + guard test + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (handler region ~line 482+) +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs` (ctor handlers ~line 86) +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` (ctor handlers ~line 172) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/…` (new guard test; place beside the existing dead-letter-probe test — find it via `rg "AllDeadLetters" tests`) + +**Step 1 — Write the guard test (failing without the fix).** For at least `DriverHostActor` +(the most-trafficked), subscribe a probe to `AllDeadLetters` on the EventStream, spin up the +actor with its redundancy-topic subscription, publish a `PeerOpcUaProbeActor.OpcUaProbeResult` +to the `RedundancyStateTopic`, and assert **no** `DeadLetter` wrapping an `OpcUaProbeResult`. +(Mirror the Phase-B `AllDeadLetters`-probe idiom already in Runtime.Tests.) A combined test +covering all three is fine if the harness makes that easy; otherwise the `DriverHostActor` +case is the required guard and the other two are covered by inspection + build. + +**Step 2 — Add the drop to each actor.** Each already subscribes the redundancy-state topic in +`PreStart` and imports `Akka.Cluster.Tools.PublishSubscribe`. Add, next to the existing +`RedundancyStateChanged` / `SubscribeAck` handlers: +```csharp +// The redundancy-state topic also carries OpcUaProbeResult (published by OpcUaPublishActor's +// peer-probes). We don't consume it here — drop it so it doesn't dead-letter. Matches +// PeerProbeSupervisor / OpcUaPublishActor. (intentional no-op) +Receive(_ => { }); +``` +- `HistorianAdapterActor`: add in the ctor near `Receive(_ => { });` (~line 87). +- `ScriptedAlarmHostActor`: add in the ctor near the other `Receive<…>` registrations (~line 172). +- `DriverHostActor`: add in the steady-state behaviour where `Receive<…>` handlers are registered + (the `Receive` handler — confirm it exists in each behaviour that is + active while subscribed; add the drop wherever `RedundancyStateChanged` is handled so the probe + result is dropped in the same behaviour(s)). + **Note:** if `DriverHostActor` uses `Become`/multiple behaviours, add the drop to every behaviour + that handles `RedundancyStateChanged`, else the probe still dead-letters in the others. + +**Step 3 — Test + build:** +`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ProbeResult"` → green; +`dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime` → 0 errors. + +**Step 4 — Commit:** +`git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` +then `git commit -m "fix(runtime): drop OpcUaProbeResult in redundancy-topic subscribers (no dead-letter)"`. + +--- + +### Task 4: Reconcile + full build/test + live `/run` + finish + +**Classification:** standard (verification + integration review + finish) +**Estimated implement time:** ~5 min + live run +**Parallelizable with:** none (depends on T1, T2, T3) + +**Step 1 — Full build + solution test:** +`dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 errors; +`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests` → green. + +**Step 2 — Final integration review** (code-reviewer subagent over `git diff master..HEAD`): +confirm no Commons/proto/EF change, audit emitted on all cert paths (incl. failures), the +`AuditEvent` shape matches the package, the drop handler is in every behaviour that subscribes +the topic, and tests genuinely fail without the fix. + +**Step 3 — Live `/run` (agent-driven; docker-dev login is disabled).** Rebuild both central +nodes to the branch (`docker compose -f docker-dev/docker-compose.yml up -d --build central-1 +central-2`); open the AdminUI Certificates page (`http://localhost:9200`), perform a Trust or +Untrust (or seed a rejected cert first), and confirm a new `ConfigAuditLog` row exists +(query the shared SQL `10.100.0.35,14330` ConfigDb: principal = the dev admin user, +EventType = `Certificate:Trust`, Outcome = success). Confirm server logs show **no** +`OpcUaProbeResult` dead-letter on either node. Capture the evidence. + +**Step 4 — Finish.** `git checkout master && git merge --ff-only feat/cert-audit-and-probe-tidy` +(fast-forward; the branch is linear off master), then `git push`. Delete the feature branch. +Update `pending.md`/`stillpending.md` (on-disk, never-staged) to record both items shipped. diff --git a/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md.tasks.json b/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md.tasks.json new file mode 100644 index 00000000..24fcaa37 --- /dev/null +++ b/docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md.tasks.json @@ -0,0 +1,11 @@ +{ + "planPath": "docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy.md", + "executionState": "IN_PROGRESS", + "tasks": [ + {"id": 1, "subject": "Task 1: Cert-audit core — ActorAuditWriter + CertAuditEvents factory + DI + unit tests", "classification": "standard", "status": "pending", "parallelizableWith": [3]}, + {"id": 2, "subject": "Task 2: Wire audit into CertificateStoreManager + Certificates.razor + extend manager tests", "classification": "standard", "status": "pending", "blockedBy": [1]}, + {"id": 3, "subject": "Task 3: OpcUaProbeResult dead-letter tidy — three actor drop handlers + guard test", "classification": "small", "status": "pending", "parallelizableWith": [1]}, + {"id": 4, "subject": "Task 4: Reconcile + full build/test + live /run + finish (merge + push)", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]} + ], + "lastUpdated": "2026-06-19" +}