337 lines
18 KiB
Markdown
337 lines
18 KiB
Markdown
# Cert-action audit logging + `OpcUaProbeResult` dead-letter tidy — Implementation Plan
|
||
|
||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
|
||
|
||
**Goal:** Record certificate Trust/Untrust/Delete actions to the `ConfigAuditLog` DB table
|
||
via the existing `IAuditWriter`/`AuditWriterActor` path (the first live structured audit
|
||
emit site), and silence the benign `OpcUaProbeResult` dead-letters in three Runtime actors.
|
||
|
||
**Architecture:** Item 1 adds an `ActorAuditWriter : IAuditWriter` AdminUI adapter (Tells the
|
||
`AuditWriterActorKey` cluster-singleton proxy, mirroring `AdminOperationsClient`), a pure
|
||
`CertAuditEvents` factory, and audits at the `CertificateStoreManager` choke point. Item 2
|
||
adds an intentional-drop `Receive<…OpcUaProbeResult>(_ => {})` to three actors (the
|
||
`PeerProbeSupervisor.cs:69` idiom). No EF migration, no Commons/proto/interface change, no bUnit.
|
||
|
||
**Tech Stack:** .NET 10, Akka.NET (Akka.Hosting `ActorRegistry`), Blazor Server (AdminUI),
|
||
EF Core (`ConfigAuditLog` — table already exists), `ZB.MOM.WW.Audit` package
|
||
(`IAuditWriter`/`AuditEvent`), xUnit + Shouldly, Akka.TestKit.
|
||
|
||
**Design:** `docs/plans/2026-06-19-cert-audit-and-probe-deadletter-tidy-design.md`
|
||
**Branch:** `feat/cert-audit-and-probe-tidy` (off master `40e8a23e`; design committed `e8c75cc3`)
|
||
|
||
**Standing guardrails:** stage by explicit path (never `git add .`); never stage
|
||
`sql_login.txt`, `Host/pki/`, `docker-dev/docker-compose.yml`, `pending.md`, `current.md`,
|
||
`stillpending.md`. No `--no-verify`, no force-push. `dangerouslyDisableSandbox: true` for
|
||
build/test/rig. Finish = merge to master + push.
|
||
|
||
**Dependency graph:** `{T1 ∥ T3} → T2 → T4`. T1 (AdminUI cert-audit core) and T3 (Runtime
|
||
actor drops) touch disjoint projects ⇒ dispatch their implementers concurrently. T2 depends
|
||
on T1 (uses `CertAuditEvents` + `IAuditWriter`). T4 is the final gate.
|
||
|
||
---
|
||
|
||
### Task 0 (do this once, before T1/T2): Confirm the `AuditEvent` / `IAuditWriter` / `AuditOutcome` shape
|
||
|
||
Not a code task — a **read** the T1/T2 implementer MUST do first so the factory matches the
|
||
real package type (it lives in the external `ZB.MOM.WW.Audit` assembly, not this repo).
|
||
|
||
- Go-to-definition / decompile `ZB.MOM.WW.Audit.AuditEvent`, `IAuditWriter`,
|
||
and the `Outcome` enum type. Confirm: exact property names + whether it's a record with a
|
||
primary ctor or settable inits; the `Outcome` enum member names (the design assumes
|
||
`Success` / `Failure` — verify); the `CorrelationId` property type (Guid? / a CorrelationId
|
||
struct / nullable).
|
||
- Ground truth for how the existing code populates it:
|
||
`src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/Audit/AuditWriterActor.cs:97-107` reads
|
||
`EventId, OccurredAtUtc, Actor, Category, Action, SourceNode, DetailsJson, CorrelationId,
|
||
Outcome`. Mirror those names exactly.
|
||
- If any assumed field/name differs, adjust the T1 factory code below accordingly (the
|
||
shape is the contract; the literal property names must match the package).
|
||
|
||
---
|
||
|
||
### Task 1: Cert-audit core — `ActorAuditWriter` + `CertAuditEvents` factory + DI + unit tests
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** Task 3
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ActorAuditWriter.cs`
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/CertAuditEvents.cs`
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs` (register `IAuditWriter` near line 55)
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/CertAuditEventsTests.cs`
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/ActorAuditWriterTests.cs`
|
||
|
||
**Step 1 — Write `CertAuditEventsTests` (failing).** Cover trust/untrust/delete × success/failure:
|
||
- `Build("Trust", "trusted", thumb, "alice", success: true, error: null)` ⇒ `Category=="Certificate"`,
|
||
`Action=="Trust"`, `SourceNode==thumb`, `Actor=="alice"`, `Outcome` is the success member,
|
||
`DetailsJson` contains the store + thumbprint, `EventId != Guid.Empty`.
|
||
- A failure case (`success: false, error: "certificate not found in rejected"`) ⇒ `Outcome` is the
|
||
failure member and `DetailsJson` includes the error text.
|
||
Use Shouldly. (Assert `DetailsJson` by `ShouldContain` substrings, not exact JSON, to stay robust.)
|
||
|
||
**Step 2 — Write `CertAuditEvents`:**
|
||
```csharp
|
||
using System.Text.Json;
|
||
using ZB.MOM.WW.Audit; // confirm namespace in Task 0
|
||
|
||
namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit;
|
||
|
||
/// <summary>
|
||
/// Pure factory for the canonical <see cref="AuditEvent"/> emitted on a certificate-store
|
||
/// mutation (Trust / Untrust / Delete). Separated from <c>CertificateStoreManager</c> so the
|
||
/// event shape is unit-testable without a filesystem or an actor system.
|
||
/// </summary>
|
||
public static class CertAuditEvents
|
||
{
|
||
/// <summary>The audit category for all certificate-store actions.</summary>
|
||
public const string Category = "Certificate";
|
||
|
||
/// <summary>Builds the audit event for one certificate action and outcome.</summary>
|
||
/// <param name="action">The action verb: <c>Trust</c> / <c>Untrust</c> / <c>Delete</c>.</param>
|
||
/// <param name="store">The PKI sub-store acted on (e.g. <c>trusted</c> / <c>rejected</c>).</param>
|
||
/// <param name="thumbprint">The certificate thumbprint.</param>
|
||
/// <param name="actor">The authenticated principal performing the action.</param>
|
||
/// <param name="success">Whether the action succeeded.</param>
|
||
/// <param name="error">The failure message when <paramref name="success"/> is false; otherwise null.</param>
|
||
/// <returns>A canonical audit event ready for <see cref="IAuditWriter.WriteAsync"/>.</returns>
|
||
public static AuditEvent Build(
|
||
string action, string store, string thumbprint, string actor, bool success, string? error)
|
||
{
|
||
var details = JsonSerializer.Serialize(new
|
||
{
|
||
store,
|
||
thumbprint,
|
||
error = success ? null : error,
|
||
});
|
||
|
||
return new AuditEvent
|
||
{
|
||
EventId = Guid.NewGuid(),
|
||
OccurredAtUtc = DateTimeOffset.UtcNow,
|
||
Actor = actor,
|
||
Category = Category,
|
||
Action = action,
|
||
SourceNode = thumbprint,
|
||
DetailsJson = details,
|
||
// CorrelationId / Outcome property types per Task 0. Outcome member names assumed
|
||
// Success / Failure — confirm and adjust.
|
||
Outcome = success ? AuditOutcome.Success : AuditOutcome.Failure,
|
||
};
|
||
}
|
||
}
|
||
```
|
||
(Adjust property/enum names to the real package per Task 0. If `AuditEvent` uses a positional
|
||
record ctor instead of init-setters, switch to the ctor form.)
|
||
|
||
**Step 3 — Write `ActorAuditWriterTests` (failing).** Use Akka.TestKit: register a `TestProbe`
|
||
in an `ActorRegistry` under `AuditWriterActorKey`, construct `ActorAuditWriter(registry)`, call
|
||
`WriteAsync(evt)`, assert the probe `ExpectMsg<AuditEvent>()` equal to `evt`. (Mirror how other
|
||
AdminUI TestKit tests build an `ActorRegistry` + register a key — search the AdminUI.Tests project.)
|
||
|
||
**Step 4 — Write `ActorAuditWriter`** (copy the `AdminOperationsClient` resolve idiom):
|
||
```csharp
|
||
using Akka.Actor;
|
||
using Akka.Hosting;
|
||
using ZB.MOM.WW.Audit;
|
||
using ZB.MOM.WW.OtOpcUa.ControlPlane; // AuditWriterActorKey
|
||
|
||
namespace ZB.MOM.WW.OtOpcUa.AdminUI.Audit;
|
||
|
||
/// <summary>
|
||
/// <see cref="IAuditWriter"/> backed by the <c>AuditWriterActor</c> cluster singleton registered
|
||
/// in <c>AddOtOpcUaControlPlane</c>. Resolves the singleton proxy from <see cref="ActorRegistry"/>
|
||
/// at construction (mirroring <c>AdminOperationsClient</c>); <see cref="WriteAsync"/> is a
|
||
/// best-effort, non-blocking <c>Tell</c> — the actor owns batching, dedup, and flush.
|
||
/// </summary>
|
||
public sealed class ActorAuditWriter : IAuditWriter
|
||
{
|
||
private readonly IActorRef _proxy;
|
||
|
||
/// <summary>Initializes a new instance resolving the audit-writer singleton proxy.</summary>
|
||
/// <param name="registry">The actor registry to resolve the audit-writer singleton proxy.</param>
|
||
public ActorAuditWriter(ActorRegistry registry) => _proxy = registry.Get<AuditWriterActorKey>();
|
||
|
||
/// <inheritdoc />
|
||
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||
{
|
||
_proxy.Tell(evt);
|
||
return Task.CompletedTask;
|
||
}
|
||
}
|
||
```
|
||
(Confirm the exact `IAuditWriter.WriteAsync` signature in Task 0 and match it.)
|
||
|
||
**Step 5 — Register in DI.** In `EndpointRouteBuilderExtensions.cs`, next to
|
||
`services.AddSingleton<Certificates.CertificateStoreManager>();` (line ~55):
|
||
```csharp
|
||
services.AddSingleton<ZB.MOM.WW.Audit.IAuditWriter, Audit.ActorAuditWriter>();
|
||
```
|
||
(Confirm `ActorRegistry` is resolvable in this method's host — it is, Akka.Hosting registers it
|
||
as a singleton; `AdminOperationsClient` proves the pattern works in AdminUI.)
|
||
|
||
**Step 6 — Run tests + build:**
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~Audit"` → green.
|
||
|
||
**Step 7 — Commit** (stage by path):
|
||
`git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Audit/ src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/EndpointRouteBuilderExtensions.cs tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Audit/`
|
||
then `git commit -m "feat(adminui): IAuditWriter adapter + cert-action audit-event factory"`.
|
||
|
||
---
|
||
|
||
### Task 2: Wire audit into `CertificateStoreManager` + `Certificates.razor` + extend manager tests
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** none (depends on Task 1)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs`
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor:194-205`
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs`
|
||
|
||
**Step 1 — Extend `CertificateStoreManagerTests` (failing).** Add a recording fake `IAuditWriter`:
|
||
```csharp
|
||
private sealed class RecordingAuditWriter : ZB.MOM.WW.Audit.IAuditWriter
|
||
{
|
||
public List<ZB.MOM.WW.Audit.AuditEvent> Events { get; } = new();
|
||
public Task WriteAsync(ZB.MOM.WW.Audit.AuditEvent evt, CancellationToken ct = default)
|
||
{ Events.Add(evt); return Task.CompletedTask; }
|
||
}
|
||
```
|
||
Change the test ctor to build `_audit = new RecordingAuditWriter();` and
|
||
`_sut = new CertificateStoreManager(_root, _audit);`. Add tests:
|
||
- `Trust_success_writes_one_success_audit_event` — seed a rejected cert, `Trust(thumb, "alice")`,
|
||
assert `_audit.Events` has one event with `Action=="Trust"`, success outcome, `SourceNode==thumb`.
|
||
- `Delete_not_found_writes_one_failure_audit_event` — `Delete("trusted", unknownThumb, "bob")`,
|
||
assert one event, failure outcome.
|
||
- Confirm the existing (no-audit-arg) assertions still pass via the defaulted test ctor.
|
||
|
||
**Step 2 — Modify `CertificateStoreManager`:**
|
||
- Add field `private readonly IAuditWriter _audit;`.
|
||
- Production ctor: `public CertificateStoreManager(IConfiguration config, IAuditWriter audit)` —
|
||
set `_audit = audit;` (and keep the pki-root read).
|
||
- Internal test ctor: `internal CertificateStoreManager(string pkiRoot, IAuditWriter? audit = null)`
|
||
→ `_audit = audit ?? NullAuditWriter`. (Use a tiny private no-op `IAuditWriter` if the package
|
||
has no Null impl — confirm in Task 0.)
|
||
- Add the actor param to the three public methods and emit after computing the result:
|
||
```csharp
|
||
public CertActionResult Trust(string thumbprint, string actor)
|
||
{
|
||
var r = Move("rejected", "trusted", thumbprint);
|
||
Audit("Trust", "rejected→trusted", thumbprint, actor, r);
|
||
return r;
|
||
}
|
||
// Untrust(thumbprint, actor) → Move("trusted","rejected",…) + Audit("Untrust", "trusted→rejected", …)
|
||
// Delete(store, thumbprint, actor) → existing body + Audit("Delete", store, …, r) at each return
|
||
|
||
private void Audit(string action, string store, string thumbprint, string actor, CertActionResult r)
|
||
=> _audit.WriteAsync(CertAuditEvents.Build(action, store, thumbprint, actor, r.Success, r.Error));
|
||
```
|
||
Ensure **every** return path of `Delete` (invalid store, invalid thumbprint, not-found, success,
|
||
IO failure) is audited — simplest is to compute the `CertActionResult` into a local, `Audit(...)`,
|
||
then return it. Keep `_audit.WriteAsync(...)` fire-and-forget (returns a completed task).
|
||
|
||
**Step 3 — Modify `Certificates.razor` `ConfirmAction` (line 194):** capture the actor and thread it:
|
||
```csharp
|
||
var actor = authState.User.Identity?.Name ?? "system";
|
||
var result = p.Verb switch
|
||
{
|
||
"trust" => CertManager.Trust(p.Thumbprint, actor),
|
||
"untrust" => CertManager.Untrust(p.Thumbprint, actor),
|
||
"delete" => p.Kind switch
|
||
{
|
||
StoreKind.Trusted => CertManager.Delete("trusted", p.Thumbprint, actor),
|
||
StoreKind.Rejected => CertManager.Delete("rejected", p.Thumbprint, actor),
|
||
_ => CertActionResult.Fail($"cannot delete from {p.Kind}"),
|
||
},
|
||
_ => CertActionResult.Fail("unknown action"),
|
||
};
|
||
```
|
||
(`authState` is already in scope from the FleetAdmin re-check at line 185.)
|
||
|
||
**Step 4 — Test + build:**
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests --filter "FullyQualifiedName~CertificateStoreManager"` → green;
|
||
`dotnet build src/Server/ZB.MOM.WW.OtOpcUa.AdminUI` → 0 errors (proves the razor compiles).
|
||
|
||
**Step 5 — Commit:**
|
||
`git add src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests/Certificates/CertificateStoreManagerTests.cs`
|
||
then `git commit -m "feat(adminui): audit cert Trust/Untrust/Delete to ConfigAuditLog"`.
|
||
|
||
---
|
||
|
||
### Task 3: `OpcUaProbeResult` dead-letter tidy — three actor drop handlers + guard test
|
||
|
||
**Classification:** small
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** Task 1
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (handler region ~line 482+)
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs` (ctor handlers ~line 86)
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` (ctor handlers ~line 172)
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/…` (new guard test; place beside the existing dead-letter-probe test — find it via `rg "AllDeadLetters" tests`)
|
||
|
||
**Step 1 — Write the guard test (failing without the fix).** For at least `DriverHostActor`
|
||
(the most-trafficked), subscribe a probe to `AllDeadLetters` on the EventStream, spin up the
|
||
actor with its redundancy-topic subscription, publish a `PeerOpcUaProbeActor.OpcUaProbeResult`
|
||
to the `RedundancyStateTopic`, and assert **no** `DeadLetter` wrapping an `OpcUaProbeResult`.
|
||
(Mirror the Phase-B `AllDeadLetters`-probe idiom already in Runtime.Tests.) A combined test
|
||
covering all three is fine if the harness makes that easy; otherwise the `DriverHostActor`
|
||
case is the required guard and the other two are covered by inspection + build.
|
||
|
||
**Step 2 — Add the drop to each actor.** Each already subscribes the redundancy-state topic in
|
||
`PreStart` and imports `Akka.Cluster.Tools.PublishSubscribe`. Add, next to the existing
|
||
`RedundancyStateChanged` / `SubscribeAck` handlers:
|
||
```csharp
|
||
// The redundancy-state topic also carries OpcUaProbeResult (published by OpcUaPublishActor's
|
||
// peer-probes). We don't consume it here — drop it so it doesn't dead-letter. Matches
|
||
// PeerProbeSupervisor / OpcUaPublishActor. (intentional no-op)
|
||
Receive<ZB.MOM.WW.OtOpcUa.Runtime.Health.PeerOpcUaProbeActor.OpcUaProbeResult>(_ => { });
|
||
```
|
||
- `HistorianAdapterActor`: add in the ctor near `Receive<SubscribeAck>(_ => { });` (~line 87).
|
||
- `ScriptedAlarmHostActor`: add in the ctor near the other `Receive<…>` registrations (~line 172).
|
||
- `DriverHostActor`: add in the steady-state behaviour where `Receive<…>` handlers are registered
|
||
(the `Receive<RedundancyStateChanged>` handler — confirm it exists in each behaviour that is
|
||
active while subscribed; add the drop wherever `RedundancyStateChanged` is handled so the probe
|
||
result is dropped in the same behaviour(s)).
|
||
**Note:** if `DriverHostActor` uses `Become`/multiple behaviours, add the drop to every behaviour
|
||
that handles `RedundancyStateChanged`, else the probe still dead-letters in the others.
|
||
|
||
**Step 3 — Test + build:**
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ProbeResult"` → green;
|
||
`dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime` → 0 errors.
|
||
|
||
**Step 4 — Commit:**
|
||
`git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/`
|
||
then `git commit -m "fix(runtime): drop OpcUaProbeResult in redundancy-topic subscribers (no dead-letter)"`.
|
||
|
||
---
|
||
|
||
### Task 4: Reconcile + full build/test + live `/run` + finish
|
||
|
||
**Classification:** standard (verification + integration review + finish)
|
||
**Estimated implement time:** ~5 min + live run
|
||
**Parallelizable with:** none (depends on T1, T2, T3)
|
||
|
||
**Step 1 — Full build + solution test:**
|
||
`dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 errors;
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.AdminUI.Tests tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests` → green.
|
||
|
||
**Step 2 — Final integration review** (code-reviewer subagent over `git diff master..HEAD`):
|
||
confirm no Commons/proto/EF change, audit emitted on all cert paths (incl. failures), the
|
||
`AuditEvent` shape matches the package, the drop handler is in every behaviour that subscribes
|
||
the topic, and tests genuinely fail without the fix.
|
||
|
||
**Step 3 — Live `/run` (agent-driven; docker-dev login is disabled).** Rebuild both central
|
||
nodes to the branch (`docker compose -f docker-dev/docker-compose.yml up -d --build central-1
|
||
central-2`); open the AdminUI Certificates page (`http://localhost:9200`), perform a Trust or
|
||
Untrust (or seed a rejected cert first), and confirm a new `ConfigAuditLog` row exists
|
||
(query the shared SQL `10.100.0.35,14330` ConfigDb: principal = the dev admin user,
|
||
EventType = `Certificate:Trust`, Outcome = success). Confirm server logs show **no**
|
||
`OpcUaProbeResult` dead-letter on either node. Capture the evidence.
|
||
|
||
**Step 4 — Finish.** `git checkout master && git merge --ff-only feat/cert-audit-and-probe-tidy`
|
||
(fast-forward; the branch is linear off master), then `git push`. Delete the feature branch.
|
||
Update `pending.md`/`stillpending.md` (on-disk, never-staged) to record both items shipped.
|