# Write-Outcome Self-Correction — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. **Goal:** When an authorized inbound OPC UA operator write to a writable equipment-tag node fails at the device, **revert the node to its real pre-write value** on the subscription — killing the "optimistic-write phantom" (`pending.md` #5). Approved design: `docs/plans/2026-06-14-write-outcome-self-correction-design.md` (committed master `8a79cb0e`). **Architecture:** Complete the originally-designed (B3) write gateway. Add `IOpcUaNodeWriteGateway { Task WriteAsync(...) }` (Commons), an Akka adapter `ActorNodeWriteGateway` (Runtime) that Asks the existing `DriverHostActor.RouteNodeWrite`, and swap the node manager's fire-and-forget `Action` router for the gateway. The node manager captures the prior value at `OnWriteValue` time and, on a failed async outcome, **compare-and-reverts** the node (only if it still holds the optimistic value, so a fresh poll isn't clobbered). The dispatch stays non-blocking under the node-manager `Lock`; the correction runs in the off-Lock continuation. **Tech Stack:** C# / .NET 10, OPC UA Foundation stack (`CustomNodeManager2`), Akka.NET, xUnit + Shouldly + Akka.TestKit. `TreatWarningsAsErrors` (production projects). **Branch:** `feat/write-outcome-self-correction` off `master` (`8a79cb0e`). **Hard rules (every task):** stage by path (never `git add .`); never stage `sql_login.txt` / `src/Server/.../pki/` / `pending.md` / `current.md` / `docker-dev/docker-compose.yml`; never echo secrets; no force-push / no `--no-verify`; **NO Configuration entity / EF migration change**; NO bUnit (this is server-side only). Done = build clean + `dotnet test` green + the live `/run` gate (T5) passes. --- ## Task DAG ``` T0 (branch) └─ T1 Commons: IOpcUaNodeWriteGateway + NodeWriteOutcome + Null default [standard] ├─ T2 Runtime: ActorNodeWriteGateway (Asks RouteNodeWrite) + TestKit [standard] ∥ T3 └─ T3 OpcUaServer: node-manager seam swap + capture-prior + revert [high-risk] ∥ T2 └─ T4 Host: wire ActorNodeWriteGateway; SetNodeWriteGateway/clear [standard] (needs T2+T3) └─ T5 Live /run: failing write reverts, success stays [verification] ``` T2 and T3 are both `blockedBy` T1 and touch **different assemblies** (Runtime vs OpcUaServer) — `Parallelizable` with each other. T4 needs both. T5 needs T4. --- ### Task T0: Create feature branch **Classification:** trivial **Estimated implement time:** ~1 min **Parallelizable with:** none (gates all) **Files:** none (git only) ```bash git checkout -b feat/write-outcome-self-correction master git rev-parse --abbrev-ref HEAD # expect: feat/write-outcome-self-correction ``` Do NOT touch the working-tree `docker-dev/docker-compose.yml` / `pending.md` (leave modified+unstaged). --- ### Task T1: Commons — `IOpcUaNodeWriteGateway` + `NodeWriteOutcome` + Null default **Classification:** standard **Estimated implement time:** ~3 min **Parallelizable with:** none (T2 + T3 depend on it) **Files:** - Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/IOpcUaNodeWriteGateway.cs` - Test: the Commons test project (the one that tests `IOpcUaAddressSpaceSink` / `EquipmentNodeIds` — locate it under `tests/Core/`, e.g. `*.Commons.Tests`; confirm exact name) **Context:** mirrors `IOpcUaAddressSpaceSink.cs` (same folder) — a Commons-level (Akka-free) contract with a `Null…` default. The write path will use this instead of the `Action` router so the device outcome can flow back to the node manager. **Step 1: Write the interface + outcome + null default** ```csharp namespace ZB.MOM.WW.OtOpcUa.Commons.OpcUa; /// /// Reverse-path gateway for an inbound OPC UA operator write to a writable equipment-tag node. /// The node manager calls fire-and-forget from its OnWriteValue handler /// (which runs under the node-manager Lock, so the call MUST return promptly — it kicks off an /// asynchronous route and the Task completes later) and uses the resolved /// to self-correct the node on failure. /// public interface IOpcUaNodeWriteGateway { /// Route a write of to the driver backing node /// ; the Task resolves with the device-write outcome. Task WriteAsync(string nodeId, object? value, CancellationToken ct); } /// Outcome of routing an inbound node write to the backing driver. /// True when the driver accepted the write. /// Failure detail when is false; null on success. public readonly record struct NodeWriteOutcome(bool Success, string? Reason); /// No-op gateway: every write resolves to "writes unavailable" (matches the legacy /// no-router-wired BadNotWritable). Used as the default before the host wires the real one. public sealed class NullOpcUaNodeWriteGateway : IOpcUaNodeWriteGateway { /// Singleton instance. public static readonly NullOpcUaNodeWriteGateway Instance = new(); private NullOpcUaNodeWriteGateway() { } /// public Task WriteAsync(string nodeId, object? value, CancellationToken ct) => Task.FromResult(new NodeWriteOutcome(false, "writes unavailable")); } ``` **Step 2: Unit test the Null default** ```csharp [Fact] public async Task NullGateway_returns_writes_unavailable() { var outcome = await NullOpcUaNodeWriteGateway.Instance.WriteAsync("ns=2;s=x", 1, default); outcome.Success.ShouldBeFalse(); outcome.Reason.ShouldBe("writes unavailable"); } ``` **Step 3: Build + test + commit** ```bash dotnet build src/Core/ZB.MOM.WW.OtOpcUa.Commons/ZB.MOM.WW.OtOpcUa.Commons.csproj dotnet test --filter "FullyQualifiedName~NullGateway" git add src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/IOpcUaNodeWriteGateway.cs git commit -m "feat(commons): IOpcUaNodeWriteGateway + NodeWriteOutcome for write-outcome routing" ``` --- ### Task T2: Runtime — `ActorNodeWriteGateway` (Asks `RouteNodeWrite`) **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** T3 **Files:** - Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/ActorNodeWriteGateway.cs` - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/ActorNodeWriteGatewayTests.cs` (NEW) **Context:** this is the current host router lambda (`OtOpcUaServerHostedService.cs:145-161`) repackaged as a named `IOpcUaNodeWriteGateway` that *returns* the outcome. It lives in Runtime (Akka-aware, co-located with `DriverHostActor` + `RouteNodeWrite`/`NodeWriteResult`), so it has an Akka.TestKit home and keeps the Host thin. It resolves the `DriverHostActor` **lazily per write** via the actor registry (StartAsync ordering — same reason the current lambda does), does the bounded 10 s Ask, logs failures (the node manager has no logger), and maps `NodeWriteResult → NodeWriteOutcome`. **Step 1: Write the failing TestKit tests** Mirror the harness in `DriverHostActorWriteRoutingTests.cs` (a `TestProbe` standing in for the `DriverHostActor`, registered under `DriverHostActorKey` in a test `IActorRegistry` — reuse whatever registry fake the suite uses, or construct the gateway with a direct `IActorRef` overload to keep the test simple). Cases: - `WriteAsync_maps_successful_NodeWriteResult_to_success` — probe replies `NodeWriteResult(true, null)` → outcome `(true, null)`. - `WriteAsync_maps_failed_NodeWriteResult_to_failure` — probe replies `NodeWriteResult(false, "not primary")` → outcome `(false, "not primary")`. - `WriteAsync_maps_timeout_to_failure` — probe never replies → outcome `(false, )` (use a short Ask timeout in the test, or an injected timeout). - `WriteAsync_with_no_DriverHostActor_returns_failure` — registry can't resolve → `(false, "writes unavailable"|…)`. ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ActorNodeWriteGateway" # expect FAIL (type missing) ``` **Step 2: Implement** ```csharp public sealed class ActorNodeWriteGateway : IOpcUaNodeWriteGateway { private readonly Func _resolveDriverHost; // lazy per-write resolve (StartAsync ordering) private readonly TimeSpan _askTimeout; private readonly ILogger _logger; public ActorNodeWriteGateway(Func resolveDriverHost, ILogger logger, TimeSpan? askTimeout = null) { _resolveDriverHost = …; _logger = …; _askTimeout = askTimeout ?? TimeSpan.FromSeconds(10); } public async Task WriteAsync(string nodeId, object? value, CancellationToken ct) { var driverHost = _resolveDriverHost(); if (driverHost is null) { _logger.LogWarning("Inbound write to {NodeId} dropped: no DriverHostActor registered", nodeId); return new NodeWriteOutcome(false, "writes unavailable"); } try { var result = await driverHost.Ask( new DriverHostActor.RouteNodeWrite(nodeId, value), _askTimeout, ct).ConfigureAwait(false); if (!result.Success) _logger.LogWarning("Operator write to {NodeId} rejected: {Reason}", nodeId, result.Reason); return new NodeWriteOutcome(result.Success, result.Reason); } catch (Exception ex) // AskTimeoutException / actor faults { _logger.LogWarning(ex, "Operator write to {NodeId} failed or timed out", nodeId); return new NodeWriteOutcome(false, "write timeout"); } } } ``` (Provide whatever constructor shape best fits the test + the host's registry resolve — a `Func` keeps it unit-testable with a direct probe while supporting the host's lazy `_actorRegistry.TryGet`.) **Step 3: test + build + commit** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~ActorNodeWriteGateway" dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ZB.MOM.WW.OtOpcUa.Runtime.csproj git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/ActorNodeWriteGateway.cs tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/ActorNodeWriteGatewayTests.cs git commit -m "feat(runtime): ActorNodeWriteGateway — Asks RouteNodeWrite, returns NodeWriteOutcome" ``` --- ### Task T3: OpcUaServer — node-manager seam swap + capture-prior + compare-and-revert **Classification:** high-risk **Estimated implement time:** ~5 min **Parallelizable with:** T2 **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` (`NodeWriteRouter` 100-104; `OnEquipmentTagWrite` 604-617; `EvaluateEquipmentWrite` 637-657; the `WriteValue` node-update pattern 122-141 to mirror) - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs` (`SetNodeWriteRouter` 52-57) - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/` (the existing `EvaluateEquipmentWrite` test class) **Context:** the node manager has **no logger** (`new OtOpcUaNodeManager(server, configuration)`), so the revert is silent (the gateway logs the failure). The continuation runs off-Lock (after the original Write returns) and re-acquires `Lock` to revert — the same shape `WriteValue` (122-141) already uses for the poll path. **Step 1 (TDD): pure-decision unit tests first** Add tests to the existing `OnEquipmentTagWrite`/`EvaluateEquipmentWrite` test class for the two pure statics this task introduces: ```csharp // gate (role + availability) — replaces the old EvaluateEquipmentWrite(identity, route) shape [Fact] public void Gate_denies_when_identity_null() => OtOpcUaNodeManager.EvaluateEquipmentWriteGate(null, gatewayWired: true)!.StatusCode.ShouldBe((StatusCode)StatusCodes.BadUserAccessDenied); [Fact] public void Gate_denies_when_role_missing() => /* identity without WriteOperate → BadUserAccessDenied */ ; [Fact] public void Gate_unavailable_when_no_gateway() => OtOpcUaNodeManager.EvaluateEquipmentWriteGate(WriteOperateIdentity(), gatewayWired: false)!.StatusCode.ShouldBe((StatusCode)StatusCodes.BadNotWritable); [Fact] public void Gate_passes_returns_null() => OtOpcUaNodeManager.EvaluateEquipmentWriteGate(WriteOperateIdentity(), gatewayWired: true).ShouldBeNull(); // compare-and-revert decision [Theory] [InlineData(true, "v", "v", false)] // success → never revert [InlineData(false, "v", "v", true)] // fail + node still holds optimistic value → revert [InlineData(false, "w", "v", false)] // fail + node moved on (poll) → don't clobber public void ShouldRevert_decides(bool success, string current, string optimistic, bool expected) => OtOpcUaNodeManager.ShouldRevert(new NodeWriteOutcome(success, null), current, optimistic).ShouldBe(expected); ``` Run → expect FAIL (methods don't exist yet). **Step 2: Implement the seam swap + the two pure statics + the continuation** In `OtOpcUaNodeManager.cs`: - Replace the `Action NodeWriteRouter` property + its `volatile` backing field with an `IOpcUaNodeWriteGateway` (default `NullOpcUaNodeWriteGateway.Instance`), volatile-backed, e.g.: ```csharp private volatile IOpcUaNodeWriteGateway _nodeWriteGateway = NullOpcUaNodeWriteGateway.Instance; public IOpcUaNodeWriteGateway NodeWriteGateway { get => _nodeWriteGateway; set => _nodeWriteGateway = value ?? NullOpcUaNodeWriteGateway.Instance; } ``` - Replace `OnEquipmentTagWrite` + `EvaluateEquipmentWrite` with: ```csharp private ServiceResult OnEquipmentTagWrite( ISystemContext context, NodeState node, NumericRange indexRange, QualifiedName dataEncoding, ref object value, ref StatusCode statusCode, ref DateTime timestamp) { var identity = (context as ISessionOperationContext)?.UserIdentity as RoleCarryingUserIdentity; var gateway = _nodeWriteGateway; var gate = EvaluateEquipmentWriteGate(identity, gateway is not NullOpcUaNodeWriteGateway); if (gate is not null) return gate; // Capture optimistic value + the REAL prior value/status BEFORE the SDK applies the write. var optimisticValue = value; var nodeKey = node.NodeId.Identifier?.ToString() ?? string.Empty; object? priorValue = null; StatusCode priorStatus = StatusCodes.Good; if (node is BaseDataVariableState v) { priorValue = v.Value; priorStatus = v.StatusCode; } // Fire-and-forget (MUST NOT block under Lock). On a failed outcome, compare-and-revert. _ = gateway.WriteAsync(nodeKey, optimisticValue, CancellationToken.None) .ContinueWith(t => { var outcome = t.IsCompletedSuccessfully ? t.Result : new NodeWriteOutcome(false, "write dispatch faulted"); RevertOptimisticWriteIfNeeded(nodeKey, outcome, optimisticValue, priorValue, priorStatus); }, CancellationToken.None, TaskContinuationOptions.None, TaskScheduler.Default); return ServiceResult.Good; } internal static ServiceResult? EvaluateEquipmentWriteGate(RoleCarryingUserIdentity? identity, bool gatewayWired) { if (identity is null || !identity.Roles.Contains(OpcUaDataPlaneRoles.WriteOperate, StringComparer.OrdinalIgnoreCase)) return new ServiceResult(StatusCodes.BadUserAccessDenied); if (!gatewayWired) return new ServiceResult(StatusCodes.BadNotWritable, "writes unavailable"); return null; } internal static bool ShouldRevert(NodeWriteOutcome outcome, object? currentNodeValue, object? optimisticValue) => !outcome.Success && Equals(currentNodeValue, optimisticValue); private void RevertOptimisticWriteIfNeeded( string nodeId, NodeWriteOutcome outcome, object? optimisticValue, object? priorValue, StatusCode priorStatus) { lock (Lock) { if (!_variables.TryGetValue(nodeId, out var variable)) return; // rebuilt/removed → no-op if (!ShouldRevert(outcome, variable.Value, optimisticValue)) return; variable.Value = priorValue; variable.StatusCode = priorStatus; variable.Timestamp = DateTime.UtcNow; variable.ClearChangeMasks(SystemContext, includeChildren: false); } } ``` (Confirm `_variables` is the dictionary `WriteValue` uses at 130 — reuse it. Confirm `node` is/casts to `BaseDataVariableState`. Use the node manager's existing `using`s.) - In `OtOpcUaSdkServer.cs`, rename `SetNodeWriteRouter(Action?)` → `SetNodeWriteGateway(IOpcUaNodeWriteGateway?)` setting `_otOpcUaNodeManager.NodeWriteGateway`. Update the XML doc. **Step 3: update the old EvaluateEquipmentWrite tests** to the new gate shape (they currently assert the `(identity, route)` form). Keep the role-gate behaviour identical; only the signature changed. **Step 4: run + build + commit** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests --filter "FullyQualifiedName~EquipmentWrite|FullyQualifiedName~ShouldRevert" dotnet build src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/ZB.MOM.WW.OtOpcUa.OpcUaServer.csproj git add src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/ git commit -m "feat(opcua): write-outcome self-correction — capture prior + compare-and-revert on failure" ``` --- ### Task T4: Host — wire `ActorNodeWriteGateway` **Classification:** standard **Estimated implement time:** ~3 min **Parallelizable with:** none (needs T2 + T3) **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs` (`StartAsync` router-wiring 133-161; `StopAsync` 178-187) **Context:** replace the inline `SetNodeWriteRouter((nodeId,value)=>{ Ask… })` lambda (145-161) with construction of the `ActorNodeWriteGateway` (lazy registry resolve + a host logger) and `SetNodeWriteGateway(gateway)`. Clear it on `StopAsync` (185 `SetNodeWriteRouter(null)` → `SetNodeWriteGateway(null)`). **Step 1: Rewire** ```csharp // replaces lines 145-161 var writeGateway = new ActorNodeWriteGateway( resolveDriverHost: () => _actorRegistry.TryGet(out var h) ? h : null, logger: _loggerFactory.CreateLogger()); _server.SetNodeWriteGateway(writeGateway); ``` And in `StopAsync`: `_server?.SetNodeWriteGateway(null);` Keep the existing explanatory comment block (133-144) — it still describes the fire-and-forget + lazy-resolve rationale; trim the "logs failures from a continuation" clause to reflect the gateway now owning that. **Step 2: build the Host + a focused integration/host test if one exists** ```bash dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj ``` (No unit test here — it's DI wiring; covered by T2's gateway tests + the T5 live gate. If `Host.IntegrationTests` has a boot-smoke test, ensure it still passes.) **Step 3: commit** ```bash git add src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs git commit -m "feat(host): wire ActorNodeWriteGateway for write-outcome self-correction" ``` --- ### Task T5: Live `/run` verification **Classification:** verification **Estimated implement time:** ~10 min (interactive) **Parallelizable with:** none (needs T4) **Goal:** prove on the LOCAL docker-dev rig that a **failed** device write **reverts** the node, and a **successful** write stays. **Setup:** docker-dev is LOCAL (OrbStack). Rebuild central on the branch + redeploy: ```bash docker compose -f docker-dev/docker-compose.yml up -d --build migrator central-1 central-2 curl -s -X POST http://localhost:9200/api/deployments -H "X-Api-Key: docker-dev-deploy-key" ``` If a driver is stuck faulted, restart central-1/central-2 so it respawns from the artifact (the reconfigure-while-faulted gotcha). **Pick a deterministically-FAILING write** (the plan's executor chooses whichever the rig makes easy): - **Modbus illegal write:** point a writable equipment tag at a Modbus address the `:5020` sim rejects (out-of-range / illegal-data-address), so the driver write returns Bad. Sims on `10.100.0.35` (`/opt/otopcua-modbus`). - **Galaxy ViewOnly:** surface a Galaxy `ViewOnly` attribute as a ReadWrite equipment tag — the gateway commit is refused. (`TestMachine_002` attrs from the Galaxy discovery probe.) **Verify with Client.CLI** (bind as the `opc-writeop`/WriteOperate LDAP user — see `docs/Client.CLI.md`; shared GLAuth `10.100.0.35:3893`, users password `password`): 1. `subscribe` the target node; record the current (real) value. 2. `write` a new value through the authorized session → the write returns Good (optimistic) and the node briefly shows the new value, **then reverts** to the prior value on the subscription when the device write fails. ✅ phantom gone. 3. Repeat on a **successful** writable tag (e.g. the proven Modbus HR[200] scratch) → the value **stays** (no spurious revert). ✅ no regression. 4. Confirm the gateway logged the rejection (`docker compose -f docker-dev/docker-compose.yml logs central-1 | grep -i "rejected\|write"`). **Then** finish via **superpowers-extended-cc:finishing-a-development-branch** (intent: merge-to-master + push). After merge, update `pending.md` to mark #5 resolved. --- ## Final verification (after T1–T4, before T5) ```bash dotnet build ZB.MOM.WW.OtOpcUa.slnx # 0 errors dotnet test tests/Core/ dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests ``` ## Out of scope (per the approved design / mechanism choice) - Bad-quality blip on failure; OPC UA `AuditWriteUpdateEvent`; synchronous structural fail-fast. Independently addable later. - `DriverHostActor.RouteNodeWrite` / `NodeWriteResult` and the driver write path are UNCHANGED — this only changes how the outcome is consumed server-side.