# Harden Milestone 1b — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task. **Goal:** Close out the review nits left from the just-shipped Milestone 1b (equipment-tag live values + inbound operator-write pipeline) — fast-fail degraded-state writes, two micro-optimisations, a FOCAS equipment-tag parse-cache gap, a benign Galaxy dead-letter, a Galaxy reconnect stale-handle gap, two test-hardening additions, and one deployment-facing docs note. **Architecture:** Pure hardening — no new feature surface, **no EF/Configuration schema change, no migration**. Changes are additive fast-fail handlers on existing actor behaviours, a thread-safe cache swap, a writer cache-invalidation seam on the Galaxy reconnect path, test additions, and a docs subsection. Source of these items: `pending.md` follow-ups #3, #4, #6 (from the Milestone 1b integration review), all enumerated below. **Tech Stack:** C# / .NET 10, Akka.NET (DriverHostActor / DriverInstanceActor), xUnit + Shouldly + Akka.TestKit. `TreatWarningsAsErrors` is on solution-wide. **Branch:** `chore/harden-milestone-1b` off `master` (HEAD `02e37a6d`). **Hard rules (every task):** stage by path (never `git add .`); never stage `sql_login.txt` / `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/` / `pending.md` / `current.md` / `docker-dev/docker-compose.yml`; never echo secrets; no force-push / no `--no-verify`; **NO Configuration entity / EF migration change**. Done = build clean + `dotnet test` green for every touched project. --- ## Task DAG ``` T0 (branch) ├─ T1 DriverHostActor: Stale fast-fail + micro-opts + raw-blob routing test [standard] ├─ T2 DriverInstanceActor: Connecting/Reconnecting/Stubbed write fast-fail │ + benign SubscriptionEstablished handler [standard] ├─ T3 FOCAS: cache equipment-tag parsed addresses (ConcurrentDictionary) [small] ├─ T4 Parity test: future-enum trap InlineData(2,false) [trivial] ├─ T5 docs/security.md: data-plane GroupToRole requirement note [trivial] └─ T6 Galaxy writer: invalidate handle caches on reconnect [small] ``` T1–T6 all `blockedBy` T0 and have **disjoint file sets** — they are mutually `Parallelizable`. Dispatch them concurrently after T0. --- ### Task T0: Create feature branch **Classification:** trivial **Estimated implement time:** ~1 min **Parallelizable with:** none (gates all) **Files:** none (git only) **Step 1: Branch off master** ```bash git -C /Users/dohertj2/Desktop/OtOpcUa checkout -b chore/harden-milestone-1b master git -C /Users/dohertj2/Desktop/OtOpcUa rev-parse --abbrev-ref HEAD # expect: chore/harden-milestone-1b ``` Do NOT touch the working-tree `docker-dev/docker-compose.yml` (leave it modified+unstaged throughout). --- ### Task T1: DriverHostActor — Stale-state write fast-fail + micro-opts + raw-blob routing test **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** T2, T3, T4, T5, T6 Addresses pending.md #4(a-host), #4(b), #4(d). **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (Stale behaviour ~537-550; `HandleRouteNodeWrite` ContinueWith line 519; forward-map build line 754-756) - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorWriteRoutingTests.cs` **Context:** `RouteNodeWrite`/`NodeWriteResult` are defined at `DriverHostActor.cs:130,136`. The running behaviour wires `HandleRouteNodeWrite`, but the **`Stale()`** behaviour (DB unreachable; lines 537-550) has no `RouteNodeWrite` handler → an inbound operator write while Stale dead-letters and the node-manager's 10 s Ask times out with a generic log (the client already got the optimistic `Good`). Add a fast-fail. Two micro-opts ride along in the same file. **Step 1: Add the failing Akka.TestKit test for Stale fast-fail** In `DriverHostActorWriteRoutingTests.cs`, add a test that drives the host into `Stale` (construct it with a DB factory that throws, or send whatever the existing tests use to reach Stale — mirror an existing Stale-path test in the file/suite) and asserts an immediate negative `NodeWriteResult`: ```csharp [Fact] public void RouteNodeWrite_while_stale_fast_fails_instead_of_timing_out() { // Arrange: a host that cannot reach the config DB enters Stale on first DispatchDeployment. var host = /* spawn DriverHostActor with a throwing IDbContextFactory — reuse the suite's Stale helper */; DriveIntoStale(host); // Act host.Tell(new DriverHostActor.RouteNodeWrite("ns=2;s=EQ-x/Tag", 42), TestActor); var reply = ExpectMsg(TimeSpan.FromSeconds(2)); // Assert: fast negative, not a ~10s timeout reply.Success.ShouldBeFalse(); reply.Reason.ShouldContain("stale"); } ``` If the suite has no existing Stale helper, reach Stale the same way the existing "ignoring DispatchDeployment while Stale" coverage does; if none exists, construct the actor with a `IDbContextFactory` stub whose `CreateDbContext` throws and send the message that triggers the DB read. Keep the helper local to the test file. **Step 2: Run it — expect FAIL (times out / dead-letters, no reply)** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests \ --filter "FullyQualifiedName~RouteNodeWrite_while_stale_fast_fails" ``` **Step 3: Add the Stale fast-fail handler** In `DriverHostActor.cs` `Stale()` (after the existing `Receive` line 547), add: ```csharp // An inbound operator write can't be serviced while the config DB is unreachable — fast-fail so the // node-manager's bounded Ask gets an immediate clear status instead of dead-lettering into a timeout. Receive(_ => Sender.Tell(new NodeWriteResult(false, "driver host stale (config DB unreachable)"))); ``` **Step 4: Micro-opt — drop `ExecuteSynchronously` on the router continuation** In `HandleRouteNodeWrite` (line 514-520), the `.ContinueWith(...)` passes `TaskContinuationOptions.ExecuteSynchronously`. Running the continuation synchronously on the Ask's completing thread (a pool/transport thread) is needless coupling for a tiny projection. Change it to `TaskContinuationOptions.None`: ```csharp .ContinueWith( t => t.IsCompletedSuccessfully ? new NodeWriteResult(t.Result.Success, t.Result.Reason) : new NodeWriteResult(false, "write timeout"), CancellationToken.None, TaskContinuationOptions.None, TaskScheduler.Default) .PipeTo(replyTo); ``` **Step 5: Micro-opt — `List.Contains` → `HashSet` in the forward-map build** The forward map `_nodeIdByDriverRef` (declared as `Dictionary<.., List>`) does `if (!list.Contains(nodeId)) list.Add(nodeId);` at line 756 — O(n) per node. Convert the value type to `HashSet`: - Change the field declaration to `Dictionary<(string,string), HashSet>` (keep `StringComparer.Ordinal` on the inner set). - Line 754-756 becomes: ```csharp if (!_nodeIdByDriverRef.TryGetValue(key, out var set)) _nodeIdByDriverRef[key] = set = new HashSet(StringComparer.Ordinal); set.Add(nodeId); ``` - **Grep every consumer of `_nodeIdByDriverRef`** (`grep -n _nodeIdByDriverRef src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs`) and confirm they only `foreach`/`Count`/membership-check the value (all of which `HashSet` supports). If any consumer indexes by position (`list[i]`) leave it a `List` and skip this micro-opt — note that in the commit message. (Expected: ForwardToMux only enumerates.) **Step 6: Raw protocol-blob routing test (#4d)** The suite seeds equipment tags with `TagConfig = JsonSerializer.Serialize(new { FullName = t.FullName })` (a Galaxy-style `{FullName}` blob). The router keys purely on **NodeId**, so the TagConfig blob shape is irrelevant to routing — prove that with a raw protocol blob. Add a tag to the seeded composition (or a new focused test) whose `TagConfig` is a raw Modbus-style blob and assert the write still routes to the owning driver by NodeId: ```csharp [Fact] public void RouteNodeWrite_routes_by_nodeId_regardless_of_raw_tagconfig_blob_shape() { // Seed an equipment tag whose TagConfig is a RAW protocol blob (region/address), NOT {FullName}. var rawBlob = JsonSerializer.Serialize(new { region = "HoldingRegister", address = 200, dataType = "UInt16" }); var host = SeedHostWithEquipmentTag(driverId: "MAIN-modbus-eq", tagConfig: rawBlob, /* nodeId derived as in the suite */); host.Tell(new DriverHostActor.RouteNodeWrite(expectedNodeId, (ushort)7), TestActor); // The child receives WriteAttribute carrying the wire-ref FullName (the blob is the wire-ref here); // assert via the suite's existing child probe that the write was forwarded (Success path). var reply = ExpectMsg(); reply.Success.ShouldBeTrue(); } ``` Reuse the suite's existing child-probe / seed helpers (the file already exercises `RouteNodeWrite` happy-path — clone that setup and only swap the `TagConfig` to the raw blob). The point is belt-and-suspenders: routing is blob-shape-independent. **Step 7: Run the full routing suite — expect PASS** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverHostActorWriteRouting" ``` **Step 8: Build + commit (stage by path)** ```bash git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs \ tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorWriteRoutingTests.cs git commit -m "fix(runtime): fast-fail RouteNodeWrite while Stale + micro-opts + raw-blob routing test" ``` --- ### Task T2: DriverInstanceActor — write fast-fail in degraded states + benign SubscriptionEstablished handler **Classification:** standard **Estimated implement time:** ~5 min **Parallelizable with:** T1, T3, T4, T5, T6 Addresses pending.md #4(a-instance) and #6(a). **Files:** - Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (Stubbed ~205-209; `Connecting()` 211-231; `Reconnecting()` 265-281; `Connected()` 233-263) - Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorWriteAndSubscribeTests.cs` (NEW file — avoids touching T1's test file) **Context:** - `WriteAttribute`/`WriteAttributeResult` are at lines 41-42. Only `Connected()` handles `WriteAttribute` (line 252). When the actor is `Connecting`/`Reconnecting`/`Stubbed`, an inbound `WriteAttribute` (Asked by `DriverHostActor.HandleRouteNodeWrite` with an 8 s timeout) dead-letters → the Ask times out as a generic "write timeout". Fast-fail in those states with a clear reason. - `ResubscribeDesired()` (line 437-443) does `Self.Tell(new Subscribe(...))`; `HandleSubscribeAsync` replies `SubscriptionEstablished` to `replyTo = Sender = Self` (line 379) → no handler → the benign dead-letter `Message [SubscriptionEstablished] from drv-… to drv-… was unhandled` seen live on Galaxy. This is the **generic** actor, so a no-op handler quiets it for every driver. **Step 1: Failing tests (new file)** ```csharp public sealed class DriverInstanceActorWriteAndSubscribeTests : TestKit { [Fact] public void WriteAttribute_while_connecting_fast_fails() { var actor = SpawnInConnectingState(); // see helper note below actor.Tell(new DriverInstanceActor.WriteAttribute("Tag.Attr", 1), TestActor); var r = ExpectMsg(TimeSpan.FromSeconds(2)); r.Success.ShouldBeFalse(); r.Reason.ShouldNotBeNull(); } [Fact] public void SubscriptionEstablished_is_swallowed_without_deadletter() { var actor = SpawnConnected(); // subscribe with a desired set so ResubscribeDesired self-Tells Subscribe and the reply lands on Self // Subscribe to DeadLetters and assert none of type SubscriptionEstablished arrive. var deadLetters = CreateTestProbe(); Sys.EventStream.Subscribe(deadLetters, typeof(DeadLetter)); // drive a (re)subscribe ... then: deadLetters.ExpectNoMsg(TimeSpan.FromMilliseconds(500)); } } ``` Helper note: reuse the spawn/connect helpers the existing DriverInstanceActor tests use (look for them in the Runtime.Tests `Drivers/` folder; there is established setup for stub drivers and the Connected/Reconnecting transitions). To reach `Connecting`, send `InitializeRequested` with a driver-config that the stub keeps pending, or `Become`-drive via the suite's existing harness. If reaching exact internal states is awkward, the **minimum viable** assertions are: (a) a `WriteAttribute` to a Stubbed actor returns a negative `WriteAttributeResult` quickly, and (b) no `SubscriptionEstablished` dead-letter is produced during a connect+resubscribe cycle. **Step 2: Run — expect FAIL** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverInstanceActorWriteAndSubscribe" ``` **Step 3: Add write fast-fail to the three degraded behaviours** Add this line to `Stubbed` (after line 208), `Connecting()` (after line 230), and `Reconnecting()` (after line 279): ```csharp Receive(_ => Sender.Tell(new WriteAttributeResult(false, "driver not connected"))); ``` (Stubbed may use a reason like `"driver stubbed"` — implementer's call; the host maps either into a `NodeWriteResult(false, …)`.) **Step 4: Add the benign SubscriptionEstablished handler** In `Connected()` (alongside the other `Receive`s, e.g. after line 262) add a swallow-with-trace so the self-reply from `ResubscribeDesired` doesn't dead-letter: ```csharp // ResubscribeDesired self-Tells Subscribe; HandleSubscribeAsync replies SubscriptionEstablished to the // sender, which on the self-resubscribe path is Self. Swallow it (trace only) so it doesn't dead-letter. Receive(msg => _log.Debug("DriverInstance {Id}: subscription established ({Count} refs, {Diag})", _driverInstanceId, msg.ReferenceCount, msg.DiagnosticId)); ``` Add the same one-liner to `Reconnecting()` and `Connecting()` to cover the race where the async subscribe reply lands after a state flip (cheap, same file). Do NOT alter where/how `SubscriptionEstablished` is *sent* — a host that genuinely Asks `Subscribe` still receives its reply (the handler only runs when the message arrives at this actor as the asker). **Step 5: Run — expect PASS, then build the project** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverInstanceActorWriteAndSubscribe" dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ZB.MOM.WW.OtOpcUa.Runtime.csproj ``` **Step 6: Commit (stage by path)** ```bash git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs \ tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorWriteAndSubscribeTests.cs git commit -m "fix(runtime): fast-fail writes in degraded driver states + swallow self SubscriptionEstablished" ``` --- ### Task T3: FOCAS — cache equipment-tag parsed addresses **Classification:** small **Estimated implement time:** ~4 min **Parallelizable with:** T1, T2, T4, T5, T6 Addresses pending.md #4(c). **Files:** - Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriver.cs` (cache field 37-38; read path 275-280; write path 337-342; init seed 120) - Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/` (add to an existing FocasDriver read/write test class) **Context:** `_parsedAddressesByTagName` (a plain `Dictionary`, line 37) is seeded only from `_options.Tags` at init (line 120). Equipment tags arrive via `_resolver` (line 66) as transient defs whose `Name` is the raw ref — never seeded — so both `ReadAsync` (275-280) and `WriteAsync` (337-342) hit the `TryGetValue` miss and **re-parse `FocasAddress.TryParse(def.Address)` every call without storing**. Fix: make the cache a `ConcurrentDictionary` (reads run on the poll loop, writes on the host Ask thread — concurrent access) and `GetOrAdd` so a transient def's address is parsed once. **Step 1: Failing test — second resolve must reuse the cache** Add an `internal` test seam to `FocasDriver` so the test can observe caching without a parser spy: ```csharp // In FocasDriver (internal, for tests via InternalsVisibleTo — confirm the attribute exists for the .Tests asm) internal bool IsParsedAddressCached(string reference) => _parsedAddressesByTagName.ContainsKey(reference); ``` Test (equipment-tag ref, not an authored tag): ```csharp [Fact] public async Task Equipment_tag_address_is_parsed_once_and_cached() { var driver = /* construct FocasDriver with a fake client + one device, no authored Tags */; await driver.InitializeAsync(configJson, default); var equipRef = "{\"address\":\"R100\",\"dataType\":\"Int16\",\"deviceHostAddress\":\"focas://10.0.0.5\",\"writable\":true}"; // matches FocasEquipmentTagParser keys driver.IsParsedAddressCached(equipRef).ShouldBeFalse(); // not seeded at init await driver.ReadAsync(new[] { equipRef }, default); // first resolve parses + caches driver.IsParsedAddressCached(equipRef).ShouldBeTrue(); // cached for subsequent hot calls } ``` Use the exact equipment-ref JSON shape that `FocasEquipmentTagParser.TryParse` accepts (open that parser to copy its camelCase keys + required fields). If `InternalsVisibleTo` for the `.Tests` assembly isn't already present on the FOCAS driver project, add it (mirror how the other drivers expose internals to their test projects). **Step 2: Run — expect FAIL (second assert false: never cached)** ```bash dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests --filter "FullyQualifiedName~Equipment_tag_address_is_parsed_once" ``` **Step 3: Implement the cache** - Change the field (line 37-38) to: ```csharp private readonly ConcurrentDictionary _parsedAddressesByTagName = new(StringComparer.OrdinalIgnoreCase); ``` (add `using System.Collections.Concurrent;` if absent). - Add a private helper: ```csharp private FocasAddress ResolveParsedAddress(FocasTagDefinition def) => _parsedAddressesByTagName.GetOrAdd(def.Name, _ => FocasAddress.TryParse(def.Address) ?? throw new InvalidOperationException( $"FOCAS tag '{def.Name}' has malformed Address '{def.Address}'.")); ``` (The throwing factory propagates and does **not** cache a bad parse — preserving the existing fail behaviour. `GetOrAdd` may invoke the factory more than once under race but only stores one — fine, parsing is pure.) - Replace the read-path block (275-280) and the write-path block (337-342) with `var parsed = ResolveParsedAddress(def);`. - Line 120 (`_parsedAddressesByTagName[tag.Name] = parsed;`) compiles unchanged on `ConcurrentDictionary`; keep it (init pre-validates + seeds authored tags). **Step 4: Run — expect PASS, build the project** ```bash dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests --filter "FullyQualifiedName~Equipment_tag_address_is_parsed_once" dotnet build src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.csproj ``` **Step 5: Commit (stage by path)** ```bash git add src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/ git commit -m "perf(focas): cache equipment-tag parsed addresses (no per-call reparse)" ``` --- ### Task T4: Parity test — future-enum trap **Classification:** trivial **Estimated implement time:** ~2 min **Parallelizable with:** T1, T2, T3, T5, T6 Addresses pending.md #4(e). **Files:** - Test only: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DeploymentArtifactAliasParityTests.cs` (numeric-AccessLevel theory ~73-104) **Context:** `TagAccessLevel` has `Read = 0`, `ReadWrite = 1` (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/TagAccessLevel.cs`). The parity test maps numeric AccessLevel → `Writable` with `[InlineData(1, true)]`/`[InlineData(0, false)]`. Add a future-enum guard so an as-yet-undefined value maps fail-safe to non-writable (only `ReadWrite==1` should yield a writable node). **Step 1: Add the InlineData row** In the numeric-AccessLevel `[Theory]` (the one with `[InlineData(1, true)]` / `[InlineData(0, false)]`), add: ```csharp [InlineData(2, false)] // future/unknown AccessLevel must map fail-safe to read-only (only ReadWrite⇒writable) ``` **Step 2: Run — expect PASS (confirms the derivation is `== ReadWrite`, not `!= Read`)** ```bash dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DeploymentArtifactAliasParity" ``` If it FAILS, the production derivation is using `!= Read` (so `2` wrongly maps to writable) — that is a real fail-safety bug; STOP and surface it (the fix would be a one-line `== TagAccessLevel.ReadWrite` in both Phase7Composer and DeploymentArtifact, but only with reviewer sign-off since it touches the byte-parity contract). **Step 3: Commit (stage by path)** ```bash git add tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DeploymentArtifactAliasParityTests.cs git commit -m "test(runtime): future-enum trap for AccessLevel→Writable parity" ``` --- ### Task T5: docs/security.md — data-plane GroupToRole requirement **Classification:** trivial **Estimated implement time:** ~3 min **Parallelizable with:** T1, T2, T3, T4, T6 Addresses pending.md #3. **Files:** - Modify: `docs/security.md` (Data-Plane Authorization section starts line 167; `GroupToRole` is described at line 163 in the LDAP section; troubleshooting at 313-315) **Context:** OPC UA session roles are unioned from two sources: the DB `LdapGroupRoleMapping` (whose `Role` column is the **`AdminRole`** enum — Administrator/Designer/Viewer only) AND the appsettings `Security:Ldap:GroupToRole` (free-form `string→string`). The data-plane gates (`WriteOperate`, `WriteTune`, `WriteConfigure`, `AlarmAck`/`AlarmAcknowledge`, `ReadOnly`) read literal **role strings** that the AdminRole-typed DB mapping **cannot produce** — so a deployment MUST map its data-plane LDAP groups → data-plane role strings via `GroupToRole`, or write-through and OPC UA alarm-ack are silently inert (every write → `BadUserAccessDenied`, default-deny). Line 163 currently frames `GroupToRole` as mapping only to *Admin* roles — that's the misleading bit to correct. **Step 1: Add a "Role grant source (data-plane)" subsection** Under **## Data-Plane Authorization** (after the "Full model" subsection ~line 241, or right after the "Permission flags" mapping ~216), add a subsection. Suggested content (match the doc's voice; keep it tight): ```markdown ### Role grant source (data-plane) Data-plane roles come from `Security:Ldap:GroupToRole` (appsettings), **not** from the Config-DB `LdapGroupRoleMapping` table. That table's `Role` column is the `AdminRole` enum (`Administrator`/`Designer`/`Viewer`) and supplies **control-plane** roles only — it cannot emit the data-plane role strings the OPC UA gates read (`ReadOnly`, `WriteOperate`, `WriteTune`, `WriteConfigure`, `AlarmAcknowledge`). A deployment therefore **must** map its data-plane LDAP groups to those role strings via `GroupToRole`, e.g.: ```json "GroupToRole": { "ot-operators": "WriteOperate", "ot-tuners": "WriteTune", "ot-engineers": "WriteConfigure", "ot-alarm-ack": "AlarmAcknowledge", "ot-readonly": "ReadOnly" } ``` If this mapping is absent the data-plane evaluator is strictly default-deny: inbound operator writes and OPC UA Part-9 alarm acknowledgement all return `BadUserAccessDenied` even for users who authenticate successfully. (The same requirement gates the pre-existing scripted-alarm ack path.) ``` Also tighten line 163 (the `GroupToRole` description in the LDAP section) so it notes the values are control-plane **and** data-plane role strings, cross-referencing the new subsection — a one-clause edit, not a rewrite. **Step 2: Verify the doc renders / no broken anchors, then commit** ```bash git add docs/security.md git commit -m "docs(security): document the GroupToRole data-plane role requirement" ``` (No test; verification = re-read the section for accuracy against the permission-flag names at lines 186-216.) --- ### Task T6: Galaxy writer — invalidate handle caches on reconnect **Classification:** small **Estimated implement time:** ~5 min **Parallelizable with:** T1, T2, T3, T4, T5 Addresses pending.md #6(b). **Files:** - Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/IGalaxyDataWriter.cs` (interface — add method) - Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs` (caches 28-32; implement clear) - Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/TracedGalaxyDataWriter.cs` (decorator — delegate) - Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (`ReopenAsync` 291-296; `_dataWriter` set at 255-257) - Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/` (GatewayGalaxyDataWriter test class) **Context:** `GatewayGalaxyDataWriter` caches `_itemHandles` (ref→gw item handle, line 28) and `_supervisedHandles` (handle→advised, line 32). `_dataWriter` is built **once** in `BuildProductionRuntimeAsync` (255-257) and reused across reconnects. On reconnect, `ReopenAsync` (291-296) calls `_ownedMxSession.RecreateAsync` — which disposes + rebuilds the session, so all prior gw item handles are **dead**. The writer's caches still hold the stale handles, so a write right after a reconnect AddItem-skips and uses a dead handle. Add a cache-invalidation seam called on reopen. **Step 1: Failing test — invalidation forces a fresh AddItem on the next write** In the existing `GatewayGalaxyDataWriter` test class (it already constructs the writer over a fake/owned session — reuse that harness), add: ```csharp [Fact] public async Task InvalidateHandleCaches_forces_AddItem_again_on_next_write() { var (writer, fakeSession) = BuildWriterWithFakeSession(); // reuse existing harness await writer.WriteAsync(new[] { new WriteRequest("Tag.Attr", 1) }, _ => SecurityClassification.Operate, default); fakeSession.AddItemCallCount.ShouldBe(1); writer.InvalidateHandleCaches(); await writer.WriteAsync(new[] { new WriteRequest("Tag.Attr", 2) }, _ => SecurityClassification.Operate, default); fakeSession.AddItemCallCount.ShouldBe(2); // re-AddItem'd because the cache was cleared } ``` If the existing fake session doesn't count `AddItem`, either extend it with a counter or assert via an internal seam on the writer (`internal int CachedItemHandleCount => _itemHandles.Count;`): assert `>0` after a write, `0` after `InvalidateHandleCaches()`. Pick whichever the existing harness makes cleanest. **Step 2: Run — expect FAIL (no such method)** ```bash dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter "FullyQualifiedName~InvalidateHandleCaches" ``` **Step 3: Add the method to the interface + impl + decorator** - `IGalaxyDataWriter`: add ```csharp /// Drop cached gw item handles + supervisory-advise state. Call after a session /// reconnect — the old handles are dead, so the next write must re-AddItem + re-advise. void InvalidateHandleCaches(); ``` - `GatewayGalaxyDataWriter`: ```csharp public void InvalidateHandleCaches() { _itemHandles.Clear(); _supervisedHandles.Clear(); } ``` - `TracedGalaxyDataWriter` (decorator): delegate — `public void InvalidateHandleCaches() => _inner.InvalidateHandleCaches();` (match the field name the decorator uses for its inner writer). **Step 4: Call it on reopen** In `GalaxyDriver.ReopenAsync` (291-296), after `RecreateAsync` succeeds: ```csharp private async Task ReopenAsync(CancellationToken cancellationToken) { if (_ownedMxSession is null) return; var clientOptions = BuildClientOptions(_options.Gateway); await _ownedMxSession.RecreateAsync(clientOptions, cancellationToken).ConfigureAwait(false); // The recreated session invalidates every prior gw item handle; drop the writer's handle/advise // caches so the next write re-AddItems + re-AdviseSupervisory against the fresh session. _dataWriter?.InvalidateHandleCaches(); } ``` (`_dataWriter` is the `TracedGalaxyDataWriter`-wrapped field set at 255-257; the call flows through to the inner `GatewayGalaxyDataWriter`.) **Step 5: Run — expect PASS, run the full Galaxy suite, build** ```bash dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter "FullyQualifiedName~InvalidateHandleCaches" dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests ``` **Step 6: Commit (stage by path)** ```bash git add src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ git commit -m "fix(galaxy): invalidate writer handle caches on session reconnect" ``` --- ## Final verification (after T1–T6) ```bash dotnet build ZB.MOM.WW.OtOpcUa.slnx # 0 errors (TreatWarningsAsErrors) dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests ``` These are all unit-level changes (no live docker-dev `/run` gate needed — none touch Razor/JS or the wire protocol behaviourally; the fast-fail paths and cache changes are exercised by Akka.TestKit + unit tests). Then finish via **superpowers-extended-cc:finishing-a-development-branch** (intent: merge-to-master + push). After merge, update `pending.md` to mark #3, #4, #6 resolved. ## Out of scope (left in pending.md) - #4 sub-item: FOCAS — covered (T3). The "surface real device-write status" item (#5) is a **separate** chosen-later effort, not part of this cluster. - #7 driver-reconfigure-while-faulted — its own high-risk actor-state-machine design/plan. - #1/#2 Phase B/C; #8 rig cleanups — unchanged.