docs(plans): harden-milestone-1b plan + task record
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
v2-ci / build (push) Failing after 48s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
This commit is contained in:
@@ -0,0 +1,557 @@
|
||||
# Harden Milestone 1b — Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Close out the review nits left from the just-shipped Milestone 1b (equipment-tag live values + inbound operator-write pipeline) — fast-fail degraded-state writes, two micro-optimisations, a FOCAS equipment-tag parse-cache gap, a benign Galaxy dead-letter, a Galaxy reconnect stale-handle gap, two test-hardening additions, and one deployment-facing docs note.
|
||||
|
||||
**Architecture:** Pure hardening — no new feature surface, **no EF/Configuration schema change, no migration**. Changes are additive fast-fail handlers on existing actor behaviours, a thread-safe cache swap, a writer cache-invalidation seam on the Galaxy reconnect path, test additions, and a docs subsection. Source of these items: `pending.md` follow-ups #3, #4, #6 (from the Milestone 1b integration review), all enumerated below.
|
||||
|
||||
**Tech Stack:** C# / .NET 10, Akka.NET (DriverHostActor / DriverInstanceActor), xUnit + Shouldly + Akka.TestKit. `TreatWarningsAsErrors` is on solution-wide.
|
||||
|
||||
**Branch:** `chore/harden-milestone-1b` off `master` (HEAD `02e37a6d`).
|
||||
|
||||
**Hard rules (every task):** stage by path (never `git add .`); never stage `sql_login.txt` / `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/` / `pending.md` / `current.md` / `docker-dev/docker-compose.yml`; never echo secrets; no force-push / no `--no-verify`; **NO Configuration entity / EF migration change**. Done = build clean + `dotnet test` green for every touched project.
|
||||
|
||||
---
|
||||
|
||||
## Task DAG
|
||||
|
||||
```
|
||||
T0 (branch)
|
||||
├─ T1 DriverHostActor: Stale fast-fail + micro-opts + raw-blob routing test [standard]
|
||||
├─ T2 DriverInstanceActor: Connecting/Reconnecting/Stubbed write fast-fail
|
||||
│ + benign SubscriptionEstablished handler [standard]
|
||||
├─ T3 FOCAS: cache equipment-tag parsed addresses (ConcurrentDictionary) [small]
|
||||
├─ T4 Parity test: future-enum trap InlineData(2,false) [trivial]
|
||||
├─ T5 docs/security.md: data-plane GroupToRole requirement note [trivial]
|
||||
└─ T6 Galaxy writer: invalidate handle caches on reconnect [small]
|
||||
```
|
||||
|
||||
T1–T6 all `blockedBy` T0 and have **disjoint file sets** — they are mutually `Parallelizable`. Dispatch them concurrently after T0.
|
||||
|
||||
---
|
||||
|
||||
### Task T0: Create feature branch
|
||||
|
||||
**Classification:** trivial
|
||||
**Estimated implement time:** ~1 min
|
||||
**Parallelizable with:** none (gates all)
|
||||
|
||||
**Files:** none (git only)
|
||||
|
||||
**Step 1: Branch off master**
|
||||
|
||||
```bash
|
||||
git -C /Users/dohertj2/Desktop/OtOpcUa checkout -b chore/harden-milestone-1b master
|
||||
git -C /Users/dohertj2/Desktop/OtOpcUa rev-parse --abbrev-ref HEAD # expect: chore/harden-milestone-1b
|
||||
```
|
||||
|
||||
Do NOT touch the working-tree `docker-dev/docker-compose.yml` (leave it modified+unstaged throughout).
|
||||
|
||||
---
|
||||
|
||||
### Task T1: DriverHostActor — Stale-state write fast-fail + micro-opts + raw-blob routing test
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** T2, T3, T4, T5, T6
|
||||
|
||||
Addresses pending.md #4(a-host), #4(b), #4(d).
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (Stale behaviour ~537-550; `HandleRouteNodeWrite` ContinueWith line 519; forward-map build line 754-756)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorWriteRoutingTests.cs`
|
||||
|
||||
**Context:** `RouteNodeWrite`/`NodeWriteResult` are defined at `DriverHostActor.cs:130,136`. The running behaviour wires `HandleRouteNodeWrite`, but the **`Stale()`** behaviour (DB unreachable; lines 537-550) has no `RouteNodeWrite` handler → an inbound operator write while Stale dead-letters and the node-manager's 10 s Ask times out with a generic log (the client already got the optimistic `Good`). Add a fast-fail. Two micro-opts ride along in the same file.
|
||||
|
||||
**Step 1: Add the failing Akka.TestKit test for Stale fast-fail**
|
||||
|
||||
In `DriverHostActorWriteRoutingTests.cs`, add a test that drives the host into `Stale` (construct it with a DB factory that throws, or send whatever the existing tests use to reach Stale — mirror an existing Stale-path test in the file/suite) and asserts an immediate negative `NodeWriteResult`:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public void RouteNodeWrite_while_stale_fast_fails_instead_of_timing_out()
|
||||
{
|
||||
// Arrange: a host that cannot reach the config DB enters Stale on first DispatchDeployment.
|
||||
var host = /* spawn DriverHostActor with a throwing IDbContextFactory — reuse the suite's Stale helper */;
|
||||
DriveIntoStale(host);
|
||||
|
||||
// Act
|
||||
host.Tell(new DriverHostActor.RouteNodeWrite("ns=2;s=EQ-x/Tag", 42), TestActor);
|
||||
var reply = ExpectMsg<DriverHostActor.NodeWriteResult>(TimeSpan.FromSeconds(2));
|
||||
|
||||
// Assert: fast negative, not a ~10s timeout
|
||||
reply.Success.ShouldBeFalse();
|
||||
reply.Reason.ShouldContain("stale");
|
||||
}
|
||||
```
|
||||
|
||||
If the suite has no existing Stale helper, reach Stale the same way the existing "ignoring DispatchDeployment while Stale" coverage does; if none exists, construct the actor with a `IDbContextFactory` stub whose `CreateDbContext` throws and send the message that triggers the DB read. Keep the helper local to the test file.
|
||||
|
||||
**Step 2: Run it — expect FAIL (times out / dead-letters, no reply)**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests \
|
||||
--filter "FullyQualifiedName~RouteNodeWrite_while_stale_fast_fails"
|
||||
```
|
||||
|
||||
**Step 3: Add the Stale fast-fail handler**
|
||||
|
||||
In `DriverHostActor.cs` `Stale()` (after the existing `Receive<RedundancyStateChanged>` line 547), add:
|
||||
|
||||
```csharp
|
||||
// An inbound operator write can't be serviced while the config DB is unreachable — fast-fail so the
|
||||
// node-manager's bounded Ask gets an immediate clear status instead of dead-lettering into a timeout.
|
||||
Receive<RouteNodeWrite>(_ =>
|
||||
Sender.Tell(new NodeWriteResult(false, "driver host stale (config DB unreachable)")));
|
||||
```
|
||||
|
||||
**Step 4: Micro-opt — drop `ExecuteSynchronously` on the router continuation**
|
||||
|
||||
In `HandleRouteNodeWrite` (line 514-520), the `.ContinueWith(...)` passes `TaskContinuationOptions.ExecuteSynchronously`. Running the continuation synchronously on the Ask's completing thread (a pool/transport thread) is needless coupling for a tiny projection. Change it to `TaskContinuationOptions.None`:
|
||||
|
||||
```csharp
|
||||
.ContinueWith(
|
||||
t => t.IsCompletedSuccessfully
|
||||
? new NodeWriteResult(t.Result.Success, t.Result.Reason)
|
||||
: new NodeWriteResult(false, "write timeout"),
|
||||
CancellationToken.None,
|
||||
TaskContinuationOptions.None,
|
||||
TaskScheduler.Default)
|
||||
.PipeTo(replyTo);
|
||||
```
|
||||
|
||||
**Step 5: Micro-opt — `List.Contains` → `HashSet` in the forward-map build**
|
||||
|
||||
The forward map `_nodeIdByDriverRef` (declared as `Dictionary<.., List<string>>`) does `if (!list.Contains(nodeId)) list.Add(nodeId);` at line 756 — O(n) per node. Convert the value type to `HashSet<string>`:
|
||||
- Change the field declaration to `Dictionary<(string,string), HashSet<string>>` (keep `StringComparer.Ordinal` on the inner set).
|
||||
- Line 754-756 becomes:
|
||||
```csharp
|
||||
if (!_nodeIdByDriverRef.TryGetValue(key, out var set))
|
||||
_nodeIdByDriverRef[key] = set = new HashSet<string>(StringComparer.Ordinal);
|
||||
set.Add(nodeId);
|
||||
```
|
||||
- **Grep every consumer of `_nodeIdByDriverRef`** (`grep -n _nodeIdByDriverRef src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs`) and confirm they only `foreach`/`Count`/membership-check the value (all of which `HashSet` supports). If any consumer indexes by position (`list[i]`) leave it a `List` and skip this micro-opt — note that in the commit message. (Expected: ForwardToMux only enumerates.)
|
||||
|
||||
**Step 6: Raw protocol-blob routing test (#4d)**
|
||||
|
||||
The suite seeds equipment tags with `TagConfig = JsonSerializer.Serialize(new { FullName = t.FullName })` (a Galaxy-style `{FullName}` blob). The router keys purely on **NodeId**, so the TagConfig blob shape is irrelevant to routing — prove that with a raw protocol blob. Add a tag to the seeded composition (or a new focused test) whose `TagConfig` is a raw Modbus-style blob and assert the write still routes to the owning driver by NodeId:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public void RouteNodeWrite_routes_by_nodeId_regardless_of_raw_tagconfig_blob_shape()
|
||||
{
|
||||
// Seed an equipment tag whose TagConfig is a RAW protocol blob (region/address), NOT {FullName}.
|
||||
var rawBlob = JsonSerializer.Serialize(new { region = "HoldingRegister", address = 200, dataType = "UInt16" });
|
||||
var host = SeedHostWithEquipmentTag(driverId: "MAIN-modbus-eq", tagConfig: rawBlob, /* nodeId derived as in the suite */);
|
||||
|
||||
host.Tell(new DriverHostActor.RouteNodeWrite(expectedNodeId, (ushort)7), TestActor);
|
||||
|
||||
// The child receives WriteAttribute carrying the wire-ref FullName (the blob is the wire-ref here);
|
||||
// assert via the suite's existing child probe that the write was forwarded (Success path).
|
||||
var reply = ExpectMsg<DriverHostActor.NodeWriteResult>();
|
||||
reply.Success.ShouldBeTrue();
|
||||
}
|
||||
```
|
||||
|
||||
Reuse the suite's existing child-probe / seed helpers (the file already exercises `RouteNodeWrite` happy-path — clone that setup and only swap the `TagConfig` to the raw blob). The point is belt-and-suspenders: routing is blob-shape-independent.
|
||||
|
||||
**Step 7: Run the full routing suite — expect PASS**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverHostActorWriteRouting"
|
||||
```
|
||||
|
||||
**Step 8: Build + commit (stage by path)**
|
||||
|
||||
```bash
|
||||
git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs \
|
||||
tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorWriteRoutingTests.cs
|
||||
git commit -m "fix(runtime): fast-fail RouteNodeWrite while Stale + micro-opts + raw-blob routing test"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task T2: DriverInstanceActor — write fast-fail in degraded states + benign SubscriptionEstablished handler
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** T1, T3, T4, T5, T6
|
||||
|
||||
Addresses pending.md #4(a-instance) and #6(a).
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (Stubbed ~205-209; `Connecting()` 211-231; `Reconnecting()` 265-281; `Connected()` 233-263)
|
||||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorWriteAndSubscribeTests.cs` (NEW file — avoids touching T1's test file)
|
||||
|
||||
**Context:**
|
||||
- `WriteAttribute`/`WriteAttributeResult` are at lines 41-42. Only `Connected()` handles `WriteAttribute` (line 252). When the actor is `Connecting`/`Reconnecting`/`Stubbed`, an inbound `WriteAttribute` (Asked by `DriverHostActor.HandleRouteNodeWrite` with an 8 s timeout) dead-letters → the Ask times out as a generic "write timeout". Fast-fail in those states with a clear reason.
|
||||
- `ResubscribeDesired()` (line 437-443) does `Self.Tell(new Subscribe(...))`; `HandleSubscribeAsync` replies `SubscriptionEstablished` to `replyTo = Sender = Self` (line 379) → no handler → the benign dead-letter `Message [SubscriptionEstablished] from drv-… to drv-… was unhandled` seen live on Galaxy. This is the **generic** actor, so a no-op handler quiets it for every driver.
|
||||
|
||||
**Step 1: Failing tests (new file)**
|
||||
|
||||
```csharp
|
||||
public sealed class DriverInstanceActorWriteAndSubscribeTests : TestKit
|
||||
{
|
||||
[Fact]
|
||||
public void WriteAttribute_while_connecting_fast_fails()
|
||||
{
|
||||
var actor = SpawnInConnectingState(); // see helper note below
|
||||
actor.Tell(new DriverInstanceActor.WriteAttribute("Tag.Attr", 1), TestActor);
|
||||
var r = ExpectMsg<DriverInstanceActor.WriteAttributeResult>(TimeSpan.FromSeconds(2));
|
||||
r.Success.ShouldBeFalse();
|
||||
r.Reason.ShouldNotBeNull();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void SubscriptionEstablished_is_swallowed_without_deadletter()
|
||||
{
|
||||
var actor = SpawnConnected();
|
||||
// subscribe with a desired set so ResubscribeDesired self-Tells Subscribe and the reply lands on Self
|
||||
// Subscribe to DeadLetters and assert none of type SubscriptionEstablished arrive.
|
||||
var deadLetters = CreateTestProbe();
|
||||
Sys.EventStream.Subscribe(deadLetters, typeof(DeadLetter));
|
||||
// drive a (re)subscribe ... then:
|
||||
deadLetters.ExpectNoMsg(TimeSpan.FromMilliseconds(500));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Helper note: reuse the spawn/connect helpers the existing DriverInstanceActor tests use (look for them in the Runtime.Tests `Drivers/` folder; there is established setup for stub drivers and the Connected/Reconnecting transitions). To reach `Connecting`, send `InitializeRequested` with a driver-config that the stub keeps pending, or `Become`-drive via the suite's existing harness. If reaching exact internal states is awkward, the **minimum viable** assertions are: (a) a `WriteAttribute` to a Stubbed actor returns a negative `WriteAttributeResult` quickly, and (b) no `SubscriptionEstablished` dead-letter is produced during a connect+resubscribe cycle.
|
||||
|
||||
**Step 2: Run — expect FAIL**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverInstanceActorWriteAndSubscribe"
|
||||
```
|
||||
|
||||
**Step 3: Add write fast-fail to the three degraded behaviours**
|
||||
|
||||
Add this line to `Stubbed` (after line 208), `Connecting()` (after line 230), and `Reconnecting()` (after line 279):
|
||||
|
||||
```csharp
|
||||
Receive<WriteAttribute>(_ =>
|
||||
Sender.Tell(new WriteAttributeResult(false, "driver not connected")));
|
||||
```
|
||||
|
||||
(Stubbed may use a reason like `"driver stubbed"` — implementer's call; the host maps either into a `NodeWriteResult(false, …)`.)
|
||||
|
||||
**Step 4: Add the benign SubscriptionEstablished handler**
|
||||
|
||||
In `Connected()` (alongside the other `Receive`s, e.g. after line 262) add a swallow-with-trace so the self-reply from `ResubscribeDesired` doesn't dead-letter:
|
||||
|
||||
```csharp
|
||||
// ResubscribeDesired self-Tells Subscribe; HandleSubscribeAsync replies SubscriptionEstablished to the
|
||||
// sender, which on the self-resubscribe path is Self. Swallow it (trace only) so it doesn't dead-letter.
|
||||
Receive<SubscriptionEstablished>(msg =>
|
||||
_log.Debug("DriverInstance {Id}: subscription established ({Count} refs, {Diag})",
|
||||
_driverInstanceId, msg.ReferenceCount, msg.DiagnosticId));
|
||||
```
|
||||
|
||||
Add the same one-liner to `Reconnecting()` and `Connecting()` to cover the race where the async subscribe reply lands after a state flip (cheap, same file). Do NOT alter where/how `SubscriptionEstablished` is *sent* — a host that genuinely Asks `Subscribe` still receives its reply (the handler only runs when the message arrives at this actor as the asker).
|
||||
|
||||
**Step 5: Run — expect PASS, then build the project**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DriverInstanceActorWriteAndSubscribe"
|
||||
dotnet build src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ZB.MOM.WW.OtOpcUa.Runtime.csproj
|
||||
```
|
||||
|
||||
**Step 6: Commit (stage by path)**
|
||||
|
||||
```bash
|
||||
git add src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs \
|
||||
tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorWriteAndSubscribeTests.cs
|
||||
git commit -m "fix(runtime): fast-fail writes in degraded driver states + swallow self SubscriptionEstablished"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task T3: FOCAS — cache equipment-tag parsed addresses
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** T1, T2, T4, T5, T6
|
||||
|
||||
Addresses pending.md #4(c).
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriver.cs` (cache field 37-38; read path 275-280; write path 337-342; init seed 120)
|
||||
- Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/` (add to an existing FocasDriver read/write test class)
|
||||
|
||||
**Context:** `_parsedAddressesByTagName` (a plain `Dictionary`, line 37) is seeded only from `_options.Tags` at init (line 120). Equipment tags arrive via `_resolver` (line 66) as transient defs whose `Name` is the raw ref — never seeded — so both `ReadAsync` (275-280) and `WriteAsync` (337-342) hit the `TryGetValue` miss and **re-parse `FocasAddress.TryParse(def.Address)` every call without storing**. Fix: make the cache a `ConcurrentDictionary` (reads run on the poll loop, writes on the host Ask thread — concurrent access) and `GetOrAdd` so a transient def's address is parsed once.
|
||||
|
||||
**Step 1: Failing test — second resolve must reuse the cache**
|
||||
|
||||
Add an `internal` test seam to `FocasDriver` so the test can observe caching without a parser spy:
|
||||
|
||||
```csharp
|
||||
// In FocasDriver (internal, for tests via InternalsVisibleTo — confirm the attribute exists for the .Tests asm)
|
||||
internal bool IsParsedAddressCached(string reference) => _parsedAddressesByTagName.ContainsKey(reference);
|
||||
```
|
||||
|
||||
Test (equipment-tag ref, not an authored tag):
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task Equipment_tag_address_is_parsed_once_and_cached()
|
||||
{
|
||||
var driver = /* construct FocasDriver with a fake client + one device, no authored Tags */;
|
||||
await driver.InitializeAsync(configJson, default);
|
||||
var equipRef = "{\"address\":\"R100\",\"dataType\":\"Int16\",\"deviceHostAddress\":\"focas://10.0.0.5\",\"writable\":true}"; // matches FocasEquipmentTagParser keys
|
||||
|
||||
driver.IsParsedAddressCached(equipRef).ShouldBeFalse(); // not seeded at init
|
||||
await driver.ReadAsync(new[] { equipRef }, default); // first resolve parses + caches
|
||||
driver.IsParsedAddressCached(equipRef).ShouldBeTrue(); // cached for subsequent hot calls
|
||||
}
|
||||
```
|
||||
|
||||
Use the exact equipment-ref JSON shape that `FocasEquipmentTagParser.TryParse` accepts (open that parser to copy its camelCase keys + required fields). If `InternalsVisibleTo` for the `.Tests` assembly isn't already present on the FOCAS driver project, add it (mirror how the other drivers expose internals to their test projects).
|
||||
|
||||
**Step 2: Run — expect FAIL (second assert false: never cached)**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests --filter "FullyQualifiedName~Equipment_tag_address_is_parsed_once"
|
||||
```
|
||||
|
||||
**Step 3: Implement the cache**
|
||||
|
||||
- Change the field (line 37-38) to:
|
||||
```csharp
|
||||
private readonly ConcurrentDictionary<string, FocasAddress> _parsedAddressesByTagName =
|
||||
new(StringComparer.OrdinalIgnoreCase);
|
||||
```
|
||||
(add `using System.Collections.Concurrent;` if absent).
|
||||
- Add a private helper:
|
||||
```csharp
|
||||
private FocasAddress ResolveParsedAddress(FocasTagDefinition def) =>
|
||||
_parsedAddressesByTagName.GetOrAdd(def.Name, _ =>
|
||||
FocasAddress.TryParse(def.Address)
|
||||
?? throw new InvalidOperationException(
|
||||
$"FOCAS tag '{def.Name}' has malformed Address '{def.Address}'."));
|
||||
```
|
||||
(The throwing factory propagates and does **not** cache a bad parse — preserving the existing fail behaviour. `GetOrAdd` may invoke the factory more than once under race but only stores one — fine, parsing is pure.)
|
||||
- Replace the read-path block (275-280) and the write-path block (337-342) with `var parsed = ResolveParsedAddress(def);`.
|
||||
- Line 120 (`_parsedAddressesByTagName[tag.Name] = parsed;`) compiles unchanged on `ConcurrentDictionary`; keep it (init pre-validates + seeds authored tags).
|
||||
|
||||
**Step 4: Run — expect PASS, build the project**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests --filter "FullyQualifiedName~Equipment_tag_address_is_parsed_once"
|
||||
dotnet build src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.csproj
|
||||
```
|
||||
|
||||
**Step 5: Commit (stage by path)**
|
||||
|
||||
```bash
|
||||
git add src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/
|
||||
git commit -m "perf(focas): cache equipment-tag parsed addresses (no per-call reparse)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task T4: Parity test — future-enum trap
|
||||
|
||||
**Classification:** trivial
|
||||
**Estimated implement time:** ~2 min
|
||||
**Parallelizable with:** T1, T2, T3, T5, T6
|
||||
|
||||
Addresses pending.md #4(e).
|
||||
|
||||
**Files:**
|
||||
- Test only: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DeploymentArtifactAliasParityTests.cs` (numeric-AccessLevel theory ~73-104)
|
||||
|
||||
**Context:** `TagAccessLevel` has `Read = 0`, `ReadWrite = 1` (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/TagAccessLevel.cs`). The parity test maps numeric AccessLevel → `Writable` with `[InlineData(1, true)]`/`[InlineData(0, false)]`. Add a future-enum guard so an as-yet-undefined value maps fail-safe to non-writable (only `ReadWrite==1` should yield a writable node).
|
||||
|
||||
**Step 1: Add the InlineData row**
|
||||
|
||||
In the numeric-AccessLevel `[Theory]` (the one with `[InlineData(1, true)]` / `[InlineData(0, false)]`), add:
|
||||
|
||||
```csharp
|
||||
[InlineData(2, false)] // future/unknown AccessLevel must map fail-safe to read-only (only ReadWrite⇒writable)
|
||||
```
|
||||
|
||||
**Step 2: Run — expect PASS (confirms the derivation is `== ReadWrite`, not `!= Read`)**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests --filter "FullyQualifiedName~DeploymentArtifactAliasParity"
|
||||
```
|
||||
|
||||
If it FAILS, the production derivation is using `!= Read` (so `2` wrongly maps to writable) — that is a real fail-safety bug; STOP and surface it (the fix would be a one-line `== TagAccessLevel.ReadWrite` in both Phase7Composer and DeploymentArtifact, but only with reviewer sign-off since it touches the byte-parity contract).
|
||||
|
||||
**Step 3: Commit (stage by path)**
|
||||
|
||||
```bash
|
||||
git add tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DeploymentArtifactAliasParityTests.cs
|
||||
git commit -m "test(runtime): future-enum trap for AccessLevel→Writable parity"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task T5: docs/security.md — data-plane GroupToRole requirement
|
||||
|
||||
**Classification:** trivial
|
||||
**Estimated implement time:** ~3 min
|
||||
**Parallelizable with:** T1, T2, T3, T4, T6
|
||||
|
||||
Addresses pending.md #3.
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/security.md` (Data-Plane Authorization section starts line 167; `GroupToRole` is described at line 163 in the LDAP section; troubleshooting at 313-315)
|
||||
|
||||
**Context:** OPC UA session roles are unioned from two sources: the DB `LdapGroupRoleMapping` (whose `Role` column is the **`AdminRole`** enum — Administrator/Designer/Viewer only) AND the appsettings `Security:Ldap:GroupToRole` (free-form `string→string`). The data-plane gates (`WriteOperate`, `WriteTune`, `WriteConfigure`, `AlarmAck`/`AlarmAcknowledge`, `ReadOnly`) read literal **role strings** that the AdminRole-typed DB mapping **cannot produce** — so a deployment MUST map its data-plane LDAP groups → data-plane role strings via `GroupToRole`, or write-through and OPC UA alarm-ack are silently inert (every write → `BadUserAccessDenied`, default-deny). Line 163 currently frames `GroupToRole` as mapping only to *Admin* roles — that's the misleading bit to correct.
|
||||
|
||||
**Step 1: Add a "Role grant source (data-plane)" subsection**
|
||||
|
||||
Under **## Data-Plane Authorization** (after the "Full model" subsection ~line 241, or right after the "Permission flags" mapping ~216), add a subsection. Suggested content (match the doc's voice; keep it tight):
|
||||
|
||||
```markdown
|
||||
### Role grant source (data-plane)
|
||||
|
||||
Data-plane roles come from `Security:Ldap:GroupToRole` (appsettings), **not** from the Config-DB
|
||||
`LdapGroupRoleMapping` table. That table's `Role` column is the `AdminRole` enum
|
||||
(`Administrator`/`Designer`/`Viewer`) and supplies **control-plane** roles only — it cannot emit the
|
||||
data-plane role strings the OPC UA gates read (`ReadOnly`, `WriteOperate`, `WriteTune`,
|
||||
`WriteConfigure`, `AlarmAcknowledge`). A deployment therefore **must** map its data-plane LDAP groups
|
||||
to those role strings via `GroupToRole`, e.g.:
|
||||
|
||||
```json
|
||||
"GroupToRole": {
|
||||
"ot-operators": "WriteOperate",
|
||||
"ot-tuners": "WriteTune",
|
||||
"ot-engineers": "WriteConfigure",
|
||||
"ot-alarm-ack": "AlarmAcknowledge",
|
||||
"ot-readonly": "ReadOnly"
|
||||
}
|
||||
```
|
||||
|
||||
If this mapping is absent the data-plane evaluator is strictly default-deny: inbound operator
|
||||
writes and OPC UA Part-9 alarm acknowledgement all return `BadUserAccessDenied` even for users who
|
||||
authenticate successfully. (The same requirement gates the pre-existing scripted-alarm ack path.)
|
||||
```
|
||||
|
||||
Also tighten line 163 (the `GroupToRole` description in the LDAP section) so it notes the values are control-plane **and** data-plane role strings, cross-referencing the new subsection — a one-clause edit, not a rewrite.
|
||||
|
||||
**Step 2: Verify the doc renders / no broken anchors, then commit**
|
||||
|
||||
```bash
|
||||
git add docs/security.md
|
||||
git commit -m "docs(security): document the GroupToRole data-plane role requirement"
|
||||
```
|
||||
|
||||
(No test; verification = re-read the section for accuracy against the permission-flag names at lines 186-216.)
|
||||
|
||||
---
|
||||
|
||||
### Task T6: Galaxy writer — invalidate handle caches on reconnect
|
||||
|
||||
**Classification:** small
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** T1, T2, T3, T4, T5
|
||||
|
||||
Addresses pending.md #6(b).
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/IGalaxyDataWriter.cs` (interface — add method)
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs` (caches 28-32; implement clear)
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/TracedGalaxyDataWriter.cs` (decorator — delegate)
|
||||
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (`ReopenAsync` 291-296; `_dataWriter` set at 255-257)
|
||||
- Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/` (GatewayGalaxyDataWriter test class)
|
||||
|
||||
**Context:** `GatewayGalaxyDataWriter` caches `_itemHandles` (ref→gw item handle, line 28) and `_supervisedHandles` (handle→advised, line 32). `_dataWriter` is built **once** in `BuildProductionRuntimeAsync` (255-257) and reused across reconnects. On reconnect, `ReopenAsync` (291-296) calls `_ownedMxSession.RecreateAsync` — which disposes + rebuilds the session, so all prior gw item handles are **dead**. The writer's caches still hold the stale handles, so a write right after a reconnect AddItem-skips and uses a dead handle. Add a cache-invalidation seam called on reopen.
|
||||
|
||||
**Step 1: Failing test — invalidation forces a fresh AddItem on the next write**
|
||||
|
||||
In the existing `GatewayGalaxyDataWriter` test class (it already constructs the writer over a fake/owned session — reuse that harness), add:
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task InvalidateHandleCaches_forces_AddItem_again_on_next_write()
|
||||
{
|
||||
var (writer, fakeSession) = BuildWriterWithFakeSession(); // reuse existing harness
|
||||
await writer.WriteAsync(new[] { new WriteRequest("Tag.Attr", 1) }, _ => SecurityClassification.Operate, default);
|
||||
fakeSession.AddItemCallCount.ShouldBe(1);
|
||||
|
||||
writer.InvalidateHandleCaches();
|
||||
|
||||
await writer.WriteAsync(new[] { new WriteRequest("Tag.Attr", 2) }, _ => SecurityClassification.Operate, default);
|
||||
fakeSession.AddItemCallCount.ShouldBe(2); // re-AddItem'd because the cache was cleared
|
||||
}
|
||||
```
|
||||
|
||||
If the existing fake session doesn't count `AddItem`, either extend it with a counter or assert via an internal seam on the writer (`internal int CachedItemHandleCount => _itemHandles.Count;`): assert `>0` after a write, `0` after `InvalidateHandleCaches()`. Pick whichever the existing harness makes cleanest.
|
||||
|
||||
**Step 2: Run — expect FAIL (no such method)**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter "FullyQualifiedName~InvalidateHandleCaches"
|
||||
```
|
||||
|
||||
**Step 3: Add the method to the interface + impl + decorator**
|
||||
|
||||
- `IGalaxyDataWriter`: add
|
||||
```csharp
|
||||
/// <summary>Drop cached gw item handles + supervisory-advise state. Call after a session
|
||||
/// reconnect — the old handles are dead, so the next write must re-AddItem + re-advise.</summary>
|
||||
void InvalidateHandleCaches();
|
||||
```
|
||||
- `GatewayGalaxyDataWriter`:
|
||||
```csharp
|
||||
public void InvalidateHandleCaches()
|
||||
{
|
||||
_itemHandles.Clear();
|
||||
_supervisedHandles.Clear();
|
||||
}
|
||||
```
|
||||
- `TracedGalaxyDataWriter` (decorator): delegate — `public void InvalidateHandleCaches() => _inner.InvalidateHandleCaches();` (match the field name the decorator uses for its inner writer).
|
||||
|
||||
**Step 4: Call it on reopen**
|
||||
|
||||
In `GalaxyDriver.ReopenAsync` (291-296), after `RecreateAsync` succeeds:
|
||||
|
||||
```csharp
|
||||
private async Task ReopenAsync(CancellationToken cancellationToken)
|
||||
{
|
||||
if (_ownedMxSession is null) return;
|
||||
var clientOptions = BuildClientOptions(_options.Gateway);
|
||||
await _ownedMxSession.RecreateAsync(clientOptions, cancellationToken).ConfigureAwait(false);
|
||||
// The recreated session invalidates every prior gw item handle; drop the writer's handle/advise
|
||||
// caches so the next write re-AddItems + re-AdviseSupervisory against the fresh session.
|
||||
_dataWriter?.InvalidateHandleCaches();
|
||||
}
|
||||
```
|
||||
|
||||
(`_dataWriter` is the `TracedGalaxyDataWriter`-wrapped field set at 255-257; the call flows through to the inner `GatewayGalaxyDataWriter`.)
|
||||
|
||||
**Step 5: Run — expect PASS, run the full Galaxy suite, build**
|
||||
|
||||
```bash
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter "FullyQualifiedName~InvalidateHandleCaches"
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests
|
||||
```
|
||||
|
||||
**Step 6: Commit (stage by path)**
|
||||
|
||||
```bash
|
||||
git add src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/
|
||||
git commit -m "fix(galaxy): invalidate writer handle caches on session reconnect"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Final verification (after T1–T6)
|
||||
|
||||
```bash
|
||||
dotnet build ZB.MOM.WW.OtOpcUa.slnx # 0 errors (TreatWarningsAsErrors)
|
||||
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests
|
||||
```
|
||||
|
||||
These are all unit-level changes (no live docker-dev `/run` gate needed — none touch Razor/JS or the wire protocol behaviourally; the fast-fail paths and cache changes are exercised by Akka.TestKit + unit tests). Then finish via **superpowers-extended-cc:finishing-a-development-branch** (intent: merge-to-master + push). After merge, update `pending.md` to mark #3, #4, #6 resolved.
|
||||
|
||||
## Out of scope (left in pending.md)
|
||||
|
||||
- #4 sub-item: FOCAS — covered (T3). The "surface real device-write status" item (#5) is a **separate** chosen-later effort, not part of this cluster.
|
||||
- #7 driver-reconfigure-while-faulted — its own high-risk actor-state-machine design/plan.
|
||||
- #1/#2 Phase B/C; #8 rig cleanups — unchanged.
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-14-harden-milestone-1b-plan.md",
|
||||
"tasks": [
|
||||
{"id": "354", "subject": "T0: Create feature branch chore/harden-milestone-1b", "classification": "trivial", "status": "completed"},
|
||||
{"id": "355", "subject": "T1: DriverHostActor Stale write fast-fail + micro-opts + raw-blob routing test", "classification": "standard", "status": "completed", "blockedBy": ["354"], "commits": ["4cda275b", "99eea0b4"]},
|
||||
{"id": "356", "subject": "T2: DriverInstanceActor degraded-state write fast-fail + benign SubscriptionEstablished handler", "classification": "standard", "status": "completed", "blockedBy": ["354"], "commits": ["42b4a923", "7e405e94"]},
|
||||
{"id": "357", "subject": "T3: FOCAS cache equipment-tag parsed addresses", "classification": "small", "status": "completed", "blockedBy": ["354"], "commits": ["46f559f5"]},
|
||||
{"id": "358", "subject": "T4: Parity test future-enum trap InlineData(2,false)", "classification": "trivial", "status": "completed", "blockedBy": ["354"], "commits": ["97d82dda"]},
|
||||
{"id": "359", "subject": "T5: docs/security.md data-plane GroupToRole requirement note", "classification": "trivial", "status": "completed", "blockedBy": ["354"], "commits": ["190ef34e"]},
|
||||
{"id": "360", "subject": "T6: Galaxy writer invalidate handle caches on reconnect", "classification": "small", "status": "completed", "blockedBy": ["354"], "commits": ["f77488ee"]}
|
||||
],
|
||||
"lastUpdated": "2026-06-14"
|
||||
}
|
||||
Reference in New Issue
Block a user