docs(plan): Galaxy writer handle-sharing implementation plan + tasks

This commit is contained in:
Joseph Doherty
2026-06-18 04:13:51 -04:00
parent c85c4e5cd0
commit 490c6b7498
2 changed files with 241 additions and 0 deletions
@@ -0,0 +1,224 @@
# Galaxy writer ⇄ subscription-registry item-handle sharing — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
**Goal:** Let `GatewayGalaxyDataWriter` borrow a live MXAccess item handle from the
`SubscriptionRegistry` so the first write to an already-subscribed tag skips a redundant `AddItem`
round-trip, without introducing any stale-handle regression.
**Architecture:** Delegate seam (symmetric with the writer's existing `securityResolver`). The
registry exposes a liveness-guarded `int? TryResolveItemHandle(fullRef)`; the writer takes an
optional `Func<string,int?>? subscribedHandleSource` and consults it (never caching borrowed
handles); `GalaxyDriver` wires the two together. Proven by unit tests for the resolution logic and
a live-gw smoke that asserts a subscribed-tag write commits with **zero** `AddItem`.
**Tech Stack:** C# / .NET 10, xUnit + Shouldly, MXAccess gateway (`ZB.MOM.WW.MxGateway.*`).
**Design:** `docs/plans/2026-06-18-galaxy-writer-handle-sharing-design.md` (committed `c85c4e5c`).
**Branch:** `feat/galaxy-writer-handle-sharing` (off master `70e1bde9`).
**Hard rules:** stage by explicit path (never `git add .`); never stage `sql_login.txt`,
`src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`, `pending.md`, `stillpending.md`,
`docker-dev/docker-compose.yml`; never echo/commit the gateway API key or any secret; no
force-push; no `--no-verify`; **NO EF migration / Commons / proto change; NO bUnit.** Use
`dangerouslyDisableSandbox: true` for all build/test/rig commands. Finish = merge to master + push.
**Dependency graph:** `{T1 ∥ T2} → T3 → T4`. T1 (`SubscriptionRegistry.cs`) and T2
(`GatewayGalaxyDataWriter.cs`) touch disjoint files — dispatch their implementers concurrently.
---
### Task 1: SubscriptionRegistry forward `fullRef → handle` lookup
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 2
**Files:**
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/SubscriptionRegistry.cs`
- Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/SubscriptionRegistryHandleResolveTests.cs` (create)
**What:**
1. Add a forward index field next to the existing dictionaries:
```csharp
private readonly ConcurrentDictionary<string, int> _itemHandleByFullRef =
new(StringComparer.OrdinalIgnoreCase);
```
2. Maintain it incrementally (only for `binding.ItemHandle > 0`):
- In `Register` and in the **add** loop of `Rebind`: `_itemHandleByFullRef[binding.FullReference] = binding.ItemHandle;`
- In `Remove` and in the **drop** loop of `Rebind`: best-effort drop — when the handle's
reverse-map set became empty (the existing `remaining.IsEmpty` branch), also
`_itemHandleByFullRef.TryRemove(binding.FullReference, out _)` **only if** the current mapped
handle still equals this binding's handle (`TryGetValue` + equality before remove), so a
concurrent re-add for the same ref isn't clobbered.
3. Add the public lookup with the liveness guard (the guard is what makes the best-effort removal
safe — a lingering entry can never resolve to a dead handle):
```csharp
/// <summary>
/// Resolve the live MXAccess item handle a current subscription holds for <paramref name="fullRef"/>,
/// or null when no live subscription covers it. The writer borrows this handle to skip a
/// redundant AddItem. Guarded by the authoritative live-handle set (<c>_subscribersByItemHandle</c>)
/// so a stale forward-map entry can never hand out a dead handle.
/// </summary>
public int? TryResolveItemHandle(string fullRef)
{
if (fullRef is null) return null;
if (_itemHandleByFullRef.TryGetValue(fullRef, out var handle)
&& _subscribersByItemHandle.ContainsKey(handle))
return handle;
return null;
}
```
**Tests** (xUnit + Shouldly; the registry needs no gateway — construct it and `Register` fake `TagBinding`s):
- `Register` then `TryResolveItemHandle("Tag.A")` → the registered handle; case-insensitive (`"tag.a"` resolves).
- A fullRef never registered → `null`.
- `Remove` the subscription → `TryResolveItemHandle` → `null`.
- `Rebind` with a new handle for the same ref → resolves the **new** handle (not the old).
- Bindings with `ItemHandle <= 0` are not resolvable (`null`).
- Liveness guard: a ref whose handle is no longer in the reverse map resolves `null` (reachable
via `Remove` of the only subscriber).
**Steps:** write the failing tests → run `dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter FullyQualifiedName~SubscriptionRegistryHandleResolve` (FAIL) → implement → re-run (PASS) → `git add` the two paths + commit `feat(galaxy): SubscriptionRegistry.TryResolveItemHandle forward lookup`.
---
### Task 2: Writer borrow seam + `AddItemCallCount`
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1
**Files:**
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs`
- Test: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyDataWriterTests.cs` (extend)
**What:**
1. Add an optional **last** ctor param (keeps every existing call site compiling — current calls
are `new GatewayGalaxyDataWriter(session, writeUserId, logger?)`):
```csharp
public GatewayGalaxyDataWriter(
GalaxyMxSession session, int writeUserId, ILogger? logger = null,
Func<string, int?>? subscribedHandleSource = null)
```
Store it in a `private readonly Func<string, int?>? _subscribedHandleSource;` field.
2. Add an `AddItemCallCount` seam (proves "AddItem was skipped" live):
```csharp
private int _addItemCallCount;
internal int AddItemCallCount => Volatile.Read(ref _addItemCallCount);
```
3. Extract the cached-or-borrowed decision into a synchronous internal seam (unit-testable
**without a live session** — the SDK session is sealed/unfakeable):
```csharp
/// <summary>
/// Resolve an item handle WITHOUT touching the gateway: a prior writer-AddItem'd handle
/// (_itemHandles), else a live subscription handle borrowed from the registry. Returns null
/// when neither is available (caller must AddItem). A borrowed handle is intentionally NOT
/// cached in _itemHandles — the registry owns its lifecycle (incl. reconnect Rebind), so the
/// next write re-borrows the fresh handle and no stale-cache window is introduced.
/// </summary>
internal int? TryResolveCachedOrBorrowed(string fullRef)
{
if (_itemHandles.TryGetValue(fullRef, out var existing)) return existing;
if (_subscribedHandleSource?.Invoke(fullRef) is int borrowed && borrowed > 0) return borrowed;
return null;
}
```
4. Rewire `EnsureItemHandleAsync` to use the seam, AddItem only on a null result, and count it:
```csharp
private async Task<int> EnsureItemHandleAsync(
MxGatewaySession session, int serverHandle, string fullRef, CancellationToken ct)
{
if (TryResolveCachedOrBorrowed(fullRef) is int resolved) return resolved;
var handle = await session.AddItemAsync(serverHandle, fullRef, ct).ConfigureAwait(false);
Interlocked.Increment(ref _addItemCallCount);
_itemHandles[fullRef] = handle;
return handle;
}
```
(Keep the class-summary remark honest: note the cache may now be augmented by borrowed
subscription handles that are consulted but not stored.)
**Tests** (extend `GatewayGalaxyDataWriterTests`, same no-live-session pattern):
- `subscribedHandleSource` returns a handle for a ref → `TryResolveCachedOrBorrowed` returns it,
and `CachedItemHandleCount` stays `0` (borrow is not cached) and `AddItemCallCount` stays `0`.
- A `_itemHandles` hit (via `SeedHandleCachesForTest`) wins over the source.
- `null` source → `TryResolveCachedOrBorrowed` returns `null` (today's behavior; would AddItem).
- Source returns `0`/negative (failed subscribe) → treated as no-borrow (`null`).
**Steps:** failing tests → run filter `FullyQualifiedName~GatewayGalaxyDataWriter` (FAIL) → implement → re-run (PASS) → `git add` the two paths + commit `feat(galaxy): writer borrows live subscription item handles (skip redundant AddItem)`.
---
### Task 3: Wire the registry into the production writer in `GalaxyDriver`
**Classification:** small
**Estimated implement time:** ~2 min
**Parallelizable with:** none (blocked by Task 1 + Task 2)
**Files:**
- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (the writer construction at ~`255-256`)
**What:** pass the registry resolver as the new 4th writer arg:
```csharp
_dataWriter = new TracedGalaxyDataWriter(
new GatewayGalaxyDataWriter(
_ownedMxSession, _options.MxAccess.WriteUserId, _logger,
_subscriptions.TryResolveItemHandle),
...);
```
`_subscriptions` (line 72) is already a live field at this point. No other change. (The reconnect
path already invalidates the writer cache at `GalaxyDriver.cs:298`; borrowed handles aren't cached,
and the registry is `Rebind`'d in `ReplayAsync`, so the writer naturally re-borrows fresh handles
post-reconnect — nothing to add here.)
**Steps:** edit → `dotnet build` the Galaxy driver project (0 errors) → run the full
`Driver.Galaxy.Tests` suite (all green, confirms T1+T2+T3 integrate) → `git add` the path + commit
`feat(galaxy): wire SubscriptionRegistry handle resolver into the production writer`.
---
### Task 4: Live-gw borrow smoke + verification + finish
**Classification:** small
**Estimated implement time:** ~5 min (+ live run)
**Parallelizable with:** none (blocked by Task 3)
**Files:**
- Modify: `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyLiveReopenAndWriteTests.cs`
- Modify: `docs/Galaxy.Performance.md` **or** the relevant Galaxy doc — a short note that the writer
borrows subscription handles (only if a natural home exists; otherwise skip docs — no overclaim).
**What — add one skip-gated `[Fact]`** reusing the harness's `RequireLiveGatewayOrSkip` /
`BuildClientOptions` and the dedicated writable `WriteRef = "TestMachine_002.TestFloat"`:
1. Open a session + `GatewayGalaxySubscriber`; `SubscribeBulkAsync([WriteRef], …)` and capture the
**real** item handle the gateway returned for `WriteRef` (from the `SubscribeResult` — check how
`GalaxyDriver` maps subscribe results into `TagBinding`s to extract the handle).
2. Build a `SubscriptionRegistry`, `Register` a binding `(WriteRef, thatHandle)`.
3. Construct the writer **with** `registry.TryResolveItemHandle` as the source; write `WriteRef`.
Assert: status Good (`0u`) **and `writer.AddItemCallCount == 0`** (the borrow skipped AddItem).
4. **Control** (same safe tag): a writer with the **same session but an empty registry** (or `null`
source) writes `WriteRef` → Good **and `AddItemCallCount == 1`** (no borrow ⇒ AddItem happened).
Both write only `TestMachine_002.TestFloat`.
**Verify:**
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — 0 errors.
- `dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests` — all green (live test skips
without env vars).
- **Live run (the gate)** — source the key without echoing and run only the new + existing live smokes:
```bash
KEY=$(docker exec otopcua-dev-central-1-1 printenv GALAXY_MXGW_API_KEY)
MXGW_ENDPOINT=http://10.100.0.48:5120 GALAXY_MXGW_API_KEY="$KEY" \
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests \
--filter FullyQualifiedName~GatewayGalaxyLiveReopenAndWrite
```
Expect: the existing 2 + the new borrow smoke pass (borrow write Good with `AddItemCallCount==0`,
control with `==1`). If the borrowed handle does NOT commit, the premise is false — **stop, do not
merge**, report.
**Finish:** commit the test (+ optional doc) by explicit path; then
superpowers-extended-cc:finishing-a-development-branch → merge to master + push; delete the branch;
update `project_stillpending_backlog.md` + `MEMORY.md` (mark §2.4 shipped).
**Steps:** add the live `[Fact]` → build → unit suite green → live run PASS → commit → merge + push.
@@ -0,0 +1,17 @@
{
"planPath": "docs/plans/2026-06-18-galaxy-writer-handle-sharing.md",
"designPath": "docs/plans/2026-06-18-galaxy-writer-handle-sharing-design.md",
"designCommit": "c85c4e5c",
"baseMaster": "70e1bde9",
"branch": "feat/galaxy-writer-handle-sharing",
"scope": "Galaxy writer borrows live MXAccess item handles from the SubscriptionRegistry so the first write to an already-subscribed tag skips a redundant AddItem round-trip. T1: SubscriptionRegistry.TryResolveItemHandle forward fullRef->handle lookup with a liveness guard. T2: writer optional Func<string,int?> subscribedHandleSource + TryResolveCachedOrBorrowed seam (borrowed handles NOT cached) + AddItemCallCount seam. T3: GalaxyDriver wires _subscriptions.TryResolveItemHandle into the production writer ctor. T4: skip-gated live-gw smoke proving a subscribed-tag write commits with AddItemCallCount==0 (control: ==1). NO Commons/proto/EF/migration change; NO bUnit; AdminUI untouched.",
"dependencyGraph": "{T1 ∥ T2} -> T3 -> T4",
"tasks": [
{"id": 1, "subject": "SubscriptionRegistry.TryResolveItemHandle forward lookup + tests", "classification": "small", "status": "pending", "parallelizableWith": [2], "files": ["src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/SubscriptionRegistry.cs", "tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/SubscriptionRegistryHandleResolveTests.cs"]},
{"id": 2, "subject": "Writer borrow seam + AddItemCallCount + tests", "classification": "small", "status": "pending", "parallelizableWith": [1], "files": ["src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs", "tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyDataWriterTests.cs"]},
{"id": 3, "subject": "Wire registry resolver into the production writer in GalaxyDriver", "classification": "small", "status": "pending", "blockedBy": [1, 2], "files": ["src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs"]},
{"id": 4, "subject": "Live-gw borrow smoke + build/test + live run + finish", "classification": "small", "status": "pending", "blockedBy": [3], "files": ["tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyLiveReopenAndWriteTests.cs"]}
],
"executionState": "PENDING — subagent-driven execution, this session.",
"lastUpdated": "2026-06-18"
}