Files
lmxopcua/docs/plans/2026-06-18-galaxy-writer-handle-sharing.md
T

12 KiB

Galaxy writer ⇄ subscription-registry item-handle sharing — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.

Goal: Let GatewayGalaxyDataWriter borrow a live MXAccess item handle from the SubscriptionRegistry so the first write to an already-subscribed tag skips a redundant AddItem round-trip, without introducing any stale-handle regression.

Architecture: Delegate seam (symmetric with the writer's existing securityResolver). The registry exposes a liveness-guarded int? TryResolveItemHandle(fullRef); the writer takes an optional Func<string,int?>? subscribedHandleSource and consults it (never caching borrowed handles); GalaxyDriver wires the two together. Proven by unit tests for the resolution logic and a live-gw smoke that asserts a subscribed-tag write commits with zero AddItem.

Tech Stack: C# / .NET 10, xUnit + Shouldly, MXAccess gateway (ZB.MOM.WW.MxGateway.*).

Design: docs/plans/2026-06-18-galaxy-writer-handle-sharing-design.md (committed c85c4e5c). Branch: feat/galaxy-writer-handle-sharing (off master 70e1bde9).

Hard rules: stage by explicit path (never git add .); never stage sql_login.txt, src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/, pending.md, stillpending.md, docker-dev/docker-compose.yml; never echo/commit the gateway API key or any secret; no force-push; no --no-verify; NO EF migration / Commons / proto change; NO bUnit. Use dangerouslyDisableSandbox: true for all build/test/rig commands. Finish = merge to master + push.

Dependency graph: {T1 ∥ T2} → T3 → T4. T1 (SubscriptionRegistry.cs) and T2 (GatewayGalaxyDataWriter.cs) touch disjoint files — dispatch their implementers concurrently.


Task 1: SubscriptionRegistry forward fullRef → handle lookup

Classification: small Estimated implement time: ~4 min Parallelizable with: Task 2

Files:

  • Modify: src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/SubscriptionRegistry.cs
  • Test: tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/SubscriptionRegistryHandleResolveTests.cs (create)

What:

  1. Add a forward index field next to the existing dictionaries:
    private readonly ConcurrentDictionary<string, int> _itemHandleByFullRef =
        new(StringComparer.OrdinalIgnoreCase);
    
  2. Maintain it incrementally (only for binding.ItemHandle > 0):
    • In Register and in the add loop of Rebind: _itemHandleByFullRef[binding.FullReference] = binding.ItemHandle;
    • In Remove and in the drop loop of Rebind: best-effort drop — when the handle's reverse-map set became empty (the existing remaining.IsEmpty branch), also _itemHandleByFullRef.TryRemove(binding.FullReference, out _) only if the current mapped handle still equals this binding's handle (TryGetValue + equality before remove), so a concurrent re-add for the same ref isn't clobbered.
  3. Add the public lookup with the liveness guard (the guard is what makes the best-effort removal safe — a lingering entry can never resolve to a dead handle):
    /// <summary>
    ///     Resolve the live MXAccess item handle a current subscription holds for <paramref name="fullRef"/>,
    ///     or null when no live subscription covers it. The writer borrows this handle to skip a
    ///     redundant AddItem. Guarded by the authoritative live-handle set (<c>_subscribersByItemHandle</c>)
    ///     so a stale forward-map entry can never hand out a dead handle.
    /// </summary>
    public int? TryResolveItemHandle(string fullRef)
    {
        if (fullRef is null) return null;
        if (_itemHandleByFullRef.TryGetValue(fullRef, out var handle)
            && _subscribersByItemHandle.ContainsKey(handle))
            return handle;
        return null;
    }
    

Tests (xUnit + Shouldly; the registry needs no gateway — construct it and Register fake TagBindings):

  • Register then TryResolveItemHandle("Tag.A") → the registered handle; case-insensitive ("tag.a" resolves).
  • A fullRef never registered → null.
  • Remove the subscription → TryResolveItemHandlenull.
  • Rebind with a new handle for the same ref → resolves the new handle (not the old).
  • Bindings with ItemHandle <= 0 are not resolvable (null).
  • Liveness guard: a ref whose handle is no longer in the reverse map resolves null (reachable via Remove of the only subscriber).

Steps: write the failing tests → run dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests --filter FullyQualifiedName~SubscriptionRegistryHandleResolve (FAIL) → implement → re-run (PASS) → git add the two paths + commit feat(galaxy): SubscriptionRegistry.TryResolveItemHandle forward lookup.


Task 2: Writer borrow seam + AddItemCallCount

Classification: small Estimated implement time: ~4 min Parallelizable with: Task 1

Files:

  • Modify: src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs
  • Test: tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyDataWriterTests.cs (extend)

What:

  1. Add an optional last ctor param (keeps every existing call site compiling — current calls are new GatewayGalaxyDataWriter(session, writeUserId, logger?)):
    public GatewayGalaxyDataWriter(
        GalaxyMxSession session, int writeUserId, ILogger? logger = null,
        Func<string, int?>? subscribedHandleSource = null)
    
    Store it in a private readonly Func<string, int?>? _subscribedHandleSource; field.
  2. Add an AddItemCallCount seam (proves "AddItem was skipped" live):
    private int _addItemCallCount;
    internal int AddItemCallCount => Volatile.Read(ref _addItemCallCount);
    
  3. Extract the cached-or-borrowed decision into a synchronous internal seam (unit-testable without a live session — the SDK session is sealed/unfakeable):
    /// <summary>
    ///     Resolve an item handle WITHOUT touching the gateway: a prior writer-AddItem'd handle
    ///     (_itemHandles), else a live subscription handle borrowed from the registry. Returns null
    ///     when neither is available (caller must AddItem). A borrowed handle is intentionally NOT
    ///     cached in _itemHandles — the registry owns its lifecycle (incl. reconnect Rebind), so the
    ///     next write re-borrows the fresh handle and no stale-cache window is introduced.
    /// </summary>
    internal int? TryResolveCachedOrBorrowed(string fullRef)
    {
        if (_itemHandles.TryGetValue(fullRef, out var existing)) return existing;
        if (_subscribedHandleSource?.Invoke(fullRef) is int borrowed && borrowed > 0) return borrowed;
        return null;
    }
    
  4. Rewire EnsureItemHandleAsync to use the seam, AddItem only on a null result, and count it:
    private async Task<int> EnsureItemHandleAsync(
        MxGatewaySession session, int serverHandle, string fullRef, CancellationToken ct)
    {
        if (TryResolveCachedOrBorrowed(fullRef) is int resolved) return resolved;
        var handle = await session.AddItemAsync(serverHandle, fullRef, ct).ConfigureAwait(false);
        Interlocked.Increment(ref _addItemCallCount);
        _itemHandles[fullRef] = handle;
        return handle;
    }
    
    (Keep the class-summary remark honest: note the cache may now be augmented by borrowed subscription handles that are consulted but not stored.)

Tests (extend GatewayGalaxyDataWriterTests, same no-live-session pattern):

  • subscribedHandleSource returns a handle for a ref → TryResolveCachedOrBorrowed returns it, and CachedItemHandleCount stays 0 (borrow is not cached) and AddItemCallCount stays 0.
  • A _itemHandles hit (via SeedHandleCachesForTest) wins over the source.
  • null source → TryResolveCachedOrBorrowed returns null (today's behavior; would AddItem).
  • Source returns 0/negative (failed subscribe) → treated as no-borrow (null).

Steps: failing tests → run filter FullyQualifiedName~GatewayGalaxyDataWriter (FAIL) → implement → re-run (PASS) → git add the two paths + commit feat(galaxy): writer borrows live subscription item handles (skip redundant AddItem).


Task 3: Wire the registry into the production writer in GalaxyDriver

Classification: small Estimated implement time: ~2 min Parallelizable with: none (blocked by Task 1 + Task 2)

Files:

  • Modify: src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs (the writer construction at ~255-256)

What: pass the registry resolver as the new 4th writer arg:

_dataWriter = new TracedGalaxyDataWriter(
    new GatewayGalaxyDataWriter(
        _ownedMxSession, _options.MxAccess.WriteUserId, _logger,
        _subscriptions.TryResolveItemHandle),
    ...);

_subscriptions (line 72) is already a live field at this point. No other change. (The reconnect path already invalidates the writer cache at GalaxyDriver.cs:298; borrowed handles aren't cached, and the registry is Rebind'd in ReplayAsync, so the writer naturally re-borrows fresh handles post-reconnect — nothing to add here.)

Steps: edit → dotnet build the Galaxy driver project (0 errors) → run the full Driver.Galaxy.Tests suite (all green, confirms T1+T2+T3 integrate) → git add the path + commit feat(galaxy): wire SubscriptionRegistry handle resolver into the production writer.


Task 4: Live-gw borrow smoke + verification + finish

Classification: small Estimated implement time: ~5 min (+ live run) Parallelizable with: none (blocked by Task 3)

Files:

  • Modify: tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime/GatewayGalaxyLiveReopenAndWriteTests.cs
  • Modify: docs/Galaxy.Performance.md or the relevant Galaxy doc — a short note that the writer borrows subscription handles (only if a natural home exists; otherwise skip docs — no overclaim).

What — add one skip-gated [Fact] reusing the harness's RequireLiveGatewayOrSkip / BuildClientOptions and the dedicated writable WriteRef = "TestMachine_002.TestFloat":

  1. Open a session + GatewayGalaxySubscriber; SubscribeBulkAsync([WriteRef], …) and capture the real item handle the gateway returned for WriteRef (from the SubscribeResult — check how GalaxyDriver maps subscribe results into TagBindings to extract the handle).
  2. Build a SubscriptionRegistry, Register a binding (WriteRef, thatHandle).
  3. Construct the writer with registry.TryResolveItemHandle as the source; write WriteRef. Assert: status Good (0u) and writer.AddItemCallCount == 0 (the borrow skipped AddItem).
  4. Control (same safe tag): a writer with the same session but an empty registry (or null source) writes WriteRef → Good and AddItemCallCount == 1 (no borrow ⇒ AddItem happened). Both write only TestMachine_002.TestFloat.

Verify:

  • dotnet build ZB.MOM.WW.OtOpcUa.slnx — 0 errors.
  • dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests — all green (live test skips without env vars).
  • Live run (the gate) — source the key without echoing and run only the new + existing live smokes:
    KEY=$(docker exec otopcua-dev-central-1-1 printenv GALAXY_MXGW_API_KEY)
    MXGW_ENDPOINT=http://10.100.0.48:5120 GALAXY_MXGW_API_KEY="$KEY" \
      dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests \
      --filter FullyQualifiedName~GatewayGalaxyLiveReopenAndWrite
    
    Expect: the existing 2 + the new borrow smoke pass (borrow write Good with AddItemCallCount==0, control with ==1). If the borrowed handle does NOT commit, the premise is false — stop, do not merge, report.

Finish: commit the test (+ optional doc) by explicit path; then superpowers-extended-cc:finishing-a-development-branch → merge to master + push; delete the branch; update project_stillpending_backlog.md + MEMORY.md (mark §2.4 shipped).

Steps: add the live [Fact] → build → unit suite green → live run PASS → commit → merge + push.