Files
lmxopcua/docs/plans/2026-06-18-galaxy-writer-handle-sharing-design.md
T

7.7 KiB
Raw Blame History

Galaxy writer ⇄ subscription-registry item-handle sharing — Design

Brainstormed 2026-06-18. Backlog item: stillpending.md §2.4 — "Galaxy — writer item-handle cache not shared with the subscription registry" (Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs). Off master 70e1bde9.

Problem (verified in code)

The Galaxy data writer's item-handle cache is already shipped and reconnect-wired: GatewayGalaxyDataWriter holds _itemHandles (fullRef → MXAccess hItem) plus _supervisedHandles, and GalaxyDriver.ReopenAsync invalidates them after a session recreate (GalaxyDriver.cs:298, commits f05b5d79 / f77488ee). So "add a writer cache" is not the open work.

What is open is exactly what the backlog says: the writer cache is isolated from the SubscriptionRegistry. The registry already records TagBinding(FullReference, ItemHandle) for every live subscription, and — the load-bearing fact — both subscribe-handles (from SubscribeBulk) and write-handles (from the writer's AddItem) are issued against the same MXAccess server registration (GalaxyMxSession.ServerHandle). Within one registration an hItem is shared across the AddItem / Advise / Write call paths, so a handle the gateway returned for a subscription is reusable by a Write.

Today the writer never consults the registry, so the first write to an already-subscribed tag pays a redundant AddItem round-trip (subsequent writes hit the writer's own _itemHandles). This is an _Optimization._ — a first-write-only latency saving, plus unification of handle provenance (one fewer source of truth for "which hItem is this tag").

Load-bearing premise (proven live, not assumed)

A subscription's item handle is usable for a no-login supervisory Write that commits.

If this were false, borrowing the subscribed handle would regress writes for subscribed tags. The MXAccess Toolkit semantics say it holds (one hItem per item under a registration, shared across Advise/Write), but this phase does not assume it — the live-gw test is the merge gate, not decoration.

Approach

Three options were considered:

  1. Delegate seam (chosen). The writer ctor gains an optional Func<string, int?>? subscribedHandleSource = null; GalaxyDriver wires it to the registry's new TryResolveItemHandle. This is symmetric with the writer's existing securityResolver delegate, preserves the writer's "independently testable" property (null ⇒ today's exact behavior), and is the smallest surface.
  2. Direct SubscriptionRegistry reference in the writer. Same assembly, registry is test-constructible — but couples the writer to a concrete collaborator with no upside over (1).
  3. Unified shared cache object both sides read/write. Matches the literal "wire bindings into _itemHandles" phrasing but is the most invasive (changes the writer's ownership model). YAGNI.

Approach 1. AdminUI untouched. No Commons/proto/EF/migration change. No bUnit.

Resolution rule (self-healing — no stale-handle regression)

GatewayGalaxyDataWriter.EnsureItemHandleAsync(fullRef) becomes:

  1. _itemHandles hit (the writer AddItem'd this tag itself before) → use it.
  2. else subscribedHandleSource?.Invoke(fullRef) returns a live handle → use it, but do NOT store it in _itemHandles. The registry owns the borrowed handle's lifecycle (including reconnect Rebind), so after a reconnect the writer always re-borrows the fresh handle on the next write — there is no stale-cache window introduced by the borrow.
  3. else AddItem + store in _itemHandles (today's path), incrementing an AddItemCallCount test seam.

AdviseSupervisory is unchanged: the writer still supervisory-advises the (possibly borrowed) hItem once per handle via _supervisedHandles. Borrowing only skips the AddItem round-trip — which is the entire point of the optimization. The borrowed item already carries the subscriber's data-change advise; supervisory advise is an additional mode on the same item.

The decision in steps 12 is extracted into a synchronous internal seam TryResolveCachedOrBorrowed(fullRef) -> int? so it is unit-testable without a live session (the SDK MxGatewaySession is sealed with an internal ctor and cannot be faked — see Testing). EnsureItemHandleAsync calls the seam first and only AddItems on a null result.

Registry change

SubscriptionRegistry gains a forward lookup:

  • A ConcurrentDictionary<string, int> _itemHandleByFullRef (StringComparer.OrdinalIgnoreCase, matching the writer's cache), maintained incrementally in Register / Rebind (add the fullRef → handle for each binding.ItemHandle > 0) and best-effort dropped in Remove / Rebind.
  • public int? TryResolveItemHandle(string fullRef) that returns the mapped handle only if _subscribersByItemHandle still contains it — a liveness guard. This means even a lingering forward-map entry can never hand out a dead handle, because _subscribersByItemHandle is the already-authoritative live-handle set. The guard de-risks the removal bookkeeping: the forward map is an index, not the source of truth.

The event-dispatch hot path (ResolveSubscribers) is untouched — the forward map is consulted only on writes (rare relative to events) and mutated only on subscribe/unsubscribe/rebind (which already do O(bindings) work).

GalaxyDriver passes _subscriptions.TryResolveItemHandle as the writer's subscribedHandleSource when it constructs the production writer.

Error handling

Unchanged. A borrowed handle that fails a write surfaces its own Bad status via the existing TranslateReply path; the writer does not cache borrowed handles, so the next write simply re-resolves (re-borrow if still live, else AddItem). TryResolveItemHandle returns null cleanly for unknown / dead handles.

Testing

Unit (xUnit + Shouldly, Driver.Galaxy.Tests) — no live session required:

  • GatewayGalaxyDataWriter: seed subscribedHandleSourceTryResolveCachedOrBorrowed returns the borrowed handle and leaves CachedItemHandleCount == 0; a _itemHandles hit wins over the source; a null source ⇒ no borrow (returns null). (Mirrors the existing SeedHandleCachesForTest / count-seam pattern, since the gw session cannot be faked.)
  • SubscriptionRegistry: TryResolveItemHandle returns the handle after Register; returns null after Remove; returns the fresh handle after Rebind; the liveness guard returns null when the handle is absent from _subscribersByItemHandle.

Live gw (the merge gate) — extend GatewayGalaxyLiveReopenAndWriteTests (skip-gated; runs against MXGW_ENDPOINT=http://10.100.0.48:5120 with the key sourced via the durable docker exec … printenv GALAXY_MXGW_API_KEY recipe, never echoed). Add an internal AddItemCallCount seam on the writer, then:

  • Subscribe a real tag (registry now holds its hItem) → write it through the registry-wired writer → assert the write commits (Good status + value persists) and AddItemCallCount == 0.
  • Control: write a non-subscribed tag → assert AddItemCallCount == 1.

This proves both halves: the borrowed handle is write-usable (premise holds) and the redundant AddItem is actually skipped.

Deferred / out of scope

  • Reverse direction (subscriber borrowing the writer's AddItem handles) — subscriber handles come from SubscribeBulk; no need.
  • Unifying the two caches into one shared object (Approach 3).
  • Any AdminUI / Commons / proto / EF change.

Done =

Build clean + dotnet test (Driver.Galaxy) green + the live-gw test proves a subscribed-tag write commits with zero AddItem → merge to master + push.