7.7 KiB
Galaxy writer ⇄ subscription-registry item-handle sharing — Design
Brainstormed 2026-06-18. Backlog item:
stillpending.md§2.4 — "Galaxy — writer item-handle cache not shared with the subscription registry" (Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs). Off master70e1bde9.
Problem (verified in code)
The Galaxy data writer's item-handle cache is already shipped and reconnect-wired:
GatewayGalaxyDataWriter holds _itemHandles (fullRef → MXAccess hItem) plus
_supervisedHandles, and GalaxyDriver.ReopenAsync invalidates them after a session
recreate (GalaxyDriver.cs:298, commits f05b5d79 / f77488ee). So "add a writer cache"
is not the open work.
What is open is exactly what the backlog says: the writer cache is isolated from the
SubscriptionRegistry. The registry already records TagBinding(FullReference, ItemHandle)
for every live subscription, and — the load-bearing fact — both subscribe-handles (from
SubscribeBulk) and write-handles (from the writer's AddItem) are issued against the same
MXAccess server registration (GalaxyMxSession.ServerHandle). Within one registration an
hItem is shared across the AddItem / Advise / Write call paths, so a handle the gateway returned
for a subscription is reusable by a Write.
Today the writer never consults the registry, so the first write to an already-subscribed tag
pays a redundant AddItem round-trip (subsequent writes hit the writer's own _itemHandles).
This is an _Optimization._ — a first-write-only latency saving, plus unification of handle
provenance (one fewer source of truth for "which hItem is this tag").
Load-bearing premise (proven live, not assumed)
A subscription's item handle is usable for a no-login supervisory
Writethat commits.
If this were false, borrowing the subscribed handle would regress writes for subscribed tags. The MXAccess Toolkit semantics say it holds (one hItem per item under a registration, shared across Advise/Write), but this phase does not assume it — the live-gw test is the merge gate, not decoration.
Approach
Three options were considered:
- Delegate seam (chosen). The writer ctor gains an optional
Func<string, int?>? subscribedHandleSource = null;GalaxyDriverwires it to the registry's newTryResolveItemHandle. This is symmetric with the writer's existingsecurityResolverdelegate, preserves the writer's "independently testable" property (null⇒ today's exact behavior), and is the smallest surface. - Direct
SubscriptionRegistryreference in the writer. Same assembly, registry is test-constructible — but couples the writer to a concrete collaborator with no upside over (1). - Unified shared cache object both sides read/write. Matches the literal "wire bindings
into
_itemHandles" phrasing but is the most invasive (changes the writer's ownership model). YAGNI.
→ Approach 1. AdminUI untouched. No Commons/proto/EF/migration change. No bUnit.
Resolution rule (self-healing — no stale-handle regression)
GatewayGalaxyDataWriter.EnsureItemHandleAsync(fullRef) becomes:
_itemHandleshit (the writerAddItem'd this tag itself before) → use it.- else
subscribedHandleSource?.Invoke(fullRef)returns a live handle → use it, but do NOT store it in_itemHandles. The registry owns the borrowed handle's lifecycle (including reconnectRebind), so after a reconnect the writer always re-borrows the fresh handle on the next write — there is no stale-cache window introduced by the borrow. - else
AddItem+ store in_itemHandles(today's path), incrementing anAddItemCallCounttest seam.
AdviseSupervisory is unchanged: the writer still supervisory-advises the (possibly borrowed)
hItem once per handle via _supervisedHandles. Borrowing only skips the AddItem round-trip —
which is the entire point of the optimization. The borrowed item already carries the
subscriber's data-change advise; supervisory advise is an additional mode on the same item.
The decision in steps 1–2 is extracted into a synchronous internal seam
TryResolveCachedOrBorrowed(fullRef) -> int? so it is unit-testable without a live session
(the SDK MxGatewaySession is sealed with an internal ctor and cannot be faked — see Testing).
EnsureItemHandleAsync calls the seam first and only AddItems on a null result.
Registry change
SubscriptionRegistry gains a forward lookup:
- A
ConcurrentDictionary<string, int> _itemHandleByFullRef(StringComparer.OrdinalIgnoreCase, matching the writer's cache), maintained incrementally inRegister/Rebind(add thefullRef → handlefor eachbinding.ItemHandle > 0) and best-effort dropped inRemove/Rebind. public int? TryResolveItemHandle(string fullRef)that returns the mapped handle only if_subscribersByItemHandlestill contains it — a liveness guard. This means even a lingering forward-map entry can never hand out a dead handle, because_subscribersByItemHandleis the already-authoritative live-handle set. The guard de-risks the removal bookkeeping: the forward map is an index, not the source of truth.
The event-dispatch hot path (ResolveSubscribers) is untouched — the forward map is consulted
only on writes (rare relative to events) and mutated only on subscribe/unsubscribe/rebind (which
already do O(bindings) work).
GalaxyDriver passes _subscriptions.TryResolveItemHandle as the writer's
subscribedHandleSource when it constructs the production writer.
Error handling
Unchanged. A borrowed handle that fails a write surfaces its own Bad status via the existing
TranslateReply path; the writer does not cache borrowed handles, so the next write simply
re-resolves (re-borrow if still live, else AddItem). TryResolveItemHandle returns null
cleanly for unknown / dead handles.
Testing
Unit (xUnit + Shouldly, Driver.Galaxy.Tests) — no live session required:
GatewayGalaxyDataWriter: seedsubscribedHandleSource→TryResolveCachedOrBorrowedreturns the borrowed handle and leavesCachedItemHandleCount == 0; a_itemHandleshit wins over the source; anullsource ⇒ no borrow (returns null). (Mirrors the existingSeedHandleCachesForTest/ count-seam pattern, since the gw session cannot be faked.)SubscriptionRegistry:TryResolveItemHandlereturns the handle afterRegister; returnsnullafterRemove; returns the fresh handle afterRebind; the liveness guard returnsnullwhen the handle is absent from_subscribersByItemHandle.
Live gw (the merge gate) — extend GatewayGalaxyLiveReopenAndWriteTests (skip-gated; runs
against MXGW_ENDPOINT=http://10.100.0.48:5120 with the key sourced via the durable
docker exec … printenv GALAXY_MXGW_API_KEY recipe, never echoed). Add an internal
AddItemCallCount seam on the writer, then:
- Subscribe a real tag (registry now holds its hItem) → write it through the registry-wired
writer → assert the write commits (Good status + value persists) and
AddItemCallCount == 0. - Control: write a non-subscribed tag → assert
AddItemCallCount == 1.
This proves both halves: the borrowed handle is write-usable (premise holds) and the redundant
AddItem is actually skipped.
Deferred / out of scope
- Reverse direction (subscriber borrowing the writer's
AddItemhandles) — subscriber handles come fromSubscribeBulk; no need. - Unifying the two caches into one shared object (Approach 3).
- Any AdminUI / Commons / proto / EF change.
Done =
Build clean + dotnet test (Driver.Galaxy) green + the live-gw test proves a subscribed-tag
write commits with zero AddItem → merge to master + push.