139 lines
7.7 KiB
Markdown
139 lines
7.7 KiB
Markdown
# Galaxy writer ⇄ subscription-registry item-handle sharing — Design
|
||
|
||
> Brainstormed 2026-06-18. Backlog item: `stillpending.md` §2.4 — "Galaxy — writer
|
||
> item-handle cache not shared with the subscription registry"
|
||
> (`Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs`). Off master `70e1bde9`.
|
||
|
||
## Problem (verified in code)
|
||
|
||
The Galaxy data writer's item-handle cache is **already shipped and reconnect-wired**:
|
||
`GatewayGalaxyDataWriter` holds `_itemHandles` (fullRef → MXAccess hItem) plus
|
||
`_supervisedHandles`, and `GalaxyDriver.ReopenAsync` invalidates them after a session
|
||
recreate (`GalaxyDriver.cs:298`, commits `f05b5d79` / `f77488ee`). So "add a writer cache"
|
||
is **not** the open work.
|
||
|
||
What is open is exactly what the backlog says: the writer cache is **isolated from the
|
||
`SubscriptionRegistry`**. The registry already records `TagBinding(FullReference, ItemHandle)`
|
||
for every live subscription, and — the load-bearing fact — both subscribe-handles (from
|
||
`SubscribeBulk`) and write-handles (from the writer's `AddItem`) are issued against the **same
|
||
MXAccess server registration** (`GalaxyMxSession.ServerHandle`). Within one registration an
|
||
hItem is shared across the AddItem / Advise / Write call paths, so a handle the gateway returned
|
||
for a subscription is reusable by a `Write`.
|
||
|
||
Today the writer never consults the registry, so the **first write to an already-subscribed tag
|
||
pays a redundant `AddItem` round-trip** (subsequent writes hit the writer's own `_itemHandles`).
|
||
This is an `_Optimization._` — a first-write-only latency saving, plus unification of handle
|
||
provenance (one fewer source of truth for "which hItem is this tag").
|
||
|
||
## Load-bearing premise (proven live, not assumed)
|
||
|
||
> *A subscription's item handle is usable for a no-login supervisory `Write` that commits.*
|
||
|
||
If this were false, borrowing the subscribed handle would **regress writes** for subscribed
|
||
tags. The MXAccess Toolkit semantics say it holds (one hItem per item under a registration,
|
||
shared across Advise/Write), but this phase does **not** assume it — the live-gw test is the
|
||
**merge gate**, not decoration.
|
||
|
||
## Approach
|
||
|
||
Three options were considered:
|
||
|
||
1. **Delegate seam (chosen).** The writer ctor gains an optional
|
||
`Func<string, int?>? subscribedHandleSource = null`; `GalaxyDriver` wires it to the
|
||
registry's new `TryResolveItemHandle`. This is symmetric with the writer's existing
|
||
`securityResolver` delegate, preserves the writer's "independently testable" property
|
||
(`null` ⇒ today's exact behavior), and is the smallest surface.
|
||
2. **Direct `SubscriptionRegistry` reference** in the writer. Same assembly, registry is
|
||
test-constructible — but couples the writer to a concrete collaborator with no upside over (1).
|
||
3. **Unified shared cache object** both sides read/write. Matches the literal "wire bindings
|
||
into `_itemHandles`" phrasing but is the most invasive (changes the writer's ownership
|
||
model). YAGNI.
|
||
|
||
→ **Approach 1.** AdminUI untouched. No Commons/proto/EF/migration change. No bUnit.
|
||
|
||
## Resolution rule (self-healing — no stale-handle regression)
|
||
|
||
`GatewayGalaxyDataWriter.EnsureItemHandleAsync(fullRef)` becomes:
|
||
|
||
1. `_itemHandles` hit (the writer `AddItem`'d this tag itself before) → use it.
|
||
2. else `subscribedHandleSource?.Invoke(fullRef)` returns a **live** handle → use it, but
|
||
**do NOT store it in `_itemHandles`**. The registry owns the borrowed handle's lifecycle
|
||
(including reconnect `Rebind`), so after a reconnect the writer always re-borrows the *fresh*
|
||
handle on the next write — there is no stale-cache window introduced by the borrow.
|
||
3. else `AddItem` + store in `_itemHandles` (today's path), incrementing an `AddItemCallCount`
|
||
test seam.
|
||
|
||
`AdviseSupervisory` is unchanged: the writer still supervisory-advises the (possibly borrowed)
|
||
hItem once per handle via `_supervisedHandles`. Borrowing only skips the `AddItem` round-trip —
|
||
which is the entire point of the optimization. The borrowed item already carries the
|
||
subscriber's data-change advise; supervisory advise is an additional mode on the same item.
|
||
|
||
The decision in steps 1–2 is extracted into a synchronous internal seam
|
||
`TryResolveCachedOrBorrowed(fullRef) -> int?` so it is unit-testable **without a live session**
|
||
(the SDK `MxGatewaySession` is sealed with an internal ctor and cannot be faked — see Testing).
|
||
`EnsureItemHandleAsync` calls the seam first and only `AddItem`s on a null result.
|
||
|
||
## Registry change
|
||
|
||
`SubscriptionRegistry` gains a forward lookup:
|
||
|
||
- A `ConcurrentDictionary<string, int> _itemHandleByFullRef` (`StringComparer.OrdinalIgnoreCase`,
|
||
matching the writer's cache), maintained incrementally in `Register` / `Rebind` (add the
|
||
`fullRef → handle` for each `binding.ItemHandle > 0`) and best-effort dropped in `Remove` /
|
||
`Rebind`.
|
||
- `public int? TryResolveItemHandle(string fullRef)` that returns the mapped handle **only if
|
||
`_subscribersByItemHandle` still contains it** — a liveness guard. This means even a lingering
|
||
forward-map entry can never hand out a dead handle, because `_subscribersByItemHandle` is the
|
||
already-authoritative live-handle set. The guard de-risks the removal bookkeeping: the forward
|
||
map is an index, not the source of truth.
|
||
|
||
The event-dispatch hot path (`ResolveSubscribers`) is **untouched** — the forward map is consulted
|
||
only on writes (rare relative to events) and mutated only on subscribe/unsubscribe/rebind (which
|
||
already do O(bindings) work).
|
||
|
||
`GalaxyDriver` passes `_subscriptions.TryResolveItemHandle` as the writer's
|
||
`subscribedHandleSource` when it constructs the production writer.
|
||
|
||
## Error handling
|
||
|
||
Unchanged. A borrowed handle that fails a write surfaces its own Bad status via the existing
|
||
`TranslateReply` path; the writer does not cache borrowed handles, so the next write simply
|
||
re-resolves (re-borrow if still live, else `AddItem`). `TryResolveItemHandle` returns `null`
|
||
cleanly for unknown / dead handles.
|
||
|
||
## Testing
|
||
|
||
**Unit (xUnit + Shouldly, `Driver.Galaxy.Tests`) — no live session required:**
|
||
|
||
- `GatewayGalaxyDataWriter`: seed `subscribedHandleSource` → `TryResolveCachedOrBorrowed`
|
||
returns the borrowed handle and leaves `CachedItemHandleCount == 0`; a `_itemHandles` hit wins
|
||
over the source; a `null` source ⇒ no borrow (returns null). (Mirrors the existing
|
||
`SeedHandleCachesForTest` / count-seam pattern, since the gw session cannot be faked.)
|
||
- `SubscriptionRegistry`: `TryResolveItemHandle` returns the handle after `Register`; returns
|
||
`null` after `Remove`; returns the **fresh** handle after `Rebind`; the liveness guard returns
|
||
`null` when the handle is absent from `_subscribersByItemHandle`.
|
||
|
||
**Live gw (the merge gate)** — extend `GatewayGalaxyLiveReopenAndWriteTests` (skip-gated; runs
|
||
against `MXGW_ENDPOINT=http://10.100.0.48:5120` with the key sourced via the durable
|
||
`docker exec … printenv GALAXY_MXGW_API_KEY` recipe, never echoed). Add an internal
|
||
`AddItemCallCount` seam on the writer, then:
|
||
|
||
- Subscribe a real tag (registry now holds its hItem) → write it through the registry-wired
|
||
writer → assert the write **commits** (Good status + value persists) **and `AddItemCallCount == 0`**.
|
||
- Control: write a **non-subscribed** tag → assert `AddItemCallCount == 1`.
|
||
|
||
This proves both halves: the borrowed handle is write-usable (premise holds) **and** the redundant
|
||
`AddItem` is actually skipped.
|
||
|
||
## Deferred / out of scope
|
||
|
||
- Reverse direction (subscriber borrowing the writer's `AddItem` handles) — subscriber handles
|
||
come from `SubscribeBulk`; no need.
|
||
- Unifying the two caches into one shared object (Approach 3).
|
||
- Any AdminUI / Commons / proto / EF change.
|
||
|
||
## Done =
|
||
|
||
Build clean + `dotnet test` (Driver.Galaxy) green + the live-gw test proves a subscribed-tag
|
||
write commits with zero `AddItem` → merge to master + push.
|