Files
lmxopcua/docs/plans/2026-06-18-galaxy-writer-handle-sharing-design.md
T

139 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Galaxy writer ⇄ subscription-registry item-handle sharing — Design
> Brainstormed 2026-06-18. Backlog item: `stillpending.md` §2.4 — "Galaxy — writer
> item-handle cache not shared with the subscription registry"
> (`Driver.Galaxy/Runtime/GatewayGalaxyDataWriter.cs`). Off master `70e1bde9`.
## Problem (verified in code)
The Galaxy data writer's item-handle cache is **already shipped and reconnect-wired**:
`GatewayGalaxyDataWriter` holds `_itemHandles` (fullRef → MXAccess hItem) plus
`_supervisedHandles`, and `GalaxyDriver.ReopenAsync` invalidates them after a session
recreate (`GalaxyDriver.cs:298`, commits `f05b5d79` / `f77488ee`). So "add a writer cache"
is **not** the open work.
What is open is exactly what the backlog says: the writer cache is **isolated from the
`SubscriptionRegistry`**. The registry already records `TagBinding(FullReference, ItemHandle)`
for every live subscription, and — the load-bearing fact — both subscribe-handles (from
`SubscribeBulk`) and write-handles (from the writer's `AddItem`) are issued against the **same
MXAccess server registration** (`GalaxyMxSession.ServerHandle`). Within one registration an
hItem is shared across the AddItem / Advise / Write call paths, so a handle the gateway returned
for a subscription is reusable by a `Write`.
Today the writer never consults the registry, so the **first write to an already-subscribed tag
pays a redundant `AddItem` round-trip** (subsequent writes hit the writer's own `_itemHandles`).
This is an `_Optimization._` — a first-write-only latency saving, plus unification of handle
provenance (one fewer source of truth for "which hItem is this tag").
## Load-bearing premise (proven live, not assumed)
> *A subscription's item handle is usable for a no-login supervisory `Write` that commits.*
If this were false, borrowing the subscribed handle would **regress writes** for subscribed
tags. The MXAccess Toolkit semantics say it holds (one hItem per item under a registration,
shared across Advise/Write), but this phase does **not** assume it — the live-gw test is the
**merge gate**, not decoration.
## Approach
Three options were considered:
1. **Delegate seam (chosen).** The writer ctor gains an optional
`Func<string, int?>? subscribedHandleSource = null`; `GalaxyDriver` wires it to the
registry's new `TryResolveItemHandle`. This is symmetric with the writer's existing
`securityResolver` delegate, preserves the writer's "independently testable" property
(`null` ⇒ today's exact behavior), and is the smallest surface.
2. **Direct `SubscriptionRegistry` reference** in the writer. Same assembly, registry is
test-constructible — but couples the writer to a concrete collaborator with no upside over (1).
3. **Unified shared cache object** both sides read/write. Matches the literal "wire bindings
into `_itemHandles`" phrasing but is the most invasive (changes the writer's ownership
model). YAGNI.
**Approach 1.** AdminUI untouched. No Commons/proto/EF/migration change. No bUnit.
## Resolution rule (self-healing — no stale-handle regression)
`GatewayGalaxyDataWriter.EnsureItemHandleAsync(fullRef)` becomes:
1. `_itemHandles` hit (the writer `AddItem`'d this tag itself before) → use it.
2. else `subscribedHandleSource?.Invoke(fullRef)` returns a **live** handle → use it, but
**do NOT store it in `_itemHandles`**. The registry owns the borrowed handle's lifecycle
(including reconnect `Rebind`), so after a reconnect the writer always re-borrows the *fresh*
handle on the next write — there is no stale-cache window introduced by the borrow.
3. else `AddItem` + store in `_itemHandles` (today's path), incrementing an `AddItemCallCount`
test seam.
`AdviseSupervisory` is unchanged: the writer still supervisory-advises the (possibly borrowed)
hItem once per handle via `_supervisedHandles`. Borrowing only skips the `AddItem` round-trip —
which is the entire point of the optimization. The borrowed item already carries the
subscriber's data-change advise; supervisory advise is an additional mode on the same item.
The decision in steps 12 is extracted into a synchronous internal seam
`TryResolveCachedOrBorrowed(fullRef) -> int?` so it is unit-testable **without a live session**
(the SDK `MxGatewaySession` is sealed with an internal ctor and cannot be faked — see Testing).
`EnsureItemHandleAsync` calls the seam first and only `AddItem`s on a null result.
## Registry change
`SubscriptionRegistry` gains a forward lookup:
- A `ConcurrentDictionary<string, int> _itemHandleByFullRef` (`StringComparer.OrdinalIgnoreCase`,
matching the writer's cache), maintained incrementally in `Register` / `Rebind` (add the
`fullRef → handle` for each `binding.ItemHandle > 0`) and best-effort dropped in `Remove` /
`Rebind`.
- `public int? TryResolveItemHandle(string fullRef)` that returns the mapped handle **only if
`_subscribersByItemHandle` still contains it** — a liveness guard. This means even a lingering
forward-map entry can never hand out a dead handle, because `_subscribersByItemHandle` is the
already-authoritative live-handle set. The guard de-risks the removal bookkeeping: the forward
map is an index, not the source of truth.
The event-dispatch hot path (`ResolveSubscribers`) is **untouched** — the forward map is consulted
only on writes (rare relative to events) and mutated only on subscribe/unsubscribe/rebind (which
already do O(bindings) work).
`GalaxyDriver` passes `_subscriptions.TryResolveItemHandle` as the writer's
`subscribedHandleSource` when it constructs the production writer.
## Error handling
Unchanged. A borrowed handle that fails a write surfaces its own Bad status via the existing
`TranslateReply` path; the writer does not cache borrowed handles, so the next write simply
re-resolves (re-borrow if still live, else `AddItem`). `TryResolveItemHandle` returns `null`
cleanly for unknown / dead handles.
## Testing
**Unit (xUnit + Shouldly, `Driver.Galaxy.Tests`) — no live session required:**
- `GatewayGalaxyDataWriter`: seed `subscribedHandleSource``TryResolveCachedOrBorrowed`
returns the borrowed handle and leaves `CachedItemHandleCount == 0`; a `_itemHandles` hit wins
over the source; a `null` source ⇒ no borrow (returns null). (Mirrors the existing
`SeedHandleCachesForTest` / count-seam pattern, since the gw session cannot be faked.)
- `SubscriptionRegistry`: `TryResolveItemHandle` returns the handle after `Register`; returns
`null` after `Remove`; returns the **fresh** handle after `Rebind`; the liveness guard returns
`null` when the handle is absent from `_subscribersByItemHandle`.
**Live gw (the merge gate)** — extend `GatewayGalaxyLiveReopenAndWriteTests` (skip-gated; runs
against `MXGW_ENDPOINT=http://10.100.0.48:5120` with the key sourced via the durable
`docker exec … printenv GALAXY_MXGW_API_KEY` recipe, never echoed). Add an internal
`AddItemCallCount` seam on the writer, then:
- Subscribe a real tag (registry now holds its hItem) → write it through the registry-wired
writer → assert the write **commits** (Good status + value persists) **and `AddItemCallCount == 0`**.
- Control: write a **non-subscribed** tag → assert `AddItemCallCount == 1`.
This proves both halves: the borrowed handle is write-usable (premise holds) **and** the redundant
`AddItem` is actually skipped.
## Deferred / out of scope
- Reverse direction (subscriber borrowing the writer's `AddItem` handles) — subscriber handles
come from `SubscribeBulk`; no need.
- Unifying the two caches into one shared object (Approach 3).
- Any AdminUI / Commons / proto / EF change.
## Done =
Build clean + `dotnet test` (Driver.Galaxy) green + the live-gw test proves a subscribed-tag
write commits with zero `AddItem` → merge to master + push.