5e375f6d3d
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.
ReadBulk has no MXAccess analogue. The worker satisfies it by:
- Returning the last cached OnDataChange payload (was_cached=true)
when the requested tag is already in the session''s item registry
AND advised — the existing subscription is NOT touched, since the
caller did not create it.
- Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
and leaving the session exactly as it was. The wait pumps Windows
messages on the STA so the inbound MXAccess event can dispatch
while the executor still holds the thread.
The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.
Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.
All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.
Tests added:
- MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
version, TryWaitForUpdate signals on Set, pump step fires each poll.
- MxAccessBaseEventSinkTests — OnDataChange populates the cache,
ValueCache property exposes the bound instance.
- MxAccessCommandExecutorTests — four bulk-write variants (per-entry
success/failure, value+timestamp forwarding, secured user ids),
ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
was_successful=false), invalid-payload reply.
- GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
- SessionManagerTests — WriteBulk and ReadBulk forwarding through
FakeWorkerHarness; ReadBulk forwards timeout_ms.
- Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
right command and returns per-entry results, ReadBulk forwards the
timeout and unpacks the was_cached flag.
Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.
Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
370 lines
12 KiB
Markdown
370 lines
12 KiB
Markdown
# Design Decisions
|
|
|
|
This document records current v1 choices for the MXAccess gateway design. These
|
|
decisions can change, but implementation should follow them until a later design
|
|
update says otherwise.
|
|
|
|
## Source References
|
|
|
|
Use these local analysis sources when answering MXAccess-specific design or
|
|
implementation questions:
|
|
|
|
```text
|
|
C:\Users\dohertj2\Desktop\mxaccess
|
|
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Public-API.md
|
|
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Reverse-Engineering.md
|
|
```
|
|
|
|
Use these local notes for Galaxy Repository SQL metadata:
|
|
|
|
```text
|
|
C:\Users\dohertj2\Desktop\lmxopcua\gr
|
|
```
|
|
|
|
## MXAccess COM Target
|
|
|
|
Decision: target the installed MXAccess COM interop surface directly from the
|
|
x86 worker.
|
|
|
|
Concrete COM details from the MXAccess analysis:
|
|
|
|
- Interop assembly:
|
|
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
|
|
- Assembly identity:
|
|
`ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae`
|
|
- COM class:
|
|
`ArchestrA.MxAccess.LMXProxyServerClass`
|
|
- CLSID:
|
|
`{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
|
- ProgID:
|
|
`LMXProxy.LMXProxyServer.1`
|
|
- Version-independent ProgID:
|
|
`LMXProxy.LMXProxyServer`
|
|
- Registered server:
|
|
`C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll`
|
|
- Registry view:
|
|
`HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
|
- Threading model:
|
|
`Apartment`
|
|
|
|
Rationale: `LMXProxyServer` is a 32-bit in-process COM server, so a .NET 10 x64
|
|
gateway cannot instantiate it directly. The x86 sidecar worker is the reliable
|
|
parity path.
|
|
|
|
Implementation guidance:
|
|
|
|
- Worker should reference `ArchestrA.MXAccess.dll`.
|
|
- Worker should instantiate `new LMXProxyServerClass()` on the dedicated STA.
|
|
- Worker should expose the resolved class, ProgID, CLSID, interop assembly
|
|
version, and `LmxProxy.dll` path through `GetWorkerInfo` / `WorkerReady`.
|
|
- Keep the ProgID/path configurable for diagnostics, but the default should be
|
|
the installed MXAccess class above.
|
|
|
|
## Session Reconnect
|
|
|
|
Decision: no reconnectable sessions for v1.
|
|
|
|
One `OpenSession` creates one gateway session and one worker process. The
|
|
session ends on `CloseSession`, client disconnect policy, lease expiry, worker
|
|
fault, or gateway shutdown.
|
|
|
|
Rationale: reconnectable sessions require event replay, orphan ownership,
|
|
security checks, and more complicated worker lifetime rules. They are not needed
|
|
for the first parity slice.
|
|
|
|
## Event Subscribers
|
|
|
|
Decision: one active `StreamEvents` subscriber per session for v1.
|
|
|
|
A second subscriber should be rejected with a clear session error. Multi-client
|
|
fan-out may be added later with explicit backpressure semantics.
|
|
|
|
Rationale: one subscriber preserves simple event ordering and failure behavior
|
|
while parity is being proven.
|
|
|
|
## Authentication
|
|
|
|
Decision: API key authentication for the public gateway.
|
|
|
|
API keys are stored in a gateway-owned SQLite database. Store hashed API key
|
|
secrets only; never store raw key material.
|
|
|
|
Recommended client format:
|
|
|
|
```text
|
|
authorization: Bearer mxgw_<key-id>_<secret>
|
|
```
|
|
|
|
Recommended SQLite tables:
|
|
|
|
```sql
|
|
CREATE TABLE api_keys (
|
|
key_id TEXT PRIMARY KEY,
|
|
key_prefix TEXT NOT NULL,
|
|
secret_hash BLOB NOT NULL,
|
|
display_name TEXT NOT NULL,
|
|
scopes TEXT NOT NULL,
|
|
created_utc TEXT NOT NULL,
|
|
last_used_utc TEXT NULL,
|
|
revoked_utc TEXT NULL
|
|
);
|
|
|
|
CREATE TABLE api_key_audit (
|
|
audit_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
key_id TEXT NULL,
|
|
event_type TEXT NOT NULL,
|
|
remote_address TEXT NULL,
|
|
created_utc TEXT NOT NULL,
|
|
details TEXT NULL
|
|
);
|
|
```
|
|
|
|
Recommended scopes:
|
|
|
|
- `session:open`
|
|
- `session:close`
|
|
- `invoke:read`
|
|
- `invoke:write`
|
|
- `invoke:secure`
|
|
- `events:read`
|
|
- `metadata:read`
|
|
- `admin`
|
|
|
|
Hashing recommendation:
|
|
|
|
- Use HMAC-SHA256 with a gateway-local secret/pepper stored outside SQLite, or
|
|
use Argon2id if a suitable dependency is already accepted.
|
|
- Compare hashes using constant-time comparison.
|
|
- Log only the key id or prefix, not the raw key.
|
|
|
|
Storage recommendation:
|
|
|
|
- Default SQLite path should be under `ProgramData` or another configured
|
|
gateway data directory.
|
|
- Apply restrictive filesystem ACLs for the gateway service identity and
|
|
administrators.
|
|
- Require TLS when the gateway is reachable off-machine.
|
|
|
|
## Authorization
|
|
|
|
Decision: start with scope checks by command category.
|
|
|
|
Suggested mapping:
|
|
|
|
- `OpenSession`: `session:open`
|
|
- `CloseSession`: `session:close`
|
|
- `Register`, `Unregister`, `AddItem`, `AddItem2`, `RemoveItem`, `Advise`,
|
|
`UnAdvise`, `AdviseSupervisory`, `AddBufferedItem`,
|
|
`SetBufferedUpdateInterval`, `Suspend`, `Activate`: `invoke:read`
|
|
- `Write`, `Write2`: `invoke:write`
|
|
- `WriteSecured`, `WriteSecured2`, `AuthenticateUser`,
|
|
`ArchestrAUserToId`: `invoke:secure`
|
|
- `StreamEvents`: `events:read`
|
|
- Galaxy SQL metadata endpoints if added: `metadata:read`
|
|
- worker shutdown diagnostics and key management: `admin`
|
|
|
|
## Worker Process Identity
|
|
|
|
Decision: run workers as the gateway service identity for v1.
|
|
|
|
Rationale: this avoids early COM/DCOM permission failures and keeps the first
|
|
implementation focused on MXAccess parity. The worker launcher should keep an
|
|
extension point for a restricted service account later.
|
|
|
|
## Event Backpressure
|
|
|
|
Decision: fail-fast bounded queues for v1 and parity testing.
|
|
|
|
If worker or gateway event queues fill, fault the session. Do not silently drop
|
|
or coalesce events in parity mode.
|
|
|
|
Rationale: event drops would hide parity defects. Production coalescing by item
|
|
handle can be added later as an explicit opt-in mode once event rates are
|
|
measured.
|
|
|
|
## Event-Rate Target
|
|
|
|
Decision: do not set a production event-rate target before measurement.
|
|
|
|
For v1, expose queue depth, event rate, stream send latency, and overflow
|
|
metrics. Keep bounded queues and fail-fast behavior. Use observed load from live
|
|
systems to set a later coalescing or scaling target.
|
|
|
|
## Command Batching
|
|
|
|
Decision: no public command batching for v1.
|
|
|
|
Use one command per request so replies, HRESULTs, status arrays, event ordering,
|
|
and failure behavior are easy to compare against direct MXAccess.
|
|
|
|
Batch tag registration can be added later if measured setup latency requires it.
|
|
|
|
## Bulk Command Family
|
|
|
|
Decision: the gateway exposes a fixed set of *bulk* command kinds —
|
|
`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`,
|
|
`SubscribeBulk`, `UnsubscribeBulk`, `WriteBulk`, `Write2Bulk`,
|
|
`WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk` — that carry a list of
|
|
entries in one round-trip and return one per-entry result. Each command kind
|
|
runs the corresponding single-item MXAccess COM call sequentially on the
|
|
worker STA; per-entry failures populate `was_successful = false` with the
|
|
underlying HRESULT and never throw. There is no transactional / fail-fast
|
|
semantic — bulk here means "one round-trip, per-entry results", not
|
|
"atomic".
|
|
|
|
Rationale: MXAccess COM itself has no native bulk API for any of these
|
|
operations. Surfacing the per-entry result list keeps parity transparent —
|
|
the caller sees the same per-item HRESULT they would see calling MXAccess
|
|
N times directly — while the bulk shape collapses the gateway/IPC overhead
|
|
to one round-trip per batch and lets the worker keep the STA hot.
|
|
|
|
`ReadBulk` is the only bulk command without a 1:1 MXAccess analogue. Two
|
|
choices were considered:
|
|
|
|
1. **Cache-then-snapshot** (chosen): when a requested tag is already in the
|
|
session's item registry AND advised, the worker returns the last cached
|
|
`OnDataChange` value without touching the subscription
|
|
(`was_cached = true`). Otherwise it takes the full `AddItem + Advise +
|
|
wait-for-first-OnDataChange + UnAdvise + RemoveItem` lifecycle itself
|
|
(`was_cached = false`) and leaves the session exactly as it was before
|
|
the call. The cache lives on a per-session `MxAccessValueCache`,
|
|
populated by `MxAccessBaseEventSink` on every `OnDataChange` after the
|
|
event clears the outbound queue.
|
|
|
|
2. **Always-snapshot**: take the AddItem-through-RemoveItem lifecycle for
|
|
every requested tag. Cleaner conceptually but pays the full lifecycle
|
|
cost on every call and would interfere with existing subscriptions if
|
|
MXAccess reuses item handles.
|
|
|
|
The chosen behavior matches what callers actually want from "current
|
|
value" — a free read of an already-streaming tag, and a one-shot snapshot
|
|
otherwise — and never disturbs subscriptions the caller did not create.
|
|
The decision intentionally does NOT synthesize an `OnDataChange` event
|
|
from the snapshot path: the snapshot value reaches the caller through
|
|
`ReadBulk`'s reply payload only, not through the event stream. This
|
|
preserves the "Don't synthesize events" rule that scopes the rest of the
|
|
worker.
|
|
|
|
`ReadBulk`'s wait loop pumps Windows messages on the worker STA
|
|
(`StaRuntime.PumpPendingMessages`) on every poll iteration so the inbound
|
|
MXAccess COM event can dispatch while the bulk executor still holds the
|
|
thread — without the pump the OnDataChange would never deliver.
|
|
|
|
## Graceful Worker Shutdown
|
|
|
|
Decision: best-effort cleanup before COM release.
|
|
|
|
During graceful shutdown, the worker should attempt:
|
|
|
|
1. `UnAdvise` for advised items.
|
|
2. `RemoveItem` for active item handles.
|
|
3. `Unregister` for active server handles.
|
|
4. Event detach.
|
|
5. COM release.
|
|
|
|
Failures during cleanup should be logged and preserved diagnostically, but the
|
|
gateway may still kill the worker after shutdown timeout.
|
|
|
|
## OperationComplete
|
|
|
|
Decision: model and forward `OperationComplete` only when native MXAccess fires
|
|
it. Do not synthesize `OperationComplete` from writes, command replies, ASB
|
|
completion queues, or other status frames.
|
|
|
|
Rationale: the event signature is known, but the MXAccess analysis has not yet
|
|
captured the runtime condition that triggers the public event. Synthesizing it
|
|
would risk breaking parity.
|
|
|
|
## Buffered Data Change
|
|
|
|
Decision: include `OnBufferedDataChange` in the protocol and worker event
|
|
model, but treat multi-sample payload conversion as capture-validated work.
|
|
|
|
The event signature and native path are known. A live buffered sample batch has
|
|
not yet been observed. Until then, preserve raw value, quality, timestamp, data
|
|
type, and status metadata whenever conversion is incomplete.
|
|
|
|
## Completion-Only Status Mapping
|
|
|
|
Decision: preserve completion-only operation-status bytes as raw diagnostic
|
|
metadata unless native MXAccess raises a public event or the MXAccess analysis
|
|
proves an exact `MXSTATUS_PROXY[]` mapping.
|
|
|
|
Do not guess status category/source/detail values for frames that MXAccess does
|
|
not expose through its public COM events.
|
|
|
|
## API Key Administration
|
|
|
|
Decision: v1 API key management is a local administrative CLI/tool, not a
|
|
public admin API.
|
|
|
|
The tool should support:
|
|
|
|
- initialize auth database,
|
|
- create key,
|
|
- list keys without showing secrets,
|
|
- revoke key,
|
|
- rotate key,
|
|
- print the raw secret exactly once at creation.
|
|
|
|
Public gRPC key-management endpoints can be added later only behind `admin`
|
|
scope and TLS.
|
|
|
|
## SQLite Migrations
|
|
|
|
Decision: use simple startup migrations with a `schema_version` table.
|
|
|
|
Recommended table:
|
|
|
|
```sql
|
|
CREATE TABLE schema_version (
|
|
id INTEGER PRIMARY KEY CHECK (id = 1),
|
|
version INTEGER NOT NULL,
|
|
applied_utc TEXT NOT NULL
|
|
);
|
|
```
|
|
|
|
Migrations should be idempotent, run inside transactions, and fail gateway
|
|
startup if the database is newer than the running binary understands.
|
|
|
|
## Web Dashboard
|
|
|
|
Decision: host a basic gateway dashboard with Blazor Server and Bootstrap
|
|
CSS/JS.
|
|
|
|
The dashboard should show gateway health, active sessions, worker instances,
|
|
basic metrics, queue depths, and recent faults. It should update in real time
|
|
through Blazor Server component updates.
|
|
|
|
Allowed UI stack:
|
|
|
|
- Blazor Server,
|
|
- Bootstrap CSS,
|
|
- Bootstrap JavaScript,
|
|
- small local CSS.
|
|
|
|
Do not use MudBlazor or other Blazor UI component libraries for v1.
|
|
|
|
Dashboard access should require API-key-backed dashboard authentication with
|
|
`admin` scope when enabled. For local development, anonymous localhost access
|
|
is enabled by default through `Dashboard:AllowAnonymousLocalhost`; the bypass is
|
|
limited to loopback requests.
|
|
|
|
## Later Revisit Items
|
|
|
|
These are explicit post-v1 revisit items, not open blockers:
|
|
|
|
- reconnectable sessions,
|
|
- multiple event subscribers per session,
|
|
- restricted worker service account,
|
|
- production coalescing by item handle,
|
|
- command batching for high-volume tag setup.
|
|
|
|
## Related Documentation
|
|
|
|
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
|
- [MXAccess Worker Instance Detailed Design](./MxAccessWorkerInstanceDesign.md)
|
|
- [Authentication](./Authentication.md)
|
|
- [Authorization](./Authorization.md)
|
|
- [Galaxy Repository](./GalaxyRepository.md)
|