Files
mxaccessgw/docs/design-decisions.md
T
2026-04-26 15:19:17 -04:00

310 lines
9.3 KiB
Markdown

# Design Decisions
This document records current v1 choices for the MXAccess gateway design. These
decisions can change, but implementation should follow them until a later design
update says otherwise.
## Source References
Use these local analysis sources when answering MXAccess-specific design or
implementation questions:
```text
C:\Users\dohertj2\Desktop\mxaccess
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Public-API.md
C:\Users\dohertj2\Desktop\mxaccess\docs\MXAccess-Reverse-Engineering.md
```
Use these local notes for Galaxy Repository SQL metadata:
```text
C:\Users\dohertj2\Desktop\lmxopcua\gr
```
## MXAccess COM Target
Decision: target the installed MXAccess COM interop surface directly from the
x86 worker.
Concrete COM details from the MXAccess analysis:
- Interop assembly:
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
- Assembly identity:
`ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae`
- COM class:
`ArchestrA.MxAccess.LMXProxyServerClass`
- CLSID:
`{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
- ProgID:
`LMXProxy.LMXProxyServer.1`
- Version-independent ProgID:
`LMXProxy.LMXProxyServer`
- Registered server:
`C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll`
- Registry view:
`HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
- Threading model:
`Apartment`
Rationale: `LMXProxyServer` is a 32-bit in-process COM server, so a .NET 10 x64
gateway cannot instantiate it directly. The x86 sidecar worker is the reliable
parity path.
Implementation guidance:
- Worker should reference `ArchestrA.MXAccess.dll`.
- Worker should instantiate `new LMXProxyServerClass()` on the dedicated STA.
- Worker should expose the resolved class, ProgID, CLSID, interop assembly
version, and `LmxProxy.dll` path through `GetWorkerInfo` / `WorkerReady`.
- Keep the ProgID/path configurable for diagnostics, but the default should be
the installed MXAccess class above.
## Session Reconnect
Decision: no reconnectable sessions for v1.
One `OpenSession` creates one gateway session and one worker process. The
session ends on `CloseSession`, client disconnect policy, lease expiry, worker
fault, or gateway shutdown.
Rationale: reconnectable sessions require event replay, orphan ownership,
security checks, and more complicated worker lifetime rules. They are not needed
for the first parity slice.
## Event Subscribers
Decision: one active `StreamEvents` subscriber per session for v1.
A second subscriber should be rejected with a clear session error. Multi-client
fan-out may be added later with explicit backpressure semantics.
Rationale: one subscriber preserves simple event ordering and failure behavior
while parity is being proven.
## Authentication
Decision: API key authentication for the public gateway.
API keys are stored in a gateway-owned SQLite database. Store hashed API key
secrets only; never store raw key material.
Recommended client format:
```text
authorization: Bearer mxgw_<key-id>_<secret>
```
Recommended SQLite tables:
```sql
CREATE TABLE api_keys (
key_id TEXT PRIMARY KEY,
key_prefix TEXT NOT NULL,
secret_hash BLOB NOT NULL,
display_name TEXT NOT NULL,
scopes TEXT NOT NULL,
created_utc TEXT NOT NULL,
last_used_utc TEXT NULL,
revoked_utc TEXT NULL
);
CREATE TABLE api_key_audit (
audit_id INTEGER PRIMARY KEY AUTOINCREMENT,
key_id TEXT NULL,
event_type TEXT NOT NULL,
remote_address TEXT NULL,
created_utc TEXT NOT NULL,
details TEXT NULL
);
```
Recommended scopes:
- `session:open`
- `session:close`
- `invoke:read`
- `invoke:write`
- `invoke:secure`
- `events:read`
- `metadata:read`
- `admin`
Hashing recommendation:
- Use HMAC-SHA256 with a gateway-local secret/pepper stored outside SQLite, or
use Argon2id if a suitable dependency is already accepted.
- Compare hashes using constant-time comparison.
- Log only the key id or prefix, not the raw key.
Storage recommendation:
- Default SQLite path should be under `ProgramData` or another configured
gateway data directory.
- Apply restrictive filesystem ACLs for the gateway service identity and
administrators.
- Require TLS when the gateway is reachable off-machine.
## Authorization
Decision: start with scope checks by command category.
Suggested mapping:
- `OpenSession`: `session:open`
- `CloseSession`: `session:close`
- `Register`, `Unregister`, `AddItem`, `AddItem2`, `RemoveItem`, `Advise`,
`UnAdvise`, `AdviseSupervisory`, `AddBufferedItem`,
`SetBufferedUpdateInterval`, `Suspend`, `Activate`: `invoke:read`
- `Write`, `Write2`: `invoke:write`
- `WriteSecured`, `WriteSecured2`, `AuthenticateUser`,
`ArchestrAUserToId`: `invoke:secure`
- `StreamEvents`: `events:read`
- Galaxy SQL metadata endpoints if added: `metadata:read`
- worker shutdown diagnostics and key management: `admin`
## Worker Process Identity
Decision: run workers as the gateway service identity for v1.
Rationale: this avoids early COM/DCOM permission failures and keeps the first
implementation focused on MXAccess parity. The worker launcher should keep an
extension point for a restricted service account later.
## Event Backpressure
Decision: fail-fast bounded queues for v1 and parity testing.
If worker or gateway event queues fill, fault the session. Do not silently drop
or coalesce events in parity mode.
Rationale: event drops would hide parity defects. Production coalescing by item
handle can be added later as an explicit opt-in mode once event rates are
measured.
## Event-Rate Target
Decision: do not set a production event-rate target before measurement.
For v1, expose queue depth, event rate, stream send latency, and overflow
metrics. Keep bounded queues and fail-fast behavior. Use observed load from live
systems to set a later coalescing or scaling target.
## Command Batching
Decision: no public command batching for v1.
Use one command per request so replies, HRESULTs, status arrays, event ordering,
and failure behavior are easy to compare against direct MXAccess.
Batch tag registration can be added later if measured setup latency requires it.
## Graceful Worker Shutdown
Decision: best-effort cleanup before COM release.
During graceful shutdown, the worker should attempt:
1. `UnAdvise` for advised items.
2. `RemoveItem` for active item handles.
3. `Unregister` for active server handles.
4. Event detach.
5. COM release.
Failures during cleanup should be logged and preserved diagnostically, but the
gateway may still kill the worker after shutdown timeout.
## OperationComplete
Decision: model and forward `OperationComplete` only when native MXAccess fires
it. Do not synthesize `OperationComplete` from writes, command replies, ASB
completion queues, or other status frames.
Rationale: the event signature is known, but the MXAccess analysis has not yet
captured the runtime condition that triggers the public event. Synthesizing it
would risk breaking parity.
## Buffered Data Change
Decision: include `OnBufferedDataChange` in the protocol and worker event
model, but treat multi-sample payload conversion as capture-validated work.
The event signature and native path are known. A live buffered sample batch has
not yet been observed. Until then, preserve raw value, quality, timestamp, data
type, and status metadata whenever conversion is incomplete.
## Completion-Only Status Mapping
Decision: preserve completion-only operation-status bytes as raw diagnostic
metadata unless native MXAccess raises a public event or the MXAccess analysis
proves an exact `MXSTATUS_PROXY[]` mapping.
Do not guess status category/source/detail values for frames that MXAccess does
not expose through its public COM events.
## API Key Administration
Decision: v1 API key management is a local administrative CLI/tool, not a
public admin API.
The tool should support:
- initialize auth database,
- create key,
- list keys without showing secrets,
- revoke key,
- rotate key,
- print the raw secret exactly once at creation.
Public gRPC key-management endpoints can be added later only behind `admin`
scope and TLS.
## SQLite Migrations
Decision: use simple startup migrations with a `schema_version` table.
Recommended table:
```sql
CREATE TABLE schema_version (
id INTEGER PRIMARY KEY CHECK (id = 1),
version INTEGER NOT NULL,
applied_utc TEXT NOT NULL
);
```
Migrations should be idempotent, run inside transactions, and fail gateway
startup if the database is newer than the running binary understands.
## Web Dashboard
Decision: host a basic gateway dashboard with Blazor Server and Bootstrap
CSS/JS.
The dashboard should show gateway health, active sessions, worker instances,
basic metrics, queue depths, and recent faults. It should update in real time
through Blazor Server component updates.
Allowed UI stack:
- Blazor Server,
- Bootstrap CSS,
- Bootstrap JavaScript,
- small local CSS.
Do not use MudBlazor or other Blazor UI component libraries for v1.
Dashboard access should require API-key-backed dashboard authentication with
`admin` scope when enabled. For local development, anonymous localhost access
may exist only behind an explicit configuration option that defaults to false.
## Later Revisit Items
These are explicit post-v1 revisit items, not open blockers:
- reconnectable sessions,
- multiple event subscribers per session,
- restricted worker service account,
- production coalescing by item handle,
- command batching for high-volume tag setup.