d692232191
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)
EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:
* IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
take a direct dependency on SignalR.
* DashboardEventBroadcaster — singleton implementation that hands the
SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
logged at debug and dropped so the source gRPC stream is never
blocked.
EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.
SessionDetailsPage adds a "Recent events" panel:
* implements IAsyncDisposable
* opens a second HubConnection via DashboardHubConnectionFactory targeting
/hubs/events
* calls SubscribeSession(SessionId) on Start
* renders the most recent 50 events in a small table (worker seq, family,
server/item handle, alarm reference when the event is OnAlarmTransition)
* shows a live/offline conn-pill driven by HubConnection.Closed /
Reconnected events
The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.
Documentation refresh
Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:
* CLAUDE.md — Authentication section now explains LDAP bind +
GroupToRole + HubToken bearer flow.
* gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
events SignalR hubs, LDAP cookie + bearer scheme.
* docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
add GroupToRole row, append "Authorization policies" and "SignalR hubs"
subsections describing the three policies and the /hubs/* endpoints.
* docs/GatewayDashboardDesign.md — hosting model (root mount, new
endpoint layout), Realtime Updates rewritten as a hub table
(DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
and routing), Authentication And Authorization rewritten around LDAP +
role mapping + the hub bearer flow, Configuration block updated.
* docs/GatewayProcessDesign.md — security-section dashboard paragraph
and the example config block both refreshed to LDAP/role auth.
* docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
API-key login flow).
* docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
RequiredGroup default; success-path assertion explains the role-claim
check.
Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1060 lines
32 KiB
Markdown
1060 lines
32 KiB
Markdown
# MXAccess Gateway Design
|
|
|
|
## Goal
|
|
|
|
Provide full MXAccess parity to modern clients without forcing those clients to
|
|
load MXAccess COM, run as x86, or own an STA message pump.
|
|
|
|
The gateway must preserve MXAccess behavior first:
|
|
|
|
- public MXAccess command semantics,
|
|
- native MXAccess event families,
|
|
- STA/message-pump delivery behavior,
|
|
- installed-provider quirks,
|
|
- HRESULT/status/value marshaling,
|
|
- per-client isolation.
|
|
|
|
`MxAsbClient` and the managed NMX client remain useful future acceleration
|
|
paths, but they should not define the parity contract. The installed MXAccess
|
|
COM component is the compatibility baseline.
|
|
|
|
## Architecture
|
|
|
|
Use a .NET 10 C# gateway for external clients and per-session .NET Framework
|
|
4.8 x86 C# worker processes for MXAccess.
|
|
|
|
```text
|
|
client
|
|
-> gRPC over TCP
|
|
-> .NET 10 x64 gateway
|
|
-> session manager
|
|
-> per-session .NET Framework 4.8 x86 worker process
|
|
-> dedicated STA thread
|
|
-> MXAccess COM instance
|
|
-> Windows/COM message pump
|
|
-> command queue
|
|
-> event sink
|
|
```
|
|
|
|
The worker does not host gRPC. The gateway talks to workers through a small
|
|
local IPC protocol. Named pipes with protobuf-framed messages are the default
|
|
transport.
|
|
|
|
Detailed follow-up docs:
|
|
|
|
- `docs/GatewayProcessDesign.md` covers the .NET 10 gateway process,
|
|
session manager, worker supervision, gRPC API, event streaming, fault model,
|
|
security, observability, and test strategy.
|
|
- `docs/WorkerFrameProtocol.md` covers the gateway-side named-pipe frame
|
|
reader/writer and `WorkerEnvelope` validation rules.
|
|
- `docs/WorkerProcessLauncher.md` covers worker executable validation, process
|
|
launch arguments, nonce handling, and startup cleanup behavior.
|
|
- `docs/MxAccessWorkerInstanceDesign.md` covers each .NET Framework 4.8 x86
|
|
MXAccess worker instance, including STA ownership, message pumping, COM
|
|
lifetime, command dispatch, event sinks, conversion, and shutdown.
|
|
- `docs/DesignDecisions.md` records current v1 choices, including API-key
|
|
authentication in gateway-owned SQLite and the concrete installed MXAccess
|
|
COM class details from `C:\Users\dohertj2\Desktop\mxaccess`.
|
|
- `docs/GatewayDashboardDesign.md` covers the Blazor Server and Bootstrap
|
|
dashboard for live gateway/session/worker status.
|
|
- `docs/ClientLibrariesDesign.md` covers shared design requirements for
|
|
official gRPC client libraries, test CLIs, and tests for .NET C#, Go, Rust,
|
|
Python, and Java.
|
|
- `docs/ImplementationPlanIndex.md` links the detailed implementation plans
|
|
and recommended Gitea milestones/issues.
|
|
- `docs/GalaxyRepository.md` covers the read-only Galaxy Repository browse
|
|
RPCs that let clients enumerate the deployed object hierarchy and dynamic
|
|
attributes before subscribing via the MXAccess gateway service.
|
|
|
|
Implementation style guides:
|
|
|
|
- `StyleGuide.md` covers project documentation.
|
|
- `docs/style-guides/CSharpStyleGuide.md` covers gateway, worker, .NET client,
|
|
and C# tests.
|
|
- `docs/style-guides/ProtobufStyleGuide.md` covers public gRPC and worker IPC
|
|
contracts.
|
|
- `docs/style-guides/GoStyleGuide.md` covers the Go client.
|
|
- `docs/style-guides/RustStyleGuide.md` covers the Rust client.
|
|
- `docs/style-guides/PythonStyleGuide.md` covers the Python client.
|
|
- `docs/style-guides/JavaStyleGuide.md` covers the Java client.
|
|
|
|
## Process Split
|
|
|
|
### Gateway Process
|
|
|
|
Runtime:
|
|
|
|
- .NET 10
|
|
- C#
|
|
- x64 preferred
|
|
- ASP.NET Core gRPC server
|
|
|
|
Responsibilities:
|
|
|
|
- expose the public TCP/gRPC API,
|
|
- authenticate/authorize remote clients if needed,
|
|
- create one worker per client session,
|
|
- route commands to the owning worker,
|
|
- stream worker events to the owning client,
|
|
- enforce session leases, heartbeats, timeouts, and quotas,
|
|
- kill/restart workers when they hang or crash,
|
|
- collect metrics and structured logs,
|
|
- optionally route selected future operations to ASB or managed NMX only after
|
|
parity tests prove equivalent behavior.
|
|
|
|
The gateway must never instantiate or call MXAccess directly.
|
|
|
|
The gateway observability foundation lives in `ZB.MOM.WW.MxGateway.Server.Diagnostics`
|
|
and `ZB.MOM.WW.MxGateway.Server.Metrics`. Structured logging scopes carry session,
|
|
worker, correlation, command, and client identity fields with redaction applied
|
|
before values enter log state. `GatewayMetrics` exposes counters, gauges, and
|
|
histograms through .NET `Meter` and a snapshot API that dashboard services can
|
|
project without binding to a metrics exporter.
|
|
`DashboardSnapshotService` projects sessions, workers, metrics, faults, and
|
|
effective configuration into immutable DTOs for read-only dashboard rendering.
|
|
The Blazor Server dashboard mounts at the host root and renders those snapshots
|
|
at `/`, `/sessions`, `/workers`, `/events`, `/galaxy`, `/alarms`, `/apikeys`,
|
|
and `/settings`. Pages connect to `/hubs/snapshot` (a SignalR hub published by
|
|
`DashboardSnapshotPublisher`) and refresh on every push instead of polling.
|
|
`/hubs/alarms` broadcasts `AlarmFeedMessage` values from the central alarm
|
|
monitor; `/hubs/events` mirrors per-session `MxEvent` traffic from
|
|
`EventStreamService` to clients subscribed to `session:{id}`. The dashboard
|
|
uses local Bootstrap CSS and JavaScript plus a small local stylesheet; it does
|
|
not use a Blazor UI component library.
|
|
|
|
`/browse` walks the `IGalaxyHierarchyCache` tree and reads subscribed tag
|
|
values live through `IDashboardLiveDataService`, which owns one shared,
|
|
lazily-opened gateway session for the whole dashboard. `/alarms` reads the
|
|
central alarm monitor's in-process cache directly. See
|
|
`docs/GatewayDashboardDesign.md`.
|
|
|
|
The gateway runs an always-on central alarm monitor (`GatewayAlarmMonitor`):
|
|
one gateway-owned worker session subscribes the configured AVEVA alarm
|
|
provider, caches the active-alarm set (reconciled periodically against the
|
|
worker's snapshot), and fans it out to every client through the session-less
|
|
`StreamAlarms` RPC — the stream opens with the current active-alarm snapshot,
|
|
then streams live transitions. `AcknowledgeAlarm` is session-less and routes
|
|
through the monitor. Clients never open a worker session to see alarms, and
|
|
alarm monitoring is independent of client lifecycle; the monitor re-opens its
|
|
session if the worker faults. Gated by `MxGateway:Alarms:Enabled` — see
|
|
`docs/DesignDecisions.md` for why this reverses the v1 single-subscriber rule
|
|
for the alarm subsystem.
|
|
|
|
Dashboard authentication is LDAP-backed (distinct from the API-key model on
|
|
the gRPC API). `/login` accepts username and password in a form body, binds
|
|
against `MxGateway:Ldap`, maps the user's LDAP groups to `Admin` or `Viewer`
|
|
via `MxGateway:Dashboard:GroupToRole`, and issues an HTTP-only secure
|
|
`__Host-MxGatewayDashboard` cookie. `/logout` clears it. Login and logout
|
|
posts validate antiforgery tokens. SignalR hub connections accept either the
|
|
cookie or a 30-minute data-protected bearer minted at `/hubs/token`.
|
|
`MxGateway:Dashboard:AllowAnonymousLocalhost` permits loopback to bypass the
|
|
cookie requirement; remote requests always require an authenticated principal
|
|
with at least the Viewer role. Setting `MxGateway:Dashboard:Enabled` to
|
|
`false` leaves the dashboard and hub routes unmapped.
|
|
|
|
### Worker Process
|
|
|
|
Runtime:
|
|
|
|
- .NET Framework 4.8
|
|
- C#
|
|
- x86 build by default
|
|
|
|
Responsibilities:
|
|
|
|
- own one MXAccess COM instance,
|
|
- create and preserve one dedicated STA thread,
|
|
- pump Windows/COM messages on that STA thread,
|
|
- execute every MXAccess method call on that STA thread,
|
|
- subscribe to MXAccess COM events,
|
|
- convert command results and events into internal protobuf DTOs,
|
|
- send events back to the gateway over the worker pipe,
|
|
- shut down cleanly on request,
|
|
- terminate quickly when the gateway kills the process.
|
|
|
|
The worker should be disposable. If MXAccess leaks state, faults, or wedges the
|
|
STA, the gateway can kill the process without corrupting other clients.
|
|
|
|
## Why Not gRPC In The Worker
|
|
|
|
.NET Framework 4.8 does not have the same first-class gRPC stack as .NET 10.
|
|
For the worker, a custom local protocol is simpler and more predictable:
|
|
|
|
- named pipes are Windows-native,
|
|
- no HTTP/2 requirement,
|
|
- fewer dependencies in the x86 process,
|
|
- easier process lifetime control,
|
|
- easier framed binary protocol,
|
|
- sufficient throughput for command and event traffic.
|
|
|
|
The public API can still be modern gRPC because the gateway runs on .NET 10.
|
|
|
|
## Worker IPC
|
|
|
|
Default transport: one bidirectional named pipe per worker.
|
|
|
|
Pipe name:
|
|
|
|
```text
|
|
mxaccess-gateway-{gatewayProcessId}-{sessionId}
|
|
```
|
|
|
|
Message framing:
|
|
|
|
```text
|
|
uint32 little-endian payload_length
|
|
payload_length bytes protobuf WorkerEnvelope
|
|
uint32 little-endian payload_length
|
|
payload_length bytes protobuf WorkerEnvelope
|
|
...
|
|
```
|
|
|
|
The gateway creates the pipe server, starts the worker with the pipe name as an
|
|
argument, then waits for the worker to connect and send `WorkerReady`.
|
|
|
|
Pipe security:
|
|
|
|
- local machine only,
|
|
- ACL restricted to the gateway identity and the launched worker identity,
|
|
- no anonymous access,
|
|
- optionally add a per-session random handshake nonce passed by command line or
|
|
inherited environment.
|
|
|
|
### Worker Envelope
|
|
|
|
Every IPC message uses a common envelope:
|
|
|
|
```protobuf
|
|
message WorkerEnvelope {
|
|
uint32 protocol_version = 1;
|
|
string session_id = 2;
|
|
uint64 sequence = 3;
|
|
uint64 correlation_id = 4;
|
|
oneof body {
|
|
WorkerHello worker_hello = 10;
|
|
GatewayHello gateway_hello = 11;
|
|
WorkerReady worker_ready = 12;
|
|
WorkerCommand command = 20;
|
|
WorkerCommandReply command_reply = 21;
|
|
WorkerEvent event = 22;
|
|
WorkerHeartbeat heartbeat = 23;
|
|
WorkerCancel cancel = 24;
|
|
WorkerShutdown shutdown = 25;
|
|
WorkerFault fault = 26;
|
|
}
|
|
}
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `sequence` is monotonic per sender.
|
|
- `correlation_id` links commands to replies.
|
|
- Events use their own correlation id or zero.
|
|
- Replies must preserve MXAccess HRESULT/status information even when the
|
|
command is also represented as a protocol-level failure.
|
|
- Protocol version mismatch fails session creation.
|
|
|
|
## Public gRPC API
|
|
|
|
The external API should be session-oriented. A bidirectional stream is the best
|
|
long-term shape because it naturally carries commands, replies, events,
|
|
heartbeats, and cancellation.
|
|
|
|
```protobuf
|
|
service MxAccessGateway {
|
|
rpc OpenSession(OpenSessionRequest) returns (OpenSessionReply);
|
|
rpc CloseSession(CloseSessionRequest) returns (CloseSessionReply);
|
|
rpc Invoke(MxCommandRequest) returns (MxCommandReply);
|
|
rpc StreamEvents(StreamEventsRequest) returns (stream MxEvent);
|
|
rpc Session(stream ClientMessage) returns (stream ServerMessage);
|
|
}
|
|
```
|
|
|
|
Recommended rollout:
|
|
|
|
1. Implement unary `OpenSession`, `CloseSession`, and `Invoke`.
|
|
2. Implement server-streaming `StreamEvents`.
|
|
3. Add bidirectional `Session` after the command/event model is stable.
|
|
|
|
The unary plus event-stream shape is easier to debug initially. The
|
|
bidirectional stream can later reduce per-command overhead and improve
|
|
backpressure.
|
|
|
|
## Public MXAccess Command Surface
|
|
|
|
The gateway contract should mirror MXAccess concepts without leaking COM types.
|
|
Keep handles and statuses explicit.
|
|
|
|
Core commands:
|
|
|
|
- `Register`
|
|
- `Unregister`
|
|
- `AddItem`
|
|
- `AddItem2`
|
|
- `RemoveItem`
|
|
- `Advise`
|
|
- `UnAdvise`
|
|
- `AdviseSupervisory`
|
|
- `AddBufferedItem`
|
|
- `SetBufferedUpdateInterval`
|
|
- `Suspend`
|
|
- `Activate`
|
|
- `Write`
|
|
- `Write2`
|
|
- `WriteSecured`
|
|
- `WriteSecured2`
|
|
- `AuthenticateUser`
|
|
- `ArchestrAUserToId`
|
|
|
|
Bulk variants (single gRPC round-trip carries the full list, the worker
|
|
runs the per-item MXAccess calls sequentially on its STA, and the reply
|
|
returns one result per requested entry — per-entry failures populate
|
|
`was_successful = false` + `error_message` and never throw):
|
|
|
|
- `AddItemBulk` — `repeated string tag_addresses` → `BulkSubscribeReply`.
|
|
- `AdviseItemBulk` — `repeated int32 item_handles` → `BulkSubscribeReply`.
|
|
- `RemoveItemBulk` — `repeated int32 item_handles` → `BulkSubscribeReply`.
|
|
- `UnAdviseItemBulk` — `repeated int32 item_handles` → `BulkSubscribeReply`.
|
|
- `SubscribeBulk` — `repeated string tag_addresses` (AddItem + Advise per
|
|
entry, with cleanup on Advise failure) → `BulkSubscribeReply`.
|
|
- `UnsubscribeBulk` — `repeated int32 item_handles` (UnAdvise + RemoveItem
|
|
per entry, with independent error tracking) → `BulkSubscribeReply`.
|
|
- `WriteBulk` — `repeated WriteBulkEntry` (each `{item_handle, value, user_id}`)
|
|
→ `BulkWriteReply` (`repeated BulkWriteResult`). Required scope: `invoke:write`.
|
|
- `Write2Bulk` — `repeated Write2BulkEntry` (each adds `timestamp_value`) →
|
|
`BulkWriteReply`. Required scope: `invoke:write`.
|
|
- `WriteSecuredBulk` — `repeated WriteSecuredBulkEntry` (each
|
|
`{item_handle, current_user_id, verifier_user_id, value}`) → `BulkWriteReply`.
|
|
Required scope: `invoke:secure`. Same redaction rules as single-item
|
|
`WriteSecured`: per-entry `value` must never reach logs unless an explicit
|
|
redacted value-logging path is enabled.
|
|
- `WriteSecured2Bulk` — `repeated WriteSecured2BulkEntry` (each adds
|
|
`timestamp_value`) → `BulkWriteReply`. Required scope: `invoke:secure`.
|
|
- `ReadBulk` — `repeated string tag_addresses` + `uint32 timeout_ms` →
|
|
`BulkReadReply` (`repeated BulkReadResult`). MXAccess COM has no
|
|
synchronous `Read`; the worker satisfies this command by returning the
|
|
last cached `OnDataChange` payload when the requested tag is already
|
|
advised (`was_cached = true`, no subscription side-effects), or by
|
|
taking a full `AddItem` + `Advise` + wait-for-first-OnDataChange +
|
|
`UnAdvise` + `RemoveItem` snapshot lifecycle when no live subscription
|
|
exists (`was_cached = false`). Per-tag timeouts surface as
|
|
`was_successful = false` rather than throwing. The cache lives on the
|
|
worker's `MxAccessValueCache`, populated by `MxAccessBaseEventSink` on
|
|
every `OnDataChange` after the event clears the outbound queue.
|
|
Required scope: `invoke:read`. `timeout_ms == 0` uses the worker's
|
|
default (1000 ms).
|
|
|
|
Optional diagnostics:
|
|
|
|
- `Ping`
|
|
- `GetSessionState`
|
|
- `GetWorkerInfo`
|
|
- `DrainEvents`
|
|
- `ShutdownWorker`
|
|
|
|
Do not compress MXAccess semantics into generic verbs too early. A command enum
|
|
with method-specific payloads is easier to test for parity.
|
|
|
|
## Event Surface
|
|
|
|
The gateway must represent every public MXAccess event family:
|
|
|
|
- `OnDataChange`
|
|
- `OnWriteComplete`
|
|
- `OperationComplete`
|
|
- `OnBufferedDataChange`
|
|
|
|
The event DTO should include:
|
|
|
|
- event family,
|
|
- session id,
|
|
- server handle,
|
|
- item handle,
|
|
- value when present,
|
|
- quality when present,
|
|
- timestamp when present,
|
|
- `MXSTATUS_PROXY[]` equivalent,
|
|
- raw HRESULT/status fields when available,
|
|
- event ordering sequence,
|
|
- worker timestamp,
|
|
- gateway receive timestamp.
|
|
|
|
Keep event order stable per worker. The gateway should not reorder events from
|
|
the same MXAccess instance.
|
|
|
|
## Value Model
|
|
|
|
Use a protobuf value union that can represent COM `VARIANT` values and arrays.
|
|
|
|
```protobuf
|
|
message MxValue {
|
|
oneof kind {
|
|
bool bool_value = 1;
|
|
int32 int32_value = 2;
|
|
int64 int64_value = 3;
|
|
float float_value = 4;
|
|
double double_value = 5;
|
|
string string_value = 6;
|
|
Timestamp timestamp_value = 7;
|
|
MxArray array_value = 8;
|
|
bytes raw_variant = 100;
|
|
}
|
|
}
|
|
```
|
|
|
|
Array support should include at least:
|
|
|
|
- bool array,
|
|
- int32 array,
|
|
- float array,
|
|
- double array,
|
|
- string array,
|
|
- timestamp array,
|
|
- raw fallback.
|
|
|
|
For full parity, unknown or awkward COM values should be preserved as raw
|
|
metadata rather than dropped. If a value cannot be losslessly converted, the
|
|
worker should return both the best typed projection and enough diagnostic
|
|
metadata to reproduce the case.
|
|
|
|
## Status Model
|
|
|
|
Represent `MXSTATUS_PROXY` explicitly:
|
|
|
|
```protobuf
|
|
message MxStatusProxy {
|
|
int32 success = 1;
|
|
uint32 category = 2;
|
|
uint32 detail = 3;
|
|
uint32 source = 4;
|
|
uint32 raw_hresult = 5;
|
|
string text = 6;
|
|
}
|
|
```
|
|
|
|
The exact field names should be adjusted to match the actual interop struct,
|
|
but the design principle is important: do not collapse status arrays into a
|
|
single success flag.
|
|
|
|
For command replies, return:
|
|
|
|
- protocol status,
|
|
- COM HRESULT if available,
|
|
- MXAccess return value if the method has one,
|
|
- method-specific out parameters,
|
|
- status array if the method emits one.
|
|
|
|
## STA Worker Thread Model
|
|
|
|
Each worker owns:
|
|
|
|
- one process,
|
|
- one MXAccess session,
|
|
- one dedicated STA thread,
|
|
- one MXAccess COM object,
|
|
- one inbound command queue,
|
|
- one outbound event queue.
|
|
|
|
All MXAccess operations run on the STA:
|
|
|
|
```text
|
|
pipe reader thread
|
|
-> parse WorkerCommand
|
|
-> enqueue StaCommand
|
|
-> await task completion
|
|
-> write WorkerCommandReply
|
|
|
|
STA thread
|
|
-> CoInitializeEx(APARTMENTTHREADED)
|
|
-> create MXAccess COM object
|
|
-> wire events
|
|
-> run message pump
|
|
-> execute queued commands between message dispatches
|
|
|
|
MXAccess event handler on STA
|
|
-> convert event args to WorkerEvent
|
|
-> enqueue outbound event
|
|
|
|
pipe writer thread
|
|
-> dequeue replies/events
|
|
-> write framed protobuf messages
|
|
```
|
|
|
|
Do not block the STA on pipe writes or gRPC calls. The STA should enqueue
|
|
results/events and return to pumping messages.
|
|
|
|
### Message Pump
|
|
|
|
The STA loop must pump Windows messages and service command work. A typical
|
|
shape:
|
|
|
|
```text
|
|
while not shutdown:
|
|
while command queue has work:
|
|
execute one command on STA
|
|
|
|
MsgWaitForMultipleObjectsEx(
|
|
command_event,
|
|
timeout,
|
|
QS_ALLINPUT,
|
|
MWMO_INPUTAVAILABLE)
|
|
|
|
while PeekMessage:
|
|
TranslateMessage
|
|
DispatchMessage
|
|
```
|
|
|
|
This is the critical piece for MXAccess event delivery. A plain blocking queue
|
|
on an STA thread is not enough if it prevents COM/window messages from being
|
|
pumped.
|
|
|
|
### COM Lifetime
|
|
|
|
Worker startup:
|
|
|
|
1. set apartment state to STA,
|
|
2. initialize COM on the STA,
|
|
3. instantiate `LMXProxyServerClass` or the installed MXAccess interop class,
|
|
4. attach event handlers,
|
|
5. send `WorkerReady`.
|
|
|
|
Worker shutdown:
|
|
|
|
1. reject new commands,
|
|
2. optionally send `UnAdvise`/`RemoveItem`/`Unregister` for active handles,
|
|
3. detach event handlers,
|
|
4. release COM object until reference count reaches zero,
|
|
5. uninitialize COM,
|
|
6. exit process.
|
|
|
|
If graceful shutdown exceeds timeout, the gateway kills the worker.
|
|
|
|
## Session Model
|
|
|
|
One external client session maps to one worker process by default.
|
|
|
|
Session state in the gateway:
|
|
|
|
- session id,
|
|
- client identity,
|
|
- worker process id,
|
|
- pipe name,
|
|
- pipe connection,
|
|
- open time,
|
|
- last heartbeat,
|
|
- active stream subscribers,
|
|
- command timeout policy,
|
|
- event queue metrics.
|
|
|
|
Session state in the worker:
|
|
|
|
- MXAccess COM object,
|
|
- registered server handles,
|
|
- item handles,
|
|
- item definitions/context,
|
|
- advise state,
|
|
- buffered state,
|
|
- authenticated user ids if needed,
|
|
- event sequence number.
|
|
|
|
The gateway should treat worker state as authoritative for MXAccess handles.
|
|
It can keep a shadow state for diagnostics and cleanup, but should not invent
|
|
handles.
|
|
|
|
## Command Execution
|
|
|
|
Every command should follow the same lifecycle:
|
|
|
|
```text
|
|
client sends gRPC command
|
|
gateway validates session and payload
|
|
gateway assigns correlation id
|
|
gateway writes WorkerCommand to pipe
|
|
worker pipe reader enqueues command to STA
|
|
STA executes MXAccess method
|
|
worker captures return value/out params/status/HRESULT
|
|
worker sends WorkerCommandReply
|
|
gateway completes gRPC response
|
|
```
|
|
|
|
Timeouts:
|
|
|
|
- gateway command timeout bounds client waiting,
|
|
- worker command timeout marks the command as stuck,
|
|
- if the STA does not recover after a configurable grace period, kill the
|
|
worker and fail the session.
|
|
|
|
Cancellation:
|
|
|
|
- canceling the gRPC call should stop waiting in the gateway,
|
|
- it cannot safely abort an in-flight COM call on the STA,
|
|
- the worker should finish the COM call and discard or log the late reply if
|
|
the correlation was canceled,
|
|
- hard cancellation means killing the worker process.
|
|
|
|
## Event Delivery And Backpressure
|
|
|
|
Events flow from worker to gateway, then gateway to client streams.
|
|
|
|
Worker policy:
|
|
|
|
- bounded outbound event channel,
|
|
- never block MXAccess event handler on pipe writes,
|
|
- fail the worker session when the outbound channel is full.
|
|
|
|
For full parity testing, default should be fail-fast rather than silent drop.
|
|
For production high-rate telemetry, add explicit coalescing modes.
|
|
|
|
Gateway policy:
|
|
|
|
- one event sequencer per session,
|
|
- preserve per-session event order,
|
|
- allow one active client event subscriber per session,
|
|
- reject a second subscriber with a clear session error,
|
|
- use a bounded `EventStreamService` queue between worker events and gRPC
|
|
writes,
|
|
- fault the session when the bounded stream queue overflows,
|
|
- detach the subscriber when the stream is canceled.
|
|
|
|
The gateway forwards only events reported by the worker. It does not synthesize
|
|
`OperationComplete` from write completion, command replies, or status frames.
|
|
|
|
## Isolation And Fault Handling
|
|
|
|
Failure cases:
|
|
|
|
- worker fails startup,
|
|
- worker pipe disconnects,
|
|
- worker heartbeat expires,
|
|
- worker process exits,
|
|
- STA command times out,
|
|
- MXAccess COM throws,
|
|
- MXAccess event handler throws,
|
|
- client disconnects,
|
|
- gateway shuts down.
|
|
|
|
Policy:
|
|
|
|
- worker startup failure fails `OpenSession`,
|
|
- worker crash emits terminal session fault to client,
|
|
- command exceptions return structured command fault with HRESULT if known,
|
|
- stale sessions are closed by lease timeout,
|
|
- stuck workers are killed by process id,
|
|
- gateway restart does not reattach old workers; `OrphanWorkerCleanupHostedService`
|
|
runs `OrphanWorkerTerminator` once on startup — before the server accepts
|
|
sessions — to kill leftover `ZB.MOM.WW.MxGateway.Worker.exe` processes (matched by the
|
|
configured worker executable path, or by image name when the x64 gateway cannot
|
|
introspect the x86 worker's module) left behind by a previous unclean run.
|
|
|
|
Because each client owns one worker, a crash or leak affects only that session.
|
|
|
|
## Security
|
|
|
|
External gateway:
|
|
|
|
- use TLS for remote gRPC if crossing machine boundaries,
|
|
- authenticate v1 gRPC clients with `authorization: Bearer
|
|
mxgw_<key-id>_<secret>` API-key metadata,
|
|
- reject missing or invalid API keys with gRPC `Unauthenticated`,
|
|
- reject valid keys that lack the required session, invoke, event, metadata, or
|
|
admin scope with gRPC `PermissionDenied`,
|
|
- authorize access to commands that can write, authenticate users, expose
|
|
metadata, stream events, or alter runtime state.
|
|
|
|
Internal worker IPC:
|
|
|
|
- local named pipes only,
|
|
- restrictive pipe ACL,
|
|
- per-session nonce handshake,
|
|
- worker validates gateway hello before creating MXAccess,
|
|
- gateway validates worker executable path and version,
|
|
- no secrets in command line when avoidable.
|
|
|
|
Credential-sensitive commands such as `AuthenticateUser` and `WriteSecured`
|
|
must not log passwords or raw credential values.
|
|
|
|
## Observability
|
|
|
|
Gateway metrics:
|
|
|
|
- sessions open,
|
|
- workers running,
|
|
- worker start latency,
|
|
- command latency by method,
|
|
- command failures by method/status,
|
|
- event rate by session/event type,
|
|
- event queue depth,
|
|
- worker memory/CPU,
|
|
- worker restarts/kills,
|
|
- gRPC stream disconnects.
|
|
|
|
Worker logs:
|
|
|
|
- startup/shutdown,
|
|
- MXAccess COM creation result,
|
|
- command start/end with correlation id,
|
|
- HRESULT/status summary,
|
|
- event family and sequence number,
|
|
- queue overflow,
|
|
- STA watchdog warnings.
|
|
|
|
Do not log full values by default. Make value logging opt-in and redacted where
|
|
credentials or secured writes are involved.
|
|
|
|
## Performance Strategy
|
|
|
|
First priority is parity. Performance comes from process isolation, batching,
|
|
and avoiding unnecessary cross-process round trips.
|
|
|
|
Baseline choices:
|
|
|
|
- long-lived worker per session,
|
|
- persistent pipe,
|
|
- protobuf binary framing,
|
|
- no gRPC inside worker,
|
|
- no COM calls outside STA,
|
|
- event streaming rather than event polling.
|
|
|
|
Optimizations after parity:
|
|
|
|
- batch commands where MXAccess semantics allow,
|
|
- batch events from worker to gateway while preserving order,
|
|
- optional data-change coalescing by item handle,
|
|
- memory-mapped payload slabs for very large arrays,
|
|
- shared schema for typed values to avoid raw COM marshaling at the gateway,
|
|
- gateway-side route to `MxAsbClient` for proven high-volume read/write
|
|
workloads only when caller opts into non-MXAccess-backed behavior or parity
|
|
tests prove equivalence.
|
|
|
|
## Project Layout
|
|
|
|
Suggested additions:
|
|
|
|
```text
|
|
src/ZB.MOM.WW.MxGateway.Contracts/
|
|
Protos/
|
|
mxaccess_gateway.proto
|
|
mxaccess_worker.proto
|
|
Generated/
|
|
|
|
src/ZB.MOM.WW.MxGateway.Server/
|
|
Program.cs
|
|
Sessions/
|
|
Workers/
|
|
Grpc/
|
|
Metrics/
|
|
|
|
src/ZB.MOM.WW.MxGateway.Worker/
|
|
Program.cs
|
|
Ipc/
|
|
Sta/
|
|
MxAccess/
|
|
Conversion/
|
|
|
|
src/ZB.MOM.WW.MxGateway.Tests/
|
|
contract tests
|
|
gateway session tests
|
|
fake worker tests
|
|
|
|
src/ZB.MOM.WW.MxGateway.Worker.Tests/
|
|
value/status conversion tests
|
|
STA queue tests
|
|
|
|
src/ZB.MOM.WW.MxGateway.IntegrationTests/
|
|
optional live MXAccess tests
|
|
```
|
|
|
|
Build outputs:
|
|
|
|
- gateway: .NET 10 x64,
|
|
- worker: .NET Framework 4.8 x86.
|
|
|
|
The contracts project can multi-target if needed, or the `.proto` files can be
|
|
shared as source inputs to both gateway and worker builds.
|
|
|
|
## Worker Implementation Plan
|
|
|
|
### Phase 1: Minimal Worker Harness
|
|
|
|
- Create .NET Framework 4.8 x86 worker executable.
|
|
- Parse pipe name/session id/nonce args.
|
|
- Connect to gateway named pipe.
|
|
- Exchange hello/ready messages.
|
|
- Start STA thread.
|
|
- Create MXAccess COM object on STA.
|
|
- Pump messages.
|
|
- Shut down cleanly.
|
|
|
|
Exit criteria:
|
|
|
|
- gateway can spawn worker,
|
|
- worker reports ready,
|
|
- worker exits on shutdown command,
|
|
- STA remains responsive.
|
|
|
|
### Phase 2: Command Queue
|
|
|
|
- Add command DTOs for `Register`, `Unregister`, `AddItem`, `RemoveItem`.
|
|
- Implement STA command dispatch.
|
|
- Return method result, HRESULT, and structured fault.
|
|
- Add command timeout handling in gateway.
|
|
|
|
Exit criteria:
|
|
|
|
- client can open a session and perform basic handle lifecycle through gRPC.
|
|
|
|
### Phase 3: Event Stream
|
|
|
|
- Wire MXAccess events in the worker.
|
|
- Convert `OnDataChange`, `OnWriteComplete`, `OperationComplete`, and
|
|
`OnBufferedDataChange` to protobuf events.
|
|
- Add event sequence numbers.
|
|
- Add gateway `StreamEvents`.
|
|
|
|
Exit criteria:
|
|
|
|
- advised item changes reach a .NET 10 client without the client owning an STA.
|
|
|
|
### Phase 4: Full Command Surface
|
|
|
|
Add remaining MXAccess methods:
|
|
|
|
- `Advise`
|
|
- `UnAdvise`
|
|
- `AdviseSupervisory`
|
|
- `AddItem2`
|
|
- `AddBufferedItem`
|
|
- `SetBufferedUpdateInterval`
|
|
- `Suspend`
|
|
- `Activate`
|
|
- `Write`
|
|
- `Write2`
|
|
- `WriteSecured`
|
|
- `WriteSecured2`
|
|
- `AuthenticateUser`
|
|
- `ArchestrAUserToId`
|
|
|
|
Exit criteria:
|
|
|
|
- gRPC command surface covers the installed MXAccess public method set.
|
|
|
|
### Phase 5: Parity Harness
|
|
|
|
- Reuse existing MXAccess trace harness scenarios.
|
|
- Run each scenario against direct MXAccess and against the gateway.
|
|
- Compare:
|
|
- return values,
|
|
- HRESULTs/exceptions,
|
|
- event sequence,
|
|
- value projection,
|
|
- quality/status arrays,
|
|
- invalid handle behavior,
|
|
- cleanup behavior.
|
|
|
|
Exit criteria:
|
|
|
|
- documented parity matrix for all public methods and event families.
|
|
|
|
### Phase 6: Hardening
|
|
|
|
- Worker watchdog.
|
|
- Heartbeats.
|
|
- Process kill/restart.
|
|
- Bounded queues.
|
|
- Backpressure policy.
|
|
- TLS/auth on public gateway.
|
|
- Metrics.
|
|
- Structured logging.
|
|
- Installer/service packaging.
|
|
|
|
Exit criteria:
|
|
|
|
- gateway can run as a Windows service and recover from worker crashes.
|
|
|
|
## Gateway Implementation Plan
|
|
|
|
### Session Manager
|
|
|
|
Core operations:
|
|
|
|
- allocate session id,
|
|
- choose worker executable,
|
|
- create pipe name and nonce,
|
|
- start worker process,
|
|
- accept pipe connection,
|
|
- verify worker hello,
|
|
- track worker state,
|
|
- close or kill worker.
|
|
|
|
The gateway implementation keeps sessions in an in-memory `SessionRegistry`
|
|
keyed by session id. `SessionManager` owns the state machine, creates
|
|
per-session pipe names and nonces, starts the worker through the worker-client
|
|
factory, gates commands to `Ready` sessions, exposes lease-close hooks, and
|
|
cleans up workers during gateway shutdown.
|
|
|
|
State machine:
|
|
|
|
```text
|
|
Creating
|
|
-> StartingWorker
|
|
-> WaitingForPipe
|
|
-> InitializingWorker
|
|
-> Ready
|
|
-> Closing
|
|
-> Closed
|
|
-> Faulted
|
|
```
|
|
|
|
### Worker Client
|
|
|
|
Gateway-side worker client owns:
|
|
|
|
- pipe stream,
|
|
- read loop,
|
|
- write loop,
|
|
- pending command dictionary,
|
|
- event channel,
|
|
- heartbeat monitor,
|
|
- process handle.
|
|
|
|
It should expose:
|
|
|
|
```csharp
|
|
Task<WorkerCommandReply> InvokeAsync(WorkerCommand command, CancellationToken ct);
|
|
IAsyncEnumerable<WorkerEvent> ReadEventsAsync(CancellationToken ct);
|
|
Task ShutdownAsync(TimeSpan timeout);
|
|
void Kill();
|
|
```
|
|
|
|
### gRPC Layer
|
|
|
|
The gRPC layer should be thin:
|
|
|
|
- validate request,
|
|
- find session,
|
|
- call session worker client,
|
|
- map worker reply to public reply,
|
|
- stream events from session event channel.
|
|
|
|
Avoid embedding MXAccess-specific business logic in gRPC handlers. Keep the
|
|
translation code testable.
|
|
|
|
The gateway maps `MxAccessGateway` to `MxAccessGatewayService`. The service
|
|
implements `OpenSession`, `CloseSession`, `Invoke`, and `StreamEvents` by
|
|
validating public requests, delegating session work to `ISessionManager`, and
|
|
using explicit mapper code for public-to-worker commands and worker replies.
|
|
`StreamEvents` delegates subscriber ownership, ordering, and backpressure to
|
|
`EventStreamService`. Missing sessions and transport failures return gRPC
|
|
status errors; worker command replies preserve MXAccess HRESULT and status
|
|
details in the public reply.
|
|
|
|
## C# Worker Versus C++ Worker
|
|
|
|
Start with a C# .NET Framework 4.8 x86 worker.
|
|
|
|
Reasons:
|
|
|
|
- fastest implementation path,
|
|
- easiest COM interop/event sink work,
|
|
- straightforward named-pipe/protobuf implementation,
|
|
- easier logging and diagnostics,
|
|
- easier parity iteration.
|
|
|
|
C++/CLI or native C++ remains an escape hatch if C# COM interop proves
|
|
insufficient. The pipe protocol should be language-neutral so a future C++
|
|
worker can replace the C# worker without changing gateway or clients.
|
|
|
|
Use C++ only if evidence shows:
|
|
|
|
- C# event sinks cannot reliably pump MXAccess events,
|
|
- COM `VARIANT`/`SAFEARRAY` conversion loses required data,
|
|
- throughput is bottlenecked by .NET COM marshaling,
|
|
- MXAccess requires ATL-style connection point behavior not reproducible from
|
|
C#.
|
|
|
|
## Compatibility Baseline
|
|
|
|
The proxy should preserve direct MXAccess behavior, including surprising cases.
|
|
|
|
Known important parity areas from existing captures:
|
|
|
|
- `WriteSecured` may fail before a value-bearing NMX body is emitted.
|
|
- `WriteSecured2` can succeed in observed native paths.
|
|
- `OperationComplete` is distinct from write completion.
|
|
- `OnBufferedDataChange` has a distinct public event shape.
|
|
- Invalid handles and cross-server handles have specific exception/status
|
|
behavior.
|
|
- STA message pumping is required for event delivery.
|
|
|
|
The gateway should not "fix" these behaviors unless the client explicitly opts
|
|
into a non-parity mode.
|
|
|
|
## Future Backend Routing
|
|
|
|
After the MXAccess-backed proxy is stable, the gateway can optionally support
|
|
other backends behind the same public contract:
|
|
|
|
- `MxAsbClient` for high-volume basic read/write where poll-based subscription
|
|
semantics are acceptable or proven equivalent for a workload,
|
|
- managed NMX for native callback experiments and eventual MXAccess-free
|
|
replacement work,
|
|
- direct MXAccess worker as the default parity backend.
|
|
|
|
Routing must be explicit and observable:
|
|
|
|
- event/reply includes backend name,
|
|
- tests assert backend choice,
|
|
- no silent fallback that changes semantics.
|
|
|
|
Initial production mode should be:
|
|
|
|
```text
|
|
backend = mxaccess-worker
|
|
```
|
|
|
|
## Open Questions
|
|
|
|
Current v1 decisions are recorded in `docs/DesignDecisions.md`.
|
|
|
|
Resolved for v1:
|
|
|
|
- MXAccess COM target is `ArchestrA.MxAccess.LMXProxyServerClass` /
|
|
`LMXProxy.LMXProxyServer.1` from the installed 32-bit `LmxProxy.dll`.
|
|
- One `OpenSession` maps to one worker process; no reconnectable sessions.
|
|
- One active event subscriber per session.
|
|
- API key authentication with hashed keys in gateway-owned SQLite.
|
|
- Basic Blazor Server dashboard with Bootstrap CSS/JS and real-time updates.
|
|
- Workers run as the gateway service identity.
|
|
- Event backpressure is fail-fast with bounded queues.
|
|
- No public command batching.
|
|
- `OperationComplete` is forwarded only when native MXAccess raises it.
|
|
- `OnBufferedDataChange` is modeled now; multi-sample payload conversion remains
|
|
capture-validated work.
|
|
|
|
Post-v1 revisit items:
|
|
|
|
- production event-rate target and optional coalescing,
|
|
- reconnectable sessions,
|
|
- multi-subscriber event fan-out,
|
|
- restricted worker process identity,
|
|
- command batching for high-volume setup.
|
|
|
|
## Recommended Next Step
|
|
|
|
Build the smallest end-to-end slice:
|
|
|
|
1. .NET 10 gateway starts.
|
|
2. Client calls `OpenSession`.
|
|
3. Gateway launches .NET Framework 4.8 x86 worker.
|
|
4. Worker creates STA and MXAccess COM object.
|
|
5. Client calls `Register`.
|
|
6. Client calls `AddItem`.
|
|
7. Client calls `Advise`.
|
|
8. Worker forwards one `OnDataChange` event to the gateway.
|
|
9. Gateway streams the event to the client.
|
|
10. Client calls `CloseSession`.
|
|
11. Gateway shuts down the worker.
|
|
|
|
That slice proves the architecture's hardest requirements: process isolation,
|
|
STA ownership, message pumping, command routing, and event streaming.
|