1aafd6bde4
Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
876 lines
29 KiB
Markdown
876 lines
29 KiB
Markdown
# MXAccess Worker Instance Detailed Design
|
||
|
||
## Purpose
|
||
|
||
An MXAccess worker instance is the compatibility boundary around one installed
|
||
MXAccess COM object. It runs as a disposable .NET Framework 4.8 x86 process,
|
||
owns one dedicated STA thread, pumps Windows/COM messages, executes MXAccess
|
||
commands on that STA, and forwards MXAccess events back to the gateway.
|
||
|
||
The worker's job is not to make MXAccess nicer. Its job is to preserve direct
|
||
MXAccess behavior while making that behavior available to modern clients through
|
||
the gateway.
|
||
|
||
## Runtime
|
||
|
||
- Target runtime: .NET Framework 4.8.
|
||
- Language: C#.
|
||
- Platform target: x86 by default.
|
||
- Process lifetime: one worker per gateway session.
|
||
- Public network listeners: none.
|
||
- Gateway IPC: one named pipe with protobuf-framed messages.
|
||
- COM apartment: one dedicated STA thread.
|
||
|
||
Style guides:
|
||
|
||
- [C# Style Guide](./style-guides/CSharpStyleGuide.md)
|
||
- [Protobuf Style Guide](./style-guides/ProtobufStyleGuide.md)
|
||
|
||
## Build And Test
|
||
|
||
Build the SDK-style worker project with the .NET SDK MSBuild entry point. The
|
||
project targets .NET Framework 4.8, but the SDK resolver comes from the .NET SDK
|
||
installation:
|
||
|
||
```powershell
|
||
dotnet msbuild src\MxGateway.Worker\MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
|
||
```
|
||
|
||
`docs/ToolchainLinks.md` records the Visual Studio MSBuild executable for
|
||
classic .NET Framework and COM interop builds:
|
||
|
||
```powershell
|
||
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\MxGateway.Worker\MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
|
||
```
|
||
|
||
Run the worker tests with the same platform target:
|
||
|
||
```powershell
|
||
dotnet test src\MxGateway.Worker.Tests\MxGateway.Worker.Tests.csproj -p:Platform=x86
|
||
```
|
||
|
||
The only MXAccess interop reference belongs in `MxGateway.Worker`. Gateway and
|
||
test projects may reference the worker project for metadata and scaffold tests,
|
||
but they must not reference `ArchestrA.MXAccess.dll` directly.
|
||
|
||
## Responsibilities
|
||
|
||
The worker owns:
|
||
|
||
- connection to the gateway pipe,
|
||
- protocol hello and readiness reporting,
|
||
- STA thread creation and teardown,
|
||
- COM initialization on the STA,
|
||
- MXAccess COM object creation,
|
||
- MXAccess event sink wiring,
|
||
- command dispatch on the STA,
|
||
- MXAccess handle and advise state tracking,
|
||
- value/status/HRESULT capture,
|
||
- conversion to worker protobuf DTOs,
|
||
- event sequencing,
|
||
- heartbeat reporting,
|
||
- graceful shutdown.
|
||
|
||
The worker does not own:
|
||
|
||
- public gRPC API,
|
||
- client authentication,
|
||
- cross-session routing,
|
||
- worker process supervision,
|
||
- remote TLS,
|
||
- policy decisions for other sessions.
|
||
|
||
## Process Bootstrap
|
||
|
||
Expected command-line arguments:
|
||
|
||
```text
|
||
--session-id <sessionId>
|
||
--pipe-name <pipeName>
|
||
--protocol-version <version>
|
||
```
|
||
|
||
Expected protected environment values:
|
||
|
||
```text
|
||
MXGATEWAY_WORKER_NONCE=<random nonce>
|
||
MXGATEWAY_WORKER_LOG_CONTEXT=<optional context>
|
||
```
|
||
|
||
Startup sequence:
|
||
|
||
1. Parse command-line arguments.
|
||
2. Configure minimal logging.
|
||
3. Validate required values are present.
|
||
4. Connect to the gateway named pipe.
|
||
5. Exchange `WorkerHello` and `GatewayHello`.
|
||
6. Validate protocol version, session id, and nonce.
|
||
7. Start the STA runtime.
|
||
8. Create the MXAccess COM object on the STA.
|
||
9. Attach MXAccess event handlers on the STA.
|
||
10. Send `WorkerReady`.
|
||
11. Start pipe read, pipe write, heartbeat, and shutdown coordination loops.
|
||
|
||
If validation fails before MXAccess creation, exit quickly with a non-zero exit
|
||
code. If MXAccess creation fails, send `WorkerFault` when possible and exit.
|
||
|
||
The bootstrap layer returns structured exit codes before it creates pipes,
|
||
starts the STA, or touches MXAccess:
|
||
|
||
| Exit code | Name | Meaning |
|
||
|-----------|------|---------|
|
||
| `0` | `Success` | Required bootstrap options are valid. |
|
||
| `1` | `UnexpectedFailure` | A non-bootstrap exception reaches the process boundary. |
|
||
| `2` | `InvalidArguments` | Required arguments are missing or unknown arguments are present. |
|
||
| `3` | `InvalidProtocolVersion` | `--protocol-version` is not numeric or does not match the supported worker protocol. |
|
||
| `4` | `MissingNonce` | `MXGATEWAY_WORKER_NONCE` is absent or empty. |
|
||
|
||
Bootstrap logs use `WorkerConsoleLogger` key/value output. `WorkerLogRedactor`
|
||
redacts fields whose names indicate nonce, secret, password, token,
|
||
credential, or API key values before the message is written.
|
||
|
||
## Internal Components
|
||
|
||
```text
|
||
MxGateway.Worker
|
||
Program
|
||
Bootstrap
|
||
WorkerOptions
|
||
WorkerHost
|
||
Ipc
|
||
PipeClient
|
||
FrameReader
|
||
FrameWriter
|
||
WorkerProtocol
|
||
Sta
|
||
StaRuntime
|
||
StaCommandQueue
|
||
MessagePump
|
||
StaWatchdog
|
||
MxAccess
|
||
MxAccessSession
|
||
MxAccessCommandDispatcher
|
||
MxAccessEventSink
|
||
MxAccessHandleRegistry
|
||
Conversion
|
||
VariantConverter
|
||
SafeArrayConverter
|
||
StatusProxyConverter
|
||
HResultMapper
|
||
```
|
||
|
||
## Threading Model
|
||
|
||
```text
|
||
main thread
|
||
-> parse args
|
||
-> configure host
|
||
-> coordinate shutdown
|
||
|
||
pipe reader thread/task
|
||
-> read WorkerEnvelope frames
|
||
-> validate protocol
|
||
-> enqueue commands or control messages
|
||
|
||
pipe writer thread/task
|
||
-> serialize WorkerEnvelope frames
|
||
-> write replies, events, heartbeats, faults
|
||
|
||
STA thread
|
||
-> CoInitializeEx(APARTMENTTHREADED)
|
||
-> create MXAccess COM object
|
||
-> attach event handlers
|
||
-> pump Windows/COM messages
|
||
-> execute queued commands
|
||
-> detach events and release COM on shutdown
|
||
|
||
watchdog/heartbeat task
|
||
-> observe STA responsiveness
|
||
-> send heartbeat or fault
|
||
```
|
||
|
||
No MXAccess method may execute outside the STA thread. Do not use `Task.Run`
|
||
around COM calls. Do not let event handlers perform pipe writes.
|
||
|
||
## STA Runtime
|
||
|
||
The STA runtime is the most important part of the worker.
|
||
|
||
Startup:
|
||
|
||
1. Create a dedicated `Thread`.
|
||
2. Set apartment state to `ApartmentState.STA`.
|
||
3. Start the thread.
|
||
4. Inside the thread, initialize COM.
|
||
5. Create the MXAccess COM object.
|
||
6. Attach event handlers.
|
||
7. Signal ready to the worker host.
|
||
8. Enter the message pump.
|
||
|
||
Shutdown:
|
||
|
||
1. Mark the command queue as completing.
|
||
2. Drain or reject pending commands according to shutdown mode.
|
||
3. Optionally issue MXAccess cleanup calls for active handles.
|
||
4. Detach event handlers.
|
||
5. Release COM references.
|
||
6. Uninitialize COM.
|
||
7. Exit the thread.
|
||
|
||
## Message Pump
|
||
|
||
The STA must pump Windows messages while also processing queued commands. A
|
||
blocking queue that prevents message pumping is not acceptable.
|
||
|
||
Required loop shape:
|
||
|
||
```text
|
||
while not shutdown:
|
||
while command queue has work:
|
||
execute one command on STA
|
||
|
||
MsgWaitForMultipleObjectsEx(
|
||
command_event,
|
||
timeout,
|
||
QS_ALLINPUT,
|
||
MWMO_INPUTAVAILABLE)
|
||
|
||
while PeekMessage:
|
||
TranslateMessage
|
||
DispatchMessage
|
||
```
|
||
|
||
The command queue should signal a Win32 event or equivalent wait handle so the
|
||
STA can wake without busy-waiting.
|
||
|
||
The loop should update a heartbeat timestamp after:
|
||
|
||
- successfully pumping messages,
|
||
- starting a command,
|
||
- finishing a command,
|
||
- processing an MXAccess event.
|
||
|
||
`StaRuntime` implements this runtime boundary in the worker. It starts one
|
||
background thread named `MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
|
||
initializes COM through `StaComApartmentInitializer`, and runs
|
||
`StaMessagePump`. Commands are scheduled through `InvokeAsync`; the command
|
||
queue signals an `AutoResetEvent` so `MsgWaitForMultipleObjectsEx` can wake the
|
||
STA without busy-waiting. `LastActivityUtc` records pump, command, startup, and
|
||
shutdown activity so the future heartbeat/watchdog can report whether the STA
|
||
is still responsive. Shutdown marks the runtime as closing, wakes the pump,
|
||
rejects new commands, cancels queued work, uninitializes COM on the STA, and
|
||
waits for the thread to exit.
|
||
|
||
## COM Creation
|
||
|
||
The MXAccess analysis source at `C:\Users\dohertj2\Desktop\mxaccess` identifies
|
||
the installed COM target:
|
||
|
||
- interop assembly:
|
||
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
|
||
- assembly identity:
|
||
`ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae`
|
||
- COM class:
|
||
`ArchestrA.MxAccess.LMXProxyServerClass`
|
||
- CLSID:
|
||
`{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
||
- ProgID:
|
||
`LMXProxy.LMXProxyServer.1`
|
||
- version-independent ProgID:
|
||
`LMXProxy.LMXProxyServer`
|
||
- registered server:
|
||
`C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll`
|
||
- registry view:
|
||
`HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
||
- threading model:
|
||
`Apartment`
|
||
|
||
The worker should reference the interop assembly and instantiate
|
||
`LMXProxyServerClass` on the dedicated STA thread. Keep the ProgID and assembly
|
||
path configurable for diagnostics, but this COM class is the v1 default.
|
||
|
||
`MxAccessStaSession` owns the initial COM creation path. It starts `StaRuntime`,
|
||
creates `LMXProxyServerClass` through `MxAccessComObjectFactory` on the STA,
|
||
attaches `MxAccessBaseEventSink`, and returns `WorkerReady` only after those
|
||
steps succeed. `MxAccessSession` keeps the raw COM object private, records the
|
||
STA managed thread id that created it, detaches the base event sink during
|
||
disposal, and releases the COM reference on the STA. After creation,
|
||
`MxAccessStaSession` owns a `StaCommandDispatcher` backed by
|
||
`MxAccessCommandExecutor`; `DispatchAsync` queues contract commands back to the
|
||
same STA instead of exposing the COM object to callers.
|
||
|
||
Creation rules:
|
||
|
||
- Create COM object only on the STA.
|
||
- Attach event handlers only on the STA.
|
||
- Keep the COM reference private to the STA runtime.
|
||
- Never marshal the raw COM object to pipe reader/writer threads.
|
||
- Capture COM creation HRESULT or exception details.
|
||
|
||
If COM creation fails, the worker should send a structured fault with:
|
||
|
||
- fault category,
|
||
- exception type,
|
||
- HRESULT when available,
|
||
- COM class or ProgID attempted,
|
||
- worker process id,
|
||
- session id.
|
||
|
||
`WorkerPipeSession` maps startup exceptions from this path to
|
||
`WorkerFaultCategory.MxaccessCreationFailed`, includes the captured HRESULT
|
||
when the exception exposes one, and does not send `WorkerReady` after a failed
|
||
COM creation attempt.
|
||
|
||
After `WorkerReady`, `WorkerPipeSession` continues reading gateway frames for
|
||
the lifetime of the process. `WorkerCommand` frames are dispatched to
|
||
`MxAccessStaSession`, replies are written as `WorkerCommandReply`, and queued
|
||
worker events are drained after command replies. `WorkerShutdown` starts the
|
||
graceful shutdown path and returns `WorkerShutdownAck` only after the STA
|
||
cleanup path completes.
|
||
|
||
## Event Sink
|
||
|
||
The worker must subscribe to every public MXAccess event family:
|
||
|
||
- `OnDataChange`
|
||
- `OnWriteComplete`
|
||
- `OperationComplete`
|
||
- `OnBufferedDataChange`
|
||
|
||
Forward these event families only when the native MXAccess COM object raises
|
||
them. Do not synthesize `OperationComplete` from write completion or command
|
||
status. `OnBufferedDataChange` must be represented in the protocol now, but
|
||
multi-sample payload conversion should remain capture-validated; preserve raw
|
||
metadata whenever conversion is incomplete.
|
||
|
||
Event handling rules:
|
||
|
||
- Event handlers are expected to run on the STA.
|
||
- Assign a monotonic worker event sequence.
|
||
- Convert event args to `WorkerEvent`.
|
||
- Include value, quality, timestamp, handles, status arrays, and raw status
|
||
details when available.
|
||
- Preserve raw event payload metadata for unsupported buffered or
|
||
completion-only shapes.
|
||
- Enqueue to the outbound event queue.
|
||
- Return quickly to preserve message pumping.
|
||
|
||
`MxAccessBaseEventSink` implements the COM connection-point handlers and keeps
|
||
the handlers limited to event argument conversion plus enqueue. It uses
|
||
`MxAccessEventMapper` to create `MxEvent` DTOs for `OnDataChange`,
|
||
`OnWriteComplete`, `OperationComplete`, and `OnBufferedDataChange`. The mapper
|
||
converts scalar and array values through `VariantConverter`, converts
|
||
`MXSTATUS_PROXY[]` through `MxStatusProxyConverter`, and maps installed
|
||
`MxDataType` values to the public protobuf enum while preserving the raw data
|
||
type on buffered events. `OperationComplete` is only emitted from the native
|
||
`OperationComplete` handler; write completion does not synthesize it.
|
||
|
||
`MxAccessEventQueue` is the bounded outbound event queue for one worker
|
||
session. It assigns the monotonic `WorkerSequence` and `WorkerTimestamp` when an
|
||
event is accepted, preserving the order in which MXAccess handlers enqueue
|
||
events. The default capacity is `10000`. When the queue reaches capacity it
|
||
records a `WorkerFaultCategory.QueueOverflow` fault and rejects further events.
|
||
The event handler catches conversion and enqueue failures, records the first
|
||
fault on the queue, and returns to the STA message pump instead of writing to
|
||
the pipe.
|
||
|
||
If event conversion throws, catch it inside the event handler, record a
|
||
structured `WorkerFault`, and keep the worker alive only if the fault policy
|
||
allows it.
|
||
|
||
## Command Queue
|
||
|
||
The pipe reader converts `WorkerCommand` messages into `StaCommand` entries.
|
||
|
||
Each entry should include:
|
||
|
||
- correlation id,
|
||
- method name,
|
||
- method-specific request payload,
|
||
- enqueue timestamp,
|
||
- cancellation marker,
|
||
- reply completion path.
|
||
|
||
The STA command dispatcher:
|
||
|
||
1. Dequeues one command.
|
||
2. Checks whether shutdown has started.
|
||
3. Calls the matching MXAccess method.
|
||
4. Captures return values, out parameters, status arrays, and HRESULT.
|
||
5. Converts results to `WorkerCommandReply`.
|
||
6. Enqueues the reply to the pipe writer.
|
||
|
||
The STA should execute one command at a time. MXAccess command ordering must be
|
||
preserved for one worker.
|
||
|
||
## Command Dispatch Surface
|
||
|
||
Phase 1 commands:
|
||
|
||
- `Register`
|
||
- `Unregister`
|
||
- `AddItem`
|
||
- `RemoveItem`
|
||
|
||
Phase 2 event commands:
|
||
|
||
- `Advise`
|
||
- `UnAdvise`
|
||
- `AdviseSupervisory`
|
||
|
||
Full surface:
|
||
|
||
- `AddItem2`
|
||
- `AddBufferedItem`
|
||
- `SetBufferedUpdateInterval`
|
||
- `Suspend`
|
||
- `Activate`
|
||
- `Write`
|
||
- `Write2`
|
||
- `WriteSecured`
|
||
- `WriteSecured2`
|
||
- `AuthenticateUser`
|
||
- `ArchestrAUserToId`
|
||
|
||
Diagnostics:
|
||
|
||
- `Ping`
|
||
- `GetSessionState`
|
||
- `GetWorkerInfo`
|
||
- `DrainEvents`
|
||
- `ShutdownWorker`
|
||
|
||
Implement method-specific dispatch instead of a generic string method invoker.
|
||
Parity tests need stable command-specific request and reply shapes.
|
||
|
||
`MxAccessCommandExecutor` implements the first command pair:
|
||
|
||
- `Register` calls `LMXProxyServerClass.Register` with the requested client
|
||
name and preserves the returned server handle in both `ReturnValue` and
|
||
`RegisterReply.ServerHandle`.
|
||
- `Unregister` calls `LMXProxyServerClass.Unregister` with the requested server
|
||
handle. The reply has no method-specific payload because the public MXAccess
|
||
method returns `void`.
|
||
|
||
Both commands set `Hresult` to `0` only after the COM call returns normally.
|
||
COM exceptions flow through `StaCommandDispatcher`, which captures the thrown
|
||
HRESULT and converts the reply to `ProtocolStatusCode.MxaccessFailure`.
|
||
`MxAccessStaSession.GetRegisteredServerHandlesAsync` returns an STA-read
|
||
snapshot of tracked server handles for diagnostics and future cleanup logic.
|
||
|
||
`MxAccessCommandExecutor` also implements the item lifecycle commands:
|
||
|
||
- `AddItem` calls `LMXProxyServerClass.AddItem` with the requested server
|
||
handle and item definition. It preserves the returned item handle in both
|
||
`ReturnValue` and `AddItemReply.ItemHandle`.
|
||
- `AddItem2` calls `LMXProxyServerClass.AddItem2` with the requested server
|
||
handle, item definition, and context string. The context string is passed to
|
||
MXAccess exactly as received.
|
||
- `RemoveItem` calls `LMXProxyServerClass.RemoveItem` with the requested server
|
||
handle and item handle. The reply has no method-specific payload because the
|
||
public MXAccess method returns `void`.
|
||
|
||
The worker records item handles only after `AddItem` or `AddItem2` returns
|
||
normally, and removes item handles only after `RemoveItem` returns normally.
|
||
The registry does not prevalidate server or item handles, so invalid and
|
||
cross-server handle behavior remains owned by MXAccess. COM exceptions continue
|
||
through `StaCommandDispatcher`, which preserves the HRESULT and leaves
|
||
diagnostic registry state unchanged for failed cleanup calls.
|
||
|
||
`MxAccessCommandExecutor` implements advice lifecycle commands on the same STA
|
||
path:
|
||
|
||
- `Advise` calls `LMXProxyServerClass.Advise` with the requested server handle
|
||
and item handle.
|
||
- `AdviseSupervisory` calls `LMXProxyServerClass.AdviseSupervisory` with the
|
||
requested server handle and item handle. This remains a distinct command from
|
||
plain `Advise` even though observed scalar captures share the same lower-level
|
||
subscription body.
|
||
- `UnAdvise` calls `LMXProxyServerClass.UnAdvise` with the requested server
|
||
handle and item handle.
|
||
|
||
The worker records plain and supervisory advice separately only after the COM
|
||
call returns normally. Successful `UnAdvise` removes all tracked advice for the
|
||
server and item pair because the public MXAccess cleanup method has no plain
|
||
versus supervisory selector. Successful `RemoveItem` and `Unregister` also clear
|
||
related advice state from the worker registry. Failed advice and cleanup calls
|
||
leave registry state unchanged so diagnostics continue to reflect the last
|
||
successful MXAccess-owned state transition.
|
||
|
||
## Handle Registry
|
||
|
||
The worker should track MXAccess state for diagnostics and cleanup, while still
|
||
treating MXAccess as the authority.
|
||
|
||
Suggested tracked state:
|
||
|
||
- registered server handles,
|
||
- item handles,
|
||
- item names and context,
|
||
- server handle for each item,
|
||
- advise state,
|
||
- buffered item state,
|
||
- authenticated user ids if needed,
|
||
- last command touching each handle.
|
||
|
||
Rules:
|
||
|
||
- Do not invent handles.
|
||
- Do not rewrite handles returned by MXAccess.
|
||
- Record server handles only after `Register` succeeds.
|
||
- Remove server handles only after `Unregister` succeeds.
|
||
- Record item handles only after `AddItem` or `AddItem2` succeeds.
|
||
- Remove item handles only after `RemoveItem` succeeds.
|
||
- Record advice state only after `Advise` or `AdviseSupervisory` succeeds.
|
||
- Remove advice state only after `UnAdvise`, `RemoveItem`, or `Unregister`
|
||
succeeds.
|
||
- Preserve invalid-handle behavior from MXAccess.
|
||
- Preserve cross-server handle behavior from MXAccess.
|
||
- Use registry state for cleanup and diagnostics, not semantic correction.
|
||
|
||
## Value Conversion
|
||
|
||
`VariantConverter` should convert COM values into the protobuf `MxValue` union.
|
||
|
||
Supported scalar projections:
|
||
|
||
- bool,
|
||
- int32,
|
||
- int64,
|
||
- float,
|
||
- double,
|
||
- string,
|
||
- timestamp,
|
||
- raw fallback.
|
||
|
||
Supported arrays:
|
||
|
||
- bool array,
|
||
- int32 array,
|
||
- float array,
|
||
- double array,
|
||
- string array,
|
||
- timestamp array,
|
||
- raw fallback.
|
||
|
||
Rules:
|
||
|
||
- Preserve null and empty values distinctly when MXAccess exposes a distinction.
|
||
- Preserve array rank and dimensions when available.
|
||
- Preserve original variant type metadata.
|
||
- If conversion is lossy, include the best typed value plus raw diagnostic
|
||
metadata.
|
||
- Do not throw away values just because they are awkward.
|
||
|
||
Credential-bearing values must not be logged.
|
||
|
||
## Status And HRESULT Capture
|
||
|
||
`MXSTATUS_PROXY` arrays must be represented explicitly. Do not collapse status
|
||
arrays into a single success flag.
|
||
|
||
For every command reply, capture:
|
||
|
||
- protocol success/failure,
|
||
- method name,
|
||
- correlation id,
|
||
- COM HRESULT if available,
|
||
- thrown exception HRESULT if available,
|
||
- MXAccess return value if any,
|
||
- method-specific out parameters,
|
||
- status array,
|
||
- diagnostic message safe for logs.
|
||
|
||
If a COM call throws, map the exception into a command reply instead of
|
||
crashing the worker, unless the exception indicates process corruption or the
|
||
configured policy says to fail the session.
|
||
|
||
## Cancellation
|
||
|
||
Worker cancellation is cooperative at the queue boundary.
|
||
|
||
Rules:
|
||
|
||
- If a `WorkerCancel` arrives before a command starts, mark the command
|
||
canceled and reply or drop according to protocol policy.
|
||
- If a command is already executing on the STA, do not attempt to abort the COM
|
||
call.
|
||
- When the COM call returns after gateway cancellation, send the reply only if
|
||
the gateway still wants late replies; otherwise log and discard.
|
||
- Hard cancellation is process kill by the gateway.
|
||
|
||
## Outbound Queues
|
||
|
||
The worker should use bounded outbound queues for replies, events, heartbeats,
|
||
and faults.
|
||
|
||
Priority order when writing:
|
||
|
||
1. faults,
|
||
2. command replies,
|
||
3. shutdown acknowledgements,
|
||
4. heartbeats,
|
||
5. events.
|
||
|
||
Event overflow policy defaults to fail-fast for parity testing. If the event
|
||
queue fills:
|
||
|
||
1. Capture overflow metrics.
|
||
2. Send `WorkerFault` if possible.
|
||
3. Stop accepting new commands.
|
||
4. Let the gateway close or kill the worker.
|
||
|
||
Production coalescing may be added later, but it must be explicit and tested.
|
||
Do not drop or coalesce events in v1.
|
||
|
||
## Heartbeat And Watchdog
|
||
|
||
`WorkerPipeSession` starts the heartbeat loop after the gateway validates
|
||
`WorkerHello` and receives `WorkerReady`. Heartbeats continue until
|
||
`WorkerShutdown`, cancellation, or a pipe/protocol failure stops the session.
|
||
The loop uses `WorkerPipeSessionOptions.HeartbeatInterval`; the default matches
|
||
the gateway worker heartbeat interval.
|
||
|
||
The worker heartbeat proves that:
|
||
|
||
- pipe writer is alive,
|
||
- worker host is alive,
|
||
- STA has recently pumped or completed work.
|
||
|
||
Heartbeat payload includes:
|
||
|
||
- worker process id,
|
||
- session id,
|
||
- current state,
|
||
- last STA activity timestamp,
|
||
- pending command count,
|
||
- outbound event queue depth,
|
||
- event sequence,
|
||
- current command correlation id if any.
|
||
|
||
`MxAccessStaSession.CaptureHeartbeat()` reads `StaRuntime.LastActivityUtc` and
|
||
`StaCommandDispatcher` queue state without touching the raw MXAccess COM object
|
||
outside the STA. Event queue depth and event sequence are reported as zero until
|
||
the event queue implementation owns those counters.
|
||
|
||
The STA watchdog currently emits a `WorkerFault` with
|
||
`WorkerFaultCategory.StaHung` when `LastStaActivityUtc` is older than
|
||
`WorkerPipeSessionOptions.HeartbeatGrace` **and no command is in flight**.
|
||
`StaRuntime.ProcessQueuedCommands` calls `MarkActivity()` only immediately
|
||
before and after each work item, so a synchronously long-running STA command
|
||
(for example a `ReadBulk` waiting `timeout_ms` for the first `OnDataChange`)
|
||
legitimately freezes `LastStaActivityUtc` for the duration of the wait while
|
||
the worker is healthy. The watchdog is therefore suppressed while the
|
||
heartbeat snapshot's `CurrentCommandCorrelationId` is non-empty: the worker is
|
||
busy executing a command, not hung, and the heartbeat already surfaces the
|
||
in-flight correlation id so the gateway can apply its own per-command timeout
|
||
if it considers the command too slow. The fault still fires on a truly hung
|
||
STA — no command in flight and no activity for longer than `HeartbeatGrace` —
|
||
which is the only case the watchdog can usefully distinguish from a slow
|
||
command. Command duration and high event queue depth remain observable through
|
||
heartbeat fields until dedicated thresholds own those warnings. The worker
|
||
reports stale STA activity, but the gateway owns the final kill decision
|
||
through its existing heartbeat and worker lifecycle policy.
|
||
|
||
The in-flight-command suppression itself is bounded by
|
||
`WorkerPipeSessionOptions.HeartbeatStuckCeiling` (default 75 seconds = 5 ×
|
||
`HeartbeatGrace`). The motivating case for the suppression is a legitimately
|
||
slow synchronous command — but a genuinely stuck COM call (for example
|
||
against a dead MXAccess provider whose cross-apartment marshaler is
|
||
permanently blocked, or a write completion that never fires) leaves
|
||
`CurrentCommandCorrelationId` non-empty indefinitely. Without an upper bound
|
||
the worker-side `StaHung` watchdog would be permanently defeated for that
|
||
session and only the gateway's per-command timeout would catch the hang —
|
||
losing the worker-originated diagnostic (`StaHung` fault category, the
|
||
stale-by interval) from the gateway audit trail. Once `LastStaActivityUtc`
|
||
has been stale for longer than `HeartbeatStuckCeiling`, the watchdog fires
|
||
`StaHung` regardless of whether a command is in flight, on the assumption
|
||
that no legitimate STA command should run that long without periodically
|
||
refreshing activity. Deployments that legitimately run very long bulk
|
||
operations should raise the ceiling rather than disable it.
|
||
|
||
## Shutdown
|
||
|
||
Graceful shutdown sequence:
|
||
|
||
1. Pipe reader receives `WorkerShutdown`.
|
||
2. Worker host marks shutdown requested.
|
||
3. Reject new commands.
|
||
4. Let current STA command finish if within timeout.
|
||
5. Optionally run MXAccess cleanup:
|
||
- `UnAdvise`,
|
||
- `RemoveItem`,
|
||
- `Unregister`.
|
||
6. Detach event handlers.
|
||
7. Release COM object until reference count reaches zero when possible.
|
||
8. Stop pipe reader and writer.
|
||
9. Exit process with success code.
|
||
|
||
If shutdown wedges, the gateway kills the process. The worker should be written
|
||
so process kill does not corrupt other sessions.
|
||
|
||
`MxAccessStaSession.ShutdownGracefullyAsync` implements the current cleanup
|
||
path. It first calls `StaCommandDispatcher.RequestShutdown()` so new commands
|
||
are rejected and queued commands that have not started receive
|
||
`ProtocolStatusCode.WorkerUnavailable`. The command already executing on the
|
||
STA is allowed to finish until the shutdown grace period expires.
|
||
|
||
After command dispatch is closed, cleanup runs on the STA in MXAccess handle
|
||
order:
|
||
|
||
1. one `UnAdvise` call per advised server/item pair,
|
||
2. `RemoveItem` for active item handles,
|
||
3. `Unregister` for active server handles,
|
||
4. event sink detach,
|
||
5. COM release.
|
||
|
||
Each cleanup call is best effort. A failed cleanup operation is recorded as an
|
||
`MxAccessShutdownFailure`, logged by `WorkerPipeSession`, and does not prevent
|
||
later cleanup calls from running. A shutdown with cleanup failures still returns
|
||
`WorkerShutdownAck` with `ProtocolStatusCode.Ok` because the worker reached the
|
||
controlled release path. If the grace period expires before cleanup can run or
|
||
finish, the worker reports `WorkerFaultCategory.ShutdownTimeout` when possible
|
||
and relies on the gateway to kill the process.
|
||
|
||
## Fault Handling
|
||
|
||
Worker fault categories:
|
||
|
||
- `InvalidArguments`
|
||
- `GatewayAuthenticationFailed`
|
||
- `ProtocolMismatch`
|
||
- `ProtocolViolation`
|
||
- `PipeDisconnected`
|
||
- `MxAccessCreationFailed`
|
||
- `MxAccessCommandFailed`
|
||
- `MxAccessEventConversionFailed`
|
||
- `StaHung`
|
||
- `QueueOverflow`
|
||
- `ShutdownTimeout`
|
||
|
||
Fault payload should include:
|
||
|
||
- category,
|
||
- session id,
|
||
- correlation id when command-specific,
|
||
- command method when command-specific,
|
||
- HRESULT when available,
|
||
- exception type when available,
|
||
- safe diagnostic message.
|
||
|
||
Do not include raw credentials or full secured-write values.
|
||
|
||
## Security
|
||
|
||
The worker should trust only the launching gateway after validating:
|
||
|
||
- expected session id,
|
||
- expected protocol version,
|
||
- nonce,
|
||
- pipe identity where available.
|
||
|
||
It should not expose any network listener. It should not accept commands from
|
||
arbitrary local processes.
|
||
|
||
Credential-bearing commands must keep credential data out of:
|
||
|
||
- command line,
|
||
- logs,
|
||
- metrics labels,
|
||
- exception messages,
|
||
- crash dumps when avoidable.
|
||
|
||
## Observability
|
||
|
||
Worker logs should include:
|
||
|
||
- startup arguments except secrets,
|
||
- protocol version,
|
||
- gateway handshake result,
|
||
- MXAccess COM creation result,
|
||
- command start/end with correlation id,
|
||
- HRESULT/status summary,
|
||
- event family and sequence,
|
||
- queue overflow,
|
||
- STA watchdog warnings,
|
||
- shutdown path.
|
||
|
||
Metrics can be emitted through the gateway or exposed as worker heartbeat
|
||
fields. The worker does not need its own public metrics endpoint.
|
||
|
||
## Testing Strategy
|
||
|
||
Worker tests that do not require installed MXAccess:
|
||
|
||
- frame reader/writer,
|
||
- protocol validation,
|
||
- command queue ordering,
|
||
- STA command scheduling with a fake COM object,
|
||
- message-pump wake behavior where practical,
|
||
- value conversion,
|
||
- status conversion,
|
||
- event conversion from fake event args,
|
||
- shutdown state transitions,
|
||
- queue overflow behavior.
|
||
|
||
Live MXAccess tests:
|
||
|
||
- COM creation on STA,
|
||
- `Register` and `Unregister`,
|
||
- `AddItem` and `RemoveItem`,
|
||
- `Advise` and one `OnDataChange`,
|
||
- write completion behavior,
|
||
- secured write behavior,
|
||
- buffered data-change behavior,
|
||
- invalid handle behavior.
|
||
- no synthesized `OperationComplete` when native MXAccess does not raise it.
|
||
- raw metadata preservation for buffered payloads that cannot yet be fully
|
||
converted.
|
||
|
||
Live tests should be opt-in and clearly marked because they depend on installed
|
||
MXAccess COM and provider state.
|
||
The worker test suite uses `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` for these
|
||
tests. `AddItem` uses `TestChildObject.TestInt` by default and accepts an
|
||
override through `MXGATEWAY_LIVE_MXACCESS_ITEM`; `AddItem2` uses the captured
|
||
parity fixture shape `AddItem2("TestInt", "TestChildObject")`.
|
||
|
||
`WorkerLiveMxAccessSmokeTests` in `src/MxGateway.IntegrationTests/` uses the
|
||
same opt-in variable for the gateway-to-worker live smoke. It launches the x86
|
||
worker through `WorkerProcessLauncher`, opens a gateway session, runs
|
||
`Register`, `AddItem`, and `Advise`, waits for one `OnDataChange`, and closes
|
||
the session. The smoke accepts `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` for a
|
||
non-default worker executable path and
|
||
`MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS` for the bounded event wait.
|
||
|
||
## Initial Implementation Slice
|
||
|
||
The first worker slice should implement:
|
||
|
||
1. Argument parsing and pipe connection.
|
||
2. Protocol hello and nonce validation.
|
||
3. STA thread startup.
|
||
4. COM initialization and MXAccess object creation.
|
||
5. Message pump with command wake event.
|
||
6. `WorkerReady`.
|
||
7. Shutdown command.
|
||
8. `Register`, `AddItem`, and `Advise`.
|
||
9. Event sink for one `OnDataChange`.
|
||
10. Basic value/status conversion.
|
||
11. Event model coverage for `OperationComplete` and `OnBufferedDataChange`
|
||
without synthesized events.
|
||
12. Fault reporting.
|
||
|
||
This slice proves the worker can preserve the core MXAccess requirements:
|
||
single-process isolation, STA ownership, message pumping, command execution,
|
||
and event delivery.
|
||
|
||
## Related Documentation
|
||
|
||
- [Worker Bootstrap](./WorkerBootstrap.md)
|
||
- [Worker STA](./WorkerSta.md)
|
||
- [Worker Conversion](./WorkerConversion.md)
|
||
- [Worker Frame Protocol](./WorkerFrameProtocol.md)
|
||
- [Worker Process Launcher](./WorkerProcessLauncher.md)
|
||
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
||
- [Design Decisions](./DesignDecisions.md)
|