dc9c0c950c
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.
External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths
Also fixes two tests that were not rename-related but became visible
while validating the rename:
- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
gateway service correctly maps to RpcException(Cancelled) per gRPC
convention was being misclassified as a stream fault. Added a sibling
catch on RpcException with StatusCode.Cancelled.
- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
and made it accept either a .git marker OR a .sln/.slnx next to src/
so the worker-exe walker works in non-git working copies.
clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.
Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
Tests: 472/472 pass
Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
IntegrationTests: 18/18 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
876 lines
29 KiB
Markdown
876 lines
29 KiB
Markdown
# MXAccess Worker Instance Detailed Design
|
||
|
||
## Purpose
|
||
|
||
An MXAccess worker instance is the compatibility boundary around one installed
|
||
MXAccess COM object. It runs as a disposable .NET Framework 4.8 x86 process,
|
||
owns one dedicated STA thread, pumps Windows/COM messages, executes MXAccess
|
||
commands on that STA, and forwards MXAccess events back to the gateway.
|
||
|
||
The worker's job is not to make MXAccess nicer. Its job is to preserve direct
|
||
MXAccess behavior while making that behavior available to modern clients through
|
||
the gateway.
|
||
|
||
## Runtime
|
||
|
||
- Target runtime: .NET Framework 4.8.
|
||
- Language: C#.
|
||
- Platform target: x86 by default.
|
||
- Process lifetime: one worker per gateway session.
|
||
- Public network listeners: none.
|
||
- Gateway IPC: one named pipe with protobuf-framed messages.
|
||
- COM apartment: one dedicated STA thread.
|
||
|
||
Style guides:
|
||
|
||
- [C# Style Guide](./style-guides/CSharpStyleGuide.md)
|
||
- [Protobuf Style Guide](./style-guides/ProtobufStyleGuide.md)
|
||
|
||
## Build And Test
|
||
|
||
Build the SDK-style worker project with the .NET SDK MSBuild entry point. The
|
||
project targets .NET Framework 4.8, but the SDK resolver comes from the .NET SDK
|
||
installation:
|
||
|
||
```powershell
|
||
dotnet msbuild src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
|
||
```
|
||
|
||
`docs/ToolchainLinks.md` records the Visual Studio MSBuild executable for
|
||
classic .NET Framework and COM interop builds:
|
||
|
||
```powershell
|
||
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
|
||
```
|
||
|
||
Run the worker tests with the same platform target:
|
||
|
||
```powershell
|
||
dotnet test src\ZB.MOM.WW.MxGateway.Worker.Tests\ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86
|
||
```
|
||
|
||
The only MXAccess interop reference belongs in `ZB.MOM.WW.MxGateway.Worker`. Gateway and
|
||
test projects may reference the worker project for metadata and scaffold tests,
|
||
but they must not reference `ArchestrA.MXAccess.dll` directly.
|
||
|
||
## Responsibilities
|
||
|
||
The worker owns:
|
||
|
||
- connection to the gateway pipe,
|
||
- protocol hello and readiness reporting,
|
||
- STA thread creation and teardown,
|
||
- COM initialization on the STA,
|
||
- MXAccess COM object creation,
|
||
- MXAccess event sink wiring,
|
||
- command dispatch on the STA,
|
||
- MXAccess handle and advise state tracking,
|
||
- value/status/HRESULT capture,
|
||
- conversion to worker protobuf DTOs,
|
||
- event sequencing,
|
||
- heartbeat reporting,
|
||
- graceful shutdown.
|
||
|
||
The worker does not own:
|
||
|
||
- public gRPC API,
|
||
- client authentication,
|
||
- cross-session routing,
|
||
- worker process supervision,
|
||
- remote TLS,
|
||
- policy decisions for other sessions.
|
||
|
||
## Process Bootstrap
|
||
|
||
Expected command-line arguments:
|
||
|
||
```text
|
||
--session-id <sessionId>
|
||
--pipe-name <pipeName>
|
||
--protocol-version <version>
|
||
```
|
||
|
||
Expected protected environment values:
|
||
|
||
```text
|
||
MXGATEWAY_WORKER_NONCE=<random nonce>
|
||
MXGATEWAY_WORKER_LOG_CONTEXT=<optional context>
|
||
```
|
||
|
||
Startup sequence:
|
||
|
||
1. Parse command-line arguments.
|
||
2. Configure minimal logging.
|
||
3. Validate required values are present.
|
||
4. Connect to the gateway named pipe.
|
||
5. Exchange `WorkerHello` and `GatewayHello`.
|
||
6. Validate protocol version, session id, and nonce.
|
||
7. Start the STA runtime.
|
||
8. Create the MXAccess COM object on the STA.
|
||
9. Attach MXAccess event handlers on the STA.
|
||
10. Send `WorkerReady`.
|
||
11. Start pipe read, pipe write, heartbeat, and shutdown coordination loops.
|
||
|
||
If validation fails before MXAccess creation, exit quickly with a non-zero exit
|
||
code. If MXAccess creation fails, send `WorkerFault` when possible and exit.
|
||
|
||
The bootstrap layer returns structured exit codes before it creates pipes,
|
||
starts the STA, or touches MXAccess:
|
||
|
||
| Exit code | Name | Meaning |
|
||
|-----------|------|---------|
|
||
| `0` | `Success` | Required bootstrap options are valid. |
|
||
| `1` | `UnexpectedFailure` | A non-bootstrap exception reaches the process boundary. |
|
||
| `2` | `InvalidArguments` | Required arguments are missing or unknown arguments are present. |
|
||
| `3` | `InvalidProtocolVersion` | `--protocol-version` is not numeric or does not match the supported worker protocol. |
|
||
| `4` | `MissingNonce` | `MXGATEWAY_WORKER_NONCE` is absent or empty. |
|
||
|
||
Bootstrap logs use `WorkerConsoleLogger` key/value output. `WorkerLogRedactor`
|
||
redacts fields whose names indicate nonce, secret, password, token,
|
||
credential, or API key values before the message is written.
|
||
|
||
## Internal Components
|
||
|
||
```text
|
||
ZB.MOM.WW.MxGateway.Worker
|
||
Program
|
||
Bootstrap
|
||
WorkerOptions
|
||
WorkerHost
|
||
Ipc
|
||
PipeClient
|
||
FrameReader
|
||
FrameWriter
|
||
WorkerProtocol
|
||
Sta
|
||
StaRuntime
|
||
StaCommandQueue
|
||
MessagePump
|
||
StaWatchdog
|
||
MxAccess
|
||
MxAccessSession
|
||
MxAccessCommandDispatcher
|
||
MxAccessEventSink
|
||
MxAccessHandleRegistry
|
||
Conversion
|
||
VariantConverter
|
||
SafeArrayConverter
|
||
StatusProxyConverter
|
||
HResultMapper
|
||
```
|
||
|
||
## Threading Model
|
||
|
||
```text
|
||
main thread
|
||
-> parse args
|
||
-> configure host
|
||
-> coordinate shutdown
|
||
|
||
pipe reader thread/task
|
||
-> read WorkerEnvelope frames
|
||
-> validate protocol
|
||
-> enqueue commands or control messages
|
||
|
||
pipe writer thread/task
|
||
-> serialize WorkerEnvelope frames
|
||
-> write replies, events, heartbeats, faults
|
||
|
||
STA thread
|
||
-> CoInitializeEx(APARTMENTTHREADED)
|
||
-> create MXAccess COM object
|
||
-> attach event handlers
|
||
-> pump Windows/COM messages
|
||
-> execute queued commands
|
||
-> detach events and release COM on shutdown
|
||
|
||
watchdog/heartbeat task
|
||
-> observe STA responsiveness
|
||
-> send heartbeat or fault
|
||
```
|
||
|
||
No MXAccess method may execute outside the STA thread. Do not use `Task.Run`
|
||
around COM calls. Do not let event handlers perform pipe writes.
|
||
|
||
## STA Runtime
|
||
|
||
The STA runtime is the most important part of the worker.
|
||
|
||
Startup:
|
||
|
||
1. Create a dedicated `Thread`.
|
||
2. Set apartment state to `ApartmentState.STA`.
|
||
3. Start the thread.
|
||
4. Inside the thread, initialize COM.
|
||
5. Create the MXAccess COM object.
|
||
6. Attach event handlers.
|
||
7. Signal ready to the worker host.
|
||
8. Enter the message pump.
|
||
|
||
Shutdown:
|
||
|
||
1. Mark the command queue as completing.
|
||
2. Drain or reject pending commands according to shutdown mode.
|
||
3. Optionally issue MXAccess cleanup calls for active handles.
|
||
4. Detach event handlers.
|
||
5. Release COM references.
|
||
6. Uninitialize COM.
|
||
7. Exit the thread.
|
||
|
||
## Message Pump
|
||
|
||
The STA must pump Windows messages while also processing queued commands. A
|
||
blocking queue that prevents message pumping is not acceptable.
|
||
|
||
Required loop shape:
|
||
|
||
```text
|
||
while not shutdown:
|
||
while command queue has work:
|
||
execute one command on STA
|
||
|
||
MsgWaitForMultipleObjectsEx(
|
||
command_event,
|
||
timeout,
|
||
QS_ALLINPUT,
|
||
MWMO_INPUTAVAILABLE)
|
||
|
||
while PeekMessage:
|
||
TranslateMessage
|
||
DispatchMessage
|
||
```
|
||
|
||
The command queue should signal a Win32 event or equivalent wait handle so the
|
||
STA can wake without busy-waiting.
|
||
|
||
The loop should update a heartbeat timestamp after:
|
||
|
||
- successfully pumping messages,
|
||
- starting a command,
|
||
- finishing a command,
|
||
- processing an MXAccess event.
|
||
|
||
`StaRuntime` implements this runtime boundary in the worker. It starts one
|
||
background thread named `ZB.MOM.WW.MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
|
||
initializes COM through `StaComApartmentInitializer`, and runs
|
||
`StaMessagePump`. Commands are scheduled through `InvokeAsync`; the command
|
||
queue signals an `AutoResetEvent` so `MsgWaitForMultipleObjectsEx` can wake the
|
||
STA without busy-waiting. `LastActivityUtc` records pump, command, startup, and
|
||
shutdown activity so the future heartbeat/watchdog can report whether the STA
|
||
is still responsive. Shutdown marks the runtime as closing, wakes the pump,
|
||
rejects new commands, cancels queued work, uninitializes COM on the STA, and
|
||
waits for the thread to exit.
|
||
|
||
## COM Creation
|
||
|
||
The MXAccess analysis source at `C:\Users\dohertj2\Desktop\mxaccess` identifies
|
||
the installed COM target:
|
||
|
||
- interop assembly:
|
||
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
|
||
- assembly identity:
|
||
`ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae`
|
||
- COM class:
|
||
`ArchestrA.MxAccess.LMXProxyServerClass`
|
||
- CLSID:
|
||
`{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
||
- ProgID:
|
||
`LMXProxy.LMXProxyServer.1`
|
||
- version-independent ProgID:
|
||
`LMXProxy.LMXProxyServer`
|
||
- registered server:
|
||
`C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll`
|
||
- registry view:
|
||
`HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC}`
|
||
- threading model:
|
||
`Apartment`
|
||
|
||
The worker should reference the interop assembly and instantiate
|
||
`LMXProxyServerClass` on the dedicated STA thread. Keep the ProgID and assembly
|
||
path configurable for diagnostics, but this COM class is the v1 default.
|
||
|
||
`MxAccessStaSession` owns the initial COM creation path. It starts `StaRuntime`,
|
||
creates `LMXProxyServerClass` through `MxAccessComObjectFactory` on the STA,
|
||
attaches `MxAccessBaseEventSink`, and returns `WorkerReady` only after those
|
||
steps succeed. `MxAccessSession` keeps the raw COM object private, records the
|
||
STA managed thread id that created it, detaches the base event sink during
|
||
disposal, and releases the COM reference on the STA. After creation,
|
||
`MxAccessStaSession` owns a `StaCommandDispatcher` backed by
|
||
`MxAccessCommandExecutor`; `DispatchAsync` queues contract commands back to the
|
||
same STA instead of exposing the COM object to callers.
|
||
|
||
Creation rules:
|
||
|
||
- Create COM object only on the STA.
|
||
- Attach event handlers only on the STA.
|
||
- Keep the COM reference private to the STA runtime.
|
||
- Never marshal the raw COM object to pipe reader/writer threads.
|
||
- Capture COM creation HRESULT or exception details.
|
||
|
||
If COM creation fails, the worker should send a structured fault with:
|
||
|
||
- fault category,
|
||
- exception type,
|
||
- HRESULT when available,
|
||
- COM class or ProgID attempted,
|
||
- worker process id,
|
||
- session id.
|
||
|
||
`WorkerPipeSession` maps startup exceptions from this path to
|
||
`WorkerFaultCategory.MxaccessCreationFailed`, includes the captured HRESULT
|
||
when the exception exposes one, and does not send `WorkerReady` after a failed
|
||
COM creation attempt.
|
||
|
||
After `WorkerReady`, `WorkerPipeSession` continues reading gateway frames for
|
||
the lifetime of the process. `WorkerCommand` frames are dispatched to
|
||
`MxAccessStaSession`, replies are written as `WorkerCommandReply`, and queued
|
||
worker events are drained after command replies. `WorkerShutdown` starts the
|
||
graceful shutdown path and returns `WorkerShutdownAck` only after the STA
|
||
cleanup path completes.
|
||
|
||
## Event Sink
|
||
|
||
The worker must subscribe to every public MXAccess event family:
|
||
|
||
- `OnDataChange`
|
||
- `OnWriteComplete`
|
||
- `OperationComplete`
|
||
- `OnBufferedDataChange`
|
||
|
||
Forward these event families only when the native MXAccess COM object raises
|
||
them. Do not synthesize `OperationComplete` from write completion or command
|
||
status. `OnBufferedDataChange` must be represented in the protocol now, but
|
||
multi-sample payload conversion should remain capture-validated; preserve raw
|
||
metadata whenever conversion is incomplete.
|
||
|
||
Event handling rules:
|
||
|
||
- Event handlers are expected to run on the STA.
|
||
- Assign a monotonic worker event sequence.
|
||
- Convert event args to `WorkerEvent`.
|
||
- Include value, quality, timestamp, handles, status arrays, and raw status
|
||
details when available.
|
||
- Preserve raw event payload metadata for unsupported buffered or
|
||
completion-only shapes.
|
||
- Enqueue to the outbound event queue.
|
||
- Return quickly to preserve message pumping.
|
||
|
||
`MxAccessBaseEventSink` implements the COM connection-point handlers and keeps
|
||
the handlers limited to event argument conversion plus enqueue. It uses
|
||
`MxAccessEventMapper` to create `MxEvent` DTOs for `OnDataChange`,
|
||
`OnWriteComplete`, `OperationComplete`, and `OnBufferedDataChange`. The mapper
|
||
converts scalar and array values through `VariantConverter`, converts
|
||
`MXSTATUS_PROXY[]` through `MxStatusProxyConverter`, and maps installed
|
||
`MxDataType` values to the public protobuf enum while preserving the raw data
|
||
type on buffered events. `OperationComplete` is only emitted from the native
|
||
`OperationComplete` handler; write completion does not synthesize it.
|
||
|
||
`MxAccessEventQueue` is the bounded outbound event queue for one worker
|
||
session. It assigns the monotonic `WorkerSequence` and `WorkerTimestamp` when an
|
||
event is accepted, preserving the order in which MXAccess handlers enqueue
|
||
events. The default capacity is `10000`. When the queue reaches capacity it
|
||
records a `WorkerFaultCategory.QueueOverflow` fault and rejects further events.
|
||
The event handler catches conversion and enqueue failures, records the first
|
||
fault on the queue, and returns to the STA message pump instead of writing to
|
||
the pipe.
|
||
|
||
If event conversion throws, catch it inside the event handler, record a
|
||
structured `WorkerFault`, and keep the worker alive only if the fault policy
|
||
allows it.
|
||
|
||
## Command Queue
|
||
|
||
The pipe reader converts `WorkerCommand` messages into `StaCommand` entries.
|
||
|
||
Each entry should include:
|
||
|
||
- correlation id,
|
||
- method name,
|
||
- method-specific request payload,
|
||
- enqueue timestamp,
|
||
- cancellation marker,
|
||
- reply completion path.
|
||
|
||
The STA command dispatcher:
|
||
|
||
1. Dequeues one command.
|
||
2. Checks whether shutdown has started.
|
||
3. Calls the matching MXAccess method.
|
||
4. Captures return values, out parameters, status arrays, and HRESULT.
|
||
5. Converts results to `WorkerCommandReply`.
|
||
6. Enqueues the reply to the pipe writer.
|
||
|
||
The STA should execute one command at a time. MXAccess command ordering must be
|
||
preserved for one worker.
|
||
|
||
## Command Dispatch Surface
|
||
|
||
Phase 1 commands:
|
||
|
||
- `Register`
|
||
- `Unregister`
|
||
- `AddItem`
|
||
- `RemoveItem`
|
||
|
||
Phase 2 event commands:
|
||
|
||
- `Advise`
|
||
- `UnAdvise`
|
||
- `AdviseSupervisory`
|
||
|
||
Full surface:
|
||
|
||
- `AddItem2`
|
||
- `AddBufferedItem`
|
||
- `SetBufferedUpdateInterval`
|
||
- `Suspend`
|
||
- `Activate`
|
||
- `Write`
|
||
- `Write2`
|
||
- `WriteSecured`
|
||
- `WriteSecured2`
|
||
- `AuthenticateUser`
|
||
- `ArchestrAUserToId`
|
||
|
||
Diagnostics:
|
||
|
||
- `Ping`
|
||
- `GetSessionState`
|
||
- `GetWorkerInfo`
|
||
- `DrainEvents`
|
||
- `ShutdownWorker`
|
||
|
||
Implement method-specific dispatch instead of a generic string method invoker.
|
||
Parity tests need stable command-specific request and reply shapes.
|
||
|
||
`MxAccessCommandExecutor` implements the first command pair:
|
||
|
||
- `Register` calls `LMXProxyServerClass.Register` with the requested client
|
||
name and preserves the returned server handle in both `ReturnValue` and
|
||
`RegisterReply.ServerHandle`.
|
||
- `Unregister` calls `LMXProxyServerClass.Unregister` with the requested server
|
||
handle. The reply has no method-specific payload because the public MXAccess
|
||
method returns `void`.
|
||
|
||
Both commands set `Hresult` to `0` only after the COM call returns normally.
|
||
COM exceptions flow through `StaCommandDispatcher`, which captures the thrown
|
||
HRESULT and converts the reply to `ProtocolStatusCode.MxaccessFailure`.
|
||
`MxAccessStaSession.GetRegisteredServerHandlesAsync` returns an STA-read
|
||
snapshot of tracked server handles for diagnostics and future cleanup logic.
|
||
|
||
`MxAccessCommandExecutor` also implements the item lifecycle commands:
|
||
|
||
- `AddItem` calls `LMXProxyServerClass.AddItem` with the requested server
|
||
handle and item definition. It preserves the returned item handle in both
|
||
`ReturnValue` and `AddItemReply.ItemHandle`.
|
||
- `AddItem2` calls `LMXProxyServerClass.AddItem2` with the requested server
|
||
handle, item definition, and context string. The context string is passed to
|
||
MXAccess exactly as received.
|
||
- `RemoveItem` calls `LMXProxyServerClass.RemoveItem` with the requested server
|
||
handle and item handle. The reply has no method-specific payload because the
|
||
public MXAccess method returns `void`.
|
||
|
||
The worker records item handles only after `AddItem` or `AddItem2` returns
|
||
normally, and removes item handles only after `RemoveItem` returns normally.
|
||
The registry does not prevalidate server or item handles, so invalid and
|
||
cross-server handle behavior remains owned by MXAccess. COM exceptions continue
|
||
through `StaCommandDispatcher`, which preserves the HRESULT and leaves
|
||
diagnostic registry state unchanged for failed cleanup calls.
|
||
|
||
`MxAccessCommandExecutor` implements advice lifecycle commands on the same STA
|
||
path:
|
||
|
||
- `Advise` calls `LMXProxyServerClass.Advise` with the requested server handle
|
||
and item handle.
|
||
- `AdviseSupervisory` calls `LMXProxyServerClass.AdviseSupervisory` with the
|
||
requested server handle and item handle. This remains a distinct command from
|
||
plain `Advise` even though observed scalar captures share the same lower-level
|
||
subscription body.
|
||
- `UnAdvise` calls `LMXProxyServerClass.UnAdvise` with the requested server
|
||
handle and item handle.
|
||
|
||
The worker records plain and supervisory advice separately only after the COM
|
||
call returns normally. Successful `UnAdvise` removes all tracked advice for the
|
||
server and item pair because the public MXAccess cleanup method has no plain
|
||
versus supervisory selector. Successful `RemoveItem` and `Unregister` also clear
|
||
related advice state from the worker registry. Failed advice and cleanup calls
|
||
leave registry state unchanged so diagnostics continue to reflect the last
|
||
successful MXAccess-owned state transition.
|
||
|
||
## Handle Registry
|
||
|
||
The worker should track MXAccess state for diagnostics and cleanup, while still
|
||
treating MXAccess as the authority.
|
||
|
||
Suggested tracked state:
|
||
|
||
- registered server handles,
|
||
- item handles,
|
||
- item names and context,
|
||
- server handle for each item,
|
||
- advise state,
|
||
- buffered item state,
|
||
- authenticated user ids if needed,
|
||
- last command touching each handle.
|
||
|
||
Rules:
|
||
|
||
- Do not invent handles.
|
||
- Do not rewrite handles returned by MXAccess.
|
||
- Record server handles only after `Register` succeeds.
|
||
- Remove server handles only after `Unregister` succeeds.
|
||
- Record item handles only after `AddItem` or `AddItem2` succeeds.
|
||
- Remove item handles only after `RemoveItem` succeeds.
|
||
- Record advice state only after `Advise` or `AdviseSupervisory` succeeds.
|
||
- Remove advice state only after `UnAdvise`, `RemoveItem`, or `Unregister`
|
||
succeeds.
|
||
- Preserve invalid-handle behavior from MXAccess.
|
||
- Preserve cross-server handle behavior from MXAccess.
|
||
- Use registry state for cleanup and diagnostics, not semantic correction.
|
||
|
||
## Value Conversion
|
||
|
||
`VariantConverter` should convert COM values into the protobuf `MxValue` union.
|
||
|
||
Supported scalar projections:
|
||
|
||
- bool,
|
||
- int32,
|
||
- int64,
|
||
- float,
|
||
- double,
|
||
- string,
|
||
- timestamp,
|
||
- raw fallback.
|
||
|
||
Supported arrays:
|
||
|
||
- bool array,
|
||
- int32 array,
|
||
- float array,
|
||
- double array,
|
||
- string array,
|
||
- timestamp array,
|
||
- raw fallback.
|
||
|
||
Rules:
|
||
|
||
- Preserve null and empty values distinctly when MXAccess exposes a distinction.
|
||
- Preserve array rank and dimensions when available.
|
||
- Preserve original variant type metadata.
|
||
- If conversion is lossy, include the best typed value plus raw diagnostic
|
||
metadata.
|
||
- Do not throw away values just because they are awkward.
|
||
|
||
Credential-bearing values must not be logged.
|
||
|
||
## Status And HRESULT Capture
|
||
|
||
`MXSTATUS_PROXY` arrays must be represented explicitly. Do not collapse status
|
||
arrays into a single success flag.
|
||
|
||
For every command reply, capture:
|
||
|
||
- protocol success/failure,
|
||
- method name,
|
||
- correlation id,
|
||
- COM HRESULT if available,
|
||
- thrown exception HRESULT if available,
|
||
- MXAccess return value if any,
|
||
- method-specific out parameters,
|
||
- status array,
|
||
- diagnostic message safe for logs.
|
||
|
||
If a COM call throws, map the exception into a command reply instead of
|
||
crashing the worker, unless the exception indicates process corruption or the
|
||
configured policy says to fail the session.
|
||
|
||
## Cancellation
|
||
|
||
Worker cancellation is cooperative at the queue boundary.
|
||
|
||
Rules:
|
||
|
||
- If a `WorkerCancel` arrives before a command starts, mark the command
|
||
canceled and reply or drop according to protocol policy.
|
||
- If a command is already executing on the STA, do not attempt to abort the COM
|
||
call.
|
||
- When the COM call returns after gateway cancellation, send the reply only if
|
||
the gateway still wants late replies; otherwise log and discard.
|
||
- Hard cancellation is process kill by the gateway.
|
||
|
||
## Outbound Queues
|
||
|
||
The worker should use bounded outbound queues for replies, events, heartbeats,
|
||
and faults.
|
||
|
||
Priority order when writing:
|
||
|
||
1. faults,
|
||
2. command replies,
|
||
3. shutdown acknowledgements,
|
||
4. heartbeats,
|
||
5. events.
|
||
|
||
Event overflow policy defaults to fail-fast for parity testing. If the event
|
||
queue fills:
|
||
|
||
1. Capture overflow metrics.
|
||
2. Send `WorkerFault` if possible.
|
||
3. Stop accepting new commands.
|
||
4. Let the gateway close or kill the worker.
|
||
|
||
Production coalescing may be added later, but it must be explicit and tested.
|
||
Do not drop or coalesce events in v1.
|
||
|
||
## Heartbeat And Watchdog
|
||
|
||
`WorkerPipeSession` starts the heartbeat loop after the gateway validates
|
||
`WorkerHello` and receives `WorkerReady`. Heartbeats continue until
|
||
`WorkerShutdown`, cancellation, or a pipe/protocol failure stops the session.
|
||
The loop uses `WorkerPipeSessionOptions.HeartbeatInterval`; the default matches
|
||
the gateway worker heartbeat interval.
|
||
|
||
The worker heartbeat proves that:
|
||
|
||
- pipe writer is alive,
|
||
- worker host is alive,
|
||
- STA has recently pumped or completed work.
|
||
|
||
Heartbeat payload includes:
|
||
|
||
- worker process id,
|
||
- session id,
|
||
- current state,
|
||
- last STA activity timestamp,
|
||
- pending command count,
|
||
- outbound event queue depth,
|
||
- event sequence,
|
||
- current command correlation id if any.
|
||
|
||
`MxAccessStaSession.CaptureHeartbeat()` reads `StaRuntime.LastActivityUtc` and
|
||
`StaCommandDispatcher` queue state without touching the raw MXAccess COM object
|
||
outside the STA. Event queue depth and event sequence are reported as zero until
|
||
the event queue implementation owns those counters.
|
||
|
||
The STA watchdog currently emits a `WorkerFault` with
|
||
`WorkerFaultCategory.StaHung` when `LastStaActivityUtc` is older than
|
||
`WorkerPipeSessionOptions.HeartbeatGrace` **and no command is in flight**.
|
||
`StaRuntime.ProcessQueuedCommands` calls `MarkActivity()` only immediately
|
||
before and after each work item, so a synchronously long-running STA command
|
||
(for example a `ReadBulk` waiting `timeout_ms` for the first `OnDataChange`)
|
||
legitimately freezes `LastStaActivityUtc` for the duration of the wait while
|
||
the worker is healthy. The watchdog is therefore suppressed while the
|
||
heartbeat snapshot's `CurrentCommandCorrelationId` is non-empty: the worker is
|
||
busy executing a command, not hung, and the heartbeat already surfaces the
|
||
in-flight correlation id so the gateway can apply its own per-command timeout
|
||
if it considers the command too slow. The fault still fires on a truly hung
|
||
STA — no command in flight and no activity for longer than `HeartbeatGrace` —
|
||
which is the only case the watchdog can usefully distinguish from a slow
|
||
command. Command duration and high event queue depth remain observable through
|
||
heartbeat fields until dedicated thresholds own those warnings. The worker
|
||
reports stale STA activity, but the gateway owns the final kill decision
|
||
through its existing heartbeat and worker lifecycle policy.
|
||
|
||
The in-flight-command suppression itself is bounded by
|
||
`WorkerPipeSessionOptions.HeartbeatStuckCeiling` (default 75 seconds = 5 ×
|
||
`HeartbeatGrace`). The motivating case for the suppression is a legitimately
|
||
slow synchronous command — but a genuinely stuck COM call (for example
|
||
against a dead MXAccess provider whose cross-apartment marshaler is
|
||
permanently blocked, or a write completion that never fires) leaves
|
||
`CurrentCommandCorrelationId` non-empty indefinitely. Without an upper bound
|
||
the worker-side `StaHung` watchdog would be permanently defeated for that
|
||
session and only the gateway's per-command timeout would catch the hang —
|
||
losing the worker-originated diagnostic (`StaHung` fault category, the
|
||
stale-by interval) from the gateway audit trail. Once `LastStaActivityUtc`
|
||
has been stale for longer than `HeartbeatStuckCeiling`, the watchdog fires
|
||
`StaHung` regardless of whether a command is in flight, on the assumption
|
||
that no legitimate STA command should run that long without periodically
|
||
refreshing activity. Deployments that legitimately run very long bulk
|
||
operations should raise the ceiling rather than disable it.
|
||
|
||
## Shutdown
|
||
|
||
Graceful shutdown sequence:
|
||
|
||
1. Pipe reader receives `WorkerShutdown`.
|
||
2. Worker host marks shutdown requested.
|
||
3. Reject new commands.
|
||
4. Let current STA command finish if within timeout.
|
||
5. Optionally run MXAccess cleanup:
|
||
- `UnAdvise`,
|
||
- `RemoveItem`,
|
||
- `Unregister`.
|
||
6. Detach event handlers.
|
||
7. Release COM object until reference count reaches zero when possible.
|
||
8. Stop pipe reader and writer.
|
||
9. Exit process with success code.
|
||
|
||
If shutdown wedges, the gateway kills the process. The worker should be written
|
||
so process kill does not corrupt other sessions.
|
||
|
||
`MxAccessStaSession.ShutdownGracefullyAsync` implements the current cleanup
|
||
path. It first calls `StaCommandDispatcher.RequestShutdown()` so new commands
|
||
are rejected and queued commands that have not started receive
|
||
`ProtocolStatusCode.WorkerUnavailable`. The command already executing on the
|
||
STA is allowed to finish until the shutdown grace period expires.
|
||
|
||
After command dispatch is closed, cleanup runs on the STA in MXAccess handle
|
||
order:
|
||
|
||
1. one `UnAdvise` call per advised server/item pair,
|
||
2. `RemoveItem` for active item handles,
|
||
3. `Unregister` for active server handles,
|
||
4. event sink detach,
|
||
5. COM release.
|
||
|
||
Each cleanup call is best effort. A failed cleanup operation is recorded as an
|
||
`MxAccessShutdownFailure`, logged by `WorkerPipeSession`, and does not prevent
|
||
later cleanup calls from running. A shutdown with cleanup failures still returns
|
||
`WorkerShutdownAck` with `ProtocolStatusCode.Ok` because the worker reached the
|
||
controlled release path. If the grace period expires before cleanup can run or
|
||
finish, the worker reports `WorkerFaultCategory.ShutdownTimeout` when possible
|
||
and relies on the gateway to kill the process.
|
||
|
||
## Fault Handling
|
||
|
||
Worker fault categories:
|
||
|
||
- `InvalidArguments`
|
||
- `GatewayAuthenticationFailed`
|
||
- `ProtocolMismatch`
|
||
- `ProtocolViolation`
|
||
- `PipeDisconnected`
|
||
- `MxAccessCreationFailed`
|
||
- `MxAccessCommandFailed`
|
||
- `MxAccessEventConversionFailed`
|
||
- `StaHung`
|
||
- `QueueOverflow`
|
||
- `ShutdownTimeout`
|
||
|
||
Fault payload should include:
|
||
|
||
- category,
|
||
- session id,
|
||
- correlation id when command-specific,
|
||
- command method when command-specific,
|
||
- HRESULT when available,
|
||
- exception type when available,
|
||
- safe diagnostic message.
|
||
|
||
Do not include raw credentials or full secured-write values.
|
||
|
||
## Security
|
||
|
||
The worker should trust only the launching gateway after validating:
|
||
|
||
- expected session id,
|
||
- expected protocol version,
|
||
- nonce,
|
||
- pipe identity where available.
|
||
|
||
It should not expose any network listener. It should not accept commands from
|
||
arbitrary local processes.
|
||
|
||
Credential-bearing commands must keep credential data out of:
|
||
|
||
- command line,
|
||
- logs,
|
||
- metrics labels,
|
||
- exception messages,
|
||
- crash dumps when avoidable.
|
||
|
||
## Observability
|
||
|
||
Worker logs should include:
|
||
|
||
- startup arguments except secrets,
|
||
- protocol version,
|
||
- gateway handshake result,
|
||
- MXAccess COM creation result,
|
||
- command start/end with correlation id,
|
||
- HRESULT/status summary,
|
||
- event family and sequence,
|
||
- queue overflow,
|
||
- STA watchdog warnings,
|
||
- shutdown path.
|
||
|
||
Metrics can be emitted through the gateway or exposed as worker heartbeat
|
||
fields. The worker does not need its own public metrics endpoint.
|
||
|
||
## Testing Strategy
|
||
|
||
Worker tests that do not require installed MXAccess:
|
||
|
||
- frame reader/writer,
|
||
- protocol validation,
|
||
- command queue ordering,
|
||
- STA command scheduling with a fake COM object,
|
||
- message-pump wake behavior where practical,
|
||
- value conversion,
|
||
- status conversion,
|
||
- event conversion from fake event args,
|
||
- shutdown state transitions,
|
||
- queue overflow behavior.
|
||
|
||
Live MXAccess tests:
|
||
|
||
- COM creation on STA,
|
||
- `Register` and `Unregister`,
|
||
- `AddItem` and `RemoveItem`,
|
||
- `Advise` and one `OnDataChange`,
|
||
- write completion behavior,
|
||
- secured write behavior,
|
||
- buffered data-change behavior,
|
||
- invalid handle behavior.
|
||
- no synthesized `OperationComplete` when native MXAccess does not raise it.
|
||
- raw metadata preservation for buffered payloads that cannot yet be fully
|
||
converted.
|
||
|
||
Live tests should be opt-in and clearly marked because they depend on installed
|
||
MXAccess COM and provider state.
|
||
The worker test suite uses `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` for these
|
||
tests. `AddItem` uses `TestChildObject.TestInt` by default and accepts an
|
||
override through `MXGATEWAY_LIVE_MXACCESS_ITEM`; `AddItem2` uses the captured
|
||
parity fixture shape `AddItem2("TestInt", "TestChildObject")`.
|
||
|
||
`WorkerLiveMxAccessSmokeTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/` uses the
|
||
same opt-in variable for the gateway-to-worker live smoke. It launches the x86
|
||
worker through `WorkerProcessLauncher`, opens a gateway session, runs
|
||
`Register`, `AddItem`, and `Advise`, waits for one `OnDataChange`, and closes
|
||
the session. The smoke accepts `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` for a
|
||
non-default worker executable path and
|
||
`MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS` for the bounded event wait.
|
||
|
||
## Initial Implementation Slice
|
||
|
||
The first worker slice should implement:
|
||
|
||
1. Argument parsing and pipe connection.
|
||
2. Protocol hello and nonce validation.
|
||
3. STA thread startup.
|
||
4. COM initialization and MXAccess object creation.
|
||
5. Message pump with command wake event.
|
||
6. `WorkerReady`.
|
||
7. Shutdown command.
|
||
8. `Register`, `AddItem`, and `Advise`.
|
||
9. Event sink for one `OnDataChange`.
|
||
10. Basic value/status conversion.
|
||
11. Event model coverage for `OperationComplete` and `OnBufferedDataChange`
|
||
without synthesized events.
|
||
12. Fault reporting.
|
||
|
||
This slice proves the worker can preserve the core MXAccess requirements:
|
||
single-process isolation, STA ownership, message pumping, command execution,
|
||
and event delivery.
|
||
|
||
## Related Documentation
|
||
|
||
- [Worker Bootstrap](./WorkerBootstrap.md)
|
||
- [Worker STA](./WorkerSta.md)
|
||
- [Worker Conversion](./WorkerConversion.md)
|
||
- [Worker Frame Protocol](./WorkerFrameProtocol.md)
|
||
- [Worker Process Launcher](./WorkerProcessLauncher.md)
|
||
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
|
||
- [Design Decisions](./DesignDecisions.md)
|