Resolve audit findings: correct WorkerEnvelope proto/route/metric/session facts; rewrite auth (ZB.MOM.WW.Auth migration), dashboard (ZB.MOM.WW.Theme), and StyleGuide (foreign-project copy-paste); document alarm subsystem, Ldap options, and gateway alarm broker; fix client CLI flags and package paths.
33 KiB
MXAccess Worker Instance Detailed Design
Purpose
An MXAccess worker instance is the compatibility boundary around one installed MXAccess COM object. It runs as a disposable .NET Framework 4.8 x86 process, owns one dedicated STA thread, pumps Windows/COM messages, executes MXAccess commands on that STA, and forwards MXAccess events back to the gateway.
The worker's job is not to make MXAccess nicer. Its job is to preserve direct MXAccess behavior while making that behavior available to modern clients through the gateway.
Runtime
- Target runtime: .NET Framework 4.8.
- Language: C#.
- Platform target: x86 by default.
- Process lifetime: one worker per gateway session.
- Public network listeners: none.
- Gateway IPC: one named pipe with protobuf-framed messages.
- COM apartment: one dedicated STA thread.
Style guides:
Build And Test
Build the SDK-style worker project with the .NET SDK MSBuild entry point. The project targets .NET Framework 4.8, but the SDK resolver comes from the .NET SDK installation:
dotnet msbuild src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
docs/ToolchainLinks.md records the Visual Studio MSBuild executable for
classic .NET Framework and COM interop builds:
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
Run the worker tests with the same platform target:
dotnet test src\ZB.MOM.WW.MxGateway.Worker.Tests\ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86
The only MXAccess interop reference belongs in ZB.MOM.WW.MxGateway.Worker. Gateway and
test projects may reference the worker project for metadata and scaffold tests,
but they must not reference ArchestrA.MXAccess.dll directly.
Responsibilities
The worker owns:
- connection to the gateway pipe,
- protocol hello and readiness reporting,
- STA thread creation and teardown,
- COM initialization on the STA,
- MXAccess COM object creation,
- MXAccess event sink wiring,
- command dispatch on the STA,
- MXAccess handle and advise state tracking,
- value/status/HRESULT capture,
- conversion to worker protobuf DTOs,
- event sequencing,
- heartbeat reporting,
- graceful shutdown.
The worker does not own:
- public gRPC API,
- client authentication,
- cross-session routing,
- worker process supervision,
- remote TLS,
- policy decisions for other sessions.
Process Bootstrap
Expected command-line arguments:
--session-id <sessionId>
--pipe-name <pipeName>
--protocol-version <version>
Expected protected environment values:
MXGATEWAY_WORKER_NONCE=<random nonce>
The nonce travels through the environment rather than the command line so it never appears in process-listing tools that expose argument vectors.
Startup sequence:
- Parse command-line arguments.
- Configure minimal logging.
- Validate required values are present.
- Connect to the gateway named pipe.
- Exchange
WorkerHelloandGatewayHello. - Validate protocol version, session id, and nonce.
- Start the STA runtime.
- Create the MXAccess COM object on the STA.
- Attach MXAccess event handlers on the STA.
- Send
WorkerReady. - Start pipe read, pipe write, heartbeat, and shutdown coordination loops.
If validation fails before MXAccess creation, exit quickly with a non-zero exit
code. If MXAccess creation fails, send WorkerFault when possible and exit.
WorkerApplication.Run returns one of the structured WorkerExitCode values.
Codes 2–4 are produced by the bootstrap parse phase before any pipe, STA, or
MXAccess work happens; codes 5–6 and a clean 0 only become reachable once
the parse succeeds and the worker runs its pipe session:
| Exit code | Name | Meaning |
|---|---|---|
0 |
Success |
The pipe session ran to a clean close. |
1 |
UnexpectedFailure |
A non-bootstrap exception reaches the process boundary. |
2 |
InvalidArguments |
Required arguments are missing or unknown arguments are present. |
3 |
InvalidProtocolVersion |
--protocol-version is not numeric or does not match the supported worker protocol. |
4 |
MissingNonce |
MXGATEWAY_WORKER_NONCE is absent or empty. |
5 |
PipeConnectionFailed |
The pipe connection raised an IOException or TimeoutException. |
6 |
ProtocolViolation |
A WorkerFrameProtocolException escaped the pipe session. |
WorkerBootstrapResult.Succeeded is a separate parse-phase gate: it reports
whether argument parsing produced usable WorkerOptions. A false result
carries one of codes 2–4 and the worker exits before running a session, so a
successful parse is distinct from the 0 exit code, which only follows a clean
pipe-session close.
Bootstrap logs use WorkerConsoleLogger key/value output. WorkerLogRedactor
redacts fields whose names indicate nonce, secret, password, token,
credential, or API key values before the message is written.
Internal Components
ZB.MOM.WW.MxGateway.Worker
Program (calls WorkerApplication.Run)
WorkerApplication (parse, bootstrap, run pipe session, map exit code)
Bootstrap
WorkerOptionsParser (parse args + env into WorkerOptions)
WorkerOptions
WorkerBootstrapResult (parse outcome + WorkerExitCode)
WorkerExitCode
WorkerConsoleLogger / WorkerLogRedactor
Ipc
WorkerPipeClient (named-pipe connect + retry, owns the session)
WorkerPipeSession (handshake, read/write/drain/heartbeat loops)
WorkerFrameReader / WorkerFrameWriter
WorkerEnvelopeValidator
WorkerContractInfo (protocol version + descriptor names)
Sta
StaRuntime (the dedicated STA thread + message pump loop)
StaCommandDispatcher
StaMessagePump
MxAccess
MxAccessStaSession (IWorkerRuntimeSession over the STA)
MxAccessSession (handle registry + COM-call orchestration)
MxAccessCommandExecutor (IStaCommandExecutor; runs commands on the STA)
MxAccessBaseEventSink (OnDataChange tag-data events)
MxAccessHandleRegistry
(alarm subsystem — see below)
Conversion
VariantConverter (MxValue <-> COM VARIANT, both directions)
MxStatusProxyConverter
HResultConverter / HResultConversion
Threading Model
main thread
-> parse args
-> configure host
-> coordinate shutdown
pipe reader thread/task
-> read WorkerEnvelope frames
-> validate protocol
-> enqueue commands or control messages
pipe writer thread/task
-> serialize WorkerEnvelope frames
-> write replies, events, heartbeats, faults
STA thread
-> CoInitializeEx(APARTMENTTHREADED)
-> create MXAccess COM object
-> attach event handlers
-> pump Windows/COM messages
-> execute queued commands
-> detach events and release COM on shutdown
watchdog/heartbeat task
-> observe STA responsiveness
-> send heartbeat or fault
No MXAccess method may execute outside the STA thread. Do not use Task.Run
around COM calls. Do not let event handlers perform pipe writes.
STA Runtime
The STA runtime is the most important part of the worker.
Startup:
- Create a dedicated
Thread. - Set apartment state to
ApartmentState.STA. - Start the thread.
- Inside the thread, initialize COM.
- Create the MXAccess COM object.
- Attach event handlers.
- Signal ready to the worker host.
- Enter the message pump.
Shutdown:
- Mark the command queue as completing.
- Drain or reject pending commands according to shutdown mode.
- Optionally issue MXAccess cleanup calls for active handles.
- Detach event handlers.
- Release COM references.
- Uninitialize COM.
- Exit the thread.
Message Pump
The STA must pump Windows messages while also processing queued commands. A blocking queue that prevents message pumping is not acceptable.
Required loop shape:
while not shutdown:
while command queue has work:
execute one command on STA
MsgWaitForMultipleObjectsEx(
command_event,
timeout,
QS_ALLINPUT,
MWMO_INPUTAVAILABLE)
while PeekMessage:
TranslateMessage
DispatchMessage
The command queue should signal a Win32 event or equivalent wait handle so the STA can wake without busy-waiting.
The loop should update a heartbeat timestamp after:
- successfully pumping messages,
- starting a command,
- finishing a command,
- processing an MXAccess event.
StaRuntime implements this runtime boundary in the worker. It starts one
background thread named MxGateway.Worker.STA, sets it to ApartmentState.STA,
initializes COM through StaComApartmentInitializer, and runs
StaMessagePump. Commands are scheduled through InvokeAsync; the command
queue signals an AutoResetEvent so MsgWaitForMultipleObjectsEx can wake the
STA without busy-waiting. LastActivityUtc records pump, command, startup, and
shutdown activity so the future heartbeat/watchdog can report whether the STA
is still responsive. Shutdown marks the runtime as closing, wakes the pump,
rejects new commands, cancels queued work, uninitializes COM on the STA, and
waits for the thread to exit.
COM Creation
The MXAccess analysis source at C:\Users\dohertj2\Desktop\mxaccess identifies
the installed COM target:
- interop assembly:
C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll - assembly identity:
ArchestrA.MxAccess, Version=3.2.0.0, PublicKeyToken=23106a86e706d0ae - COM class:
ArchestrA.MxAccess.LMXProxyServerClass - CLSID:
{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC} - ProgID:
LMXProxy.LMXProxyServer.1 - version-independent ProgID:
LMXProxy.LMXProxyServer - registered server:
C:\Program Files (x86)\ArchestrA\Framework\Bin\LmxProxy.dll - registry view:
HKCR\Wow6432Node\CLSID\{C30B52F5-2CB5-4760-AF0A-3A344A7EB5DC} - threading model:
Apartment
The worker should reference the interop assembly and instantiate
LMXProxyServerClass on the dedicated STA thread. Keep the ProgID and assembly
path configurable for diagnostics, but this COM class is the v1 default.
MxAccessStaSession owns the initial COM creation path. It starts StaRuntime,
creates LMXProxyServerClass through MxAccessComObjectFactory on the STA,
attaches MxAccessBaseEventSink, and returns WorkerReady only after those
steps succeed. MxAccessSession keeps the raw COM object private, records the
STA managed thread id that created it, detaches the base event sink during
disposal, and releases the COM reference on the STA. After creation,
MxAccessStaSession owns a StaCommandDispatcher backed by
MxAccessCommandExecutor; DispatchAsync queues contract commands back to the
same STA instead of exposing the COM object to callers.
Creation rules:
- Create COM object only on the STA.
- Attach event handlers only on the STA.
- Keep the COM reference private to the STA runtime.
- Never marshal the raw COM object to pipe reader/writer threads.
- Capture COM creation HRESULT or exception details.
If COM creation fails, the worker should send a structured fault with:
- fault category,
- exception type,
- HRESULT when available,
- COM class or ProgID attempted,
- worker process id,
- session id.
WorkerPipeSession maps startup exceptions from this path to
WorkerFaultCategory.MxaccessCreationFailed, includes the captured HRESULT
when the exception exposes one, and does not send WorkerReady after a failed
COM creation attempt.
After WorkerReady, WorkerPipeSession continues reading gateway frames for
the lifetime of the process. WorkerCommand frames are dispatched to
MxAccessStaSession, replies are written as WorkerCommandReply, and queued
worker events are drained after command replies. WorkerShutdown starts the
graceful shutdown path and returns WorkerShutdownAck only after the STA
cleanup path completes.
Event Sink
The worker subscribes to every public MXAccess event family through
MxAccessBaseEventSink:
OnDataChangeOnWriteCompleteOperationCompleteOnBufferedDataChange
Alarm transitions arrive on a separate path. They do not originate from the
LMXProxyServerClass connection points, so MxAccessAlarmEventSink (driven by
the alarm subsystem below) feeds them onto the same MxAccessEventQueue rather
than MxAccessBaseEventSink.
Forward these event families only when the native MXAccess COM object raises
them. Do not synthesize OperationComplete from write completion or command
status. OnBufferedDataChange must be represented in the protocol now, but
multi-sample payload conversion should remain capture-validated; preserve raw
metadata whenever conversion is incomplete.
Event handling rules:
- Event handlers are expected to run on the STA.
- Assign a monotonic worker event sequence.
- Convert event args to
WorkerEvent. - Include value, quality, timestamp, handles, status arrays, and raw status details when available.
- Preserve raw event payload metadata for unsupported buffered or completion-only shapes.
- Enqueue to the outbound event queue.
- Return quickly to preserve message pumping.
MxAccessBaseEventSink implements the COM connection-point handlers and keeps
the handlers limited to event argument conversion plus enqueue. It uses
MxAccessEventMapper to create MxEvent DTOs for OnDataChange,
OnWriteComplete, OperationComplete, and OnBufferedDataChange. The mapper
converts scalar and array values through VariantConverter, converts
MXSTATUS_PROXY[] through MxStatusProxyConverter, and maps installed
MxDataType values to the public protobuf enum while preserving the raw data
type on buffered events. OperationComplete is only emitted from the native
OperationComplete handler; write completion does not synthesize it.
MxAccessEventQueue is the bounded outbound event queue for one worker
session. It assigns the monotonic WorkerSequence and WorkerTimestamp when an
event is accepted, preserving the order in which MXAccess handlers enqueue
events. The default capacity is 10000. When the queue reaches capacity, Enqueue
records a WorkerFaultCategory.QueueOverflow fault and then throws
MxAccessEventQueueOverflowException so the caller cannot silently drop the
event. The event handler catches conversion and enqueue failures (including this
overflow exception), records the first fault on the queue, and returns to the
STA message pump instead of writing to the pipe.
If event conversion throws, catch it inside the event handler, record a
structured WorkerFault, and keep the worker alive only if the fault policy
allows it.
Alarm Subsystem
Alarms come from a different COM surface than tag data, so the worker carries a
separate pipeline rather than folding alarms into MxAccessBaseEventSink. The
MXAccess LMXProxyServerClass does not expose alarm subscription, so the worker
hosts AVEVA's standalone alarm-consumer COM object instead.
WnWrapAlarmConsumeris the productionIMxAccessAlarmConsumer, backed byWNWRAPCONSUMERLib.wwAlarmConsumerClass. It returns the active alarm set as a BSTR XML string throughGetXmlCurrentAlarms2, which avoids the FILETIME→DateTimemarshaling that crashed the earlier managed alarm client. The CLSID is registeredThreadingModel=Apartment, so the consumer is created and driven entirely on the worker's STA. It owns no internal timer.MxAccessStaSessiondrives the STA alarm poll loop:RunAlarmPollLoopAsyncawaits a fixed500 msinterval and then callsIAlarmCommandHandler.PollOnceon the STA via the runtime, so everyGetXmlCurrentAlarms2call stays on the apartment that owns the consumer. A poll failure is recorded as aWorkerFaulton the event queue rather than terminating the worker.AlarmCommandHandlerowns oneAlarmDispatcherper session and is the entry point for the alarm IPC commands (SubscribeAlarms,AcknowledgeAlarmby GUID or name,QueryActiveAlarms,Unsubscribe). It rejects a second subscribe before an unsubscribe, mirroring the consumer's non-idempotentSubscribe.AlarmDispatcherwires the consumer'sAlarmTransitionEmittedstream ontoMxAccessAlarmEventSink.EnqueueTransition. It maps state transitions throughAlarmRecordTransitionMapper, composes the canonical\\<machine>\Galaxy!<area>full reference, and projects active-alarm snapshots toActiveAlarmSnapshotprotos for theQueryActiveAlarmsrefresh stream.MxAccessAlarmEventSinkenqueues each decoded transition onto the sharedMxAccessEventQueueas a proto alarm-transition event, stamping the session id, so alarms ride the same outbound IPC path as tag-data events.
Command Queue
The pipe reader converts WorkerCommand messages into StaCommand entries.
Each entry should include:
- correlation id,
- method name,
- method-specific request payload,
- enqueue timestamp,
- cancellation marker,
- reply completion path.
The STA command dispatcher:
- Dequeues one command.
- Checks whether shutdown has started.
- Calls the matching MXAccess method.
- Captures return values, out parameters, status arrays, and HRESULT.
- Converts results to
WorkerCommandReply. - Enqueues the reply to the pipe writer.
The STA should execute one command at a time. MXAccess command ordering must be preserved for one worker.
Command Dispatch Surface
Phase 1 commands:
RegisterUnregisterAddItemRemoveItem
Phase 2 event commands:
AdviseUnAdviseAdviseSupervisory
Full surface:
AddItem2AddBufferedItemSetBufferedUpdateIntervalSuspendActivateWriteWrite2WriteSecuredWriteSecured2AuthenticateUserArchestrAUserToId
Diagnostics:
PingGetSessionStateGetWorkerInfoDrainEventsShutdownWorker
Implement method-specific dispatch instead of a generic string method invoker. Parity tests need stable command-specific request and reply shapes.
MxAccessCommandExecutor implements the first command pair:
RegistercallsLMXProxyServerClass.Registerwith the requested client name and preserves the returned server handle in bothReturnValueandRegisterReply.ServerHandle.UnregistercallsLMXProxyServerClass.Unregisterwith the requested server handle. The reply has no method-specific payload because the public MXAccess method returnsvoid.
Both commands set Hresult to 0 only after the COM call returns normally.
COM exceptions flow through StaCommandDispatcher, which captures the thrown
HRESULT and converts the reply to ProtocolStatusCode.MxaccessFailure.
MxAccessStaSession.GetRegisteredServerHandlesAsync returns an STA-read
snapshot of tracked server handles for diagnostics and future cleanup logic.
MxAccessCommandExecutor also implements the item lifecycle commands:
AddItemcallsLMXProxyServerClass.AddItemwith the requested server handle and item definition. It preserves the returned item handle in bothReturnValueandAddItemReply.ItemHandle.AddItem2callsLMXProxyServerClass.AddItem2with the requested server handle, item definition, and context string. The context string is passed to MXAccess exactly as received.RemoveItemcallsLMXProxyServerClass.RemoveItemwith the requested server handle and item handle. The reply has no method-specific payload because the public MXAccess method returnsvoid.
The worker records item handles only after AddItem or AddItem2 returns
normally, and removes item handles only after RemoveItem returns normally.
The registry does not prevalidate server or item handles, so invalid and
cross-server handle behavior remains owned by MXAccess. COM exceptions continue
through StaCommandDispatcher, which preserves the HRESULT and leaves
diagnostic registry state unchanged for failed cleanup calls.
MxAccessCommandExecutor implements advice lifecycle commands on the same STA
path:
AdvisecallsLMXProxyServerClass.Advisewith the requested server handle and item handle.AdviseSupervisorycallsLMXProxyServerClass.AdviseSupervisorywith the requested server handle and item handle. This remains a distinct command from plainAdviseeven though observed scalar captures share the same lower-level subscription body.UnAdvisecallsLMXProxyServerClass.UnAdvisewith the requested server handle and item handle.
The worker records plain and supervisory advice separately only after the COM
call returns normally. Successful UnAdvise removes all tracked advice for the
server and item pair because the public MXAccess cleanup method has no plain
versus supervisory selector. Successful RemoveItem and Unregister also clear
related advice state from the worker registry. Failed advice and cleanup calls
leave registry state unchanged so diagnostics continue to reflect the last
successful MXAccess-owned state transition.
Handle Registry
The worker should track MXAccess state for diagnostics and cleanup, while still treating MXAccess as the authority.
Suggested tracked state:
- registered server handles,
- item handles,
- item names and context,
- server handle for each item,
- advise state,
- buffered item state,
- authenticated user ids if needed,
- last command touching each handle.
Rules:
- Do not invent handles.
- Do not rewrite handles returned by MXAccess.
- Record server handles only after
Registersucceeds. - Remove server handles only after
Unregistersucceeds. - Record item handles only after
AddItemorAddItem2succeeds. - Remove item handles only after
RemoveItemsucceeds. - Record advice state only after
AdviseorAdviseSupervisorysucceeds. - Remove advice state only after
UnAdvise,RemoveItem, orUnregistersucceeds. - Preserve invalid-handle behavior from MXAccess.
- Preserve cross-server handle behavior from MXAccess.
- Use registry state for cleanup and diagnostics, not semantic correction.
Value Conversion
VariantConverter should convert COM values into the protobuf MxValue union.
Supported scalar projections:
- bool,
- int32,
- int64,
- float,
- double,
- string,
- timestamp,
- raw fallback.
Supported arrays:
- bool array,
- int32 array,
- float array,
- double array,
- string array,
- timestamp array,
- raw fallback.
Rules:
- Preserve null and empty values distinctly when MXAccess exposes a distinction.
- Preserve array rank and dimensions when available.
- Preserve original variant type metadata.
- If conversion is lossy, include the best typed value plus raw diagnostic metadata.
- Do not throw away values just because they are awkward.
Credential-bearing values must not be logged.
Status And HRESULT Capture
MXSTATUS_PROXY arrays must be represented explicitly. Do not collapse status
arrays into a single success flag.
For every command reply, capture:
- protocol success/failure,
- method name,
- correlation id,
- COM HRESULT if available,
- thrown exception HRESULT if available,
- MXAccess return value if any,
- method-specific out parameters,
- status array,
- diagnostic message safe for logs.
If a COM call throws, map the exception into a command reply instead of crashing the worker, unless the exception indicates process corruption or the configured policy says to fail the session.
Cancellation
Worker cancellation is cooperative at the queue boundary.
Rules:
- If a
WorkerCancelarrives before a command starts, mark the command canceled and reply or drop according to protocol policy. - If a command is already executing on the STA, do not attempt to abort the COM call.
- When the COM call returns after gateway cancellation, send the reply only if the gateway still wants late replies; otherwise log and discard.
- Hard cancellation is process kill by the gateway.
Outbound Queues
The worker should use bounded outbound queues for replies, events, heartbeats, and faults.
Priority order when writing:
- faults,
- command replies,
- shutdown acknowledgements,
- heartbeats,
- events.
Event overflow policy defaults to fail-fast for parity testing. If the event queue fills:
- Capture overflow metrics.
- Send
WorkerFaultif possible. - Stop accepting new commands.
- Let the gateway close or kill the worker.
Production coalescing may be added later, but it must be explicit and tested. Do not drop or coalesce events in v1.
Heartbeat And Watchdog
WorkerPipeSession starts the heartbeat loop after the gateway validates
WorkerHello and receives WorkerReady. Heartbeats continue until
WorkerShutdown, cancellation, or a pipe/protocol failure stops the session.
The loop uses WorkerPipeSessionOptions.HeartbeatInterval; the default matches
the gateway worker heartbeat interval.
The worker heartbeat proves that:
- pipe writer is alive,
- worker host is alive,
- STA has recently pumped or completed work.
Heartbeat payload includes:
- worker process id,
- session id,
- current state,
- last STA activity timestamp,
- pending command count,
- outbound event queue depth,
- event sequence,
- current command correlation id if any.
MxAccessStaSession.CaptureHeartbeat() reads StaRuntime.LastActivityUtc and
StaCommandDispatcher queue state without touching the raw MXAccess COM object
outside the STA. Event queue depth and event sequence are reported as zero until
the event queue implementation owns those counters.
The STA watchdog currently emits a WorkerFault with
WorkerFaultCategory.StaHung when LastStaActivityUtc is older than
WorkerPipeSessionOptions.HeartbeatGrace and no command is in flight.
StaRuntime.ProcessQueuedCommands calls MarkActivity() only immediately
before and after each work item, so a synchronously long-running STA command
(for example a ReadBulk waiting timeout_ms for the first OnDataChange)
legitimately freezes LastStaActivityUtc for the duration of the wait while
the worker is healthy. The watchdog is therefore suppressed while the
heartbeat snapshot's CurrentCommandCorrelationId is non-empty: the worker is
busy executing a command, not hung, and the heartbeat already surfaces the
in-flight correlation id so the gateway can apply its own per-command timeout
if it considers the command too slow. The fault still fires on a truly hung
STA — no command in flight and no activity for longer than HeartbeatGrace —
which is the only case the watchdog can usefully distinguish from a slow
command. Command duration and high event queue depth remain observable through
heartbeat fields until dedicated thresholds own those warnings. The worker
reports stale STA activity, but the gateway owns the final kill decision
through its existing heartbeat and worker lifecycle policy.
The in-flight-command suppression itself is bounded by
WorkerPipeSessionOptions.HeartbeatStuckCeiling (default 75 seconds = 5 ×
HeartbeatGrace). The motivating case for the suppression is a legitimately
slow synchronous command — but a genuinely stuck COM call (for example
against a dead MXAccess provider whose cross-apartment marshaler is
permanently blocked, or a write completion that never fires) leaves
CurrentCommandCorrelationId non-empty indefinitely. Without an upper bound
the worker-side StaHung watchdog would be permanently defeated for that
session and only the gateway's per-command timeout would catch the hang —
losing the worker-originated diagnostic (StaHung fault category, the
stale-by interval) from the gateway audit trail. Once LastStaActivityUtc
has been stale for longer than HeartbeatStuckCeiling, the watchdog fires
StaHung regardless of whether a command is in flight, on the assumption
that no legitimate STA command should run that long without periodically
refreshing activity. Deployments that legitimately run very long bulk
operations should raise the ceiling rather than disable it.
Shutdown
Graceful shutdown sequence:
- Pipe reader receives
WorkerShutdown. - Worker host marks shutdown requested.
- Reject new commands.
- Let current STA command finish if within timeout.
- Optionally run MXAccess cleanup:
UnAdvise,RemoveItem,Unregister.
- Detach event handlers.
- Release COM object until reference count reaches zero when possible.
- Stop pipe reader and writer.
- Exit process with success code.
If shutdown wedges, the gateway kills the process. The worker should be written so process kill does not corrupt other sessions.
MxAccessStaSession.ShutdownGracefullyAsync implements the current cleanup
path. It first calls StaCommandDispatcher.RequestShutdown() so new commands
are rejected and queued commands that have not started receive
ProtocolStatusCode.WorkerUnavailable. The command already executing on the
STA is allowed to finish until the shutdown grace period expires.
After command dispatch is closed, cleanup runs on the STA in MXAccess handle order:
- one
UnAdvisecall per advised server/item pair, RemoveItemfor active item handles,Unregisterfor active server handles,- event sink detach,
- COM release.
Each cleanup call is best effort. A failed cleanup operation is recorded as an
MxAccessShutdownFailure, logged by WorkerPipeSession, and does not prevent
later cleanup calls from running. A shutdown with cleanup failures still returns
WorkerShutdownAck with ProtocolStatusCode.Ok because the worker reached the
controlled release path. If the grace period expires before cleanup can run or
finish, the worker reports WorkerFaultCategory.ShutdownTimeout when possible
and relies on the gateway to kill the process.
Fault Handling
Worker fault categories:
InvalidArgumentsGatewayAuthenticationFailedProtocolMismatchProtocolViolationPipeDisconnectedMxAccessCreationFailedMxAccessCommandFailedMxAccessEventConversionFailedStaHungQueueOverflowShutdownTimeout
Fault payload should include:
- category,
- session id,
- correlation id when command-specific,
- command method when command-specific,
- HRESULT when available,
- exception type when available,
- safe diagnostic message.
Do not include raw credentials or full secured-write values.
Security
The worker should trust only the launching gateway after validating:
- expected session id,
- expected protocol version,
- nonce,
- pipe identity where available.
It should not expose any network listener. It should not accept commands from arbitrary local processes.
Credential-bearing commands must keep credential data out of:
- command line,
- logs,
- metrics labels,
- exception messages,
- crash dumps when avoidable.
Observability
Worker logs should include:
- startup arguments except secrets,
- protocol version,
- gateway handshake result,
- MXAccess COM creation result,
- command start/end with correlation id,
- HRESULT/status summary,
- event family and sequence,
- queue overflow,
- STA watchdog warnings,
- shutdown path.
Metrics can be emitted through the gateway or exposed as worker heartbeat fields. The worker does not need its own public metrics endpoint.
Testing Strategy
Worker tests that do not require installed MXAccess:
- frame reader/writer,
- protocol validation,
- command queue ordering,
- STA command scheduling with a fake COM object,
- message-pump wake behavior where practical,
- value conversion,
- status conversion,
- event conversion from fake event args,
- shutdown state transitions,
- queue overflow behavior.
Live MXAccess tests:
- COM creation on STA,
RegisterandUnregister,AddItemandRemoveItem,Adviseand oneOnDataChange,- write completion behavior,
- secured write behavior,
- buffered data-change behavior,
- invalid handle behavior.
- no synthesized
OperationCompletewhen native MXAccess does not raise it. - raw metadata preservation for buffered payloads that cannot yet be fully converted.
Live tests should be opt-in and clearly marked because they depend on installed
MXAccess COM and provider state.
The worker test suite uses MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 for these
tests. AddItem uses TestChildObject.TestInt by default and accepts an
override through MXGATEWAY_LIVE_MXACCESS_ITEM; AddItem2 uses the captured
parity fixture shape AddItem2("TestInt", "TestChildObject").
WorkerLiveMxAccessSmokeTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/ uses the
same opt-in variable for the gateway-to-worker live smoke. It launches the x86
worker through WorkerProcessLauncher, opens a gateway session, runs
Register, AddItem, and Advise, waits for one OnDataChange, and closes
the session. The smoke accepts MXGATEWAY_LIVE_MXACCESS_WORKER_EXE for a
non-default worker executable path and
MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS for the bounded event wait.
Initial Implementation Slice
The first worker slice should implement:
- Argument parsing and pipe connection.
- Protocol hello and nonce validation.
- STA thread startup.
- COM initialization and MXAccess object creation.
- Message pump with command wake event.
WorkerReady.- Shutdown command.
Register,AddItem, andAdvise.- Event sink for one
OnDataChange. - Basic value/status conversion.
- Event model coverage for
OperationCompleteandOnBufferedDataChangewithout synthesized events. - Fault reporting.
This slice proves the worker can preserve the core MXAccess requirements: single-process isolation, STA ownership, message pumping, command execution, and event delivery.