Files
mxaccessgw/docs/WorkerSta.md
T
Joseph Doherty ddad573b75 Merge origin/main with local pending work and update AGENTS.md references
- Resolve 14 conflicts from popping local stash on top of origin's
  eed1e88 + 8d3352f doc-comment additions (11 mechanical, plus
  version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs)
- Fix 4 test files that used AGENTS.md as the repo-root sentinel
  (now use CLAUDE.md, since AGENTS.md was removed in 4731ab5)
- Redirect 10 doc citations from AGENTS.md to the matching gateway.md
  sections (Value Model, Status Model, Security, STA Worker Thread
  Model, gRPC Layer rule, cancellation rule)

Verified: solution build clean, x86 worker build clean, 266/266
gateway tests passing, 121/121 worker tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:13:33 -04:00

9.8 KiB

Worker STA Runtime

The worker STA runtime owns the dedicated single-threaded apartment thread that hosts the MXAccess COM object, runs a Windows message pump for COM event delivery, and serializes all gateway commands onto that thread.

Why an STA Is Required

The installed MXAccess interop assembly declares an Apartment threading model (see gateway.md under "STA Worker Thread Model"). COM objects with that model must be created and called on a thread initialized as a single-threaded apartment, and any callbacks the object raises (event sink calls) are delivered through the thread's Windows message queue. A plain blocking queue is not sufficient: the STA loop must pump Windows messages so that the COM marshaler can deliver event invocations on the same thread that holds the object. Because of that constraint, every MXAccess operation in the worker is funneled through the types in src/MxGateway.Worker/Sta/.

Types

Type Role
StaRuntime Owns the STA Thread, the command queue, and the lifecycle gates.
IStaComApartmentInitializer / StaComApartmentInitializer Calls CoInitializeEx / CoUninitialize so the thread enters and leaves the apartment-threaded apartment.
StaMessagePump Wraps MsgWaitForMultipleObjectsEx, PeekMessage, TranslateMessage, and DispatchMessage so the STA loop can wait on work and drain the Windows message queue.
IStaWorkItem / StaWorkItem<T> Internal queue entries that capture a delegate, a CancellationToken, and a TaskCompletionSource<T> for the caller.
StaCommand Carries an MxCommand together with SessionId, CorrelationId, EnqueueTimestamp, and a CancellationToken.
IStaCommandExecutor The boundary between the dispatcher and the MXAccess interop layer; returns MxCommandReply.
StaCommandDispatcher Bounded asynchronous queue in front of StaRuntime that converts StaCommand into MxCommandReply and applies status normalization.

STA Thread Initialization

StaRuntime's constructor configures a background Thread named MxGateway.Worker.STA and forces it into ApartmentState.STA before the thread starts. Start() releases the thread and then blocks on startedEvent so callers observe a fully-initialized apartment (or a captured startupException) before the first InvokeAsync call:

staThread = new Thread(ThreadMain)
{
    IsBackground = true,
    Name = "MxGateway.Worker.STA"
};
staThread.SetApartmentState(ApartmentState.STA);

StaComApartmentInitializer.Initialize calls CoInitializeEx with COINIT_APARTMENTTHREADED (0x2) and treats both S_OK and S_FALSE as success because S_FALSE indicates the apartment was already initialized on this thread. Any other HRESULT throws COMException so ThreadMain records the failure and signals startedEvent, which causes Start() to surface the exception.

The STA Loop

ThreadMain runs the canonical "wait, pump, dispatch" loop. It enters COM, drains queued work, blocks until either a command arrives or a Windows message is posted, then pumps the message queue and records activity for the heartbeat:

StaThreadId = Thread.CurrentThread.ManagedThreadId;
comApartmentInitializer.Initialize();
comInitialized = true;
MarkActivity();
startedEvent.Set();

while (!IsShutdownRequested())
{
    ProcessQueuedCommands();
    messagePump.WaitForWorkOrMessages(commandWakeEvent, idlePumpInterval);
    messagePump.PumpPendingMessages();
    MarkActivity();
}

commandWakeEvent is an AutoResetEvent set by InvokeAsync whenever a new work item is enqueued. The idlePumpInterval defaults to 50 ms so the pump still services Windows messages even when no commands are queued; this matters because COM event sink calls arrive as posted messages and would otherwise sit in the queue until the next command. LastActivityUtc is updated through Volatile.Write on every iteration so the worker's heartbeat path can read it without holding gate.

The message pump

StaMessagePump.WaitForWorkOrMessages calls MsgWaitForMultipleObjectsEx with QS_ALLINPUT and MWMO_INPUTAVAILABLE, so the wait wakes for either a signal on commandWakeEvent or any new message in the thread's Windows queue. PumpPendingMessages then drains the queue with PM_REMOVE:

public int PumpPendingMessages()
{
    int pumpedMessages = 0;

    while (PeekMessage(out NativeMessage message, IntPtr.Zero, 0, 0, PmRemove))
    {
        TranslateMessage(ref message);
        DispatchMessage(ref message);
        pumpedMessages++;
    }

    return pumpedMessages;
}

DispatchMessage is what causes the COM marshaler to deliver event sink calls onto the STA thread; without this loop, MXAccess events never surface to managed code.

Work Items and the Command Queue

StaRuntime exposes two InvokeAsync overloads. Both wrap the delegate in an internal StaWorkItem<T>, enqueue it on a ConcurrentQueue<IStaWorkItem>, and signal commandWakeEvent:

StaWorkItem<T> workItem = new(command, cancellationToken);

lock (gate)
{
    if (shutdownRequested)
    {
        return Task.FromException<T>(
            new InvalidOperationException("The worker STA runtime is shutting down."));
    }

    commandQueue.Enqueue(workItem);
}

commandWakeEvent.Set();
return workItem.Task;

StaWorkItem<T> uses an Interlocked.CompareExchange on started so that exactly one of three outcomes happens: the STA thread executes the delegate, an external CancellationToken cancellation fires first, or CancelBeforeExecution runs during shutdown. Execute runs on the STA thread, sets Completion from command(), and propagates exceptions through TrySetException so the awaiting caller observes them.

ProcessQueuedCommands is the only consumer of the queue and runs on the STA thread, so each work item is guaranteed to execute on the apartment that owns the COM object.

Command Dispatch

StaCommandDispatcher sits between the worker's IPC layer and StaRuntime. It converts an StaCommand into an MxCommandReply and enforces a bounded queue: when commandQueue.Count reaches maxPendingCommands (default DefaultMaxPendingCommands = 128) the dispatcher returns a synthetic WorkerUnavailable reply rather than queueing further work. This back-pressure keeps the STA from accumulating an unbounded backlog while it is busy with a long-running call.

A single drain task pulls from commandQueue and submits each command to the STA via staRuntime.InvokeAsync:

SetCurrentCommand(command.CorrelationId);
try
{
    MxCommandReply reply = await staRuntime
        .InvokeAsync(() => commandExecutor.Execute(command))
        .ConfigureAwait(false);

    queuedCommand.Complete(NormalizeReply(command, reply));
}
catch (Exception exception)
{
    queuedCommand.Complete(CreateExceptionReply(command, exception));
}
finally
{
    SetCurrentCommand(string.Empty);
}

SetCurrentCommand records the in-flight CorrelationId so PopulateHeartbeat can publish both PendingCommandCount and CurrentCommandCorrelationId on the worker heartbeat. Exceptions are converted through HResultConverter so the IPC reply still carries a structured ProtocolStatus, an HRESULT, and a diagnostic message instead of an unhandled fault. NormalizeReply backfills SessionId, CorrelationId, Kind, and a default ProtocolStatusCode.Ok so executors can return minimal replies without restating the envelope.

CancelQueuedCommand walks the queue and completes a single matching entry with ProtocolStatusCode.Canceled. It cannot abort a command already running on the STA: per gateway.md, "Canceling a gRPC call should stop waiting in the gateway, but it cannot safely abort an in-flight COM call on the STA. Hard cancellation means killing the worker process."

Why the STA Loop Cannot Block on I/O

gateway.md states explicitly: "Do not block the STA on pipe writes, gRPC calls, or slow consumers. Event handlers should convert event args, enqueue outbound events, and return to pumping messages." The STA thread is the only thread that can service COM event callbacks, so any work that blocks it stalls every event the MXAccess object would otherwise deliver. The runtime keeps to that rule by giving the STA only two responsibilities inside ThreadMain: executing already-queued work items and pumping messages. Outbound event delivery and pipe writes happen on threads that observe the queues populated from the STA, never on the STA itself.

Shutdown Sequence

StaRuntime.Shutdown(TimeSpan timeout) performs an ordered shutdown:

  1. Sets shutdownRequested under gate so InvokeAsync rejects new work with InvalidOperationException.
  2. Signals commandWakeEvent to break the STA out of WaitForWorkOrMessages.
  3. Waits up to timeout on stoppedEvent, which the STA sets after it leaves ThreadMain.
  4. Once the thread has stopped, drains the queue through CancelQueuedCommands, which calls CancelBeforeExecution on every remaining work item so awaiting callers observe OperationCanceledException instead of hanging.

ThreadMain's finally block guarantees that comApartmentInitializer.Uninitialize runs (when COM was successfully initialized) before stoppedEvent.Set, so the apartment is always torn down on the same thread that initialized it. Dispose calls Shutdown with a five-second budget and only disposes the wait handles when shutdown actually completed, which prevents a still-running STA thread from touching disposed handles.

StaCommandDispatcher.RequestShutdown mirrors the same intent at the dispatcher layer: it sets shutdownRequested, drains its own queue, and completes every queued command with ProtocolStatusCode.WorkerUnavailable so callers receive a structured rejection during teardown.