Files
mxaccessgw/docs/WorkerSta.md
T
2026-04-29 07:27:00 -04:00

158 lines
9.8 KiB
Markdown

# Worker STA Runtime
The worker STA runtime owns the dedicated single-threaded apartment thread that hosts the MXAccess COM object, runs a Windows message pump for COM event delivery, and serializes all gateway commands onto that thread.
## Why an STA Is Required
The installed MXAccess interop assembly declares an `Apartment` threading model (see `AGENTS.md` under "Worker Rules"). COM objects with that model must be created and called on a thread initialized as a single-threaded apartment, and any callbacks the object raises (event sink calls) are delivered through the thread's Windows message queue. A plain blocking queue is not sufficient: the STA loop must pump Windows messages so that the COM marshaler can deliver event invocations on the same thread that holds the object. Because of that constraint, every MXAccess operation in the worker is funneled through the types in `src/MxGateway.Worker/Sta/`.
## Types
| Type | Role |
|------|------|
| `StaRuntime` | Owns the STA `Thread`, the command queue, and the lifecycle gates. |
| `IStaComApartmentInitializer` / `StaComApartmentInitializer` | Calls `CoInitializeEx` / `CoUninitialize` so the thread enters and leaves the apartment-threaded apartment. |
| `StaMessagePump` | Wraps `MsgWaitForMultipleObjectsEx`, `PeekMessage`, `TranslateMessage`, and `DispatchMessage` so the STA loop can wait on work and drain the Windows message queue. |
| `IStaWorkItem` / `StaWorkItem<T>` | Internal queue entries that capture a delegate, a `CancellationToken`, and a `TaskCompletionSource<T>` for the caller. |
| `StaCommand` | Carries an `MxCommand` together with `SessionId`, `CorrelationId`, `EnqueueTimestamp`, and a `CancellationToken`. |
| `IStaCommandExecutor` | The boundary between the dispatcher and the MXAccess interop layer; returns `MxCommandReply`. |
| `StaCommandDispatcher` | Bounded asynchronous queue in front of `StaRuntime` that converts `StaCommand` into `MxCommandReply` and applies status normalization. |
## STA Thread Initialization
`StaRuntime`'s constructor configures a background `Thread` named `MxGateway.Worker.STA` and forces it into `ApartmentState.STA` before the thread starts. `Start()` releases the thread and then blocks on `startedEvent` so callers observe a fully-initialized apartment (or a captured `startupException`) before the first `InvokeAsync` call:
```csharp
staThread = new Thread(ThreadMain)
{
IsBackground = true,
Name = "MxGateway.Worker.STA"
};
staThread.SetApartmentState(ApartmentState.STA);
```
`StaComApartmentInitializer.Initialize` calls `CoInitializeEx` with `COINIT_APARTMENTTHREADED` (`0x2`) and treats both `S_OK` and `S_FALSE` as success because `S_FALSE` indicates the apartment was already initialized on this thread. Any other HRESULT throws `COMException` so `ThreadMain` records the failure and signals `startedEvent`, which causes `Start()` to surface the exception.
## The STA Loop
`ThreadMain` runs the canonical "wait, pump, dispatch" loop. It enters COM, drains queued work, blocks until either a command arrives or a Windows message is posted, then pumps the message queue and records activity for the heartbeat:
```csharp
StaThreadId = Thread.CurrentThread.ManagedThreadId;
comApartmentInitializer.Initialize();
comInitialized = true;
MarkActivity();
startedEvent.Set();
while (!IsShutdownRequested())
{
ProcessQueuedCommands();
messagePump.WaitForWorkOrMessages(commandWakeEvent, idlePumpInterval);
messagePump.PumpPendingMessages();
MarkActivity();
}
```
`commandWakeEvent` is an `AutoResetEvent` set by `InvokeAsync` whenever a new work item is enqueued. The `idlePumpInterval` defaults to 50 ms so the pump still services Windows messages even when no commands are queued; this matters because COM event sink calls arrive as posted messages and would otherwise sit in the queue until the next command. `LastActivityUtc` is updated through `Volatile.Write` on every iteration so the worker's heartbeat path can read it without holding `gate`.
### The message pump
`StaMessagePump.WaitForWorkOrMessages` calls `MsgWaitForMultipleObjectsEx` with `QS_ALLINPUT` and `MWMO_INPUTAVAILABLE`, so the wait wakes for either a signal on `commandWakeEvent` or any new message in the thread's Windows queue. `PumpPendingMessages` then drains the queue with `PM_REMOVE`:
```csharp
public int PumpPendingMessages()
{
int pumpedMessages = 0;
while (PeekMessage(out NativeMessage message, IntPtr.Zero, 0, 0, PmRemove))
{
TranslateMessage(ref message);
DispatchMessage(ref message);
pumpedMessages++;
}
return pumpedMessages;
}
```
`DispatchMessage` is what causes the COM marshaler to deliver event sink calls onto the STA thread; without this loop, MXAccess events never surface to managed code.
## Work Items and the Command Queue
`StaRuntime` exposes two `InvokeAsync` overloads. Both wrap the delegate in an internal `StaWorkItem<T>`, enqueue it on a `ConcurrentQueue<IStaWorkItem>`, and signal `commandWakeEvent`:
```csharp
StaWorkItem<T> workItem = new(command, cancellationToken);
lock (gate)
{
if (shutdownRequested)
{
return Task.FromException<T>(
new InvalidOperationException("The worker STA runtime is shutting down."));
}
commandQueue.Enqueue(workItem);
}
commandWakeEvent.Set();
return workItem.Task;
```
`StaWorkItem<T>` uses an `Interlocked.CompareExchange` on `started` so that exactly one of three outcomes happens: the STA thread executes the delegate, an external `CancellationToken` cancellation fires first, or `CancelBeforeExecution` runs during shutdown. `Execute` runs on the STA thread, sets `Completion` from `command()`, and propagates exceptions through `TrySetException` so the awaiting caller observes them.
`ProcessQueuedCommands` is the only consumer of the queue and runs on the STA thread, so each work item is guaranteed to execute on the apartment that owns the COM object.
## Command Dispatch
`StaCommandDispatcher` sits between the worker's IPC layer and `StaRuntime`. It converts an `StaCommand` into an `MxCommandReply` and enforces a bounded queue: when `commandQueue.Count` reaches `maxPendingCommands` (default `DefaultMaxPendingCommands = 128`) the dispatcher returns a synthetic `WorkerUnavailable` reply rather than queueing further work. This back-pressure keeps the STA from accumulating an unbounded backlog while it is busy with a long-running call.
A single drain task pulls from `commandQueue` and submits each command to the STA via `staRuntime.InvokeAsync`:
```csharp
SetCurrentCommand(command.CorrelationId);
try
{
MxCommandReply reply = await staRuntime
.InvokeAsync(() => commandExecutor.Execute(command))
.ConfigureAwait(false);
queuedCommand.Complete(NormalizeReply(command, reply));
}
catch (Exception exception)
{
queuedCommand.Complete(CreateExceptionReply(command, exception));
}
finally
{
SetCurrentCommand(string.Empty);
}
```
`SetCurrentCommand` records the in-flight `CorrelationId` so `PopulateHeartbeat` can publish both `PendingCommandCount` and `CurrentCommandCorrelationId` on the worker heartbeat. Exceptions are converted through `HResultConverter` so the IPC reply still carries a structured `ProtocolStatus`, an HRESULT, and a diagnostic message instead of an unhandled fault. `NormalizeReply` backfills `SessionId`, `CorrelationId`, `Kind`, and a default `ProtocolStatusCode.Ok` so executors can return minimal replies without restating the envelope.
`CancelQueuedCommand` walks the queue and completes a single matching entry with `ProtocolStatusCode.Canceled`. It cannot abort a command already running on the STA: per `AGENTS.md`, "Canceling a gRPC call should stop waiting in the gateway, but it cannot safely abort an in-flight COM call on the STA. Hard cancellation means killing the worker process."
## Why the STA Loop Cannot Block on I/O
`AGENTS.md` states explicitly: "Do not block the STA on pipe writes, gRPC calls, or slow consumers. Event handlers should convert event args, enqueue outbound events, and return to pumping messages." The STA thread is the only thread that can service COM event callbacks, so any work that blocks it stalls every event the MXAccess object would otherwise deliver. The runtime keeps to that rule by giving the STA only two responsibilities inside `ThreadMain`: executing already-queued work items and pumping messages. Outbound event delivery and pipe writes happen on threads that observe the queues populated from the STA, never on the STA itself.
## Shutdown Sequence
`StaRuntime.Shutdown(TimeSpan timeout)` performs an ordered shutdown:
1. Sets `shutdownRequested` under `gate` so `InvokeAsync` rejects new work with `InvalidOperationException`.
2. Signals `commandWakeEvent` to break the STA out of `WaitForWorkOrMessages`.
3. Waits up to `timeout` on `stoppedEvent`, which the STA sets after it leaves `ThreadMain`.
4. Once the thread has stopped, drains the queue through `CancelQueuedCommands`, which calls `CancelBeforeExecution` on every remaining work item so awaiting callers observe `OperationCanceledException` instead of hanging.
`ThreadMain`'s `finally` block guarantees that `comApartmentInitializer.Uninitialize` runs (when COM was successfully initialized) before `stoppedEvent.Set`, so the apartment is always torn down on the same thread that initialized it. `Dispose` calls `Shutdown` with a five-second budget and only disposes the wait handles when shutdown actually completed, which prevents a still-running STA thread from touching disposed handles.
`StaCommandDispatcher.RequestShutdown` mirrors the same intent at the dispatcher layer: it sets `shutdownRequested`, drains its own queue, and completes every queued command with `ProtocolStatusCode.WorkerUnavailable` so callers receive a structured rejection during teardown.
## Related Documentation
- [MXAccess worker instance design](./mxaccess-worker-instance-design.md)
- [Worker conversion](./WorkerConversion.md)
- [Worker bootstrap](./WorkerBootstrap.md)