Implement graceful worker shutdown

This commit is contained in:
Joseph Doherty
2026-04-26 19:36:22 -04:00
parent 95e71cd819
commit d890eff862
15 changed files with 694 additions and 11 deletions
+30
View File
@@ -321,6 +321,13 @@ If COM creation fails, the worker should send a structured fault with:
when the exception exposes one, and does not send `WorkerReady` after a failed
COM creation attempt.
After `WorkerReady`, `WorkerPipeSession` continues reading gateway frames for
the lifetime of the process. `WorkerCommand` frames are dispatched to
`MxAccessStaSession`, replies are written as `WorkerCommandReply`, and queued
worker events are drained after command replies. `WorkerShutdown` starts the
graceful shutdown path and returns `WorkerShutdownAck` only after the STA
cleanup path completes.
## Event Sink
The worker must subscribe to every public MXAccess event family:
@@ -663,6 +670,29 @@ Graceful shutdown sequence:
If shutdown wedges, the gateway kills the process. The worker should be written
so process kill does not corrupt other sessions.
`MxAccessStaSession.ShutdownGracefullyAsync` implements the current cleanup
path. It first calls `StaCommandDispatcher.RequestShutdown()` so new commands
are rejected and queued commands that have not started receive
`ProtocolStatusCode.WorkerUnavailable`. The command already executing on the
STA is allowed to finish until the shutdown grace period expires.
After command dispatch is closed, cleanup runs on the STA in MXAccess handle
order:
1. one `UnAdvise` call per advised server/item pair,
2. `RemoveItem` for active item handles,
3. `Unregister` for active server handles,
4. event sink detach,
5. COM release.
Each cleanup call is best effort. A failed cleanup operation is recorded as an
`MxAccessShutdownFailure`, logged by `WorkerPipeSession`, and does not prevent
later cleanup calls from running. A shutdown with cleanup failures still returns
`WorkerShutdownAck` with `ProtocolStatusCode.Ok` because the worker reached the
controlled release path. If the grace period expires before cleanup can run or
finish, the worker reports `WorkerFaultCategory.ShutdownTimeout` when possible
and relies on the gateway to kill the process.
## Fault Handling
Worker fault categories: