Files
mxaccessgw/code-reviews/Worker/findings.md
T
Joseph Doherty 1764eff1cf Resolve Worker-009..015 code-review findings
Worker-009: WorkerFrameWriter serialized twice and WorkerFrameReader
allocated a payload byte[] per frame. The writer now serializes once into a
single prefix+payload buffer; the reader rents the payload buffer from
ArrayPool and honors the logical frame length.

Worker-010: VariantConverter projected a uint+Time value as a full FILETIME,
producing a near-1601 timestamp. The FILETIME projection is now gated on
`value is long`; uint falls through to the integer projection.

Worker-011: replaced the opaque retryAttempts formula in WorkerPipeClient
with MaxRetryAttempts = int.MaxValue, leaving the connect deadline as the
sole bound.

Worker-012: rewrote stale "future PR / polls on a Timer" comments in
AlarmDispatcher, AlarmCommandHandler, MxAccessAlarmEventSink and
MxAccessEventMapper to match the shipped, post-Worker-001 behavior.

Worker-013 (re-triaged): already resolved — StaMessagePumpTests and
MxAccessStaSessionTests cover the pump and poll loop directly.

Worker-014: moved IAlarmCommandHandler into its own file so
AlarmCommandHandler.cs declares one public type.

Worker-015: clarified the MxAccessBaseEventSink.EnqueueEvent overflow-catch
comment explaining the deliberate double RecordFault no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:42:17 -04:00

31 KiB

Code Review — Worker

Field Value
Module src/MxGateway.Worker
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 6c64030
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Issues found: heartbeat loop sleeps before first beat (Worker-002), ProcessCommandAsync state race drops replies (Worker-003), watchdog/heartbeat state inconsistency (Worker-004), double-dispose path (Worker-006), plus Worker-010/011/015.
2 mxaccessgw conventions Issue found: Worker-007 (reflection-based COM invocation bypasses the typed interface contract).
3 Concurrency & thread safety Issues found: Worker-001 (WnWrapAlarmConsumer timer fires COM off the STA), Worker-008 (consumer factory STA-affinity not enforced).
4 Error handling & resilience Issue found: Worker-005 (OnPoll silently swallows all poll failures).
5 Security No secret logging (redaction applied); inbound frame validation reasonable. No issues found.
6 Performance & resource management Issue found: Worker-009 (per-frame byte[] allocations on the hot event path). COM release is correct.
7 Design-document adherence Code matches WorkerSta.md/WorkerFrameProtocol.md; stale alarm-path docs (Worker-012).
8 Code organization & conventions Issue found: Worker-014 (AlarmCommandHandler.cs declares two public types in one file).
9 Testing coverage Issue found: Worker-013 (StaMessagePump has no direct tests; poll-loop lifecycle untested).
10 Documentation & comments Issue found: Worker-012 (stale "future PR / A.3" comments now describe shipped code).

Findings

Worker-001

Field Value
Severity High
Category Concurrency & thread safety
Location src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207
Status Resolved

Description: When constructed with pollIntervalMilliseconds > 0, Subscribe starts a System.Threading.Timer whose OnPoll callback runs PollOnce() — which calls wwAlarmConsumerClass.GetXmlCurrentAlarms2 — on a thread-pool thread. The wnwrap CLSID is registered ThreadingModel=Apartment; calling its methods off the owning STA violates the hard rule that all COM calls happen on the dedicated STA thread, and can deadlock on cross-apartment marshaling when the STA is not pumping. The production path (default constructor, interval 0) is safe, but the public 3-arg constructor leaves this footgun callable, and tests/live-smoke use it.

Recommendation: Remove the internal Timer entirely (production already drives PollOnce from the STA), or document and gate it so it can only be used from an STA thread. At minimum, make the timer-driven mode unreachable from any production wiring.

Resolution: 2026-05-18 — Removed the off-STA timer infrastructure from WnWrapAlarmConsumer: the Timer? pollTimer and pollIntervalMs fields, the DefaultPollIntervalMilliseconds constant, the OnPoll callback, the timer-arming arm in Subscribe, and the timer disposal block in Dispose. The pollIntervalMilliseconds parameter is gone from both public constructors (the test-seam ctor is now 2-arg: wwAlarmConsumerClass + maxAlarmsPerFetch), so the off-STA footgun is structurally unreachable. PollOnce() remains the public STA-driven entry point. The stale "poll … on a timer below" comment was corrected. Verified by the regression tests WnWrapAlarmConsumer_has_no_internal_timer_field and WnWrapAlarmConsumer_exposes_no_poll_interval_constructor_parameter; the AlarmsLiveSmokeTests call site was updated to the 2-arg constructor.

Worker-002

Field Value
Severity High
Category Correctness & logic bugs
Location src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:545-549
Status Resolved

Description: RunHeartbeatLoopAsync calls await Task.Delay(_sessionOptions.HeartbeatInterval, ...) before sending the first heartbeat. The gateway therefore receives no heartbeat for the first full interval (default 5s) after the worker reaches Ready. If the gateway's liveness watchdog expects a heartbeat sooner, a healthy worker can be misclassified as hung at startup.

Recommendation: Send an initial heartbeat immediately on entering the loop, or move the Task.Delay to the end of the loop body.

Resolution: 2026-05-18 — Restructured RunHeartbeatLoopAsync so the Task.Delay(HeartbeatInterval) is applied between beats only, not before the first. A firstBeat guard skips the delay on the initial iteration, so the gateway sees a heartbeat as soon as the worker is Ready; cancellation behavior is preserved (the loop still observes the token and the delay still throws on cancellation). Verified by the regression test RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop. Three pre-existing tests (WorkerPipeClientTests.RunAsync_ConnectsToPipeAndCompletesHandshake, WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears, WorkerPipeSessionTests.RunAsync_WhenCommandThrowsAfterShutdown_DropsLateFaultAndWritesShutdownAck) assumed strict frame ordering and were updated to skip the now-interleaved first heartbeat while still asserting the same shutdown-ack behavior.

Worker-003

Field Value
Severity High
Category Correctness & logic bugs
Location src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403, :416-419
Status Resolved

Description: ProcessCommandAsync checks _state after DispatchAsync completes and silently returns without writing a WorkerCommandReply (or fault) when _state is not Ready/ExecutingCommand. _state is a plain field mutated from multiple tasks (heartbeat loop, event-drain loop, shutdown). A command that completes successfully while _state has transitioned will have its reply dropped with no diagnostic, and the gateway's correlation-id wait then hangs until its own timeout. The _state read is also not synchronized.

Recommendation: Always attempt to write the reply/fault for an in-flight command, or explicitly reject in-flight commands with a Canceled/WorkerUnavailable reply during state transitions. Make _state access thread-safe (volatile or locked).

Resolution: 2026-05-18 — Both silent-drop return sites in ProcessCommandAsync (the post-DispatchAsync success path and the exception path) now call a new LogCommandResultDropped helper before returning. The helper logs an Information event named WorkerCommandResultDropped via the session's IWorkerLogger, carrying the command's correlation_id plus command_method and worker_state, so a stuck gateway correlation-id wait is now traceable. The _state field was made volatile (WorkerState is an int-backed protobuf enum, so volatile is valid) so cross-thread reads observe the latest value without tearing; this is a low-risk, non-behavioral change and did not destabilize any test. Verified by the regression test RunAsync_WhenReplyIsDroppedAfterShutdown_LogsDiagnostic.

Worker-004

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588
Status Resolved

Description: After ReportWatchdogFaultIfNeededAsync sends an StaHung fault, the heartbeat loop continues sending normal heartbeats with State derived from _state, which the watchdog path never sets to Faulted. The heartbeat then keeps reporting a non-faulted state that contradicts the fault just sent.

Recommendation: Set _state = WorkerState.Faulted (thread-safely) when the watchdog fault fires so heartbeat state and fault stay consistent.

Resolution: 2026-05-18 — ReportWatchdogFaultIfNeededAsync now sets _state = WorkerState.Faulted immediately after _watchdogFaultSent = true and before the StaHung fault is written, so the next heartbeat reports Faulted instead of contradicting the fault. _state is already volatile (Worker-003), so the cross-thread write from the heartbeat loop is observed correctly by the heartbeat's own CreateHeartbeat read; no further locking is required. Verified by the regression test WorkerPipeSessionTests.RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState, which uses a stale-activity snapshot with an empty current-command correlation id so the heartbeat State is derived from _state rather than forced to ExecutingCommand.

Worker-005

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258 (production alarm poll loop)
Status Resolved

Description: OnPoll catches every exception from PollOnce() and discards it (_ = ex;). The production poll path (MxAccessStaSession.RunAlarmPollLoopAsyncAlarmCommandHandler.PollOnceAlarmDispatcher.PollOnceconsumer.PollOnce()) has no fault recording either. A permanently failing alarm provider (e.g. GetXmlCurrentAlarms2 returning E_FAIL, malformed XML throwing in XmlDocument.LoadXml) is therefore completely silent — no fault on the event queue, no log.

Recommendation: Route poll failures to MxAccessEventQueue.RecordFault (or a logger) so a broken alarm subscription becomes observable. Update the now-stale comment.

Re-triage: The cited location WnWrapAlarmConsumer.cs:297-313 and the OnPoll callback no longer exist as of this branch — Worker-001 removed the off-STA Timer and its OnPoll callback entirely. The substantive concern still held, however: the production poll path in MxAccessStaSession.RunAlarmPollLoopAsync caught only OperationCanceledException, ObjectDisposedException, and InvalidOperationException. A genuine poll failure (COMException from GetXmlCurrentAlarms2, a malformed-XML XmlException) escaped uncaught, faulted the never-awaited Task.Run poll task, and was silently lost — exactly the silent-failure the finding describes. The finding was re-pointed at the live location and fixed there rather than at the removed OnPoll.

Resolution: 2026-05-18 — RunAlarmPollLoopAsync gained a trailing catch (Exception exception) arm after the three graceful-stop catches. A real alarm-poll failure is now converted to a WorkerFault (category MxaccessEventConversionFailed, carrying the exception type and, for a COMException, its HResult) by the new CreateAlarmPollFault helper and recorded on the session's MxAccessEventQueue via RecordFault. The worker's event-drain loop drains that fault and forwards it to the gateway, so a broken alarm subscription is now observable on the IPC fault path instead of vanishing. The poll loop still stops after the failure (the subscription is dead). No new proto enum value was added — MxaccessEventConversionFailed is the closest existing alarm-path category, avoiding a contracts regeneration across all clients. Verified by the regression test MxAccessStaSessionTests.RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue.

Worker-006

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124, src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491
Status Resolved

Description: RunAsync's finally calls _runtimeSession?.Dispose() unless _shutdownTimedOut. On the normal path ShutdownGracefullyAsync already disposed the STA runtime, so re-entering Dispose() is a harmless no-op only because ShutdownGracefullyAsync reached its end and set disposed = true. If ShutdownGracefullyAsync throws TimeoutException after partial teardown with _shutdownTimedOut set, the session is never disposed at all — the finally skips it — leaking the STA thread and COM object, leaving cleanup to rely solely on process exit.

Recommendation: Make the dispose decision explicit and confirm process exit always follows a timed-out shutdown; otherwise dispose defensively. At minimum document why disposal is deliberately skipped on timeout.

Resolution: 2026-05-18 — RunAsync's finally now always calls _runtimeSession?.Dispose(); the if (!_shutdownTimedOut) guard and the _shutdownTimedOut field (which had become write-only) were removed. MxAccessStaSession.Dispose is idempotent (if (disposed) return) and bounded — each STA join is capped with Wait(TimeSpan.FromSeconds(2)) — so re-entering it on the normal path (where ShutdownGracefullyAsync already disposed the runtime) is a harmless no-op, while on the timed-out path it is now the only thing that reclaims the STA thread and releases the MXAccess COM object. The previous behaviour leaked both on a shutdown timeout and relied solely on process exit. A code comment in the finally block documents the reasoning. Verified by the regression test WorkerPipeSessionTests.RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession, which forces a TimeoutException from ShutdownGracefullyAsync and asserts the runtime session is disposed before RunAsync rethrows.

Worker-007

Field Value
Severity Medium
Category mxaccessgw conventions
Location src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150
Status Resolved

Description: Invoke uses late-bound Type.InvokeMember reflection as a fallback when the COM object does not cast to ILMXProxyServer*. In production the object is always LMXProxyServerClass, so the reflection path exists only for test doubles — it is dead/untested code on the production path and obscures the interface contract. params object[] arguments also boxes value-type handles on every call.

Recommendation: Drop the reflection fallback and require the COM object to implement the interface (tests can supply a typed fake), or clearly mark the fallback as test-only.

Re-triage: The finding's claim that the reflection path is "dead/untested code" is partly inaccurate — it was in fact the path exercised by the entire MxAccessCommandExecutorTests suite, whose FakeMxAccessComObject did not implement any typed interface. So the reflection fallback was test-only but not untested. The convention concern (bypassing the typed interface contract, boxing value-type handles) is valid, so the fix follows the recommendation's first option.

Resolution: 2026-05-18 — The late-bound Type.InvokeMember reflection fallback and its params object[]-boxing Invoke helper were removed from MxAccessComServer. Each adapter method now takes one of two typed paths: an is IMxAccessServer fast path (test fakes implement IMxAccessServer directly) and the production path that casts to the typed ILMXProxyServer / ILMXProxyServer3 / ILMXProxyServer4 COM interfaces via new AsProxyServer* helpers. A COM object implementing neither now fails fast with a clear InvalidOperationException naming the missing interface, instead of an opaque late-bound call. The test seam was migrated accordingly: MxAccessCommandExecutorTests.FakeMxAccessComObject now declares : IMxAccessServer (its method signatures already matched the interface exactly, so no behavioural change). Verified by the new MxAccessComServerTests (typed-server routing, untyped-object rejection, original-exception propagation — no more TargetInvocationException wrapping) plus the unchanged, still-passing MxAccessCommandExecutorTests suite which now exercises the typed IMxAccessServer path.

Worker-008

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249, :429-447
Status Resolved

Description: RunAlarmPollLoopAsync correctly marshals handler.PollOnce() onto the STA via staRuntime.InvokeAsync, and the cancel/await/dispose ordering in ShutdownGracefullyAsync is sound. However, nothing enforces that the consumerFactory and all IMxAccessAlarmConsumer calls run on the STA thread; a future caller could break STA affinity silently.

Recommendation: Add an assertion or documented invariant that the consumer factory and all IMxAccessAlarmConsumer calls run on the STA thread, mirroring the existing MxAccessSession.CreationThreadId pattern.

Resolution: 2026-05-18 — MxAccessStaSession now records the STA thread id (alarmConsumerThreadId) at the point the alarm-command-handler factory is invoked — which already runs inside staRuntime.InvokeAsync during StartAsync, mirroring the MxAccessSession.CreationThreadId capture. RunAlarmPollLoopAsync's marshalled poll lambda now calls EnsureOnAlarmConsumerThread() before handler.PollOnce(), asserting the poll runs on the recorded STA thread. The check is delegated to a new internal static guard AssertOnAlarmConsumerThread(int? expected, int actual) that throws a descriptive InvalidOperationException on an affinity violation and is a no-op when the consumer thread is unrecorded (no alarm handler configured). Making the guard static and internal keeps it directly unit-testable. The STA-affinity invariant is documented in the guard's XML doc. Verified by the regression tests MxAccessStaSessionTests.AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws and AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow.

Worker-009

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49, src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58
Status Resolved

Description: Every frame read allocates a fresh 4-byte length buffer and a payload byte[]; every write allocates ToByteArray() plus a 4-byte prefix. On the hot event-drain path (batches of up to 128 WorkerEvent frames every 25 ms) this produces steady gen-0 garbage. WorkerFrameWriter also effectively serializes twice (CalculateSize() then ToByteArray()).

Recommendation: Reuse a pooled buffer / ArrayPool<byte> for the length prefix and payload, and write directly into a pooled buffer using CodedOutputStream. Low priority unless event throughput is high.

Resolution: 2026-05-18 — WorkerFrameWriter.WriteAsync now serializes the envelope exactly once into a single frame buffer that carries the 4-byte length prefix followed by the payload, via envelope.WriteTo(new Span<byte>(frame, sizeof(uint), payloadLength)). This eliminates the redundant second serialization pass (ToByteArray() re-runs CalculateSize() internally), the separate length-prefix array, and the separate prefix WriteAsync/extra FlushAsync round. WorkerFrameReader.ReadAsync now rents its payload buffer from ArrayPool<byte>.Shared and returns it in a finally once WorkerEnvelope.Parser.ParseFrom(payload, 0, length) has copied what it needs; ReadExactlyOrThrowAsync gained an explicit count parameter so it honours the logical frame length rather than the (possibly larger) rented buffer length. The 4-byte length-prefix buffer is left as a per-call stack-sized allocation — pooling a 4-byte array is not worthwhile. Verified by the new regression test WorkerFrameProtocolTests.ReadAsync_WithVaryingFrameSizes_ParsesEachFrameExactly, which reads a large frame followed by a small frame through one reader to prove the pooled buffer is sliced to each frame's own length and never leaks stale trailing bytes; the existing round-trip, malformed-payload, and concurrent-write tests continue to pass.

Worker-010

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226
Status Resolved

Description: ConvertInt64Scalar is reached for TypeCode.UInt32 and TypeCode.Int64. For a uint with expectedDataType == MxDataType.Time, the value is treated as a Windows FILETIME via DateTime.FromFileTimeUtc(longValue); a 32-bit FILETIME is never a valid full FILETIME, so this silently produces a near-epoch timestamp rather than a raw/diagnostic value. Unlikely in practice but a silent misconversion.

Recommendation: Only apply the MxDataType.Time FILETIME projection for 64-bit source types; for uint fall through to integer or raw.

Resolution: 2026-05-18 — ConvertInt64Scalar's MxDataType.Time FILETIME projection is now gated on value is long. A genuine 64-bit long still projects to a Timestamp via DateTime.FromFileTimeUtc; a 32-bit uint — which can only hold the low half of a FILETIME — now falls through to the integer projection (DataType = Integer, Int64Value) instead of silently producing a bogus near-1601 timestamp. Verified by the regression test VariantConverterTests.Convert_WithUInt32AndExpectedTime_DoesNotProjectFileTime; the existing Convert_WithFileTimeAndExpectedTime_ProjectsTimestamp (a long FILETIME) continues to pass, confirming the 64-bit path is unchanged.

Worker-011

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171
Status Resolved

Description: retryAttempts is computed as (connectTimeout / min(connectTimeout, attemptTimeout)) - 1. With defaults (30000 / 2000) this yields 14 retries, but each retry also incurs Polly exponential backoff. The overall connectDeadline (CancelAfter(connectTimeout)) is the real bound, so the computed attempt count can be larger or smaller than the time budget allows, and the formula is opaque.

Recommendation: Drive retries purely off the connectDeadline token (Polly stops when cancelled) and drop the fragile attempt-count arithmetic, or add a comment explaining the intent.

Resolution: 2026-05-18 — The opaque retryAttempts arithmetic in ConnectWithRetryAsync was removed. MaxRetryAttempts is now int.MaxValue, so the retry loop is bounded solely by the connectDeadline linked token (CancelAfter(_connectTimeoutMilliseconds)): Polly stops retrying the moment that token is cancelled, making the overall connect timeout the single source of truth and correctly accounting for the exponential backoff between attempts (which the old formula ignored). A comment documents the intent. No new test was added — the change does not alter observable behavior (the deadline was always the real bound; the old formula always permitted more attempts than fit the budget), and the existing WorkerPipeClientTests.RunAsync_RetriesUntilPipeServerAppears (server appears mid-retry) and RunAsync_WhenPipeNeverAppears_ThrowsTimeoutException (deadline ends the loop) already cover both retry-until-success and deadline-bounded termination.

Worker-012

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs:44-55, src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:38-43, src/MxGateway.Worker/MxAccess/MxAccessEventMapper.cs:106-112
Status Resolved

Description: Multiple comments describe the alarm path as not-yet-wired future work ("PR A.2 — COM-side subscription scaffold … the worker advertises no alarm subscription", "the worker bootstrap will gain a thin 'run-on-STA' wrapper as part of A.3"). As of commit 6c64030 the alarm command handler, STA poll loop, and SubscribeAlarms/AcknowledgeAlarm/QueryActiveAlarms are all wired. These comments are stale and misleading.

Recommendation: Update the XML docs/comments to describe the shipped behavior; remove the "future PR" framing.

Re-triage: The WnWrapAlarmConsumer.cs:38-43 citation is inaccurate — those lines were rewritten by Worker-001 and already describe the shipped no-internal-timer threading model correctly; nothing stale there. Conversely, two stale comments the finding did not cite were found on the same alarm path and fixed under the same root cause: AlarmDispatcher.cs's <remarks> still framed the dispatcher as "the in-process slice of A.3" with a "companion follow-up PR" adding the (now-shipped) SubscribeAlarmsCommand/AcknowledgeAlarmCommand/QueryActiveAlarmsCommand, and stated the consumer "polls on a System.Threading.Timer thread today" — a claim made false by Worker-001's removal of that timer; and AlarmCommandHandler.cs's <remarks> likewise asserted "the wnwrap consumer's polling timer fires on a thread-pool thread". The discovery document docs/AlarmClientDiscovery.md (referenced by the source comments) was deliberately left untouched: it is a historical research log of the investigation that chose the shipped design, not API/contract/lifecycle prose, and the source comments cite only its still-accurate "Option A — captured" payload schema.

Resolution: 2026-05-18 — Rewrote the stale alarm-path comments to describe shipped behavior with no "future PR / A.2 / A.3" framing. MxAccessAlarmEventSink: the class <remarks> and the Attach comment now explain that AlarmDispatcher owns the consumer→sink→queue wire-up and that Attach carries only the session id (no COM-event subscription is needed because the polled wnwrap consumer raises transition events itself). MxAccessEventMapper.CreateOnAlarmTransition's XML summary now states the worker drives it from MxAccessAlarmEventSink.EnqueueTransition once AlarmDispatcher decodes a wnwrap transition. AlarmDispatcher and AlarmCommandHandler <remarks> were corrected to describe the shipped command surface and the no-internal-timer / STA-driven polling model (the System.Threading.Timer claims were factually wrong post-Worker-001). Pure documentation change — no behavior altered, no test needed; the build stays green.

Worker-013

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Worker/Sta/StaMessagePump.cs
Status Resolved

Description: StaMessagePump — the heart of COM event delivery (MsgWaitForMultipleObjectsEx + PeekMessage/DispatchMessage) — has no direct unit tests. StaRuntimeTests exercises it indirectly for command wake-up but never verifies that a posted Windows message actually wakes the wait and is dispatched, nor that PumpPendingMessages returns a correct count. The alarm poll-loop lifecycle in MxAccessStaSession (start/cancel/await on shutdown) also has no test. These are the most failure-sensitive paths in the module.

Recommendation: Add tests that post a message to the STA thread and assert it is pumped, and tests covering alarm poll-loop start/stop and shutdown ordering.

Re-triage: This finding is stale as of the reviewed branch — the coverage it asks for already exists. src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs contains direct StaMessagePump tests covering null-argument validation, waking on a signalled event, returning on timeout, the zero-timeout conversion branch, PumpPendingMessages returning the correct count for messages posted to the STA thread (PumpPendingMessages_MessagesPostedToStaThread_ReturnsCountProcessed, PumpPendingMessages_NoMessagesPosted_ReturnsZero), and WaitForWorkOrMessages waking on a posted Windows message (WaitForWorkOrMessages_WindowsMessagePosted_ReturnsForInputAvailable) — exactly the "post a message and assert it is pumped" test the recommendation asks for. The alarm poll-loop lifecycle is covered by MxAccessStaSessionTests.StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta (start → poll runs on the STA) and Dispose_StopsAlarmPollLoop (Dispose joins the poll task; no further polls). The finding was raised against a stale view of the test project; no source or test change is required. Re-triaged as already resolved rather than fixed.

Resolution: 2026-05-18 — No code change. Re-triaged: the requested direct StaMessagePump tests (including posted-message dispatch and pump count) and the alarm poll-loop start/stop lifecycle tests already exist in StaMessagePumpTests.cs and MxAccessStaSessionTests.cs. See the re-triage note above for the specific test names.

Worker-014

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:33, :202
Status Resolved

Description: The file declares two public types — the AlarmCommandHandler class and the IAlarmCommandHandler interface. The C# style guide and the rest of the module follow one-public-type-per-file (e.g. interfaces in their own I*.cs files like IMxAccessAlarmConsumer.cs).

Recommendation: Move IAlarmCommandHandler to its own IAlarmCommandHandler.cs for consistency.

Resolution: 2026-05-18 — The IAlarmCommandHandler interface (with its XML docs) was moved verbatim out of AlarmCommandHandler.cs into a new src/MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs, with its own using directives (System, System.Collections.Generic, MxGateway.Contracts.Proto). AlarmCommandHandler.cs now declares one public type, matching the module's one-public-type-per-file convention (cf. IMxAccessAlarmConsumer.cs). Pure file-organization change — no API surface, behavior, or namespace changed; no test needed. The worker build is clean with zero warnings (no unused usings left behind in AlarmCommandHandler.cs).

Worker-015

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:115-145
Status Resolved

Description: On overflow, Enqueue records the overflow fault and throws MxAccessEventQueueOverflowException; MxAccessBaseEventSink.EnqueueEvent catches it and calls RecordFault again. RecordFault is a no-op when a fault already exists, so the second call is harmless — but the intent is muddled, and there is no test asserting the dropped-event behavior. This is acceptable per the fail-fast design but undocumented at the call site.

Recommendation: Add a brief comment in EnqueueEvent clarifying that an overflow exception is expected and already self-records its fault, so the catch is intentionally a near no-op.

Resolution: 2026-05-18 — Added a comment in MxAccessBaseEventSink.EnqueueEvent's catch block (per the finding's recommendation) explaining that two distinct fail-fast failures land there: a conversion failure from createEvent() (recorded here as an MxaccessEventConversionFailed fault) and an MxAccessEventQueueOverflowException from Enqueue at capacity, which — per the fail-fast backpressure design in docs/DesignDecisions.md — drops the event and has already self-recorded a QueueOverflow fault inside Enqueue. Because MxAccessEventQueue.RecordFault keeps only the first fault, the catch's RecordFault call is then a deliberate near no-op rather than a second, conflicting fault. Pure comment change as recommended — no behavior altered. docs/DesignDecisions.md already documents the fail-fast event backpressure rule, so no doc change was required.