Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state. ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing the StaHung fault. Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001; the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync, which caught only graceful-stop exceptions. A failing PollOnce now records a WorkerFault on the event queue instead of vanishing on a non-awaited task. Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed out, leaking the STA thread and COM object. It now always disposes (MxAccessStaSession.Dispose is idempotent and bounded). Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember reflection fallback with an IMxAccessServer fast path plus typed ILMXProxyServer* casts; a non-conforming object now fails fast. Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession records the alarm consumer's STA thread id and asserts every PollOnce runs on it via a unit-testable guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
| Review date | 2026-05-18 |
|
||||
| Commit reviewed | `6c64030` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -78,13 +78,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** After `ReportWatchdogFaultIfNeededAsync` sends an `StaHung` fault, the heartbeat loop continues sending normal heartbeats with `State` derived from `_state`, which the watchdog path never sets to `Faulted`. The heartbeat then keeps reporting a non-faulted state that contradicts the fault just sent.
|
||||
|
||||
**Recommendation:** Set `_state = WorkerState.Faulted` (thread-safely) when the watchdog fault fires so heartbeat state and fault stay consistent.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** 2026-05-18 — `ReportWatchdogFaultIfNeededAsync` now sets `_state = WorkerState.Faulted` immediately after `_watchdogFaultSent = true` and before the `StaHung` fault is written, so the next heartbeat reports `Faulted` instead of contradicting the fault. `_state` is already `volatile` (Worker-003), so the cross-thread write from the heartbeat loop is observed correctly by the heartbeat's own `CreateHeartbeat` read; no further locking is required. Verified by the regression test `WorkerPipeSessionTests.RunAsync_AfterWatchdogFault_HeartbeatReportsFaultedState`, which uses a stale-activity snapshot with an empty current-command correlation id so the heartbeat `State` is derived from `_state` rather than forced to `ExecutingCommand`.
|
||||
|
||||
### Worker-005
|
||||
|
||||
@@ -92,14 +92,16 @@
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:297-313` |
|
||||
| Status | Open |
|
||||
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `OnPoll` catches every exception from `PollOnce()` and discards it (`_ = ex;`). The production poll path (`MxAccessStaSession.RunAlarmPollLoopAsync` → `AlarmCommandHandler.PollOnce` → `AlarmDispatcher.PollOnce` → `consumer.PollOnce()`) has no fault recording either. A permanently failing alarm provider (e.g. `GetXmlCurrentAlarms2` returning `E_FAIL`, malformed XML throwing in `XmlDocument.LoadXml`) is therefore completely silent — no fault on the event queue, no log.
|
||||
|
||||
**Recommendation:** Route poll failures to `MxAccessEventQueue.RecordFault` (or a logger) so a broken alarm subscription becomes observable. Update the now-stale comment.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Re-triage:** The cited location `WnWrapAlarmConsumer.cs:297-313` and the `OnPoll` callback no longer exist as of this branch — Worker-001 removed the off-STA `Timer` and its `OnPoll` callback entirely. The substantive concern still held, however: the **production** poll path in `MxAccessStaSession.RunAlarmPollLoopAsync` caught only `OperationCanceledException`, `ObjectDisposedException`, and `InvalidOperationException`. A genuine poll failure (`COMException` from `GetXmlCurrentAlarms2`, a malformed-XML `XmlException`) escaped uncaught, faulted the never-awaited `Task.Run` poll task, and was silently lost — exactly the silent-failure the finding describes. The finding was re-pointed at the live location and fixed there rather than at the removed `OnPoll`.
|
||||
|
||||
**Resolution:** 2026-05-18 — `RunAlarmPollLoopAsync` gained a trailing `catch (Exception exception)` arm after the three graceful-stop catches. A real alarm-poll failure is now converted to a `WorkerFault` (category `MxaccessEventConversionFailed`, carrying the exception type and, for a `COMException`, its `HResult`) by the new `CreateAlarmPollFault` helper and recorded on the session's `MxAccessEventQueue` via `RecordFault`. The worker's event-drain loop drains that fault and forwards it to the gateway, so a broken alarm subscription is now observable on the IPC fault path instead of vanishing. The poll loop still stops after the failure (the subscription is dead). No new proto enum value was added — `MxaccessEventConversionFailed` is the closest existing alarm-path category, avoiding a contracts regeneration across all clients. Verified by the regression test `MxAccessStaSessionTests.RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue`.
|
||||
|
||||
### Worker-006
|
||||
|
||||
@@ -108,13 +110,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `RunAsync`'s `finally` calls `_runtimeSession?.Dispose()` unless `_shutdownTimedOut`. On the normal path `ShutdownGracefullyAsync` already disposed the STA runtime, so re-entering `Dispose()` is a harmless no-op only because `ShutdownGracefullyAsync` reached its end and set `disposed = true`. If `ShutdownGracefullyAsync` throws `TimeoutException` after partial teardown with `_shutdownTimedOut` set, the session is never disposed at all — the `finally` skips it — leaking the STA thread and COM object, leaving cleanup to rely solely on process exit.
|
||||
|
||||
**Recommendation:** Make the dispose decision explicit and confirm process exit always follows a timed-out shutdown; otherwise dispose defensively. At minimum document why disposal is deliberately skipped on timeout.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** 2026-05-18 — `RunAsync`'s `finally` now always calls `_runtimeSession?.Dispose()`; the `if (!_shutdownTimedOut)` guard and the `_shutdownTimedOut` field (which had become write-only) were removed. `MxAccessStaSession.Dispose` is idempotent (`if (disposed) return`) and bounded — each STA join is capped with `Wait(TimeSpan.FromSeconds(2))` — so re-entering it on the normal path (where `ShutdownGracefullyAsync` already disposed the runtime) is a harmless no-op, while on the timed-out path it is now the only thing that reclaims the STA thread and releases the MXAccess COM object. The previous behaviour leaked both on a shutdown timeout and relied solely on process exit. A code comment in the `finally` block documents the reasoning. Verified by the regression test `WorkerPipeSessionTests.RunAsync_WhenShutdownTimesOut_StillDisposesRuntimeSession`, which forces a `TimeoutException` from `ShutdownGracefullyAsync` and asserts the runtime session is disposed before `RunAsync` rethrows.
|
||||
|
||||
### Worker-007
|
||||
|
||||
@@ -123,13 +125,15 @@
|
||||
| Severity | Medium |
|
||||
| Category | mxaccessgw conventions |
|
||||
| Location | `src/MxGateway.Worker/MxAccess/MxAccessComServer.cs:130-150` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `Invoke` uses late-bound `Type.InvokeMember` reflection as a fallback when the COM object does not cast to `ILMXProxyServer*`. In production the object is always `LMXProxyServerClass`, so the reflection path exists only for test doubles — it is dead/untested code on the production path and obscures the interface contract. `params object[] arguments` also boxes value-type handles on every call.
|
||||
|
||||
**Recommendation:** Drop the reflection fallback and require the COM object to implement the interface (tests can supply a typed fake), or clearly mark the fallback as test-only.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Re-triage:** The finding's claim that the reflection path is "dead/untested code" is partly inaccurate — it was in fact the path exercised by the entire `MxAccessCommandExecutorTests` suite, whose `FakeMxAccessComObject` did not implement any typed interface. So the reflection fallback was test-only but *not* untested. The convention concern (bypassing the typed interface contract, boxing value-type handles) is valid, so the fix follows the recommendation's first option.
|
||||
|
||||
**Resolution:** 2026-05-18 — The late-bound `Type.InvokeMember` reflection fallback and its `params object[]`-boxing `Invoke` helper were removed from `MxAccessComServer`. Each adapter method now takes one of two typed paths: an `is IMxAccessServer` fast path (test fakes implement `IMxAccessServer` directly) and the production path that casts to the typed `ILMXProxyServer` / `ILMXProxyServer3` / `ILMXProxyServer4` COM interfaces via new `AsProxyServer*` helpers. A COM object implementing neither now fails fast with a clear `InvalidOperationException` naming the missing interface, instead of an opaque late-bound call. The test seam was migrated accordingly: `MxAccessCommandExecutorTests.FakeMxAccessComObject` now declares `: IMxAccessServer` (its method signatures already matched the interface exactly, so no behavioural change). Verified by the new `MxAccessComServerTests` (typed-server routing, untyped-object rejection, original-exception propagation — no more `TargetInvocationException` wrapping) plus the unchanged, still-passing `MxAccessCommandExecutorTests` suite which now exercises the typed `IMxAccessServer` path.
|
||||
|
||||
### Worker-008
|
||||
|
||||
@@ -138,13 +142,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-249`, `:429-447` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `RunAlarmPollLoopAsync` correctly marshals `handler.PollOnce()` onto the STA via `staRuntime.InvokeAsync`, and the cancel/await/dispose ordering in `ShutdownGracefullyAsync` is sound. However, nothing enforces that the `consumerFactory` and all `IMxAccessAlarmConsumer` calls run on the STA thread; a future caller could break STA affinity silently.
|
||||
|
||||
**Recommendation:** Add an assertion or documented invariant that the consumer factory and all `IMxAccessAlarmConsumer` calls run on the STA thread, mirroring the existing `MxAccessSession.CreationThreadId` pattern.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** 2026-05-18 — `MxAccessStaSession` now records the STA thread id (`alarmConsumerThreadId`) at the point the alarm-command-handler factory is invoked — which already runs inside `staRuntime.InvokeAsync` during `StartAsync`, mirroring the `MxAccessSession.CreationThreadId` capture. `RunAlarmPollLoopAsync`'s marshalled poll lambda now calls `EnsureOnAlarmConsumerThread()` before `handler.PollOnce()`, asserting the poll runs on the recorded STA thread. The check is delegated to a new `internal static` guard `AssertOnAlarmConsumerThread(int? expected, int actual)` that throws a descriptive `InvalidOperationException` on an affinity violation and is a no-op when the consumer thread is unrecorded (no alarm handler configured). Making the guard `static` and `internal` keeps it directly unit-testable. The STA-affinity invariant is documented in the guard's XML doc. Verified by the regression tests `MxAccessStaSessionTests.AssertOnAlarmConsumerThread_WhenOffOwningThread_Throws` and `AssertOnAlarmConsumerThread_OnOwningThreadOrUnset_DoesNotThrow`.
|
||||
|
||||
### Worker-009
|
||||
|
||||
|
||||
Reference in New Issue
Block a user