Files
mxaccessgw/code-reviews/Worker.Tests/findings.md
T
Joseph Doherty 1b4dcf32d5 Resolve Worker.Tests-001 and Worker.Tests-002 code-review findings
Worker.Tests-001: StaMessagePump had no direct unit test. Added
Sta/StaMessagePumpTests.cs — 8 STA-thread facts covering WaitForWorkOrMessages
(wake-event signalled before/during the wait, timeout expiry, zero-timeout
fast path, the QS_ALLINPUT posted-message wake path) and PumpPendingMessages
drain counting.

Worker.Tests-002: no test drove a COM event through the integrated
sink -> mapper -> queue path. Added MxAccess/MxAccessBaseEventSinkTests.cs —
5 facts driving OnDataChange, OnWriteComplete, OperationComplete and
OnBufferedDataChange through a real MxAccessBaseEventSink + mapper + queue and
asserting the converted WorkerEvent lands in MxAccessEventQueue. The four COM
event handlers were widened private -> internal and InternalsVisibleTo for
MxGateway.Worker.Tests was added, mirroring MxAccessAlarmEventSink's existing
test seam; no worker behavior changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:07:48 -04:00

253 lines
16 KiB
Markdown

# Code Review — Worker.Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 13 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Worker.Tests-010 (weak substring assertion), Worker.Tests-011 (test name overstates what it proves). |
| 2 | mxaccessgw conventions | Tests respect STA-affinity and the WorkerEnvelope frame protocol; naming-convention drift only (Worker.Tests-009). |
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-003/004/013 (wall-clock and fixed-delay timing assertions). |
| 4 | Error handling & resilience | COMException/HResult, pipe-never-appears, malformed frames, shutdown-during-command, watchdog all covered; queue branch gap (Worker.Tests-015). |
| 5 | Security | No real secrets; redaction explicitly tested. No issues found. |
| 6 | Performance & resource management | Issues found: Worker.Tests-005 (`MemoryStream` not disposed), Worker.Tests-006 (`MxAccessStaSession` leak on assertion failure). |
| 7 | Design-document adherence | Tests match `docs/Worker*.md`; `docs/WorkerFrameProtocol.md` is stale (Worker.Tests-007). |
| 8 | Code organization & conventions | Issues found: Worker.Tests-009 (two naming conventions), Worker.Tests-014 (duplicated test doubles). |
| 9 | Testing coverage | Issues found: Worker.Tests-001 (`StaMessagePump` untested), Worker.Tests-002 (COM-event delivery untested), Worker.Tests-012 (frame-validation gaps). |
| 10 | Documentation & comments | Issues found: Worker.Tests-008 (misplaced redaction test), Worker.Tests-011 (misleading test name). |
## Findings
### Worker.Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Status | Resolved |
**Description:** `StaMessagePump` — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. `WaitForWorkOrMessages` (timeout conversion, the `MsgWaitForMultipleObjectsEx` failure path) and `PumpPendingMessages` (drain count) are exercised only indirectly via `StaRuntime`, which never asserts the pump returns/throws correctly. The `MsgWaitFailed` error branch and `ToTimeoutMilliseconds` edge cases (`InfiniteTimeSpan`, `<= Zero`, `>= uint.MaxValue`) are completely uncovered.
**Recommendation:** Add `StaMessagePumpTests` that post a Windows message to the STA thread and assert `PumpPendingMessages` returns the expected count; cover `WaitForWorkOrMessages` waking on a signaled event vs timeout; cover `ToTimeoutMilliseconds` boundaries through an internals-visible seam.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs` (8 `[Fact]` tests, run on dedicated STA threads). Covers `WaitForWorkOrMessages` null-argument validation, returning immediately when the wake event is pre-signalled, waking when the event is signalled mid-wait, returning on timeout when never signalled, the `TimeSpan.Zero` (`<= Zero`) conversion branch, and waking on a `WM_NULL` Windows message posted to the STA thread (the `QS_ALLINPUT` path). `PumpPendingMessages` is covered for both an empty queue (returns 0) and three posted messages (returns 3). Boundary noted in the file: the `MsgWaitFailed` branch is not exercised because forcing `MsgWaitForMultipleObjectsEx` to fail needs a deliberately invalid native handle, which is unsafe to construct in-process; `ToTimeoutMilliseconds` is `private static` and is covered indirectly through wait-latency assertions rather than reflection.
### Worker.Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Status | Resolved |
**Description:** No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the `MxAccessEventQueue`. `MxAccessEventMapperTests` exercises the mapper directly with hand-built fakes, and `AlarmDispatcherTests` covers the alarm sink, but the non-alarm COM-event path (`MxAccessBaseEventSink`/`MxAccessComServer` event handlers → `MxAccessEventMapper` → queue, triggered by an actual sink callback) is never end-to-end tested. Given the worker's core purpose is to convert COM events to protobuf, this is a significant gap.
**Recommendation:** Add a test that invokes the base event sink's data-change handler (via an internal seam or a fake COM event source) and asserts a converted `WorkerEvent` with correct family/sequence appears in the queue.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/MxAccess/MxAccessBaseEventSinkTests.cs` (5 `[Fact]` tests). The four `MxAccessBaseEventSink` COM event handlers (`OnDataChange`, `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`) — the exact delegate targets the MXAccess COM runtime invokes — were widened from `private` to `internal` (with XML-doc notes that this is a unit-test seam), and `[assembly: InternalsVisibleTo("MxGateway.Worker.Tests")]` was added to `MxGateway.Worker.csproj`. The tests construct a real `MxAccessBaseEventSink` over a real `MxAccessEventMapper` and `MxAccessEventQueue`, invoke each handler with COM-style arguments, and assert a correctly-converted protobuf `WorkerEvent` (family, body case, server/item handle, value, quality, source timestamp, monotonic `WorkerSequence`) lands in the queue. Boundary noted in the file: the COM `+=` wire-up in `Attach`/`Detach` casts to the sealed `LMXProxyServerClass` RCW and cannot run without a live MXAccess COM object, so it is not exercised; invoking the handlers directly reproduces an STA-thread COM callback and exercises the genuine conversion + enqueue path.
### Worker.Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Status | Open |
**Description:** `InvokeAsync_WakesIdlePumpForQueuedCommand` asserts `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake event (vs the 50 ms idle pump) caused the dispatch.
**Recommendation:** Remove the wall-clock assertion (the awaited result already proves the command ran), or raise the budget substantially with a comment that it is a coarse smoke check.
**Resolution:** _(open)_
### Worker.Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Status | Open |
**Description:** `StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` use poll-until loops, and `Dispose_StopsAlarmPollLoop` additionally does `await Task.Delay(1000)` then asserts `PollCount` is unchanged. The 1s "no further polls" window is a timing race: a poll scheduled just before disposal could increment the counter afterward, and a slow agent could simply not run a poll in the window even without correct stop logic.
**Recommendation:** Make the poll loop deterministically observable — expose a "poll loop stopped" signal or have `Dispose` join the poll task — then assert on that rather than on elapsed-time silence.
**Resolution:** _(open)_
### Worker.Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Status | Open |
**Description:** `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suite (which carefully `using`s `CancellationTokenSource`, `StaRuntime`, `PipePair`). `WorkerFrameWriter`/`WorkerFrameReader` are also constructed without disposal.
**Recommendation:** Wrap `MemoryStream` (and reader/writer if they are `IDisposable`) in `using` declarations for consistency.
**Resolution:** _(open)_
### Worker.Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Status | Open |
**Description:** `Dispose_StopsAlarmPollLoop` constructs `MxAccessStaSession session` without `using` (unlike every sibling test) and relies on an explicit `session.Dispose()`. If an assertion between `StartAsync` and `Dispose()` throws, the session — its STA thread and poll loop — leaks for the rest of the run. The `StaRuntime` is `using`d so the thread is eventually reclaimed, but the alarm poll loop and handler are not.
**Recommendation:** Use `using MxAccessStaSession session = ...` and drop the manual `Dispose()`, or wrap the body in try/finally.
**Resolution:** _(open)_
### Worker.Tests-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/WorkerFrameProtocol.md:38-49` |
| Status | Open |
**Description:** `docs/WorkerFrameProtocol.md` instructs running `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests` and states the frame protocol "is part of `MxGateway.Server`". The frame protocol actually lives in `MxGateway.Worker.Ipc` and is tested by `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The doc's verification command points at the wrong project and build, so anyone following it after changing the worker frame protocol will not run the relevant tests.
**Recommendation:** Update `docs/WorkerFrameProtocol.md` to reference `src/MxGateway.Worker.Tests` and the x86 worker build (`-p:Platform=x86`).
**Resolution:** _(open)_
### Worker.Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` |
| Status | Open |
**Description:** `Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging` lives in `VariantConverterTests` but asserts on `WorkerLogRedactor.RedactValue`, which has nothing to do with `VariantConverter`. It is also a near-duplicate of coverage in `WorkerLogRedactorTests`. Placing redaction coverage inside the variant-converter class is misleading.
**Recommendation:** Move this test into `Bootstrap/WorkerLogRedactorTests.cs` (which already exists and tests `RedactFields`).
**Resolution:** _(open)_
### Worker.Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` |
| Status | Open |
**Description:** The alarm-related test files use `snake_case` method names while the rest of the project uses the `Method_State_Result` PascalCase convention. `docs/style-guides/CSharpStyleGuide.md` and the surrounding code establish PascalCase as the project convention; the alarm files diverge.
**Recommendation:** Rename alarm-test methods to the `Method_Scenario_Expectation` PascalCase form for one consistent convention.
**Resolution:** _(open)_
### Worker.Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` |
| Status | Open |
**Description:** `StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest` asserts `Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase)`. The XML doc claims it verifies the diagnostic says "alarm consumer not configured", but the assertion only checks the substring "alarm" — which would also match an unrelated message like "invalid alarm GUID". The assertion is weaker than the documented intent.
**Recommendation:** Assert the full diagnostic phrase so the test fails if the diagnostic regresses to a misleading message.
**Resolution:** _(open)_
### Worker.Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` |
| Status | Open |
**Description:** `DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply` is named and documented as if it proves cancellation arrived after execution began. The test does `Started.Wait(...)` then `cancellation.Cancel()`, which proves execution started, but because the executor is already running on the STA the cancellation is inherently a no-op — the test cannot distinguish "cancel was observed and ignored" from "cancel was never checked". The name overstates what is proven.
**Recommendation:** Either tighten the test (assert the dispatcher's cancel path was reached and declined) or rename/comment it to "cancellation cannot abort an in-flight STA command", matching `gateway.md`'s stated behavior.
**Resolution:** _(open)_
### Worker.Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Status | Open |
**Description:** `docs/WorkerFrameProtocol.md` states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." `WorkerFrameProtocolTests` covers malformed-length, wrong protocol version, wrong session, and malformed payload, but has no test for the zero-length-payload rejection or the oversized-frame rejection — both explicit security-relevant input-validation paths.
**Recommendation:** Add tests feeding a frame with `payload_length == 0` and one with `payload_length` above the configured maximum, asserting the corresponding `WorkerFrameProtocolErrorCode`.
**Resolution:** _(open)_
### Worker.Tests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` |
| Status | Open |
**Description:** `ThrowIfCompletedAsync` does an unconditional `await Task.Delay(TimeSpan.FromMilliseconds(100))` then checks `task.IsCompleted`. This adds a fixed 100 ms to the test and only catches a `RunAsync` that fails within that arbitrary window; a session that faults after 100 ms slips past undetected.
**Recommendation:** Replace with a deterministic race: `await Task.WhenAny(runTask, <first-expected-frame-read>)` and assert the run task did not win.
**Resolution:** _(open)_
### Worker.Tests-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` |
| Status | Open |
**Description:** `FakeRuntimeSession`, `NoopComApartmentInitializer`, `NoopEventSink`/`NullEventSink`, and the `CreateFrame`/`WriteUInt32LittleEndian` helpers are re-implemented independently in multiple test files. The two `FakeRuntimeSession` implementations have already diverged (one supports `BlockDispatch`/event enqueue, one does not), and `NoopComApartmentInitializer` is defined four times.
**Recommendation:** Extract shared test doubles (`NoopComApartmentInitializer`, frame helpers, a single configurable `FakeRuntimeSession`) into a `TestSupport` folder/namespace consumed by all test classes.
**Resolution:** _(open)_
### Worker.Tests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` |
| Status | Open |
**Description:** `MxAccessEventQueueTests` covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover `Drain` with `maxEvents: 0` (drain-all) — a branch `FakeRuntimeSession.DrainEvents` even special-cases — nor draining an empty queue, nor enqueue after a manual `RecordFault`. These are minor branches but the overflow/fault interaction is the worker's backpressure contract.
**Recommendation:** Add a `Drain(0)` drain-all test and an empty-queue drain test.
**Resolution:** _(open)_