Files
mxaccessgw/code-reviews/Worker.Tests/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

405 lines
54 KiB
Markdown

# Code Review — Worker.Tests
| Field | Value |
|---|---|
| Module | `src/MxGateway.Worker.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | `1cd51bb` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
### 2026-05-18 review (commit `6c64030`)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Worker.Tests-010 (weak substring assertion), Worker.Tests-011 (test name overstates what it proves). |
| 2 | mxaccessgw conventions | Tests respect STA-affinity and the WorkerEnvelope frame protocol; naming-convention drift only (Worker.Tests-009). |
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-003/004/013 (wall-clock and fixed-delay timing assertions). |
| 4 | Error handling & resilience | COMException/HResult, pipe-never-appears, malformed frames, shutdown-during-command, watchdog all covered; queue branch gap (Worker.Tests-015). |
| 5 | Security | No real secrets; redaction explicitly tested. No issues found. |
| 6 | Performance & resource management | Issues found: Worker.Tests-005 (`MemoryStream` not disposed), Worker.Tests-006 (`MxAccessStaSession` leak on assertion failure). |
| 7 | Design-document adherence | Tests match `docs/Worker*.md`; `docs/WorkerFrameProtocol.md` is stale (Worker.Tests-007). |
| 8 | Code organization & conventions | Issues found: Worker.Tests-009 (two naming conventions), Worker.Tests-014 (duplicated test doubles). |
| 9 | Testing coverage | Issues found: Worker.Tests-001 (`StaMessagePump` untested), Worker.Tests-002 (COM-event delivery untested), Worker.Tests-012 (frame-validation gaps). |
| 10 | Documentation & comments | Issues found: Worker.Tests-008 (misplaced redaction test), Worker.Tests-011 (misleading test name). |
### 2026-05-20 re-review (commit `1cd51bb`)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: Worker.Tests-018 (silent-skip masquerades as passing tests), Worker.Tests-024 (`Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer` swallows the real exception type). |
| 2 | mxaccessgw conventions | Issues found: Worker.Tests-019 (`AlarmsLiveSmokeTests` uses `snake_case` outside the alarm-method scope Worker.Tests-009 corrected); pre-existing `LiveMxAccessFactAttribute` is not consumed by `MxAccessLiveComCreationTests` (Worker.Tests-018). |
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-020 (`MxAccessValueCacheTests.TryWaitForUpdate_ReturnsFalseAfterDeadline_WhenNoSetOccurs` asserts wall-clock floor and pump-call lower bound). |
| 4 | Error handling & resilience | Issues found: Worker.Tests-021 (`WorkerFrameProtocolErrorCode.EndOfStream` and the writer-side `MessageTooLarge`/`InvalidEnvelope` branches are uncovered). |
| 5 | Security | Redaction coverage is sound; no new issues. |
| 6 | Performance & resource management | No new issues — `MemoryStream`/session-disposal hygiene fixes from the prior pass hold; `WorkerFrameReader` `ArrayPool` rent/return path is now regression-tested. |
| 7 | Design-document adherence | No new issues. |
| 8 | Code organization & conventions | Issues found: Worker.Tests-016 (the now-shared `MxAccessSession` reflection construction in `AlarmCommandExecutorTests` duplicates the testable surface the consolidated TestSupport folder was meant to host). |
| 9 | Testing coverage | Issues found: Worker.Tests-017 (`WorkerCancel` envelope-dispatch path untested), Worker.Tests-022 (`WnWrapAlarmConsumer.PollOnce` transition-delta computation untested at the snapshot-to-transitions level). |
| 10 | Documentation & comments | Issues found: Worker.Tests-023 (`AlarmClientWmProbeTests` and `WnWrapConsumerProbeTests` are unit-test classes carrying 1000+ lines of probe-only code; their `[Fact(Skip=...)]` status is documented but the probe scaffolding is mixed into the same test assembly as regression tests). |
## Findings
### Worker.Tests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) |
| Status | Resolved |
**Description:** `StaMessagePump` — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. `WaitForWorkOrMessages` (timeout conversion, the `MsgWaitForMultipleObjectsEx` failure path) and `PumpPendingMessages` (drain count) are exercised only indirectly via `StaRuntime`, which never asserts the pump returns/throws correctly. The `MsgWaitFailed` error branch and `ToTimeoutMilliseconds` edge cases (`InfiniteTimeSpan`, `<= Zero`, `>= uint.MaxValue`) are completely uncovered.
**Recommendation:** Add `StaMessagePumpTests` that post a Windows message to the STA thread and assert `PumpPendingMessages` returns the expected count; cover `WaitForWorkOrMessages` waking on a signaled event vs timeout; cover `ToTimeoutMilliseconds` boundaries through an internals-visible seam.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs` (8 `[Fact]` tests, run on dedicated STA threads). Covers `WaitForWorkOrMessages` null-argument validation, returning immediately when the wake event is pre-signalled, waking when the event is signalled mid-wait, returning on timeout when never signalled, the `TimeSpan.Zero` (`<= Zero`) conversion branch, and waking on a `WM_NULL` Windows message posted to the STA thread (the `QS_ALLINPUT` path). `PumpPendingMessages` is covered for both an empty queue (returns 0) and three posted messages (returns 3). Boundary noted in the file: the `MsgWaitFailed` branch is not exercised because forcing `MsgWaitForMultipleObjectsEx` to fail needs a deliberately invalid native handle, which is unsafe to construct in-process; `ToTimeoutMilliseconds` is `private static` and is covered indirectly through wait-latency assertions rather than reflection.
### Worker.Tests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` |
| Status | Resolved |
**Description:** No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the `MxAccessEventQueue`. `MxAccessEventMapperTests` exercises the mapper directly with hand-built fakes, and `AlarmDispatcherTests` covers the alarm sink, but the non-alarm COM-event path (`MxAccessBaseEventSink`/`MxAccessComServer` event handlers → `MxAccessEventMapper` → queue, triggered by an actual sink callback) is never end-to-end tested. Given the worker's core purpose is to convert COM events to protobuf, this is a significant gap.
**Recommendation:** Add a test that invokes the base event sink's data-change handler (via an internal seam or a fake COM event source) and asserts a converted `WorkerEvent` with correct family/sequence appears in the queue.
**Resolution:** 2026-05-18 — Added `src/MxGateway.Worker.Tests/MxAccess/MxAccessBaseEventSinkTests.cs` (5 `[Fact]` tests). The four `MxAccessBaseEventSink` COM event handlers (`OnDataChange`, `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`) — the exact delegate targets the MXAccess COM runtime invokes — were widened from `private` to `internal` (with XML-doc notes that this is a unit-test seam), and `[assembly: InternalsVisibleTo("MxGateway.Worker.Tests")]` was added to `MxGateway.Worker.csproj`. The tests construct a real `MxAccessBaseEventSink` over a real `MxAccessEventMapper` and `MxAccessEventQueue`, invoke each handler with COM-style arguments, and assert a correctly-converted protobuf `WorkerEvent` (family, body case, server/item handle, value, quality, source timestamp, monotonic `WorkerSequence`) lands in the queue. Boundary noted in the file: the COM `+=` wire-up in `Attach`/`Detach` casts to the sealed `LMXProxyServerClass` RCW and cannot run without a live MXAccess COM object, so it is not exercised; invoking the handlers directly reproduces an STA-thread COM callback and exercises the genuine conversion + enqueue path.
### Worker.Tests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48` |
| Status | Resolved |
**Description:** `InvokeAsync_WakesIdlePumpForQueuedCommand` asserts `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake event (vs the 50 ms idle pump) caused the dispatch.
**Recommendation:** Remove the wall-clock assertion (the awaited result already proves the command ran), or raise the budget substantially with a comment that it is a coarse smoke check.
**Resolution:** 2026-05-18 — Removed the `Stopwatch` and the `stopwatch.Elapsed < TimeSpan.FromSeconds(2)` wall-clock assertion from `InvokeAsync_WakesIdlePumpForQueuedCommand`. The test already constructs the `StaRuntime` with a 30-second idle pump period, so the awaited `InvokeAsync` completing at all proves the command wake event — not the idle pump tick — drove the dispatch; no timing budget is needed. The XML-doc comment now states this explicitly. The now-unused `using System.Diagnostics;` was removed (`TreatWarningsAsErrors`).
### Worker.Tests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329` |
| Status | Resolved |
**Description:** `StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` use poll-until loops, and `Dispose_StopsAlarmPollLoop` additionally does `await Task.Delay(1000)` then asserts `PollCount` is unchanged. The 1s "no further polls" window is a timing race: a poll scheduled just before disposal could increment the counter afterward, and a slow agent could simply not run a poll in the window even without correct stop logic.
**Recommendation:** Make the poll loop deterministically observable — expose a "poll loop stopped" signal or have `Dispose` join the poll task — then assert on that rather than on elapsed-time silence.
**Resolution:** 2026-05-18 — `MxAccessStaSession.Dispose` now joins the alarm poll task (`pollTaskToJoin.Wait(TimeSpan.FromSeconds(5))`) after cancelling the poll CTS, instead of setting `alarmPollTask = null` and discarding it. Once `Dispose` returns, the poll loop has provably exited and no `PollOnce` call can still be in flight. `Dispose_StopsAlarmPollLoop` was rewritten to drop the `await Task.Delay(1000)` "no further polls" window: it now captures `PollCount` immediately after `Dispose()` returns and re-asserts equality after a bare `await Task.Yield()` — a deterministic frozen-count check rather than an elapsed-time race. The success-direction poll-until loop in `PollOnceCalledViaSta` was left as-is: waiting for an event to *occur* is sound; only waiting for an event to *not* occur is the race, and that pattern is now eliminated. Note: `ShutdownGracefullyAsync` already joined the poll task, so this change makes `Dispose` consistent with the graceful path.
### Worker.Tests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` |
| Status | Resolved |
**Description:** `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suite (which carefully `using`s `CancellationTokenSource`, `StaRuntime`, `PipePair`). `WorkerFrameWriter`/`WorkerFrameReader` are also constructed without disposal.
**Recommendation:** Wrap `MemoryStream` (and reader/writer if they are `IDisposable`) in `using` declarations for consistency.
**Resolution:** 2026-05-18 — All six `MemoryStream` test-body declarations in `WorkerFrameProtocolTests.cs` and the five `inbound`/`outbound` `MemoryStream` declarations in the `WorkerPipeSessionTests.cs` handshake tests were converted to `using` declarations, matching how the rest of the suite handles `CancellationTokenSource`/`StaRuntime`/`PipePair`. Re-triage of the parenthetical: `WorkerFrameWriter` and `WorkerFrameReader` are **not** `IDisposable` (`sealed class` with no `IDisposable` and no `Dispose` member — verified in `src/MxGateway.Worker/Ipc/`), so the finding's "reader/writer if they are `IDisposable`" suggestion does not apply and no change was made there. The shared `MemoryStream` instances inside the `WorkerPipeSessionTests` harness/helper classes (`ReadWrittenFrames` parameter, the `PipePair`/harness fields) are out of the cited line scope and were left untouched.
### Worker.Tests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` |
| Status | Resolved |
**Description:** `Dispose_StopsAlarmPollLoop` constructs `MxAccessStaSession session` without `using` (unlike every sibling test) and relies on an explicit `session.Dispose()`. If an assertion between `StartAsync` and `Dispose()` throws, the session — its STA thread and poll loop — leaks for the rest of the run. The `StaRuntime` is `using`d so the thread is eventually reclaimed, but the alarm poll loop and handler are not.
**Recommendation:** Use `using MxAccessStaSession session = ...` and drop the manual `Dispose()`, or wrap the body in try/finally.
**Resolution:** 2026-05-18 — `Dispose_StopsAlarmPollLoop` now declares its `MxAccessStaSession` with a `using` declaration. The manual `session.Dispose()` is kept because the test's purpose is to observe poll behaviour across disposal — but `MxAccessStaSession.Dispose` is idempotent (guarded by the `disposed` field), so the explicit mid-test call and the `using`-scope call do not conflict. An assertion thrown anywhere in the body now still tears the session (STA poll loop + alarm handler) down. The cited line numbers in the finding were imprecise — they straddle `PollOnceCalledViaSta` and `Dispose_StopsAlarmPollLoop` — but the described root cause (one `MxAccessStaSession` constructed without `using`) was singular and is the one in `Dispose_StopsAlarmPollLoop`; the sibling tests `PollOnceCalledViaSta` and `RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue` already used `using` and needed no change.
### Worker.Tests-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/WorkerFrameProtocol.md:38-49` |
| Status | Resolved |
**Description:** `docs/WorkerFrameProtocol.md` instructs running `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests` and states the frame protocol "is part of `MxGateway.Server`". The frame protocol actually lives in `MxGateway.Worker.Ipc` and is tested by `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The doc's verification command points at the wrong project and build, so anyone following it after changing the worker frame protocol will not run the relevant tests.
**Recommendation:** Update `docs/WorkerFrameProtocol.md` to reference `src/MxGateway.Worker.Tests` and the x86 worker build (`-p:Platform=x86`).
**Resolution:** 2026-05-18 — Rewrote the `## Verification` section of `docs/WorkerFrameProtocol.md`. The test command now targets `src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests`; the build command now targets `src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86`. The prose now states the frame protocol lives in `MxGateway.Worker.Ipc` (naming `WorkerFrameReader`/`WorkerFrameWriter`/`WorkerFrameProtocolOptions` and the `WorkerFrameProtocolTests.cs` test file) and notes the worker is an x86 process. Verified against the source: the frame-protocol types are confirmed under `src/MxGateway.Worker/Ipc/` and the tests under `src/MxGateway.Worker.Tests/Ipc/`, so the original doc was wrong on both project and component. Fenced code blocks were also relabelled `powershell` (the build/test commands are run from PowerShell on this Windows dev box).
### Worker.Tests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182` |
| Status | Resolved |
**Description:** `Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging` lives in `VariantConverterTests` but asserts on `WorkerLogRedactor.RedactValue`, which has nothing to do with `VariantConverter`. It is also a near-duplicate of coverage in `WorkerLogRedactorTests`. Placing redaction coverage inside the variant-converter class is misleading.
**Recommendation:** Move this test into `Bootstrap/WorkerLogRedactorTests.cs` (which already exists and tests `RedactFields`).
**Resolution:** 2026-05-18 — The misplaced redaction test was removed from `VariantConverterTests.cs` and re-added to `Bootstrap/WorkerLogRedactorTests.cs` as `RedactValue_WithCredentialBearingFieldNames_ReturnsRedactedValue` — alongside the existing `RedactFields` coverage, where redaction tests belong. Confirmed root cause: the old test asserted only on `WorkerLogRedactor.RedactValue` and never touched `VariantConverter`. The now-orphaned `using MxGateway.Worker.Bootstrap;` was removed from `VariantConverterTests.cs` (`TreatWarningsAsErrors`). The new home is `RedactValue` per-field coverage; `WorkerLogRedactorTests.RedactFields_...` already covers the dictionary path, so the two are complementary rather than duplicates.
### Worker.Tests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs`, `AlarmDispatcherTests.cs`, `AlarmCommandExecutorTests.cs`, `AlarmRecordTransitionMapperTests.cs`, `WnWrapAlarmConsumerXmlTests.cs` |
| Status | Resolved |
**Description:** The alarm-related test files use `snake_case` method names while the rest of the project uses the `Method_State_Result` PascalCase convention. `docs/style-guides/CSharpStyleGuide.md` and the surrounding code establish PascalCase as the project convention; the alarm files diverge.
**Recommendation:** Rename alarm-test methods to the `Method_Scenario_Expectation` PascalCase form for one consistent convention.
**Resolution:** 2026-05-18 — Renamed every `[Fact]`/`[Theory]` method in the five alarm test files from `snake_case` to the project's `Method_Scenario_Expectation` PascalCase form (46 test methods total: 10 in `AlarmCommandHandlerTests`, 8 in `AlarmDispatcherTests`, 12 in `AlarmCommandExecutorTests`, 8 in `AlarmRecordTransitionMapperTests`, 9 in `WnWrapAlarmConsumerXmlTests` minus the existing PascalCase probe methods). Only test methods were renamed — `snake_case` is not present; the method names that *look* like helpers (`Subscribe`, `PollOnce`, `Dispose` on the fake doubles) are interface implementations of `IAlarmCommandHandler`/`IAlarmTransitionConsumer`/`IDisposable` and were correctly left unchanged. The suite stays green; xUnit discovers tests by attribute, not name, so the renames are behaviour-neutral.
### Worker.Tests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258` |
| Status | Resolved |
**Description:** `StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest` asserts `Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase)`. The XML doc claims it verifies the diagnostic says "alarm consumer not configured", but the assertion only checks the substring "alarm" — which would also match an unrelated message like "invalid alarm GUID". The assertion is weaker than the documented intent.
**Recommendation:** Assert the full diagnostic phrase so the test fails if the diagnostic regresses to a misleading message.
**Resolution:** 2026-05-18 — The weak `Assert.Contains("alarm", ...)` was replaced with an exact `Assert.Equal` against the diagnostic the executor actually emits. Re-triage: the test's XML doc claimed the phrase was "alarm consumer not configured", but `MxAccessCommandExecutor.ExecuteSubscribeAlarms` (verified in `src/MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:310-315`) produces "SubscribeAlarms requires an alarm command handler; the worker was constructed without one." — the doc was wrong, so both the assertion and the XML doc were corrected to the real phrase. The test now fails if the diagnostic regresses to any other message.
### Worker.Tests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112` |
| Status | Resolved |
**Description:** `DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply` is named and documented as if it proves cancellation arrived after execution began. The test does `Started.Wait(...)` then `cancellation.Cancel()`, which proves execution started, but because the executor is already running on the STA the cancellation is inherently a no-op — the test cannot distinguish "cancel was observed and ignored" from "cancel was never checked". The name overstates what is proven.
**Recommendation:** Either tighten the test (assert the dispatcher's cancel path was reached and declined) or rename/comment it to "cancellation cannot abort an in-flight STA command", matching `gateway.md`'s stated behavior.
**Resolution:** 2026-05-18 — Took the rename/re-document option. The test is renamed `DispatchAsync_WhenCanceledWhileExecuting_DoesNotAbortInFlightCommand` and its XML doc rewritten to state exactly what it proves — an in-flight STA command is *not* aborted by cancellation — and to state explicitly that the test cannot and does not distinguish "cancel observed and ignored" from "cancel never checked". The doc now cites `gateway.md`'s wording ("cannot safely abort an in-flight COM call on the STA"). The test body is unchanged: it already asserts the command runs to completion and returns its normal `Ok` reply, which is the genuine behaviour. No runtime behaviour changed.
### Worker.Tests-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Status | Resolved |
**Description:** `docs/WorkerFrameProtocol.md` states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." `WorkerFrameProtocolTests` covers malformed-length, wrong protocol version, wrong session, and malformed payload, but has no test for the zero-length-payload rejection or the oversized-frame rejection — both explicit security-relevant input-validation paths.
**Recommendation:** Add tests feeding a frame with `payload_length == 0` and one with `payload_length` above the configured maximum, asserting the corresponding `WorkerFrameProtocolErrorCode`.
**Resolution:** 2026-05-18 — Re-triage of the zero-length half: the finding's "no test for the zero-length-payload rejection" is partly inaccurate. The pre-existing `ReadAsync_WithMalformedLength_ThrowsMalformedLength` fed a four-zero-byte stream — which is exactly a frame declaring `payload_length == 0` — so the zero-length path *was* already covered, just under a misleading name (the length prefix itself is well-formed; only the declared length is zero). That test was renamed `ReadAsync_WithZeroLengthPayload_ThrowsMalformedLength` with an XML doc explaining the four-zero-byte construction, rather than adding a duplicate. The oversized half was a genuine gap: a new `ReadAsync_WithPayloadAboveConfiguredMaximum_ThrowsMessageTooLarge` constructs `WorkerFrameProtocolOptions` with a 64-byte maximum, feeds a length prefix of 65, and asserts `WorkerFrameProtocolErrorCode.MessageTooLarge` — verified against `WorkerFrameReader.ReadAsync`, both checks fire before the payload buffer is rented. The small configured maximum keeps the test from allocating a multi-megabyte buffer.
### Worker.Tests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546` |
| Status | Resolved |
**Description:** `ThrowIfCompletedAsync` does an unconditional `await Task.Delay(TimeSpan.FromMilliseconds(100))` then checks `task.IsCompleted`. This adds a fixed 100 ms to the test and only catches a `RunAsync` that fails within that arbitrary window; a session that faults after 100 ms slips past undetected.
**Recommendation:** Replace with a deterministic race: `await Task.WhenAny(runTask, <first-expected-frame-read>)` and assert the run task did not win.
**Resolution:** 2026-05-18 — `ThrowIfCompletedAsync` was deleted (it had a single call site, in `RunAsync_SendsHeartbeatPayloadFromRuntimeSnapshot`). That test now races `runTask` against the first-heartbeat `ReadUntilAsync` with `Task.WhenAny`; if `runTask` wins it is awaited to surface the underlying fault and the test fails via `Assert.Fail`. The fixed 100 ms delay is gone — the check is now deterministic: a `RunAsync` faulting at *any* time before the first heartbeat is caught, and a healthy run completes as soon as the heartbeat arrives instead of always paying 100 ms.
### Worker.Tests-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194`, `WorkerPipeSessionTests.cs:622`, `Sta/StaCommandDispatcherTests.cs:348`, `MxAccess/MxAccessStaSessionTests.cs:334`, `MxAccess/MxAccessCommandExecutorTests.cs:1124` |
| Status | Resolved |
**Description:** `FakeRuntimeSession`, `NoopComApartmentInitializer`, `NoopEventSink`/`NullEventSink`, and the `CreateFrame`/`WriteUInt32LittleEndian` helpers are re-implemented independently in multiple test files. The two `FakeRuntimeSession` implementations have already diverged (one supports `BlockDispatch`/event enqueue, one does not), and `NoopComApartmentInitializer` is defined four times.
**Recommendation:** Extract shared test doubles (`NoopComApartmentInitializer`, frame helpers, a single configurable `FakeRuntimeSession`) into a `TestSupport` folder/namespace consumed by all test classes.
**Resolution:** 2026-05-18 — Added a `src/MxGateway.Worker.Tests/TestSupport/` folder (namespace `MxGateway.Worker.Tests.TestSupport`) with four shared doubles: `NoopComApartmentInitializer`, `NoopEventSink`, `WorkerFrameTestHelpers` (`CreateFrame`/`WriteUInt32LittleEndian`), and a single configurable `FakeRuntimeSession`. The consolidated `FakeRuntimeSession` is the richer of the two divergent copies (it supports `BlockDispatch`, event enqueue, shutdown-timeout, and throw-after-release); the minimal `WorkerPipeClientTests` caller simply leaves the options unset. The per-file copies were deleted from `WorkerPipeClientTests`, `WorkerPipeSessionTests`, `StaCommandDispatcherTests`, `MxAccessStaSessionTests`, `MxAccessCommandExecutorTests`, and `WorkerFrameProtocolTests`, and the orphaned `NullEventSink` in `AlarmCommandExecutorTests` was replaced with the shared `NoopEventSink`. Re-triage: the finding says `NoopComApartmentInitializer` "is defined four times" — it was defined **three** times (`StaCommandDispatcherTests`, `MxAccessStaSessionTests`, `MxAccessCommandExecutorTests`); the fourth alarm-area `IStaComApartmentInitializer` implementation is `StaRuntimeTests.RecordingComApartmentInitializer`, which is a *recording* double (asserts init/uninit ordering), not a no-op, so it was deliberately left in place rather than folded into the shared no-op. Unused `using` directives left behind by the removals were stripped (`TreatWarningsAsErrors`).
### Worker.Tests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs` |
| Status | Resolved |
**Description:** `MxAccessEventQueueTests` covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover `Drain` with `maxEvents: 0` (drain-all) — a branch `FakeRuntimeSession.DrainEvents` even special-cases — nor draining an empty queue, nor enqueue after a manual `RecordFault`. These are minor branches but the overflow/fault interaction is the worker's backpressure contract.
**Recommendation:** Add a `Drain(0)` drain-all test and an empty-queue drain test.
**Resolution:** 2026-05-18 — Added three tests to `MxAccessEventQueueTests`. `Drain_WithZeroMaxEvents_DrainsAllEvents` covers the `maxEvents == 0` drain-all branch in `MxAccessEventQueue.Drain` (verified at `src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:174`) — three events enqueued, `Drain(0)` returns all three in order and empties the queue. `Drain_WhenQueueIsEmpty_ReturnsEmptyList` covers the `drainCount == 0` early-return branch for both `Drain(0)` and `Drain(5)` on an empty queue. `Enqueue_AfterRecordFault_ThrowsInvalidOperationException` covers the backpressure contract gap the finding flagged — after a manual `RecordFault`, `Enqueue` throws `InvalidOperationException` ("outbound event queue is faulted") and the event is not queued.
### Worker.Tests-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandExecutorTests.cs:317-393` |
| Status | Resolved |
**Description:** `AlarmCommandExecutorTests` reaches into `MxAccessSession` via reflection (`typeof(MxAccessSession).GetConstructor(BindingFlags.NonPublic | BindingFlags.Instance, ..., new[] { typeof(object), typeof(IMxAccessServer), typeof(IMxAccessEventSink), typeof(MxAccessHandleRegistry), typeof(MxAccessValueCache), typeof(int) }, ...)`) and provides an inline `NullMxAccessServer` no-op implementing every `IMxAccessServer` method. The XML doc admits the reflection-based path is fragile (`"MxAccessSession private ctor signature changed; update the test seam."`). The same `NullMxAccessServer` shape is reinventable wherever an executor is exercised in isolation; the consolidated `TestSupport` namespace introduced in Worker.Tests-014 was the natural home for it, but the no-op server lives in a single test file's private nested class instead. A future change to the private ctor signature breaks this one test in a way that requires re-reading the reflection call to diagnose, and a second test that wants the same no-op surface will reflectively duplicate it.
**Recommendation:** Either (a) add a non-reflective seam — a constructor or static factory marked `internal`-with-`InternalsVisibleTo` that takes `IMxAccessServer` + the existing dependencies, removing the reflection — or (b) move the `NullMxAccessServer` no-op and the reflection helper into `TestSupport/NoopMxAccessSession.cs` so any future test can share it and a ctor change is fixed in one place.
**Resolution:** 2026-05-20 — Took option (a) plus option (b). Added a non-reflective `internal static MxAccessSession.CreateForTesting(IMxAccessServer, IMxAccessEventSink, MxAccessHandleRegistry?, MxAccessValueCache?, int?)` factory in `src/MxGateway.Worker/MxAccess/MxAccessSession.cs` (lines 61-88), gated through the pre-existing `<InternalsVisibleTo Include="MxGateway.Worker.Tests" />` in `src/MxGateway.Worker/MxGateway.Worker.csproj`. `AlarmCommandExecutorTests.NewExecutor` now calls `MxAccessSession.CreateForTesting(new NoopMxAccessServer(), new NoopEventSink())` — no `GetConstructor`/`Invoke`/`BindingFlags` anywhere in the file. The previously per-file `NullMxAccessServer` no-op was extracted to the shared `src/MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs` (matching the `TestSupport` consolidation introduced in Worker.Tests-014); the XML doc on the new file explicitly cites Worker.Tests-016 for the rationale. A future change to the `MxAccessSession` private ctor signature now updates `CreateForTesting` in one place; the test file does not need to be edited.
### Worker.Tests-017
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs` |
| Status | Resolved |
**Description:** `WorkerPipeSession.DispatchGatewayEnvelopeAsync` (`src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:365-385`) has three documented branches: `WorkerCommand`, `WorkerShutdown`, and `WorkerCancel`. `WorkerPipeSessionTests` exercises the first two but never sends a `WorkerCancel` envelope, so the `_runtimeSession?.CancelCommand(envelope.CorrelationId)` path and the contract that the session forwards a cancel without faulting the pipe are uncovered. The `default:` arm (`UnexpectedEnvelopeBody` exception) is also uncovered — a gateway sending the wrong body case (e.g. another `GatewayHello` after the handshake) should produce a `ProtocolViolation` fault but no test asserts this.
**Recommendation:** Add two tests: one that writes a `WorkerCancel` envelope with a known correlation id and asserts `FakeRuntimeSession.CancelCommand` was called with that id (extend the shared `FakeRuntimeSession` to record cancel-correlation-ids); one that writes a post-handshake `GatewayHello` envelope and asserts the session writes a `WorkerFault` with category `ProtocolViolation` and exits the message loop.
**Resolution:** 2026-05-20 — Added two `[Fact]`s to `WorkerPipeSessionTests` and the supporting state to the shared `FakeRuntimeSession`. (1) `RunAsync_WhenGatewaySendsWorkerCancel_ForwardsCorrelationIdToRuntimeSession` writes a `WorkerCancel` envelope with correlation id `"cancel-correlation-1"` after the handshake, then drives a normal shutdown via `SendShutdownAndWaitAsync` — observing the shutdown ack proves the message loop kept running (no fault, no exit) and `Assert.Contains("cancel-correlation-1", runtime.CancelledCorrelationIds)` proves the cancel reached `IWorkerRuntimeSession.CancelCommand`. The shared `FakeRuntimeSession` was extended with a `CancelledCorrelationIds` snapshot list and an optional `CancelCommandReturnValue` (defaulting to `false`, preserving the prior behaviour). (2) `RunAsync_WhenGatewaySendsUnexpectedEnvelopeBodyAfterHandshake_ThrowsAndExitsMessageLoop` writes a second `GatewayHello` envelope post-handshake — valid envelope, invalid body case for the message-loop state — and asserts `Assert.ThrowsAsync<WorkerFrameProtocolException>(async () => await runTask)` with `ErrorCode == WorkerFrameProtocolErrorCode.UnexpectedEnvelopeBody`. Re-triage: the original recommendation said "the session writes a `WorkerFault` with category `ProtocolViolation`", but the source at `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:380-384` shows the `default:` arm throws `WorkerFrameProtocolException`; `RunMessageLoopAsync` has no fault-writing catch (only `CompleteStartupHandshakeAsync` writes faults during the handshake). The test XML doc records this — the contract pinned is the exception type/error-code and the message-loop exit, not a fault frame.
### Worker.Tests-018
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessLiveComCreationTests.cs:18-31, 35-73, 75-145, 148-220, 222-342` |
| Status | Resolved |
**Description:** Every `[Fact]` in `MxAccessLiveComCreationTests` gates on `RunLiveMxAccessTests()` and `return`s silently when the opt-in env var is not set. xUnit reports a `Fact` that returns normally as **passed**, so a CI run without `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` shows five green "live MXAccess" tests that did not run a single line of MXAccess code. `docs/GatewayTesting.md` and the `IntegrationTests` project already provide the correct pattern — `LiveMxAccessFactAttribute` (in `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs`) emits xUnit's native `Skipped` status when the env var is absent — but `MxAccessLiveComCreationTests` does not consume it, so the gate is invisible in test output. The first test (`StartAsync_WhenOptedIn_CreatesInstalledMxAccessComObjectOnSta`) additionally inlines the env-var check (`string.Equals(Environment.GetEnvironmentVariable(...), "1", StringComparison.Ordinal)`) instead of using the local `RunLiveMxAccessTests()` helper, so the convention is inconsistent even within the same file.
**Recommendation:** Move `LiveMxAccessFactAttribute` into a shared location both projects can reference (e.g. `MxGateway.Contracts.TestSupport` or a new `MxGateway.TestSupport` shared project), and decorate the five `MxAccessLiveComCreationTests` methods with `[LiveMxAccessFact]` instead of `[Fact]`. Drop the inline env-var checks. Skipped runs will then report `Skipped` rather than `Passed`, and CI will distinguish "live MXAccess unavailable" from "live MXAccess opted in, succeeded".
**Resolution:** 2026-05-20 — Added a self-contained `LiveMxAccessFactAttribute` at `src/MxGateway.Worker.Tests/TestSupport/LiveMxAccessFactAttribute.cs` (namespace `MxGateway.Worker.Tests.TestSupport`) that mirrors the `MxGateway.IntegrationTests` attribute: when `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS` is not `1`, the attribute sets `Skip` so xUnit emits a native `Skipped` result rather than a misleading `Passed`. All five `MxAccessLiveComCreationTests` methods now use `[LiveMxAccessFact]`; the inline env-var check at the top of `StartAsync_WhenOptedIn_CreatesInstalledMxAccessComObjectOnSta` and the per-method `if (!RunLiveMxAccessTests()) return;` silent-returns were deleted. The worker tests target net48/x86 and the integration tests target net10.0, so introducing a cross-project shared assembly was not practical; the Worker.Tests attribute is a near-duplicate of the IntegrationTests attribute and the XML doc on the new file calls this out so the next reviewer understands why two copies exist. xUnit output now reports the five live tests as `[SKIP]` when the env var is absent — `dotnet test ...` shows `Skipped: 9, Total: 274`, with the five `MxAccessLiveComCreationTests` correctly counted as skipped rather than passed.
### Worker.Tests-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Worker.Tests/AlarmsLiveSmokeTests.cs:45`, `src/MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs:143`, `src/MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs:55` |
| Status | Resolved |
**Description:** Worker.Tests-009 renamed every `snake_case` alarm-test method to the project's `Method_Scenario_Expectation` convention, but the rename missed the dev-rig probe and live-smoke `[Fact]`s in the `MxGateway.Worker.Tests` root (not under `MxAccess/`): `AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip`, `AlarmClientWmProbeTests.Probe_AlarmClient_for_alarm_messages` (and its helpers), and `WnWrapConsumerProbeTests.ProbeWnWrapConsumer`. These are `[Fact(Skip=...)]` so they never execute in normal CI, but they still drift from `docs/style-guides/CSharpStyleGuide.md` and contradict the resolution claim in Worker.Tests-009 that "every `[Fact]`/`[Theory]` method in the five alarm test files" was renamed.
**Recommendation:** Rename `Alarms_full_pipeline_round_trip``Alarms_FullPipelineRoundTrip_RaisesAndAcknowledges` (or similar `Method_Scenario_Expectation` form) and apply the same convention to the two probe methods. xUnit discovers by attribute, not name, so renames are behaviour-neutral.
**Resolution:** 2026-05-20 — Renamed the three `snake_case` probe/smoke `[Fact]` methods to the project's `Method_Scenario_Expectation` PascalCase convention: `Alarms_full_pipeline_round_trip``Alarms_FullPipelineRoundTrip_RaisesAndAcknowledges` (in `Probes/AlarmsLiveSmokeTests.cs`), `ProbeAlarmClientWmMessages``ProbeAlarmClient_OnDevRig_LogsAlarmWindowMessages` (in `Probes/AlarmClientWmProbeTests.cs`), and `ProbeWnWrapConsumer``ProbeWnWrapConsumer_OnDevRig_LogsXmlAlarmStream` (in `Probes/WnWrapConsumerProbeTests.cs`). The three files have moved to `Probes/` as part of Worker.Tests-023; the location columns above predate that move. xUnit discovers tests by attribute, so the renames are behaviour-neutral and the `Skip` strings still apply unchanged.
### Worker.Tests-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Worker.Tests/MxAccess/MxAccessValueCacheTests.cs:88-108` |
| Status | Resolved |
**Description:** `TryWaitForUpdate_ReturnsFalseAfterDeadline_WhenNoSetOccurs` asserts both a lower wall-clock bound (`stopwatch.ElapsedMilliseconds >= 60`, deadline was 80ms) and `pumpCalls > 1`. The 60ms floor is the same class of timing race Worker.Tests-003/004/013 corrected elsewhere: on a loaded CI agent a `Task.Run` scheduling delay can push the wait's start past the deadline so the loop runs zero or one iteration, the wait returns slightly *early* of the 60ms floor, and the test fails through no fault of the production code. The `pumpCalls > 1` check additionally races against the same scheduler — if the agent stalls the wait thread, `pumpStep` might fire only once before the deadline. The test purpose (verifying the timeout is honoured and pump-step is invoked) is sound but the assertions are wall-clock floors rather than deterministic checks.
**Recommendation:** Drop the elapsed-time floor and the `pumpCalls > 1` assertion; verify only that `result` is false, `value` is default, and `pumpCalls >= 1` (the pump must fire at least once, but not "more than once"). The fact that `TryWaitForUpdate` returned false after the deadline is the contract the test exists to pin; the timing strictness is incidental.
**Resolution:** 2026-05-20 — Eliminated the wall-clock dependency entirely (the equivalent of a manual time source for the `DateTime.UtcNow`-based deadline). The test now passes `DateTime.UtcNow.AddMilliseconds(-1)` — a deadline already in the past — so `TryWaitForUpdate`'s loop pumps once, immediately observes the elapsed deadline, and returns false with zero `Thread.Sleep`. The `Stopwatch`/`stopwatch.ElapsedMilliseconds >= 60` floor and the `pumpCalls > 1` strict-inequality assertions are gone. With an already-expired deadline the contract is deterministic: exactly one pump call (the loop must pump before checking the deadline so MXAccess messages can dispatch on the calling thread even when the deadline has just expired), `result == false`, `value` is default. Matches the pattern Worker.Tests-003/004/013 used — drop wall-clock floor checks in favour of a deterministic signal.
### Worker.Tests-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs` |
| Status | Resolved |
**Description:** `WorkerFrameProtocolTests` covers `MalformedLength`, `MessageTooLarge` (read-side, added in Worker.Tests-012), `ProtocolVersionMismatch`, `SessionMismatch`, and `InvalidEnvelope` on `WorkerFrameReader`. Three documented protocol-error branches remain uncovered: (1) `WorkerFrameProtocolErrorCode.EndOfStream` from `WorkerFrameReader.ReadExactlyOrThrowAsync` (`src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:106`) when the stream closes mid-frame — important because the gateway closing its end of the pipe during a partial read is the most common production transport failure; (2) `WorkerFrameWriter` rejecting an envelope whose `CalculateSize()` returns 0 with `WorkerFrameProtocolErrorCode.InvalidEnvelope` (`WorkerFrameWriter.cs:46`); (3) `WorkerFrameWriter` rejecting an envelope larger than `MaxMessageBytes` with `WorkerFrameProtocolErrorCode.MessageTooLarge` (`WorkerFrameWriter.cs:53`). The writer-side checks defend against a session that constructs a too-large envelope before sending it down the pipe — completely separate from the reader-side bounds the existing tests pin.
**Recommendation:** Add three tests: (a) `ReadAsync_WhenStreamEndsMidFrame_ThrowsEndOfStream` — feed a 4-byte length prefix declaring 100 bytes followed by only 50 bytes, assert `EndOfStream`; (b) `WriteAsync_WithEnvelopeAboveConfiguredMaximum_ThrowsMessageTooLarge` — construct `WorkerFrameProtocolOptions` with a small `MaxMessageBytes` and an envelope whose serialised size exceeds it, assert `MessageTooLarge`; (c) since `WorkerEnvelope.CalculateSize()` never returns 0 for a valid envelope (the protocol version field alone serializes), the `InvalidEnvelope` writer branch is genuinely unreachable in normal operation — either document this as defensive code that is intentionally untestable, or drop the check.
**Resolution:** 2026-05-20 — Added three `[Fact]`s to `WorkerFrameProtocolTests.cs` for the three uncovered protocol-error branches. (a) `ReadAsync_WhenStreamEndsMidFrame_ThrowsEndOfStream` builds a 4-byte length prefix declaring 100 bytes followed by only 50 bytes, drives `WorkerFrameReader.ReadAsync` against it, and asserts `WorkerFrameProtocolErrorCode.EndOfStream` — pins the gateway-closes-mid-read transport failure. (b) `WriteAsync_WithEnvelopeAboveConfiguredMaximum_ThrowsMessageTooLarge` constructs `WorkerFrameProtocolOptions` with `MaxMessageBytes=64`, builds a `GatewayHello` envelope whose `GatewayVersion` is padded to 1024 bytes, asserts `WorkerFrameProtocolErrorCode.MessageTooLarge` and that the stream stayed empty (zero bytes written). (c) `WriteAsync_WithEmptyEnvelope_ThrowsInvalidEnvelopeFromValidator` exercises the body-less path — `WorkerEnvelopeValidator.Validate` runs first and rejects an envelope whose `BodyCase` is `None` with `InvalidEnvelope`, so the `CalculateSize()==0` branch is intercepted before it fires; the XML doc explicitly documents that the defensive zero-length branch is unreachable through public API but is left in place as a one-comparison safety net against future serialisation regressions. Net change: three new tests, all green; the reader-side `EndOfStream` plus writer-side `MessageTooLarge`/`InvalidEnvelope` rejections are now regression-protected.
### Worker.Tests-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/MxGateway.Worker.Tests/MxAccess/WnWrapAlarmConsumerXmlTests.cs` |
| Status | Resolved |
**Description:** `WnWrapAlarmConsumerXmlTests` covers `ParseSnapshotXml` and `TryParseHexGuid` directly — the pure-helper layer — and pins the no-internal-timer Worker-001 invariant via reflection. The `PollOnce` transition-delta logic (`src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:289-337`) is what actually turns "snapshot N to snapshot N+1" into `MxAlarmTransitionEvent` instances, and is the only place the consumer makes state-management decisions: skip-when-state-unchanged, fire-with-previous-state-Unspecified for first sighting, and (implicitly) drop entries that vanished from the new snapshot. None of these branches are exercised — the live-smoke `AlarmsLiveSmokeTests` covers the end-to-end pipeline but is `[Fact(Skip=...)]` against the dev rig, so there is no in-CI coverage of "snapshot delta computation produces the right transitions" at all. A regression that, for example, emits a transition every poll regardless of state-change would slip through.
**Recommendation:** Refactor `PollOnce`'s snapshot-diff loop into a pure `internal static IReadOnlyList<MxAlarmTransitionEvent> ComputeTransitions(Dictionary<Guid,MxAlarmSnapshotRecord> previous, Dictionary<Guid,MxAlarmSnapshotRecord> next)` and add direct unit tests: (a) new entry produces `PreviousState=Unspecified`; (b) state-unchanged produces no transition; (c) state-changed produces a transition with the prior state; (d) entry vanished from `next` produces no transition (an alarm cleared from the active set; the snapshot just no longer mentions it). `MxAccessStaSession` already drives the COM-side polling, so the diff is genuinely independent of any COM dependency.
**Resolution:** 2026-05-20 — Extracted the snapshot-diff loop from `WnWrapAlarmConsumer.PollOnce` into a pure `internal static IReadOnlyList<MxAlarmTransitionEvent> ComputeTransitions(Dictionary<Guid,MxAlarmSnapshotRecord> previous, Dictionary<Guid,MxAlarmSnapshotRecord> next)` in `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs`. `PollOnce` now calls `ComputeTransitions` under the same `syncRoot` lock; the diff rules are unchanged. Added five `[Fact]`s in `WnWrapAlarmConsumerXmlTests.cs` exercising all four branches plus a multi-alarm fan-out case: `ComputeTransitions_WhenAlarmIsNewInNextSnapshot_EmitsTransitionWithUnspecifiedPreviousState`, `ComputeTransitions_WhenAlarmStateUnchanged_EmitsNoTransition`, `ComputeTransitions_WhenAlarmStateChanged_EmitsTransitionWithPriorState`, `ComputeTransitions_WhenAlarmDroppedFromActiveSet_EmitsNoTransition`, and `ComputeTransitions_WithMixedDelta_EmitsOnlyNewAndChangedTransitions`. Each test drives the function with `Dictionary<Guid,MxAlarmSnapshotRecord>` snapshots built from a `NewRecord` helper — no COM, no STA. A regression that emits a transition every poll regardless of state, swaps the previous/next ordering, or treats a dropped alarm as a transition now fails in-CI.
### Worker.Tests-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs` (779 lines), `src/MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs` (287 lines), `src/MxGateway.Worker.Tests/AlarmsLiveSmokeTests.cs` (270 lines) |
| Status | Resolved |
**Description:** Three large dev-rig "probe" files are mixed into the worker unit-test project but are not unit tests in the usual sense: each is a `[Fact(Skip="Runtime probe — flip Skip=null on the dev rig (AVEVA installed)...")]` driver that runs hundreds of seconds, opens real Galaxy subscriptions, posts Windows messages on STA threads, captures alarm payloads to `ITestOutputHelper`, and exists to document AVEVA COM behaviour rather than gate it. `AlarmClientWmProbeTests` alone is 779 lines — larger than every genuine unit-test file in the project. Build-time these files contribute 1300+ lines of probe scaffolding that consumers of the project's "what is `Worker.Tests` for?" inspection have to wade through. The Skip-attribute strings document why they exist, but a colocated `docs/AlarmProbes.md` (or moving the probes to a separate `MxGateway.Worker.Probes` non-test assembly) would make the distinction explicit and stop the probe files from inflating `Worker.Tests`' build/test surface.
**Recommendation:** Either (a) carve the three probe files out into `src/MxGateway.Worker.Probes/` (a separate project the dev-rig user opts into; the assembly references stay the same), or (b) move them into a `Probes/` subfolder inside `MxGateway.Worker.Tests` and add a one-paragraph header in `docs/GatewayTesting.md` describing the probe surface. Option (a) is cleaner because the live-smoke `AlarmsLiveSmokeTests` already references `WnWrapAlarmConsumer` directly and would naturally cohabit with the other AVEVA-COM probes.
**Resolution:** 2026-05-20 — Took option (b): moved `AlarmClientWmProbeTests.cs`, `WnWrapConsumerProbeTests.cs`, and `AlarmsLiveSmokeTests.cs` from `src/MxGateway.Worker.Tests/` into a new `src/MxGateway.Worker.Tests/Probes/` subfolder. The files keep their existing namespace (`MxGateway.Worker.Tests`) and their `[Fact(Skip=...)]` gating; the SDK-style project picks them up under the new path without a `.csproj` change. Option (b) was chosen over (a) because the probes still rely on the same test-project package references (`xunit`, `Microsoft.NET.Test.Sdk`, `Xunit.Abstractions`) plus the `Interop.WNWRAPCONSUMERLib`/`ArchestrA.MxAccess`/`aaAlarmManagedClient`/`IAlarmMgrDataProvider` references already declared in `MxGateway.Worker.Tests.csproj`; a separate `MxGateway.Worker.Probes` project would have to duplicate every one of these. The probes remain runnable on the dev rig by flipping `Skip=null` exactly as before. The `Worker.Tests` root listing now contains only genuine unit-test/regression files; probe scaffolding is visibly partitioned by directory.
### Worker.Tests-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs:42-54` |
| Status | Resolved |
**Description:** `Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer` asserts that an exception during `IMxAccessAlarmConsumer.Subscribe` triggers consumer disposal. The fake throws `new InvalidOperationException("simulated wnwrap subscribe failure")` and the test asserts `Assert.Throws<InvalidOperationException>(() => handler.Subscribe(...))`. But `AlarmCommandHandler.Subscribe` (`src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:65-93`) wraps the underlying call and re-throws — so an `InvalidOperationException` from any code path inside `Subscribe` (e.g. its own "already subscribed" guard at line 73) would also satisfy the assertion. The test does not pin that the *thrown* exception is the one from the fake; if `AlarmCommandHandler` regressed to throw before reaching the consumer, the test would still pass with `consumer.Disposed == false` ... except the test additionally asserts `consumer.Disposed` is true, which would fail. So the test does pin the disposal behaviour. The genuine weakness is that the assertion doesn't pin the exception message either ("simulated wnwrap subscribe failure"), so an unexpected `InvalidOperationException` from a different branch with a misleading message would pass without anyone noticing the handler swallowed the real failure cause.
**Recommendation:** Strengthen to `InvalidOperationException exception = Assert.Throws<InvalidOperationException>(...); Assert.Contains("simulated wnwrap subscribe failure", exception.Message)` — pin both the type and the originating message so a regression that throws a *different* `InvalidOperationException` from inside `AlarmCommandHandler` fails the test.
**Resolution:** 2026-05-20 — `Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer` now captures the thrown exception and asserts `Assert.Contains("simulated wnwrap subscribe failure", exception.Message)` against the fake's exact thrown message. A regression that throws a *different* `InvalidOperationException` from inside `AlarmCommandHandler` (for example its own "already subscribed" guard at line 73 of `AlarmCommandHandler.cs`) now fails the message-contains assertion — the original test's type-only `Assert.Throws<InvalidOperationException>` would have passed silently while hiding the swallowed failure cause. The disposal assertion (`consumer.Disposed == true`) is unchanged; the test now pins both the disposal contract and the origin of the propagated exception. XML doc on the test method documents the regression scenario.