Files
mxaccessgw/code-reviews/Worker.Tests/findings.md
T
Joseph Doherty d2d2e5f68f code-review 2026-05-24: re-review at d692232 across all 11 modules
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).

Module status

| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server           | 8 | 43 | +6 |
| Contracts        | 2 | 17 | +2 |
| Tests            | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java      | 5 | 31 | +5 |
| Client.Rust      | 1 | 21 | +1 |
| Worker           | 0 | 25 |  0 (rename-only diff, clean) |
| Worker.Tests     | 0 | 30 |  0 (rename-only diff, clean) |
| Client.Dotnet    | 0 | 17 |  0 (rename + alarm-fix diff, clean) |
| Client.Python    | 0 | 21 |  0 (rename + alarm-fix diff, clean) |
| Client.Go        | 0 | 21 |  0 (rename + alarm-fix diff, clean) |

Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.

New findings

  * Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
    any session id from any Viewer; no per-session ACL guards the
    EventsHub group fan-out.
  * Server-039 (Low / Error handling) — HubTokenService.Validate
    accepts a payload with null Name/NameIdentifier.
  * Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
    full-vs-RDN lookup precedence.
  * Server-041 (Low / Design adherence) — EventStreamService calls
    IDashboardEventBroadcaster.Publish without a try/catch — fragile
    seam relying on the never-throw contract.
  * Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
    retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
  * Server-043 (Low / Documentation) — HubTokenService singleton
    sharing across login + hub-token validation undocumented.

  * Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
    reserved-for-future-use ambiguity.
  * Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
    omits the alarm_filter_prefix filter description.

  * Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
    fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
  * Tests-026 (Medium / Testing coverage) — no test proves
    EventStreamService actually calls IDashboardEventBroadcaster.Publish.

  * IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
    silent fallback to Directory.GetCurrentDirectory().
  * IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
    success-path asserts ldap_group but not the Role claim.
  * IntegrationTests-024 (Low / Conventions) — inline
    NullDashboardEventBroadcaster fake duplicates Tests-side copies.

  * Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
    Gradle task names still use the old short project names.
  * Client.Java-028 (Medium / Design adherence) — JavaClientDesign
    build-layout shows the old `com/dohertylan/mxgateway/` package paths.
  * Client.Java-029 (Low / Documentation) — README installDist path
    cites the wrong directory.
  * Client.Java-030 (Low / Testing coverage) — no Java test exercises
    the regenerated QueryActiveAlarmsRequest RPC.
  * Client.Java-031 (Low / Conventions) — README prose uses old short
    project names instead of canonical prefixed ones.

  * Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
    "Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
    that does not exist; actual layout is the flat top-level crate.

Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.

Process notes

- The `code-reviews/` tree was not in this working copy's git
  history (the local extract pre-dates the divergent branch that
  carried the reviews). Restored from `dd7ca16` via
  `git checkout dd7ca16 -- code-reviews/` before the re-review.
- Some "Resolved" entries in the restored findings.md reference
  fixes that landed on the divergent branch (the same one that
  carried the reviews) and are not present on the current main
  lineage. The re-review treats those statuses as historical;
  the new pass only files findings against HEAD's actual state.
- `python code-reviews/regen-readme.py --check` is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:34:30 -04:00

70 KiB

Code Review — Worker.Tests

Field Value
Module src/ZB.MOM.WW.MxGateway.Worker.Tests
Reviewer Claude Code
Review date 2026-05-24
Commit reviewed d692232
Status Reviewed
Open findings 0

Checklist coverage

2026-05-18 review (commit 6c64030)

# Category Result
1 Correctness & logic bugs Issues found: Worker.Tests-010 (weak substring assertion), Worker.Tests-011 (test name overstates what it proves).
2 mxaccessgw conventions Tests respect STA-affinity and the WorkerEnvelope frame protocol; naming-convention drift only (Worker.Tests-009).
3 Concurrency & thread safety Issues found: Worker.Tests-003/004/013 (wall-clock and fixed-delay timing assertions).
4 Error handling & resilience COMException/HResult, pipe-never-appears, malformed frames, shutdown-during-command, watchdog all covered; queue branch gap (Worker.Tests-015).
5 Security No real secrets; redaction explicitly tested. No issues found.
6 Performance & resource management Issues found: Worker.Tests-005 (MemoryStream not disposed), Worker.Tests-006 (MxAccessStaSession leak on assertion failure).
7 Design-document adherence Tests match docs/Worker*.md; docs/WorkerFrameProtocol.md is stale (Worker.Tests-007).
8 Code organization & conventions Issues found: Worker.Tests-009 (two naming conventions), Worker.Tests-014 (duplicated test doubles).
9 Testing coverage Issues found: Worker.Tests-001 (StaMessagePump untested), Worker.Tests-002 (COM-event delivery untested), Worker.Tests-012 (frame-validation gaps).
10 Documentation & comments Issues found: Worker.Tests-008 (misplaced redaction test), Worker.Tests-011 (misleading test name).

2026-05-20 re-review (commit 1cd51bb)

# Category Result
1 Correctness & logic bugs Issues found: Worker.Tests-018 (silent-skip masquerades as passing tests), Worker.Tests-024 (Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer swallows the real exception type).
2 mxaccessgw conventions Issues found: Worker.Tests-019 (AlarmsLiveSmokeTests uses snake_case outside the alarm-method scope Worker.Tests-009 corrected); pre-existing LiveMxAccessFactAttribute is not consumed by MxAccessLiveComCreationTests (Worker.Tests-018).
3 Concurrency & thread safety Issues found: Worker.Tests-020 (MxAccessValueCacheTests.TryWaitForUpdate_ReturnsFalseAfterDeadline_WhenNoSetOccurs asserts wall-clock floor and pump-call lower bound).
4 Error handling & resilience Issues found: Worker.Tests-021 (WorkerFrameProtocolErrorCode.EndOfStream and the writer-side MessageTooLarge/InvalidEnvelope branches are uncovered).
5 Security Redaction coverage is sound; no new issues.
6 Performance & resource management No new issues — MemoryStream/session-disposal hygiene fixes from the prior pass hold; WorkerFrameReader ArrayPool rent/return path is now regression-tested.
7 Design-document adherence No new issues.
8 Code organization & conventions Issues found: Worker.Tests-016 (the now-shared MxAccessSession reflection construction in AlarmCommandExecutorTests duplicates the testable surface the consolidated TestSupport folder was meant to host).
9 Testing coverage Issues found: Worker.Tests-017 (WorkerCancel envelope-dispatch path untested), Worker.Tests-022 (WnWrapAlarmConsumer.PollOnce transition-delta computation untested at the snapshot-to-transitions level).
10 Documentation & comments Issues found: Worker.Tests-023 (AlarmClientWmProbeTests and WnWrapConsumerProbeTests are unit-test classes carrying 1000+ lines of probe-only code; their [Fact(Skip=...)] status is documented but the probe scaffolding is mixed into the same test assembly as regression tests).

2026-05-20 re-review (commit a020350)

# Category Result
1 Correctness & logic bugs No new issues — Worker.Tests-018/024 fixes hold; the new WriteAsync_WithEmptyEnvelope_ThrowsInvalidEnvelopeFromValidator correctly documents that the writer-side defensive zero-length branch is intercepted by WorkerEnvelopeValidator.Validate.
2 mxaccessgw conventions Issues found: Worker.Tests-025 (LiveMxAccessFactAttribute duplicated in Worker.Tests and IntegrationTests with no shared constant — divergent-by-drift risk).
3 Concurrency & thread safety Issues found: Worker.Tests-027 (FakeRuntimeSession.CancelCommandReturnValue mutated without the same gate lock that protects cancelledCorrelationIds/snapshot/events).
4 Error handling & resilience No new issues — Worker.Tests-021 closed all three uncovered protocol branches.
5 Security No new issues.
6 Performance & resource management No new issues.
7 Design-document adherence Issues found: Worker.Tests-028 (Worker.Tests-023 resolution promised an docs/GatewayTesting.md paragraph describing the probe surface; the doc was never updated, so the partition is invisible outside the source tree).
8 Code organization & conventions Issues found: Worker.Tests-026 (MxAccessSession.CreateForTesting has no runtime guard preventing accidental production use — only the internal modifier plus InternalsVisibleTo separates it from the live Create path); Worker.Tests-029 (Probes moved to Probes/ folder but kept the unit-test MxGateway.Worker.Tests namespace, so a namespace-based filter cannot distinguish probes from regression tests).
9 Testing coverage No new issues — the five LiveMxAccessFact-gated tests in MxAccessLiveComCreationTests and the ComputeTransitions unit tests close the previously identified gaps.
10 Documentation & comments Issues found: Worker.Tests-030 (CreateCancelEnvelope uses Sequence = 4 while the immediately-following CreateShutdownEnvelope uses Sequence = 3; the cancel test writes them in 4-then-3 order, which works because the worker has no inbound sequence-monotonicity check — but the numbering is misleading to a future reader and contradicts the gateway-side monotonic-sequence convention gateway.md documents for outbound).

2026-05-24 review (commit d692232)

Re-review pass at d692232. Diff against a020350 is the rename-only namespace/csproj update (commit dc9c0c9). The InternalsVisibleTo on the Worker project points at the new ZB.MOM.WW.MxGateway.Worker.Tests assembly name; the live-test gating attribute still reads the unchanged MXGATEWAY_RUN_LIVE_MXACCESS_TESTS env var. No behavioural changes; the prior findings (Worker.Tests-001 through -030) are unaffected.

# Category Result
1 Correctness & logic bugs No issues found in the a020350..d692232 diff.
2 mxaccessgw conventions No issues found — namespaces updated; env-var names unchanged.
3 Concurrency & thread safety No issues found in this diff.
4 Error handling & resilience No issues found in this diff.
5 Security No issues found in this diff.
6 Performance & resource management No issues found in this diff.
7 Design-document adherence No issues found in this diff.
8 Code organization & conventions No issues found in this diff.
9 Testing coverage No issues found in this diff.
10 Documentation & comments No issues found in this diff.

Findings

Worker.Tests-001

Field Value
Severity High
Category Testing coverage
Location src/MxGateway.Worker.Tests/Sta/ (no StaMessagePumpTests.cs)
Status Resolved

Description: StaMessagePump — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. WaitForWorkOrMessages (timeout conversion, the MsgWaitForMultipleObjectsEx failure path) and PumpPendingMessages (drain count) are exercised only indirectly via StaRuntime, which never asserts the pump returns/throws correctly. The MsgWaitFailed error branch and ToTimeoutMilliseconds edge cases (InfiniteTimeSpan, <= Zero, >= uint.MaxValue) are completely uncovered.

Recommendation: Add StaMessagePumpTests that post a Windows message to the STA thread and assert PumpPendingMessages returns the expected count; cover WaitForWorkOrMessages waking on a signaled event vs timeout; cover ToTimeoutMilliseconds boundaries through an internals-visible seam.

Resolution: 2026-05-18 — Added src/MxGateway.Worker.Tests/Sta/StaMessagePumpTests.cs (8 [Fact] tests, run on dedicated STA threads). Covers WaitForWorkOrMessages null-argument validation, returning immediately when the wake event is pre-signalled, waking when the event is signalled mid-wait, returning on timeout when never signalled, the TimeSpan.Zero (<= Zero) conversion branch, and waking on a WM_NULL Windows message posted to the STA thread (the QS_ALLINPUT path). PumpPendingMessages is covered for both an empty queue (returns 0) and three posted messages (returns 3). Boundary noted in the file: the MsgWaitFailed branch is not exercised because forcing MsgWaitForMultipleObjectsEx to fail needs a deliberately invalid native handle, which is unsafe to construct in-process; ToTimeoutMilliseconds is private static and is covered indirectly through wait-latency assertions rather than reflection.

Worker.Tests-002

Field Value
Severity High
Category Testing coverage
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs, src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs
Status Resolved

Description: No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the MxAccessEventQueue. MxAccessEventMapperTests exercises the mapper directly with hand-built fakes, and AlarmDispatcherTests covers the alarm sink, but the non-alarm COM-event path (MxAccessBaseEventSink/MxAccessComServer event handlers → MxAccessEventMapper → queue, triggered by an actual sink callback) is never end-to-end tested. Given the worker's core purpose is to convert COM events to protobuf, this is a significant gap.

Recommendation: Add a test that invokes the base event sink's data-change handler (via an internal seam or a fake COM event source) and asserts a converted WorkerEvent with correct family/sequence appears in the queue.

Resolution: 2026-05-18 — Added src/MxGateway.Worker.Tests/MxAccess/MxAccessBaseEventSinkTests.cs (5 [Fact] tests). The four MxAccessBaseEventSink COM event handlers (OnDataChange, OnWriteComplete, OperationComplete, OnBufferedDataChange) — the exact delegate targets the MXAccess COM runtime invokes — were widened from private to internal (with XML-doc notes that this is a unit-test seam), and [assembly: InternalsVisibleTo("MxGateway.Worker.Tests")] was added to MxGateway.Worker.csproj. The tests construct a real MxAccessBaseEventSink over a real MxAccessEventMapper and MxAccessEventQueue, invoke each handler with COM-style arguments, and assert a correctly-converted protobuf WorkerEvent (family, body case, server/item handle, value, quality, source timestamp, monotonic WorkerSequence) lands in the queue. Boundary noted in the file: the COM += wire-up in Attach/Detach casts to the sealed LMXProxyServerClass RCW and cannot run without a live MXAccess COM object, so it is not exercised; invoking the handlers directly reproduces an STA-thread COM callback and exercises the genuine conversion + enqueue path.

Worker.Tests-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/MxGateway.Worker.Tests/Sta/StaRuntimeTests.cs:46-48
Status Resolved

Description: InvokeAsync_WakesIdlePumpForQueuedCommand asserts stopwatch.Elapsed < TimeSpan.FromSeconds(2) — a wall-clock assertion that on a loaded CI agent can exceed 2s, producing a false failure. The test also does not actually prove the wake event (vs the 50 ms idle pump) caused the dispatch.

Recommendation: Remove the wall-clock assertion (the awaited result already proves the command ran), or raise the budget substantially with a comment that it is a coarse smoke check.

Resolution: 2026-05-18 — Removed the Stopwatch and the stopwatch.Elapsed < TimeSpan.FromSeconds(2) wall-clock assertion from InvokeAsync_WakesIdlePumpForQueuedCommand. The test already constructs the StaRuntime with a 30-second idle pump period, so the awaited InvokeAsync completing at all proves the command wake event — not the idle pump tick — drove the dispatch; no timing budget is needed. The XML-doc comment now states this explicitly. The now-unused using System.Diagnostics; was removed (TreatWarningsAsErrors).

Worker.Tests-004

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:281-329
Status Resolved

Description: StartAsync_WithAlarmCommandHandlerFactory_PollOnceCalledViaSta and Dispose_StopsAlarmPollLoop use poll-until loops, and Dispose_StopsAlarmPollLoop additionally does await Task.Delay(1000) then asserts PollCount is unchanged. The 1s "no further polls" window is a timing race: a poll scheduled just before disposal could increment the counter afterward, and a slow agent could simply not run a poll in the window even without correct stop logic.

Recommendation: Make the poll loop deterministically observable — expose a "poll loop stopped" signal or have Dispose join the poll task — then assert on that rather than on elapsed-time silence.

Resolution: 2026-05-18 — MxAccessStaSession.Dispose now joins the alarm poll task (pollTaskToJoin.Wait(TimeSpan.FromSeconds(5))) after cancelling the poll CTS, instead of setting alarmPollTask = null and discarding it. Once Dispose returns, the poll loop has provably exited and no PollOnce call can still be in flight. Dispose_StopsAlarmPollLoop was rewritten to drop the await Task.Delay(1000) "no further polls" window: it now captures PollCount immediately after Dispose() returns and re-asserts equality after a bare await Task.Yield() — a deterministic frozen-count check rather than an elapsed-time race. The success-direction poll-until loop in PollOnceCalledViaSta was left as-is: waiting for an event to occur is sound; only waiting for an event to not occur is the race, and that pattern is now eliminated. Note: ShutdownGracefullyAsync already joined the poll task, so this change makes Dispose consistent with the graceful path.

Worker.Tests-005

Field Value
Severity Medium
Category Performance & resource management
Location src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105, src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31
Status Resolved

Description: MemoryStream instances are created and never disposed across the frame-protocol and pipe-session tests (MemoryStream stream = new(); with no using). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suite (which carefully usings CancellationTokenSource, StaRuntime, PipePair). WorkerFrameWriter/WorkerFrameReader are also constructed without disposal.

Recommendation: Wrap MemoryStream (and reader/writer if they are IDisposable) in using declarations for consistency.

Resolution: 2026-05-18 — All six MemoryStream test-body declarations in WorkerFrameProtocolTests.cs and the five inbound/outbound MemoryStream declarations in the WorkerPipeSessionTests.cs handshake tests were converted to using declarations, matching how the rest of the suite handles CancellationTokenSource/StaRuntime/PipePair. Re-triage of the parenthetical: WorkerFrameWriter and WorkerFrameReader are not IDisposable (sealed class with no IDisposable and no Dispose member — verified in src/MxGateway.Worker/Ipc/), so the finding's "reader/writer if they are IDisposable" suggestion does not apply and no change was made there. The shared MemoryStream instances inside the WorkerPipeSessionTests harness/helper classes (ReadWrittenFrames parameter, the PipePair/harness fields) are out of the cited line scope and were left untouched.

Worker.Tests-006

Field Value
Severity Medium
Category Performance & resource management
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323
Status Resolved

Description: Dispose_StopsAlarmPollLoop constructs MxAccessStaSession session without using (unlike every sibling test) and relies on an explicit session.Dispose(). If an assertion between StartAsync and Dispose() throws, the session — its STA thread and poll loop — leaks for the rest of the run. The StaRuntime is usingd so the thread is eventually reclaimed, but the alarm poll loop and handler are not.

Recommendation: Use using MxAccessStaSession session = ... and drop the manual Dispose(), or wrap the body in try/finally.

Resolution: 2026-05-18 — Dispose_StopsAlarmPollLoop now declares its MxAccessStaSession with a using declaration. The manual session.Dispose() is kept because the test's purpose is to observe poll behaviour across disposal — but MxAccessStaSession.Dispose is idempotent (guarded by the disposed field), so the explicit mid-test call and the using-scope call do not conflict. An assertion thrown anywhere in the body now still tears the session (STA poll loop + alarm handler) down. The cited line numbers in the finding were imprecise — they straddle PollOnceCalledViaSta and Dispose_StopsAlarmPollLoop — but the described root cause (one MxAccessStaSession constructed without using) was singular and is the one in Dispose_StopsAlarmPollLoop; the sibling tests PollOnceCalledViaSta and RunAlarmPollLoop_WhenPollOnceThrows_RecordsFaultOnEventQueue already used using and needed no change.

Worker.Tests-007

Field Value
Severity Medium
Category Design-document adherence
Location docs/WorkerFrameProtocol.md:38-49
Status Resolved

Description: docs/WorkerFrameProtocol.md instructs running dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests and states the frame protocol "is part of MxGateway.Server". The frame protocol actually lives in MxGateway.Worker.Ipc and is tested by src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs. The doc's verification command points at the wrong project and build, so anyone following it after changing the worker frame protocol will not run the relevant tests.

Recommendation: Update docs/WorkerFrameProtocol.md to reference src/MxGateway.Worker.Tests and the x86 worker build (-p:Platform=x86).

Resolution: 2026-05-18 — Rewrote the ## Verification section of docs/WorkerFrameProtocol.md. The test command now targets src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests; the build command now targets src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86. The prose now states the frame protocol lives in MxGateway.Worker.Ipc (naming WorkerFrameReader/WorkerFrameWriter/WorkerFrameProtocolOptions and the WorkerFrameProtocolTests.cs test file) and notes the worker is an x86 process. Verified against the source: the frame-protocol types are confirmed under src/MxGateway.Worker/Ipc/ and the tests under src/MxGateway.Worker.Tests/Ipc/, so the original doc was wrong on both project and component. Fenced code blocks were also relabelled powershell (the build/test commands are run from PowerShell on this Windows dev box).

Worker.Tests-008

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Worker.Tests/Conversion/VariantConverterTests.cs:175-182
Status Resolved

Description: Redactor_WithCredentialBearingValueFields_RedactsBeforeLogging lives in VariantConverterTests but asserts on WorkerLogRedactor.RedactValue, which has nothing to do with VariantConverter. It is also a near-duplicate of coverage in WorkerLogRedactorTests. Placing redaction coverage inside the variant-converter class is misleading.

Recommendation: Move this test into Bootstrap/WorkerLogRedactorTests.cs (which already exists and tests RedactFields).

Resolution: 2026-05-18 — The misplaced redaction test was removed from VariantConverterTests.cs and re-added to Bootstrap/WorkerLogRedactorTests.cs as RedactValue_WithCredentialBearingFieldNames_ReturnsRedactedValue — alongside the existing RedactFields coverage, where redaction tests belong. Confirmed root cause: the old test asserted only on WorkerLogRedactor.RedactValue and never touched VariantConverter. The now-orphaned using MxGateway.Worker.Bootstrap; was removed from VariantConverterTests.cs (TreatWarningsAsErrors). The new home is RedactValue per-field coverage; WorkerLogRedactorTests.RedactFields_... already covers the dictionary path, so the two are complementary rather than duplicates.

Worker.Tests-009

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs, AlarmDispatcherTests.cs, AlarmCommandExecutorTests.cs, AlarmRecordTransitionMapperTests.cs, WnWrapAlarmConsumerXmlTests.cs
Status Resolved

Description: The alarm-related test files use snake_case method names while the rest of the project uses the Method_State_Result PascalCase convention. docs/style-guides/CSharpStyleGuide.md and the surrounding code establish PascalCase as the project convention; the alarm files diverge.

Recommendation: Rename alarm-test methods to the Method_Scenario_Expectation PascalCase form for one consistent convention.

Resolution: 2026-05-18 — Renamed every [Fact]/[Theory] method in the five alarm test files from snake_case to the project's Method_Scenario_Expectation PascalCase form (46 test methods total: 10 in AlarmCommandHandlerTests, 8 in AlarmDispatcherTests, 12 in AlarmCommandExecutorTests, 8 in AlarmRecordTransitionMapperTests, 9 in WnWrapAlarmConsumerXmlTests minus the existing PascalCase probe methods). Only test methods were renamed — snake_case is not present; the method names that look like helpers (Subscribe, PollOnce, Dispose on the fake doubles) are interface implementations of IAlarmCommandHandler/IAlarmTransitionConsumer/IDisposable and were correctly left unchanged. The suite stays green; xUnit discovers tests by attribute, not name, so the renames are behaviour-neutral.

Worker.Tests-010

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:230-258
Status Resolved

Description: StartAsync_WithoutAlarmCommandHandlerFactory_SubscribeAlarmsReturnsInvalidRequest asserts Assert.Contains("alarm", reply.DiagnosticMessage, StringComparison.OrdinalIgnoreCase). The XML doc claims it verifies the diagnostic says "alarm consumer not configured", but the assertion only checks the substring "alarm" — which would also match an unrelated message like "invalid alarm GUID". The assertion is weaker than the documented intent.

Recommendation: Assert the full diagnostic phrase so the test fails if the diagnostic regresses to a misleading message.

Resolution: 2026-05-18 — The weak Assert.Contains("alarm", ...) was replaced with an exact Assert.Equal against the diagnostic the executor actually emits. Re-triage: the test's XML doc claimed the phrase was "alarm consumer not configured", but MxAccessCommandExecutor.ExecuteSubscribeAlarms (verified in src/MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:310-315) produces "SubscribeAlarms requires an alarm command handler; the worker was constructed without one." — the doc was wrong, so both the assertion and the XML doc were corrected to the real phrase. The test now fails if the diagnostic regresses to any other message.

Worker.Tests-011

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Worker.Tests/Sta/StaCommandDispatcherTests.cs:92-112
Status Resolved

Description: DispatchAsync_WhenCanceledAfterExecutionStarts_StillReturnsLateReply is named and documented as if it proves cancellation arrived after execution began. The test does Started.Wait(...) then cancellation.Cancel(), which proves execution started, but because the executor is already running on the STA the cancellation is inherently a no-op — the test cannot distinguish "cancel was observed and ignored" from "cancel was never checked". The name overstates what is proven.

Recommendation: Either tighten the test (assert the dispatcher's cancel path was reached and declined) or rename/comment it to "cancellation cannot abort an in-flight STA command", matching gateway.md's stated behavior.

Resolution: 2026-05-18 — Took the rename/re-document option. The test is renamed DispatchAsync_WhenCanceledWhileExecuting_DoesNotAbortInFlightCommand and its XML doc rewritten to state exactly what it proves — an in-flight STA command is not aborted by cancellation — and to state explicitly that the test cannot and does not distinguish "cancel observed and ignored" from "cancel never checked". The doc now cites gateway.md's wording ("cannot safely abort an in-flight COM call on the STA"). The test body is unchanged: it already asserts the command runs to completion and returns its normal Ok reply, which is the genuine behaviour. No runtime behaviour changed.

Worker.Tests-012

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs
Status Resolved

Description: docs/WorkerFrameProtocol.md states the reader "rejects zero-length payloads and payloads larger than the configured maximum (default 16 MiB) before allocating the payload buffer." WorkerFrameProtocolTests covers malformed-length, wrong protocol version, wrong session, and malformed payload, but has no test for the zero-length-payload rejection or the oversized-frame rejection — both explicit security-relevant input-validation paths.

Recommendation: Add tests feeding a frame with payload_length == 0 and one with payload_length above the configured maximum, asserting the corresponding WorkerFrameProtocolErrorCode.

Resolution: 2026-05-18 — Re-triage of the zero-length half: the finding's "no test for the zero-length-payload rejection" is partly inaccurate. The pre-existing ReadAsync_WithMalformedLength_ThrowsMalformedLength fed a four-zero-byte stream — which is exactly a frame declaring payload_length == 0 — so the zero-length path was already covered, just under a misleading name (the length prefix itself is well-formed; only the declared length is zero). That test was renamed ReadAsync_WithZeroLengthPayload_ThrowsMalformedLength with an XML doc explaining the four-zero-byte construction, rather than adding a duplicate. The oversized half was a genuine gap: a new ReadAsync_WithPayloadAboveConfiguredMaximum_ThrowsMessageTooLarge constructs WorkerFrameProtocolOptions with a 64-byte maximum, feeds a length prefix of 65, and asserts WorkerFrameProtocolErrorCode.MessageTooLarge — verified against WorkerFrameReader.ReadAsync, both checks fire before the payload buffer is rented. The small configured maximum keeps the test from allocating a multi-megabyte buffer.

Worker.Tests-013

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:539-546
Status Resolved

Description: ThrowIfCompletedAsync does an unconditional await Task.Delay(TimeSpan.FromMilliseconds(100)) then checks task.IsCompleted. This adds a fixed 100 ms to the test and only catches a RunAsync that fails within that arbitrary window; a session that faults after 100 ms slips past undetected.

Recommendation: Replace with a deterministic race: await Task.WhenAny(runTask, <first-expected-frame-read>) and assert the run task did not win.

Resolution: 2026-05-18 — ThrowIfCompletedAsync was deleted (it had a single call site, in RunAsync_SendsHeartbeatPayloadFromRuntimeSnapshot). That test now races runTask against the first-heartbeat ReadUntilAsync with Task.WhenAny; if runTask wins it is awaited to surface the underlying fault and the test fails via Assert.Fail. The fixed 100 ms delay is gone — the check is now deterministic: a RunAsync faulting at any time before the first heartbeat is caught, and a healthy run completes as soon as the heartbeat arrives instead of always paying 100 ms.

Worker.Tests-014

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Worker.Tests/Ipc/WorkerPipeClientTests.cs:194, WorkerPipeSessionTests.cs:622, Sta/StaCommandDispatcherTests.cs:348, MxAccess/MxAccessStaSessionTests.cs:334, MxAccess/MxAccessCommandExecutorTests.cs:1124
Status Resolved

Description: FakeRuntimeSession, NoopComApartmentInitializer, NoopEventSink/NullEventSink, and the CreateFrame/WriteUInt32LittleEndian helpers are re-implemented independently in multiple test files. The two FakeRuntimeSession implementations have already diverged (one supports BlockDispatch/event enqueue, one does not), and NoopComApartmentInitializer is defined four times.

Recommendation: Extract shared test doubles (NoopComApartmentInitializer, frame helpers, a single configurable FakeRuntimeSession) into a TestSupport folder/namespace consumed by all test classes.

Resolution: 2026-05-18 — Added a src/MxGateway.Worker.Tests/TestSupport/ folder (namespace MxGateway.Worker.Tests.TestSupport) with four shared doubles: NoopComApartmentInitializer, NoopEventSink, WorkerFrameTestHelpers (CreateFrame/WriteUInt32LittleEndian), and a single configurable FakeRuntimeSession. The consolidated FakeRuntimeSession is the richer of the two divergent copies (it supports BlockDispatch, event enqueue, shutdown-timeout, and throw-after-release); the minimal WorkerPipeClientTests caller simply leaves the options unset. The per-file copies were deleted from WorkerPipeClientTests, WorkerPipeSessionTests, StaCommandDispatcherTests, MxAccessStaSessionTests, MxAccessCommandExecutorTests, and WorkerFrameProtocolTests, and the orphaned NullEventSink in AlarmCommandExecutorTests was replaced with the shared NoopEventSink. Re-triage: the finding says NoopComApartmentInitializer "is defined four times" — it was defined three times (StaCommandDispatcherTests, MxAccessStaSessionTests, MxAccessCommandExecutorTests); the fourth alarm-area IStaComApartmentInitializer implementation is StaRuntimeTests.RecordingComApartmentInitializer, which is a recording double (asserts init/uninit ordering), not a no-op, so it was deliberately left in place rather than folded into the shared no-op. Unused using directives left behind by the removals were stripped (TreatWarningsAsErrors).

Worker.Tests-015

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessEventQueueTests.cs
Status Resolved

Description: MxAccessEventQueueTests covers monotonic sequencing, drain, capacity overflow, and first-fault-wins, but does not cover Drain with maxEvents: 0 (drain-all) — a branch FakeRuntimeSession.DrainEvents even special-cases — nor draining an empty queue, nor enqueue after a manual RecordFault. These are minor branches but the overflow/fault interaction is the worker's backpressure contract.

Recommendation: Add a Drain(0) drain-all test and an empty-queue drain test.

Resolution: 2026-05-18 — Added three tests to MxAccessEventQueueTests. Drain_WithZeroMaxEvents_DrainsAllEvents covers the maxEvents == 0 drain-all branch in MxAccessEventQueue.Drain (verified at src/MxGateway.Worker/MxAccess/MxAccessEventQueue.cs:174) — three events enqueued, Drain(0) returns all three in order and empties the queue. Drain_WhenQueueIsEmpty_ReturnsEmptyList covers the drainCount == 0 early-return branch for both Drain(0) and Drain(5) on an empty queue. Enqueue_AfterRecordFault_ThrowsInvalidOperationException covers the backpressure contract gap the finding flagged — after a manual RecordFault, Enqueue throws InvalidOperationException ("outbound event queue is faulted") and the event is not queued.

Worker.Tests-016

Field Value
Severity Medium
Category Code organization & conventions
Location src/MxGateway.Worker.Tests/MxAccess/AlarmCommandExecutorTests.cs:317-393
Status Resolved

Description: AlarmCommandExecutorTests reaches into MxAccessSession via reflection (typeof(MxAccessSession).GetConstructor(BindingFlags.NonPublic | BindingFlags.Instance, ..., new[] { typeof(object), typeof(IMxAccessServer), typeof(IMxAccessEventSink), typeof(MxAccessHandleRegistry), typeof(MxAccessValueCache), typeof(int) }, ...)) and provides an inline NullMxAccessServer no-op implementing every IMxAccessServer method. The XML doc admits the reflection-based path is fragile ("MxAccessSession private ctor signature changed; update the test seam."). The same NullMxAccessServer shape is reinventable wherever an executor is exercised in isolation; the consolidated TestSupport namespace introduced in Worker.Tests-014 was the natural home for it, but the no-op server lives in a single test file's private nested class instead. A future change to the private ctor signature breaks this one test in a way that requires re-reading the reflection call to diagnose, and a second test that wants the same no-op surface will reflectively duplicate it.

Recommendation: Either (a) add a non-reflective seam — a constructor or static factory marked internal-with-InternalsVisibleTo that takes IMxAccessServer + the existing dependencies, removing the reflection — or (b) move the NullMxAccessServer no-op and the reflection helper into TestSupport/NoopMxAccessSession.cs so any future test can share it and a ctor change is fixed in one place.

Resolution: 2026-05-20 — Took option (a) plus option (b). Added a non-reflective internal static MxAccessSession.CreateForTesting(IMxAccessServer, IMxAccessEventSink, MxAccessHandleRegistry?, MxAccessValueCache?, int?) factory in src/MxGateway.Worker/MxAccess/MxAccessSession.cs (lines 61-88), gated through the pre-existing <InternalsVisibleTo Include="MxGateway.Worker.Tests" /> in src/MxGateway.Worker/MxGateway.Worker.csproj. AlarmCommandExecutorTests.NewExecutor now calls MxAccessSession.CreateForTesting(new NoopMxAccessServer(), new NoopEventSink()) — no GetConstructor/Invoke/BindingFlags anywhere in the file. The previously per-file NullMxAccessServer no-op was extracted to the shared src/MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs (matching the TestSupport consolidation introduced in Worker.Tests-014); the XML doc on the new file explicitly cites Worker.Tests-016 for the rationale. A future change to the MxAccessSession private ctor signature now updates CreateForTesting in one place; the test file does not need to be edited.

Worker.Tests-017

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs
Status Resolved

Description: WorkerPipeSession.DispatchGatewayEnvelopeAsync (src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:365-385) has three documented branches: WorkerCommand, WorkerShutdown, and WorkerCancel. WorkerPipeSessionTests exercises the first two but never sends a WorkerCancel envelope, so the _runtimeSession?.CancelCommand(envelope.CorrelationId) path and the contract that the session forwards a cancel without faulting the pipe are uncovered. The default: arm (UnexpectedEnvelopeBody exception) is also uncovered — a gateway sending the wrong body case (e.g. another GatewayHello after the handshake) should produce a ProtocolViolation fault but no test asserts this.

Recommendation: Add two tests: one that writes a WorkerCancel envelope with a known correlation id and asserts FakeRuntimeSession.CancelCommand was called with that id (extend the shared FakeRuntimeSession to record cancel-correlation-ids); one that writes a post-handshake GatewayHello envelope and asserts the session writes a WorkerFault with category ProtocolViolation and exits the message loop.

Resolution: 2026-05-20 — Added two [Fact]s to WorkerPipeSessionTests and the supporting state to the shared FakeRuntimeSession. (1) RunAsync_WhenGatewaySendsWorkerCancel_ForwardsCorrelationIdToRuntimeSession writes a WorkerCancel envelope with correlation id "cancel-correlation-1" after the handshake, then drives a normal shutdown via SendShutdownAndWaitAsync — observing the shutdown ack proves the message loop kept running (no fault, no exit) and Assert.Contains("cancel-correlation-1", runtime.CancelledCorrelationIds) proves the cancel reached IWorkerRuntimeSession.CancelCommand. The shared FakeRuntimeSession was extended with a CancelledCorrelationIds snapshot list and an optional CancelCommandReturnValue (defaulting to false, preserving the prior behaviour). (2) RunAsync_WhenGatewaySendsUnexpectedEnvelopeBodyAfterHandshake_ThrowsAndExitsMessageLoop writes a second GatewayHello envelope post-handshake — valid envelope, invalid body case for the message-loop state — and asserts Assert.ThrowsAsync<WorkerFrameProtocolException>(async () => await runTask) with ErrorCode == WorkerFrameProtocolErrorCode.UnexpectedEnvelopeBody. Re-triage: the original recommendation said "the session writes a WorkerFault with category ProtocolViolation", but the source at src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:380-384 shows the default: arm throws WorkerFrameProtocolException; RunMessageLoopAsync has no fault-writing catch (only CompleteStartupHandshakeAsync writes faults during the handshake). The test XML doc records this — the contract pinned is the exception type/error-code and the message-loop exit, not a fault frame.

Worker.Tests-018

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessLiveComCreationTests.cs:18-31, 35-73, 75-145, 148-220, 222-342
Status Resolved

Description: Every [Fact] in MxAccessLiveComCreationTests gates on RunLiveMxAccessTests() and returns silently when the opt-in env var is not set. xUnit reports a Fact that returns normally as passed, so a CI run without MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 shows five green "live MXAccess" tests that did not run a single line of MXAccess code. docs/GatewayTesting.md and the IntegrationTests project already provide the correct pattern — LiveMxAccessFactAttribute (in src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs) emits xUnit's native Skipped status when the env var is absent — but MxAccessLiveComCreationTests does not consume it, so the gate is invisible in test output. The first test (StartAsync_WhenOptedIn_CreatesInstalledMxAccessComObjectOnSta) additionally inlines the env-var check (string.Equals(Environment.GetEnvironmentVariable(...), "1", StringComparison.Ordinal)) instead of using the local RunLiveMxAccessTests() helper, so the convention is inconsistent even within the same file.

Recommendation: Move LiveMxAccessFactAttribute into a shared location both projects can reference (e.g. MxGateway.Contracts.TestSupport or a new MxGateway.TestSupport shared project), and decorate the five MxAccessLiveComCreationTests methods with [LiveMxAccessFact] instead of [Fact]. Drop the inline env-var checks. Skipped runs will then report Skipped rather than Passed, and CI will distinguish "live MXAccess unavailable" from "live MXAccess opted in, succeeded".

Resolution: 2026-05-20 — Added a self-contained LiveMxAccessFactAttribute at src/MxGateway.Worker.Tests/TestSupport/LiveMxAccessFactAttribute.cs (namespace MxGateway.Worker.Tests.TestSupport) that mirrors the MxGateway.IntegrationTests attribute: when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is not 1, the attribute sets Skip so xUnit emits a native Skipped result rather than a misleading Passed. All five MxAccessLiveComCreationTests methods now use [LiveMxAccessFact]; the inline env-var check at the top of StartAsync_WhenOptedIn_CreatesInstalledMxAccessComObjectOnSta and the per-method if (!RunLiveMxAccessTests()) return; silent-returns were deleted. The worker tests target net48/x86 and the integration tests target net10.0, so introducing a cross-project shared assembly was not practical; the Worker.Tests attribute is a near-duplicate of the IntegrationTests attribute and the XML doc on the new file calls this out so the next reviewer understands why two copies exist. xUnit output now reports the five live tests as [SKIP] when the env var is absent — dotnet test ... shows Skipped: 9, Total: 274, with the five MxAccessLiveComCreationTests correctly counted as skipped rather than passed.

Worker.Tests-019

Field Value
Severity Low
Category mxaccessgw conventions
Location src/MxGateway.Worker.Tests/AlarmsLiveSmokeTests.cs:45, src/MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs:143, src/MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs:55
Status Resolved

Description: Worker.Tests-009 renamed every snake_case alarm-test method to the project's Method_Scenario_Expectation convention, but the rename missed the dev-rig probe and live-smoke [Fact]s in the MxGateway.Worker.Tests root (not under MxAccess/): AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip, AlarmClientWmProbeTests.Probe_AlarmClient_for_alarm_messages (and its helpers), and WnWrapConsumerProbeTests.ProbeWnWrapConsumer. These are [Fact(Skip=...)] so they never execute in normal CI, but they still drift from docs/style-guides/CSharpStyleGuide.md and contradict the resolution claim in Worker.Tests-009 that "every [Fact]/[Theory] method in the five alarm test files" was renamed.

Recommendation: Rename Alarms_full_pipeline_round_tripAlarms_FullPipelineRoundTrip_RaisesAndAcknowledges (or similar Method_Scenario_Expectation form) and apply the same convention to the two probe methods. xUnit discovers by attribute, not name, so renames are behaviour-neutral.

Resolution: 2026-05-20 — Renamed the three snake_case probe/smoke [Fact] methods to the project's Method_Scenario_Expectation PascalCase convention: Alarms_full_pipeline_round_tripAlarms_FullPipelineRoundTrip_RaisesAndAcknowledges (in Probes/AlarmsLiveSmokeTests.cs), ProbeAlarmClientWmMessagesProbeAlarmClient_OnDevRig_LogsAlarmWindowMessages (in Probes/AlarmClientWmProbeTests.cs), and ProbeWnWrapConsumerProbeWnWrapConsumer_OnDevRig_LogsXmlAlarmStream (in Probes/WnWrapConsumerProbeTests.cs). The three files have moved to Probes/ as part of Worker.Tests-023; the location columns above predate that move. xUnit discovers tests by attribute, so the renames are behaviour-neutral and the Skip strings still apply unchanged.

Worker.Tests-020

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.Worker.Tests/MxAccess/MxAccessValueCacheTests.cs:88-108
Status Resolved

Description: TryWaitForUpdate_ReturnsFalseAfterDeadline_WhenNoSetOccurs asserts both a lower wall-clock bound (stopwatch.ElapsedMilliseconds >= 60, deadline was 80ms) and pumpCalls > 1. The 60ms floor is the same class of timing race Worker.Tests-003/004/013 corrected elsewhere: on a loaded CI agent a Task.Run scheduling delay can push the wait's start past the deadline so the loop runs zero or one iteration, the wait returns slightly early of the 60ms floor, and the test fails through no fault of the production code. The pumpCalls > 1 check additionally races against the same scheduler — if the agent stalls the wait thread, pumpStep might fire only once before the deadline. The test purpose (verifying the timeout is honoured and pump-step is invoked) is sound but the assertions are wall-clock floors rather than deterministic checks.

Recommendation: Drop the elapsed-time floor and the pumpCalls > 1 assertion; verify only that result is false, value is default, and pumpCalls >= 1 (the pump must fire at least once, but not "more than once"). The fact that TryWaitForUpdate returned false after the deadline is the contract the test exists to pin; the timing strictness is incidental.

Resolution: 2026-05-20 — Eliminated the wall-clock dependency entirely (the equivalent of a manual time source for the DateTime.UtcNow-based deadline). The test now passes DateTime.UtcNow.AddMilliseconds(-1) — a deadline already in the past — so TryWaitForUpdate's loop pumps once, immediately observes the elapsed deadline, and returns false with zero Thread.Sleep. The Stopwatch/stopwatch.ElapsedMilliseconds >= 60 floor and the pumpCalls > 1 strict-inequality assertions are gone. With an already-expired deadline the contract is deterministic: exactly one pump call (the loop must pump before checking the deadline so MXAccess messages can dispatch on the calling thread even when the deadline has just expired), result == false, value is default. Matches the pattern Worker.Tests-003/004/013 used — drop wall-clock floor checks in favour of a deterministic signal.

Worker.Tests-021

Field Value
Severity Low
Category Error handling & resilience
Location src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs
Status Resolved

Description: WorkerFrameProtocolTests covers MalformedLength, MessageTooLarge (read-side, added in Worker.Tests-012), ProtocolVersionMismatch, SessionMismatch, and InvalidEnvelope on WorkerFrameReader. Three documented protocol-error branches remain uncovered: (1) WorkerFrameProtocolErrorCode.EndOfStream from WorkerFrameReader.ReadExactlyOrThrowAsync (src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:106) when the stream closes mid-frame — important because the gateway closing its end of the pipe during a partial read is the most common production transport failure; (2) WorkerFrameWriter rejecting an envelope whose CalculateSize() returns 0 with WorkerFrameProtocolErrorCode.InvalidEnvelope (WorkerFrameWriter.cs:46); (3) WorkerFrameWriter rejecting an envelope larger than MaxMessageBytes with WorkerFrameProtocolErrorCode.MessageTooLarge (WorkerFrameWriter.cs:53). The writer-side checks defend against a session that constructs a too-large envelope before sending it down the pipe — completely separate from the reader-side bounds the existing tests pin.

Recommendation: Add three tests: (a) ReadAsync_WhenStreamEndsMidFrame_ThrowsEndOfStream — feed a 4-byte length prefix declaring 100 bytes followed by only 50 bytes, assert EndOfStream; (b) WriteAsync_WithEnvelopeAboveConfiguredMaximum_ThrowsMessageTooLarge — construct WorkerFrameProtocolOptions with a small MaxMessageBytes and an envelope whose serialised size exceeds it, assert MessageTooLarge; (c) since WorkerEnvelope.CalculateSize() never returns 0 for a valid envelope (the protocol version field alone serializes), the InvalidEnvelope writer branch is genuinely unreachable in normal operation — either document this as defensive code that is intentionally untestable, or drop the check.

Resolution: 2026-05-20 — Added three [Fact]s to WorkerFrameProtocolTests.cs for the three uncovered protocol-error branches. (a) ReadAsync_WhenStreamEndsMidFrame_ThrowsEndOfStream builds a 4-byte length prefix declaring 100 bytes followed by only 50 bytes, drives WorkerFrameReader.ReadAsync against it, and asserts WorkerFrameProtocolErrorCode.EndOfStream — pins the gateway-closes-mid-read transport failure. (b) WriteAsync_WithEnvelopeAboveConfiguredMaximum_ThrowsMessageTooLarge constructs WorkerFrameProtocolOptions with MaxMessageBytes=64, builds a GatewayHello envelope whose GatewayVersion is padded to 1024 bytes, asserts WorkerFrameProtocolErrorCode.MessageTooLarge and that the stream stayed empty (zero bytes written). (c) WriteAsync_WithEmptyEnvelope_ThrowsInvalidEnvelopeFromValidator exercises the body-less path — WorkerEnvelopeValidator.Validate runs first and rejects an envelope whose BodyCase is None with InvalidEnvelope, so the CalculateSize()==0 branch is intercepted before it fires; the XML doc explicitly documents that the defensive zero-length branch is unreachable through public API but is left in place as a one-comparison safety net against future serialisation regressions. Net change: three new tests, all green; the reader-side EndOfStream plus writer-side MessageTooLarge/InvalidEnvelope rejections are now regression-protected.

Worker.Tests-022

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Worker.Tests/MxAccess/WnWrapAlarmConsumerXmlTests.cs
Status Resolved

Description: WnWrapAlarmConsumerXmlTests covers ParseSnapshotXml and TryParseHexGuid directly — the pure-helper layer — and pins the no-internal-timer Worker-001 invariant via reflection. The PollOnce transition-delta logic (src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:289-337) is what actually turns "snapshot N to snapshot N+1" into MxAlarmTransitionEvent instances, and is the only place the consumer makes state-management decisions: skip-when-state-unchanged, fire-with-previous-state-Unspecified for first sighting, and (implicitly) drop entries that vanished from the new snapshot. None of these branches are exercised — the live-smoke AlarmsLiveSmokeTests covers the end-to-end pipeline but is [Fact(Skip=...)] against the dev rig, so there is no in-CI coverage of "snapshot delta computation produces the right transitions" at all. A regression that, for example, emits a transition every poll regardless of state-change would slip through.

Recommendation: Refactor PollOnce's snapshot-diff loop into a pure internal static IReadOnlyList<MxAlarmTransitionEvent> ComputeTransitions(Dictionary<Guid,MxAlarmSnapshotRecord> previous, Dictionary<Guid,MxAlarmSnapshotRecord> next) and add direct unit tests: (a) new entry produces PreviousState=Unspecified; (b) state-unchanged produces no transition; (c) state-changed produces a transition with the prior state; (d) entry vanished from next produces no transition (an alarm cleared from the active set; the snapshot just no longer mentions it). MxAccessStaSession already drives the COM-side polling, so the diff is genuinely independent of any COM dependency.

Resolution: 2026-05-20 — Extracted the snapshot-diff loop from WnWrapAlarmConsumer.PollOnce into a pure internal static IReadOnlyList<MxAlarmTransitionEvent> ComputeTransitions(Dictionary<Guid,MxAlarmSnapshotRecord> previous, Dictionary<Guid,MxAlarmSnapshotRecord> next) in src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs. PollOnce now calls ComputeTransitions under the same syncRoot lock; the diff rules are unchanged. Added five [Fact]s in WnWrapAlarmConsumerXmlTests.cs exercising all four branches plus a multi-alarm fan-out case: ComputeTransitions_WhenAlarmIsNewInNextSnapshot_EmitsTransitionWithUnspecifiedPreviousState, ComputeTransitions_WhenAlarmStateUnchanged_EmitsNoTransition, ComputeTransitions_WhenAlarmStateChanged_EmitsTransitionWithPriorState, ComputeTransitions_WhenAlarmDroppedFromActiveSet_EmitsNoTransition, and ComputeTransitions_WithMixedDelta_EmitsOnlyNewAndChangedTransitions. Each test drives the function with Dictionary<Guid,MxAlarmSnapshotRecord> snapshots built from a NewRecord helper — no COM, no STA. A regression that emits a transition every poll regardless of state, swaps the previous/next ordering, or treats a dropped alarm as a transition now fails in-CI.

Worker.Tests-023

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs (779 lines), src/MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs (287 lines), src/MxGateway.Worker.Tests/AlarmsLiveSmokeTests.cs (270 lines)
Status Resolved

Description: Three large dev-rig "probe" files are mixed into the worker unit-test project but are not unit tests in the usual sense: each is a [Fact(Skip="Runtime probe — flip Skip=null on the dev rig (AVEVA installed)...")] driver that runs hundreds of seconds, opens real Galaxy subscriptions, posts Windows messages on STA threads, captures alarm payloads to ITestOutputHelper, and exists to document AVEVA COM behaviour rather than gate it. AlarmClientWmProbeTests alone is 779 lines — larger than every genuine unit-test file in the project. Build-time these files contribute 1300+ lines of probe scaffolding that consumers of the project's "what is Worker.Tests for?" inspection have to wade through. The Skip-attribute strings document why they exist, but a colocated docs/AlarmProbes.md (or moving the probes to a separate MxGateway.Worker.Probes non-test assembly) would make the distinction explicit and stop the probe files from inflating Worker.Tests' build/test surface.

Recommendation: Either (a) carve the three probe files out into src/MxGateway.Worker.Probes/ (a separate project the dev-rig user opts into; the assembly references stay the same), or (b) move them into a Probes/ subfolder inside MxGateway.Worker.Tests and add a one-paragraph header in docs/GatewayTesting.md describing the probe surface. Option (a) is cleaner because the live-smoke AlarmsLiveSmokeTests already references WnWrapAlarmConsumer directly and would naturally cohabit with the other AVEVA-COM probes.

Resolution: 2026-05-20 — Took option (b): moved AlarmClientWmProbeTests.cs, WnWrapConsumerProbeTests.cs, and AlarmsLiveSmokeTests.cs from src/MxGateway.Worker.Tests/ into a new src/MxGateway.Worker.Tests/Probes/ subfolder. The files keep their existing namespace (MxGateway.Worker.Tests) and their [Fact(Skip=...)] gating; the SDK-style project picks them up under the new path without a .csproj change. Option (b) was chosen over (a) because the probes still rely on the same test-project package references (xunit, Microsoft.NET.Test.Sdk, Xunit.Abstractions) plus the Interop.WNWRAPCONSUMERLib/ArchestrA.MxAccess/aaAlarmManagedClient/IAlarmMgrDataProvider references already declared in MxGateway.Worker.Tests.csproj; a separate MxGateway.Worker.Probes project would have to duplicate every one of these. The probes remain runnable on the dev rig by flipping Skip=null exactly as before. The Worker.Tests root listing now contains only genuine unit-test/regression files; probe scaffolding is visibly partitioned by directory.

Worker.Tests-024

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Worker.Tests/MxAccess/AlarmCommandHandlerTests.cs:42-54
Status Resolved

Description: Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer asserts that an exception during IMxAccessAlarmConsumer.Subscribe triggers consumer disposal. The fake throws new InvalidOperationException("simulated wnwrap subscribe failure") and the test asserts Assert.Throws<InvalidOperationException>(() => handler.Subscribe(...)). But AlarmCommandHandler.Subscribe (src/MxGateway.Worker/MxAccess/AlarmCommandHandler.cs:65-93) wraps the underlying call and re-throws — so an InvalidOperationException from any code path inside Subscribe (e.g. its own "already subscribed" guard at line 73) would also satisfy the assertion. The test does not pin that the thrown exception is the one from the fake; if AlarmCommandHandler regressed to throw before reaching the consumer, the test would still pass with consumer.Disposed == false ... except the test additionally asserts consumer.Disposed is true, which would fail. So the test does pin the disposal behaviour. The genuine weakness is that the assertion doesn't pin the exception message either ("simulated wnwrap subscribe failure"), so an unexpected InvalidOperationException from a different branch with a misleading message would pass without anyone noticing the handler swallowed the real failure cause.

Recommendation: Strengthen to InvalidOperationException exception = Assert.Throws<InvalidOperationException>(...); Assert.Contains("simulated wnwrap subscribe failure", exception.Message) — pin both the type and the originating message so a regression that throws a different InvalidOperationException from inside AlarmCommandHandler fails the test.

Resolution: 2026-05-20 — Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer now captures the thrown exception and asserts Assert.Contains("simulated wnwrap subscribe failure", exception.Message) against the fake's exact thrown message. A regression that throws a different InvalidOperationException from inside AlarmCommandHandler (for example its own "already subscribed" guard at line 73 of AlarmCommandHandler.cs) now fails the message-contains assertion — the original test's type-only Assert.Throws<InvalidOperationException> would have passed silently while hiding the swallowed failure cause. The disposal assertion (consumer.Disposed == true) is unchanged; the test now pins both the disposal contract and the origin of the propagated exception. XML doc on the test method documents the regression scenario.

Worker.Tests-025

Field Value
Severity Low
Category mxaccessgw conventions
Location src/MxGateway.Worker.Tests/TestSupport/LiveMxAccessFactAttribute.cs:23, src/MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:5, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs:9-12
Status Resolved

Description: Worker.Tests-018 resolved the silent-skip issue by adding a Worker.Tests-local LiveMxAccessFactAttribute. The resolution called out that "introducing a cross-project shared assembly was not practical" because Worker.Tests targets net48/x86 and IntegrationTests targets net10.0. The two copies are correct today but the contract is held only by convention — both define LiveMxAccessVariableName = "MXGATEWAY_RUN_LIVE_MXACCESS_TESTS" as separate public const string literals, with the same =="1" StringComparison.Ordinal check duplicated. The IntegrationTests copy delegates to IntegrationTestEnvironment.LiveMxAccessTestsEnabled/IsEnabled, so any future opt-in tweak (e.g. accepting "true" as well, or honouring a different env-var name) made in IntegrationTestEnvironment will silently leave Worker.Tests behind. The XML doc on the Worker.Tests copy acknowledges this risk in prose but the divergence is invisible at compile time — there's no test or assertion that pins the two opt-in checks return the same answer.

Recommendation: Either (a) lift the env-var-name string into MxGateway.Contracts (which already multi-targets net10.0;net48) as a public const string, then both LiveMxAccessFactAttribute copies reference the same constant; (b) add a single unit test in Worker.Tests that pins LiveMxAccessFactAttribute.LiveMxAccessVariableName == "MXGATEWAY_RUN_LIVE_MXACCESS_TESTS" to make the contract literal-visible to any reviewer changing the name; (c) document the synchronization requirement in docs/GatewayTesting.md alongside the existing live-opt-in section.

Resolution: 2026-05-20 — Added GatewayContractInfo.LiveMxAccessOptInVariableName to MxGateway.Contracts (net10.0/net48-multi-targeted) and routed both LiveMxAccessFactAttribute copies plus IntegrationTestEnvironment.LiveMxAccessVariableName through that single constant; the env-var literal now lives in one place.

Worker.Tests-026

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Worker/MxAccess/MxAccessSession.cs:74-88
Status Resolved

Description: MxAccessSession.CreateForTesting (added in Worker.Tests-016) is declared internal static, gated only by <InternalsVisibleTo Include="MxGateway.Worker.Tests" /> in MxGateway.Worker.csproj. The XML doc states "production code must use the Create factory", but there is no runtime enforcement. The protection rests on (1) the internal modifier — which silently widens if any future InternalsVisibleTo directive is added (e.g. for an integration-test shim, a benchmark project, or an InternalsVisibleTo-using analyzer); and (2) reviewer attention. Worker.Tests itself contains real STA-running test code (the live tests, the probes), so a future test in Worker.Tests could call CreateForTesting from a context that has a real MXAccess COM object and the new object() placeholder would silently substitute. The factory hands out a session with mxAccessComObject = new object() so any code that later goes through Marshal.IsComObject or Marshal.FinalReleaseComObject on it would simply return false / no-op, masking lifetime regressions.

Recommendation: Add a one-line conditional guard — e.g. [Conditional("DEBUG")] is not appropriate (the worker also ships Release builds), but the factory could check that eventSink is not an MxAccessBaseEventSink (the production sink), throwing InvalidOperationException("CreateForTesting must not be used with the production MxAccessBaseEventSink"). Production code never passes that sink to a "for testing" factory; the asymmetry is the cheapest signal. Alternatively, gate the factory with [Obsolete("Test seam — never call from production code", error: false)] so any production call surfaces as a build warning (and TreatWarningsAsErrors would turn that into a build break).

Resolution: 2026-05-20 — Added a runtime guard to MxAccessSession.CreateForTesting that throws ArgumentException when the supplied eventSink is an MxAccessBaseEventSink (the production sink), so any future caller wiring the live sink into the test factory fails fast instead of silently bypassing Marshal.IsComObject on the new object() placeholder.

Worker.Tests-027

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.Worker.Tests/TestSupport/FakeRuntimeSession.cs:174, 179-187
Status Resolved

Description: The consolidated FakeRuntimeSession (introduced by Worker.Tests-014, extended for Worker.Tests-017) reads/writes cancelledCorrelationIds, snapshot, and events under lock(gate). The new CancelCommandReturnValue (a bool set by the test) is mutated outside any lock and read inside CancelCommand outside the lock as well (return CancelCommandReturnValue; after the locked cancelledCorrelationIds.Add). For a plain bool set before the worker's message-loop runs this is harmless on x86 (atomic-on-aligned-write), but it contradicts the rest of the file's locking convention and a future test that flips CancelCommandReturnValue mid-dispatch from a different thread would see an undocumented race. The same applies to BlockDispatch, ThrowAfterDispatchReleased, ThrowTimeoutOnShutdown, and Disposed — all are bool/auto-property without the gate lock — but those existed before Worker.Tests-017 and the finding flags only the consistency drift the new property introduces.

Recommendation: Either (a) hold lock(gate) when reading CancelCommandReturnValue inside CancelCommand, matching the surrounding locked statement; (b) mark CancelCommandReturnValue with volatile to document the cross-thread visibility; or (c) add an XML-doc note stating the property must be set before RunAsync begins and is not safe to mutate mid-test. Option (c) is cheapest and matches how BlockDispatch is used today.

Resolution: 2026-05-20 — Converted CancelCommandReturnValue to a private-backing-field property whose get/set both hold lock(gate), and folded the return statement of CancelCommand inside the existing locked block, so the property now respects the same locking convention as cancelledCorrelationIds, snapshot, and events.

Worker.Tests-028

Field Value
Severity Low
Category Design-document adherence
Location docs/GatewayTesting.md, src/MxGateway.Worker.Tests/Probes/
Status Resolved

Description: The Worker.Tests-023 resolution (commit a020350) stated that option (b) was taken — moving the three probe files to Probes/ — but the recommendation for option (b) was "move them into a Probes/ subfolder inside MxGateway.Worker.Tests and add a one-paragraph header in docs/GatewayTesting.md describing the probe surface." The folder move was made; the documentation addition was not. docs/GatewayTesting.md has no mention of Probes/, AlarmClientWmProbeTests, WnWrapConsumerProbeTests, or AlarmsLiveSmokeTests (verified with Grep against the doc). A reader navigating docs/GatewayTesting.md to understand the testing surface cannot tell the probes exist, what they pin, or how to flip Skip=null on the dev rig — the only documentation is the in-source Skip=... strings and the per-probe XML doc.

Recommendation: Add a ## Dev-rig probes (or similar) section to docs/GatewayTesting.md that names the three probe files, explains the probe contract (live AVEVA COM, Skip=null flip, no in-CI coverage), and points to the source location src/MxGateway.Worker.Tests/Probes/. One paragraph is enough; the existing [Fact(Skip=...)] strings carry the rest of the detail.

Resolution: 2026-05-20 — Added a ## Dev-rig Probes section to docs/GatewayTesting.md between the Live MXAccess Smoke and Live Galaxy Repository sections; the new section names the three probe files (AlarmsLiveSmokeTests, AlarmClientWmProbeTests, WnWrapConsumerProbeTests), explains the probe contract (live AVEVA COM, Skip=null flip on the dev rig, not part of the regression contract), and points to the source location src/MxGateway.Worker.Tests/Probes/.

Worker.Tests-029

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Worker.Tests/Probes/AlarmsLiveSmokeTests.cs:9, src/MxGateway.Worker.Tests/Probes/AlarmClientWmProbeTests.cs:14, src/MxGateway.Worker.Tests/Probes/WnWrapConsumerProbeTests.cs:10
Status Resolved

Description: Worker.Tests-023 partitioned the probes by directory (Probes/ subfolder) but kept their original namespace namespace MxGateway.Worker.Tests; rather than moving them to namespace MxGateway.Worker.Tests.Probes;. The folder/namespace mismatch is a minor C# convention drift (the project's other subfolder-grouped tests — Bootstrap/, Conversion/, MxAccess/, Sta/, Ipc/, TestSupport/, Contracts/, ProjectStructure/ — all use a MxGateway.Worker.Tests.<Subfolder> namespace matching the directory). It also means an xUnit test filter like --filter FullyQualifiedName~MxGateway.Worker.Tests.Probes will discover zero tests, so the partition is invisible to the runner: any CI-side rule that wants to exclude probes still has to enumerate file/class names individually rather than match by namespace.

Recommendation: Move the three probe files to namespace MxGateway.Worker.Tests.Probes;. xUnit discovers by attribute, not by namespace, so the rename is behaviour-neutral and lets a FullyQualifiedName~Probes filter trivially target them. The two other consolidations introduced in this sweep (TestSupport/MxGateway.Worker.Tests.TestSupport) already follow this pattern.

Resolution: 2026-05-20 — Moved AlarmsLiveSmokeTests, AlarmClientWmProbeTests, and WnWrapConsumerProbeTests to namespace MxGateway.Worker.Tests.Probes; so the folder and namespace match the project's other subfolder-grouped tests; a FullyQualifiedName~MxGateway.Worker.Tests.Probes filter now targets exactly the three probe classes. Verified by xUnit discovery output: the three probes appear under their new namespace as [SKIP].

Worker.Tests-030

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:862-890
Status Resolved

Description: Within WorkerPipeSessionTests, the inbound-envelope helpers assign Sequence values that are inconsistent with the order in which the tests send them: CreateGatewayHelloEnvelope is Sequence = 1, CreateCommandEnvelope is Sequence = 2, CreateShutdownEnvelope is Sequence = 3, and CreateCancelEnvelope is Sequence = 4. The Worker.Tests-017 cancel test sends the cancel (Sequence = 4) before the shutdown (Sequence = 3) — a future reader inspecting the wire trace will see decreasing sequence numbers. The test still passes because the worker has no inbound sequence-monotonicity check (verified by Greping Ipc/ for ValidateSequence/monotonic/sequence-comparison patterns — none exist). But gateway.md documents monotonic sequence numbers on the outbound side, and the test's literal sequence values suggest a convention that isn't enforced and can mislead a debugger correlating a frame dump to test intent.

Recommendation: Either (a) reassign CreateCancelEnvelope to a sequence value > shutdown (or pass the sequence as a parameter, matching CreateGatewayHelloEnvelope's parameter style), so the wire trace reads in ascending order; (b) add an XML-doc note on the cancel test stating that the worker has no inbound monotonicity check and the test ignores envelope sequence ordering; (c) parameterise all four helper methods so each test passes its desired sequence and the literal numbers stop carrying implicit meaning. Option (c) is the cleanest because CreateGatewayHelloEnvelope is already parameter-driven for nonce/version.

Resolution: 2026-05-20 — Took option (c): parameterised CreateGatewayHelloEnvelope/CreateCommandEnvelope/CreateCancelEnvelope/CreateShutdownEnvelope with a ulong sequence argument (defaults 1/2/2/3 respectively, matching the typical Hello/Command/Cancel/Shutdown ordering), so the literal sequence values no longer carry implicit meaning. Updated the cancel-correlation test's wire trace to ascend (Hello=1, Cancel=2, Shutdown=3) and added a comment noting that the worker has no inbound monotonicity check — the parameter exists so multi-frame tests can pin the trace ordering explicitly when needed.