Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5,7 +5,7 @@
|
||||
| Module | `src/MxGateway.Worker.Tests` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-20 |
|
||||
| Commit reviewed | `1cd51bb` |
|
||||
| Commit reviewed | `a020350` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 0 |
|
||||
|
||||
@@ -41,6 +41,21 @@
|
||||
| 9 | Testing coverage | Issues found: Worker.Tests-017 (`WorkerCancel` envelope-dispatch path untested), Worker.Tests-022 (`WnWrapAlarmConsumer.PollOnce` transition-delta computation untested at the snapshot-to-transitions level). |
|
||||
| 10 | Documentation & comments | Issues found: Worker.Tests-023 (`AlarmClientWmProbeTests` and `WnWrapConsumerProbeTests` are unit-test classes carrying 1000+ lines of probe-only code; their `[Fact(Skip=...)]` status is documented but the probe scaffolding is mixed into the same test assembly as regression tests). |
|
||||
|
||||
### 2026-05-20 re-review (commit `a020350`)
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | No new issues — Worker.Tests-018/024 fixes hold; the new `WriteAsync_WithEmptyEnvelope_ThrowsInvalidEnvelopeFromValidator` correctly documents that the writer-side defensive zero-length branch is intercepted by `WorkerEnvelopeValidator.Validate`. |
|
||||
| 2 | mxaccessgw conventions | Issues found: Worker.Tests-025 (`LiveMxAccessFactAttribute` duplicated in Worker.Tests and IntegrationTests with no shared constant — divergent-by-drift risk). |
|
||||
| 3 | Concurrency & thread safety | Issues found: Worker.Tests-027 (`FakeRuntimeSession.CancelCommandReturnValue` mutated without the same `gate` lock that protects `cancelledCorrelationIds`/`snapshot`/`events`). |
|
||||
| 4 | Error handling & resilience | No new issues — Worker.Tests-021 closed all three uncovered protocol branches. |
|
||||
| 5 | Security | No new issues. |
|
||||
| 6 | Performance & resource management | No new issues. |
|
||||
| 7 | Design-document adherence | Issues found: Worker.Tests-028 (Worker.Tests-023 resolution promised an `docs/GatewayTesting.md` paragraph describing the probe surface; the doc was never updated, so the partition is invisible outside the source tree). |
|
||||
| 8 | Code organization & conventions | Issues found: Worker.Tests-026 (`MxAccessSession.CreateForTesting` has no runtime guard preventing accidental production use — only the `internal` modifier plus `InternalsVisibleTo` separates it from the live `Create` path); Worker.Tests-029 (Probes moved to `Probes/` folder but kept the unit-test `MxGateway.Worker.Tests` namespace, so a namespace-based filter cannot distinguish probes from regression tests). |
|
||||
| 9 | Testing coverage | No new issues — the five `LiveMxAccessFact`-gated tests in `MxAccessLiveComCreationTests` and the `ComputeTransitions` unit tests close the previously identified gaps. |
|
||||
| 10 | Documentation & comments | Issues found: Worker.Tests-030 (`CreateCancelEnvelope` uses `Sequence = 4` while the immediately-following `CreateShutdownEnvelope` uses `Sequence = 3`; the cancel test writes them in 4-then-3 order, which works because the worker has no inbound sequence-monotonicity check — but the numbering is misleading to a future reader and contradicts the gateway-side monotonic-sequence convention `gateway.md` documents for outbound). |
|
||||
|
||||
## Findings
|
||||
|
||||
### Worker.Tests-001
|
||||
@@ -402,3 +417,93 @@
|
||||
**Recommendation:** Strengthen to `InvalidOperationException exception = Assert.Throws<InvalidOperationException>(...); Assert.Contains("simulated wnwrap subscribe failure", exception.Message)` — pin both the type and the originating message so a regression that throws a *different* `InvalidOperationException` from inside `AlarmCommandHandler` fails the test.
|
||||
|
||||
**Resolution:** 2026-05-20 — `Subscribe_WhenUnderlyingSubscribeThrows_DisposesConsumer` now captures the thrown exception and asserts `Assert.Contains("simulated wnwrap subscribe failure", exception.Message)` against the fake's exact thrown message. A regression that throws a *different* `InvalidOperationException` from inside `AlarmCommandHandler` (for example its own "already subscribed" guard at line 73 of `AlarmCommandHandler.cs`) now fails the message-contains assertion — the original test's type-only `Assert.Throws<InvalidOperationException>` would have passed silently while hiding the swallowed failure cause. The disposal assertion (`consumer.Disposed == true`) is unchanged; the test now pins both the disposal contract and the origin of the propagated exception. XML doc on the test method documents the regression scenario.
|
||||
|
||||
### Worker.Tests-025
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | mxaccessgw conventions |
|
||||
| Location | `src/MxGateway.Worker.Tests/TestSupport/LiveMxAccessFactAttribute.cs:23`, `src/MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:5`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs:9-12` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Worker.Tests-018 resolved the silent-skip issue by adding a Worker.Tests-local `LiveMxAccessFactAttribute`. The resolution called out that "introducing a cross-project shared assembly was not practical" because Worker.Tests targets net48/x86 and IntegrationTests targets net10.0. The two copies are correct today but the contract is held only by convention — both define `LiveMxAccessVariableName = "MXGATEWAY_RUN_LIVE_MXACCESS_TESTS"` as separate `public const string` literals, with the same `=="1"` `StringComparison.Ordinal` check duplicated. The IntegrationTests copy delegates to `IntegrationTestEnvironment.LiveMxAccessTestsEnabled`/`IsEnabled`, so any future opt-in tweak (e.g. accepting `"true"` as well, or honouring a different env-var name) made in `IntegrationTestEnvironment` will silently leave Worker.Tests behind. The XML doc on the Worker.Tests copy acknowledges this risk in prose but the divergence is invisible at compile time — there's no test or assertion that pins the two opt-in checks return the same answer.
|
||||
|
||||
**Recommendation:** Either (a) lift the env-var-name string into `MxGateway.Contracts` (which already multi-targets `net10.0;net48`) as a `public const string`, then both `LiveMxAccessFactAttribute` copies reference the same constant; (b) add a single unit test in Worker.Tests that pins `LiveMxAccessFactAttribute.LiveMxAccessVariableName == "MXGATEWAY_RUN_LIVE_MXACCESS_TESTS"` to make the contract literal-visible to any reviewer changing the name; (c) document the synchronization requirement in `docs/GatewayTesting.md` alongside the existing live-opt-in section.
|
||||
|
||||
**Resolution:** 2026-05-20 — Added `GatewayContractInfo.LiveMxAccessOptInVariableName` to `MxGateway.Contracts` (net10.0/net48-multi-targeted) and routed both `LiveMxAccessFactAttribute` copies plus `IntegrationTestEnvironment.LiveMxAccessVariableName` through that single constant; the env-var literal now lives in one place.
|
||||
|
||||
### Worker.Tests-026
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/MxGateway.Worker/MxAccess/MxAccessSession.cs:74-88` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `MxAccessSession.CreateForTesting` (added in Worker.Tests-016) is declared `internal static`, gated only by `<InternalsVisibleTo Include="MxGateway.Worker.Tests" />` in `MxGateway.Worker.csproj`. The XML doc states "production code must use the `Create` factory", but there is no runtime enforcement. The protection rests on (1) the `internal` modifier — which silently widens if any future `InternalsVisibleTo` directive is added (e.g. for an integration-test shim, a benchmark project, or an `InternalsVisibleTo`-using analyzer); and (2) reviewer attention. Worker.Tests itself contains real STA-running test code (the live tests, the probes), so a future test in Worker.Tests could call `CreateForTesting` from a context that has a real MXAccess COM object and the `new object()` placeholder would silently substitute. The factory hands out a session with `mxAccessComObject = new object()` so any code that later goes through `Marshal.IsComObject` or `Marshal.FinalReleaseComObject` on it would simply return false / no-op, masking lifetime regressions.
|
||||
|
||||
**Recommendation:** Add a one-line conditional guard — e.g. `[Conditional("DEBUG")]` is not appropriate (the worker also ships Release builds), but the factory could check that `eventSink` is *not* an `MxAccessBaseEventSink` (the production sink), throwing `InvalidOperationException("CreateForTesting must not be used with the production MxAccessBaseEventSink")`. Production code never passes that sink to a "for testing" factory; the asymmetry is the cheapest signal. Alternatively, gate the factory with `[Obsolete("Test seam — never call from production code", error: false)]` so any production call surfaces as a build warning (and `TreatWarningsAsErrors` would turn that into a build break).
|
||||
|
||||
**Resolution:** 2026-05-20 — Added a runtime guard to `MxAccessSession.CreateForTesting` that throws `ArgumentException` when the supplied `eventSink` is an `MxAccessBaseEventSink` (the production sink), so any future caller wiring the live sink into the test factory fails fast instead of silently bypassing `Marshal.IsComObject` on the `new object()` placeholder.
|
||||
|
||||
### Worker.Tests-027
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/MxGateway.Worker.Tests/TestSupport/FakeRuntimeSession.cs:174, 179-187` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** The consolidated `FakeRuntimeSession` (introduced by Worker.Tests-014, extended for Worker.Tests-017) reads/writes `cancelledCorrelationIds`, `snapshot`, and `events` under `lock(gate)`. The new `CancelCommandReturnValue` (a `bool` set by the test) is mutated outside any lock and read inside `CancelCommand` outside the lock as well (`return CancelCommandReturnValue;` after the locked `cancelledCorrelationIds.Add`). For a plain `bool` set before the worker's message-loop runs this is harmless on x86 (atomic-on-aligned-write), but it contradicts the rest of the file's locking convention and a future test that flips `CancelCommandReturnValue` mid-dispatch from a different thread would see an undocumented race. The same applies to `BlockDispatch`, `ThrowAfterDispatchReleased`, `ThrowTimeoutOnShutdown`, and `Disposed` — all are `bool`/auto-property without the `gate` lock — but those existed before Worker.Tests-017 and the finding flags only the consistency drift the new property introduces.
|
||||
|
||||
**Recommendation:** Either (a) hold `lock(gate)` when reading `CancelCommandReturnValue` inside `CancelCommand`, matching the surrounding locked statement; (b) mark `CancelCommandReturnValue` with `volatile` to document the cross-thread visibility; or (c) add an XML-doc note stating the property must be set before `RunAsync` begins and is not safe to mutate mid-test. Option (c) is cheapest and matches how `BlockDispatch` is used today.
|
||||
|
||||
**Resolution:** 2026-05-20 — Converted `CancelCommandReturnValue` to a private-backing-field property whose get/set both hold `lock(gate)`, and folded the return statement of `CancelCommand` inside the existing locked block, so the property now respects the same locking convention as `cancelledCorrelationIds`, `snapshot`, and `events`.
|
||||
|
||||
### Worker.Tests-028
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `docs/GatewayTesting.md`, `src/MxGateway.Worker.Tests/Probes/` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** The Worker.Tests-023 resolution (commit `a020350`) stated that option (b) was taken — moving the three probe files to `Probes/` — but the recommendation for option (b) was "move them into a `Probes/` subfolder inside `MxGateway.Worker.Tests` **and** add a one-paragraph header in `docs/GatewayTesting.md` describing the probe surface." The folder move was made; the documentation addition was not. `docs/GatewayTesting.md` has no mention of `Probes/`, `AlarmClientWmProbeTests`, `WnWrapConsumerProbeTests`, or `AlarmsLiveSmokeTests` (verified with `Grep` against the doc). A reader navigating `docs/GatewayTesting.md` to understand the testing surface cannot tell the probes exist, what they pin, or how to flip `Skip=null` on the dev rig — the only documentation is the in-source `Skip=...` strings and the per-probe XML doc.
|
||||
|
||||
**Recommendation:** Add a `## Dev-rig probes` (or similar) section to `docs/GatewayTesting.md` that names the three probe files, explains the probe contract (live AVEVA COM, `Skip=null` flip, no in-CI coverage), and points to the source location `src/MxGateway.Worker.Tests/Probes/`. One paragraph is enough; the existing `[Fact(Skip=...)]` strings carry the rest of the detail.
|
||||
|
||||
**Resolution:** 2026-05-20 — Added a `## Dev-rig Probes` section to `docs/GatewayTesting.md` between the Live MXAccess Smoke and Live Galaxy Repository sections; the new section names the three probe files (`AlarmsLiveSmokeTests`, `AlarmClientWmProbeTests`, `WnWrapConsumerProbeTests`), explains the probe contract (live AVEVA COM, `Skip=null` flip on the dev rig, not part of the regression contract), and points to the source location `src/MxGateway.Worker.Tests/Probes/`.
|
||||
|
||||
### Worker.Tests-029
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/MxGateway.Worker.Tests/Probes/AlarmsLiveSmokeTests.cs:9`, `src/MxGateway.Worker.Tests/Probes/AlarmClientWmProbeTests.cs:14`, `src/MxGateway.Worker.Tests/Probes/WnWrapConsumerProbeTests.cs:10` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Worker.Tests-023 partitioned the probes by directory (`Probes/` subfolder) but kept their original namespace `namespace MxGateway.Worker.Tests;` rather than moving them to `namespace MxGateway.Worker.Tests.Probes;`. The folder/namespace mismatch is a minor C# convention drift (the project's other subfolder-grouped tests — `Bootstrap/`, `Conversion/`, `MxAccess/`, `Sta/`, `Ipc/`, `TestSupport/`, `Contracts/`, `ProjectStructure/` — all use a `MxGateway.Worker.Tests.<Subfolder>` namespace matching the directory). It also means an xUnit test filter like `--filter FullyQualifiedName~MxGateway.Worker.Tests.Probes` will discover zero tests, so the partition is invisible to the runner: any CI-side rule that wants to exclude probes still has to enumerate file/class names individually rather than match by namespace.
|
||||
|
||||
**Recommendation:** Move the three probe files to `namespace MxGateway.Worker.Tests.Probes;`. xUnit discovers by attribute, not by namespace, so the rename is behaviour-neutral and lets a `FullyQualifiedName~Probes` filter trivially target them. The two other consolidations introduced in this sweep (`TestSupport/` → `MxGateway.Worker.Tests.TestSupport`) already follow this pattern.
|
||||
|
||||
**Resolution:** 2026-05-20 — Moved `AlarmsLiveSmokeTests`, `AlarmClientWmProbeTests`, and `WnWrapConsumerProbeTests` to `namespace MxGateway.Worker.Tests.Probes;` so the folder and namespace match the project's other subfolder-grouped tests; a `FullyQualifiedName~MxGateway.Worker.Tests.Probes` filter now targets exactly the three probe classes. Verified by xUnit discovery output: the three probes appear under their new namespace as `[SKIP]`.
|
||||
|
||||
### Worker.Tests-030
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:862-890` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Within `WorkerPipeSessionTests`, the inbound-envelope helpers assign `Sequence` values that are inconsistent with the order in which the tests send them: `CreateGatewayHelloEnvelope` is `Sequence = 1`, `CreateCommandEnvelope` is `Sequence = 2`, `CreateShutdownEnvelope` is `Sequence = 3`, and `CreateCancelEnvelope` is `Sequence = 4`. The Worker.Tests-017 cancel test sends the cancel (`Sequence = 4`) **before** the shutdown (`Sequence = 3`) — a future reader inspecting the wire trace will see decreasing sequence numbers. The test still passes because the worker has no inbound sequence-monotonicity check (verified by `Grep`ing `Ipc/` for `ValidateSequence`/`monotonic`/sequence-comparison patterns — none exist). But `gateway.md` documents monotonic sequence numbers on the outbound side, and the test's literal sequence values suggest a convention that isn't enforced and can mislead a debugger correlating a frame dump to test intent.
|
||||
|
||||
**Recommendation:** Either (a) reassign `CreateCancelEnvelope` to a sequence value `>` shutdown (or pass the sequence as a parameter, matching `CreateGatewayHelloEnvelope`'s parameter style), so the wire trace reads in ascending order; (b) add an XML-doc note on the cancel test stating that the worker has no inbound monotonicity check and the test ignores envelope sequence ordering; (c) parameterise all four helper methods so each test passes its desired sequence and the literal numbers stop carrying implicit meaning. Option (c) is the cleanest because `CreateGatewayHelloEnvelope` is already parameter-driven for nonce/version.
|
||||
|
||||
**Resolution:** 2026-05-20 — Took option (c): parameterised `CreateGatewayHelloEnvelope`/`CreateCommandEnvelope`/`CreateCancelEnvelope`/`CreateShutdownEnvelope` with a `ulong sequence` argument (defaults 1/2/2/3 respectively, matching the typical Hello/Command/Cancel/Shutdown ordering), so the literal sequence values no longer carry implicit meaning. Updated the cancel-correlation test's wire trace to ascend (Hello=1, Cancel=2, Shutdown=3) and added a comment noting that the worker has no inbound monotonicity check — the parameter exists so multi-frame tests can pin the trace ordering explicitly when needed.
|
||||
|
||||
Reference in New Issue
Block a user