code-reviews: 2026-06-16 re-review of all 11 modules at 8df5ab3

Re-review of the 99-commit delta since the 410acc9 baseline (session-resilience
epic, dashboard disable-login, galaxy browse fixes, and stillpending §8).

44 new Open findings, no Critical/High:
- Server 2 (incl. Medium design-doc drift), Worker 0 (026/027/028 confirmed
  resolved), Contracts 3, Tests 3, Worker.Tests 3, IntegrationTests 4
- Client.Dotnet 4 (Medium env-var key redaction), Client.Go 5 (Medium watch
  drain), Client.Java 9 (Medium overflow race), Client.Python 5 (Medium README
  API), Client.Rust 6 (Medium --tls/--plaintext downgrade)

README regenerated; regen-readme.py --check passes.
This commit is contained in:
Joseph Doherty
2026-06-16 18:57:56 -04:00
parent 8df5ab381a
commit 25d04ec37e
12 changed files with 936 additions and 44 deletions
+50 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Server` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 2 |
## Checklist coverage
@@ -69,6 +69,23 @@ findings (Server-001 through Server-032) are unchanged by this pass.
| 9 | Testing coverage | Issues found: Server-037 (no test for the corrupt-snapshot restore path or for `PersistSnapshot = false` at the cache level). |
| 10 | Documentation & comments | No issues found — XML docs match behavior; the `GalaxyRepository.md` "On-disk snapshot" section documents the Stale-on-restore lifecycle. |
### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the session-resilience epic + §8 delta (`git diff 410acc9..8df5ab3`): `SessionEventDistributor` multi-subscriber fan-out, replay-on-reconnect, detach-grace retention, bounded worker-ready wait, dashboard auto-login.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Server-055 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (replay/handoff atomicity, reconnect-vs-sweep, single-clock ready-wait all verified sound) |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found (DisableLogin auto-login is intentional/config-gated/documented) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | Server-054 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | No issues found |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037` scoped to the dashboard destructive-action wave on
@@ -1022,3 +1039,33 @@ Additionally, `GatewayAlarmMonitor.ApplyProviderModeChangeAsync` increments the
**Recommendation:** Add resolver tests for (a) cancellation propagation and (b) an include that is also excluded; and a `GatewayAlarmMonitorProviderMode` test pinning the provider-switch counter behaviour for a same-mode repeat event (whichever semantics the team intends). These lock down the contracts the Server-051/052 findings expose.
**Resolution:** Resolved 2026-06-15. Added all three missing tests: (a) `AlarmWatchListResolverTests.ResolveAsync_RepositoryCancelled_PropagatesOperationCanceled` (cancellation propagation, also covers Server-051); (b) `AlarmWatchListResolverTests.ResolveAsync_ExcludeAlsoSuppressesMatchingExplicitInclude` (exclude-vs-include precedence, also Server-052 item 2); and (c) `GatewayAlarmMonitorProviderModeTests.ProviderModeChange_RepeatedSameMode_RecordsASwitchForEachEvent`, which pins the existing semantics — each worker-reported `OnAlarmProviderModeChanged` event records a `provider_switches` increment (and resets `_providerSince`) even when `toMode` equals the current mode, since the worker is the authority on when a mode change occurred and the gateway does not synthesize or suppress it.
### Server-054
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) |
| Status | Open |
**Description:** The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still states "no reconnectable sessions for v1" and "one active StreamEvents subscriber per session for v1", and still files both as post-v1 "Later Revisit Items". `CLAUDE.md` likewise still says these are "explicitly out of scope". This is the stale-prose-vs-shipped-behavior drift the "update docs in the same change as the source" rule prohibits.
**Recommendation:** Update both `DesignDecisions.md` sections and the revisit list to describe the shipped behavior (gated by `AllowMultipleEventSubscribers`, `DetachGraceSeconds`, replay options), and amend the CLAUDE.md convention bullet.
**Resolution:** _(empty until closed)_
### Server-055
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` |
| Status | Open |
**Description:** When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` and detach-grace is enabled — stamps `_detachedAtUtc = now`. A freshly-`Ready` session that never had a successful subscriber thus enters the detach-grace window on a failed first attach, making it sweep-eligible after `DetachGraceSeconds` even though no client ever streamed. Impact is minor (the lease still protects it; a later successful attach clears the stamp) but the "last subscriber dropped" semantics are violated.
**Recommendation:** Only stamp `_detachedAtUtc` on a detach that mirrors a prior successful attach — roll the failure path back without entering grace, or guard the stamp on "a subscriber had previously been registered."
**Resolution:** _(empty until closed)_