Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5,12 +5,14 @@
|
||||
| Module | `src/MxGateway.Server` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-20 |
|
||||
| Commit reviewed | `1cd51bb` |
|
||||
| Commit reviewed | `a020350` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
### 2026-05-20 review (commit 1cd51bb)
|
||||
|
||||
This row summarizes the 2026-05-20 review pass at commit `1cd51bb`. Findings from
|
||||
prior passes (Server-001 through Server-014) are all closed and remain below as
|
||||
audit history.
|
||||
@@ -28,6 +30,23 @@ audit history.
|
||||
| 9 | Testing coverage | Issues found: Server-021 (`MxAccessGatewayService.ApplyConstraintsAsync` and the new `BulkConstraintPlan` / `ReadBulkConstraintPlan` / `WriteBulkConstraintPlan` / `SubscribeBulkConstraintPlan` merge logic is entirely untested). |
|
||||
| 10 | Documentation & comments | Issues found: Server-022 (`IAlarmRpcDispatcher` XML doc still describes the dispatcher as "ships a not-yet-wired default"; stale after Server-014). |
|
||||
|
||||
### 2026-05-20 review (commit a020350)
|
||||
|
||||
Re-review pass at `a020350` — the cross-module sweep that resolved Server-015 through Server-022. Verified each fix is sound (lock discipline now uniform on `_syncRoot`; `DisposeAsync` gates on `_closeLock`; alarm RPCs map to `InvokeWrite`/`EventsRead`; glob cache is bounded; alarm dispatcher SessionNotFound flows through `MxAccessGatewayService.MapException` → gRPC `NotFound`; dashboard pages emit a single `@page`; 11 new `MxAccessGatewayServiceConstraintTests` cover the bulk-constraint plans). New findings filed against this pass.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Issues found: Server-024 (`GalaxyGlobMatcher.GetOrCreateRegex` indexer access after `TryAdd` fails can throw `KeyNotFoundException` under contention near the cap). |
|
||||
| 2 | mxaccessgw conventions | No issues found. |
|
||||
| 3 | Concurrency & thread safety | No new issues found — Server-015/016 fixes verified sound. |
|
||||
| 4 | Error handling & resilience | Issues found: Server-026 (`AlarmsOptions` is bound but not validated by `GatewayOptionsValidator`). |
|
||||
| 5 | Security | No issues found — Server-017 mapping (`InvokeWrite` / `EventsRead`) is defensible and exercised by both resolver and interceptor tests. |
|
||||
| 6 | Performance & resource management | No issues found — Server-018 cap is in place and tested. |
|
||||
| 7 | Design-document adherence | Issues found: Server-027 (`docs/Authorization.md` `ResolveCommandScope` code snippet and Constraint Enforcement section omit the bulk read/write command families). |
|
||||
| 8 | Code organization & conventions | Issues found: Server-025 (`GalaxyRepositoryGrpcService` still consumes the concrete `GalaxyRepository` after `IGalaxyRepository` was introduced for testability — inconsistent with `GalaxyHierarchyCache`). |
|
||||
| 9 | Testing coverage | Issues found: Server-028 (`GatewayGrpcScopeResolverTests` does not exercise `WatchDeployEventsRequest` or `MxCommandKind.ReadBulk`; no `GatewaySessionTests` case asserts a `MarkFaulted` during in-flight Close). |
|
||||
| 10 | Documentation & comments | Issues found: Server-023 (`NotWiredAlarmRpcDispatcher` class XML doc still says "PR A.6/A.7 — default … shipped while the worker-side AlarmClient event subscription is gated on dev-rig validation"; contradicts the cleanup that Server-014/Server-022 applied to the interface, gateway service, and `WorkerAlarmRpcDispatcher`). Issues found: Server-029 (`OpenSession` capability list advertises `bulk-subscribe-commands` but not the now-shipping bulk-read or bulk-write families — clients that gate on capability strings have no signal that those families exist). |
|
||||
|
||||
## Findings
|
||||
|
||||
### Server-001
|
||||
@@ -359,3 +378,114 @@ audit history.
|
||||
**Recommendation:** Rewrite the `IAlarmRpcDispatcher` `<remarks>` block to match the language now used on `WorkerAlarmRpcDispatcher` and on the gRPC service: DI binds `WorkerAlarmRpcDispatcher` by default; `NotWiredAlarmRpcDispatcher` is only the null fallback for tests/DI omission. Drop the "PR A.6 / A.7" prefix from the `<summary>` — the interface is now the public alarm-RPC seam.
|
||||
|
||||
**Resolution:** 2026-05-20 — Rewrote `IAlarmRpcDispatcher`'s `<summary>` and `<remarks>` (`src/MxGateway.Server/Sessions/IAlarmRpcDispatcher.cs`) to match the language now used on `WorkerAlarmRpcDispatcher` and on `MxAccessGatewayService.AcknowledgeAlarm` / `QueryActiveAlarms`: dropped the stale "PR A.6 / A.7" prefix from the summary, and replaced the "this PR ships a not-yet-wired default that returns a clear worker-pending diagnostic" clause with the correct statement that DI binds the production `WorkerAlarmRpcDispatcher` by default and `NotWiredAlarmRpcDispatcher` is only the null fallback for DI omission / standalone tests. Pure documentation change; no test.
|
||||
|
||||
### Server-023
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/MxGateway.Server/Sessions/NotWiredAlarmRpcDispatcher.cs:10-26` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Server-014 and Server-022 swept the stale "PR A.6 / A.7" / "not-yet-wired" / "worker-pending" language off `MxAccessGatewayService.AcknowledgeAlarm` / `QueryActiveAlarms`, `WorkerAlarmRpcDispatcher`, and `IAlarmRpcDispatcher`. The concrete `NotWiredAlarmRpcDispatcher` class XML doc was not updated as part of either fix and still reads: *"PR A.6 / A.7 — default `IAlarmRpcDispatcher` shipped while the worker-side AlarmClient event subscription is gated on dev-rig validation"* and *"When the worker dispatcher (PR A.6/A.7 dev-rig follow-up) lands, `WorkerAlarmRpcDispatcher` replaces this implementation in the DI container"*. That is the exact prose the other sweeps removed, and it directly contradicts the now-current narrative everywhere else: `SessionServiceCollectionExtensions.AddGatewaySessions` registers `WorkerAlarmRpcDispatcher` as the default `IAlarmRpcDispatcher`; `NotWiredAlarmRpcDispatcher` is only the null fallback used when no dispatcher is registered (DI omission / standalone tests). The diagnostic string returned by `AcknowledgeAsync` (line 39) — `"the worker-side AlarmClient consumer (PR A.5) is in place but the dispatcher hookup is gated on validating the AVEVA alarm-provider event subscription on the dev rig"` — is also stale; the dispatcher hookup landed and any client that actually sees that diagnostic today is hitting the null-fallback path, not the dev-rig gate it describes.
|
||||
|
||||
**Recommendation:** Replace the `<summary>` and `<remarks>` on `NotWiredAlarmRpcDispatcher` with text that matches the language now used on the interface and `WorkerAlarmRpcDispatcher` — "null fallback `IAlarmRpcDispatcher` used when no dispatcher is registered (DI omission / standalone tests); production wires `WorkerAlarmRpcDispatcher`." Either drop the `AcknowledgeAsync` diagnostic string's dev-rig framing entirely or shorten it to "alarm dispatcher is not registered." `#pragma warning disable CS1998` on `QueryActiveAlarmsAsync` is correct here (empty stream is intentional for the null fallback) and should stay.
|
||||
|
||||
**Resolution:** 2026-05-20 — Rewrote `NotWiredAlarmRpcDispatcher` summary/remarks as the null-fallback dispatcher and shortened the `AcknowledgeAsync` diagnostic to "Alarm dispatcher is not registered."; updated the two tests that asserted the old "worker"-prefixed diagnostic.
|
||||
|
||||
### Server-024
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/MxGateway.Server/Galaxy/GalaxyGlobMatcher.cs:56-77` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `GetOrCreateRegex`'s race-loser branch reads `RegexCache[glob]` with an indexer (line 76) after `TryAdd` returned `false`. The indexer throws `KeyNotFoundException` if the key is missing. Under the new bounded cache (Server-018), there is a real — if narrow — race where the key vanishes between the failing `TryAdd` and the indexer read: thread A and thread B both compile a `Regex` for `glob`; A's `TryAdd` succeeds, A enqueues + enters `EvictIfOverCapacity`, the eviction loop dequeues `glob` (because some other thread had already enqueued + evicted enough that `glob` is now the oldest entry) and removes it; thread B's `TryAdd` then returns false, B reads `RegexCache[glob]`, and the indexer throws. The window is tiny but nonzero — eviction is approximate FIFO, and a hot pattern that is repeatedly re-added near the cap is the natural trigger. The same pre-Server-018 code used `GetOrAdd`, which had no such race because the dictionary handled the rebuild atomically.
|
||||
|
||||
**Recommendation:** Replace the `TryAdd` + indexer pair with `RegexCache.GetOrAdd(glob, _ => compiled)` so the dictionary atomically returns whichever instance won. Track the new insertion only when `GetOrAdd` returns the locally-compiled instance (`ReferenceEquals(result, compiled)`), then enqueue + evict. Alternatively, swap the trailing indexer read for `TryGetValue` + recursive recompile on miss. Add a stress test that mixes repeated reads of a single hot pattern with a flood of unique patterns near the cap and asserts no exception escapes `IsMatch`.
|
||||
|
||||
**Resolution:** 2026-05-20 — Replaced the `TryAdd` + indexer pair with `RegexCache.GetOrAdd(glob, compiled)`; FIFO enqueue + eviction now run only when `ReferenceEquals(result, compiled)` (i.e. our caller was the inserter), eliminating the post-eviction `KeyNotFoundException` window.
|
||||
|
||||
### Server-025
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:19-25`, `src/MxGateway.Server/Galaxy/IGalaxyRepository.cs` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** The Tests-016 fix introduced `IGalaxyRepository` so `GalaxyHierarchyCache` could be unit-tested against an in-memory fake, and `GalaxyHierarchyCache` was updated to depend on the interface. `GalaxyRepositoryGrpcService` was not updated and still receives the concrete `GalaxyDb.GalaxyRepository` via its primary constructor. Functionally this is fine — DI registers the concrete singleton and a thin `sp.GetRequiredService<GalaxyRepository>()` forwarder for the interface — but the seam is now half-applied: a future caller that wants to test or stub the gRPC service's `TestConnection` path has to construct a real `GalaxyRepository` against a SQL connection string, defeating the abstraction `IGalaxyRepository` was introduced for. The pattern also creates an inconsistency for new readers — two consumers in the same namespace, one on the interface and one on the concrete.
|
||||
|
||||
**Recommendation:** Change `GalaxyRepositoryGrpcService`'s `repository` parameter to `IGalaxyRepository`. No DI change is needed (both forwarders already resolve to the same singleton). Optionally drop the concrete singleton registration and register the interface directly.
|
||||
|
||||
**Resolution:** 2026-05-20 — Changed `GalaxyRepositoryGrpcService`'s `repository` primary-constructor parameter from the concrete `GalaxyRepository` to `IGalaxyRepository`; existing DI registration in `GalaxyRepositoryServiceCollectionExtensions` already resolves both the concrete and interface to the same singleton.
|
||||
|
||||
### Server-026
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/MxGateway.Server/Configuration/GatewayOptionsValidator.cs:17-32`, `src/MxGateway.Server/Configuration/AlarmsOptions.cs` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `GatewayOptions.Alarms` is bound from `MxGateway:Alarms` and consumed by `SessionManager.TryAutoSubscribeAlarmsAsync` (per-session SubscribeAlarms on Ready). `GatewayOptionsValidator.Validate` validates every other section (`Authentication`, `Ldap`, `Worker`, `Sessions`, `Events`, `Dashboard`, `Protocol`) but has no `ValidateAlarms` arm — `AlarmsOptions` is silently accepted regardless of contents. The runtime mitigates this by logging a warning when `Enabled = true` but neither `SubscriptionExpression` nor `DefaultArea` is set, then either faulting open-session (`RequireSubscribeOnOpen = true`) or skipping auto-subscribe — a configuration error therefore surfaces per-session at runtime instead of at startup. Other sections fail-fast at `ValidateOnStart()`, so the inconsistency makes alarm misconfiguration discoverable only after a client hits the gateway. A misformatted `SubscriptionExpression` (no `\\<host>\Galaxy!<area>` shape) likewise passes validation; the worker rejects it later.
|
||||
|
||||
**Recommendation:** Add a `ValidateAlarms(options.Alarms, failures)` arm in `GatewayOptionsValidator`. When `Enabled = true`, require either a non-blank `SubscriptionExpression` or a non-blank `DefaultArea`; when `SubscriptionExpression` is provided, sanity-check that it starts with `\\` (the AVEVA UNC subscription shape) — or document that the shape is left to the worker to validate. Either way, treat the configuration as part of the validated surface.
|
||||
|
||||
**Resolution:** 2026-05-20 — Added `ValidateAlarms` to `GatewayOptionsValidator`: when `Enabled = true`, requires a non-blank `SubscriptionExpression` or `DefaultArea`, and when `SubscriptionExpression` is provided, requires it to start with `\\` (canonical UNC subscription shape). Alarm misconfiguration now fails fast at startup instead of per-session.
|
||||
|
||||
### Server-027
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `docs/Authorization.md:120-141,176-181` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Two parts of `docs/Authorization.md` drifted from `GatewayGrpcScopeResolver.ResolveCommandScope` and from `MxAccessGatewayService.ApplyConstraintsAsync` over the bulk-read/bulk-write series (`f220908`/`5e375f6`/`758aca2`) and were not updated by the Server-017 / Server-021 fixes:
|
||||
|
||||
1. The `ResolveCommandScope` code snippet at lines 120-141 still shows only `Write`/`Write2` against `InvokeWrite` and `WriteSecured`/`WriteSecured2`/`AuthenticateUser` against `InvokeSecure`. The actual resolver also maps `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk`, `MxCommandKind.WriteSecuredBulk`, and `MxCommandKind.WriteSecured2Bulk`. A reader believing the snippet would conclude the bulk-write families inherit the fail-closed admin scope, when in fact they correctly map to `InvokeWrite` / `InvokeSecure` (the Scope Catalog table at lines 199-200 lists them).
|
||||
2. The Constraint Enforcement section (lines 176-181) says: *"The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`, `SubscribeBulk`, and `AdviseItemBulk`. It checks write constraints for `Write`, `Write2`, `WriteSecured`, and `WriteSecured2`."* The actual `ApplyConstraintsAsync` switch also enforces constraints for `ReadBulk` (read scope), `WriteBulk` / `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk` (write scope, per-entry filtering with index-order merge). Server-021 added test coverage for all of these without touching the doc.
|
||||
|
||||
**Recommendation:** Update the `ResolveCommandScope` snippet to include the four bulk-write arms. Update the Constraint Enforcement prose to enumerate the bulk read/write commands that are actually filtered, and reference the per-entry index-ordered merge that `BulkConstraintPlan.MergeDeniedInto` performs. Adding `ReadBulk` to the `InvokeRead` row of the Scope Catalog would also be useful — the table currently lists `Register`/`AddItem`/`Advise` against `InvokeRead` but not `ReadBulk`.
|
||||
|
||||
**Resolution:** 2026-05-20 — Updated the `ResolveCommandScope` snippet in `docs/Authorization.md` to enumerate the four bulk-write arms (`WriteBulk`/`Write2Bulk` against `InvokeWrite`, `WriteSecuredBulk`/`WriteSecured2Bulk` against `InvokeSecure`); expanded the Constraint Enforcement prose to list `ReadBulk` and all four bulk-write commands and to call out `BulkConstraintPlan.MergeDeniedInto`'s index-ordered merge; added `ReadBulk` to the `InvokeRead` row of the Scope Catalog.
|
||||
|
||||
### Server-028
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcScopeResolverTests.cs:13-20`, `src/MxGateway.Tests/Gateway/Sessions/GatewaySessionTests.cs` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** Two narrow test gaps were not closed by Server-017 / Server-015:
|
||||
|
||||
1. `GatewayGrpcScopeResolverTests.ResolveRequiredScope_KnownRpcRequest_ReturnsExpectedScope` enumerates `OpenSessionRequest`, `CloseSessionRequest`, `StreamEventsRequest`, `AcknowledgeAlarmRequest`, `QueryActiveAlarmsRequest`, `TestConnectionRequest`, `GetLastDeployTimeRequest`, and `DiscoverHierarchyRequest`. `WatchDeployEventsRequest` is missing even though it is named in the resolver's metadata-read arm and listed in the Scope Catalog. Similarly, the `ResolveRequiredScope_InvokeCommand_ReturnsExpectedScope` matrix covers every other write/secure/bulk command but omits `MxCommandKind.ReadBulk`, which is the only bulk family that falls into the `_ => GatewayScopes.InvokeRead` default arm. A regression that drops `WatchDeployEvents` from the request switch or that adds a new mapping for `ReadBulk` would not be caught.
|
||||
2. `GatewaySessionTests` (added under Server-015 / Server-016) covers the `TransitionTo(Ready)` and `MarkFaulted(post-Close)` cases but does not cover the third edge that Server-015's tightened state machine permits: `MarkFaulted` issued while `CloseAsync` is parked between `TryBeginClose` (Closing) and `MarkClosed` (Closed). The current `MarkFaulted` (`GatewaySession.cs:314-326`) checks only for `Closed`, so it overwrites `Closing` → `Faulted`; the subsequent `MarkClosed` then overwrites `Faulted` → `Closed` while `_finalFault` is preserved. The behaviour is consistent with the docs ("Closing only allows a transition to Closed or Faulted") but the test bundle does not pin it, and a future tightening of `MarkFaulted` could silently regress.
|
||||
|
||||
**Recommendation:** Extend `GatewayGrpcScopeResolverTests.ResolveRequiredScope_KnownRpcRequest_ReturnsExpectedScope` with `[InlineData(typeof(WatchDeployEventsRequest), GatewayScopes.MetadataRead)]` and extend the command theory with `[InlineData(MxCommandKind.ReadBulk, GatewayScopes.InvokeRead)]`. Add a `GatewaySessionTests.MarkFaulted_DuringInFlightClose_PreservesFaultButYieldsToClose` case using `BlockingShutdownWorkerClient` to park `CloseAsync`, call `MarkFaulted` while parked, release the worker, and assert `State == Closed && FinalFault == "<the fault reason>"`.
|
||||
|
||||
**Resolution:** 2026-05-20 — Added `[InlineData(typeof(WatchDeployEventsRequest), GatewayScopes.MetadataRead)]` to `GatewayGrpcScopeResolverTests.ResolveRequiredScope_KnownRpcRequest_ReturnsExpectedScope` (the `ReadBulk` arm was already present); added `GatewaySessionTests.MarkFaulted_DuringInFlightClose_PreservesFaultButYieldsToClose` covering the parked-close + `MarkFaulted` interleave and asserting the post-release state is `Closed` with `FinalFault = "concurrent-fault"`.
|
||||
|
||||
### Server-029
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:52-58` |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `OpenSession` advertises capabilities the gateway supports so clients can branch on them. The current list is `unary-open-session`, `unary-close-session`, `unary-invoke`, `server-stream-events`, `bulk-subscribe-commands`, `unary-acknowledge-alarm`, `server-stream-active-alarms`. The `bulk-subscribe-commands` token was added for the `AddItemBulk` / `AdviseItemBulk` / `RemoveItemBulk` / `UnAdviseItemBulk` / `SubscribeBulk` / `UnsubscribeBulk` family. The subsequent `ReadBulk` and `WriteBulk` / `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk` families landed without a corresponding capability token — the contract advertises bulk-subscribe support but is silent on bulk-read and bulk-write. A defensive client that gates on `bulk-write-commands` before issuing a `WriteBulk` has no signal that the family is supported; current clients sidestep this by ignoring the list entirely, but that just shifts the failure mode (an old client against a new server, or vice versa, will see `Unimplemented` instead of a structured `Capabilities` mismatch).
|
||||
|
||||
**Recommendation:** Either (a) extend the advertised list with `bulk-read-command` and `bulk-write-commands` (`WriteBulk` / `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk` collectively), or (b) document in `gateway.md` and `docs/Contracts.md` that `Capabilities` is informational only and not the contract version. Option (a) is the simplest forward-compatible fix and keeps the capability token shape clients are already familiar with.
|
||||
|
||||
**Resolution:** 2026-05-20 — Extended the `OpenSession` capabilities list with `bulk-read-commands` and `bulk-write-commands` alongside the existing `bulk-subscribe-commands` token, so clients that gate on capability strings have an explicit signal for the bulk-read and bulk-write families.
|
||||
|
||||
Reference in New Issue
Block a user