Compare commits

...

7 Commits

Author SHA1 Message Date
Joseph Doherty a68f0cf222 code-review: regen README, all 21 open findings resolved
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open
findings from the d692232 re-review are now Resolved:

  Server          327e9c5  Server-031, -032, -038, -039, -040, -041, -042, -043
  Contracts       bd1d1f1  Contracts-016, -017
  Tests           d48099f  Tests-025, -026
  IntegrationTests 865c22a IntegrationTests-022, -023, -024
  Client.Java     10bd0c0  Client.Java-027, -028, -029, -030, -031
  Client.Rust     83eba4b  Client.Rust-021

Also fixes stale "Open findings" header counts in Client.Java and
IntegrationTests findings.md that survived the resolve passes
(the agents updated each finding's Status but missed the header
sum). `regen-readme.py --check` is now green.

Module status: 11 / 11 reviewed, 0 / 276 total open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:39 -04:00
Joseph Doherty 83eba4bec5 Resolve Client.Rust-021
Client.Rust-021 (Design adherence): RustClientDesign.md "Crate layout"
section now describes the actual flat workspace structure instead of
the aspirational nested form. The replacement text states that the
workspace root is clients/rust/, the top-level crate
zb-mom-ww-mxgateway-client is declared in clients/rust/Cargo.toml
directly, and crates/mxgw-cli/ is the sole [workspace.members] entry.
The accompanying tree lists the real files on disk (Cargo.toml,
Cargo.lock, build.rs, README.md, RustClientDesign.md,
src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs
plus the src/generated/ tonic-build output dir, tests/, and
crates/mxgw-cli/).

Doc-only change. cargo build --workspace + cargo test --workspace clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:07 -04:00
Joseph Doherty 10bd0c0e4d Resolve Client.Java-027..031
Client.Java-027 (Documentation): Updated 17 Gradle task references in
clients/java/README.md (lines 37, 108-110, 160-161, 169-176, 186, 206,
221) and 3 in clients/java/JavaClientDesign.md from the retired short
subproject names to the canonical zb-mom-ww-mxgateway-client /
zb-mom-ww-mxgateway-cli names. Copy-pasting any documented command now
matches the subproject names declared in settings.gradle.

Client.Java-028 (Design adherence): Build-layout block in
JavaClientDesign.md lines 23-27 updated to show the actual package
paths com/zb/mom/ww/mxgateway/{client,cli}/ instead of the retired
com/dohertylan/mxgateway/{client,cli}/ paths.

Client.Java-029 (Documentation): README.md line 210 corrected from
"zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli" to
"zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli" — Gradle
installDist produces a directory whose name matches the project name,
not the short suffix. The e2e script already used the correct path.

Client.Java-030 (Testing coverage): Added
queryActiveAlarmsForwardsRequestAndStreamsSnapshots to
MxGatewayClientSessionTests. The test pushes a QueryActiveAlarmsRequest
carrying session_id / client_correlation_id / alarm_filter_prefix
through an InProcessGateway + TestGatewayService and asserts the server
observed all three request fields, two ActiveAlarmSnapshots stream in
order, and onError is never called. TDD red→green confirmed via a
deliberately-wrong session_id assertion. The re-triage note in
Client.Java-030's resolution clarifies that the finding's reference to
"the existing acknowledgeAlarm test" was aspirational — the alarm RPC
surface had zero coverage before this commit.

Client.Java-031 (Conventions): README.md prose lines 17, 22, 26 updated
to use the canonical zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli
names so the layout description matches Gradle / IDE project names.

Verification: gradle build BUILD SUCCESSFUL; all Java unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:06 -04:00
Joseph Doherty 865c22a884 Resolve IntegrationTests-022..024
IntegrationTests-022 (Conventions): ResolveRepositoryRoot now throws
InvalidOperationException when the walk exhausts without finding a root
marker, with a message naming the start directory, the expected markers
(src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE
escape hatch. Replaces the silent fallback to
Directory.GetCurrentDirectory() that previously masked misconfiguration.
New regression test
ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers
in IntegrationTestEnvironmentTests asserts the throw and the message
contents. TDD red→green confirmed.

IntegrationTests-023 (Testing coverage): DashboardLdapLiveTests's
AuthenticateAsync_AdminInGwAdminGroup_Succeeds now asserts that the
authenticated principal carries a ClaimTypes.Role claim with value
DashboardRoles.Admin in addition to the existing LdapGroupClaimType
assertion. A regression in MapGroupsToRoles (returning an empty list or
missing the RDN fallback) would now surface here. Gated by
MXGATEWAY_RUN_LIVE_LDAP_TESTS.

IntegrationTests-024 (Conventions): Option (b) — extracted within
IntegrationTests. New file TestSupport/NullDashboardEventBroadcaster.cs
(public type, private ctor, singleton Instance). The inline class at
the bottom of WorkerLiveMxAccessSmokeTests is gone; the file now imports
the shared type. Matches the unit-test project's Tests-007 / Tests-021 /
Tests-025 pattern while keeping the two test projects independently
buildable (no shared test-helpers project crossing module boundaries).

Verification: dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests
clean; 19/19 integration tests passing (live MxAccess + LDAP + Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty d48099f0d0 Resolve Tests-025..026
Tests-025 (Conventions): Extracted the previously-duplicated
NullDashboardEventBroadcaster into TestSupport/NullDashboardEventBroadcaster.cs
(singleton Instance, private ctor). The two nested copies in
EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests were
removed; both files now use the shared type via
'using ZB.MOM.WW.MxGateway.Tests.TestSupport;'. The Server-041 regression
test's ThrowingDashboardEventBroadcaster is intentionally left nested —
single-file usage doesn't warrant promotion to TestSupport. The third
copy in IntegrationTests/WorkerLiveMxAccessSmokeTests was handled by
IntegrationTests-024 in its own commit.

Tests-026 (Testing coverage): Added a new RecordingDashboardEventBroadcaster
test double in TestSupport — a thread-safe (ConcurrentQueue<DashboardEventCapture>)
recorder. New fixture
StreamEventsAsync_PublishesEachEventToDashboardBroadcaster in
EventStreamServiceTests pushes two events through the fake session and
asserts the broadcaster received both with the correct sessionId and
WorkerSequence. TDD red→green confirmed: the deliberately-wrong
"Expected 3, Actual 2" red phase proved the recording fake was actually
invoked by the production code path.

Verification: 486/486 server tests passing (485 previous + 1 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty bd1d1f1c0e Resolve Contracts-016..017
Contracts-016 (Conventions): QueryActiveAlarmsRequest.session_id header
replaced with the unambiguous "Clients may leave session_id empty; the
gateway currently ignores it and serves the session-less central-monitor
cache. A future version may use it to scope the snapshot to one
session." Removes the ambiguity that the prior "reserved for future
use" wording introduced.

Contracts-017 (Documentation): The rpc QueryActiveAlarms comment now
includes the alarm_filter_prefix description: "QueryActiveAlarmsRequest.alarm_filter_prefix
optionally narrows the snapshot to alarms whose alarm_full_reference
starts with the given prefix; an empty prefix returns the full set."

Both are proto-comment-only changes — no wire-format impact, no field
renumbering, and the regenerated MxaccessGateway.cs / MxaccessGatewayGrpc.cs
carry only the doc-comment delta. Added the additive-only regression
guard QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter
to ProtobufContractRoundTripTests — pins
session_id=1 / client_correlation_id=2 / alarm_filter_prefix=3 by
descriptor lookup and round-trips the message with and without the
filter populated.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean;
ProtobufContractRoundTripTests 40/40 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:13 -04:00
Joseph Doherty 327e9c5f94 Resolve Server-031..032 (re-triaged) + Server-038..043
Server-031: re-triaged. The recommended gateway-side
"skip-while-command-in-flight" guard is already in place at
WorkerClient.HeartbeatLoopAsync via WorkerClientOptions.HeartbeatStuckCeiling
(default 75s = 5× HeartbeatGrace). Two regression tests pin the
behaviour. Recommendation #1 (decouple worker-side _writeLock) is a
Worker-module concern (Worker-017 / Worker-023) and out of scope here.

Server-032: re-triaged. Recommendation #2 (rich diagnostic) is already
in EnqueueWorkerEventAsync, with #3 (overflow grace) absorbed by the
TryWrite → WriteAsync-with-timeout fall-through. Test
EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic
pins the diagnostic string. Recommendation #1 (prose contract in
gateway.md / docs) is deferred — outside this pass's edit scope.

Server-038 (Security): EventsHub.SubscribeSession's missing per-session
ACL is documented with a TODO(per-session-acl) and a <remarks> block
explaining the v1 acceptance (any dashboard role can subscribe to any
session — non-secret metadata, redacted value logging). The per-session
ACL design lands in a follow-up once a session-scoped role exists.

Server-039 (Error handling): HubTokenService.Validate now rejects a
deserialized payload where both Name and NameIdentifier are null/empty.
New test file HubTokenServiceTests.cs covers the regression and five
sanity cases. TDD confirmed.

Server-040 (Conventions): MapGroupsToRoles gains a precedence comment
explaining "full literal match first, leading-RDN fallback;
OrdinalIgnoreCase via DashboardOptions.GroupToRole". Documentation-only.

Server-041 (Design adherence): EventStreamService.ProduceEventsAsync
wraps the broadcaster.Publish call in try/catch (Exception). The
producer loop and gRPC stream are no longer at the mercy of the
broadcaster's never-throw discipline. New regression test
StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession.

Server-042 (Performance): DashboardSnapshotPublisher.ExecuteAsync now
mirrors AlarmsHubPublisher's reconnect loop — wraps the await foreach
in a while-not-cancelled, catches general exceptions, and Task.Delays
5s before retrying. An internal ctor accepts a shorter delay for the
test. New test file DashboardSnapshotPublisherTests.cs covers the
throw-then-yield reconnect path and the normal-completion case.

Server-043 (Documentation): HubTokenService class XML doc gains a
<remarks> describing the singleton lifetime, the two consumer scopes
(DashboardHubConnectionFactory scoped, HubTokenAuthenticationHandler
transient), and the thread-safety contract.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean
(0 warnings / 0 errors); src/ZB.MOM.WW.MxGateway.Tests 486/486 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:18:52 -04:00
33 changed files with 1011 additions and 175 deletions
+9 -9
View File
@@ -20,11 +20,11 @@ clients/java/
src/main/generated/
zb-mom-ww-mxgateway-client/
build.gradle
src/main/java/com/dohertylan/mxgateway/client/
src/test/java/com/dohertylan/mxgateway/client/
src/main/java/com/zb/mom/ww/mxgateway/client/
src/test/java/com/zb/mom/ww/mxgateway/client/
zb-mom-ww-mxgateway-cli/
build.gradle
src/main/java/com/dohertylan/mxgateway/cli/
src/main/java/com/zb/mom/ww/mxgateway/cli/
```
Alternative Maven layout is acceptable if the repo standardizes on Maven.
@@ -192,8 +192,8 @@ stream for bounded time, and close.
Publish library and CLI separately:
- `mxgateway-client` jar,
- `mxgateway-cli` runnable distribution.
- `zb-mom-ww-mxgateway-client` jar,
- `zb-mom-ww-mxgateway-cli` runnable distribution.
Generated protobuf code should be produced during the build from shared proto
files and should not be hand-edited.
@@ -206,10 +206,10 @@ Run the Java scaffold checks from `clients/java`:
gradle test
```
The `mxgateway-client` project generates the gateway and worker protobuf/gRPC
bindings into `src/main/generated`, compiles the generated contracts, and runs
JUnit 5 tests. The `mxgateway-cli` project builds a Picocli-based `mxgw-java`
entry point for later command implementation.
The `zb-mom-ww-mxgateway-client` project generates the gateway and worker
protobuf/gRPC bindings into `src/main/generated`, compiles the generated
contracts, and runs JUnit 5 tests. The `zb-mom-ww-mxgateway-cli` project
builds a Picocli-based `mxgw-java` entry point for later command implementation.
## Related Documentation
+24 -23
View File
@@ -14,18 +14,19 @@ clients/java/
zb-mom-ww-mxgateway-cli/
```
`mxgateway-client` generates Java protobuf and gRPC sources from
`zb-mom-ww-mxgateway-client` generates Java protobuf and gRPC sources from
`../../src/ZB.MOM.WW.MxGateway.Contracts/Protos`. The Gradle protobuf plugin writes those
generated sources under `src/main/generated`, which matches the client proto
manifest in `../proto/proto-inputs.json`. Do not edit generated files by hand.
`mxgateway-client` exposes `MxGatewayClientOptions`, `MxGatewayClient`,
`zb-mom-ww-mxgateway-client` exposes `MxGatewayClientOptions`, `MxGatewayClient`,
`MxGatewaySession`, value/status helpers, typed gateway exceptions, raw
generated stubs, and generated protobuf messages for parity tests.
`mxgateway-cli` depends on `mxgateway-client` and provides the `mxgw-java`
application entry point. The CLI supports version, session, command, event
streaming, write, and smoke-test commands with deterministic JSON output.
`zb-mom-ww-mxgateway-cli` depends on `zb-mom-ww-mxgateway-client` and provides
the `mxgw-java` application entry point. The CLI supports version, session,
command, event streaming, write, and smoke-test commands with deterministic
JSON output.
## Regenerating Protobuf Bindings
@@ -33,7 +34,7 @@ Run generation from `clients/java` after the shared `.proto` files or Java
output path changes:
```powershell
gradle :mxgateway-client:generateProto
gradle :zb-mom-ww-mxgateway-client:generateProto
```
## Client Usage
@@ -104,9 +105,9 @@ The CLI exposes matching subcommands: `galaxy-test`, `galaxy-deploy-time`,
`--timeout`, and `--json` options as the gateway commands.
```powershell
gradle :mxgateway-cli:run --args="galaxy-test --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :mxgateway-cli:run --args="galaxy-deploy-time --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :mxgateway-cli:run --args="galaxy-discover --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="galaxy-test --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="galaxy-deploy-time --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="galaxy-discover --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
```
### Watching deploy events
@@ -156,8 +157,8 @@ The matching CLI subcommand streams events until cancelled (Ctrl+C) and prints
one line per event in text mode or one JSON object per event with `--json`:
```powershell
gradle :mxgateway-cli:run --args="galaxy-watch --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :mxgateway-cli:run --args="galaxy-watch --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --last-seen-deploy-time 2026-04-28T18:30:00Z --limit 5"
gradle :zb-mom-ww-mxgateway-cli:run --args="galaxy-watch --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="galaxy-watch --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --last-seen-deploy-time 2026-04-28T18:30:00Z --limit 5"
```
## CLI Usage
@@ -165,14 +166,14 @@ gradle :mxgateway-cli:run --args="galaxy-watch --endpoint localhost:5000 --api-k
Run the CLI through Gradle:
```powershell
gradle :mxgateway-cli:run --args="version --json"
gradle :mxgateway-cli:run --args="open-session --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --client-session-name java-cli --json"
gradle :mxgateway-cli:run --args="register --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --client-name java-cli --json"
gradle :mxgateway-cli:run --args="add-item --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item TestObject.TestInt --json"
gradle :mxgateway-cli:run --args="advise --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item-handle 1 --json"
gradle :mxgateway-cli:run --args="write --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123 --json"
gradle :mxgateway-cli:run --args="stream-events --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --limit 1 --json"
gradle :mxgateway-cli:run --args="smoke --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --item TestObject.TestInt --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="version --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="open-session --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --client-session-name java-cli --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="register --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --client-name java-cli --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="add-item --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item TestObject.TestInt --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="advise --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item-handle 1 --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="write --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123 --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="stream-events --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --limit 1 --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="smoke --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --item TestObject.TestInt --json"
```
The CLI accepts `--api-key`, `--api-key-env`, `--plaintext`, `--ca-file`,
@@ -182,7 +183,7 @@ output redacts API keys.
Use TLS options for a secured gateway:
```powershell
gradle :mxgateway-cli:run --args="smoke --endpoint mxgateway.example.local:5001 --ca-file C:\certs\mxgateway-ca.pem --server-name-override mxgateway.example.local --api-key-env MXGATEWAY_API_KEY --item TestObject.TestInt --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="smoke --endpoint mxgateway.example.local:5001 --ca-file C:\certs\mxgateway-ca.pem --server-name-override mxgateway.example.local --api-key-env MXGATEWAY_API_KEY --item TestObject.TestInt --json"
```
## Build And Test
@@ -202,11 +203,11 @@ in-process gRPC behavior, stream cancellation, and CLI parser/output behavior.
Create local library and CLI artifacts from `clients/java`:
```powershell
gradle :mxgateway-client:jar :mxgateway-cli:installDist
gradle :zb-mom-ww-mxgateway-client:jar :zb-mom-ww-mxgateway-cli:installDist
```
The library jar is under `zb-mom-ww-mxgateway-client/build/libs`. The installed CLI
distribution is under `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli`.
distribution is under `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`.
## Integration Checks
@@ -217,7 +218,7 @@ $env:MXGATEWAY_INTEGRATION = '1'
$env:MXGATEWAY_ENDPOINT = 'localhost:5000'
$env:MXGATEWAY_API_KEY = '<gateway-api-key>'
$env:MXGATEWAY_TEST_ITEM = 'TestObject.TestInt'
gradle :mxgateway-cli:run --args="smoke --endpoint $env:MXGATEWAY_ENDPOINT --plaintext --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="smoke --endpoint $env:MXGATEWAY_ENDPOINT --plaintext --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json"
```
## Related Documentation
@@ -354,6 +354,9 @@ public final class MxAccessGatewayGrpc {
* reconnect to seed Part 9 client state, or to reconcile alarms that may
* have been missed during a transport blip. Streamed so callers can
* begin processing without buffering the full set.
* `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
* snapshot to alarms whose `alarm_full_reference` starts with the given
* prefix; an empty prefix returns the full set.
* </pre>
*/
default void queryActiveAlarms(mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request,
@@ -457,6 +460,9 @@ public final class MxAccessGatewayGrpc {
* reconnect to seed Part 9 client state, or to reconcile alarms that may
* have been missed during a transport blip. Streamed so callers can
* begin processing without buffering the full set.
* `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
* snapshot to alarms whose `alarm_full_reference` starts with the given
* prefix; an empty prefix returns the full set.
* </pre>
*/
public void queryActiveAlarms(mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest request,
@@ -545,6 +551,9 @@ public final class MxAccessGatewayGrpc {
* reconnect to seed Part 9 client state, or to reconcile alarms that may
* have been missed during a transport blip. Streamed so callers can
* begin processing without buffering the full set.
* `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
* snapshot to alarms whose `alarm_full_reference` starts with the given
* prefix; an empty prefix returns the full set.
* </pre>
*/
@io.grpc.ExperimentalApi("https://github.com/grpc/grpc-java/issues/10918")
@@ -632,6 +641,9 @@ public final class MxAccessGatewayGrpc {
* reconnect to seed Part 9 client state, or to reconcile alarms that may
* have been missed during a transport blip. Streamed so callers can
* begin processing without buffering the full set.
* `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
* snapshot to alarms whose `alarm_full_reference` starts with the given
* prefix; an empty prefix returns the full set.
* </pre>
*/
public java.util.Iterator<mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot> queryActiveAlarms(
@@ -1995,9 +1995,10 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
}
/**
* <pre>
* Public request shape for QueryActiveAlarms. session_id is currently unused
* (the snapshot is session-less) but reserved so a future per-session view
* can be added without a wire break.
* Public request shape for QueryActiveAlarms.
* Clients may leave `session_id` empty; the gateway currently ignores it and
* serves the session-less central-monitor cache. A future version may use it
* to scope the snapshot to one session.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.QueryActiveAlarmsRequest}
@@ -2344,9 +2345,10 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
}
/**
* <pre>
* Public request shape for QueryActiveAlarms. session_id is currently unused
* (the snapshot is session-less) but reserved so a future per-session view
* can be added without a wire break.
* Public request shape for QueryActiveAlarms.
* Clients may leave `session_id` empty; the gateway currently ignores it and
* serves the session-less central-monitor cache. A future version may use it
* to scope the snapshot to one session.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.QueryActiveAlarmsRequest}
@@ -2,6 +2,7 @@ package com.zb.mom.ww.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
@@ -23,7 +24,9 @@ import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.ActiveAlarmSnapshot;
import mxaccess_gateway.v1.MxaccessGateway.AddItemReply;
import mxaccess_gateway.v1.MxaccessGateway.AlarmConditionState;
import mxaccess_gateway.v1.MxaccessGateway.BulkSubscribeReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionRequest;
@@ -35,6 +38,7 @@ import mxaccess_gateway.v1.MxaccessGateway.OpenSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.OpenSessionRequest;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatus;
import mxaccess_gateway.v1.MxaccessGateway.ProtocolStatusCode;
import mxaccess_gateway.v1.MxaccessGateway.QueryActiveAlarmsRequest;
import mxaccess_gateway.v1.MxaccessGateway.RegisterReply;
import mxaccess_gateway.v1.MxaccessGateway.SessionState;
import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
@@ -179,6 +183,91 @@ final class MxGatewayClientSessionTests {
}
}
@Test
void queryActiveAlarmsForwardsRequestAndStreamsSnapshots() throws Exception {
AtomicReference<QueryActiveAlarmsRequest> queryRequest = new AtomicReference<>();
TestGatewayService service = new TestGatewayService() {
@Override
public void queryActiveAlarms(
QueryActiveAlarmsRequest request, StreamObserver<ActiveAlarmSnapshot> responseObserver) {
queryRequest.set(request);
responseObserver.onNext(ActiveAlarmSnapshot.newBuilder()
.setAlarmFullReference("Galaxy!TestArea.TestMachine_001.HiTemp")
.setSourceObjectReference("TestMachine_001")
.setAlarmTypeName("HiAlarm")
.setSeverity(800)
.setCurrentState(AlarmConditionState.ALARM_CONDITION_STATE_ACTIVE)
.setCategory("Process")
.setDescription("High temperature alarm")
.build());
responseObserver.onNext(ActiveAlarmSnapshot.newBuilder()
.setAlarmFullReference("Galaxy!TestArea.TestMachine_002.LoTemp")
.setSourceObjectReference("TestMachine_002")
.setAlarmTypeName("LoAlarm")
.setSeverity(500)
.setCurrentState(AlarmConditionState.ALARM_CONDITION_STATE_ACTIVE_ACKED)
.setOperatorUser("operator-1")
.setOperatorComment("acknowledged")
.build());
responseObserver.onCompleted();
}
};
try (InProcessGateway gateway = InProcessGateway.start(service, new AtomicReference<>());
MxGatewayClient client = gateway.client("", Duration.ofSeconds(5))) {
CountDownLatch completed = new CountDownLatch(1);
java.util.List<ActiveAlarmSnapshot> received = new java.util.ArrayList<>();
AtomicReference<Throwable> errorRef = new AtomicReference<>();
QueryActiveAlarmsRequest request = QueryActiveAlarmsRequest.newBuilder()
.setSessionId("alarm-session")
.setClientCorrelationId("test-corr-1")
.setAlarmFilterPrefix("Galaxy!TestArea")
.build();
MxGatewayActiveAlarmsSubscription subscription = client.queryActiveAlarms(
request,
new StreamObserver<>() {
@Override
public void onNext(ActiveAlarmSnapshot value) {
received.add(value);
}
@Override
public void onError(Throwable t) {
errorRef.set(t);
completed.countDown();
}
@Override
public void onCompleted() {
completed.countDown();
}
});
assertTrue(completed.await(5, TimeUnit.SECONDS));
subscription.close();
assertNotNull(queryRequest.get());
assertEquals("alarm-session", queryRequest.get().getSessionId());
assertEquals("test-corr-1", queryRequest.get().getClientCorrelationId());
assertEquals("Galaxy!TestArea", queryRequest.get().getAlarmFilterPrefix());
assertEquals(2, received.size());
assertEquals("Galaxy!TestArea.TestMachine_001.HiTemp", received.get(0).getAlarmFullReference());
assertEquals(
AlarmConditionState.ALARM_CONDITION_STATE_ACTIVE,
received.get(0).getCurrentState());
assertEquals(800, received.get(0).getSeverity());
assertEquals("Galaxy!TestArea.TestMachine_002.LoTemp", received.get(1).getAlarmFullReference());
assertEquals(
AlarmConditionState.ALARM_CONDITION_STATE_ACTIVE_ACKED,
received.get(1).getCurrentState());
assertEquals("operator-1", received.get(1).getOperatorUser());
assertNull(errorRef.get());
}
}
@Test
void commandFailureKeepsRawReply() throws Exception {
TestGatewayService service = new TestGatewayService() {
+28 -15
View File
@@ -11,25 +11,38 @@ generated contract inputs.
## Crate Layout
Recommended layout:
The workspace is rooted at `clients/rust/`. The top-level crate
`zb-mom-ww-mxgateway-client` is declared by `clients/rust/Cargo.toml` itself
(flat layout — its `src/` sits directly under the workspace root, not nested
inside `crates/`). The only `[workspace.members]` entry is the `mxgw-cli`
binary subcrate under `crates/mxgw-cli/`.
```text
clients/rust/
Cargo.toml
build.rs
crates/
zb-mom-ww-mxgateway-client/
src/lib.rs
src/client.rs
src/session.rs
src/options.rs
src/auth.rs
src/value.rs
src/error.rs
src/generated/
mxgw-cli/
src/main.rs
Cargo.toml # workspace root + top-level crate `zb-mom-ww-mxgateway-client`
Cargo.lock
build.rs # tonic-build proto generation
README.md
RustClientDesign.md
src/
lib.rs
client.rs
session.rs
galaxy.rs
options.rs
auth.rs
error.rs
value.rs
version.rs
generated.rs # `pub mod` wrappers around files under `src/generated/`
generated/ # tonic-build output (not hand-edited)
tests/
client_behavior.rs
proto_fixtures.rs
crates/
mxgw-cli/ # sole workspace member — binary `mxgw`
Cargo.toml
src/main.rs
```
Expected dependencies:
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -451,13 +451,13 @@ documentation updates lag — see the doc-side findings below.
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/java/README.md:36,107-175,185,205,220`, `clients/java/JavaClientDesign.md:195-211` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `397d3c5` renamed the gradle subprojects to `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` in `settings.gradle`, but did not propagate that rename into the README's documented gradle commands or into `JavaClientDesign.md`. Every documented gradle invocation still uses the old short names — `gradle :mxgateway-client:generateProto`, `gradle :mxgateway-cli:run --args=...`, `gradle :mxgateway-client:jar :mxgateway-cli:installDist` — and every one fails with `project 'mxgateway-client' not found in root project 'zb-mom-ww-mxaccessgw-java'`. A user copy-pasting from the README or design doc will hit this on the very first command.
**Recommendation:** Find-and-replace `:mxgateway-client:``:zb-mom-ww-mxgateway-client:` and `:mxgateway-cli:``:zb-mom-ww-mxgateway-cli:` across `clients/java/README.md` and `clients/java/JavaClientDesign.md`. (Roughly 17 occurrences in the README, 3 in the design doc.) Test by running one updated command end-to-end.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Replaced every stale gradle task path in `clients/java/README.md` (line 37 `:mxgateway-client:generateProto`, lines 108110 `:mxgateway-cli:run` galaxy-* invocations, lines 160161 `:mxgateway-cli:run` galaxy-watch invocations, lines 169176 `:mxgateway-cli:run` gateway-command invocations, line 186 `:mxgateway-cli:run` TLS smoke invocation, line 206 `:mxgateway-client:jar :mxgateway-cli:installDist` packaging command, line 221 `:mxgateway-cli:run` integration-check invocation) and in `clients/java/JavaClientDesign.md` (lines 195196 packaging bullets, lines 209212 "Current Build" prose) with their prefixed `:zb-mom-ww-mxgateway-client:` / `:zb-mom-ww-mxgateway-cli:` equivalents. Verified by running `gradle :zb-mom-ww-mxgateway-client:test --tests …` (now the updated documented form) end-to-end and `gradle build`, both BUILD SUCCESSFUL. Doc-only change; no production code touched.
### Client.Java-028
@@ -466,13 +466,13 @@ documentation updates lag — see the doc-side findings below.
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/java/JavaClientDesign.md:23-27` |
| Status | Open |
| Status | Resolved |
**Description:** The build-layout block in `JavaClientDesign.md` still shows the old Java package paths `com/dohertylan/mxgateway/client/` and `com/dohertylan/mxgateway/cli/`. The actual source tree was moved to `com/zb/mom/ww/mxgateway/{client,cli}/` in commit `397d3c5`. Anyone using the design doc to locate or navigate code will look in the wrong place.
**Recommendation:** Update the layout block to reflect the new paths: `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/`. Comment-only change.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Updated the Build Layout block in `clients/java/JavaClientDesign.md:23-27` to show `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/` instead of the old `com/dohertylan/mxgateway/{client,cli}/` package paths, matching the actual on-disk source tree under `clients/java/zb-mom-ww-mxgateway-{client,cli}/src/{main,test}/java/`. Doc-only change; no production code touched.
### Client.Java-029
@@ -481,13 +481,13 @@ documentation updates lag — see the doc-side findings below.
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/README.md:208-209` |
| Status | Open |
| Status | Resolved |
**Description:** The packaging section states "The library jar is under `zb-mom-ww-mxgateway-client/build/libs`. The installed CLI distribution is under `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli`." The library-jar path is correct, but the install-distribution path is wrong — gradle's `installDist` produces a directory whose name matches the project name, not the (now-retired) short name, so the actual path is `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/`. The e2e script (`scripts/run-client-e2e-tests.ps1`) uses the correct path, so the script works; only the README is wrong.
**Recommendation:** Correct the README to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`. Comment-only.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Corrected the install-distribution path in `clients/java/README.md:210` from `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli` to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`, matching what gradle's `installDist` actually produces (the directory name matches the subproject name). The library-jar path on the preceding line was already correct and is unchanged. Doc-only change; the e2e script `scripts/run-client-e2e-tests.ps1` was already using the correct path.
### Client.Java-030
@@ -496,13 +496,13 @@ documentation updates lag — see the doc-side findings below.
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `397d3c5` added the missing `QueryActiveAlarmsRequest` proto message and the corresponding `rpc QueryActiveAlarms` to `mxaccess_gateway.proto`. The Java client now generates the request type and the gRPC stub method, and `MxGatewayClient.queryActiveAlarms` correctly references both. No unit test exercises the new RPC end-to-end on the Java side — the proto compiles, the import resolves, the method is callable, but the absence of a `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` (or similar) fixture means a serialisation regression in `QueryActiveAlarmsRequest` or the streaming reply would not surface in the Java unit tests.
**Recommendation:** Add a unit test in `MxGatewayFixtureTests` (or wherever the alarm fixtures live) that pushes a `QueryActiveAlarmsRequest` through a `FakeMxAccessGateway` and asserts the `ActiveAlarmSnapshot` stream is consumed correctly — mirror the existing `acknowledgeAlarm` test shape.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Re-triaged the recommendation: the suite uses the `InProcessGateway` + `TestGatewayService` fixture in `MxGatewayClientSessionTests` (no separate `FakeMxAccessGateway`), and there is in fact no existing `acknowledgeAlarm` test in the current tree to mirror — the alarm RPC surface has zero coverage. Added `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` to `MxGatewayClientSessionTests` (the same file/fixture style as the other unary/streaming RPC tests): the test overrides `TestGatewayService.queryActiveAlarms` to capture the inbound `QueryActiveAlarmsRequest` and emit two `ActiveAlarmSnapshot` messages (one `ACTIVE`, one `ACTIVE_ACKED` with an operator/comment populated), then calls `MxGatewayClient.queryActiveAlarms` with a request carrying `session_id`, `client_correlation_id`, and `alarm_filter_prefix`. It awaits `onCompleted` via a `CountDownLatch`, closes the subscription, and asserts (a) the server observed all three inbound request fields, (b) both snapshots arrived in order with the expected `alarm_full_reference`/`current_state`/`severity`/`operator_user`, and (c) the observer never received an error. TDD red phase confirmed by temporarily changing the `session_id` assertion to `"WRONG-SESSION-ID"``gradle :zb-mom-ww-mxgateway-client:test --tests "…queryActiveAlarmsForwardsRequestAndStreamsSnapshots"` failed with `AssertionFailedError at MxGatewayClientSessionTests.java:252`. Restoring the assertion turned the build green. Full `gradle build` from `clients/java` is BUILD SUCCESSFUL.
### Client.Java-031
@@ -511,10 +511,10 @@ documentation updates lag — see the doc-side findings below.
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/java/README.md:13,17,26` |
| Status | Open |
| Status | Resolved |
**Description:** The README prose at lines 1326 introduces the subprojects as `mxgateway-client` and `mxgateway-cli` (the old short names) when discussing the layout. Those are no longer the actual subproject names — `settings.gradle` declares `zb-mom-ww-mxgateway-client` / `zb-mom-ww-mxgateway-cli`. The prose works as a naming mnemonic, but it confuses anyone trying to map README descriptions to actual gradle output, IDE project trees, or the e2e script.
**Recommendation:** Either (a) update the prose to the full prefixed names, or (b) clarify in a one-line note: "The subprojects are `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli`; this README refers to them by their short suffixes below for readability." (a) is more honest; (b) preserves readability at the cost of one extra concept.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Took option (a): updated the README layout-section prose in `clients/java/README.md:17,22,26` (the `mxgateway-client` generates… paragraph, the `mxgateway-client` exposes… paragraph, and the `mxgateway-cli` depends on `mxgateway-client`… paragraph) to use the full prefixed `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` names, matching the layout block at lines 1314 and `settings.gradle`. Reflows the final paragraph slightly because the prefixed names push past the existing 80-column wrap; content is unchanged otherwise. Doc-only change.
+3 -3
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 1 |
| Open findings | 0 |
## Checklist coverage
@@ -420,10 +420,10 @@ The CLI integration in Client.Rust-014 works either way; this is solely about de
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-33` |
| Status | Open |
| Status | Resolved |
**Description:** The crate-name change in commit `397d3c5` (top-level `mxgateway-client``zb-mom-ww-mxgateway-client`) is reflected in `Cargo.toml`, `Cargo.lock`, every `use zb_mom_ww_mxgateway_client::` import, and `build.rs`. The "Recommended layout" block in `RustClientDesign.md:21-33` shows a nested structure with a top-level `crates/zb-mom-ww-mxgateway-client/` subdirectory containing `src/lib.rs`, `src/client.rs`, etc. — but the actual layout on disk is flat: the top-level crate lives at `clients/rust/` directly (with `src/`, `Cargo.toml`, and `build.rs` at the workspace root) and `crates/mxgw-cli/` is the only nested member. A reader consulting the design to understand the layout will be misled into looking for `crates/zb-mom-ww-mxgateway-client/` that does not exist. `CLAUDE.md` requires design docs to track code changes.
**Recommendation:** Update `RustClientDesign.md:14-33` to describe the actual flat layout: workspace root at `clients/rust/`, top-level crate `zb-mom-ww-mxgateway-client` (declared in `Cargo.toml`) with `src/lib.rs`, `src/client.rs`, etc. directly under it; `crates/mxgw-cli/` is the single member subcrate. Alternatively label the block as "Aspirational nested layout (not currently adopted)" and add a separate "Current layout" section.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Rewrote the "Crate Layout" section of `clients/rust/RustClientDesign.md` to describe the actual flat layout on disk instead of the aspirational nested form. The section now opens with a short paragraph stating that the workspace is rooted at `clients/rust/`, that `clients/rust/Cargo.toml` declares the top-level crate `zb-mom-ww-mxgateway-client` itself (flat layout — `src/` sits directly under the workspace root, not under `crates/`), and that the only `[workspace.members]` entry is the `mxgw-cli` binary subcrate under `crates/mxgw-cli/`. The accompanying tree replaces the fictitious `crates/zb-mom-ww-mxgateway-client/` subdirectory with the real files: `Cargo.toml`, `Cargo.lock`, `build.rs`, `README.md`, `RustClientDesign.md`, `src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs` plus `src/generated/` (tonic-build output), `tests/{client_behavior,proto_fixtures}.rs`, and `crates/mxgw-cli/` annotated as the sole workspace member producing the `mxgw` binary. `cargo build --workspace` and `cargo test --workspace` are clean at HEAD (29 tests pass across the library, integration, and proto-fixture targets). `cargo clippy --workspace --all-targets -- -D warnings` reports three pre-existing errors at HEAD on this dev box (`options.rs:98,143` missing docs, `galaxy.rs:282` `clone_on_copy`, `session.rs:522` `enum_variant_names`) that exist independently of this doc-only change — verified by running clippy with the design-doc edit stashed; tracking those is out of scope for Client.Rust-021.
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -284,13 +284,13 @@ Python and Go descriptors. No fields renumbered or repurposed.
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:31-41` (`QueryActiveAlarmsRequest`) |
| Status | Open |
| Status | Resolved |
**Description:** The new public message `QueryActiveAlarmsRequest` (added in commit `397d3c5`) has `session_id = 1` with a comment "session_id is currently unused (the snapshot is session-less) but reserved so a future per-session view can be added without a wire break." The field is not marked `reserved` and clients are not told whether to populate it. As shipped today the server-side implementation (`MxAccessGatewayService.QueryActiveAlarms`) ignores it, but a Java/Rust/Go/Python client author reading the proto alone can't tell whether to leave it empty or write the caller's session id.
**Recommendation:** Either (a) tighten the comment to "Clients may leave `session_id` empty; the gateway currently ignores it and serves the session-less central-monitor cache. A future version may use it to scope the snapshot to one session." — making the "currently-ignored" semantic unambiguous — or (b) remove `session_id` and use `reserved 1; reserved "session_id";` until the per-session view actually exists. Option (a) is cheaper and preserves a forward-compat hint.
**Resolution:** _(empty until closed)_
**Resolution:** _(2026-05-24)_ Took the finding's recommended option (a) — comment-only, no wire-format change. Confirmed the ambiguity by reading lines 37-39 of `mxaccess_gateway.proto` at `d2d2e5f`: the prior wording said the field was "reserved" while remaining a populated `string session_id = 1` (the protobuf `reserved` keyword was never used), so a client author could reasonably read it either way. Reworded the header comment on `QueryActiveAlarmsRequest` to verbatim "Clients may leave `session_id` empty; the gateway currently ignores it and serves the session-less central-monitor cache. A future version may use it to scope the snapshot to one session." — making the "currently-ignored" semantic unambiguous, removing the misleading use of "reserved" (which now only describes the protobuf keyword in this file), and preserving the forward-compatibility hint. The field, its number (1), and `client_correlation_id = 2` / `alarm_filter_prefix = 3` are unchanged — the additive-only contract rule is honoured. Added regression test `ProtobufContractRoundTripTests.QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter` which pins the three field numbers (`session_id = 1`, `client_correlation_id = 2`, `alarm_filter_prefix = 3`) by name + number against the descriptor, ensuring a future editor cannot quietly renumber. `dotnet build src/ZB.MOM.WW.MxGateway.slnx` is green (0 warnings, 0 errors); the new test plus the existing 39 `ProtobufContractRoundTripTests` are 40 / 40 green.
### Contracts-017
@@ -299,10 +299,10 @@ Python and Go descriptors. No fields renumbered or repurposed.
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:23-29` (the `rpc QueryActiveAlarms` block) |
| Status | Open |
| Status | Resolved |
**Description:** The RPC comment on `QueryActiveAlarms` describes the stream order ("Point-in-time snapshot of the currently-active alarm set served from the gateway's always-on alarm monitor cache") and the session-less semantic, but does not mention that `QueryActiveAlarmsRequest.alarm_filter_prefix` narrows the snapshot by a `StartsWith(reference)` match on `alarm_full_reference`. A client author reading the RPC comment alone cannot discover the filter capability without inspecting the request message.
**Recommendation:** Extend the RPC comment with one line: "`QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the snapshot to alarms whose `alarm_full_reference` starts with the given prefix; an empty prefix returns the full set." Comment-only.
**Resolution:** _(empty until closed)_
**Resolution:** _(2026-05-24)_ Confirmed against `mxaccess_gateway.proto` at `d2d2e5f`: lines 29-34 of the `rpc QueryActiveAlarms` block describe the snapshot order and session-less origin but never mention `alarm_filter_prefix`, which is only documented on the request message itself two lines below — a client author reading the RPC comment alone cannot discover the filter capability. Extended the RPC comment with the finding's recommended one-line addition verbatim: "`QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the snapshot to alarms whose `alarm_full_reference` starts with the given prefix; an empty prefix returns the full set." Comment-only change; the RPC signature, request/reply types, and field numbers are unchanged. The same regression test added under Contracts-016 (`ProtobufContractRoundTripTests.QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter`) also exercises a round-trip of `QueryActiveAlarmsRequest` with `alarm_filter_prefix` populated and pins `QueryActiveAlarms` on the public `MxAccessGateway` service surface, so the RPC + filter field combination the new comment documents is wire-stable. `dotnet build src/ZB.MOM.WW.MxGateway.slnx` is green (0 warnings, 0 errors).
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 3 |
| Open findings | 0 |
## Checklist coverage
@@ -424,13 +424,13 @@ The Write parity test (IntegrationTests-012's resolution) added exactly this ass
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) |
| Status | Open |
| Status | Resolved |
**Description:** The walker introduced in `dc9c0c9` searches parents for a directory containing `src/` plus either `.git` or a `*.slnx`/`*.sln` file, falling back to `Directory.GetCurrentDirectory()` when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a `.git` or `.slnx` marker silently uses the wrong root and then the live `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current `bin/` layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.
**Recommendation:** Throw a clear `InvalidOperationException` from `ResolveRepositoryRoot` when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (`MXGATEWAY_LIVE_MXACCESS_WORKER_EXE`) remains the escape hatch for unusual deployments.
**Resolution:** _(empty until closed)_
**Resolution:** Resolved 2026-05-24 — Replaced the silent `Directory.GetCurrentDirectory()` fallback in `IntegrationTestEnvironment.ResolveRepositoryRoot` with an `InvalidOperationException` whose message names the start directory, the expected markers (`src/`, `.git`, `*.sln`, `*.slnx`), and the `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` escape hatch. Added regression test `IntegrationTestEnvironmentTests.ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers` that creates an isolated temp directory under `Path.GetTempPath()` containing no markers, asserts the throw, and asserts the message carries the start directory, `.git`, `.sln`, and the worker-exe env-var name so a future refactor that dropped any of these from the diagnostic would fail loudly. TDD red/green confirmed: the new test fails against the pre-fix walker (`Assert.Throws() Failure: No exception was thrown`) and passes after the throw is added. The existing `ResolveRepositoryRoot_AcceptsGitWorktreeFile` test still passes, so the happy path is unaffected.
### IntegrationTests-023
@@ -439,13 +439,13 @@ The Write parity test (IntegrationTests-012's resolution) added exactly this ass
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` |
| Status | Open |
| Status | Resolved |
**Description:** `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the principal carries an `LdapGroupClaimType` claim containing `GwAdmin` (line 26-28). After the `27ed651` refactor, `DashboardAuthenticator.CreatePrincipal` also emits a `ClaimTypes.Role` claim with the mapped role (`Admin` for the seeded `GwAdmin``Admin` mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting `Role: Admin` (e.g. a future refactor of `MapGroupsToRoles` that returned an empty list) would silently pass.
**Recommendation:** Add an `Assert.Contains` asserting the principal holds a `ClaimTypes.Role` claim with value `DashboardRoles.Admin` (or `"Admin"`). Mirror the style of the existing `LdapGroupClaimType` assertion.
**Resolution:** _(empty until closed)_
**Resolution:** Resolved 2026-05-24 — Added an `Assert.Contains` to `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` that mirrors the existing `LdapGroupClaimType` assertion style, asserting the principal carries a `ClaimTypes.Role` claim with value `DashboardRoles.Admin`. Couples the live test to the `GwAdmin``Admin` mapping wired through `CreateAuthenticator`'s `GroupToRole` dictionary, so a future regression to `MapGroupsToRoles` (returning an empty list, dropping the RDN-fallback lookup, breaking the case-insensitive comparer) would now surface as a failed live test rather than a silent pass. The test is gated by `MXGATEWAY_RUN_LIVE_LDAP_TESTS`; build is green and the test continues to skip cleanly when the env var is unset.
### IntegrationTests-024
@@ -454,10 +454,10 @@ The Write parity test (IntegrationTests-012's resolution) added exactly this ass
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) |
| Status | Open |
| Status | Resolved |
**Description:** The inline `NullDashboardEventBroadcaster` private class is the third copy in the repository (the other two live in `EventStreamServiceTests` and `GatewayEndToEndFakeWorkerSmokeTests` under the Tests module — see Tests-025). Each carries a singleton `Instance` field and a no-op `Publish`. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.
**Recommendation:** When Tests-025 lands the shared `TestSupport/NullDashboardEventBroadcaster.cs`, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.
**Resolution:** _(empty until closed)_
**Resolution:** Resolved 2026-05-24 — Chose option (b) (extract within IntegrationTests) so the unit-test and integration-test projects stay independently buildable without a shared test-helpers project. Moved the inline private class out of `WorkerLiveMxAccessSmokeTests.cs` into a new public type at `src/ZB.MOM.WW.MxGateway.IntegrationTests/TestSupport/NullDashboardEventBroadcaster.cs` (namespace `ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport`), mirroring the layout the unit-test Tests project established for Tests-007/Tests-021. The fixture wires through the same `NullDashboardEventBroadcaster.Instance` singleton, but the constructor is now private so future integration tests can rely on a single shared no-op. A new `using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport;` in `WorkerLiveMxAccessSmokeTests.cs` is the only change to its `EventStreamService` wiring. Build is green (0 warnings, 0 errors); no regression in the existing tests (4 passed / 15 skipped, identical to the pre-change baseline).
+28 -29
View File
@@ -12,13 +12,13 @@ Each module's `findings.md` is the source of truth; this file is generated from
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 17 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 21 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 5 | 31 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 31 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 21 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 1 | 21 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 2 | 17 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 3 | 24 |
| [Server](Server/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 8 | 43 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 2 | 26 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 21 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 17 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 24 |
| [Server](Server/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 43 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 26 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 25 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-24 | `d692232` | Reviewed | 0 | 30 |
@@ -26,29 +26,7 @@ Each module's `findings.md` is the source of truth; this file is generated from
Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Java-027 | Medium | Documentation & comments | `clients/java/README.md:36,107-175,185,205,220`, `clients/java/JavaClientDesign.md:195-211` | Commit `397d3c5` renamed the gradle subprojects to `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` in `settings.gradle`, but did not propagate that rename into the README's documented gradle commands or into `JavaClientDesign.md… |
| Client.Java-028 | Medium | Documentation & comments | `clients/java/JavaClientDesign.md:23-27` | The build-layout block in `JavaClientDesign.md` still shows the old Java package paths `com/dohertylan/mxgateway/client/` and `com/dohertylan/mxgateway/cli/`. The actual source tree was moved to `com/zb/mom/ww/mxgateway/{client,cli}/` in c… |
| Server-031 | Medium | Concurrency & thread safety | `src/MxGateway.Server/Workers/WorkerClient.cs:392-422` (gateway-side heartbeat watchdog); `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:588-617` (worker-side heartbeat loop); `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:14,67-76` (shared `_writeLock`) | Surfaced during the 2026-05-20 cross-language e2e re-run against gateway `b794c46`. The .NET phase succeeded through `open-session`/`register`/`bulk-subscribe`/`bulk-read`/`bulk-unsubscribe`/`stream-events`/`write` but then failed on its t… |
| Server-032 | Medium | Error handling & resilience | `src/MxGateway.Server/Workers/WorkerClient.cs:70-77,463-484` (gateway-side `_events` channel); `src/MxGateway.Server/Configuration/EventOptions.cs:8` (default capacity 10,000); `src/MxGateway.Server/Grpc/EventStreamService.cs` (consumer) | Surfaced during the 2026-05-20 cross-language e2e re-run against gateway `b794c46`. The Java phase advised ~55 items (`item-handle 63`) before failing on the next `advise` call with the Server-030 diagnostic `Session ... is not ready. Sess… |
| Server-038 | Medium | Security | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs:23-44` | `EventsHub` is gated by `[Authorize(Policy = DashboardAuthenticationDefaults.HubClientsPolicy)]`, which checks only that the caller carries a dashboard role (Admin or Viewer). `SubscribeSession(sessionId)` accepts any non-empty session id… |
| Tests-026 | Medium | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs`, `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126` | The new `IDashboardEventBroadcaster` is wired into `EventStreamService` at line 123 (commit `d692232`) and the broadcaster's `Publish` is the only path that mirrors per-session events into the dashboard `EventsHub`. The unit tests inject `… |
| Client.Java-029 | Low | Documentation & comments | `clients/java/README.md:208-209` | The packaging section states "The library jar is under `zb-mom-ww-mxgateway-client/build/libs`. The installed CLI distribution is under `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli`." The library-jar path is correct, but the instal… |
| Client.Java-030 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/` | Commit `397d3c5` added the missing `QueryActiveAlarmsRequest` proto message and the corresponding `rpc QueryActiveAlarms` to `mxaccess_gateway.proto`. The Java client now generates the request type and the gRPC stub method, and `MxGatewayC… |
| Client.Java-031 | Low | mxaccessgw conventions | `clients/java/README.md:13,17,26` | The README prose at lines 1326 introduces the subprojects as `mxgateway-client` and `mxgateway-cli` (the old short names) when discussing the layout. Those are no longer the actual subproject names — `settings.gradle` declares `zb-mom-ww-… |
| Client.Rust-021 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:14-33` | The crate-name change in commit `397d3c5` (top-level `mxgateway-client``zb-mom-ww-mxgateway-client`) is reflected in `Cargo.toml`, `Cargo.lock`, every `use zb_mom_ww_mxgateway_client::` import, and `build.rs`. The "Recommended layout" b… |
| Contracts-016 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:31-41` (`QueryActiveAlarmsRequest`) | The new public message `QueryActiveAlarmsRequest` (added in commit `397d3c5`) has `session_id = 1` with a comment "session_id is currently unused (the snapshot is session-less) but reserved so a future per-session view can be added without… |
| Contracts-017 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:23-29` (the `rpc QueryActiveAlarms` block) | The RPC comment on `QueryActiveAlarms` describes the stream order ("Point-in-time snapshot of the currently-active alarm set served from the gateway's always-on alarm monitor cache") and the session-less semantic, but does not mention that… |
| IntegrationTests-022 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) | The walker introduced in `dc9c0c9` searches parents for a directory containing `src/` plus either `.git` or a `*.slnx`/`*.sln` file, falling back to `Directory.GetCurrentDirectory()` when nothing matches. The fallback masks misconfiguratio… |
| IntegrationTests-023 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` | `AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the principal carries an `LdapGroupClaimType` claim containing `GwAdmin` (line 26-28). After the `27ed651` refactor, `DashboardAuthenticator.CreatePrincipal` also emits a `ClaimTypes… |
| IntegrationTests-024 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) | The inline `NullDashboardEventBroadcaster` private class is the third copy in the repository (the other two live in `EventStreamServiceTests` and `GatewayEndToEndFakeWorkerSmokeTests` under the Tests module — see Tests-025). Each carries a… |
| Server-039 | Low | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:37-58` | `HubTokenService.Validate` deserializes the protected JSON payload and trusts `payload.Roles` even when `payload.Name` and `payload.NameIdentifier` are both `null`. The resulting `ClaimsPrincipal` has the `MxGateway.Dashboard.HubToken` sch… |
| Server-040 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardAuthenticator.cs:140-160` (`MapGroupsToRoles`) | `MapGroupsToRoles` checks each LDAP group against the role map twice — first by the full group string, then by `ExtractFirstRdnValue(group)` — and `TryGetValue` short-circuits on the first hit. The precedence ("full match wins over RDN mat… |
| Server-041 | Low | Design-document adherence | `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/IDashboardEventBroadcaster.cs:6-10` | `IDashboardEventBroadcaster.Publish` is documented as "Implementations must never throw — broadcast failures are best-effort and must not disrupt the source gRPC stream." `EventStreamService` honors that contract by passing the call throug… |
| Server-042 | Low | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/DashboardSnapshotPublisher.cs:18-41` | `DashboardSnapshotPublisher.ExecuteAsync` reads from `IDashboardSnapshotService.WatchSnapshotsAsync` inside an outer `try` that catches `OperationCanceledException` only. A failure inside `WatchSnapshotsAsync` (e.g. the snapshot service th… |
| Server-043 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:1`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:24` | `HubTokenService` is registered as a singleton (good — data protection providers are thread-safe and a single protector instance is correct) and shared by both `DashboardHubConnectionFactory` (per-circuit scoped, mints fresh tokens from th… |
| Tests-025 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:285-289`, `src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:417-421` | Commit `d692232` widened the `EventStreamService` constructor with an `IDashboardEventBroadcaster` parameter. Two test files now carry an identical `private sealed class NullDashboardEventBroadcaster : IDashboardEventBroadcaster` with a si… |
_No pending findings._
## Closed findings
@@ -89,6 +67,8 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Java-014 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124` |
| Client.Java-015 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138`, `MxGatewayClient.java:183-191,224-232,322-329`, `GalaxyRepositoryClient.java:164-170,212-214` |
| Client.Java-021 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135` |
| Client.Java-027 | Medium | Resolved | Documentation & comments | `clients/java/README.md:36,107-175,185,205,220`, `clients/java/JavaClientDesign.md:195-211` |
| Client.Java-028 | Medium | Resolved | Documentation & comments | `clients/java/JavaClientDesign.md:23-27` |
| Client.Python-003 | Medium | Resolved | Error handling & resilience | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Client.Python-005 | Medium | Resolved | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Client.Python-009 | Medium | Resolved | Testing coverage | `clients/python/tests/` |
@@ -116,7 +96,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-016 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Sessions/GatewaySession.cs:790-797`, `src/MxGateway.Server/Sessions/SessionManager.cs:237-258` |
| Server-021 | Medium | Resolved | Testing coverage | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:266-664`, `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs` |
| Server-030 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Sessions/GatewaySession.cs:952-980` |
| Server-031 | Medium | Resolved | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:392-443` (gateway-side heartbeat watchdog); `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClientOptions.cs:14-67` (new `HeartbeatStuckCeiling` option) |
| Server-032 | Medium | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:510-569` (gateway-side `_events` channel); `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClientOptions.cs:45-53` (`EventChannelFullModeTimeout`) |
| Server-033 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:265-323` (`TryRestoreFromDiskAsync`), `:84-99` (`_firstLoad` / `WaitForFirstLoadAsync`); `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:141-163` (`WaitForCacheBootstrap`) |
| Server-038 | Medium | Resolved | Security | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs:23-44` |
| Tests-003 | Medium | Resolved | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Tests-004 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Tests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
@@ -124,6 +107,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Tests-013 | Medium | Resolved | Testing coverage | `src/MxGateway.Server/Sessions/GatewaySession.cs:449-679`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
| Tests-016 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Galaxy/GalaxyHierarchyCacheTests.cs:29-41,115-124` |
| Tests-020 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceConstraintTests.cs:275-347`, `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:803-829` |
| Tests-026 | Medium | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs`, `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126` |
| Worker-004 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:565-588` |
| Worker-005 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:205-258` (production alarm poll loop) |
| Worker-006 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:117-124`, `src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs:386-491` |
@@ -189,6 +173,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Java-024 | Low | Resolved | Correctness & logic bugs | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883` |
| Client.Java-025 | Low | Resolved | Code organization & conventions | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185` |
| Client.Java-026 | Low | Resolved | Testing coverage | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java` |
| Client.Java-029 | Low | Resolved | Documentation & comments | `clients/java/README.md:208-209` |
| Client.Java-030 | Low | Resolved | Testing coverage | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/` |
| Client.Java-031 | Low | Resolved | mxaccessgw conventions | `clients/java/README.md:13,17,26` |
| Client.Python-001 | Low | Resolved | Documentation & comments | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` |
| Client.Python-002 | Low | Resolved | Code organization & conventions | `clients/python/src/mxgateway/__init__.py:27` |
| Client.Python-004 | Low | Resolved | Correctness & logic bugs | `clients/python/src/mxgateway_cli/commands.py:386,402-404` |
@@ -215,6 +202,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Rust-017 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:79-99,156-163` |
| Client.Rust-019 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:96-100` |
| Client.Rust-020 | Low | Resolved | Documentation & comments | `clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39` |
| Client.Rust-021 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-33` |
| Contracts-001 | Low | Resolved | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Contracts-003 | Low | Won't Fix | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Contracts-004 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
@@ -228,6 +216,8 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Contracts-013 | Low | Resolved | Documentation & comments | `src/MxGateway.Tests/Contracts/GatewayContractInfoTests.cs:14` |
| Contracts-014 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:549-553` |
| Contracts-015 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:571-582` |
| Contracts-016 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:31-41` (`QueryActiveAlarmsRequest`) |
| Contracts-017 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:23-29` (the `rpc QueryActiveAlarms` block) |
| IntegrationTests-007 | Low | Resolved | Concurrency & thread safety | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| IntegrationTests-008 | Low | Resolved | Code organization & conventions | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| IntegrationTests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
@@ -239,6 +229,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| IntegrationTests-018 | Low | Resolved | Code organization & conventions | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037`, `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595` |
| IntegrationTests-020 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622` |
| IntegrationTests-021 | Low | Resolved | Testing coverage | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622` |
| IntegrationTests-022 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138` (`ResolveRepositoryRoot` / `IsRepositoryRoot`) |
| IntegrationTests-023 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29` |
| IntegrationTests-024 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` (`NullDashboardEventBroadcaster` private class at end of file) |
| Server-007 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` |
| Server-008 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` |
| Server-009 | Low | Resolved | Error handling & resilience | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` |
@@ -262,6 +255,11 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-035 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:176` (call site), `:327-352` (`PersistSnapshotAsync`) |
| Server-036 | Low | Resolved | Error handling & resilience | `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:345-348` (`PersistSnapshotAsync` catch) |
| Server-037 | Low | Resolved | Testing coverage | `src/MxGateway.Tests/Galaxy/GalaxyHierarchySnapshotStoreTests.cs`, `src/MxGateway.Tests/Galaxy/GalaxyHierarchyCacheTests.cs` |
| Server-039 | Low | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:37-58` |
| Server-040 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardAuthenticator.cs:140-160` (`MapGroupsToRoles`) |
| Server-041 | Low | Resolved | Design-document adherence | `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/IDashboardEventBroadcaster.cs:6-10` |
| Server-042 | Low | Resolved | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/DashboardSnapshotPublisher.cs:18-41` |
| Server-043 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:1`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:24` |
| Tests-007 | Low | Resolved | Code organization & conventions | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Tests-008 | Low | Resolved | mxaccessgw conventions | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Tests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
@@ -277,6 +275,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Tests-022 | Low | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerBulkTests.cs:52-61,90-99,126-135,163-172,202-211,238-247,282-294,339-360,413-434,484-506,553-567,663-688` |
| Tests-023 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Tests/Gateway/Sessions/SessionWorkerClientFactoryFakeWorkerTests.cs:334-374` |
| Tests-024 | Low | Resolved | Testing coverage | `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:713-730,784-801,859-876`, `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceConstraintTests.cs` |
| Tests-025 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:285-289`, `src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:417-421` |
| Worker-009 | Low | Resolved | Performance & resource management | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` |
| Worker-010 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` |
| Worker-011 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` |
+19 -19
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 0 |
## Checklist coverage
@@ -556,8 +556,8 @@ contention nor the bounded `_events` channel saw any changes in this wave.
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/MxGateway.Server/Workers/WorkerClient.cs:392-422` (gateway-side heartbeat watchdog); `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:588-617` (worker-side heartbeat loop); `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:14,67-76` (shared `_writeLock`) |
| Status | Open |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:392-443` (gateway-side heartbeat watchdog); `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClientOptions.cs:14-67` (new `HeartbeatStuckCeiling` option) |
| Status | Resolved |
**Description:** Surfaced during the 2026-05-20 cross-language e2e re-run against gateway `b794c46`. The .NET phase succeeded through `open-session`/`register`/`bulk-subscribe`/`bulk-read`/`bulk-unsubscribe`/`stream-events`/`write` but then failed on its third `advise` call with the Server-030 diagnostic `Session ... is not ready. Session state is Ready; worker state is Faulted.` The gateway stdout log records the underlying cause: **`Worker client faulted for session session-01a1a07fa59c489983a719821fa46e72: Worker heartbeat expired. Last heartbeat was at 2026-05-20T17:20:39.+00:00.`** — a real 15s+ gap with no `WorkerHeartbeat` envelope arriving from the worker.
@@ -579,7 +579,7 @@ The .NET test pattern stresses the issue uniquely because each `dotnet run --pro
Add a regression test that floods the worker's outbound event channel (e.g., via a high-rate STA fixture or a mock event source emitting at > 1000 events/s for several seconds) and asserts the worker is not faulted while the gateway has no `StreamEvents` consumer attached.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** 2026-05-24 — Re-triaged at HEAD `d2d2e5f`: the gateway-side "skip-while-command-in-flight" guard (recommendation #2) is already implemented and verified against source. `WorkerClient.HeartbeatLoopAsync` (`src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:403-443`) now skips the `HeartbeatExpired` fault while `TryGetOldestPendingCommandAge` reports an in-flight command younger than `WorkerClientOptions.HeartbeatStuckCeiling` (default 75s = 5× `HeartbeatGrace`). Once the oldest pending command exceeds the ceiling the watchdog fires anyway, so a truly stuck COM call doesn't hide the worker forever. The new `HeartbeatStuckCeiling` option is documented inline with a back-reference to Worker-023, the worker-side mirror. Regression tests in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs`: `HeartbeatMonitor_WhenCommandInFlightWithinCeiling_DoesNotFaultOnExpiredHeartbeat` (the named scenario — parks an unanswered `InvokeAsync` past `HeartbeatGrace` but well within `HeartbeatStuckCeiling` and asserts the client stays `Ready`) and `HeartbeatMonitor_WhenPendingCommandExceedsStuckCeiling_FaultsClient` (advances past the ceiling and asserts the watchdog still fires). Recommendation #1 (decoupling the worker-side `_writeLock`) is the worker module's concern and is tracked by Worker-017 / Worker-023 — out of scope for the Server module here.
### Server-032
@@ -587,8 +587,8 @@ Add a regression test that floods the worker's outbound event channel (e.g., via
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/MxGateway.Server/Workers/WorkerClient.cs:70-77,463-484` (gateway-side `_events` channel); `src/MxGateway.Server/Configuration/EventOptions.cs:8` (default capacity 10,000); `src/MxGateway.Server/Grpc/EventStreamService.cs` (consumer) |
| Status | Open |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:510-569` (gateway-side `_events` channel); `src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClientOptions.cs:45-53` (`EventChannelFullModeTimeout`) |
| Status | Resolved |
**Description:** Surfaced during the 2026-05-20 cross-language e2e re-run against gateway `b794c46`. The Java phase advised ~55 items (`item-handle 63`) before failing on the next `advise` call with the Server-030 diagnostic `Session ... is not ready. Session state is Ready; worker state is Faulted.`. The gateway stdout log records: **`Worker client faulted for session session-adfcc808da974808947e87db060c2b03: Worker event channel rejected an event.`** — the gateway-side per-session bounded event channel filled up and `Channel.Writer.TryWrite` returned `false`, triggering the fail-fast path in `EnqueueWorkerEventAsync` (`WorkerClient.cs:467-484`).
@@ -612,7 +612,7 @@ The diagnostic `"Worker event channel rejected an event."` also does not name th
Add a regression test that advises N items without an active `StreamEvents` consumer, lets the channel fill, and asserts the produced fault message contains the channel-depth diagnostic (#2) — gated so that #3 is not required.
**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
**Resolution:** 2026-05-24 — Re-triaged at HEAD `d2d2e5f`: recommendation #2 (improved diagnostic) is already implemented and verified against source. `WorkerClient.EnqueueWorkerEventAsync` (`src/ZB.MOM.WW.MxGateway.Server/Workers/WorkerClient.cs:525-569`) now (a) attempts `TryWrite` first for the fast path, (b) on full-channel falls through to `WriteAsync` with a linked `CancellationTokenSource` cancelled after `WorkerClientOptions.EventChannelFullModeTimeout` (default 5s), so a transient consumer hiccup is absorbed instead of fail-fast on the first overflow event, and (c) on real overflow records `QueueOverflow("worker-events")` and faults with the rich diagnostic message naming the wait timeout, the current channel depth, the channel capacity, and the actionable remediation ("Attach a StreamEvents consumer or raise MxGateway:Events:QueueCapacity."). Regression test: `WorkerClientTests.EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic` (`src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:473-521`) fills a 4-slot channel + one overflow, asserts the worker is faulted, then drains the propagated `WorkerClientException` and pins the diagnostic string contains "Worker event channel rejected", "of 4 capacity", "StreamEvents", and "MxGateway:Events:QueueCapacity". Recommendation #1 (the prose contract in `gateway.md` / `docs/DesignDecisions.md` / client READMEs) is out of scope for this pass — the prompt restricts edits to `src/ZB.MOM.WW.MxGateway.Server/**`, `src/ZB.MOM.WW.MxGateway.Tests/**`, and this findings file; the documentation update needs to land in a follow-up that has docs access. Recommendation #3 (overflow grace window) was already implemented in spirit by the `WriteAsync` + timeout switch — the channel now absorbs a transient burst up to the configured wait timeout, satisfying #3's "consumer hiccup resilience" goal without requiring a separate counter.
### Server-033
@@ -696,13 +696,13 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Medium |
| Category | Security |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs:23-44` |
| Status | Open |
| Status | Resolved |
**Description:** `EventsHub` is gated by `[Authorize(Policy = DashboardAuthenticationDefaults.HubClientsPolicy)]`, which checks only that the caller carries a dashboard role (Admin or Viewer). `SubscribeSession(sessionId)` accepts any non-empty session id and joins the caller to `session:{id}`. A Viewer who knows or guesses a session id can therefore subscribe to any session's MxEvent stream once `DashboardEventBroadcaster` is broadcasting (which it now is, per `d692232`). The per-session ACL that gates the gRPC `StreamEvents` RPC is not replicated.
**Recommendation:** Before the EventsHub is exercised by Admin-only sessions or session-scoped Viewer roles, gate `SubscribeSession` on a session-access check — either via a per-session role check in the hub method itself, or by storing a per-user allowed-session-id set in the connection's `Context.Items` at connect time and rejecting subscribes outside that set. The current dashboard surfaces only a per-page Session Details view that the page can prove it's authorized for, but as soon as a Viewer role exists the gap matters.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Documented the v1 acceptance per the prompt's "practical fix for v1" direction. Added a detailed `<remarks>` block to `EventsHub.SubscribeSession` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs`) stating that (a) in v1 the hub-level `HubClientsPolicy` only requires one of the dashboard roles (Admin or Viewer) and both may subscribe to any session id, (b) this is acceptable today because the dashboard's per-session views show non-secret session metadata any authenticated user can already see and value logging is gated by the same redaction policy, and (c) the per-session ACL that gates the gRPC `StreamEvents` RPC is intentionally not yet mirrored here. Added an explicit `TODO(per-session-acl)` describing the future enforcement seam — once a role/scope is introduced that scopes a Viewer to a specific session or tenant, add a session-access check at this method (inline on `Context.User` claims/`Context.Items`, or via a dedicated authorization policy applied to the hub method). No code-behavior change in this pass; the per-session ACL data model design is out of scope for the resolution window. No new regression test (the change is documentation-only).
### Server-039
@@ -711,13 +711,13 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:37-58` |
| Status | Open |
| Status | Resolved |
**Description:** `HubTokenService.Validate` deserializes the protected JSON payload and trusts `payload.Roles` even when `payload.Name` and `payload.NameIdentifier` are both `null`. The resulting `ClaimsPrincipal` has the `MxGateway.Dashboard.HubToken` scheme as its `AuthenticationType` and the role claims, but no identity claims. `Identity?.IsAuthenticated` returns `true` because the auth type is non-empty, so the principal satisfies `IsAuthenticated` checks and `IsInRole` checks even though it has no caller identity. A token forged from a corrupted data-protection store could pass authorization without an associated user.
**Recommendation:** Mark `HubTokenPayload.Name` and `HubTokenPayload.NameIdentifier` as required (e.g. with `[JsonRequired]` once the project standardizes the JSON binder, or by validating non-null explicitly after deserialization) and reject the token if either is missing. Alternatively, document on `IDashboardAuthorizationHandler` consumers that they must check `Identity?.Name` is non-null before honoring role claims from this scheme.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — `HubTokenService.Validate` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs`) now rejects a deserialized payload where both `Name` and `NameIdentifier` are null/empty — returning `null` rather than emitting a principal with role claims but no caller identity. The check sits immediately after the protector unprotect and the null-payload guard, with a comment back-referencing Server-039. Either field is sufficient (a token minted with only a `NameIdentifier` still validates), matching the existing `Issue` path where the cookie principal may carry just one of them. New test file `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/HubTokenServiceTests.cs` with six tests: `Validate_TokenWithNullNameAndNullNameIdentifier_ReturnsNull` (the named regression — confirmed to fail before the fix and pass after), `Validate_TokenWithName_ReturnsAuthenticatedPrincipal`, `Validate_TokenWithOnlyNameIdentifier_ReturnsPrincipal`, plus null/empty/garbage-token sanity checks. Verified by passing tests.
### Server-040
@@ -726,13 +726,13 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardAuthenticator.cs:140-160` (`MapGroupsToRoles`) |
| Status | Open |
| Status | Resolved |
**Description:** `MapGroupsToRoles` checks each LDAP group against the role map twice — first by the full group string, then by `ExtractFirstRdnValue(group)` — and `TryGetValue` short-circuits on the first hit. The precedence ("full match wins over RDN match") is correct because the map's key set is operator-controlled and matches should resolve deterministically, but the lookup ordering is not documented. A future maintainer reading the code can't tell whether "fall through to RDN" is intentional or a leftover from refactoring `IsMemberOfRequiredGroup`.
**Recommendation:** Add a one-line comment above the loop explaining the precedence: full DN/CN literal first, leading-RDN fallback second. Mention the case-insensitive map comparer (`OrdinalIgnoreCase`) so the next reader doesn't ask why `"GwAdmin"` matches `"gwadmin"`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Added a precedence comment block above the lookup in `MapGroupsToRoles` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardAuthenticator.cs:156-163`) explaining that the full literal group string is tried first and the leading-RDN value (e.g. `GwAdmin` extracted from `ou=GwAdmin,ou=groups,...`) is the fallback, and back-referencing `DashboardOptions.GroupToRole` as the source of the `OrdinalIgnoreCase` comparer so a maintainer sees why `"GwAdmin"` matches `"gwadmin"`. No code change — existing `DashboardAuthenticatorTests.MapGroupsToRoles_ResolvesByShortNameAndDistinguishedName` already pins both the full-match and RDN-fallback paths and the case-insensitive lookup; pure documentation-only resolution, no new test.
### Server-041
@@ -741,13 +741,13 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Low |
| Category | Design-document adherence |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/IDashboardEventBroadcaster.cs:6-10` |
| Status | Open |
| Status | Resolved |
**Description:** `IDashboardEventBroadcaster.Publish` is documented as "Implementations must never throw — broadcast failures are best-effort and must not disrupt the source gRPC stream." `EventStreamService` honors that contract by passing the call through without a try/catch. The current `DashboardEventBroadcaster` implementation observes the `SendAsync` task's continuation but does not raise synchronously, so the seam is safe today. A future implementation that adds synchronous validation or a serializer hop could throw, faulting the producer loop and ending the gRPC stream.
**Recommendation:** Either wrap the `Publish` call in a `try/catch (Exception ex)` that logs at debug and continues (matching the `DashboardSnapshotPublisher` pattern), or add a code-review checklist note enforcing the never-throw contract on implementations. The wrap is safer because it doesn't depend on convention.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Took the safer wrap. `EventStreamService.ProduceEventsAsync` (`src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-137`) now wraps the `dashboardEventBroadcaster.Publish(...)` call in a `try / catch (Exception ex)` that logs at debug and continues. The producer loop and the gRPC stream are no longer at the mercy of the broadcaster's never-throw discipline — a future implementation that adds synchronous validation or a serializer hop cannot fault the stream. Regression test in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs`: `StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession` (new) injects a `ThrowingDashboardEventBroadcaster` test double that throws `InvalidOperationException` on every `Publish`, then asserts (a) the gRPC stream still yields both events in order, (b) the broadcaster's `Publish` is attempted for every event (so the catch is exercised per-event rather than aborting the loop), and (c) the session does not transition to `Faulted`. Confirmed to fail before the fix (the producer loop surfaced the simulated `InvalidOperationException`) and pass after. Verified by passing tests.
### Server-042
@@ -756,13 +756,13 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/DashboardSnapshotPublisher.cs:18-41` |
| Status | Open |
| Status | Resolved |
**Description:** `DashboardSnapshotPublisher.ExecuteAsync` reads from `IDashboardSnapshotService.WatchSnapshotsAsync` inside an outer `try` that catches `OperationCanceledException` only. A failure inside `WatchSnapshotsAsync` (e.g. the snapshot service throws after a transient SQL failure for the Galaxy summary projection) escapes the outer try and ends the BackgroundService — no automatic reconnect. The sibling `AlarmsHubPublisher` (lines 55-61) wraps its `StreamAsync` consumer in a 5-second reconnect loop with `catch (Exception ex)` and continues. The snapshot publisher should follow the same shape.
**Recommendation:** Wrap the `await foreach` in a `while (!stoppingToken.IsCancellationRequested)` loop with a `catch (Exception ex)` plus a 5-second `Task.Delay`, mirroring `AlarmsHubPublisher`. Today's snapshot service rarely throws on the watch path, but a one-time logger-init failure or transient `IGatewayConfigurationProvider` exception would silently take the dashboard offline.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Mirrored `AlarmsHubPublisher`'s reconnect loop. `DashboardSnapshotPublisher.ExecuteAsync` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/DashboardSnapshotPublisher.cs`) now wraps the `await foreach` in a `while (!stoppingToken.IsCancellationRequested)` loop and catches general exceptions with a logged warning + `Task.Delay(reconnectDelay, stoppingToken)`. The 5-second `DefaultReconnectDelay` is preserved for production via the public constructor; an `internal` overload injects a shorter delay for the regression test (with `[InternalsVisibleTo("ZB.MOM.WW.MxGateway.Tests")]` already in place). Also tightened cancellation handling: the inner `OperationCanceledException` `return`s instead of merely catching, so a normal shutdown exits cleanly rather than re-looping on the cancelled token. New test file `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardSnapshotPublisherTests.cs` with two cases: `ExecuteAsync_WhenSnapshotServiceThrowsOnce_ReconnectsAfterDelay` (the named regression — `ThrowOnceThenYieldSnapshotService` throws on the first `WatchSnapshotsAsync` call and yields a snapshot on the second; the publisher must reconnect, broadcast the snapshot, and the gap between throw and reconnect must respect the configured delay) and `ExecuteAsync_WhenSnapshotServiceCompletes_ReconnectsAfterDelay` (sanity case: a normal `yield break` also triggers the reconnect loop). Confirmed both tests fail against the original single-try implementation (the BackgroundService exits and `SubscribeCount` stays at 1) and pass after the fix. Verified by passing tests.
### Server-043
@@ -771,10 +771,10 @@ Add a regression test that advises N items without an active `StreamEvents` cons
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs:1`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:24` |
| Status | Open |
| Status | Resolved |
**Description:** `HubTokenService` is registered as a singleton (good — data protection providers are thread-safe and a single protector instance is correct) and shared by both `DashboardHubConnectionFactory` (per-circuit scoped, mints fresh tokens from the cookie principal) and `HubTokenAuthenticationHandler` (per-request transient, validates inbound tokens). The class-level docs describe what the service does but not that it is intentionally a singleton with two consumer scopes, so a future maintainer rewriting the DI registration may pick the wrong lifetime.
**Recommendation:** Add a `<remarks>` block to `HubTokenService` noting "Registered as a singleton in `AddGatewayDashboard`; the underlying `ITimeLimitedDataProtector` is thread-safe and shared across hub-token issuance and validation." Optionally add a comment near the DI registration explaining the lifetime contract.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Added a `<remarks>` block to `HubTokenService` (`src/ZB.MOM.WW.MxGateway.Server/Dashboard/HubTokenService.cs`) documenting that the service is registered as a singleton in `DashboardServiceCollectionExtensions.AddGatewayDashboard` and is shared by two consumer scopes — `DashboardHubConnectionFactory` (scoped, per-circuit; calls `Issue` from the cookie-authenticated dashboard) and `HubTokenAuthenticationHandler` (transient, per-request; calls `Validate` from the SignalR negotiate / connection path). Notes that the underlying `ITimeLimitedDataProtector` is thread-safe so concurrent mint/validate from any number of callers is safe, and explicitly asks future maintainers to preserve the singleton lifetime to keep the protector instance stable. Pure documentation change; no test.
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -429,13 +429,13 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:285-289`, `src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:417-421` |
| Status | Open |
| Status | Resolved |
**Description:** Commit `d692232` widened the `EventStreamService` constructor with an `IDashboardEventBroadcaster` parameter. Two test files now carry an identical `private sealed class NullDashboardEventBroadcaster : IDashboardEventBroadcaster` with a singleton `Instance` field and a no-op `Publish`. This mirrors the duplication pattern Tests-007 and Tests-021 already consolidated for `TestServerCallContext` / `RecordingServerStreamWriter` / `AllowAllConstraintEnforcer` / `ManualTimeProvider` into `src/MxGateway.Tests/TestSupport/`; the same pattern should apply here.
**Recommendation:** Extract `NullDashboardEventBroadcaster` to `src/ZB.MOM.WW.MxGateway.Tests/TestSupport/NullDashboardEventBroadcaster.cs` (or a single `DashboardTestDoubles.cs`), delete both nested copies, and update the two `using`-bearing files to import from `TestSupport`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Extracted the shared no-op broadcaster to `src/ZB.MOM.WW.MxGateway.Tests/TestSupport/NullDashboardEventBroadcaster.cs` (single `public sealed` class with the singleton `Instance` field and a private constructor — matches the Tests-007 / Tests-021 consolidation pattern). Deleted both nested duplicates (`EventStreamServiceTests.cs:319-323` and `GatewayEndToEndFakeWorkerSmokeTests.cs:417-421`); the latter already imported `ZB.MOM.WW.MxGateway.Tests.TestSupport` so `NullDashboardEventBroadcaster.Instance` resolves against the shared type. `EventStreamServiceTests.cs` gained a `using ZB.MOM.WW.MxGateway.Tests.TestSupport;`. The integration-tests copy in `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs` was left alone (different module, per scope). Server-041's `ThrowingDashboardEventBroadcaster` remains nested in `EventStreamServiceTests` (single-file usage, no consolidation needed). Build clean (0 warnings), full Tests suite green (486/486).
### Tests-026
@@ -444,10 +444,10 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs`, `src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126` |
| Status | Open |
| Status | Resolved |
**Description:** The new `IDashboardEventBroadcaster` is wired into `EventStreamService` at line 123 (commit `d692232`) and the broadcaster's `Publish` is the only path that mirrors per-session events into the dashboard `EventsHub`. The unit tests inject `NullDashboardEventBroadcaster.Instance`, so the broadcaster invocation is never observed — a regression that silently dropped the `Publish` call (e.g. an `if` accidentally added around it, or removing the broadcaster ctor parameter) would not be caught by any test in this module. The hub-registration tests (`DashboardHubsRegistrationTests`) verify the endpoints exist but not the producer-side hook.
**Recommendation:** Add a fixture to `EventStreamServiceTests` named e.g. `StreamEventsAsync_PublishesEachEventToDashboardBroadcaster`: inject a recording fake that captures `(sessionId, mxEvent)` calls, push two events through the fake session, and assert the broadcaster received both with the correct session id and matching `WorkerSequence`. This pins the broadcast hook and proves the dashboard event mirror is not a no-op.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-05-24 — Added `src/ZB.MOM.WW.MxGateway.Tests/TestSupport/RecordingDashboardEventBroadcaster.cs` — a thread-safe `IDashboardEventBroadcaster` test double backed by a `ConcurrentQueue<DashboardEventCapture>` that captures every `(sessionId, mxEvent)` invocation. Added `StreamEventsAsync_PublishesEachEventToDashboardBroadcaster` to `EventStreamServiceTests`: pushes two events (`WorkerSequence` 7 / `OnDataChange` and 8 / `OnWriteComplete`) through the fake session, drains the stream, and asserts the recording broadcaster captured exactly two publishes with the matching `SessionId`, `WorkerSequence`, and `Family` for each. TDD red/green confirmed: a deliberate "expected 3 captures" failed (`Expected: 3, Actual: 2`) before flipping to the correct count; the green run passes deterministically. The fixture would have caught a regression that drops or wraps `dashboardEventBroadcaster.Publish` at `EventStreamService.cs:133`. Build clean (0 warnings); full Tests suite green (486/486).
@@ -735,9 +735,10 @@ namespace ZB.MOM.WW.MxGateway.Contracts.Proto {
#region Messages
/// <summary>
/// Public request shape for QueryActiveAlarms. session_id is currently unused
/// (the snapshot is session-less) but reserved so a future per-session view
/// can be added without a wire break.
/// Public request shape for QueryActiveAlarms.
/// Clients may leave `session_id` empty; the gateway currently ignores it and
/// serves the session-less central-monitor cache. A future version may use it
/// to scope the snapshot to one session.
/// </summary>
[global::System.Diagnostics.DebuggerDisplayAttribute("{ToString(),nq}")]
public sealed partial class QueryActiveAlarmsRequest : pb::IMessage<QueryActiveAlarmsRequest>
@@ -196,6 +196,9 @@ namespace ZB.MOM.WW.MxGateway.Contracts.Proto {
/// reconnect to seed Part 9 client state, or to reconcile alarms that may
/// have been missed during a transport blip. Streamed so callers can
/// begin processing without buffering the full set.
/// `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
/// snapshot to alarms whose `alarm_full_reference` starts with the given
/// prefix; an empty prefix returns the full set.
/// </summary>
/// <param name="request">The request received from the client.</param>
/// <param name="responseStream">Used for sending responses back to the client.</param>
@@ -364,6 +367,9 @@ namespace ZB.MOM.WW.MxGateway.Contracts.Proto {
/// reconnect to seed Part 9 client state, or to reconcile alarms that may
/// have been missed during a transport blip. Streamed so callers can
/// begin processing without buffering the full set.
/// `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
/// snapshot to alarms whose `alarm_full_reference` starts with the given
/// prefix; an empty prefix returns the full set.
/// </summary>
/// <param name="request">The request to send to the server.</param>
/// <param name="headers">The initial metadata to send with the call. This parameter is optional.</param>
@@ -381,6 +387,9 @@ namespace ZB.MOM.WW.MxGateway.Contracts.Proto {
/// reconnect to seed Part 9 client state, or to reconcile alarms that may
/// have been missed during a transport blip. Streamed so callers can
/// begin processing without buffering the full set.
/// `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
/// snapshot to alarms whose `alarm_full_reference` starts with the given
/// prefix; an empty prefix returns the full set.
/// </summary>
/// <param name="request">The request to send to the server.</param>
/// <param name="options">The options for the call.</param>
@@ -31,12 +31,16 @@ service MxAccessGateway {
// reconnect to seed Part 9 client state, or to reconcile alarms that may
// have been missed during a transport blip. Streamed so callers can
// begin processing without buffering the full set.
// `QueryActiveAlarmsRequest.alarm_filter_prefix` optionally narrows the
// snapshot to alarms whose `alarm_full_reference` starts with the given
// prefix; an empty prefix returns the full set.
rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot);
}
// Public request shape for QueryActiveAlarms. session_id is currently unused
// (the snapshot is session-less) but reserved so a future per-session view
// can be added without a wire break.
// Public request shape for QueryActiveAlarms.
// Clients may leave `session_id` empty; the gateway currently ignores it and
// serves the session-less central-monitor cache. A future version may use it
// to scope the snapshot to one session.
message QueryActiveAlarmsRequest {
string session_id = 1;
string client_correlation_id = 2;
@@ -26,6 +26,16 @@ public sealed class DashboardLdapLiveTests
Assert.Contains(result.Principal.Claims, claim =>
claim.Type == DashboardAuthenticationDefaults.LdapGroupClaimType
&& claim.Value.Contains("GwAdmin", StringComparison.OrdinalIgnoreCase));
// IntegrationTests-023: DashboardAuthenticator.CreatePrincipal emits a
// ClaimTypes.Role claim derived from MapGroupsToRoles. The seeded
// GroupToRole map (GwAdmin -> Admin) means the admin principal must
// carry Role=Admin alongside the raw LDAP-group claim. A regression in
// MapGroupsToRoles (returning an empty list, missing the RDN fallback)
// would silently pass without this assertion.
Assert.Contains(result.Principal.Claims, claim =>
claim.Type == ClaimTypes.Role
&& claim.Value == DashboardRoles.Admin);
}
[LiveLdapFact]
@@ -97,9 +97,22 @@ public static class IntegrationTestEnvironment
return defaultValue;
}
/// <summary>Resolves the root directory of the repository by walking parents for a src/ directory next to either a .git marker or a .sln/.slnx file.</summary>
/// <summary>
/// Resolves the root directory of the repository by walking parents for a
/// <c>src/</c> directory next to either a <c>.git</c> marker or a
/// <c>.sln</c>/<c>.slnx</c> file. Throws <see cref="InvalidOperationException"/>
/// when no root is found so a misconfigured run fails fast with an actionable
/// message rather than silently falling back to the current working directory
/// (which previously produced a misleading "worker exe not found" pointing at
/// a fabricated path — see IntegrationTests-022). The
/// <see cref="LiveMxAccessWorkerExecutableVariableName"/> environment variable
/// remains the escape hatch for unusual deployments.
/// </summary>
/// <param name="startDirectory">Starting directory to search from.</param>
/// <returns>The repository root path, or the start directory if not found.</returns>
/// <returns>The repository root path.</returns>
/// <exception cref="InvalidOperationException">
/// Thrown when the parent walk exhausts without finding a repository root.
/// </exception>
internal static string ResolveRepositoryRoot(string startDirectory)
{
DirectoryInfo? directory = new(startDirectory);
@@ -113,7 +126,11 @@ public static class IntegrationTestEnvironment
directory = directory.Parent;
}
return Directory.GetCurrentDirectory();
throw new InvalidOperationException(
$"Could not resolve repository root by walking parents of '{startDirectory}'. "
+ "Expected to find a directory containing a 'src' subdirectory next to either a '.git' marker "
+ "or a '*.sln' / '*.slnx' file in 'src'. "
+ $"Set the '{LiveMxAccessWorkerExecutableVariableName}' environment variable to bypass repository-root resolution.");
}
private static bool IsRepositoryRoot(DirectoryInfo directory)
@@ -45,4 +45,41 @@ public sealed class IntegrationTestEnvironmentTests
}
}
}
/// <summary>
/// Verifies that <see cref="IntegrationTestEnvironment.ResolveRepositoryRoot"/>
/// throws <see cref="InvalidOperationException"/> with a diagnostic message when
/// the walk exhausts without finding a repository root. The previous silent
/// fallback to <c>Directory.GetCurrentDirectory()</c> masked misconfiguration
/// (IntegrationTests-022); operators get a clear, actionable failure instead.
/// </summary>
[Fact]
public void ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers()
{
string isolatedRoot = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
string isolatedStart = Path.Combine(isolatedRoot, "nested");
try
{
Directory.CreateDirectory(isolatedStart);
InvalidOperationException ex = Assert.Throws<InvalidOperationException>(
() => IntegrationTestEnvironment.ResolveRepositoryRoot(isolatedStart));
Assert.Contains(isolatedStart, ex.Message, StringComparison.Ordinal);
Assert.Contains(".git", ex.Message, StringComparison.Ordinal);
Assert.Contains(".sln", ex.Message, StringComparison.Ordinal);
Assert.Contains(
IntegrationTestEnvironment.LiveMxAccessWorkerExecutableVariableName,
ex.Message,
StringComparison.Ordinal);
}
finally
{
if (Directory.Exists(isolatedRoot))
{
Directory.Delete(isolatedRoot, recursive: true);
}
}
}
}
@@ -0,0 +1,31 @@
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
namespace ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport;
/// <summary>
/// No-op <see cref="IDashboardEventBroadcaster"/> for integration tests that
/// construct an <see cref="ZB.MOM.WW.MxGateway.Server.Grpc.EventStreamService"/>
/// without exercising the dashboard SignalR fan-out. Singleton because the
/// type holds no state — every consumer can share <see cref="Instance"/>.
/// The unit-test project owns a parallel copy under
/// <c>ZB.MOM.WW.MxGateway.Tests/TestSupport/</c>; IntegrationTests keeps its
/// own copy here so the two test projects stay independently buildable
/// without a shared test-helpers project (IntegrationTests-024).
/// </summary>
public sealed class NullDashboardEventBroadcaster : IDashboardEventBroadcaster
{
/// <summary>Shared no-op instance.</summary>
public static readonly NullDashboardEventBroadcaster Instance = new();
private NullDashboardEventBroadcaster()
{
}
/// <inheritdoc />
public void Publish(string sessionId, MxEvent mxEvent)
{
// Intentionally empty — integration tests assert directly on the gRPC
// stream writer, not on dashboard fan-out.
}
}
@@ -15,6 +15,7 @@ using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
using ZB.MOM.WW.MxGateway.Server.Security.Authorization;
using ZB.MOM.WW.MxGateway.Server.Sessions;
using ZB.MOM.WW.MxGateway.Server.Workers;
using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport;
using Xunit.Abstractions;
namespace ZB.MOM.WW.MxGateway.IntegrationTests;
@@ -1595,9 +1596,4 @@ public sealed class WorkerLiveMxAccessSmokeTests(ITestOutputHelper output)
CancellationToken cancellationToken) => Task.CompletedTask;
}
private sealed class NullDashboardEventBroadcaster : ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster
{
public static readonly NullDashboardEventBroadcaster Instance = new();
public void Publish(string sessionId, ZB.MOM.WW.MxGateway.Contracts.Proto.MxEvent mxEvent) { }
}
}
@@ -154,6 +154,13 @@ public sealed class DashboardAuthenticator(
foreach (string group in groups)
{
string normalizedGroup = group.Trim();
// Lookup precedence (Server-040): the full literal group string is
// tried first; only if that misses do we fall back to the leading
// RDN value (e.g. "GwAdmin" extracted from
// "ou=GwAdmin,ou=groups,..."). The map's comparer is
// OrdinalIgnoreCase (see DashboardOptions.GroupToRole), so e.g.
// "GwAdmin" and "gwadmin" both match.
if (groupToRole.TryGetValue(normalizedGroup, out string? mapped)
|| groupToRole.TryGetValue(ExtractFirstRdnValue(normalizedGroup), out mapped))
{
@@ -11,6 +11,18 @@ namespace ZB.MOM.WW.MxGateway.Server.Dashboard;
/// role claims. Validity is enforced by the data-protection time-limited
/// protector; no separate signing keys are configured.
/// </summary>
/// <remarks>
/// Server-043: this service is registered as a singleton in
/// <see cref="DashboardServiceCollectionExtensions.AddGatewayDashboard"/> and
/// is shared by two consumer scopes: <c>DashboardHubConnectionFactory</c>
/// (scoped, per-circuit; calls <see cref="Issue"/> from the cookie-authenticated
/// dashboard) and <c>HubTokenAuthenticationHandler</c> (transient, per-request;
/// calls <see cref="Validate"/> from the SignalR negotiate / connection path).
/// The underlying <see cref="ITimeLimitedDataProtector"/> is thread-safe, so
/// minting and validating concurrently from any number of callers is safe;
/// future maintainers should preserve the singleton lifetime to keep the
/// protector instance stable.
/// </remarks>
public sealed class HubTokenService
{
private const string ProtectorPurpose = "ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1";
@@ -51,6 +63,16 @@ public sealed class HubTokenService
return null;
}
// Server-039: reject a token whose payload carries no caller
// identity. A null/empty Name AND NameIdentifier would otherwise
// produce a principal that satisfies IsAuthenticated and IsInRole
// checks without any associated user, because the AuthenticationType
// (the HubToken scheme) is non-empty.
if (string.IsNullOrEmpty(payload.Name) && string.IsNullOrEmpty(payload.NameIdentifier))
{
return null;
}
List<Claim> claims = [];
if (!string.IsNullOrEmpty(payload.Name))
{
@@ -8,34 +8,94 @@ namespace ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
/// <see cref="DashboardSnapshotHub"/> client. There is one publisher per
/// gateway process; clients listen via the hub.
/// </summary>
public sealed class DashboardSnapshotPublisher(
IDashboardSnapshotService snapshotService,
IHubContext<DashboardSnapshotHub> hubContext,
ILogger<DashboardSnapshotPublisher> logger) : BackgroundService
/// <remarks>
/// Server-042: <see cref="ExecuteAsync"/> wraps the snapshot subscription in
/// a reconnect loop with a configurable retry delay (5s by default,
/// mirroring <see cref="AlarmsHubPublisher"/>). A transient failure inside
/// <see cref="IDashboardSnapshotService.WatchSnapshotsAsync"/> — e.g. a
/// one-time logger-init failure or a transient SQL error from the Galaxy
/// summary projection — would otherwise end the BackgroundService with no
/// reconnect, taking the dashboard offline until process restart.
/// </remarks>
public sealed class DashboardSnapshotPublisher : BackgroundService
{
private static readonly TimeSpan DefaultReconnectDelay = TimeSpan.FromSeconds(5);
private readonly IDashboardSnapshotService _snapshotService;
private readonly IHubContext<DashboardSnapshotHub> _hubContext;
private readonly ILogger<DashboardSnapshotPublisher> _logger;
private readonly TimeSpan _reconnectDelay;
public DashboardSnapshotPublisher(
IDashboardSnapshotService snapshotService,
IHubContext<DashboardSnapshotHub> hubContext,
ILogger<DashboardSnapshotPublisher> logger)
: this(snapshotService, hubContext, logger, DefaultReconnectDelay)
{
}
// Internal hook for the Server-042 regression test: tests inject a
// very short reconnect delay so the assertion doesn't wait the full
// 5s. Production wiring always uses the 5s default via the public ctor.
internal DashboardSnapshotPublisher(
IDashboardSnapshotService snapshotService,
IHubContext<DashboardSnapshotHub> hubContext,
ILogger<DashboardSnapshotPublisher> logger,
TimeSpan reconnectDelay)
{
_snapshotService = snapshotService;
_hubContext = hubContext;
_logger = logger;
_reconnectDelay = reconnectDelay;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
try
// Loop forever — when WatchSnapshotsAsync completes or throws, reopen
// the subscription after a short delay. The hosted-service lifetime
// ends only when the host stops. Mirrors AlarmsHubPublisher.
while (!stoppingToken.IsCancellationRequested)
{
await foreach (DashboardSnapshot snapshot in snapshotService
.WatchSnapshotsAsync(stoppingToken)
.ConfigureAwait(false))
try
{
await foreach (DashboardSnapshot snapshot in _snapshotService
.WatchSnapshotsAsync(stoppingToken)
.ConfigureAwait(false))
{
if (stoppingToken.IsCancellationRequested)
{
break;
}
try
{
await _hubContext.Clients
.All
.SendAsync(DashboardSnapshotHub.SnapshotMessage, snapshot, stoppingToken)
.ConfigureAwait(false);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
_logger.LogWarning(ex, "Snapshot broadcast failed; will retry on the next snapshot tick.");
}
}
}
catch (OperationCanceledException)
{
return;
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Snapshot subscription faulted; reconnecting in {DelaySeconds:F1}s.", _reconnectDelay.TotalSeconds);
try
{
await hubContext.Clients
.All
.SendAsync(DashboardSnapshotHub.SnapshotMessage, snapshot, stoppingToken)
.ConfigureAwait(false);
await Task.Delay(_reconnectDelay, stoppingToken).ConfigureAwait(false);
}
catch (Exception ex) when (ex is not OperationCanceledException)
catch (OperationCanceledException)
{
logger.LogWarning(ex, "Snapshot broadcast failed; will retry on the next snapshot tick.");
return;
}
}
}
catch (OperationCanceledException)
{
}
}
}
@@ -23,6 +23,29 @@ public sealed class EventsHub : Hub
public static string GroupName(string sessionId) => $"session:{sessionId}";
/// <summary>
/// Subscribes the calling SignalR connection to the per-session events
/// group, so that events broadcast by
/// <see cref="DashboardEventBroadcaster"/> for that session reach this
/// client.
/// </summary>
/// <remarks>
/// Server-038: in v1 the hub-level <see cref="AuthorizeAttribute"/>
/// (<c>HubClientsPolicy</c>) only checks that the caller carries one of
/// the dashboard roles (Admin or Viewer); both roles may subscribe to
/// any session id they choose. This is acceptable today because (a) the
/// dashboard's per-session views show non-secret session metadata that
/// any authenticated dashboard user can already see, and (b) value
/// logging in the source gRPC stream is gated by the same redaction
/// policy that protects logs. The per-session ACL that gates the gRPC
/// <c>StreamEvents</c> RPC is intentionally not yet mirrored here.
/// TODO(per-session-acl): once a role/scope is introduced that scopes a
/// Viewer to a specific session or tenant, add a session-access check
/// at this seam — either inline (consult the per-user allowed-session
/// set on <c>Context.User</c> claims / <c>Context.Items</c>) or via a
/// dedicated authorization policy applied to the hub method itself.
/// </remarks>
/// <param name="sessionId">Session id to subscribe the caller to.</param>
public Task SubscribeSession(string sessionId)
{
if (string.IsNullOrWhiteSpace(sessionId))
@@ -122,8 +122,23 @@ public sealed class EventStreamService(
// Mirror the event to the dashboard EventsHub group for this
// session. Fire-and-forget — broadcast errors must not affect
// the source gRPC stream.
dashboardEventBroadcaster.Publish(session.SessionId, publicEvent);
// the source gRPC stream. Server-041: the
// IDashboardEventBroadcaster contract documents Publish as
// never-throw, but we enforce that at the seam too, so a
// future implementation that adds synchronous validation or
// a serializer hop cannot fault the producer loop and end
// this client's gRPC stream.
try
{
dashboardEventBroadcaster.Publish(session.SessionId, publicEvent);
}
catch (Exception ex)
{
logger.LogDebug(
ex,
"Dashboard event mirror threw for session {SessionId}; continuing.",
session.SessionId);
}
if (!writer.TryWrite(publicEvent))
{
@@ -437,6 +437,46 @@ public sealed class ProtobufContractRoundTripTests
Assert.Equal(withFilter, StreamAlarmsRequest.Parser.ParseFrom(withFilter.ToByteArray()));
}
/// <summary>
/// Verifies that <c>QueryActiveAlarmsRequest</c> pins the additive-only field numbering
/// (<c>session_id = 1</c>, <c>client_correlation_id = 2</c>, <c>alarm_filter_prefix = 3</c>)
/// advertised in its proto comment, that the message round-trips with the optional
/// <c>alarm_filter_prefix</c> populated (the filter semantic the public RPC comment
/// documents), and that <c>QueryActiveAlarms</c> remains on the public service surface.
/// </summary>
[Fact]
public void QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter()
{
var service = Assert.Single(
MxaccessGatewayReflection.Descriptor.Services,
descriptor => descriptor.Name == "MxAccessGateway");
Assert.Contains(service.Methods, method => method.Name == "QueryActiveAlarms");
var fields = QueryActiveAlarmsRequest.Descriptor.Fields;
Assert.Equal(1, fields[QueryActiveAlarmsRequest.SessionIdFieldNumber].FieldNumber);
Assert.Equal(2, fields[QueryActiveAlarmsRequest.ClientCorrelationIdFieldNumber].FieldNumber);
Assert.Equal(3, fields[QueryActiveAlarmsRequest.AlarmFilterPrefixFieldNumber].FieldNumber);
Assert.Equal("session_id", fields[QueryActiveAlarmsRequest.SessionIdFieldNumber].Name);
Assert.Equal("client_correlation_id", fields[QueryActiveAlarmsRequest.ClientCorrelationIdFieldNumber].Name);
Assert.Equal("alarm_filter_prefix", fields[QueryActiveAlarmsRequest.AlarmFilterPrefixFieldNumber].Name);
var withoutFilter = new QueryActiveAlarmsRequest
{
ClientCorrelationId = "client-correlation-10",
};
var withFilter = new QueryActiveAlarmsRequest
{
ClientCorrelationId = "client-correlation-11",
AlarmFilterPrefix = "Tank01.",
};
Assert.Equal(withoutFilter, QueryActiveAlarmsRequest.Parser.ParseFrom(withoutFilter.ToByteArray()));
var parsedWithFilter = QueryActiveAlarmsRequest.Parser.ParseFrom(withFilter.ToByteArray());
Assert.Equal(withFilter, parsedWithFilter);
Assert.Equal("Tank01.", parsedWithFilter.AlarmFilterPrefix);
}
/// <summary>Verifies that an MxValue carrying a raw_value bytes payload round-trips.</summary>
[Fact]
public void MxValue_RoundTripsRawValueBytesPayload()
@@ -0,0 +1,206 @@
using System.Runtime.CompilerServices;
using Microsoft.AspNetCore.SignalR;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.MxGateway.Server.Dashboard;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
namespace ZB.MOM.WW.MxGateway.Tests.Gateway.Dashboard;
public sealed class DashboardSnapshotPublisherTests
{
private static readonly TimeSpan TestTimeout = TimeSpan.FromSeconds(5);
/// <summary>
/// Server-042 regression: a transient failure inside
/// <see cref="IDashboardSnapshotService.WatchSnapshotsAsync"/> must not
/// end the BackgroundService; the publisher must wait the configured
/// reconnect delay and then re-open the subscription. Before the fix,
/// the publisher exited on the first non-cancellation exception and
/// the dashboard's snapshot stream went silent until process restart.
/// </summary>
[Fact]
public async Task ExecuteAsync_WhenSnapshotServiceThrowsOnce_ReconnectsAfterDelay()
{
ThrowOnceThenYieldSnapshotService snapshotService = new();
RecordingHubContext hubContext = new();
TimeSpan reconnectDelay = TimeSpan.FromMilliseconds(50);
DashboardSnapshotPublisher publisher = new(
snapshotService,
hubContext,
NullLogger<DashboardSnapshotPublisher>.Instance,
reconnectDelay);
using CancellationTokenSource cts = new();
DateTimeOffset startedAt = DateTimeOffset.UtcNow;
Task execute = publisher.StartAsync(cts.Token);
await execute.WaitAsync(TestTimeout);
// The publisher's first WatchSnapshotsAsync call throws; the second
// call yields one snapshot. We block here until the publisher has
// made the second subscribe attempt AND broadcast its first
// snapshot — proving the publisher did NOT exit on the throw.
await WaitUntilAsync(() => snapshotService.SubscribeCount >= 2);
await WaitUntilAsync(() => hubContext.SendCount >= 1);
DateTimeOffset secondSubscribeAt = snapshotService.SecondSubscribeAt
?? throw new InvalidOperationException("Second subscribe did not record a timestamp.");
await cts.CancelAsync();
await publisher.StopAsync(CancellationToken.None);
Assert.True(snapshotService.SubscribeCount >= 2,
$"Expected at least 2 subscribe calls, got {snapshotService.SubscribeCount}.");
Assert.True(hubContext.SendCount >= 1);
// The gap between the throw (first subscribe) and the reconnect
// (second subscribe) is bounded below by the reconnect delay. We
// give a small slack (10ms) for scheduling jitter on slow CI VMs.
TimeSpan gap = secondSubscribeAt - startedAt;
Assert.True(gap >= reconnectDelay - TimeSpan.FromMilliseconds(10),
$"Expected reconnect gap >= {reconnectDelay.TotalMilliseconds}ms; got {gap.TotalMilliseconds}ms.");
}
/// <summary>
/// Sanity: a normal completion of WatchSnapshotsAsync (no exception)
/// also reconnects after the delay — exits only on host shutdown.
/// </summary>
[Fact]
public async Task ExecuteAsync_WhenSnapshotServiceCompletes_ReconnectsAfterDelay()
{
CompleteImmediatelySnapshotService snapshotService = new();
RecordingHubContext hubContext = new();
TimeSpan reconnectDelay = TimeSpan.FromMilliseconds(50);
DashboardSnapshotPublisher publisher = new(
snapshotService,
hubContext,
NullLogger<DashboardSnapshotPublisher>.Instance,
reconnectDelay);
using CancellationTokenSource cts = new();
Task execute = publisher.StartAsync(cts.Token);
await execute.WaitAsync(TestTimeout);
await WaitUntilAsync(() => snapshotService.SubscribeCount >= 2);
await cts.CancelAsync();
await publisher.StopAsync(CancellationToken.None);
Assert.True(snapshotService.SubscribeCount >= 2);
}
private static async Task WaitUntilAsync(Func<bool> predicate)
{
using CancellationTokenSource cancellationTokenSource = new(TestTimeout);
while (!predicate())
{
await Task.Delay(TimeSpan.FromMilliseconds(5), cancellationTokenSource.Token);
}
}
private sealed class ThrowOnceThenYieldSnapshotService : IDashboardSnapshotService
{
public int SubscribeCount { get; private set; }
public DateTimeOffset? SecondSubscribeAt { get; private set; }
public DashboardSnapshot GetSnapshot()
{
return null!;
}
public async IAsyncEnumerable<DashboardSnapshot> WatchSnapshotsAsync(
[EnumeratorCancellation] CancellationToken cancellationToken)
{
SubscribeCount++;
int call = SubscribeCount;
if (call == 1)
{
// First call: throw after a brief yield so the publisher
// observes us as a live producer that failed.
await Task.Yield();
throw new InvalidOperationException("simulated transient snapshot failure");
}
SecondSubscribeAt = DateTimeOffset.UtcNow;
yield return GetSnapshot();
// Stay open until cancelled so the publisher's inner await
// foreach doesn't immediately re-loop.
try
{
await Task.Delay(Timeout.InfiniteTimeSpan, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException)
{
}
}
}
private sealed class CompleteImmediatelySnapshotService : IDashboardSnapshotService
{
public int SubscribeCount { get; private set; }
public DashboardSnapshot GetSnapshot()
{
return null!;
}
#pragma warning disable CS1998 // async without await — IAsyncEnumerable contract requires async signature
public async IAsyncEnumerable<DashboardSnapshot> WatchSnapshotsAsync(
[EnumeratorCancellation] CancellationToken cancellationToken)
#pragma warning restore CS1998
{
SubscribeCount++;
// Yield nothing and complete immediately — simulates a transient
// upstream disconnect that completes cleanly.
yield break;
}
}
private sealed class RecordingHubContext : IHubContext<DashboardSnapshotHub>
{
private readonly RecordingHubClients _clients = new();
public IHubClients Clients => _clients;
public IGroupManager Groups { get; } = new NoopGroupManager();
public int SendCount => _clients.AllProxy.SendCount;
}
private sealed class RecordingHubClients : IHubClients
{
public RecordingClientProxy AllProxy { get; } = new();
public IClientProxy All => AllProxy;
public IClientProxy AllExcept(IReadOnlyList<string> excludedConnectionIds) => AllProxy;
public IClientProxy Client(string connectionId) => AllProxy;
public IClientProxy Clients(IReadOnlyList<string> connectionIds) => AllProxy;
public IClientProxy Group(string groupName) => AllProxy;
public IClientProxy GroupExcept(string groupName, IReadOnlyList<string> excludedConnectionIds) => AllProxy;
public IClientProxy Groups(IReadOnlyList<string> groupNames) => AllProxy;
public IClientProxy User(string userId) => AllProxy;
public IClientProxy Users(IReadOnlyList<string> userIds) => AllProxy;
}
private sealed class RecordingClientProxy : IClientProxy
{
private int _sendCount;
public int SendCount => Volatile.Read(ref _sendCount);
public Task SendCoreAsync(string method, object?[] args, CancellationToken cancellationToken = default)
{
Interlocked.Increment(ref _sendCount);
return Task.CompletedTask;
}
}
private sealed class NoopGroupManager : IGroupManager
{
public Task AddToGroupAsync(string connectionId, string groupName, CancellationToken cancellationToken = default)
=> Task.CompletedTask;
public Task RemoveFromGroupAsync(string connectionId, string groupName, CancellationToken cancellationToken = default)
=> Task.CompletedTask;
}
}
@@ -0,0 +1,115 @@
using System.Security.Claims;
using Microsoft.AspNetCore.DataProtection;
using ZB.MOM.WW.MxGateway.Server.Dashboard;
namespace ZB.MOM.WW.MxGateway.Tests.Gateway.Dashboard;
public sealed class HubTokenServiceTests
{
/// <summary>
/// Server-039: a token whose data-protected payload has both
/// <c>Name</c> and <c>NameIdentifier</c> null (the principal that
/// minted the token had no identity claims) must be rejected by
/// <see cref="HubTokenService.Validate"/>. The role claims alone are
/// not enough — without a caller identity, the resulting
/// <see cref="ClaimsPrincipal"/> would satisfy
/// <c>IsAuthenticated</c> / <c>IsInRole</c> checks without an
/// associated user.
/// </summary>
[Fact]
public void Validate_TokenWithNullNameAndNullNameIdentifier_ReturnsNull()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
// Issue from a principal with NO Name claim and NO NameIdentifier
// claim. The Issue method's payload will then carry
// (Name = null, NameIdentifier = null, Roles = ["Viewer"]).
ClaimsIdentity identity = new(
[new Claim(ClaimTypes.Role, DashboardRoles.Viewer)],
authenticationType: "test");
ClaimsPrincipal principal = new(identity);
string token = service.Issue(principal);
ClaimsPrincipal? result = service.Validate(token);
Assert.Null(result);
}
/// <summary>
/// Sanity check: a token minted from a principal with a Name claim
/// validates and returns a principal carrying that identity. Pins
/// that the Server-039 fix does not over-reject valid tokens.
/// </summary>
[Fact]
public void Validate_TokenWithName_ReturnsAuthenticatedPrincipal()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
ClaimsIdentity identity = new(
[
new Claim(ClaimTypes.Name, "alice"),
new Claim(ClaimTypes.NameIdentifier, "alice-id"),
new Claim(ClaimTypes.Role, DashboardRoles.Admin),
],
authenticationType: "test",
nameType: ClaimTypes.Name,
roleType: ClaimTypes.Role);
ClaimsPrincipal principal = new(identity);
string token = service.Issue(principal);
ClaimsPrincipal? result = service.Validate(token);
Assert.NotNull(result);
Assert.Equal("alice", result.Identity?.Name);
Assert.True(result.IsInRole(DashboardRoles.Admin));
}
/// <summary>
/// Sanity check: a token minted with only a NameIdentifier (no Name)
/// still validates — a non-null caller identity is the contract,
/// either field is sufficient.
/// </summary>
[Fact]
public void Validate_TokenWithOnlyNameIdentifier_ReturnsPrincipal()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
ClaimsIdentity identity = new(
[
new Claim(ClaimTypes.NameIdentifier, "alice-id"),
new Claim(ClaimTypes.Role, DashboardRoles.Viewer),
],
authenticationType: "test");
ClaimsPrincipal principal = new(identity);
string token = service.Issue(principal);
ClaimsPrincipal? result = service.Validate(token);
Assert.NotNull(result);
Assert.True(result.IsInRole(DashboardRoles.Viewer));
}
[Fact]
public void Validate_NullToken_ReturnsNull()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
Assert.Null(service.Validate(null));
}
[Fact]
public void Validate_EmptyToken_ReturnsNull()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
Assert.Null(service.Validate(string.Empty));
}
[Fact]
public void Validate_GarbageToken_ReturnsNull()
{
HubTokenService service = new(new EphemeralDataProtectionProvider());
Assert.Null(service.Validate("this-is-not-a-protected-payload"));
}
}
@@ -414,9 +414,4 @@ public sealed class GatewayEndToEndFakeWorkerSmokeTests
}
}
private sealed class NullDashboardEventBroadcaster : ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster
{
public static readonly NullDashboardEventBroadcaster Instance = new();
public void Publish(string sessionId, ZB.MOM.WW.MxGateway.Contracts.Proto.MxEvent mxEvent) { }
}
}
@@ -9,6 +9,7 @@ using ZB.MOM.WW.MxGateway.Server.Grpc;
using ZB.MOM.WW.MxGateway.Server.Metrics;
using ZB.MOM.WW.MxGateway.Server.Sessions;
using ZB.MOM.WW.MxGateway.Server.Workers;
using ZB.MOM.WW.MxGateway.Tests.TestSupport;
namespace ZB.MOM.WW.MxGateway.Tests.Gateway.Grpc;
@@ -260,11 +261,81 @@ public sealed class EventStreamServiceTests
Assert.Equal(1, metrics.GetSnapshot().Faults);
}
/// <summary>
/// Tests-026 regression: <see cref="EventStreamService.StreamEventsAsync"/>
/// must mirror every yielded event to the
/// <see cref="ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster"/>
/// seam (the only path that fans events out to dashboard SignalR clients).
/// A regression that silently dropped the <c>Publish</c> call — e.g. an
/// <c>if</c> accidentally added around it, or the broadcaster ctor
/// parameter being removed — would have produced no failing test before
/// this fixture existed. The recording fake captures every call and we
/// assert one publish per yielded event, with the correct session id and
/// preserved <c>WorkerSequence</c>.
/// </summary>
[Fact]
public async Task StreamEventsAsync_PublishesEachEventToDashboardBroadcaster()
{
FakeWorkerClient workerClient = new();
GatewaySession session = CreateReadySession(workerClient);
RecordingDashboardEventBroadcaster recordingBroadcaster = new();
EventStreamService service = CreateService(
new FakeSessionManager(session),
dashboardEventBroadcaster: recordingBroadcaster);
workerClient.Events.Add(CreateWorkerEvent(sequence: 7, MxEventFamily.OnDataChange));
workerClient.Events.Add(CreateWorkerEvent(sequence: 8, MxEventFamily.OnWriteComplete));
workerClient.CompleteAfterConfiguredEvents = true;
List<MxEvent> events = await CollectEventsAsync(service, session.SessionId);
Assert.Equal([7UL, 8UL], events.Select(mxEvent => mxEvent.WorkerSequence).ToArray());
IReadOnlyList<DashboardEventCapture> captures = recordingBroadcaster.Captures;
Assert.Equal(2, captures.Count);
Assert.All(captures, capture => Assert.Equal(session.SessionId, capture.SessionId));
Assert.Equal([7UL, 8UL], captures.Select(capture => capture.MxEvent.WorkerSequence).ToArray());
Assert.Equal(MxEventFamily.OnDataChange, captures[0].MxEvent.Family);
Assert.Equal(MxEventFamily.OnWriteComplete, captures[1].MxEvent.Family);
}
/// <summary>
/// Server-041 regression: <see cref="EventStreamService"/> must not
/// abort the gRPC stream when the dashboard broadcaster throws.
/// <c>IDashboardEventBroadcaster.Publish</c> is documented as
/// best-effort and never-throw, but the gRPC consumer cannot rely on
/// implementation discipline alone — the seam itself swallows the
/// fault and logs at debug, mirroring the broadcaster's own
/// continuation handler. Without the wrap, the producer loop would
/// surface the exception and the client would see a faulted stream
/// for a dashboard-mirror failure.
/// </summary>
[Fact]
public async Task StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession()
{
FakeWorkerClient workerClient = new();
GatewaySession session = CreateReadySession(workerClient);
using GatewayMetrics metrics = new();
ThrowingDashboardEventBroadcaster throwingBroadcaster = new();
EventStreamService service = CreateService(
new FakeSessionManager(session),
metrics,
dashboardEventBroadcaster: throwingBroadcaster);
workerClient.Events.Add(CreateWorkerEvent(sequence: 1, MxEventFamily.OnDataChange));
workerClient.Events.Add(CreateWorkerEvent(sequence: 2, MxEventFamily.OnDataChange));
workerClient.CompleteAfterConfiguredEvents = true;
List<MxEvent> events = await CollectEventsAsync(service, session.SessionId);
Assert.Equal([1UL, 2UL], events.Select(mxEvent => mxEvent.WorkerSequence).ToArray());
Assert.Equal(2, throwingBroadcaster.PublishAttempts);
Assert.NotEqual(SessionState.Faulted, session.State);
}
private static EventStreamService CreateService(
FakeSessionManager sessionManager,
GatewayMetrics? metrics = null,
int queueCapacity = 8,
EventBackpressurePolicy backpressurePolicy = EventBackpressurePolicy.FailFast)
EventBackpressurePolicy backpressurePolicy = EventBackpressurePolicy.FailFast,
ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster? dashboardEventBroadcaster = null)
{
return new EventStreamService(
sessionManager,
@@ -278,14 +349,19 @@ public sealed class EventStreamServiceTests
}),
new MxAccessGrpcMapper(),
metrics ?? new GatewayMetrics(),
NullDashboardEventBroadcaster.Instance,
dashboardEventBroadcaster ?? NullDashboardEventBroadcaster.Instance,
NullLogger<EventStreamService>.Instance);
}
private sealed class NullDashboardEventBroadcaster : ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster
private sealed class ThrowingDashboardEventBroadcaster : ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs.IDashboardEventBroadcaster
{
public static readonly NullDashboardEventBroadcaster Instance = new();
public void Publish(string sessionId, MxEvent mxEvent) { }
public int PublishAttempts { get; private set; }
public void Publish(string sessionId, MxEvent mxEvent)
{
PublishAttempts++;
throw new InvalidOperationException("simulated dashboard broadcaster failure");
}
}
private static async Task<List<MxEvent>> CollectEventsAsync(
@@ -0,0 +1,26 @@
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
namespace ZB.MOM.WW.MxGateway.Tests.TestSupport;
/// <summary>
/// <see cref="IDashboardEventBroadcaster"/> that drops every event on the floor.
/// Used by tests that need to wire up <c>EventStreamService</c> but do not care
/// about the dashboard fan-out side-channel. The singleton <see cref="Instance"/>
/// mirrors the Tests-007 / Tests-021 shared-fake pattern under
/// <c>src/ZB.MOM.WW.MxGateway.Tests/TestSupport/</c>.
/// </summary>
public sealed class NullDashboardEventBroadcaster : IDashboardEventBroadcaster
{
/// <summary>Shared instance for tests that do not need to inspect broadcaster state.</summary>
public static readonly NullDashboardEventBroadcaster Instance = new();
private NullDashboardEventBroadcaster()
{
}
/// <inheritdoc />
public void Publish(string sessionId, MxEvent mxEvent)
{
}
}
@@ -0,0 +1,30 @@
using System.Collections.Concurrent;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.MxGateway.Server.Dashboard.Hubs;
namespace ZB.MOM.WW.MxGateway.Tests.TestSupport;
/// <summary>
/// <see cref="IDashboardEventBroadcaster"/> test double that captures every
/// <c>Publish(sessionId, mxEvent)</c> call into a thread-safe list, so tests
/// can prove the gRPC producer loop actually mirrors events to the dashboard
/// fan-out seam. Tests-026.
/// </summary>
public sealed class RecordingDashboardEventBroadcaster : IDashboardEventBroadcaster
{
private readonly ConcurrentQueue<DashboardEventCapture> _captures = new();
/// <summary>Gets a snapshot of every <c>Publish</c> call in arrival order.</summary>
public IReadOnlyList<DashboardEventCapture> Captures => _captures.ToArray();
/// <inheritdoc />
public void Publish(string sessionId, MxEvent mxEvent)
{
_captures.Enqueue(new DashboardEventCapture(sessionId, mxEvent));
}
}
/// <summary>Captured <c>(sessionId, mxEvent)</c> pair from a Publish invocation.</summary>
/// <param name="SessionId">The session id passed to Publish.</param>
/// <param name="MxEvent">The MxEvent passed to Publish.</param>
public sealed record DashboardEventCapture(string SessionId, MxEvent MxEvent);