stream-alarms attaches to the gateway's central alarm feed (mirrors
stream-events: --max-events cap, --json/--jsonl, --filter-prefix);
acknowledge-alarm is a unary session-less ack (--reference required,
--comment, --operator). Both wired through IMxGatewayCliClient and the
adapter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cross-language e2e matrix spawned one CLI process per operation —
~250 per client — paying a process (and, for the Java CLI, a full JVM)
cold-start every time. The Java leg alone ran ~16 minutes.
Each client CLI (dotnet, go, rust, python, java) gains a `batch`
subcommand: a single process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result,
then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command
writes its `{"error":...}` envelope and the loop continues.
run-client-e2e-tests.ps1 now launches one batch process per client and
pings every operation through its stdin/stdout, so startup is paid once
per client. The orchestration and assertions are unchanged; the parity
and auth phases now read the `{"error":...}` envelope instead of a
process exit code.
Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java
leg dropped from ~16 min to ~2-3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both findings surfaced when running the cross-language e2e matrix
(scripts/run-client-e2e-tests.ps1) against the redeployed gateway at
commit 84d36b7. Filed in code-reviews/Server/findings.md and
code-reviews/Client.Dotnet/findings.md and fixed in the same change.
Server-030 (Medium / Error handling): GatewaySession.GetReadyWorkerClient
gated on `_state == Ready && _workerClient.State == Ready` but only
formatted `_state` into the SessionManagerException message. Under load
the gateway-driven `_state` and the worker-driven `WorkerClient.State`
can diverge, producing a self-contradictory diagnostic ("Session ... is
not ready. Current state is Ready."). The Java e2e client hit this on
the 56th item after 55 successful add-items. Rewrote the message to
include both states ("Session state is X; worker state is Y"), added
an XML doc explaining the two-state contract and that this branch is
the fail-fast for a divergence race, and added regression test
SessionManagerTests.InvokeAsync_WhenWorkerNotReadyButSessionReady_DiagnosticIncludesBothStates
that pins both states appear in the message. The deeper race (should
the gateway briefly wait for worker-Ready before failing?) remains
open as a follow-up.
Client.Dotnet-017 (Low / Error handling): stream-events CLI threw
OperationCanceledException as an unhandled exception when the user's
--timeout expired before --max-events was reached. Exit code
-532462766, no aggregate JSON. The other client CLIs (Go, Rust, Python,
Java) exit 0 in this case. Wrapped the `await foreach` in
`catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)`
so the supplied token's cancellation (--timeout, Ctrl+C, or parent
CTS) becomes graceful completion; the aggregate `{ "events": [...] }`
JSON still runs after the catch. Added regression test
RunAsync_StreamEvents_WhenTimeoutFiresAfterEvents_EmitsCollectedEventsAndExitsZero
backed by a new FakeCliClient.StreamHangAfterEvents hook that yields
the configured events then parks on the cancellation token.
Side cleanup: the GatewayApplicationTests test added under Server-020
was asserting an invariant (`/dashboard/dashboard/X` doesn't exist)
that I broke by reverting Server-020 in 84d36b7. The doubled endpoint
shapes do exist now (MapGroup("/dashboard") prefixing an already
"/dashboard/X" @page directive) but they're harmless — no client
requests `/dashboard/dashboard/X`. Replaced the test with a positive
assertion (`/dashboard/X` routes ARE registered) and rewrote the XML
doc to record the actual contract.
Verified: dotnet test src/MxGateway.Tests passes 480/480, dotnet test
clients/dotnet/MxGateway.Client.Tests passes 77/77, gateway redeployed
at this commit and GET http://localhost:5130/dashboard returns 200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client.Dotnet-004: documented DefaultCallTimeout as both the per-attempt
deadline and the shared retry budget, and removed DeadlineExceeded from the
transient-retry set (a client-imposed deadline cannot be helped by retrying).
Client.Dotnet-005: RegisterAsync/AddItemAsync/AddItem2Async silently returned
0 when a successful reply lacked the typed payload. They now throw a
descriptive MxGatewayException.
Client.Dotnet-006: added XML docs to the previously undocumented public
members MaxGrpcMessageBytes, GatewayProtocolVersion, WorkerProtocolVersion.
Client.Dotnet-007: corrected the AcknowledgeAlarmAsync XML comment — the RPC
requires the admin scope, not a non-existent invoke:alarm-ack sub-scope.
Client.Dotnet-008: the CLI redactor missed env-var-sourced keys because the
caller passed only the --api-key option. Redaction now uses the same
resolver, stripping env-var keys too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seventh PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR A.1 (proto, merged)
and E.1 (regen, merged).
Hand-written .NET SDK methods on top of the regenerated proto types:
- MxGatewayClient.AcknowledgeAlarmAsync — routes through the existing
safe-unary retry pipeline (Acks are idempotent at MxAccess), maps
Unauthenticated/PermissionDenied RpcExceptions to typed
MxGatewayAuthenticationException / MxGatewayAuthorizationException
via GrpcMxGatewayClientTransport.MapRpcException.
- MxGatewayClient.QueryActiveAlarmsAsync — server-streaming
IAsyncEnumerable<ActiveAlarmSnapshot> mirroring the StreamEvents
pattern.
- IMxGatewayClientTransport extended; GrpcMxGatewayClientTransport
implements both methods using the regenerated grpc client.
- FakeGatewayTransport extended with capture lists, exception queue,
and reply / snapshot enqueue helpers.
CLI version-string assertions updated for the GatewayProtocolVersion
2 → 3 bump from A.1.
The CLI alarms verb (subscribe / acknowledge / query-active) is
deferred to a follow-up — keeping this PR focused on the SDK surface
that lmxopcua's GalaxyDriver consumes in PR B.2. The other-language
SDKs (E.3-E.6) layer the same shape on the regen.
Tests:
- 6 new MxGatewayClientAlarmsTests — request shape, cancellation
honor (linked-token via retry pipeline), Unauthenticated mapping,
streaming snapshot enumeration, filter prefix passthrough,
cancellation during enumeration.
- Full client test suite: 57 passed (was 51; 6 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Resolve 14 conflicts from popping local stash on top of origin's
eed1e88 + 8d3352f doc-comment additions (11 mechanical, plus
version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs)
- Fix 4 test files that used AGENTS.md as the repo-root sentinel
(now use CLAUDE.md, since AGENTS.md was removed in 4731ab5)
- Redirect 10 doc citations from AGENTS.md to the matching gateway.md
sections (Value Model, Status Model, Security, STA Worker Thread
Model, gRPC Layer rule, cancellation rule)
Verified: solution build clean, x86 worker build clean, 266/266
gateway tests passing, 121/121 worker tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>