Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a bench-read-bulk subcommand to every client CLI (.NET, Go, Rust,
Python, Java) and a PowerShell driver that runs all five concurrently
against the deployed gateway and prints a side-by-side comparison.
Each CLI''s bench:
- Opens its own session, registers, subscribes to bulk-size tags so the
worker''s MxAccessValueCache populates from real OnDataChange events.
- Runs a warmup-seconds-long pre-loop with identical calls so JIT /
connection-pool / first-call overhead is amortised before the
measurement window.
- Runs ReadBulk in a tight in-process loop for duration-seconds with
per-call high-resolution latency capture (Stopwatch in .NET,
time.Now in Go, std::time::Instant in Rust, time.perf_counter in
Python, System.nanoTime in Java).
- Unsubscribes + closes the session, then emits one JSON object with
the shared schema: { language, durationMs, totalCalls, successfulCalls,
failedCalls, totalReadResults, cachedReadResults, callsPerSecond,
latencyMs: { p50, p95, p99, max, mean } }.
The PS driver (scripts/bench-read-bulk.ps1) launches one detached process
per client, waits for all to finish, parses the trailing JSON object from
each stdout, prints a comparison table, and persists the combined report
under artifacts/bench/. Quoting around Java''s `gradle --args="..."` is
handled by writing a one-shot .bat that cmd.exe runs; the .NET CLI''s
per-call gRPC timeout is auto-scaled to (Duration + Warmup + 30s) so the
channel-wide timeout doesn''t cancel the bench mid-loop.
Live 30-second steady-state run against the deployed gateway, all five
clients hitting the same six TestMachine_001..006.TestChangingInt tags:
client calls/sec cached/total p50 ms p95 ms p99 ms max ms
dotnet 171.78 30924/30924 3.84 14.06 40.41 542.48
go 175.46 31590/31590 3.93 13.52 41.26 243.00
rust 123.26 22188/22188 5.52 15.78 48.11 544.41
python 145.79 26244/26244 4.86 14.85 41.65 645.84
java 181.12 32604/32604 3.80 10.59 33.37 344.27
143,550 ReadBulk results across all five clients during the 30s window;
100% were was_cached = true (the worker''s cache fast-path never fell
through to the snapshot lifecycle). Aggregate read throughput ~800
calls/sec against five concurrent sessions sharing the same cached tags.
A second variant with bulk-size 20 sustained the same per-client call
rate while delivering 3.3x more values per call (~37,000 cached reads/sec
aggregate across the five concurrent sessions), confirming the linear
per-tag cache lookup inside one call is not a bottleneck at this scale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit added the bulk read/write library surface in every
client; this commit makes that surface reachable from each client's CLI
and exercises it through scripts/run-client-e2e-tests.ps1.
Five new subcommands in every client CLI (.NET / Go / Rust / Python /
Java): read-bulk, write-bulk, write2-bulk, write-secured-bulk, and
write-secured2-bulk. Each follows the existing subscribe-bulk shape:
- read-bulk takes --server-handle, --items <csv tag list>, and
--timeout-ms (0 = worker default). JSON output carries the
BulkReadResult fields, including was_cached so the e2e matrix can
verify the cached-path semantics.
- The four bulk-write families take --server-handle, --item-handles
<csv>, --type, --values <csv>. write2-bulk and write-secured2-bulk
add a single --timestamp applied to every entry; the secured
variants take --current-user-id and --verifier-user-id. All four
output BulkWriteResult JSON.
A new -SkipReadWriteBulk switch on the matrix script (default OFF)
controls two new e2e phases:
- After the existing subscribe-bulk phase leaves tags advised, the
script runs read-bulk against the same tag list and asserts most
results return was_cached = true. This is the only e2e coverage of
the cache-then-snapshot fork — the unit + gateway tests verify the
semantics with a fake worker, but only the live cross-language
matrix proves the cache populates from real OnDataChange events and
survives the round-trip through every client''s JSON parser.
- When -VerifyWrite is set, the write phase now also runs a single-
entry write-bulk against the same writable item handle (using a
distinct sentinel value) and asserts a per-entry success. Confirms
the BulkWriteResult wire format end-to-end without complicating
the OnWriteComplete echo assertion the single-item phase already
verifies.
Dry-run validation passes for all five clients: each emits the correct
read-bulk and write-bulk CLI invocations with the right flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client.Java-006: close() on both clients only called shutdown(). It now
awaits termination up to the connect timeout and shutdownNow()s on timeout.
Client.Java-007: added MxGatewayLowFindingsTests covering the alarm surface,
async streaming, MxEventStream overflow, and TLS channel construction. A
latent bug surfaced: a missing CA file throws IllegalArgumentException, not
SSLException — the channel-builder catch was broadened accordingly.
Client.Java-008: async thenApply sites now route stray RuntimeExceptions
through MxGatewayErrors.fromGrpc via a normalising validator.
Client.Java-009: extracted ~80 duplicated lines (createChannel, withDeadline,
toCompletable, ...) into a shared MxGatewayChannels; both clients delegate.
Client.Java-010 (re-triaged): the README's metadata:read scope was correct;
the acknowledgeAlarm Javadoc's invoke:alarm-ack was wrong — corrected to the
admin scope.
Client.Java-011: documented the intentional fail-fast event-stream
backpressure in Javadoc and the README.
Client.Java-012: replaced CommonOptions.resolved()'s mutate-and-return-this
with side-effect-free resolvedApiKey()/resolvedTimeout() accessors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client.Java-001: redactApiKey echoed the last 4 secret characters. It now
keeps only the non-secret mxgw_<key-id>_ prefix plus ***; non-gateway-shaped
tokens return <redacted>.
Client.Java-002: a close() after a queue-overflow could wipe the enqueued
overflow exception. Terminal transitions are now serialized through a single
guarded terminate() — first terminal condition wins.
Client.Java-003: openSession never read gateway_protocol_version. Both
openSession paths now call ensureGatewayProtocolCompatible, rejecting a
non-zero mismatch and accepting unset (0) for older gateways.
Client.Java-004: register/addItem/addItem2 fell back to a return_value that
silently yields 0 when unset. The fallback is now guarded by hasReturnValue()
and throws on a protocol violation.
Client.Java-005: close() in try-with-resources could mask the body exception
when the CloseSession RPC failed. close() now catches and logs the
close-time failure; closeRaw() still surfaces it for callers that want it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Mirrors PR E.2's .NET surface
on the Java SDK. Depends on PR E.1 (regen, merged).
- MxGatewayClient.acknowledgeAlarm — blocking unary call, validates
protocol status via the existing MxGatewayErrors helper. Wraps
RuntimeException through MxGatewayErrors.fromGrpc for typed
failure mapping.
- MxGatewayClient.acknowledgeAlarmAsync — CompletableFuture variant
using the future stub.
- MxGatewayClient.queryActiveAlarms — async server-streaming RPC
observed via a new MxGatewayActiveAlarmsSubscription handle
(parallel to MxGatewayEventSubscription; the existing
subscription class is hard-typed to MxEvent so a parallel type
was simpler than retrofitting generics).
- MxGatewayClientVersion bumps GATEWAY_PROTOCOL_VERSION 2 → 3 to
match the .NET contract; CLI version-string assertions updated
to match.
Java SDK build green via Gradle 9.4.1 (mxgateway-client + mxgateway-cli).
17 tasks, all tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>