Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -447,7 +447,9 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let client = connect(connection).await?;
|
||||
let reply = client
|
||||
.invoke(MxCommandRequest {
|
||||
client_correlation_id: "rust-cli-ping".to_owned(),
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
"cli-ping",
|
||||
),
|
||||
command: Some(MxCommand {
|
||||
kind: MxCommandKind::Ping as i32,
|
||||
payload: Some(mxgateway_client::generated::mxaccess_gateway::v1::mx_command::Payload::Ping(
|
||||
@@ -494,7 +496,9 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let reply = client
|
||||
.close_session_raw(CloseSessionRequest {
|
||||
session_id,
|
||||
client_correlation_id: "rust-cli-close-session".to_owned(),
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
"cli-close-session",
|
||||
),
|
||||
})
|
||||
.await?;
|
||||
if json {
|
||||
@@ -1034,19 +1038,13 @@ async fn run_bench_read_bulk(
|
||||
.map(|r| r.item_handle)
|
||||
.collect();
|
||||
|
||||
let warmup_deadline = std::time::Instant::now()
|
||||
+ std::time::Duration::from_secs(warmup_seconds);
|
||||
let warmup_deadline =
|
||||
std::time::Instant::now() + std::time::Duration::from_secs(warmup_seconds);
|
||||
while std::time::Instant::now() < warmup_deadline {
|
||||
let _ = session
|
||||
.read_bulk(server_handle, &tags, timeout_ms)
|
||||
.await;
|
||||
let _ = session.read_bulk(server_handle, &tags, timeout_ms).await;
|
||||
}
|
||||
|
||||
let mut latencies_ms: Vec<f64> = Vec::with_capacity(65_536);
|
||||
let mut total_read_results: u64 = 0;
|
||||
let mut cached_read_results: u64 = 0;
|
||||
let mut successful_calls: u64 = 0;
|
||||
let mut failed_calls: u64 = 0;
|
||||
let mut stats = BenchReadBulkStats::default();
|
||||
let steady_start = std::time::Instant::now();
|
||||
let steady_deadline = steady_start + std::time::Duration::from_secs(duration_seconds);
|
||||
|
||||
@@ -1054,18 +1052,9 @@ async fn run_bench_read_bulk(
|
||||
let call_start = std::time::Instant::now();
|
||||
let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
|
||||
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
|
||||
latencies_ms.push(elapsed_ms);
|
||||
match outcome {
|
||||
Ok(results) => {
|
||||
successful_calls += 1;
|
||||
for r in &results {
|
||||
total_read_results += 1;
|
||||
if r.was_cached {
|
||||
cached_read_results += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(_) => failed_calls += 1,
|
||||
Ok(results) => stats.record_success(elapsed_ms, &results),
|
||||
Err(error) => stats.record_failure(elapsed_ms, &error),
|
||||
}
|
||||
}
|
||||
let steady_elapsed = steady_start.elapsed();
|
||||
@@ -1074,36 +1063,20 @@ async fn run_bench_read_bulk(
|
||||
let _ = session.unsubscribe_bulk(server_handle, item_handles).await;
|
||||
}
|
||||
|
||||
let total_calls = successful_calls + failed_calls;
|
||||
let calls_per_second = if steady_elapsed.as_secs_f64() > 0.0 {
|
||||
total_calls as f64 / steady_elapsed.as_secs_f64()
|
||||
} else {
|
||||
0.0
|
||||
let context = BenchReadBulkContext {
|
||||
endpoint: &endpoint,
|
||||
client_name: &client_name,
|
||||
bulk_size,
|
||||
duration_seconds,
|
||||
warmup_seconds,
|
||||
steady_elapsed,
|
||||
tags: &tags,
|
||||
};
|
||||
|
||||
let summary = percentile_summary(&latencies_ms);
|
||||
let stats = serde_json::json!({
|
||||
"language": "rust",
|
||||
"command": "bench-read-bulk",
|
||||
"endpoint": endpoint,
|
||||
"clientName": client_name,
|
||||
"bulkSize": bulk_size,
|
||||
"durationSeconds": duration_seconds,
|
||||
"warmupSeconds": warmup_seconds,
|
||||
"durationMs": steady_elapsed.as_millis() as u64,
|
||||
"tags": tags,
|
||||
"totalCalls": total_calls,
|
||||
"successfulCalls": successful_calls,
|
||||
"failedCalls": failed_calls,
|
||||
"totalReadResults": total_read_results,
|
||||
"cachedReadResults": cached_read_results,
|
||||
"callsPerSecond": round_to(calls_per_second, 2),
|
||||
"latencyMs": summary,
|
||||
});
|
||||
let json_stats = stats.to_json(&context);
|
||||
if use_json {
|
||||
println!("{}", stats);
|
||||
println!("{}", json_stats);
|
||||
} else {
|
||||
println!("{calls_per_second}");
|
||||
println!("{}", stats.calls_per_second(steady_elapsed));
|
||||
}
|
||||
Ok::<(), Error>(())
|
||||
}
|
||||
@@ -1113,6 +1086,102 @@ async fn run_bench_read_bulk(
|
||||
bench_outcome
|
||||
}
|
||||
|
||||
/// Per-iteration accounting for `bench-read-bulk`.
|
||||
///
|
||||
/// Only successful `read_bulk` calls contribute to the success-latency
|
||||
/// histogram (`success_latencies_ms`). Failures are tracked separately in
|
||||
/// `failure_latencies_ms` and the first failure's redacted error string is
|
||||
/// stashed in `first_failure` so a partial-failure run is visible in the
|
||||
/// emitted JSON. This keeps the cross-language `latencyMs.p99`/`max`
|
||||
/// contract honest: it reports successful-call latency only and never
|
||||
/// folds in a per-call timeout from a failed RPC.
|
||||
#[derive(Default)]
|
||||
struct BenchReadBulkStats {
|
||||
success_latencies_ms: Vec<f64>,
|
||||
failure_latencies_ms: Vec<f64>,
|
||||
total_read_results: u64,
|
||||
cached_read_results: u64,
|
||||
successful_calls: u64,
|
||||
failed_calls: u64,
|
||||
first_failure: Option<String>,
|
||||
}
|
||||
|
||||
impl BenchReadBulkStats {
|
||||
fn record_success(
|
||||
&mut self,
|
||||
elapsed_ms: f64,
|
||||
results: &[mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult],
|
||||
) {
|
||||
self.success_latencies_ms.push(elapsed_ms);
|
||||
self.successful_calls += 1;
|
||||
for result in results {
|
||||
self.total_read_results += 1;
|
||||
if result.was_cached {
|
||||
self.cached_read_results += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn record_failure(&mut self, elapsed_ms: f64, error: &Error) {
|
||||
self.failure_latencies_ms.push(elapsed_ms);
|
||||
self.failed_calls += 1;
|
||||
if self.first_failure.is_none() {
|
||||
self.first_failure = Some(error.to_string());
|
||||
}
|
||||
}
|
||||
|
||||
fn total_calls(&self) -> u64 {
|
||||
self.successful_calls + self.failed_calls
|
||||
}
|
||||
|
||||
fn calls_per_second(&self, elapsed: std::time::Duration) -> f64 {
|
||||
let seconds = elapsed.as_secs_f64();
|
||||
if seconds > 0.0 {
|
||||
self.total_calls() as f64 / seconds
|
||||
} else {
|
||||
0.0
|
||||
}
|
||||
}
|
||||
|
||||
fn to_json(&self, context: &BenchReadBulkContext<'_>) -> serde_json::Value {
|
||||
let calls_per_second = self.calls_per_second(context.steady_elapsed);
|
||||
let success_summary = percentile_summary(&self.success_latencies_ms);
|
||||
let failure_summary = percentile_summary(&self.failure_latencies_ms);
|
||||
serde_json::json!({
|
||||
"language": "rust",
|
||||
"command": "bench-read-bulk",
|
||||
"endpoint": context.endpoint,
|
||||
"clientName": context.client_name,
|
||||
"bulkSize": context.bulk_size,
|
||||
"durationSeconds": context.duration_seconds,
|
||||
"warmupSeconds": context.warmup_seconds,
|
||||
"durationMs": context.steady_elapsed.as_millis() as u64,
|
||||
"tags": context.tags,
|
||||
"totalCalls": self.total_calls(),
|
||||
"successfulCalls": self.successful_calls,
|
||||
"failedCalls": self.failed_calls,
|
||||
"totalReadResults": self.total_read_results,
|
||||
"cachedReadResults": self.cached_read_results,
|
||||
"callsPerSecond": round_to(calls_per_second, 2),
|
||||
"latencyMs": success_summary,
|
||||
"failureLatencyMs": failure_summary,
|
||||
"firstFailure": self.first_failure,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
/// Static configuration for one `bench-read-bulk` run, packaged so the
|
||||
/// JSON serialiser can quote it back without taking eight positional args.
|
||||
struct BenchReadBulkContext<'a> {
|
||||
endpoint: &'a str,
|
||||
client_name: &'a str,
|
||||
bulk_size: usize,
|
||||
duration_seconds: u64,
|
||||
warmup_seconds: u64,
|
||||
steady_elapsed: std::time::Duration,
|
||||
tags: &'a [String],
|
||||
}
|
||||
|
||||
fn percentile_summary(sample: &[f64]) -> serde_json::Value {
|
||||
if sample.is_empty() {
|
||||
return serde_json::json!({ "p50": 0.0, "p95": 0.0, "p99": 0.0, "max": 0.0, "mean": 0.0 });
|
||||
@@ -1294,7 +1363,13 @@ fn build_write_bulk_entries(
|
||||
item_handles: &[i32],
|
||||
value_type: CliValueType,
|
||||
values: &[String],
|
||||
) -> Result<Vec<(i32, mxgateway_client::generated::mxaccess_gateway::v1::MxValue)>, Error> {
|
||||
) -> Result<
|
||||
Vec<(
|
||||
i32,
|
||||
mxgateway_client::generated::mxaccess_gateway::v1::MxValue,
|
||||
)>,
|
||||
Error,
|
||||
> {
|
||||
if item_handles.len() != values.len() {
|
||||
return Err(Error::InvalidArgument {
|
||||
name: "values".to_owned(),
|
||||
@@ -1660,4 +1735,77 @@ mod tests {
|
||||
assert_eq!(frac.seconds, utc.seconds);
|
||||
assert_eq!(frac.nanos, 250_000_000);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram() {
|
||||
use mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult;
|
||||
use mxgateway_client::Error;
|
||||
|
||||
let mut stats = super::BenchReadBulkStats::default();
|
||||
let cached = BulkReadResult {
|
||||
was_cached: true,
|
||||
was_successful: true,
|
||||
..BulkReadResult::default()
|
||||
};
|
||||
let uncached = BulkReadResult {
|
||||
was_cached: false,
|
||||
was_successful: true,
|
||||
..BulkReadResult::default()
|
||||
};
|
||||
|
||||
// Two fast successes and one slow failure: the slow failure must
|
||||
// not pollute the success p99/max histogram.
|
||||
stats.record_success(1.5, std::slice::from_ref(&cached));
|
||||
stats.record_success(2.0, std::slice::from_ref(&uncached));
|
||||
let failure = Error::MalformedReply {
|
||||
detail: "synthetic failure for the bench test".to_owned(),
|
||||
};
|
||||
stats.record_failure(1_500.0, &failure);
|
||||
|
||||
assert_eq!(stats.success_latencies_ms, vec![1.5, 2.0]);
|
||||
assert_eq!(stats.failure_latencies_ms, vec![1_500.0]);
|
||||
assert_eq!(stats.successful_calls, 2);
|
||||
assert_eq!(stats.failed_calls, 1);
|
||||
assert_eq!(stats.total_calls(), 3);
|
||||
assert_eq!(stats.total_read_results, 2);
|
||||
assert_eq!(stats.cached_read_results, 1);
|
||||
assert!(stats
|
||||
.first_failure
|
||||
.as_deref()
|
||||
.unwrap()
|
||||
.contains("synthetic failure"));
|
||||
|
||||
let elapsed = std::time::Duration::from_secs(1);
|
||||
let context = super::BenchReadBulkContext {
|
||||
endpoint: "http://fake",
|
||||
client_name: "client",
|
||||
bulk_size: 2,
|
||||
duration_seconds: 1,
|
||||
warmup_seconds: 0,
|
||||
steady_elapsed: elapsed,
|
||||
tags: &[],
|
||||
};
|
||||
let payload = stats.to_json(&context);
|
||||
// The success-latency histogram must never see the 1_500 ms failure.
|
||||
assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 2.0);
|
||||
assert!(payload["latencyMs"]["p99"].as_f64().unwrap() <= 2.0);
|
||||
// The failure-latency histogram must own it instead.
|
||||
assert_eq!(
|
||||
payload["failureLatencyMs"]["max"].as_f64().unwrap(),
|
||||
1_500.0
|
||||
);
|
||||
assert_eq!(payload["failedCalls"].as_u64().unwrap(), 1);
|
||||
assert_eq!(payload["successfulCalls"].as_u64().unwrap(), 2);
|
||||
assert!(payload["firstFailure"]
|
||||
.as_str()
|
||||
.unwrap()
|
||||
.contains("synthetic failure"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bench_read_bulk_stats_calls_per_second_handles_zero_duration() {
|
||||
let stats = super::BenchReadBulkStats::default();
|
||||
|
||||
assert_eq!(stats.calls_per_second(std::time::Duration::ZERO), 0.0);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user