Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -447,9 +447,7 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let client = connect(connection).await?;
|
||||
let reply = client
|
||||
.invoke(MxCommandRequest {
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
"cli-ping",
|
||||
),
|
||||
client_correlation_id: mxgateway_client::next_correlation_id("cli-ping"),
|
||||
command: Some(MxCommand {
|
||||
kind: MxCommandKind::Ping as i32,
|
||||
payload: Some(mxgateway_client::generated::mxaccess_gateway::v1::mx_command::Payload::Ping(
|
||||
@@ -496,7 +494,7 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let reply = client
|
||||
.close_session_raw(CloseSessionRequest {
|
||||
session_id,
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
client_correlation_id: mxgateway_client::next_correlation_id(
|
||||
"cli-close-session",
|
||||
),
|
||||
})
|
||||
@@ -1088,16 +1086,17 @@ async fn run_bench_read_bulk(
|
||||
|
||||
/// Per-iteration accounting for `bench-read-bulk`.
|
||||
///
|
||||
/// Only successful `read_bulk` calls contribute to the success-latency
|
||||
/// histogram (`success_latencies_ms`). Failures are tracked separately in
|
||||
/// `failure_latencies_ms` and the first failure's redacted error string is
|
||||
/// stashed in `first_failure` so a partial-failure run is visible in the
|
||||
/// emitted JSON. This keeps the cross-language `latencyMs.p99`/`max`
|
||||
/// contract honest: it reports successful-call latency only and never
|
||||
/// folds in a per-call timeout from a failed RPC.
|
||||
/// Every `read_bulk` call's elapsed time contributes to the all-calls
|
||||
/// histogram (`latencies_ms`), matching the .NET/Go/Python/Java bench
|
||||
/// implementations whose `latencyMs` field is the cross-language comparison
|
||||
/// contract collated by `scripts/bench-read-bulk.ps1`. Failures additionally
|
||||
/// land in `failure_latencies_ms` and the first failure's redacted error
|
||||
/// string is stashed in `first_failure`, both surfaced through the JSON as
|
||||
/// Rust-only enrichment so a partial-failure run is still visible at the
|
||||
/// report layer without breaking the side-by-side comparison.
|
||||
#[derive(Default)]
|
||||
struct BenchReadBulkStats {
|
||||
success_latencies_ms: Vec<f64>,
|
||||
latencies_ms: Vec<f64>,
|
||||
failure_latencies_ms: Vec<f64>,
|
||||
total_read_results: u64,
|
||||
cached_read_results: u64,
|
||||
@@ -1112,7 +1111,7 @@ impl BenchReadBulkStats {
|
||||
elapsed_ms: f64,
|
||||
results: &[mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult],
|
||||
) {
|
||||
self.success_latencies_ms.push(elapsed_ms);
|
||||
self.latencies_ms.push(elapsed_ms);
|
||||
self.successful_calls += 1;
|
||||
for result in results {
|
||||
self.total_read_results += 1;
|
||||
@@ -1123,6 +1122,7 @@ impl BenchReadBulkStats {
|
||||
}
|
||||
|
||||
fn record_failure(&mut self, elapsed_ms: f64, error: &Error) {
|
||||
self.latencies_ms.push(elapsed_ms);
|
||||
self.failure_latencies_ms.push(elapsed_ms);
|
||||
self.failed_calls += 1;
|
||||
if self.first_failure.is_none() {
|
||||
@@ -1145,7 +1145,7 @@ impl BenchReadBulkStats {
|
||||
|
||||
fn to_json(&self, context: &BenchReadBulkContext<'_>) -> serde_json::Value {
|
||||
let calls_per_second = self.calls_per_second(context.steady_elapsed);
|
||||
let success_summary = percentile_summary(&self.success_latencies_ms);
|
||||
let latency_summary = percentile_summary(&self.latencies_ms);
|
||||
let failure_summary = percentile_summary(&self.failure_latencies_ms);
|
||||
serde_json::json!({
|
||||
"language": "rust",
|
||||
@@ -1163,7 +1163,7 @@ impl BenchReadBulkStats {
|
||||
"totalReadResults": self.total_read_results,
|
||||
"cachedReadResults": self.cached_read_results,
|
||||
"callsPerSecond": round_to(calls_per_second, 2),
|
||||
"latencyMs": success_summary,
|
||||
"latencyMs": latency_summary,
|
||||
"failureLatencyMs": failure_summary,
|
||||
"firstFailure": self.first_failure,
|
||||
})
|
||||
@@ -1737,7 +1737,7 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram() {
|
||||
fn bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately() {
|
||||
use mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult;
|
||||
use mxgateway_client::Error;
|
||||
|
||||
@@ -1753,8 +1753,10 @@ mod tests {
|
||||
..BulkReadResult::default()
|
||||
};
|
||||
|
||||
// Two fast successes and one slow failure: the slow failure must
|
||||
// not pollute the success p99/max histogram.
|
||||
// Two fast successes and one slow failure: every call lands in the
|
||||
// all-calls histogram (the cross-language `latencyMs` contract) and
|
||||
// the failure additionally surfaces through `failureLatencyMs` plus
|
||||
// `firstFailure` as Rust-only enrichment.
|
||||
stats.record_success(1.5, std::slice::from_ref(&cached));
|
||||
stats.record_success(2.0, std::slice::from_ref(&uncached));
|
||||
let failure = Error::MalformedReply {
|
||||
@@ -1762,7 +1764,7 @@ mod tests {
|
||||
};
|
||||
stats.record_failure(1_500.0, &failure);
|
||||
|
||||
assert_eq!(stats.success_latencies_ms, vec![1.5, 2.0]);
|
||||
assert_eq!(stats.latencies_ms, vec![1.5, 2.0, 1_500.0]);
|
||||
assert_eq!(stats.failure_latencies_ms, vec![1_500.0]);
|
||||
assert_eq!(stats.successful_calls, 2);
|
||||
assert_eq!(stats.failed_calls, 1);
|
||||
@@ -1786,10 +1788,12 @@ mod tests {
|
||||
tags: &[],
|
||||
};
|
||||
let payload = stats.to_json(&context);
|
||||
// The success-latency histogram must never see the 1_500 ms failure.
|
||||
assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 2.0);
|
||||
assert!(payload["latencyMs"]["p99"].as_f64().unwrap() <= 2.0);
|
||||
// The failure-latency histogram must own it instead.
|
||||
// The all-calls histogram (cross-language `latencyMs` contract)
|
||||
// includes the failure latency so the side-by-side comparison with
|
||||
// .NET/Go/Python/Java stays apples-to-apples.
|
||||
assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 1_500.0);
|
||||
// The Rust-only `failureLatencyMs` enrichment surfaces failures
|
||||
// separately for partial-failure diagnostics.
|
||||
assert_eq!(
|
||||
payload["failureLatencyMs"]["max"].as_f64().unwrap(),
|
||||
1_500.0
|
||||
|
||||
Reference in New Issue
Block a user