Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -93,23 +93,38 @@ impl Session {
|
||||
pub async fn subscribe_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
|
||||
pub async fn unsubscribe_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
|
||||
pub async fn write(&self, server_handle: i32, item_handle: i32, value: MxValue, user_id: i32) -> Result<(), Error>;
|
||||
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;
|
||||
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
|
||||
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;
|
||||
pub async fn events(&self) -> Result<impl Stream<Item = Result<MxEvent, Error>>, Error>;
|
||||
pub async fn close(&self) -> Result<(), Error>;
|
||||
}
|
||||
```
|
||||
|
||||
The five bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`,
|
||||
The four bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`,
|
||||
`write_secured2_bulk`) and `read_bulk` mirror the worker's bulk command shapes
|
||||
in `mxaccess_gateway.proto` and use the same correlation-id discipline as the
|
||||
unary helpers — `session::next_correlation_id` is `pub` so that consumers
|
||||
constructing raw `MxCommandRequest`/`CloseSessionRequest` payloads outside
|
||||
the `Session` helpers (notably the `mxgw` test CLI's `ping` and
|
||||
`close-session` subcommands) share the same id generation.
|
||||
unary helpers — `next_correlation_id` is part of the public SDK surface,
|
||||
re-exported at the crate root (`mxgateway_client::next_correlation_id`), so
|
||||
that consumers constructing raw `MxCommandRequest`/`CloseSessionRequest`
|
||||
payloads outside the `Session` helpers (notably the `mxgw` test CLI's `ping`
|
||||
and `close-session` subcommands) share the same id generation. The returned
|
||||
id is documented as an opaque token with three guaranteed properties
|
||||
(embeds the caller's label, unique within a process, carries no secret);
|
||||
its textual format is intentionally *not* part of the contract.
|
||||
|
||||
The per-entry fields that the matching MXAccess COM calls accept once per
|
||||
batch — `user_id` (`WriteBulkEntry`/`Write2BulkEntry`), `timestamp_value`
|
||||
(`Write2BulkEntry`/`WriteSecured2BulkEntry`), and `current_user_id` /
|
||||
`verifier_user_id` (`WriteSecuredBulkEntry`/`WriteSecured2BulkEntry`) — live
|
||||
on the entry structs themselves rather than as trailing positional arguments
|
||||
on the helper, matching the protobuf shapes in
|
||||
`mxaccess_gateway.proto` (`WriteBulkCommand` / `Write2BulkCommand` /
|
||||
`WriteSecuredBulkCommand` / `WriteSecured2BulkCommand`). `read_bulk` is
|
||||
generic over `AsRef<str>` so callers can pass `&[String]` or `&[&str]`
|
||||
without cloning at the call site.
|
||||
|
||||
## Authentication
|
||||
|
||||
|
||||
@@ -447,9 +447,7 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let client = connect(connection).await?;
|
||||
let reply = client
|
||||
.invoke(MxCommandRequest {
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
"cli-ping",
|
||||
),
|
||||
client_correlation_id: mxgateway_client::next_correlation_id("cli-ping"),
|
||||
command: Some(MxCommand {
|
||||
kind: MxCommandKind::Ping as i32,
|
||||
payload: Some(mxgateway_client::generated::mxaccess_gateway::v1::mx_command::Payload::Ping(
|
||||
@@ -496,7 +494,7 @@ async fn run(cli: Cli) -> Result<(), Error> {
|
||||
let reply = client
|
||||
.close_session_raw(CloseSessionRequest {
|
||||
session_id,
|
||||
client_correlation_id: mxgateway_client::session::next_correlation_id(
|
||||
client_correlation_id: mxgateway_client::next_correlation_id(
|
||||
"cli-close-session",
|
||||
),
|
||||
})
|
||||
@@ -1088,16 +1086,17 @@ async fn run_bench_read_bulk(
|
||||
|
||||
/// Per-iteration accounting for `bench-read-bulk`.
|
||||
///
|
||||
/// Only successful `read_bulk` calls contribute to the success-latency
|
||||
/// histogram (`success_latencies_ms`). Failures are tracked separately in
|
||||
/// `failure_latencies_ms` and the first failure's redacted error string is
|
||||
/// stashed in `first_failure` so a partial-failure run is visible in the
|
||||
/// emitted JSON. This keeps the cross-language `latencyMs.p99`/`max`
|
||||
/// contract honest: it reports successful-call latency only and never
|
||||
/// folds in a per-call timeout from a failed RPC.
|
||||
/// Every `read_bulk` call's elapsed time contributes to the all-calls
|
||||
/// histogram (`latencies_ms`), matching the .NET/Go/Python/Java bench
|
||||
/// implementations whose `latencyMs` field is the cross-language comparison
|
||||
/// contract collated by `scripts/bench-read-bulk.ps1`. Failures additionally
|
||||
/// land in `failure_latencies_ms` and the first failure's redacted error
|
||||
/// string is stashed in `first_failure`, both surfaced through the JSON as
|
||||
/// Rust-only enrichment so a partial-failure run is still visible at the
|
||||
/// report layer without breaking the side-by-side comparison.
|
||||
#[derive(Default)]
|
||||
struct BenchReadBulkStats {
|
||||
success_latencies_ms: Vec<f64>,
|
||||
latencies_ms: Vec<f64>,
|
||||
failure_latencies_ms: Vec<f64>,
|
||||
total_read_results: u64,
|
||||
cached_read_results: u64,
|
||||
@@ -1112,7 +1111,7 @@ impl BenchReadBulkStats {
|
||||
elapsed_ms: f64,
|
||||
results: &[mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult],
|
||||
) {
|
||||
self.success_latencies_ms.push(elapsed_ms);
|
||||
self.latencies_ms.push(elapsed_ms);
|
||||
self.successful_calls += 1;
|
||||
for result in results {
|
||||
self.total_read_results += 1;
|
||||
@@ -1123,6 +1122,7 @@ impl BenchReadBulkStats {
|
||||
}
|
||||
|
||||
fn record_failure(&mut self, elapsed_ms: f64, error: &Error) {
|
||||
self.latencies_ms.push(elapsed_ms);
|
||||
self.failure_latencies_ms.push(elapsed_ms);
|
||||
self.failed_calls += 1;
|
||||
if self.first_failure.is_none() {
|
||||
@@ -1145,7 +1145,7 @@ impl BenchReadBulkStats {
|
||||
|
||||
fn to_json(&self, context: &BenchReadBulkContext<'_>) -> serde_json::Value {
|
||||
let calls_per_second = self.calls_per_second(context.steady_elapsed);
|
||||
let success_summary = percentile_summary(&self.success_latencies_ms);
|
||||
let latency_summary = percentile_summary(&self.latencies_ms);
|
||||
let failure_summary = percentile_summary(&self.failure_latencies_ms);
|
||||
serde_json::json!({
|
||||
"language": "rust",
|
||||
@@ -1163,7 +1163,7 @@ impl BenchReadBulkStats {
|
||||
"totalReadResults": self.total_read_results,
|
||||
"cachedReadResults": self.cached_read_results,
|
||||
"callsPerSecond": round_to(calls_per_second, 2),
|
||||
"latencyMs": success_summary,
|
||||
"latencyMs": latency_summary,
|
||||
"failureLatencyMs": failure_summary,
|
||||
"firstFailure": self.first_failure,
|
||||
})
|
||||
@@ -1737,7 +1737,7 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram() {
|
||||
fn bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately() {
|
||||
use mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult;
|
||||
use mxgateway_client::Error;
|
||||
|
||||
@@ -1753,8 +1753,10 @@ mod tests {
|
||||
..BulkReadResult::default()
|
||||
};
|
||||
|
||||
// Two fast successes and one slow failure: the slow failure must
|
||||
// not pollute the success p99/max histogram.
|
||||
// Two fast successes and one slow failure: every call lands in the
|
||||
// all-calls histogram (the cross-language `latencyMs` contract) and
|
||||
// the failure additionally surfaces through `failureLatencyMs` plus
|
||||
// `firstFailure` as Rust-only enrichment.
|
||||
stats.record_success(1.5, std::slice::from_ref(&cached));
|
||||
stats.record_success(2.0, std::slice::from_ref(&uncached));
|
||||
let failure = Error::MalformedReply {
|
||||
@@ -1762,7 +1764,7 @@ mod tests {
|
||||
};
|
||||
stats.record_failure(1_500.0, &failure);
|
||||
|
||||
assert_eq!(stats.success_latencies_ms, vec![1.5, 2.0]);
|
||||
assert_eq!(stats.latencies_ms, vec![1.5, 2.0, 1_500.0]);
|
||||
assert_eq!(stats.failure_latencies_ms, vec![1_500.0]);
|
||||
assert_eq!(stats.successful_calls, 2);
|
||||
assert_eq!(stats.failed_calls, 1);
|
||||
@@ -1786,10 +1788,12 @@ mod tests {
|
||||
tags: &[],
|
||||
};
|
||||
let payload = stats.to_json(&context);
|
||||
// The success-latency histogram must never see the 1_500 ms failure.
|
||||
assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 2.0);
|
||||
assert!(payload["latencyMs"]["p99"].as_f64().unwrap() <= 2.0);
|
||||
// The failure-latency histogram must own it instead.
|
||||
// The all-calls histogram (cross-language `latencyMs` contract)
|
||||
// includes the failure latency so the side-by-side comparison with
|
||||
// .NET/Go/Python/Java stays apples-to-apples.
|
||||
assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 1_500.0);
|
||||
// The Rust-only `failureLatencyMs` enrichment surfaces failures
|
||||
// separately for partial-failure diagnostics.
|
||||
assert_eq!(
|
||||
payload["failureLatencyMs"]["max"].as_f64().unwrap(),
|
||||
1_500.0
|
||||
|
||||
@@ -32,7 +32,7 @@ pub use galaxy::{DeployEventStream, GalaxyClient};
|
||||
#[doc(inline)]
|
||||
pub use options::ClientOptions;
|
||||
#[doc(inline)]
|
||||
pub use session::Session;
|
||||
pub use session::{next_correlation_id, Session};
|
||||
#[doc(inline)]
|
||||
pub use value::{MxArrayProjection, MxArrayValue, MxStatus, MxValue, MxValueProjection};
|
||||
#[doc(inline)]
|
||||
|
||||
@@ -37,8 +37,20 @@ static CORRELATION_SEQUENCE: AtomicU64 = AtomicU64::new(0);
|
||||
/// Exposed so consumers that construct raw [`MxCommandRequest`] /
|
||||
/// [`CloseSessionRequest`] payloads outside the `Session` helpers — notably
|
||||
/// the `mxgw` test CLI — share the same correlation-id discipline as the
|
||||
/// library. The returned id is `rust-client-{label}-{N}` where `N` comes
|
||||
/// from a process-wide atomic sequence.
|
||||
/// library. Also re-exported at the crate root as
|
||||
/// [`mxgateway_client::next_correlation_id`](crate::next_correlation_id).
|
||||
///
|
||||
/// The returned id has the following guaranteed properties:
|
||||
///
|
||||
/// - it embeds the supplied `label` verbatim so log readers can pick out
|
||||
/// which call site emitted it;
|
||||
/// - it is unique within the lifetime of a single process (driven by an
|
||||
/// internal monotonically-increasing atomic sequence);
|
||||
/// - it carries no embedded secret or user-supplied payload beyond `label`.
|
||||
///
|
||||
/// The exact textual format (currently `rust-client-{label}-{N}`) is *not*
|
||||
/// part of the public contract and may change between releases — callers
|
||||
/// must not parse it. Treat the returned `String` as an opaque token.
|
||||
#[must_use]
|
||||
pub fn next_correlation_id(label: &str) -> String {
|
||||
let sequence = CORRELATION_SEQUENCE.fetch_add(1, Ordering::Relaxed);
|
||||
|
||||
Reference in New Issue
Block a user