Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules

Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 09:46:47 -04:00
parent 1cd51bbda3
commit a0203503a7
122 changed files with 8723 additions and 757 deletions
+394 -30
View File
@@ -20,7 +20,8 @@ use mxgateway_client::generated::mxaccess_gateway::v1::{
CloseSessionReply, CloseSessionRequest, MxCommandKind, MxCommandReply, MxDataType, MxEvent,
MxEventFamily, MxStatusCategory, MxStatusProxy, MxStatusSource, MxValue, OpenSessionReply,
OpenSessionRequest, ProtocolStatus, ProtocolStatusCode, QueryActiveAlarmsRequest, SessionState,
StreamEventsRequest, SubscribeResult, WriteBulkEntry,
StreamEventsRequest, SubscribeResult, Write2BulkEntry, WriteBulkEntry, WriteSecured2BulkEntry,
WriteSecuredBulkEntry,
};
use mxgateway_client::{
ApiKey, ClientOptions, CommandError, Error, GatewayClient, MxStatus, MxValue as ClientMxValue,
@@ -160,7 +161,10 @@ async fn read_bulk_forwards_timeout_and_unpacks_cached_flag() {
let entry = &results[0];
assert!(entry.was_cached);
assert_eq!(entry.value.as_ref().and_then(|v| v.kind.as_ref()), Some(&Kind::Int32Value(99)));
assert_eq!(
entry.value.as_ref().and_then(|v| v.kind.as_ref()),
Some(&Kind::Int32Value(99))
);
assert_eq!(*state.last_read_bulk_timeout_ms.lock().await, Some(750));
}
@@ -393,6 +397,238 @@ async fn connect_with_unreadable_ca_file_reports_invalid_endpoint() {
);
}
#[tokio::test]
async fn register_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.register("client-name").await.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("Register")),
"expected MalformedReply for register, got {error:?}"
);
}
#[tokio::test]
async fn add_item_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("AddItem")),
"expected MalformedReply for add_item, got {error:?}"
);
}
#[tokio::test]
async fn add_item2_returns_malformed_reply_when_ok_reply_has_no_payload() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.add_item2(12, "Plant.Area.Tag", "ctx")
.await
.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("AddItem2")),
"expected MalformedReply for add_item2, got {error:?}"
);
}
#[tokio::test]
async fn subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForBulk);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.subscribe_bulk(12, vec!["Tank01.Level".to_owned()])
.await
.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("bulk")),
"expected MalformedReply for subscribe_bulk, got {error:?}"
);
}
#[tokio::test]
async fn write_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForBulkWrite);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.write_bulk(
12,
vec![WriteBulkEntry {
item_handle: 901,
value: Some(int_value(11)),
user_id: 5,
}],
)
.await
.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("bulk write")),
"expected MalformedReply for write_bulk, got {error:?}"
);
}
#[tokio::test]
async fn read_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForReadBulk);
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session
.read_bulk(12, &["Tank01.Level"], 500)
.await
.unwrap_err();
assert!(
matches!(&error, Error::MalformedReply { detail } if detail.contains("ReadBulk")),
"expected MalformedReply for read_bulk, got {error:?}"
);
}
#[tokio::test]
async fn unary_invoke_maps_status_unavailable_to_error_unavailable() {
let state = Arc::new(FakeState::default());
*state.invoke_override.lock().await =
Some(InvokeOverride::Unavailable("gateway restarting".to_owned()));
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
assert!(
matches!(&error, Error::Unavailable { .. }),
"expected Error::Unavailable for unary unavailable, got {error:?}"
);
}
#[tokio::test]
async fn write2_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write2_bulk(
12,
vec![Write2BulkEntry {
item_handle: 901,
value: Some(int_value(11)),
timestamp_value: Some(int_value(0)),
user_id: 5,
}],
)
.await
.unwrap();
assert_eq!(results.len(), 2);
assert!(results[0].was_successful);
assert!(!results[1].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::Write2Bulk as i32));
}
#[tokio::test]
async fn write_secured_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write_secured_bulk(
12,
vec![WriteSecuredBulkEntry {
item_handle: 901,
current_user_id: 7,
verifier_user_id: 9,
value: Some(int_value(11)),
}],
)
.await
.unwrap();
assert_eq!(results.len(), 2);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::WriteSecuredBulk as i32));
}
#[tokio::test]
async fn write_secured2_bulk_round_trips_through_the_fake_gateway() {
let state = Arc::new(FakeState::default());
let endpoint = spawn_fake_gateway(state.clone()).await;
let client = GatewayClient::connect(ClientOptions::new(endpoint))
.await
.unwrap();
let session = client.session("session-fixture");
let results = session
.write_secured2_bulk(
12,
vec![WriteSecured2BulkEntry {
item_handle: 901,
current_user_id: 7,
verifier_user_id: 9,
value: Some(int_value(11)),
timestamp_value: Some(int_value(0)),
}],
)
.await
.unwrap();
assert_eq!(results.len(), 2);
assert!(results[0].was_successful);
let last_command = state.last_command_kind.lock().await;
assert_eq!(*last_command, Some(MxCommandKind::WriteSecured2Bulk as i32));
}
#[derive(Default)]
struct FakeState {
authorization: Mutex<Option<String>>,
@@ -400,6 +636,39 @@ struct FakeState {
last_read_bulk_timeout_ms: Mutex<Option<u32>>,
stream_dropped: Arc<AtomicBool>,
emit_stream_fault: AtomicBool,
/// Test-injected override for the next (and all subsequent) `Invoke`
/// calls. When `Some`, the fake gateway returns the override's response
/// instead of its default per-kind reply. Used by the malformed-reply
/// and unary-Unavailable tests; default `None` preserves existing
/// happy-path test behaviour.
invoke_override: Mutex<Option<InvokeOverride>>,
}
/// Test-injected override for the fake gateway's `Invoke` handler.
///
/// Each variant short-circuits the per-kind dispatch in `FakeGateway::invoke`
/// and reproduces one of the wire shapes the Rust client's error paths must
/// handle. The bool tags the OK reply variants as "OK envelope, payload
/// missing/wrong" — the exact condition the new `Error::MalformedReply`
/// paths in `session.rs` are designed to catch.
#[derive(Clone)]
enum InvokeOverride {
/// Return `Status::unavailable(message)` from the unary Invoke RPC, so
/// the client maps it to `Error::Unavailable`.
Unavailable(String),
/// Return an OK `MxCommandReply` whose `payload` field is `None`. Used
/// to exercise `register_server_handle` / `add_item_handle` /
/// `add_item2_handle` falling through to the `MalformedReply` arm.
OkReplyNoPayload,
/// Return an OK reply whose payload arm does not match the bulk-read
/// command, so `read_bulk` falls through to its `MalformedReply` arm.
OkReplyWrongPayloadForReadBulk,
/// Return an OK reply whose payload arm does not match the requested
/// bulk command, so `bulk_results` falls through to `MalformedReply`.
OkReplyWrongPayloadForBulk,
/// Return an OK reply whose payload arm does not match the requested
/// bulk-write command, so `bulk_write_results` returns `MalformedReply`.
OkReplyWrongPayloadForBulkWrite,
}
#[derive(Clone)]
@@ -453,6 +722,58 @@ impl MxAccessGateway for FakeGateway {
.unwrap_or_default();
*self.state.last_command_kind.lock().await = Some(kind);
if let Some(override_) = self.state.invoke_override.lock().await.clone() {
return match override_ {
InvokeOverride::Unavailable(message) => Err(Status::unavailable(message)),
InvokeOverride::OkReplyNoPayload => Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok but payload omitted")),
payload: None,
..MxCommandReply::default()
})),
InvokeOverride::OkReplyWrongPayloadForReadBulk => {
Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("read-bulk wrong payload arm")),
// AddItem payload arm against a ReadBulk request:
// the client's `read_bulk` matcher must reject it.
payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
item_handle: 0,
})),
..MxCommandReply::default()
}))
}
InvokeOverride::OkReplyWrongPayloadForBulk => Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("bulk wrong payload arm")),
// AddItem payload arm against a SubscribeBulk request.
payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
item_handle: 0,
})),
..MxCommandReply::default()
})),
InvokeOverride::OkReplyWrongPayloadForBulkWrite => {
Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("bulk-write wrong payload arm")),
// AddItem payload arm against a WriteBulk request.
payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
item_handle: 0,
})),
..MxCommandReply::default()
}))
}
};
}
if kind == MxCommandKind::Write as i32 {
return Ok(Response::new(mxaccess_failure_reply()));
}
@@ -478,36 +799,41 @@ impl MxAccessGateway for FakeGateway {
}));
}
// All four bulk-write families return `BulkWriteReply` over the
// wire and only differ by which `payload` arm carries it. The
// round-trip tests below want one entry per family, so wire them
// all up to the same canned reply (one success + one failure) and
// pick the matching payload arm by kind.
if kind == MxCommandKind::WriteBulk as i32 {
// Echo one success and one failure so the test can assert the per-entry
// shape and verify the call did not throw on per-entry failure.
return Ok(Response::new(MxCommandReply {
session_id: request.session_id,
correlation_id: "fake-correlation".to_owned(),
return Ok(Response::new(bulk_write_envelope(
request.session_id,
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(mx_command_reply::Payload::WriteBulk(BulkWriteReply {
results: vec![
BulkWriteResult {
server_handle: 12,
item_handle: 901,
was_successful: true,
hresult: None,
statuses: vec![],
error_message: String::new(),
},
BulkWriteResult {
server_handle: 12,
item_handle: 902,
was_successful: false,
hresult: None,
statuses: vec![],
error_message: "invalid handle".to_owned(),
},
],
})),
..MxCommandReply::default()
}));
mx_command_reply::Payload::WriteBulk(canned_bulk_write_reply()),
)));
}
if kind == MxCommandKind::Write2Bulk as i32 {
return Ok(Response::new(bulk_write_envelope(
request.session_id,
kind,
mx_command_reply::Payload::Write2Bulk(canned_bulk_write_reply()),
)));
}
if kind == MxCommandKind::WriteSecuredBulk as i32 {
return Ok(Response::new(bulk_write_envelope(
request.session_id,
kind,
mx_command_reply::Payload::WriteSecuredBulk(canned_bulk_write_reply()),
)));
}
if kind == MxCommandKind::WriteSecured2Bulk as i32 {
return Ok(Response::new(bulk_write_envelope(
request.session_id,
kind,
mx_command_reply::Payload::WriteSecured2Bulk(canned_bulk_write_reply()),
)));
}
if kind == MxCommandKind::ReadBulk as i32 {
@@ -699,6 +1025,44 @@ fn mxaccess_failure_reply() -> MxCommandReply {
}
}
fn canned_bulk_write_reply() -> BulkWriteReply {
BulkWriteReply {
results: vec![
BulkWriteResult {
server_handle: 12,
item_handle: 901,
was_successful: true,
hresult: None,
statuses: vec![],
error_message: String::new(),
},
BulkWriteResult {
server_handle: 12,
item_handle: 902,
was_successful: false,
hresult: None,
statuses: vec![],
error_message: "invalid handle".to_owned(),
},
],
}
}
fn bulk_write_envelope(
session_id: String,
kind: i32,
payload: mx_command_reply::Payload,
) -> MxCommandReply {
MxCommandReply {
session_id,
correlation_id: "fake-correlation".to_owned(),
kind,
protocol_status: Some(ok_status("command ok")),
payload: Some(payload),
..MxCommandReply::default()
}
}
fn event(sequence: u64) -> MxEvent {
MxEvent {
family: MxEventFamily::OnDataChange as i32,