Files
mxaccessgw/clients/rust/RustClientDesign.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

244 lines
9.0 KiB
Markdown

# Rust Client Detailed Design
## Purpose
Provide an async Rust client crate for MXAccess Gateway, plus a test CLI and
unit tests. The Rust client should use `tonic` and `tokio`.
Follow the [Rust Style Guide](../../docs/style-guides/RustStyleGuide.md) for handwritten
code and the [Protobuf Style Guide](../../docs/style-guides/ProtobufStyleGuide.md) for
generated contract inputs.
## Crate Layout
Actual layout — the `mxgateway-client` library crate is the workspace root,
with the `mxgw` test CLI as a workspace member:
```text
clients/rust/ # `mxgateway-client` library crate (workspace root)
Cargo.toml
build.rs
src/
lib.rs
client.rs
session.rs
galaxy.rs
options.rs
auth.rs
value.rs
version.rs
error.rs
generated.rs
crates/
mxgw-cli/ # `mxgw` test CLI (workspace member)
Cargo.toml
src/main.rs
tests/
client_behavior.rs
proto_fixtures.rs
```
Dependencies:
- `tonic`
- `prost`
- `prost-types`
- `tokio`
- `tokio-stream`
- `thiserror`
- `clap`
- `serde`
- `serde_json`
## Library API
Suggested API:
```rust
pub struct GatewayClient { /* tonic channel + generated client */ }
pub struct ClientOptions {
pub endpoint: String,
pub api_key: String,
pub plaintext: bool,
pub ca_file: Option<PathBuf>,
pub server_name_override: Option<String>,
pub connect_timeout: Duration,
pub call_timeout: Duration,
}
impl GatewayClient {
pub async fn connect(options: ClientOptions) -> Result<Self, Error>;
pub async fn open_session(&self, options: OpenSessionOptions) -> Result<Session, Error>;
pub async fn invoke(&self, request: MxCommandRequest) -> Result<MxCommandReply, Error>;
}
```
Session:
```rust
pub struct Session {
pub id: String,
}
impl Session {
pub async fn register(&self, client_name: &str) -> Result<i32, Error>;
pub async fn add_item(&self, server_handle: i32, item: &str) -> Result<i32, Error>;
pub async fn add_item2(&self, server_handle: i32, item: &str, context: &str) -> Result<i32, Error>;
pub async fn advise(&self, server_handle: i32, item_handle: i32) -> Result<(), Error>;
pub async fn add_item_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn advise_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn remove_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn un_advise_item_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn subscribe_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn unsubscribe_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
pub async fn write(&self, server_handle: i32, item_handle: i32, value: MxValue, user_id: i32) -> Result<(), Error>;
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;
pub async fn events(&self) -> Result<impl Stream<Item = Result<MxEvent, Error>>, Error>;
pub async fn close(&self) -> Result<(), Error>;
}
```
The five bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`,
`write_secured2_bulk`) and `read_bulk` mirror the worker's bulk command shapes
in `mxaccess_gateway.proto` and use the same correlation-id discipline as the
unary helpers — `session::next_correlation_id` is `pub` so that consumers
constructing raw `MxCommandRequest`/`CloseSessionRequest` payloads outside
the `Session` helpers (notably the `mxgw` test CLI's `ping` and
`close-session` subcommands) share the same id generation.
## Authentication
Use a `tonic` interceptor or request extension layer to add:
```text
authorization: Bearer <api key>
```
Use `SecretString` or equivalent if a dependency is acceptable. Always redact
API keys in `Debug` output.
## TLS
Support:
- plaintext channel for local development,
- native or rustls TLS depending on project preference,
- custom CA file,
- domain override.
## Streaming
Expose event streams as a `Stream<Item = Result<MxEvent, Error>>`. Dropping the
stream should cancel the underlying gRPC stream.
Do not buffer unboundedly in the client. If a helper channel is used, make it
bounded.
## Error Handling
Use `thiserror`:
```rust
pub enum Error {
InvalidEndpoint { endpoint: String, detail: String },
InvalidArgument { name: String, detail: String },
Transport(tonic::transport::Error),
Authentication { message: String, status: Box<tonic::Status> },
Authorization { message: String, status: Box<tonic::Status> },
Timeout { message: String, status: Box<tonic::Status> },
Cancelled { message: String, status: Box<tonic::Status> },
Unavailable { message: String, status: Box<tonic::Status> },
Status(Box<tonic::Status>),
Command(Box<CommandError>),
ProtocolStatus { operation: &'static str, code: ProtocolStatusCode, message: String },
MalformedReply { detail: String },
}
```
`Unavailable` classifies the transient `Code::Unavailable` /
`Code::ResourceExhausted` statuses so callers can decide whether to retry
without unwrapping the raw status. `MalformedReply` surfaces OK replies
whose payload does not carry the data the command contract requires (for
example, an `AddItem` reply missing the item handle, or a `WriteBulk` reply
carrying the wrong payload arm). `InvalidEndpoint` is returned when the
endpoint URL fails to parse or its TLS material cannot be loaded.
Preserve raw command replies in `CommandError` where applicable.
## Test CLI
Binary: `mxgw`.
Use `clap` derive.
Commands (see `clients/rust/README.md` for full argument lists):
```text
mxgw version
mxgw ping
mxgw open-session
mxgw close-session --session-id <id>
mxgw register --session-id <id> --client-name <name>
mxgw add-item --session-id <id> --server-handle <h> --item <tag>
mxgw advise --session-id <id> --server-handle <h> --item-handle <h>
mxgw subscribe-bulk --session-id <id> --server-handle <h> --items <a,b,c>
mxgw unsubscribe-bulk --session-id <id> --server-handle <h> --item-handles <1,2,3>
mxgw read-bulk --session-id <id> --server-handle <h> --items <a,b,c> --timeout-ms 1500
mxgw write --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123
mxgw write2 --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123 --timestamp <rfc3339>
mxgw write-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2>
mxgw write2-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2> --timestamp <rfc3339>
mxgw write-secured-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2>
mxgw write-secured2-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2> --timestamp <rfc3339>
mxgw stream-events --session-id <id> --json
mxgw bench-read-bulk --duration-seconds 30 --bulk-size 6 --json
mxgw smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item TestChildObject.TestInt
mxgw galaxy test-connection
mxgw galaxy last-deploy-time
mxgw galaxy discover-hierarchy
mxgw galaxy watch [--last-seen-deploy-time <rfc3339>] [--max-events N]
```
JSON output should use `serde_json`.
## Unit Tests
Use a fake `tonic` server started on a local ephemeral port, or abstract the
generated client behind a trait for unit tests.
Required tests:
- generated client compiles from proto,
- auth metadata injection,
- TLS/plaintext endpoint construction,
- value conversion,
- command request construction,
- error mapping from `tonic::Status`,
- event stream order,
- stream cancellation,
- CLI parsing,
- JSON redaction.
## Integration Tests
Skip unless:
```text
MXGATEWAY_INTEGRATION=1
```
Use `tokio::test`. Run bounded smoke flow and ensure `CloseSession` is attempted
with `drop` fallback docs, but do not rely on `Drop` for async close.
## Related Documentation
- [Client Libraries Detailed Design](../../docs/ClientLibrariesDesign.md)
- [Client Proto Generation](../../docs/ClientProtoGeneration.md)
- [Client Packaging](../../docs/ClientPackaging.md)
- [Rust Style Guide](../../docs/style-guides/RustStyleGuide.md)