Commit Graph

8 Commits

Author SHA1 Message Date
Joseph Doherty 1b6ca07bb5 Point the Rust client at the StreamAlarms alarm feed
Replace GatewayClient::query_active_alarms with stream_alarms, an
AlarmFeedStream over AlarmFeedMessage served by the gateway's central
alarm monitor (snapshot, snapshot_complete, then live transitions).
Drops session_id from the acknowledge surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:49:26 -04:00
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00
Joseph Doherty 61644e63fb Rust: take Session::read_bulk tag list by borrowed slice
Changes the public signature from `Vec<String>` (owned) to
`&[impl AsRef<str>]` so callers can re-issue the same call repeatedly
without cloning at the call site. The bench''s steady-state loop now
passes `&tags` instead of `tags.clone()`; the CLI subcommand passes the
parsed `&items`; the integration test passes `&["Area001.Pump001.Speed"]`
straight from a string literal slice.

Honest perf note: this is an ergonomics change, not a measurable speedup.
The method still has to materialise an owned `Vec<String>` internally
because prost''s generated `ReadBulkCommand` field requires it, so the
total heap traffic per call is unchanged. Across two 30-second, 5-way
concurrent bench runs at bulkSize=6:

  pre-fix  (.clone() at caller): 145.35 calls/sec, p99  62.31 ms
  post-fix run 1 (&tags):        165.98 calls/sec, p99  40.65 ms
  post-fix run 2 (&tags):        146.19 calls/sec, p99  60.04 ms

Run-to-run variance (145-166) dominates any signal from the fix. Solo
Rust release stayed at 261-267 calls/sec across both API shapes,
confirming the bench is gateway-bound under 5-way contention rather
than client-allocation-bound. The change is kept because the borrowed
slice is the idiomatic Rust API shape for "list of items the callee
does not need to take ownership of", and it cleans up the explicit
clone from the bench's inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:49:33 -04:00
Joseph Doherty 5e375f6d3d Add bulk read/write command family across worker, gateway, and clients
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.

ReadBulk has no MXAccess analogue. The worker satisfies it by:

  - Returning the last cached OnDataChange payload (was_cached=true)
    when the requested tag is already in the session''s item registry
    AND advised — the existing subscription is NOT touched, since the
    caller did not create it.
  - Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
    UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
    and leaving the session exactly as it was. The wait pumps Windows
    messages on the STA so the inbound MXAccess event can dispatch
    while the executor still holds the thread.

The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.

Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.

All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.

Tests added:
  - MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
    version, TryWaitForUpdate signals on Set, pump step fires each poll.
  - MxAccessBaseEventSinkTests — OnDataChange populates the cache,
    ValueCache property exposes the bound instance.
  - MxAccessCommandExecutorTests — four bulk-write variants (per-entry
    success/failure, value+timestamp forwarding, secured user ids),
    ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
    was_successful=false), invalid-payload reply.
  - GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
  - SessionManagerTests — WriteBulk and ReadBulk forwarding through
    FakeWorkerHarness; ReadBulk forwards timeout_ms.
  - Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
    right command and returns per-entry results, ReadBulk forwards the
    timeout and unpacks the was_cached flag.

Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.

Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:42:38 -04:00
Joseph Doherty 0d8a28d2fe Fix all MxGateway.Client.Rust code-review findings
Resolves Client.Rust-001 through Client.Rust-011.

Build/test/clippy gate (Client.Rust-001/002/003):
- options.rs: doc comments on with_max_grpc_message_bytes /
  max_grpc_message_bytes (#![warn(missing_docs)])
- session.rs: rename BulkReplyKind variants to drop the shared `Bulk`
  suffix (clippy::enum_variant_names)
- galaxy.rs: deref instead of clone on Option<Timestamp>
  (clippy::clone_on_copy — an extra violation the gate also hit)
- mxgw-cli: assert version_json against GATEWAY/WORKER_PROTOCOL_VERSION
  constants instead of the stale literal 2
`cargo clippy --workspace --all-targets -- -D warnings` now passes.

Correctness / error handling:
- version.rs: CLIENT_VERSION = env!("CARGO_PKG_VERSION") (Client.Rust-004)
- session.rs: register/add_item/add_item2 handle extractors and
  bulk_results now return Err(Error::MalformedReply) instead of a
  silent 0 / empty vec on a shapeless OK reply (Client.Rust-005/006)
- error.rs: new Error::Unavailable classifies Code::Unavailable /
  ResourceExhausted as transient (Client.Rust-010)
- session.rs: per-call unique correlation ids via an atomic counter
  (Client.Rust-011)

Other:
- value.rs: MxValue/MxArrayValue compute the projection on demand
  instead of caching it, so a wire-only value pays no projection cost
  (Client.Rust-008)
- RustClientDesign.md: correct the crate layout, drop the unused
  `tracing` dependency (Client.Rust-007)
- client_behavior.rs: tests for the bulk-size cap, a mid-stream status
  fault, and the unreadable-CA-file path (Client.Rust-009)

cargo fmt / test --workspace (27 tests) / clippy all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:08:55 -04:00
Joseph Doherty fe19c478c0 clients/rust: SDK methods for AcknowledgeAlarm + QueryActiveAlarms (PR E.6)
Eleventh PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Mirrors PR E.2's .NET surface
on the Rust async SDK. Depends on PR E.1 (regen, merged).

- GatewayClient::acknowledge_alarm — async unary call. Uses the
  existing unary_request helper (call timeout) and routes failures
  through Error mapping; non-OK protocol status promotes to
  Error::ProtocolStatus via ensure_protocol_success.
- GatewayClient::query_active_alarms — async server-streaming call
  returning a new ActiveAlarmStream type alias (parallel to
  EventStream). Errors are pre-mapped from tonic::Status; dropping
  the stream cancels the call cooperatively.
- GATEWAY_PROTOCOL_VERSION bumped 2 → 3 to match the .NET contract.
- FakeGateway test impl extends to satisfy the new trait methods so
  client_behavior.rs builds. Two new integration tests cover the
  new SDK methods.

Tests:
- 12 unit + 10 client_behavior + 4 proto_fixtures = 26 tests, all
  pass under cargo test (Rust 1.x via existing toolchain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:06:13 -04:00
Joseph Doherty 3d11ac3316 Add bulk MXAccess subscription commands 2026-04-26 22:29:27 -04:00
Joseph Doherty 89a8fb876a Issue #44: implement Rust client session values errors and CLI 2026-04-26 20:30:04 -04:00