Commit Graph

20 Commits

Author SHA1 Message Date
Joseph Doherty 61644e63fb Rust: take Session::read_bulk tag list by borrowed slice
Changes the public signature from `Vec<String>` (owned) to
`&[impl AsRef<str>]` so callers can re-issue the same call repeatedly
without cloning at the call site. The bench''s steady-state loop now
passes `&tags` instead of `tags.clone()`; the CLI subcommand passes the
parsed `&items`; the integration test passes `&["Area001.Pump001.Speed"]`
straight from a string literal slice.

Honest perf note: this is an ergonomics change, not a measurable speedup.
The method still has to materialise an owned `Vec<String>` internally
because prost''s generated `ReadBulkCommand` field requires it, so the
total heap traffic per call is unchanged. Across two 30-second, 5-way
concurrent bench runs at bulkSize=6:

  pre-fix  (.clone() at caller): 145.35 calls/sec, p99  62.31 ms
  post-fix run 1 (&tags):        165.98 calls/sec, p99  40.65 ms
  post-fix run 2 (&tags):        146.19 calls/sec, p99  60.04 ms

Run-to-run variance (145-166) dominates any signal from the fix. Solo
Rust release stayed at 261-267 calls/sec across both API shapes,
confirming the bench is gateway-bound under 5-way contention rather
than client-allocation-bound. The change is kept because the borrowed
slice is the idiomatic Rust API shape for "list of items the callee
does not need to take ownership of", and it cleans up the explicit
clone from the bench's inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:49:33 -04:00
Joseph Doherty 93633ce99c Cross-language ReadBulk stress benchmark
Adds a bench-read-bulk subcommand to every client CLI (.NET, Go, Rust,
Python, Java) and a PowerShell driver that runs all five concurrently
against the deployed gateway and prints a side-by-side comparison.

Each CLI''s bench:

  - Opens its own session, registers, subscribes to bulk-size tags so the
    worker''s MxAccessValueCache populates from real OnDataChange events.
  - Runs a warmup-seconds-long pre-loop with identical calls so JIT /
    connection-pool / first-call overhead is amortised before the
    measurement window.
  - Runs ReadBulk in a tight in-process loop for duration-seconds with
    per-call high-resolution latency capture (Stopwatch in .NET,
    time.Now in Go, std::time::Instant in Rust, time.perf_counter in
    Python, System.nanoTime in Java).
  - Unsubscribes + closes the session, then emits one JSON object with
    the shared schema: { language, durationMs, totalCalls, successfulCalls,
    failedCalls, totalReadResults, cachedReadResults, callsPerSecond,
    latencyMs: { p50, p95, p99, max, mean } }.

The PS driver (scripts/bench-read-bulk.ps1) launches one detached process
per client, waits for all to finish, parses the trailing JSON object from
each stdout, prints a comparison table, and persists the combined report
under artifacts/bench/. Quoting around Java''s `gradle --args="..."` is
handled by writing a one-shot .bat that cmd.exe runs; the .NET CLI''s
per-call gRPC timeout is auto-scaled to (Duration + Warmup + 30s) so the
channel-wide timeout doesn''t cancel the bench mid-loop.

Live 30-second steady-state run against the deployed gateway, all five
clients hitting the same six TestMachine_001..006.TestChangingInt tags:

  client    calls/sec  cached/total    p50 ms  p95 ms  p99 ms  max ms
  dotnet      171.78   30924/30924      3.84   14.06   40.41  542.48
  go          175.46   31590/31590      3.93   13.52   41.26  243.00
  rust        123.26   22188/22188      5.52   15.78   48.11  544.41
  python      145.79   26244/26244      4.86   14.85   41.65  645.84
  java        181.12   32604/32604      3.80   10.59   33.37  344.27

143,550 ReadBulk results across all five clients during the 30s window;
100% were was_cached = true (the worker''s cache fast-path never fell
through to the snapshot lifecycle). Aggregate read throughput ~800
calls/sec against five concurrent sessions sharing the same cached tags.

A second variant with bulk-size 20 sustained the same per-client call
rate while delivering 3.3x more values per call (~37,000 cached reads/sec
aggregate across the five concurrent sessions), confirming the linear
per-tag cache lookup inside one call is not a bottleneck at this scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 05:17:08 -04:00
Joseph Doherty f220908f3f Add bulk read/write CLI subcommands and e2e matrix coverage
The previous commit added the bulk read/write library surface in every
client; this commit makes that surface reachable from each client's CLI
and exercises it through scripts/run-client-e2e-tests.ps1.

Five new subcommands in every client CLI (.NET / Go / Rust / Python /
Java): read-bulk, write-bulk, write2-bulk, write-secured-bulk, and
write-secured2-bulk. Each follows the existing subscribe-bulk shape:

  - read-bulk takes --server-handle, --items <csv tag list>, and
    --timeout-ms (0 = worker default). JSON output carries the
    BulkReadResult fields, including was_cached so the e2e matrix can
    verify the cached-path semantics.
  - The four bulk-write families take --server-handle, --item-handles
    <csv>, --type, --values <csv>. write2-bulk and write-secured2-bulk
    add a single --timestamp applied to every entry; the secured
    variants take --current-user-id and --verifier-user-id. All four
    output BulkWriteResult JSON.

A new -SkipReadWriteBulk switch on the matrix script (default OFF)
controls two new e2e phases:

  - After the existing subscribe-bulk phase leaves tags advised, the
    script runs read-bulk against the same tag list and asserts most
    results return was_cached = true. This is the only e2e coverage of
    the cache-then-snapshot fork — the unit + gateway tests verify the
    semantics with a fake worker, but only the live cross-language
    matrix proves the cache populates from real OnDataChange events and
    survives the round-trip through every client''s JSON parser.
  - When -VerifyWrite is set, the write phase now also runs a single-
    entry write-bulk against the same writable item handle (using a
    distinct sentinel value) and asserts a per-entry success. Confirms
    the BulkWriteResult wire format end-to-end without complicating
    the OnWriteComplete echo assertion the single-item phase already
    verifies.

Dry-run validation passes for all five clients: each emits the correct
read-bulk and write-bulk CLI invocations with the right flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 04:06:14 -04:00
Joseph Doherty 5e375f6d3d Add bulk read/write command family across worker, gateway, and clients
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.

ReadBulk has no MXAccess analogue. The worker satisfies it by:

  - Returning the last cached OnDataChange payload (was_cached=true)
    when the requested tag is already in the session''s item registry
    AND advised — the existing subscription is NOT touched, since the
    caller did not create it.
  - Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
    UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
    and leaving the session exactly as it was. The wait pumps Windows
    messages on the STA so the inbound MXAccess event can dispatch
    while the executor still holds the thread.

The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.

Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.

All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.

Tests added:
  - MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
    version, TryWaitForUpdate signals on Set, pump step fires each poll.
  - MxAccessBaseEventSinkTests — OnDataChange populates the cache,
    ValueCache property exposes the bound instance.
  - MxAccessCommandExecutorTests — four bulk-write variants (per-entry
    success/failure, value+timestamp forwarding, secured user ids),
    ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
    was_successful=false), invalid-payload reply.
  - GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
  - SessionManagerTests — WriteBulk and ReadBulk forwarding through
    FakeWorkerHarness; ReadBulk forwards timeout_ms.
  - Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
    right command and returns per-entry results, ReadBulk forwards the
    timeout and unpacks the was_cached flag.

Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.

Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:42:38 -04:00
Joseph Doherty 758aca2355 Make the e2e write phase work live across all five clients
Running the matrix against a live gateway surfaced several issues:

- The write phase is now opt-in (-VerifyWrite, was -SkipWrite). It runs
  right after register so only a small event backlog precedes the write,
  and asserts the reliable OnWriteComplete signal (the written value is
  not echoed back by a provider-driven attribute like TestChangingInt, so
  the value compare is best-effort).
- Java was launched as bare "gradle", which .NET's Process.Start cannot
  exec (it is gradle.bat) — resolve the launcher and run it via cmd.exe.
- The Java client's MxEventStream queue capacity was 16, which overflows
  on any active session's backlog-replay burst; raised to 1024.
- The Rust stream-events CLI now renders the event family as the proto
  enum name, matching the protobuf-JSON the other four clients emit.

Update docs/GatewayTesting.md for the reworked write phase.

Verified live: the full five-client matrix passes with -VerifyWrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 14:45:47 -04:00
Joseph Doherty e355a7674b Add write, parity, auth, and parallel coverage to client e2e matrix
Close the notable gaps in scripts/run-client-e2e-tests.ps1:

- Write round-trip: write a per-client sentinel value to a configurable
  writable attribute, then assert it is echoed back through the event
  stream. Extends the Rust mxgw-cli stream-events output with full
  per-event JSON (itemHandle + protojson-shaped value) so all five
  language clients run an identical value compare.
- Parity: assert an invalid item handle and an unknown session id are
  rejected rather than silently succeeding.
- Auth rejection: assert open-session is rejected with a missing API key
  and, when -RejectScopeApiKeyEnv is supplied, with an insufficient-scope
  key.
- Parallel: -Parallel runs each language client as an isolated child
  process and merges their JSON reports.

Update docs/GatewayTesting.md for the new phases and flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:55:51 -04:00
Joseph Doherty 0d8a28d2fe Fix all MxGateway.Client.Rust code-review findings
Resolves Client.Rust-001 through Client.Rust-011.

Build/test/clippy gate (Client.Rust-001/002/003):
- options.rs: doc comments on with_max_grpc_message_bytes /
  max_grpc_message_bytes (#![warn(missing_docs)])
- session.rs: rename BulkReplyKind variants to drop the shared `Bulk`
  suffix (clippy::enum_variant_names)
- galaxy.rs: deref instead of clone on Option<Timestamp>
  (clippy::clone_on_copy — an extra violation the gate also hit)
- mxgw-cli: assert version_json against GATEWAY/WORKER_PROTOCOL_VERSION
  constants instead of the stale literal 2
`cargo clippy --workspace --all-targets -- -D warnings` now passes.

Correctness / error handling:
- version.rs: CLIENT_VERSION = env!("CARGO_PKG_VERSION") (Client.Rust-004)
- session.rs: register/add_item/add_item2 handle extractors and
  bulk_results now return Err(Error::MalformedReply) instead of a
  silent 0 / empty vec on a shapeless OK reply (Client.Rust-005/006)
- error.rs: new Error::Unavailable classifies Code::Unavailable /
  ResourceExhausted as transient (Client.Rust-010)
- session.rs: per-call unique correlation ids via an atomic counter
  (Client.Rust-011)

Other:
- value.rs: MxValue/MxArrayValue compute the projection on demand
  instead of caching it, so a wire-only value pays no projection cost
  (Client.Rust-008)
- RustClientDesign.md: correct the crate layout, drop the unused
  `tracing` dependency (Client.Rust-007)
- client_behavior.rs: tests for the bulk-size cap, a mid-stream status
  fault, and the unreadable-CA-file path (Client.Rust-009)

cargo fmt / test --workspace (27 tests) / clippy all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:08:55 -04:00
Joseph Doherty fe19c478c0 clients/rust: SDK methods for AcknowledgeAlarm + QueryActiveAlarms (PR E.6)
Eleventh PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Mirrors PR E.2's .NET surface
on the Rust async SDK. Depends on PR E.1 (regen, merged).

- GatewayClient::acknowledge_alarm — async unary call. Uses the
  existing unary_request helper (call timeout) and routes failures
  through Error mapping; non-OK protocol status promotes to
  Error::ProtocolStatus via ensure_protocol_success.
- GatewayClient::query_active_alarms — async server-streaming call
  returning a new ActiveAlarmStream type alias (parallel to
  EventStream). Errors are pre-mapped from tonic::Status; dropping
  the stream cancels the call cooperatively.
- GATEWAY_PROTOCOL_VERSION bumped 2 → 3 to match the .NET contract.
- FakeGateway test impl extends to satisfy the new trait methods so
  client_behavior.rs builds. Two new integration tests cover the
  new SDK methods.

Tests:
- 12 unit + 10 client_behavior + 4 proto_fixtures = 26 tests, all
  pass under cargo test (Rust 1.x via existing toolchain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:06:13 -04:00
Joseph Doherty ddad573b75 Merge origin/main with local pending work and update AGENTS.md references
- Resolve 14 conflicts from popping local stash on top of origin's
  eed1e88 + 8d3352f doc-comment additions (11 mechanical, plus
  version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs)
- Fix 4 test files that used AGENTS.md as the repo-root sentinel
  (now use CLAUDE.md, since AGENTS.md was removed in 4731ab5)
- Redirect 10 doc citations from AGENTS.md to the matching gateway.md
  sections (Value Model, Status Model, Security, STA Worker Thread
  Model, gRPC Layer rule, cancellation rule)

Verified: solution build clean, x86 worker build clean, 266/266
gateway tests passing, 121/121 worker tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:13:33 -04:00
Joseph Doherty 8d3352f2c6 Add idiomatic documentation to Go, Java, Python, and Rust clients 2026-04-30 12:04:46 -04:00
Joseph Doherty 51a9dadf62 Align docs with StyleGuide and add CLAUDE.md
- Rename 16 kebab-case docs to PascalCase per StyleGuide
- Move per-language client design docs from docs/ to clients/<lang>/
  alongside their READMEs
- Add ## Related Documentation sections to 15 docs that lacked one
- Fix sentence-case violations in H3 headings (StyleGuide rule)
- Update cross-references in gateway.md, client READMEs, scripts,
  and generate-proto.ps1 helpers to follow the new paths
- Add CLAUDE.md with build/test commands, the source-update
  verification matrix, the parity-first contract, and pointers
  to MXAccess and Galaxy Repository analysis sources

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:19:22 -04:00
Joseph Doherty 133c83029b Add Galaxy repository API and clients 2026-04-29 07:27:00 -04:00
Joseph Doherty 907aa49aea Improve gateway reliability and client e2e coverage 2026-04-28 06:11:18 -04:00
Joseph Doherty 4fc355b357 Improve gateway reliability and dashboard docs 2026-04-28 00:13:22 -04:00
Joseph Doherty 3d11ac3316 Add bulk MXAccess subscription commands 2026-04-26 22:29:27 -04:00
Joseph Doherty 4ea2c4fd86 Issue #50: clarify packaging API key placeholders 2026-04-26 21:26:28 -04:00
Joseph Doherty f2118f7028 Issue #50: document client packaging 2026-04-26 21:20:43 -04:00
Joseph Doherty 89a8fb876a Issue #44: implement Rust client session values errors and CLI 2026-04-26 20:30:04 -04:00
Joseph Doherty ee88f9d647 Scaffold Rust client workspace 2026-04-26 19:47:26 -04:00
Joseph Doherty 6a40d26366 Publish stable client proto inputs 2026-04-26 18:52:39 -04:00