mxaccessgw

Author	SHA1	Message	Date
Joseph Doherty	3e22285f09	Exercise the alarm subcommands in the client e2e matrix Add an opt-in alarm phase (-VerifyAlarms) to run-client-e2e-tests.ps1: each of the five client CLIs runs stream-alarms (asserting at least one AlarmFeedMessage) and acknowledge-alarm against the gateway's central alarm monitor. Both RPCs are session-less. -AlarmReference and -AlarmStreamMax tune the phase; GatewayTesting.md documents it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 19:47:20 -04:00
Joseph Doherty	6126099cdb	e2e: drive each client CLI through one long-lived batch process The cross-language e2e matrix spawned one CLI process per operation — ~250 per client — paying a process (and, for the Java CLI, a full JVM) cold-start every time. The Java leg alone ran ~16 minutes. Each client CLI (dotnet, go, rust, python, java) gains a `batch` subcommand: a single process that reads one command line from stdin, runs it through the normal subcommand dispatch, writes the JSON result, then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command writes its `{"error":...}` envelope and the loop continues. run-client-e2e-tests.ps1 now launches one batch process per client and pings every operation through its stdin/stdout, so startup is paid once per client. The orchestration and assertions are unchanged; the parity and auth phases now read the `{"error":...}` envelope instead of a process exit code. Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java leg dropped from ~16 min to ~2-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 06:20:13 -04:00
Joseph Doherty	c1ff8c94e8	e2e: build client CLIs once and drain events so dotnet/java pass The cross-language client e2e matrix failed for dotnet and Java. Both failures were in the harness, not the client code. 1. Per-call toolchain cold-start. The matrix issues ~250 CLI calls per client; it invoked `dotnet run` / `gradle :mxgateway-cli:run` every time, rebuilding and cold-starting the toolchain per call. Build each CLI once up front (`dotnet build`, `gradle :mxgateway-cli:installDist`) and invoke the compiled artifact directly. This alone fixes dotnet. 2. Worker event-channel overflow. The per-tag advise loop advises every discovered tag with no StreamEvents consumer attached, so change events accumulate in the worker event channel (MxGateway:Events:QueueCapacity) until FailFast faults the worker. dotnet's faster loop slipped under the window; the Java CLI's process-per-call JVM cold-start did not. Every -DrainEveryTags advised tags (default 15) the loop connects a short StreamEvents drain; the gateway's per-stream producer empties the channel the instant a subscriber attaches, so a small bounded read suffices. Full 5-client matrix (dotnet, go, rust, python, java) now passes with -VerifyWrite against a live gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 05:24:24 -04:00
Joseph Doherty	7db4bffa30	bench-read-bulk driver: invoke .NET in -c Release and Rust in --release Rust''s debug profile costs the bench ~45% of solo throughput and ~3x of p99 latency vs release (267 vs 184 solo calls/sec, p99 5.7 vs 16ms). Debug disables inlining, runs overflow checks on every arithmetic op, keeps Future state machines un-collapsed, and lets every Vec allocation through unoptimized. Other compiled clients in the matrix don''t see this gap: Go always builds optimized, Python is interpreted, and the JIT-tiered runtimes (HotSpot for Java, CoreCLR Tier 1 for .NET) close most of the gap during the warmup window. The driver now requests `cargo run --release` for Rust and `dotnet run -c Release --no-build` for .NET, so the two compiled-AOT clients race under their production-equivalent profiles. Callers must `cargo build --release -p mxgw-cli` and `dotnet build ... -c Release` once before running the bench; `--no-build` then keeps each measurement window free of compilation overhead. Live re-run (5-way concurrent, 30s, bulkSize 6) after the switch: rust: 145.35 calls/sec (was 123.26 in debug; 18% gain under contention) go: 185.59 calls/sec java: 171.80 calls/sec dotnet:172.31 calls/sec python:140.52 calls/sec Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:25:17 -04:00
Joseph Doherty	93633ce99c	Cross-language ReadBulk stress benchmark Adds a bench-read-bulk subcommand to every client CLI (.NET, Go, Rust, Python, Java) and a PowerShell driver that runs all five concurrently against the deployed gateway and prints a side-by-side comparison. Each CLI''s bench: - Opens its own session, registers, subscribes to bulk-size tags so the worker''s MxAccessValueCache populates from real OnDataChange events. - Runs a warmup-seconds-long pre-loop with identical calls so JIT / connection-pool / first-call overhead is amortised before the measurement window. - Runs ReadBulk in a tight in-process loop for duration-seconds with per-call high-resolution latency capture (Stopwatch in .NET, time.Now in Go, std::time::Instant in Rust, time.perf_counter in Python, System.nanoTime in Java). - Unsubscribes + closes the session, then emits one JSON object with the shared schema: { language, durationMs, totalCalls, successfulCalls, failedCalls, totalReadResults, cachedReadResults, callsPerSecond, latencyMs: { p50, p95, p99, max, mean } }. The PS driver (scripts/bench-read-bulk.ps1) launches one detached process per client, waits for all to finish, parses the trailing JSON object from each stdout, prints a comparison table, and persists the combined report under artifacts/bench/. Quoting around Java''s `gradle --args="..."` is handled by writing a one-shot .bat that cmd.exe runs; the .NET CLI''s per-call gRPC timeout is auto-scaled to (Duration + Warmup + 30s) so the channel-wide timeout doesn''t cancel the bench mid-loop. Live 30-second steady-state run against the deployed gateway, all five clients hitting the same six TestMachine_001..006.TestChangingInt tags: client calls/sec cached/total p50 ms p95 ms p99 ms max ms dotnet 171.78 30924/30924 3.84 14.06 40.41 542.48 go 175.46 31590/31590 3.93 13.52 41.26 243.00 rust 123.26 22188/22188 5.52 15.78 48.11 544.41 python 145.79 26244/26244 4.86 14.85 41.65 645.84 java 181.12 32604/32604 3.80 10.59 33.37 344.27 143,550 ReadBulk results across all five clients during the 30s window; 100% were was_cached = true (the worker''s cache fast-path never fell through to the snapshot lifecycle). Aggregate read throughput ~800 calls/sec against five concurrent sessions sharing the same cached tags. A second variant with bulk-size 20 sustained the same per-client call rate while delivering 3.3x more values per call (~37,000 cached reads/sec aggregate across the five concurrent sessions), confirming the linear per-tag cache lookup inside one call is not a bottleneck at this scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 05:17:08 -04:00
Joseph Doherty	f220908f3f	Add bulk read/write CLI subcommands and e2e matrix coverage The previous commit added the bulk read/write library surface in every client; this commit makes that surface reachable from each client's CLI and exercises it through scripts/run-client-e2e-tests.ps1. Five new subcommands in every client CLI (.NET / Go / Rust / Python / Java): read-bulk, write-bulk, write2-bulk, write-secured-bulk, and write-secured2-bulk. Each follows the existing subscribe-bulk shape: - read-bulk takes --server-handle, --items <csv tag list>, and --timeout-ms (0 = worker default). JSON output carries the BulkReadResult fields, including was_cached so the e2e matrix can verify the cached-path semantics. - The four bulk-write families take --server-handle, --item-handles <csv>, --type, --values <csv>. write2-bulk and write-secured2-bulk add a single --timestamp applied to every entry; the secured variants take --current-user-id and --verifier-user-id. All four output BulkWriteResult JSON. A new -SkipReadWriteBulk switch on the matrix script (default OFF) controls two new e2e phases: - After the existing subscribe-bulk phase leaves tags advised, the script runs read-bulk against the same tag list and asserts most results return was_cached = true. This is the only e2e coverage of the cache-then-snapshot fork — the unit + gateway tests verify the semantics with a fake worker, but only the live cross-language matrix proves the cache populates from real OnDataChange events and survives the round-trip through every client''s JSON parser. - When -VerifyWrite is set, the write phase now also runs a single- entry write-bulk against the same writable item handle (using a distinct sentinel value) and asserts a per-entry success. Confirms the BulkWriteResult wire format end-to-end without complicating the OnWriteComplete echo assertion the single-item phase already verifies. Dry-run validation passes for all five clients: each emits the correct read-bulk and write-bulk CLI invocations with the right flags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 04:06:14 -04:00
Joseph Doherty	758aca2355	Make the e2e write phase work live across all five clients Running the matrix against a live gateway surfaced several issues: - The write phase is now opt-in (-VerifyWrite, was -SkipWrite). It runs right after register so only a small event backlog precedes the write, and asserts the reliable OnWriteComplete signal (the written value is not echoed back by a provider-driven attribute like TestChangingInt, so the value compare is best-effort). - Java was launched as bare "gradle", which .NET's Process.Start cannot exec (it is gradle.bat) — resolve the launcher and run it via cmd.exe. - The Java client's MxEventStream queue capacity was 16, which overflows on any active session's backlog-replay burst; raised to 1024. - The Rust stream-events CLI now renders the event family as the proto enum name, matching the protobuf-JSON the other four clients emit. Update docs/GatewayTesting.md for the reworked write phase. Verified live: the full five-client matrix passes with -VerifyWrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:45:47 -04:00
Joseph Doherty	e355a7674b	Add write, parity, auth, and parallel coverage to client e2e matrix Close the notable gaps in scripts/run-client-e2e-tests.ps1: - Write round-trip: write a per-client sentinel value to a configurable writable attribute, then assert it is echoed back through the event stream. Extends the Rust mxgw-cli stream-events output with full per-event JSON (itemHandle + protojson-shaped value) so all five language clients run an identical value compare. - Parity: assert an invalid item handle and an unknown session id are rejected rather than silently succeeding. - Auth rejection: assert open-session is rejected with a missing API key and, when -RejectScopeApiKeyEnv is supplied, with an insufficient-scope key. - Parallel: -Parallel runs each language client as an isolated child process and merges their JSON reports. Update docs/GatewayTesting.md for the new phases and flags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 11:55:51 -04:00
Joseph Doherty	3cc53a8c69	Harden code-review tooling and align REVIEW-PROCESS.md with mxaccessgw - regen-readme.py: use `python` not the broken `python3` Store alias in the generated note and docstring; --check now also fails when a module header's "Open findings" count disagrees with finding statuses or a finding has an unrecognised Status (find_inconsistencies) - REVIEW-PROCESS.md: rewritten for mxaccessgw (was describing ScadaLink) — MxGateway.* modules, "mxaccessgw conventions" checklist category, gateway.md/docs/ design context, `python` command - scripts/check-code-reviews-readme.ps1: CI/pre-commit wrapper for regen-readme.py --check - code-reviews/test_regen_readme.py: dependency-free parser tests - code-reviews/README.md: regenerated Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:36:25 -04:00
Joseph Doherty	51a9dadf62	Align docs with StyleGuide and add CLAUDE.md - Rename 16 kebab-case docs to PascalCase per StyleGuide - Move per-language client design docs from docs/ to clients/<lang>/ alongside their READMEs - Add ## Related Documentation sections to 15 docs that lacked one - Fix sentence-case violations in H3 headings (StyleGuide rule) - Update cross-references in gateway.md, client READMEs, scripts, and generate-proto.ps1 helpers to follow the new paths - Add CLAUDE.md with build/test commands, the source-update verification matrix, the parity-first contract, and pointers to MXAccess and Galaxy Repository analysis sources Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 10:19:22 -04:00
Joseph Doherty	133c83029b	Add Galaxy repository API and clients	2026-04-29 07:27:00 -04:00
Joseph Doherty	907aa49aea	Improve gateway reliability and client e2e coverage	2026-04-28 06:11:18 -04:00
Joseph Doherty	4fc355b357	Improve gateway reliability and dashboard docs	2026-04-28 00:13:22 -04:00
Joseph Doherty	d431ff9660	Fix dashboard static assets and add client e2e scripts	2026-04-27 12:10:40 -04:00
Joseph Doherty	108a3d3f8a	Add client behavior fixtures	2026-04-26 19:11:04 -04:00
Joseph Doherty	6a40d26366	Publish stable client proto inputs	2026-04-26 18:52:39 -04:00

16 Commits