0a54fa5e35cb3526a2b620ce83ba78e2d9e781fc
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6126099cdb |
e2e: drive each client CLI through one long-lived batch process
The cross-language e2e matrix spawned one CLI process per operation —
~250 per client — paying a process (and, for the Java CLI, a full JVM)
cold-start every time. The Java leg alone ran ~16 minutes.
Each client CLI (dotnet, go, rust, python, java) gains a `batch`
subcommand: a single process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result,
then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command
writes its `{"error":...}` envelope and the loop continues.
run-client-e2e-tests.ps1 now launches one batch process per client and
pings every operation through its stdin/stdout, so startup is paid once
per client. The orchestration and assertions are unchanged; the parity
and auth phases now read the `{"error":...}` envelope instead of a
process exit code.
Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java
leg dropped from ~16 min to ~2-3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b794c46bc7 |
File and fix Server-030 and Client.Dotnet-017 from e2e surfacing
Both findings surfaced when running the cross-language e2e matrix (scripts/run-client-e2e-tests.ps1) against the redeployed gateway at commit |
||
|
|
1aafd6bde4 |
Code-review 2026-05-20 sweep #2: re-review at a020350, resolve 48 findings
Second re-review pass at commit
|
||
|
|
a0203503a7 |
Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit
|
||
|
|
1cd51bbda3 |
.NET CLI: bench-stream-events for max event-throughput characterization
New subcommand drives the gateway''s StreamEvents server-stream as fast
as it can from a single client process. Subscribes to --bulk-size tags
(rotating through all six TestMachine attributes by default) and counts
events received over a --duration-seconds steady-state window. Tracks
events/sec, end-to-end latency (now - event.worker_timestamp), and any
worker faults observed via a post-run DrainEvents probe.
--session-count opens N independent gateway sessions from the same
client process — each session is independent at the gateway (own
worker, own event subscriber, own item handles) so this measures how
the gateway multiplexes concurrent event streams without needing
multiple client processes. Sessions are staggered open by default
(--session-start-stagger-ms 750) because firing N concurrent
OpenSession calls forces N concurrent worker x86 spawns, and on a dev
rig that exceeds the gateway''s 30-second worker startup timeout
around N >= 6-8. The stagger gives each worker headroom to init its
COM apartment + attach the event sink before the next one starts.
Phase 1 of the bench opens + subscribes every session sequentially;
phase 2 opens the steady-state window once everyone is advised, so
the measurement isn''t skewed by late-arriving sessions still in
warmup. The latency sample is shared across sessions (locked
List<double>); event counts use Interlocked.
Initial sweep at --bulk-size 120 against the dev galaxy (20 machines
x 6 attributes = 120 unique tags) showed:
- Linear throughput scaling with subscribed-tag count: N=6→2 ev/s,
N=24→8 ev/s, N=60→20 ev/s, N=120→41 ev/s. The dev galaxy is
producer-bound at ~0.34 events/sec per advised tag — gateway has
plenty of headroom.
- Latency stayed at p50 ≈17ms, p95 ≈34ms across the entire range —
no degradation with subscribed-tag count.
- Zero queue-overflow faults; gateway 10k-event buffer never came
close to filling at this producer rate.
- Linear scaling with session count too (staggered open): 1→44, 2→81,
4→130, 8→324 events/sec at p50 16ms across all session counts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
93633ce99c |
Cross-language ReadBulk stress benchmark
Adds a bench-read-bulk subcommand to every client CLI (.NET, Go, Rust,
Python, Java) and a PowerShell driver that runs all five concurrently
against the deployed gateway and prints a side-by-side comparison.
Each CLI''s bench:
- Opens its own session, registers, subscribes to bulk-size tags so the
worker''s MxAccessValueCache populates from real OnDataChange events.
- Runs a warmup-seconds-long pre-loop with identical calls so JIT /
connection-pool / first-call overhead is amortised before the
measurement window.
- Runs ReadBulk in a tight in-process loop for duration-seconds with
per-call high-resolution latency capture (Stopwatch in .NET,
time.Now in Go, std::time::Instant in Rust, time.perf_counter in
Python, System.nanoTime in Java).
- Unsubscribes + closes the session, then emits one JSON object with
the shared schema: { language, durationMs, totalCalls, successfulCalls,
failedCalls, totalReadResults, cachedReadResults, callsPerSecond,
latencyMs: { p50, p95, p99, max, mean } }.
The PS driver (scripts/bench-read-bulk.ps1) launches one detached process
per client, waits for all to finish, parses the trailing JSON object from
each stdout, prints a comparison table, and persists the combined report
under artifacts/bench/. Quoting around Java''s `gradle --args="..."` is
handled by writing a one-shot .bat that cmd.exe runs; the .NET CLI''s
per-call gRPC timeout is auto-scaled to (Duration + Warmup + 30s) so the
channel-wide timeout doesn''t cancel the bench mid-loop.
Live 30-second steady-state run against the deployed gateway, all five
clients hitting the same six TestMachine_001..006.TestChangingInt tags:
client calls/sec cached/total p50 ms p95 ms p99 ms max ms
dotnet 171.78 30924/30924 3.84 14.06 40.41 542.48
go 175.46 31590/31590 3.93 13.52 41.26 243.00
rust 123.26 22188/22188 5.52 15.78 48.11 544.41
python 145.79 26244/26244 4.86 14.85 41.65 645.84
java 181.12 32604/32604 3.80 10.59 33.37 344.27
143,550 ReadBulk results across all five clients during the 30s window;
100% were was_cached = true (the worker''s cache fast-path never fell
through to the snapshot lifecycle). Aggregate read throughput ~800
calls/sec against five concurrent sessions sharing the same cached tags.
A second variant with bulk-size 20 sustained the same per-client call
rate while delivering 3.3x more values per call (~37,000 cached reads/sec
aggregate across the five concurrent sessions), confirming the linear
per-tag cache lookup inside one call is not a bottleneck at this scale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
eaa7093cd6 |
.NET CLI: register the five new bulk subcommands in IsKnownGatewayCommand
The previous commit added read-bulk / write-bulk / write2-bulk / write-secured-bulk / write-secured2-bulk dispatch cases to RunCoreAsync but left them out of IsKnownGatewayCommand, so the .NET CLI rejected them at the pre-dispatch gate and printed the usage banner instead of running the new code paths. Surfaced when the live e2e exercised the read-bulk phase against the deployed gateway — the call routed through the unknown-command path before reaching the protobuf builder. Also extends WriteUsage with one line per new subcommand so the banner documents the new surface. Live e2e against the deployed gateway now passes for all five clients (dotnet, go, rust, python, java) with 4/4 tags returning was_cached=true after the subscribe-bulk + read-bulk path, confirming the worker MxAccessValueCache populates from real MXAccess OnDataChange events and round-trips through every client''s JSON parser. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f220908f3f |
Add bulk read/write CLI subcommands and e2e matrix coverage
The previous commit added the bulk read/write library surface in every
client; this commit makes that surface reachable from each client's CLI
and exercises it through scripts/run-client-e2e-tests.ps1.
Five new subcommands in every client CLI (.NET / Go / Rust / Python /
Java): read-bulk, write-bulk, write2-bulk, write-secured-bulk, and
write-secured2-bulk. Each follows the existing subscribe-bulk shape:
- read-bulk takes --server-handle, --items <csv tag list>, and
--timeout-ms (0 = worker default). JSON output carries the
BulkReadResult fields, including was_cached so the e2e matrix can
verify the cached-path semantics.
- The four bulk-write families take --server-handle, --item-handles
<csv>, --type, --values <csv>. write2-bulk and write-secured2-bulk
add a single --timestamp applied to every entry; the secured
variants take --current-user-id and --verifier-user-id. All four
output BulkWriteResult JSON.
A new -SkipReadWriteBulk switch on the matrix script (default OFF)
controls two new e2e phases:
- After the existing subscribe-bulk phase leaves tags advised, the
script runs read-bulk against the same tag list and asserts most
results return was_cached = true. This is the only e2e coverage of
the cache-then-snapshot fork — the unit + gateway tests verify the
semantics with a fake worker, but only the live cross-language
matrix proves the cache populates from real OnDataChange events and
survives the round-trip through every client''s JSON parser.
- When -VerifyWrite is set, the write phase now also runs a single-
entry write-bulk against the same writable item handle (using a
distinct sentinel value) and asserts a per-entry success. Confirms
the BulkWriteResult wire format end-to-end without complicating
the OnWriteComplete echo assertion the single-item phase already
verifies.
Dry-run validation passes for all five clients: each emits the correct
read-bulk and write-bulk CLI invocations with the right flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
89043cb2b6 |
Resolve Client.Dotnet-004..008 code-review findings
Client.Dotnet-004: documented DefaultCallTimeout as both the per-attempt deadline and the shared retry budget, and removed DeadlineExceeded from the transient-retry set (a client-imposed deadline cannot be helped by retrying). Client.Dotnet-005: RegisterAsync/AddItemAsync/AddItem2Async silently returned 0 when a successful reply lacked the typed payload. They now throw a descriptive MxGatewayException. Client.Dotnet-006: added XML docs to the previously undocumented public members MaxGrpcMessageBytes, GatewayProtocolVersion, WorkerProtocolVersion. Client.Dotnet-007: corrected the AcknowledgeAlarmAsync XML comment — the RPC requires the admin scope, not a non-existent invoke:alarm-ack sub-scope. Client.Dotnet-008: the CLI redactor missed env-var-sourced keys because the caller passed only the --api-key option. Redaction now uses the same resolver, stripping env-var keys too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ddad573b75 |
Merge origin/main with local pending work and update AGENTS.md references
- Resolve 14 conflicts from popping local stash on top of origin's |
||
|
|
eed1e88a37 | Add XML documentation across gateway, worker, and .NET client | ||
|
|
133c83029b | Add Galaxy repository API and clients | ||
|
|
907aa49aea | Improve gateway reliability and client e2e coverage | ||
|
|
4fc355b357 | Improve gateway reliability and dashboard docs | ||
|
|
499708b2a2 | Issue #40: implement .NET values status errors and CLI | ||
|
|
7331c6157a | Scaffold .NET client projects |