Commit Graph

232 Commits

Author SHA1 Message Date
Joseph Doherty 430187c28b code-reviews: regenerate after batch 1 resolutions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:39 -04:00
Joseph Doherty f5b50c4484 Resolve Client.Python-022..026: TLS-by-default, batch CLI, README
Client.Python-022  README CLI examples for stream-alarms and
                   acknowledge-alarm now use the correct flags;
                   regression test parses every documented line through
                   Click.
Client.Python-023  Re-applied Client.Python-013 — _use_plaintext drops
                   the silent localhost / 127.0.0.1 auto-downgrade
                   branch; --plaintext and --tls are mutually exclusive
                   and TLS is the default.
Client.Python-024  batch dispatch routes through main.main(...,
                   standalone_mode=False) under a redirected stdout
                   instead of click.testing.CliRunner; recursive batch
                   lines are rejected outright.
Client.Python-025  Added behavioural tests for the five bulk SDK methods,
                   stream_alarms, and the new CLI subcommands.
Client.Python-026  _bench_read_bulk hoists 'import time' to module scope
                   and logs cleanup failures instead of swallowing them.

All resolved at 2026-05-24; python -m pytest is 65/65 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:27 -04:00
Joseph Doherty 4a0f88b17d Resolve Client.Rust-022..029: MalformedReply, correlation ids, clippy
Client.Rust-022  Restored Error::MalformedReply for register / add_item /
                 add_item2 and the bulk-subscribe / read-bulk / write-bulk
                 dispatch arms so malformed-but-OK replies fail loudly
                 instead of returning Vec::new().
Client.Rust-023  Restored next_correlation_id and routed every CLI close /
                 stream-alarms / acknowledge-alarm / bench-read-bulk call
                 through it so each call carries a unique opaque token.
Client.Rust-024  Added round-trip tests for read_bulk / write_bulk /
                 write2_bulk / write_secured_bulk / write_secured2_bulk
                 plus stream_alarms and percentile_summary unit tests.
Client.Rust-025  RustClientDesign.md re-synced — new bulk SDK, alarms
                 surface, Error variants, CLI command list, and the
                 Windows stack workaround.
Client.Rust-026  Session::read_bulk now borrows a tag slice; bench-read-
                 bulk binds tags once outside the warm-up / steady-state
                 loops.
Client.Rust-027  .cargo/config.toml selector tightened to
                 cfg(all(windows, target_env = "msvc")) and comment
                 rewritten to match reality (release + debug ship the
                 8 MB reservation).
Client.Rust-028  run_batch removed the empty-line break; stdin EOF is
                 the only terminator.
Client.Rust-029  Re-applied Client.Rust-001 / 002 / 012 — added the
                 missing doc comments, renamed BulkReplyKind variants,
                 and replaced the clone-on-copy with a deref under lock
                 so cargo clippy -D warnings is clean.

All resolved at 2026-05-24; cargo fmt + check + clippy + test all green
(55 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:50:15 -04:00
Joseph Doherty 82996aa8e6 Resolve Client.Go-022..027: bulk flags, bench cancel, batch loop
Client.Go-022  Re-applied Client.Go-015 shape — runWriteBulkVariant drops
               the unused secured param and gates -current-user-id /
               -verifier-user-id / -user-id behind the secured-only
               variants.
Client.Go-023  Re-applied Client.Go-018 shape — bench warm-up and steady-
               state loops respect ctx.Err().
Client.Go-024  Added SDK-level tests for WriteBulk / Write2Bulk /
               WriteSecuredBulk / WriteSecured2Bulk / ReadBulk and
               StreamAlarms via the existing bufconn fake gateway pattern.
Client.Go-025  Five bulk SDK methods short-circuit on empty input without
               an RPC round-trip and document the behavior.
Client.Go-026  runBatch widens scanner.Buffer to 16 MiB and emits an
               error-with-sentinel if a longer line still arrives, rather
               than aborting the session silently.
Client.Go-027  runBatch treats blank lines as skip-and-continue; only EOF
               ends the session.

All resolved at 2026-05-24; gofmt + go vet + go build + go test ./... all
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:58 -04:00
Joseph Doherty 712cb06442 Resolve Client.Dotnet-018..021: README + bench-read-bulk hardening
Client.Dotnet-018  README CLI examples for stream-alarms / acknowledge-alarm
                   replaced with parser-correct flags; new theory test
                   parses each documented README example through the CLI.
Client.Dotnet-019  BenchReadBulkAsync routes through new
                   RequireRegisterServerHandle helper that fails loudly when
                   the OK register reply has no typed payload.
Client.Dotnet-020  Bench steady-state catch is now
                   catch (Exception ex) when (ex is not OperationCanceledException)
                   so user-driven cancellation exits promptly.
Client.Dotnet-021  --timeout-ms now flows through ParseTimeoutMs which
                   rejects negatives with a clear error in both read-bulk
                   and bench-read-bulk.

All resolved at 2026-05-24; 67/67 .NET client tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:45 -04:00
Joseph Doherty 4d77279e7e Resolve Server-044..050: KillWorker accounting + admin service hardening
Server-044  KillWorkerAsync catch path now calls _metrics.SessionRemoved
            so the open-session gauge does not leak when KillWorker throws.
Server-045  KillWorkerAsync routes through a new
            GatewaySession.KillWorkerWithCloseGateAsync that takes the
            per-session close lock, so concurrent kills count SessionsClosed
            exactly once.
Server-046  CloseSessionCoreAsync's SessionCloseStartedException branch and
            ShutdownAsync's kill fallback both increment SessionsClosed (not
            just the gauge), so the counter and gauge stay consistent.
Server-047  ApiKeysPage.ConfirmPendingAsync holds PendingAction across the
            awaited action and clears it in finally, matching the sessions
            pages.
Server-048  Closed: the 044/045 regression tests cover the previously-
            untested kill paths.
Server-049  IDashboardSessionAdminService + DashboardSessionAdminService
            now carry XML docs that pin the Admin gate, missing-session
            return-Fail semantics, and the dashboard-admin-kill reason.
Server-050  CloseSessionAsync and KillWorkerAsync catch unexpected
            exceptions after the SessionManagerException catches and return
            a friendly Fail; OperationCanceledException tied to the caller
            token still propagates.

All resolved at 2026-05-24; 503/503 gateway tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:49:34 -04:00
Joseph Doherty 6079c62709 code-reviews: regenerate index at 42b0037
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:56 -04:00
Joseph Doherty 37ef27e8ed code-reviews: bump Worker + Worker.Tests headers to 42b0037
No source changes since d692232 — header bumped to track the latest
reviewed commit, all prior findings remain closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:56 -04:00
Joseph Doherty db2218f395 code-reviews: re-review Client.Java at 42b0037
Append 5 new findings (Client.Java-032..036): README flags for new
alarm subcommands do not exist; StreamAlarmsCommand bounded queue
silently drops alarms instead of fail-fast; BatchCommand whitespace
tokenisation shreds quoted args; no library-side stream_alarms test;
MxGatewayAlarmFeedSubscription is the fourth duplicate subscription
class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:55 -04:00
Joseph Doherty bc28fee641 code-reviews: re-review Client.Python at 42b0037
Append 5 new findings (Client.Python-022..026): README flags for new
alarm subcommands do not exist; Client.Python-013 regression — the
silent localhost auto-plaintext branch is still present (the prior
Resolution did not survive the rename); production batch path uses
the click.testing.CliRunner helper; no behavioural tests for new SDK
+ CLI; bench cleanup swallows exceptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:55 -04:00
Joseph Doherty 15fceed536 code-reviews: re-review Client.Rust at 42b0037
Append 8 new findings (Client.Rust-022..029): MalformedReply path
absent from the new bulk SDK methods, hard-coded client_correlation_id
in new CLI commands, no tests for stream_alarms / bulk SDK / bench,
RustClientDesign.md silent on new surface, run_bench_read_bulk clones
tags inside the loop, .cargo/config.toml comment is wrong, run_batch
exits on blank line, and cargo clippy fails at HEAD with three
warnings the prior reviewer punted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:41 -04:00
Joseph Doherty afa82e0989 code-reviews: re-review Client.Go at 42b0037
Append 6 new findings (Client.Go-022..027): regressions of Client.Go-015
(runWriteBulkVariant secured flag) and Client.Go-018 (bench loop
ignores ctx.Err()) reintroduced by the bulk port; no SDK tests for the
new WriteBulk/ReadBulk/StreamAlarms methods; bulk SDK accepts empty
slices; bufio.Scanner buffer + blank-line sentinel can abort the batch
session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:41 -04:00
Joseph Doherty b9ef09d26e code-reviews: re-review Client.Dotnet at 42b0037
Append 4 new findings (Client.Dotnet-018..021): README flags for the
new stream-alarms/acknowledge-alarm subcommands cite options that do
not exist on the CLI; BenchReadBulkAsync reinstates the silent
register-handle fallback and swallows OperationCanceledException;
both new --timeout-ms consumers cast int32 to uint without bounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:40 -04:00
Joseph Doherty 7d66967122 code-reviews: re-review IntegrationTests at 42b0037
Append 1 new finding (IntegrationTests-025): the
ResolveRepositoryRoot_NoMarkers test walks up from Path.GetTempPath()
through unisolated ancestors; a redirected TMP or co-located checkout
at C:\src silently breaks the assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:26 -04:00
Joseph Doherty 2f8404d2ef code-reviews: re-review Contracts at 42b0037
No new findings — the only Contracts commit in window (bd1d1f1) is a
comment-only proto edit; field numbering remains additive with no
reuse, renumbering, or type narrowing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:25 -04:00
Joseph Doherty 2b92be02b9 code-reviews: re-review Tests at 42b0037
Append 5 new findings (Tests-027..031) covering the
StreamEvents_WhenEventIsWritten_RecordsSendDuration flake root cause
(shared MeterListener by meter name), missing kill-path coverage
(reason propagation + concurrent-kill double-count), asymmetric guard
coverage between Close and Kill, missing audit-failure-path coverage
for ApiKey Delete, and the DashboardSnapshotPublisher reconnect-window
timer sensitivity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:15 -04:00
Joseph Doherty 056f0d8808 code-reviews: re-review Server at 42b0037
Append 7 new findings (Server-044..050) covering the destructive-action
wave: KillWorkerAsync metric/state leaks, ShutdownAsync kill-fallback
gauge leak, inconsistent ConfirmDialog cleanup across pages, missing
XML docs on the new DashboardSessionAdmin surface, and unhandled
RemoveSessionAsync exception paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:15 -04:00
Joseph Doherty 42b0037376 Dashboard: replace inline fully-qualified type refs with @using
ApiKeysPage and GalaxyPage referenced ApiKeyConstraints and
GalaxyRepositoryOptions by full namespace inside @code. Add the
appropriate @using directive at the top of each file and let the
short type names resolve through that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:51:00 -04:00
Joseph Doherty de7639a3e9 Dashboard: fix 500 on / from duplicate endpoint mapping
GatewayApplication.MapGatewayEndpoints registered MapGet("/", ...)
that redirected to /health/live. After the dashboard added a Blazor
component at @page "/", both endpoints matched GET / and the matcher
threw AmbiguousMatchException, surfacing as a 500. The redirect
predated the dashboard home page — drop it so / lands on Overview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:48:43 -04:00
Joseph Doherty 8738735f0d clients: document StreamAlarms + AcknowledgeAlarm in each README
Each client's README now covers the alarms surface in both the SDK
section (StreamAlarms / AcknowledgeAlarm beside the existing
QueryActiveAlarms entry, with the streaming-cancellation note) and
the CLI examples (stream-alarms / acknowledge-alarm invocations
mirroring the in-tree implementations across .NET, Go, Rust, Python,
and Java).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:40:23 -04:00
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00
Joseph Doherty 24cc5fd0f0 Dashboard: delete revoked API keys + confirm Rotate/Revoke/Delete
Add IApiKeyAdminStore.DeleteAsync that only deletes already-revoked
rows (active keys must be revoked first so the revoke event lands in
the audit log before the row disappears) and a matching admin-gated
DashboardApiKeyManagementService.DeleteAsync. ApiKeysPage now shows
Delete on revoked rows in place of the old "No actions" stub, and
Rotate/Revoke/Delete all route through ConfirmDialog so each
destructive action requires an explicit confirmation step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:30:30 -04:00
Joseph Doherty c5153d68bb Dashboard: fix API keys page stuck on "Loading"
ApiKeysPage.OnInitializedAsync overrode the base without chaining, so
DashboardPageBase's Snapshot seed and hub connect never ran and the
page rendered the null-snapshot empty state forever.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:22:38 -04:00
Joseph Doherty 0e56b5befb Dashboard: confirm before Close session / Kill worker
Add a shared ConfirmDialog component and route Sessions, Workers, and
SessionDetails Close/Kill buttons through it. The dialog shows the
target session id and a color-matched confirm button (yellow Close,
red Kill); Cancel dismisses without invoking the admin service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:17:32 -04:00
Joseph Doherty c5e7479ee4 Dashboard: admin-only Close session / Kill worker
Add IDashboardSessionAdminService (Admin-role gate, friendly errors,
audit logging) wrapping a new ISessionManager.KillWorkerAsync that
skips graceful shutdown and cleans up registry/metrics. Sessions,
Workers, and SessionDetails pages render Close / Kill buttons only
when CanManage; the service re-checks the role on every call so
forged clicks return Unauthenticated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:10:32 -04:00
Joseph Doherty 8a0c59d7e8 Java client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to the Java CLI. stream-alarms
attaches to the gateway's central alarm feed (--filter-prefix, --limit, --json
— NDJSON, one AlarmFeedMessage per line); acknowledge-alarm is a unary ack
(--reference required, --comment, --operator). streamAlarms joins
queryActiveAlarms on MxGatewayClient and uses a new
MxGatewayAlarmFeedSubscription cancellable handle. Batch dispatch re-enters the
picocli command line per stdin line, so registering the two new subcommands
suffices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 06:46:03 -04:00
Joseph Doherty 828e3e6cf6 Python client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to mxgw-py. stream-alarms reads a
bounded slice of the gateway's central alarm feed (--filter-prefix,
--max-messages, --timeout, --json; aggregate `{messages: [...]}`);
acknowledge-alarm is a unary ack (--reference required, --comment, --operator).
GatewayClient.stream_alarms joins query_active_alarms via a
_canceling_alarm_feed_iterator helper mirroring the existing
_canceling_active_alarms_iterator pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 06:45:54 -04:00
Joseph Doherty 7de4efeb02 Rust client: port stream-alarms and acknowledge-alarm + fix stream-events family + 8MB Windows stack
Adds the session-less alarm CLI subcommands to mxgw. stream-alarms attaches to
the gateway's central alarm feed (--filter-prefix, --max-events, --json/--jsonl;
aggregate shape `{messageCount, messages: [...]}`); acknowledge-alarm is a unary
ack (--reference required, --comment, --operator). stream_alarms joins
query_active_alarms on GatewayClient and re-exports AlarmFeedStream.

Also extends stream-events JSON to emit a full `events` array (itemHandle, value
projected to protojson-shaped `*Value` keys, etc.) instead of just `eventCount`,
matching the other four CLIs, and renders MxEvent.family as the protobuf enum
NAME (MX_EVENT_FAMILY_ON_WRITE_COMPLETE) rather than the raw i32 so the e2e
write round-trip can recognise the OnWriteComplete echo.

Adds clients/rust/.cargo/config.toml bumping the Windows main-thread stack to
8 MB via /STACK:8388608. clap-derive's Command enum (one variant per subcommand)
overflowed the default 1 MB stack in debug builds after the new variants
landed; release builds were unaffected but the e2e matrix runs Rust via
`cargo run` (debug).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 06:45:46 -04:00
Joseph Doherty 6f0d142639 Go client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to mxgw-go. stream-alarms attaches
to the gateway's central alarm feed (--filter-prefix, --limit, --json — NDJSON,
one AlarmFeedMessage per line); acknowledge-alarm is a unary ack (--reference
required, --comment, --operator). StreamAlarms joins QueryActiveAlarms on the
public Client and is wired through the existing batch dispatcher via runWithIO.
SDK type aliases for StreamAlarmsRequest / AlarmFeedMessage / StreamAlarmsClient
land alongside the existing alarm types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 06:45:32 -04:00
Joseph Doherty 11cc6715ed .NET client: port stream-alarms and acknowledge-alarm + fix stream-events OCE
Adds the session-less alarm CLI subcommands. stream-alarms attaches to the
gateway's central alarm feed (--filter-prefix, --max-events, --json/--jsonl);
acknowledge-alarm is a unary ack (--reference required, --comment, --operator).
StreamAlarmsAsync joins QueryActiveAlarmsAsync on MxGatewayClient and the
transport interface; the CLI client interface, adapter, and FakeGatewayTransport
follow.

Also fixes the OCE bug exposed by -VerifyWrite in the cross-language e2e:
StreamEventsAsync's await foreach now swallows OperationCanceledException when
the supplied cancellation token is the one that fired (graceful end-of-window),
and RunBatchAsync no longer excludes OCE from its outer catch — so a streaming
command that hits its --timeout reports a JSON error inside its EOR-delimited
record instead of killing the long-lived batch process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 06:45:24 -04:00
Joseph Doherty f90bff01db Java client: port bulk read/write SDK methods + CLI subcommands
Final language in the bulk-CLI port wave. HEAD's MxGatewaySession had
only the subscribe-style bulks; this commit adds the value-bulks plus
matching picocli subcommands and a bench-read-bulk harness.

SDK (MxGatewaySession.java):
- List<BulkWriteResult> writeBulk(int serverHandle, List<WriteBulkEntry> entries)
- List<BulkWriteResult> write2Bulk(int serverHandle, List<Write2BulkEntry> entries)
- List<BulkWriteResult> writeSecuredBulk(int serverHandle, List<WriteSecuredBulkEntry> entries)
- List<BulkWriteResult> writeSecured2Bulk(int serverHandle, List<WriteSecured2BulkEntry> entries)
- List<BulkReadResult> readBulk(int serverHandle, List<String> tagAddresses, Duration timeout)

readBulk uses java.time.Duration for the timeout parameter (idiomatic
Java) and internally converts to the timeoutMs proto field;
Duration.ZERO / null both delegate to the worker default. Per-entry
secured user ids stay on each WriteSecured(2)BulkEntry to match the
proto's per-row shape.

CLI (MxGatewayCli.java):
- read-bulk / write-bulk / write2-bulk / write-secured-bulk /
  write-secured2-bulk as picocli @Command subcommands. Write families
  share value-parsing logic; gating of --current-user-id /
  --verifier-user-id / --timestamp matches the cross-language flag
  contract.
- bench-read-bulk: --iterations / --warmup loop with avg/min/max ms
  reporting plus a --json mode that emits the cross-language bench
  JSON schema.

A small fixture in MxGatewayCliTests.FakeSession adds stub
implementations of the five new interface methods so the test module
compiles.

Verification: gradle build BUILD SUCCESSFUL (4 tasks executed, all
tests pass); gradle :zb-mom-ww-mxgateway-cli:installDist BUILD
SUCCESSFUL. Manual smoke against live gateway on localhost:5120:
open-session → register → read-bulk cold (wasCached=false both tags)
→ subscribe-bulk → read-bulk warm (wasCached=true both tags) →
write-bulk int32 111,222 (both wasSuccessful=true) → write2-bulk
timestamped (both wasSuccessful=true) → write-secured-bulk and
write-secured2-bulk return per-entry MXAccess "Value does not fall
within the expected range" failures with the configured user/verifier
ids (0,0) — confirming the SDK does NOT throw on per-entry MXAccess
failures and surfaces them through BulkWriteResult exactly as the
.NET and Go ports do → bench-read-bulk iterations=20 avg=9.5 ms
last_success=2/2 cached=2/2 → close-session SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:50:34 -04:00
Joseph Doherty 6add4b4acc Python client: port bulk read/write SDK methods + CLI subcommands
Mirrors the .NET / Go ports of divergent branch commit f220908. HEAD's
Session class had only the subscribe-style bulks; this commit adds the
value-bulk SDK surface plus matching CLI subcommands and a
bench-read-bulk harness.

SDK (zb_mom_ww_mxgateway/session.py):
- async def write_bulk(server_handle, entries, *, correlation_id="")
  → list[pb.BulkWriteResult]
- async def write2_bulk(server_handle, entries, *, correlation_id="")
  → list[pb.BulkWriteResult]
- async def write_secured_bulk(server_handle, entries, *, correlation_id="")
  → list[pb.BulkWriteResult]
- async def write_secured2_bulk(server_handle, entries, *, correlation_id="")
  → list[pb.BulkWriteResult]
- async def read_bulk(server_handle, tag_addresses, *, timeout_ms=0,
  correlation_id="") → list[pb.BulkReadResult]

All five reuse the existing _ensure_bulk_size validator and route
through the existing invoke() pipeline. read_bulk additionally enforces
timeout_ms >= 0.

CLI (zb_mom_ww_mxgateway_cli/commands.py):
- read-bulk / write-bulk / write2-bulk / write-secured-bulk /
  write-secured2-bulk registered as click @main.command(...). The
  write families share a _build_write_bulk_entries() helper that parses
  --item-handles and --values with a single --type, validates count
  match, converts via to_mx_value, and assembles the correct per-entry
  proto message.
- bench-read-bulk: opens its own session, subscribes to --bulk-size
  TestMachine_NNN.TestChangingInt tags, runs warmup then steady-state
  ReadBulk for --duration-seconds with time.perf_counter() latency
  capture, and emits the shared JSON schema (language, durationMs,
  totalCalls, successfulCalls, failedCalls, totalReadResults,
  cachedReadResults, callsPerSecond, latencyMs:{p50,p95,p99,max,mean})
  so scripts/bench-read-bulk.ps1 collates Python alongside the four
  other clients. _percentile_summary + linear-interpolation
  _percentile helper match the Go / .NET implementations.

to_mx_value is added to the existing values-module import line in
commands.py since the bulk-write commands need it.

Verification: python -m pip install -e . --quiet --no-deps; pytest
42/42 passing. Manual smoke against live gateway on localhost:5120:
open-session → register → subscribe-bulk on two
TestMachine_NNN.TestChangingInt tags (both wasSuccessful=true) →
read-bulk (both wasSuccessful=true / wasCached=true / int32 values
present) → close-session SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:50:10 -04:00
Joseph Doherty 325106920f Rust client: port BenchReadBulk subcommand + session.rs tightening
The bulk-write/read SDK methods (read_bulk, write_bulk, write2_bulk,
write_secured_bulk, write_secured2_bulk) and the matching clap
subcommands (ReadBulk, WriteBulk, Write2Bulk, WriteSecuredBulk,
WriteSecured2Bulk) were already on HEAD from a prior session — they
were the only bulk family that HEAD shipped before the .NET / Go /
Python / Java parallel ports. The one missing piece from the divergent
branch (commit f220908) was the BenchReadBulk benchmark harness.

mxgw-cli/src/main.rs adds:
- BenchReadBulk clap variant with flags --client-name,
  --duration-seconds, --warmup-seconds, --bulk-size, --tag-start,
  --tag-prefix, --tag-attribute, --timeout-ms, --json — defaults match
  the .NET and Go benches.
- run_bench_read_bulk(): open-session → register → subscribe_bulk on
  the synthesized TestMachine_NNN.TestChangingInt tags to populate the
  worker value cache → warmup → steady-state loop with per-call
  std::time::Instant capture → unsubscribe → close-session.
- BenchStats + LatencySummary structs and a percentile()
  helper (nearest-rank with linear interpolation, matching the Go and
  .NET implementations) so the cross-language JSON output is byte-for-
  byte comparable. JSON schema: language / command / endpoint /
  clientName / bulkSize / durationSeconds / warmupSeconds / durationMs
  / tags / totalCalls / successfulCalls / failedCalls /
  totalReadResults / cachedReadResults / callsPerSecond /
  latencyMs:{p50,p95,p99,max,mean}. scripts/bench-read-bulk.ps1 will
  pick up the Rust line on its next run.

session.rs picks up minor tightening tied to the bulk SDK methods that
were already in the file (per-entry validation paths, BulkReplyKind
dispatch coverage) — no public-surface change.

Verification: cargo build --workspace clean (the 2 pre-existing
options.rs missing_docs warnings remain — out of scope); cargo test
--workspace 34/34 passing; cargo clippy --workspace --all-targets has
only the 3 pre-existing tolerated warnings (enum_variant_names on
BulkReplyKind, missing_docs on options.rs, clone_on_copy on
galaxy.rs:282). Manual smoke against live gateway on localhost:5120:
read-bulk on two TestMachine tags returned wasCached=true,
wasSuccessful=true; bench-read-bulk --duration-seconds 2
--warmup-seconds 1 --bulk-size 2 --json ran 363 calls / 181.35 calls
per second / p50=5.3 ms / p99=7.8 ms / 726 of 726 cached reads, all
emitting valid JSON in the shared bench schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:50:09 -04:00
Joseph Doherty 8aaab82287 Go client: port bulk read/write SDK methods + CLI subcommands
Mirrors the .NET addition: HEAD's session.go had only the subscribe-style
bulks (AddItemBulk / AdviseItemBulk / RemoveItemBulk / UnAdviseItemBulk /
SubscribeBulk / UnsubscribeBulk). This commit ports the value-bulk SDK
surface and CLI subcommands from divergent branch commit f220908.

SDK (clients/go/mxgateway/session.go):
- WriteBulk(ctx, serverHandle int32, entries []*WriteBulkEntry)
- Write2Bulk(ctx, ..., entries []*Write2BulkEntry)
- WriteSecuredBulk(ctx, ..., entries []*WriteSecuredBulkEntry)
- WriteSecured2Bulk(ctx, ..., entries []*WriteSecured2BulkEntry)
- ReadBulk(ctx, serverHandle int32, tagAddresses []string, timeout time.Duration)
  → []*BulkReadResult

types.go gains public re-exports of the generated proto types
(WriteBulkCommand, WriteBulkEntry, Write2BulkCommand, Write2BulkEntry,
WriteSecuredBulkCommand, WriteSecuredBulkEntry, WriteSecured2BulkCommand,
WriteSecured2BulkEntry, ReadBulkCommand, BulkWriteReply, BulkWriteResult,
BulkReadReply, BulkReadResult) so external callers can construct entries
through the public `mxgateway` package without dipping into the internal
generated path.

CLI (clients/go/cmd/mxgw-go/main.go):
- read-bulk, write-bulk, write2-bulk, write-secured-bulk,
  write-secured2-bulk routed through runWithIO. write families share a
  runWriteBulkVariant helper that gates per-variant flags
  (--current-user-id, --verifier-user-id, --timestamp) so the
  Client.Go-015 flag-gating contract is preserved.
- bench-read-bulk: percentile + timing helpers; JSON output schema
  identical to the .NET / Rust / Python / Java benches.

parseInt32List was changed from panic-on-error to ([]int32, error) so
the new write-bulk commands surface parse errors gracefully; the
existing runUnsubscribeBulk caller is updated accordingly.

Verification: go build ./... + go vet ./... + go test ./... all clean.
Manual smoke against live gateway on localhost:5120: open-session →
register → subscribe-bulk on 3 TestMachine_NNN.TestChangingInt tags
(all wasSuccessful=true) → read-bulk (all wasSuccessful=true /
wasCached=true) → write-bulk int32 100/200/300 (all wasSuccessful=true)
→ close-session SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:49:33 -04:00
Joseph Doherty b3ae200b11 .NET client: port bulk read/write SDK methods + CLI subcommands
Adds the value-bulk SDK surface and CLI subcommands that lived on the
divergent branch (commit f220908) but were never merged into main.
HEAD's MxGatewaySession only had the subscribe-style bulks (AddItem /
Advise / Remove / UnAdvise / Subscribe / Unsubscribe). The proto
contract already defined ReadBulkCommand / WriteBulkCommand /
Write2BulkCommand / WriteSecuredBulkCommand / WriteSecured2BulkCommand
/ BulkReadReply / BulkWriteReply, so this is purely a client-side
addition.

SDK (MxGatewaySession.cs):
- WriteBulkAsync(serverHandle, IReadOnlyList<WriteBulkEntry>, ct)
- Write2BulkAsync(serverHandle, IReadOnlyList<Write2BulkEntry>, ct)
- WriteSecuredBulkAsync(serverHandle, IReadOnlyList<WriteSecuredBulkEntry>, ct)
- WriteSecured2BulkAsync(serverHandle, IReadOnlyList<WriteSecured2BulkEntry>, ct)
- ReadBulkAsync(serverHandle, IReadOnlyList<string> tagAddresses, TimeSpan timeout, ct)

Per-entry secured user ids live on each WriteSecured(2)BulkEntry — they
are NOT lifted to ctor args because the proto field shape allows distinct
ids per row.

CLI (MxGatewayClientCli.cs):
- read-bulk / write-bulk / write2-bulk / write-secured-bulk / write-secured2-bulk
  routed through the existing dispatch table, with --type, --values,
  --item-handles, --timeout-ms, --current-user-id, --verifier-user-id,
  --timestamp flags matching the cross-language CLI surface.
- bench-read-bulk benchmark harness: warmup + steady-state ReadBulk loop
  with p50/p95/p99/max/mean latency, emitting the shared JSON schema so
  scripts/bench-read-bulk.ps1 collates the .NET line alongside the four
  other clients.

The new subcommands flow through the existing batch dispatcher without
further changes.

Verification: dotnet build clean (0 warnings / 0 errors);
dotnet test 59/59 passing. Manual smoke against the live gateway
on localhost:5120: read-bulk returned 2 BulkReadResult entries with
wasSuccessful=true, wasCached=true; write-bulk on int32 returned
wasSuccessful=true; close-session returned SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:49:33 -04:00
Joseph Doherty 71d2c39f01 e2e: port batch subcommand to all five client CLIs
scripts/run-client-e2e-tests.ps1 expects each language CLI to expose a
`batch` subcommand that reads command lines from stdin, runs each
through the normal subcommand dispatch, writes the JSON result, then
a sentinel line `__MXGW_BATCH_EOR__`. The implementation lived on a
divergent branch (commit 6126099) that was never merged into main —
this commit ports the same protocol to HEAD's renamed CLIs so the
existing matrix script runs end-to-end.

The protocol:
  - one line of stdin = one full CLI invocation
  - successful output → stdout, then __MXGW_BATCH_EOR__
  - failure → {"error":"...","type":"error"} JSON on stdout, then
    __MXGW_BATCH_EOR__ (errors do NOT exit the loop)
  - empty line or EOF terminates the loop

Per-CLI additions:

  .NET: RunBatchAsync + per-line StringWriter capture, JSON error
    envelope when forceJsonErrors is true. Two new tests in
    MxGatewayClientCliTests covering the success and error paths.

  Go:   runBatch with bufio.Scanner, runs each line through the
    existing runWithIO switch with a buffered stdout writer. One new
    test pinning the EOR sentinel.

  Rust: new `Batch` variant on the clap Command enum, run_batch
    re-parses each line via Cli::try_parse_from. Two new tests in the
    inline mod tests block.

  Python: new `batch` click command in commands.py that uses
    CliRunner to dispatch each line; synthesises {"error",..."type"}
    JSON from click error messages when the captured output isn't
    already JSON-shaped. Three new tests in test_cli.py.

  Java: BatchCommand inner @Command with BufferedReader stdin loop,
    fresh commandLine() per dispatch with captured stdout/stderr
    PrintWriters; non-zero exit codes and uncaught exceptions both
    surface as JSON-error blocks. Two new tests.

Also fixes scripts/run-client-e2e-tests.ps1 line 705: the Python
invocation was still passing the old module name `mxgateway_cli` to
`python -m`; the client SDK rename in 397d3c5 moved it to
`zb_mom_ww_mxgateway_cli`. Without the fix the Python leg fails
with "No module named mxgateway_cli" before reaching open-session.

Verification: full matrix at the redeployed gateway (localhost:5120,
running ZB.MOM.WW.MxGateway.Server.exe / ZB.MOM.WW.MxGateway.Worker.exe)
with -SkipBulk -SkipReadWriteBulk -SkipParity -SkipAuth (those phases
exercise bulk read/write CLI subcommands that also live on the
divergent branch — porting those is a follow-up). All five clients
report `closed=true, addedItems=120, eventCount=5` and overall
`success=true`. Per-language unit tests pass:
  - dotnet: 59/59
  - go:     all packages clean
  - rust:   cargo test --workspace clean
  - python: 42/42
  - java:   gradle build SUCCESSFUL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:08:15 -04:00
Joseph Doherty a68f0cf222 code-review: regen README, all 21 open findings resolved
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open
findings from the d692232 re-review are now Resolved:

  Server          327e9c5  Server-031, -032, -038, -039, -040, -041, -042, -043
  Contracts       bd1d1f1  Contracts-016, -017
  Tests           d48099f  Tests-025, -026
  IntegrationTests 865c22a IntegrationTests-022, -023, -024
  Client.Java     10bd0c0  Client.Java-027, -028, -029, -030, -031
  Client.Rust     83eba4b  Client.Rust-021

Also fixes stale "Open findings" header counts in Client.Java and
IntegrationTests findings.md that survived the resolve passes
(the agents updated each finding's Status but missed the header
sum). `regen-readme.py --check` is now green.

Module status: 11 / 11 reviewed, 0 / 276 total open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:39 -04:00
Joseph Doherty 83eba4bec5 Resolve Client.Rust-021
Client.Rust-021 (Design adherence): RustClientDesign.md "Crate layout"
section now describes the actual flat workspace structure instead of
the aspirational nested form. The replacement text states that the
workspace root is clients/rust/, the top-level crate
zb-mom-ww-mxgateway-client is declared in clients/rust/Cargo.toml
directly, and crates/mxgw-cli/ is the sole [workspace.members] entry.
The accompanying tree lists the real files on disk (Cargo.toml,
Cargo.lock, build.rs, README.md, RustClientDesign.md,
src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs
plus the src/generated/ tonic-build output dir, tests/, and
crates/mxgw-cli/).

Doc-only change. cargo build --workspace + cargo test --workspace clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:07 -04:00
Joseph Doherty 10bd0c0e4d Resolve Client.Java-027..031
Client.Java-027 (Documentation): Updated 17 Gradle task references in
clients/java/README.md (lines 37, 108-110, 160-161, 169-176, 186, 206,
221) and 3 in clients/java/JavaClientDesign.md from the retired short
subproject names to the canonical zb-mom-ww-mxgateway-client /
zb-mom-ww-mxgateway-cli names. Copy-pasting any documented command now
matches the subproject names declared in settings.gradle.

Client.Java-028 (Design adherence): Build-layout block in
JavaClientDesign.md lines 23-27 updated to show the actual package
paths com/zb/mom/ww/mxgateway/{client,cli}/ instead of the retired
com/dohertylan/mxgateway/{client,cli}/ paths.

Client.Java-029 (Documentation): README.md line 210 corrected from
"zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli" to
"zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli" — Gradle
installDist produces a directory whose name matches the project name,
not the short suffix. The e2e script already used the correct path.

Client.Java-030 (Testing coverage): Added
queryActiveAlarmsForwardsRequestAndStreamsSnapshots to
MxGatewayClientSessionTests. The test pushes a QueryActiveAlarmsRequest
carrying session_id / client_correlation_id / alarm_filter_prefix
through an InProcessGateway + TestGatewayService and asserts the server
observed all three request fields, two ActiveAlarmSnapshots stream in
order, and onError is never called. TDD red→green confirmed via a
deliberately-wrong session_id assertion. The re-triage note in
Client.Java-030's resolution clarifies that the finding's reference to
"the existing acknowledgeAlarm test" was aspirational — the alarm RPC
surface had zero coverage before this commit.

Client.Java-031 (Conventions): README.md prose lines 17, 22, 26 updated
to use the canonical zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli
names so the layout description matches Gradle / IDE project names.

Verification: gradle build BUILD SUCCESSFUL; all Java unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:06 -04:00
Joseph Doherty 865c22a884 Resolve IntegrationTests-022..024
IntegrationTests-022 (Conventions): ResolveRepositoryRoot now throws
InvalidOperationException when the walk exhausts without finding a root
marker, with a message naming the start directory, the expected markers
(src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE
escape hatch. Replaces the silent fallback to
Directory.GetCurrentDirectory() that previously masked misconfiguration.
New regression test
ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers
in IntegrationTestEnvironmentTests asserts the throw and the message
contents. TDD red→green confirmed.

IntegrationTests-023 (Testing coverage): DashboardLdapLiveTests's
AuthenticateAsync_AdminInGwAdminGroup_Succeeds now asserts that the
authenticated principal carries a ClaimTypes.Role claim with value
DashboardRoles.Admin in addition to the existing LdapGroupClaimType
assertion. A regression in MapGroupsToRoles (returning an empty list or
missing the RDN fallback) would now surface here. Gated by
MXGATEWAY_RUN_LIVE_LDAP_TESTS.

IntegrationTests-024 (Conventions): Option (b) — extracted within
IntegrationTests. New file TestSupport/NullDashboardEventBroadcaster.cs
(public type, private ctor, singleton Instance). The inline class at
the bottom of WorkerLiveMxAccessSmokeTests is gone; the file now imports
the shared type. Matches the unit-test project's Tests-007 / Tests-021 /
Tests-025 pattern while keeping the two test projects independently
buildable (no shared test-helpers project crossing module boundaries).

Verification: dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests
clean; 19/19 integration tests passing (live MxAccess + LDAP + Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty d48099f0d0 Resolve Tests-025..026
Tests-025 (Conventions): Extracted the previously-duplicated
NullDashboardEventBroadcaster into TestSupport/NullDashboardEventBroadcaster.cs
(singleton Instance, private ctor). The two nested copies in
EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests were
removed; both files now use the shared type via
'using ZB.MOM.WW.MxGateway.Tests.TestSupport;'. The Server-041 regression
test's ThrowingDashboardEventBroadcaster is intentionally left nested —
single-file usage doesn't warrant promotion to TestSupport. The third
copy in IntegrationTests/WorkerLiveMxAccessSmokeTests was handled by
IntegrationTests-024 in its own commit.

Tests-026 (Testing coverage): Added a new RecordingDashboardEventBroadcaster
test double in TestSupport — a thread-safe (ConcurrentQueue<DashboardEventCapture>)
recorder. New fixture
StreamEventsAsync_PublishesEachEventToDashboardBroadcaster in
EventStreamServiceTests pushes two events through the fake session and
asserts the broadcaster received both with the correct sessionId and
WorkerSequence. TDD red→green confirmed: the deliberately-wrong
"Expected 3, Actual 2" red phase proved the recording fake was actually
invoked by the production code path.

Verification: 486/486 server tests passing (485 previous + 1 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty bd1d1f1c0e Resolve Contracts-016..017
Contracts-016 (Conventions): QueryActiveAlarmsRequest.session_id header
replaced with the unambiguous "Clients may leave session_id empty; the
gateway currently ignores it and serves the session-less central-monitor
cache. A future version may use it to scope the snapshot to one
session." Removes the ambiguity that the prior "reserved for future
use" wording introduced.

Contracts-017 (Documentation): The rpc QueryActiveAlarms comment now
includes the alarm_filter_prefix description: "QueryActiveAlarmsRequest.alarm_filter_prefix
optionally narrows the snapshot to alarms whose alarm_full_reference
starts with the given prefix; an empty prefix returns the full set."

Both are proto-comment-only changes — no wire-format impact, no field
renumbering, and the regenerated MxaccessGateway.cs / MxaccessGatewayGrpc.cs
carry only the doc-comment delta. Added the additive-only regression
guard QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter
to ProtobufContractRoundTripTests — pins
session_id=1 / client_correlation_id=2 / alarm_filter_prefix=3 by
descriptor lookup and round-trips the message with and without the
filter populated.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean;
ProtobufContractRoundTripTests 40/40 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:13 -04:00
Joseph Doherty 327e9c5f94 Resolve Server-031..032 (re-triaged) + Server-038..043
Server-031: re-triaged. The recommended gateway-side
"skip-while-command-in-flight" guard is already in place at
WorkerClient.HeartbeatLoopAsync via WorkerClientOptions.HeartbeatStuckCeiling
(default 75s = 5× HeartbeatGrace). Two regression tests pin the
behaviour. Recommendation #1 (decouple worker-side _writeLock) is a
Worker-module concern (Worker-017 / Worker-023) and out of scope here.

Server-032: re-triaged. Recommendation #2 (rich diagnostic) is already
in EnqueueWorkerEventAsync, with #3 (overflow grace) absorbed by the
TryWrite → WriteAsync-with-timeout fall-through. Test
EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic
pins the diagnostic string. Recommendation #1 (prose contract in
gateway.md / docs) is deferred — outside this pass's edit scope.

Server-038 (Security): EventsHub.SubscribeSession's missing per-session
ACL is documented with a TODO(per-session-acl) and a <remarks> block
explaining the v1 acceptance (any dashboard role can subscribe to any
session — non-secret metadata, redacted value logging). The per-session
ACL design lands in a follow-up once a session-scoped role exists.

Server-039 (Error handling): HubTokenService.Validate now rejects a
deserialized payload where both Name and NameIdentifier are null/empty.
New test file HubTokenServiceTests.cs covers the regression and five
sanity cases. TDD confirmed.

Server-040 (Conventions): MapGroupsToRoles gains a precedence comment
explaining "full literal match first, leading-RDN fallback;
OrdinalIgnoreCase via DashboardOptions.GroupToRole". Documentation-only.

Server-041 (Design adherence): EventStreamService.ProduceEventsAsync
wraps the broadcaster.Publish call in try/catch (Exception). The
producer loop and gRPC stream are no longer at the mercy of the
broadcaster's never-throw discipline. New regression test
StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession.

Server-042 (Performance): DashboardSnapshotPublisher.ExecuteAsync now
mirrors AlarmsHubPublisher's reconnect loop — wraps the await foreach
in a while-not-cancelled, catches general exceptions, and Task.Delays
5s before retrying. An internal ctor accepts a shorter delay for the
test. New test file DashboardSnapshotPublisherTests.cs covers the
throw-then-yield reconnect path and the normal-completion case.

Server-043 (Documentation): HubTokenService class XML doc gains a
<remarks> describing the singleton lifetime, the two consumer scopes
(DashboardHubConnectionFactory scoped, HubTokenAuthenticationHandler
transient), and the thread-safety contract.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean
(0 warnings / 0 errors); src/ZB.MOM.WW.MxGateway.Tests 486/486 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:18:52 -04:00
Joseph Doherty d2d2e5f68f code-review 2026-05-24: re-review at d692232 across all 11 modules
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).

Module status

| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server           | 8 | 43 | +6 |
| Contracts        | 2 | 17 | +2 |
| Tests            | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java      | 5 | 31 | +5 |
| Client.Rust      | 1 | 21 | +1 |
| Worker           | 0 | 25 |  0 (rename-only diff, clean) |
| Worker.Tests     | 0 | 30 |  0 (rename-only diff, clean) |
| Client.Dotnet    | 0 | 17 |  0 (rename + alarm-fix diff, clean) |
| Client.Python    | 0 | 21 |  0 (rename + alarm-fix diff, clean) |
| Client.Go        | 0 | 21 |  0 (rename + alarm-fix diff, clean) |

Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.

New findings

  * Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
    any session id from any Viewer; no per-session ACL guards the
    EventsHub group fan-out.
  * Server-039 (Low / Error handling) — HubTokenService.Validate
    accepts a payload with null Name/NameIdentifier.
  * Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
    full-vs-RDN lookup precedence.
  * Server-041 (Low / Design adherence) — EventStreamService calls
    IDashboardEventBroadcaster.Publish without a try/catch — fragile
    seam relying on the never-throw contract.
  * Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
    retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
  * Server-043 (Low / Documentation) — HubTokenService singleton
    sharing across login + hub-token validation undocumented.

  * Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
    reserved-for-future-use ambiguity.
  * Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
    omits the alarm_filter_prefix filter description.

  * Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
    fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
  * Tests-026 (Medium / Testing coverage) — no test proves
    EventStreamService actually calls IDashboardEventBroadcaster.Publish.

  * IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
    silent fallback to Directory.GetCurrentDirectory().
  * IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
    success-path asserts ldap_group but not the Role claim.
  * IntegrationTests-024 (Low / Conventions) — inline
    NullDashboardEventBroadcaster fake duplicates Tests-side copies.

  * Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
    Gradle task names still use the old short project names.
  * Client.Java-028 (Medium / Design adherence) — JavaClientDesign
    build-layout shows the old `com/dohertylan/mxgateway/` package paths.
  * Client.Java-029 (Low / Documentation) — README installDist path
    cites the wrong directory.
  * Client.Java-030 (Low / Testing coverage) — no Java test exercises
    the regenerated QueryActiveAlarmsRequest RPC.
  * Client.Java-031 (Low / Conventions) — README prose uses old short
    project names instead of canonical prefixed ones.

  * Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
    "Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
    that does not exist; actual layout is the flat top-level crate.

Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.

Process notes

- The `code-reviews/` tree was not in this working copy's git
  history (the local extract pre-dates the divergent branch that
  carried the reviews). Restored from `dd7ca16` via
  `git checkout dd7ca16 -- code-reviews/` before the re-review.
- Some "Resolved" entries in the restored findings.md reference
  fixes that landed on the divergent branch (the same one that
  carried the reviews) and are not present on the current main
  lineage. The re-review treats those statuses as historical;
  the new pass only files findings against HEAD's actual state.
- `python code-reviews/regen-readme.py --check` is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:34:30 -04:00
Joseph Doherty d692232191 dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)

EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:

  * IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
    take a direct dependency on SignalR.
  * DashboardEventBroadcaster — singleton implementation that hands the
    SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
    logged at debug and dropped so the source gRPC stream is never
    blocked.

EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.

SessionDetailsPage adds a "Recent events" panel:
  * implements IAsyncDisposable
  * opens a second HubConnection via DashboardHubConnectionFactory targeting
    /hubs/events
  * calls SubscribeSession(SessionId) on Start
  * renders the most recent 50 events in a small table (worker seq, family,
    server/item handle, alarm reference when the event is OnAlarmTransition)
  * shows a live/offline conn-pill driven by HubConnection.Closed /
    Reconnected events

The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.

Documentation refresh

Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:

  * CLAUDE.md — Authentication section now explains LDAP bind +
    GroupToRole + HubToken bearer flow.
  * gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
    events SignalR hubs, LDAP cookie + bearer scheme.
  * docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
    add GroupToRole row, append "Authorization policies" and "SignalR hubs"
    subsections describing the three policies and the /hubs/* endpoints.
  * docs/GatewayDashboardDesign.md — hosting model (root mount, new
    endpoint layout), Realtime Updates rewritten as a hub table
    (DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
    and routing), Authentication And Authorization rewritten around LDAP +
    role mapping + the hub bearer flow, Configuration block updated.
  * docs/GatewayProcessDesign.md — security-section dashboard paragraph
    and the example config block both refreshed to LDAP/role auth.
  * docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
    updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
    API-key login flow).
  * docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
    GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
    RequiredGroup default; success-path assertion explains the role-claim
    check.

Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:07:30 -04:00
Joseph Doherty 65943597d4 dashboard: side-rail layout + SignalR push hubs (snapshot, alarms, events)
Layout
------
DashboardLayout.razor replaces the inline header nav with a left side rail
modelled on the OtOpcUa admin (Dashboard B). The top bar keeps only the
brand, breadcrumb, and signed-in status pill; navigation moves into a
fixed-width 218px rail with grouped section eyebrows (Overview,
Runtime, Galaxy, Admin) and a Session footer carrying the user name,
role claims, and a Sign-out button. dashboard.css gains the
`.app-shell` flex container, `.side-rail` column, `.rail-eyebrow`,
`.rail-link[.active]`, `.rail-foot`, `.rail-user`, `.rail-roles`, and
`.rail-btn` rules (all driven by the existing theme.css tokens, no new
hard-coded colours).

SignalR (push)
--------------
Adds three hubs under `Dashboard/Hubs/`, all gated by the
`HubClientsPolicy` registered in the previous commit:

  * DashboardSnapshotHub (/hubs/snapshot)
    Broadcasts the full DashboardSnapshot on every change. Sends the
    current snapshot to a new caller in OnConnectedAsync so the first
    paint is immediate.

  * AlarmsHub (/hubs/alarms)
    Connected clients auto-join the `__alarms__` group. Receives
    AlarmFeedMessage values (active_alarm / snapshot_complete /
    transition) re-broadcast from the gateway's central alarm monitor.

  * EventsHub (/hubs/events)
    Per-session push surface. Clients call SubscribeSession(sessionId)
    to join `session:{id}`. The publisher side is intentionally a
    follow-up — the snapshot hub already carries recent-events
    rollups; a dedicated MxEvent broadcaster on EventStreamService
    will plug into this hub's group convention.

Two BackgroundService publishers wire server-side data sources to the
hubs:

  * DashboardSnapshotPublisher subscribes to
    `IDashboardSnapshotService.WatchSnapshotsAsync` and forwards every
    snapshot to all connected hub clients.
  * AlarmsHubPublisher subscribes to `IGatewayAlarmService.StreamAsync`
    (no filter) and forwards every AlarmFeedMessage to the
    `__alarms__` group, reconnecting with a 5-second backoff if the
    stream faults.

Connection + auth plumbing
--------------------------
  * `GET /hubs/token` issues a fresh data-protected bearer token
    bound to the calling user's identity and roles. Gated by the
    cookie-only ViewerPolicy so a Blazor circuit (cookie-authenticated)
    can mint a token, but a hub bearer cannot self-bootstrap a new
    one.
  * DashboardHubConnectionFactory (scoped) is the client-side helper
    Razor pages inject. It builds a HubConnection with an
    AccessTokenProvider that calls HubTokenService.Issue on every
    (re)connect — keeps the connection alive across cookie refresh
    boundaries.

Pull → push refactor
--------------------
DashboardPageBase no longer drives its own `WatchSnapshotsAsync`
async-foreach loop. It now:
  1. seeds Snapshot synchronously from `IDashboardSnapshotService.GetSnapshot()`
     so the first render is non-empty;
  2. opens a `DashboardSnapshotHub` connection via the connection
     factory;
  3. updates Snapshot + triggers StateHasChanged on each
     `SnapshotUpdated` push.

The hub connection is best-effort: if SignalR can't start, the
synchronous snapshot seed keeps the UI populated. SignalR's
WithAutomaticReconnect handles the recovery path.

Package
-------
Adds `Microsoft.AspNetCore.SignalR.Client` 10.0.0 to the server csproj
so the in-process Blazor pages can open hub connections back to their
own hosting process.

Verification: 475 server tests (+ 2 new
`DashboardHubsRegistrationTests` that pin the hub negotiate endpoints
and the singleton/scoped DI shape), 275 worker tests (+ 9 dev-rig
skips), 18 integration tests (live MxAccess + LDAP + Galaxy) all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:48:27 -04:00
Joseph Doherty 27ed65114e dashboard: role-based LDAP auth + hub bearer scheme, drop PathBase
Restructure dashboard auth around LDAP-driven Admin/Viewer roles, add a
bearer scheme so SignalR hubs (next commit) can authenticate without
forwarding the HttpOnly browser cookie, and mount the dashboard at the
host root instead of a configurable `/dashboard` prefix.

Configuration changes (breaking):
- `MxGateway:Dashboard:PathBase` removed — the dashboard now serves at `/`.
- `MxGateway:Dashboard:RequireAdminScope` removed — role checks replace
  the single admin-scope claim.
- `MxGateway:Ldap:RequiredGroup` removed — replaced by `MxGateway:Dashboard:GroupToRole`,
  a map from LDAP group name to dashboard role. Legal role values:
  `Admin` and `Viewer`. Users whose LDAP groups don't intersect this
  map are rejected at login (the existing fail-closed contract).
- appsettings.json ships a default mapping `{ GwAdmin: Admin, GwReader: Viewer }`.

Auth model:
- DashboardRoles: new static class with `Admin` and `Viewer` constants.
- DashboardAuthenticator.AuthenticateAsync: after LDAP bind, maps the
  user's groups through `DashboardOptions.GroupToRole` and emits one
  `ClaimTypes.Role` claim per resolved role. Empty result → login fails.
- DashboardAuthorizationRequirement now carries `RequiredRoles`; static
  presets `AnyDashboardRole` (Viewer ∨ Admin) and `AdminOnly`.
- DashboardAuthorizationHandler checks `IsInRole` against the
  requirement's role list instead of the old scope claim. The
  `AuthenticationMode.Disabled` and `AllowAnonymousLocalhost` bypasses
  are preserved.
- DashboardApiKeyAuthorization.CanManage now requires the `Admin` role
  (was: required LDAP group membership). The constructor's IOptions
  parameter is gone.

Policies / schemes:
- DashboardAuthenticationDefaults gains `ViewerPolicy`, `AdminPolicy`,
  `HubClientsPolicy`, and `HubAuthenticationScheme`. The legacy
  `AuthorizationPolicy` and `ScopeClaimType` constants are removed.
- DashboardServiceCollectionExtensions registers all three policies,
  adds the cookie scheme and the HubToken bearer scheme side by side,
  calls `AddSignalR()`, and hard-codes the cookie's login/logout/denied
  paths to root-relative `/login` etc.

Hub bearer infrastructure (no hubs wired yet — next commit):
- HubTokenService: mints time-limited data-protected JSON tokens
  carrying the user's name, NameIdentifier, and roles. 30-minute
  lifetime, purpose `ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1`.
- HubTokenAuthenticationHandler: validates the token from
  `Authorization: Bearer …` or `?access_token=…` (WebSocket upgrade
  query string) and rebuilds the principal.

Endpoint mapping:
- DashboardEndpointRouteBuilderExtensions drops the `MapGroup(pathBase)`
  wrapper. Login/logout/denied and Razor component routes are now
  mounted at `/`. The login form posts to `/login`. Razor components
  require the new `ViewerPolicy`.
- All page `@page "/dashboard/X"` dual-route directives are removed —
  pages live at their canonical roots (`@page "/"`, `@page "/sessions"`, …).
- App.razor and DashboardLayout.razor drop their PathBase computations.

EffectiveLdapConfiguration drops `RequiredGroup`; EffectiveDashboardConfiguration
drops `PathBase`/`RequireAdminScope` and gains `GroupToRole`. SettingsPage
renders the role mapping in place of the retired fields.

Tests updated:
- DashboardAuthenticatorTests: covers the new GroupToRole mapping
  (short name + DN + multi-role).
- DashboardAuthorizationHandlerTests: split into Viewer-policy and
  Admin-policy cases.
- DashboardApiKeyAuthorizationTests, DashboardApiKeyManagementServiceTests:
  authorized principal now carries the `Admin` role claim.
- DashboardCookieOptionsTests: expects root-relative login/logout paths.
- GatewayApplicationTests: dashboard component routes registered at `/`,
  `/sessions`, … and gated by `ViewerPolicy`. Filter on
  `ComponentTypeMetadata` to ignore minimal-API endpoints sharing `/`.
- GatewayOptionsTests + Validator: drop PathBase / RequireAdminScope /
  RequiredGroup assertions; add a `GroupToRole` value-validation case.
- DashboardLdapLiveTests: provides the default `GwAdmin` → `Admin`
  mapping so the live LDAP bind resolves to a role.

Verification: 473 server tests, 275 worker tests (+9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass.

This commit is intentionally UI-neutral. The sidebar layout and the
SignalR hubs that consume the new HubToken scheme land in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:38:33 -04:00
Joseph Doherty 397d3c5c4f rename: apply ZB.MOM.WW prefix to all client SDKs + fix pre-existing alarm-RPC breaks
Rename across every client surface using each language's idiomatic convention:

  * .NET   clients/dotnet/MxGateway.Client[.Cli|.Tests]/
             -> clients/dotnet/ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]/
             namespaces -> ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]
             contracts ProjectReference repointed to ZB.MOM.WW.MxGateway.Contracts
             sln migrated to slnx (dotnet sln migrate)
  * Python src/mxgateway -> src/zb_mom_ww_mxgateway
             src/mxgateway_cli -> src/zb_mom_ww_mxgateway_cli
             distribution: mxaccess-gateway-client -> zb-mom-ww-mxaccess-gateway-client
  * Rust   crate: mxgateway-client -> zb-mom-ww-mxgateway-client
             build.rs proto path repointed
  * Java   subprojects: mxgateway-{client,cli} -> zb-mom-ww-mxgateway-{client,cli}
             packages com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
             group   com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
             rootProject mxaccessgw-java -> zb-mom-ww-mxaccessgw-java
  * Go     generate-proto.ps1 proto path repointed; module path and
             package mxgateway kept (Go convention).
  * proto-inputs.json: generatedOutputs.python updated to new package path.
  * scripts/run-client-e2e-tests.ps1: Java CLI install path + gradle task
             updated to zb-mom-ww-mxgateway-cli.

CLI binary names (mxgw, mxgw-py, mxgw-go, mxgateway-cli) and wire-level
identifiers (MXGATEWAY_* env vars, the mxgw_<id>_<secret> API key
prefix, protobuf package names like mxaccess_gateway.v1, all MXAccess
references) intentionally NOT renamed.

Fix pre-existing alarms-over-gateway breaks unblocked by the rename:

  * mxaccess_gateway.proto: add missing public message QueryActiveAlarmsRequest
    {session_id, client_correlation_id, alarm_filter_prefix} and missing
    rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns
    (stream ActiveAlarmSnapshot). All four typed clients referenced
    these but they were absent from the proto.
  * MxAccessGatewayService.QueryActiveAlarms: implement the new RPC on
    the server, streaming from IGatewayAlarmService.CurrentAlarms with
    optional alarm_filter_prefix filter.
  * clients/dotnet/.../DiscoverHierarchyOptions.cs: add the hand-written
    .NET POCO that wraps DiscoverHierarchyRequest (referenced by
    GalaxyRepositoryClient.DiscoverHierarchyAsync but never authored).
  * Drop retired session_id field references from
    AcknowledgeAlarmRequest/AcknowledgeAlarmReply test fixtures across
    .NET, Rust, Go, and Python clients.
  * Rust integration test: add the missing stream_alarms impl on the
    fake MxAccessGateway server (the trait gained the method, fake
    didn't).
  * Rust CLI test: bump expected gatewayProtocolVersion 2 -> 3.

Regenerated artifacts updated in this commit:
  * src/ZB.MOM.WW.MxGateway.Contracts/Generated/{MxaccessGateway,MxaccessGatewayGrpc}.cs
  * clients/python/src/zb_mom_ww_mxgateway/generated/*_pb2{,_grpc}.py
  * clients/go/internal/generated/*.pb.go
(C# regenerated by Grpc.Tools on contracts build; Python and Go via
their generate-proto.ps1 scripts; Rust regenerates from .proto via
tonic-build at compile time so no checked-in artefact.)

Verification: 472 server tests, 275 worker tests (9 dev-rig skipped),
18 integration tests (live MxAccess + LDAP + Galaxy), 57 .NET client
tests, 32 Rust workspace tests, 39 Python tests, all Go packages, and
gradle build for Java all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:09:34 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00
dohertj2 867bf18116 alarms-over-gateway: full pipeline (#118)
Seven slices on this branch implement the full alarms-over-gateway path:

1. f711a55  A.2: WnWrapAlarmConsumer replaces aaAlarmManagedClient (wnwrapConsumer.dll, XML payload bypasses FILETIME crash)
2. 82eb0ad  A.3 in-process: AlarmDispatcher wires consumer events onto worker MxAccessEventQueue
3. 01f5e6a  A.3 worker IPC: SubscribeAlarms / UnsubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms commands + executor switch arms
4. 9b21ca3  A.3 gateway: WorkerAlarmRpcDispatcher routes RPCs through the IPC; replaces NotWiredAlarmRpcDispatcher in DI
5. 47b1fd4  A.3 auto-subscribe: SessionManager issues SubscribeAlarms on session open (gated by Alarms.Enabled config)
6. 4e02927  A.3 alarm-ack-by-name: public AcknowledgeAlarm now accepts Provider!Group.Tag references via AlarmAckByName
7. a4ed605  A.3 live smoke: end-to-end pipeline verified on dev rig; surfaced + fixed three production-relevant AVEVA quirks (SetXmlAlarmQuery required for reads, breaks acks; v2 8-arg AlarmAckByName is a stub; AlarmAckByGUID is a stub)

Known follow-ups not in scope:
 - WnWrapAlarmConsumer.PollOnce needs to be driven from the worker StaRuntime (production hosting); currently the timer-based path deadlocks on cross-apartment marshaling without an STA pump.
 - Pre-existing structure-test failure (test project ArchestrA.MxAccess ref) untouched.

Test counts at merge time:
  Worker: 195 pass / 4 skipped (live probes incl. AlarmsLiveSmokeTests) / 1 pre-existing fail
  Server: 308 pass / 0 fail
2026-05-01 12:31:27 -04:00