Commit Graph

200 Commits

Author SHA1 Message Date
Joseph Doherty 325106920f Rust client: port BenchReadBulk subcommand + session.rs tightening
The bulk-write/read SDK methods (read_bulk, write_bulk, write2_bulk,
write_secured_bulk, write_secured2_bulk) and the matching clap
subcommands (ReadBulk, WriteBulk, Write2Bulk, WriteSecuredBulk,
WriteSecured2Bulk) were already on HEAD from a prior session — they
were the only bulk family that HEAD shipped before the .NET / Go /
Python / Java parallel ports. The one missing piece from the divergent
branch (commit f220908) was the BenchReadBulk benchmark harness.

mxgw-cli/src/main.rs adds:
- BenchReadBulk clap variant with flags --client-name,
  --duration-seconds, --warmup-seconds, --bulk-size, --tag-start,
  --tag-prefix, --tag-attribute, --timeout-ms, --json — defaults match
  the .NET and Go benches.
- run_bench_read_bulk(): open-session → register → subscribe_bulk on
  the synthesized TestMachine_NNN.TestChangingInt tags to populate the
  worker value cache → warmup → steady-state loop with per-call
  std::time::Instant capture → unsubscribe → close-session.
- BenchStats + LatencySummary structs and a percentile()
  helper (nearest-rank with linear interpolation, matching the Go and
  .NET implementations) so the cross-language JSON output is byte-for-
  byte comparable. JSON schema: language / command / endpoint /
  clientName / bulkSize / durationSeconds / warmupSeconds / durationMs
  / tags / totalCalls / successfulCalls / failedCalls /
  totalReadResults / cachedReadResults / callsPerSecond /
  latencyMs:{p50,p95,p99,max,mean}. scripts/bench-read-bulk.ps1 will
  pick up the Rust line on its next run.

session.rs picks up minor tightening tied to the bulk SDK methods that
were already in the file (per-entry validation paths, BulkReplyKind
dispatch coverage) — no public-surface change.

Verification: cargo build --workspace clean (the 2 pre-existing
options.rs missing_docs warnings remain — out of scope); cargo test
--workspace 34/34 passing; cargo clippy --workspace --all-targets has
only the 3 pre-existing tolerated warnings (enum_variant_names on
BulkReplyKind, missing_docs on options.rs, clone_on_copy on
galaxy.rs:282). Manual smoke against live gateway on localhost:5120:
read-bulk on two TestMachine tags returned wasCached=true,
wasSuccessful=true; bench-read-bulk --duration-seconds 2
--warmup-seconds 1 --bulk-size 2 --json ran 363 calls / 181.35 calls
per second / p50=5.3 ms / p99=7.8 ms / 726 of 726 cached reads, all
emitting valid JSON in the shared bench schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:50:09 -04:00
Joseph Doherty 8aaab82287 Go client: port bulk read/write SDK methods + CLI subcommands
Mirrors the .NET addition: HEAD's session.go had only the subscribe-style
bulks (AddItemBulk / AdviseItemBulk / RemoveItemBulk / UnAdviseItemBulk /
SubscribeBulk / UnsubscribeBulk). This commit ports the value-bulk SDK
surface and CLI subcommands from divergent branch commit f220908.

SDK (clients/go/mxgateway/session.go):
- WriteBulk(ctx, serverHandle int32, entries []*WriteBulkEntry)
- Write2Bulk(ctx, ..., entries []*Write2BulkEntry)
- WriteSecuredBulk(ctx, ..., entries []*WriteSecuredBulkEntry)
- WriteSecured2Bulk(ctx, ..., entries []*WriteSecured2BulkEntry)
- ReadBulk(ctx, serverHandle int32, tagAddresses []string, timeout time.Duration)
  → []*BulkReadResult

types.go gains public re-exports of the generated proto types
(WriteBulkCommand, WriteBulkEntry, Write2BulkCommand, Write2BulkEntry,
WriteSecuredBulkCommand, WriteSecuredBulkEntry, WriteSecured2BulkCommand,
WriteSecured2BulkEntry, ReadBulkCommand, BulkWriteReply, BulkWriteResult,
BulkReadReply, BulkReadResult) so external callers can construct entries
through the public `mxgateway` package without dipping into the internal
generated path.

CLI (clients/go/cmd/mxgw-go/main.go):
- read-bulk, write-bulk, write2-bulk, write-secured-bulk,
  write-secured2-bulk routed through runWithIO. write families share a
  runWriteBulkVariant helper that gates per-variant flags
  (--current-user-id, --verifier-user-id, --timestamp) so the
  Client.Go-015 flag-gating contract is preserved.
- bench-read-bulk: percentile + timing helpers; JSON output schema
  identical to the .NET / Rust / Python / Java benches.

parseInt32List was changed from panic-on-error to ([]int32, error) so
the new write-bulk commands surface parse errors gracefully; the
existing runUnsubscribeBulk caller is updated accordingly.

Verification: go build ./... + go vet ./... + go test ./... all clean.
Manual smoke against live gateway on localhost:5120: open-session →
register → subscribe-bulk on 3 TestMachine_NNN.TestChangingInt tags
(all wasSuccessful=true) → read-bulk (all wasSuccessful=true /
wasCached=true) → write-bulk int32 100/200/300 (all wasSuccessful=true)
→ close-session SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:49:33 -04:00
Joseph Doherty b3ae200b11 .NET client: port bulk read/write SDK methods + CLI subcommands
Adds the value-bulk SDK surface and CLI subcommands that lived on the
divergent branch (commit f220908) but were never merged into main.
HEAD's MxGatewaySession only had the subscribe-style bulks (AddItem /
Advise / Remove / UnAdvise / Subscribe / Unsubscribe). The proto
contract already defined ReadBulkCommand / WriteBulkCommand /
Write2BulkCommand / WriteSecuredBulkCommand / WriteSecured2BulkCommand
/ BulkReadReply / BulkWriteReply, so this is purely a client-side
addition.

SDK (MxGatewaySession.cs):
- WriteBulkAsync(serverHandle, IReadOnlyList<WriteBulkEntry>, ct)
- Write2BulkAsync(serverHandle, IReadOnlyList<Write2BulkEntry>, ct)
- WriteSecuredBulkAsync(serverHandle, IReadOnlyList<WriteSecuredBulkEntry>, ct)
- WriteSecured2BulkAsync(serverHandle, IReadOnlyList<WriteSecured2BulkEntry>, ct)
- ReadBulkAsync(serverHandle, IReadOnlyList<string> tagAddresses, TimeSpan timeout, ct)

Per-entry secured user ids live on each WriteSecured(2)BulkEntry — they
are NOT lifted to ctor args because the proto field shape allows distinct
ids per row.

CLI (MxGatewayClientCli.cs):
- read-bulk / write-bulk / write2-bulk / write-secured-bulk / write-secured2-bulk
  routed through the existing dispatch table, with --type, --values,
  --item-handles, --timeout-ms, --current-user-id, --verifier-user-id,
  --timestamp flags matching the cross-language CLI surface.
- bench-read-bulk benchmark harness: warmup + steady-state ReadBulk loop
  with p50/p95/p99/max/mean latency, emitting the shared JSON schema so
  scripts/bench-read-bulk.ps1 collates the .NET line alongside the four
  other clients.

The new subcommands flow through the existing batch dispatcher without
further changes.

Verification: dotnet build clean (0 warnings / 0 errors);
dotnet test 59/59 passing. Manual smoke against the live gateway
on localhost:5120: read-bulk returned 2 BulkReadResult entries with
wasSuccessful=true, wasCached=true; write-bulk on int32 returned
wasSuccessful=true; close-session returned SESSION_STATE_CLOSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:49:33 -04:00
Joseph Doherty 71d2c39f01 e2e: port batch subcommand to all five client CLIs
scripts/run-client-e2e-tests.ps1 expects each language CLI to expose a
`batch` subcommand that reads command lines from stdin, runs each
through the normal subcommand dispatch, writes the JSON result, then
a sentinel line `__MXGW_BATCH_EOR__`. The implementation lived on a
divergent branch (commit 6126099) that was never merged into main —
this commit ports the same protocol to HEAD's renamed CLIs so the
existing matrix script runs end-to-end.

The protocol:
  - one line of stdin = one full CLI invocation
  - successful output → stdout, then __MXGW_BATCH_EOR__
  - failure → {"error":"...","type":"error"} JSON on stdout, then
    __MXGW_BATCH_EOR__ (errors do NOT exit the loop)
  - empty line or EOF terminates the loop

Per-CLI additions:

  .NET: RunBatchAsync + per-line StringWriter capture, JSON error
    envelope when forceJsonErrors is true. Two new tests in
    MxGatewayClientCliTests covering the success and error paths.

  Go:   runBatch with bufio.Scanner, runs each line through the
    existing runWithIO switch with a buffered stdout writer. One new
    test pinning the EOR sentinel.

  Rust: new `Batch` variant on the clap Command enum, run_batch
    re-parses each line via Cli::try_parse_from. Two new tests in the
    inline mod tests block.

  Python: new `batch` click command in commands.py that uses
    CliRunner to dispatch each line; synthesises {"error",..."type"}
    JSON from click error messages when the captured output isn't
    already JSON-shaped. Three new tests in test_cli.py.

  Java: BatchCommand inner @Command with BufferedReader stdin loop,
    fresh commandLine() per dispatch with captured stdout/stderr
    PrintWriters; non-zero exit codes and uncaught exceptions both
    surface as JSON-error blocks. Two new tests.

Also fixes scripts/run-client-e2e-tests.ps1 line 705: the Python
invocation was still passing the old module name `mxgateway_cli` to
`python -m`; the client SDK rename in 397d3c5 moved it to
`zb_mom_ww_mxgateway_cli`. Without the fix the Python leg fails
with "No module named mxgateway_cli" before reaching open-session.

Verification: full matrix at the redeployed gateway (localhost:5120,
running ZB.MOM.WW.MxGateway.Server.exe / ZB.MOM.WW.MxGateway.Worker.exe)
with -SkipBulk -SkipReadWriteBulk -SkipParity -SkipAuth (those phases
exercise bulk read/write CLI subcommands that also live on the
divergent branch — porting those is a follow-up). All five clients
report `closed=true, addedItems=120, eventCount=5` and overall
`success=true`. Per-language unit tests pass:
  - dotnet: 59/59
  - go:     all packages clean
  - rust:   cargo test --workspace clean
  - python: 42/42
  - java:   gradle build SUCCESSFUL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 04:08:15 -04:00
Joseph Doherty a68f0cf222 code-review: regen README, all 21 open findings resolved
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open
findings from the d692232 re-review are now Resolved:

  Server          327e9c5  Server-031, -032, -038, -039, -040, -041, -042, -043
  Contracts       bd1d1f1  Contracts-016, -017
  Tests           d48099f  Tests-025, -026
  IntegrationTests 865c22a IntegrationTests-022, -023, -024
  Client.Java     10bd0c0  Client.Java-027, -028, -029, -030, -031
  Client.Rust     83eba4b  Client.Rust-021

Also fixes stale "Open findings" header counts in Client.Java and
IntegrationTests findings.md that survived the resolve passes
(the agents updated each finding's Status but missed the header
sum). `regen-readme.py --check` is now green.

Module status: 11 / 11 reviewed, 0 / 276 total open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:39 -04:00
Joseph Doherty 83eba4bec5 Resolve Client.Rust-021
Client.Rust-021 (Design adherence): RustClientDesign.md "Crate layout"
section now describes the actual flat workspace structure instead of
the aspirational nested form. The replacement text states that the
workspace root is clients/rust/, the top-level crate
zb-mom-ww-mxgateway-client is declared in clients/rust/Cargo.toml
directly, and crates/mxgw-cli/ is the sole [workspace.members] entry.
The accompanying tree lists the real files on disk (Cargo.toml,
Cargo.lock, build.rs, README.md, RustClientDesign.md,
src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs
plus the src/generated/ tonic-build output dir, tests/, and
crates/mxgw-cli/).

Doc-only change. cargo build --workspace + cargo test --workspace clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:07 -04:00
Joseph Doherty 10bd0c0e4d Resolve Client.Java-027..031
Client.Java-027 (Documentation): Updated 17 Gradle task references in
clients/java/README.md (lines 37, 108-110, 160-161, 169-176, 186, 206,
221) and 3 in clients/java/JavaClientDesign.md from the retired short
subproject names to the canonical zb-mom-ww-mxgateway-client /
zb-mom-ww-mxgateway-cli names. Copy-pasting any documented command now
matches the subproject names declared in settings.gradle.

Client.Java-028 (Design adherence): Build-layout block in
JavaClientDesign.md lines 23-27 updated to show the actual package
paths com/zb/mom/ww/mxgateway/{client,cli}/ instead of the retired
com/dohertylan/mxgateway/{client,cli}/ paths.

Client.Java-029 (Documentation): README.md line 210 corrected from
"zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli" to
"zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli" — Gradle
installDist produces a directory whose name matches the project name,
not the short suffix. The e2e script already used the correct path.

Client.Java-030 (Testing coverage): Added
queryActiveAlarmsForwardsRequestAndStreamsSnapshots to
MxGatewayClientSessionTests. The test pushes a QueryActiveAlarmsRequest
carrying session_id / client_correlation_id / alarm_filter_prefix
through an InProcessGateway + TestGatewayService and asserts the server
observed all three request fields, two ActiveAlarmSnapshots stream in
order, and onError is never called. TDD red→green confirmed via a
deliberately-wrong session_id assertion. The re-triage note in
Client.Java-030's resolution clarifies that the finding's reference to
"the existing acknowledgeAlarm test" was aspirational — the alarm RPC
surface had zero coverage before this commit.

Client.Java-031 (Conventions): README.md prose lines 17, 22, 26 updated
to use the canonical zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli
names so the layout description matches Gradle / IDE project names.

Verification: gradle build BUILD SUCCESSFUL; all Java unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:06 -04:00
Joseph Doherty 865c22a884 Resolve IntegrationTests-022..024
IntegrationTests-022 (Conventions): ResolveRepositoryRoot now throws
InvalidOperationException when the walk exhausts without finding a root
marker, with a message naming the start directory, the expected markers
(src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE
escape hatch. Replaces the silent fallback to
Directory.GetCurrentDirectory() that previously masked misconfiguration.
New regression test
ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers
in IntegrationTestEnvironmentTests asserts the throw and the message
contents. TDD red→green confirmed.

IntegrationTests-023 (Testing coverage): DashboardLdapLiveTests's
AuthenticateAsync_AdminInGwAdminGroup_Succeeds now asserts that the
authenticated principal carries a ClaimTypes.Role claim with value
DashboardRoles.Admin in addition to the existing LdapGroupClaimType
assertion. A regression in MapGroupsToRoles (returning an empty list or
missing the RDN fallback) would now surface here. Gated by
MXGATEWAY_RUN_LIVE_LDAP_TESTS.

IntegrationTests-024 (Conventions): Option (b) — extracted within
IntegrationTests. New file TestSupport/NullDashboardEventBroadcaster.cs
(public type, private ctor, singleton Instance). The inline class at
the bottom of WorkerLiveMxAccessSmokeTests is gone; the file now imports
the shared type. Matches the unit-test project's Tests-007 / Tests-021 /
Tests-025 pattern while keeping the two test projects independently
buildable (no shared test-helpers project crossing module boundaries).

Verification: dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests
clean; 19/19 integration tests passing (live MxAccess + LDAP + Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty d48099f0d0 Resolve Tests-025..026
Tests-025 (Conventions): Extracted the previously-duplicated
NullDashboardEventBroadcaster into TestSupport/NullDashboardEventBroadcaster.cs
(singleton Instance, private ctor). The two nested copies in
EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests were
removed; both files now use the shared type via
'using ZB.MOM.WW.MxGateway.Tests.TestSupport;'. The Server-041 regression
test's ThrowingDashboardEventBroadcaster is intentionally left nested —
single-file usage doesn't warrant promotion to TestSupport. The third
copy in IntegrationTests/WorkerLiveMxAccessSmokeTests was handled by
IntegrationTests-024 in its own commit.

Tests-026 (Testing coverage): Added a new RecordingDashboardEventBroadcaster
test double in TestSupport — a thread-safe (ConcurrentQueue<DashboardEventCapture>)
recorder. New fixture
StreamEventsAsync_PublishesEachEventToDashboardBroadcaster in
EventStreamServiceTests pushes two events through the fake session and
asserts the broadcaster received both with the correct sessionId and
WorkerSequence. TDD red→green confirmed: the deliberately-wrong
"Expected 3, Actual 2" red phase proved the recording fake was actually
invoked by the production code path.

Verification: 486/486 server tests passing (485 previous + 1 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:40 -04:00
Joseph Doherty bd1d1f1c0e Resolve Contracts-016..017
Contracts-016 (Conventions): QueryActiveAlarmsRequest.session_id header
replaced with the unambiguous "Clients may leave session_id empty; the
gateway currently ignores it and serves the session-less central-monitor
cache. A future version may use it to scope the snapshot to one
session." Removes the ambiguity that the prior "reserved for future
use" wording introduced.

Contracts-017 (Documentation): The rpc QueryActiveAlarms comment now
includes the alarm_filter_prefix description: "QueryActiveAlarmsRequest.alarm_filter_prefix
optionally narrows the snapshot to alarms whose alarm_full_reference
starts with the given prefix; an empty prefix returns the full set."

Both are proto-comment-only changes — no wire-format impact, no field
renumbering, and the regenerated MxaccessGateway.cs / MxaccessGatewayGrpc.cs
carry only the doc-comment delta. Added the additive-only regression
guard QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter
to ProtobufContractRoundTripTests — pins
session_id=1 / client_correlation_id=2 / alarm_filter_prefix=3 by
descriptor lookup and round-trips the message with and without the
filter populated.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean;
ProtobufContractRoundTripTests 40/40 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:20:13 -04:00
Joseph Doherty 327e9c5f94 Resolve Server-031..032 (re-triaged) + Server-038..043
Server-031: re-triaged. The recommended gateway-side
"skip-while-command-in-flight" guard is already in place at
WorkerClient.HeartbeatLoopAsync via WorkerClientOptions.HeartbeatStuckCeiling
(default 75s = 5× HeartbeatGrace). Two regression tests pin the
behaviour. Recommendation #1 (decouple worker-side _writeLock) is a
Worker-module concern (Worker-017 / Worker-023) and out of scope here.

Server-032: re-triaged. Recommendation #2 (rich diagnostic) is already
in EnqueueWorkerEventAsync, with #3 (overflow grace) absorbed by the
TryWrite → WriteAsync-with-timeout fall-through. Test
EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic
pins the diagnostic string. Recommendation #1 (prose contract in
gateway.md / docs) is deferred — outside this pass's edit scope.

Server-038 (Security): EventsHub.SubscribeSession's missing per-session
ACL is documented with a TODO(per-session-acl) and a <remarks> block
explaining the v1 acceptance (any dashboard role can subscribe to any
session — non-secret metadata, redacted value logging). The per-session
ACL design lands in a follow-up once a session-scoped role exists.

Server-039 (Error handling): HubTokenService.Validate now rejects a
deserialized payload where both Name and NameIdentifier are null/empty.
New test file HubTokenServiceTests.cs covers the regression and five
sanity cases. TDD confirmed.

Server-040 (Conventions): MapGroupsToRoles gains a precedence comment
explaining "full literal match first, leading-RDN fallback;
OrdinalIgnoreCase via DashboardOptions.GroupToRole". Documentation-only.

Server-041 (Design adherence): EventStreamService.ProduceEventsAsync
wraps the broadcaster.Publish call in try/catch (Exception). The
producer loop and gRPC stream are no longer at the mercy of the
broadcaster's never-throw discipline. New regression test
StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession.

Server-042 (Performance): DashboardSnapshotPublisher.ExecuteAsync now
mirrors AlarmsHubPublisher's reconnect loop — wraps the await foreach
in a while-not-cancelled, catches general exceptions, and Task.Delays
5s before retrying. An internal ctor accepts a shorter delay for the
test. New test file DashboardSnapshotPublisherTests.cs covers the
throw-then-yield reconnect path and the normal-completion case.

Server-043 (Documentation): HubTokenService class XML doc gains a
<remarks> describing the singleton lifetime, the two consumer scopes
(DashboardHubConnectionFactory scoped, HubTokenAuthenticationHandler
transient), and the thread-safety contract.

Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean
(0 warnings / 0 errors); src/ZB.MOM.WW.MxGateway.Tests 486/486 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:18:52 -04:00
Joseph Doherty d2d2e5f68f code-review 2026-05-24: re-review at d692232 across all 11 modules
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).

Module status

| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server           | 8 | 43 | +6 |
| Contracts        | 2 | 17 | +2 |
| Tests            | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java      | 5 | 31 | +5 |
| Client.Rust      | 1 | 21 | +1 |
| Worker           | 0 | 25 |  0 (rename-only diff, clean) |
| Worker.Tests     | 0 | 30 |  0 (rename-only diff, clean) |
| Client.Dotnet    | 0 | 17 |  0 (rename + alarm-fix diff, clean) |
| Client.Python    | 0 | 21 |  0 (rename + alarm-fix diff, clean) |
| Client.Go        | 0 | 21 |  0 (rename + alarm-fix diff, clean) |

Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.

New findings

  * Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
    any session id from any Viewer; no per-session ACL guards the
    EventsHub group fan-out.
  * Server-039 (Low / Error handling) — HubTokenService.Validate
    accepts a payload with null Name/NameIdentifier.
  * Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
    full-vs-RDN lookup precedence.
  * Server-041 (Low / Design adherence) — EventStreamService calls
    IDashboardEventBroadcaster.Publish without a try/catch — fragile
    seam relying on the never-throw contract.
  * Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
    retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
  * Server-043 (Low / Documentation) — HubTokenService singleton
    sharing across login + hub-token validation undocumented.

  * Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
    reserved-for-future-use ambiguity.
  * Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
    omits the alarm_filter_prefix filter description.

  * Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
    fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
  * Tests-026 (Medium / Testing coverage) — no test proves
    EventStreamService actually calls IDashboardEventBroadcaster.Publish.

  * IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
    silent fallback to Directory.GetCurrentDirectory().
  * IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
    success-path asserts ldap_group but not the Role claim.
  * IntegrationTests-024 (Low / Conventions) — inline
    NullDashboardEventBroadcaster fake duplicates Tests-side copies.

  * Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
    Gradle task names still use the old short project names.
  * Client.Java-028 (Medium / Design adherence) — JavaClientDesign
    build-layout shows the old `com/dohertylan/mxgateway/` package paths.
  * Client.Java-029 (Low / Documentation) — README installDist path
    cites the wrong directory.
  * Client.Java-030 (Low / Testing coverage) — no Java test exercises
    the regenerated QueryActiveAlarmsRequest RPC.
  * Client.Java-031 (Low / Conventions) — README prose uses old short
    project names instead of canonical prefixed ones.

  * Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
    "Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
    that does not exist; actual layout is the flat top-level crate.

Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.

Process notes

- The `code-reviews/` tree was not in this working copy's git
  history (the local extract pre-dates the divergent branch that
  carried the reviews). Restored from `dd7ca16` via
  `git checkout dd7ca16 -- code-reviews/` before the re-review.
- Some "Resolved" entries in the restored findings.md reference
  fixes that landed on the divergent branch (the same one that
  carried the reviews) and are not present on the current main
  lineage. The re-review treats those statuses as historical;
  the new pass only files findings against HEAD's actual state.
- `python code-reviews/regen-readme.py --check` is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:34:30 -04:00
Joseph Doherty d692232191 dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)

EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:

  * IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
    take a direct dependency on SignalR.
  * DashboardEventBroadcaster — singleton implementation that hands the
    SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
    logged at debug and dropped so the source gRPC stream is never
    blocked.

EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.

SessionDetailsPage adds a "Recent events" panel:
  * implements IAsyncDisposable
  * opens a second HubConnection via DashboardHubConnectionFactory targeting
    /hubs/events
  * calls SubscribeSession(SessionId) on Start
  * renders the most recent 50 events in a small table (worker seq, family,
    server/item handle, alarm reference when the event is OnAlarmTransition)
  * shows a live/offline conn-pill driven by HubConnection.Closed /
    Reconnected events

The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.

Documentation refresh

Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:

  * CLAUDE.md — Authentication section now explains LDAP bind +
    GroupToRole + HubToken bearer flow.
  * gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
    events SignalR hubs, LDAP cookie + bearer scheme.
  * docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
    add GroupToRole row, append "Authorization policies" and "SignalR hubs"
    subsections describing the three policies and the /hubs/* endpoints.
  * docs/GatewayDashboardDesign.md — hosting model (root mount, new
    endpoint layout), Realtime Updates rewritten as a hub table
    (DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
    and routing), Authentication And Authorization rewritten around LDAP +
    role mapping + the hub bearer flow, Configuration block updated.
  * docs/GatewayProcessDesign.md — security-section dashboard paragraph
    and the example config block both refreshed to LDAP/role auth.
  * docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
    updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
    API-key login flow).
  * docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
    GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
    RequiredGroup default; success-path assertion explains the role-claim
    check.

Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:07:30 -04:00
Joseph Doherty 65943597d4 dashboard: side-rail layout + SignalR push hubs (snapshot, alarms, events)
Layout
------
DashboardLayout.razor replaces the inline header nav with a left side rail
modelled on the OtOpcUa admin (Dashboard B). The top bar keeps only the
brand, breadcrumb, and signed-in status pill; navigation moves into a
fixed-width 218px rail with grouped section eyebrows (Overview,
Runtime, Galaxy, Admin) and a Session footer carrying the user name,
role claims, and a Sign-out button. dashboard.css gains the
`.app-shell` flex container, `.side-rail` column, `.rail-eyebrow`,
`.rail-link[.active]`, `.rail-foot`, `.rail-user`, `.rail-roles`, and
`.rail-btn` rules (all driven by the existing theme.css tokens, no new
hard-coded colours).

SignalR (push)
--------------
Adds three hubs under `Dashboard/Hubs/`, all gated by the
`HubClientsPolicy` registered in the previous commit:

  * DashboardSnapshotHub (/hubs/snapshot)
    Broadcasts the full DashboardSnapshot on every change. Sends the
    current snapshot to a new caller in OnConnectedAsync so the first
    paint is immediate.

  * AlarmsHub (/hubs/alarms)
    Connected clients auto-join the `__alarms__` group. Receives
    AlarmFeedMessage values (active_alarm / snapshot_complete /
    transition) re-broadcast from the gateway's central alarm monitor.

  * EventsHub (/hubs/events)
    Per-session push surface. Clients call SubscribeSession(sessionId)
    to join `session:{id}`. The publisher side is intentionally a
    follow-up — the snapshot hub already carries recent-events
    rollups; a dedicated MxEvent broadcaster on EventStreamService
    will plug into this hub's group convention.

Two BackgroundService publishers wire server-side data sources to the
hubs:

  * DashboardSnapshotPublisher subscribes to
    `IDashboardSnapshotService.WatchSnapshotsAsync` and forwards every
    snapshot to all connected hub clients.
  * AlarmsHubPublisher subscribes to `IGatewayAlarmService.StreamAsync`
    (no filter) and forwards every AlarmFeedMessage to the
    `__alarms__` group, reconnecting with a 5-second backoff if the
    stream faults.

Connection + auth plumbing
--------------------------
  * `GET /hubs/token` issues a fresh data-protected bearer token
    bound to the calling user's identity and roles. Gated by the
    cookie-only ViewerPolicy so a Blazor circuit (cookie-authenticated)
    can mint a token, but a hub bearer cannot self-bootstrap a new
    one.
  * DashboardHubConnectionFactory (scoped) is the client-side helper
    Razor pages inject. It builds a HubConnection with an
    AccessTokenProvider that calls HubTokenService.Issue on every
    (re)connect — keeps the connection alive across cookie refresh
    boundaries.

Pull → push refactor
--------------------
DashboardPageBase no longer drives its own `WatchSnapshotsAsync`
async-foreach loop. It now:
  1. seeds Snapshot synchronously from `IDashboardSnapshotService.GetSnapshot()`
     so the first render is non-empty;
  2. opens a `DashboardSnapshotHub` connection via the connection
     factory;
  3. updates Snapshot + triggers StateHasChanged on each
     `SnapshotUpdated` push.

The hub connection is best-effort: if SignalR can't start, the
synchronous snapshot seed keeps the UI populated. SignalR's
WithAutomaticReconnect handles the recovery path.

Package
-------
Adds `Microsoft.AspNetCore.SignalR.Client` 10.0.0 to the server csproj
so the in-process Blazor pages can open hub connections back to their
own hosting process.

Verification: 475 server tests (+ 2 new
`DashboardHubsRegistrationTests` that pin the hub negotiate endpoints
and the singleton/scoped DI shape), 275 worker tests (+ 9 dev-rig
skips), 18 integration tests (live MxAccess + LDAP + Galaxy) all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:48:27 -04:00
Joseph Doherty 27ed65114e dashboard: role-based LDAP auth + hub bearer scheme, drop PathBase
Restructure dashboard auth around LDAP-driven Admin/Viewer roles, add a
bearer scheme so SignalR hubs (next commit) can authenticate without
forwarding the HttpOnly browser cookie, and mount the dashboard at the
host root instead of a configurable `/dashboard` prefix.

Configuration changes (breaking):
- `MxGateway:Dashboard:PathBase` removed — the dashboard now serves at `/`.
- `MxGateway:Dashboard:RequireAdminScope` removed — role checks replace
  the single admin-scope claim.
- `MxGateway:Ldap:RequiredGroup` removed — replaced by `MxGateway:Dashboard:GroupToRole`,
  a map from LDAP group name to dashboard role. Legal role values:
  `Admin` and `Viewer`. Users whose LDAP groups don't intersect this
  map are rejected at login (the existing fail-closed contract).
- appsettings.json ships a default mapping `{ GwAdmin: Admin, GwReader: Viewer }`.

Auth model:
- DashboardRoles: new static class with `Admin` and `Viewer` constants.
- DashboardAuthenticator.AuthenticateAsync: after LDAP bind, maps the
  user's groups through `DashboardOptions.GroupToRole` and emits one
  `ClaimTypes.Role` claim per resolved role. Empty result → login fails.
- DashboardAuthorizationRequirement now carries `RequiredRoles`; static
  presets `AnyDashboardRole` (Viewer ∨ Admin) and `AdminOnly`.
- DashboardAuthorizationHandler checks `IsInRole` against the
  requirement's role list instead of the old scope claim. The
  `AuthenticationMode.Disabled` and `AllowAnonymousLocalhost` bypasses
  are preserved.
- DashboardApiKeyAuthorization.CanManage now requires the `Admin` role
  (was: required LDAP group membership). The constructor's IOptions
  parameter is gone.

Policies / schemes:
- DashboardAuthenticationDefaults gains `ViewerPolicy`, `AdminPolicy`,
  `HubClientsPolicy`, and `HubAuthenticationScheme`. The legacy
  `AuthorizationPolicy` and `ScopeClaimType` constants are removed.
- DashboardServiceCollectionExtensions registers all three policies,
  adds the cookie scheme and the HubToken bearer scheme side by side,
  calls `AddSignalR()`, and hard-codes the cookie's login/logout/denied
  paths to root-relative `/login` etc.

Hub bearer infrastructure (no hubs wired yet — next commit):
- HubTokenService: mints time-limited data-protected JSON tokens
  carrying the user's name, NameIdentifier, and roles. 30-minute
  lifetime, purpose `ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1`.
- HubTokenAuthenticationHandler: validates the token from
  `Authorization: Bearer …` or `?access_token=…` (WebSocket upgrade
  query string) and rebuilds the principal.

Endpoint mapping:
- DashboardEndpointRouteBuilderExtensions drops the `MapGroup(pathBase)`
  wrapper. Login/logout/denied and Razor component routes are now
  mounted at `/`. The login form posts to `/login`. Razor components
  require the new `ViewerPolicy`.
- All page `@page "/dashboard/X"` dual-route directives are removed —
  pages live at their canonical roots (`@page "/"`, `@page "/sessions"`, …).
- App.razor and DashboardLayout.razor drop their PathBase computations.

EffectiveLdapConfiguration drops `RequiredGroup`; EffectiveDashboardConfiguration
drops `PathBase`/`RequireAdminScope` and gains `GroupToRole`. SettingsPage
renders the role mapping in place of the retired fields.

Tests updated:
- DashboardAuthenticatorTests: covers the new GroupToRole mapping
  (short name + DN + multi-role).
- DashboardAuthorizationHandlerTests: split into Viewer-policy and
  Admin-policy cases.
- DashboardApiKeyAuthorizationTests, DashboardApiKeyManagementServiceTests:
  authorized principal now carries the `Admin` role claim.
- DashboardCookieOptionsTests: expects root-relative login/logout paths.
- GatewayApplicationTests: dashboard component routes registered at `/`,
  `/sessions`, … and gated by `ViewerPolicy`. Filter on
  `ComponentTypeMetadata` to ignore minimal-API endpoints sharing `/`.
- GatewayOptionsTests + Validator: drop PathBase / RequireAdminScope /
  RequiredGroup assertions; add a `GroupToRole` value-validation case.
- DashboardLdapLiveTests: provides the default `GwAdmin` → `Admin`
  mapping so the live LDAP bind resolves to a role.

Verification: 473 server tests, 275 worker tests (+9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass.

This commit is intentionally UI-neutral. The sidebar layout and the
SignalR hubs that consume the new HubToken scheme land in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 01:38:33 -04:00
Joseph Doherty 397d3c5c4f rename: apply ZB.MOM.WW prefix to all client SDKs + fix pre-existing alarm-RPC breaks
Rename across every client surface using each language's idiomatic convention:

  * .NET   clients/dotnet/MxGateway.Client[.Cli|.Tests]/
             -> clients/dotnet/ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]/
             namespaces -> ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]
             contracts ProjectReference repointed to ZB.MOM.WW.MxGateway.Contracts
             sln migrated to slnx (dotnet sln migrate)
  * Python src/mxgateway -> src/zb_mom_ww_mxgateway
             src/mxgateway_cli -> src/zb_mom_ww_mxgateway_cli
             distribution: mxaccess-gateway-client -> zb-mom-ww-mxaccess-gateway-client
  * Rust   crate: mxgateway-client -> zb-mom-ww-mxgateway-client
             build.rs proto path repointed
  * Java   subprojects: mxgateway-{client,cli} -> zb-mom-ww-mxgateway-{client,cli}
             packages com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
             group   com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
             rootProject mxaccessgw-java -> zb-mom-ww-mxaccessgw-java
  * Go     generate-proto.ps1 proto path repointed; module path and
             package mxgateway kept (Go convention).
  * proto-inputs.json: generatedOutputs.python updated to new package path.
  * scripts/run-client-e2e-tests.ps1: Java CLI install path + gradle task
             updated to zb-mom-ww-mxgateway-cli.

CLI binary names (mxgw, mxgw-py, mxgw-go, mxgateway-cli) and wire-level
identifiers (MXGATEWAY_* env vars, the mxgw_<id>_<secret> API key
prefix, protobuf package names like mxaccess_gateway.v1, all MXAccess
references) intentionally NOT renamed.

Fix pre-existing alarms-over-gateway breaks unblocked by the rename:

  * mxaccess_gateway.proto: add missing public message QueryActiveAlarmsRequest
    {session_id, client_correlation_id, alarm_filter_prefix} and missing
    rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns
    (stream ActiveAlarmSnapshot). All four typed clients referenced
    these but they were absent from the proto.
  * MxAccessGatewayService.QueryActiveAlarms: implement the new RPC on
    the server, streaming from IGatewayAlarmService.CurrentAlarms with
    optional alarm_filter_prefix filter.
  * clients/dotnet/.../DiscoverHierarchyOptions.cs: add the hand-written
    .NET POCO that wraps DiscoverHierarchyRequest (referenced by
    GalaxyRepositoryClient.DiscoverHierarchyAsync but never authored).
  * Drop retired session_id field references from
    AcknowledgeAlarmRequest/AcknowledgeAlarmReply test fixtures across
    .NET, Rust, Go, and Python clients.
  * Rust integration test: add the missing stream_alarms impl on the
    fake MxAccessGateway server (the trait gained the method, fake
    didn't).
  * Rust CLI test: bump expected gatewayProtocolVersion 2 -> 3.

Regenerated artifacts updated in this commit:
  * src/ZB.MOM.WW.MxGateway.Contracts/Generated/{MxaccessGateway,MxaccessGatewayGrpc}.cs
  * clients/python/src/zb_mom_ww_mxgateway/generated/*_pb2{,_grpc}.py
  * clients/go/internal/generated/*.pb.go
(C# regenerated by Grpc.Tools on contracts build; Python and Go via
their generate-proto.ps1 scripts; Rust regenerates from .proto via
tonic-build at compile time so no checked-in artefact.)

Verification: 472 server tests, 275 worker tests (9 dev-rig skipped),
18 integration tests (live MxAccess + LDAP + Galaxy), 57 .NET client
tests, 32 Rust workspace tests, 39 Python tests, all Go packages, and
gradle build for Java all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 19:09:34 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00
dohertj2 867bf18116 alarms-over-gateway: full pipeline (#118)
Seven slices on this branch implement the full alarms-over-gateway path:

1. f711a55  A.2: WnWrapAlarmConsumer replaces aaAlarmManagedClient (wnwrapConsumer.dll, XML payload bypasses FILETIME crash)
2. 82eb0ad  A.3 in-process: AlarmDispatcher wires consumer events onto worker MxAccessEventQueue
3. 01f5e6a  A.3 worker IPC: SubscribeAlarms / UnsubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms commands + executor switch arms
4. 9b21ca3  A.3 gateway: WorkerAlarmRpcDispatcher routes RPCs through the IPC; replaces NotWiredAlarmRpcDispatcher in DI
5. 47b1fd4  A.3 auto-subscribe: SessionManager issues SubscribeAlarms on session open (gated by Alarms.Enabled config)
6. 4e02927  A.3 alarm-ack-by-name: public AcknowledgeAlarm now accepts Provider!Group.Tag references via AlarmAckByName
7. a4ed605  A.3 live smoke: end-to-end pipeline verified on dev rig; surfaced + fixed three production-relevant AVEVA quirks (SetXmlAlarmQuery required for reads, breaks acks; v2 8-arg AlarmAckByName is a stub; AlarmAckByGUID is a stub)

Known follow-ups not in scope:
 - WnWrapAlarmConsumer.PollOnce needs to be driven from the worker StaRuntime (production hosting); currently the timer-based path deadlocks on cross-apartment marshaling without an STA pump.
 - Pre-existing structure-test failure (test project ArchestrA.MxAccess ref) untouched.

Test counts at merge time:
  Worker: 195 pass / 4 skipped (live probes incl. AlarmsLiveSmokeTests) / 1 pre-existing fail
  Server: 308 pass / 0 fail
2026-05-01 12:31:27 -04:00
Joseph Doherty a4ed605f74 A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
  - Subscribe + 1st PollOnce yield real transition events
  - Field-by-field decode correct (provider, group, tag, severity,
    UTC timestamp, comment, type)
  - SnapshotActiveAlarms reflects current state
  - AcknowledgeByName(real identity) -> rc=0
  - Pipeline keeps streaming transitions on the 10s cadence post-ack

Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:

1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
   earlier discovery-doc recommendation) makes the first
   GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
   call is unnecessary because the round-trip echo is mangled was
   wrong — mangled echo or not, the call is required.

2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
   instance (returns -55). Workaround: provision a parallel
   "ack-only" wnwrap consumer that runs Initialize → Register →
   Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
   Production WnWrapAlarmConsumer now holds two COM clients;
   AcknowledgeByName always dispatches through the ack-only one.

3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
   8-arg overload returns -55 on this AVEVA build (apparently a
   stub); the v1 6-arg overload works. Production now calls the
   6-arg overload, discarding the proto's operator_domain and
   operator_full_name fields. The proto contract keeps both for
   forward-compat if AVEVA fixes the v2 method.

Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).

Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.

Test counts after this slice:
  Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
          1 pre-existing structure-fail (untouched)
  Server: 308 pass / 0 fail
Solution builds clean.

docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:17:39 -04:00
Joseph Doherty 4e02927f01 A.3 (alarm-ack-by-name): public AcknowledgeAlarm now accepts Provider!Group.Tag references
Closes the gap where the public AcknowledgeAlarm RPC required canonical
GUIDs but OnAlarmTransitionEvent.AlarmFullReference is "Provider!Group.Tag".
Adds an AVEVA AlarmAckByName path that wraps wwAlarmConsumerClass.AlarmAckByName
so callers can ack with the natural reference.

Proto:
- New MxCommandKind.AcknowledgeAlarmByName (=29).
- New AcknowledgeAlarmByNameCommand(alarm_name, provider_name, group_name,
  comment, operator_user/node/domain/full_name) on MxCommand oneof.
- AcknowledgeAlarmReplyPayload (existing) carries the AVEVA native
  status; reused for the by-name path.

Worker:
- IMxAccessAlarmConsumer + WnWrapAlarmConsumer + AlarmDispatcher +
  AlarmCommandHandler all gain an AcknowledgeByName(name, provider,
  group, comment, operator-identity) overload that maps to
  wwAlarmConsumerClass.AlarmAckByName.
- MxAccessCommandExecutor: new switch arm routes
  MxCommandKind.AcknowledgeAlarmByName to the handler. Empty alarm_name
  yields InvalidRequest; handler exceptions surface as MxaccessFailure.

Gateway:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference: parses
  "Provider!Group.Tag" with the convention that the FIRST '!' separates
  provider, the FIRST '.' after '!' separates group; tag may contain
  more dots.
- AcknowledgeAsync now branches: GUID input → AcknowledgeAlarm command
  (existing path); reference input → AcknowledgeAlarmByName command
  (new path); neither parses → InvalidRequest with a clear diagnostic.

Tests: 13 new unit tests cover each layer end-to-end:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference (3 valid + 8 invalid
  forms) including the realistic 4-component "Galaxy!TestArea.
  TestMachine_001.TestAlarm001" reference.
- WorkerAlarmRpcDispatcher.AcknowledgeAsync routes references through
  AcknowledgeAlarmByName + propagates the full operator tuple.
- Executor switch arm carries the by-name tuple and rejects empty
  alarm_name.
- AlarmDispatcher.AcknowledgeByName forwards to consumer.
- Existing fakes extended for the new overload.

Counts: server 308/0, worker 195/3 skip / 1 pre-existing structure-fail
(untouched). Solution builds clean.

End-to-end alarms-over-gateway now serves the full lmxopcua flow:
client.AcknowledgeAlarm(reference="Galaxy!TestArea.TestMachine_001.TestAlarm001",
operator_user="alice") → gateway parses → IPC AcknowledgeAlarmByName →
worker AlarmAckByName → AVEVA history. The remaining piece for full
parity is a live dev-rig smoke test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:17:15 -04:00
Joseph Doherty 47b1fd422c A.3 (auto-subscribe): SessionManager issues SubscribeAlarms on session open
Adds the missing trigger that activates the worker's wnwrap consumer.
Without this, every session opened in OK state but the consumer never
started, so AcknowledgeAlarm/QueryActiveAlarms returned "alarm consumer
not configured" forever.

New AlarmsOptions config block (under MxGateway:Alarms):
  - Enabled (default false): gates the auto-subscribe path so existing
    deployments without alarm configuration are unaffected.
  - SubscriptionExpression: explicit AVEVA expression like
    \<machine>\Galaxy!<area>.
  - DefaultArea: fallback used when SubscriptionExpression is empty;
    composes \$(MachineName)\Galaxy!$(DefaultArea).
  - RequireSubscribeOnOpen (default false): when true, an auto-subscribe
    failure faults the session; when false, the failure is logged and
    the session stays Ready (data subscriptions keep working, alarms
    return "not subscribed" until the operator retries).

SessionManager.OpenSessionAsync gains a TryAutoSubscribeAlarmsAsync hook
that runs after MarkReady. Skips when alarms are disabled; otherwise
builds a SubscribeAlarmsCommand, invokes it on the session's worker
client, and either logs the resulting status or escalates per
RequireSubscribeOnOpen. SessionManagerException is the failure mode for
the strict path so callers in MxAccessGatewayService surface it as
session-open-failed.

Tests: 7 new unit tests cover the disabled lane, expression-driven
subscribe, DefaultArea fallback, success path, soft-failure (require
off), strict-failure (require on), and missing-config-strict-throw.
Server suite total: 295 pass / 0 fail. Solution builds clean.

End-to-end alarms-over-gateway path is now live (with config). Open a
session against a gateway with Alarms.Enabled=true + a valid
SubscriptionExpression; the worker's wnwrap consumer auto-subscribes;
QueryActiveAlarms streams snapshots; AcknowledgeAlarm acks by GUID.
Reference→GUID resolution (AlarmAckByName worker command) and the live
dev-rig smoke test remain follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:10:13 -04:00
Joseph Doherty 9b21ca3554 A.3 (gateway dispatcher): WorkerAlarmRpcDispatcher routes alarm RPCs over the worker pipe
Replaces NotWiredAlarmRpcDispatcher in DI with a production
implementation that issues the new MxCommandKind.{AcknowledgeAlarm,
QueryActiveAlarms} commands across the IPC and unwraps the resulting
MxCommandReply into the public RPC types.

QueryActiveAlarms is fully wired: builds the QueryActiveAlarmsCommand
(forwarding alarm_filter_prefix), invokes it on the resolved
GatewaySession's worker client, and yields each ActiveAlarmSnapshot
from the QueryActiveAlarmsReplyPayload as the RPC stream. Worker
failures + missing sessions yield an empty stream — matches the
ConditionRefresh contract clients already speak to.

AcknowledgeAlarm is partially wired: the public RPC takes
AlarmFullReference (Provider!Group.Tag), but the worker's wnwrap
consumer acks by GUID. Strategy:
- If AlarmFullReference parses as a canonical GUID, forward it
  directly through MxCommandKind.AcknowledgeAlarm. Native status
  flows back via MxCommandReply.Hresult and the dedicated
  AcknowledgeAlarmReplyPayload.NativeStatus.
- Otherwise, return InvalidRequest with a clear diagnostic naming the
  follow-up — reference→GUID lookup needs a worker-side AlarmAckByName
  command wrapping wwAlarmConsumerClass.AlarmAckByName.

DI: SessionServiceCollectionExtensions registers WorkerAlarmRpcDispatcher
as the default IAlarmRpcDispatcher; MxAccessGatewayService picks it up
via constructor injection. NotWiredAlarmRpcDispatcher is retained for
test fixtures that want the no-side-effect fake.

Tests: 7 new unit tests cover session-not-found short-circuit, GUID-vs-
reference branching, native-status propagation, worker MxaccessFailure
diagnostic propagation, and snapshot-stream yielding. Server test
suite total: 288/0 fail. Solution builds clean.

End-to-end alarms-over-gateway pipeline status:
  consumer → sink → queue (A.2 + A.3 in-process slice)
  worker IPC commands (A.3 worker slice)
  gateway dispatcher (this slice)

Remaining for full E2E:
  - Auto-issue SubscribeAlarms on session open (or add a public
    SubscribeAlarms RPC). Without this trigger the consumer never
    starts and Acknowledge/Query return "not subscribed".
  - AlarmAckByName worker command for ack-by-reference.
  - End-to-end live test against the dev rig.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:58:40 -04:00
Joseph Doherty 01f5e6ad91 A.3 (worker IPC slice): proto SubscribeAlarms/Acknowledge/QueryActive commands + executor routing
Adds the worker-side IPC surface for the alarm subsystem so the gateway
can drive the AlarmDispatcher across the named-pipe boundary. Adds four
proto MxCommandKind values + matching command messages and two
MxCommandReply payload variants:

- SubscribeAlarmsCommand(subscription_expression)
- UnsubscribeAlarmsCommand
- AcknowledgeAlarmCommand(alarm_guid, comment, operator_user/node/domain/full_name)
- QueryActiveAlarmsCommand(alarm_filter_prefix)
- AcknowledgeAlarmReplyPayload(native_status)
- QueryActiveAlarmsReplyPayload(repeated ActiveAlarmSnapshot snapshots)

Worker plumbing:

- New IAlarmCommandHandler interface + AlarmCommandHandler production
  impl. Lazy-creates an AlarmDispatcher (with a wnwrap-backed consumer
  by default) on the first SubscribeAlarms; routes Acknowledge / QueryActive /
  Unsubscribe through it. Idempotent under repeated Unsubscribe; rejects
  a second Subscribe without an intervening Unsubscribe; cleans up the
  consumer if the underlying Subscribe call throws.
- MxAccessCommandExecutor: 4 new switch arms map MxCommandKind values to
  IAlarmCommandHandler calls. Acknowledge surfaces the AVEVA native
  status into both MxCommandReply.Hresult and the dedicated
  AcknowledgeAlarmReplyPayload.NativeStatus so gateway-side consumers
  can echo it without unpacking the outer envelope. Invalid GUIDs and
  missing payloads return InvalidRequest; handler exceptions return
  MxaccessFailure with the exception message in DiagnosticMessage.
- MxAccessStaSession: new constructor overload accepts an
  alarmCommandHandlerFactory; it's invoked on the STA thread during
  StartAsync and the resulting handler is passed into the executor.
  ShutdownGracefullyAsync + Dispose tear it down on the STA before the
  data-side cleanup runs.

Tests: 20 new unit tests covering AlarmCommandHandler lazy lifecycle
(Subscribe/Unsubscribe/Acknowledge/Query/Dispose, error paths) and the
executor's 4 alarm switch arms (OK/InvalidRequest/MxaccessFailure paths,
hresult propagation, prefix filtering). Worker test suite total: 192
passed / 3 skipped (live probes) / 1 pre-existing structure-test fail
(untouched).

Deferred to next slice: gateway-side WorkerAlarmRpcDispatcher that
replaces NotWiredAlarmRpcDispatcher, builds + sends these commands across
the IPC, and unwraps the resulting MxCommandReply into AcknowledgeAlarmReply
/ ActiveAlarmSnapshot stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:52:04 -04:00
Joseph Doherty 82eb0ad569 A.3 (in-process slice): AlarmDispatcher wires consumer events onto event queue
Adds the in-process plumbing that connects WnWrapAlarmConsumer's
AlarmTransitionEmitted stream to the worker's MxAccessEventQueue via
MxAccessAlarmEventSink. With this change a transition raised by the
consumer lands as an OnAlarmTransitionEvent proto on the queue,
SessionId attached, ready for IPC dispatch.

Mapping: provider!group.tag → AlarmFullReference, tag → SourceObjectReference,
priority → severity, wnwrap STATE → AlarmConditionState (Active /
ActiveAcked / Inactive — wnwrap's ack-vs-unack-on-cleared distinction
collapses since OPC UA Part 9 doesn't model it). State delta drives
AlarmTransitionKind via the existing AlarmRecordTransitionMapper table.

Holding off on the proto IPC additions (SubscribeAlarms /
AcknowledgeAlarm / QueryActiveAlarms commands + WorkerAlarmRpcDispatcher)
for a follow-up — those touch every layer of the worker IPC and warrant
their own PR. This slice proves the consumer→sink→queue pipeline
end-to-end with unit tests and clears the path for the proto additions
to plug in cleanly.

Tests: 10 new unit tests cover field-by-field mapping, the
"unchanged-state-doesn't-emit" filter, the state→transition kind table,
Subscribe / Acknowledge passthrough, SnapshotActiveAlarms → proto
ActiveAlarmSnapshot mapping, and Dispose detaches the handler. All
passing; total worker test count 172/3 skip / 1 pre-existing structure
fail (untouched).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:52:35 -04:00
Joseph Doherty f711a55be4 A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient`
to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by
`wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML
payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime
auto-marshaling that crashed `GetHighPriAlarm` with
ArgumentOutOfRangeException on every poll. Live captured 60/60 polls
clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform
script flipped TestMachine_001.TestAlarm001 every 10s; the GUID,
priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps
arrived end-to-end.

Implementation:
- `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under
  `lib/` so dev boxes don't need the SDK to build.
- New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a
  500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the
  snapshot keyed by alarm GUID, and raises one
  `MxAlarmTransitionEvent` per state change. Includes the
  Initialize→Register-before-Subscribe ordering fix found during
  Discovery probe runs.
- New library-agnostic types `MxAlarmSnapshotRecord` /
  `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build
  path is testable without an AVEVA install.
- `AlarmRecordTransitionMapper` retired the COM-coupled
  `MapTransitionKind(eAlmTransitions)`; new pure helpers
  `ParseStateKind`, `MapTransition(prev, curr)`, and
  `ParseTransitionTimestampUtc` cover XML decode + state-delta logic.
- `IMxAccessAlarmConsumer` event surface changed from
  `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>`
  and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` —
  decoupling the interface from any specific COM library.
- Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider`
  refs; adds `Interop.WNWRAPCONSUMERLib`.

Tests:
- 36 new unit tests (state-string mapping, prev/current → proto kind
  decision table, timestamp UTC reassembly, XML payload parser, 32-char
  hex GUID round-trip) covering everything that doesn't touch the live
  COM surface — all passing.
- Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives
  the live capture flow for regression / future probes.

Docs:
- `docs/AlarmClientDiscovery.md` "Option A — captured" section records
  sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip
  (prefer `Subscribe` for filtering), the `GetStatistics`
  AccessViolationException quirk, and the worker-integration outline.

Pre-existing failure noted (separate):
`MxAccessInteropReference_ExistsOnlyInWorkerProject` was already
failing on HEAD — the test project still references `ArchestrA.MxAccess`
for the Skip-gated discovery probes. Not regressed by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:44:15 -04:00
Joseph Doherty f490ae2593 docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements
only IDisposable (no [ComImport] interface, no class GUID) and
has a single field "CwwAlarmConsumer* m_almUnmanaged". So
AlarmClient is a C++/CLI managed wrapper around a native C++
class -- NOT a COM-interop class. The DateTime conversion happens
INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling
boundary. There's no separate COM interface to QI to.

Revised approach (in docs/AlarmClientDiscovery.md):

A. wnwrapConsumer.dll -- separate standalone COM library AVEVA
   ships at "C:\Program Files (x86)\Common Files\ArchestrA"
   exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with
   SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output
   bypasses FILETIME marshaling entirely. Best fit -- real COM,
   self-contained, conventional production-grade approach.
B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a
   vendor binary, brittle to upgrades.
C. Reflect into m_almUnmanaged and call native vtable directly
   -- requires reverse-engineering the C++ class layout.

Picking A. Probe restored to Skip; next commit starts the
wnwrapConsumer integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:15:37 -04:00
Joseph Doherty 39f9fd8946 probe: BREAKTHROUGH — alarms flow via canonical \Node\Galaxy!Area, blocked by DateTime marshaling
Two findings that turn the alarm capture path on:

1. Subscription expression: \<MachineName>\Galaxy!<Area> is the
   canonical AlarmClient subscription format per ArchestrA docs:
   \Node\Provider!Area!Filter, with Provider literally "Galaxy"
   (not the Galaxy name) and Node being the machine name. For
   this rig: \DESKTOP-6JL3KKO\Galaxy!DEV catches alarms.
2. InitializeConsumer before RegisterConsumer — discovered
   earlier; bug-fix for PR A.5's AlarmClientConsumer.

With these in place, GetHighPriAlarm returned a record on every
poll for 60s straight (117/117 calls). But every call throws
ArgumentOutOfRangeException: Not a valid Win32 FileTime, because
AlarmRecord has five DateTime fields (ar_Time / ar_OrigTime /
ar_AckTime / ar_RtnTime / ar_SubTime) and AVEVA writes sentinel
FILETIME values for unset ones (e.g., ar_AckTime on an
unacknowledged alarm). The aaAlarmManagedClient.dll auto-marshals
FILETIME -> DateTime and rejects out-of-range values.

GetStatistics still reports total=0 active=0 even with
GetHighPriAlarm returning records — those two APIs have
different views. The active read API for current alarms is
GetHighPriAlarm, not GetStatistics's change array.

So the consumer chain works. The blocking issue is now
extracting the payload past the AVEVA-shipped DateTime
auto-marshaling. Three approaches for the next PR:

1. Patch aaAlarmManagedClient.dll via ildasm/ilasm round-trip.
2. Define a custom [ComImport] interface with safe-blittable
   types and Marshal.QueryInterface to it.
3. Use IDispatch late binding to bypass strong-typed marshaling.

Option 2 is cleanest; needs the AlarmClient COM IID.

Probe changes:

- Subscription expression set to \<MachineName>\Galaxy!DEV.
- GetHighPriAlarm tally counters (ok-with-record vs throw).
- 117 throws / 0 ok-with-record over 60s confirms alarms are
  flowing continuously while the user's flip script runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:06:45 -04:00
Joseph Doherty bb7be14d1d probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified
Sixth probe iteration with every consumer-side knob exhausted:

- Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!,
  \Galaxy!TestArea, \.\Galaxy!.
- Read channels polled at 500ms: GetStatistics, GetHighPriAlarm,
  SFCreateSnapshot + SFGetStatistics.
- Filters: priority 0..32767, qtSummary + qtHistory both tried,
  asAlarmActiveNow.
- AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug
  on default(DateTime).

Result: every read API returns empty for the entire 60s window
even with TestMachine_001.TestAlarm001 firing every 10s and
aaObjectViewer confirming InAlarm transitions. The
aaAlarmManagedClient.AlarmClient is not the receive surface
AVEVA's alarm pipeline routes to in this Galaxy configuration.

The consumer chain is verified working end-to-end: Initialize +
Register + Subscribe all succeed, GetProviders finds the
provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's
internal hwnd. There is simply no alarm data flowing through
this consumer surface.

Next investigation is not consumer-side: either find the SDK
aaObjectViewer's alarm panel uses, or query the historian
event storage directly. If alarms only flow via the historian
path on this customer's Galaxy, the worker's PR A.5 architecture
is a dead-end and A.2 needs a different transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:26:29 -04:00
Joseph Doherty 8ac6642bf8 probe: subscribe-parameter sweep — alarms still absent, producer-side blocked
Tried every documented subscription knob with InitializeConsumer
present + provider visible at status 100:

- qtSummary AND qtHistory (the only eQueryType values).
- Priority 1..999 AND 0..32767.
- FilterMask/Spec asNone AND asAlarmActiveNow.

eAlarmFilterState is single-state-valued (asNone=0,
asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits,
so the filter surface is exhausted.

GetStatistics continued to report total=0 active=0 codes=[7]
for every poll across all combinations.

User confirmation: the BoolAlarm extension on
TestMachine_001.TestAlarm001 is evaluating (the $Alarm.InAlarm
sub-attribute flips true/false in lockstep with the script
writes, visible in aaObjectViewer). So the consumer chain is
verified working end-to-end on our side. What's missing is
producer-side publication into the aaAlarmManagedClient stream.

Probable causes (config, not code):

- BoolAlarm extension's "publish to alarm manager" / "Active" /
  "Enabled" flag may be off.
- Alarm-vs-event mode setting may have it routing to events,
  not alarms.
- Platform alarm area may not match the consumer's subscription
  scope.

Resolution path: check the BoolAlarm extension's config in System
Platform IDE; check aaObjectViewer's Active Alarms panel (not
attribute panel) to see if the alarm appears there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:53:26 -04:00
Joseph Doherty 4e8928cf71 probe: InitializeConsumer required — provider visible after, alarms still absent
InitializeConsumer was the missing call. Adding it before
RegisterConsumer makes the \Galaxy! provider appear in
GetProviders (status 0 -> 100 within 500ms). Without Initialize,
GetProviders returns an empty list even though everything else
returns rc=0 (success).

Probe trace 2026-05-01:

  InitializeConsumer -> 0
  RegisterConsumer -> 0
  GetProviders [after Register] -> count=0 list=[]
  Subscribe('\Galaxy!') -> 0
  GetProviders [after Subscribe] -> count=1 list=[  0 \Galaxy!]
  GetProviders [poll #1] -> count=1 list=[100 \Galaxy!]

Despite the provider being at "100% query complete" for the
entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] -- no alarm transitions reached the
consumer even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run.

So the consumer chain works end-to-end. What's missing is alarm
traffic from the producer side. The next discriminator is
whether ObjectViewer (or another live consumer) sees the alarm
fire while the script runs.

API-ordering bug fix to apply to PR A.5's AlarmClientConsumer
regardless of how A.2 lands: AlarmClientConsumer.Subscribe
should call InitializeConsumer before RegisterConsumer (currently
omits Initialize entirely, which means the provider chain is
never visible from the worker either). That fix lifts a
fundamental bug independent of the polling-vs-callback question.

Probe changes:

- Added InitializeConsumer call before RegisterConsumer.
- Added LogProviders helper that logs only on change; called
  after Register, after Subscribe, and on every poll. Easier
  to spot when the provider chain transitions from empty to
  populated.
- Restored Skip-gating after run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:43:06 -04:00
Joseph Doherty f4423dfb6d probe: GetProviders=0 — alarm path upstream-blocked on dev rig
Extended AlarmClientWmProbeTests to call AlarmClient.GetProviders
after RegisterConsumer. Run 2026-05-01:

  GetProviders -> rc=0 count=0 list=[]

Zero alarm providers visible to the consumer. This explains every
preceding probe run — no providers means no alarm events,
regardless of subscription expression or value writes upstream.
Even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run,
GetStatistics reported no transitions, no positions[] entries,
no field changes after t=0.85s.

Possible causes (dev-rig configuration, not code):

1. No $Alarm extension on the test bool — flipping the value
   writes a value but doesn't fire an alarm.
2. AVEVA alarm-manager service (aaAlarmMgr or equivalent) not
   running on this rig.
3. Process security context — providers registered under a
   service account aren't visible to a consumer running under
   a normal user account.

A.2 implementation is blocked on this until at least one provider
is visible. Once a provider exists, the polling-vs-callback
question is answerable in one probe run; without a provider both
paths return the same "nothing happening" answer.

Probe changes:

- Added in-process MxAccess Write attempt (TriggerWriteValue) —
  hit TargetParameterCountException so the Write signature is
  not (handle, item, value); reflection diag added but not
  resolved. Now disabled in favor of external trigger.
- Added GetProviders enumeration after RegisterConsumer.
- Removed firePrint/clearPrint markers; probe is observe-only.
- Added ArchestrA.MxAccess reference to the test project.

Also updated docs/AlarmClientDiscovery.md with the
alarm-provider-visibility section explaining what's blocked
and why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:37:15 -04:00
Joseph Doherty 3ff4969224 probe: GetStatistics polling viable, Galaxy has no active alarms today
Extended AlarmClientWmProbeTests.ProbeAlarmClientWmMessages to also
call GetStatistics every ~2s during the pump window. Re-ran on the
dev rig 2026-05-01:

- GetStatistics is safely callable from the same thread that did
  RegisterConsumer + Subscribe. Every poll (9 calls / 20s window)
  returned rc=0, no exceptions.
- Galaxy currently has zero active alarms. total=0 active=0
  suppressed=0 newAlarms=0 across every poll. positions[] and
  handles[] arrays were empty.
- changes=1 codes=[7] was constant across all polls, matching the
  constant 1 Hz WM 0xC275 cadence — same heartbeat semantics
  exposed through both the WM path and the pull API.

Confirms the polling design is mechanically viable: GetStatistics
threading-affinity is fine and the call is cheap. The remaining
unknown is whether GetStatistics populates positions[] / handles[]
with real entries when an alarm actually fires. Proving that
requires triggering an alarm — next probe is an MxAccess write to a
$Alarm-extended boolean tag (reference pending).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:16:08 -04:00
Joseph Doherty 12881ca791 docs+test: live AlarmClient WM probe — heartbeat-only, hWnd not used
Added MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs as a Skip-gated
runtime probe. Run on the dev rig 2026-05-01 against the live AVEVA
install (Galaxy reachable, no manual alarm fired). Findings:

- RegisterConsumer(hWnd, ...) and Subscribe("\Galaxy!", ...) both
  return 0 (success). Calls are valid against the deployed assembly.
- A registered-message-class WM (ID 0xC275 in this OS session) fires
  every ~1 second after Subscribe completes. Constant wParam=0x1100,
  constant lParam=0x079E46D8 — looks like a heartbeat / keepalive,
  not a per-change notification.
- Critically, this WM is delivered to AVEVA's own internal window
  (hwnd=0x18032E), NOT to the consumer hWnd we registered. The
  consumer window receives only the standard WM_CREATE / WM_DESTROY
  sequence; no AVEVA traffic in between.

This invalidates the WM_APP-pump design previously documented. The
hWnd parameter to RegisterConsumer appears to be a registration
identity only — AVEVA's notification path runs entirely against
AVEVA's own internal window.

Two viable A.2 designs replace the previous one:

1. Polling. Call GetStatistics on a 500ms timer in the worker's STA
   and react to whatever change set it reports. No window plumbing
   needed. Latency floor = poll period. Matches AVEVA's own
   internal heartbeat cadence.
2. Hook AVEVA's internal window. Discover AVEVA's own hwnd,
   SetWindowSubclass on it, intercept WM 0xC275 on AVEVA's thread.
   Higher fidelity, lower latency, but invasive and fragile across
   AVEVA upgrades — likely a non-starter.

Recommendation in docs/AlarmClientDiscovery.md is option 1 (polling)
unless a follow-up probe with a real fired alarm shows AVEVA does
post change-specific WMs to a different hWnd.

Open follow-up probes documented:

- Fire a real Galaxy alarm during pump and check whether WM 0xC275
  cadence changes or GetStatistics returns non-empty arrays.
- GetStatistics threading affinity test.
- Hook AVEVA's internal window 0x18032E.
- Decompile aaAlarmManagedClient IL for RegisterConsumer to find
  whether WNAL_Register's callback surface is wrapped.

Test project changes:

- Added Reference to aaAlarmManagedClient + IAlarmMgrDataProvider
  (Private=true so the DLL gets copied into bin for test load).
- Test-suite-wide: 127 real tests still pass; both alarm-related
  Skip-gated tests skip cleanly.

Code change to the probe is additive — the worker is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:05:47 -04:00
Joseph Doherty 6e356da092 docs: AlarmClient public surface — managed-event premise wrong, WM_APP required
Reflection probe of the deployed aaAlarmManagedClient.dll
(v1.0.7368.41290) on 2026-05-01 confirmed the public AlarmClient class
exposes zero public events. The PR A.5 design that AlarmClientConsumer
is built on (managed-event surface, no message pump) does not hold
against this assembly.

The actual notification mechanism is WM_APP messaging:
RegisterConsumer(hWnd, ...) takes a window handle because AVEVA's alarm
provider WM_APP-pokes the registered window, then GetStatistics +
GetAlarmExtendedRec pull the change set on each poke.

Practical impact:

- AlarmClientConsumer.AlarmRecordReceived has no production caller.
  RaiseAlarmRecordReceived is invoked only from tests. Subscribe(...)
  returns OK from RegisterConsumer + Subscribe but no notifications
  reach the consumer at runtime because no window is attached.
- Until A.2 lands a hidden message-only window + WindowProc that routes
  WM_APP into MxAccessAlarmEventSink.EnqueueTransition, the gateway's
  MX_EVENT_FAMILY_ON_ALARM_TRANSITION family cannot carry events.
- AcknowledgeByGuid and SnapshotActiveAlarms are pull-style and remain
  correct as written.

Changes:

- docs/AlarmClientDiscovery.md (new) — reflection probe summary, full
  AlarmClient method list, open questions for A.2 implementation.
- AlarmClientConsumer.cs xmldoc — replaced the inaccurate "managed
  event surface" claim with the WM_APP finding; flagged
  AlarmRecordReceived as unreachable in production until the WM_APP
  pump lands.
- MxAccessAlarmEventSink.cs xmldoc — replaced the "verify on dev rig"
  hedge in the wiring plan with the resolved finding; expanded the
  open-questions list (WM_APP message ID, wParam/lParam semantics, STA
  affinity, subscription scope) so the next A.2 PR knows what the
  dev-rig probe needs to answer.

Code-only no-op for the worker; worker builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:50:57 -04:00
dohertj2 a739fadb5f Merge pull request 'gateway: alarm-RPC dispatcher seam (PRs A.6 + A.7)' (#117) from track-a6-a7-alarm-rpc-dispatch into main 2026-04-30 22:50:07 -04:00
Joseph Doherty 6b3c117d1e gateway: alarm-RPC dispatcher seam (PRs A.6 + A.7)
Replaces the inline diagnostic strings in PR A.3's AcknowledgeAlarm
+ QueryActiveAlarms handlers with an IAlarmRpcDispatcher seam.

- IAlarmRpcDispatcher (new) — gateway-side abstraction over the
  worker-RPC path that fronts AlarmClient.AlarmAckByGUID and the
  active-alarm walk. AcknowledgeAsync returns the
  AcknowledgeAlarmReply directly; QueryActiveAlarmsAsync yields an
  IAsyncEnumerable<ActiveAlarmSnapshot>.
- NotWiredAlarmRpcDispatcher (new, default impl) — returns
  PROTOCOL_STATUS_OK with a structured worker-pending diagnostic
  on Acknowledge, yields an empty stream on QueryActiveAlarms.
  Same observable shape as PR A.3, but the integration seam is
  now in code instead of hardcoded inside the handler.
- MxAccessGatewayService — handlers delegate to the dispatcher.
  Constructor accepts an optional IAlarmRpcDispatcher (default
  NotWiredAlarmRpcDispatcher); a future WorkerAlarmRpcDispatcher
  registration in DI swaps in the live worker-IPC routing without
  changing the public RPC surface.
- 2 new dispatcher tests pin the not-wired contract; 279 → 281
  total tests, all green.

Worker-side dispatch (translating Acknowledge / QueryActiveAlarms
to the IPC method that calls IMxAccessAlarmConsumer from PR A.5)
is the dev-rig follow-up — it depends on validating the AVEVA
GetAlarmChangesCompleted event subscription against a live alarm
provider before pinning a wire format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:47:42 -04:00
dohertj2 c7d5b83390 Merge pull request 'worker: AlarmClientConsumer + transition mapper (PR A.5)' (#116) from track-a5-alarm-consumer-wiring into main 2026-04-30 22:44:51 -04:00
Joseph Doherty 1ac5bcafb2 worker: AlarmClientConsumer + transition mapper (PR A.5)
Wires the worker-side consumer for AVEVA alarm transitions over the
aaAlarmManagedClient API discovered in the prior foundation PR.

- IAlarmMgrDataProvider.dll referenced — exposes AlarmRecord +
  eAlmTransitions / eQueryType / eSortFlags / eAlarmFilterState.
  Both DLLs (aaAlarmManagedClient + IAlarmMgrDataProvider) load in
  the worker's existing net48 x86 process; no new bitness boundary.
- IMxAccessAlarmConsumer abstraction — Subscribe / AcknowledgeByGuid
  / SnapshotActiveAlarms / AlarmRecordReceived event. Test seam.
- AlarmClientConsumer production wrapper — RegisterConsumer +
  Subscribe + AlarmAckByGUID + GetStatistics-based active-alarm
  walk, all delegated to AlarmClient. Uses AVEVA's managed event
  surface (GetAlarmChangesCompleted on IAlarmMgrDataProvider) so
  no Windows message pump is required — plain .NET events arrive
  on the alarm-client's internal callback thread.
- AlarmRecordTransitionMapper — pure-function helpers:
    MapTransitionKind(eAlmTransitions): ALM→Raise, ACK→Acknowledge,
        RTN→Clear, others (SUB/ENB/DIS/SUP/REL/REMOVE)→Unspecified
        so EventPump's decoding-failure counter records them.
    ComposeFullReference(provider, group, name): Provider!Group.Name
        format matching AVEVA's standard alarm-reference syntax.

Pinned during dev-rig validation (subsequent commits):

1. Confirm RegisterConsumer accepts hWnd=0 — if it requires a real
   hwnd, the worker creates a hidden message-only window and
   passes that handle. The managed event surface should make
   this irrelevant but the AVEVA API is older than its managed
   wrapper.
2. Wire AlarmClientConsumer.AlarmRecordReceived: the AVEVA
   IAlarmMgrDataProvider.GetAlarmChangesCompleted event needs to
   be hooked from inside the AlarmClient — find the proper
   accessor (likely a property exposing the inner provider).
3. AlarmRecord field-by-field translation into the proto event
   uses MxAccessAlarmEventSink.EnqueueTransition (existing
   plumbing). The AlarmRecord field names (ar_OrigTime,
   AlarmName, AckOperatorFullName, AckComment, etc.) are
   pinned in the discovery dump preserved in
   AlarmClientDiscoveryTests.

Tests: 127 pass (4 new ComposeFullReference cases + 1 Skip-gated
discovery probe). Transition-kind enum mapping is dev-rig-validated
rather than unit-tested because the AVEVA assembly is Private=false
on the reference and isn't copied to the test bin directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:42:22 -04:00
dohertj2 e7c2c546b5 Merge pull request 'worker: aaAlarmManagedClient discovery + reference (alarm-helper foundation)' (#115) from track-alarm-helper-discovery into main 2026-04-30 22:20:07 -04:00
Joseph Doherty a14098468b worker: aaAlarmManagedClient discovery + reference (alarm-helper foundation)
Discovers the surface of aaAlarmManagedClient.dll and stages the worker
csproj reference so subsequent PRs can wire native MxAccess alarm
subscription. Replaces the speculative "operator decision needed
between path 1 and path 2" framing in MxAccessAlarmEventSink with the
validated architecture.

Key findings from the discovery probe:

1. aaAlarmManagedClient.dll is x86 + .NET Framework (mixed-mode
   C++/CLI; PE Machine = i386, NativeEntryPoint flag set). The
   "x64-only" framing in the prior follow-up was wrong — confused
   by the file path under Wonderware\Historian\x64\.
   The assembly is bitness- and runtime-compatible with the
   worker (net48 x86), so it loads in the existing process. No
   sub-process needed.

2. AlarmClient is the public class. Its model mirrors MxAccess:
   RegisterConsumer takes a Windows hWnd and the AVEVA alarm
   service WM_APP-pokes that hwnd when alarms change. The worker's
   existing STA + WM_APP pump can drive both the data-change COM
   subscriber and the alarm-client consumer.

3. AlarmAckByGUID(alarmGuid, ackComment, oprName, oprNode,
   oprDomain, oprFullName) — the native ack carries the operator's
   full identity atomically with the comment. Closes the v1
   operator-comment fidelity gap completely.

This PR:

- Adds the aaAlarmManagedClient.dll reference to MxGateway.Worker.
  csproj. Worker still builds clean.
- Adds AlarmClientDiscoveryTests as a Skip-gated reflection probe;
  flip the Skip parameter to dump the public type surface for
  reference. Captured the dump into MxAccessAlarmEventSink
  documentation so it doesn't have to be re-run.
- Replaces MxAccessAlarmEventSink's "two paths forward" doc with
  the actual wiring plan against AlarmClient's RegisterConsumer +
  Subscribe + AlarmAckByGUID surface.

Subsequent PRs (gated on STA + WM_APP integration testing on the
dev rig):

- Wire RegisterConsumer + Subscribe at session-startup; route
  WM_APP messages through GetStatistics + GetAlarmExtendedRec into
  EnqueueTransition.
- Translate gateway-side AcknowledgeAlarm RPC to a worker command
  that calls AlarmAckByGUID with the OPC UA operator's identity;
  replaces the worker-pending diagnostic from PR A.3.
- Translate gateway-side QueryActiveAlarms to a worker command
  that walks GetStatistics's reported handles via GetAlarmExtendedRec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:17:38 -04:00
dohertj2 e030661c1b Merge pull request 'worker: document MXAccess Toolkit alarm-API gap (A.2 follow-up)' (#114) from track-a2-followup-com-api-finding into main 2026-04-30 21:30:58 -04:00
Joseph Doherty 4e933802a7 worker: document MXAccess Toolkit alarm-API gap (A.2 follow-up)
PR A.2 ship-pin discovery: the MXAccess COM Toolkit installed at
C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll
does not expose any alarm-event family. Reflection enumeration of
the assembly confirms ILMXProxyServerEvents and
ILMXProxyServerEvents2 only carry OnDataChange, OnWriteComplete,
OperationComplete, and OnBufferedDataChange — no IAlarmEventSink,
no Alarms collection, no OnAlarmTransition.

AVEVA's separate alarm-subscription managed assemblies
(aaAlarmManagedClient.dll under InTouch\ViewAppFramework\Content\MA\
and ArchestrAAlarmsAndEvents.SDK.Common.dll under
Wonderware\Historian\x64\) exist on this box but are x64-only —
incompatible with the worker's x86 bitness, which is the bitness
constraint the mxaccessgw architecture exists to isolate in the
first place.

This commit replaces the speculative "TBD pin during dev-rig
validation" comment in MxAccessAlarmEventSink with the actual
finding plus the two operator-facing paths forward:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). lmxopcua's AlarmConditionService already synthesizes
   Part 9 transitions from the four MXAccess sub-attributes.
   Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions to the
   worker over a small named-pipe IPC. Recovers full v1 fidelity
   but adds operational complexity.

Until that decision resolves, the sink's Attach is a no-op, the
worker continues to function for data subscriptions, and
lmxopcua-side AlarmConditionService keeps the sub-attribute
synthesis active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:28:31 -04:00
dohertj2 6c3edf4516 Merge pull request 'gateway: AcknowledgeAlarm + QueryActiveAlarms handler tests (PR A.4)' (#113) from track-a4-conditionrefresh-coverage into main 2026-04-30 21:23:20 -04:00
Joseph Doherty 9de2c0c43d gateway: AcknowledgeAlarm + QueryActiveAlarms handler tests (PR A.4)
Nineteenth (final) PR of the alarms-over-gateway epic. Pins the
public RPC handler contract added in PR A.3:

- AcknowledgeAlarm rejects empty session_id and empty
  alarm_full_reference with InvalidArgument.
- AcknowledgeAlarm with valid input returns OK and a
  worker-pending diagnostic so clients see a successful round-trip
  even before A.2's worker dispatch lands.
- QueryActiveAlarms rejects empty session_id with InvalidArgument.
- QueryActiveAlarms with valid input streams zero snapshots until
  PR A.2 wires the worker-side QueryActiveAlarmsCommand
  (filter-prefix passthrough verified at the proto layer).
- OpenSession advertises both new RPC capability strings
  (unary-acknowledge-alarm, server-stream-active-alarms) so client
  capability negotiation lights up against the contract surface.

Closes Track A's gateway-side surface. The remaining worker
ConditionRefresh walk + integration parity-rig validation lands
during dev-rig hardware validation alongside PR A.2's COM-side
alarm subscription pin.

Tests: 279 passed (was 273; 6 new). Per-handler integration tests
land alongside the dev-rig validation when the worker walks the
real MxAccess active-alarm collection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:20:57 -04:00
dohertj2 bc61598b44 Merge pull request 'worker: alarm event mapper + sink scaffold (PR A.2 — partial)' (#112) from track-a2-worker-alarm-mapper into main 2026-04-30 21:18:53 -04:00
Joseph Doherty 335c952f00 worker: alarm event mapper + sink scaffold (PR A.2 — partial)
Eighteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the proto-build path that
the worker uses to create OnAlarmTransition events. The COM-side
subscription that registers an alarm event sink against the MXAccess
Toolkit is pinned during dev-rig validation — the exact API differs
across AVEVA versions and needs hardware to verify.

Lands today (unit-testable, no hardware needed):
- MxAccessEventMapper.CreateOnAlarmTransition — mechanical proto
  builder. Takes decoded alarm fields (full reference, source
  object, alarm type, transition kind, severity, timestamps,
  operator user/comment, category, description) and produces an
  MxEvent with the OnAlarmTransition body populated. Mirrors the
  pattern of CreateOnDataChange / CreateOnWriteComplete / etc.
- MxAccessAlarmEventSink — scaffolded class with documented
  Attach / Detach + an internal EnqueueTransition entry point.
  When dev-rig validation pins the MXAccess Toolkit alarm
  subscription API, the only edit needed is to wire the COM
  delegate inside Attach to call EnqueueTransition. The mapper
  bridge is already done.

Pending dev-rig validation:
- Pin the MXAccess Toolkit alarm event source COM API (likely one
  of IAlarmEventSink, IAlarmEventSubscription, or a method on
  LMXProxyServerClass — verify against the worker host's installed
  version).
- Add cancellation/cleanup tests once the COM hook is wired.
- Integration test against the parity rig that fires a real Galaxy
  alarm and asserts the gateway emits OnAlarmTransition.

Tests:
- 2 new mapper tests pin the full-payload Acknowledge case and
  the bare-bones Raise case.
- Full Worker.Tests suite green: 123 passed (was 121; 2 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:16:29 -04:00
dohertj2 3256733d24 Merge pull request 'gateway: AcknowledgeAlarm + QueryActiveAlarms RPC handlers (PR A.3)' (#111) from track-a3-gateway-alarm-handlers into main 2026-04-30 21:06:20 -04:00
Joseph Doherty 4f0f03fca5 gateway: AcknowledgeAlarm + QueryActiveAlarms RPC handlers (PR A.3)
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the public RPC handler
surface that PR A.1's proto introduced. The actual worker-side
ack call + active-alarm walk depend on PR A.2 (worker MxAccess
subscription); this PR ensures clients can call the RPCs and
receive a meaningful response without UNIMPLEMENTED at the gRPC
layer.

- AcknowledgeAlarm — validates session_id + alarm_full_reference,
  resolves the session (NotFound on miss), returns a successful
  reply with a structured DiagnosticMessage indicating worker
  dispatch is pending PR A.2. Once A.2 ships, the body translates
  the request into a WorkerCommand and forwards through
  SessionManager.InvokeAsync.
- QueryActiveAlarms — validates session_id, returns an empty
  stream. PR A.4 layers the actual ConditionRefresh implementation
  once the worker's QueryActiveAlarmsCommand is available.
- OpenSessionReply.Capabilities advertises both new RPCs
  (unary-acknowledge-alarm, server-stream-active-alarms) so
  clients can negotiate against the contract surface.

OnAlarmTransition events flow through the existing StreamEvents
path automatically — EventStreamService and MxAccessGrpcMapper
forward whatever family the worker emits without filtering, so
no changes are needed there for A.3.

Tests: full 273-test suite still green. Per-handler unit tests
ship with PR A.4's expanded surface; A.3's stub handlers are
narrow enough that the existing parity-fixture tests cover the
contract round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:03:58 -04:00
dohertj2 9ca200f814 Merge pull request 'clients/rust: SDK methods for AcknowledgeAlarm + QueryActiveAlarms (PR E.6)' (#110) from track-e6-rust-alarm-sdk into main 2026-04-30 17:08:37 -04:00
Joseph Doherty fe19c478c0 clients/rust: SDK methods for AcknowledgeAlarm + QueryActiveAlarms (PR E.6)
Eleventh PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Mirrors PR E.2's .NET surface
on the Rust async SDK. Depends on PR E.1 (regen, merged).

- GatewayClient::acknowledge_alarm — async unary call. Uses the
  existing unary_request helper (call timeout) and routes failures
  through Error mapping; non-OK protocol status promotes to
  Error::ProtocolStatus via ensure_protocol_success.
- GatewayClient::query_active_alarms — async server-streaming call
  returning a new ActiveAlarmStream type alias (parallel to
  EventStream). Errors are pre-mapped from tonic::Status; dropping
  the stream cancels the call cooperatively.
- GATEWAY_PROTOCOL_VERSION bumped 2 → 3 to match the .NET contract.
- FakeGateway test impl extends to satisfy the new trait methods so
  client_behavior.rs builds. Two new integration tests cover the
  new SDK methods.

Tests:
- 12 unit + 10 client_behavior + 4 proto_fixtures = 26 tests, all
  pass under cargo test (Rust 1.x via existing toolchain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:06:13 -04:00