2b92be02b95e03f24bc36de7bcde87a58c540be5
217 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2b92be02b9 |
code-reviews: re-review Tests at 42b0037
Append 5 new findings (Tests-027..031) covering the StreamEvents_WhenEventIsWritten_RecordsSendDuration flake root cause (shared MeterListener by meter name), missing kill-path coverage (reason propagation + concurrent-kill double-count), asymmetric guard coverage between Close and Kill, missing audit-failure-path coverage for ApiKey Delete, and the DashboardSnapshotPublisher reconnect-window timer sensitivity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
056f0d8808 |
code-reviews: re-review Server at 42b0037
Append 7 new findings (Server-044..050) covering the destructive-action wave: KillWorkerAsync metric/state leaks, ShutdownAsync kill-fallback gauge leak, inconsistent ConfirmDialog cleanup across pages, missing XML docs on the new DashboardSessionAdmin surface, and unhandled RemoveSessionAsync exception paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
42b0037376 |
Dashboard: replace inline fully-qualified type refs with @using
ApiKeysPage and GalaxyPage referenced ApiKeyConstraints and GalaxyRepositoryOptions by full namespace inside @code. Add the appropriate @using directive at the top of each file and let the short type names resolve through that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
de7639a3e9 |
Dashboard: fix 500 on / from duplicate endpoint mapping
GatewayApplication.MapGatewayEndpoints registered MapGet("/", ...)
that redirected to /health/live. After the dashboard added a Blazor
component at @page "/", both endpoints matched GET / and the matcher
threw AmbiguousMatchException, surfacing as a 500. The redirect
predated the dashboard home page — drop it so / lands on Overview.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8738735f0d |
clients: document StreamAlarms + AcknowledgeAlarm in each README
Each client's README now covers the alarms surface in both the SDK section (StreamAlarms / AcknowledgeAlarm beside the existing QueryActiveAlarms entry, with the streaming-cancellation note) and the CLI examples (stream-alarms / acknowledge-alarm invocations mirroring the in-tree implementations across .NET, Go, Rust, Python, and Java). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e80f3c70b6 |
docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only dashboard surface. GatewayDashboardDesign now documents the Close session / Kill worker controls and the new Delete action on revoked API keys, plus the ConfirmDialog gate for every destructive action. Sessions.md adds the SessionManager.KillWorkerAsync entry alongside CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path and the dashboard-delete-key audit event. DashboardInterfaceDesign drops the "read-only until admin workflows have a separate design" line in favor of the confirm-before-act invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
24cc5fd0f0 |
Dashboard: delete revoked API keys + confirm Rotate/Revoke/Delete
Add IApiKeyAdminStore.DeleteAsync that only deletes already-revoked rows (active keys must be revoked first so the revoke event lands in the audit log before the row disappears) and a matching admin-gated DashboardApiKeyManagementService.DeleteAsync. ApiKeysPage now shows Delete on revoked rows in place of the old "No actions" stub, and Rotate/Revoke/Delete all route through ConfirmDialog so each destructive action requires an explicit confirmation step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c5153d68bb |
Dashboard: fix API keys page stuck on "Loading"
ApiKeysPage.OnInitializedAsync overrode the base without chaining, so DashboardPageBase's Snapshot seed and hub connect never ran and the page rendered the null-snapshot empty state forever. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0e56b5befb |
Dashboard: confirm before Close session / Kill worker
Add a shared ConfirmDialog component and route Sessions, Workers, and SessionDetails Close/Kill buttons through it. The dialog shows the target session id and a color-matched confirm button (yellow Close, red Kill); Cancel dismisses without invoking the admin service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c5e7479ee4 |
Dashboard: admin-only Close session / Kill worker
Add IDashboardSessionAdminService (Admin-role gate, friendly errors, audit logging) wrapping a new ISessionManager.KillWorkerAsync that skips graceful shutdown and cleans up registry/metrics. Sessions, Workers, and SessionDetails pages render Close / Kill buttons only when CanManage; the service re-checks the role on every call so forged clicks return Unauthenticated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8a0c59d7e8 |
Java client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to the Java CLI. stream-alarms attaches to the gateway's central alarm feed (--filter-prefix, --limit, --json — NDJSON, one AlarmFeedMessage per line); acknowledge-alarm is a unary ack (--reference required, --comment, --operator). streamAlarms joins queryActiveAlarms on MxGatewayClient and uses a new MxGatewayAlarmFeedSubscription cancellable handle. Batch dispatch re-enters the picocli command line per stdin line, so registering the two new subcommands suffices. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
828e3e6cf6 |
Python client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to mxgw-py. stream-alarms reads a
bounded slice of the gateway's central alarm feed (--filter-prefix,
--max-messages, --timeout, --json; aggregate `{messages: [...]}`);
acknowledge-alarm is a unary ack (--reference required, --comment, --operator).
GatewayClient.stream_alarms joins query_active_alarms via a
_canceling_alarm_feed_iterator helper mirroring the existing
_canceling_active_alarms_iterator pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7de4efeb02 |
Rust client: port stream-alarms and acknowledge-alarm + fix stream-events family + 8MB Windows stack
Adds the session-less alarm CLI subcommands to mxgw. stream-alarms attaches to
the gateway's central alarm feed (--filter-prefix, --max-events, --json/--jsonl;
aggregate shape `{messageCount, messages: [...]}`); acknowledge-alarm is a unary
ack (--reference required, --comment, --operator). stream_alarms joins
query_active_alarms on GatewayClient and re-exports AlarmFeedStream.
Also extends stream-events JSON to emit a full `events` array (itemHandle, value
projected to protojson-shaped `*Value` keys, etc.) instead of just `eventCount`,
matching the other four CLIs, and renders MxEvent.family as the protobuf enum
NAME (MX_EVENT_FAMILY_ON_WRITE_COMPLETE) rather than the raw i32 so the e2e
write round-trip can recognise the OnWriteComplete echo.
Adds clients/rust/.cargo/config.toml bumping the Windows main-thread stack to
8 MB via /STACK:8388608. clap-derive's Command enum (one variant per subcommand)
overflowed the default 1 MB stack in debug builds after the new variants
landed; release builds were unaffected but the e2e matrix runs Rust via
`cargo run` (debug).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6f0d142639 |
Go client: port stream-alarms and acknowledge-alarm
Adds the session-less alarm CLI subcommands to mxgw-go. stream-alarms attaches to the gateway's central alarm feed (--filter-prefix, --limit, --json — NDJSON, one AlarmFeedMessage per line); acknowledge-alarm is a unary ack (--reference required, --comment, --operator). StreamAlarms joins QueryActiveAlarms on the public Client and is wired through the existing batch dispatcher via runWithIO. SDK type aliases for StreamAlarmsRequest / AlarmFeedMessage / StreamAlarmsClient land alongside the existing alarm types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
11cc6715ed |
.NET client: port stream-alarms and acknowledge-alarm + fix stream-events OCE
Adds the session-less alarm CLI subcommands. stream-alarms attaches to the gateway's central alarm feed (--filter-prefix, --max-events, --json/--jsonl); acknowledge-alarm is a unary ack (--reference required, --comment, --operator). StreamAlarmsAsync joins QueryActiveAlarmsAsync on MxGatewayClient and the transport interface; the CLI client interface, adapter, and FakeGatewayTransport follow. Also fixes the OCE bug exposed by -VerifyWrite in the cross-language e2e: StreamEventsAsync's await foreach now swallows OperationCanceledException when the supplied cancellation token is the one that fired (graceful end-of-window), and RunBatchAsync no longer excludes OCE from its outer catch — so a streaming command that hits its --timeout reports a JSON error inside its EOR-delimited record instead of killing the long-lived batch process. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f90bff01db |
Java client: port bulk read/write SDK methods + CLI subcommands
Final language in the bulk-CLI port wave. HEAD's MxGatewaySession had only the subscribe-style bulks; this commit adds the value-bulks plus matching picocli subcommands and a bench-read-bulk harness. SDK (MxGatewaySession.java): - List<BulkWriteResult> writeBulk(int serverHandle, List<WriteBulkEntry> entries) - List<BulkWriteResult> write2Bulk(int serverHandle, List<Write2BulkEntry> entries) - List<BulkWriteResult> writeSecuredBulk(int serverHandle, List<WriteSecuredBulkEntry> entries) - List<BulkWriteResult> writeSecured2Bulk(int serverHandle, List<WriteSecured2BulkEntry> entries) - List<BulkReadResult> readBulk(int serverHandle, List<String> tagAddresses, Duration timeout) readBulk uses java.time.Duration for the timeout parameter (idiomatic Java) and internally converts to the timeoutMs proto field; Duration.ZERO / null both delegate to the worker default. Per-entry secured user ids stay on each WriteSecured(2)BulkEntry to match the proto's per-row shape. CLI (MxGatewayCli.java): - read-bulk / write-bulk / write2-bulk / write-secured-bulk / write-secured2-bulk as picocli @Command subcommands. Write families share value-parsing logic; gating of --current-user-id / --verifier-user-id / --timestamp matches the cross-language flag contract. - bench-read-bulk: --iterations / --warmup loop with avg/min/max ms reporting plus a --json mode that emits the cross-language bench JSON schema. A small fixture in MxGatewayCliTests.FakeSession adds stub implementations of the five new interface methods so the test module compiles. Verification: gradle build BUILD SUCCESSFUL (4 tasks executed, all tests pass); gradle :zb-mom-ww-mxgateway-cli:installDist BUILD SUCCESSFUL. Manual smoke against live gateway on localhost:5120: open-session → register → read-bulk cold (wasCached=false both tags) → subscribe-bulk → read-bulk warm (wasCached=true both tags) → write-bulk int32 111,222 (both wasSuccessful=true) → write2-bulk timestamped (both wasSuccessful=true) → write-secured-bulk and write-secured2-bulk return per-entry MXAccess "Value does not fall within the expected range" failures with the configured user/verifier ids (0,0) — confirming the SDK does NOT throw on per-entry MXAccess failures and surfaces them through BulkWriteResult exactly as the .NET and Go ports do → bench-read-bulk iterations=20 avg=9.5 ms last_success=2/2 cached=2/2 → close-session SESSION_STATE_CLOSED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6add4b4acc |
Python client: port bulk read/write SDK methods + CLI subcommands
Mirrors the .NET / Go ports of divergent branch commit
|
||
|
|
325106920f |
Rust client: port BenchReadBulk subcommand + session.rs tightening
The bulk-write/read SDK methods (read_bulk, write_bulk, write2_bulk,
write_secured_bulk, write_secured2_bulk) and the matching clap
subcommands (ReadBulk, WriteBulk, Write2Bulk, WriteSecuredBulk,
WriteSecured2Bulk) were already on HEAD from a prior session — they
were the only bulk family that HEAD shipped before the .NET / Go /
Python / Java parallel ports. The one missing piece from the divergent
branch (commit
|
||
|
|
8aaab82287 |
Go client: port bulk read/write SDK methods + CLI subcommands
Mirrors the .NET addition: HEAD's session.go had only the subscribe-style
bulks (AddItemBulk / AdviseItemBulk / RemoveItemBulk / UnAdviseItemBulk /
SubscribeBulk / UnsubscribeBulk). This commit ports the value-bulk SDK
surface and CLI subcommands from divergent branch commit
|
||
|
|
b3ae200b11 |
.NET client: port bulk read/write SDK methods + CLI subcommands
Adds the value-bulk SDK surface and CLI subcommands that lived on the
divergent branch (commit
|
||
|
|
71d2c39f01 |
e2e: port batch subcommand to all five client CLIs
scripts/run-client-e2e-tests.ps1 expects each language CLI to expose a `batch` subcommand that reads command lines from stdin, runs each through the normal subcommand dispatch, writes the JSON result, then a sentinel line `__MXGW_BATCH_EOR__`. The implementation lived on a divergent branch (commit |
||
|
|
a68f0cf222 |
code-review: regen README, all 21 open findings resolved
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open findings from the |
||
|
|
83eba4bec5 |
Resolve Client.Rust-021
Client.Rust-021 (Design adherence): RustClientDesign.md "Crate layout"
section now describes the actual flat workspace structure instead of
the aspirational nested form. The replacement text states that the
workspace root is clients/rust/, the top-level crate
zb-mom-ww-mxgateway-client is declared in clients/rust/Cargo.toml
directly, and crates/mxgw-cli/ is the sole [workspace.members] entry.
The accompanying tree lists the real files on disk (Cargo.toml,
Cargo.lock, build.rs, README.md, RustClientDesign.md,
src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs
plus the src/generated/ tonic-build output dir, tests/, and
crates/mxgw-cli/).
Doc-only change. cargo build --workspace + cargo test --workspace clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
10bd0c0e4d |
Resolve Client.Java-027..031
Client.Java-027 (Documentation): Updated 17 Gradle task references in
clients/java/README.md (lines 37, 108-110, 160-161, 169-176, 186, 206,
221) and 3 in clients/java/JavaClientDesign.md from the retired short
subproject names to the canonical zb-mom-ww-mxgateway-client /
zb-mom-ww-mxgateway-cli names. Copy-pasting any documented command now
matches the subproject names declared in settings.gradle.
Client.Java-028 (Design adherence): Build-layout block in
JavaClientDesign.md lines 23-27 updated to show the actual package
paths com/zb/mom/ww/mxgateway/{client,cli}/ instead of the retired
com/dohertylan/mxgateway/{client,cli}/ paths.
Client.Java-029 (Documentation): README.md line 210 corrected from
"zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli" to
"zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli" — Gradle
installDist produces a directory whose name matches the project name,
not the short suffix. The e2e script already used the correct path.
Client.Java-030 (Testing coverage): Added
queryActiveAlarmsForwardsRequestAndStreamsSnapshots to
MxGatewayClientSessionTests. The test pushes a QueryActiveAlarmsRequest
carrying session_id / client_correlation_id / alarm_filter_prefix
through an InProcessGateway + TestGatewayService and asserts the server
observed all three request fields, two ActiveAlarmSnapshots stream in
order, and onError is never called. TDD red→green confirmed via a
deliberately-wrong session_id assertion. The re-triage note in
Client.Java-030's resolution clarifies that the finding's reference to
"the existing acknowledgeAlarm test" was aspirational — the alarm RPC
surface had zero coverage before this commit.
Client.Java-031 (Conventions): README.md prose lines 17, 22, 26 updated
to use the canonical zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli
names so the layout description matches Gradle / IDE project names.
Verification: gradle build BUILD SUCCESSFUL; all Java unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
865c22a884 |
Resolve IntegrationTests-022..024
IntegrationTests-022 (Conventions): ResolveRepositoryRoot now throws InvalidOperationException when the walk exhausts without finding a root marker, with a message naming the start directory, the expected markers (src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE escape hatch. Replaces the silent fallback to Directory.GetCurrentDirectory() that previously masked misconfiguration. New regression test ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers in IntegrationTestEnvironmentTests asserts the throw and the message contents. TDD red→green confirmed. IntegrationTests-023 (Testing coverage): DashboardLdapLiveTests's AuthenticateAsync_AdminInGwAdminGroup_Succeeds now asserts that the authenticated principal carries a ClaimTypes.Role claim with value DashboardRoles.Admin in addition to the existing LdapGroupClaimType assertion. A regression in MapGroupsToRoles (returning an empty list or missing the RDN fallback) would now surface here. Gated by MXGATEWAY_RUN_LIVE_LDAP_TESTS. IntegrationTests-024 (Conventions): Option (b) — extracted within IntegrationTests. New file TestSupport/NullDashboardEventBroadcaster.cs (public type, private ctor, singleton Instance). The inline class at the bottom of WorkerLiveMxAccessSmokeTests is gone; the file now imports the shared type. Matches the unit-test project's Tests-007 / Tests-021 / Tests-025 pattern while keeping the two test projects independently buildable (no shared test-helpers project crossing module boundaries). Verification: dotnet build src/ZB.MOM.WW.MxGateway.IntegrationTests clean; 19/19 integration tests passing (live MxAccess + LDAP + Galaxy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d48099f0d0 |
Resolve Tests-025..026
Tests-025 (Conventions): Extracted the previously-duplicated NullDashboardEventBroadcaster into TestSupport/NullDashboardEventBroadcaster.cs (singleton Instance, private ctor). The two nested copies in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests were removed; both files now use the shared type via 'using ZB.MOM.WW.MxGateway.Tests.TestSupport;'. The Server-041 regression test's ThrowingDashboardEventBroadcaster is intentionally left nested — single-file usage doesn't warrant promotion to TestSupport. The third copy in IntegrationTests/WorkerLiveMxAccessSmokeTests was handled by IntegrationTests-024 in its own commit. Tests-026 (Testing coverage): Added a new RecordingDashboardEventBroadcaster test double in TestSupport — a thread-safe (ConcurrentQueue<DashboardEventCapture>) recorder. New fixture StreamEventsAsync_PublishesEachEventToDashboardBroadcaster in EventStreamServiceTests pushes two events through the fake session and asserts the broadcaster received both with the correct sessionId and WorkerSequence. TDD red→green confirmed: the deliberately-wrong "Expected 3, Actual 2" red phase proved the recording fake was actually invoked by the production code path. Verification: 486/486 server tests passing (485 previous + 1 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bd1d1f1c0e |
Resolve Contracts-016..017
Contracts-016 (Conventions): QueryActiveAlarmsRequest.session_id header replaced with the unambiguous "Clients may leave session_id empty; the gateway currently ignores it and serves the session-less central-monitor cache. A future version may use it to scope the snapshot to one session." Removes the ambiguity that the prior "reserved for future use" wording introduced. Contracts-017 (Documentation): The rpc QueryActiveAlarms comment now includes the alarm_filter_prefix description: "QueryActiveAlarmsRequest.alarm_filter_prefix optionally narrows the snapshot to alarms whose alarm_full_reference starts with the given prefix; an empty prefix returns the full set." Both are proto-comment-only changes — no wire-format impact, no field renumbering, and the regenerated MxaccessGateway.cs / MxaccessGatewayGrpc.cs carry only the doc-comment delta. Added the additive-only regression guard QueryActiveAlarmsRequest_PinsFieldNumbersAndRoundTripsPrefixFilter to ProtobufContractRoundTripTests — pins session_id=1 / client_correlation_id=2 / alarm_filter_prefix=3 by descriptor lookup and round-trips the message with and without the filter populated. Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean; ProtobufContractRoundTripTests 40/40 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
327e9c5f94 |
Resolve Server-031..032 (re-triaged) + Server-038..043
Server-031: re-triaged. The recommended gateway-side "skip-while-command-in-flight" guard is already in place at WorkerClient.HeartbeatLoopAsync via WorkerClientOptions.HeartbeatStuckCeiling (default 75s = 5× HeartbeatGrace). Two regression tests pin the behaviour. Recommendation #1 (decouple worker-side _writeLock) is a Worker-module concern (Worker-017 / Worker-023) and out of scope here. Server-032: re-triaged. Recommendation #2 (rich diagnostic) is already in EnqueueWorkerEventAsync, with #3 (overflow grace) absorbed by the TryWrite → WriteAsync-with-timeout fall-through. Test EnqueueWorkerEvent_WhenChannelFullPastTimeout_FaultsWithRichDiagnostic pins the diagnostic string. Recommendation #1 (prose contract in gateway.md / docs) is deferred — outside this pass's edit scope. Server-038 (Security): EventsHub.SubscribeSession's missing per-session ACL is documented with a TODO(per-session-acl) and a <remarks> block explaining the v1 acceptance (any dashboard role can subscribe to any session — non-secret metadata, redacted value logging). The per-session ACL design lands in a follow-up once a session-scoped role exists. Server-039 (Error handling): HubTokenService.Validate now rejects a deserialized payload where both Name and NameIdentifier are null/empty. New test file HubTokenServiceTests.cs covers the regression and five sanity cases. TDD confirmed. Server-040 (Conventions): MapGroupsToRoles gains a precedence comment explaining "full literal match first, leading-RDN fallback; OrdinalIgnoreCase via DashboardOptions.GroupToRole". Documentation-only. Server-041 (Design adherence): EventStreamService.ProduceEventsAsync wraps the broadcaster.Publish call in try/catch (Exception). The producer loop and gRPC stream are no longer at the mercy of the broadcaster's never-throw discipline. New regression test StreamEventsAsync_WhenDashboardBroadcasterThrows_StillYieldsEventsAndDoesNotFaultSession. Server-042 (Performance): DashboardSnapshotPublisher.ExecuteAsync now mirrors AlarmsHubPublisher's reconnect loop — wraps the await foreach in a while-not-cancelled, catches general exceptions, and Task.Delays 5s before retrying. An internal ctor accepts a shorter delay for the test. New test file DashboardSnapshotPublisherTests.cs covers the throw-then-yield reconnect path and the normal-completion case. Server-043 (Documentation): HubTokenService class XML doc gains a <remarks> describing the singleton lifetime, the two consumer scopes (DashboardHubConnectionFactory scoped, HubTokenAuthenticationHandler transient), and the thread-safety contract. Verification: dotnet build src/ZB.MOM.WW.MxGateway.slnx clean (0 warnings / 0 errors); src/ZB.MOM.WW.MxGateway.Tests 486/486 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d2d2e5f68f |
code-review 2026-05-24: re-review at d692232 across all 11 modules
Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).
Module status
| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server | 8 | 43 | +6 |
| Contracts | 2 | 17 | +2 |
| Tests | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java | 5 | 31 | +5 |
| Client.Rust | 1 | 21 | +1 |
| Worker | 0 | 25 | 0 (rename-only diff, clean) |
| Worker.Tests | 0 | 30 | 0 (rename-only diff, clean) |
| Client.Dotnet | 0 | 17 | 0 (rename + alarm-fix diff, clean) |
| Client.Python | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
| Client.Go | 0 | 21 | 0 (rename + alarm-fix diff, clean) |
Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.
New findings
* Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
any session id from any Viewer; no per-session ACL guards the
EventsHub group fan-out.
* Server-039 (Low / Error handling) — HubTokenService.Validate
accepts a payload with null Name/NameIdentifier.
* Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
full-vs-RDN lookup precedence.
* Server-041 (Low / Design adherence) — EventStreamService calls
IDashboardEventBroadcaster.Publish without a try/catch — fragile
seam relying on the never-throw contract.
* Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
* Server-043 (Low / Documentation) — HubTokenService singleton
sharing across login + hub-token validation undocumented.
* Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
reserved-for-future-use ambiguity.
* Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
omits the alarm_filter_prefix filter description.
* Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
* Tests-026 (Medium / Testing coverage) — no test proves
EventStreamService actually calls IDashboardEventBroadcaster.Publish.
* IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
silent fallback to Directory.GetCurrentDirectory().
* IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
success-path asserts ldap_group but not the Role claim.
* IntegrationTests-024 (Low / Conventions) — inline
NullDashboardEventBroadcaster fake duplicates Tests-side copies.
* Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
Gradle task names still use the old short project names.
* Client.Java-028 (Medium / Design adherence) — JavaClientDesign
build-layout shows the old `com/dohertylan/mxgateway/` package paths.
* Client.Java-029 (Low / Documentation) — README installDist path
cites the wrong directory.
* Client.Java-030 (Low / Testing coverage) — no Java test exercises
the regenerated QueryActiveAlarmsRequest RPC.
* Client.Java-031 (Low / Conventions) — README prose uses old short
project names instead of canonical prefixed ones.
* Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
"Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
that does not exist; actual layout is the flat top-level crate.
Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.
Process notes
- The `code-reviews/` tree was not in this working copy's git
history (the local extract pre-dates the divergent branch that
carried the reviews). Restored from `dd7ca16` via
`git checkout
|
||
|
|
d692232191 |
dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)
EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:
* IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
take a direct dependency on SignalR.
* DashboardEventBroadcaster — singleton implementation that hands the
SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
logged at debug and dropped so the source gRPC stream is never
blocked.
EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.
SessionDetailsPage adds a "Recent events" panel:
* implements IAsyncDisposable
* opens a second HubConnection via DashboardHubConnectionFactory targeting
/hubs/events
* calls SubscribeSession(SessionId) on Start
* renders the most recent 50 events in a small table (worker seq, family,
server/item handle, alarm reference when the event is OnAlarmTransition)
* shows a live/offline conn-pill driven by HubConnection.Closed /
Reconnected events
The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.
Documentation refresh
Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:
* CLAUDE.md — Authentication section now explains LDAP bind +
GroupToRole + HubToken bearer flow.
* gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
events SignalR hubs, LDAP cookie + bearer scheme.
* docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
add GroupToRole row, append "Authorization policies" and "SignalR hubs"
subsections describing the three policies and the /hubs/* endpoints.
* docs/GatewayDashboardDesign.md — hosting model (root mount, new
endpoint layout), Realtime Updates rewritten as a hub table
(DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
and routing), Authentication And Authorization rewritten around LDAP +
role mapping + the hub bearer flow, Configuration block updated.
* docs/GatewayProcessDesign.md — security-section dashboard paragraph
and the example config block both refreshed to LDAP/role auth.
* docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
API-key login flow).
* docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
RequiredGroup default; success-path assertion explains the role-claim
check.
Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
65943597d4 |
dashboard: side-rail layout + SignalR push hubs (snapshot, alarms, events)
Layout
------
DashboardLayout.razor replaces the inline header nav with a left side rail
modelled on the OtOpcUa admin (Dashboard B). The top bar keeps only the
brand, breadcrumb, and signed-in status pill; navigation moves into a
fixed-width 218px rail with grouped section eyebrows (Overview,
Runtime, Galaxy, Admin) and a Session footer carrying the user name,
role claims, and a Sign-out button. dashboard.css gains the
`.app-shell` flex container, `.side-rail` column, `.rail-eyebrow`,
`.rail-link[.active]`, `.rail-foot`, `.rail-user`, `.rail-roles`, and
`.rail-btn` rules (all driven by the existing theme.css tokens, no new
hard-coded colours).
SignalR (push)
--------------
Adds three hubs under `Dashboard/Hubs/`, all gated by the
`HubClientsPolicy` registered in the previous commit:
* DashboardSnapshotHub (/hubs/snapshot)
Broadcasts the full DashboardSnapshot on every change. Sends the
current snapshot to a new caller in OnConnectedAsync so the first
paint is immediate.
* AlarmsHub (/hubs/alarms)
Connected clients auto-join the `__alarms__` group. Receives
AlarmFeedMessage values (active_alarm / snapshot_complete /
transition) re-broadcast from the gateway's central alarm monitor.
* EventsHub (/hubs/events)
Per-session push surface. Clients call SubscribeSession(sessionId)
to join `session:{id}`. The publisher side is intentionally a
follow-up — the snapshot hub already carries recent-events
rollups; a dedicated MxEvent broadcaster on EventStreamService
will plug into this hub's group convention.
Two BackgroundService publishers wire server-side data sources to the
hubs:
* DashboardSnapshotPublisher subscribes to
`IDashboardSnapshotService.WatchSnapshotsAsync` and forwards every
snapshot to all connected hub clients.
* AlarmsHubPublisher subscribes to `IGatewayAlarmService.StreamAsync`
(no filter) and forwards every AlarmFeedMessage to the
`__alarms__` group, reconnecting with a 5-second backoff if the
stream faults.
Connection + auth plumbing
--------------------------
* `GET /hubs/token` issues a fresh data-protected bearer token
bound to the calling user's identity and roles. Gated by the
cookie-only ViewerPolicy so a Blazor circuit (cookie-authenticated)
can mint a token, but a hub bearer cannot self-bootstrap a new
one.
* DashboardHubConnectionFactory (scoped) is the client-side helper
Razor pages inject. It builds a HubConnection with an
AccessTokenProvider that calls HubTokenService.Issue on every
(re)connect — keeps the connection alive across cookie refresh
boundaries.
Pull → push refactor
--------------------
DashboardPageBase no longer drives its own `WatchSnapshotsAsync`
async-foreach loop. It now:
1. seeds Snapshot synchronously from `IDashboardSnapshotService.GetSnapshot()`
so the first render is non-empty;
2. opens a `DashboardSnapshotHub` connection via the connection
factory;
3. updates Snapshot + triggers StateHasChanged on each
`SnapshotUpdated` push.
The hub connection is best-effort: if SignalR can't start, the
synchronous snapshot seed keeps the UI populated. SignalR's
WithAutomaticReconnect handles the recovery path.
Package
-------
Adds `Microsoft.AspNetCore.SignalR.Client` 10.0.0 to the server csproj
so the in-process Blazor pages can open hub connections back to their
own hosting process.
Verification: 475 server tests (+ 2 new
`DashboardHubsRegistrationTests` that pin the hub negotiate endpoints
and the singleton/scoped DI shape), 275 worker tests (+ 9 dev-rig
skips), 18 integration tests (live MxAccess + LDAP + Galaxy) all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
27ed65114e |
dashboard: role-based LDAP auth + hub bearer scheme, drop PathBase
Restructure dashboard auth around LDAP-driven Admin/Viewer roles, add a
bearer scheme so SignalR hubs (next commit) can authenticate without
forwarding the HttpOnly browser cookie, and mount the dashboard at the
host root instead of a configurable `/dashboard` prefix.
Configuration changes (breaking):
- `MxGateway:Dashboard:PathBase` removed — the dashboard now serves at `/`.
- `MxGateway:Dashboard:RequireAdminScope` removed — role checks replace
the single admin-scope claim.
- `MxGateway:Ldap:RequiredGroup` removed — replaced by `MxGateway:Dashboard:GroupToRole`,
a map from LDAP group name to dashboard role. Legal role values:
`Admin` and `Viewer`. Users whose LDAP groups don't intersect this
map are rejected at login (the existing fail-closed contract).
- appsettings.json ships a default mapping `{ GwAdmin: Admin, GwReader: Viewer }`.
Auth model:
- DashboardRoles: new static class with `Admin` and `Viewer` constants.
- DashboardAuthenticator.AuthenticateAsync: after LDAP bind, maps the
user's groups through `DashboardOptions.GroupToRole` and emits one
`ClaimTypes.Role` claim per resolved role. Empty result → login fails.
- DashboardAuthorizationRequirement now carries `RequiredRoles`; static
presets `AnyDashboardRole` (Viewer ∨ Admin) and `AdminOnly`.
- DashboardAuthorizationHandler checks `IsInRole` against the
requirement's role list instead of the old scope claim. The
`AuthenticationMode.Disabled` and `AllowAnonymousLocalhost` bypasses
are preserved.
- DashboardApiKeyAuthorization.CanManage now requires the `Admin` role
(was: required LDAP group membership). The constructor's IOptions
parameter is gone.
Policies / schemes:
- DashboardAuthenticationDefaults gains `ViewerPolicy`, `AdminPolicy`,
`HubClientsPolicy`, and `HubAuthenticationScheme`. The legacy
`AuthorizationPolicy` and `ScopeClaimType` constants are removed.
- DashboardServiceCollectionExtensions registers all three policies,
adds the cookie scheme and the HubToken bearer scheme side by side,
calls `AddSignalR()`, and hard-codes the cookie's login/logout/denied
paths to root-relative `/login` etc.
Hub bearer infrastructure (no hubs wired yet — next commit):
- HubTokenService: mints time-limited data-protected JSON tokens
carrying the user's name, NameIdentifier, and roles. 30-minute
lifetime, purpose `ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1`.
- HubTokenAuthenticationHandler: validates the token from
`Authorization: Bearer …` or `?access_token=…` (WebSocket upgrade
query string) and rebuilds the principal.
Endpoint mapping:
- DashboardEndpointRouteBuilderExtensions drops the `MapGroup(pathBase)`
wrapper. Login/logout/denied and Razor component routes are now
mounted at `/`. The login form posts to `/login`. Razor components
require the new `ViewerPolicy`.
- All page `@page "/dashboard/X"` dual-route directives are removed —
pages live at their canonical roots (`@page "/"`, `@page "/sessions"`, …).
- App.razor and DashboardLayout.razor drop their PathBase computations.
EffectiveLdapConfiguration drops `RequiredGroup`; EffectiveDashboardConfiguration
drops `PathBase`/`RequireAdminScope` and gains `GroupToRole`. SettingsPage
renders the role mapping in place of the retired fields.
Tests updated:
- DashboardAuthenticatorTests: covers the new GroupToRole mapping
(short name + DN + multi-role).
- DashboardAuthorizationHandlerTests: split into Viewer-policy and
Admin-policy cases.
- DashboardApiKeyAuthorizationTests, DashboardApiKeyManagementServiceTests:
authorized principal now carries the `Admin` role claim.
- DashboardCookieOptionsTests: expects root-relative login/logout paths.
- GatewayApplicationTests: dashboard component routes registered at `/`,
`/sessions`, … and gated by `ViewerPolicy`. Filter on
`ComponentTypeMetadata` to ignore minimal-API endpoints sharing `/`.
- GatewayOptionsTests + Validator: drop PathBase / RequireAdminScope /
RequiredGroup assertions; add a `GroupToRole` value-validation case.
- DashboardLdapLiveTests: provides the default `GwAdmin` → `Admin`
mapping so the live LDAP bind resolves to a role.
Verification: 473 server tests, 275 worker tests (+9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass.
This commit is intentionally UI-neutral. The sidebar layout and the
SignalR hubs that consume the new HubToken scheme land in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
397d3c5c4f |
rename: apply ZB.MOM.WW prefix to all client SDKs + fix pre-existing alarm-RPC breaks
Rename across every client surface using each language's idiomatic convention:
* .NET clients/dotnet/MxGateway.Client[.Cli|.Tests]/
-> clients/dotnet/ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]/
namespaces -> ZB.MOM.WW.MxGateway.Client[.Cli|.Tests]
contracts ProjectReference repointed to ZB.MOM.WW.MxGateway.Contracts
sln migrated to slnx (dotnet sln migrate)
* Python src/mxgateway -> src/zb_mom_ww_mxgateway
src/mxgateway_cli -> src/zb_mom_ww_mxgateway_cli
distribution: mxaccess-gateway-client -> zb-mom-ww-mxaccess-gateway-client
* Rust crate: mxgateway-client -> zb-mom-ww-mxgateway-client
build.rs proto path repointed
* Java subprojects: mxgateway-{client,cli} -> zb-mom-ww-mxgateway-{client,cli}
packages com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
group com.dohertylan.mxgateway -> com.zb.mom.ww.mxgateway
rootProject mxaccessgw-java -> zb-mom-ww-mxaccessgw-java
* Go generate-proto.ps1 proto path repointed; module path and
package mxgateway kept (Go convention).
* proto-inputs.json: generatedOutputs.python updated to new package path.
* scripts/run-client-e2e-tests.ps1: Java CLI install path + gradle task
updated to zb-mom-ww-mxgateway-cli.
CLI binary names (mxgw, mxgw-py, mxgw-go, mxgateway-cli) and wire-level
identifiers (MXGATEWAY_* env vars, the mxgw_<id>_<secret> API key
prefix, protobuf package names like mxaccess_gateway.v1, all MXAccess
references) intentionally NOT renamed.
Fix pre-existing alarms-over-gateway breaks unblocked by the rename:
* mxaccess_gateway.proto: add missing public message QueryActiveAlarmsRequest
{session_id, client_correlation_id, alarm_filter_prefix} and missing
rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns
(stream ActiveAlarmSnapshot). All four typed clients referenced
these but they were absent from the proto.
* MxAccessGatewayService.QueryActiveAlarms: implement the new RPC on
the server, streaming from IGatewayAlarmService.CurrentAlarms with
optional alarm_filter_prefix filter.
* clients/dotnet/.../DiscoverHierarchyOptions.cs: add the hand-written
.NET POCO that wraps DiscoverHierarchyRequest (referenced by
GalaxyRepositoryClient.DiscoverHierarchyAsync but never authored).
* Drop retired session_id field references from
AcknowledgeAlarmRequest/AcknowledgeAlarmReply test fixtures across
.NET, Rust, Go, and Python clients.
* Rust integration test: add the missing stream_alarms impl on the
fake MxAccessGateway server (the trait gained the method, fake
didn't).
* Rust CLI test: bump expected gatewayProtocolVersion 2 -> 3.
Regenerated artifacts updated in this commit:
* src/ZB.MOM.WW.MxGateway.Contracts/Generated/{MxaccessGateway,MxaccessGatewayGrpc}.cs
* clients/python/src/zb_mom_ww_mxgateway/generated/*_pb2{,_grpc}.py
* clients/go/internal/generated/*.pb.go
(C# regenerated by Grpc.Tools on contracts build; Python and Go via
their generate-proto.ps1 scripts; Rust regenerates from .proto via
tonic-build at compile time so no checked-in artefact.)
Verification: 472 server tests, 275 worker tests (9 dev-rig skipped),
18 integration tests (live MxAccess + LDAP + Galaxy), 57 .NET client
tests, 32 Rust workspace tests, 39 Python tests, all Go packages, and
gradle build for Java all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
dc9c0c950c |
rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.
External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths
Also fixes two tests that were not rename-related but became visible
while validating the rename:
- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
gateway service correctly maps to RpcException(Cancelled) per gRPC
convention was being misclassified as a stream fault. Added a sibling
catch on RpcException with StatusCode.Cancelled.
- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
and made it accept either a .git marker OR a .sln/.slnx next to src/
so the worker-exe walker works in non-git working copies.
clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.
Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
Tests: 472/472 pass
Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
IntegrationTests: 18/18 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
867bf18116 |
alarms-over-gateway: full pipeline (#118)
Seven slices on this branch implement the full alarms-over-gateway path: 1. |
||
|
|
a4ed605f74 |
A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
- Subscribe + 1st PollOnce yield real transition events
- Field-by-field decode correct (provider, group, tag, severity,
UTC timestamp, comment, type)
- SnapshotActiveAlarms reflects current state
- AcknowledgeByName(real identity) -> rc=0
- Pipeline keeps streaming transitions on the 10s cadence post-ack
Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:
1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
earlier discovery-doc recommendation) makes the first
GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
call is unnecessary because the round-trip echo is mangled was
wrong — mangled echo or not, the call is required.
2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
instance (returns -55). Workaround: provision a parallel
"ack-only" wnwrap consumer that runs Initialize → Register →
Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
Production WnWrapAlarmConsumer now holds two COM clients;
AcknowledgeByName always dispatches through the ack-only one.
3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
8-arg overload returns -55 on this AVEVA build (apparently a
stub); the v1 6-arg overload works. Production now calls the
6-arg overload, discarding the proto's operator_domain and
operator_full_name fields. The proto contract keeps both for
forward-compat if AVEVA fixes the v2 method.
Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).
Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.
Test counts after this slice:
Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
1 pre-existing structure-fail (untouched)
Server: 308 pass / 0 fail
Solution builds clean.
docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4e02927f01 |
A.3 (alarm-ack-by-name): public AcknowledgeAlarm now accepts Provider!Group.Tag references
Closes the gap where the public AcknowledgeAlarm RPC required canonical GUIDs but OnAlarmTransitionEvent.AlarmFullReference is "Provider!Group.Tag". Adds an AVEVA AlarmAckByName path that wraps wwAlarmConsumerClass.AlarmAckByName so callers can ack with the natural reference. Proto: - New MxCommandKind.AcknowledgeAlarmByName (=29). - New AcknowledgeAlarmByNameCommand(alarm_name, provider_name, group_name, comment, operator_user/node/domain/full_name) on MxCommand oneof. - AcknowledgeAlarmReplyPayload (existing) carries the AVEVA native status; reused for the by-name path. Worker: - IMxAccessAlarmConsumer + WnWrapAlarmConsumer + AlarmDispatcher + AlarmCommandHandler all gain an AcknowledgeByName(name, provider, group, comment, operator-identity) overload that maps to wwAlarmConsumerClass.AlarmAckByName. - MxAccessCommandExecutor: new switch arm routes MxCommandKind.AcknowledgeAlarmByName to the handler. Empty alarm_name yields InvalidRequest; handler exceptions surface as MxaccessFailure. Gateway: - WorkerAlarmRpcDispatcher.TryParseAlarmReference: parses "Provider!Group.Tag" with the convention that the FIRST '!' separates provider, the FIRST '.' after '!' separates group; tag may contain more dots. - AcknowledgeAsync now branches: GUID input → AcknowledgeAlarm command (existing path); reference input → AcknowledgeAlarmByName command (new path); neither parses → InvalidRequest with a clear diagnostic. Tests: 13 new unit tests cover each layer end-to-end: - WorkerAlarmRpcDispatcher.TryParseAlarmReference (3 valid + 8 invalid forms) including the realistic 4-component "Galaxy!TestArea. TestMachine_001.TestAlarm001" reference. - WorkerAlarmRpcDispatcher.AcknowledgeAsync routes references through AcknowledgeAlarmByName + propagates the full operator tuple. - Executor switch arm carries the by-name tuple and rejects empty alarm_name. - AlarmDispatcher.AcknowledgeByName forwards to consumer. - Existing fakes extended for the new overload. Counts: server 308/0, worker 195/3 skip / 1 pre-existing structure-fail (untouched). Solution builds clean. End-to-end alarms-over-gateway now serves the full lmxopcua flow: client.AcknowledgeAlarm(reference="Galaxy!TestArea.TestMachine_001.TestAlarm001", operator_user="alice") → gateway parses → IPC AcknowledgeAlarmByName → worker AlarmAckByName → AVEVA history. The remaining piece for full parity is a live dev-rig smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
47b1fd422c |
A.3 (auto-subscribe): SessionManager issues SubscribeAlarms on session open
Adds the missing trigger that activates the worker's wnwrap consumer.
Without this, every session opened in OK state but the consumer never
started, so AcknowledgeAlarm/QueryActiveAlarms returned "alarm consumer
not configured" forever.
New AlarmsOptions config block (under MxGateway:Alarms):
- Enabled (default false): gates the auto-subscribe path so existing
deployments without alarm configuration are unaffected.
- SubscriptionExpression: explicit AVEVA expression like
\<machine>\Galaxy!<area>.
- DefaultArea: fallback used when SubscriptionExpression is empty;
composes \$(MachineName)\Galaxy!$(DefaultArea).
- RequireSubscribeOnOpen (default false): when true, an auto-subscribe
failure faults the session; when false, the failure is logged and
the session stays Ready (data subscriptions keep working, alarms
return "not subscribed" until the operator retries).
SessionManager.OpenSessionAsync gains a TryAutoSubscribeAlarmsAsync hook
that runs after MarkReady. Skips when alarms are disabled; otherwise
builds a SubscribeAlarmsCommand, invokes it on the session's worker
client, and either logs the resulting status or escalates per
RequireSubscribeOnOpen. SessionManagerException is the failure mode for
the strict path so callers in MxAccessGatewayService surface it as
session-open-failed.
Tests: 7 new unit tests cover the disabled lane, expression-driven
subscribe, DefaultArea fallback, success path, soft-failure (require
off), strict-failure (require on), and missing-config-strict-throw.
Server suite total: 295 pass / 0 fail. Solution builds clean.
End-to-end alarms-over-gateway path is now live (with config). Open a
session against a gateway with Alarms.Enabled=true + a valid
SubscriptionExpression; the worker's wnwrap consumer auto-subscribes;
QueryActiveAlarms streams snapshots; AcknowledgeAlarm acks by GUID.
Reference→GUID resolution (AlarmAckByName worker command) and the live
dev-rig smoke test remain follow-ups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9b21ca3554 |
A.3 (gateway dispatcher): WorkerAlarmRpcDispatcher routes alarm RPCs over the worker pipe
Replaces NotWiredAlarmRpcDispatcher in DI with a production
implementation that issues the new MxCommandKind.{AcknowledgeAlarm,
QueryActiveAlarms} commands across the IPC and unwraps the resulting
MxCommandReply into the public RPC types.
QueryActiveAlarms is fully wired: builds the QueryActiveAlarmsCommand
(forwarding alarm_filter_prefix), invokes it on the resolved
GatewaySession's worker client, and yields each ActiveAlarmSnapshot
from the QueryActiveAlarmsReplyPayload as the RPC stream. Worker
failures + missing sessions yield an empty stream — matches the
ConditionRefresh contract clients already speak to.
AcknowledgeAlarm is partially wired: the public RPC takes
AlarmFullReference (Provider!Group.Tag), but the worker's wnwrap
consumer acks by GUID. Strategy:
- If AlarmFullReference parses as a canonical GUID, forward it
directly through MxCommandKind.AcknowledgeAlarm. Native status
flows back via MxCommandReply.Hresult and the dedicated
AcknowledgeAlarmReplyPayload.NativeStatus.
- Otherwise, return InvalidRequest with a clear diagnostic naming the
follow-up — reference→GUID lookup needs a worker-side AlarmAckByName
command wrapping wwAlarmConsumerClass.AlarmAckByName.
DI: SessionServiceCollectionExtensions registers WorkerAlarmRpcDispatcher
as the default IAlarmRpcDispatcher; MxAccessGatewayService picks it up
via constructor injection. NotWiredAlarmRpcDispatcher is retained for
test fixtures that want the no-side-effect fake.
Tests: 7 new unit tests cover session-not-found short-circuit, GUID-vs-
reference branching, native-status propagation, worker MxaccessFailure
diagnostic propagation, and snapshot-stream yielding. Server test
suite total: 288/0 fail. Solution builds clean.
End-to-end alarms-over-gateway pipeline status:
consumer → sink → queue (A.2 + A.3 in-process slice)
worker IPC commands (A.3 worker slice)
gateway dispatcher (this slice)
Remaining for full E2E:
- Auto-issue SubscribeAlarms on session open (or add a public
SubscribeAlarms RPC). Without this trigger the consumer never
starts and Acknowledge/Query return "not subscribed".
- AlarmAckByName worker command for ack-by-reference.
- End-to-end live test against the dev rig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
01f5e6ad91 |
A.3 (worker IPC slice): proto SubscribeAlarms/Acknowledge/QueryActive commands + executor routing
Adds the worker-side IPC surface for the alarm subsystem so the gateway can drive the AlarmDispatcher across the named-pipe boundary. Adds four proto MxCommandKind values + matching command messages and two MxCommandReply payload variants: - SubscribeAlarmsCommand(subscription_expression) - UnsubscribeAlarmsCommand - AcknowledgeAlarmCommand(alarm_guid, comment, operator_user/node/domain/full_name) - QueryActiveAlarmsCommand(alarm_filter_prefix) - AcknowledgeAlarmReplyPayload(native_status) - QueryActiveAlarmsReplyPayload(repeated ActiveAlarmSnapshot snapshots) Worker plumbing: - New IAlarmCommandHandler interface + AlarmCommandHandler production impl. Lazy-creates an AlarmDispatcher (with a wnwrap-backed consumer by default) on the first SubscribeAlarms; routes Acknowledge / QueryActive / Unsubscribe through it. Idempotent under repeated Unsubscribe; rejects a second Subscribe without an intervening Unsubscribe; cleans up the consumer if the underlying Subscribe call throws. - MxAccessCommandExecutor: 4 new switch arms map MxCommandKind values to IAlarmCommandHandler calls. Acknowledge surfaces the AVEVA native status into both MxCommandReply.Hresult and the dedicated AcknowledgeAlarmReplyPayload.NativeStatus so gateway-side consumers can echo it without unpacking the outer envelope. Invalid GUIDs and missing payloads return InvalidRequest; handler exceptions return MxaccessFailure with the exception message in DiagnosticMessage. - MxAccessStaSession: new constructor overload accepts an alarmCommandHandlerFactory; it's invoked on the STA thread during StartAsync and the resulting handler is passed into the executor. ShutdownGracefullyAsync + Dispose tear it down on the STA before the data-side cleanup runs. Tests: 20 new unit tests covering AlarmCommandHandler lazy lifecycle (Subscribe/Unsubscribe/Acknowledge/Query/Dispose, error paths) and the executor's 4 alarm switch arms (OK/InvalidRequest/MxaccessFailure paths, hresult propagation, prefix filtering). Worker test suite total: 192 passed / 3 skipped (live probes) / 1 pre-existing structure-test fail (untouched). Deferred to next slice: gateway-side WorkerAlarmRpcDispatcher that replaces NotWiredAlarmRpcDispatcher, builds + sends these commands across the IPC, and unwraps the resulting MxCommandReply into AcknowledgeAlarmReply / ActiveAlarmSnapshot stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
82eb0ad569 |
A.3 (in-process slice): AlarmDispatcher wires consumer events onto event queue
Adds the in-process plumbing that connects WnWrapAlarmConsumer's AlarmTransitionEmitted stream to the worker's MxAccessEventQueue via MxAccessAlarmEventSink. With this change a transition raised by the consumer lands as an OnAlarmTransitionEvent proto on the queue, SessionId attached, ready for IPC dispatch. Mapping: provider!group.tag → AlarmFullReference, tag → SourceObjectReference, priority → severity, wnwrap STATE → AlarmConditionState (Active / ActiveAcked / Inactive — wnwrap's ack-vs-unack-on-cleared distinction collapses since OPC UA Part 9 doesn't model it). State delta drives AlarmTransitionKind via the existing AlarmRecordTransitionMapper table. Holding off on the proto IPC additions (SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms commands + WorkerAlarmRpcDispatcher) for a follow-up — those touch every layer of the worker IPC and warrant their own PR. This slice proves the consumer→sink→queue pipeline end-to-end with unit tests and clears the path for the proto additions to plug in cleanly. Tests: 10 new unit tests cover field-by-field mapping, the "unchanged-state-doesn't-emit" filter, the state→transition kind table, Subscribe / Acknowledge passthrough, SnapshotActiveAlarms → proto ActiveAlarmSnapshot mapping, and Dispose detaches the handler. All passing; total worker test count 172/3 skip / 1 pre-existing structure fail (untouched). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f711a55be4 |
A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient` to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by `wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime auto-marshaling that crashed `GetHighPriAlarm` with ArgumentOutOfRangeException on every poll. Live captured 60/60 polls clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform script flipped TestMachine_001.TestAlarm001 every 10s; the GUID, priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps arrived end-to-end. Implementation: - `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under `lib/` so dev boxes don't need the SDK to build. - New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a 500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the snapshot keyed by alarm GUID, and raises one `MxAlarmTransitionEvent` per state change. Includes the Initialize→Register-before-Subscribe ordering fix found during Discovery probe runs. - New library-agnostic types `MxAlarmSnapshotRecord` / `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build path is testable without an AVEVA install. - `AlarmRecordTransitionMapper` retired the COM-coupled `MapTransitionKind(eAlmTransitions)`; new pure helpers `ParseStateKind`, `MapTransition(prev, curr)`, and `ParseTransitionTimestampUtc` cover XML decode + state-delta logic. - `IMxAccessAlarmConsumer` event surface changed from `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>` and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` — decoupling the interface from any specific COM library. - Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider` refs; adds `Interop.WNWRAPCONSUMERLib`. Tests: - 36 new unit tests (state-string mapping, prev/current → proto kind decision table, timestamp UTC reassembly, XML payload parser, 32-char hex GUID round-trip) covering everything that doesn't touch the live COM surface — all passing. - Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives the live capture flow for regression / future probes. Docs: - `docs/AlarmClientDiscovery.md` "Option A — captured" section records sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip (prefer `Subscribe` for filtering), the `GetStatistics` AccessViolationException quirk, and the worker-integration outline. Pre-existing failure noted (separate): `MxAccessInteropReference_ExistsOnlyInWorkerProject` was already failing on HEAD — the test project still references `ArchestrA.MxAccess` for the Skip-gated discovery probes. Not regressed by this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f490ae2593 |
docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements only IDisposable (no [ComImport] interface, no class GUID) and has a single field "CwwAlarmConsumer* m_almUnmanaged". So AlarmClient is a C++/CLI managed wrapper around a native C++ class -- NOT a COM-interop class. The DateTime conversion happens INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling boundary. There's no separate COM interface to QI to. Revised approach (in docs/AlarmClientDiscovery.md): A. wnwrapConsumer.dll -- separate standalone COM library AVEVA ships at "C:\Program Files (x86)\Common Files\ArchestrA" exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output bypasses FILETIME marshaling entirely. Best fit -- real COM, self-contained, conventional production-grade approach. B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a vendor binary, brittle to upgrades. C. Reflect into m_almUnmanaged and call native vtable directly -- requires reverse-engineering the C++ class layout. Picking A. Probe restored to Skip; next commit starts the wnwrapConsumer integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
39f9fd8946 |
probe: BREAKTHROUGH — alarms flow via canonical \Node\Galaxy!Area, blocked by DateTime marshaling
Two findings that turn the alarm capture path on: 1. Subscription expression: \<MachineName>\Galaxy!<Area> is the canonical AlarmClient subscription format per ArchestrA docs: \Node\Provider!Area!Filter, with Provider literally "Galaxy" (not the Galaxy name) and Node being the machine name. For this rig: \DESKTOP-6JL3KKO\Galaxy!DEV catches alarms. 2. InitializeConsumer before RegisterConsumer — discovered earlier; bug-fix for PR A.5's AlarmClientConsumer. With these in place, GetHighPriAlarm returned a record on every poll for 60s straight (117/117 calls). But every call throws ArgumentOutOfRangeException: Not a valid Win32 FileTime, because AlarmRecord has five DateTime fields (ar_Time / ar_OrigTime / ar_AckTime / ar_RtnTime / ar_SubTime) and AVEVA writes sentinel FILETIME values for unset ones (e.g., ar_AckTime on an unacknowledged alarm). The aaAlarmManagedClient.dll auto-marshals FILETIME -> DateTime and rejects out-of-range values. GetStatistics still reports total=0 active=0 even with GetHighPriAlarm returning records — those two APIs have different views. The active read API for current alarms is GetHighPriAlarm, not GetStatistics's change array. So the consumer chain works. The blocking issue is now extracting the payload past the AVEVA-shipped DateTime auto-marshaling. Three approaches for the next PR: 1. Patch aaAlarmManagedClient.dll via ildasm/ilasm round-trip. 2. Define a custom [ComImport] interface with safe-blittable types and Marshal.QueryInterface to it. 3. Use IDispatch late binding to bypass strong-typed marshaling. Option 2 is cleanest; needs the AlarmClient COM IID. Probe changes: - Subscription expression set to \<MachineName>\Galaxy!DEV. - GetHighPriAlarm tally counters (ok-with-record vs throw). - 117 throws / 0 ok-with-record over 60s confirms alarms are flowing continuously while the user's flip script runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bb7be14d1d |
probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified
Sixth probe iteration with every consumer-side knob exhausted: - Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!, \Galaxy!TestArea, \.\Galaxy!. - Read channels polled at 500ms: GetStatistics, GetHighPriAlarm, SFCreateSnapshot + SFGetStatistics. - Filters: priority 0..32767, qtSummary + qtHistory both tried, asAlarmActiveNow. - AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug on default(DateTime). Result: every read API returns empty for the entire 60s window even with TestMachine_001.TestAlarm001 firing every 10s and aaObjectViewer confirming InAlarm transitions. The aaAlarmManagedClient.AlarmClient is not the receive surface AVEVA's alarm pipeline routes to in this Galaxy configuration. The consumer chain is verified working end-to-end: Initialize + Register + Subscribe all succeed, GetProviders finds the provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's internal hwnd. There is simply no alarm data flowing through this consumer surface. Next investigation is not consumer-side: either find the SDK aaObjectViewer's alarm panel uses, or query the historian event storage directly. If alarms only flow via the historian path on this customer's Galaxy, the worker's PR A.5 architecture is a dead-end and A.2 needs a different transport. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8ac6642bf8 |
probe: subscribe-parameter sweep — alarms still absent, producer-side blocked
Tried every documented subscription knob with InitializeConsumer present + provider visible at status 100: - qtSummary AND qtHistory (the only eQueryType values). - Priority 1..999 AND 0..32767. - FilterMask/Spec asNone AND asAlarmActiveNow. eAlarmFilterState is single-state-valued (asNone=0, asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits, so the filter surface is exhausted. GetStatistics continued to report total=0 active=0 codes=[7] for every poll across all combinations. User confirmation: the BoolAlarm extension on TestMachine_001.TestAlarm001 is evaluating (the $Alarm.InAlarm sub-attribute flips true/false in lockstep with the script writes, visible in aaObjectViewer). So the consumer chain is verified working end-to-end on our side. What's missing is producer-side publication into the aaAlarmManagedClient stream. Probable causes (config, not code): - BoolAlarm extension's "publish to alarm manager" / "Active" / "Enabled" flag may be off. - Alarm-vs-event mode setting may have it routing to events, not alarms. - Platform alarm area may not match the consumer's subscription scope. Resolution path: check the BoolAlarm extension's config in System Platform IDE; check aaObjectViewer's Active Alarms panel (not attribute panel) to see if the alarm appears there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4e8928cf71 |
probe: InitializeConsumer required — provider visible after, alarms still absent
InitializeConsumer was the missing call. Adding it before
RegisterConsumer makes the \Galaxy! provider appear in
GetProviders (status 0 -> 100 within 500ms). Without Initialize,
GetProviders returns an empty list even though everything else
returns rc=0 (success).
Probe trace 2026-05-01:
InitializeConsumer -> 0
RegisterConsumer -> 0
GetProviders [after Register] -> count=0 list=[]
Subscribe('\Galaxy!') -> 0
GetProviders [after Subscribe] -> count=1 list=[ 0 \Galaxy!]
GetProviders [poll #1] -> count=1 list=[100 \Galaxy!]
Despite the provider being at "100% query complete" for the
entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] -- no alarm transitions reached the
consumer even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run.
So the consumer chain works end-to-end. What's missing is alarm
traffic from the producer side. The next discriminator is
whether ObjectViewer (or another live consumer) sees the alarm
fire while the script runs.
API-ordering bug fix to apply to PR A.5's AlarmClientConsumer
regardless of how A.2 lands: AlarmClientConsumer.Subscribe
should call InitializeConsumer before RegisterConsumer (currently
omits Initialize entirely, which means the provider chain is
never visible from the worker either). That fix lifts a
fundamental bug independent of the polling-vs-callback question.
Probe changes:
- Added InitializeConsumer call before RegisterConsumer.
- Added LogProviders helper that logs only on change; called
after Register, after Subscribe, and on every poll. Easier
to spot when the provider chain transitions from empty to
populated.
- Restored Skip-gating after run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f4423dfb6d |
probe: GetProviders=0 — alarm path upstream-blocked on dev rig
Extended AlarmClientWmProbeTests to call AlarmClient.GetProviders after RegisterConsumer. Run 2026-05-01: GetProviders -> rc=0 count=0 list=[] Zero alarm providers visible to the consumer. This explains every preceding probe run — no providers means no alarm events, regardless of subscription expression or value writes upstream. Even with a System Platform script flipping TestMachine_001.TestAlarm001 every 10s during the run, GetStatistics reported no transitions, no positions[] entries, no field changes after t=0.85s. Possible causes (dev-rig configuration, not code): 1. No $Alarm extension on the test bool — flipping the value writes a value but doesn't fire an alarm. 2. AVEVA alarm-manager service (aaAlarmMgr or equivalent) not running on this rig. 3. Process security context — providers registered under a service account aren't visible to a consumer running under a normal user account. A.2 implementation is blocked on this until at least one provider is visible. Once a provider exists, the polling-vs-callback question is answerable in one probe run; without a provider both paths return the same "nothing happening" answer. Probe changes: - Added in-process MxAccess Write attempt (TriggerWriteValue) — hit TargetParameterCountException so the Write signature is not (handle, item, value); reflection diag added but not resolved. Now disabled in favor of external trigger. - Added GetProviders enumeration after RegisterConsumer. - Removed firePrint/clearPrint markers; probe is observe-only. - Added ArchestrA.MxAccess reference to the test project. Also updated docs/AlarmClientDiscovery.md with the alarm-provider-visibility section explaining what's blocked and why. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3ff4969224 |
probe: GetStatistics polling viable, Galaxy has no active alarms today
Extended AlarmClientWmProbeTests.ProbeAlarmClientWmMessages to also call GetStatistics every ~2s during the pump window. Re-ran on the dev rig 2026-05-01: - GetStatistics is safely callable from the same thread that did RegisterConsumer + Subscribe. Every poll (9 calls / 20s window) returned rc=0, no exceptions. - Galaxy currently has zero active alarms. total=0 active=0 suppressed=0 newAlarms=0 across every poll. positions[] and handles[] arrays were empty. - changes=1 codes=[7] was constant across all polls, matching the constant 1 Hz WM 0xC275 cadence — same heartbeat semantics exposed through both the WM path and the pull API. Confirms the polling design is mechanically viable: GetStatistics threading-affinity is fine and the call is cheap. The remaining unknown is whether GetStatistics populates positions[] / handles[] with real entries when an alarm actually fires. Proving that requires triggering an alarm — next probe is an MxAccess write to a $Alarm-extended boolean tag (reference pending). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
12881ca791 |
docs+test: live AlarmClient WM probe — heartbeat-only, hWnd not used
Added MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs as a Skip-gated
runtime probe. Run on the dev rig 2026-05-01 against the live AVEVA
install (Galaxy reachable, no manual alarm fired). Findings:
- RegisterConsumer(hWnd, ...) and Subscribe("\Galaxy!", ...) both
return 0 (success). Calls are valid against the deployed assembly.
- A registered-message-class WM (ID 0xC275 in this OS session) fires
every ~1 second after Subscribe completes. Constant wParam=0x1100,
constant lParam=0x079E46D8 — looks like a heartbeat / keepalive,
not a per-change notification.
- Critically, this WM is delivered to AVEVA's own internal window
(hwnd=0x18032E), NOT to the consumer hWnd we registered. The
consumer window receives only the standard WM_CREATE / WM_DESTROY
sequence; no AVEVA traffic in between.
This invalidates the WM_APP-pump design previously documented. The
hWnd parameter to RegisterConsumer appears to be a registration
identity only — AVEVA's notification path runs entirely against
AVEVA's own internal window.
Two viable A.2 designs replace the previous one:
1. Polling. Call GetStatistics on a 500ms timer in the worker's STA
and react to whatever change set it reports. No window plumbing
needed. Latency floor = poll period. Matches AVEVA's own
internal heartbeat cadence.
2. Hook AVEVA's internal window. Discover AVEVA's own hwnd,
SetWindowSubclass on it, intercept WM 0xC275 on AVEVA's thread.
Higher fidelity, lower latency, but invasive and fragile across
AVEVA upgrades — likely a non-starter.
Recommendation in docs/AlarmClientDiscovery.md is option 1 (polling)
unless a follow-up probe with a real fired alarm shows AVEVA does
post change-specific WMs to a different hWnd.
Open follow-up probes documented:
- Fire a real Galaxy alarm during pump and check whether WM 0xC275
cadence changes or GetStatistics returns non-empty arrays.
- GetStatistics threading affinity test.
- Hook AVEVA's internal window 0x18032E.
- Decompile aaAlarmManagedClient IL for RegisterConsumer to find
whether WNAL_Register's callback surface is wrapped.
Test project changes:
- Added Reference to aaAlarmManagedClient + IAlarmMgrDataProvider
(Private=true so the DLL gets copied into bin for test load).
- Test-suite-wide: 127 real tests still pass; both alarm-related
Skip-gated tests skip cleanly.
Code change to the probe is additive — the worker is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|