758aca23555752a2e37f8a20945c658e3fc02a14
113 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
06030dd1ef |
Implement MXAccess write commands in the worker
The .proto contract and MxCommandKind already defined Write, Write2,
WriteSecured, and WriteSecured2, but the worker's MxAccessCommandExecutor
had no case for any of them — every write kind fell through to
CreateInvalidRequestReply ("Unsupported MXAccess command kind Write").
Implement all four:
- VariantConverter.ConvertToComValue projects an MxValue into a
COM-marshalable object (scalars, arrays, null) — the inverse of the
existing COM-to-MxValue projection.
- IMxAccessServer / MxAccessComServer gain Write/Write2/WriteSecured/
WriteSecured2, routed to ILMXProxyServer / ILMXProxyServer4.
- MxAccessSession and MxAccessCommandExecutor add the four write paths,
following the existing ExecuteAdvise pattern; the reply is a plain OK
reply and the outcome surfaces later as an OnWriteComplete event.
Verified live: a Write now returns PROTOCOL_STATUS_CODE_OK and produces
an OnWriteComplete event where it previously returned InvalidRequest.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
964b40dcbc |
Fix stale WorkerProjectReferenceTests MXAccess-interop assertion
MxAccessInteropReference_ExistsOnlyInWorkerProject asserted the MXAccess COM interop was referenced only by MxGateway.Worker. The worker test project now legitimately references ArchestrA.MxAccess and Interop.WNWRAPCONSUMERLib so it can exercise the COM-facing worker code (WnWrapAlarmConsumer, the alarm tests). Renamed to ..._ExistsOnlyInWorkerAndWorkerTestProjects, updated the assertion to expect both projects, and made it order-independent. The architecture invariant the test protects — the gateway/contracts never reference MXAccess COM — still holds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bb5603b7ec |
Fix flaky GalaxyHierarchyRefreshServiceTests timing race
ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFault BackgroundService cancelled the service immediately after StartAsync, so under parallel load the first RefreshAsync could be skipped (RefreshCallCount 0) and `await executeTask` rethrew TaskCanceledException before the IsFaulted assertion. The test now waits for a TaskCompletionSource signal that the first refresh was attempted before cancelling, and uses Task.WhenAny so a Canceled ExecuteTask does not rethrow. Confirmed stable across full-suite runs (408/408). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ee959e46e6 |
Resolve Contracts-001/004/005/006/007/008 code-review findings
Contracts-001: docs/Grpc.md still described "four MxAccessGateway RPCs" —
updated to the actual six (adding AcknowledgeAlarm and QueryActiveAlarms to
the handler and validation-rule sections).
Contracts-003 (Won't Fix): the finding is factually wrong — the <Protobuf>
item for mxaccess_worker.proto already sets ProtoRoot="Protos"; all three
items are consistent (confirmed back to the reviewed commit).
Contracts-004: corrected the stale GatewayContractInfo XML summary
("before generated protobuf contracts are introduced").
Contracts-005: no proto field/enum value was ever removed, so no reserved
ranges were invented. Added a wire-compatibility policy comment to all three
.proto files instructing future editors to reserve removed numbers.
Contracts-006: documented MxStatusProxy.success — it mirrors the COM
MXSTATUS_PROXY numeric success member, is not a boolean, and clients should
branch on category.
Contracts-007: added 13 round-trip tests covering galaxy_repository.proto
messages, bulk-subscribe payloads, and raw-value/IPC worker bodies.
Contracts-008: WorkerAlarmRpcDispatcher never assigns AcknowledgeAlarmReply.
status, so the old "native status" proto comment was misleading. Corrected
the hresult/status proto comments and documented the worker native_status →
public reply mapping in AlarmClientDiscovery.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b4f5e8eb48 |
Resolve IntegrationTests-007..010 code-review findings
IntegrationTests-007: the three live test classes contend for shared singletons (one MXAccess COM, one ZB SQL DB, one GLAuth). Added LiveResourcesCollection with DisableParallelization and applied it to all three so they no longer run concurrently. IntegrationTests-008: the three live fact attributes each re-implemented the env-var check. Added IntegrationTestEnvironment.IsEnabled and all three now delegate to it. IntegrationTests-009: reworded the misleading "Mock server call context" XML doc — it is a hand-written stub with no verification behavior. IntegrationTests-010: WaitForMessageAsync ignored cancellation. It now takes an optional CancellationToken linked with the timeout; the smoke test shares one cancellation source with the StreamEvents call context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
371bcb3f91 |
Resolve Worker.Tests-008..015 code-review findings
Worker.Tests-008: moved the misplaced WorkerLogRedactor test out of VariantConverterTests into Bootstrap/WorkerLogRedactorTests. Worker.Tests-009: renamed 46 snake_case alarm-test methods to PascalCase Method_Scenario_Expectation. Worker.Tests-010: replaced a weak Assert.Contains with an exact assertion against the real diagnostic message and corrected the XML doc. Worker.Tests-011: renamed and re-documented a cancellation test that overstated what it proved. Worker.Tests-012: added an oversized-frame (MessageTooLarge) test; renamed the mislabeled zero-length-payload test. Worker.Tests-013: removed the fixed-100ms ThrowIfCompletedAsync helper; the caller now races runTask deterministically. Worker.Tests-014: consolidated duplicated test fakes/helpers (FakeRuntimeSession, NoopComApartmentInitializer, NoopEventSink, frame helpers) into a shared TestSupport namespace. Worker.Tests-015: added MxAccessEventQueue coverage for drain-all (maxEvents 0), empty-queue drain, and enqueue-after-fault. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9582de077b |
Resolve Tests-007..012 code-review findings
Tests-007: TestServerCallContext and stream-writer/constraint helpers were copy-pasted across five test files. Consolidated into a shared MxGateway.Tests.TestSupport namespace; duplicates deleted. Tests-008: renamed snake_case alarm-test methods to PascalCase Method_Condition_Result and dropped redundant usings. Re-triaged two inaccurate sub-claims (the "wnwrap" name and a required CompilerServices using). Tests-009: corrected three copy-paste-mismatched XML <summary> comments in SessionManagerTests. Tests-010: added the missing anonymous-localhost security negatives — bypass disallowed, and loopback-allowed from a remote address. Tests-011: SessionWorkerClientFactoryFakeWorkerTests discarded worker tasks. The test class now tracks each launcher and observes its task in DisposeAsync. Tests-012: added xunit.runner.json pinning collection parallelism and documented the ephemeral-port convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1764eff1cf |
Resolve Worker-009..015 code-review findings
Worker-009: WorkerFrameWriter serialized twice and WorkerFrameReader allocated a payload byte[] per frame. The writer now serializes once into a single prefix+payload buffer; the reader rents the payload buffer from ArrayPool and honors the logical frame length. Worker-010: VariantConverter projected a uint+Time value as a full FILETIME, producing a near-1601 timestamp. The FILETIME projection is now gated on `value is long`; uint falls through to the integer projection. Worker-011: replaced the opaque retryAttempts formula in WorkerPipeClient with MaxRetryAttempts = int.MaxValue, leaving the connect deadline as the sole bound. Worker-012: rewrote stale "future PR / polls on a Timer" comments in AlarmDispatcher, AlarmCommandHandler, MxAccessAlarmEventSink and MxAccessEventMapper to match the shipped, post-Worker-001 behavior. Worker-013 (re-triaged): already resolved — StaMessagePumpTests and MxAccessStaSessionTests cover the pump and poll loop directly. Worker-014: moved IAlarmCommandHandler into its own file so AlarmCommandHandler.cs declares one public type. Worker-015: clarified the MxAccessBaseEventSink.EnqueueEvent overflow-catch comment explaining the deliberate double RecordFault no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fe9044115b |
Resolve Server-007..014 code-review findings
Server-007: GalaxyHierarchyProjector re-filtered the whole hierarchy per page (O(total) paging). It now memoizes the filtered list per cache-entry + filter signature so subsequent pages are an O(pageSize) slice. Server-008: WatchDeployEvents re-resolved browse subtrees and rebuilt globs per streamed event. ResolveBrowseSubtrees is hoisted out of the loop and GalaxyGlobMatcher caches compiled Regex instances per pattern. Server-009: auth-store connections used no busy timeout or WAL. A new OpenConnectionAsync applies journal_mode=WAL and a busy_timeout; all auth call sites use it. docs/Authentication.md updated. Server-010: the dashboard rendered Rotate/Revoke for revoked keys, where Rotate silently reactivates them. ApiKeysPage now shows actions only for Active keys. docs/Authentication.md updated. Server-011: WorkerAlarmRpcDispatcher converted to a primary constructor and brought in line with module conventions. Server-012: CLAUDE.md corrected to the canonical *:* scope strings. Server-013 (partly re-triaged): three named coverage gaps were already closed; the genuine gap (WorkerExecutableValidator) is now covered. Server-014: rewrote stale "alarm path not yet wired" comments in MxAccessGatewayService to describe the production WorkerAlarmRpcDispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1f546c46ee |
Resolve Contracts-002 code-review finding
MxCommandReply.payload has no by-name ack case: MX_COMMAND_KIND_ACKNOWLEDGE_ ALARM_BY_NAME reuses the acknowledge_alarm reply payload. Verified the worker (MxAccessCommandExecutor.ExecuteAcknowledgeAlarmByName) and gateway (WorkerAlarmRpcDispatcher) already implement this correctly — the gap was purely undocumented contract asymmetry. Documented the reuse on the proto oneof case and the AcknowledgeAlarmReplyPayload message comment (regenerating the .NET contract), and in docs/AlarmClientDiscovery.md. Added ProtobufContractRoundTripTests.MxCommandReply_AcknowledgeAlarmByName_Reuses AcknowledgeAlarmPayloadCase to pin the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f13f35bc79 |
Resolve IntegrationTests-003..006 code-review findings
IntegrationTests-003: the live MXAccess smoke test asserted on the first streamed event, which a registration/quality bootstrap event could occupy. The recording writer now waits for the first event matching a predicate (Family == OnDataChange). IntegrationTests-004: the cleanup `finally` could throw and mask an original assertion failure. Shutdown now routes through a helper that logs cleanup exceptions instead of propagating them. IntegrationTests-005: added live MXAccess parity tests — a Write round-trip to an advised item, and an invalid-handle command surfacing the MXAccess failure without a transport fault. IntegrationTests-006: added live LDAP failure-path tests — wrong password (no password leak), unknown username, and server-unreachable. docs/GatewayTesting.md updated to describe the new cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
18ce2922e2 |
Resolve Worker.Tests-003..007 code-review findings
Worker.Tests-003: removed the wall-clock `Elapsed < 2s` assertion from InvokeAsync_WakesIdlePumpForQueuedCommand; the awaited completion against a 30s idle period already proves the wake event drove dispatch. Worker.Tests-004: MxAccessStaSession.Dispose now joins the alarm poll task after cancelling the CTS (consistent with ShutdownGracefullyAsync), and Dispose_StopsAlarmPollLoop asserts deterministically instead of via Task.Delay. Worker.Tests-005: undisposed MemoryStream instances across the frame-protocol and pipe-session tests are now `using` declarations. Worker.Tests-006: Dispose_StopsAlarmPollLoop now constructs MxAccessStaSession with `using` so a failed assertion cannot leak the STA poll loop. Worker.Tests-007: docs/WorkerFrameProtocol.md verification section corrected to target MxGateway.Worker.Tests / MxGateway.Worker with -p:Platform=x86. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
5ade3f4f48 |
Resolve Tests-003, -004, -005, -006 code-review findings
Tests-003: temp auth-DB directories leaked under %TEMP%. Added the TempDatabaseDirectory IDisposable helper (clears the Sqlite connection pool, then recursively deletes); SqliteAuthStoreTests and ApiKeyAdminCliRunnerTests now dispose every directory they create. Tests-004: added end-to-end coverage composing the real authorization interceptor in front of the real MxAccessGatewayService, plus scope-resolver tests confirming an unmapped request type fails closed to the admin scope. Tests-005: added coverage for a worker faulting mid-command — a pipe disconnect and a worker fault while an InvokeAsync is in flight both fail the pending invoke. No product change needed. Tests-006 (re-triaged): the flaky ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess is a test race, not a product bug — the kill runs synchronously inside SetFaulted. Rewrote it to await FakeWorkerProcess exit deterministically, and replaced fixed Task.Delay timing in the late-reply and heartbeat tests with FIFO ordering and an injected ManualTimeProvider. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
54325343bd |
Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state. ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing the StaHung fault. Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001; the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync, which caught only graceful-stop exceptions. A failing PollOnce now records a WorkerFault on the event queue instead of vanishing on a non-awaited task. Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed out, leaking the STA thread and COM object. It now always disposes (MxAccessStaSession.Dispose is idempotent and bounded). Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember reflection fallback with an IMxAccessServer fast path plus typed ILMXProxyServer* casts; a non-conforming object now fails fast. Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession records the alarm consumer's STA thread id and asserts every PollOnce runs on it via a unit-testable guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1d9e3afadd |
Resolve Server-002, -004, -005, -006 code-review findings
Server-002: the gateway never terminated leftover MxGateway.Worker.exe processes at startup, contradicting gateway.md and CLAUDE.md. Added IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator, and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are accepted); updated gateway.md to describe the implemented behavior. Server-004: API-key scopes were persisted verbatim with no validation. Added GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now reject unknown scope strings. Server-005: a non-SqlException/InvalidOperationException fault on the initial Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches all non-cancellation exceptions on first load and RefreshCoreAsync broadens its catch so the cache records Stale/Unavailable instead. Server-006: OpenSessionAsync incremented the open-sessions gauge before alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch path now calls SessionRemoved() when the gauge was incremented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1b4dcf32d5 |
Resolve Worker.Tests-001 and Worker.Tests-002 code-review findings
Worker.Tests-001: StaMessagePump had no direct unit test. Added Sta/StaMessagePumpTests.cs — 8 STA-thread facts covering WaitForWorkOrMessages (wake-event signalled before/during the wait, timeout expiry, zero-timeout fast path, the QS_ALLINPUT posted-message wake path) and PumpPendingMessages drain counting. Worker.Tests-002: no test drove a COM event through the integrated sink -> mapper -> queue path. Added MxAccess/MxAccessBaseEventSinkTests.cs — 5 facts driving OnDataChange, OnWriteComplete, OperationComplete and OnBufferedDataChange through a real MxAccessBaseEventSink + mapper + queue and asserting the converted WorkerEvent lands in MxAccessEventQueue. The four COM event handlers were widened private -> internal and InternalsVisibleTo for MxGateway.Worker.Tests was added, mirroring MxAccessAlarmEventSink's existing test seam; no worker behavior changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
53e3973209 |
Resolve Worker-001, Worker-002, Worker-003 code-review findings
Worker-001: WnWrapAlarmConsumer armed a System.Threading.Timer whose OnPoll callback ran GetXmlCurrentAlarms2 on a thread-pool thread against the Apartment-threaded wnwrap COM object, which can deadlock on cross-apartment marshaling. Removed the pollTimer/pollIntervalMs fields, OnPoll, the poll-interval constructor parameter, and the timer arm/disposal. Polls are driven externally by the STA via StaRuntime.InvokeAsync(PollOnce). Worker-002: RunHeartbeatLoopAsync delayed a full HeartbeatInterval before the first heartbeat. Restructured so the first beat is sent immediately on entering the loop and the delay applies only between subsequent beats. Worker-003: ProcessCommandAsync silently returned without a reply when _state was not a command-serving state after dispatch. Both drop sites now log a WorkerCommandResultDropped diagnostic with correlation_id via IWorkerLogger; _state is now volatile. Three pre-existing tests that asserted strict frame ordering were updated to tolerate an interleaved first heartbeat (Worker-002 consequence). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b381bfcaf1 |
Resolve Tests-001 and Tests-002 code-review findings
Tests-001: FakeSessionManager.TryGetSession unconditionally synthesized a session, so Invoke_WhenSessionMissing_ThrowsNotFound did not actually verify the missing-session path. Added ResolveOnlySeededSessions/SeedSession to the fake, rewrote the missing-session test, and added seeded-resolution and alarm-RPC missing-session coverage. Tests-002: re-triaged. GalaxyRepository issues only constant SQL; filters are applied in-memory by GalaxyHierarchyProjector/GalaxyGlobMatcher. Kept as a valid coverage gap and added GalaxyFilterInputSafetyTests exercising filter/glob input safety directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a8aafdf974 |
Enforce dashboard authorization on all component routes
Fixes code-review findings Server-001 (Critical) and Server-003 (High). Server-001: the dashboard Razor components were mapped with no authorization policy, so every dashboard page — including the API Keys page — was reachable unauthenticated. MapRazorComponents<App>() now requires DashboardAuthenticationDefaults.AuthorizationPolicy; unauthenticated requests are challenged by the cookie scheme and redirected to the login page. Server-003: DashboardAuthenticator.CreatePrincipal never issued the 'scope' claim that DashboardAuthorizationHandler checks when Dashboard:RequireAdminScope is enabled, so enforcing the policy would have denied every LDAP login. CreatePrincipal (reached only after the required-group check passes) now emits the admin scope claim. Replaces the GatewayApplicationTests case that asserted dashboard routes allow anonymous access — it encoded the bug as expected behavior — with tests that verify component routes require the policy and the login/logout/denied endpoints allow anonymous. All 309 MxGateway.Tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a67a5a4857 |
fix(worker): wire alarm command handler and STA poll loop (Gap 1 + Gap 2)
Gap 1 — WorkerPipeSession now passes `eq => new AlarmCommandHandler(eq)` as
the alarmCommandHandlerFactory in all three places it constructs
MxAccessStaSession (two convenience constructors and InitializeMxAccessAsync).
Previously the parameterless MxAccessStaSession() set the factory to null,
so every SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms command
returned "alarm consumer not configured" in a deployed worker.
- Added internal `MxAccessStaSession(Func<MxAccessEventQueue, IAlarmCommandHandler>?)`
constructor that builds all defaults but accepts a factory.
- Added public `MxAccessStaSession(StaRuntime, factory, eventQueue, alarmFactory?)`
4-arg overload to complete the constructor chain.
Gap 2 — WnWrapAlarmConsumer now disables its internal threadpool Timer
(pollIntervalMilliseconds=0 in the default constructor). MxAccessStaSession
starts a `RunAlarmPollLoopAsync` background task that sleeps off-STA then
calls `staRuntime.InvokeAsync(() => handler.PollOnce())` at 500ms intervals.
This satisfies the ThreadingModel=Apartment requirement of wwAlarmConsumerClass:
every GetXmlCurrentAlarms2 call now runs on the worker's STA.
- Added `PollOnce()` to `IMxAccessAlarmConsumer`, `AlarmDispatcher`,
`IAlarmCommandHandler`, and `AlarmCommandHandler`.
- Poll loop cancelled and awaited before alarm handler disposal in both
ShutdownGracefullyAsync and Dispose.
Tests: 4 new tests in MxAccessStaSessionTests verify that
- SubscribeAlarms reaches the handler when the factory is wired (Gap 1)
- SubscribeAlarms returns InvalidRequest without a factory (regression guard)
- PollOnce is called on the STA thread within 3s (Gap 2)
- The poll loop stops after Dispose (Gap 2 lifecycle)
All fake IMxAccessAlarmConsumer / IAlarmCommandHandler test implementations
updated with no-op PollOnce() to satisfy the new interface member.
Worker tests: 199 passed / 1 pre-existing failure / 4 skipped (was 195/1/4).
Server tests: 308 passed / 0 failures (unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e00ee61cf0 |
Place Last Refresh next to Last Deploy on the Galaxy page
Group the two double-width timestamp cards at the start of the metric row so the deploy/refresh pair reads together, ahead of the count cards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
271bf7edff |
Give Galaxy timestamp cards double-width boxes
The Last Deploy and Last Refresh metric cards hold full timestamps that wrapped to three or four lines in a single-width card. Add a Wide option to MetricCard (grid-column: span 2) and set it on both Galaxy timestamp cards. Also switch .metric-value to overflow-wrap: break-word so a date token is never split mid-value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f598b3a647 |
Stop dashboard table cells breaking whole words
overflow-wrap: anywhere let the table layout shrink a column below its longest word, splitting tokens like "unconstrained" mid-word. Switch to break-word so a word only breaks when it genuinely cannot fit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
509b0118d4 |
Keep dashboard button labels on a single line
Bootstrap 5 .btn no longer pins white-space, so the Rotate/Revoke action buttons broke mid-word in the narrow API Keys actions column. Pin .btn to white-space: nowrap so a label always reads as one word. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
298836d2f3 |
Keep dashboard status chips on a single line
Status chips wrapped to two lines in narrow table columns (e.g. the session State column). Pin .chip to white-space: nowrap so an enumerated state always reads as one token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
96bea1d478 |
Apply technical-light design system to the gateway dashboard
Restyles the Blazor dashboard onto a portable token-based theme so it reads like an instrument panel: warm-paper background, hairline-ruled panels, IBM Plex type, monospace tabular numerics, and status carried by colour chips. Vendors theme.css + IBM Plex fonts, rewrites dashboard.css as a thin token-driven view layer, and swaps the Bootstrap navbar and status badges for the design-system app bar and chips. Also includes pending API-key management, Galaxy hierarchy projection, and constraint-enforcement work with their tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a4ed605f74 |
A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
- Subscribe + 1st PollOnce yield real transition events
- Field-by-field decode correct (provider, group, tag, severity,
UTC timestamp, comment, type)
- SnapshotActiveAlarms reflects current state
- AcknowledgeByName(real identity) -> rc=0
- Pipeline keeps streaming transitions on the 10s cadence post-ack
Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:
1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
earlier discovery-doc recommendation) makes the first
GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
call is unnecessary because the round-trip echo is mangled was
wrong — mangled echo or not, the call is required.
2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
instance (returns -55). Workaround: provision a parallel
"ack-only" wnwrap consumer that runs Initialize → Register →
Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
Production WnWrapAlarmConsumer now holds two COM clients;
AcknowledgeByName always dispatches through the ack-only one.
3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
8-arg overload returns -55 on this AVEVA build (apparently a
stub); the v1 6-arg overload works. Production now calls the
6-arg overload, discarding the proto's operator_domain and
operator_full_name fields. The proto contract keeps both for
forward-compat if AVEVA fixes the v2 method.
Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).
Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.
Test counts after this slice:
Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
1 pre-existing structure-fail (untouched)
Server: 308 pass / 0 fail
Solution builds clean.
docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4e02927f01 |
A.3 (alarm-ack-by-name): public AcknowledgeAlarm now accepts Provider!Group.Tag references
Closes the gap where the public AcknowledgeAlarm RPC required canonical GUIDs but OnAlarmTransitionEvent.AlarmFullReference is "Provider!Group.Tag". Adds an AVEVA AlarmAckByName path that wraps wwAlarmConsumerClass.AlarmAckByName so callers can ack with the natural reference. Proto: - New MxCommandKind.AcknowledgeAlarmByName (=29). - New AcknowledgeAlarmByNameCommand(alarm_name, provider_name, group_name, comment, operator_user/node/domain/full_name) on MxCommand oneof. - AcknowledgeAlarmReplyPayload (existing) carries the AVEVA native status; reused for the by-name path. Worker: - IMxAccessAlarmConsumer + WnWrapAlarmConsumer + AlarmDispatcher + AlarmCommandHandler all gain an AcknowledgeByName(name, provider, group, comment, operator-identity) overload that maps to wwAlarmConsumerClass.AlarmAckByName. - MxAccessCommandExecutor: new switch arm routes MxCommandKind.AcknowledgeAlarmByName to the handler. Empty alarm_name yields InvalidRequest; handler exceptions surface as MxaccessFailure. Gateway: - WorkerAlarmRpcDispatcher.TryParseAlarmReference: parses "Provider!Group.Tag" with the convention that the FIRST '!' separates provider, the FIRST '.' after '!' separates group; tag may contain more dots. - AcknowledgeAsync now branches: GUID input → AcknowledgeAlarm command (existing path); reference input → AcknowledgeAlarmByName command (new path); neither parses → InvalidRequest with a clear diagnostic. Tests: 13 new unit tests cover each layer end-to-end: - WorkerAlarmRpcDispatcher.TryParseAlarmReference (3 valid + 8 invalid forms) including the realistic 4-component "Galaxy!TestArea. TestMachine_001.TestAlarm001" reference. - WorkerAlarmRpcDispatcher.AcknowledgeAsync routes references through AcknowledgeAlarmByName + propagates the full operator tuple. - Executor switch arm carries the by-name tuple and rejects empty alarm_name. - AlarmDispatcher.AcknowledgeByName forwards to consumer. - Existing fakes extended for the new overload. Counts: server 308/0, worker 195/3 skip / 1 pre-existing structure-fail (untouched). Solution builds clean. End-to-end alarms-over-gateway now serves the full lmxopcua flow: client.AcknowledgeAlarm(reference="Galaxy!TestArea.TestMachine_001.TestAlarm001", operator_user="alice") → gateway parses → IPC AcknowledgeAlarmByName → worker AlarmAckByName → AVEVA history. The remaining piece for full parity is a live dev-rig smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
47b1fd422c |
A.3 (auto-subscribe): SessionManager issues SubscribeAlarms on session open
Adds the missing trigger that activates the worker's wnwrap consumer.
Without this, every session opened in OK state but the consumer never
started, so AcknowledgeAlarm/QueryActiveAlarms returned "alarm consumer
not configured" forever.
New AlarmsOptions config block (under MxGateway:Alarms):
- Enabled (default false): gates the auto-subscribe path so existing
deployments without alarm configuration are unaffected.
- SubscriptionExpression: explicit AVEVA expression like
\<machine>\Galaxy!<area>.
- DefaultArea: fallback used when SubscriptionExpression is empty;
composes \$(MachineName)\Galaxy!$(DefaultArea).
- RequireSubscribeOnOpen (default false): when true, an auto-subscribe
failure faults the session; when false, the failure is logged and
the session stays Ready (data subscriptions keep working, alarms
return "not subscribed" until the operator retries).
SessionManager.OpenSessionAsync gains a TryAutoSubscribeAlarmsAsync hook
that runs after MarkReady. Skips when alarms are disabled; otherwise
builds a SubscribeAlarmsCommand, invokes it on the session's worker
client, and either logs the resulting status or escalates per
RequireSubscribeOnOpen. SessionManagerException is the failure mode for
the strict path so callers in MxAccessGatewayService surface it as
session-open-failed.
Tests: 7 new unit tests cover the disabled lane, expression-driven
subscribe, DefaultArea fallback, success path, soft-failure (require
off), strict-failure (require on), and missing-config-strict-throw.
Server suite total: 295 pass / 0 fail. Solution builds clean.
End-to-end alarms-over-gateway path is now live (with config). Open a
session against a gateway with Alarms.Enabled=true + a valid
SubscriptionExpression; the worker's wnwrap consumer auto-subscribes;
QueryActiveAlarms streams snapshots; AcknowledgeAlarm acks by GUID.
Reference→GUID resolution (AlarmAckByName worker command) and the live
dev-rig smoke test remain follow-ups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9b21ca3554 |
A.3 (gateway dispatcher): WorkerAlarmRpcDispatcher routes alarm RPCs over the worker pipe
Replaces NotWiredAlarmRpcDispatcher in DI with a production
implementation that issues the new MxCommandKind.{AcknowledgeAlarm,
QueryActiveAlarms} commands across the IPC and unwraps the resulting
MxCommandReply into the public RPC types.
QueryActiveAlarms is fully wired: builds the QueryActiveAlarmsCommand
(forwarding alarm_filter_prefix), invokes it on the resolved
GatewaySession's worker client, and yields each ActiveAlarmSnapshot
from the QueryActiveAlarmsReplyPayload as the RPC stream. Worker
failures + missing sessions yield an empty stream — matches the
ConditionRefresh contract clients already speak to.
AcknowledgeAlarm is partially wired: the public RPC takes
AlarmFullReference (Provider!Group.Tag), but the worker's wnwrap
consumer acks by GUID. Strategy:
- If AlarmFullReference parses as a canonical GUID, forward it
directly through MxCommandKind.AcknowledgeAlarm. Native status
flows back via MxCommandReply.Hresult and the dedicated
AcknowledgeAlarmReplyPayload.NativeStatus.
- Otherwise, return InvalidRequest with a clear diagnostic naming the
follow-up — reference→GUID lookup needs a worker-side AlarmAckByName
command wrapping wwAlarmConsumerClass.AlarmAckByName.
DI: SessionServiceCollectionExtensions registers WorkerAlarmRpcDispatcher
as the default IAlarmRpcDispatcher; MxAccessGatewayService picks it up
via constructor injection. NotWiredAlarmRpcDispatcher is retained for
test fixtures that want the no-side-effect fake.
Tests: 7 new unit tests cover session-not-found short-circuit, GUID-vs-
reference branching, native-status propagation, worker MxaccessFailure
diagnostic propagation, and snapshot-stream yielding. Server test
suite total: 288/0 fail. Solution builds clean.
End-to-end alarms-over-gateway pipeline status:
consumer → sink → queue (A.2 + A.3 in-process slice)
worker IPC commands (A.3 worker slice)
gateway dispatcher (this slice)
Remaining for full E2E:
- Auto-issue SubscribeAlarms on session open (or add a public
SubscribeAlarms RPC). Without this trigger the consumer never
starts and Acknowledge/Query return "not subscribed".
- AlarmAckByName worker command for ack-by-reference.
- End-to-end live test against the dev rig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
01f5e6ad91 |
A.3 (worker IPC slice): proto SubscribeAlarms/Acknowledge/QueryActive commands + executor routing
Adds the worker-side IPC surface for the alarm subsystem so the gateway can drive the AlarmDispatcher across the named-pipe boundary. Adds four proto MxCommandKind values + matching command messages and two MxCommandReply payload variants: - SubscribeAlarmsCommand(subscription_expression) - UnsubscribeAlarmsCommand - AcknowledgeAlarmCommand(alarm_guid, comment, operator_user/node/domain/full_name) - QueryActiveAlarmsCommand(alarm_filter_prefix) - AcknowledgeAlarmReplyPayload(native_status) - QueryActiveAlarmsReplyPayload(repeated ActiveAlarmSnapshot snapshots) Worker plumbing: - New IAlarmCommandHandler interface + AlarmCommandHandler production impl. Lazy-creates an AlarmDispatcher (with a wnwrap-backed consumer by default) on the first SubscribeAlarms; routes Acknowledge / QueryActive / Unsubscribe through it. Idempotent under repeated Unsubscribe; rejects a second Subscribe without an intervening Unsubscribe; cleans up the consumer if the underlying Subscribe call throws. - MxAccessCommandExecutor: 4 new switch arms map MxCommandKind values to IAlarmCommandHandler calls. Acknowledge surfaces the AVEVA native status into both MxCommandReply.Hresult and the dedicated AcknowledgeAlarmReplyPayload.NativeStatus so gateway-side consumers can echo it without unpacking the outer envelope. Invalid GUIDs and missing payloads return InvalidRequest; handler exceptions return MxaccessFailure with the exception message in DiagnosticMessage. - MxAccessStaSession: new constructor overload accepts an alarmCommandHandlerFactory; it's invoked on the STA thread during StartAsync and the resulting handler is passed into the executor. ShutdownGracefullyAsync + Dispose tear it down on the STA before the data-side cleanup runs. Tests: 20 new unit tests covering AlarmCommandHandler lazy lifecycle (Subscribe/Unsubscribe/Acknowledge/Query/Dispose, error paths) and the executor's 4 alarm switch arms (OK/InvalidRequest/MxaccessFailure paths, hresult propagation, prefix filtering). Worker test suite total: 192 passed / 3 skipped (live probes) / 1 pre-existing structure-test fail (untouched). Deferred to next slice: gateway-side WorkerAlarmRpcDispatcher that replaces NotWiredAlarmRpcDispatcher, builds + sends these commands across the IPC, and unwraps the resulting MxCommandReply into AcknowledgeAlarmReply / ActiveAlarmSnapshot stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
82eb0ad569 |
A.3 (in-process slice): AlarmDispatcher wires consumer events onto event queue
Adds the in-process plumbing that connects WnWrapAlarmConsumer's AlarmTransitionEmitted stream to the worker's MxAccessEventQueue via MxAccessAlarmEventSink. With this change a transition raised by the consumer lands as an OnAlarmTransitionEvent proto on the queue, SessionId attached, ready for IPC dispatch. Mapping: provider!group.tag → AlarmFullReference, tag → SourceObjectReference, priority → severity, wnwrap STATE → AlarmConditionState (Active / ActiveAcked / Inactive — wnwrap's ack-vs-unack-on-cleared distinction collapses since OPC UA Part 9 doesn't model it). State delta drives AlarmTransitionKind via the existing AlarmRecordTransitionMapper table. Holding off on the proto IPC additions (SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms commands + WorkerAlarmRpcDispatcher) for a follow-up — those touch every layer of the worker IPC and warrant their own PR. This slice proves the consumer→sink→queue pipeline end-to-end with unit tests and clears the path for the proto additions to plug in cleanly. Tests: 10 new unit tests cover field-by-field mapping, the "unchanged-state-doesn't-emit" filter, the state→transition kind table, Subscribe / Acknowledge passthrough, SnapshotActiveAlarms → proto ActiveAlarmSnapshot mapping, and Dispose detaches the handler. All passing; total worker test count 172/3 skip / 1 pre-existing structure fail (untouched). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f711a55be4 |
A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient` to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by `wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime auto-marshaling that crashed `GetHighPriAlarm` with ArgumentOutOfRangeException on every poll. Live captured 60/60 polls clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform script flipped TestMachine_001.TestAlarm001 every 10s; the GUID, priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps arrived end-to-end. Implementation: - `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under `lib/` so dev boxes don't need the SDK to build. - New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a 500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the snapshot keyed by alarm GUID, and raises one `MxAlarmTransitionEvent` per state change. Includes the Initialize→Register-before-Subscribe ordering fix found during Discovery probe runs. - New library-agnostic types `MxAlarmSnapshotRecord` / `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build path is testable without an AVEVA install. - `AlarmRecordTransitionMapper` retired the COM-coupled `MapTransitionKind(eAlmTransitions)`; new pure helpers `ParseStateKind`, `MapTransition(prev, curr)`, and `ParseTransitionTimestampUtc` cover XML decode + state-delta logic. - `IMxAccessAlarmConsumer` event surface changed from `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>` and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` — decoupling the interface from any specific COM library. - Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider` refs; adds `Interop.WNWRAPCONSUMERLib`. Tests: - 36 new unit tests (state-string mapping, prev/current → proto kind decision table, timestamp UTC reassembly, XML payload parser, 32-char hex GUID round-trip) covering everything that doesn't touch the live COM surface — all passing. - Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives the live capture flow for regression / future probes. Docs: - `docs/AlarmClientDiscovery.md` "Option A — captured" section records sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip (prefer `Subscribe` for filtering), the `GetStatistics` AccessViolationException quirk, and the worker-integration outline. Pre-existing failure noted (separate): `MxAccessInteropReference_ExistsOnlyInWorkerProject` was already failing on HEAD — the test project still references `ArchestrA.MxAccess` for the Skip-gated discovery probes. Not regressed by this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f490ae2593 |
docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements only IDisposable (no [ComImport] interface, no class GUID) and has a single field "CwwAlarmConsumer* m_almUnmanaged". So AlarmClient is a C++/CLI managed wrapper around a native C++ class -- NOT a COM-interop class. The DateTime conversion happens INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling boundary. There's no separate COM interface to QI to. Revised approach (in docs/AlarmClientDiscovery.md): A. wnwrapConsumer.dll -- separate standalone COM library AVEVA ships at "C:\Program Files (x86)\Common Files\ArchestrA" exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output bypasses FILETIME marshaling entirely. Best fit -- real COM, self-contained, conventional production-grade approach. B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a vendor binary, brittle to upgrades. C. Reflect into m_almUnmanaged and call native vtable directly -- requires reverse-engineering the C++ class layout. Picking A. Probe restored to Skip; next commit starts the wnwrapConsumer integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
39f9fd8946 |
probe: BREAKTHROUGH — alarms flow via canonical \Node\Galaxy!Area, blocked by DateTime marshaling
Two findings that turn the alarm capture path on: 1. Subscription expression: \<MachineName>\Galaxy!<Area> is the canonical AlarmClient subscription format per ArchestrA docs: \Node\Provider!Area!Filter, with Provider literally "Galaxy" (not the Galaxy name) and Node being the machine name. For this rig: \DESKTOP-6JL3KKO\Galaxy!DEV catches alarms. 2. InitializeConsumer before RegisterConsumer — discovered earlier; bug-fix for PR A.5's AlarmClientConsumer. With these in place, GetHighPriAlarm returned a record on every poll for 60s straight (117/117 calls). But every call throws ArgumentOutOfRangeException: Not a valid Win32 FileTime, because AlarmRecord has five DateTime fields (ar_Time / ar_OrigTime / ar_AckTime / ar_RtnTime / ar_SubTime) and AVEVA writes sentinel FILETIME values for unset ones (e.g., ar_AckTime on an unacknowledged alarm). The aaAlarmManagedClient.dll auto-marshals FILETIME -> DateTime and rejects out-of-range values. GetStatistics still reports total=0 active=0 even with GetHighPriAlarm returning records — those two APIs have different views. The active read API for current alarms is GetHighPriAlarm, not GetStatistics's change array. So the consumer chain works. The blocking issue is now extracting the payload past the AVEVA-shipped DateTime auto-marshaling. Three approaches for the next PR: 1. Patch aaAlarmManagedClient.dll via ildasm/ilasm round-trip. 2. Define a custom [ComImport] interface with safe-blittable types and Marshal.QueryInterface to it. 3. Use IDispatch late binding to bypass strong-typed marshaling. Option 2 is cleanest; needs the AlarmClient COM IID. Probe changes: - Subscription expression set to \<MachineName>\Galaxy!DEV. - GetHighPriAlarm tally counters (ok-with-record vs throw). - 117 throws / 0 ok-with-record over 60s confirms alarms are flowing continuously while the user's flip script runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bb7be14d1d |
probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified
Sixth probe iteration with every consumer-side knob exhausted: - Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!, \Galaxy!TestArea, \.\Galaxy!. - Read channels polled at 500ms: GetStatistics, GetHighPriAlarm, SFCreateSnapshot + SFGetStatistics. - Filters: priority 0..32767, qtSummary + qtHistory both tried, asAlarmActiveNow. - AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug on default(DateTime). Result: every read API returns empty for the entire 60s window even with TestMachine_001.TestAlarm001 firing every 10s and aaObjectViewer confirming InAlarm transitions. The aaAlarmManagedClient.AlarmClient is not the receive surface AVEVA's alarm pipeline routes to in this Galaxy configuration. The consumer chain is verified working end-to-end: Initialize + Register + Subscribe all succeed, GetProviders finds the provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's internal hwnd. There is simply no alarm data flowing through this consumer surface. Next investigation is not consumer-side: either find the SDK aaObjectViewer's alarm panel uses, or query the historian event storage directly. If alarms only flow via the historian path on this customer's Galaxy, the worker's PR A.5 architecture is a dead-end and A.2 needs a different transport. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8ac6642bf8 |
probe: subscribe-parameter sweep — alarms still absent, producer-side blocked
Tried every documented subscription knob with InitializeConsumer present + provider visible at status 100: - qtSummary AND qtHistory (the only eQueryType values). - Priority 1..999 AND 0..32767. - FilterMask/Spec asNone AND asAlarmActiveNow. eAlarmFilterState is single-state-valued (asNone=0, asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits, so the filter surface is exhausted. GetStatistics continued to report total=0 active=0 codes=[7] for every poll across all combinations. User confirmation: the BoolAlarm extension on TestMachine_001.TestAlarm001 is evaluating (the $Alarm.InAlarm sub-attribute flips true/false in lockstep with the script writes, visible in aaObjectViewer). So the consumer chain is verified working end-to-end on our side. What's missing is producer-side publication into the aaAlarmManagedClient stream. Probable causes (config, not code): - BoolAlarm extension's "publish to alarm manager" / "Active" / "Enabled" flag may be off. - Alarm-vs-event mode setting may have it routing to events, not alarms. - Platform alarm area may not match the consumer's subscription scope. Resolution path: check the BoolAlarm extension's config in System Platform IDE; check aaObjectViewer's Active Alarms panel (not attribute panel) to see if the alarm appears there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4e8928cf71 |
probe: InitializeConsumer required — provider visible after, alarms still absent
InitializeConsumer was the missing call. Adding it before
RegisterConsumer makes the \Galaxy! provider appear in
GetProviders (status 0 -> 100 within 500ms). Without Initialize,
GetProviders returns an empty list even though everything else
returns rc=0 (success).
Probe trace 2026-05-01:
InitializeConsumer -> 0
RegisterConsumer -> 0
GetProviders [after Register] -> count=0 list=[]
Subscribe('\Galaxy!') -> 0
GetProviders [after Subscribe] -> count=1 list=[ 0 \Galaxy!]
GetProviders [poll #1] -> count=1 list=[100 \Galaxy!]
Despite the provider being at "100% query complete" for the
entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] -- no alarm transitions reached the
consumer even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run.
So the consumer chain works end-to-end. What's missing is alarm
traffic from the producer side. The next discriminator is
whether ObjectViewer (or another live consumer) sees the alarm
fire while the script runs.
API-ordering bug fix to apply to PR A.5's AlarmClientConsumer
regardless of how A.2 lands: AlarmClientConsumer.Subscribe
should call InitializeConsumer before RegisterConsumer (currently
omits Initialize entirely, which means the provider chain is
never visible from the worker either). That fix lifts a
fundamental bug independent of the polling-vs-callback question.
Probe changes:
- Added InitializeConsumer call before RegisterConsumer.
- Added LogProviders helper that logs only on change; called
after Register, after Subscribe, and on every poll. Easier
to spot when the provider chain transitions from empty to
populated.
- Restored Skip-gating after run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f4423dfb6d |
probe: GetProviders=0 — alarm path upstream-blocked on dev rig
Extended AlarmClientWmProbeTests to call AlarmClient.GetProviders after RegisterConsumer. Run 2026-05-01: GetProviders -> rc=0 count=0 list=[] Zero alarm providers visible to the consumer. This explains every preceding probe run — no providers means no alarm events, regardless of subscription expression or value writes upstream. Even with a System Platform script flipping TestMachine_001.TestAlarm001 every 10s during the run, GetStatistics reported no transitions, no positions[] entries, no field changes after t=0.85s. Possible causes (dev-rig configuration, not code): 1. No $Alarm extension on the test bool — flipping the value writes a value but doesn't fire an alarm. 2. AVEVA alarm-manager service (aaAlarmMgr or equivalent) not running on this rig. 3. Process security context — providers registered under a service account aren't visible to a consumer running under a normal user account. A.2 implementation is blocked on this until at least one provider is visible. Once a provider exists, the polling-vs-callback question is answerable in one probe run; without a provider both paths return the same "nothing happening" answer. Probe changes: - Added in-process MxAccess Write attempt (TriggerWriteValue) — hit TargetParameterCountException so the Write signature is not (handle, item, value); reflection diag added but not resolved. Now disabled in favor of external trigger. - Added GetProviders enumeration after RegisterConsumer. - Removed firePrint/clearPrint markers; probe is observe-only. - Added ArchestrA.MxAccess reference to the test project. Also updated docs/AlarmClientDiscovery.md with the alarm-provider-visibility section explaining what's blocked and why. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3ff4969224 |
probe: GetStatistics polling viable, Galaxy has no active alarms today
Extended AlarmClientWmProbeTests.ProbeAlarmClientWmMessages to also call GetStatistics every ~2s during the pump window. Re-ran on the dev rig 2026-05-01: - GetStatistics is safely callable from the same thread that did RegisterConsumer + Subscribe. Every poll (9 calls / 20s window) returned rc=0, no exceptions. - Galaxy currently has zero active alarms. total=0 active=0 suppressed=0 newAlarms=0 across every poll. positions[] and handles[] arrays were empty. - changes=1 codes=[7] was constant across all polls, matching the constant 1 Hz WM 0xC275 cadence — same heartbeat semantics exposed through both the WM path and the pull API. Confirms the polling design is mechanically viable: GetStatistics threading-affinity is fine and the call is cheap. The remaining unknown is whether GetStatistics populates positions[] / handles[] with real entries when an alarm actually fires. Proving that requires triggering an alarm — next probe is an MxAccess write to a $Alarm-extended boolean tag (reference pending). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
12881ca791 |
docs+test: live AlarmClient WM probe — heartbeat-only, hWnd not used
Added MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs as a Skip-gated
runtime probe. Run on the dev rig 2026-05-01 against the live AVEVA
install (Galaxy reachable, no manual alarm fired). Findings:
- RegisterConsumer(hWnd, ...) and Subscribe("\Galaxy!", ...) both
return 0 (success). Calls are valid against the deployed assembly.
- A registered-message-class WM (ID 0xC275 in this OS session) fires
every ~1 second after Subscribe completes. Constant wParam=0x1100,
constant lParam=0x079E46D8 — looks like a heartbeat / keepalive,
not a per-change notification.
- Critically, this WM is delivered to AVEVA's own internal window
(hwnd=0x18032E), NOT to the consumer hWnd we registered. The
consumer window receives only the standard WM_CREATE / WM_DESTROY
sequence; no AVEVA traffic in between.
This invalidates the WM_APP-pump design previously documented. The
hWnd parameter to RegisterConsumer appears to be a registration
identity only — AVEVA's notification path runs entirely against
AVEVA's own internal window.
Two viable A.2 designs replace the previous one:
1. Polling. Call GetStatistics on a 500ms timer in the worker's STA
and react to whatever change set it reports. No window plumbing
needed. Latency floor = poll period. Matches AVEVA's own
internal heartbeat cadence.
2. Hook AVEVA's internal window. Discover AVEVA's own hwnd,
SetWindowSubclass on it, intercept WM 0xC275 on AVEVA's thread.
Higher fidelity, lower latency, but invasive and fragile across
AVEVA upgrades — likely a non-starter.
Recommendation in docs/AlarmClientDiscovery.md is option 1 (polling)
unless a follow-up probe with a real fired alarm shows AVEVA does
post change-specific WMs to a different hWnd.
Open follow-up probes documented:
- Fire a real Galaxy alarm during pump and check whether WM 0xC275
cadence changes or GetStatistics returns non-empty arrays.
- GetStatistics threading affinity test.
- Hook AVEVA's internal window 0x18032E.
- Decompile aaAlarmManagedClient IL for RegisterConsumer to find
whether WNAL_Register's callback surface is wrapped.
Test project changes:
- Added Reference to aaAlarmManagedClient + IAlarmMgrDataProvider
(Private=true so the DLL gets copied into bin for test load).
- Test-suite-wide: 127 real tests still pass; both alarm-related
Skip-gated tests skip cleanly.
Code change to the probe is additive — the worker is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6e356da092 |
docs: AlarmClient public surface — managed-event premise wrong, WM_APP required
Reflection probe of the deployed aaAlarmManagedClient.dll (v1.0.7368.41290) on 2026-05-01 confirmed the public AlarmClient class exposes zero public events. The PR A.5 design that AlarmClientConsumer is built on (managed-event surface, no message pump) does not hold against this assembly. The actual notification mechanism is WM_APP messaging: RegisterConsumer(hWnd, ...) takes a window handle because AVEVA's alarm provider WM_APP-pokes the registered window, then GetStatistics + GetAlarmExtendedRec pull the change set on each poke. Practical impact: - AlarmClientConsumer.AlarmRecordReceived has no production caller. RaiseAlarmRecordReceived is invoked only from tests. Subscribe(...) returns OK from RegisterConsumer + Subscribe but no notifications reach the consumer at runtime because no window is attached. - Until A.2 lands a hidden message-only window + WindowProc that routes WM_APP into MxAccessAlarmEventSink.EnqueueTransition, the gateway's MX_EVENT_FAMILY_ON_ALARM_TRANSITION family cannot carry events. - AcknowledgeByGuid and SnapshotActiveAlarms are pull-style and remain correct as written. Changes: - docs/AlarmClientDiscovery.md (new) — reflection probe summary, full AlarmClient method list, open questions for A.2 implementation. - AlarmClientConsumer.cs xmldoc — replaced the inaccurate "managed event surface" claim with the WM_APP finding; flagged AlarmRecordReceived as unreachable in production until the WM_APP pump lands. - MxAccessAlarmEventSink.cs xmldoc — replaced the "verify on dev rig" hedge in the wiring plan with the resolved finding; expanded the open-questions list (WM_APP message ID, wParam/lParam semantics, STA affinity, subscription scope) so the next A.2 PR knows what the dev-rig probe needs to answer. Code-only no-op for the worker; worker builds clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6b3c117d1e |
gateway: alarm-RPC dispatcher seam (PRs A.6 + A.7)
Replaces the inline diagnostic strings in PR A.3's AcknowledgeAlarm + QueryActiveAlarms handlers with an IAlarmRpcDispatcher seam. - IAlarmRpcDispatcher (new) — gateway-side abstraction over the worker-RPC path that fronts AlarmClient.AlarmAckByGUID and the active-alarm walk. AcknowledgeAsync returns the AcknowledgeAlarmReply directly; QueryActiveAlarmsAsync yields an IAsyncEnumerable<ActiveAlarmSnapshot>. - NotWiredAlarmRpcDispatcher (new, default impl) — returns PROTOCOL_STATUS_OK with a structured worker-pending diagnostic on Acknowledge, yields an empty stream on QueryActiveAlarms. Same observable shape as PR A.3, but the integration seam is now in code instead of hardcoded inside the handler. - MxAccessGatewayService — handlers delegate to the dispatcher. Constructor accepts an optional IAlarmRpcDispatcher (default NotWiredAlarmRpcDispatcher); a future WorkerAlarmRpcDispatcher registration in DI swaps in the live worker-IPC routing without changing the public RPC surface. - 2 new dispatcher tests pin the not-wired contract; 279 → 281 total tests, all green. Worker-side dispatch (translating Acknowledge / QueryActiveAlarms to the IPC method that calls IMxAccessAlarmConsumer from PR A.5) is the dev-rig follow-up — it depends on validating the AVEVA GetAlarmChangesCompleted event subscription against a live alarm provider before pinning a wire format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1ac5bcafb2 |
worker: AlarmClientConsumer + transition mapper (PR A.5)
Wires the worker-side consumer for AVEVA alarm transitions over the
aaAlarmManagedClient API discovered in the prior foundation PR.
- IAlarmMgrDataProvider.dll referenced — exposes AlarmRecord +
eAlmTransitions / eQueryType / eSortFlags / eAlarmFilterState.
Both DLLs (aaAlarmManagedClient + IAlarmMgrDataProvider) load in
the worker's existing net48 x86 process; no new bitness boundary.
- IMxAccessAlarmConsumer abstraction — Subscribe / AcknowledgeByGuid
/ SnapshotActiveAlarms / AlarmRecordReceived event. Test seam.
- AlarmClientConsumer production wrapper — RegisterConsumer +
Subscribe + AlarmAckByGUID + GetStatistics-based active-alarm
walk, all delegated to AlarmClient. Uses AVEVA's managed event
surface (GetAlarmChangesCompleted on IAlarmMgrDataProvider) so
no Windows message pump is required — plain .NET events arrive
on the alarm-client's internal callback thread.
- AlarmRecordTransitionMapper — pure-function helpers:
MapTransitionKind(eAlmTransitions): ALM→Raise, ACK→Acknowledge,
RTN→Clear, others (SUB/ENB/DIS/SUP/REL/REMOVE)→Unspecified
so EventPump's decoding-failure counter records them.
ComposeFullReference(provider, group, name): Provider!Group.Name
format matching AVEVA's standard alarm-reference syntax.
Pinned during dev-rig validation (subsequent commits):
1. Confirm RegisterConsumer accepts hWnd=0 — if it requires a real
hwnd, the worker creates a hidden message-only window and
passes that handle. The managed event surface should make
this irrelevant but the AVEVA API is older than its managed
wrapper.
2. Wire AlarmClientConsumer.AlarmRecordReceived: the AVEVA
IAlarmMgrDataProvider.GetAlarmChangesCompleted event needs to
be hooked from inside the AlarmClient — find the proper
accessor (likely a property exposing the inner provider).
3. AlarmRecord field-by-field translation into the proto event
uses MxAccessAlarmEventSink.EnqueueTransition (existing
plumbing). The AlarmRecord field names (ar_OrigTime,
AlarmName, AckOperatorFullName, AckComment, etc.) are
pinned in the discovery dump preserved in
AlarmClientDiscoveryTests.
Tests: 127 pass (4 new ComposeFullReference cases + 1 Skip-gated
discovery probe). Transition-kind enum mapping is dev-rig-validated
rather than unit-tested because the AVEVA assembly is Private=false
on the reference and isn't copied to the test bin directory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a14098468b |
worker: aaAlarmManagedClient discovery + reference (alarm-helper foundation)
Discovers the surface of aaAlarmManagedClient.dll and stages the worker csproj reference so subsequent PRs can wire native MxAccess alarm subscription. Replaces the speculative "operator decision needed between path 1 and path 2" framing in MxAccessAlarmEventSink with the validated architecture. Key findings from the discovery probe: 1. aaAlarmManagedClient.dll is x86 + .NET Framework (mixed-mode C++/CLI; PE Machine = i386, NativeEntryPoint flag set). The "x64-only" framing in the prior follow-up was wrong — confused by the file path under Wonderware\Historian\x64\. The assembly is bitness- and runtime-compatible with the worker (net48 x86), so it loads in the existing process. No sub-process needed. 2. AlarmClient is the public class. Its model mirrors MxAccess: RegisterConsumer takes a Windows hWnd and the AVEVA alarm service WM_APP-pokes that hwnd when alarms change. The worker's existing STA + WM_APP pump can drive both the data-change COM subscriber and the alarm-client consumer. 3. AlarmAckByGUID(alarmGuid, ackComment, oprName, oprNode, oprDomain, oprFullName) — the native ack carries the operator's full identity atomically with the comment. Closes the v1 operator-comment fidelity gap completely. This PR: - Adds the aaAlarmManagedClient.dll reference to MxGateway.Worker. csproj. Worker still builds clean. - Adds AlarmClientDiscoveryTests as a Skip-gated reflection probe; flip the Skip parameter to dump the public type surface for reference. Captured the dump into MxAccessAlarmEventSink documentation so it doesn't have to be re-run. - Replaces MxAccessAlarmEventSink's "two paths forward" doc with the actual wiring plan against AlarmClient's RegisterConsumer + Subscribe + AlarmAckByGUID surface. Subsequent PRs (gated on STA + WM_APP integration testing on the dev rig): - Wire RegisterConsumer + Subscribe at session-startup; route WM_APP messages through GetStatistics + GetAlarmExtendedRec into EnqueueTransition. - Translate gateway-side AcknowledgeAlarm RPC to a worker command that calls AlarmAckByGUID with the OPC UA operator's identity; replaces the worker-pending diagnostic from PR A.3. - Translate gateway-side QueryActiveAlarms to a worker command that walks GetStatistics's reported handles via GetAlarmExtendedRec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4e933802a7 |
worker: document MXAccess Toolkit alarm-API gap (A.2 follow-up)
PR A.2 ship-pin discovery: the MXAccess COM Toolkit installed at C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll does not expose any alarm-event family. Reflection enumeration of the assembly confirms ILMXProxyServerEvents and ILMXProxyServerEvents2 only carry OnDataChange, OnWriteComplete, OperationComplete, and OnBufferedDataChange — no IAlarmEventSink, no Alarms collection, no OnAlarmTransition. AVEVA's separate alarm-subscription managed assemblies (aaAlarmManagedClient.dll under InTouch\ViewAppFramework\Content\MA\ and ArchestrAAlarmsAndEvents.SDK.Common.dll under Wonderware\Historian\x64\) exist on this box but are x64-only — incompatible with the worker's x86 bitness, which is the bitness constraint the mxaccessgw architecture exists to isolate in the first place. This commit replaces the speculative "TBD pin during dev-rig validation" comment in MxAccessAlarmEventSink with the actual finding plus the two operator-facing paths forward: 1. Stay on the value-driven sub-attribute path (current production behaviour). lmxopcua's AlarmConditionService already synthesizes Part 9 transitions from the four MXAccess sub-attributes. Operator-comment fidelity is the only v1 regression. 2. Add an x64 alarm-helper sub-process alongside the worker that loads aaAlarmManagedClient and forwards transitions to the worker over a small named-pipe IPC. Recovers full v1 fidelity but adds operational complexity. Until that decision resolves, the sink's Attach is a no-op, the worker continues to function for data subscriptions, and lmxopcua-side AlarmConditionService keeps the sub-attribute synthesis active. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9de2c0c43d |
gateway: AcknowledgeAlarm + QueryActiveAlarms handler tests (PR A.4)
Nineteenth (final) PR of the alarms-over-gateway epic. Pins the public RPC handler contract added in PR A.3: - AcknowledgeAlarm rejects empty session_id and empty alarm_full_reference with InvalidArgument. - AcknowledgeAlarm with valid input returns OK and a worker-pending diagnostic so clients see a successful round-trip even before A.2's worker dispatch lands. - QueryActiveAlarms rejects empty session_id with InvalidArgument. - QueryActiveAlarms with valid input streams zero snapshots until PR A.2 wires the worker-side QueryActiveAlarmsCommand (filter-prefix passthrough verified at the proto layer). - OpenSession advertises both new RPC capability strings (unary-acknowledge-alarm, server-stream-active-alarms) so client capability negotiation lights up against the contract surface. Closes Track A's gateway-side surface. The remaining worker ConditionRefresh walk + integration parity-rig validation lands during dev-rig hardware validation alongside PR A.2's COM-side alarm subscription pin. Tests: 279 passed (was 273; 6 new). Per-handler integration tests land alongside the dev-rig validation when the worker walks the real MxAccess active-alarm collection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
335c952f00 |
worker: alarm event mapper + sink scaffold (PR A.2 — partial)
Eighteenth PR of the alarms-over-gateway epic (docs/plans/alarms-over-gateway.md). Lands the proto-build path that the worker uses to create OnAlarmTransition events. The COM-side subscription that registers an alarm event sink against the MXAccess Toolkit is pinned during dev-rig validation — the exact API differs across AVEVA versions and needs hardware to verify. Lands today (unit-testable, no hardware needed): - MxAccessEventMapper.CreateOnAlarmTransition — mechanical proto builder. Takes decoded alarm fields (full reference, source object, alarm type, transition kind, severity, timestamps, operator user/comment, category, description) and produces an MxEvent with the OnAlarmTransition body populated. Mirrors the pattern of CreateOnDataChange / CreateOnWriteComplete / etc. - MxAccessAlarmEventSink — scaffolded class with documented Attach / Detach + an internal EnqueueTransition entry point. When dev-rig validation pins the MXAccess Toolkit alarm subscription API, the only edit needed is to wire the COM delegate inside Attach to call EnqueueTransition. The mapper bridge is already done. Pending dev-rig validation: - Pin the MXAccess Toolkit alarm event source COM API (likely one of IAlarmEventSink, IAlarmEventSubscription, or a method on LMXProxyServerClass — verify against the worker host's installed version). - Add cancellation/cleanup tests once the COM hook is wired. - Integration test against the parity rig that fires a real Galaxy alarm and asserts the gateway emits OnAlarmTransition. Tests: - 2 new mapper tests pin the full-payload Acknowledge case and the bare-bones Raise case. - Full Worker.Tests suite green: 123 passed (was 121; 2 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4f0f03fca5 |
gateway: AcknowledgeAlarm + QueryActiveAlarms RPC handlers (PR A.3)
Twelfth PR of the alarms-over-gateway epic (docs/plans/alarms-over-gateway.md). Lands the public RPC handler surface that PR A.1's proto introduced. The actual worker-side ack call + active-alarm walk depend on PR A.2 (worker MxAccess subscription); this PR ensures clients can call the RPCs and receive a meaningful response without UNIMPLEMENTED at the gRPC layer. - AcknowledgeAlarm — validates session_id + alarm_full_reference, resolves the session (NotFound on miss), returns a successful reply with a structured DiagnosticMessage indicating worker dispatch is pending PR A.2. Once A.2 ships, the body translates the request into a WorkerCommand and forwards through SessionManager.InvokeAsync. - QueryActiveAlarms — validates session_id, returns an empty stream. PR A.4 layers the actual ConditionRefresh implementation once the worker's QueryActiveAlarmsCommand is available. - OpenSessionReply.Capabilities advertises both new RPCs (unary-acknowledge-alarm, server-stream-active-alarms) so clients can negotiate against the contract surface. OnAlarmTransition events flow through the existing StreamEvents path automatically — EventStreamService and MxAccessGrpcMapper forward whatever family the worker emits without filtering, so no changes are needed there for A.3. Tests: full 273-test suite still green. Per-handler unit tests ship with PR A.4's expanded surface; A.3's stub handlers are narrow enough that the existing parity-fixture tests cover the contract round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0f88a953d7 |
proto: add alarm-transition event family + ack/query RPCs (PR A.1)
First PR of the alarms-over-gateway epic (docs/plans/alarms-over-gateway.md in lmxopcua). Pure contract-surface change — no functional wiring yet. Worker-side subscription (A.2), gateway-side dispatch + ack handler (A.3), and ConditionRefresh (A.4) follow. mxaccess_gateway.proto: - Extend MxEventFamily with MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5. - Extend MxEvent.body oneof with OnAlarmTransitionEvent on_alarm_transition = 24. - Add OnAlarmTransitionEvent message carrying the full MxAccess alarm payload (full reference, source object, alarm-type-name, transition kind, raw severity, original raise timestamp, transition timestamp, operator user/comment, category, description, current/limit value). Mapping to OPC UA 0-1000 severity ladder happens server-side in lmxopcua's MxAccessSeverityMapper (B.1) — gateway preserves the native MxAccess scale. - Add AlarmTransitionKind enum (Raise / Acknowledge / Clear / Retrigger). - Add ActiveAlarmSnapshot + AlarmConditionState for the ConditionRefresh stream. - Add public RPCs AcknowledgeAlarm (unary) and QueryActiveAlarms (server-streaming) on MxAccessGateway service. - Add AcknowledgeAlarmRequest/Reply + QueryActiveAlarmsRequest. GatewayContractInfo.GatewayProtocolVersion bumps 2 -> 3. Fixture manifests (proto-inputs, behavior, parity, golden OpenSessionReply) and protoset descriptor regenerated. Tests: round-trip serialization for the new messages with all-fields-populated and empty-optional-fields cases; oneof last-write-wins guard between OnDataChange and OnAlarmTransition; descriptor service-method enumeration includes the two new RPCs. All 273 existing tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |