Commit Graph

41 Commits

Author SHA1 Message Date
Joseph Doherty 5e375f6d3d Add bulk read/write command family across worker, gateway, and clients
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.

ReadBulk has no MXAccess analogue. The worker satisfies it by:

  - Returning the last cached OnDataChange payload (was_cached=true)
    when the requested tag is already in the session''s item registry
    AND advised — the existing subscription is NOT touched, since the
    caller did not create it.
  - Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
    UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
    and leaving the session exactly as it was. The wait pumps Windows
    messages on the STA so the inbound MXAccess event can dispatch
    while the executor still holds the thread.

The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.

Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.

All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.

Tests added:
  - MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
    version, TryWaitForUpdate signals on Set, pump step fires each poll.
  - MxAccessBaseEventSinkTests — OnDataChange populates the cache,
    ValueCache property exposes the bound instance.
  - MxAccessCommandExecutorTests — four bulk-write variants (per-entry
    success/failure, value+timestamp forwarding, secured user ids),
    ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
    was_successful=false), invalid-payload reply.
  - GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
  - SessionManagerTests — WriteBulk and ReadBulk forwarding through
    FakeWorkerHarness; ReadBulk forwards timeout_ms.
  - Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
    right command and returns per-entry results, ReadBulk forwards the
    timeout and unpacks the was_cached flag.

Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.

Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 03:42:38 -04:00
Joseph Doherty 06030dd1ef Implement MXAccess write commands in the worker
The .proto contract and MxCommandKind already defined Write, Write2,
WriteSecured, and WriteSecured2, but the worker's MxAccessCommandExecutor
had no case for any of them — every write kind fell through to
CreateInvalidRequestReply ("Unsupported MXAccess command kind Write").

Implement all four:

- VariantConverter.ConvertToComValue projects an MxValue into a
  COM-marshalable object (scalars, arrays, null) — the inverse of the
  existing COM-to-MxValue projection.
- IMxAccessServer / MxAccessComServer gain Write/Write2/WriteSecured/
  WriteSecured2, routed to ILMXProxyServer / ILMXProxyServer4.
- MxAccessSession and MxAccessCommandExecutor add the four write paths,
  following the existing ExecuteAdvise pattern; the reply is a plain OK
  reply and the outcome surfaces later as an OnWriteComplete event.

Verified live: a Write now returns PROTOCOL_STATUS_CODE_OK and produces
an OnWriteComplete event where it previously returned InvalidRequest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 14:45:35 -04:00
Joseph Doherty 1764eff1cf Resolve Worker-009..015 code-review findings
Worker-009: WorkerFrameWriter serialized twice and WorkerFrameReader
allocated a payload byte[] per frame. The writer now serializes once into a
single prefix+payload buffer; the reader rents the payload buffer from
ArrayPool and honors the logical frame length.

Worker-010: VariantConverter projected a uint+Time value as a full FILETIME,
producing a near-1601 timestamp. The FILETIME projection is now gated on
`value is long`; uint falls through to the integer projection.

Worker-011: replaced the opaque retryAttempts formula in WorkerPipeClient
with MaxRetryAttempts = int.MaxValue, leaving the connect deadline as the
sole bound.

Worker-012: rewrote stale "future PR / polls on a Timer" comments in
AlarmDispatcher, AlarmCommandHandler, MxAccessAlarmEventSink and
MxAccessEventMapper to match the shipped, post-Worker-001 behavior.

Worker-013 (re-triaged): already resolved — StaMessagePumpTests and
MxAccessStaSessionTests cover the pump and poll loop directly.

Worker-014: moved IAlarmCommandHandler into its own file so
AlarmCommandHandler.cs declares one public type.

Worker-015: clarified the MxAccessBaseEventSink.EnqueueEvent overflow-catch
comment explaining the deliberate double RecordFault no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:42:17 -04:00
Joseph Doherty 18ce2922e2 Resolve Worker.Tests-003..007 code-review findings
Worker.Tests-003: removed the wall-clock `Elapsed < 2s` assertion from
InvokeAsync_WakesIdlePumpForQueuedCommand; the awaited completion against a
30s idle period already proves the wake event drove dispatch.

Worker.Tests-004: MxAccessStaSession.Dispose now joins the alarm poll task
after cancelling the CTS (consistent with ShutdownGracefullyAsync), and
Dispose_StopsAlarmPollLoop asserts deterministically instead of via Task.Delay.

Worker.Tests-005: undisposed MemoryStream instances across the frame-protocol
and pipe-session tests are now `using` declarations.

Worker.Tests-006: Dispose_StopsAlarmPollLoop now constructs MxAccessStaSession
with `using` so a failed assertion cannot leak the STA poll loop.

Worker.Tests-007: docs/WorkerFrameProtocol.md verification section corrected
to target MxGateway.Worker.Tests / MxGateway.Worker with -p:Platform=x86.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:01 -04:00
Joseph Doherty 54325343bd Resolve Worker-004, -005, -006, -007, -008 code-review findings
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state.
ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing
the StaHung fault.

Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001;
the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync,
which caught only graceful-stop exceptions. A failing PollOnce now records a
WorkerFault on the event queue instead of vanishing on a non-awaited task.

Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed
out, leaking the STA thread and COM object. It now always disposes
(MxAccessStaSession.Dispose is idempotent and bounded).

Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember
reflection fallback with an IMxAccessServer fast path plus typed
ILMXProxyServer* casts; a non-conforming object now fails fast.

Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession
records the alarm consumer's STA thread id and asserts every PollOnce runs
on it via a unit-testable guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:23 -04:00
Joseph Doherty 1b4dcf32d5 Resolve Worker.Tests-001 and Worker.Tests-002 code-review findings
Worker.Tests-001: StaMessagePump had no direct unit test. Added
Sta/StaMessagePumpTests.cs — 8 STA-thread facts covering WaitForWorkOrMessages
(wake-event signalled before/during the wait, timeout expiry, zero-timeout
fast path, the QS_ALLINPUT posted-message wake path) and PumpPendingMessages
drain counting.

Worker.Tests-002: no test drove a COM event through the integrated
sink -> mapper -> queue path. Added MxAccess/MxAccessBaseEventSinkTests.cs —
5 facts driving OnDataChange, OnWriteComplete, OperationComplete and
OnBufferedDataChange through a real MxAccessBaseEventSink + mapper + queue and
asserting the converted WorkerEvent lands in MxAccessEventQueue. The four COM
event handlers were widened private -> internal and InternalsVisibleTo for
MxGateway.Worker.Tests was added, mirroring MxAccessAlarmEventSink's existing
test seam; no worker behavior changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:07:48 -04:00
Joseph Doherty 53e3973209 Resolve Worker-001, Worker-002, Worker-003 code-review findings
Worker-001: WnWrapAlarmConsumer armed a System.Threading.Timer whose OnPoll
callback ran GetXmlCurrentAlarms2 on a thread-pool thread against the
Apartment-threaded wnwrap COM object, which can deadlock on cross-apartment
marshaling. Removed the pollTimer/pollIntervalMs fields, OnPoll, the
poll-interval constructor parameter, and the timer arm/disposal. Polls are
driven externally by the STA via StaRuntime.InvokeAsync(PollOnce).

Worker-002: RunHeartbeatLoopAsync delayed a full HeartbeatInterval before
the first heartbeat. Restructured so the first beat is sent immediately on
entering the loop and the delay applies only between subsequent beats.

Worker-003: ProcessCommandAsync silently returned without a reply when
_state was not a command-serving state after dispatch. Both drop sites now
log a WorkerCommandResultDropped diagnostic with correlation_id via
IWorkerLogger; _state is now volatile.

Three pre-existing tests that asserted strict frame ordering were updated to
tolerate an interleaved first heartbeat (Worker-002 consequence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:59:46 -04:00
Joseph Doherty a67a5a4857 fix(worker): wire alarm command handler and STA poll loop (Gap 1 + Gap 2)
Gap 1 — WorkerPipeSession now passes `eq => new AlarmCommandHandler(eq)` as
the alarmCommandHandlerFactory in all three places it constructs
MxAccessStaSession (two convenience constructors and InitializeMxAccessAsync).
Previously the parameterless MxAccessStaSession() set the factory to null,
so every SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms command
returned "alarm consumer not configured" in a deployed worker.

  - Added internal `MxAccessStaSession(Func<MxAccessEventQueue, IAlarmCommandHandler>?)`
    constructor that builds all defaults but accepts a factory.
  - Added public `MxAccessStaSession(StaRuntime, factory, eventQueue, alarmFactory?)`
    4-arg overload to complete the constructor chain.

Gap 2 — WnWrapAlarmConsumer now disables its internal threadpool Timer
(pollIntervalMilliseconds=0 in the default constructor). MxAccessStaSession
starts a `RunAlarmPollLoopAsync` background task that sleeps off-STA then
calls `staRuntime.InvokeAsync(() => handler.PollOnce())` at 500ms intervals.
This satisfies the ThreadingModel=Apartment requirement of wwAlarmConsumerClass:
every GetXmlCurrentAlarms2 call now runs on the worker's STA.

  - Added `PollOnce()` to `IMxAccessAlarmConsumer`, `AlarmDispatcher`,
    `IAlarmCommandHandler`, and `AlarmCommandHandler`.
  - Poll loop cancelled and awaited before alarm handler disposal in both
    ShutdownGracefullyAsync and Dispose.

Tests: 4 new tests in MxAccessStaSessionTests verify that
  - SubscribeAlarms reaches the handler when the factory is wired (Gap 1)
  - SubscribeAlarms returns InvalidRequest without a factory (regression guard)
  - PollOnce is called on the STA thread within 3s (Gap 2)
  - The poll loop stops after Dispose (Gap 2 lifecycle)
All fake IMxAccessAlarmConsumer / IAlarmCommandHandler test implementations
updated with no-op PollOnce() to satisfy the new interface member.

Worker tests: 199 passed / 1 pre-existing failure / 4 skipped (was 195/1/4).
Server tests: 308 passed / 0 failures (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 06:30:14 -04:00
Joseph Doherty a4ed605f74 A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
  - Subscribe + 1st PollOnce yield real transition events
  - Field-by-field decode correct (provider, group, tag, severity,
    UTC timestamp, comment, type)
  - SnapshotActiveAlarms reflects current state
  - AcknowledgeByName(real identity) -> rc=0
  - Pipeline keeps streaming transitions on the 10s cadence post-ack

Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:

1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
   earlier discovery-doc recommendation) makes the first
   GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
   call is unnecessary because the round-trip echo is mangled was
   wrong — mangled echo or not, the call is required.

2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
   instance (returns -55). Workaround: provision a parallel
   "ack-only" wnwrap consumer that runs Initialize → Register →
   Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
   Production WnWrapAlarmConsumer now holds two COM clients;
   AcknowledgeByName always dispatches through the ack-only one.

3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
   8-arg overload returns -55 on this AVEVA build (apparently a
   stub); the v1 6-arg overload works. Production now calls the
   6-arg overload, discarding the proto's operator_domain and
   operator_full_name fields. The proto contract keeps both for
   forward-compat if AVEVA fixes the v2 method.

Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).

Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.

Test counts after this slice:
  Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
          1 pre-existing structure-fail (untouched)
  Server: 308 pass / 0 fail
Solution builds clean.

docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:17:39 -04:00
Joseph Doherty 4e02927f01 A.3 (alarm-ack-by-name): public AcknowledgeAlarm now accepts Provider!Group.Tag references
Closes the gap where the public AcknowledgeAlarm RPC required canonical
GUIDs but OnAlarmTransitionEvent.AlarmFullReference is "Provider!Group.Tag".
Adds an AVEVA AlarmAckByName path that wraps wwAlarmConsumerClass.AlarmAckByName
so callers can ack with the natural reference.

Proto:
- New MxCommandKind.AcknowledgeAlarmByName (=29).
- New AcknowledgeAlarmByNameCommand(alarm_name, provider_name, group_name,
  comment, operator_user/node/domain/full_name) on MxCommand oneof.
- AcknowledgeAlarmReplyPayload (existing) carries the AVEVA native
  status; reused for the by-name path.

Worker:
- IMxAccessAlarmConsumer + WnWrapAlarmConsumer + AlarmDispatcher +
  AlarmCommandHandler all gain an AcknowledgeByName(name, provider,
  group, comment, operator-identity) overload that maps to
  wwAlarmConsumerClass.AlarmAckByName.
- MxAccessCommandExecutor: new switch arm routes
  MxCommandKind.AcknowledgeAlarmByName to the handler. Empty alarm_name
  yields InvalidRequest; handler exceptions surface as MxaccessFailure.

Gateway:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference: parses
  "Provider!Group.Tag" with the convention that the FIRST '!' separates
  provider, the FIRST '.' after '!' separates group; tag may contain
  more dots.
- AcknowledgeAsync now branches: GUID input → AcknowledgeAlarm command
  (existing path); reference input → AcknowledgeAlarmByName command
  (new path); neither parses → InvalidRequest with a clear diagnostic.

Tests: 13 new unit tests cover each layer end-to-end:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference (3 valid + 8 invalid
  forms) including the realistic 4-component "Galaxy!TestArea.
  TestMachine_001.TestAlarm001" reference.
- WorkerAlarmRpcDispatcher.AcknowledgeAsync routes references through
  AcknowledgeAlarmByName + propagates the full operator tuple.
- Executor switch arm carries the by-name tuple and rejects empty
  alarm_name.
- AlarmDispatcher.AcknowledgeByName forwards to consumer.
- Existing fakes extended for the new overload.

Counts: server 308/0, worker 195/3 skip / 1 pre-existing structure-fail
(untouched). Solution builds clean.

End-to-end alarms-over-gateway now serves the full lmxopcua flow:
client.AcknowledgeAlarm(reference="Galaxy!TestArea.TestMachine_001.TestAlarm001",
operator_user="alice") → gateway parses → IPC AcknowledgeAlarmByName →
worker AlarmAckByName → AVEVA history. The remaining piece for full
parity is a live dev-rig smoke test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:17:15 -04:00
Joseph Doherty 01f5e6ad91 A.3 (worker IPC slice): proto SubscribeAlarms/Acknowledge/QueryActive commands + executor routing
Adds the worker-side IPC surface for the alarm subsystem so the gateway
can drive the AlarmDispatcher across the named-pipe boundary. Adds four
proto MxCommandKind values + matching command messages and two
MxCommandReply payload variants:

- SubscribeAlarmsCommand(subscription_expression)
- UnsubscribeAlarmsCommand
- AcknowledgeAlarmCommand(alarm_guid, comment, operator_user/node/domain/full_name)
- QueryActiveAlarmsCommand(alarm_filter_prefix)
- AcknowledgeAlarmReplyPayload(native_status)
- QueryActiveAlarmsReplyPayload(repeated ActiveAlarmSnapshot snapshots)

Worker plumbing:

- New IAlarmCommandHandler interface + AlarmCommandHandler production
  impl. Lazy-creates an AlarmDispatcher (with a wnwrap-backed consumer
  by default) on the first SubscribeAlarms; routes Acknowledge / QueryActive /
  Unsubscribe through it. Idempotent under repeated Unsubscribe; rejects
  a second Subscribe without an intervening Unsubscribe; cleans up the
  consumer if the underlying Subscribe call throws.
- MxAccessCommandExecutor: 4 new switch arms map MxCommandKind values to
  IAlarmCommandHandler calls. Acknowledge surfaces the AVEVA native
  status into both MxCommandReply.Hresult and the dedicated
  AcknowledgeAlarmReplyPayload.NativeStatus so gateway-side consumers
  can echo it without unpacking the outer envelope. Invalid GUIDs and
  missing payloads return InvalidRequest; handler exceptions return
  MxaccessFailure with the exception message in DiagnosticMessage.
- MxAccessStaSession: new constructor overload accepts an
  alarmCommandHandlerFactory; it's invoked on the STA thread during
  StartAsync and the resulting handler is passed into the executor.
  ShutdownGracefullyAsync + Dispose tear it down on the STA before the
  data-side cleanup runs.

Tests: 20 new unit tests covering AlarmCommandHandler lazy lifecycle
(Subscribe/Unsubscribe/Acknowledge/Query/Dispose, error paths) and the
executor's 4 alarm switch arms (OK/InvalidRequest/MxaccessFailure paths,
hresult propagation, prefix filtering). Worker test suite total: 192
passed / 3 skipped (live probes) / 1 pre-existing structure-test fail
(untouched).

Deferred to next slice: gateway-side WorkerAlarmRpcDispatcher that
replaces NotWiredAlarmRpcDispatcher, builds + sends these commands across
the IPC, and unwraps the resulting MxCommandReply into AcknowledgeAlarmReply
/ ActiveAlarmSnapshot stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:52:04 -04:00
Joseph Doherty 82eb0ad569 A.3 (in-process slice): AlarmDispatcher wires consumer events onto event queue
Adds the in-process plumbing that connects WnWrapAlarmConsumer's
AlarmTransitionEmitted stream to the worker's MxAccessEventQueue via
MxAccessAlarmEventSink. With this change a transition raised by the
consumer lands as an OnAlarmTransitionEvent proto on the queue,
SessionId attached, ready for IPC dispatch.

Mapping: provider!group.tag → AlarmFullReference, tag → SourceObjectReference,
priority → severity, wnwrap STATE → AlarmConditionState (Active /
ActiveAcked / Inactive — wnwrap's ack-vs-unack-on-cleared distinction
collapses since OPC UA Part 9 doesn't model it). State delta drives
AlarmTransitionKind via the existing AlarmRecordTransitionMapper table.

Holding off on the proto IPC additions (SubscribeAlarms /
AcknowledgeAlarm / QueryActiveAlarms commands + WorkerAlarmRpcDispatcher)
for a follow-up — those touch every layer of the worker IPC and warrant
their own PR. This slice proves the consumer→sink→queue pipeline
end-to-end with unit tests and clears the path for the proto additions
to plug in cleanly.

Tests: 10 new unit tests cover field-by-field mapping, the
"unchanged-state-doesn't-emit" filter, the state→transition kind table,
Subscribe / Acknowledge passthrough, SnapshotActiveAlarms → proto
ActiveAlarmSnapshot mapping, and Dispose detaches the handler. All
passing; total worker test count 172/3 skip / 1 pre-existing structure
fail (untouched).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:52:35 -04:00
Joseph Doherty f711a55be4 A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient`
to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by
`wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML
payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime
auto-marshaling that crashed `GetHighPriAlarm` with
ArgumentOutOfRangeException on every poll. Live captured 60/60 polls
clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform
script flipped TestMachine_001.TestAlarm001 every 10s; the GUID,
priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps
arrived end-to-end.

Implementation:
- `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under
  `lib/` so dev boxes don't need the SDK to build.
- New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a
  500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the
  snapshot keyed by alarm GUID, and raises one
  `MxAlarmTransitionEvent` per state change. Includes the
  Initialize→Register-before-Subscribe ordering fix found during
  Discovery probe runs.
- New library-agnostic types `MxAlarmSnapshotRecord` /
  `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build
  path is testable without an AVEVA install.
- `AlarmRecordTransitionMapper` retired the COM-coupled
  `MapTransitionKind(eAlmTransitions)`; new pure helpers
  `ParseStateKind`, `MapTransition(prev, curr)`, and
  `ParseTransitionTimestampUtc` cover XML decode + state-delta logic.
- `IMxAccessAlarmConsumer` event surface changed from
  `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>`
  and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` —
  decoupling the interface from any specific COM library.
- Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider`
  refs; adds `Interop.WNWRAPCONSUMERLib`.

Tests:
- 36 new unit tests (state-string mapping, prev/current → proto kind
  decision table, timestamp UTC reassembly, XML payload parser, 32-char
  hex GUID round-trip) covering everything that doesn't touch the live
  COM surface — all passing.
- Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives
  the live capture flow for regression / future probes.

Docs:
- `docs/AlarmClientDiscovery.md` "Option A — captured" section records
  sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip
  (prefer `Subscribe` for filtering), the `GetStatistics`
  AccessViolationException quirk, and the worker-integration outline.

Pre-existing failure noted (separate):
`MxAccessInteropReference_ExistsOnlyInWorkerProject` was already
failing on HEAD — the test project still references `ArchestrA.MxAccess`
for the Skip-gated discovery probes. Not regressed by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:44:15 -04:00
Joseph Doherty 6e356da092 docs: AlarmClient public surface — managed-event premise wrong, WM_APP required
Reflection probe of the deployed aaAlarmManagedClient.dll
(v1.0.7368.41290) on 2026-05-01 confirmed the public AlarmClient class
exposes zero public events. The PR A.5 design that AlarmClientConsumer
is built on (managed-event surface, no message pump) does not hold
against this assembly.

The actual notification mechanism is WM_APP messaging:
RegisterConsumer(hWnd, ...) takes a window handle because AVEVA's alarm
provider WM_APP-pokes the registered window, then GetStatistics +
GetAlarmExtendedRec pull the change set on each poke.

Practical impact:

- AlarmClientConsumer.AlarmRecordReceived has no production caller.
  RaiseAlarmRecordReceived is invoked only from tests. Subscribe(...)
  returns OK from RegisterConsumer + Subscribe but no notifications
  reach the consumer at runtime because no window is attached.
- Until A.2 lands a hidden message-only window + WindowProc that routes
  WM_APP into MxAccessAlarmEventSink.EnqueueTransition, the gateway's
  MX_EVENT_FAMILY_ON_ALARM_TRANSITION family cannot carry events.
- AcknowledgeByGuid and SnapshotActiveAlarms are pull-style and remain
  correct as written.

Changes:

- docs/AlarmClientDiscovery.md (new) — reflection probe summary, full
  AlarmClient method list, open questions for A.2 implementation.
- AlarmClientConsumer.cs xmldoc — replaced the inaccurate "managed
  event surface" claim with the WM_APP finding; flagged
  AlarmRecordReceived as unreachable in production until the WM_APP
  pump lands.
- MxAccessAlarmEventSink.cs xmldoc — replaced the "verify on dev rig"
  hedge in the wiring plan with the resolved finding; expanded the
  open-questions list (WM_APP message ID, wParam/lParam semantics, STA
  affinity, subscription scope) so the next A.2 PR knows what the
  dev-rig probe needs to answer.

Code-only no-op for the worker; worker builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:50:57 -04:00
Joseph Doherty 1ac5bcafb2 worker: AlarmClientConsumer + transition mapper (PR A.5)
Wires the worker-side consumer for AVEVA alarm transitions over the
aaAlarmManagedClient API discovered in the prior foundation PR.

- IAlarmMgrDataProvider.dll referenced — exposes AlarmRecord +
  eAlmTransitions / eQueryType / eSortFlags / eAlarmFilterState.
  Both DLLs (aaAlarmManagedClient + IAlarmMgrDataProvider) load in
  the worker's existing net48 x86 process; no new bitness boundary.
- IMxAccessAlarmConsumer abstraction — Subscribe / AcknowledgeByGuid
  / SnapshotActiveAlarms / AlarmRecordReceived event. Test seam.
- AlarmClientConsumer production wrapper — RegisterConsumer +
  Subscribe + AlarmAckByGUID + GetStatistics-based active-alarm
  walk, all delegated to AlarmClient. Uses AVEVA's managed event
  surface (GetAlarmChangesCompleted on IAlarmMgrDataProvider) so
  no Windows message pump is required — plain .NET events arrive
  on the alarm-client's internal callback thread.
- AlarmRecordTransitionMapper — pure-function helpers:
    MapTransitionKind(eAlmTransitions): ALM→Raise, ACK→Acknowledge,
        RTN→Clear, others (SUB/ENB/DIS/SUP/REL/REMOVE)→Unspecified
        so EventPump's decoding-failure counter records them.
    ComposeFullReference(provider, group, name): Provider!Group.Name
        format matching AVEVA's standard alarm-reference syntax.

Pinned during dev-rig validation (subsequent commits):

1. Confirm RegisterConsumer accepts hWnd=0 — if it requires a real
   hwnd, the worker creates a hidden message-only window and
   passes that handle. The managed event surface should make
   this irrelevant but the AVEVA API is older than its managed
   wrapper.
2. Wire AlarmClientConsumer.AlarmRecordReceived: the AVEVA
   IAlarmMgrDataProvider.GetAlarmChangesCompleted event needs to
   be hooked from inside the AlarmClient — find the proper
   accessor (likely a property exposing the inner provider).
3. AlarmRecord field-by-field translation into the proto event
   uses MxAccessAlarmEventSink.EnqueueTransition (existing
   plumbing). The AlarmRecord field names (ar_OrigTime,
   AlarmName, AckOperatorFullName, AckComment, etc.) are
   pinned in the discovery dump preserved in
   AlarmClientDiscoveryTests.

Tests: 127 pass (4 new ComposeFullReference cases + 1 Skip-gated
discovery probe). Transition-kind enum mapping is dev-rig-validated
rather than unit-tested because the AVEVA assembly is Private=false
on the reference and isn't copied to the test bin directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:42:22 -04:00
Joseph Doherty a14098468b worker: aaAlarmManagedClient discovery + reference (alarm-helper foundation)
Discovers the surface of aaAlarmManagedClient.dll and stages the worker
csproj reference so subsequent PRs can wire native MxAccess alarm
subscription. Replaces the speculative "operator decision needed
between path 1 and path 2" framing in MxAccessAlarmEventSink with the
validated architecture.

Key findings from the discovery probe:

1. aaAlarmManagedClient.dll is x86 + .NET Framework (mixed-mode
   C++/CLI; PE Machine = i386, NativeEntryPoint flag set). The
   "x64-only" framing in the prior follow-up was wrong — confused
   by the file path under Wonderware\Historian\x64\.
   The assembly is bitness- and runtime-compatible with the
   worker (net48 x86), so it loads in the existing process. No
   sub-process needed.

2. AlarmClient is the public class. Its model mirrors MxAccess:
   RegisterConsumer takes a Windows hWnd and the AVEVA alarm
   service WM_APP-pokes that hwnd when alarms change. The worker's
   existing STA + WM_APP pump can drive both the data-change COM
   subscriber and the alarm-client consumer.

3. AlarmAckByGUID(alarmGuid, ackComment, oprName, oprNode,
   oprDomain, oprFullName) — the native ack carries the operator's
   full identity atomically with the comment. Closes the v1
   operator-comment fidelity gap completely.

This PR:

- Adds the aaAlarmManagedClient.dll reference to MxGateway.Worker.
  csproj. Worker still builds clean.
- Adds AlarmClientDiscoveryTests as a Skip-gated reflection probe;
  flip the Skip parameter to dump the public type surface for
  reference. Captured the dump into MxAccessAlarmEventSink
  documentation so it doesn't have to be re-run.
- Replaces MxAccessAlarmEventSink's "two paths forward" doc with
  the actual wiring plan against AlarmClient's RegisterConsumer +
  Subscribe + AlarmAckByGUID surface.

Subsequent PRs (gated on STA + WM_APP integration testing on the
dev rig):

- Wire RegisterConsumer + Subscribe at session-startup; route
  WM_APP messages through GetStatistics + GetAlarmExtendedRec into
  EnqueueTransition.
- Translate gateway-side AcknowledgeAlarm RPC to a worker command
  that calls AlarmAckByGUID with the OPC UA operator's identity;
  replaces the worker-pending diagnostic from PR A.3.
- Translate gateway-side QueryActiveAlarms to a worker command
  that walks GetStatistics's reported handles via GetAlarmExtendedRec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 22:17:38 -04:00
Joseph Doherty 4e933802a7 worker: document MXAccess Toolkit alarm-API gap (A.2 follow-up)
PR A.2 ship-pin discovery: the MXAccess COM Toolkit installed at
C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll
does not expose any alarm-event family. Reflection enumeration of
the assembly confirms ILMXProxyServerEvents and
ILMXProxyServerEvents2 only carry OnDataChange, OnWriteComplete,
OperationComplete, and OnBufferedDataChange — no IAlarmEventSink,
no Alarms collection, no OnAlarmTransition.

AVEVA's separate alarm-subscription managed assemblies
(aaAlarmManagedClient.dll under InTouch\ViewAppFramework\Content\MA\
and ArchestrAAlarmsAndEvents.SDK.Common.dll under
Wonderware\Historian\x64\) exist on this box but are x64-only —
incompatible with the worker's x86 bitness, which is the bitness
constraint the mxaccessgw architecture exists to isolate in the
first place.

This commit replaces the speculative "TBD pin during dev-rig
validation" comment in MxAccessAlarmEventSink with the actual
finding plus the two operator-facing paths forward:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). lmxopcua's AlarmConditionService already synthesizes
   Part 9 transitions from the four MXAccess sub-attributes.
   Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions to the
   worker over a small named-pipe IPC. Recovers full v1 fidelity
   but adds operational complexity.

Until that decision resolves, the sink's Attach is a no-op, the
worker continues to function for data subscriptions, and
lmxopcua-side AlarmConditionService keeps the sub-attribute
synthesis active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:28:31 -04:00
Joseph Doherty 335c952f00 worker: alarm event mapper + sink scaffold (PR A.2 — partial)
Eighteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the proto-build path that
the worker uses to create OnAlarmTransition events. The COM-side
subscription that registers an alarm event sink against the MXAccess
Toolkit is pinned during dev-rig validation — the exact API differs
across AVEVA versions and needs hardware to verify.

Lands today (unit-testable, no hardware needed):
- MxAccessEventMapper.CreateOnAlarmTransition — mechanical proto
  builder. Takes decoded alarm fields (full reference, source
  object, alarm type, transition kind, severity, timestamps,
  operator user/comment, category, description) and produces an
  MxEvent with the OnAlarmTransition body populated. Mirrors the
  pattern of CreateOnDataChange / CreateOnWriteComplete / etc.
- MxAccessAlarmEventSink — scaffolded class with documented
  Attach / Detach + an internal EnqueueTransition entry point.
  When dev-rig validation pins the MXAccess Toolkit alarm
  subscription API, the only edit needed is to wire the COM
  delegate inside Attach to call EnqueueTransition. The mapper
  bridge is already done.

Pending dev-rig validation:
- Pin the MXAccess Toolkit alarm event source COM API (likely one
  of IAlarmEventSink, IAlarmEventSubscription, or a method on
  LMXProxyServerClass — verify against the worker host's installed
  version).
- Add cancellation/cleanup tests once the COM hook is wired.
- Integration test against the parity rig that fires a real Galaxy
  alarm and asserts the gateway emits OnAlarmTransition.

Tests:
- 2 new mapper tests pin the full-payload Acknowledge case and
  the bare-bones Raise case.
- Full Worker.Tests suite green: 123 passed (was 121; 2 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:16:29 -04:00
Joseph Doherty eed1e88a37 Add XML documentation across gateway, worker, and .NET client 2026-04-30 11:49:58 -04:00
Joseph Doherty b0041c5d18 Fix reliability findings 2026-04-28 06:27:01 -04:00
Joseph Doherty 907aa49aea Improve gateway reliability and client e2e coverage 2026-04-28 06:11:18 -04:00
Joseph Doherty 4fc355b357 Improve gateway reliability and dashboard docs 2026-04-28 00:13:22 -04:00
Joseph Doherty bd4a09a35e Add Polly resilience policies 2026-04-27 15:37:56 -04:00
Joseph Doherty 3d11ac3316 Add bulk MXAccess subscription commands 2026-04-26 22:29:27 -04:00
Joseph Doherty f7929cc12f Merge remote-tracking branch 'origin/main' into agent-2/issue-33-implement-graceful-shutdown
# Conflicts:
#	src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs
#	src/MxGateway.Worker/Ipc/WorkerPipeClient.cs
#	src/MxGateway.Worker/Ipc/WorkerPipeSession.cs
2026-04-26 19:41:04 -04:00
Joseph Doherty d890eff862 Implement graceful worker shutdown 2026-04-26 19:36:22 -04:00
Joseph Doherty 7d67313a7d Merge remote-tracking branch 'origin/main' into agent-3/issue-32-implement-heartbeat-and-watchdog
# Conflicts:
#	src/MxGateway.Worker/Ipc/WorkerPipeSession.cs
#	src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs
2026-04-26 19:16:42 -04:00
Joseph Doherty 4a3560c7ee Implement worker heartbeat watchdog 2026-04-26 19:12:06 -04:00
Joseph Doherty dd455089b4 Implement worker MXAccess event queue 2026-04-26 19:04:56 -04:00
Joseph Doherty a871f2f2e5 Implement worker advise commands 2026-04-26 18:41:10 -04:00
Joseph Doherty 59c710d789 Implement worker AddItem commands 2026-04-26 18:26:44 -04:00
Joseph Doherty 556c3bfa83 Implement worker register and unregister 2026-04-26 18:08:45 -04:00
Joseph Doherty 14419853c7 Issue #25: implement sta command dispatcher 2026-04-26 17:49:01 -04:00
Joseph Doherty 276288ad87 Merge remote-tracking branch 'origin/main' into agent-2/issue-31-implement-mxstatus-proxy-and-hresult-conversion 2026-04-26 17:39:48 -04:00
Joseph Doherty 29455fc1f6 Issue #31: implement mxstatus proxy and hresult conversion 2026-04-26 17:35:30 -04:00
Joseph Doherty 451dccf7e3 Issue #24: create mxaccess com object on sta 2026-04-26 17:34:12 -04:00
Joseph Doherty 6559672fc1 Issue #30: implement value conversion 2026-04-26 17:26:36 -04:00
Joseph Doherty e81682e367 Issue #23: implement sta runtime and message pump 2026-04-26 17:19:00 -04:00
Joseph Doherty d5a982152b Issue #22: implement pipe client and frame protocol 2026-04-26 17:16:49 -04:00
Joseph Doherty 0af1427859 Issue #21: implement worker bootstrap and options 2026-04-26 16:53:06 -04:00
Joseph Doherty b42c3c8b3b Issue #20: scaffold worker project 2026-04-26 16:37:23 -04:00