Commit Graph

119 Commits

Author SHA1 Message Date
Joseph Doherty ac42783e36 feat(sessions): multi-subscriber cap enforcement + mode-gated FailFast 2026-06-15 15:32:08 -04:00
Joseph Doherty 7b12eebbd1 docs: add MaxEventSubscribersPerSession to config shape example; assert its default
Add MaxEventSubscribersPerSession (value 8) to the Sessions block of the
Configuration Shape JSON example in GatewayConfiguration.md, matching the
appsettings.json default the options table already documents. Assert both
MaxEventSubscribersPerSession (8) and MaxPendingCommandsPerSession (128)
defaults in GatewayOptionsTests.OptionsBinding_UsesDesignDefaults.
2026-06-15 15:16:44 -04:00
Joseph Doherty bd190ab012 feat(config): allow multiple event subscribers + add MaxEventSubscribersPerSession cap
Remove the hard-rejection of AllowMultipleEventSubscribers=true in GatewayOptionsValidator
(fan-out is now implemented via SessionEventDistributor). Add MaxEventSubscribersPerSession
(default 8, must be >= 1) to SessionOptions, validate it, expose it in
EffectiveSessionConfiguration / GatewayConfigurationProvider, document it in
GatewayConfiguration.md and appsettings.json. Tests cover the no-error path for
AllowMultipleEventSubscribers=true, the 0/-1 rejection, positive pass, and default pass.
2026-06-15 15:13:21 -04:00
Joseph Doherty 1ea08c3b10 feat(dashboard): mirror events via SessionEventDistributor subscriber (fixes dark feed without gRPC client) 2026-06-15 14:42:32 -04:00
Joseph Doherty 039111ca05 feat(sessions): per-subscriber backpressure isolation in SessionEventDistributor 2026-06-15 13:39:25 -04:00
Joseph Doherty e962737d2c feat(sessions): add bounded replay ring buffer to SessionEventDistributor 2026-06-15 12:42:15 -04:00
Joseph Doherty 00c849e63b docs: session-resilience implementation plan (28 tasks, 5 phases) 2026-06-15 12:15:34 -04:00
Joseph Doherty 3fc6ccad30 docs: session-resilience epic design (fan-out, reconnect, ACL, reattach) 2026-06-15 12:11:46 -04:00
Joseph Doherty a56ce0ddbd docs: refresh stillpending.md for completed work; record residuals (§7/E2)
Mark §1.1 (11 worker commands), §1.2 (audit CorrelationId), and §4 client
CLI/helper parity as Resolved with commit refs; correct §4.4 (dotnet version
already worked). Record open residuals: §1.3 live failover counter, §3.2
multi-sample buffered conversion, §1.4 vendor-stub ack, DrainEvents snapshot
semantics.
2026-06-15 11:35:48 -04:00
Joseph Doherty bd46ba1270 fix(test): drop removed logger arg from GalaxyRepositoryGrpcService test call sites; docs: STA phrasing
Remove the trailing NullLogger<GalaxyRepositoryGrpcService>.Instance argument
from all four CreateService/inline constructions in GalaxyRepositoryGrpcServiceTests
and GalaxyFilterInputSafetyTests, matching the now-4-param constructor after the
dead logger parameter was removed in 0032d2d. Also drop the now-unused
Microsoft.Extensions.Logging.Abstractions using from both files.

Rephrase the §5 STA blurb in docs/AlarmClientDiscovery.md: GatewayAlarmMonitor
routes polling *through* the worker's StaRuntime (which owns the STA pump) rather
than owning the pump itself.
2026-06-15 09:52:07 -04:00
Joseph Doherty 0032d2dc44 docs+chore: fix stale prose, project names, remove dead MapSqlException (§7)
- docs/plans/2026-06-14-deferred-followups.md: mark D1 as executed
  (commit 4af24b9; metric emitted at DashboardSnapshotService.cs:198);
  note D2 resolved as no-op; D3-D5 remain pending
- docs/AlarmClientDiscovery.md §5: rewrite STA "production fix needed"
  to past tense — alarms now route through GatewayAlarmMonitor/worker STA
- EventsHub.cs: replace stale "publisher side is a future follow-up"
  comment; DashboardEventBroadcaster is live and DI-registered
- CLAUDE.md: fix all project-name drift (src/MxGateway.* →
  src/ZB.MOM.WW.MxGateway.*; MxGateway.sln → ZB.MOM.WW.MxGateway.slnx;
  clients/dotnet/MxGateway.Client.sln → ZB.MOM.WW.MxGateway.Client.slnx)
- GalaxyRepositoryGrpcService.cs: remove dead MapSqlException method and
  its IDE0051 suppression pragma; drop now-unused ILogger ctor param and
  Microsoft.Data.SqlClient using; build confirmed 0 warnings/errors
2026-06-15 09:43:00 -04:00
Joseph Doherty 883557fc8a docs: implementation plan for stillpending.md completion
28 tasks across 5 workstreams (A worker control cmds, B worker COM cmds,
C audit CorrelationId, D client CLI parity, E docs). Zero proto changes;
worker net48/x86 + Java on windev, rest local.
2026-06-15 09:35:50 -04:00
Joseph Doherty 4a00b1bdc1 docs: design for completing stillpending.md actionable items
Covers the 11 worker command kinds (§1.1), audit CorrelationId threading
(§1.2), client CLI/helper parity (§4), and doc hygiene (§7). Key finding:
all 11 commands already have proto/validation/scope/routing, so this is a
worker-executor + COM-wrapper + client-CLI effort with zero contract changes.
2026-06-15 09:32:01 -04:00
Joseph Doherty d2c776901b fix(integrationtests): repair GatewayAlarmMonitor ctor build break; LDAP bind + docs (IntegrationTests-026..029) 2026-06-15 02:39:11 -04:00
Joseph Doherty 258e09e0de fix(server): propagate watch-list cancellation; doc + test gaps (Server-051..053) 2026-06-15 02:39:11 -04:00
Joseph Doherty b40aaeef05 docs: forced-subtag fix resolution (#1 proto artifact, #2 fixed) 2026-06-15 01:33:14 -04:00
Joseph Doherty c6f17557f6 docs: forced-subtag mode fix plan 2026-06-15 01:04:46 -04:00
Joseph Doherty bbbef4d098 D2: document no current duplicate / endpoint (no-op) 2026-06-14 23:49:47 -04:00
Joseph Doherty 371ce53409 docs: add deferred follow-ups plan (D1-D5) 2026-06-14 23:46:21 -04:00
Joseph Doherty 37aadf72b3 docs(alarms): clarify resolver cancellation contract; mark design doc superseded
C6b: IAlarmWatchListResolver.ResolveAsync doc now notes that while discovery being
unavailable never throws, a triggered cancellation token still propagates.
C7: annotate the original design doc where it drifted from the shipped code — metric
names / unimplemented watch-list gauges, and the proto-type location (gateway proto, not
worker proto).
2026-06-14 02:33:14 -04:00
Joseph Doherty 5b31e99ab6 alarms: compose subtag reference from object's real Galaxy area for exact alarmmgr parity 2026-06-14 02:12:11 -04:00
Joseph Doherty 64db828d71 docs(alarms): record live confirmation of subtag path + ack; advise-before-write requirement 2026-06-13 11:26:08 -04:00
Joseph Doherty e72763d703 alarms: use confirmed AVEVA AlarmExtension subtag names (InAlarm/Acked/AckMsg/Priority) 2026-06-13 11:07:22 -04:00
Joseph Doherty 3c9becc8d6 docs(plan): mark all alarm-subtag-fallback tasks completed 2026-06-13 10:55:18 -04:00
Joseph Doherty 2f30f0c7c0 docs(alarms): document alarmmgr->subtag fallback (providers, failover, config, contract, parity) 2026-06-13 10:43:37 -04:00
Joseph Doherty c75920c620 docs(plan): correct alarm proto location to mxaccess_gateway.proto (Tasks 1-2) 2026-06-13 09:18:11 -04:00
Joseph Doherty 5ea5618315 docs: implementation plan for alarm subtag-monitoring fallback
18 TDD tasks across contracts, worker (SubtagAlarmConsumer + FailoverAlarmConsumer),
gateway (GR-SQL watch-list discovery, monitor mode reflection, metrics, dashboard),
and docs. Grounded in current signatures; parity-preserving (worker-side synthesis).
2026-06-13 08:44:42 -04:00
Joseph Doherty 38a0ad8ab4 docs: design for alarmmgr→subtag alarm-provider fallback
Auto-failover/failback between the wnwrap alarmmgr consumer and a new
worker-side SubtagAlarmConsumer that advises alarm subtags and synthesizes
transitions. GR-SQL+config watch-list discovery, ack via ack-comment write,
degraded state surfaced in the gRPC contract and dashboard/metrics.
2026-06-13 08:35:18 -04:00
Joseph Doherty e57d864ab2 fix(dashboard): make dashboard auth cookie name configurable
The dashboard auth cookie name was hardcoded to the constant
DashboardAuthenticationDefaults.CookieName (MxGatewayDashboard). Browser
cookies are scoped by host+path but NOT by port, so two gateway instances
sharing a hostname would clobber each other's dashboard session under the
shared name.

Add DashboardOptions.CookieName (MxGateway:Dashboard:CookieName); null/blank
keeps the canonical default. Applied in the existing dashboard cookie
PostConfigure (runs after the inline AddCookie default, so it wins). Behaviour
is unchanged when unset. Adds a Tests case for the override.
2026-06-03 13:11:29 -04:00
Joseph Doherty dbf550da8b docs(mxgateway): sync Metrics.md to renamed meter + seconds histogram units 2026-06-01 16:48:46 -04:00
Joseph Doherty 2eb81379e4 docs: TLS auto-cert and lenient client trust 2026-06-01 07:43:13 -04:00
Joseph Doherty c4e7ddea70 docs: implementation plan for gateway TLS auto-cert and lenient client trust 2026-06-01 07:01:58 -04:00
Joseph Doherty 6bfa4fe884 docs: design for gateway TLS auto-cert and lenient client trust 2026-06-01 06:54:23 -04:00
Joseph Doherty 97e583e96b docs: implementation plan for per-language LazyBrowseNode walker
9 tasks: Java toolchain install (Homebrew), 5 parallel per-language
walker implementations, README updates, final verification. Java
walker is gated on toolchain bootstrap success; other languages
proceed independently if Java fails.
2026-05-28 14:17:52 -04:00
Joseph Doherty eaf479349d docs: design for client-side LazyBrowseNode walker + per-language tests
Adds one high-level walker per client (.NET/Python/Rust/Go/Java) plus
six unit tests each against existing fake transports. One-shot idempotent
Expand semantics; pagination hidden inside the helper. Includes Java
toolchain bootstrap (Homebrew Temurin + Gradle) so the Java client can
build locally on the macOS dev host.
2026-05-28 14:12:03 -04:00
Joseph Doherty 83a4d41fce docs: align design doc test-plan with InvalidArgument error mapping 2026-05-28 13:30:19 -04:00
Joseph Doherty 0b389f5a97 docs: document BrowseChildren RPC and lazy browse architecture 2026-05-28 13:19:08 -04:00
Joseph Doherty cf54a278e1 docs: record lazy-browse stays wire-only; align error mapping 2026-05-28 13:18:23 -04:00
Joseph Doherty b3ebf583ad docs: implementation plan for lazy-browse BrowseChildren RPC
12-task bite-sized plan executing the approved design.
Includes native task persistence file.
2026-05-28 12:41:11 -04:00
Joseph Doherty edb812d859 docs: design for lazy-browse BrowseChildren RPC
OPC UA-style level-at-a-time browse across gRPC, dashboard, and the
shared cache projector. Server still loads the full Galaxy hierarchy;
laziness is wire-side and UI-side only.
2026-05-28 12:34:37 -04:00
Joseph Doherty 615b487a77 docs+ui: backfill XML doc comments and finish dashboard layout pass
Adds missing <summary>/<param> XML docs across 99 server, worker, and test
files so CommentChecker reports zero issues (TreatWarningsAsErrors needs the
analyzer clean). Bundles in WIP dashboard work: NavSection extraction,
MainLayout/site.css/js styling alignment, and DashboardOptions/Auth tweaks.
2026-05-27 14:20:10 -04:00
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00
Joseph Doherty d692232191 dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)

EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:

  * IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
    take a direct dependency on SignalR.
  * DashboardEventBroadcaster — singleton implementation that hands the
    SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
    logged at debug and dropped so the source gRPC stream is never
    blocked.

EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.

SessionDetailsPage adds a "Recent events" panel:
  * implements IAsyncDisposable
  * opens a second HubConnection via DashboardHubConnectionFactory targeting
    /hubs/events
  * calls SubscribeSession(SessionId) on Start
  * renders the most recent 50 events in a small table (worker seq, family,
    server/item handle, alarm reference when the event is OnAlarmTransition)
  * shows a live/offline conn-pill driven by HubConnection.Closed /
    Reconnected events

The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.

Documentation refresh

Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:

  * CLAUDE.md — Authentication section now explains LDAP bind +
    GroupToRole + HubToken bearer flow.
  * gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
    events SignalR hubs, LDAP cookie + bearer scheme.
  * docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
    add GroupToRole row, append "Authorization policies" and "SignalR hubs"
    subsections describing the three policies and the /hubs/* endpoints.
  * docs/GatewayDashboardDesign.md — hosting model (root mount, new
    endpoint layout), Realtime Updates rewritten as a hub table
    (DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
    and routing), Authentication And Authorization rewritten around LDAP +
    role mapping + the hub bearer flow, Configuration block updated.
  * docs/GatewayProcessDesign.md — security-section dashboard paragraph
    and the example config block both refreshed to LDAP/role auth.
  * docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
    updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
    API-key login flow).
  * docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
    GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
    RequiredGroup default; success-path assertion explains the role-claim
    check.

Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:07:30 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00
Joseph Doherty a4ed605f74 A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
  - Subscribe + 1st PollOnce yield real transition events
  - Field-by-field decode correct (provider, group, tag, severity,
    UTC timestamp, comment, type)
  - SnapshotActiveAlarms reflects current state
  - AcknowledgeByName(real identity) -> rc=0
  - Pipeline keeps streaming transitions on the 10s cadence post-ack

Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:

1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
   earlier discovery-doc recommendation) makes the first
   GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
   call is unnecessary because the round-trip echo is mangled was
   wrong — mangled echo or not, the call is required.

2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
   instance (returns -55). Workaround: provision a parallel
   "ack-only" wnwrap consumer that runs Initialize → Register →
   Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
   Production WnWrapAlarmConsumer now holds two COM clients;
   AcknowledgeByName always dispatches through the ack-only one.

3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
   8-arg overload returns -55 on this AVEVA build (apparently a
   stub); the v1 6-arg overload works. Production now calls the
   6-arg overload, discarding the proto's operator_domain and
   operator_full_name fields. The proto contract keeps both for
   forward-compat if AVEVA fixes the v2 method.

Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).

Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.

Test counts after this slice:
  Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
          1 pre-existing structure-fail (untouched)
  Server: 308 pass / 0 fail
Solution builds clean.

docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:17:39 -04:00
Joseph Doherty f711a55be4 A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient`
to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by
`wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML
payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime
auto-marshaling that crashed `GetHighPriAlarm` with
ArgumentOutOfRangeException on every poll. Live captured 60/60 polls
clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform
script flipped TestMachine_001.TestAlarm001 every 10s; the GUID,
priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps
arrived end-to-end.

Implementation:
- `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under
  `lib/` so dev boxes don't need the SDK to build.
- New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a
  500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the
  snapshot keyed by alarm GUID, and raises one
  `MxAlarmTransitionEvent` per state change. Includes the
  Initialize→Register-before-Subscribe ordering fix found during
  Discovery probe runs.
- New library-agnostic types `MxAlarmSnapshotRecord` /
  `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build
  path is testable without an AVEVA install.
- `AlarmRecordTransitionMapper` retired the COM-coupled
  `MapTransitionKind(eAlmTransitions)`; new pure helpers
  `ParseStateKind`, `MapTransition(prev, curr)`, and
  `ParseTransitionTimestampUtc` cover XML decode + state-delta logic.
- `IMxAccessAlarmConsumer` event surface changed from
  `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>`
  and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` —
  decoupling the interface from any specific COM library.
- Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider`
  refs; adds `Interop.WNWRAPCONSUMERLib`.

Tests:
- 36 new unit tests (state-string mapping, prev/current → proto kind
  decision table, timestamp UTC reassembly, XML payload parser, 32-char
  hex GUID round-trip) covering everything that doesn't touch the live
  COM surface — all passing.
- Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives
  the live capture flow for regression / future probes.

Docs:
- `docs/AlarmClientDiscovery.md` "Option A — captured" section records
  sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip
  (prefer `Subscribe` for filtering), the `GetStatistics`
  AccessViolationException quirk, and the worker-integration outline.

Pre-existing failure noted (separate):
`MxAccessInteropReference_ExistsOnlyInWorkerProject` was already
failing on HEAD — the test project still references `ArchestrA.MxAccess`
for the Skip-gated discovery probes. Not regressed by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:44:15 -04:00
Joseph Doherty f490ae2593 docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements
only IDisposable (no [ComImport] interface, no class GUID) and
has a single field "CwwAlarmConsumer* m_almUnmanaged". So
AlarmClient is a C++/CLI managed wrapper around a native C++
class -- NOT a COM-interop class. The DateTime conversion happens
INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling
boundary. There's no separate COM interface to QI to.

Revised approach (in docs/AlarmClientDiscovery.md):

A. wnwrapConsumer.dll -- separate standalone COM library AVEVA
   ships at "C:\Program Files (x86)\Common Files\ArchestrA"
   exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with
   SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output
   bypasses FILETIME marshaling entirely. Best fit -- real COM,
   self-contained, conventional production-grade approach.
B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a
   vendor binary, brittle to upgrades.
C. Reflect into m_almUnmanaged and call native vtable directly
   -- requires reverse-engineering the C++ class layout.

Picking A. Probe restored to Skip; next commit starts the
wnwrapConsumer integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:15:37 -04:00
Joseph Doherty 39f9fd8946 probe: BREAKTHROUGH — alarms flow via canonical \Node\Galaxy!Area, blocked by DateTime marshaling
Two findings that turn the alarm capture path on:

1. Subscription expression: \<MachineName>\Galaxy!<Area> is the
   canonical AlarmClient subscription format per ArchestrA docs:
   \Node\Provider!Area!Filter, with Provider literally "Galaxy"
   (not the Galaxy name) and Node being the machine name. For
   this rig: \DESKTOP-6JL3KKO\Galaxy!DEV catches alarms.
2. InitializeConsumer before RegisterConsumer — discovered
   earlier; bug-fix for PR A.5's AlarmClientConsumer.

With these in place, GetHighPriAlarm returned a record on every
poll for 60s straight (117/117 calls). But every call throws
ArgumentOutOfRangeException: Not a valid Win32 FileTime, because
AlarmRecord has five DateTime fields (ar_Time / ar_OrigTime /
ar_AckTime / ar_RtnTime / ar_SubTime) and AVEVA writes sentinel
FILETIME values for unset ones (e.g., ar_AckTime on an
unacknowledged alarm). The aaAlarmManagedClient.dll auto-marshals
FILETIME -> DateTime and rejects out-of-range values.

GetStatistics still reports total=0 active=0 even with
GetHighPriAlarm returning records — those two APIs have
different views. The active read API for current alarms is
GetHighPriAlarm, not GetStatistics's change array.

So the consumer chain works. The blocking issue is now
extracting the payload past the AVEVA-shipped DateTime
auto-marshaling. Three approaches for the next PR:

1. Patch aaAlarmManagedClient.dll via ildasm/ilasm round-trip.
2. Define a custom [ComImport] interface with safe-blittable
   types and Marshal.QueryInterface to it.
3. Use IDispatch late binding to bypass strong-typed marshaling.

Option 2 is cleanest; needs the AlarmClient COM IID.

Probe changes:

- Subscription expression set to \<MachineName>\Galaxy!DEV.
- GetHighPriAlarm tally counters (ok-with-record vs throw).
- 117 throws / 0 ok-with-record over 60s confirms alarms are
  flowing continuously while the user's flip script runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:06:45 -04:00
Joseph Doherty bb7be14d1d probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified
Sixth probe iteration with every consumer-side knob exhausted:

- Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!,
  \Galaxy!TestArea, \.\Galaxy!.
- Read channels polled at 500ms: GetStatistics, GetHighPriAlarm,
  SFCreateSnapshot + SFGetStatistics.
- Filters: priority 0..32767, qtSummary + qtHistory both tried,
  asAlarmActiveNow.
- AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug
  on default(DateTime).

Result: every read API returns empty for the entire 60s window
even with TestMachine_001.TestAlarm001 firing every 10s and
aaObjectViewer confirming InAlarm transitions. The
aaAlarmManagedClient.AlarmClient is not the receive surface
AVEVA's alarm pipeline routes to in this Galaxy configuration.

The consumer chain is verified working end-to-end: Initialize +
Register + Subscribe all succeed, GetProviders finds the
provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's
internal hwnd. There is simply no alarm data flowing through
this consumer surface.

Next investigation is not consumer-side: either find the SDK
aaObjectViewer's alarm panel uses, or query the historian
event storage directly. If alarms only flow via the historian
path on this customer's Galaxy, the worker's PR A.5 architecture
is a dead-end and A.2 needs a different transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:26:29 -04:00
Joseph Doherty 8ac6642bf8 probe: subscribe-parameter sweep — alarms still absent, producer-side blocked
Tried every documented subscription knob with InitializeConsumer
present + provider visible at status 100:

- qtSummary AND qtHistory (the only eQueryType values).
- Priority 1..999 AND 0..32767.
- FilterMask/Spec asNone AND asAlarmActiveNow.

eAlarmFilterState is single-state-valued (asNone=0,
asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits,
so the filter surface is exhausted.

GetStatistics continued to report total=0 active=0 codes=[7]
for every poll across all combinations.

User confirmation: the BoolAlarm extension on
TestMachine_001.TestAlarm001 is evaluating (the $Alarm.InAlarm
sub-attribute flips true/false in lockstep with the script
writes, visible in aaObjectViewer). So the consumer chain is
verified working end-to-end on our side. What's missing is
producer-side publication into the aaAlarmManagedClient stream.

Probable causes (config, not code):

- BoolAlarm extension's "publish to alarm manager" / "Active" /
  "Enabled" flag may be off.
- Alarm-vs-event mode setting may have it routing to events,
  not alarms.
- Platform alarm area may not match the consumer's subscription
  scope.

Resolution path: check the BoolAlarm extension's config in System
Platform IDE; check aaObjectViewer's Active Alarms panel (not
attribute panel) to see if the alarm appears there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:53:26 -04:00