Commit Graph

90 Commits

Author SHA1 Message Date
Joseph Doherty dbf550da8b docs(mxgateway): sync Metrics.md to renamed meter + seconds histogram units 2026-06-01 16:48:46 -04:00
Joseph Doherty 2eb81379e4 docs: TLS auto-cert and lenient client trust 2026-06-01 07:43:13 -04:00
Joseph Doherty c4e7ddea70 docs: implementation plan for gateway TLS auto-cert and lenient client trust 2026-06-01 07:01:58 -04:00
Joseph Doherty 6bfa4fe884 docs: design for gateway TLS auto-cert and lenient client trust 2026-06-01 06:54:23 -04:00
Joseph Doherty 97e583e96b docs: implementation plan for per-language LazyBrowseNode walker
9 tasks: Java toolchain install (Homebrew), 5 parallel per-language
walker implementations, README updates, final verification. Java
walker is gated on toolchain bootstrap success; other languages
proceed independently if Java fails.
2026-05-28 14:17:52 -04:00
Joseph Doherty eaf479349d docs: design for client-side LazyBrowseNode walker + per-language tests
Adds one high-level walker per client (.NET/Python/Rust/Go/Java) plus
six unit tests each against existing fake transports. One-shot idempotent
Expand semantics; pagination hidden inside the helper. Includes Java
toolchain bootstrap (Homebrew Temurin + Gradle) so the Java client can
build locally on the macOS dev host.
2026-05-28 14:12:03 -04:00
Joseph Doherty 83a4d41fce docs: align design doc test-plan with InvalidArgument error mapping 2026-05-28 13:30:19 -04:00
Joseph Doherty 0b389f5a97 docs: document BrowseChildren RPC and lazy browse architecture 2026-05-28 13:19:08 -04:00
Joseph Doherty cf54a278e1 docs: record lazy-browse stays wire-only; align error mapping 2026-05-28 13:18:23 -04:00
Joseph Doherty b3ebf583ad docs: implementation plan for lazy-browse BrowseChildren RPC
12-task bite-sized plan executing the approved design.
Includes native task persistence file.
2026-05-28 12:41:11 -04:00
Joseph Doherty edb812d859 docs: design for lazy-browse BrowseChildren RPC
OPC UA-style level-at-a-time browse across gRPC, dashboard, and the
shared cache projector. Server still loads the full Galaxy hierarchy;
laziness is wire-side and UI-side only.
2026-05-28 12:34:37 -04:00
Joseph Doherty 615b487a77 docs+ui: backfill XML doc comments and finish dashboard layout pass
Adds missing <summary>/<param> XML docs across 99 server, worker, and test
files so CommentChecker reports zero issues (TreatWarningsAsErrors needs the
analyzer clean). Bundles in WIP dashboard work: NavSection extraction,
MainLayout/site.css/js styling alignment, and DashboardOptions/Auth tweaks.
2026-05-27 14:20:10 -04:00
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00
Joseph Doherty d692232191 dashboard: clear deferred items — EventsHub publisher + doc refresh
EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)

EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:

  * IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
    take a direct dependency on SignalR.
  * DashboardEventBroadcaster — singleton implementation that hands the
    SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
    logged at debug and dropped so the source gRPC stream is never
    blocked.

EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.

SessionDetailsPage adds a "Recent events" panel:
  * implements IAsyncDisposable
  * opens a second HubConnection via DashboardHubConnectionFactory targeting
    /hubs/events
  * calls SubscribeSession(SessionId) on Start
  * renders the most recent 50 events in a small table (worker seq, family,
    server/item handle, alarm reference when the event is OnAlarmTransition)
  * shows a live/offline conn-pill driven by HubConnection.Closed /
    Reconnected events

The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.

Documentation refresh

Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:

  * CLAUDE.md — Authentication section now explains LDAP bind +
    GroupToRole + HubToken bearer flow.
  * gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
    events SignalR hubs, LDAP cookie + bearer scheme.
  * docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
    add GroupToRole row, append "Authorization policies" and "SignalR hubs"
    subsections describing the three policies and the /hubs/* endpoints.
  * docs/GatewayDashboardDesign.md — hosting model (root mount, new
    endpoint layout), Realtime Updates rewritten as a hub table
    (DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
    and routing), Authentication And Authorization rewritten around LDAP +
    role mapping + the hub bearer flow, Configuration block updated.
  * docs/GatewayProcessDesign.md — security-section dashboard paragraph
    and the example config block both refreshed to LDAP/role auth.
  * docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
    updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
    API-key login flow).
  * docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
    GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
    RequiredGroup default; success-path assertion explains the role-claim
    check.

Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:07:30 -04:00
Joseph Doherty dc9c0c950c rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx
Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:22:23 -04:00
Joseph Doherty a4ed605f74 A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
  - Subscribe + 1st PollOnce yield real transition events
  - Field-by-field decode correct (provider, group, tag, severity,
    UTC timestamp, comment, type)
  - SnapshotActiveAlarms reflects current state
  - AcknowledgeByName(real identity) -> rc=0
  - Pipeline keeps streaming transitions on the 10s cadence post-ack

Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:

1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
   earlier discovery-doc recommendation) makes the first
   GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
   call is unnecessary because the round-trip echo is mangled was
   wrong — mangled echo or not, the call is required.

2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
   instance (returns -55). Workaround: provision a parallel
   "ack-only" wnwrap consumer that runs Initialize → Register →
   Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
   Production WnWrapAlarmConsumer now holds two COM clients;
   AcknowledgeByName always dispatches through the ack-only one.

3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
   8-arg overload returns -55 on this AVEVA build (apparently a
   stub); the v1 6-arg overload works. Production now calls the
   6-arg overload, discarding the proto's operator_domain and
   operator_full_name fields. The proto contract keeps both for
   forward-compat if AVEVA fixes the v2 method.

Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).

Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.

Test counts after this slice:
  Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
          1 pre-existing structure-fail (untouched)
  Server: 308 pass / 0 fail
Solution builds clean.

docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:17:39 -04:00
Joseph Doherty f711a55be4 A.2: replace AlarmClientConsumer with wnwrap-based polling consumer
Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient`
to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by
`wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML
payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime
auto-marshaling that crashed `GetHighPriAlarm` with
ArgumentOutOfRangeException on every poll. Live captured 60/60 polls
clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform
script flipped TestMachine_001.TestAlarm001 every 10s; the GUID,
priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps
arrived end-to-end.

Implementation:
- `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under
  `lib/` so dev boxes don't need the SDK to build.
- New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a
  500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the
  snapshot keyed by alarm GUID, and raises one
  `MxAlarmTransitionEvent` per state change. Includes the
  Initialize→Register-before-Subscribe ordering fix found during
  Discovery probe runs.
- New library-agnostic types `MxAlarmSnapshotRecord` /
  `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build
  path is testable without an AVEVA install.
- `AlarmRecordTransitionMapper` retired the COM-coupled
  `MapTransitionKind(eAlmTransitions)`; new pure helpers
  `ParseStateKind`, `MapTransition(prev, curr)`, and
  `ParseTransitionTimestampUtc` cover XML decode + state-delta logic.
- `IMxAccessAlarmConsumer` event surface changed from
  `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>`
  and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` —
  decoupling the interface from any specific COM library.
- Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider`
  refs; adds `Interop.WNWRAPCONSUMERLib`.

Tests:
- 36 new unit tests (state-string mapping, prev/current → proto kind
  decision table, timestamp UTC reassembly, XML payload parser, 32-char
  hex GUID round-trip) covering everything that doesn't touch the live
  COM surface — all passing.
- Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives
  the live capture flow for regression / future probes.

Docs:
- `docs/AlarmClientDiscovery.md` "Option A — captured" section records
  sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip
  (prefer `Subscribe` for filtering), the `GetStatistics`
  AccessViolationException quirk, and the worker-integration outline.

Pre-existing failure noted (separate):
`MxAccessInteropReference_ExistsOnlyInWorkerProject` was already
failing on HEAD — the test project still references `ArchestrA.MxAccess`
for the Skip-gated discovery probes. Not regressed by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:44:15 -04:00
Joseph Doherty f490ae2593 docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements
only IDisposable (no [ComImport] interface, no class GUID) and
has a single field "CwwAlarmConsumer* m_almUnmanaged". So
AlarmClient is a C++/CLI managed wrapper around a native C++
class -- NOT a COM-interop class. The DateTime conversion happens
INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling
boundary. There's no separate COM interface to QI to.

Revised approach (in docs/AlarmClientDiscovery.md):

A. wnwrapConsumer.dll -- separate standalone COM library AVEVA
   ships at "C:\Program Files (x86)\Common Files\ArchestrA"
   exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with
   SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output
   bypasses FILETIME marshaling entirely. Best fit -- real COM,
   self-contained, conventional production-grade approach.
B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a
   vendor binary, brittle to upgrades.
C. Reflect into m_almUnmanaged and call native vtable directly
   -- requires reverse-engineering the C++ class layout.

Picking A. Probe restored to Skip; next commit starts the
wnwrapConsumer integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:15:37 -04:00
Joseph Doherty 39f9fd8946 probe: BREAKTHROUGH — alarms flow via canonical \Node\Galaxy!Area, blocked by DateTime marshaling
Two findings that turn the alarm capture path on:

1. Subscription expression: \<MachineName>\Galaxy!<Area> is the
   canonical AlarmClient subscription format per ArchestrA docs:
   \Node\Provider!Area!Filter, with Provider literally "Galaxy"
   (not the Galaxy name) and Node being the machine name. For
   this rig: \DESKTOP-6JL3KKO\Galaxy!DEV catches alarms.
2. InitializeConsumer before RegisterConsumer — discovered
   earlier; bug-fix for PR A.5's AlarmClientConsumer.

With these in place, GetHighPriAlarm returned a record on every
poll for 60s straight (117/117 calls). But every call throws
ArgumentOutOfRangeException: Not a valid Win32 FileTime, because
AlarmRecord has five DateTime fields (ar_Time / ar_OrigTime /
ar_AckTime / ar_RtnTime / ar_SubTime) and AVEVA writes sentinel
FILETIME values for unset ones (e.g., ar_AckTime on an
unacknowledged alarm). The aaAlarmManagedClient.dll auto-marshals
FILETIME -> DateTime and rejects out-of-range values.

GetStatistics still reports total=0 active=0 even with
GetHighPriAlarm returning records — those two APIs have
different views. The active read API for current alarms is
GetHighPriAlarm, not GetStatistics's change array.

So the consumer chain works. The blocking issue is now
extracting the payload past the AVEVA-shipped DateTime
auto-marshaling. Three approaches for the next PR:

1. Patch aaAlarmManagedClient.dll via ildasm/ilasm round-trip.
2. Define a custom [ComImport] interface with safe-blittable
   types and Marshal.QueryInterface to it.
3. Use IDispatch late binding to bypass strong-typed marshaling.

Option 2 is cleanest; needs the AlarmClient COM IID.

Probe changes:

- Subscription expression set to \<MachineName>\Galaxy!DEV.
- GetHighPriAlarm tally counters (ok-with-record vs throw).
- 117 throws / 0 ok-with-record over 60s confirms alarms are
  flowing continuously while the user's flip script runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:06:45 -04:00
Joseph Doherty bb7be14d1d probe: aaAlarmManagedClient receives no alarm data — full consumer chain verified
Sixth probe iteration with every consumer-side knob exhausted:

- Subscriptions tried (all rc=0): \Galaxy!, \Galaxy!*, \Galaxy!,
  \Galaxy!TestArea, \.\Galaxy!.
- Read channels polled at 500ms: GetStatistics, GetHighPriAlarm,
  SFCreateSnapshot + SFGetStatistics.
- Filters: priority 0..32767, qtSummary + qtHistory both tried,
  asAlarmActiveNow.
- AlarmRecord pre-init to FILETIME epoch to dodge marshaler bug
  on default(DateTime).

Result: every read API returns empty for the entire 60s window
even with TestMachine_001.TestAlarm001 firing every 10s and
aaObjectViewer confirming InAlarm transitions. The
aaAlarmManagedClient.AlarmClient is not the receive surface
AVEVA's alarm pipeline routes to in this Galaxy configuration.

The consumer chain is verified working end-to-end: Initialize +
Register + Subscribe all succeed, GetProviders finds the
provider, the WM 0xC275 heartbeat fires at 1Hz to AVEVA's
internal hwnd. There is simply no alarm data flowing through
this consumer surface.

Next investigation is not consumer-side: either find the SDK
aaObjectViewer's alarm panel uses, or query the historian
event storage directly. If alarms only flow via the historian
path on this customer's Galaxy, the worker's PR A.5 architecture
is a dead-end and A.2 needs a different transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:26:29 -04:00
Joseph Doherty 8ac6642bf8 probe: subscribe-parameter sweep — alarms still absent, producer-side blocked
Tried every documented subscription knob with InitializeConsumer
present + provider visible at status 100:

- qtSummary AND qtHistory (the only eQueryType values).
- Priority 1..999 AND 0..32767.
- FilterMask/Spec asNone AND asAlarmActiveNow.

eAlarmFilterState is single-state-valued (asNone=0,
asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits,
so the filter surface is exhausted.

GetStatistics continued to report total=0 active=0 codes=[7]
for every poll across all combinations.

User confirmation: the BoolAlarm extension on
TestMachine_001.TestAlarm001 is evaluating (the $Alarm.InAlarm
sub-attribute flips true/false in lockstep with the script
writes, visible in aaObjectViewer). So the consumer chain is
verified working end-to-end on our side. What's missing is
producer-side publication into the aaAlarmManagedClient stream.

Probable causes (config, not code):

- BoolAlarm extension's "publish to alarm manager" / "Active" /
  "Enabled" flag may be off.
- Alarm-vs-event mode setting may have it routing to events,
  not alarms.
- Platform alarm area may not match the consumer's subscription
  scope.

Resolution path: check the BoolAlarm extension's config in System
Platform IDE; check aaObjectViewer's Active Alarms panel (not
attribute panel) to see if the alarm appears there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:53:26 -04:00
Joseph Doherty 4e8928cf71 probe: InitializeConsumer required — provider visible after, alarms still absent
InitializeConsumer was the missing call. Adding it before
RegisterConsumer makes the \Galaxy! provider appear in
GetProviders (status 0 -> 100 within 500ms). Without Initialize,
GetProviders returns an empty list even though everything else
returns rc=0 (success).

Probe trace 2026-05-01:

  InitializeConsumer -> 0
  RegisterConsumer -> 0
  GetProviders [after Register] -> count=0 list=[]
  Subscribe('\Galaxy!') -> 0
  GetProviders [after Subscribe] -> count=1 list=[  0 \Galaxy!]
  GetProviders [poll #1] -> count=1 list=[100 \Galaxy!]

Despite the provider being at "100% query complete" for the
entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] -- no alarm transitions reached the
consumer even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run.

So the consumer chain works end-to-end. What's missing is alarm
traffic from the producer side. The next discriminator is
whether ObjectViewer (or another live consumer) sees the alarm
fire while the script runs.

API-ordering bug fix to apply to PR A.5's AlarmClientConsumer
regardless of how A.2 lands: AlarmClientConsumer.Subscribe
should call InitializeConsumer before RegisterConsumer (currently
omits Initialize entirely, which means the provider chain is
never visible from the worker either). That fix lifts a
fundamental bug independent of the polling-vs-callback question.

Probe changes:

- Added InitializeConsumer call before RegisterConsumer.
- Added LogProviders helper that logs only on change; called
  after Register, after Subscribe, and on every poll. Easier
  to spot when the provider chain transitions from empty to
  populated.
- Restored Skip-gating after run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:43:06 -04:00
Joseph Doherty f4423dfb6d probe: GetProviders=0 — alarm path upstream-blocked on dev rig
Extended AlarmClientWmProbeTests to call AlarmClient.GetProviders
after RegisterConsumer. Run 2026-05-01:

  GetProviders -> rc=0 count=0 list=[]

Zero alarm providers visible to the consumer. This explains every
preceding probe run — no providers means no alarm events,
regardless of subscription expression or value writes upstream.
Even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run,
GetStatistics reported no transitions, no positions[] entries,
no field changes after t=0.85s.

Possible causes (dev-rig configuration, not code):

1. No $Alarm extension on the test bool — flipping the value
   writes a value but doesn't fire an alarm.
2. AVEVA alarm-manager service (aaAlarmMgr or equivalent) not
   running on this rig.
3. Process security context — providers registered under a
   service account aren't visible to a consumer running under
   a normal user account.

A.2 implementation is blocked on this until at least one provider
is visible. Once a provider exists, the polling-vs-callback
question is answerable in one probe run; without a provider both
paths return the same "nothing happening" answer.

Probe changes:

- Added in-process MxAccess Write attempt (TriggerWriteValue) —
  hit TargetParameterCountException so the Write signature is
  not (handle, item, value); reflection diag added but not
  resolved. Now disabled in favor of external trigger.
- Added GetProviders enumeration after RegisterConsumer.
- Removed firePrint/clearPrint markers; probe is observe-only.
- Added ArchestrA.MxAccess reference to the test project.

Also updated docs/AlarmClientDiscovery.md with the
alarm-provider-visibility section explaining what's blocked
and why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:37:15 -04:00
Joseph Doherty 3ff4969224 probe: GetStatistics polling viable, Galaxy has no active alarms today
Extended AlarmClientWmProbeTests.ProbeAlarmClientWmMessages to also
call GetStatistics every ~2s during the pump window. Re-ran on the
dev rig 2026-05-01:

- GetStatistics is safely callable from the same thread that did
  RegisterConsumer + Subscribe. Every poll (9 calls / 20s window)
  returned rc=0, no exceptions.
- Galaxy currently has zero active alarms. total=0 active=0
  suppressed=0 newAlarms=0 across every poll. positions[] and
  handles[] arrays were empty.
- changes=1 codes=[7] was constant across all polls, matching the
  constant 1 Hz WM 0xC275 cadence — same heartbeat semantics
  exposed through both the WM path and the pull API.

Confirms the polling design is mechanically viable: GetStatistics
threading-affinity is fine and the call is cheap. The remaining
unknown is whether GetStatistics populates positions[] / handles[]
with real entries when an alarm actually fires. Proving that
requires triggering an alarm — next probe is an MxAccess write to a
$Alarm-extended boolean tag (reference pending).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:16:08 -04:00
Joseph Doherty 12881ca791 docs+test: live AlarmClient WM probe — heartbeat-only, hWnd not used
Added MxGateway.Worker.Tests/AlarmClientWmProbeTests.cs as a Skip-gated
runtime probe. Run on the dev rig 2026-05-01 against the live AVEVA
install (Galaxy reachable, no manual alarm fired). Findings:

- RegisterConsumer(hWnd, ...) and Subscribe("\Galaxy!", ...) both
  return 0 (success). Calls are valid against the deployed assembly.
- A registered-message-class WM (ID 0xC275 in this OS session) fires
  every ~1 second after Subscribe completes. Constant wParam=0x1100,
  constant lParam=0x079E46D8 — looks like a heartbeat / keepalive,
  not a per-change notification.
- Critically, this WM is delivered to AVEVA's own internal window
  (hwnd=0x18032E), NOT to the consumer hWnd we registered. The
  consumer window receives only the standard WM_CREATE / WM_DESTROY
  sequence; no AVEVA traffic in between.

This invalidates the WM_APP-pump design previously documented. The
hWnd parameter to RegisterConsumer appears to be a registration
identity only — AVEVA's notification path runs entirely against
AVEVA's own internal window.

Two viable A.2 designs replace the previous one:

1. Polling. Call GetStatistics on a 500ms timer in the worker's STA
   and react to whatever change set it reports. No window plumbing
   needed. Latency floor = poll period. Matches AVEVA's own
   internal heartbeat cadence.
2. Hook AVEVA's internal window. Discover AVEVA's own hwnd,
   SetWindowSubclass on it, intercept WM 0xC275 on AVEVA's thread.
   Higher fidelity, lower latency, but invasive and fragile across
   AVEVA upgrades — likely a non-starter.

Recommendation in docs/AlarmClientDiscovery.md is option 1 (polling)
unless a follow-up probe with a real fired alarm shows AVEVA does
post change-specific WMs to a different hWnd.

Open follow-up probes documented:

- Fire a real Galaxy alarm during pump and check whether WM 0xC275
  cadence changes or GetStatistics returns non-empty arrays.
- GetStatistics threading affinity test.
- Hook AVEVA's internal window 0x18032E.
- Decompile aaAlarmManagedClient IL for RegisterConsumer to find
  whether WNAL_Register's callback surface is wrapped.

Test project changes:

- Added Reference to aaAlarmManagedClient + IAlarmMgrDataProvider
  (Private=true so the DLL gets copied into bin for test load).
- Test-suite-wide: 127 real tests still pass; both alarm-related
  Skip-gated tests skip cleanly.

Code change to the probe is additive — the worker is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:05:47 -04:00
Joseph Doherty 6e356da092 docs: AlarmClient public surface — managed-event premise wrong, WM_APP required
Reflection probe of the deployed aaAlarmManagedClient.dll
(v1.0.7368.41290) on 2026-05-01 confirmed the public AlarmClient class
exposes zero public events. The PR A.5 design that AlarmClientConsumer
is built on (managed-event surface, no message pump) does not hold
against this assembly.

The actual notification mechanism is WM_APP messaging:
RegisterConsumer(hWnd, ...) takes a window handle because AVEVA's alarm
provider WM_APP-pokes the registered window, then GetStatistics +
GetAlarmExtendedRec pull the change set on each poke.

Practical impact:

- AlarmClientConsumer.AlarmRecordReceived has no production caller.
  RaiseAlarmRecordReceived is invoked only from tests. Subscribe(...)
  returns OK from RegisterConsumer + Subscribe but no notifications
  reach the consumer at runtime because no window is attached.
- Until A.2 lands a hidden message-only window + WindowProc that routes
  WM_APP into MxAccessAlarmEventSink.EnqueueTransition, the gateway's
  MX_EVENT_FAMILY_ON_ALARM_TRANSITION family cannot carry events.
- AcknowledgeByGuid and SnapshotActiveAlarms are pull-style and remain
  correct as written.

Changes:

- docs/AlarmClientDiscovery.md (new) — reflection probe summary, full
  AlarmClient method list, open questions for A.2 implementation.
- AlarmClientConsumer.cs xmldoc — replaced the inaccurate "managed
  event surface" claim with the WM_APP finding; flagged
  AlarmRecordReceived as unreachable in production until the WM_APP
  pump lands.
- MxAccessAlarmEventSink.cs xmldoc — replaced the "verify on dev rig"
  hedge in the wiring plan with the resolved finding; expanded the
  open-questions list (WM_APP message ID, wParam/lParam semantics, STA
  affinity, subscription scope) so the next A.2 PR knows what the
  dev-rig probe needs to answer.

Code-only no-op for the worker; worker builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:50:57 -04:00
Joseph Doherty ddad573b75 Merge origin/main with local pending work and update AGENTS.md references
- Resolve 14 conflicts from popping local stash on top of origin's
  eed1e88 + 8d3352f doc-comment additions (11 mechanical, plus
  version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs)
- Fix 4 test files that used AGENTS.md as the repo-root sentinel
  (now use CLAUDE.md, since AGENTS.md was removed in 4731ab5)
- Redirect 10 doc citations from AGENTS.md to the matching gateway.md
  sections (Value Model, Status Model, Security, STA Worker Thread
  Model, gRPC Layer rule, cancellation rule)

Verified: solution build clean, x86 worker build clean, 266/266
gateway tests passing, 121/121 worker tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:13:33 -04:00
Joseph Doherty 51a9dadf62 Align docs with StyleGuide and add CLAUDE.md
- Rename 16 kebab-case docs to PascalCase per StyleGuide
- Move per-language client design docs from docs/ to clients/<lang>/
  alongside their READMEs
- Add ## Related Documentation sections to 15 docs that lacked one
- Fix sentence-case violations in H3 headings (StyleGuide rule)
- Update cross-references in gateway.md, client READMEs, scripts,
  and generate-proto.ps1 helpers to follow the new paths
- Add CLAUDE.md with build/test commands, the source-update
  verification matrix, the parity-first contract, and pointers
  to MXAccess and Galaxy Repository analysis sources

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:19:22 -04:00
Joseph Doherty 133c83029b Add Galaxy repository API and clients 2026-04-29 07:27:00 -04:00
Joseph Doherty 907aa49aea Improve gateway reliability and client e2e coverage 2026-04-28 06:11:18 -04:00
Joseph Doherty 4fc355b357 Improve gateway reliability and dashboard docs 2026-04-28 00:13:22 -04:00
Joseph Doherty bd4a09a35e Add Polly resilience policies 2026-04-27 15:37:56 -04:00
Joseph Doherty d431ff9660 Fix dashboard static assets and add client e2e scripts 2026-04-27 12:10:40 -04:00
Joseph Doherty 3d11ac3316 Add bulk MXAccess subscription commands 2026-04-26 22:29:27 -04:00
Joseph Doherty 4ea2c4fd86 Issue #50: clarify packaging API key placeholders 2026-04-26 21:26:28 -04:00
Joseph Doherty 41a2d70f8f Merge remote-tracking branch 'origin/main' into agent-2/issue-50-client-packaging-documentation 2026-04-26 21:23:06 -04:00
Joseph Doherty 79f73e04fd Issue #49: add cross-language smoke matrix 2026-04-26 21:21:49 -04:00
Joseph Doherty f2118f7028 Issue #50: document client packaging 2026-04-26 21:20:43 -04:00
Joseph Doherty 0a670eb381 Issue #35: add parity fixture matrix 2026-04-26 20:47:05 -04:00
Joseph Doherty af42891d5a Issue #47: scaffold Java Gradle build 2026-04-26 20:36:27 -04:00
Joseph Doherty f861a8b3b8 Issue #45: scaffold Python package 2026-04-26 20:22:35 -04:00
Joseph Doherty 0f17a1d1d9 Add live MXAccess worker smoke test 2026-04-26 19:58:33 -04:00
Joseph Doherty f049d3e603 Merge remote-tracking branch 'origin/main' into agent-3/issue-43-scaffold-rust-workspace 2026-04-26 19:49:30 -04:00
Joseph Doherty ee88f9d647 Scaffold Rust client workspace 2026-04-26 19:47:26 -04:00
Joseph Doherty f7929cc12f Merge remote-tracking branch 'origin/main' into agent-2/issue-33-implement-graceful-shutdown
# Conflicts:
#	src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs
#	src/MxGateway.Worker/Ipc/WorkerPipeClient.cs
#	src/MxGateway.Worker/Ipc/WorkerPipeSession.cs
2026-04-26 19:41:04 -04:00
Joseph Doherty d890eff862 Implement graceful worker shutdown 2026-04-26 19:36:22 -04:00
Joseph Doherty bcfbd1cfc8 Merge remote-tracking branch 'origin/main' into agent-3/issue-41-scaffold-go-module 2026-04-26 19:30:16 -04:00
Joseph Doherty 8e3b0c1c4a Scaffold Go client module 2026-04-26 19:27:27 -04:00
Joseph Doherty bd4be85f26 Merge remote-tracking branch 'origin/main' into agent-1/issue-38-scaffold-dotnet-client-projects 2026-04-26 19:25:15 -04:00
Joseph Doherty 7331c6157a Scaffold .NET client projects 2026-04-26 19:25:07 -04:00