Commit Graph

669 Commits

Author SHA1 Message Date
Joseph Doherty
da8a3e46f7 Merge branch 'feat/tasks-12-14-22-23'
Wave 2 of the task-list run:
- #12 harden the Phase 6.2 deferred authz gates (28 new tests; gates already shipped in fb6dd34)
- #14 Phase 6.4 Stream B.5 — five-identifier ranked equipment search
- #22 docs/v2/phase-7-status.md — Phase 7 reconciliation (Phase 7 ~complete, 5 gaps)
- #23 retire docs/v2/lmx-followups.md
2026-05-18 04:42:39 -04:00
Joseph Doherty
09af8d2830 docs: add Phase 7 status reconciliation document
Audits every Phase 7 plan stream (A-H) against the repo, confirms the
exit gate is fully closed, and records the five genuine remaining gaps:
OPC UA method-call dispatch for alarm Ack/Confirm/Shelve, the
/virtual-tags and /scripted-alarms Admin UI tabs, the script log viewer,
and the missing production IHistoryWriter for virtual-tag historization.
Also notes that docs/v2/v2-release-readiness.md carries a stale "out of
scope" label — Phase 7 shipped completely after that doc was last updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:40:32 -04:00
Joseph Doherty
6c78027b5a docs: retire docs/v2/lmx-followups.md (all items DONE, pre-PR-7.2 arch)
Every item in lmx-followups.md was marked DONE and rooted in the
retired GalaxyProxyDriver / OtOpcUaGalaxyHost named-pipe architecture
deleted in PR 7.2. No live or unique content remains: v2-release-readiness.md
is the canonical open-work tracker. Remove the file and drop the
now-dead link from docs/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
bb1854b2f8 feat(admin): add five-identifier ranked equipment search (Phase 6.4 Stream B.5)
Implements the missing Stream B.5 search from the Phase 6.4 plan:
- EquipmentService.SearchAsync scopes to a cluster, scores hits across
  ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid (decision #117):
  exact = 100, prefix = 50, fuzzy (opt-in) = 20; published generation
  outranks draft on equal scores per spec.
- EquipmentSearchHit record carries Score + MatchedField for badge display.
- EquipmentTab.razor gains a search panel with per-row matched-field chips
  (green exact, amber prefix, grey fuzzy) and fuzzy opt-in checkbox.
- 14 new unit tests in EquipmentSearchTests.cs (Category=Unit) cover exact,
  prefix, fuzzy, case-insensitivity, tie-break, cross-cluster isolation, and
  maxResults cap; all 148 admin unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
70d7166a39 test(server): harden deferred authz gates — task #12 Browse/Subscribe/Call/AlarmAck
Add DeferredGateHardeningTests (28 unit tests) covering the Phase 6.2
compliance-checklist gaps left by the per-gate unit suites that shipped
with the gate implementations:

- Lax-mode fall-through for CreateMonitoredItems and Call gates (null
  identity and identity-without-LDAP-groups both skip denial in lax mode,
  consistent with BrowseGatingTests.Lax_mode_null_identity)
- Flag isolation: Subscribe-only grant does NOT imply Read; Read-only
  grant does NOT imply Subscribe; HistoryRead-only grant does NOT imply
  Read and vice versa (Phase 6.2 compliance: "HistoryRead uses its own flag")
- Alarm-bit isolation: AlarmAcknowledge alone does not grant AlarmConfirm
  or AlarmShelve; Browse alone does not grant AlarmAcknowledge
- AlarmShelve falls through to OpcUaOperation.Call in MapCallOperation
  (documents the ShelvedStateMachine per-instance NodeId limitation noted
  in the implementation, with the follow-up path: MethodCall grant covers it)
- Complete OpcUaOperation→NodePermissions mapping coverage for all deferred
  operations (Browse, CreateMonitoredItems, TransferSubscriptions, Call,
  AlarmAcknowledge, AlarmConfirm, AlarmShelve) — both positive and
  wrong-bit negative cases
- Multi-group union for deferred gates (grp-browse ∪ grp-ack gives both
  Browse and AlarmAcknowledge without leaking Read or Call)

Build: 0 errors on Server.csproj (verified against main repo build which
carries the gRPC-generated Galaxy driver artifacts the isolated worktree
lacks — that pre-existing gap is unrelated to these changes).
Test count: 247 → 275 (+28 unit, 0 failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:57 -04:00
Joseph Doherty
6968872e5d Merge branch 'feat/tasks-9-10-11-admin-hardening'
Wave 1 of the task-list run — three Admin hardening tasks:
- #9  resilient LDAP role-grant reads (ResilientLdapGroupRoleMappingService)
- #10 InteractiveServer render mode on 13 interactive pages + hub-URL fixes
- #11 ZTag/SAPID reservation pre-check in equipment CSV import (task #197)
2026-05-18 04:25:50 -04:00
Joseph Doherty
020c30f9a6 feat(admin): add ZTag/SAPID reservation pre-check to equipment CSV import (task #197)
ApplyReservationPreCheckAsync on EquipmentImportBatchService queries active
ExternalIdReservation rows in a single round-trip at parse time; rows whose ZTag
or SAPID is claimed by a different EquipmentUuid are moved from AcceptedRows to
RejectedRows with a descriptive reason. ImportEquipment.razor calls the check
after EquipmentCsvImporter.Parse so conflicts appear in the preview before the
operator clicks Stage + Finalise. Updated notice banner to reflect the pre-check
is now live; 6 new unit tests cover conflict, no-conflict, same-UUID, released-
reservation, and empty-input paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
a8dabc47f9 fix(admin): add InteractiveServer render mode to remaining interactive pages
ModbusAddressPreview (@bind on dropdowns + child ModbusAddressEditor with @oninput),
ModbusDiagnostics (@onclick Refresh), and NewCluster (EditForm with Nav.NavigateTo on submit)
were missed in the first pass — all three require interactivity but had no @rendermode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
43291d7fdd fix(admin): add InteractiveServer render mode to all interactive Blazor pages; fix wrong hub URLs
Eight pages were using @onclick handlers, Timers, or HubConnections but had no @rendermode,
causing interactivity to be silently dead under static SSR. Added @rendermode RenderMode.InteractiveServer
(with the required @using Microsoft.AspNetCore.Components.Web) to: AlarmsHistorian, Certificates,
Fleet, Home, Hosts, Reservations, DraftEditor, and ImportEquipment.

Also fixed two hub URL bugs: AclsTab and RedundancyTab were connecting to the non-existent
/hubs/fleet-status path; corrected to /hubs/fleet which matches the MapHub<FleetStatusHub>
call in Program.cs. Build: 0 errors, 0 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
75b91ebb97 feat(admin): wrap LdapGroupRoleMappingService in Phase 6.1-style resilience pipeline (Phase 6.2 Stream A.2)
Add ResilientLdapGroupRoleMappingService — a singleton decorator that wraps the
hot-path GetByGroupsAsync call in a Polly pipeline (timeout 2s → retry 3× jittered
→ fallback to in-memory sealed snapshot) so a transient Config DB outage at
Admin sign-in falls back to the last-known-good mapping set rather than denying
every login. The static LdapOptions.GroupToRole bootstrap dictionary in
AdminRoleGrantResolver remains the lock-out-proof floor regardless of DB state.

DI wiring uses keyed services: LdapGroupRoleMappingService (EF, scoped) is
registered under key "LdapGroupRoleMappingService.Inner"; the resilient singleton
decorator is the primary ILdapGroupRoleMappingService binding. The singleton
avoids the captive-dependency anti-pattern by using IServiceScopeFactory to open
a short-lived scope for each DB call.

Write methods (CreateAsync, DeleteAsync, ListAllAsync) pass through unchanged —
resilience is read-path only per Phase 6.1 design decision.

15 new unit tests cover: DB success/failure/retry paths, snapshot sealing and
per-group-set isolation, order-independent cache key normalisation, cancellation
propagation, and pass-through method routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:22 -04:00
Joseph Doherty
412cdec9b1 Merge branch 'docs/reservations'
Add docs/Reservations.md documenting the external-ID reservation flow.
2026-05-18 03:48:27 -04:00
Joseph Doherty
b90718013e docs: add Reservations.md — external-ID reservation flow
Document the ZTag / SAPID external-ID reservation subsystem: what a
reservation is, why it sits outside the generation flow (decision #124),
the ExternalIdReservation table, the lifecycle (author → publish
precheck → publish-time MERGE → FleetAdmin release), and the
/reservations Admin page. Linked from the docs README Operational table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:48:21 -04:00
Joseph Doherty
4a2e993a95 Merge branch 'chore/admin-trim-page-notices'
Trim explanatory notices from the role-grants and certificates pages.
2026-05-18 03:34:20 -04:00
Joseph Doherty
adbbb5e7d0 chore(admin): trim explanatory notices from role-grants and certificates
- Role grants: drop the page notice describing the LDAP-group → role
  mapping semantics; this is moving to the user instructions.
- Certificates: drop the trailing "operators should retry the rejected
  client's connection" note from the trust notice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:34:13 -04:00
Joseph Doherty
a8ef73dcb5 Merge branch 'feat/ldap-role-grants-signin'
Consume DB-backed LDAP role grants at Admin sign-in, with fleet-wide
and cluster-scoped roles, and fix the role-grants page interactivity.
2026-05-18 03:18:27 -04:00
Joseph Doherty
22fd314694 fix(admin): make the role-grants page interactive
The role-grants page is the authoring surface for LdapGroupRoleMapping
rows, but it had no @rendermode — so it rendered as static SSR and its
@onclick handlers (Add grant, Revoke) never fired. App.razor's <Routes/>
sets no global render mode; only ClusterDetail opted in.

- Add @rendermode RenderMode.InteractiveServer.
- Fix the SignalR hub URL: the page connected to /hubs/fleet-status,
  but FleetStatusHub is mapped at /hubs/fleet. Static SSR masked this
  (OnAfterRenderAsync never ran); enabling interactivity surfaced the
  404 that terminated the circuit.

Verified in-browser: Add grant opens the form, a cluster-scoped grant
saves and lists, Revoke removes it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:17:56 -04:00
Joseph Doherty
8adb83afee feat(admin): consume LDAP role grants at sign-in, incl. cluster scoping
The role-grants page authored LdapGroupRoleMapping rows but nothing
consumed them — sign-in only read the static appsettings GroupToRole
dictionary. Wire the DB-backed grants into the auth path.

- AdminRoleGrantResolver merges the static bootstrap dictionary (always
  fleet-wide, lock-out-proof) with DB grants; system-wide rows fold into
  fleet roles, cluster-scoped rows become (cluster, role) grants.
- Login emits a ClaimTypes.Role claim per fleet role and a cluster_role
  claim per cluster-scoped grant; lock-out check spans both scopes.
- ClusterRoleClaims + ClaimsPrincipal extensions resolve the effective
  role for a cluster (highest of fleet-wide and cluster-scoped).
- ClusterAuthorizeView gates cluster pages: ClusterDetail (view +
  ConfigEditor draft actions), DraftEditor (ConfigEditor / FleetAdmin
  publish), DiffViewer (ConfigViewer), ImportEquipment (ConfigEditor).
- RoleGrants page is now FleetAdmin-only; Account surfaces fleet-wide
  and cluster-scoped grants separately.

Control-plane only — decision #150 holds, NodeAcl is untouched.

Tests: AdminRoleGrantResolverTests + ClusterRoleClaimsTests (22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:09:06 -04:00
Joseph Doherty
1e04796953 Merge branch 'feat/admin-technical-light-design'
Restyle the Admin web UI with the technical-light design system,
and fix the LDAP sign-in path so it actually authenticates against
GLAuth (form binding, service-account DN, user-search attribute).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:49:35 -04:00
Joseph Doherty
5f5bfe1ea5 fix: make Admin LDAP sign-in work against GLAuth
Three bugs blocked sign-in entirely:

- Login.razor is static-SSR but its form model lacked
  [SupplyParameterFromForm], so the posted username/password never
  bound — SignInAsync saw empty fields and bailed before LDAP was
  contacted. Annotate the model; seed it in OnInitialized since
  BL0008 forbids an initializer on a [SupplyParameterFromForm]
  property.
- appsettings.json ServiceAccountDn used ou=svcaccts, which GLAuth
  reads as a (non-existent) group — the service-account bind failed
  with "Group not found". Use cn=serviceaccount,dc=lmxopcua,dc=local.
- LdapAuthService resolved the user DN by searching (uid=...), but
  GLAuth keys users by cn. Add an LdapOptions.UserNameAttribute knob
  (default cn for GLAuth; set sAMAccountName for Active Directory)
  and use it for the search filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:48:00 -04:00
Joseph Doherty
482d5f5637 feat: restyle Admin UI with the technical-light design system
Adopt the technical-light design system across the Admin web UI:

- Vendor theme.css + IBM Plex woff2 fonts into wwwroot; include
  theme.css globally after Bootstrap.
- Rebuild MainLayout: top app-bar (brand mark, breadcrumb, connection
  pill) + hairline-ruled side rail with accent-bordered active link.
- Convert all 33 pages to the component catalog — tables to
  panel + data-table (num/mono columns), KPI cards to agg-grid,
  detail blocks to metric-card/kv rows, badges to chips, alerts to
  panel notice, headings to page-title/panel-head, .rise reveals.
- Buttons/forms stay on Bootstrap; theme.css restyles them via
  --bs-* overrides. View-specific layout lives in app.css; all
  colour/type comes from theme.css tokens.

Also fix a pre-existing /fleet 500: the node-state query ordered on
a property of a constructed FleetNodeRow record, which EF Core
cannot translate. Order the join's columns before projecting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:20:09 -04:00
Joseph Doherty
31b9468102 Merge branch 'fix/admin-configdb-host'
fix: point Admin ConfigDb at the shared SQL host (10.100.0.35,14330).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 06:27:07 -04:00
Joseph Doherty
cf024c8150 fix: point Admin ConfigDb at the shared SQL host
The Admin appsettings.json still carried Server=localhost,14330 — a
straggler from before the 2026-04-28 Docker migration that moved SQL
Server onto the shared Linux host. Every other checked-in appsettings
was rewritten then; this one was missed, so the Admin web UI returned
HTTP 500 on every page (SqlException, connection timeout). Repoint it
at 10.100.0.35,14330 to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 05:04:14 -04:00
Joseph Doherty
0aee14686b Merge branch 'chore/solution-module-folders'
Organize the solution into module folders (Core/Server/Drivers/Client/
Tooling) on disk and in ZB.MOM.WW.OtOpcUa.slnx, with all .csproj, script,
and docs path references updated to match. Build green; unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:28:05 -04:00
Joseph Doherty
4e1751e1a4 docs: correct CLAUDE.md test commands for per-module test layout
The Build Commands block referenced tests/ZB.MOM.WW.OtOpcUa.Tests and
.IntegrationTests, which never existed in v2. Replace with the actual
per-module layout under tests/<module>/ and note which suites need
Docker fixtures or the central SQL Server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:11:36 -04:00
Joseph Doherty
969b0847a1 docs: update path references for module-folder reorganization
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:10:29 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00
69f02fed7f Merge pull request 'docs: alarms-over-gateway plan banner — record A.2 dev-rig finding' (#418) from track-d1-followup-plan-banner into master 2026-04-30 21:31:40 -04:00
Joseph Doherty
5ed26d2ec6 docs: alarms-over-gateway plan banner — record A.2 dev-rig finding
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.

Two operator-facing paths forward documented inline:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions over a
   named-pipe IPC. Recovers full v1 fidelity but adds operational
   complexity.

The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:29:16 -04:00
439b39463b Merge pull request 'scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)' (#417) from track-d1-refresh-services into master 2026-04-30 21:13:58 -04:00
62d01e76e5 Merge pull request 'docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)' (#416) from track-b5-docs-memory-housekeeping into master 2026-04-30 21:11:29 -04:00
Joseph Doherty
32b872d5c7 scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)
Seventeenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the script that the
plan calls for in Track D — the actual smoke-run validation
on the dev rig (publish, restart, fire alarms, capture artifacts)
remains operator work; this PR ships the automation that the
operator drives.

scripts/install/Refresh-Services.ps1 — single-shot refresh
script. Designed to run elevated on the deploy host
(DESKTOP-6JL3KKO today; production uses a separate runbook).
The script:

- Stops services in reverse-dependency order (OtOpcUa →
  OtOpcUaWonderwareHistorian → MxAccessGw) and force-kills any
  residual processes (avoids the publish-time MSB3027 file-lock
  the original install script hit).
- Snapshots existing C:\publish trees to
  C:\publish\.backup-YYYY-MM-DD-HHMMSS\ for rollback (skip with
  -SkipBackup).
- Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
  binaries from the sibling repo.
- Publishes OtOpcUa Server + Wonderware historian sidecar from
  this repo.
- Ensures OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true is set on
  the historian service env block (PR C.2 toggle).
- Starts services in forward-dependency order with the
  inter-service waits the original install used.
- Smoke-verifies (service status, listening ports 5120 / 4840
  / 4841, recent log tails).

Supports -WhatIf for dry-run inspection without touching the
running services.

docs/v2/dev-environment.md — new "Service Refresh —
Refresh-Services.ps1" section between Credential Management
and Test Data Seed. Cross-references the plan's Track D
functional verification scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:11:27 -04:00
Joseph Doherty
89004c052c docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)
Sixteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Closes the documentation sweep
the plan calls for.

- docs/AlarmTracking.md — promoted top-level v2-final architecture
  doc (was a worktree-only draft pre-epic). Covers the three alarm
  sources (Galaxy MxAccess driver-native / Galaxy sub-attribute
  fallback / scripted alarms), how they converge on
  AlarmConditionService, the Acknowledge routing decision in
  DriverNodeManager (driver-native preferred over IWritable
  sub-attribute fallback), the sidecar historian write-back path
  for non-Galaxy producers, and cross-references to the plan +
  v1 archive.
- docs/v1/AlarmTracking.md — banner pointing readers at the v2
  doc; preserved as historical record.
- docs/drivers/Galaxy.md — capability list updated to include
  IAlarmSource (now eight capabilities, restored by B.2). Replaced
  the "IAlarmSource retired in 7.2" sentence with the restoration
  note + cross-link to docs/AlarmTracking.md.
- docs/plans/alarms-over-gateway.md — completion banner at the
  top of the plan, marking 14 of 16 PRs shipped 2026-04-30 and
  noting that A.2 + A.4 + D.1 are the hardware-gated follow-up.

Memory entries updated separately:
- project_alarms_over_gateway_epic.md (new) — epic summary +
  per-PR digest.
- project_galaxy_via_mxgateway.md — added "Alarms restored"
  bullet pointing at the new architecture.
- project_server_history_alarm_subsystems.md — bullet 2 updated
  to describe the new ack-routing decision (B.3) + bullet 3
  added describing the historian write-back path that B.4 + C.1
  + C.2 light up.
- MEMORY.md index — new pointer entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:09:04 -04:00
2baca785ad Merge pull request 'abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)' (#415) from track-e7-alarm-event-args-extension into master 2026-04-30 17:49:19 -04:00
Joseph Doherty
1d62709060 abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)
Fourteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged) and B.3 (DriverNodeManager prefers
driver-native ack, merged).

Three new optional fields on Core.Abstractions.AlarmEventArgs:

- OperatorComment — populated by the driver-native gateway path on
  Acknowledge transitions. Null on raise / clear, and null on the
  sub-attribute fallback path where the comment collapses into a
  single string write.
- OriginalRaiseTimestampUtc — preserved across Acknowledge so OPC
  UA Part 9 conditions keep the original raise time.
- AlarmCategory — taxonomy bucket from the upstream alarm system.
  Maps to ConditionClassName downstream when a class mapping is
  configured.

GalaxyDriver.OnPumpAlarmTransition populates the new fields from
GalaxyAlarmTransition (PR B.1). Empty strings collapse to null so
consumers can use is-null rather than is-null-or-empty checks.

Client.Shared mirror DTO (Client.Shared/Models/AlarmEventArgs)
gains the same three properties so the Client.UI / Client.CLI
surfaces can reflect the rich payload — the actual UI/CLI
verbose-output and Show-Details rendering ship as a follow-up
PR; this PR locks in the payload contract.

Tests:
- 2 new tests in Driver.Galaxy.Tests pin the populated-vs-null
  behaviour for full-payload Acknowledge and bare-bones Raise
  transitions respectively.
- Solution build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:46:47 -04:00
0b5a4a676e Merge pull request 'server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)' (#414) from track-b3-prefer-driver-native-alarm into master 2026-04-30 17:23:09 -04:00
Joseph Doherty
edc984987b server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)
Thirteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged).

When DriverNodeManager registers an AlarmConditionState with
AlarmConditionService, it now picks the acknowledger:

- Driver implements IAlarmSource → DriverAlarmSourceAcknowledger
  routes the operator comment through IAlarmSource.AcknowledgeAsync
  via the existing AlarmSurfaceInvoker (Phase 6.1 resilience pipeline,
  no-retry per decision #143). Preserves operator-comment fidelity
  end-to-end — the value-driven sub-attribute write collapses the
  comment into a single string write that loses MxAccess metadata.
- Driver does not implement IAlarmSource →
  DriverWritableAcknowledger fallback (existing behaviour for
  AbCip / Modbus / S7 / etc).

The dedup logic that prefers driver-native transitions over
sub-attribute synthesis lives in AlarmConditionService and is
already in place — drivers that surface OnAlarmEvent (B.2) feed
the service directly, while sub-attribute writes still flow
through DriverNodeManager's ConditionSink so a Galaxy template
without $Alarm extensions stays functional.

Tests:
- 2 new routing-decision tests in
  DriverAlarmSourceAcknowledgerRoutingTests pin the
  IAlarmSource detection used at registration time.
- Server build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:20:45 -04:00
6126374594 Merge pull request 'driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)' (#413) from track-b2-galaxy-driver-ialarmsource into master 2026-04-30 17:18:20 -04:00
Joseph Doherty
38afc234ff driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.1 (EventPump
dispatch, merged) and PR E.2 (.NET SDK alarm methods, merged).
Restores the v1 IAlarmSource capability that PR 7.2 retired with the
legacy Galaxy.Host / Galaxy.Proxy projects.

GalaxyDriver gains:
- IAlarmSource on the class declaration → eight capabilities total
  (IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable /
  IRediscoverable / IHostConnectivityProbe / IAlarmSource).
- SubscribeAlarmsAsync — returns a sentinel handle and starts the
  shared EventPump (alarm wiring is lazy on first sub).
  Multiple handles share the same gateway stream; the server-side
  AlarmConditionService dispatches per-source-node downstream.
- UnsubscribeAlarmsAsync — symmetric handle removal; rejects
  handles not issued by this driver.
- AcknowledgeAsync — issues one gateway RPC per acknowledgement
  through IGalaxyAlarmAcknowledger. ConditionId carries the alarm
  full reference; falls back to SourceNodeId when empty.
- OnAlarmEvent — bridges EventPump.OnAlarmTransition (B.1) onto
  AlarmEventArgs. Suppressed when no alarm subscription is active so
  untracked transitions don't leak through.

New runtime types:
- IGalaxyAlarmAcknowledger — test seam.
- GatewayGalaxyAlarmAcknowledger — production wrapper around
  MxGatewayClient.AcknowledgeAlarmAsync (PR E.2). Maps native
  MxStatus failures to a logged warning rather than a thrown
  exception so a transient MxAccess hiccup doesn't fail the
  operator's Acknowledge.
- GalaxyAlarmSubscriptionHandle — driver-side IAlarmSubscriptionHandle.

Production runtime construction in BuildProductionRuntimeAsync wires
the acknowledger when not pre-injected; tests inject a fake via the
internal ctor.

Tests:
- 7 new tests in GalaxyDriverAlarmSourceTests — subscribe → event
  fire path, suppress without subscription, unsubscribe stops flow,
  foreign-handle rejection, ack routes per-request, ack falls back
  to SourceNodeId, ack throws NotSupported without acknowledger.
- Full Driver.Galaxy.Tests: 203 passed (was 196; 7 new).

Operates as a "stub-ready" surface — runtime ack calls will return
PERMISSION_DENIED until A.3 ships the gateway-side dispatch, and no
alarm transitions will arrive until A.2 adds the worker MxAccess
subscription. Both will activate this code path automatically when
the gateway side lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:15:46 -04:00
95422995c0 Merge pull request 'server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)' (#412) from track-b4-sidecar-alarm-historian-writer into master 2026-04-30 16:33:27 -04:00
Joseph Doherty
6e282b9946 server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)
Sixth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.2 (sidecar
serves IAlarmEventWriter when enabled), already merged.

Today Phase7Composer.ResolveHistorianSink only scans drivers for an
IAlarmHistorianWriter — no Galaxy driver provides one since PR 7.2,
so the resolution falls through to NullAlarmHistorianSink and
scripted-alarm transitions are silently discarded.

WonderwareHistorianClient already implements IAlarmHistorianWriter
and Program.cs:178 already registers it as a singleton when
Historian:Wonderware:Enabled=true. The gap was that Phase7Composer
ignored DI: this PR adds an optional injectedWriter constructor
parameter, and ASP.NET Core DI resolves it from the same
registration when present.

- Phase7Composer constructor: new optional IAlarmHistorianWriter?
  injectedWriter parameter (default null). Backward-compatible —
  existing callers don't need to change; DI populates it
  automatically when the singleton is registered.
- New static SelectAlarmHistorianWriter helper — resolution order
  is driver → DI → null. Drivers win when both are present so a
  future GalaxyDriver-as-IAlarmHistorianWriter takes the write
  path directly, preserving the v1 invariant where a driver that
  natively owns the historian client doesn't bounce through the
  sidecar IPC.
- ResolveHistorianSink uses the helper + emits a structured log
  line identifying which source provided the writer.

Tests:
- 4 SelectAlarmHistorianWriter precedence tests — no source / DI
  only / driver wins over DI / first-driver-with-writer wins.
- Pre-existing 4 HostStatusPublisherTests SQL failures unrelated
  to this change (require the docker-host SQL Server at
  10.100.0.35,14330 per CLAUDE.md). Phase7 + alarm tests all
  green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:31:00 -04:00
f67b3b1b30 Merge pull request 'sidecar: wire IAlarmEventWriter into Program.cs (PR C.2)' (#411) from track-c2-program-wires-alarm-writer into master 2026-04-30 16:22:36 -04:00
Joseph Doherty
ffacbe0370 sidecar: wire IAlarmEventWriter into Program.cs (PR C.2)
Fifth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.1
(AahClientManagedAlarmEventWriter), already merged.

Today HistorianFrameHandler is constructed at Program.cs line 57
without an alarmWriter, so every WriteAlarmEvents frame replies
"Sidecar not configured with an alarm-event writer" and the lmxopcua
side keeps the row queued. C.2 wires a real writer behind a new
OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED toggle.

- Program.BuildAlarmWriter — gated on the env var (default true,
  fail-open under accidental misconfiguration). Constructs an
  AahClientManagedAlarmEventWriter wrapping a
  SdkAlarmHistorianWriteBackend with the same connection config the
  read path uses.
- Install-Services.ps1 — appends OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
  to the OtOpcUaWonderwareHistorian service env block when the
  sidecar is installed. Read-only deployments flip it to false at
  service-config edit time without re-installing.
- HistorianFrameHandler signature already accepts
  IAlarmEventWriter? — supplying non-null at line 57 lights up
  the WriteAlarmEvents reply path that's been dormant since PR 3.3.

Until PR D.1 pins the live aahClientManaged entry point, the
SdkAlarmHistorianWriteBackend reports RetryPlease for every event
with a structured diagnostic. The lmxopcua-side
SqliteStoreAndForwardSink retains queued events; same effective
behaviour as today's NullAlarmHistorianSink fallback but with
visible diagnostics rather than silent discard.

Tests:
- 6 BuildAlarmWriter env-var cases — unset / true / false /
  unrecognized → default-on / capitalization variants.
- Full sidecar test suite: 56 passed (was 48; 8 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:20:11 -04:00
8a4526a376 Merge pull request 'sidecar: AahClientManagedAlarmEventWriter implements IAlarmEventWriter (PR C.1)' (#410) from track-c1-aah-alarm-writer into master 2026-04-30 16:19:36 -04:00
Joseph Doherty
f99cf5033a sidecar: AahClientManagedAlarmEventWriter implements IAlarmEventWriter (PR C.1)
Fourth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Independent of Tracks A and B —
the sidecar slot defined in HistorianFrameHandler line 242 is unwired
today; PR C.2 (next) flips it on in Program.cs.

- AlarmHistorianWriteOutcome (sidecar-local, net48 — twin of
  Core.AlarmHistorian.HistorianWriteOutcome which is net10): Ack /
  RetryPlease / PermanentFail.
- IAlarmHistorianWriteBackend abstraction so the SDK call can be
  faked in unit tests.
- AahClientManagedAlarmEventWriter implements IAlarmEventWriter,
  delegates to the backend, maps Ack→true / Retry|Permanent→false
  for the IPC bool[] reply contract. Backend exception → whole
  batch RetryPlease (preserves the sender's queue across transients
  rather than dropping). Wrong-count return defends against a
  backend bug desyncing queue accounting.
- SdkAlarmHistorianWriteBackend — production binding skeleton.
  Reports RetryPlease for every event and logs a structured
  diagnostic until PR D.1 pins the live aahClientManaged entry
  point against the dev rig. The sender's SqliteStoreAndForwardSink
  retains queued events, mirroring today's NullAlarmHistorianSink
  behaviour but with visible diagnostics instead of silent discard.
- MapOutcome shared helper — pinned via theory tests so the D.1
  swap can change the SDK call site without reshuffling the
  HRESULT → outcome mapping.

Tests:
- 6 writer tests — empty batch / single Ack / mixed Ack-Retry-
  Permanent-Ack ordering / backend-throw → RetryPlease batch /
  cancellation propagates / wrong-count defensive degrade.
- 5 outcome theory cases — hresult 0 → Ack, malformed wins over
  hresult 0, comm error → Retry, unknown failure → Retry,
  malformed + comm → Permanent.
- Full sidecar test suite: 48 passed (was 42; 6 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:17:05 -04:00
c59bf59635 Merge pull request 'driver-galaxy: EventPump dispatches OnAlarmTransition family (PR B.1)' (#409) from track-b1-eventpump-alarm into master 2026-04-30 15:44:32 -04:00
Joseph Doherty
7853e94f4b driver-galaxy: EventPump dispatches OnAlarmTransition family (PR B.1)
Second PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR A.1 in mxaccessgw
(merged) which added the OnAlarmTransitionEvent body + family. No
runtime impact yet — the gateway doesn't emit the new family until
A.3 ships; this PR just stops dropping it on the floor.

EventPump.Dispatch becomes a switch on MxEventFamily. The new
DispatchAlarmTransition decodes the proto event, runs the raw severity
through MxAccessSeverityMapper (the same four-bucket ladder v1 used —
250/500/750/1000 boundaries per docs/v1/AlarmTracking.md), and fires
an internal OnAlarmTransition event with a GalaxyAlarmTransition
record carrying the full payload.

Body absent or transition-kind unspecified → counted via
galaxy.alarm_transitions.decoding_failures and dropped. Gateway
version skew or worker malformed event therefore degrades to "fall
back to the sub-attribute path" rather than crashing the pump.

GalaxyDriver consumes the internal event in PR B.2 (next), wrapping
it onto IAlarmSource.OnAlarmEvent. The richer fields (operator user
+ comment, original raise time, category) become visible on the OPC
UA Part 9 condition once AlarmEventArgs gets extended in E.7.

Tests:
- MxAccessSeverityMapperTests — full bucket ladder + clamp behaviour
  for negative + out-of-range inputs.
- EventPumpAlarmTests — raise/ack/clear sequence dispatches in order
  with operator metadata + original-raise preserved; unspecified
  kind drops; missing body drops; mixed data-change + alarm streams
  dispatch independently; OnWriteComplete / OperationComplete
  filtered out.

Full Driver.Galaxy.Tests suite: 196 passed (was 191 — 5 new tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:41:44 -04:00
Joseph Doherty
49ae6e7b6f docs: alarms-over-gateway — add Track E client surface refresh
Cover both client surfaces that become user-visible when the alarm
path lights up:

- mxaccessgw client SDKs in 5 languages (.NET, Python, Go, Java,
  Rust). E.1 regens proto across all of them; E.2-E.6 add per-language
  alarms helpers (subscribe / acknowledge / query-active) plus matching
  CLI verbs.
- lmxopcua OPC UA-facing clients (Client.CLI, Client.UI). E.7 extends
  AlarmEventArgs with the new optional fields, surfaces them in the
  CLI's --verbose / --json output and the UI's Show-details toggle,
  and updates ClientRequirements + Client.{CLI,UI}.md.

Sequencing: E.1 first (mechanical regen), then E.2-E.7 in parallel.
E.2 (.NET) is on the critical path because lmxopcua consumes it; the
other-language SDKs can ship asynchronously without gating D.1.

12 PRs grew to 19 total: 4 in A, 5 in B, 2 in C, 7 in E, 1 in D.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:20:57 -04:00
Joseph Doherty
8d0e13e69e docs: alarms-over-gateway plan — add Track D deployment refresh
After A/B/C all merge, the running services on C:\publish need to be
refreshed before the Galaxy alarm-event family flows end-to-end. Add
PR D.1: a Refresh-Services.ps1 script + runbook for stopping in
reverse-dependency order, restaging binaries from the build outputs,
restarting in forward-dependency order, and capturing a smoke-run
artifact.

D.1 gates B.5 (docs sweep) — the documentation records the
as-deployed shape, so the deployment has to be live first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:11:23 -04:00
Joseph Doherty
7367b3e23f docs: alarm-historian write moves from gateway to historian sidecar
Revise the alarms-over-gateway plan based on review feedback:

The gateway is for MxAccess (live data + Galaxy hierarchy); the
Wonderware historian sidecar is for aahClientManaged (time-series +
alarms historian). Two SDKs, two concerns. Routing alarm-historian
write-back through the gateway would force coupling that doesn't need
to exist — the sidecar already has a dormant WriteAlarmEvents IPC slot
ready to wire.

Drop A.5 (gateway WriteHistorianEvent RPC). Add Track C — two PRs in
the historian sidecar that complete the dormant slot:
  C.1 AahClientManagedAlarmEventWriter implementation
  C.2 Program.cs wires the writer into HistorianFrameHandler

B.4 reverses from "delete the IPC slot" to "consume the IPC slot" via
a new SidecarAlarmHistorianWriter on the lmxopcua side.

Also tightens Why-section #3 + D5 to make explicit that the path is
exclusively for non-Galaxy alarm producers (scripted alarms today, AB
CIP ALMD or others future). Galaxy-native alarms reach AVEVA Historian
via System Platform's own HistorizeToAveva toggle, independent of
anything in our stack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:08:58 -04:00
Joseph Doherty
65a5f64931 docs: plan — alarms over the mxaccessgw gateway
Coordinated cross-repo epic to restore the three v1 alarm capabilities
that PR 7.2 regressed: rich MxAccess alarm-event metadata, native
Acknowledge semantics, and the IAlarmHistorianWriter write-back path.

Architectural split: gateway owns MxAccess transport (new
OnAlarmTransition event family + AcknowledgeAlarm / QueryActiveAlarms /
WriteHistorianEvent RPCs); lmxopcua keeps the OPC UA Part 9 state
machine, ACL/role enforcement, and multi-source aggregation. The
existing value-driven sub-attribute path stays as fallback.

10 PRs total — 5 in mxaccessgw, 5 in lmxopcua — sequenced so each
side's work is independently reviewable. End-of-epic gate is a parity
matrix run with five new alarm scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:02:48 -04:00