Commit Graph

678 Commits

Author SHA1 Message Date
Joseph Doherty
392b219233 fix(tests): stabilize three flaky tests under parallel full-solution load
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.

#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.

#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:59:00 -04:00
Joseph Doherty
41f133a337 feat(admin-ui): add /virtual-tags, /scripted-alarms, and /script-log pages (tasks #25, #26, #27)
Gap 2 (#25): VirtualTagsTab.razor + /virtual-tags global page — list/create/toggle
virtual tags per draft generation with DataType, Script, trigger, Historize, Enabled
fields. Tab wired into DraftEditor.

Gap 3 (#26): ScriptedAlarmsTab.razor + /scripted-alarms global page — list/create
scripted alarms with AlarmType, Severity, MessageTemplate, PredicateScript,
HistorizeToAveva, Retain. SeverityBand helper shows Low/Medium/High/Critical label.
Tab wired into DraftEditor.

Gap 4 (#27): ScriptLogHub (SignalR IAsyncEnumerable stream) tails scripts-*.log with
optional ScriptName filter; ScriptLog.razor provides Start/Stop/Clear controls plus
level filter dropdown. Hub registered at /hubs/script-log in Program.cs.

Nav rail gains a "Scripting" eyebrow with entries for all three pages.
19 new unit tests for ScriptLogHub parse/filter/tail helpers (Category=Unit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:59 -04:00
Joseph Doherty
bc8ff7a5fe feat(phase7): wire RingBufferHistoryWriter as production IHistoryWriter for virtual tags (Gap 5)
Closes Phase 7 Gap 5: VirtualTagEngine called IHistoryWriter.Record per evaluation
when Historize=true but Phase7EngineComposer always passed NullHistoryWriter, so
virtual-tag history was computed but never persisted.

The fix:
- New RingBufferHistoryWriter implements both IHistoryWriter (write port for the
  evaluation pipeline) and IHistorianDataSource (read port for IHistoryRouter so
  OPC UA HistoryRead on virtual-tag nodes resolves here). Maintains one bounded
  ring buffer (1000 samples, configurable) per tag path; Record() is O(1) and
  never blocks evaluation.
- Phase7EngineComposer.Compose now accepts IHistoryRouter? and, when any
  VirtualTagDefinition.Historize=true, creates a RingBufferHistoryWriter, passes
  it to VirtualTagEngine as historyWriter, adds it to the disposables list, and
  registers it under the "virtual:" prefix in the router for HistoryRead dispatch.
- Phase7Composer accepts IHistoryRouter? from DI (already registered as singleton
  in Program.cs) and threads it through to Phase7EngineComposer.Compose.
- NullHistoryWriter remains as fallback when no tags request historization.
- 16 new unit tests in RingBufferHistoryWriterTests.cs cover ring-buffer semantics,
  eviction, per-tag isolation, ReadRawAsync windowing, IHistorianDataSource stubs,
  router registration, and the Historize=false / null-router fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
ca149ce907 feat(phase7): route OPC UA Part 9 Acknowledge/Confirm methods to ScriptedAlarmEngine (task #24)
Gap 1 of phase-7-status.md. Intercepts AcknowledgeableConditionType_Acknowledge and
AcknowledgeableConditionType_Confirm calls in DriverNodeManager.Call and dispatches
them to ScriptedAlarmEngine so OPC UA HMI clients can acknowledge/confirm scripted alarms
in addition to the existing Admin UI path. Shelve methods deferred (per-instance NodeIds,
not well-known type MethodIds — follow-up task). AlarmEngine is now exposed through
Phase7ComposedSources so the server wire-up passes it to every DriverNodeManager. 13 new
unit tests cover dispatch kernel, identity fallback, batch handling, and error paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
1913bda6b8 Merge branch 'fix/phase-6-1-stale-compliance-check' 2026-05-18 05:25:11 -04:00
Joseph Doherty
fa965ede3d fix(compliance): drop stale Galaxy.Proxy assertions from phase 6.1 gate
phase-6-1-compliance.ps1 asserted CircuitBreaker.cs / Backoff.cs exist
under src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Supervisor/ — but the
Galaxy.Proxy project was retired in PR 7.2 with the legacy in-process
Galaxy stack. Circuit-breaker + backoff resilience is now the Core
pipeline (DriverResiliencePipelineBuilder, per-device-keyed), which the
same gate already asserts and passes. The two stale checks always fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:25:11 -04:00
Joseph Doherty
7b3b6580b3 Merge branch 'fix/test-fixture-sql-host'
Fix DB-test fixture defaults missed in the 2026-04-28 SQL-host migration.
2026-05-18 05:12:21 -04:00
Joseph Doherty
41da84293a fix(tests): point DB-test fixture defaults at the migrated SQL host
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).

Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:12:20 -04:00
Joseph Doherty
16a87b08f3 docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
  concrete test matrix (A/B/C blocks), and a step-by-step cutover
  runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
  and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
  CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
  procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
  worker wiring in the mxaccessgw sibling repo, documenting the
  discovered AVEVA API surface, the architectural decision that blocks
  A.2, the dependency order, and what each item needs to unblock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:53:36 -04:00
Joseph Doherty
da8a3e46f7 Merge branch 'feat/tasks-12-14-22-23'
Wave 2 of the task-list run:
- #12 harden the Phase 6.2 deferred authz gates (28 new tests; gates already shipped in fb6dd34)
- #14 Phase 6.4 Stream B.5 — five-identifier ranked equipment search
- #22 docs/v2/phase-7-status.md — Phase 7 reconciliation (Phase 7 ~complete, 5 gaps)
- #23 retire docs/v2/lmx-followups.md
2026-05-18 04:42:39 -04:00
Joseph Doherty
09af8d2830 docs: add Phase 7 status reconciliation document
Audits every Phase 7 plan stream (A-H) against the repo, confirms the
exit gate is fully closed, and records the five genuine remaining gaps:
OPC UA method-call dispatch for alarm Ack/Confirm/Shelve, the
/virtual-tags and /scripted-alarms Admin UI tabs, the script log viewer,
and the missing production IHistoryWriter for virtual-tag historization.
Also notes that docs/v2/v2-release-readiness.md carries a stale "out of
scope" label — Phase 7 shipped completely after that doc was last updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:40:32 -04:00
Joseph Doherty
6c78027b5a docs: retire docs/v2/lmx-followups.md (all items DONE, pre-PR-7.2 arch)
Every item in lmx-followups.md was marked DONE and rooted in the
retired GalaxyProxyDriver / OtOpcUaGalaxyHost named-pipe architecture
deleted in PR 7.2. No live or unique content remains: v2-release-readiness.md
is the canonical open-work tracker. Remove the file and drop the
now-dead link from docs/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
bb1854b2f8 feat(admin): add five-identifier ranked equipment search (Phase 6.4 Stream B.5)
Implements the missing Stream B.5 search from the Phase 6.4 plan:
- EquipmentService.SearchAsync scopes to a cluster, scores hits across
  ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid (decision #117):
  exact = 100, prefix = 50, fuzzy (opt-in) = 20; published generation
  outranks draft on equal scores per spec.
- EquipmentSearchHit record carries Score + MatchedField for badge display.
- EquipmentTab.razor gains a search panel with per-row matched-field chips
  (green exact, amber prefix, grey fuzzy) and fuzzy opt-in checkbox.
- 14 new unit tests in EquipmentSearchTests.cs (Category=Unit) cover exact,
  prefix, fuzzy, case-insensitivity, tie-break, cross-cluster isolation, and
  maxResults cap; all 148 admin unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
70d7166a39 test(server): harden deferred authz gates — task #12 Browse/Subscribe/Call/AlarmAck
Add DeferredGateHardeningTests (28 unit tests) covering the Phase 6.2
compliance-checklist gaps left by the per-gate unit suites that shipped
with the gate implementations:

- Lax-mode fall-through for CreateMonitoredItems and Call gates (null
  identity and identity-without-LDAP-groups both skip denial in lax mode,
  consistent with BrowseGatingTests.Lax_mode_null_identity)
- Flag isolation: Subscribe-only grant does NOT imply Read; Read-only
  grant does NOT imply Subscribe; HistoryRead-only grant does NOT imply
  Read and vice versa (Phase 6.2 compliance: "HistoryRead uses its own flag")
- Alarm-bit isolation: AlarmAcknowledge alone does not grant AlarmConfirm
  or AlarmShelve; Browse alone does not grant AlarmAcknowledge
- AlarmShelve falls through to OpcUaOperation.Call in MapCallOperation
  (documents the ShelvedStateMachine per-instance NodeId limitation noted
  in the implementation, with the follow-up path: MethodCall grant covers it)
- Complete OpcUaOperation→NodePermissions mapping coverage for all deferred
  operations (Browse, CreateMonitoredItems, TransferSubscriptions, Call,
  AlarmAcknowledge, AlarmConfirm, AlarmShelve) — both positive and
  wrong-bit negative cases
- Multi-group union for deferred gates (grp-browse ∪ grp-ack gives both
  Browse and AlarmAcknowledge without leaking Read or Call)

Build: 0 errors on Server.csproj (verified against main repo build which
carries the gRPC-generated Galaxy driver artifacts the isolated worktree
lacks — that pre-existing gap is unrelated to these changes).
Test count: 247 → 275 (+28 unit, 0 failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:57 -04:00
Joseph Doherty
6968872e5d Merge branch 'feat/tasks-9-10-11-admin-hardening'
Wave 1 of the task-list run — three Admin hardening tasks:
- #9  resilient LDAP role-grant reads (ResilientLdapGroupRoleMappingService)
- #10 InteractiveServer render mode on 13 interactive pages + hub-URL fixes
- #11 ZTag/SAPID reservation pre-check in equipment CSV import (task #197)
2026-05-18 04:25:50 -04:00
Joseph Doherty
020c30f9a6 feat(admin): add ZTag/SAPID reservation pre-check to equipment CSV import (task #197)
ApplyReservationPreCheckAsync on EquipmentImportBatchService queries active
ExternalIdReservation rows in a single round-trip at parse time; rows whose ZTag
or SAPID is claimed by a different EquipmentUuid are moved from AcceptedRows to
RejectedRows with a descriptive reason. ImportEquipment.razor calls the check
after EquipmentCsvImporter.Parse so conflicts appear in the preview before the
operator clicks Stage + Finalise. Updated notice banner to reflect the pre-check
is now live; 6 new unit tests cover conflict, no-conflict, same-UUID, released-
reservation, and empty-input paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
a8dabc47f9 fix(admin): add InteractiveServer render mode to remaining interactive pages
ModbusAddressPreview (@bind on dropdowns + child ModbusAddressEditor with @oninput),
ModbusDiagnostics (@onclick Refresh), and NewCluster (EditForm with Nav.NavigateTo on submit)
were missed in the first pass — all three require interactivity but had no @rendermode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
43291d7fdd fix(admin): add InteractiveServer render mode to all interactive Blazor pages; fix wrong hub URLs
Eight pages were using @onclick handlers, Timers, or HubConnections but had no @rendermode,
causing interactivity to be silently dead under static SSR. Added @rendermode RenderMode.InteractiveServer
(with the required @using Microsoft.AspNetCore.Components.Web) to: AlarmsHistorian, Certificates,
Fleet, Home, Hosts, Reservations, DraftEditor, and ImportEquipment.

Also fixed two hub URL bugs: AclsTab and RedundancyTab were connecting to the non-existent
/hubs/fleet-status path; corrected to /hubs/fleet which matches the MapHub<FleetStatusHub>
call in Program.cs. Build: 0 errors, 0 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
75b91ebb97 feat(admin): wrap LdapGroupRoleMappingService in Phase 6.1-style resilience pipeline (Phase 6.2 Stream A.2)
Add ResilientLdapGroupRoleMappingService — a singleton decorator that wraps the
hot-path GetByGroupsAsync call in a Polly pipeline (timeout 2s → retry 3× jittered
→ fallback to in-memory sealed snapshot) so a transient Config DB outage at
Admin sign-in falls back to the last-known-good mapping set rather than denying
every login. The static LdapOptions.GroupToRole bootstrap dictionary in
AdminRoleGrantResolver remains the lock-out-proof floor regardless of DB state.

DI wiring uses keyed services: LdapGroupRoleMappingService (EF, scoped) is
registered under key "LdapGroupRoleMappingService.Inner"; the resilient singleton
decorator is the primary ILdapGroupRoleMappingService binding. The singleton
avoids the captive-dependency anti-pattern by using IServiceScopeFactory to open
a short-lived scope for each DB call.

Write methods (CreateAsync, DeleteAsync, ListAllAsync) pass through unchanged —
resilience is read-path only per Phase 6.1 design decision.

15 new unit tests cover: DB success/failure/retry paths, snapshot sealing and
per-group-set isolation, order-independent cache key normalisation, cancellation
propagation, and pass-through method routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:22 -04:00
Joseph Doherty
412cdec9b1 Merge branch 'docs/reservations'
Add docs/Reservations.md documenting the external-ID reservation flow.
2026-05-18 03:48:27 -04:00
Joseph Doherty
b90718013e docs: add Reservations.md — external-ID reservation flow
Document the ZTag / SAPID external-ID reservation subsystem: what a
reservation is, why it sits outside the generation flow (decision #124),
the ExternalIdReservation table, the lifecycle (author → publish
precheck → publish-time MERGE → FleetAdmin release), and the
/reservations Admin page. Linked from the docs README Operational table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:48:21 -04:00
Joseph Doherty
4a2e993a95 Merge branch 'chore/admin-trim-page-notices'
Trim explanatory notices from the role-grants and certificates pages.
2026-05-18 03:34:20 -04:00
Joseph Doherty
adbbb5e7d0 chore(admin): trim explanatory notices from role-grants and certificates
- Role grants: drop the page notice describing the LDAP-group → role
  mapping semantics; this is moving to the user instructions.
- Certificates: drop the trailing "operators should retry the rejected
  client's connection" note from the trust notice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:34:13 -04:00
Joseph Doherty
a8ef73dcb5 Merge branch 'feat/ldap-role-grants-signin'
Consume DB-backed LDAP role grants at Admin sign-in, with fleet-wide
and cluster-scoped roles, and fix the role-grants page interactivity.
2026-05-18 03:18:27 -04:00
Joseph Doherty
22fd314694 fix(admin): make the role-grants page interactive
The role-grants page is the authoring surface for LdapGroupRoleMapping
rows, but it had no @rendermode — so it rendered as static SSR and its
@onclick handlers (Add grant, Revoke) never fired. App.razor's <Routes/>
sets no global render mode; only ClusterDetail opted in.

- Add @rendermode RenderMode.InteractiveServer.
- Fix the SignalR hub URL: the page connected to /hubs/fleet-status,
  but FleetStatusHub is mapped at /hubs/fleet. Static SSR masked this
  (OnAfterRenderAsync never ran); enabling interactivity surfaced the
  404 that terminated the circuit.

Verified in-browser: Add grant opens the form, a cluster-scoped grant
saves and lists, Revoke removes it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:17:56 -04:00
Joseph Doherty
8adb83afee feat(admin): consume LDAP role grants at sign-in, incl. cluster scoping
The role-grants page authored LdapGroupRoleMapping rows but nothing
consumed them — sign-in only read the static appsettings GroupToRole
dictionary. Wire the DB-backed grants into the auth path.

- AdminRoleGrantResolver merges the static bootstrap dictionary (always
  fleet-wide, lock-out-proof) with DB grants; system-wide rows fold into
  fleet roles, cluster-scoped rows become (cluster, role) grants.
- Login emits a ClaimTypes.Role claim per fleet role and a cluster_role
  claim per cluster-scoped grant; lock-out check spans both scopes.
- ClusterRoleClaims + ClaimsPrincipal extensions resolve the effective
  role for a cluster (highest of fleet-wide and cluster-scoped).
- ClusterAuthorizeView gates cluster pages: ClusterDetail (view +
  ConfigEditor draft actions), DraftEditor (ConfigEditor / FleetAdmin
  publish), DiffViewer (ConfigViewer), ImportEquipment (ConfigEditor).
- RoleGrants page is now FleetAdmin-only; Account surfaces fleet-wide
  and cluster-scoped grants separately.

Control-plane only — decision #150 holds, NodeAcl is untouched.

Tests: AdminRoleGrantResolverTests + ClusterRoleClaimsTests (22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:09:06 -04:00
Joseph Doherty
1e04796953 Merge branch 'feat/admin-technical-light-design'
Restyle the Admin web UI with the technical-light design system,
and fix the LDAP sign-in path so it actually authenticates against
GLAuth (form binding, service-account DN, user-search attribute).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:49:35 -04:00
Joseph Doherty
5f5bfe1ea5 fix: make Admin LDAP sign-in work against GLAuth
Three bugs blocked sign-in entirely:

- Login.razor is static-SSR but its form model lacked
  [SupplyParameterFromForm], so the posted username/password never
  bound — SignInAsync saw empty fields and bailed before LDAP was
  contacted. Annotate the model; seed it in OnInitialized since
  BL0008 forbids an initializer on a [SupplyParameterFromForm]
  property.
- appsettings.json ServiceAccountDn used ou=svcaccts, which GLAuth
  reads as a (non-existent) group — the service-account bind failed
  with "Group not found". Use cn=serviceaccount,dc=lmxopcua,dc=local.
- LdapAuthService resolved the user DN by searching (uid=...), but
  GLAuth keys users by cn. Add an LdapOptions.UserNameAttribute knob
  (default cn for GLAuth; set sAMAccountName for Active Directory)
  and use it for the search filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:48:00 -04:00
Joseph Doherty
482d5f5637 feat: restyle Admin UI with the technical-light design system
Adopt the technical-light design system across the Admin web UI:

- Vendor theme.css + IBM Plex woff2 fonts into wwwroot; include
  theme.css globally after Bootstrap.
- Rebuild MainLayout: top app-bar (brand mark, breadcrumb, connection
  pill) + hairline-ruled side rail with accent-bordered active link.
- Convert all 33 pages to the component catalog — tables to
  panel + data-table (num/mono columns), KPI cards to agg-grid,
  detail blocks to metric-card/kv rows, badges to chips, alerts to
  panel notice, headings to page-title/panel-head, .rise reveals.
- Buttons/forms stay on Bootstrap; theme.css restyles them via
  --bs-* overrides. View-specific layout lives in app.css; all
  colour/type comes from theme.css tokens.

Also fix a pre-existing /fleet 500: the node-state query ordered on
a property of a constructed FleetNodeRow record, which EF Core
cannot translate. Order the join's columns before projecting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:20:09 -04:00
Joseph Doherty
31b9468102 Merge branch 'fix/admin-configdb-host'
fix: point Admin ConfigDb at the shared SQL host (10.100.0.35,14330).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 06:27:07 -04:00
Joseph Doherty
cf024c8150 fix: point Admin ConfigDb at the shared SQL host
The Admin appsettings.json still carried Server=localhost,14330 — a
straggler from before the 2026-04-28 Docker migration that moved SQL
Server onto the shared Linux host. Every other checked-in appsettings
was rewritten then; this one was missed, so the Admin web UI returned
HTTP 500 on every page (SqlException, connection timeout). Repoint it
at 10.100.0.35,14330 to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 05:04:14 -04:00
Joseph Doherty
0aee14686b Merge branch 'chore/solution-module-folders'
Organize the solution into module folders (Core/Server/Drivers/Client/
Tooling) on disk and in ZB.MOM.WW.OtOpcUa.slnx, with all .csproj, script,
and docs path references updated to match. Build green; unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:28:05 -04:00
Joseph Doherty
4e1751e1a4 docs: correct CLAUDE.md test commands for per-module test layout
The Build Commands block referenced tests/ZB.MOM.WW.OtOpcUa.Tests and
.IntegrationTests, which never existed in v2. Replace with the actual
per-module layout under tests/<module>/ and note which suites need
Docker fixtures or the central SQL Server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:11:36 -04:00
Joseph Doherty
969b0847a1 docs: update path references for module-folder reorganization
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:10:29 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00
69f02fed7f Merge pull request 'docs: alarms-over-gateway plan banner — record A.2 dev-rig finding' (#418) from track-d1-followup-plan-banner into master 2026-04-30 21:31:40 -04:00
Joseph Doherty
5ed26d2ec6 docs: alarms-over-gateway plan banner — record A.2 dev-rig finding
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.

Two operator-facing paths forward documented inline:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions over a
   named-pipe IPC. Recovers full v1 fidelity but adds operational
   complexity.

The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:29:16 -04:00
439b39463b Merge pull request 'scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)' (#417) from track-d1-refresh-services into master 2026-04-30 21:13:58 -04:00
62d01e76e5 Merge pull request 'docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)' (#416) from track-b5-docs-memory-housekeeping into master 2026-04-30 21:11:29 -04:00
Joseph Doherty
32b872d5c7 scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)
Seventeenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the script that the
plan calls for in Track D — the actual smoke-run validation
on the dev rig (publish, restart, fire alarms, capture artifacts)
remains operator work; this PR ships the automation that the
operator drives.

scripts/install/Refresh-Services.ps1 — single-shot refresh
script. Designed to run elevated on the deploy host
(DESKTOP-6JL3KKO today; production uses a separate runbook).
The script:

- Stops services in reverse-dependency order (OtOpcUa →
  OtOpcUaWonderwareHistorian → MxAccessGw) and force-kills any
  residual processes (avoids the publish-time MSB3027 file-lock
  the original install script hit).
- Snapshots existing C:\publish trees to
  C:\publish\.backup-YYYY-MM-DD-HHMMSS\ for rollback (skip with
  -SkipBackup).
- Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
  binaries from the sibling repo.
- Publishes OtOpcUa Server + Wonderware historian sidecar from
  this repo.
- Ensures OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true is set on
  the historian service env block (PR C.2 toggle).
- Starts services in forward-dependency order with the
  inter-service waits the original install used.
- Smoke-verifies (service status, listening ports 5120 / 4840
  / 4841, recent log tails).

Supports -WhatIf for dry-run inspection without touching the
running services.

docs/v2/dev-environment.md — new "Service Refresh —
Refresh-Services.ps1" section between Credential Management
and Test Data Seed. Cross-references the plan's Track D
functional verification scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:11:27 -04:00
Joseph Doherty
89004c052c docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)
Sixteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Closes the documentation sweep
the plan calls for.

- docs/AlarmTracking.md — promoted top-level v2-final architecture
  doc (was a worktree-only draft pre-epic). Covers the three alarm
  sources (Galaxy MxAccess driver-native / Galaxy sub-attribute
  fallback / scripted alarms), how they converge on
  AlarmConditionService, the Acknowledge routing decision in
  DriverNodeManager (driver-native preferred over IWritable
  sub-attribute fallback), the sidecar historian write-back path
  for non-Galaxy producers, and cross-references to the plan +
  v1 archive.
- docs/v1/AlarmTracking.md — banner pointing readers at the v2
  doc; preserved as historical record.
- docs/drivers/Galaxy.md — capability list updated to include
  IAlarmSource (now eight capabilities, restored by B.2). Replaced
  the "IAlarmSource retired in 7.2" sentence with the restoration
  note + cross-link to docs/AlarmTracking.md.
- docs/plans/alarms-over-gateway.md — completion banner at the
  top of the plan, marking 14 of 16 PRs shipped 2026-04-30 and
  noting that A.2 + A.4 + D.1 are the hardware-gated follow-up.

Memory entries updated separately:
- project_alarms_over_gateway_epic.md (new) — epic summary +
  per-PR digest.
- project_galaxy_via_mxgateway.md — added "Alarms restored"
  bullet pointing at the new architecture.
- project_server_history_alarm_subsystems.md — bullet 2 updated
  to describe the new ack-routing decision (B.3) + bullet 3
  added describing the historian write-back path that B.4 + C.1
  + C.2 light up.
- MEMORY.md index — new pointer entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:09:04 -04:00
2baca785ad Merge pull request 'abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)' (#415) from track-e7-alarm-event-args-extension into master 2026-04-30 17:49:19 -04:00
Joseph Doherty
1d62709060 abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)
Fourteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged) and B.3 (DriverNodeManager prefers
driver-native ack, merged).

Three new optional fields on Core.Abstractions.AlarmEventArgs:

- OperatorComment — populated by the driver-native gateway path on
  Acknowledge transitions. Null on raise / clear, and null on the
  sub-attribute fallback path where the comment collapses into a
  single string write.
- OriginalRaiseTimestampUtc — preserved across Acknowledge so OPC
  UA Part 9 conditions keep the original raise time.
- AlarmCategory — taxonomy bucket from the upstream alarm system.
  Maps to ConditionClassName downstream when a class mapping is
  configured.

GalaxyDriver.OnPumpAlarmTransition populates the new fields from
GalaxyAlarmTransition (PR B.1). Empty strings collapse to null so
consumers can use is-null rather than is-null-or-empty checks.

Client.Shared mirror DTO (Client.Shared/Models/AlarmEventArgs)
gains the same three properties so the Client.UI / Client.CLI
surfaces can reflect the rich payload — the actual UI/CLI
verbose-output and Show-Details rendering ship as a follow-up
PR; this PR locks in the payload contract.

Tests:
- 2 new tests in Driver.Galaxy.Tests pin the populated-vs-null
  behaviour for full-payload Acknowledge and bare-bones Raise
  transitions respectively.
- Solution build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:46:47 -04:00
0b5a4a676e Merge pull request 'server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)' (#414) from track-b3-prefer-driver-native-alarm into master 2026-04-30 17:23:09 -04:00
Joseph Doherty
edc984987b server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)
Thirteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged).

When DriverNodeManager registers an AlarmConditionState with
AlarmConditionService, it now picks the acknowledger:

- Driver implements IAlarmSource → DriverAlarmSourceAcknowledger
  routes the operator comment through IAlarmSource.AcknowledgeAsync
  via the existing AlarmSurfaceInvoker (Phase 6.1 resilience pipeline,
  no-retry per decision #143). Preserves operator-comment fidelity
  end-to-end — the value-driven sub-attribute write collapses the
  comment into a single string write that loses MxAccess metadata.
- Driver does not implement IAlarmSource →
  DriverWritableAcknowledger fallback (existing behaviour for
  AbCip / Modbus / S7 / etc).

The dedup logic that prefers driver-native transitions over
sub-attribute synthesis lives in AlarmConditionService and is
already in place — drivers that surface OnAlarmEvent (B.2) feed
the service directly, while sub-attribute writes still flow
through DriverNodeManager's ConditionSink so a Galaxy template
without $Alarm extensions stays functional.

Tests:
- 2 new routing-decision tests in
  DriverAlarmSourceAcknowledgerRoutingTests pin the
  IAlarmSource detection used at registration time.
- Server build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:20:45 -04:00
6126374594 Merge pull request 'driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)' (#413) from track-b2-galaxy-driver-ialarmsource into master 2026-04-30 17:18:20 -04:00
Joseph Doherty
38afc234ff driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.1 (EventPump
dispatch, merged) and PR E.2 (.NET SDK alarm methods, merged).
Restores the v1 IAlarmSource capability that PR 7.2 retired with the
legacy Galaxy.Host / Galaxy.Proxy projects.

GalaxyDriver gains:
- IAlarmSource on the class declaration → eight capabilities total
  (IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable /
  IRediscoverable / IHostConnectivityProbe / IAlarmSource).
- SubscribeAlarmsAsync — returns a sentinel handle and starts the
  shared EventPump (alarm wiring is lazy on first sub).
  Multiple handles share the same gateway stream; the server-side
  AlarmConditionService dispatches per-source-node downstream.
- UnsubscribeAlarmsAsync — symmetric handle removal; rejects
  handles not issued by this driver.
- AcknowledgeAsync — issues one gateway RPC per acknowledgement
  through IGalaxyAlarmAcknowledger. ConditionId carries the alarm
  full reference; falls back to SourceNodeId when empty.
- OnAlarmEvent — bridges EventPump.OnAlarmTransition (B.1) onto
  AlarmEventArgs. Suppressed when no alarm subscription is active so
  untracked transitions don't leak through.

New runtime types:
- IGalaxyAlarmAcknowledger — test seam.
- GatewayGalaxyAlarmAcknowledger — production wrapper around
  MxGatewayClient.AcknowledgeAlarmAsync (PR E.2). Maps native
  MxStatus failures to a logged warning rather than a thrown
  exception so a transient MxAccess hiccup doesn't fail the
  operator's Acknowledge.
- GalaxyAlarmSubscriptionHandle — driver-side IAlarmSubscriptionHandle.

Production runtime construction in BuildProductionRuntimeAsync wires
the acknowledger when not pre-injected; tests inject a fake via the
internal ctor.

Tests:
- 7 new tests in GalaxyDriverAlarmSourceTests — subscribe → event
  fire path, suppress without subscription, unsubscribe stops flow,
  foreign-handle rejection, ack routes per-request, ack falls back
  to SourceNodeId, ack throws NotSupported without acknowledger.
- Full Driver.Galaxy.Tests: 203 passed (was 196; 7 new).

Operates as a "stub-ready" surface — runtime ack calls will return
PERMISSION_DENIED until A.3 ships the gateway-side dispatch, and no
alarm transitions will arrive until A.2 adds the worker MxAccess
subscription. Both will activate this code path automatically when
the gateway side lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:15:46 -04:00
95422995c0 Merge pull request 'server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)' (#412) from track-b4-sidecar-alarm-historian-writer into master 2026-04-30 16:33:27 -04:00
Joseph Doherty
6e282b9946 server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)
Sixth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.2 (sidecar
serves IAlarmEventWriter when enabled), already merged.

Today Phase7Composer.ResolveHistorianSink only scans drivers for an
IAlarmHistorianWriter — no Galaxy driver provides one since PR 7.2,
so the resolution falls through to NullAlarmHistorianSink and
scripted-alarm transitions are silently discarded.

WonderwareHistorianClient already implements IAlarmHistorianWriter
and Program.cs:178 already registers it as a singleton when
Historian:Wonderware:Enabled=true. The gap was that Phase7Composer
ignored DI: this PR adds an optional injectedWriter constructor
parameter, and ASP.NET Core DI resolves it from the same
registration when present.

- Phase7Composer constructor: new optional IAlarmHistorianWriter?
  injectedWriter parameter (default null). Backward-compatible —
  existing callers don't need to change; DI populates it
  automatically when the singleton is registered.
- New static SelectAlarmHistorianWriter helper — resolution order
  is driver → DI → null. Drivers win when both are present so a
  future GalaxyDriver-as-IAlarmHistorianWriter takes the write
  path directly, preserving the v1 invariant where a driver that
  natively owns the historian client doesn't bounce through the
  sidecar IPC.
- ResolveHistorianSink uses the helper + emits a structured log
  line identifying which source provided the writer.

Tests:
- 4 SelectAlarmHistorianWriter precedence tests — no source / DI
  only / driver wins over DI / first-driver-with-writer wins.
- Pre-existing 4 HostStatusPublisherTests SQL failures unrelated
  to this change (require the docker-host SQL Server at
  10.100.0.35,14330 per CLAUDE.md). Phase7 + alarm tests all
  green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:31:00 -04:00
f67b3b1b30 Merge pull request 'sidecar: wire IAlarmEventWriter into Program.cs (PR C.2)' (#411) from track-c2-program-wires-alarm-writer into master 2026-04-30 16:22:36 -04:00