Commit Graph

47 Commits

Author SHA1 Message Date
Joseph Doherty
32eeeb9e04 Phase 2 Streams A+B+C feature-complete — real Win32 pump, all 9 IDriver capabilities, end-to-end IPC dispatch. Streams D+E remain (Galaxy MXAccess code lift + parity-debug cycle, plan-budgeted 3-4 weeks). The 494 v1 IntegrationTests still pass — legacy OtOpcUa.Host untouched. StaPump replaces the BlockingCollection placeholder with a real Win32 message pump lifted from v1 StaComThread per CLAUDE.md "Reference Implementation": dedicated STA Thread with SetApartmentState(STA), GetMessage/PostThreadMessage/PeekMessage/TranslateMessage/DispatchMessage/PostQuitMessage P/Invoke, WM_APP=0x8000 for work-item dispatch, WM_APP+1 for graceful-drain → PostQuitMessage, peek-pm-noremove on entry to force the system to create the thread message queue before signalling Started, IsResponsiveAsync probe still no-op-round-trips through PostThreadMessage so the wedge detection works against the real pump. Concurrent ConcurrentQueue<WorkItem> drains on every WM_APP; fault path on dispose drains-and-faults all pending work-item TCSes with InvalidOperationException("STA pump has exited"). All three StaPumpTests pass against the real pump (apartment state STA, healthy probe true, wedged probe false). GalaxyProxyDriver now implements every Phase 2 Stream C capability — IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe — each forwarding through the matching IPC contract. ReadAsync preserves request order even when the Host returns out-of-order values; WriteAsync MessagePack-serializes the value into ValueBytes; SubscribeAsync wraps SubscriptionId in a GalaxySubscriptionHandle record; UnsubscribeAsync uses the new SendOneWayAsync helper on GalaxyIpcClient (fire-and-forget but still gated through the call-semaphore so it doesn't interleave with CallAsync); AlarmSubscribe is one-way and the Host pushes events back via OnAlarmEvent; ReadProcessedAsync short-circuits to NotSupportedException (Galaxy historian only does raw); IRediscoverable's OnRediscoveryNeeded fires when the Host pushes a deploy-watermark notification; IHostConnectivityProbe.GetHostStatuses() snapshots and OnHostStatusChanged fires on Running↔Stopped/Faulted transitions, with IpcHostConnectivityStatus aliased to disambiguate from the Core.Abstractions namespace's same-named type. Internal RaiseDataChange/RaiseAlarmEvent/RaiseRediscoveryNeeded/OnHostConnectivityUpdate methods are the entry points the IPC client will invoke when push frames arrive. Host side: new Backend/IGalaxyBackend interface defines the seam between IPC dispatch and the live MXAccess code (so the dispatcher is unit-testable against an in-memory mock without needing live Galaxy); Backend/StubGalaxyBackend returns success for OpenSession/CloseSession/Subscribe/Unsubscribe/AlarmSubscribe/AlarmAck/Recycle and a recognizable "stub: MXAccess code lift pending (Phase 2 Task B.1)"-tagged error for Discover/ReadValues/WriteValues/HistoryRead — keeps the IPC end-to-end testable today and gives the parity team a clear seam to slot the real implementation into; Ipc/GalaxyFrameHandler is the new real dispatcher (replaces StubFrameHandler in Program.cs) — switch on MessageKind, deserialize the matching contract, await backend method, write the response (one-way for Unsubscribe/AlarmSubscribe/AlarmAck/CloseSession), heartbeat handled inline so liveness still works if the backend is sick, exceptions caught and surfaced as ErrorResponse with code "handler-exception" so the Proxy raises GalaxyIpcException instead of disconnecting. End-to-end IPC integration test (EndToEndIpcTests) drives every operation through the full stack — Initialize → Read → Write → Subscribe → Unsubscribe → SubscribeAlarms → AlarmAck → ReadRaw → ReadProcessed (short-circuit) — proving the wire protocol, dispatcher, capability forwarding, and one-way semantics agree end-to-end. Skipped on Windows administrator shells per the same PipeAcl-denies-Administrators reasoning the IpcHandshakeIntegrationTests use. Full solution 952 pass / 1 pre-existing Phase 0 baseline. Phase 2 evidence doc updated: status header now reads "Streams A+B+C complete... Streams D+E remain — gated only on the iterative Galaxy code lift + parity-debug cycle"; new Update 2026-04-17 (later) callout enumerates the upgrade with explicit "what's left for the Phase 2 exit gate" — replace StubGalaxyBackend with a MxAccessClient-backed implementation calling on the StaPump, then run the v1 IntegrationTests against the v2 topology and iterate on parity defects until green, then delete legacy OtOpcUa.Host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 23:02:00 -04:00
Joseph Doherty
18f93d72bb Phase 1 LDAP auth + SignalR real-time — closes the last two open Admin UI TODOs. LDAP: Admin/Security/ gets SecurityOptions (bound from appsettings.json Authentication:Ldap), LdapAuthResult record, ILdapAuthService + LdapAuthService ported from scadalink-design's LdapAuthService (TLS guard, search-then-bind when a service account is configured, direct-bind fallback, service-account re-bind after user bind so attribute lookup uses the service principal's read rights, LdapException-to-friendly-message translation, OperationCanceledException pass-through), RoleMapper (pure function: case-insensitive group-name match against LdapOptions.GroupToRole, returns the distinct set of mapped Admin roles). EscapeLdapFilter escapes the five LDAP filter control chars (\, *, (, ), \0); ExtractFirstRdnValue pulls the value portion of a DN's leading RDN for memberOf parsing; ExtractOuSegment added as a GLAuth-specific fallback when the directory doesn't populate memberOf but does embed ou=PrimaryGroup into user DNs (actual GLAuth config in C:\publish\glauth\glauth.cfg uses nameformat=cn, groupformat=ou — direct bind is enough). Login page rewritten: EditForm → ILdapAuthService.AuthenticateAsync → cookie sign-in with claims (Name = displayName, NameIdentifier = username, Role for each mapped role, ldap_group for each raw group); failed bind shows the service's error; empty-role-map returns an explicit "no Admin role mapped" message rather than silently succeeding. appsettings.json gains an Authentication:Ldap section with dev-GLAuth defaults (localhost:3893, UseTls=false, AllowInsecureLdap=true for dev, GroupToRole maps GLAuth's ReadOnly/WriteOperate/AlarmAck → ConfigViewer/ConfigEditor/FleetAdmin). SignalR: two hubs + a BackgroundService poller. FleetStatusHub routes per-cluster NodeStateChanged pushes (SubscribeCluster/UnsubscribeCluster on connection; FleetGroup for dashboard-wide) with a typed NodeStateChangedMessage payload. AlertHub auto-subscribes every connection to the AllAlertsGroup and exposes AcknowledgeAsync (ack persistence deferred to v2.1). FleetStatusPoller (IHostedService, 5s default cadence) scans ClusterNodeGenerationState joined with ClusterNode, caches the prior snapshot per NodeId, pushes NodeStateChanged on any delta, raises AlertMessage("apply-failed") on transition INTO Failed (sticky — the hub client acks later). Program.cs registers HttpContextAccessor (sign-in needs it), SignalR, LdapOptions + ILdapAuthService, the poller as hosted service, and maps /hubs/fleet + /hubs/alerts endpoints. ClusterDetail adds @rendermode RenderMode.InteractiveServer, @implements IAsyncDisposable, and a HubConnectionBuilder subscription that calls LoadAsync() on each NodeStateChanged for its cluster so the "current published" card refreshes without a page reload; a dismissable "Live update" info banner surfaces the most recent event. Microsoft.AspNetCore.SignalR.Client 10.0.0 + Novell.Directory.Ldap.NETStandard 3.6.0 added. Tests: 13 new — RoleMapperTests (single group, case-insensitive match, multi-group distinct-roles, unknown-group ignored, empty-map); LdapAuthServiceTests (EscapeLdapFilter with 4 inputs, ExtractFirstRdnValue with 4 inputs — all via reflection against internals); LdapLiveBindTests (skip when localhost:3893 unreachable; valid-credentials-bind-succeeds; wrong-password-fails-with-recognizable-error; empty-username-rejected-before-hitting-directory); FleetStatusPollerTests (throwaway DB, seeds cluster+node+generation+apply-state, runs PollOnceAsync, asserts NodeStateChanged hit the recorder; second test seeds a Failed state and asserts AlertRaised fired) — backed by RecordingHubContext/RecordingHubClients/RecordingClientProxy that capture SendCoreAsync invocations while throwing NotImplementedException for the IHubClients methods the poller doesn't call (fail-fast if evolution adds new dependencies). InternalsVisibleTo added so the test project can call FleetStatusPoller.PollOnceAsync directly. Full solution 946 pass / 1 pre-existing Phase 0 baseline failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 22:28:49 -04:00
Joseph Doherty
7a5b535cd6 Phase 1 Stream E Admin UI — finish Blazor pages so operators can run the draft → publish → rollback workflow end-to-end without hand-executing SQL. Adds eight new scoped services that wrap the Configuration stored procs + managed validators: EquipmentService (CRUD with auto-derived EquipmentId per decision #125), UnsService (areas + lines), NamespaceService, DriverInstanceService (generic JSON DriverConfig editor per decision #94 — per-driver schema validation lands in each driver's phase), NodeAclService (grant + revoke with bundled-preset permission sets; full per-flag editor + bulk-grant + permission simulator deferred to v2.1), ReservationService (fleet-wide active + released reservation inspector + FleetAdmin-only sp_ReleaseExternalIdReservation wrapper with required-reason invariant), DraftValidationService (hydrates a DraftSnapshot from the draft's rows plus prior-cluster Equipment + active reservations, runs the managed DraftValidator to surface every rule in one pass for inline validation panel), AuditLogService (recent ConfigAuditLog reader). Pages: /clusters list with create-new shortcut; /clusters/new wizard that creates the cluster row + initial empty draft in one go; /clusters/{id} detail with 8 tabs (Overview / Generations / Equipment / UNS Structure / Namespaces / Drivers / ACLs / Audit) — tabs that write always target the active draft, published generations stay read-only; /clusters/{id}/draft/{gen} editor with live validation panel (errors list with stable code + message + context; publish button disabled while any error exists) and tab-embedded sub-components; /clusters/{id}/draft/{gen}/diff three-column view backed by sp_ComputeGenerationDiff with Added/Removed/Modified badges; Generations tab with per-row rollback action wired to sp_RollbackToGeneration; /reservations FleetAdmin-only page (CanPublish policy) with active + released lists and a modal release dialog that enforces non-empty reason and round-trips through sp_ReleaseExternalIdReservation; /login scaffold with stub credential accept + FleetAdmin-role cookie issuance (real LDAP bind via the ScadaLink-parity LdapAuthService is deferred until live GLAuth integration — marked in the login view and in the Phase 1 partial-exit TODO). Layout: sidebar gets Overview / Clusters / Reservations + AuthorizeView with signed-in username + roles + sign-out POST to /auth/logout; cascading authentication state registered for <AuthorizeView> to work in RenderMode.InteractiveServer. Integration testing: AdminServicesIntegrationTests creates a throwaway per-run database (same pattern as the Configuration test fixture), applies all three migrations, and exercises (1) create-cluster → add-namespace+UNS+driver+equipment → validate (expects zero errors) → publish (expects Published status) → rollback (expects one new Published + at least one Superseded); (2) cross-cluster namespace binding draft → validates to BadCrossClusterNamespaceBinding per decision #122. Old flat Components/Pages/Clusters.razor moved to Components/Pages/Clusters/ClustersList.razor so the Clusters folder can host tab sub-components without the razor generator creating a type-and-namespace collision. Dev appsettings.json connection string switched from Integrated Security to sa auth to match the otopcua-mssql container on port 14330 (remapped from 1433 to coexist with the native MSSQL14 Galaxy ZB instance). Browser smoke test completed: home page, clusters list, new-cluster form, cluster detail with a seeded row, reservations (redirected to login for anon user) all return 200 / 302-to-login as expected; full solution 928 pass / 1 pre-existing Phase 0 baseline failure. Phase 1 Stream E items explicitly deferred with TODOs: CSV import for Equipment, SignalR FleetStatusHub + AlertHub real-time push, bulk-grant workflow, permission-simulator trie, merge-equipment draft, AppServer-via-OI-Gateway end-to-end smoke test (decision #142), and the real LDAP bind replacing the Login page stub.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:52:42 -04:00
Joseph Doherty
01fd90c178 Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:35:25 -04:00
Joseph Doherty
980ea5190c Phase 1 Stream A — Core.Abstractions project + 11 capability interfaces + DriverTypeRegistry + interface-independence tests
New project src/ZB.MOM.WW.OtOpcUa.Core.Abstractions (.NET 10, BCL-only dependencies, GenerateDocumentationFile=true, TreatWarningsAsErrors=true) defining the contract surface every driver implements. Per docs/v2/plan.md decisions #4 (composable capability interfaces), #52 (streaming IAddressSpaceBuilder), #53 (capability discovery via `is` checks no flag enum), #54 (optional IRediscoverable sub-interface), #59 (Core.Abstractions internal-only for now design as if public).

Eleven capability interfaces:
- IDriver — required lifecycle / health / config-apply / memory-footprint accounting (per driver-stability.md Tier A/B allocation tracking)
- ITagDiscovery — discovers tags streaming to IAddressSpaceBuilder
- IReadable — on-demand reads idempotent for Polly retry
- IWritable — writes NOT auto-retried by default per decisions #44 + #45
- ISubscribable — data-change subscriptions covering both native (Galaxy MXAccess advisory, OPC UA monitored items, TwinCAT ADS) and driver-internal polled (Modbus, AB CIP, S7, FOCAS) mechanisms; OnDataChange callback regardless of source
- IAlarmSource — alarm events + acknowledge + AlarmSeverity enum mirroring acl-design.md NodePermissions alarm-severity values
- IHistoryProvider — HistoryReadRaw + HistoryReadProcessed with continuation points
- IRediscoverable — opt-in change-detection signal; static drivers don't implement
- IHostConnectivityProbe — generalized from Galaxy's GalaxyRuntimeProbeManager per plan §5a
- IDriverConfigEditor — Admin UI plug-point for per-driver custom config editors deferred to each driver's phase per decision #27
- IAddressSpaceBuilder — streaming builder API for driver-driven address-space construction

Plus DTOs: DriverDataType, SecurityClassification (mirroring v1 Galaxy model), DriverAttributeInfo (replaces Galaxy-specific GalaxyAttributeInfo per plan §5a), DriverHealth + DriverState, DataValueSnapshot (universal OPC UA quality + timestamp carrier per decision #13), HostConnectivityStatus + HostState + HostStatusChangedEventArgs, RediscoveryEventArgs, DataChangeEventArgs, AlarmEventArgs + AlarmAcknowledgeRequest + AlarmSeverity, WriteRequest + WriteResult, HistoryReadResult + HistoryAggregateType, ISubscriptionHandle + IAlarmSubscriptionHandle + IVariableHandle.

DriverTypeRegistry singleton with Register / Get / TryGet / All; thread-safe via Interlocked.Exchange snapshot replacement on registration; case-insensitive lookups; rejects duplicate registrations; rejects empty type names. DriverTypeMetadata record carries TypeName + AllowedNamespaceKinds (NamespaceKindCompatibility flags enum per decision #111) + per-config-tier JSON Schemas the validator checks at draft-publish time (decision #91).

Tests project tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests (xUnit v3 1.1.0 matching existing test projects). 24 tests covering: 1) interface independence reflection check (no references outside BCL/System; all public types in root namespace; every capability interface is public); 2) DriverTypeRegistry round-trip, case-insensitive lookups, KeyNotFoundException on unknown, null on TryGet of unknown, InvalidOperationException on duplicate registration (case-insensitive too), All() enumeration, NamespaceKindCompatibility bitmask combinations, ArgumentException on empty type names.

Build: 0 errors, 4 warnings (only pre-existing transitive package vulnerability + analyzer hints). Full test suite: 845 passing / 1 failing — strict improvement over Phase 0 baseline (821/1) by the 24 new Core.Abstractions tests; no regressions in any other test project.

Phase 1 entry-gate record (docs/v2/implementation/entry-gate-phase-1.md) documents the deviation: only Stream A executed in this continuation since Streams B-E need SQL Server / GLAuth / Galaxy infrastructure standup per dev-environment.md Step 1, which is currently TODO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:15:55 -04:00
Joseph Doherty
3b2defd94f Phase 0 — mechanical rename ZB.MOM.WW.LmxOpcUa.* → ZB.MOM.WW.OtOpcUa.*
Renames all 11 projects (5 src + 6 tests), the .slnx solution file, all source-file namespaces, all axaml namespace references, and all v1 documentation references in CLAUDE.md and docs/*.md (excluding docs/v2/ which is already in OtOpcUa form). Also updates the TopShelf service registration name from "LmxOpcUa" to "OtOpcUa" per Phase 0 Task 0.6.

Preserves runtime identifiers per Phase 0 Out-of-Scope rules to avoid breaking v1/v2 client trust during coexistence: OPC UA `ApplicationUri` defaults (`urn:{GalaxyName}:LmxOpcUa`), server `EndpointPath` (`/LmxOpcUa`), `ServerName` default (feeds cert subject CN), `MxAccessConfiguration.ClientName` default (defensive — stays "LmxOpcUa" for MxAccess audit-trail consistency), client OPC UA identifiers (`ApplicationName = "LmxOpcUaClient"`, `ApplicationUri = "urn:localhost:LmxOpcUaClient"`, cert directory `%LocalAppData%\LmxOpcUaClient\pki\`), and the `LmxOpcUaServer` class name (class rename out of Phase 0 scope per Task 0.5 sed pattern; happens in Phase 1 alongside `LmxNodeManager → GenericDriverNodeManager` Core extraction). 23 LmxOpcUa references retained, all enumerated and justified in `docs/v2/implementation/exit-gate-phase-0.md`.

Build clean: 0 errors, 30 warnings (lower than baseline 167). Tests at strict improvement over baseline: 821 passing / 1 failing vs baseline 820 / 2 (one flaky pre-existing failure passed this run; the other still fails — both pre-existing and unrelated to the rename). `Client.UI.Tests`, `Historian.Aveva.Tests`, `Client.Shared.Tests`, `IntegrationTests` all match baseline exactly. Exit gate compliance results recorded in `docs/v2/implementation/exit-gate-phase-0.md` with all 7 checks PASS or DEFERRED-to-PR-review (#7 service install verification needs Windows service permissions on the reviewer's box).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:57:47 -04:00
Joseph Doherty
bc282b6788 Add Galaxy platform scope filter so multi-node deployments can restrict the OPC UA address space to only objects hosted by the local platform, reducing memory footprint and MXAccess subscription count from the full Galaxy (49 objects / 4206 attributes) down to the local subtree (3 objects / 386 attributes on the dev Galaxy).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:39:11 -04:00
Joseph Doherty
c76ab8fdee Close all four stability-review 2026-04-13 findings so a failed runtime probe subscription can no longer leave a phantom entry that Tick() flips to Stopped and fans out false BadOutOfService quality across a host's subtree, a silently-failed dashboard bind no longer lets the service advertise a successful start while an operator-visible endpoint is dead, the seven sync-over-async sites in LmxNodeManager (rebuild probe sync, Read, Write, four HistoryRead overrides) can no longer park the OPC UA stack thread indefinitely on a hung backend, and alarm auto-subscribe + transferred-subscription restore no longer race shutdown as untracked fire-and-forget tasks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:48:07 -04:00
Joseph Doherty
731092595f Stop MxAccess from overwriting Bad quality on stopped-host variables: suppress pending data changes at dispatch, guard cross-host clear from wiping sibling state, and silence the Unknown→Running startup callback so recovering DevPlatform can no longer reset variables that a still-stopped DevAppEngine marked Bad.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:22:28 -04:00
Joseph Doherty
98ed6bd47b Stop OPC UA Read requests from serving stale Good-quality cached values while a Galaxy runtime host is Stopped, and defer probe-transition callbacks through a dispatch-thread queue so MarkHostVariablesBadQuality can no longer deadlock against worker threads waiting on the MxAccess STA thread
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:20:01 -04:00
Joseph Doherty
9d49cdcc58 Track Galaxy Platform and AppEngine runtime state via ScanState probes and proactively invalidate descendant variable quality on Stopped transitions so operators can detect a stopped runtime host before downstream clients read stale data and so the bridge delivers a uniform bad-quality signal instead of relying on MxAccess per-tag fan-out
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:44 -04:00
Joseph Doherty
8f340553d9 Instrument the historian plugin with runtime query health counters and read-only cluster failover so operators can detect silent query degradation and keep serving history when a single cluster node goes down
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:08:32 -04:00
Joseph Doherty
4fe37fd1b7 Promote service version into the dashboard title and surface the active alarm filter patterns in the Alarms panel so operators can verify scope at a glance without reading logs or the footer block
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:05:47 -04:00
Joseph Doherty
517d92c76f Scope alarm tracking to selected templates and surface endpoint/security state on the dashboard so operators can deploy in large galaxies without drowning clients in irrelevant alarms or guessing what the server is advertising
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:48:57 -04:00
Joseph Doherty
c5ed5312a9 Surface historian plugin and alarm-tracking health in the status dashboard so operators can detect misconfiguration and runtime degradation that previously showed as fully healthy
Wraps the 4 HistoryRead overrides and OnAlarmAcknowledge with PerformanceMetrics.BeginOperation, adds alarm counters to LmxNodeManager, publishes a structured HistorianPluginOutcome from HistorianPluginLoader, and extends HealthCheckService with plugin-load, history-read, and alarm-ack-failure degradation rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:52:03 -04:00
Joseph Doherty
9b42b61eb6 Extract historian into a runtime-loaded plugin so hosts without the Wonderware SDK can run with Historian.Enabled=false
The aahClientManaged SDK is now isolated in ZB.MOM.WW.LmxOpcUa.Historian.Aveva and loaded via HistorianPluginLoader from a Historian/ subfolder only when enabled, removing the SDK from Host's compile-time and deploy-time surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:16:07 -04:00
Joseph Doherty
9e1a180ce3 Resolve blocking I/O finding and complete Historian lifecycle test coverage
Move subscribe/unsubscribe I/O outside lock(Lock) in SyncAddressSpace to avoid
blocking all OPC UA operations during rebuilds. Replace blocking ReadAsync calls
for alarm priority/description in dispatch loop with cached subscription values.
Extract IHistorianConnectionFactory so EnsureConnected can be tested without the
SDK runtime — adds 5 connection lifecycle tests (failure, timeout, reconnect,
state resilience, dispose-after-failure). All stability review findings and test
coverage gaps are now fully resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:16:03 -04:00
Joseph Doherty
95ad9c6866 Resolve 6 of 7 stability review findings and close test coverage gaps
Fixes P1 StaComThread hang (crash-path faulting via WorkItem queue), P1 subscription
fire-and-forget (block+log or ContinueWith on 5 call sites), P2 continuation point
leak (PurgeExpired on Retrieve/Release), P2 dashboard bind failure (localhost prefix,
bool Start), P3 background loop double-start (task handles + join on stop in 3 files),
and P3 config logging exposure (SqlConnectionStringBuilder password masking). Adds
FakeMxAccessClient fault injection and 12 new tests. Documents required runtime
assemblies in ServiceHosting.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:37:27 -04:00
Joseph Doherty
6d47687573 Resolve DA, A&C, and security spec gaps with ServerCapabilities, alarm methods, and modern profiles
Add ServerCapabilities/OperationLimits node, enable diagnostics, add OnModifyMonitoredItemsComplete
override for DA compliance. Wire shelving, enable/disable, confirm, and addcomment handlers on
alarm conditions with LocalTime/Quality event fields for Part 9 compliance. Add Aes128/Aes256
security profiles, X.509 certificate authentication, and AUDIT-prefixed auth logging. Fix flaky
probe monitor test. Update docs for all changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:02:05 -04:00
Joseph Doherty
41f0e9ec4c Migrate historian from SQL to aahClientManaged SDK and resolve all OPC UA Part 11 gaps
Replace direct SQL queries against Historian Runtime database with the Wonderware
Historian managed SDK (ArchestrA.HistorianAccess). Add HistoryServerCapabilities node,
AggregateFunctions folder, continuation points, ReadAtTime interpolation, ReturnBounds,
ReadModified rejection, HistoricalDataConfiguration per node, historical event access,
and client-side StandardDeviation aggregate support. Remove screenshot tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:38:00 -04:00
Joseph Doherty
5c89a44255 Document client stack XML docs progress 2026-04-01 08:58:17 -04:00
Joseph Doherty
188cbf7d24 Add UI features, alarm ack, historian UTC fix, and Client.UI documentation
Major changes across the client stack:
- Settings persistence (connection, subscriptions, alarm source)
- Deferred OPC UA SDK init for instant startup
- Array/status code formatting, write value popup, alarm acknowledgment
- Severity-colored alarm rows, condition dedup on server side
- DateTimeRangePicker control with preset buttons and UTC text input
- Historian queries use wwTimezone=UTC and OPCQuality column
- Recursive subscribe from tree, multi-select remove
- Connection panel with expander, folder chooser for cert path
- Dynamic tab headers showing subscription/alarm counts
- Client.UI.md documentation with headless-rendered screenshots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 20:46:45 -04:00
Joseph Doherty
41a6b66943 Apply code style formatting and restore partial modifiers on Avalonia views
Linter/formatter pass across the full codebase. Restores required partial
keyword on AXAML code-behind classes that the formatter incorrectly removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:58:13 -04:00
Joseph Doherty
55ef854612 Add tree context menu, missing connection settings, and fix lazy-load browse
Right-click on browse tree nodes to Subscribe (multi-select) or View History
(Variable nodes only), with automatic tab switching. Add missing UI controls
for failover URLs, session timeout, auto-accept certificates, and certificate
store path. Fix tree expansion by adding two-way IsExpanded binding on
TreeViewItem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 17:45:29 -04:00
Joseph Doherty
a2883b82d9 Add cross-platform OPC UA client stack: shared library, CLI tool, and Avalonia UI
Implements Client.Shared (IOpcUaClientService with connection lifecycle, failover,
browse, read/write, subscriptions, alarms, history, redundancy), Client.CLI (8 CliFx
commands mirroring tools/opcuacli-dotnet), and Client.UI (Avalonia desktop app with
tree browser, read/write, subscriptions, alarms, and history tabs). All three target
.NET 10 and are covered by 249 unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 15:49:42 -04:00
Joseph Doherty
50b85d41bd Consolidate LDAP roles into OPC UA session roles with granular write permissions
Map LDAP groups to custom OPC UA role NodeIds on RoleBasedIdentity.GrantedRoleIds
during authentication, replacing the username-to-role side cache. Split ReadWrite
into WriteOperate/WriteTune/WriteConfigure so write access is gated per Galaxy
security classification. AnonymousCanWrite now behaves consistently regardless
of LDAP state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 01:50:16 -04:00
Joseph Doherty
d9463d6998 Remove static Users auth, use shared QualityMapper for historian, simplify LDAP permission checks
- Remove ConfigUserAuthenticationProvider and Users property — LDAP is the only auth mechanism
- Fix historian quality mapping to use existing QualityMapper (OPC DA quality bytes, not custom mapping)
- Add AppRoles constants, unify HasWritePermission/HasAlarmAckPermission into shared HasRole helper
- Hoist write permission check out of per-item loop, eliminate redundant _ldapRolesEnabled field
- Update docs (Configuration.md, Security.md, OpcUaServer.md, HistoricalDataAccess.md)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:23:20 -04:00
Joseph Doherty
74107ea95e Add LDAP authentication with role-based OPC UA permissions
Replace static user list with GLAuth LDAP authentication. Group
membership (ReadOnly, ReadWrite, AlarmAck) maps to granular OPC UA
permissions for write and alarm-ack operations. Anonymous can still
browse and read but not write.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:57:30 -04:00
Joseph Doherty
9d3599fbb6 Add rich HTTP health endpoints for cluster monitoring
Enhance /api/health with component-level health, ServiceLevel, and
redundancy state for load balancer probes. Add /health HTML page for
operators to monitor node health in clustered System Platform deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:44:31 -04:00
Joseph Doherty
a55153d7d5 Add configurable non-transparent OPC UA server redundancy
Separates ApplicationUri from namespace identity so each instance in a
redundant pair has a unique server URI while sharing the same Galaxy
namespace. Exposes RedundancySupport, ServerUriArray, and dynamic
ServiceLevel through the standard OPC UA server object. ServiceLevel
is computed from role (Primary/Secondary) and runtime health (MXAccess
and DB connectivity). Adds CLI redundancy command, second deployed
service instance, and 31 new tests including paired-server integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:32:17 -04:00
Joseph Doherty
55173665b1 Add configurable transport security profiles and bind address
Adds Security section to appsettings.json with configurable OPC UA
transport profiles (None, Basic256Sha256-Sign, Basic256Sha256-SignAndEncrypt),
certificate policy settings, and a configurable BindAddress for the
OPC UA endpoint. Defaults preserve backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:59:43 -04:00
Joseph Doherty
bbd043e97b Add authentication and role-based write access control
Implements configurable user authentication (anonymous + username/password)
with pluggable credential provider (IUserAuthenticationProvider). Anonymous
writes can be disabled via AnonymousCanWrite setting while reads remain
open. Adds -U/-P flags to all CLI commands for authenticated sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:14:37 -04:00
Joseph Doherty
9368767b1b Add alarm acknowledge plan and incorporate code review fixes
Adds alarm_ack.md documenting the two-way acknowledge flow (OPC UA client
writes AckMsg, Galaxy confirms via Acked data change). Includes external
code review fixes for subscriptions and node manager, and removes stale
plan files now superseded by component documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:02:47 -04:00
Joseph Doherty
ce0b291664 Refine XML docs for historian, OPC UA, and tests 2026-03-26 15:33:14 -04:00
Joseph Doherty
3c326e2d45 Replace full address space rebuild with incremental subtree sync
On Galaxy deploy changes, only the affected gobject subtrees are torn down
and rebuilt instead of destroying the entire address space. Unchanged nodes,
subscriptions, and alarm tracking continue uninterrupted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:23:11 -04:00
Joseph Doherty
415e62c585 Add security classification, alarm detection, historical data access, and primitive grouping
Wire Galaxy security_classification to OPC UA AccessLevel (ReadOnly for SecuredWrite/VerifiedWrite/ViewOnly).
Use deployed package chain for attribute queries to exclude undeployed attributes.
Group primitive attributes under their parent variable node (merged Variable+Object).
Add is_historized and is_alarm detection via HistoryExtension/AlarmExtension primitives.
Implement OPC UA HistoryRead backed by Wonderware Historian Runtime database.
Implement AlarmConditionState nodes driven by InAlarm with condition refresh support.
Add historyread and alarms CLI commands for testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:32:33 -04:00
Joseph Doherty
bb0a89b2a1 Publish default values for null static arrays 2026-03-25 14:12:37 -04:00
Joseph Doherty
ed42b33512 Use bracketless OPC UA node IDs for arrays 2026-03-25 12:57:05 -04:00
Joseph Doherty
4833765606 Expand XML docs across bridge and test code 2026-03-25 11:45:12 -04:00
Joseph Doherty
3f813b3869 Add OPC UA array element write integration test 2026-03-25 11:05:04 -04:00
Joseph Doherty
09ed15bdda Fix second-pass review findings: subscription leak on rebuild, metrics accuracy, and MxAccess startup recovery
- Preserve and replay subscription ref counts across address space rebuilds to prevent MXAccess subscription leaks
- Mark read timeouts and write failures as unsuccessful in PerformanceMetrics for accurate health reporting
- Add deferred MxAccess reconnect path when initial connection fails at startup
- Update code review document with verified completions and new findings
- Add covering tests for all fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 09:41:12 -04:00
Joseph Doherty
ee7e190fab Add multi-client subscription sync and concurrency integration tests
9 tests verifying server handles multiple simultaneous OPC UA clients:

Subscription sync:
- 3 clients subscribe to same tag, all receive data changes
- Client disconnect doesn't affect other clients' subscriptions
- Client unsubscribe doesn't affect other clients' subscriptions
- Clients subscribing to different tags receive only their own data

Concurrency:
- 5 clients browse simultaneously, all get identical results
- 5 clients browse different nodes concurrently, all succeed
- 4 clients browse+subscribe simultaneously, no interference
- 3 clients subscribe+browse concurrently, no deadlock (timeout-guarded)
- Rapid connect/disconnect cycles (10x), server stays stable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 06:54:47 -04:00
Joseph Doherty
e4aaee10f7 Add runtime address space rebuild integration tests
Tests verify nodes can be added/removed from the OPC UA server at
runtime by mutating FakeGalaxyRepository and triggering a rebuild.
Uses real OPC UA client sessions to browse, subscribe, and observe
changes.

Tests cover:
- Browse initial hierarchy via OPC UA client
- Add object at runtime → new node appears on browse
- Remove object → node disappears from browse
- Subscribe to node, then remove it → publishes Bad quality
- Surviving nodes still browsable after partial rebuild
- Add/remove individual attributes at runtime

Infrastructure:
- OpcUaTestClient helper for programmatic OPC UA client connections
- OpcUaServerFixture updated with GalaxyRepository/MxProxy accessors
- OpcUaService.TriggerRebuild() exposed for test-driven rebuilds
- Namespace index resolved dynamically via session namespace table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 06:32:31 -04:00
Joseph Doherty
44177acf64 Add integration test harness: OpcUaServiceBuilder + OpcUaServerFixture
OpcUaServiceBuilder provides fluent API for constructing OpcUaService
with dependency overrides (IMxProxy, IGalaxyRepository, IMxAccessClient).
WithMxAccessClient skips the STA thread and COM interop entirely.

OpcUaServerFixture wraps the service lifecycle with automatic port
allocation (atomic counter starting at 16000), guaranteed cleanup via
IAsyncLifetime, and factory methods for common test scenarios:
- WithFakes() — FakeMxProxy + FakeGalaxyRepository with standard data
- WithFakeMxAccessClient() — bypasses COM, fastest for most tests

Also adds TestData helper with reusable hierarchy/attributes matching
gr/layout.md, and 5 fixture tests verifying startup, shutdown, port
isolation, and address space building.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 06:22:31 -04:00
Joseph Doherty
a0edac81fb Add probe/stale tag monitoring tests
Cover the full probe staleness detection path that was previously
untested: stale probe forces reconnect, data changes prevent false
staleness, no-probe config skips staleness check, probe tag subscribed
on connect and protected from unsubscribe.

5 new tests, 184 total passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 06:10:49 -04:00
Joseph Doherty
72d7a21a9d Add ExtendedAttributes config toggle for system+user attributes
When GalaxyRepository.ExtendedAttributes is true, uses the extended
attributes query that includes both primitive (system) and dynamic
(user-defined) attributes. Default is false (dynamic only, preserving
existing behavior). Extended mode returns ~564 attributes vs ~48.

Adds PrimitiveName and AttributeSource fields to GalaxyAttributeInfo.
Includes 5 new unit tests and 6 new integration tests covering both
standard and extended attribute modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 06:05:55 -04:00
Joseph Doherty
a7576ffb38 Implement LmxOpcUa server — all 6 phases complete
Full OPC UA server on .NET Framework 4.8 (x86) exposing AVEVA System
Platform Galaxy tags via MXAccess. Mirrors Galaxy object hierarchy as
OPC UA address space, translating contained-name browse paths to
tag-name runtime references.

Components implemented:
- Configuration: AppConfiguration with 4 sections, validator
- Domain: ConnectionState, Quality, Vtq, MxDataTypeMapper, error codes
- MxAccess: StaComThread, MxAccessClient (partial classes), MxProxyAdapter
  using strongly-typed ArchestrA.MxAccess COM interop
- Galaxy Repository: SQL queries (hierarchy, attributes, change detection),
  ChangeDetectionService with auto-rebuild on deploy
- OPC UA Server: LmxNodeManager (CustomNodeManager2), LmxOpcUaServer,
  OpcUaServerHost with programmatic config, SecurityPolicy None
- Status Dashboard: HTTP server with HTML/JSON/health endpoints
- Integration: Full 14-step startup, graceful shutdown, component wiring

175 tests (174 unit + 1 integration), all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 05:55:27 -04:00