Commit Graph

35 Commits

Author SHA1 Message Date
Joseph Doherty
c76ab8fdee Close all four stability-review 2026-04-13 findings so a failed runtime probe subscription can no longer leave a phantom entry that Tick() flips to Stopped and fans out false BadOutOfService quality across a host's subtree, a silently-failed dashboard bind no longer lets the service advertise a successful start while an operator-visible endpoint is dead, the seven sync-over-async sites in LmxNodeManager (rebuild probe sync, Read, Write, four HistoryRead overrides) can no longer park the OPC UA stack thread indefinitely on a hung backend, and alarm auto-subscribe + transferred-subscription restore no longer race shutdown as untracked fire-and-forget tasks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:48:07 -04:00
Joseph Doherty
731092595f Stop MxAccess from overwriting Bad quality on stopped-host variables: suppress pending data changes at dispatch, guard cross-host clear from wiping sibling state, and silence the Unknown→Running startup callback so recovering DevPlatform can no longer reset variables that a still-stopped DevAppEngine marked Bad.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:22:28 -04:00
Joseph Doherty
4b209f64bb Expose per-host runtime status as synthetic OPC UA variables so clients can observe Platform/Engine ScanState transitions without the status dashboard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:07:16 -04:00
Joseph Doherty
98ed6bd47b Stop OPC UA Read requests from serving stale Good-quality cached values while a Galaxy runtime host is Stopped, and defer probe-transition callbacks through a dispatch-thread queue so MarkHostVariablesBadQuality can no longer deadlock against worker threads waiting on the MxAccess STA thread
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:20:01 -04:00
Joseph Doherty
9d49cdcc58 Track Galaxy Platform and AppEngine runtime state via ScanState probes and proactively invalidate descendant variable quality on Stopped transitions so operators can detect a stopped runtime host before downstream clients read stale data and so the bridge delivers a uniform bad-quality signal instead of relying on MxAccess per-tag fan-out
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:44 -04:00
Joseph Doherty
8f340553d9 Instrument the historian plugin with runtime query health counters and read-only cluster failover so operators can detect silent query degradation and keep serving history when a single cluster node goes down
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:08:32 -04:00
Joseph Doherty
4fe37fd1b7 Promote service version into the dashboard title and surface the active alarm filter patterns in the Alarms panel so operators can verify scope at a glance without reading logs or the footer block
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:05:47 -04:00
Joseph Doherty
517d92c76f Scope alarm tracking to selected templates and surface endpoint/security state on the dashboard so operators can deploy in large galaxies without drowning clients in irrelevant alarms or guessing what the server is advertising
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:48:57 -04:00
Joseph Doherty
c5ed5312a9 Surface historian plugin and alarm-tracking health in the status dashboard so operators can detect misconfiguration and runtime degradation that previously showed as fully healthy
Wraps the 4 HistoryRead overrides and OnAlarmAcknowledge with PerformanceMetrics.BeginOperation, adds alarm counters to LmxNodeManager, publishes a structured HistorianPluginOutcome from HistorianPluginLoader, and extends HealthCheckService with plugin-load, history-read, and alarm-ack-failure degradation rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:52:03 -04:00
Joseph Doherty
9b42b61eb6 Extract historian into a runtime-loaded plugin so hosts without the Wonderware SDK can run with Historian.Enabled=false
The aahClientManaged SDK is now isolated in ZB.MOM.WW.LmxOpcUa.Historian.Aveva and loaded via HistorianPluginLoader from a Historian/ subfolder only when enabled, removing the SDK from Host's compile-time and deploy-time surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:16:07 -04:00
Joseph Doherty
9e1a180ce3 Resolve blocking I/O finding and complete Historian lifecycle test coverage
Move subscribe/unsubscribe I/O outside lock(Lock) in SyncAddressSpace to avoid
blocking all OPC UA operations during rebuilds. Replace blocking ReadAsync calls
for alarm priority/description in dispatch loop with cached subscription values.
Extract IHistorianConnectionFactory so EnsureConnected can be tested without the
SDK runtime — adds 5 connection lifecycle tests (failure, timeout, reconnect,
state resilience, dispose-after-failure). All stability review findings and test
coverage gaps are now fully resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:16:03 -04:00
Joseph Doherty
95ad9c6866 Resolve 6 of 7 stability review findings and close test coverage gaps
Fixes P1 StaComThread hang (crash-path faulting via WorkItem queue), P1 subscription
fire-and-forget (block+log or ContinueWith on 5 call sites), P2 continuation point
leak (PurgeExpired on Retrieve/Release), P2 dashboard bind failure (localhost prefix,
bool Start), P3 background loop double-start (task handles + join on stop in 3 files),
and P3 config logging exposure (SqlConnectionStringBuilder password masking). Adds
FakeMxAccessClient fault injection and 12 new tests. Documents required runtime
assemblies in ServiceHosting.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:37:27 -04:00
Joseph Doherty
6d47687573 Resolve DA, A&C, and security spec gaps with ServerCapabilities, alarm methods, and modern profiles
Add ServerCapabilities/OperationLimits node, enable diagnostics, add OnModifyMonitoredItemsComplete
override for DA compliance. Wire shelving, enable/disable, confirm, and addcomment handlers on
alarm conditions with LocalTime/Quality event fields for Part 9 compliance. Add Aes128/Aes256
security profiles, X.509 certificate authentication, and AUDIT-prefixed auth logging. Fix flaky
probe monitor test. Update docs for all changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:02:05 -04:00
Joseph Doherty
41f0e9ec4c Migrate historian from SQL to aahClientManaged SDK and resolve all OPC UA Part 11 gaps
Replace direct SQL queries against Historian Runtime database with the Wonderware
Historian managed SDK (ArchestrA.HistorianAccess). Add HistoryServerCapabilities node,
AggregateFunctions folder, continuation points, ReadAtTime interpolation, ReturnBounds,
ReadModified rejection, HistoricalDataConfiguration per node, historical event access,
and client-side StandardDeviation aggregate support. Remove screenshot tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:38:00 -04:00
Joseph Doherty
188cbf7d24 Add UI features, alarm ack, historian UTC fix, and Client.UI documentation
Major changes across the client stack:
- Settings persistence (connection, subscriptions, alarm source)
- Deferred OPC UA SDK init for instant startup
- Array/status code formatting, write value popup, alarm acknowledgment
- Severity-colored alarm rows, condition dedup on server side
- DateTimeRangePicker control with preset buttons and UTC text input
- Historian queries use wwTimezone=UTC and OPCQuality column
- Recursive subscribe from tree, multi-select remove
- Connection panel with expander, folder chooser for cert path
- Dynamic tab headers showing subscription/alarm counts
- Client.UI.md documentation with headless-rendered screenshots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 20:46:45 -04:00
Joseph Doherty
41a6b66943 Apply code style formatting and restore partial modifiers on Avalonia views
Linter/formatter pass across the full codebase. Restores required partial
keyword on AXAML code-behind classes that the formatter incorrectly removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:58:13 -04:00
Joseph Doherty
50b85d41bd Consolidate LDAP roles into OPC UA session roles with granular write permissions
Map LDAP groups to custom OPC UA role NodeIds on RoleBasedIdentity.GrantedRoleIds
during authentication, replacing the username-to-role side cache. Split ReadWrite
into WriteOperate/WriteTune/WriteConfigure so write access is gated per Galaxy
security classification. AnonymousCanWrite now behaves consistently regardless
of LDAP state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 01:50:16 -04:00
Joseph Doherty
50b9603465 Propagate alarm events up the full notifier chain so subscribers at any ancestor see them
Previously alarms were only reported to the immediate parent node and the Server node.
Now ReportEventUpNotifierChain walks the full parent chain so clients subscribed at
TestArea see alarms from TestMachine_001, and EventNotifier is set on all ancestors
of alarm-containing nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 20:25:55 -04:00
Joseph Doherty
d9463d6998 Remove static Users auth, use shared QualityMapper for historian, simplify LDAP permission checks
- Remove ConfigUserAuthenticationProvider and Users property — LDAP is the only auth mechanism
- Fix historian quality mapping to use existing QualityMapper (OPC DA quality bytes, not custom mapping)
- Add AppRoles constants, unify HasWritePermission/HasAlarmAckPermission into shared HasRole helper
- Hoist write permission check out of per-item loop, eliminate redundant _ldapRolesEnabled field
- Update docs (Configuration.md, Security.md, OpcUaServer.md, HistoricalDataAccess.md)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:23:20 -04:00
Joseph Doherty
74107ea95e Add LDAP authentication with role-based OPC UA permissions
Replace static user list with GLAuth LDAP authentication. Group
membership (ReadOnly, ReadWrite, AlarmAck) maps to granular OPC UA
permissions for write and alarm-ack operations. Anonymous can still
browse and read but not write.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:57:30 -04:00
Joseph Doherty
bbd043e97b Add authentication and role-based write access control
Implements configurable user authentication (anonymous + username/password)
with pluggable credential provider (IUserAuthenticationProvider). Anonymous
writes can be disabled via AnonymousCanWrite setting while reads remain
open. Adds -U/-P flags to all CLI commands for authenticated sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:14:37 -04:00
Joseph Doherty
b27d355763 Fix alarm acknowledge EventId validation and add auth plan
Set a new EventId (GUID) on AlarmConditionState each time an alarm event
is reported so the framework can match it when clients call Acknowledge.
Without this, the framework rejected all ack attempts with BadEventIdUnknown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:39:21 -04:00
Joseph Doherty
9368767b1b Add alarm acknowledge plan and incorporate code review fixes
Adds alarm_ack.md documenting the two-way acknowledge flow (OPC UA client
writes AckMsg, Galaxy confirms via Acked data change). Includes external
code review fixes for subscriptions and node manager, and removes stale
plan files now superseded by component documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:02:47 -04:00
Joseph Doherty
ce0b291664 Refine XML docs for historian, OPC UA, and tests 2026-03-26 15:33:14 -04:00
Joseph Doherty
3c326e2d45 Replace full address space rebuild with incremental subtree sync
On Galaxy deploy changes, only the affected gobject subtrees are torn down
and rebuilt instead of destroying the entire address space. Unchanged nodes,
subscriptions, and alarm tracking continue uninterrupted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:23:11 -04:00
Joseph Doherty
bfd360a6db Add enable/disable configuration for alarm tracking and historian integration
Both features now default to disabled and require explicit opt-in via
OpcUa.AlarmTrackingEnabled and Historian.Enabled in appsettings.json,
preventing errors in environments without a Historian database or alarm setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 13:56:38 -04:00
Joseph Doherty
415e62c585 Add security classification, alarm detection, historical data access, and primitive grouping
Wire Galaxy security_classification to OPC UA AccessLevel (ReadOnly for SecuredWrite/VerifiedWrite/ViewOnly).
Use deployed package chain for attribute queries to exclude undeployed attributes.
Group primitive attributes under their parent variable node (merged Variable+Object).
Add is_historized and is_alarm detection via HistoryExtension/AlarmExtension primitives.
Implement OPC UA HistoryRead backed by Wonderware Historian Runtime database.
Implement AlarmConditionState nodes driven by InAlarm with condition refresh support.
Add historyread and alarms CLI commands for testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:32:33 -04:00
Joseph Doherty
bb0a89b2a1 Publish default values for null static arrays 2026-03-25 14:12:37 -04:00
Joseph Doherty
ed42b33512 Use bracketless OPC UA node IDs for arrays 2026-03-25 12:57:05 -04:00
Joseph Doherty
4833765606 Expand XML docs across bridge and test code 2026-03-25 11:45:12 -04:00
Joseph Doherty
3f813b3869 Add OPC UA array element write integration test 2026-03-25 11:05:04 -04:00
Joseph Doherty
4351854754 Fix service deployment: set working directory for correct log paths and use MasterNodeManager for Objects→ZB reference
Windows services default to System32 as working directory, causing logs to land in the wrong location. Set Environment.CurrentDirectory to AppDomain.CurrentDomain.BaseDirectory before Serilog init. Also fix ZB root folder not appearing under Objects folder — BuildAddressSpace runs after CreateAddressSpace completes so the externalReferences dict is already consumed; use Server.NodeManager.AddReferences instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 10:57:18 -04:00
Joseph Doherty
09ed15bdda Fix second-pass review findings: subscription leak on rebuild, metrics accuracy, and MxAccess startup recovery
- Preserve and replay subscription ref counts across address space rebuilds to prevent MXAccess subscription leaks
- Mark read timeouts and write failures as unsuccessful in PerformanceMetrics for accurate health reporting
- Add deferred MxAccess reconnect path when initial connection fails at startup
- Update code review document with verified completions and new findings
- Add covering tests for all fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 09:41:12 -04:00
Joseph Doherty
71254e005e Fix 5 code review findings (P1-P3)
P1: Wire OPC UA monitored items to MXAccess subscriptions
  - Override OnCreateMonitoredItemsComplete/OnDeleteMonitoredItemsComplete
    in LmxNodeManager to trigger ref-counted SubscribeTag/UnsubscribeTag
  - Clients subscribing to tags now start live MXAccess data pushes

P1: Write timeout now returns false instead of true
  - Previously a missing OnWriteComplete callback was treated as success
  - Now correctly reports failure so OPC UA clients see the error

P1: Auto-reconnect retries from Error state (not just Disconnected)
  - Monitor loop now checks both Disconnected and Error states
  - Prevents permanent outages after a single failed reconnect attempt

P2: Topological sort on hierarchy before building address space
  - Parents guaranteed to appear before children regardless of input order
  - Prevents misplaced nodes when SQL returns unsorted results

P3: Skip redundant first-poll rebuild on startup
  - ChangeDetectionService accepts initial deploy time from OpcUaService
  - First poll only triggers rebuild if deploy time is actually unknown
  - Eliminates duplicate DB fetch and address space rebuild at startup

All 212 tests pass (205 unit + 7 integration).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 07:16:23 -04:00
Joseph Doherty
a7576ffb38 Implement LmxOpcUa server — all 6 phases complete
Full OPC UA server on .NET Framework 4.8 (x86) exposing AVEVA System
Platform Galaxy tags via MXAccess. Mirrors Galaxy object hierarchy as
OPC UA address space, translating contained-name browse paths to
tag-name runtime references.

Components implemented:
- Configuration: AppConfiguration with 4 sections, validator
- Domain: ConnectionState, Quality, Vtq, MxDataTypeMapper, error codes
- MxAccess: StaComThread, MxAccessClient (partial classes), MxProxyAdapter
  using strongly-typed ArchestrA.MxAccess COM interop
- Galaxy Repository: SQL queries (hierarchy, attributes, change detection),
  ChangeDetectionService with auto-rebuild on deploy
- OPC UA Server: LmxNodeManager (CustomNodeManager2), LmxOpcUaServer,
  OpcUaServerHost with programmatic config, SecurityPolicy None
- Status Dashboard: HTTP server with HTML/JSON/health endpoints
- Integration: Full 14-step startup, graceful shutdown, component wiring

175 tests (174 unit + 1 integration), all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 05:55:27 -04:00