30 KiB
Service Update Summary
Updated service instance: C:\publish\lmxopcua\instance1
Update time: 2026-03-25 12:54-12:55 America/New_York
Backup created before deploy: C:\publish\lmxopcua\backups\20260325-125444
Configuration preserved:
C:\publish\lmxopcua\instance1\appsettings.jsonwas not overwritten.
Deployed binary:
C:\publish\lmxopcua\instance1\ZB.MOM.WW.LmxOpcUa.Host.exe- Last write time:
2026-03-25 12:53:58 - Size:
143360
Windows service:
- Name:
LmxOpcUa - Display name:
LMX OPC UA Server - Account:
LocalSystem - Status after update:
Running - Process ID after restart:
29236
Restart evidence:
- Service log file:
C:\publish\lmxopcua\instance1\logs\lmxopcua-20260325_004.log - Last startup line:
2026-03-25 12:55:08.619 -04:00 [INF] The LmxOpcUa service was started.
CLI Verification
Endpoint from deployed config:
opc.tcp://localhost:4840/LmxOpcUa
CLI used:
C:\Users\dohertj2\Desktop\lmxopcua\tools\opcuacli-dotnet\bin\Debug\net10.0\opcuacli-dotnet.exe
Commands run:
opcuacli-dotnet.exe connect -u opc.tcp://localhost:4840/LmxOpcUa
opcuacli-dotnet.exe read -u opc.tcp://localhost:4840/LmxOpcUa -n 'ns=1;s=MESReceiver_001.MoveInPartNumbers'
opcuacli-dotnet.exe read -u opc.tcp://localhost:4840/LmxOpcUa -n 'ns=1;s=MESReceiver_001.MoveInPartNumbers[]'
Observed results:
connect: succeeded, server reported asLmxOpcUa.read ns=1;s=MESReceiver_001.MoveInPartNumbers: succeeded with good status0x00000000.read ns=1;s=MESReceiver_001.MoveInPartNumbers[]: failed withBadNodeIdUnknown(0x80340000).
Instance 2 (Redundant Secondary)
Deployed: 2026-03-28
Deployment path: C:\publish\lmxopcua\instance2
Configuration:
OpcUa.Port:4841OpcUa.ServerName:LmxOpcUa2OpcUa.ApplicationUri:urn:localhost:LmxOpcUa:instance2Dashboard.Port:8082MxAccess.ClientName:LmxOpcUa2Redundancy.Enabled:trueRedundancy.Mode:WarmRedundancy.Role:SecondaryRedundancy.ServerUris:["urn:localhost:LmxOpcUa:instance1", "urn:localhost:LmxOpcUa:instance2"]
Windows service:
- Name:
LmxOpcUa2 - Display name:
LMX OPC UA Server (Instance 2) - Account:
LocalSystem - Endpoint:
opc.tcp://localhost:4841/LmxOpcUa
Instance 1 redundancy update (same date):
OpcUa.ApplicationUri:urn:localhost:LmxOpcUa:instance1Redundancy.Enabled:trueRedundancy.Mode:WarmRedundancy.Role:PrimaryRedundancy.ServerUris:["urn:localhost:LmxOpcUa:instance1", "urn:localhost:LmxOpcUa:instance2"]
CLI verification:
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
→ Redundancy Mode: Warm, Service Level: 200, Application URI: urn:localhost:LmxOpcUa:instance1
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4841/LmxOpcUa
→ Redundancy Mode: Warm, Service Level: 150, Application URI: urn:localhost:LmxOpcUa:instance2
Both instances report the same ServerUriArray and expose the same Galaxy namespace (urn:ZB:LmxOpcUa).
LDAP Authentication Update
Updated: 2026-03-28
Both instances updated to use LDAP authentication via GLAuth.
Configuration changes (both instances):
Authentication.AllowAnonymous:true(anonymous can browse/read)Authentication.AnonymousCanWrite:false(anonymous writes blocked)Authentication.Ldap.Enabled:trueAuthentication.Ldap.Host:localhostAuthentication.Ldap.Port:3893Authentication.Ldap.BaseDN:dc=lmxopcua,dc=local
LDAP server: GLAuth v2.4.0 at C:\publish\glauth\ (Windows service: GLAuth)
Permission verification (instance1, port 4840):
anonymous read → allowed
anonymous write → denied (BadUserAccessDenied)
readonly read → allowed
readonly write → denied (BadUserAccessDenied)
readwrite write → allowed
admin write → allowed
alarmack write → denied (BadUserAccessDenied)
bad password → denied (connection rejected)
Alarm Notifier Chain Update
Updated: 2026-03-28
Both instances updated with alarm event propagation up the notifier chain.
Code changes:
- Alarm events now walk up the parent chain (
ReportEventUpNotifierChain), reporting to every ancestor node EventNotifier = SubscribeToEventsis set on all ancestors of alarm-containing nodes (EnableEventNotifierUpChain)- Removed separate
Server.ReportEventcall (no longer needed — the walk reaches the root)
No configuration changes required — alarm tracking was already enabled (AlarmTrackingEnabled: true).
Verification (instance1, port 4840):
alarms --node TestArea --refresh:
TestMachine_001.TestAlarm001 → visible (Severity=500, Retain=True)
TestMachine_001.TestAlarm002 → visible (Severity=500, Retain=True)
TestMachine_001.TestAlarm003 → visible (Severity=500, Retain=True)
TestMachine_002.TestAlarm001 → visible (Severity=500, Retain=True)
TestMachine_002.TestAlarm003 → visible (Severity=500, Retain=True)
alarms --node DEV --refresh:
Same 5 alarms visible at DEV (grandparent) level
Auth Consolidation Update
Updated: 2026-03-28
Both instances updated to consolidate LDAP roles into OPC UA session roles (RoleBasedIdentity.GrantedRoleIds).
Code changes:
- LDAP groups now map to custom OPC UA role NodeIds in
urn:zbmom:lmxopcua:rolesnamespace - Roles stored on session identity via
GrantedRoleIds— no username-to-role side cache - Permission checks use
GrantedRoleIds.Contains()instead of username extraction AnonymousCanWritebehavior is consistent regardless of LDAP state- Galaxy namespace moved from
ns=2tons=3(roles namespace isns=2)
No configuration changes required.
Verification (instance1, port 4840):
anonymous read → allowed
anonymous write → denied (BadUserAccessDenied, AnonymousCanWrite=false)
readonly write → denied (BadUserAccessDenied)
readwrite write → allowed
admin write → allowed
alarmack write → denied (BadUserAccessDenied)
bad password → rejected (connection failed)
Granular Write Roles Update
Updated: 2026-03-28
Both instances updated with granular write roles replacing the single ReadWrite role.
Code changes:
ReadWriterole replaced byWriteOperate,WriteTune,WriteConfigure- Write permission checks now consider the Galaxy security classification of the target attribute
SecurityClassificationstored inTagMetadatafor per-node lookup at write time
GLAuth changes:
- New groups:
WriteOperate(5502),WriteTune(5504),WriteConfigure(5505) - New users:
writeop,writetune,writeconfig adminuser added to all groups (5502, 5503, 5504, 5505)
Config changes (both instances):
Authentication.Ldap.ReadWriteGroupreplaced byWriteOperateGroup,WriteTuneGroup,WriteConfigureGroup
Verification (instance1, port 4840, Operate-classified attributes):
anonymous read → allowed
anonymous write → denied (AnonymousCanWrite=false)
readonly write → denied (no write role)
writeop write → allowed (WriteOperate matches Operate classification)
writetune write → denied (WriteTune doesn't match Operate)
writeconfig write → denied (WriteConfigure doesn't match Operate)
admin write → allowed (has all write roles)
Historian SDK Migration
Updated: 2026-04-06
Both instances updated to use the Wonderware Historian SDK (aahClientManaged.dll) instead of direct SQL queries for historical data access.
Code changes:
HistorianDataSourcerewritten fromSqlConnection/SqlDataReadertoArchestrA.HistorianAccessSDK- Persistent connection with lazy connect and auto-reconnect on failure
HistorianConfiguration.ConnectionStringreplaced withServerName,IntegratedSecurity,UserName,Password,PortHistorianDataSourcenow implementsIDisposable, disposed on service shutdownConfigurationValidatorvalidates Historian SDK settings at startup
SDK DLLs deployed to both instances:
aahClientManaged.dll(primary SDK, v2.0.0.0)aahClient.dll,aahClientCommon.dll(dependencies)Historian.CBE.dll,Historian.DPAPI.dll,ArchestrA.CloudHistorian.Contract.dll
Configuration changes (both instances):
Historian.ConnectionStringremovedHistorian.ServerName:"localhost"Historian.IntegratedSecurity:trueHistorian.Port:32568Historian.Enabled:true(unchanged)
Verification (instance1 startup log):
Historian.Enabled=true, ServerName=localhost, IntegratedSecurity=true, Port=32568
Historian.CommandTimeoutSeconds=30, MaxValuesPerRead=10000
=== Configuration Valid ===
LmxOpcUa service started successfully
HistoryServerCapabilities and Continuation Points
Updated: 2026-04-06
Both instances updated with OPC UA Part 11 spec compliance improvements.
Code changes:
HistoryServerCapabilitiesnode populated underServerCapabilitieswith all boolean capability propertiesAggregateFunctionsfolder populated with references to 7 supported aggregate functionsHistoryContinuationPointManageradded — stores remaining data when results exceedNumValuesPerNodeHistoryReadRawModifiedandHistoryReadProcessednow returnContinuationPointinHistoryReadResultfor partial reads- Follow-up requests with
ContinuationPointresume from stored state; invalid/expired points returnBadContinuationPointInvalid
No configuration changes required.
Verification (instance1 startup log):
HistoryServerCapabilities configured with 7 aggregate functions
LmxOpcUa service started successfully
Remaining Historian Gaps Fix
Updated: 2026-04-06
Both instances updated with remaining OPC UA Part 11 spec compliance fixes.
Code changes:
- Gap 4:
HistoryReadRawModifiedreturnsBadHistoryOperationUnsupportedwhenIsReadModified=true - Gap 5:
HistoryReadAtTimeoverride added withReadAtTimeAsyncusing SDKHistorianRetrievalMode.Interpolated - Gap 8:
HistoricalDataConfigurationStatechild nodes added to historized variables (Stepped=false,Definition="Wonderware Historian") - Gap 10:
ReturnBoundsparameter handled — boundaryDataValueentries withBadBoundNotFoundinserted at StartTime/EndTime - Gap 11:
StandardDeviationaggregate added to client enum, mapper, CLI (aliases:stddev/stdev), and UI dropdown
No configuration changes required.
Historical Event Access
Updated: 2026-04-06
Both instances updated with OPC UA historical event access (Gap 7).
Code changes:
HistorianDataSource.ReadEventsAsyncqueries Historian event store via separateHistorianConnectionType.EventconnectionLmxNodeManager.HistoryReadEventsoverride mapsHistorianEventrecords to OPC UAHistoryEventFieldListentriesAccessHistoryEventsCapabilityset totruewhenAlarmTrackingEnabledis true- Event fields: EventId, EventType, SourceNode, SourceName, Time, ReceiveTime, Message, Severity
No configuration changes required. All historian gaps (1-11) are now resolved.
Data Access Gaps Fix
Updated: 2026-04-06
Both instances updated with OPC UA DA spec compliance fixes.
Code changes:
ConfigureServerCapabilities()populatesServerCapabilitiesnode:ServerProfileArray,LocaleIdArray,MinSupportedSampleRate, continuation point limits, array/string limits, and 12OperationLimitsvaluesServer_ServerDiagnostics_EnabledFlagset totrue— SDK auto-tracks session/subscription countsOnModifyMonitoredItemsCompleteoverride logs monitored item modifications
No configuration changes required. All DA gaps (1-8) resolved.
Alarms & Conditions Gaps Fix
Updated: 2026-04-06
Both instances updated with OPC UA Part 9 alarm spec compliance fixes.
Code changes:
- Wired
OnConfirm,OnAddComment,OnEnableDisable,OnShelve,OnTimedUnshelvehandlers on eachAlarmConditionState - Shelving:
SetShelvingState()managesTimedShelve,OneShotShelve,Unshelvestate machine ReportAlarmEventnow populatesLocalTime(timezone offset + DST) andQualityevent fields- Flaky
Monitor_ProbeDataChange_PreventsStaleReconnecttest fixed (increased stale threshold from 2s to 5s)
No configuration changes required. All A&C gaps (1-10) resolved.
Security Gaps Fix
Updated: 2026-04-06
Both instances updated with OPC UA Part 2/4/7 security spec compliance fixes.
Code changes:
SecurityProfileResolver: Added 4 modern AES profiles (Aes128_Sha256_RsaOaep-Sign/SignAndEncrypt,Aes256_Sha256_RsaPss-Sign/SignAndEncrypt)OnImpersonateUser: AddedX509IdentityTokenhandling with CN extraction and role assignmentBuildUserTokenPolicies: AdvertisesUserTokenType.Certificatewhen non-None security profiles are configuredOnCertificateValidation: Enhanced logging with certificate thumbprint, subject, and expiry- Authentication audit logging:
AUDIT:prefixed log entries for success/failure with session ID and roles
No configuration changes required. All security gaps (1-10) resolved.
Historian Plugin Runtime Load + Dashboard Health
Updated: 2026-04-12 18:47-18:49 America/New_York
Both instances updated to the latest build. Brings in the runtime-loaded Historian plugin (Historian/ subfolder next to the Host) and the status dashboard health surface for historian plugin + alarm-tracking misconfiguration.
Backups created before deploy:
C:\publish\lmxopcua\backups\20260412-184713-instance1C:\publish\lmxopcua\backups\20260412-184713-instance2
Configuration preserved:
C:\publish\lmxopcua\instance1\appsettings.jsonwas not overwritten.C:\publish\lmxopcua\instance2\appsettings.jsonwas not overwritten.
Layout change:
- Flat historian interop DLLs removed from each instance root (
aahClient*.dll,ArchestrA.CloudHistorian.Contract.dll,Historian.CBE.dll,Historian.DPAPI.dll). - Historian plugin + interop DLLs now live under
<instance>\Historian\(includingZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll), loaded byHistorianPluginLoader.
Deployed binary (both instances):
ZB.MOM.WW.LmxOpcUa.Host.exe- Last write time:
2026-04-12 18:46:22 -04:00 - Size:
7938048
Windows services:
LmxOpcUa— Running, PID40176LmxOpcUa2— Running, PID34400
Restart evidence (instance1 logs/lmxopcua-20260412.log):
2026-04-12 18:48:02.968 -04:00 [INF] Historian.Enabled=true, ServerName=localhost, IntegratedSecurity=true, Port=32568
2026-04-12 18:48:02.971 -04:00 [INF] === Configuration Valid ===
2026-04-12 18:48:09.658 -04:00 [INF] Historian plugin loaded from C:\publish\lmxopcua\instance1\Historian\ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll
2026-04-12 18:48:13.691 -04:00 [INF] LmxOpcUa service started successfully
Restart evidence (instance2 logs/lmxopcua-20260412.log):
2026-04-12 18:49:08.152 -04:00 [INF] Historian.Enabled=true, ServerName=localhost, IntegratedSecurity=true, Port=32568
2026-04-12 18:49:08.155 -04:00 [INF] === Configuration Valid ===
2026-04-12 18:49:14.744 -04:00 [INF] Historian plugin loaded from C:\publish\lmxopcua\instance2\Historian\ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll
2026-04-12 18:49:18.777 -04:00 [INF] LmxOpcUa service started successfully
CLI verification (via dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI):
connect opc.tcp://localhost:4840/LmxOpcUa → Server: LmxOpcUa
connect opc.tcp://localhost:4841/LmxOpcUa → Server: LmxOpcUa2
redundancy opc.tcp://localhost:4840/LmxOpcUa → Warm, ServiceLevel=200, urn:localhost:LmxOpcUa:instance1
redundancy opc.tcp://localhost:4841/LmxOpcUa → Warm, ServiceLevel=150, urn:localhost:LmxOpcUa:instance2
Both instances report the same ServerUriArray and the primary advertises the higher ServiceLevel, matching the prior redundancy baseline.
Endpoints Panel on Dashboard
Updated: 2026-04-13 08:46-08:50 America/New_York
Both instances updated with a new Endpoints panel on the status dashboard surfacing the opc.tcp base addresses, active OPC UA security profiles (mode + policy name + full URI), and user token policies.
Code changes:
StatusData.cs— addedEndpointsInfo/SecurityProfileInfoDTOs onStatusData.OpcUaServerHost.cs— addedBaseAddresses,SecurityPolicies,UserTokenPoliciesruntime accessors readingApplicationConfiguration.ServerConfigurationlive state.StatusReportService.cs— buildsEndpointsInfofrom the host and renders a new panel with a graceful empty state when the server is not started.
No configuration changes required.
Verification (instance1 @ http://localhost:8085/):
Base Addresses: opc.tcp://localhost:4840/LmxOpcUa
Security Profiles: None / None / http://opcfoundation.org/UA/SecurityPolicy#None
User Token Policies: Anonymous, UserName
Verification (instance2 @ http://localhost:8086/):
Base Addresses: opc.tcp://localhost:4841/LmxOpcUa
Security Profiles: None / None / http://opcfoundation.org/UA/SecurityPolicy#None
User Token Policies: Anonymous, UserName
Template-Based Alarm Object Filter
Updated: 2026-04-13 09:39-09:43 America/New_York
Both instances updated with a new configurable alarm object filter. When OpcUa.AlarmFilter.ObjectFilters is non-empty, only Galaxy objects whose template derivation chain matches a pattern (and their containment-tree descendants) contribute AlarmConditionState nodes. When the list is empty, the current unfiltered behavior is preserved (backward-compatible default).
Backups created before deploy:
C:\publish\lmxopcua\backups\20260413-093900-instance1C:\publish\lmxopcua\backups\20260413-093900-instance2
Deployed binary (both instances):
ZB.MOM.WW.LmxOpcUa.Host.exe- Last write time:
2026-04-13 09:38:46 -04:00 - Size:
7951360
Windows services:
LmxOpcUa— Running, PID40900LmxOpcUa2— Running, PID29936
Code changes:
gr/queries/hierarchy.sql— added recursive CTE ongobject.derived_from_gobject_idand a newtemplate_chaincolumn (pipe-delimited, innermost template first).Domain/GalaxyObjectInfo.cs— addedTemplateChain: List<string>populated from the new SQL column.GalaxyRepositoryService.cs— reads the new column and splits intoTemplateChain.Configuration/AlarmFilterConfiguration.cs(new) —List<string> ObjectFilters; entries may themselves be comma-separated. Attached toOpcUaConfiguration.AlarmFilter.Configuration/ConfigurationValidator.cs— logs the effective filter and warns if patterns are configured whileAlarmTrackingEnabled == false.Domain/AlarmObjectFilter.cs(new) — compiles wildcard patterns (*only) to case-insensitive regexes with Galaxy$prefix normalized on both sides; walks the hierarchy top-down with cycle defense; returns aHashSet<int>of included gobject IDs plusUnmatchedPatternsfor startup warnings.OpcUa/LmxNodeManager.cs— constructor accepts the filter; the two alarm-creation loops (BuildAddressSpacefull build and the subtree rebuild path) both callResolveAlarmFilterIncludedIds(sorted)and skip any object not in the resolved set. New public properties expose filter state to the dashboard:AlarmFilterEnabled,AlarmFilterPatternCount,AlarmFilterIncludedObjectCount.OpcUa/OpcUaServerHost.cs,OpcUa/LmxOpcUaServer.cs,OpcUaService.cs,OpcUaServiceBuilder.cs— plumbing to construct and thread the filter fromappsettings.jsondown to the node manager.Status/StatusData.cs+Status/StatusReportService.cs—AlarmStatusInfogainsFilterEnabled,FilterPatternCount,FilterIncludedObjectCount; a filter summary line renders in the Alarms panel when the filter is active.
Tests:
- 36 new unit tests in
tests/.../Domain/AlarmObjectFilterTests.cscovering pattern parsing, wildcard semantics, regex escaping, Galaxy$normalization, template-chain matching, subtree propagation, set semantics, orphan/cycle defense, andUnmatchedPatternstracking. - 5 new integration tests in
tests/.../Integration/AlarmObjectFilterIntegrationTests.csspinning up a realLmxNodeManagerviaOpcUaServerFixtureand assertingAlarmConditionCount/AlarmFilterIncludedObjectCountunder various filters. - 1 new Status test verifying JSON exposes the filter counters.
- Full suite: 446/446 tests passing (no regressions).
Configuration change: both instances have OpcUa.AlarmFilter.ObjectFilters: [] (filter disabled, unfiltered alarm tracking preserved).
Live verification against instance1 Galaxy (filter temporarily set to "TestMachine"):
2026-04-13 09:41:31 [INF] OpcUa.AlarmTrackingEnabled=true, AlarmFilter.ObjectFilters=[TestMachine]
2026-04-13 09:41:42 [INF] Alarm filter: 42 of 49 objects included (1 pattern(s))
Dashboard Alarms panel: Tracking: True | Conditions: 60 | Active: 4
Filter: 1 pattern(s), 42 object(s) included
Final configuration restored to empty filter. Dashboard confirms unfiltered behavior on both endpoints:
instance1 @ http://localhost:8085/ → Conditions: 60 | Active: 4 (no filter line)
instance2 @ http://localhost:8086/ → Conditions: 60 | Active: 4 (no filter line)
Filter syntax quick reference (documented in AlarmFilterConfiguration.cs XML-doc):
*is the only wildcard (glob-style; zero or more characters).- Matching is case-insensitive and ignores the Galaxy leading
$template prefix on both the pattern and the stored chain entry, so operators writeTestMachine*not$TestMachine*. - Each entry may contain comma-separated patterns for convenience (e.g.,
"TestMachine*, Pump_*"). - Empty list → filter disabled → current unfiltered behavior.
- Match semantics: an object is included when any template in its derivation chain matches any pattern, and the inclusion propagates to all descendants in the containment hierarchy. Each object is evaluated once regardless of how many patterns or ancestors match.
Historian Runtime Health Surface
Updated: 2026-04-13 10:44-10:52 America/New_York
Both instances updated with runtime historian query instrumentation so the status dashboard can detect silent query degradation that the load-time PluginStatus cannot catch.
Backups:
C:\publish\lmxopcua\backups\20260413-104406-instance1C:\publish\lmxopcua\backups\20260413-104406-instance2
Code changes:
Host/Historian/HistorianHealthSnapshot.cs(new) — DTO withTotalQueries,TotalSuccesses,TotalFailures,ConsecutiveFailures,LastSuccessTime,LastFailureTime,LastError,ProcessConnectionOpen,EventConnectionOpen.Host/Historian/IHistorianDataSource.cs— addedGetHealthSnapshot()interface method.Historian.Aveva/HistorianDataSource.cs— added_healthLock-guarded counters,RecordSuccess()/RecordFailure(path)helpers called at every terminal site in all four read methods (raw, aggregate, at-time, events). Error messages carry araw:/aggregate:/at-time:/events:prefix so operators can tell which SDK call is broken.Host/OpcUa/LmxNodeManager.cs— exposesHistorianHealthproperty that proxies toIHistorianDataSource.GetHealthSnapshot().Host/Status/StatusData.cs— added 9 new fields onHistorianStatusInfo.Host/Status/StatusReportService.cs—BuildHistorianStatusInfo()populates the new fields from the node manager; panel color gradient: green → yellow (1-4 consecutive failures) → red (≥5 consecutive or plugin unloaded). RendersQueries: N (Success: X, Failure: Y) | Consecutive Failures: Z,Process Conn: open/closed | Event Conn: open/closed, plusLast Success:/Last Failure:/Last Error:lines when applicable.Host/Status/HealthCheckService.cs— new Rule 2b2:DegradedwhenConsecutiveFailures >= 3. Threshold chosen to avoid flagging single transient blips.
Tests:
- 5 new unit tests in
HistorianDataSourceLifecycleTestscovering fresh zero-state, single failure, multi-failure consecutive increment, cross-read-path counting, and error-message-carries-path. - Full suite: 16/16 plugin tests, 447/447 host tests passing.
Live verification on instance1:
Before any query:
Queries: 0 (Success: 0, Failure: 0) | Process Conn: closed | Event Conn: closed
After TestMachine_001.TestHistoryValue raw read:
Queries: 1 (Success: 1, Failure: 0) | Process Conn: open
Last Success: 2026-04-13T14:45:18Z
After aggregate hourly-average over 24h:
Queries: 2 (Success: 2, Failure: 0)
After historyread against an unknown node id (bad tag):
Queries: 2 (counter unchanged — rejected at node-lookup before reaching the plugin; correct)
JSON endpoint /api/status carries all 9 new fields with correct types. Both instances deployed; instance1 LmxOpcUa PID 33824, instance2 LmxOpcUa2 PID 30200.
Historian Read-Only Cluster Support
Updated: 2026-04-13 11:25-12:00 America/New_York
Both instances updated with Wonderware Historian read-only cluster failover. Operators can supply an ordered list of historian cluster nodes; the plugin iterates them on each fresh connect and benches failed nodes for a configurable cooldown window. Single-node deployments are preserved via the existing ServerName field.
Backups:
C:\publish\lmxopcua\backups\20260413-112519-instance1C:\publish\lmxopcua\backups\20260413-112519-instance2
Code changes:
Host/Configuration/HistorianConfiguration.cs— addedServerNames: List<string>(defaults to[]) andFailureCooldownSeconds: int(defaults to 60).ServerNamepreserved as fallback whenServerNamesis empty.Host/Historian/HistorianClusterNodeState.cs(new) — per-node DTO:Name,IsHealthy,CooldownUntil,FailureCount,LastError,LastFailureTime.Host/Historian/HistorianHealthSnapshot.cs— extended withActiveProcessNode,ActiveEventNode,NodeCount,HealthyNodeCount,Nodes: List<HistorianClusterNodeState>.Historian.Aveva/HistorianClusterEndpointPicker.cs(new, internal) — pure picker with injected clock, thread-safe via lock, BFS-styleGetHealthyNodes()/MarkFailed()/MarkHealthy()/SnapshotNodeStates(). Nodes iterate in configuration order; failed nodes skip until cooldown elapses; the cumulativeFailureCountandLastErrorare retained across recovery for operator diagnostics.Historian.Aveva/HistorianDataSource.cs— newConnectToAnyHealthyNode(type)method iterates picker candidates, clonesHistorianConfigurationper attempt with the candidate asServerName, and returns the first successful(Connection, Node)tuple.EnsureConnectedandEnsureEventConnectedboth call it.HandleConnectionErrorandHandleEventConnectionErrornow mark the active node failed in the picker before nulling._activeProcessNode/_activeEventNodetrack the live node for the dashboard. Both silos (process + event) share a single picker instance so a node failure on one immediately benches it for the other.Host/Status/StatusData.cs— addedNodeCount,HealthyNodeCount,ActiveProcessNode,ActiveEventNode,NodestoHistorianStatusInfo.Host/Status/StatusReportService.cs— Historian panel rendersProcess Conn: open (<node>)badges and a cluster table (whenNodeCount > 1) showing each node's state, cooldown expiry, failure count, and last error. Single-node deployments render a compactNode: <hostname>line.Host/Status/HealthCheckService.cs— new Rule 2b3:DegradedwhenNodeCount > 1 && HealthyNodeCount < NodeCount. Lets operators alert on a partially-failed cluster even while queries are still succeeding via the remaining nodes.Host/Configuration/ConfigurationValidator.cs— logs the effective node list andFailureCooldownSecondsat startup, validates thatFailureCooldownSeconds >= 0, warns whenServerNameis set alongside a non-emptyServerNames.
Tests:
HistorianClusterEndpointPickerTests.cs— 19 unit tests covering config parsing, ordered iteration, cooldown expiry, zero-cooldown mode, mark-healthy clears, cumulative failure counting, unknown-node safety, concurrent writers (thread-safety smoke test).HistorianClusterFailoverTests.cs— 6 integration tests drivingHistorianDataSourcevia a scriptedFakeHistorianConnectionFactory: first-node-fails-picks-second, all-nodes-fail, second-call-skips-cooled-down-node, single-node-legacy-behavior, picker-order-respected, shared-picker-across-silos.- Full plugin suite: 41/41 tests passing. Host suite: 446/447 (1 pre-existing flaky MxAccess monitor test passes on retry).
Live verification on instance1 (cluster = ["does-not-exist-historian.invalid", "localhost"], FailureCooldownSeconds=30):
Failover cycle 1 (fresh picker state, both nodes healthy):
2026-04-13 11:27:25.381 [WRN] Historian node does-not-exist-historian.invalid failed during connect attempt; trying next candidate
2026-04-13 11:27:25.910 [INF] Historian SDK connection opened to localhost:32568
- historyread returned 1 value successfully (
Queries: 1 (Success: 1, Failure: 0)). - Dashboard: panel yellow,
Cluster: 1 of 2 nodes healthy, bad nodecooldownuntil11:27:55Z,Process Conn: open (localhost).
Cooldown expiry:
- At 11:29 UTC, the cooldown window had elapsed. Panel back to green, both nodes healthy, but
does-not-exist-historian.invalidretainsFailureCount=1andLastErroras history.
Failover cycle 2 (service restart to drop persistent connection):
2026-04-13 14:00:39.352 [WRN] Historian node does-not-exist-historian.invalid failed during connect attempt; trying next candidate
2026-04-13 14:00:39.885 [INF] Historian SDK connection opened to localhost:32568
- historyread returned 1 value successfully on the second restart cycle — proves the picker re-admits a cooled-down node and the whole failover cycle repeats cleanly.
Single-node restoration:
- Changed instance1 back to
"ServerNames": [], restarted. Dashboard rendersNode: localhost(no cluster table), panel green, backward compat verified.
Final configuration: both instances running with empty ServerNames (single-node mode). LmxOpcUa PID 31064, LmxOpcUa2 PID 15012.
Operator configuration shape:
"Historian": {
"Enabled": true,
"ServerName": "localhost", // ignored when ServerNames is non-empty
"ServerNames": ["historian-a", "historian-b"],
"FailureCooldownSeconds": 60,
...
}
Notes
The service deployment and restart succeeded. The live CLI checks confirm the endpoint is reachable and that the array node identifier has changed to the bracketless form. The array value on the live service still prints as blank even though the status is good, so if this environment should have populated MoveInPartNumbers, the runtime data path still needs follow-up investigation.