# Full Review Update: `src` Overall verdict: **patch is still incorrect** This pass verified the previous findings against the current implementation, then performed another full review of `src/` with emphasis on service-mode reliability, continuous OPC UA publishing, and recovery from MXAccess disconnects. ## Verified Completed ### [DONE] OPC UA monitored items now start MXAccess subscriptions Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:355-423`. The node manager now hooks monitored-item create/delete callbacks and translates them into ref-counted `SubscribeTag` / `UnsubscribeTag` calls. This closes the original gap where OPC UA subscriptions never triggered runtime MX subscriptions. Validation: - `MultiClientTests.MultipleClients_SubscribeToSameTag_AllReceiveDataChanges` passed ### [DONE] Write timeout no longer reports success Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:101-107`. The timeout path now completes the write with `false` instead of `true`, so OPC UA writes are no longer acknowledged when MXAccess never confirms them. ### [DONE] Monitor retries from `Error` state as well as `Disconnected` Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.Monitor.cs:38-43`. The reconnect loop now treats `ConnectionState.Error` as reconnectable, which closes the prior issue where one failed reconnect left the client permanently stuck. Validation: - `MxAccessClientMonitorTests.Monitor_ReconnectsOnDisconnect` passed - `MxAccessClientMonitorTests.Monitor_ProbeStale_ForcesReconnect` passed ### [DONE] Address-space construction no longer depends on input ordering Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:75-77` and `:174-199`. The added topological sort ensures parents are materialized before children even when the hierarchy input is unsorted. ### [DONE] Startup no longer forces an immediate redundant rebuild Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:143-184` and `src/ZB.MOM.WW.LmxOpcUa.Host/GalaxyRepository/ChangeDetectionService.cs:25-30`. The initial deploy time is now captured during startup and passed into `ChangeDetectionService`, preventing the unconditional first-poll rebuild when startup already loaded the same deploy. Validation: - `ChangeDetectionServiceTests` passed ## New Findings ### [P1] Service never recovers if MXAccess is unavailable during startup Files: - `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:107-123` - `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:139-140` If the initial `MxAccessClient.ConnectAsync()` fails, startup disposes the real client and STA thread, replaces them with `NullMxAccessClient`, and still starts the OPC UA server. After that point there is no code path that ever retries the real MXAccess connection, so a temporary LMX outage at service startup becomes a permanent loss of runtime publishing until the whole Windows service is restarted. ### [P1] Address-space rebuild loses live subscription bookkeeping Files: - `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:163-169` - `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:390-423` `RebuildAddressSpace()` clears `_subscriptionRefCounts` without unsubscribing the existing MXAccess subscriptions or rebuilding the ref counts for still-active monitored items. If a Galaxy rebuild happens while clients are subscribed, later deletes no longer call `_mxAccessClient.UnsubscribeAsync(...)`, and any future subscription to the same tag starts from zero and creates another runtime subscription. In a long-running service this leaks live subscriptions and can duplicate data-change delivery after successive rebuilds. ### [P2] Failed reads and writes are still recorded as successful operations Files: - `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:36-42` - `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:97-109` - `src/ZB.MOM.WW.LmxOpcUa.Host/Status/HealthCheckService.cs:24-38` `PerformanceMetrics` only gets marked unsuccessful when an exception is thrown. A read timeout returns `Vtq.Bad(...)` and a rejected or timed-out write returns `false`, but neither path calls `scope.SetSuccess(false)`. Since the dashboard health logic uses those success rates to detect degraded behavior, sustained runtime failures can still leave the service reporting healthy even while reads or writes are failing. ## Test Notes Focused verification completed successfully: - `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~MultiClientTests.MultipleClients_SubscribeToSameTag_AllReceiveDataChanges"` - `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~MxAccessClientMonitorTests.Monitor_ReconnectsOnDisconnect|FullyQualifiedName~MxAccessClientMonitorTests.Monitor_ProbeStale_ForcesReconnect"` - `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~ChangeDetectionServiceTests"` I did not rerun the entire solution test suite in this pass. A previous full-solution run had timed out, so verification here was focused on the behavior touched by the prior fixes and the service-reliability paths reviewed above.