- Preserve and replay subscription ref counts across address space rebuilds to prevent MXAccess subscription leaks - Mark read timeouts and write failures as unsuccessful in PerformanceMetrics for accurate health reporting - Add deferred MxAccess reconnect path when initial connection fails at startup - Update code review document with verified completions and new findings - Add covering tests for all fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
76 lines
5.1 KiB
Markdown
76 lines
5.1 KiB
Markdown
# Full Review Update: `src`
|
|
|
|
Overall verdict: **patch is still incorrect**
|
|
|
|
This pass verified the previous findings against the current implementation, then performed another full review of `src/` with emphasis on service-mode reliability, continuous OPC UA publishing, and recovery from MXAccess disconnects.
|
|
|
|
## Verified Completed
|
|
|
|
### [DONE] OPC UA monitored items now start MXAccess subscriptions
|
|
Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:355-423`.
|
|
|
|
The node manager now hooks monitored-item create/delete callbacks and translates them into ref-counted `SubscribeTag` / `UnsubscribeTag` calls. This closes the original gap where OPC UA subscriptions never triggered runtime MX subscriptions.
|
|
|
|
Validation:
|
|
- `MultiClientTests.MultipleClients_SubscribeToSameTag_AllReceiveDataChanges` passed
|
|
|
|
### [DONE] Write timeout no longer reports success
|
|
Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:101-107`.
|
|
|
|
The timeout path now completes the write with `false` instead of `true`, so OPC UA writes are no longer acknowledged when MXAccess never confirms them.
|
|
|
|
### [DONE] Monitor retries from `Error` state as well as `Disconnected`
|
|
Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.Monitor.cs:38-43`.
|
|
|
|
The reconnect loop now treats `ConnectionState.Error` as reconnectable, which closes the prior issue where one failed reconnect left the client permanently stuck.
|
|
|
|
Validation:
|
|
- `MxAccessClientMonitorTests.Monitor_ReconnectsOnDisconnect` passed
|
|
- `MxAccessClientMonitorTests.Monitor_ProbeStale_ForcesReconnect` passed
|
|
|
|
### [DONE] Address-space construction no longer depends on input ordering
|
|
Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:75-77` and `:174-199`.
|
|
|
|
The added topological sort ensures parents are materialized before children even when the hierarchy input is unsorted.
|
|
|
|
### [DONE] Startup no longer forces an immediate redundant rebuild
|
|
Verified in `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:143-184` and `src/ZB.MOM.WW.LmxOpcUa.Host/GalaxyRepository/ChangeDetectionService.cs:25-30`.
|
|
|
|
The initial deploy time is now captured during startup and passed into `ChangeDetectionService`, preventing the unconditional first-poll rebuild when startup already loaded the same deploy.
|
|
|
|
Validation:
|
|
- `ChangeDetectionServiceTests` passed
|
|
|
|
## New Findings
|
|
|
|
### [P1] Service never recovers if MXAccess is unavailable during startup
|
|
Files:
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:107-123`
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUaService.cs:139-140`
|
|
|
|
If the initial `MxAccessClient.ConnectAsync()` fails, startup disposes the real client and STA thread, replaces them with `NullMxAccessClient`, and still starts the OPC UA server. After that point there is no code path that ever retries the real MXAccess connection, so a temporary LMX outage at service startup becomes a permanent loss of runtime publishing until the whole Windows service is restarted.
|
|
|
|
### [P1] Address-space rebuild loses live subscription bookkeeping
|
|
Files:
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:163-169`
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/OpcUa/LmxNodeManager.cs:390-423`
|
|
|
|
`RebuildAddressSpace()` clears `_subscriptionRefCounts` without unsubscribing the existing MXAccess subscriptions or rebuilding the ref counts for still-active monitored items. If a Galaxy rebuild happens while clients are subscribed, later deletes no longer call `_mxAccessClient.UnsubscribeAsync(...)`, and any future subscription to the same tag starts from zero and creates another runtime subscription. In a long-running service this leaks live subscriptions and can duplicate data-change delivery after successive rebuilds.
|
|
|
|
### [P2] Failed reads and writes are still recorded as successful operations
|
|
Files:
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:36-42`
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/MxAccess/MxAccessClient.ReadWrite.cs:97-109`
|
|
- `src/ZB.MOM.WW.LmxOpcUa.Host/Status/HealthCheckService.cs:24-38`
|
|
|
|
`PerformanceMetrics` only gets marked unsuccessful when an exception is thrown. A read timeout returns `Vtq.Bad(...)` and a rejected or timed-out write returns `false`, but neither path calls `scope.SetSuccess(false)`. Since the dashboard health logic uses those success rates to detect degraded behavior, sustained runtime failures can still leave the service reporting healthy even while reads or writes are failing.
|
|
|
|
## Test Notes
|
|
|
|
Focused verification completed successfully:
|
|
- `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~MultiClientTests.MultipleClients_SubscribeToSameTag_AllReceiveDataChanges"`
|
|
- `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~MxAccessClientMonitorTests.Monitor_ReconnectsOnDisconnect|FullyQualifiedName~MxAccessClientMonitorTests.Monitor_ProbeStale_ForcesReconnect"`
|
|
- `dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests/ZB.MOM.WW.LmxOpcUa.Tests.csproj --filter "FullyQualifiedName~ChangeDetectionServiceTests"`
|
|
|
|
I did not rerun the entire solution test suite in this pass. A previous full-solution run had timed out, so verification here was focused on the behavior touched by the prior fixes and the service-reliability paths reviewed above.
|