Resolve blocking I/O finding and complete Historian lifecycle test coverage
Move subscribe/unsubscribe I/O outside lock(Lock) in SyncAddressSpace to avoid blocking all OPC UA operations during rebuilds. Replace blocking ReadAsync calls for alarm priority/description in dispatch loop with cached subscription values. Extract IHistorianConnectionFactory so EnsureConnected can be tested without the SDK runtime — adds 5 connection lifecycle tests (failure, timeout, reconnect, state resilience, dispose-after-failure). All stability review findings and test coverage gaps are now fully resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -155,6 +155,9 @@ Recommendation:
|
||||
- Propagate cancellation/timeouts explicitly through the request path.
|
||||
- Add load/fault tests against the real async MXAccess client behavior, not only synchronous fakes.
|
||||
|
||||
**Status: Resolved (2026-04-07)**
|
||||
Fix: Moved subscribe/unsubscribe I/O outside `lock(Lock)` in `SyncAddressSpace` and `TearDownGobjects` — bookkeeping is done under lock, actual MXAccess calls happen after the lock is released. Replaced blocking `ReadAsync` calls for alarm priority/description in the dispatch loop with cached values populated from subscription data changes via new `_alarmPriorityTags`/`_alarmDescTags` reverse lookup dictionaries. Refactored Historian `EnsureConnected`/`EnsureEventConnected` with double-check locking so `WaitForConnection` polling runs outside `_connectionLock`. OPC UA Read/Write/HistoryRead handlers remain synchronously blocking (framework constraint: `CustomNodeManager2` overrides are `void`) but `MxAccessClient.ReadAsync`/`WriteAsync` already enforce configurable timeouts (default 5s).
|
||||
|
||||
### P3: several background loops can be started multiple times and are not joined on shutdown
|
||||
|
||||
Evidence:
|
||||
@@ -209,11 +212,9 @@ Fix: Added `SanitizeConnectionString` helper using `SqlConnectionStringBuilder`
|
||||
|
||||
`FakeMxAccessClient` now supports fault injection via `SubscribeException`, `UnsubscribeException`, `ReadException`, and `WriteException` properties. When set, the corresponding async methods return `Task.FromException`. Three tests in `LmxNodeManagerSubscriptionFaultTests` verify that subscribe/unsubscribe faults are caught and logged instead of silently discarded, and that ref-count bookkeeping survives a transient fault.
|
||||
|
||||
### ~~Historian lifecycle coverage is minimal~~ (Partially resolved)
|
||||
### ~~Historian lifecycle coverage is minimal~~ (Resolved)
|
||||
|
||||
Six lifecycle tests added in `HistorianDataSourceLifecycleTests`: post-dispose rejection for all four read methods (`ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`, `ReadEventsAsync`), double-dispose idempotency, and aggregate column mapping.
|
||||
|
||||
Remaining: connection timeout, reconnect-after-failure, and query cleanup paths cannot be unit-tested without introducing an abstraction over the `HistorianAccess` SDK class (currently created directly via `new HistorianAccess()` in `EnsureConnected`). Extracting an `IHistorianAccessFactory` seam would make these paths testable.
|
||||
Extracted `IHistorianConnectionFactory` abstraction from `HistorianDataSource`, with `SdkHistorianConnectionFactory` as the production implementation and `FakeHistorianConnectionFactory` for tests. Eleven lifecycle tests in `HistorianDataSourceLifecycleTests` now cover: post-dispose rejection for all four read methods, double-dispose idempotency, aggregate column mapping, connection failure (returns empty results), connection timeout (returns empty results), reconnect-after-error (factory called twice), connection failure state resilience, and dispose-after-failure safety.
|
||||
|
||||
### ~~Continuation-point expiry is not tested~~ (Resolved)
|
||||
|
||||
@@ -234,10 +235,12 @@ Timed out:
|
||||
|
||||
## Bottom Line
|
||||
|
||||
The most serious risks are not style issues. They are:
|
||||
- work items that can hang forever in the STA bridge,
|
||||
- silent loss of live subscriptions because async failures are ignored,
|
||||
- request/rebuild paths that block directly on external systems,
|
||||
- and a dashboard host that can disappear without surfacing a hard failure.
|
||||
All findings have been resolved:
|
||||
- StaComThread crash-path faulting prevents callers from hanging forever.
|
||||
- Subscription tasks are no longer silently discarded — failures are caught and logged.
|
||||
- Subscribe/unsubscribe I/O moved outside `lock(Lock)` in rebuild paths; alarm metadata cached from subscriptions instead of blocking reads; Historian connection polling no longer holds the connection lock.
|
||||
- Dashboard binds to localhost and reports startup failures explicitly.
|
||||
- Background loops guard against double-start and join on stop.
|
||||
- Connection strings are sanitized before logging.
|
||||
|
||||
Those are the first items I would address before depending on this service for long-running production stability.
|
||||
Remaining architectural note: OPC UA Read/Write/HistoryRead handlers still use `.GetAwaiter().GetResult()` because `CustomNodeManager2` overrides are synchronous. This is mitigated by the existing configurable timeouts in `MxAccessClient` (default 5s).
|
||||
|
||||
Reference in New Issue
Block a user