fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-005)

Status fields (_lastDrainUtc, _lastSuccessUtc, _lastError, _drainState,
_evictedCount) were written by the drain timer thread and read by
GetStatus() / health-check threads with no memory barrier, risking torn
DateTime? reads and stale DrainState observations.

- Added _statusLock object; all writes to status fields now happen inside
  lock(_statusLock) blocks in DrainOnceAsync and DrainTimerCallback.
- GetStatus() snapshots all fields atomically under the same lock so the
  Admin UI / /healthz endpoint always sees a consistent view.
- Regression test GetStatus_snapshot_is_consistent_under_concurrent_drain
  drives status writes and reads from concurrent threads; asserts no throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 09:27:31 -04:00
parent 0003f76301
commit 6d520c6756
2 changed files with 124 additions and 2 deletions

View File

@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
| Status | Open |
| Status | Resolved |
**Description:** The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin-UI / health-check threads with no memory barrier (no `lock`, no `volatile`, no `Interlocked`). `DateTime?` is not guaranteed to be written atomically, and the reader can observe a stale or torn value. This is a diagnostics surface so the impact is limited, but a torn `DateTime?` read is real undefined behavior.
**Recommendation:** Guard the status fields with a small lock, or make the scalars `volatile` where the type permits and snapshot `DateTime?` values under a lock. Take the snapshot atomically in `GetStatus()`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_statusLock` object; all writes to `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_evictedCount` (new) now happen inside `lock (_statusLock)` blocks; `GetStatus()` snapshots all fields atomically under the same lock. Regression test `GetStatus_snapshot_is_consistent_under_concurrent_drain` added.
### Core.AlarmHistorian-006