fix(health-monitoring): resolve HealthMonitoring-001/002 — populate S&F buffer depth, make SiteHealthState immutable

This commit is contained in:
Joseph Doherty
2026-05-16 19:40:40 -04:00
parent 340a70f0e6
commit 7d7214a4ca
7 changed files with 287 additions and 60 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 12 |
| Open findings | 10 |
## Summary
@@ -55,7 +55,7 @@ design-adherence gap.
|--|--|
| Severity | High |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.HealthMonitoring/SiteHealthCollector.cs:104`, `src/ScadaLink.HealthMonitoring/HealthReportSender.cs:79` |
**Description**
@@ -79,7 +79,17 @@ the dead setter. Update the placeholder test accordingly once implemented.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`). `HealthReportSender.ExecuteAsync` now
queries the existing public `StoreAndForwardStorage.GetBufferDepthByCategoryAsync()`
API alongside the parked-count call and feeds the per-category depths into
`SiteHealthCollector.SetStoreAndForwardDepths` (category enum names as keys), so the
documented store-and-forward buffer depth metric is populated in every emitted
report. Regression test `HealthReportSenderTests.ReportsIncludeStoreAndForwardBufferDepthsFromStorage`
verifies populated per-category depths. The obsolete placeholder test
`SiteHealthCollectorTests.StoreAndForwardBufferDepths_IsEmptyPlaceholder` continues
to pass — it only exercises the collector with no setter call and still correctly
asserts the empty default; it was left in place as the collector-level default-state
test. No StoreAndForward source was modified (existing public API only).
### HealthMonitoring-002 — `SiteHealthState` mutable fields written from multiple threads without synchronization
@@ -87,7 +97,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.HealthMonitoring/SiteHealthState.cs:11`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:86`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:137` |
**Description**
@@ -112,7 +122,22 @@ a single atomic reference swap.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`). `SiteHealthState` is now a `sealed record`
with `init`-only properties. `CentralHealthAggregator.ProcessReport`,
`MarkHeartbeat`, and `CheckForOfflineSites` were rewritten to perform every state
transition as an atomic compare-and-swap (`TryAdd`/`TryUpdate`) producing a new
record instance — no field of a stored state is ever mutated in place. `ProcessReport`
uses an explicit CAS retry loop instead of the `AddOrUpdate` update delegate so the
sequence-number guard and the field writes are evaluated against the value actually
installed (this also closes the root cause behind HealthMonitoring-003). Reads via
`GetAllSiteStates`/`GetSiteState` now hand out immutable snapshots, so a concurrent
reader can never observe a torn or half-applied state. `LatestReport` was changed
from `SiteHealthReport` (`null!`) to `SiteHealthReport?`, making the contract honest;
all existing consumers (CentralUI, integration/perf tests) already null-checked it
and continue to build clean. Regression test
`CentralHealthAggregatorTests.ProcessReport_ConcurrentUpdates_NeverLoseSequenceOrTearState`
exercises concurrent report/heartbeat/read threads and asserts snapshot consistency
and no lost updates.
### HealthMonitoring-003 — Shared state mutated inside `ConcurrentDictionary.AddOrUpdate` update delegate