fix(communication): resolve Communication-009,010,011 — atomic site-cache refresh, XML doc correction, test coverage

This commit is contained in:
Joseph Doherty
2026-05-16 22:04:21 -04:00
parent c07f524ca4
commit 0b4c1563aa
4 changed files with 88 additions and 12 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 3 |
| Open findings | 0 |
## Summary
@@ -405,7 +405,7 @@ per-event reset (`Grpc_Error_Resets_RetryCount_On_Successful_Event`) was replace
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:53`, `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:240` |
**Description**
@@ -427,7 +427,16 @@ malformed site record cannot abort the whole refresh and leave a half-updated ca
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). Root cause confirmed: `_siteClients` was a
non-`readonly` field, and `HandleSiteAddressCacheLoaded`'s add/update loop called
`ActorPath.Parse` per site with no guard — a malformed `NodeAAddress` threw mid-loop and
aborted the refresh, leaving the cache half-updated until the next 60s cycle. Fix:
`_siteClients` is now `readonly`, and the per-site `ActorPath.Parse` is wrapped in a
try/catch that logs the bad address at Warning and `continue`s to the next site, so a
single garbage row cannot starve other sites of their ClusterClient. Regression test
`CentralCommunicationActorTests.MalformedSiteAddress_DoesNotAbortRefresh_OtherSitesStillRegistered`
(bad site ordered before a good one) fails against the pre-fix code (good site never
registered) and passes after.
### Communication-010 — `DebugStreamBridgeActor` XML doc incorrectly describes it as a "Persistent actor"
@@ -435,7 +444,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/DebugStreamBridgeActor.cs:10` |
**Description**
@@ -453,7 +462,13 @@ side..." or similar, removing the word "Persistent".
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). Root cause confirmed: the class summary opened
with "Persistent actor (one per active debug session)..." but the actor derives from
`ReceiveActor`, holds no `PersistenceId`, and writes no journal/snapshot. Fix
(documentation only — no behaviour change, so no regression test): the summary was
reworded to "Long-lived (one per active debug session) actor on the central side. Debug
sessions are session-based and temporary — this actor holds no persisted state and does
not derive from an Akka.Persistence base class; its state does not survive a restart."
### Communication-011 — No test coverage for snapshot-timeout cleanup, address-cache failure, or gRPC reconnect leak
@@ -461,7 +476,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Status | Resolved |
| Location | `tests/ScadaLink.Communication.Tests/` (module-wide) |
**Description**
@@ -487,4 +502,20 @@ failure logging and empty-cache behaviour; reusing a correlation ID across
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). This is a meta-coverage finding; every gap it
enumerates is now covered by a regression test (each fails against its pre-fix code and
passes after):
- Snapshot timeout / pre-snapshot termination (Communication-001) —
`DebugStreamServiceTests.StartStreamAsync_StreamTerminatesBeforeSnapshot_ThrowsMeaningfulException`.
- gRPC reconnect not unsubscribing the old node (Communication-002) —
`DebugStreamBridgeActorTests.On_GrpcError_Unsubscribes_Old_Stream_Before_Reconnect`.
- `SiteStreamGrpcClient` subscription-map overwrite/removal race (Communication-003) —
`SiteStreamGrpcClientTests.RegisterSubscription_ReusedCorrelationId_CancelsAndDisposesPriorCts`
and `RemoveSubscription_OnlyRemovesOwnCts_NotAReplacement`.
- `LoadSiteAddressesFromDb` fault (Communication-006) —
`CentralCommunicationActorTests.LoadSiteAddressesFailure_IsLoggedNotSilentlySwallowed`.
- Malformed `NodeAAddress` aborting `HandleSiteAddressCacheLoaded` (Communication-009) —
`CentralCommunicationActorTests.MalformedSiteAddress_DoesNotAbortRefresh_OtherSitesStillRegistered`
(added with this finding's resolution).
The full module suite (`dotnet test tests/ScadaLink.Communication.Tests`) is green at
111 passing tests.