fix(data-connection): resolve DataConnectionLayer-001 — off-thread actor state mutation

HandleSubscribe spawned a Task.Run that mutated DataConnectionActor private
state (_subscriptionIds, _subscriptionsByInstance, _totalSubscribed,
_resolvedTags, _unresolvedTags) from a thread-pool thread, racing the actor's
own message loop — a data race on non-thread-safe Dictionary/HashSet and
non-atomic counters.

Restructured HandleSubscribe to follow the actor's existing PipeTo(Self)
pattern: the background task now performs only adapter I/O and pipes a
SubscribeCompleted message to Self; all subscription-state mutation happens
in the new HandleSubscribeCompleted handler on the actor thread (wired into
the Connected, Connecting and Reconnecting states).

Adds DCL001_ConcurrentSubscribes_DoNotCorruptSubscriptionCounters (30x30
concurrent subscribes) which fails against the pre-fix code and passes after.
This commit is contained in:
Joseph Doherty
2026-05-16 18:26:43 -04:00
parent 977d7369a7
commit 239bee3bc4
4 changed files with 213 additions and 54 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 13 |
| Open findings | 12 |
## Summary
@@ -53,7 +53,7 @@ tag-resolution retry, disconnect/re-subscribe, and concurrency around `HandleSub
|--|--|
| Severity | Critical |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:473-538` |
**Description**
@@ -82,7 +82,18 @@ handler too.
**Resolution**
_Unresolved._
Resolved 2026-05-16. `HandleSubscribe` was restructured to follow the actor's own
`PipeTo(Self)` pattern (the one already used by `HandleRetryTagResolution`): the
background `Task.Run` now performs only adapter I/O (`SubscribeAsync`/`ReadAsync`),
collects per-tag outcomes into an immutable `SubscribeCompleted` message, and pipes
that to `Self`. All mutation of `_subscriptionIds`, `_subscriptionsByInstance`,
`_totalSubscribed`, `_resolvedTags` and `_unresolvedTags` now happens in the new
`HandleSubscribeCompleted` handler on the actor thread; it is wired into the
Connected, Connecting and Reconnecting states so an in-flight subscribe is applied
regardless of state transitions. Regression test
`DCL001_ConcurrentSubscribes_DoNotCorruptSubscriptionCounters` (30×30 concurrent
subscribes) fails against the pre-fix code and passes after. Fixed by the commit
whose message references `DataConnectionLayer-001`.
### DataConnectionLayer-002 — `Restart` supervision discards all subscription state on connection-actor crash

View File

@@ -28,53 +28,55 @@ code-reviews/
## Baseline review — 2026-05-16
All 19 modules were reviewed at commit `9c60592`. This established the baseline below.
All 19 modules were reviewed at commit `9c60592` (241 findings: 6 Critical, 46 High,
100 Medium, 89 Low). The tables below track what remains **open** as findings are
resolved.
| Severity | Open findings |
|----------|---------------|
| Critical | 6 |
| Critical | 5 |
| High | 46 |
| Medium | 100 |
| Low | 89 |
| **Total** | **241** |
| **Total** | **240** |
## Module Status
| Module | Review status | Last reviewed | Commit | Open (C/H/M/L) | Total |
|--------|---------------|---------------|--------|----------------|-------|
| [CentralUI](CentralUI/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/3/10/5 | 19 |
| [CLI](CLI/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/1/6/6 | 13 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/1/4/3 | 8 |
| [Commons](Commons/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/0/4/8 | 12 |
| [Communication](Communication/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/2/5/3 | 11 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/1/4/6 | 11 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/4/6/2 | 13 |
| [DeploymentManager](DeploymentManager/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/3/6/5 | 14 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/2/7/4 | 14 |
| [HealthMonitoring](HealthMonitoring/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/2/5/5 | 12 |
| [Host](Host/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/1/3/7 | 11 |
| [InboundAPI](InboundAPI/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 |
| [ManagementService](ManagementService/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 |
| [NotificationService](NotificationService/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/3/5/3 | 12 |
| [Security](Security/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/3/4/4 | 11 |
| [SiteEventLogging](SiteEventLogging/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/4/4/3 | 11 |
| [SiteRuntime](SiteRuntime/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/3/8/5 | 16 |
| [StoreAndForward](StoreAndForward/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 1/2/4/6 | 13 |
| [TemplateEngine](TemplateEngine/findings.md) | Reviewed | 2026-05-16 | `9c60592` | 0/5/5/4 | 14 |
| Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|--------|---------------|--------|----------------|------|-------|
| [CentralUI](CentralUI/findings.md) | 2026-05-16 | `9c60592` | 1/3/10/5 | 19 | 19 |
| [CLI](CLI/findings.md) | 2026-05-16 | `9c60592` | 0/1/6/6 | 13 | 13 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-16 | `9c60592` | 0/1/4/3 | 8 | 8 |
| [Commons](Commons/findings.md) | 2026-05-16 | `9c60592` | 0/0/4/8 | 12 | 12 |
| [Communication](Communication/findings.md) | 2026-05-16 | `9c60592` | 1/2/5/3 | 11 | 11 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-16 | `9c60592` | 0/1/4/6 | 11 | 11 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-16 | `9c60592` | 0/4/6/2 | 12 | 13 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-16 | `9c60592` | 0/3/6/5 | 14 | 14 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-16 | `9c60592` | 1/2/7/4 | 14 | 14 |
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-16 | `9c60592` | 0/2/5/5 | 12 | 12 |
| [Host](Host/findings.md) | 2026-05-16 | `9c60592` | 0/1/3/7 | 11 | 11 |
| [InboundAPI](InboundAPI/findings.md) | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 | 13 |
| [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 | 13 |
| [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 1/3/5/3 | 12 | 12 |
| [Security](Security/findings.md) | 2026-05-16 | `9c60592` | 0/3/4/4 | 11 | 11 |
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/4/4/3 | 11 | 11 |
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/3/8/5 | 16 | 16 |
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 1/2/4/6 | 13 | 13 |
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/5/5/4 | 14 | 14 |
## Pending Findings
All findings are currently `Open`. As findings are resolved, remove them from the
tables below (see [REVIEW-PROCESS.md](REVIEW-PROCESS.md) §5). Full detail for each
finding — description, location, recommendation — lives in the module's `findings.md`.
Every `Open` / `In Progress` finding across all modules, highest severity first.
Resolved findings drop off this list but remain recorded in their module's
`findings.md` (see [REVIEW-PROCESS.md](REVIEW-PROCESS.md) §4§5). Full detail —
description, location, recommendation — lives in the module's `findings.md`.
### Critical (6)
### Critical (5)
| ID | Module | Title |
|----|--------|-------|
| CentralUI-001 | [CentralUI](CentralUI/findings.md) | Test Run sandbox executes arbitrary C# with no trust-model enforcement |
| Communication-001 | [Communication](Communication/findings.md) | Snapshot timeout leaves orphaned bridge actor and site subscription |
| DataConnectionLayer-001 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `Task.Run` in `HandleSubscribe` mutates actor state off the actor thread |
| ExternalSystemGateway-001 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | No S&F delivery handler registered; cached calls and writes can never be delivered |
| NotificationService-001 | [NotificationService](NotificationService/findings.md) | Buffered notifications are never retried (no S&F delivery handler) |
| StoreAndForward-001 | [StoreAndForward](StoreAndForward/findings.md) | Replication to standby is never triggered by the active node |