fix(concurrency): close 8 race / thread-safety findings across CD, DCL, SR
CD-015: rewrite NotificationOutboxRepository.InsertIfNotExistsAsync as raw-SQL IF NOT EXISTS … INSERT with SqlException 2601/2627 catch, ending the at-least-once livelock on the site→central notification handoff. DCL-018/019/020/021/022: add _subscribesInFlight guard so concurrent same-tag subscribes don't orphan an adapter handle; delete the latent dead _subscriptionHandles dictionary; stop double-counting _totalSubscribed when an unresolved tag is promoted via another instance; release adapter handles on mid-flight unsubscribe; gate the tag-resolution retry timer with IsTimerActive so subscribe bursts don't reset it into starvation. SR-020: add _terminatingActorsByName shadow so a third deploy arriving during a pending redeploy doesn't crash on InvalidActorNameException — displaced senders get a Failed/superseded response and the latest command wins on Terminated. SR-024: split OperationTrackingStore reads from writes (fresh SqliteConnection per GetStatusAsync) so long writes don't block status queries; rewrite Dispose to drop the sync-over-async bridge that could deadlock on a non-reentrant SyncContext; Interlocked.Exchange makes the dispose-once flag race-safe across both paths.
This commit is contained in:
+8
-16
@@ -40,10 +40,10 @@ module file and counted in **Total**.
|
||||
| Severity | Open findings |
|
||||
|----------|---------------|
|
||||
| Critical | 0 |
|
||||
| High | 12 |
|
||||
| Medium | 52 |
|
||||
| High | 10 |
|
||||
| Medium | 46 |
|
||||
| Low | 90 |
|
||||
| **Total** | **154** |
|
||||
| **Total** | **146** |
|
||||
|
||||
## Module Status
|
||||
|
||||
@@ -55,8 +55,8 @@ module file and counted in **Total**.
|
||||
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/4 | 4 | 14 |
|
||||
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/6 | 9 | 23 |
|
||||
| [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/1/5 | 7 | 22 |
|
||||
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/4/5 | 10 | 24 |
|
||||
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/4/0 | 5 | 22 |
|
||||
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/4/5 | 9 | 24 |
|
||||
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
|
||||
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/1/5 | 7 | 24 |
|
||||
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/2/3 | 6 | 23 |
|
||||
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/5 | 7 | 23 |
|
||||
@@ -68,7 +68,7 @@ module file and counted in **Total**.
|
||||
| [Security](Security/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 21 |
|
||||
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/4 | 6 | 6 |
|
||||
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/2/6 | 9 | 23 |
|
||||
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/4/3 | 7 | 26 |
|
||||
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/3 | 5 | 26 |
|
||||
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/3/3 | 7 | 24 |
|
||||
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-28 | `1eb6e97` | 0/1/4/1 | 6 | 22 |
|
||||
| [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/2/2/4 | 8 | 12 |
|
||||
@@ -84,13 +84,11 @@ description, location, recommendation — lives in the module's `findings.md`.
|
||||
|
||||
_None open._
|
||||
|
||||
### High (12)
|
||||
### High (10)
|
||||
|
||||
| ID | Module | Title |
|
||||
|----|--------|-------|
|
||||
| Communication-016 | [Communication](Communication/findings.md) | `HandleConnectionStateChanged` is dead code — the documented disconnect-cleanup workflow never fires |
|
||||
| ConfigurationDatabase-015 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | `NotificationOutboxRepository.InsertIfNotExistsAsync` is a check-then-act race with no duplicate-key catch |
|
||||
| DataConnectionLayer-018 | [DataConnectionLayer](DataConnectionLayer/findings.md) | Concurrent subscribes for the same tag from different instances orphan an adapter subscription handle |
|
||||
| DeploymentManager-018 | [DeploymentManager](DeploymentManager/findings.md) | Reconciliation force-sets `Enabled`, overwriting an intentional `Disabled` after central failover |
|
||||
| ExternalSystemGateway-018 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `DeliverBufferedAsync` lets `JsonException` propagate, turning a corrupt buffered row into a permanent retry-forever poison message |
|
||||
| InboundAPI-022 | [InboundAPI](InboundAPI/findings.md) | `IActiveNodeGate` has no production registration in Host — standby-node gating is silently disabled in production |
|
||||
@@ -101,7 +99,7 @@ _None open._
|
||||
| Transport-001 | [Transport](Transport/findings.md) | Template Overwrite never syncs attributes / alarms / scripts |
|
||||
| Transport-002 | [Transport](Transport/findings.md) | ExternalSystem Overwrite never syncs methods |
|
||||
|
||||
### Medium (52)
|
||||
### Medium (46)
|
||||
|
||||
| ID | Module | Title |
|
||||
|----|--------|-------|
|
||||
@@ -120,10 +118,6 @@ _None open._
|
||||
| ConfigurationDatabase-017 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Stub-attach delete on `DeploymentRecord` bypasses optimistic concurrency |
|
||||
| ConfigurationDatabase-018 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | `DateTime`-typed `*Utc` columns on `AuditEvent` / `SiteCall` carry no `DateTimeKind` enforcement |
|
||||
| ConfigurationDatabase-019 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | `EnsureLookaheadAsync` swallows non-idempotent SPLIT failures and continues, creating partition holes |
|
||||
| DataConnectionLayer-019 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `OpcUaDataConnection._subscriptionHandles` is a plain `Dictionary<,>` mutated from concurrent thread-pool continuations |
|
||||
| DataConnectionLayer-020 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `HandleSubscribeCompleted` double-counts `_totalSubscribed` when a previously-unresolved tag is resolved by a different instance's subscribe |
|
||||
| DataConnectionLayer-021 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `HandleSubscribeCompleted` re-creates and leaks `_subscriptionsByInstance` entry when the instance unsubscribed mid-flight |
|
||||
| DataConnectionLayer-022 | [DataConnectionLayer](DataConnectionLayer/findings.md) | `HandleSubscribeCompleted` and `HandleTagResolutionFailed` reset the tag-resolution retry timer on every call via `StartPeriodicTimer`, starving the retry under subscribe bursts |
|
||||
| DeploymentManager-019 | [DeploymentManager](DeploymentManager/findings.md) | Lifecycle command timeout writes no audit entry |
|
||||
| ExternalSystemGateway-019 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `HttpClient.Timeout` is not set; `DefaultHttpTimeout` > 100s is silently clipped by the framework default |
|
||||
| ExternalSystemGateway-020 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry |
|
||||
@@ -144,10 +138,8 @@ _None open._
|
||||
| SiteCallAudit-003 | [SiteCallAudit](SiteCallAudit/findings.md) | `OnUpsertAsync` does not refresh `IngestedAtUtc`; direct-write callers must remember to stamp it |
|
||||
| SiteEventLogging-015 | [SiteEventLogging](SiteEventLogging/findings.md) | Background write queue is unbounded; can grow without limit under sustained writer slowness |
|
||||
| SiteEventLogging-017 | [SiteEventLogging](SiteEventLogging/findings.md) | Central client's `PageSize` is unbounded; defeats the "configurable page size" design rationale |
|
||||
| SiteRuntime-020 | [SiteRuntime](SiteRuntime/findings.md) | Second `DeployInstanceCommand` arriving during a pending redeploy races the still-terminating actor on its name |
|
||||
| SiteRuntime-021 | [SiteRuntime](SiteRuntime/findings.md) | `HandleDeployArtifacts` updates `DataConnections` in SQLite but never sends `CreateConnectionCommand` to the DCL |
|
||||
| SiteRuntime-022 | [SiteRuntime](SiteRuntime/findings.md) | `AuditingDbCommand.DbConnection.set` uses reflection to read `AuditingDbConnection._inner` |
|
||||
| SiteRuntime-024 | [SiteRuntime](SiteRuntime/findings.md) | `OperationTrackingStore` serialises all writes through one connection + `SemaphoreSlim`, and `Dispose()` does sync-over-async |
|
||||
| StoreAndForward-019 | [StoreAndForward](StoreAndForward/findings.md) | Notifications park after `DefaultMaxRetries` exhaustion, contradicting "retried until central acks" |
|
||||
| StoreAndForward-020 | [StoreAndForward](StoreAndForward/findings.md) | `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load |
|
||||
| StoreAndForward-021 | [StoreAndForward](StoreAndForward/findings.md) | Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime |
|
||||
|
||||
Reference in New Issue
Block a user