fix(driver-modbus): resolve High code-review finding (Driver.Modbus-001)

_lastPublishedByRef was a plain Dictionary<string, object> mutated inside
ShouldPublish, which runs on the PollGroupEngine onChange callback. The engine
runs one background Task per subscription, so a driver with two or more
subscriptions invokes ShouldPublish concurrently on separate threads. Concurrent
TryGetValue/indexer writes on a non-thread-safe Dictionary can corrupt internal
state, drop entries, or throw, crashing the poll loop.

Switch _lastPublishedByRef to ConcurrentDictionary<string, object>; its
TryGetValue and indexer-set operations are individually thread-safe, so the
deadband cache is now correct under concurrent multi-subscription publishing,
consistent with the lock-guarded sibling cache _lastWrittenByRef.

Add an xUnit + Shouldly regression test that runs 24 deadband-configured
single-tag subscriptions concurrently and asserts the poll loop survives without
faulting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 06:53:51 -04:00
parent 7f2e144f8d
commit 532b961cf2
3 changed files with 63 additions and 4 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 11 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `ModbusDriver.cs:92,99-122` |
| Status | Open |
| Status | Resolved |
**Description:** `_lastPublishedByRef` is a plain `Dictionary<string, object>` mutated inside `ShouldPublish`, which runs on the `PollGroupEngine.onChange` callback. `PollGroupEngine` runs one background `Task` per subscription (`PollGroupEngine.cs:64`), so a driver with two or more subscriptions invokes `onChange` — and therefore `ShouldPublish` — concurrently on separate threads. `ShouldPublish` does `TryGetValue` and indexer writes on the unsynchronized dictionary (`ModbusDriver.cs:108`, `112`, `120`). Concurrent reads/writes of a non-thread-safe `Dictionary` can corrupt internal state, drop entries, or throw `IndexOutOfRangeException`/`InvalidOperationException`, crashing the poll loop. The sibling cache `_lastWrittenByRef` is correctly guarded by `_lastWrittenLock` — only the deadband cache was left unprotected.
**Recommendation:** Guard `_lastPublishedByRef` with a dedicated lock around every access in `ShouldPublish`, or switch it to `ConcurrentDictionary<string, object>` and use `AddOrUpdate`/`TryGetValue`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — switched `_lastPublishedByRef` to `ConcurrentDictionary<string, object>` so the `TryGetValue`/indexer-write accesses in `ShouldPublish` are thread-safe under concurrent multi-subscription `onChange` callbacks; added a concurrent-deadband-subscription regression test.
### Driver.Modbus-002

View File

@@ -1,4 +1,5 @@
using System.Buffers.Binary;
using System.Collections.Concurrent;
using System.Text.Json;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
@@ -89,7 +90,11 @@ public sealed class ModbusDriver
// Last-published value per tag, keyed by FullReference. Used by ShouldPublish to apply
// the deadband filter. Stored as object so all numeric types share one map; the comparison
// does a typed cast inside.
private readonly Dictionary<string, object> _lastPublishedByRef = new(StringComparer.OrdinalIgnoreCase);
// Driver.Modbus-001: ShouldPublish runs on the PollGroupEngine onChange callback, which
// executes on one background Task per subscription — so a multi-subscription driver mutates
// this map concurrently from several threads. A plain Dictionary corrupts under concurrent
// writes; ConcurrentDictionary makes every TryGetValue / indexer write thread-safe.
private readonly ConcurrentDictionary<string, object> _lastPublishedByRef = new(StringComparer.OrdinalIgnoreCase);
// Last-written value per tag for the WriteOnChangeOnly suppression. Invalidated by reads
// that return a different value (so an HMI-side change doesn't get masked).

View File

@@ -171,6 +171,60 @@ public sealed class ModbusSubscriptionTests
await drv.UnsubscribeAsync(hb, CancellationToken.None);
}
/// <summary>
/// Driver.Modbus-001 regression: the deadband cache <c>_lastPublishedByRef</c> is read and
/// written inside <c>ShouldPublish</c>, which runs on the <c>PollGroupEngine</c> onChange
/// callback — one background <c>Task</c> per subscription. With many deadband-configured
/// tags spread across many subscriptions all polling fast, the callback fires concurrently
/// on several threads. A plain <c>Dictionary</c> corrupts under concurrent mutation and
/// throws <c>InvalidOperationException</c>/<c>IndexOutOfRangeException</c>, faulting the
/// poll loop. With a <c>ConcurrentDictionary</c> the run completes cleanly. Each tag's
/// value steps up by 5 every poll (well over the deadband of 2) so every poll publishes,
/// maximising contention on the cache.
/// </summary>
[Fact]
public async Task Concurrent_deadband_subscriptions_do_not_corrupt_the_publish_cache()
{
const int tagCount = 24;
var tags = Enumerable.Range(0, tagCount)
.Select(i => new ModbusTagDefinition(
$"T{i}", ModbusRegion.HoldingRegisters, (ushort)i, ModbusDataType.Int16, Deadband: 2.0))
.ToArray();
var (drv, fake) = NewDriver(tags);
await drv.InitializeAsync("{}", CancellationToken.None);
var events = new ConcurrentQueue<DataChangeEventArgs>();
var faults = new ConcurrentQueue<Exception>();
drv.OnDataChange += (_, e) =>
{
try { events.Enqueue(e); }
catch (Exception ex) { faults.Enqueue(ex); }
};
// One single-tag subscription per tag => one PollGroupEngine background Task per tag,
// so ShouldPublish (and the cache it mutates) is hit concurrently from tagCount threads.
var handles = new List<ISubscriptionHandle>();
foreach (var t in tags)
handles.Add(await drv.SubscribeAsync([t.Name], TimeSpan.FromMilliseconds(100), CancellationToken.None));
// Churn every tag's value for ~1s so every poll clears the deadband and writes the cache.
var deadline = DateTime.UtcNow + TimeSpan.FromSeconds(1);
while (DateTime.UtcNow < deadline)
{
for (var i = 0; i < tagCount; i++)
fake.HoldingRegisters[i] = (ushort)(fake.HoldingRegisters[i] + 5);
await Task.Delay(20);
}
foreach (var h in handles)
await drv.UnsubscribeAsync(h, CancellationToken.None);
faults.ShouldBeEmpty();
// The poll loop survived: health is not Faulted and changes were published.
drv.GetHealth().State.ShouldNotBe(DriverState.Faulted);
events.ShouldNotBeEmpty();
}
private static async Task WaitForCountAsync<T>(ConcurrentQueue<T> q, int target, TimeSpan timeout)
{
var deadline = DateTime.UtcNow + timeout;