fix(abcip,focas): collapse alarm projection to a single poll loop (no reconnect leak)

The owning DriverInstanceActor re-subscribes alarms on every Connected
entry (DetachAlarmSource nulls its cached handle on Connected->Reconnecting
without calling UnsubscribeAlarmsAsync), and the driver object + its alarm
projection are reused across every in-place reconnect. Each SubscribeAsync
started a fresh, never-cancelled Task.Run poll loop and added it to _subs,
so N reconnects leaked N concurrent loops all polling the device and all
firing the same raise/clear transitions => duplicate alarm events + CPU/mem
growth.

Mirrors the Galaxy #399 fix (Clear-before-Add) but for live poll loops the
collapse must also CANCEL the superseded loops, not just drop references.
SubscribeAsync now snapshots existing subs under _subsLock, clears _subs,
adds the new sub, starts its loop, then retires each stale sub out-of-band
(RetireAsync: Cancel + await loop + Dispose CTS, fire-and-forget so the new
subscription's return isn't blocked on a poll interval). Snapshot+clear under
the same lock DisposeAsync uses guarantees no double-own / double-dispose.

There is exactly one consumer per driver instance (factory-per-actor), so
retiring all prior subscriptions before starting the new one is faithful.

Regression tests (TDD, fail->pass): subscribe twice then drive one device
raise; assert OnAlarmEvent fires exactly once (was twice with two leaked
loops).
This commit is contained in:
Joseph Doherty
2026-06-15 06:09:38 -04:00
parent 43b3769a1d
commit 6ba59f9d4d
4 changed files with 190 additions and 2 deletions
@@ -119,6 +119,68 @@ public sealed class AbCipAlarmProjectionTests
await drv.ShutdownAsync(CancellationToken.None);
}
/// <summary>
/// Regression for the reconnect poll-loop leak (#399 sibling): the owning
/// DriverInstanceActor re-subscribes alarms on every Connected entry without first calling
/// Unsubscribe, and the driver object (and its projection) survives the in-place reconnect.
/// Each SubscribeAsync used to start a fresh, never-cancelled poll loop — so after N
/// reconnects there were N live loops all polling the device and all firing the same
/// raise/clear transition, producing DUPLICATE alarm events.
///
/// This simulates two re-subscribes (one reconnect) against the same source, then drives
/// ONE 0->1 device transition. After the collapse-to-single-loop fix exactly one loop is
/// alive so the raise fires exactly once; before the fix both loops fire it ⇒ two events.
/// </summary>
[Fact]
public async Task Resubscribe_Collapses_To_Single_Loop_No_Duplicate_Raise()
{
var factory = new FakeAbCipTagFactory();
var opts = new AbCipDriverOptions
{
Devices = [new AbCipDeviceOptions(Device)],
Tags = [AlmdTag("HighTemp")],
EnableAlarmProjection = true,
AlarmPollInterval = TimeSpan.FromMilliseconds(20),
EnableDeclarationOnlyUdtGrouping = true,
};
var drv = new AbCipDriver(opts, "drv-1", factory);
await drv.InitializeAsync("{}", CancellationToken.None);
var raises = new List<AlarmEventArgs>();
drv.OnAlarmEvent += (_, e) =>
{
if (e.Message.Contains("raised")) lock (raises) raises.Add(e);
};
// First subscribe creates + starts polling the HighTemp runtime. Seed InFaulted=false +
// a severity so the loop's baseline settles on "not faulted".
var sub1 = await drv.SubscribeAlarmsAsync(["HighTemp"], CancellationToken.None);
await WaitForTagCreation(factory, "HighTemp");
factory.Tags["HighTemp"].ValuesByOffset[0] = 0; // InFaulted=false at offset 0
factory.Tags["HighTemp"].ValuesByOffset[8] = 500; // Severity at offset 8
await Task.Delay(80); // let sub1's loop seed its "last-seen false" baseline
// Second subscribe = the actor re-subscribing across a reconnect. Its loop reads the same
// shared HighTemp runtime; give it time to seed its own "last-seen false" baseline too.
var sub2 = await drv.SubscribeAlarmsAsync(["HighTemp"], CancellationToken.None);
await Task.Delay(80);
factory.Tags["HighTemp"].ValuesByOffset[0] = 1; // one 0->1 transition
// Wait past several 20ms poll intervals so any leaked second loop has ample time to also
// fire its duplicate raise before we assert.
await Task.Delay(250);
await drv.UnsubscribeAlarmsAsync(sub1, CancellationToken.None);
await drv.UnsubscribeAlarmsAsync(sub2, CancellationToken.None);
await drv.ShutdownAsync(CancellationToken.None);
lock (raises)
{
raises.Count.ShouldBe(1,
"exactly one poll loop must survive a re-subscribe; a leaked loop fires a duplicate raise");
}
}
/// <summary>Verifies that alarm clear event fires on 1-to-0 transition.</summary>
[Fact]
public async Task Clear_Event_Fires_On_1_to_0_Transition()