fix(abcip,focas): collapse alarm projection to a single poll loop (no reconnect leak)

The owning DriverInstanceActor re-subscribes alarms on every Connected
entry (DetachAlarmSource nulls its cached handle on Connected->Reconnecting
without calling UnsubscribeAlarmsAsync), and the driver object + its alarm
projection are reused across every in-place reconnect. Each SubscribeAsync
started a fresh, never-cancelled Task.Run poll loop and added it to _subs,
so N reconnects leaked N concurrent loops all polling the device and all
firing the same raise/clear transitions => duplicate alarm events + CPU/mem
growth.

Mirrors the Galaxy #399 fix (Clear-before-Add) but for live poll loops the
collapse must also CANCEL the superseded loops, not just drop references.
SubscribeAsync now snapshots existing subs under _subsLock, clears _subs,
adds the new sub, starts its loop, then retires each stale sub out-of-band
(RetireAsync: Cancel + await loop + Dispose CTS, fire-and-forget so the new
subscription's return isn't blocked on a poll interval). Snapshot+clear under
the same lock DisposeAsync uses guarantees no double-own / double-dispose.

There is exactly one consumer per driver instance (factory-per-actor), so
retiring all prior subscriptions before starting the new one is faithful.

Regression tests (TDD, fail->pass): subscribe twice then drive one device
raise; assert OnAlarmEvent fires exactly once (was twice with two leaked
loops).
This commit is contained in:
Joseph Doherty
2026-06-15 06:09:38 -04:00
parent 43b3769a1d
commit 6ba59f9d4d
4 changed files with 190 additions and 2 deletions
@@ -62,13 +62,46 @@ internal sealed class AbCipAlarmProjection : IAsyncDisposable
var cts = new CancellationTokenSource();
var sub = new Subscription(handle, [..sourceNodeIds], cts);
lock (_subsLock) _subs[id] = sub;
// Collapse to a SINGLE active poll loop. The owning DriverInstanceActor re-subscribes
// alarms on every Connected entry (its DetachAlarmSource nulls the cached handle on
// Connected→Reconnecting WITHOUT calling Unsubscribe), and this projection is reused across
// every in-place reconnect, so each SubscribeAsync would otherwise leak a live, never-
// cancelled poll loop. There is exactly one consumer per driver instance, so retiring all
// prior subscriptions before starting the new one is semantically faithful. Snapshotting +
// clearing under the same lock DisposeAsync uses guarantees the stale subs can never be
// double-owned (and thus never double-disposed) by a racing dispose.
List<Subscription> stale;
lock (_subsLock)
{
stale = _subs.Values.ToList();
_subs.Clear();
_subs[id] = sub;
}
sub.Loop = Task.Run(() => RunPollLoopAsync(sub, cts.Token), cts.Token);
// Retire superseded loops out-of-band so the new subscription's return isn't blocked on a
// full poll interval (awaiting a loop means waiting for it to exit its Task.Delay). The
// loops already catch internally; RetireAsync still wraps every await defensively.
foreach (var old in stale) _ = RetireAsync(old);
await Task.CompletedTask;
return handle;
}
/// <summary>
/// Cancels a superseded subscription's poll loop, waits for it to wind down, and disposes
/// its CTS. Fire-and-forget from <see cref="SubscribeAsync"/>; every await is wrapped so an
/// unobserved exception can never escape (the loops already swallow their own).
/// </summary>
/// <param name="sub">The retired subscription whose loop must be cancelled + disposed.</param>
private static async Task RetireAsync(Subscription sub)
{
try { sub.Cts.Cancel(); } catch { }
try { await sub.Loop.ConfigureAwait(false); } catch { }
sub.Cts.Dispose();
}
/// <summary>Unsubscribes from alarm events using the provided subscription handle.</summary>
/// <param name="handle">The subscription handle obtained from <see cref="SubscribeAsync"/>.</param>
/// <param name="cancellationToken">A cancellation token to stop the operation.</param>
@@ -56,12 +56,47 @@ internal sealed class FocasAlarmProjection : IAsyncDisposable
: new HashSet<string>(sourceNodeIds, StringComparer.OrdinalIgnoreCase);
var sub = new Subscription(handle, filter, cts);
lock (_subsLock) _subs[id] = sub;
// Collapse to a SINGLE active poll loop. The owning DriverInstanceActor re-subscribes
// alarms on every Connected entry (its DetachAlarmSource nulls the cached handle on
// Connected→Reconnecting WITHOUT calling Unsubscribe), and this projection is reused across
// every in-place reconnect, so each SubscribeAsync would otherwise leak a live, never-
// cancelled poll loop. There is exactly one consumer per driver instance, so retiring all
// prior subscriptions before starting the new one is semantically faithful. Snapshotting +
// clearing under the same lock DisposeAsync uses guarantees the stale subs can never be
// double-owned (and thus never double-disposed) by a racing dispose.
List<Subscription> stale;
lock (_subsLock)
{
stale = [.. _subs.Values];
_subs.Clear();
_subs[id] = sub;
}
sub.Loop = Task.Run(() => RunPollLoopAsync(sub, cts.Token), cts.Token);
// Retire superseded loops out-of-band so the new subscription's return isn't blocked on a
// full poll interval (awaiting a loop means waiting for it to exit its Task.Delay). The
// loops already catch internally; RetireAsync still wraps every await defensively.
foreach (var old in stale) _ = RetireAsync(old);
return Task.FromResult<IAlarmSubscriptionHandle>(handle);
}
/// <summary>
/// Cancels a superseded subscription's poll loop, waits for it to wind down, and disposes
/// its CTS. Fire-and-forget from <see cref="SubscribeAsync"/>; every await is wrapped so an
/// unobserved exception can never escape (the loops already swallow their own).
/// </summary>
/// <param name="sub">The retired subscription whose loop must be cancelled + disposed.</param>
private async Task RetireAsync(Subscription sub)
{
try { sub.Cts.Cancel(); }
catch (Exception ex) { _logger.LogDebug(ex, "Cancelling superseded alarm-subscription CTS failed"); }
try { await sub.Loop.ConfigureAwait(false); }
catch (Exception ex) { _logger.LogDebug(ex, "Awaiting superseded alarm-subscription loop failed during retire"); }
sub.Cts.Dispose();
}
/// <summary>Unsubscribes from an alarm subscription.</summary>
/// <param name="handle">Alarm subscription handle.</param>
/// <param name="cancellationToken">Cancellation token.</param>