fix(galaxy): bound alarm-subscription handles to one (no reconnect leak)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
GalaxyDriver's StreamAlarms feed is session-less and survives an in-place reconnect, so DriverInstanceActor re-subscribed on every Connected re-entry (after dropping its own cached handle without an Unsubscribe — sync teardown). The re-subscribe was additive: _alarmSubscriptions.Add grew the list by one untracked handle per reconnect cycle — a slow unbounded leak. Functionally harmless (the gate is Count>0 and OnAlarmFeedTransition only reads [0], firing once regardless), but it accumulated forever. Fix: SubscribeAlarmsAsync clears the set before adding, collapsing to a single live handle (under the existing _alarmHandlersLock, atomic w.r.t. the fan-out reader). There is exactly one consumer per driver instance (factory-per-actor lifecycle), so replacing the set with the latest handle is faithful. Chosen over making the actor's sync DetachAlarmSource call UnsubscribeAlarmsAsync async/fire-and-forget — disproportionate for a minor leak. Regression test Re_subscribe_collapses_to_a_single_handle_no_accumulation (TDD-verified: FAILS without the Clear — releasing the latest handle leaves the feed open because stale handles remain; PASSES with the fix). Galaxy tests 263 pass / 3 skip; Runtime native-alarm 24 pass. Code-reviewed (approved).
This commit is contained in:
@@ -1059,6 +1059,15 @@ public sealed class GalaxyDriver
|
||||
var handle = new GalaxyAlarmSubscriptionHandle(Guid.NewGuid().ToString("N"));
|
||||
lock (_alarmHandlersLock)
|
||||
{
|
||||
// Collapse to a SINGLE live handle. The session-less StreamAlarms feed survives an in-place
|
||||
// reconnect, so DriverInstanceActor re-subscribing on every Connected re-entry (after dropping
|
||||
// its own cached handle) would otherwise grow this list by one untracked handle per reconnect —
|
||||
// a slow leak (the gate only checks Count > 0 and OnAlarmFeedTransition only reads [0], so the
|
||||
// extras were pure dead weight). There is exactly one consumer per driver instance — enforced
|
||||
// structurally by the factory-per-actor lifecycle (one GalaxyDriver is created for one owning
|
||||
// DriverInstanceActor, the only caller of SubscribeAlarmsAsync), not by mere convention — so
|
||||
// replacing the set with the latest handle is semantically faithful and bounds it at 1.
|
||||
_alarmSubscriptions.Clear();
|
||||
_alarmSubscriptions.Add(handle);
|
||||
}
|
||||
return Task.FromResult<IAlarmSubscriptionHandle>(handle);
|
||||
|
||||
@@ -571,9 +571,9 @@ public sealed class DriverInstanceActor : ReceiveActor, IWithTimers
|
||||
// reconnects. NOTE: this does NOT tear down the driver-side subscription. For a session-bound
|
||||
// IAlarmSource the old subscription dies with the session (no accumulation). For a session-less feed
|
||||
// (GalaxyDriver's always-on central monitor) it survives an in-place reconnect, so the re-subscribe
|
||||
// is additive — harmless because the gate only checks Count > 0 and the feed fans out once
|
||||
// regardless of handle count, but it does slowly accumulate handles across many reconnects (a minor
|
||||
// leak tracked as a follow-up; the correct cleanup is a driver-side reset on re-init).
|
||||
// is additive — but the driver now collapses to a single live handle on each SubscribeAlarmsAsync
|
||||
// (GalaxyDriver.SubscribeAlarmsAsync clears the set before adding), so handles no longer accumulate
|
||||
// across reconnects. The gate (Count > 0) and the one-shot fan-out are unchanged.
|
||||
_alarmSubscriptionHandle = null;
|
||||
}
|
||||
|
||||
|
||||
@@ -116,6 +116,50 @@ public sealed class GalaxyDriverAlarmSourceTests
|
||||
observed.ShouldBeEmpty();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Regression for the session-less alarm-handle leak: the gateway's StreamAlarms feed survives an
|
||||
/// in-place reconnect, so <c>DriverInstanceActor</c> re-subscribes on every Connected re-entry
|
||||
/// (after dropping its own cached handle). The driver must COLLAPSE to a single live handle on each
|
||||
/// subscribe rather than accumulate one per reconnect. Observable proof (the handle set is private):
|
||||
/// after three re-subscribes, releasing only the LATEST handle fully gates the feed — which can only
|
||||
/// happen if the earlier handles were not retained. Before the fix the set kept h1/h2/h3, so
|
||||
/// releasing h3 left the gate open and a transition still surfaced.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task Re_subscribe_collapses_to_a_single_handle_no_accumulation()
|
||||
{
|
||||
var feed = new FakeAlarmFeed();
|
||||
var ack = new RecordingAcknowledger();
|
||||
using var driver = NewDriver(feed, ack);
|
||||
|
||||
// Simulate the actor re-subscribing on each in-place reconnect.
|
||||
var h1 = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
|
||||
var h2 = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
|
||||
var latest = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
|
||||
|
||||
// Each subscribe issues a DISTINCT handle (not a reused singleton) — otherwise the collapse
|
||||
// semantics would be vacuous.
|
||||
h1.DiagnosticId.ShouldNotBe(h2.DiagnosticId);
|
||||
h2.DiagnosticId.ShouldNotBe(latest.DiagnosticId);
|
||||
|
||||
var observed = new List<AlarmEventArgs>();
|
||||
driver.OnAlarmEvent += (_, args) => observed.Add(args);
|
||||
|
||||
// Releasing ONLY the latest handle must fully gate the feed — proving the driver held exactly one
|
||||
// live handle, not the three issued across the simulated reconnects.
|
||||
await driver.UnsubscribeAlarmsAsync(latest, CancellationToken.None);
|
||||
feed.Emit(NewTransition("Tank01.Level.HiHi", "Tank01",
|
||||
GalaxyAlarmTransitionKind.Raise, AlarmSeverity.High));
|
||||
observed.ShouldBeEmpty("re-subscribe must collapse to one handle; releasing it gates the feed");
|
||||
|
||||
// And the latest subscribe is the live one: a fresh subscribe re-opens the gate.
|
||||
var reopened = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
|
||||
feed.Emit(NewTransition("Tank01.Level.HiHi", "Tank01",
|
||||
GalaxyAlarmTransitionKind.Raise, AlarmSeverity.High));
|
||||
observed.ShouldHaveSingleItem();
|
||||
observed[0].SubscriptionHandle.ShouldBe(reopened);
|
||||
}
|
||||
|
||||
/// <summary>Verifies that UnsubscribeAlarmsAsync throws for a foreign handle.</summary>
|
||||
[Fact]
|
||||
public async Task UnsubscribeAlarmsAsync_throws_for_foreign_handle()
|
||||
|
||||
Reference in New Issue
Block a user