review(Core.ScriptedAlarms): stop shelving timer on failed reload + drop dead branch

Re-review at 7286d320. -015: dispose shelving timer at top of LoadAsync so a failed
reload doesn't leave it firing against partially-cleared state + test. -014: make
pendingEmissions required (removes unreachable fire-under-gate branch that could
reintroduce the -003 deadlock).
This commit is contained in:
Joseph Doherty
2026-06-19 11:21:35 -04:00
parent 621d00e455
commit 272a9da61e
3 changed files with 186 additions and 14 deletions
@@ -947,6 +947,64 @@ public sealed class ScriptedAlarmEngineTests
r.NoOpReason.ShouldBeNull("None() factory has no reason — only NoOp() carries one");
}
// -------------------------------------------------------------------------
// Core.ScriptedAlarms-015: timer runs against partially-cleared _alarms on
// a failed LoadAsync reload. When a second LoadAsync throws at compile time,
// _shelvingTimer?.Dispose() (which lives AFTER the compileFailures throw in
// the pre-fix code) is skipped, so the old timer keeps firing against
// whatever _alarms was left in after the partial clear + partial recompile.
// The timer is eventually cleaned up by Dispose(), so there is no permanent
// resource leak; the observable risk is unexpected shelving-check side effects
// during the window between the failed reload and Dispose. The fix moves
// _shelvingTimer?.Dispose() to the START of the try block, alongside
// UnsubscribeFromUpstream, so the timer is always stopped before _alarms is
// cleared.
// -------------------------------------------------------------------------
/// <summary>Verifies that a failed second LoadAsync (compile errors) leaves the engine in a consistent
/// state: the old timer is eventually cleaned up by Dispose, a third successful LoadAsync restores
/// functionality, and no background task outlives the engine (Core.ScriptedAlarms-015).</summary>
[Fact]
public async Task Failed_reload_leaves_engine_recoverable_and_disposes_cleanly(/* -015 */)
{
var up = new FakeUpstream();
up.Set("Temp", 50);
var logger = new LoggerConfiguration().CreateLogger();
var store = new InMemoryAlarmStateStore();
var eng = new ScriptedAlarmEngine(up, store, new ScriptLoggerFactory(logger), logger);
// === First load: succeeds ===
await eng.LoadAsync([Alarm("A", "return false;")], TestContext.Current.CancellationToken);
eng.LoadedAlarmIds.ShouldContain("A");
// === Second load: fails at compile time ===
// The old timer (started by the first load) now runs against whatever _alarms
// contains after the partial clear + partial recompile. _shelvingTimer?.Dispose()
// is skipped because it lives after the compileFailures throw.
var ex = await Should.ThrowAsync<InvalidOperationException>(async () =>
await eng.LoadAsync([Alarm("B", "return cantCompileThis___!;")],
TestContext.Current.CancellationToken));
ex.Message.ShouldContain("did not compile");
// Engine is in a partially-cleared state — A is gone, B didn't compile.
eng.LoadedAlarmIds.ShouldNotContain("A");
eng.LoadedAlarmIds.ShouldNotContain("B");
// === Third load: succeeds — engine is still operational ===
up.Set("Temp", 50);
await eng.LoadAsync([Alarm("C", "return false;")], TestContext.Current.CancellationToken);
eng.LoadedAlarmIds.ShouldContain("C");
// Dispose must complete cleanly (no background tasks outlive the engine,
// _shelvingTimer is stopped). Before the fix, the old timer from the first
// load could keep firing between the failed second load and Dispose.
// After the fix, it is stopped at the START of every LoadAsync attempt
// (successful or not), so Dispose always finds a clean timer state.
var disposeTask = Task.Run(() => eng.Dispose());
await disposeTask.WaitAsync(TimeSpan.FromSeconds(3), TestContext.Current.CancellationToken);
// If Dispose threw or hung, the WaitAsync would surface it.
}
private static async Task WaitForAsync(Func<bool> cond, int timeoutMs = 2000)
{
var deadline = DateTime.UtcNow.AddMilliseconds(timeoutMs);