fix(historian-gateway): guard recorder outbox-append failures + retry-success test + Sender capture + mux deregister

I-1: Wrap the OnValueChangedAsync AppendAsync in try/catch so a durable-boundary
failure (e.g. a PerEntry fsync hitting disk-full/I-O error) can no longer propagate
out of the handler and trip Akka supervision into a restart loop. A canceled append
during shutdown returns quietly; any other exception increments a new
_outboxAppendFailures counter, logs a Warning (exception type name only), and drops
the value without recording it or nudging the drain. The counter is surfaced on
RecorderStatus (new OutboxAppendFailures field).

I-2: Strengthen Writer_failure_keeps_entry_for_retry to prove the drain actually ran
— assert the writer was invoked (the fake records even on Succeed=false) AND the
outbox stayed at 1 (RemoveAsync not called), via AwaitAssertAsync.

M-3: Capture Sender before the await in the GetStatus handler, then Tell the reply.

M-4: Add Retry_after_writer_failure_eventually_acks proving the retry -> success ->
ack path; FakeValueWriter gains a FailFirstN option + CallCount (Succeed behaviour
unchanged). Short minBackoff keeps it fast and deterministic (AwaitAssert, no sleep).

M-5: Deregister mux interest on PostStop via DependencyMuxActor.UnregisterInterest,
mirroring VirtualTagActor.PostStop, closing the dead-letter window before Terminated.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
This commit is contained in:
Joseph Doherty
2026-06-26 18:34:19 -04:00
parent 82124ee4f8
commit 97528c500f
2 changed files with 94 additions and 5 deletions
@@ -66,8 +66,40 @@ public sealed class ContinuousHistorizationRecorderTests : TestKit
rec.Tell(new VirtualTagActor.DependencyValueChanged("Pump1.Temp", 7.0, DateTime.UtcNow));
// Prove the drain actually RAN, not just that the append happened: assert the writer was
// invoked (the fake records every value, even on Succeed=false) AND that the outbox stayed at 1
// (RemoveAsync was NOT called, so the un-acked entry is retained for retry). Count==1 alone is
// true the instant the append lands and would not catch a silently-broken drain.
await AwaitAssertAsync(async () =>
Assert.Equal(1, await outbox.CountAsync(default))); // not acked -> retained for retry
{
Assert.Contains(writer.Snapshot(), w => w.Tag == "Pump1.Temp" && w.Value == 7.0);
Assert.Equal(1, await outbox.CountAsync(default));
});
}
[Fact]
public async Task Retry_after_writer_failure_eventually_acks()
{
var mux = CreateTestProbe();
// First drain fails, the next succeeds — exercises the retry → success → ack path.
var writer = new FakeValueWriter { Succeed = true, FailFirstN = 1 };
var outbox = new InMemoryOutbox();
// Short backoff so the retry fires promptly; the assert below is still time-bounded and
// deterministic (AwaitAssert polls — no Thread.Sleep).
var rec = Sys.ActorOf(ContinuousHistorizationRecorder.Props(
mux.Ref, writer, outbox, new[] { "Pump1.Temp" },
minBackoff: TimeSpan.FromMilliseconds(50),
maxBackoff: TimeSpan.FromMilliseconds(200)));
rec.Tell(new VirtualTagActor.DependencyValueChanged("Pump1.Temp", 13.0, DateTime.UtcNow));
// The first drain returns false (entry retained); after the backoff the retry drain succeeds
// and acks, truncating the outbox to 0.
await AwaitAssertAsync(async () =>
Assert.Equal(0, await outbox.CountAsync(default)), TimeSpan.FromSeconds(5));
Assert.True(writer.CallCount >= 2, "the writer must have been called at least twice (a retry happened)");
}
[Fact]
@@ -99,9 +131,27 @@ public sealed class ContinuousHistorizationRecorderTests : TestKit
{
private readonly Lock _gate = new();
private readonly List<WrittenValue> _written = new();
private int _calls;
public bool Succeed { get; init; } = true;
/// <summary>When &gt; 0, the first N calls fail (return false) regardless of <see cref="Succeed"/>;
/// later calls use <see cref="Succeed"/>. Lets a test prove the retry → success → ack path while
/// leaving the plain <see cref="Succeed"/> behaviour (FailFirstN==0) untouched.</summary>
public int FailFirstN { get; init; }
/// <summary>Lifetime count of <see cref="WriteLiveValuesAsync"/> invocations (proves a retry ran).</summary>
public int CallCount
{
get
{
lock (_gate)
{
return _calls;
}
}
}
public IReadOnlyList<WrittenValue> Snapshot()
{
lock (_gate)
@@ -115,13 +165,16 @@ public sealed class ContinuousHistorizationRecorderTests : TestKit
{
lock (_gate)
{
_calls++;
foreach (HistorizationValue v in values)
{
_written.Add(new WrittenValue(tag, v.Value, v.Quality, v.TimestampUtc));
}
}
return Task.FromResult(Succeed);
// Fail the first FailFirstN calls; thereafter honour Succeed.
bool result = _calls > FailFirstN && Succeed;
return Task.FromResult(result);
}
}
}