# Catch-Up Replay And Advanced Retry Implementation Plan > **For Codex:** REQUIRED SUB-SKILL: Use `executeplan` to implement this plan task-by-task. **Goal:** Add best-effort latest-value catch-up after reconnect and replace the fixed reconnect delay schedule with a production-grade retry policy, while also fixing the current reconnect quality issues. **Architecture:** Extend the existing reconnect runtime with a small runtime-options layer, a retry-policy calculator, and a post-reconnect catch-up refresh phase. Keep reconnect success defined as restored live subscriptions, and treat catch-up as a best-effort follow-on phase that emits synthetic updates marked separately from live traffic. **Tech Stack:** .NET 10, C#, xUnit, existing SuiteLink protocol/client/runtime/transport layers --- ### Task 1: Add Runtime Option Types **Files:** - Create: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRuntimeOptions.cs` - Create: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRetryPolicy.cs` - Create: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkCatchUpPolicy.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkConnectionOptions.cs` - Test: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkConnectionOptionsTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public void ConnectionOptions_DefaultsRuntimeOptions() { var options = new SuiteLinkConnectionOptions( host: "127.0.0.1", application: "App", topic: "Topic", clientName: "Client", clientNode: "Node", userName: "User", serverNode: "Server"); Assert.NotNull(options.Runtime); Assert.Equal(SuiteLinkCatchUpPolicy.None, options.Runtime.CatchUpPolicy); Assert.NotNull(options.Runtime.RetryPolicy); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter ConnectionOptions_DefaultsRuntimeOptions -v minimal` Expected: FAIL because runtime options do not exist yet **Step 3: Write minimal implementation** Create: ```csharp public enum SuiteLinkCatchUpPolicy { None = 0, RefreshLatestValue = 1 } ``` ```csharp public sealed record class SuiteLinkRetryPolicy( TimeSpan InitialDelay, double Multiplier, TimeSpan MaxDelay, int? MaxAttempts = null, bool UseJitter = true) { public static SuiteLinkRetryPolicy Default { get; } = new(TimeSpan.FromSeconds(1), 2.0, TimeSpan.FromSeconds(30)); } ``` ```csharp public sealed record class SuiteLinkRuntimeOptions( SuiteLinkRetryPolicy RetryPolicy, SuiteLinkCatchUpPolicy CatchUpPolicy, TimeSpan CatchUpTimeout) { public static SuiteLinkRuntimeOptions Default { get; } = new(SuiteLinkRetryPolicy.Default, SuiteLinkCatchUpPolicy.None, TimeSpan.FromSeconds(2)); } ``` Update `SuiteLinkConnectionOptions` to expose: ```csharp public SuiteLinkRuntimeOptions Runtime { get; } ``` and default it to `SuiteLinkRuntimeOptions.Default`. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkConnectionOptionsTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRuntimeOptions.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRetryPolicy.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkCatchUpPolicy.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkConnectionOptions.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkConnectionOptionsTests.cs git commit -m "feat: add runtime reconnect option types" ``` ### Task 2: Add Retry Policy Delay Calculator **Files:** - Create: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs` - Test: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public void GetDelay_UsesImmediateThenExponentialCap() { var policy = new SuiteLinkRetryPolicy( InitialDelay: TimeSpan.FromSeconds(1), Multiplier: 2.0, MaxDelay: TimeSpan.FromSeconds(30), UseJitter: false); Assert.Equal(TimeSpan.Zero, SuiteLinkRetryDelayCalculator.GetDelay(policy, 0)); Assert.Equal(TimeSpan.FromSeconds(1), SuiteLinkRetryDelayCalculator.GetDelay(policy, 1)); Assert.Equal(TimeSpan.FromSeconds(2), SuiteLinkRetryDelayCalculator.GetDelay(policy, 2)); Assert.Equal(TimeSpan.FromSeconds(4), SuiteLinkRetryDelayCalculator.GetDelay(policy, 3)); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal` Expected: FAIL because calculator does not exist yet **Step 3: Write minimal implementation** Create: ```csharp internal static class SuiteLinkRetryDelayCalculator { public static TimeSpan GetDelay(SuiteLinkRetryPolicy policy, int attempt) { if (attempt == 0) { return TimeSpan.Zero; } var rawSeconds = policy.InitialDelay.TotalSeconds * Math.Pow(policy.Multiplier, attempt - 1); var bounded = TimeSpan.FromSeconds(Math.Min(rawSeconds, policy.MaxDelay.TotalSeconds)); return bounded; } } ``` Do not add jitter yet beyond the policy flag unless tests require it. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs git commit -m "feat: add reconnect retry delay calculator" ``` ### Task 3: Wire Retry Policy Into Reconnect Runtime **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public async Task Reconnect_UsesConfiguredRetryPolicy() { var observed = new List(); var options = CreateOptions() with { Runtime = new SuiteLinkRuntimeOptions( new SuiteLinkRetryPolicy(TimeSpan.FromSeconds(3), 3.0, TimeSpan.FromSeconds(20), UseJitter: false), SuiteLinkCatchUpPolicy.None, TimeSpan.FromSeconds(2)) }; var client = CreateReconnectClient(delayAsync: (delay, _) => { observed.Add(delay); return Task.CompletedTask; }); await client.ConnectAsync(options); await EventuallyReconnectAsync(client); Assert.Contains(TimeSpan.FromSeconds(3), observed); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_UsesConfiguredRetryPolicy -v minimal` Expected: FAIL because reconnect still uses a fixed schedule **Step 3: Write minimal implementation** In `SuiteLinkClient`: - remove direct use of `ReconnectDelaySchedule` - read retry policy from `_connectionOptions!.Runtime.RetryPolicy` - use `SuiteLinkRetryDelayCalculator.GetDelay(policy, attempt)` Keep the current injected `_delayAsync` test seam. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs git commit -m "feat: apply retry policy to reconnect runtime" ``` ### Task 4: Fix Fast-Fail Writes During Reconnect **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientWriteTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public async Task WriteAsync_DuringReconnect_ThrowsBeforeWaitingOnOperationGate() { var client = CreateClientWithBlockedOperationGateAndReconnectState(); var ex = await Assert.ThrowsAsync( () => client.WriteAsync("Pump001.Run", SuiteLinkValue.FromBoolean(true))); Assert.Contains("reconnecting", ex.Message, StringComparison.OrdinalIgnoreCase); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter WriteAsync_DuringReconnect_ThrowsBeforeWaitingOnOperationGate -v minimal` Expected: FAIL because `WriteAsync` currently waits on `_operationGate` first **Step 3: Write minimal implementation** Move the reconnect state check ahead of: ```csharp await _operationGate.WaitAsync(...) ``` while keeping disposed-state checks intact. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientWriteTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientWriteTests.cs git commit -m "fix: fail writes before reconnect gate contention" ``` ### Task 5: Fix Transport Reset Ownership Semantics **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/SuiteLinkTcpTransport.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/ISuiteLinkReconnectableTransport.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Transport/SuiteLinkTcpTransportTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public async Task ResetConnectionAsync_LeaveOpenTrue_DoesNotDisposeInjectedStream() { var stream = new TrackingStream(); await using var transport = new SuiteLinkTcpTransport(stream, leaveOpen: true); await transport.ResetConnectionAsync(); Assert.False(stream.WasDisposed); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter ResetConnectionAsync_LeaveOpenTrue_DoesNotDisposeInjectedStream -v minimal` Expected: FAIL because reset currently disposes caller-owned resources **Step 3: Write minimal implementation** Update `ResetConnectionAsync` to respect the same ownership rule as `DisposeAsync`: - if `leaveOpen` is `true`, detach without disposing injected resources - if `leaveOpen` is `false`, dispose detached resources Do not broaden interface scope unnecessarily. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkTcpTransportTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/SuiteLinkTcpTransport.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/ISuiteLinkReconnectableTransport.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Transport/SuiteLinkTcpTransportTests.cs git commit -m "fix: preserve transport ownership during reconnect reset" ``` ### Task 6: Add Update Source Metadata **Files:** - Create: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkUpdateSource.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkTagUpdate.cs` - Test: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkValueTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public void TagUpdate_DefaultSource_IsLive() { var update = new SuiteLinkTagUpdate( "Pump001.Run", 1, SuiteLinkValue.FromBoolean(true), 0x00C0, 1, DateTimeOffset.UtcNow); Assert.Equal(SuiteLinkUpdateSource.Live, update.Source); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter TagUpdate_DefaultSource_IsLive -v minimal` Expected: FAIL because source metadata does not exist **Step 3: Write minimal implementation** Create: ```csharp public enum SuiteLinkUpdateSource { Live = 0, CatchUpReplay = 1 } ``` Add `Source` to `SuiteLinkTagUpdate` with default: ```csharp SuiteLinkUpdateSource.Live ``` **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkTagUpdate -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkUpdateSource.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkTagUpdate.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkValueTests.cs git commit -m "feat: add update source metadata" ``` ### Task 7: Add Best-Effort Catch-Up Refresh Execution **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SubscriptionRegistrationEntry.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public async Task Reconnect_WithRefreshLatestValue_CanDispatchCatchUpReplay() { SuiteLinkTagUpdate? catchUp = null; var client = CreateReconnectReplayClient( catchUpPolicy: SuiteLinkCatchUpPolicy.RefreshLatestValue, onUpdate: update => { if (update.Source == SuiteLinkUpdateSource.CatchUpReplay) { catchUp = update; } }); await client.ConnectAsync(CreateOptionsWithCatchUp()); Assert.NotNull(catchUp); Assert.Equal(SuiteLinkUpdateSource.CatchUpReplay, catchUp.Source); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_WithRefreshLatestValue_CanDispatchCatchUpReplay -v minimal` Expected: FAIL because reconnect only resumes live dispatch today **Step 3: Write minimal implementation** After successful reconnect and durable subscription replay: - if `Runtime.CatchUpPolicy == SuiteLinkCatchUpPolicy.RefreshLatestValue` - run a sequential refresh pass over durable subscriptions - obtain one fresh value per item using existing temporary-read machinery or a dedicated internal refresh path - dispatch synthetic updates with: ```csharp Source: SuiteLinkUpdateSource.CatchUpReplay ``` Do not fail reconnect if one item refresh fails or times out. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SubscriptionRegistrationEntry.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs git commit -m "feat: add reconnect catch-up refresh replay" ``` ### Task 8: Make Catch-Up Partial Failure Non-Fatal **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public async Task Reconnect_CatchUpTimeout_DoesNotFailRecoveredSubscriptions() { var client = CreateReconnectReplayClientWithTimedOutRefresh(); await client.ConnectAsync(CreateOptionsWithCatchUp()); await Eventually.AssertAsync(() => Assert.True(client.IsConnected)); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_CatchUpTimeout_DoesNotFailRecoveredSubscriptions -v minimal` Expected: FAIL if catch-up failure tears down reconnect **Step 3: Write minimal implementation** Wrap each refresh item independently: - timeout per item from `Runtime.CatchUpTimeout` - swallow per-item failure after optionally recording internal debug signal - continue to remaining items Do not change the recovered `Ready`/`Subscribed` state. **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs git commit -m "feat: tolerate partial catch-up refresh failures" ``` ### Task 9: Add Jitter Coverage Without Flaky Tests **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs` **Step 1: Write the failing test** ```csharp [Fact] public void GetDelay_WithJitterEnabled_StaysWithinCap() { var policy = new SuiteLinkRetryPolicy( InitialDelay: TimeSpan.FromSeconds(2), Multiplier: 2.0, MaxDelay: TimeSpan.FromSeconds(10), UseJitter: true); var delay = SuiteLinkRetryDelayCalculator.GetDelay(policy, 3, () => 0.5); Assert.InRange(delay, TimeSpan.Zero, TimeSpan.FromSeconds(10)); } ``` **Step 2: Run test to verify it fails** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal` Expected: FAIL because jitter injection does not exist yet **Step 3: Write minimal implementation** Add an injected random source overload: ```csharp public static TimeSpan GetDelay(SuiteLinkRetryPolicy policy, int attempt, Func? nextDouble = null) ``` When jitter is enabled: - compute bounded base delay - apply deterministic injected random value in tests - keep final value within `[0, MaxDelay]` **Step 4: Run test to verify it passes** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal` Expected: PASS **Step 5: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs git commit -m "feat: add deterministic jitter coverage for retry policy" ``` ### Task 10: Update Documentation And Final Verification **Files:** - Modify: `/Users/dohertj2/Desktop/suitelinkclient/README.md` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-design.md` - Modify: `/Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-implementation-plan.md` **Step 1: Write the documentation diff** Document: - catch-up mode is latest-value refresh only - retry policy is configurable and jittered by default - reconnect success is separate from best-effort catch-up completion - writes still fail during reconnect **Step 2: Run targeted verification** Run: `rg -n "catch-up|retry|reconnect|jitter|refresh latest|reconnecting" /Users/dohertj2/Desktop/suitelinkclient/README.md /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md` Expected: PASS with updated wording **Step 3: Run full verification** Run: `dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx -v minimal` Expected: PASS **Step 4: Commit** ```bash git add /Users/dohertj2/Desktop/suitelinkclient/README.md /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md /Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-design.md /Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-implementation-plan.md git commit -m "docs: describe catch-up replay and retry policy" ```