20 KiB
Catch-Up Replay And Advanced Retry Implementation Plan
For Codex: REQUIRED SUB-SKILL: Use
executeplanto implement this plan task-by-task.
Goal: Add best-effort latest-value catch-up after reconnect and replace the fixed reconnect delay schedule with a production-grade retry policy, while also fixing the current reconnect quality issues.
Architecture: Extend the existing reconnect runtime with a small runtime-options layer, a retry-policy calculator, and a post-reconnect catch-up refresh phase. Keep reconnect success defined as restored live subscriptions, and treat catch-up as a best-effort follow-on phase that emits synthetic updates marked separately from live traffic.
Tech Stack: .NET 10, C#, xUnit, existing SuiteLink protocol/client/runtime/transport layers
Task 1: Add Runtime Option Types
Files:
- Create:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRuntimeOptions.cs - Create:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRetryPolicy.cs - Create:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkCatchUpPolicy.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkConnectionOptions.cs - Test:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkConnectionOptionsTests.cs
Step 1: Write the failing test
[Fact]
public void ConnectionOptions_DefaultsRuntimeOptions()
{
var options = new SuiteLinkConnectionOptions(
host: "127.0.0.1",
application: "App",
topic: "Topic",
clientName: "Client",
clientNode: "Node",
userName: "User",
serverNode: "Server");
Assert.NotNull(options.Runtime);
Assert.Equal(SuiteLinkCatchUpPolicy.None, options.Runtime.CatchUpPolicy);
Assert.NotNull(options.Runtime.RetryPolicy);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter ConnectionOptions_DefaultsRuntimeOptions -v minimal
Expected: FAIL because runtime options do not exist yet
Step 3: Write minimal implementation
Create:
public enum SuiteLinkCatchUpPolicy
{
None = 0,
RefreshLatestValue = 1
}
public sealed record class SuiteLinkRetryPolicy(
TimeSpan InitialDelay,
double Multiplier,
TimeSpan MaxDelay,
int? MaxAttempts = null,
bool UseJitter = true)
{
public static SuiteLinkRetryPolicy Default { get; } =
new(TimeSpan.FromSeconds(1), 2.0, TimeSpan.FromSeconds(30));
}
public sealed record class SuiteLinkRuntimeOptions(
SuiteLinkRetryPolicy RetryPolicy,
SuiteLinkCatchUpPolicy CatchUpPolicy,
TimeSpan CatchUpTimeout)
{
public static SuiteLinkRuntimeOptions Default { get; } =
new(SuiteLinkRetryPolicy.Default, SuiteLinkCatchUpPolicy.None, TimeSpan.FromSeconds(2));
}
Update SuiteLinkConnectionOptions to expose:
public SuiteLinkRuntimeOptions Runtime { get; }
and default it to SuiteLinkRuntimeOptions.Default.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkConnectionOptionsTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRuntimeOptions.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkRetryPolicy.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkCatchUpPolicy.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkConnectionOptions.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkConnectionOptionsTests.cs
git commit -m "feat: add runtime reconnect option types"
Task 2: Add Retry Policy Delay Calculator
Files:
- Create:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs - Test:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs
Step 1: Write the failing test
[Fact]
public void GetDelay_UsesImmediateThenExponentialCap()
{
var policy = new SuiteLinkRetryPolicy(
InitialDelay: TimeSpan.FromSeconds(1),
Multiplier: 2.0,
MaxDelay: TimeSpan.FromSeconds(30),
UseJitter: false);
Assert.Equal(TimeSpan.Zero, SuiteLinkRetryDelayCalculator.GetDelay(policy, 0));
Assert.Equal(TimeSpan.FromSeconds(1), SuiteLinkRetryDelayCalculator.GetDelay(policy, 1));
Assert.Equal(TimeSpan.FromSeconds(2), SuiteLinkRetryDelayCalculator.GetDelay(policy, 2));
Assert.Equal(TimeSpan.FromSeconds(4), SuiteLinkRetryDelayCalculator.GetDelay(policy, 3));
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal
Expected: FAIL because calculator does not exist yet
Step 3: Write minimal implementation
Create:
internal static class SuiteLinkRetryDelayCalculator
{
public static TimeSpan GetDelay(SuiteLinkRetryPolicy policy, int attempt)
{
if (attempt == 0)
{
return TimeSpan.Zero;
}
var rawSeconds = policy.InitialDelay.TotalSeconds * Math.Pow(policy.Multiplier, attempt - 1);
var bounded = TimeSpan.FromSeconds(Math.Min(rawSeconds, policy.MaxDelay.TotalSeconds));
return bounded;
}
}
Do not add jitter yet beyond the policy flag unless tests require it.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs
git commit -m "feat: add reconnect retry delay calculator"
Task 3: Wire Retry Policy Into Reconnect Runtime
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
Step 1: Write the failing test
[Fact]
public async Task Reconnect_UsesConfiguredRetryPolicy()
{
var observed = new List<TimeSpan>();
var options = CreateOptions() with
{
Runtime = new SuiteLinkRuntimeOptions(
new SuiteLinkRetryPolicy(TimeSpan.FromSeconds(3), 3.0, TimeSpan.FromSeconds(20), UseJitter: false),
SuiteLinkCatchUpPolicy.None,
TimeSpan.FromSeconds(2))
};
var client = CreateReconnectClient(delayAsync: (delay, _) =>
{
observed.Add(delay);
return Task.CompletedTask;
});
await client.ConnectAsync(options);
await EventuallyReconnectAsync(client);
Assert.Contains(TimeSpan.FromSeconds(3), observed);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_UsesConfiguredRetryPolicy -v minimal
Expected: FAIL because reconnect still uses a fixed schedule
Step 3: Write minimal implementation
In SuiteLinkClient:
- remove direct use of
ReconnectDelaySchedule - read retry policy from
_connectionOptions!.Runtime.RetryPolicy - use
SuiteLinkRetryDelayCalculator.GetDelay(policy, attempt)
Keep the current injected _delayAsync test seam.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
git commit -m "feat: apply retry policy to reconnect runtime"
Task 4: Fix Fast-Fail Writes During Reconnect
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientWriteTests.cs
Step 1: Write the failing test
[Fact]
public async Task WriteAsync_DuringReconnect_ThrowsBeforeWaitingOnOperationGate()
{
var client = CreateClientWithBlockedOperationGateAndReconnectState();
var ex = await Assert.ThrowsAsync<InvalidOperationException>(
() => client.WriteAsync("Pump001.Run", SuiteLinkValue.FromBoolean(true)));
Assert.Contains("reconnecting", ex.Message, StringComparison.OrdinalIgnoreCase);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter WriteAsync_DuringReconnect_ThrowsBeforeWaitingOnOperationGate -v minimal
Expected: FAIL because WriteAsync currently waits on _operationGate first
Step 3: Write minimal implementation
Move the reconnect state check ahead of:
await _operationGate.WaitAsync(...)
while keeping disposed-state checks intact.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientWriteTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientWriteTests.cs
git commit -m "fix: fail writes before reconnect gate contention"
Task 5: Fix Transport Reset Ownership Semantics
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/SuiteLinkTcpTransport.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/ISuiteLinkReconnectableTransport.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Transport/SuiteLinkTcpTransportTests.cs
Step 1: Write the failing test
[Fact]
public async Task ResetConnectionAsync_LeaveOpenTrue_DoesNotDisposeInjectedStream()
{
var stream = new TrackingStream();
await using var transport = new SuiteLinkTcpTransport(stream, leaveOpen: true);
await transport.ResetConnectionAsync();
Assert.False(stream.WasDisposed);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter ResetConnectionAsync_LeaveOpenTrue_DoesNotDisposeInjectedStream -v minimal
Expected: FAIL because reset currently disposes caller-owned resources
Step 3: Write minimal implementation
Update ResetConnectionAsync to respect the same ownership rule as DisposeAsync:
- if
leaveOpenistrue, detach without disposing injected resources - if
leaveOpenisfalse, dispose detached resources
Do not broaden interface scope unnecessarily.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkTcpTransportTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/SuiteLinkTcpTransport.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Transport/ISuiteLinkReconnectableTransport.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Transport/SuiteLinkTcpTransportTests.cs
git commit -m "fix: preserve transport ownership during reconnect reset"
Task 6: Add Update Source Metadata
Files:
- Create:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkUpdateSource.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkTagUpdate.cs - Test:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkValueTests.cs
Step 1: Write the failing test
[Fact]
public void TagUpdate_DefaultSource_IsLive()
{
var update = new SuiteLinkTagUpdate(
"Pump001.Run",
1,
SuiteLinkValue.FromBoolean(true),
0x00C0,
1,
DateTimeOffset.UtcNow);
Assert.Equal(SuiteLinkUpdateSource.Live, update.Source);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter TagUpdate_DefaultSource_IsLive -v minimal
Expected: FAIL because source metadata does not exist
Step 3: Write minimal implementation
Create:
public enum SuiteLinkUpdateSource
{
Live = 0,
CatchUpReplay = 1
}
Add Source to SuiteLinkTagUpdate with default:
SuiteLinkUpdateSource.Live
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkTagUpdate -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkUpdateSource.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkTagUpdate.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkValueTests.cs
git commit -m "feat: add update source metadata"
Task 7: Add Best-Effort Catch-Up Refresh Execution
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SubscriptionRegistrationEntry.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
Step 1: Write the failing test
[Fact]
public async Task Reconnect_WithRefreshLatestValue_CanDispatchCatchUpReplay()
{
SuiteLinkTagUpdate? catchUp = null;
var client = CreateReconnectReplayClient(
catchUpPolicy: SuiteLinkCatchUpPolicy.RefreshLatestValue,
onUpdate: update =>
{
if (update.Source == SuiteLinkUpdateSource.CatchUpReplay)
{
catchUp = update;
}
});
await client.ConnectAsync(CreateOptionsWithCatchUp());
Assert.NotNull(catchUp);
Assert.Equal(SuiteLinkUpdateSource.CatchUpReplay, catchUp.Source);
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_WithRefreshLatestValue_CanDispatchCatchUpReplay -v minimal
Expected: FAIL because reconnect only resumes live dispatch today
Step 3: Write minimal implementation
After successful reconnect and durable subscription replay:
- if
Runtime.CatchUpPolicy == SuiteLinkCatchUpPolicy.RefreshLatestValue - run a sequential refresh pass over durable subscriptions
- obtain one fresh value per item using existing temporary-read machinery or a dedicated internal refresh path
- dispatch synthetic updates with:
Source: SuiteLinkUpdateSource.CatchUpReplay
Do not fail reconnect if one item refresh fails or times out.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SubscriptionRegistrationEntry.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
git commit -m "feat: add reconnect catch-up refresh replay"
Task 8: Make Catch-Up Partial Failure Non-Fatal
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
Step 1: Write the failing test
[Fact]
public async Task Reconnect_CatchUpTimeout_DoesNotFailRecoveredSubscriptions()
{
var client = CreateReconnectReplayClientWithTimedOutRefresh();
await client.ConnectAsync(CreateOptionsWithCatchUp());
await Eventually.AssertAsync(() => Assert.True(client.IsConnected));
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter Reconnect_CatchUpTimeout_DoesNotFailRecoveredSubscriptions -v minimal
Expected: FAIL if catch-up failure tears down reconnect
Step 3: Write minimal implementation
Wrap each refresh item independently:
- timeout per item from
Runtime.CatchUpTimeout - swallow per-item failure after optionally recording internal debug signal
- continue to remaining items
Do not change the recovered Ready/Subscribed state.
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkClientReconnectTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/SuiteLinkClient.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/SuiteLinkClientReconnectTests.cs
git commit -m "feat: tolerate partial catch-up refresh failures"
Task 9: Add Jitter Coverage Without Flaky Tests
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs
Step 1: Write the failing test
[Fact]
public void GetDelay_WithJitterEnabled_StaysWithinCap()
{
var policy = new SuiteLinkRetryPolicy(
InitialDelay: TimeSpan.FromSeconds(2),
Multiplier: 2.0,
MaxDelay: TimeSpan.FromSeconds(10),
UseJitter: true);
var delay = SuiteLinkRetryDelayCalculator.GetDelay(policy, 3, () => 0.5);
Assert.InRange(delay, TimeSpan.Zero, TimeSpan.FromSeconds(10));
}
Step 2: Run test to verify it fails
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal
Expected: FAIL because jitter injection does not exist yet
Step 3: Write minimal implementation
Add an injected random source overload:
public static TimeSpan GetDelay(SuiteLinkRetryPolicy policy, int attempt, Func<double>? nextDouble = null)
When jitter is enabled:
- compute bounded base delay
- apply deterministic injected random value in tests
- keep final value within
[0, MaxDelay]
Step 4: Run test to verify it passes
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx --filter SuiteLinkRetryDelayCalculatorTests -v minimal
Expected: PASS
Step 5: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/src/SuiteLink.Client/Internal/SuiteLinkRetryDelayCalculator.cs /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.Tests/Internal/SuiteLinkRetryDelayCalculatorTests.cs
git commit -m "feat: add deterministic jitter coverage for retry policy"
Task 10: Update Documentation And Final Verification
Files:
- Modify:
/Users/dohertj2/Desktop/suitelinkclient/README.md - Modify:
/Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md - Modify:
/Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-design.md - Modify:
/Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-implementation-plan.md
Step 1: Write the documentation diff
Document:
- catch-up mode is latest-value refresh only
- retry policy is configurable and jittered by default
- reconnect success is separate from best-effort catch-up completion
- writes still fail during reconnect
Step 2: Run targeted verification
Run: rg -n "catch-up|retry|reconnect|jitter|refresh latest|reconnecting" /Users/dohertj2/Desktop/suitelinkclient/README.md /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md
Expected: PASS with updated wording
Step 3: Run full verification
Run: dotnet test /Users/dohertj2/Desktop/suitelinkclient/SuiteLink.Client.slnx -v minimal
Expected: PASS
Step 4: Commit
git add /Users/dohertj2/Desktop/suitelinkclient/README.md /Users/dohertj2/Desktop/suitelinkclient/tests/SuiteLink.Client.IntegrationTests/README.md /Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-design.md /Users/dohertj2/Desktop/suitelinkclient/docs/plans/2026-03-17-catchup-retry-implementation-plan.md
git commit -m "docs: describe catch-up replay and retry policy"