# SuiteLink Runtime Reconnect Design ## Goal Add a background receive loop and automatic reconnect/recovery to the existing SuiteLink client so subscriptions are restored automatically and update callbacks resume without caller intervention. ## Scope This design adds: - a background receive loop owned by `SuiteLinkClient` - automatic reconnect with bounded retry backoff - automatic subscription replay after reconnect - resumed update dispatch after replay This design does not add: - write queuing during reconnect - catch-up replay of missed values - secure SuiteLink V3/TLS support - AlarmMgr support ## Runtime Model The current client uses explicit on-demand inbound processing. The new model shifts normal operation to a managed runtime loop. There are two categories of state: - durable desired state - configured connection options - caller subscription intent - callbacks associated with subscribed items - ephemeral connection state - current transport connection - current session state - current `itemName <-> tagId` mappings Durable state survives reconnects. Ephemeral state is rebuilt on reconnect. ## Recommended Approach Implement a supervised background receive loop inside `SuiteLinkClient`. Behavior: 1. `ConnectAsync` establishes the initial transport/session and starts the receive loop. 2. The receive loop reads frames continuously. 3. Update frames are decoded and dispatched to user callbacks. 4. EOF, transport exceptions, malformed frames, or replay failures trigger recovery. 5. Recovery reconnects with bounded retry delays. 6. After reconnect succeeds, the client replays all current subscriptions and resumes dispatching. This keeps the public API simple and avoids forcing callers to manually poll `ProcessIncomingAsync`. ## State Model Expand session/client lifecycle to distinguish pending vs ready vs reconnecting: - `Disconnected` - `Connecting` - `ConnectSent` - `Ready` - `Reconnecting` - `Faulted` - `Disposed` Definitions: - `Connecting`: transport connect + handshake in progress - `ConnectSent`: startup connect has been sent but the runtime is not yet considered ready - `Ready`: background receive loop active and subscriptions can be served normally - `Reconnecting`: recovery loop active after a connection failure `IsConnected` should reflect `Ready` only. ## Recovery Policy Failure triggers: - transport read returns `0` - transport exception while sending or receiving - malformed or unexpected frame during active runtime - reconnect replay failure Recovery behavior: - stop the current receive loop - mark the ephemeral session as disconnected/faulted - start reconnect attempts until success or explicit shutdown Retry schedule: - first retry immediately - then bounded retry delays such as: - 1 second - 2 seconds - 5 seconds - 10 seconds - cap the delay instead of growing without bound Writes during `Reconnecting` are rejected with a clear exception. ## Subscription Replay The client should maintain a durable subscription registry keyed by `itemName`. Each entry stores: - `itemName` - callback - requested tag id During reconnect: 1. reconnect transport 2. send handshake 3. send connect 4. replay every subscribed item via `ADVISE` 5. rebuild live session mappings from fresh ACKs 6. transition to `Ready` Subscription replay is serialized and must not run concurrently with normal writes or new replay attempts. ## Callback Rules Callbacks must never run under client locks or gates. Rules: - decode frames under internal synchronization - dispatch callbacks only after releasing gates - callback exceptions remain contained and do not crash the receive loop ## Public API Effects Expected public behavior: - `ConnectAsync` - establishes initial runtime and starts background receive - `SubscribeAsync` - records durable intent - advises immediately when ready - keeps durable subscription for replay after reconnect - `ReadAsync` - can remain implemented as a temporary subscription - should still use the background runtime instead of manual caller polling - `WriteAsync` - allowed only in `Ready` - fails during `Reconnecting` - `DisconnectAsync` - stops receive and reconnect tasks - tears down transport `ProcessIncomingAsync` should stop being the primary runtime API. It can be retained only as an internal/test helper if still useful. ## Internal Changes ### `SuiteLinkClient` Add: - receive loop task - reconnect supervisor task or integrated recovery loop - cancellation tokens for runtime shutdown - durable subscription registry - reconnect backoff helper Responsibilities: - own runtime lifecycle - coordinate reconnect attempts - replay subscriptions safely - ensure only one receive loop and one reconnect flow are active ### `SuiteLinkSession` Continue to manage: - live connection/session state - current `itemName <-> tagId` mappings - live dispatch helpers Do not make it responsible for durable reconnect intent. ### `SubscriptionHandle` Should continue to remove durable subscription intent and trigger `UNADVISE` when possible. If called during reconnect/disconnect, removal of durable intent still succeeds even if wire unadvise cannot be sent. ## Testing Strategy ### Runtime Loop Tests Add tests proving: - updates received by the background loop reach callbacks - no manual `ProcessIncomingAsync` call is needed in normal operation ### Recovery Tests Add tests proving: - EOF triggers reconnect - reconnect replays handshake/connect/subscriptions - callback dispatch resumes after reconnect - writes during reconnect fail predictably ### Lifecycle Tests Add tests proving: - `DisconnectAsync` stops background tasks - `DisposeAsync` stops reconnect attempts - repeated failures do not start multiple reconnect loops ## Recommended Next Step Create an implementation plan that breaks this into small tasks: - durable subscription registry - background receive loop - reconnect loop and backoff - replay logic - runtime tests