Files
suitelinkclient/docs/plans/2026-03-17-runtime-reconnect-design.md
2026-03-17 11:04:19 -04:00

5.9 KiB

SuiteLink Runtime Reconnect Design

Goal

Add a background receive loop and automatic reconnect/recovery to the existing SuiteLink client so subscriptions are restored automatically and update callbacks resume without caller intervention.

Scope

This design adds:

  • a background receive loop owned by SuiteLinkClient
  • automatic reconnect with bounded retry backoff
  • automatic subscription replay after reconnect
  • resumed update dispatch after replay

This design does not add:

  • write queuing during reconnect
  • catch-up replay of missed values
  • secure SuiteLink V3/TLS support
  • AlarmMgr support

Runtime Model

The current client uses explicit on-demand inbound processing. The new model shifts normal operation to a managed runtime loop.

There are two categories of state:

  • durable desired state
    • configured connection options
    • caller subscription intent
    • callbacks associated with subscribed items
  • ephemeral connection state
    • current transport connection
    • current session state
    • current itemName <-> tagId mappings

Durable state survives reconnects. Ephemeral state is rebuilt on reconnect.

Implement a supervised background receive loop inside SuiteLinkClient.

Behavior:

  1. ConnectAsync establishes the initial transport/session and starts the receive loop.
  2. The receive loop reads frames continuously.
  3. Update frames are decoded and dispatched to user callbacks.
  4. EOF, transport exceptions, malformed frames, or replay failures trigger recovery.
  5. Recovery reconnects with bounded retry delays.
  6. After reconnect succeeds, the client replays all current subscriptions and resumes dispatching.

This keeps the public API simple and avoids forcing callers to manually poll ProcessIncomingAsync.

State Model

Expand session/client lifecycle to distinguish pending vs ready vs reconnecting:

  • Disconnected
  • Connecting
  • ConnectSent
  • Ready
  • Reconnecting
  • Faulted
  • Disposed

Definitions:

  • Connecting: transport connect + handshake in progress
  • ConnectSent: startup connect has been sent but the runtime is not yet considered ready
  • Ready: background receive loop active and subscriptions can be served normally
  • Reconnecting: recovery loop active after a connection failure

IsConnected should reflect Ready only.

Recovery Policy

Failure triggers:

  • transport read returns 0
  • transport exception while sending or receiving
  • malformed or unexpected frame during active runtime
  • reconnect replay failure

Recovery behavior:

  • stop the current receive loop
  • mark the ephemeral session as disconnected/faulted
  • start reconnect attempts until success or explicit shutdown

Retry schedule:

  • first retry immediately
  • then bounded retry delays such as:
    • 1 second
    • 2 seconds
    • 5 seconds
    • 10 seconds
  • cap the delay instead of growing without bound

Writes during Reconnecting are rejected with a clear exception.

Subscription Replay

The client should maintain a durable subscription registry keyed by itemName.

Each entry stores:

  • itemName
  • callback
  • requested tag id

During reconnect:

  1. reconnect transport
  2. send handshake
  3. send connect
  4. replay every subscribed item via ADVISE
  5. rebuild live session mappings from fresh ACKs
  6. transition to Ready

Subscription replay is serialized and must not run concurrently with normal writes or new replay attempts.

Callback Rules

Callbacks must never run under client locks or gates.

Rules:

  • decode frames under internal synchronization
  • dispatch callbacks only after releasing gates
  • callback exceptions remain contained and do not crash the receive loop

Public API Effects

Expected public behavior:

  • ConnectAsync
    • establishes initial runtime and starts background receive
  • SubscribeAsync
    • records durable intent
    • advises immediately when ready
    • keeps durable subscription for replay after reconnect
  • ReadAsync
    • can remain implemented as a temporary subscription
    • should still use the background runtime instead of manual caller polling
  • WriteAsync
    • allowed only in Ready
    • fails during Reconnecting
  • DisconnectAsync
    • stops receive and reconnect tasks
    • tears down transport

ProcessIncomingAsync should stop being the primary runtime API. It can be retained only as an internal/test helper if still useful.

Internal Changes

SuiteLinkClient

Add:

  • receive loop task
  • reconnect supervisor task or integrated recovery loop
  • cancellation tokens for runtime shutdown
  • durable subscription registry
  • reconnect backoff helper

Responsibilities:

  • own runtime lifecycle
  • coordinate reconnect attempts
  • replay subscriptions safely
  • ensure only one receive loop and one reconnect flow are active

SuiteLinkSession

Continue to manage:

  • live connection/session state
  • current itemName <-> tagId mappings
  • live dispatch helpers

Do not make it responsible for durable reconnect intent.

SubscriptionHandle

Should continue to remove durable subscription intent and trigger UNADVISE when possible.

If called during reconnect/disconnect, removal of durable intent still succeeds even if wire unadvise cannot be sent.

Testing Strategy

Runtime Loop Tests

Add tests proving:

  • updates received by the background loop reach callbacks
  • no manual ProcessIncomingAsync call is needed in normal operation

Recovery Tests

Add tests proving:

  • EOF triggers reconnect
  • reconnect replays handshake/connect/subscriptions
  • callback dispatch resumes after reconnect
  • writes during reconnect fail predictably

Lifecycle Tests

Add tests proving:

  • DisconnectAsync stops background tasks
  • DisposeAsync stops reconnect attempts
  • repeated failures do not start multiple reconnect loops

Create an implementation plan that breaks this into small tasks:

  • durable subscription registry
  • background receive loop
  • reconnect loop and backoff
  • replay logic
  • runtime tests