226 lines
5.9 KiB
Markdown
226 lines
5.9 KiB
Markdown
# SuiteLink Runtime Reconnect Design
|
|
|
|
## Goal
|
|
|
|
Add a background receive loop and automatic reconnect/recovery to the existing SuiteLink client so subscriptions are restored automatically and update callbacks resume without caller intervention.
|
|
|
|
## Scope
|
|
|
|
This design adds:
|
|
|
|
- a background receive loop owned by `SuiteLinkClient`
|
|
- automatic reconnect with bounded retry backoff
|
|
- automatic subscription replay after reconnect
|
|
- resumed update dispatch after replay
|
|
|
|
This design does not add:
|
|
|
|
- write queuing during reconnect
|
|
- catch-up replay of missed values
|
|
- secure SuiteLink V3/TLS support
|
|
- AlarmMgr support
|
|
|
|
## Runtime Model
|
|
|
|
The current client uses explicit on-demand inbound processing. The new model shifts normal operation to a managed runtime loop.
|
|
|
|
There are two categories of state:
|
|
|
|
- durable desired state
|
|
- configured connection options
|
|
- caller subscription intent
|
|
- callbacks associated with subscribed items
|
|
- ephemeral connection state
|
|
- current transport connection
|
|
- current session state
|
|
- current `itemName <-> tagId` mappings
|
|
|
|
Durable state survives reconnects. Ephemeral state is rebuilt on reconnect.
|
|
|
|
## Recommended Approach
|
|
|
|
Implement a supervised background receive loop inside `SuiteLinkClient`.
|
|
|
|
Behavior:
|
|
|
|
1. `ConnectAsync` establishes the initial transport/session and starts the receive loop.
|
|
2. The receive loop reads frames continuously.
|
|
3. Update frames are decoded and dispatched to user callbacks.
|
|
4. EOF, transport exceptions, malformed frames, or replay failures trigger recovery.
|
|
5. Recovery reconnects with bounded retry delays.
|
|
6. After reconnect succeeds, the client replays all current subscriptions and resumes dispatching.
|
|
|
|
This keeps the public API simple and avoids forcing callers to manually poll `ProcessIncomingAsync`.
|
|
|
|
## State Model
|
|
|
|
Expand session/client lifecycle to distinguish pending vs ready vs reconnecting:
|
|
|
|
- `Disconnected`
|
|
- `Connecting`
|
|
- `ConnectSent`
|
|
- `Ready`
|
|
- `Reconnecting`
|
|
- `Faulted`
|
|
- `Disposed`
|
|
|
|
Definitions:
|
|
|
|
- `Connecting`: transport connect + handshake in progress
|
|
- `ConnectSent`: startup connect has been sent but the runtime is not yet considered ready
|
|
- `Ready`: background receive loop active and subscriptions can be served normally
|
|
- `Reconnecting`: recovery loop active after a connection failure
|
|
|
|
`IsConnected` should reflect `Ready` only.
|
|
|
|
## Recovery Policy
|
|
|
|
Failure triggers:
|
|
|
|
- transport read returns `0`
|
|
- transport exception while sending or receiving
|
|
- malformed or unexpected frame during active runtime
|
|
- reconnect replay failure
|
|
|
|
Recovery behavior:
|
|
|
|
- stop the current receive loop
|
|
- mark the ephemeral session as disconnected/faulted
|
|
- start reconnect attempts until success or explicit shutdown
|
|
|
|
Retry schedule:
|
|
|
|
- first retry immediately
|
|
- then bounded retry delays such as:
|
|
- 1 second
|
|
- 2 seconds
|
|
- 5 seconds
|
|
- 10 seconds
|
|
- cap the delay instead of growing without bound
|
|
|
|
Writes during `Reconnecting` are rejected with a clear exception.
|
|
|
|
## Subscription Replay
|
|
|
|
The client should maintain a durable subscription registry keyed by `itemName`.
|
|
|
|
Each entry stores:
|
|
|
|
- `itemName`
|
|
- callback
|
|
- requested tag id
|
|
|
|
During reconnect:
|
|
|
|
1. reconnect transport
|
|
2. send handshake
|
|
3. send connect
|
|
4. replay every subscribed item via `ADVISE`
|
|
5. rebuild live session mappings from fresh ACKs
|
|
6. transition to `Ready`
|
|
|
|
Subscription replay is serialized and must not run concurrently with normal writes or new replay attempts.
|
|
|
|
## Callback Rules
|
|
|
|
Callbacks must never run under client locks or gates.
|
|
|
|
Rules:
|
|
|
|
- decode frames under internal synchronization
|
|
- dispatch callbacks only after releasing gates
|
|
- callback exceptions remain contained and do not crash the receive loop
|
|
|
|
## Public API Effects
|
|
|
|
Expected public behavior:
|
|
|
|
- `ConnectAsync`
|
|
- establishes initial runtime and starts background receive
|
|
- `SubscribeAsync`
|
|
- records durable intent
|
|
- advises immediately when ready
|
|
- keeps durable subscription for replay after reconnect
|
|
- `ReadAsync`
|
|
- can remain implemented as a temporary subscription
|
|
- should still use the background runtime instead of manual caller polling
|
|
- `WriteAsync`
|
|
- allowed only in `Ready`
|
|
- fails during `Reconnecting`
|
|
- `DisconnectAsync`
|
|
- stops receive and reconnect tasks
|
|
- tears down transport
|
|
|
|
`ProcessIncomingAsync` should stop being the primary runtime API. It can be retained only as an internal/test helper if still useful.
|
|
|
|
## Internal Changes
|
|
|
|
### `SuiteLinkClient`
|
|
|
|
Add:
|
|
|
|
- receive loop task
|
|
- reconnect supervisor task or integrated recovery loop
|
|
- cancellation tokens for runtime shutdown
|
|
- durable subscription registry
|
|
- reconnect backoff helper
|
|
|
|
Responsibilities:
|
|
|
|
- own runtime lifecycle
|
|
- coordinate reconnect attempts
|
|
- replay subscriptions safely
|
|
- ensure only one receive loop and one reconnect flow are active
|
|
|
|
### `SuiteLinkSession`
|
|
|
|
Continue to manage:
|
|
|
|
- live connection/session state
|
|
- current `itemName <-> tagId` mappings
|
|
- live dispatch helpers
|
|
|
|
Do not make it responsible for durable reconnect intent.
|
|
|
|
### `SubscriptionHandle`
|
|
|
|
Should continue to remove durable subscription intent and trigger `UNADVISE` when possible.
|
|
|
|
If called during reconnect/disconnect, removal of durable intent still succeeds even if wire unadvise cannot be sent.
|
|
|
|
## Testing Strategy
|
|
|
|
### Runtime Loop Tests
|
|
|
|
Add tests proving:
|
|
|
|
- updates received by the background loop reach callbacks
|
|
- no manual `ProcessIncomingAsync` call is needed in normal operation
|
|
|
|
### Recovery Tests
|
|
|
|
Add tests proving:
|
|
|
|
- EOF triggers reconnect
|
|
- reconnect replays handshake/connect/subscriptions
|
|
- callback dispatch resumes after reconnect
|
|
- writes during reconnect fail predictably
|
|
|
|
### Lifecycle Tests
|
|
|
|
Add tests proving:
|
|
|
|
- `DisconnectAsync` stops background tasks
|
|
- `DisposeAsync` stops reconnect attempts
|
|
- repeated failures do not start multiple reconnect loops
|
|
|
|
## Recommended Next Step
|
|
|
|
Create an implementation plan that breaks this into small tasks:
|
|
|
|
- durable subscription registry
|
|
- background receive loop
|
|
- reconnect loop and backoff
|
|
- replay logic
|
|
- runtime tests
|