Deployment Manager: add deployment concurrency rules (block same-instance, allow parallel different-instance), per-site artifact deployment status, current-only status persistence. Central UI: specify Blazor Server framework, real-time push updates via SignalR for debug view, health dashboard, and deployment status. Site Event Logging: daily retention purge, paginated queries with 500-event default, keyword search on message/source fields. Store-and-Forward: clarify async best-effort replication to standby with acceptable trade-offs on failover.
5.4 KiB
Component: Store-and-Forward Engine
Purpose
The Store-and-Forward Engine provides reliable message delivery for outbound communications from site clusters. It buffers messages when the target system is unavailable, retries them according to configured policies, and parks messages that exhaust retries for manual review.
Location
Site clusters only. The central cluster does not buffer messages.
Responsibilities
- Buffer outbound messages when the target system is unavailable.
- Manage three categories of buffered messages:
- External system API calls.
- Email notifications.
- Cached database writes.
- Retry delivery per message according to the configured retry policy.
- Park messages that exhaust their retry limit (dead-letter).
- Persist buffered messages to local SQLite for durability.
- Replicate buffered messages to the standby node via application-level replication over Akka.NET remoting.
- On failover, the standby node takes over delivery from its replicated copy.
- Respond to remote queries from central for parked message management (list, retry, discard).
Message Lifecycle
Script submits message
│
▼
Attempt immediate delivery
│
├── Success → Remove from buffer
│
└── Failure → Buffer message
│
▼
Retry loop (per retry policy)
│
├── Success → Remove from buffer + notify standby
│
└── Max retries exhausted → Park message
Retry Policy
Retry settings are defined on the source entity (not per-message):
- External systems: Each external system definition includes max retry count and time between retries.
- Notifications: Email/SMTP configuration includes max retry count and time between retries.
- Cached database writes: Each database connection definition includes max retry count and time between retries.
The retry interval is fixed (not exponential backoff). Fixed interval is sufficient for the expected use cases.
Note: Only transient failures are eligible for store-and-forward buffering. For external system calls, transient failures are connection errors, timeouts, and HTTP 5xx responses. Permanent failures (HTTP 4xx) are returned directly to the calling script and are not queued for retry. This prevents the buffer from accumulating requests that will never succeed.
Buffer Size
There is no maximum buffer size. Messages accumulate in the buffer until delivery succeeds or retries are exhausted and the message is parked. Storage is bounded only by available disk space on the site node.
Persistence
- Buffered messages are persisted to a local SQLite database on each site node.
- The active node persists locally and forwards each buffer operation (add, remove, park) to the standby node asynchronously via Akka.NET remoting. The active node does not wait for standby acknowledgment — this avoids adding latency to every script that buffers a message.
- The standby node applies the same operations to its own local SQLite database.
- On failover, the new active node has a near-complete copy of the buffer. In rare cases, the most recent operations may not have been replicated (e.g., a message added or removed just before failover). This can result in a few duplicate deliveries (message delivered but remove not replicated) or a few missed retries (message added but not replicated). Both are acceptable trade-offs for the latency benefit.
- On failover, the new active node resumes delivery from its local copy.
Parked Message Management
- Parked messages remain stored at the site in SQLite.
- The central UI can query sites for parked messages via the Communication Layer.
- Operators can:
- Retry a parked message (moves it back to the retry queue).
- Discard a parked message (removes it permanently).
- Store-and-forward messages are not automatically cleared when an instance is deleted. Pending and parked messages continue to exist and can be managed via the central UI.
Message Format
Each buffered message stores:
- Message ID: Unique identifier.
- Category: External system call, notification, or cached database write.
- Target: External system name, notification list name, or database connection name.
- Payload: Serialized message content (API method + parameters, email subject + body, SQL + parameters).
- Retry Count: Number of attempts so far.
- Created At: Timestamp when the message was first queued.
- Last Attempt At: Timestamp of the most recent delivery attempt.
- Status: Pending, retrying, or parked.
Dependencies
- SQLite: Local persistence on each node.
- Communication Layer: Application-level replication to standby node; remote query handling from central.
- External System Gateway: Delivers external system API calls.
- Notification Service: Delivers email notifications.
- Database Connections: Delivers cached database writes.
- Site Event Logging: Logs store-and-forward activity (queued, delivered, retried, parked).
Interactions
- Site Runtime (Script Actors): Scripts submit messages to the buffer (external calls, notifications, cached DB writes).
- Communication Layer: Handles parked message queries/commands from central.
- Health Monitoring: Reports buffer depth metrics.