Refine Communication Layer: timeouts, transport config, ordering, failure behavior
Add per-pattern message timeouts with sensible defaults (120s for deployments, 30s for queries/commands). Configure Akka.NET transport heartbeat explicitly rather than relying on framework defaults. Document per-site message ordering guarantee. Specify that in-flight messages on disconnect result in timeout error (no central buffering) and debug streams die on any disconnect.
This commit is contained in:
@@ -0,0 +1,47 @@
|
||||
# Communication Layer Refinement — Design
|
||||
|
||||
**Date**: 2026-03-16
|
||||
**Component**: Central–Site Communication (`Component-Communication.md`)
|
||||
**Status**: Approved
|
||||
|
||||
## Problem
|
||||
|
||||
The Communication Layer doc defined 8 message patterns clearly but lacked specification for timeouts, transport configuration, reconnection behavior, message ordering guarantees, and connection failure handling.
|
||||
|
||||
## Decisions
|
||||
|
||||
### Message Timeouts
|
||||
- **Per-pattern timeouts with sensible defaults**, overridable in configuration.
|
||||
- Deployment and system-wide artifacts: 120 seconds (script compilation can be slow).
|
||||
- Lifecycle commands, integration routing, recipe/command delivery, remote queries: 30 seconds.
|
||||
- Uses the Akka.NET ask pattern; timeout results in failure to caller.
|
||||
|
||||
### Transport Configuration
|
||||
- **Akka.NET built-in reconnection** with explicitly configured transport heartbeat interval and failure detection threshold.
|
||||
- No custom reconnection logic — framework handles it.
|
||||
- Settings explicitly documented rather than relying on framework defaults, for predictability in a SCADA context.
|
||||
|
||||
### Connection Failure Behavior
|
||||
- **In-flight messages get a timeout error** — caller retries manually. No buffering at central. Consistent with existing design principle.
|
||||
- Automatic retry rejected due to risk of duplicate processing (e.g., site may have applied a deployment before the connection dropped).
|
||||
|
||||
### Message Ordering
|
||||
- **Per-site ordering guaranteed** — relies on Akka.NET's built-in per-sender/per-receiver ordering. No custom sequencing logic needed.
|
||||
|
||||
### Debug Stream Interruption
|
||||
- **Stream dies on any disconnect** (failover or network blip). Engineer reopens the debug view manually.
|
||||
- Auto-resume rejected — adds complexity for a transient diagnostic tool.
|
||||
|
||||
## Affected Documents
|
||||
|
||||
| Document | Change |
|
||||
|----------|--------|
|
||||
| `Component-Communication.md` | Added 4 new sections: Message Timeouts, Transport Configuration, Message Ordering, Connection Failure Behavior |
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
- **Global timeout for all patterns**: Rejected — deployment involves compilation and needs more time than a simple query.
|
||||
- **Default Akka.NET transport settings**: Rejected — relying on undocumented defaults is risky for SCADA; explicit configuration ensures predictable behavior.
|
||||
- **Automatic retry of in-flight messages**: Rejected — risks duplicate processing and contradicts the no-buffering-at-central principle.
|
||||
- **No ordering guarantee**: Rejected — Akka.NET provides this for free; the design already implicitly relies on it.
|
||||
- **Auto-resume debug streams on reconnection**: Rejected — adds state tracking complexity for a transient diagnostic feature.
|
||||
Reference in New Issue
Block a user