# Communication Layer Refinement — Design **Date**: 2026-03-16 **Component**: Central–Site Communication (`docs/requirements/Component-Communication.md`) **Status**: Approved ## Problem The Communication Layer doc defined 8 message patterns clearly but lacked specification for timeouts, transport configuration, reconnection behavior, message ordering guarantees, and connection failure handling. ## Decisions ### Message Timeouts - **Per-pattern timeouts with sensible defaults**, overridable in configuration. - Deployment and system-wide artifacts: 120 seconds (script compilation can be slow). - Lifecycle commands, integration routing, recipe/command delivery, remote queries: 30 seconds. - Uses the Akka.NET ask pattern; timeout results in failure to caller. ### Transport Configuration - **Akka.NET built-in reconnection** with explicitly configured transport heartbeat interval and failure detection threshold. - No custom reconnection logic — framework handles it. - Settings explicitly documented rather than relying on framework defaults, for predictability in a SCADA context. ### Connection Failure Behavior - **In-flight messages get a timeout error** — caller retries manually. No buffering at central. Consistent with existing design principle. - Automatic retry rejected due to risk of duplicate processing (e.g., site may have applied a deployment before the connection dropped). ### Message Ordering - **Per-site ordering guaranteed** — relies on Akka.NET's built-in per-sender/per-receiver ordering. No custom sequencing logic needed. ### Debug Stream Interruption - **Stream dies on any disconnect** (failover or network blip). Engineer reopens the debug view manually. - Auto-resume rejected — adds complexity for a transient diagnostic tool. ## Affected Documents | Document | Change | |----------|--------| | `docs/requirements/Component-Communication.md` | Added 4 new sections: Message Timeouts, Transport Configuration, Message Ordering, Connection Failure Behavior | ## Alternatives Considered - **Global timeout for all patterns**: Rejected — deployment involves compilation and needs more time than a simple query. - **Default Akka.NET transport settings**: Rejected — relying on undocumented defaults is risky for SCADA; explicit configuration ensures predictable behavior. - **Automatic retry of in-flight messages**: Rejected — risks duplicate processing and contradicts the no-buffering-at-central principle. - **No ordering guarantee**: Rejected — Akka.NET provides this for free; the design already implicitly relies on it. - **Auto-resume debug streams on reconnection**: Rejected — adds state tracking complexity for a transient diagnostic feature.