Files
scadalink-design/docs/plans/2026-03-16-communication-layer-refinement-design.md
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00

48 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Communication Layer Refinement — Design
**Date**: 2026-03-16
**Component**: CentralSite Communication (`docs/requirements/Component-Communication.md`)
**Status**: Approved
## Problem
The Communication Layer doc defined 8 message patterns clearly but lacked specification for timeouts, transport configuration, reconnection behavior, message ordering guarantees, and connection failure handling.
## Decisions
### Message Timeouts
- **Per-pattern timeouts with sensible defaults**, overridable in configuration.
- Deployment and system-wide artifacts: 120 seconds (script compilation can be slow).
- Lifecycle commands, integration routing, recipe/command delivery, remote queries: 30 seconds.
- Uses the Akka.NET ask pattern; timeout results in failure to caller.
### Transport Configuration
- **Akka.NET built-in reconnection** with explicitly configured transport heartbeat interval and failure detection threshold.
- No custom reconnection logic — framework handles it.
- Settings explicitly documented rather than relying on framework defaults, for predictability in a SCADA context.
### Connection Failure Behavior
- **In-flight messages get a timeout error** — caller retries manually. No buffering at central. Consistent with existing design principle.
- Automatic retry rejected due to risk of duplicate processing (e.g., site may have applied a deployment before the connection dropped).
### Message Ordering
- **Per-site ordering guaranteed** — relies on Akka.NET's built-in per-sender/per-receiver ordering. No custom sequencing logic needed.
### Debug Stream Interruption
- **Stream dies on any disconnect** (failover or network blip). Engineer reopens the debug view manually.
- Auto-resume rejected — adds complexity for a transient diagnostic tool.
## Affected Documents
| Document | Change |
|----------|--------|
| `docs/requirements/Component-Communication.md` | Added 4 new sections: Message Timeouts, Transport Configuration, Message Ordering, Connection Failure Behavior |
## Alternatives Considered
- **Global timeout for all patterns**: Rejected — deployment involves compilation and needs more time than a simple query.
- **Default Akka.NET transport settings**: Rejected — relying on undocumented defaults is risky for SCADA; explicit configuration ensures predictable behavior.
- **Automatic retry of in-flight messages**: Rejected — risks duplicate processing and contradicts the no-buffering-at-central principle.
- **No ordering guarantee**: Rejected — Akka.NET provides this for free; the design already implicitly relies on it.
- **Auto-resume debug streams on reconnection**: Rejected — adds state tracking complexity for a transient diagnostic feature.