Files
scadalink-design/docs/plans/2026-03-16-data-connection-layer-refinement-design.md
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00

52 lines
3.1 KiB
Markdown

# Data Connection Layer Refinement — Design
**Date**: 2026-03-16
**Component**: Data Connection Layer (`docs/requirements/Component-DataConnectionLayer.md`)
**Status**: Approved
## Problem
The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity.
## Decisions
### Connection Lifecycle & Reconnection
- **Auto-reconnect with fixed interval** per data connection, consistent with the system's fixed-interval retry philosophy.
- **Immediate bad quality** on disconnect — all tags on the affected connection are pushed with quality `bad` as soon as the connection drops.
- **Transparent re-subscribe** on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to `good` as values resume.
- Connection state transitions (`connected` / `disconnected` / `reconnecting`) logged to Site Event Logging.
### Write Failure Handling
- Writes are **synchronous** from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script.
- **No store-and-forward for device writes** — buffering stale setpoints is dangerous for industrial control.
- Write failures also logged to Site Event Logging.
### Tag Path Resolution
- Unresolvable tag paths are marked with quality `bad` and logged.
- **Periodic retry** at a configurable interval to accommodate devices that start in stages.
- On successful resolution, the subscription activates normally.
### Health Reporting
- Per-connection status: `connected` / `disconnected` / `reconnecting`.
- Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags.
- Both reported via existing Health Monitoring heartbeat.
### Subscription Model
- **No deduplication** — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale.
## Affected Documents
| Document | Change |
|----------|--------|
| `docs/requirements/Component-DataConnectionLayer.md` | Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting |
| `docs/requirements/Component-HealthMonitoring.md` | Added tag resolution counts to monitored metrics table |
| `docs/requirements/Component-SiteRuntime.md` | Updated SetAttribute description to note synchronous write failure errors |
## Alternatives Considered
- **Exponential backoff for reconnection**: Rejected — fixed interval is simpler and consistent with the rest of the system.
- **Grace period before marking quality as bad**: Rejected — in SCADA, immediate staleness indication is safer.
- **Instance Actor-driven re-subscription**: Rejected — adds complexity to Instance Actors for no benefit.
- **Fire-and-forget writes**: Rejected — script authors need to know when device writes fail.
- **Subscription deduplication in DCL**: Rejected — adds reference-counting complexity for minimal gain at expected scale.