Organize documentation by moving requirements (HighLevelReqs, Component-*, lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to docs/test_infra/. Updates all cross-references in README, CLAUDE.md, infra/README, component docs, and 23 plan files.
52 lines
3.1 KiB
Markdown
52 lines
3.1 KiB
Markdown
# Data Connection Layer Refinement — Design
|
|
|
|
**Date**: 2026-03-16
|
|
**Component**: Data Connection Layer (`docs/requirements/Component-DataConnectionLayer.md`)
|
|
**Status**: Approved
|
|
|
|
## Problem
|
|
|
|
The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity.
|
|
|
|
## Decisions
|
|
|
|
### Connection Lifecycle & Reconnection
|
|
- **Auto-reconnect with fixed interval** per data connection, consistent with the system's fixed-interval retry philosophy.
|
|
- **Immediate bad quality** on disconnect — all tags on the affected connection are pushed with quality `bad` as soon as the connection drops.
|
|
- **Transparent re-subscribe** on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to `good` as values resume.
|
|
- Connection state transitions (`connected` / `disconnected` / `reconnecting`) logged to Site Event Logging.
|
|
|
|
### Write Failure Handling
|
|
- Writes are **synchronous** from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script.
|
|
- **No store-and-forward for device writes** — buffering stale setpoints is dangerous for industrial control.
|
|
- Write failures also logged to Site Event Logging.
|
|
|
|
### Tag Path Resolution
|
|
- Unresolvable tag paths are marked with quality `bad` and logged.
|
|
- **Periodic retry** at a configurable interval to accommodate devices that start in stages.
|
|
- On successful resolution, the subscription activates normally.
|
|
|
|
### Health Reporting
|
|
- Per-connection status: `connected` / `disconnected` / `reconnecting`.
|
|
- Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags.
|
|
- Both reported via existing Health Monitoring heartbeat.
|
|
|
|
### Subscription Model
|
|
- **No deduplication** — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale.
|
|
|
|
## Affected Documents
|
|
|
|
| Document | Change |
|
|
|----------|--------|
|
|
| `docs/requirements/Component-DataConnectionLayer.md` | Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting |
|
|
| `docs/requirements/Component-HealthMonitoring.md` | Added tag resolution counts to monitored metrics table |
|
|
| `docs/requirements/Component-SiteRuntime.md` | Updated SetAttribute description to note synchronous write failure errors |
|
|
|
|
## Alternatives Considered
|
|
|
|
- **Exponential backoff for reconnection**: Rejected — fixed interval is simpler and consistent with the rest of the system.
|
|
- **Grace period before marking quality as bad**: Rejected — in SCADA, immediate staleness indication is safer.
|
|
- **Instance Actor-driven re-subscription**: Rejected — adds complexity to Instance Actors for no benefit.
|
|
- **Fire-and-forget writes**: Rejected — script authors need to know when device writes fail.
|
|
- **Subscription deduplication in DCL**: Rejected — adds reference-counting complexity for minimal gain at expected scale.
|