# Data Connection Layer Refinement — Design **Date**: 2026-03-16 **Component**: Data Connection Layer (`docs/requirements/Component-DataConnectionLayer.md`) **Status**: Approved ## Problem The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity. ## Decisions ### Connection Lifecycle & Reconnection - **Auto-reconnect with fixed interval** per data connection, consistent with the system's fixed-interval retry philosophy. - **Immediate bad quality** on disconnect — all tags on the affected connection are pushed with quality `bad` as soon as the connection drops. - **Transparent re-subscribe** on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to `good` as values resume. - Connection state transitions (`connected` / `disconnected` / `reconnecting`) logged to Site Event Logging. ### Write Failure Handling - Writes are **synchronous** from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script. - **No store-and-forward for device writes** — buffering stale setpoints is dangerous for industrial control. - Write failures also logged to Site Event Logging. ### Tag Path Resolution - Unresolvable tag paths are marked with quality `bad` and logged. - **Periodic retry** at a configurable interval to accommodate devices that start in stages. - On successful resolution, the subscription activates normally. ### Health Reporting - Per-connection status: `connected` / `disconnected` / `reconnecting`. - Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags. - Both reported via existing Health Monitoring heartbeat. ### Subscription Model - **No deduplication** — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale. ## Affected Documents | Document | Change | |----------|--------| | `docs/requirements/Component-DataConnectionLayer.md` | Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting | | `docs/requirements/Component-HealthMonitoring.md` | Added tag resolution counts to monitored metrics table | | `docs/requirements/Component-SiteRuntime.md` | Updated SetAttribute description to note synchronous write failure errors | ## Alternatives Considered - **Exponential backoff for reconnection**: Rejected — fixed interval is simpler and consistent with the rest of the system. - **Grace period before marking quality as bad**: Rejected — in SCADA, immediate staleness indication is safer. - **Instance Actor-driven re-subscription**: Rejected — adds complexity to Instance Actors for no benefit. - **Fire-and-forget writes**: Rejected — script authors need to know when device writes fail. - **Subscription deduplication in DCL**: Rejected — adds reference-counting complexity for minimal gain at expected scale.