Files
scadalink-design/docs/plans/2026-03-16-data-connection-layer-refinement-design.md
Joseph Doherty 19c7e6880f Refine Data Connection Layer: error handling, reconnection, write failures, health reporting
Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on
disconnect, transparent re-subscribe), synchronous write failure errors to scripts,
periodic tag path resolution retry, and enhanced health reporting with tag resolution
counts. Update cross-references in Health Monitoring and Site Runtime.
2026-03-16 07:51:37 -04:00

3.1 KiB

Data Connection Layer Refinement — Design

Date: 2026-03-16 Component: Data Connection Layer (Component-DataConnectionLayer.md) Status: Approved

Problem

The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity.

Decisions

Connection Lifecycle & Reconnection

  • Auto-reconnect with fixed interval per data connection, consistent with the system's fixed-interval retry philosophy.
  • Immediate bad quality on disconnect — all tags on the affected connection are pushed with quality bad as soon as the connection drops.
  • Transparent re-subscribe on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to good as values resume.
  • Connection state transitions (connected / disconnected / reconnecting) logged to Site Event Logging.

Write Failure Handling

  • Writes are synchronous from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script.
  • No store-and-forward for device writes — buffering stale setpoints is dangerous for industrial control.
  • Write failures also logged to Site Event Logging.

Tag Path Resolution

  • Unresolvable tag paths are marked with quality bad and logged.
  • Periodic retry at a configurable interval to accommodate devices that start in stages.
  • On successful resolution, the subscription activates normally.

Health Reporting

  • Per-connection status: connected / disconnected / reconnecting.
  • Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags.
  • Both reported via existing Health Monitoring heartbeat.

Subscription Model

  • No deduplication — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale.

Affected Documents

Document Change
Component-DataConnectionLayer.md Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting
Component-HealthMonitoring.md Added tag resolution counts to monitored metrics table
Component-SiteRuntime.md Updated SetAttribute description to note synchronous write failure errors

Alternatives Considered

  • Exponential backoff for reconnection: Rejected — fixed interval is simpler and consistent with the rest of the system.
  • Grace period before marking quality as bad: Rejected — in SCADA, immediate staleness indication is safer.
  • Instance Actor-driven re-subscription: Rejected — adds complexity to Instance Actors for no benefit.
  • Fire-and-forget writes: Rejected — script authors need to know when device writes fail.
  • Subscription deduplication in DCL: Rejected — adds reference-counting complexity for minimal gain at expected scale.