Files
scadalink-design/docs/plans/2026-03-16-data-connection-layer-refinement-design.md
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00

3.1 KiB

Data Connection Layer Refinement — Design

Date: 2026-03-16 Component: Data Connection Layer (docs/requirements/Component-DataConnectionLayer.md) Status: Approved

Problem

The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity.

Decisions

Connection Lifecycle & Reconnection

  • Auto-reconnect with fixed interval per data connection, consistent with the system's fixed-interval retry philosophy.
  • Immediate bad quality on disconnect — all tags on the affected connection are pushed with quality bad as soon as the connection drops.
  • Transparent re-subscribe on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to good as values resume.
  • Connection state transitions (connected / disconnected / reconnecting) logged to Site Event Logging.

Write Failure Handling

  • Writes are synchronous from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script.
  • No store-and-forward for device writes — buffering stale setpoints is dangerous for industrial control.
  • Write failures also logged to Site Event Logging.

Tag Path Resolution

  • Unresolvable tag paths are marked with quality bad and logged.
  • Periodic retry at a configurable interval to accommodate devices that start in stages.
  • On successful resolution, the subscription activates normally.

Health Reporting

  • Per-connection status: connected / disconnected / reconnecting.
  • Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags.
  • Both reported via existing Health Monitoring heartbeat.

Subscription Model

  • No deduplication — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale.

Affected Documents

Document Change
docs/requirements/Component-DataConnectionLayer.md Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting
docs/requirements/Component-HealthMonitoring.md Added tag resolution counts to monitored metrics table
docs/requirements/Component-SiteRuntime.md Updated SetAttribute description to note synchronous write failure errors

Alternatives Considered

  • Exponential backoff for reconnection: Rejected — fixed interval is simpler and consistent with the rest of the system.
  • Grace period before marking quality as bad: Rejected — in SCADA, immediate staleness indication is safer.
  • Instance Actor-driven re-subscription: Rejected — adds complexity to Instance Actors for no benefit.
  • Fire-and-forget writes: Rejected — script authors need to know when device writes fail.
  • Subscription deduplication in DCL: Rejected — adds reference-counting complexity for minimal gain at expected scale.