Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe), synchronous write failure errors to scripts, periodic tag path resolution retry, and enhanced health reporting with tag resolution counts. Update cross-references in Health Monitoring and Site Runtime.
3.1 KiB
3.1 KiB
Data Connection Layer Refinement — Design
Date: 2026-03-16
Component: Data Connection Layer (Component-DataConnectionLayer.md)
Status: Approved
Problem
The Data Connection Layer doc covered the happy path (interface, subscriptions, write-back, value format) but lacked specification for error handling, reconnection behavior, write failures, tag path resolution, and health reporting granularity.
Decisions
Connection Lifecycle & Reconnection
- Auto-reconnect with fixed interval per data connection, consistent with the system's fixed-interval retry philosophy.
- Immediate bad quality on disconnect — all tags on the affected connection are pushed with quality
badas soon as the connection drops. - Transparent re-subscribe on reconnection — the DCL re-establishes all prior subscriptions automatically. Instance Actors take no action; they see quality return to
goodas values resume. - Connection state transitions (
connected/disconnected/reconnecting) logged to Site Event Logging.
Write Failure Handling
- Writes are synchronous from the script's perspective. Failures (connection down, device rejection, timeout) return an error to the calling script.
- No store-and-forward for device writes — buffering stale setpoints is dangerous for industrial control.
- Write failures also logged to Site Event Logging.
Tag Path Resolution
- Unresolvable tag paths are marked with quality
badand logged. - Periodic retry at a configurable interval to accommodate devices that start in stages.
- On successful resolution, the subscription activates normally.
Health Reporting
- Per-connection status:
connected/disconnected/reconnecting. - Per-connection tag resolution counts: total subscribed tags vs. successfully resolved tags.
- Both reported via existing Health Monitoring heartbeat.
Subscription Model
- No deduplication — each Instance Actor gets its own subscription even if multiple actors subscribe to the same tag path. Protocol layers (e.g., OPC UA) handle this efficiently at the expected scale.
Affected Documents
| Document | Change |
|---|---|
Component-DataConnectionLayer.md |
Added 4 new sections: Connection Lifecycle & Reconnection, Write Failure Handling, Tag Path Resolution, Health Reporting |
Component-HealthMonitoring.md |
Added tag resolution counts to monitored metrics table |
Component-SiteRuntime.md |
Updated SetAttribute description to note synchronous write failure errors |
Alternatives Considered
- Exponential backoff for reconnection: Rejected — fixed interval is simpler and consistent with the rest of the system.
- Grace period before marking quality as bad: Rejected — in SCADA, immediate staleness indication is safer.
- Instance Actor-driven re-subscription: Rejected — adds complexity to Instance Actors for no benefit.
- Fire-and-forget writes: Rejected — script authors need to know when device writes fail.
- Subscription deduplication in DCL: Rejected — adds reference-counting complexity for minimal gain at expected scale.