Refine Data Connection Layer: error handling, reconnection, write failures, health reporting
Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe), synchronous write failure errors to scripts, periodic tag path resolution retry, and enhanced health reporting with tag resolution counts. Update cross-references in Health Monitoring and Site Runtime.
This commit is contained in:
@@ -67,6 +67,41 @@ Each value update delivered to an Instance Actor includes:
|
||||
- **Quality**: Data quality indicator (good, bad, uncertain).
|
||||
- **Timestamp**: When the value was read from the device.
|
||||
|
||||
## Connection Lifecycle & Reconnection
|
||||
|
||||
The DCL manages connection lifecycle automatically:
|
||||
|
||||
1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
|
||||
2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system.
|
||||
3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging.
|
||||
4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions.
|
||||
|
||||
## Write Failure Handling
|
||||
|
||||
Writes to physical devices are **synchronous** from the script's perspective:
|
||||
|
||||
- If the write fails (connection down, device rejection, timeout), the error is **returned to the calling script**. Script authors can catch and handle write errors (log, notify, retry, etc.).
|
||||
- Write failures are also logged to Site Event Logging.
|
||||
- There is **no store-and-forward for device writes** — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.
|
||||
|
||||
## Tag Path Resolution
|
||||
|
||||
When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):
|
||||
|
||||
1. The failure is **logged to Site Event Logging**.
|
||||
2. The attribute is marked with quality `bad`.
|
||||
3. The DCL **periodically retries resolution** at a configurable interval, accommodating devices that come online in stages or load modules after startup.
|
||||
4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.
|
||||
|
||||
Note: Pre-deployment validation at central does **not** verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.
|
||||
|
||||
## Health Reporting
|
||||
|
||||
The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:
|
||||
|
||||
- **Connection status**: `connected`, `disconnected`, or `reconnecting` per data connection.
|
||||
- **Tag resolution counts**: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests.
|
||||
|
||||
Reference in New Issue
Block a user