Refine Data Connection Layer: error handling, reconnection, write failures, health reporting

Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on
disconnect, transparent re-subscribe), synchronous write failure errors to scripts,
periodic tag path resolution retry, and enhanced health reporting with tag resolution
counts. Update cross-references in Health Monitoring and Site Runtime.
This commit is contained in:
Joseph Doherty
2026-03-16 07:51:37 -04:00
parent f0108e161b
commit 19c7e6880f
4 changed files with 89 additions and 2 deletions

View File

@@ -25,7 +25,8 @@ Site clusters (metric collection and reporting). Central cluster (aggregation an
|--------|--------|-------------|
| Site online/offline | Communication Layer | Whether the site is reachable (based on heartbeat) |
| Active/standby node status | Cluster Infrastructure | Which node is active, which is standby |
| Data connection health | Data Connection Layer | Connected/disconnected per data connection |
| Data connection health | Data Connection Layer | Connected/disconnected/reconnecting per data connection |
| Tag resolution counts | Data Connection Layer | Per connection: total subscribed tags vs. successfully resolved tags |
| Script error rates | Site Runtime (Script Actors) | Frequency of script failures |
| Alarm evaluation error rates | Site Runtime (Alarm Actors) | Frequency of alarm evaluation failures |
| Store-and-forward buffer depth | Store-and-Forward Engine | Pending messages by category (external, notification, DB write) |