Files
scadalink-design/docs/requirements/Component-DataConnectionLayer.md
Joseph Doherty 9dccf8e72f deprecate(lmxproxy): move all LmxProxy code, tests, and docs to deprecated/
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
2026-04-08 15:56:23 -04:00

14 KiB
Raw Blame History

Component: Data Connection Layer

Purpose

The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a clean data pipe — it performs no evaluation of triggers, alarm conditions, or business logic.

Location

Site clusters only. Central does not interact with machines directly.

Responsibilities

  • Manage data connections defined centrally and deployed to sites as part of artifact deployment (OPC UA servers). Data connection definitions are stored in local SQLite after deployment.
  • Establish and maintain connections to data sources based on deployed instance configurations.
  • Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
  • Deliver tag value updates to the requesting Instance Actors.
  • Support writing values to machines (when Instance Actors forward SetAttribute write requests for data-connected attributes).
  • Report data connection health status to the Health Monitoring component.

Common Interface

All protocol adapters implement the same interface:

IDataConnection : IAsyncDisposable
├── Connect(connectionDetails) → void
├── Disconnect() → void
├── Subscribe(tagPath, callback) → subscriptionId
├── Unsubscribe(subscriptionId) → void
├── Read(tagPath) → value
├── ReadBatch(tagPaths) → values
├── Write(tagPath, value) → void
├── WriteBatch(values) → void
├── WriteBatchAndWait(values, flagPath, flagValue, responsePath, responseValue, timeout) → bool
├── Status → ConnectionHealth
└── Disconnected → event Action?

The Disconnected event is raised by an adapter when it detects an unexpected connection loss (server offline, network failure, keep-alive timeout). The DataConnectionActor subscribes to this event to trigger the reconnection state machine. Additional protocols can be added by implementing this interface.

Common Value Type

All protocols produce the same value tuple consumed by Instance Actors. Before the first value update arrives from the DCL, data-sourced attributes are held at uncertain quality by the Instance Actor (see Site Runtime — Initialization):

Concept ScadaLink Design
Value container TagValue(Value, Quality, Timestamp)
Quality QualityCode enum: Good / Bad / Uncertain
Timestamp DateTimeOffset (UTC)
Value type object?

Supported Protocols

OPC UA

  • Uses the OPC Foundation .NET Standard Library (OPCFoundation.NetStandard.Opc.Ua.Client).
  • Session-based connection with endpoint discovery, certificate handling, and configurable security modes.
  • Subscriptions via OPC UA Monitored Items with data change notifications (1000ms sampling, queue size 10, discard-oldest).
  • Read/Write via OPC UA Read/Write services with StatusCode-based quality mapping.
  • Disconnect detection via Session.KeepAlive event (see Disconnect Detection Pattern below).

Endpoint Redundancy

Data connections support an optional backup endpoint for automatic failover when the active endpoint becomes unreachable. Both endpoints use the same protocol.

Entity fields:

Field Type Notes
PrimaryConfiguration string? (max 4000) Required. Renamed from Configuration
BackupConfiguration string? (max 4000) Optional. Null = no backup
FailoverRetryCount int (default 3) Retries on active endpoint before switching

Failover state machine:

Connected → disconnect → push bad quality → retry active endpoint (5s)
  → N failures (≥ FailoverRetryCount) → switch to other endpoint
    → dispose adapter, create fresh adapter with other config
    → reconnect → ReSubscribeAll → Connected
  • Round-robin: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working.
  • No auto-failback: The connection remains on the active endpoint until it fails.
  • Single-endpoint connections (no backup): Retry indefinitely on the same endpoint, preserving existing behavior.
  • Adapter lifecycle on failover: The actor disposes the current IDataConnection adapter and creates a fresh one via DataConnectionFactory.Create() with the other endpoint's configuration. Clean slate — no stale state.

Health reporting:

  • DataConnectionHealthReport includes ActiveEndpoint: "Primary", "Backup", or "Primary (no backup)".

Site event log entries:

  • DataConnectionFailover (Warning) — connection name, from-endpoint, to-endpoint, failure count.
  • DataConnectionRestored (Info) — connection name, active endpoint.

See 2026-03-22-primary-backup-data-connections-design.md for the full design.

Connection Configuration Reference

All settings are parsed from the data connection's configuration JSON dictionaries (PrimaryConfiguration and optional BackupConfiguration, stored as IDictionary<string, string> connection details). Both endpoints use the same protocol-specific keys. Invalid numeric values fall back to defaults silently.

OPC UA Settings

Key Type Default Description
endpoint / EndpointUrl string opc.tcp://localhost:4840 OPC UA server endpoint URL
SessionTimeoutMs int 60000 OPC UA session timeout in milliseconds
OperationTimeoutMs int 15000 Transport operation timeout in milliseconds
PublishingIntervalMs int 1000 Subscription publishing interval in milliseconds
KeepAliveCount int 10 Keep-alive frames before session timeout
LifetimeCount int 30 Subscription lifetime in publish intervals
MaxNotificationsPerPublish int 100 Max notifications batched per publish cycle
SamplingIntervalMs int 1000 Per-item server sampling rate in milliseconds
QueueSize int 10 Per-item notification buffer size
SecurityMode string None Preferred endpoint security: None, Sign, or SignAndEncrypt
AutoAcceptUntrustedCerts bool true Accept untrusted server certificates

Shared Settings (appsettings.json)

These are configured via DataConnectionOptions in appsettings.json, not per-connection:

Setting Default Description
ReconnectInterval 5s Fixed interval between reconnection attempts
TagResolutionRetryInterval 10s Retry interval for unresolved tag paths
WriteTimeout 30s Timeout for write operations

Subscription Management

  • When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
  • The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
  • Tag value updates are delivered directly to the requesting Instance Actor.
  • When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
  • When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.

Write-Back Support

  • When a script calls Instance.SetAttribute for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
  • The DCL writes the value to the physical device via the appropriate protocol.
  • The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
  • The Instance Actor's in-memory value is not updated until the device confirms the write.

Value Update Message Format

Each value update delivered to an Instance Actor includes:

  • Tag path: The relative path of the attribute's data source reference.
  • Value: The new value from the device.
  • Quality: Data quality indicator (good, bad, uncertain).
  • Timestamp: When the value was read from the device.

Connection Actor Model

Each data connection is managed by a dedicated connection actor that uses the Akka.NET Become/Stash pattern to model its lifecycle as a state machine:

  • Connecting: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are stashed (buffered in the actor's stash).
  • Connected: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
  • Reconnecting: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.

This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.

OPC UA-specific notes: The RealOpcUaClient uses the OPC Foundation SDK's Session.KeepAlive event for proactive disconnect detection. The SDK sends keep-alive requests at the subscription's KeepAliveCount × PublishingInterval (default: 10s). When keep-alive fails, the ConnectionLost event fires, triggering the same reconnection flow. On reconnection, the DCL re-creates the OPC UA session and subscription, then re-adds all monitored items.

Connection Lifecycle & Reconnection

The DCL manages connection lifecycle automatically:

  1. Connection drop detection: When a connection to a data source is lost, the DCL immediately pushes a value update with quality bad for every tag subscribed on that connection. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
  2. Auto-reconnect with fixed interval: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined per data connection. This is consistent with the fixed-interval retry philosophy used throughout the system. Individual gRPC/OPC UA operations (reads, writes) fail immediately to the caller on error; there is no operation-level retry within the adapter.
  3. Connection state transitions: The DCL tracks each connection's state as connected, disconnected, or reconnecting. All transitions are logged to Site Event Logging.
  4. Transparent re-subscribe: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to good as fresh values arrive from restored subscriptions.

Disconnect Detection Pattern

Each adapter implements the IDataConnection.Disconnected event to proactively signal connection loss to the DataConnectionActor. Detection uses two complementary paths:

Proactive detection (server goes offline between operations):

  • OPC UA: The OPC Foundation SDK fires Session.KeepAlive events at regular intervals. RealOpcUaClient hooks this event; when ServiceResult.IsBad(e.Status) (server unreachable, keep-alive timeout), it fires ConnectionLost. The OpcUaDataConnection adapter translates this into IDataConnection.Disconnected.

Reactive detection (failure discovered during an operation):

  • Both adapters wrap ReadAsync (and by extension ReadBatchAsync) with exception handling. If a read throws a non-cancellation exception, the adapter calls RaiseDisconnected() and re-throws. The DataConnectionActor's existing error handling catches the exception while the disconnect event triggers the reconnection state machine.

Event marshalling: The DataConnectionActor subscribes to _adapter.Disconnected in PreStart(). Since Disconnected may fire from a background thread (gRPC stream task, OPC UA keep-alive timer), the handler sends an AdapterDisconnected message to Self, marshalling the notification onto the actor's message loop. This triggers BecomeReconnecting() → bad quality push → retry timer.

Once-only guard: OpcUaDataConnection uses a volatile bool _disconnectFired flag to ensure RaiseDisconnected() fires exactly once per connection session. The flag resets on successful reconnection (ConnectAsync).

Write Failure Handling

Writes to physical devices are synchronous from the script's perspective:

  • If the write fails (connection down, device rejection, timeout), the error is returned to the calling script. Script authors can catch and handle write errors (log, notify, retry, etc.).
  • Write failures are also logged to Site Event Logging.
  • There is no store-and-forward for device writes — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.

Tag Path Resolution

When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):

  1. The failure is logged to Site Event Logging.
  2. The attribute is marked with quality bad.
  3. The DCL periodically retries resolution at a configurable interval, accommodating devices that come online in stages or load modules after startup.
  4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.

Note: Pre-deployment validation at central does not verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.

Health Reporting

The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:

  • Connection status: connected, disconnected, or reconnecting per data connection.
  • Tag resolution counts: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.

Dependencies

  • Site Runtime (Instance Actors): Receives subscription registrations and delivers value updates. Receives write requests.
  • Health Monitoring: Reports connection status.
  • Site Event Logging: Logs connection status changes.

Interactions

  • Site Runtime (Instance Actors): Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
  • Health Monitoring: Reports connection health periodically.
  • Site Event Logging: Logs connection/disconnection events.