LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL adapter files, and related docs to deprecated/. Removed LmxProxy registration from DataConnectionFactory, project reference from DCL, protocol option from UI, and cleaned up all requirement docs.
223 lines
14 KiB
Markdown
223 lines
14 KiB
Markdown
# Component: Data Connection Layer
|
||
|
||
## Purpose
|
||
|
||
The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a **clean data pipe** — it performs no evaluation of triggers, alarm conditions, or business logic.
|
||
|
||
## Location
|
||
|
||
Site clusters only. Central does not interact with machines directly.
|
||
|
||
## Responsibilities
|
||
|
||
- Manage data connections defined centrally and deployed to sites as part of artifact deployment (OPC UA servers). Data connection definitions are stored in local SQLite after deployment.
|
||
- Establish and maintain connections to data sources based on deployed instance configurations.
|
||
- Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
|
||
- Deliver tag value updates to the requesting Instance Actors.
|
||
- Support writing values to machines (when Instance Actors forward `SetAttribute` write requests for data-connected attributes).
|
||
- Report data connection health status to the Health Monitoring component.
|
||
|
||
## Common Interface
|
||
|
||
All protocol adapters implement the same interface:
|
||
|
||
```
|
||
IDataConnection : IAsyncDisposable
|
||
├── Connect(connectionDetails) → void
|
||
├── Disconnect() → void
|
||
├── Subscribe(tagPath, callback) → subscriptionId
|
||
├── Unsubscribe(subscriptionId) → void
|
||
├── Read(tagPath) → value
|
||
├── ReadBatch(tagPaths) → values
|
||
├── Write(tagPath, value) → void
|
||
├── WriteBatch(values) → void
|
||
├── WriteBatchAndWait(values, flagPath, flagValue, responsePath, responseValue, timeout) → bool
|
||
├── Status → ConnectionHealth
|
||
└── Disconnected → event Action?
|
||
```
|
||
|
||
The `Disconnected` event is raised by an adapter when it detects an unexpected connection loss (server offline, network failure, keep-alive timeout). The `DataConnectionActor` subscribes to this event to trigger the reconnection state machine. Additional protocols can be added by implementing this interface.
|
||
|
||
### Common Value Type
|
||
|
||
All protocols produce the same value tuple consumed by Instance Actors. Before the first value update arrives from the DCL, data-sourced attributes are held at **uncertain** quality by the Instance Actor (see Site Runtime — Initialization):
|
||
|
||
| Concept | ScadaLink Design |
|
||
|---|---|
|
||
| Value container | `TagValue(Value, Quality, Timestamp)` |
|
||
| Quality | `QualityCode` enum: Good / Bad / Uncertain |
|
||
| Timestamp | `DateTimeOffset` (UTC) |
|
||
| Value type | `object?` |
|
||
|
||
## Supported Protocols
|
||
|
||
### OPC UA
|
||
- Uses the **OPC Foundation .NET Standard Library** (`OPCFoundation.NetStandard.Opc.Ua.Client`).
|
||
- Session-based connection with endpoint discovery, certificate handling, and configurable security modes.
|
||
- Subscriptions via OPC UA Monitored Items with data change notifications (1000ms sampling, queue size 10, discard-oldest).
|
||
- Read/Write via OPC UA Read/Write services with StatusCode-based quality mapping.
|
||
- Disconnect detection via `Session.KeepAlive` event (see Disconnect Detection Pattern below).
|
||
|
||
## Endpoint Redundancy
|
||
|
||
Data connections support an optional backup endpoint for automatic failover when the active endpoint becomes unreachable. Both endpoints use the same protocol.
|
||
|
||
**Entity fields:**
|
||
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| `PrimaryConfiguration` | string? (max 4000) | Required. Renamed from `Configuration` |
|
||
| `BackupConfiguration` | string? (max 4000) | Optional. Null = no backup |
|
||
| `FailoverRetryCount` | int (default 3) | Retries on active endpoint before switching |
|
||
|
||
**Failover state machine:**
|
||
|
||
```
|
||
Connected → disconnect → push bad quality → retry active endpoint (5s)
|
||
→ N failures (≥ FailoverRetryCount) → switch to other endpoint
|
||
→ dispose adapter, create fresh adapter with other config
|
||
→ reconnect → ReSubscribeAll → Connected
|
||
```
|
||
|
||
- **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working.
|
||
- **No auto-failback**: The connection remains on the active endpoint until it fails.
|
||
- **Single-endpoint connections** (no backup): Retry indefinitely on the same endpoint, preserving existing behavior.
|
||
- **Adapter lifecycle on failover**: The actor disposes the current `IDataConnection` adapter and creates a fresh one via `DataConnectionFactory.Create()` with the other endpoint's configuration. Clean slate — no stale state.
|
||
|
||
**Health reporting:**
|
||
|
||
- `DataConnectionHealthReport` includes `ActiveEndpoint`: `"Primary"`, `"Backup"`, or `"Primary (no backup)"`.
|
||
|
||
**Site event log entries:**
|
||
|
||
- `DataConnectionFailover` (Warning) — connection name, from-endpoint, to-endpoint, failure count.
|
||
- `DataConnectionRestored` (Info) — connection name, active endpoint.
|
||
|
||
See [`2026-03-22-primary-backup-data-connections-design.md`](../plans/2026-03-22-primary-backup-data-connections-design.md) for the full design.
|
||
|
||
## Connection Configuration Reference
|
||
|
||
All settings are parsed from the data connection's configuration JSON dictionaries (`PrimaryConfiguration` and optional `BackupConfiguration`, stored as `IDictionary<string, string>` connection details). Both endpoints use the same protocol-specific keys. Invalid numeric values fall back to defaults silently.
|
||
|
||
### OPC UA Settings
|
||
|
||
| Key | Type | Default | Description |
|
||
|-----|------|---------|-------------|
|
||
| `endpoint` / `EndpointUrl` | string | `opc.tcp://localhost:4840` | OPC UA server endpoint URL |
|
||
| `SessionTimeoutMs` | int | `60000` | OPC UA session timeout in milliseconds |
|
||
| `OperationTimeoutMs` | int | `15000` | Transport operation timeout in milliseconds |
|
||
| `PublishingIntervalMs` | int | `1000` | Subscription publishing interval in milliseconds |
|
||
| `KeepAliveCount` | int | `10` | Keep-alive frames before session timeout |
|
||
| `LifetimeCount` | int | `30` | Subscription lifetime in publish intervals |
|
||
| `MaxNotificationsPerPublish` | int | `100` | Max notifications batched per publish cycle |
|
||
| `SamplingIntervalMs` | int | `1000` | Per-item server sampling rate in milliseconds |
|
||
| `QueueSize` | int | `10` | Per-item notification buffer size |
|
||
| `SecurityMode` | string | `None` | Preferred endpoint security: `None`, `Sign`, or `SignAndEncrypt` |
|
||
| `AutoAcceptUntrustedCerts` | bool | `true` | Accept untrusted server certificates |
|
||
|
||
### Shared Settings (appsettings.json)
|
||
|
||
These are configured via `DataConnectionOptions` in `appsettings.json`, not per-connection:
|
||
|
||
| Setting | Default | Description |
|
||
|---------|---------|-------------|
|
||
| `ReconnectInterval` | 5s | Fixed interval between reconnection attempts |
|
||
| `TagResolutionRetryInterval` | 10s | Retry interval for unresolved tag paths |
|
||
| `WriteTimeout` | 30s | Timeout for write operations |
|
||
|
||
## Subscription Management
|
||
|
||
- When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
|
||
- The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
|
||
- Tag value updates are delivered directly to the requesting Instance Actor.
|
||
- When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
|
||
- When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.
|
||
|
||
## Write-Back Support
|
||
|
||
- When a script calls `Instance.SetAttribute` for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
|
||
- The DCL writes the value to the physical device via the appropriate protocol.
|
||
- The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
|
||
- The Instance Actor's in-memory value is **not** updated until the device confirms the write.
|
||
|
||
## Value Update Message Format
|
||
|
||
Each value update delivered to an Instance Actor includes:
|
||
- **Tag path**: The relative path of the attribute's data source reference.
|
||
- **Value**: The new value from the device.
|
||
- **Quality**: Data quality indicator (good, bad, uncertain).
|
||
- **Timestamp**: When the value was read from the device.
|
||
|
||
## Connection Actor Model
|
||
|
||
Each data connection is managed by a dedicated connection actor that uses the Akka.NET **Become/Stash** pattern to model its lifecycle as a state machine:
|
||
|
||
- **Connecting**: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are **stashed** (buffered in the actor's stash).
|
||
- **Connected**: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
|
||
- **Reconnecting**: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.
|
||
|
||
This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.
|
||
|
||
**OPC UA-specific notes**: The `RealOpcUaClient` uses the OPC Foundation SDK's `Session.KeepAlive` event for proactive disconnect detection. The SDK sends keep-alive requests at the subscription's `KeepAliveCount × PublishingInterval` (default: 10s). When keep-alive fails, the `ConnectionLost` event fires, triggering the same reconnection flow. On reconnection, the DCL re-creates the OPC UA session and subscription, then re-adds all monitored items.
|
||
|
||
## Connection Lifecycle & Reconnection
|
||
|
||
The DCL manages connection lifecycle automatically:
|
||
|
||
1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
|
||
2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. Individual gRPC/OPC UA operations (reads, writes) fail immediately to the caller on error; there is no operation-level retry within the adapter.
|
||
3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging.
|
||
4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions.
|
||
|
||
### Disconnect Detection Pattern
|
||
|
||
Each adapter implements the `IDataConnection.Disconnected` event to proactively signal connection loss to the `DataConnectionActor`. Detection uses two complementary paths:
|
||
|
||
**Proactive detection** (server goes offline between operations):
|
||
- **OPC UA**: The OPC Foundation SDK fires `Session.KeepAlive` events at regular intervals. `RealOpcUaClient` hooks this event; when `ServiceResult.IsBad(e.Status)` (server unreachable, keep-alive timeout), it fires `ConnectionLost`. The `OpcUaDataConnection` adapter translates this into `IDataConnection.Disconnected`.
|
||
|
||
**Reactive detection** (failure discovered during an operation):
|
||
- Both adapters wrap `ReadAsync` (and by extension `ReadBatchAsync`) with exception handling. If a read throws a non-cancellation exception, the adapter calls `RaiseDisconnected()` and re-throws. The `DataConnectionActor`'s existing error handling catches the exception while the disconnect event triggers the reconnection state machine.
|
||
|
||
**Event marshalling**: The `DataConnectionActor` subscribes to `_adapter.Disconnected` in `PreStart()`. Since `Disconnected` may fire from a background thread (gRPC stream task, OPC UA keep-alive timer), the handler sends an `AdapterDisconnected` message to `Self`, marshalling the notification onto the actor's message loop. This triggers `BecomeReconnecting()` → bad quality push → retry timer.
|
||
|
||
**Once-only guard**: `OpcUaDataConnection` uses a `volatile bool _disconnectFired` flag to ensure `RaiseDisconnected()` fires exactly once per connection session. The flag resets on successful reconnection (`ConnectAsync`).
|
||
|
||
## Write Failure Handling
|
||
|
||
Writes to physical devices are **synchronous** from the script's perspective:
|
||
|
||
- If the write fails (connection down, device rejection, timeout), the error is **returned to the calling script**. Script authors can catch and handle write errors (log, notify, retry, etc.).
|
||
- Write failures are also logged to Site Event Logging.
|
||
- There is **no store-and-forward for device writes** — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.
|
||
|
||
## Tag Path Resolution
|
||
|
||
When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):
|
||
|
||
1. The failure is **logged to Site Event Logging**.
|
||
2. The attribute is marked with quality `bad`.
|
||
3. The DCL **periodically retries resolution** at a configurable interval, accommodating devices that come online in stages or load modules after startup.
|
||
4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.
|
||
|
||
Note: Pre-deployment validation at central does **not** verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.
|
||
|
||
## Health Reporting
|
||
|
||
The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:
|
||
|
||
- **Connection status**: `connected`, `disconnected`, or `reconnecting` per data connection.
|
||
- **Tag resolution counts**: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.
|
||
|
||
## Dependencies
|
||
|
||
- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests.
|
||
- **Health Monitoring**: Reports connection status.
|
||
- **Site Event Logging**: Logs connection status changes.
|
||
|
||
## Interactions
|
||
|
||
- **Site Runtime (Instance Actors)**: Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
|
||
- **Health Monitoring**: Reports connection health periodically.
|
||
- **Site Event Logging**: Logs connection/disconnection events.
|