Attributes bound to data connections now initialize with "Uncertain" quality, distinguishing "never received a value" from "known good" or "connection lost." Quality is tracked per attribute and included in GetAttributeResponse.
181 lines
12 KiB
Markdown
181 lines
12 KiB
Markdown
# Component: Data Connection Layer
|
||
|
||
## Purpose
|
||
|
||
The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a **clean data pipe** — it performs no evaluation of triggers, alarm conditions, or business logic.
|
||
|
||
## Location
|
||
|
||
Site clusters only. Central does not interact with machines directly.
|
||
|
||
## Responsibilities
|
||
|
||
- Manage data connections defined centrally and deployed to sites as part of artifact deployment (OPC UA servers, LmxProxy endpoints). Data connection definitions are stored in local SQLite after deployment.
|
||
- Establish and maintain connections to data sources based on deployed instance configurations.
|
||
- Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
|
||
- Deliver tag value updates to the requesting Instance Actors.
|
||
- Support writing values to machines (when Instance Actors forward `SetAttribute` write requests for data-connected attributes).
|
||
- Report data connection health status to the Health Monitoring component.
|
||
|
||
## Common Interface
|
||
|
||
Both OPC UA and LmxProxy implement the same interface:
|
||
|
||
```
|
||
IDataConnection : IAsyncDisposable
|
||
├── Connect(connectionDetails) → void
|
||
├── Disconnect() → void
|
||
├── Subscribe(tagPath, callback) → subscriptionId
|
||
├── Unsubscribe(subscriptionId) → void
|
||
├── Read(tagPath) → value
|
||
├── ReadBatch(tagPaths) → values
|
||
├── Write(tagPath, value) → void
|
||
├── WriteBatch(values) → void
|
||
├── WriteBatchAndWait(values, flagPath, flagValue, responsePath, responseValue, timeout) → bool
|
||
└── Status → ConnectionHealth
|
||
```
|
||
|
||
Additional protocols can be added by implementing this interface.
|
||
|
||
### Concrete Type Mappings
|
||
|
||
| IDataConnection | OPC UA SDK | LmxProxy SDK (`LmxProxyClient`) |
|
||
|---|---|---|
|
||
| `Connect()` | OPC UA session establishment | `ConnectAsync()` → gRPC `ConnectRequest`, server returns `SessionId` |
|
||
| `Disconnect()` | Close OPC UA session | `DisconnectAsync()` → gRPC `DisconnectRequest` |
|
||
| `Subscribe(tagPath, callback)` | OPC UA Monitored Items | `SubscribeAsync(addresses, onUpdate)` → server-streaming gRPC (`IAsyncEnumerable<VtqMessage>`) |
|
||
| `Unsubscribe(id)` | Remove Monitored Item | `ISubscription.DisposeAsync()` (cancels streaming RPC) |
|
||
| `Read(tagPath)` | OPC UA Read | `ReadAsync(address)` → `Vtq` |
|
||
| `ReadBatch(tagPaths)` | OPC UA Read (multiple nodes) | `ReadBatchAsync(addresses)` → `IDictionary<string, Vtq>` |
|
||
| `Write(tagPath, value)` | OPC UA Write | `WriteAsync(address, value)` |
|
||
| `WriteBatch(values)` | OPC UA Write (multiple nodes) | `WriteBatchAsync(values)` |
|
||
| `WriteBatchAndWait(...)` | OPC UA Write + poll for confirmation | `WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, timeout)` |
|
||
| `Status` | OPC UA session state | `IsConnected` property + keep-alive heartbeat (30-second interval via `GetConnectionStateAsync`) |
|
||
|
||
### Common Value Type
|
||
|
||
Both protocols produce the same value tuple consumed by Instance Actors. Before the first value update arrives from the DCL, data-sourced attributes are held at **uncertain** quality by the Instance Actor (see Site Runtime — Initialization):
|
||
|
||
| Concept | ScadaLink Design | LmxProxy SDK (`Vtq`) |
|
||
|---|---|---|
|
||
| Value container | `{value, quality, timestamp}` | `Vtq(Value, Timestamp, Quality)` — readonly record struct |
|
||
| Quality | good / bad / uncertain | `Quality` enum (byte, OPC UA compatible: Good=0xC0, Bad=0x00, Uncertain=0x40) |
|
||
| Timestamp | UTC | `DateTime` (UTC) |
|
||
| Value type | object | `object?` (parsed: double, bool, string) |
|
||
|
||
## Supported Protocols
|
||
|
||
### OPC UA
|
||
- Standard OPC UA client implementation.
|
||
- Supports subscriptions (monitored items) and read/write operations.
|
||
|
||
### LmxProxy (Custom Protocol)
|
||
|
||
LmxProxy is a gRPC-based protocol for communicating with LMX data servers. An existing client SDK (`LmxProxyClient` NuGet package) provides a production-ready implementation.
|
||
|
||
**Transport & Connection**:
|
||
- gRPC over HTTP/2, using protobuf-net code-first contracts (service: `scada.ScadaService`).
|
||
- Default port: **5050**.
|
||
- Session-based: `ConnectAsync` returns a `SessionId` used for all subsequent operations.
|
||
- Keep-alive: 30-second heartbeat via `GetConnectionStateAsync`. On failure, the client marks itself disconnected and disposes subscriptions.
|
||
|
||
**Authentication & TLS**:
|
||
- API key-based authentication (sent in `ConnectRequest`).
|
||
- Full TLS support: TLS 1.2/1.3, mutual TLS (client cert + key in PEM), custom CA trust, self-signed cert allowance for dev.
|
||
|
||
**Subscriptions**:
|
||
- Server-streaming gRPC (`IAsyncEnumerable<VtqMessage>`).
|
||
- Configurable sampling interval (default: 1000ms; 0 = on-change).
|
||
- Wire format: `VtqMessage { Tag, Value (string), TimestampUtcTicks (long), Quality (string: "Good"/"Uncertain"/"Bad") }`.
|
||
- Subscription disposed via `ISubscription.DisposeAsync()`.
|
||
|
||
**Additional Capabilities (beyond IDataConnection)**:
|
||
- Built-in retry policy via Polly: exponential backoff (base delay × 2^attempt), configurable max attempts (default: 3), applied to reads. Transient errors: `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`.
|
||
- Operation metrics: count, errors, p95/p99 latency (ring buffer of last 1000 samples per operation).
|
||
- Correlation ID propagation for distributed tracing (configurable header name).
|
||
- DI integration: `AddLmxProxyClient(IConfiguration)` binds to `"LmxProxy"` config section in `appsettings.json`.
|
||
|
||
**SDK Reference**: The client SDK source is at `LmxProxyClient` in the ScadaBridge repository. The DCL's LmxProxy adapter wraps this SDK behind the `IDataConnection` interface.
|
||
|
||
## Subscription Management
|
||
|
||
- When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
|
||
- The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
|
||
- Tag value updates are delivered directly to the requesting Instance Actor.
|
||
- When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
|
||
- When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.
|
||
|
||
## Write-Back Support
|
||
|
||
- When a script calls `Instance.SetAttribute` for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
|
||
- The DCL writes the value to the physical device via the appropriate protocol.
|
||
- The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
|
||
- The Instance Actor's in-memory value is **not** updated until the device confirms the write.
|
||
|
||
## Value Update Message Format
|
||
|
||
Each value update delivered to an Instance Actor includes:
|
||
- **Tag path**: The relative path of the attribute's data source reference.
|
||
- **Value**: The new value from the device.
|
||
- **Quality**: Data quality indicator (good, bad, uncertain).
|
||
- **Timestamp**: When the value was read from the device.
|
||
|
||
## Connection Actor Model
|
||
|
||
Each data connection is managed by a dedicated connection actor that uses the Akka.NET **Become/Stash** pattern to model its lifecycle as a state machine:
|
||
|
||
- **Connecting**: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are **stashed** (buffered in the actor's stash).
|
||
- **Connected**: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
|
||
- **Reconnecting**: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.
|
||
|
||
This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.
|
||
|
||
**LmxProxy-specific notes**: The LmxProxy connection actor holds the `SessionId` returned by `ConnectAsync` and passes it to all subsequent operations. On entering the **Connected** state, the actor starts the 30-second keep-alive timer. Subscriptions use server-streaming gRPC — the actor processes the `IAsyncEnumerable<VtqMessage>` stream and forwards updates to Instance Actors. On keep-alive failure, the actor transitions to **Reconnecting** and the client automatically disposes active subscriptions.
|
||
|
||
## Connection Lifecycle & Reconnection
|
||
|
||
The DCL manages connection lifecycle automatically:
|
||
|
||
1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
|
||
2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. **Note on LmxProxy**: The LmxProxy SDK includes its own retry policy (exponential backoff via Polly) for individual operations (reads). The DCL's fixed-interval reconnect owns **connection-level** recovery (re-establishing the gRPC session after a keep-alive failure or disconnect). The SDK's retry policy handles **operation-level** transient failures within an active session. These are complementary — the DCL does not disable the SDK's retry policy.
|
||
3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging.
|
||
4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions.
|
||
|
||
## Write Failure Handling
|
||
|
||
Writes to physical devices are **synchronous** from the script's perspective:
|
||
|
||
- If the write fails (connection down, device rejection, timeout), the error is **returned to the calling script**. Script authors can catch and handle write errors (log, notify, retry, etc.).
|
||
- Write failures are also logged to Site Event Logging.
|
||
- There is **no store-and-forward for device writes** — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.
|
||
|
||
## Tag Path Resolution
|
||
|
||
When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):
|
||
|
||
1. The failure is **logged to Site Event Logging**.
|
||
2. The attribute is marked with quality `bad`.
|
||
3. The DCL **periodically retries resolution** at a configurable interval, accommodating devices that come online in stages or load modules after startup.
|
||
4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.
|
||
|
||
Note: Pre-deployment validation at central does **not** verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.
|
||
|
||
## Health Reporting
|
||
|
||
The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:
|
||
|
||
- **Connection status**: `connected`, `disconnected`, or `reconnecting` per data connection.
|
||
- **Tag resolution counts**: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.
|
||
|
||
## Dependencies
|
||
|
||
- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests.
|
||
- **Health Monitoring**: Reports connection status.
|
||
- **Site Event Logging**: Logs connection status changes.
|
||
|
||
## Interactions
|
||
|
||
- **Site Runtime (Instance Actors)**: Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
|
||
- **Health Monitoring**: Reports connection health periodically.
|
||
- **Site Event Logging**: Logs connection/disconnection events.
|