Files
scadalink-design/Component-DataConnectionLayer.md
Joseph Doherty a9fa74d5ac Document LmxProxy protocol in DCL, strengthen plan generation traceability guards, and add UI constraints
- Replace "custom protocol" placeholder with full LmxProxy details (gRPC transport, SDK API mapping, session management, keep-alive, TLS, batch ops)
- Add bullet-level requirement traceability, design constraint traceability (52 KDD + 6 CD), split-section tracking, and post-generation orphan check to plan framework
- Resolve Q9 (LmxProxy), Q11 (REST test server), Q13 (solo dev), Q14 (self-test), Q15 (Machine Data DB out of scope)
- Set Central UI constraints: Blazor Server + Bootstrap only, no heavy frameworks, custom components, clean corporate design
2026-03-16 15:08:57 -04:00

12 KiB
Raw Blame History

Component: Data Connection Layer

Purpose

The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a clean data pipe — it performs no evaluation of triggers, alarm conditions, or business logic.

Location

Site clusters only. Central does not interact with machines directly.

Responsibilities

  • Manage data connections defined at the site level (OPC UA servers, LmxProxy endpoints).
  • Establish and maintain connections to data sources based on deployed instance configurations.
  • Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration).
  • Deliver tag value updates to the requesting Instance Actors.
  • Support writing values to machines (when Instance Actors forward SetAttribute write requests for data-connected attributes).
  • Report data connection health status to the Health Monitoring component.

Common Interface

Both OPC UA and LmxProxy implement the same interface:

IDataConnection
├── Connect(connectionDetails) → void
├── Disconnect() → void
├── Subscribe(tagPath, callback) → subscriptionId
├── Unsubscribe(subscriptionId) → void
├── Read(tagPath) → value
├── Write(tagPath, value) → void
└── Status → ConnectionHealth

Additional protocols can be added by implementing this interface.

Concrete Type Mappings

IDataConnection OPC UA SDK LmxProxy SDK (LmxProxyClient)
Connect() OPC UA session establishment ConnectAsync() → gRPC ConnectRequest, server returns SessionId
Disconnect() Close OPC UA session DisconnectAsync() → gRPC DisconnectRequest
Subscribe(tagPath, callback) OPC UA Monitored Items SubscribeAsync(addresses, onUpdate) → server-streaming gRPC (IAsyncEnumerable<VtqMessage>)
Unsubscribe(id) Remove Monitored Item ISubscription.DisposeAsync() (cancels streaming RPC)
Read(tagPath) OPC UA Read ReadAsync(address)Vtq
Write(tagPath, value) OPC UA Write WriteAsync(address, value)
Status OPC UA session state IsConnected property + keep-alive heartbeat (30-second interval via GetConnectionStateAsync)

Common Value Type

Both protocols produce the same value tuple consumed by Instance Actors:

Concept ScadaLink Design LmxProxy SDK (Vtq)
Value container {value, quality, timestamp} Vtq(Value, Timestamp, Quality) — readonly record struct
Quality good / bad / uncertain Quality enum (byte, OPC UA compatible: Good=0xC0, Bad=0x00, Uncertain=0x40)
Timestamp UTC DateTime (UTC)
Value type object object? (parsed: double, bool, string)

Supported Protocols

OPC UA

  • Standard OPC UA client implementation.
  • Supports subscriptions (monitored items) and read/write operations.

LmxProxy (Custom Protocol)

LmxProxy is a gRPC-based protocol for communicating with LMX data servers. An existing client SDK (LmxProxyClient NuGet package) provides a production-ready implementation.

Transport & Connection:

  • gRPC over HTTP/2, using protobuf-net code-first contracts (service: scada.ScadaService).
  • Default port: 5050.
  • Session-based: ConnectAsync returns a SessionId used for all subsequent operations.
  • Keep-alive: 30-second heartbeat via GetConnectionStateAsync. On failure, the client marks itself disconnected and disposes subscriptions.

Authentication & TLS:

  • API key-based authentication (sent in ConnectRequest).
  • Full TLS support: TLS 1.2/1.3, mutual TLS (client cert + key in PEM), custom CA trust, self-signed cert allowance for dev.

Subscriptions:

  • Server-streaming gRPC (IAsyncEnumerable<VtqMessage>).
  • Configurable sampling interval (default: 1000ms; 0 = on-change).
  • Wire format: VtqMessage { Tag, Value (string), TimestampUtcTicks (long), Quality (string: "Good"/"Uncertain"/"Bad") }.
  • Subscription disposed via ISubscription.DisposeAsync().

Additional Capabilities (beyond IDataConnection):

  • ReadBatchAsync(addresses) — bulk read in a single gRPC call.
  • WriteBatchAsync(values) — bulk write in a single gRPC call.
  • WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, timeout) — write-and-poll pattern for handshake protocols (default timeout: 30s, poll interval: 100ms).
  • Built-in retry policy via Polly: exponential backoff (base delay × 2^attempt), configurable max attempts (default: 3), applied to reads. Transient errors: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted.
  • Operation metrics: count, errors, p95/p99 latency (ring buffer of last 1000 samples per operation).
  • Correlation ID propagation for distributed tracing (configurable header name).
  • DI integration: AddLmxProxyClient(IConfiguration) binds to "LmxProxy" config section in appsettings.json.

SDK Reference: The client SDK source is at LmxProxyClient in the ScadaBridge repository. The DCL's LmxProxy adapter wraps this SDK behind the IDataConnection interface.

Subscription Management

  • When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer.
  • The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration.
  • Tag value updates are delivered directly to the requesting Instance Actor.
  • When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions.
  • When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration.

Write-Back Support

  • When a script calls Instance.SetAttribute for an attribute with a data source reference, the Instance Actor sends a write request to the DCL.
  • The DCL writes the value to the physical device via the appropriate protocol.
  • The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update.
  • The Instance Actor's in-memory value is not updated until the device confirms the write.

Value Update Message Format

Each value update delivered to an Instance Actor includes:

  • Tag path: The relative path of the attribute's data source reference.
  • Value: The new value from the device.
  • Quality: Data quality indicator (good, bad, uncertain).
  • Timestamp: When the value was read from the device.

Connection Actor Model

Each data connection is managed by a dedicated connection actor that uses the Akka.NET Become/Stash pattern to model its lifecycle as a state machine:

  • Connecting: The actor attempts to establish the connection. Subscription requests and write commands received during this phase are stashed (buffered in the actor's stash).
  • Connected: The actor is actively servicing subscriptions. On entering this state, all stashed messages are unstashed and processed.
  • Reconnecting: The connection was lost. The actor transitions back to a connecting-like state, stashing new requests while it retries.

This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies.

LmxProxy-specific notes: The LmxProxy connection actor holds the SessionId returned by ConnectAsync and passes it to all subsequent operations. On entering the Connected state, the actor starts the 30-second keep-alive timer. Subscriptions use server-streaming gRPC — the actor processes the IAsyncEnumerable<VtqMessage> stream and forwards updates to Instance Actors. On keep-alive failure, the actor transitions to Reconnecting and the client automatically disposes active subscriptions.

Connection Lifecycle & Reconnection

The DCL manages connection lifecycle automatically:

  1. Connection drop detection: When a connection to a data source is lost, the DCL immediately pushes a value update with quality bad for every tag subscribed on that connection. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately.
  2. Auto-reconnect with fixed interval: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined per data connection. This is consistent with the fixed-interval retry philosophy used throughout the system. Note on LmxProxy: The LmxProxy SDK includes its own retry policy (exponential backoff via Polly) for individual operations (reads). The DCL's fixed-interval reconnect owns connection-level recovery (re-establishing the gRPC session after a keep-alive failure or disconnect). The SDK's retry policy handles operation-level transient failures within an active session. These are complementary — the DCL does not disable the SDK's retry policy.
  3. Connection state transitions: The DCL tracks each connection's state as connected, disconnected, or reconnecting. All transitions are logged to Site Event Logging.
  4. Transparent re-subscribe: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to good as fresh values arrive from restored subscriptions.

Write Failure Handling

Writes to physical devices are synchronous from the script's perspective:

  • If the write fails (connection down, device rejection, timeout), the error is returned to the calling script. Script authors can catch and handle write errors (log, notify, retry, etc.).
  • Write failures are also logged to Site Event Logging.
  • There is no store-and-forward for device writes — these are real-time control operations. Buffering stale setpoints for later application would be dangerous in an industrial context.

Tag Path Resolution

When the DCL subscribes to a tag path from the flattened configuration but the path does not exist on the physical device (e.g., typo in the template, device firmware changed, device still booting):

  1. The failure is logged to Site Event Logging.
  2. The attribute is marked with quality bad.
  3. The DCL periodically retries resolution at a configurable interval, accommodating devices that come online in stages or load modules after startup.
  4. On successful resolution, the subscription activates normally and quality reflects the live value from the device.

Note: Pre-deployment validation at central does not verify that tag paths resolve to real tags on physical devices — that is a runtime concern handled here.

Health Reporting

The DCL reports the following metrics to the Health Monitoring component via the existing periodic heartbeat:

  • Connection status: connected, disconnected, or reconnecting per data connection.
  • Tag resolution counts: Per connection, the number of total subscribed tags vs. successfully resolved tags. This gives operators visibility into misconfigured templates without needing to open the debug view for individual instances.

Dependencies

  • Site Runtime (Instance Actors): Receives subscription registrations and delivers value updates. Receives write requests.
  • Health Monitoring: Reports connection status.
  • Site Event Logging: Logs connection status changes.

Interactions

  • Site Runtime (Instance Actors): Bidirectional — delivers value updates, receives subscription registrations and write-back commands.
  • Health Monitoring: Reports connection health periodically.
  • Site Event Logging: Logs connection/disconnection events.