Files
scadalink-design/docs/requirements/Component-ExternalSystemGateway.md

11 KiB

Component: External System Gateway

Purpose

The External System Gateway manages predefined integrations with external systems (e.g., MES, recipe managers) and database connections. It provides the runtime for invoking external API methods and executing database operations from scripts at site clusters.

Location

Site clusters (executes calls directly to external systems, reads definitions from local SQLite). Central cluster (stores definitions in config DB, brokers inbound requests from external systems to sites).

Responsibilities

Definitions (Central)

  • Store external system definitions in the configuration database: connection details, API method signatures (parameters and return types).
  • Store database connection definitions: server, database, credentials.
  • Deploy definitions uniformly to all sites (no per-site overrides). Deployment requires explicit action by a user with the Deployment role.
  • Managed by users with the Design role.

Execution (Site)

  • Invoke external system API methods as requested by scripts (via Script Execution Actors and Alarm Execution Actors).
  • Provide raw MS SQL client connections (ADO.NET) by name for synchronous database access.
  • Submit cached database writes to the Store-and-Forward Engine for reliable delivery.
  • Sites communicate with external systems directly (not routed through central).

Integration Brokering (Central)

  • Receive inbound requests from external systems (e.g., MES querying machine values).
  • Route requests to the appropriate site via the Communication Layer.
  • Return responses to the external system.

External System Definition

Each external system definition includes:

  • Name: Unique identifier (e.g., "MES", "RecipeManager").
  • Base URL: The root endpoint URL for the external system (e.g., https://mes.example.com/api).
  • Authentication: One of:
    • API Key: Header name (e.g., X-API-Key) and key value.
    • Basic Auth: Username and password.
  • Timeout: Per-system timeout for all method calls (e.g., 30 seconds). Applies to the HTTP request round-trip.
  • Retry Settings: Max retry count, fixed time between retries (used by Store-and-Forward Engine for transient failures only).
  • Method Definitions: List of available API methods, each with:
    • Method name.
    • HTTP method: GET, POST, PUT, or DELETE.
    • Path: Relative path appended to the base URL (e.g., /recipes/{id}).
    • Parameter definitions (name, type). Supports the extended type system (Boolean, Integer, Float, String, Object, List).
    • Return type definition. Supports the extended type system for complex response structures.

Database Connection Definition

Each database connection definition includes:

  • Name: Unique identifier (e.g., "MES_DB", "HistorianDB").
  • Connection Details: Server address, database name, credentials.
  • Retry Settings: Max retry count, fixed time between retries (for cached writes).

Database Access Modes

Synchronous (Real-time)

  • Script calls Database.Connection("name") and receives a raw ADO.NET SqlConnection.
  • Full control: queries, updates, transactions, stored procedures.
  • Failures are immediate — no buffering.
  • Audit emission: script-initiated Execute/ExecuteScalar calls emit DbOutbound.SyncWrite rows; ExecuteReader emits DbOutbound.SyncRead. SQL parameter values are captured by default; per-connection redaction opt-in via the Audit Log configuration (see Component-AuditLog.md, Payload Capture Policy). Audit-write failure never aborts the script.

Cached Write (Store-and-Forward)

  • Script calls Database.CachedWrite("name", "sql", parameters). This is deferred delivery: the call returns a TrackedOperationId tracking handle immediately rather than the write result.
  • Payload includes: connection name, SQL statement, serialized parameter values.
  • The write is attempted immediately. On immediate success it is recorded as a terminal Delivered tracking record. On transient failure (database unavailable) it is buffered (Pending/Retrying) and retried per the connection's retry settings by the Store-and-Forward Engine.
  • On permanent failure (e.g. a SQL syntax or constraint error — a request that will never succeed), the error is returned synchronously to the calling script and the write is not buffered. The call is also recorded as a terminal Failed tracking record capturing the error.
  • Cached-write status is observable to scripts via Tracking.Status(id) (answered site-locally and authoritatively) and centrally via the Site Call Audit component.
  • Audit emission: each lifecycle transition (CachedEnqueued, CachedAttempt, CachedTerminal) emits an audit row via the combined cached-operation telemetry packet — one packet carries both the audit row and the SiteCalls upsert (see Component-AuditLog.md, Cached Operations — Combined Telemetry, and Component-SiteCallAudit.md). Audit-write failure never aborts the script.

Invocation Protocol

All external system calls are HTTP/REST with JSON serialization:

  • The ESG acts as an HTTP client. The external system definition provides the base URL; each method definition specifies the HTTP method and relative path.
  • Request parameters are serialized as JSON in the request body (POST/PUT) or as query parameters (GET/DELETE).
  • Response bodies are deserialized from JSON into the method's defined return type.
  • Credentials (API key header or Basic Auth header) are attached to every request per the system's authentication configuration.

External System Call Modes

Scripts choose between two call modes per invocation, mirroring the dual-mode database access pattern:

Synchronous (Real-time)

  • Script calls ExternalSystem.Call("systemName", "methodName", params).
  • The HTTP request is executed immediately. The script blocks until the response is received or the timeout elapses.
  • All failures (transient and permanent) return an error to the calling script. No store-and-forward buffering.
  • Use for request/response interactions where the script needs the result (e.g., fetching a recipe, querying inventory).
  • Audit emission: emits an ApiOutbound.SyncCall row to IAuditWriter at call completion (success or failure). Payload captured per the Audit Log policy (see Component-AuditLog.md, Payload Capture Policy). Audit-write failure never aborts the script.

Cached (Store-and-Forward)

  • Script calls ExternalSystem.CachedCall("systemName", "methodName", params). This is deferred delivery: the call returns a TrackedOperationId tracking handle immediately rather than the response body.
  • The call is attempted immediately. If it succeeds, the response is discarded and the call is recorded as a terminal Delivered tracking record.
  • On transient failure (connection refused, timeout, HTTP 5xx), the call is routed to the Store-and-Forward Engine for retry per the system's retry settings. The script does not block — the call is buffered (Pending/Retrying) and the script continues.
  • On permanent failure (HTTP 4xx), the error is returned synchronously to the calling script. No retry — the request itself is wrong. The call is also recorded as a terminal Failed tracking record capturing the error.
  • Cached-call status is observable to scripts via Tracking.Status(id) (answered site-locally and authoritatively) and centrally via the Site Call Audit component.
  • Audit emission: each lifecycle transition (CachedEnqueued, CachedAttempt, CachedTerminal) emits an audit row via the combined cached-operation telemetry packet — one packet carries both the audit row and the SiteCalls upsert (see Component-AuditLog.md, Cached Operations — Combined Telemetry, and Component-SiteCallAudit.md). Audit-write failure never aborts the script.
  • Use for outbound data pushes where deferred delivery is acceptable (e.g., posting production data, sending quality reports).

Call Timeout & Error Handling

  • Each external system definition specifies a timeout that applies to all method calls on that system.
  • Error classification by HTTP response:
    • Transient failures (connection refused, timeout, HTTP 408, 429, 5xx): Behavior depends on call mode — CachedCall buffers for retry; Call returns error to script.
    • Permanent failures (HTTP 4xx except 408/429): Always returned to the calling script regardless of call mode. Logged to Site Event Logging. For CachedCall, the failure is additionally recorded as a terminal Failed tracking record — so even a never-buffered cached call has an authoritative status record.
  • This classification ensures the S&F buffer is not polluted with requests that will never succeed.
  • Idempotency note: CachedCall retries may result in duplicate delivery if the external system received the original request but the response was lost. Callers should use CachedCall only for operations that are idempotent or where duplicate delivery is acceptable.

Blocking I/O Isolation

  • External system HTTP calls and database operations are blocking I/O. Script Execution Actors (which are short-lived, per-invocation actors) execute these calls, ensuring that blocking does not starve the parent Script Actor or Instance Actor.
  • The Akka.NET actor system should configure a dedicated dispatcher for Script Execution Actors to isolate blocking I/O from the default dispatcher used by coordination actors.

Database Connection Management

  • Database connections use standard ADO.NET connection pooling per named connection. No custom pool management.
  • Pool behavior (max pool size, connection lifetime, etc.) can be tuned via connection string parameters in the database connection definition if needed.
  • Synchronous failures on Database.Connection() (e.g., unreachable server) return an error to the calling script, consistent with external system permanent failure handling.

Dependencies

  • Configuration Database (MS SQL): Stores external system and database connection definitions (central only).
  • Local SQLite: At sites, external system and database connection definitions are read from local SQLite (populated by artifact deployment). Sites do not access the central config DB.
  • Store-and-Forward Engine: Handles buffering for failed external system calls and cached database writes, and owns the site-local operation tracking table read by Tracking.Status(id).
  • Site Call Audit: Central audit mirror for cached calls — receives cached-call lifecycle telemetry so CachedCall/CachedWrite status is observable centrally.
  • Communication Layer: Routes inbound external system requests from central to sites.
  • Security & Auth: Design role manages definitions.
  • Configuration Database (via IAuditService): Definition changes are audit logged.

Interactions

  • Site Runtime (Script/Alarm Execution Actors): Scripts invoke external system methods and database operations through this component.
  • Store-and-Forward Engine: Failed calls and cached writes are routed here for reliable delivery; it also assigns each cached call a TrackedOperationId tracking row.
  • Site Call Audit: The central observability sibling for cached calls — cached-call status reported here is queried via the Central UI Site Calls page.
  • Deployment Manager: Receives updated definitions as part of system-wide artifact deployment (triggered explicitly by Deployment role).