Files
ScadaBridge/docs/components/ExternalSystemGateway.md
Joseph Doherty 25bae4e43b docs(components): accuracy fixes from deep review (batch 2)
TemplateEngine (alarm-script-ref ordering, native-alarm-sources not in
revision hash, composition cycle checks, 9-step pipeline), SiteRuntime
(alarm on-trigger scripts run with a restricted context; PreStart seeds
children from defaults before overrides arrive), DataConnectionLayer
(UnsubscribeAlarmsRequest stashed in Connecting), StoreAndForward (InFlight/
Delivered are dead enum values; notifications can park at 50 retries),
ExternalSystemGateway (CachedWrite returns void + enqueues directly; log levels).
2026-06-03 16:34:37 -04:00

16 KiB

External System Gateway

The External System Gateway gives site scripts two runtime capabilities: invoking HTTP/REST APIs on named external systems, and executing SQL writes against named database connections. Both capabilities expose a dual call mode — synchronous (blocking, result returned) and cached (store-and-forward on transient failure) — so scripts choose the right delivery guarantee per operation without knowing the underlying retry machinery.

Overview

External System Gateway (#7) runs exclusively at the site. Definitions — external system endpoints with their authentication and method catalogue, and database connection strings — are authored centrally and deployed to the site's local SQLite by the Deployment Manager. The site never reaches back to the configuration database at call time; the repository resolves each definition from SQLite on the hot path.

The component code lives in src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/, with all four source files at the root:

  • ExternalSystemClient.csIExternalSystemClient implementation; CallAsync (synchronous) and CachedCallAsync (store-and-forward on transient failure), plus the DeliverBufferedAsync entry point consumed by the Store-and-Forward Engine during retry sweeps.
  • DatabaseGateway.csIDatabaseGateway implementation; GetConnectionAsync (ADO.NET SqlConnection) and CachedWriteAsync (S&F-buffered SQL), plus its own DeliverBufferedAsync for the retry path.
  • ErrorClassifier.cs — static helper that maps HTTP status codes and exception types to TransientExternalSystemException / PermanentExternalSystemException.
  • ExternalSystemGatewayOptions.cs — options class bound from ScadaBridge:ExternalSystemGateway.
  • ServiceCollectionExtensions.csAddExternalSystemGateway extension; registers ExternalSystemClient and DatabaseGateway as scoped services and applies per-system connection limits to named HttpClient instances.

Both services are DI-scoped. Script Execution Actors (short-lived, per-invocation) resolve them; blocking I/O from both runs on a dedicated Akka.NET dispatcher to keep the default dispatcher free for coordination actors.

Key Concepts

Definitions at rest

An ExternalSystemDefinition carries the base EndpointUrl, AuthType ("apikey", "basic", or "none"), AuthConfiguration (the credential payload), and per-system retry settings (MaxRetries, RetryDelay). Its child ExternalSystemMethod records each carry HttpMethod, Path (relative to the base URL), and JSON-serialized ParameterDefinitions / ReturnDefinition. A DatabaseConnectionDefinition carries an ADO.NET ConnectionString and its own MaxRetries / RetryDelay.

Definitions are resolved from the site SQLite repository on every call via name-keyed indexed queries (GetExternalSystemByNameAsync, GetDatabaseConnectionByNameAsync) rather than a fetch-all-then-filter scan, because definitions are read on every script invocation.

Dual call modes

Every API call and every database write has two modes:

Mode API surface Failure behaviour Return value
Synchronous ExternalSystem.Call() / Database.Connection() All failures returned to script Response JSON / DbConnection
Cached ExternalSystem.CachedCall() / Database.CachedWrite() Transient → buffered; permanent → returned ExternalCallResult (buffered) / void

CachedCallAsync attempts immediate delivery first; only a transient failure routes to the Store-and-Forward Engine. CachedWriteAsync makes no immediate SQL attempt — it resolves the connection definition and enqueues directly.

Error classification

ErrorClassifier is the authority on HTTP and exception transience for the synchronous call path:

  • HTTP status codes: 5xx, 408 (Request Timeout), 429 (Too Many Requests) → transient. All other non-success 4xx → permanent.
  • Exceptions: HttpRequestException, TaskCanceledException, TimeoutException, OperationCanceledException → transient.

JsonException during buffered-delivery payload deserialization is classified as permanent inline inside DeliverBufferedAsync (both ExternalSystemClient and DatabaseGateway), not via ErrorClassifier — a malformed payload will not become well-formed on retry, so it is parked immediately.

Transient failures on CachedCall / CachedWrite are silently buffered (logged at Debug). Permanent failures on the synchronous (InvokeHttpAsync) path are logged at Warning and returned to the calling script. Permanent failures detected during buffered retry delivery (DeliverBufferedAsync) are logged at Error before parking.

Architecture

HTTP invocation (ExternalSystemClient)

InvokeHttpAsync constructs the request, applies auth, dispatches, and classifies the response. The gateway creates a named HttpClient per system (ExternalSystem_{systemName}) through IHttpClientFactory, with SocketsHttpHandler.MaxConnectionsPerServer capped by MaxConcurrentConnectionsPerSystem. The framework default HttpClient.Timeout (100 s) is deliberately overridden to Timeout.InfiniteTimeSpan so the gateway's own CancellationTokenSource(DefaultHttpTimeout) is the sole timeout source — without this, configured timeouts above 100 s would be silently clipped.

Parameter routing by verb:

  • POST, PUT, PATCH → JSON body (application/json).
  • GET, DELETE → URL query string (null-valued parameters omitted; no trailing ? when all values are null).

Auth application:

  • apikeyAuthConfiguration format "HeaderName:KeyValue" or bare key value (default header X-API-Key).
  • basicAuthConfiguration format "username:password", Base64-encoded as Authorization: Basic ....
  • none — silent no-op.
  • Missing or malformed AuthConfiguration for a type that requires credentials logs a Warning but does not abort the call.

Error body embedded in script-visible messages is capped at 2 048 characters so a misbehaving endpoint cannot inflate error strings.

// ExternalSystemClient.cs
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
    // The caller asked to abandon the work — do not reclassify as transient.
    throw;
}
catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested)
{
    // Our own timeout elapsed — a transient failure per the design.
    throw ErrorClassifier.AsTransient(
        $"Timeout calling {system.Name} after {_options.DefaultHttpTimeout.TotalSeconds:0.##}s", ex);
}
catch (Exception ex) when (ErrorClassifier.IsTransient(ex))
{
    throw ErrorClassifier.AsTransient($"Connection error to {system.Name}: {ex.Message}", ex);
}

CachedCallAsync — the buffered path

On a transient failure, CachedCallAsync serializes {SystemName, MethodName, Parameters} as JSON and calls StoreAndForwardService.EnqueueAsync with StoreAndForwardCategory.ExternalSystem. Three details matter for correct S&F integration:

  • attemptImmediateDelivery: false — the HTTP attempt has already been made; passing true would dispatch the same request twice.
  • MaxRetries / RetryDelay defaultingExternalSystemDefinition.MaxRetries defaults to 0, and the S&F engine treats a stored 0 as "no limit". A 0 is therefore passed as null so the engine's own bounded default applies, avoiding unbounded retry loops on unconfigured systems.
  • messageId: trackedOperationId — pins the S&F message GUID to the caller-supplied TrackedOperationId so the retry loop can emit per-attempt and terminal audit telemetry under the same tracking id.
// ExternalSystemClient.cs — transient branch of CachedCallAsync
await _storeAndForward.EnqueueAsync(
    StoreAndForwardCategory.ExternalSystem,
    systemName,
    payload,
    originInstanceName,
    system.MaxRetries > 0 ? system.MaxRetries : null,
    system.RetryDelay > TimeSpan.Zero ? system.RetryDelay : null,
    attemptImmediateDelivery: false,
    messageId: trackedOperationId?.ToString(),
    executionId: executionId,
    sourceScript: sourceScript,
    parentExecutionId: parentExecutionId);

return new ExternalCallResult(true, null, null, WasBuffered: true);

DeliverBufferedAsync — S&F retry delivery

The Store-and-Forward Engine calls ExternalSystemClient.DeliverBufferedAsync and DatabaseGateway.DeliverBufferedAsync during retry sweeps. Both methods:

  1. Deserialize the payload JSON; treat JsonException as permanent (return false → park).
  2. Re-resolve the definition by name; if gone, return false → park.
  3. Execute the operation. PermanentExternalSystemException → park. TransientExternalSystemException propagates → engine retries.

Database gateway (DatabaseGateway)

GetConnectionAsync resolves the DatabaseConnectionDefinition, opens a SqlConnection against ConnectionString, and returns the open connection. The caller owns disposal. If OpenAsync throws (unreachable server, bad credentials), the connection is disposed before the exception propagates.

CachedWriteAsync serializes {ConnectionName, Sql, Parameters} and enqueues to S&F under StoreAndForwardCategory.CachedDbWrite, with the same MaxRetries / RetryDelay defaulting logic as CachedCallAsync.

During retry delivery, JsonElement parameter values are converted with a numeric type preference of longdecimaldouble. This matters because a script's decimal SQL parameter is serialized as an untagged JSON number; naively casting to double loses precision for money and measurement values.

// DatabaseGateway.cs — JsonElementToParameterValue
JsonValueKind.Number => element.TryGetInt64(out var l)
    ? l
    : element.TryGetDecimal(out var dec)
        ? dec
        : element.GetDouble(),

Usage

Scripts interact through IExternalSystemClient and IDatabaseGateway, which the Script Runtime Context exposes as ExternalSystem and Database respectively. Scripts never construct gateway types directly.

Synchronous external system call — blocks until the response arrives or the timeout elapses:

// Script code (via ScriptRuntimeContext)
var result = await ExternalSystem.Call("MES", "GetRecipe", new { RecipeId = 42 });
if (result.Success)
{
    var name = result.Response.recipeName; // dynamic JSON access
}

Cached external system call — returns immediately with a TrackedOperationId; the actual HTTP request is attempted once and, on transient failure, buffered for retry:

var tracked = await ExternalSystem.CachedCall("MES", "PostProductionResult", payload);
// tracked.WasBuffered == true when queued to S&F

Synchronous database access — caller controls the connection lifetime:

await using var conn = await Database.Connection("HistorianDB");
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT TOP 1 Value FROM dbo.Tags WHERE Name = @name";
cmd.Parameters.AddWithValue("@name", tagName);
var value = await cmd.ExecuteScalarAsync();

Cached database write — enqueued immediately; returns nothing (Task):

await Database.CachedWrite("MES_DB",
    "INSERT INTO dbo.ProductionLog (BatchId, Qty) VALUES (@batchId, @qty)",
    new { batchId = id, qty = quantity });

Call status is observable via Tracking.Status(trackedOperationId) — answered site-locally against the S&F tracking table, or centrally via the Site Call Audit page.

Configuration

Options are bound from ScadaBridge:ExternalSystemGateway into ExternalSystemGatewayOptions by AddExternalSystemGateway.

Key Default Description
DefaultHttpTimeout 00:00:30 Per-call HTTP round-trip timeout. Applied via CancellationTokenSource; overrides the framework 100 s default.
MaxConcurrentConnectionsPerSystem 10 SocketsHttpHandler.MaxConnectionsPerServer applied to each named HttpClient (ExternalSystem_{name}). Does not affect other host HttpClient instances.

Per-system retry settings (MaxRetries, RetryDelay) are properties of ExternalSystemDefinition and DatabaseConnectionDefinition, authored by operators in the Central UI and deployed as part of the system artifact. The gateway passes these directly to the Store-and-Forward Engine on enqueue.

There is no separate configuration section for database connections — connection strings reside in DatabaseConnectionDefinition.ConnectionString, deployed via artifact. Pool tuning (max pool size, connection lifetime) can be embedded in the connection string itself.

Dependencies & Interactions

  • Commons (#16) — owns IExternalSystemClient, IDatabaseGateway, ExternalCallResult, TrackedOperationId, ExternalSystemDefinition, ExternalSystemMethod, DatabaseConnectionDefinition, IExternalSystemRepository, and the StoreAndForwardCategory enum values consumed here.
  • Store-and-Forward Engine (#6) — receives buffered ExternalSystem and CachedDbWrite payloads from CachedCallAsync / CachedWriteAsync; drives retry sweeps by calling DeliverBufferedAsync on both gateway types; assigns TrackedOperationId tracking rows; owns the site-local operation tracking table read by Tracking.Status().
  • Configuration Database (#17) — provides IExternalSystemRepository, implemented against the site SQLite replica. Central uses the same interface against MS SQL for definition management.
  • Site Runtime (#3) — Script Execution Actors resolve IExternalSystemClient and IDatabaseGateway from DI and expose them to script code as ExternalSystem and Database. Actors run on a dedicated blocking I/O dispatcher to isolate HTTP and SQL waits from the actor system's default dispatcher.
  • Site Call Audit (#22) — receives cached-call lifecycle telemetry (via the combined CachedCallTelemetry packet) so cached call status is observable centrally; the gateway's S&F delivery writes the tracking row that Tracking.Status() reads.
  • Audit Log (#23) — audit rows for ApiOutbound and DbOutbound channels are emitted by the Script Runtime Context around gateway calls; gateway itself does not write audit rows directly. The trackedOperationId, executionId, and parentExecutionId threaded through CachedCallAsync / CachedWriteAsync keep audit rows correlated across the retry lifecycle.

Troubleshooting

A cached call is stuck retrying

If the external system definition or database connection has MaxRetries = 0 and the operator intended "no retries", the S&F engine interprets 0 as "no limit" (retry forever). The gateway normalizes 0 to null on enqueue so the engine's bounded default applies. Verify the definition's MaxRetries field is set to the intended value in the Central UI and redeployed.

Timeout is not being respected

ExternalSystemGatewayOptions.DefaultHttpTimeout applies only when HttpClient.Timeout is Timeout.InfiniteTimeSpan. The gateway sets this explicitly on every factory-supplied client. If a custom HttpMessageHandler upstream resets Timeout, the gateway's CancellationTokenSource(DefaultHttpTimeout) is still the controlling token because SendAsync is called with the linked token, not the raw cancellationToken.

Auth header not sent

The gateway logs a Warning when AuthType is "apikey" or "basic" but AuthConfiguration is empty or absent, and when AuthType is "basic" but AuthConfiguration has no : separator. Check the site log for ApplyAuth: warning messages. The credential value is never logged — only the system name and auth type.

A buffered call is parked immediately

A JsonException during DeliverBufferedAsync payload deserialization is treated as permanent (the same malformed payload will fail every time). The message is parked rather than retried. Check the site log for "malformed JSON payload; parking" alongside the message GUID, then inspect the S&F store for the payload to identify the serialization issue.