TemplateEngine (alarm-script-ref ordering, native-alarm-sources not in revision hash, composition cycle checks, 9-step pipeline), SiteRuntime (alarm on-trigger scripts run with a restricted context; PreStart seeds children from defaults before overrides arrive), DataConnectionLayer (UnsubscribeAlarmsRequest stashed in Connecting), StoreAndForward (InFlight/ Delivered are dead enum values; notifications can park at 50 retries), ExternalSystemGateway (CachedWrite returns void + enqueues directly; log levels).
16 KiB
External System Gateway
The External System Gateway gives site scripts two runtime capabilities: invoking HTTP/REST APIs on named external systems, and executing SQL writes against named database connections. Both capabilities expose a dual call mode — synchronous (blocking, result returned) and cached (store-and-forward on transient failure) — so scripts choose the right delivery guarantee per operation without knowing the underlying retry machinery.
Overview
External System Gateway (#7) runs exclusively at the site. Definitions — external system endpoints with their authentication and method catalogue, and database connection strings — are authored centrally and deployed to the site's local SQLite by the Deployment Manager. The site never reaches back to the configuration database at call time; the repository resolves each definition from SQLite on the hot path.
The component code lives in src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/, with all four source files at the root:
ExternalSystemClient.cs—IExternalSystemClientimplementation;CallAsync(synchronous) andCachedCallAsync(store-and-forward on transient failure), plus theDeliverBufferedAsyncentry point consumed by the Store-and-Forward Engine during retry sweeps.DatabaseGateway.cs—IDatabaseGatewayimplementation;GetConnectionAsync(ADO.NETSqlConnection) andCachedWriteAsync(S&F-buffered SQL), plus its ownDeliverBufferedAsyncfor the retry path.ErrorClassifier.cs— static helper that maps HTTP status codes and exception types toTransientExternalSystemException/PermanentExternalSystemException.ExternalSystemGatewayOptions.cs— options class bound fromScadaBridge:ExternalSystemGateway.ServiceCollectionExtensions.cs—AddExternalSystemGatewayextension; registersExternalSystemClientandDatabaseGatewayas scoped services and applies per-system connection limits to namedHttpClientinstances.
Both services are DI-scoped. Script Execution Actors (short-lived, per-invocation) resolve them; blocking I/O from both runs on a dedicated Akka.NET dispatcher to keep the default dispatcher free for coordination actors.
Key Concepts
Definitions at rest
An ExternalSystemDefinition carries the base EndpointUrl, AuthType ("apikey", "basic", or "none"), AuthConfiguration (the credential payload), and per-system retry settings (MaxRetries, RetryDelay). Its child ExternalSystemMethod records each carry HttpMethod, Path (relative to the base URL), and JSON-serialized ParameterDefinitions / ReturnDefinition. A DatabaseConnectionDefinition carries an ADO.NET ConnectionString and its own MaxRetries / RetryDelay.
Definitions are resolved from the site SQLite repository on every call via name-keyed indexed queries (GetExternalSystemByNameAsync, GetDatabaseConnectionByNameAsync) rather than a fetch-all-then-filter scan, because definitions are read on every script invocation.
Dual call modes
Every API call and every database write has two modes:
| Mode | API surface | Failure behaviour | Return value |
|---|---|---|---|
| Synchronous | ExternalSystem.Call() / Database.Connection() |
All failures returned to script | Response JSON / DbConnection |
| Cached | ExternalSystem.CachedCall() / Database.CachedWrite() |
Transient → buffered; permanent → returned | ExternalCallResult (buffered) / void |
CachedCallAsync attempts immediate delivery first; only a transient failure routes to the Store-and-Forward Engine. CachedWriteAsync makes no immediate SQL attempt — it resolves the connection definition and enqueues directly.
Error classification
ErrorClassifier is the authority on HTTP and exception transience for the synchronous call path:
- HTTP status codes: 5xx, 408 (Request Timeout), 429 (Too Many Requests) → transient. All other non-success 4xx → permanent.
- Exceptions:
HttpRequestException,TaskCanceledException,TimeoutException,OperationCanceledException→ transient.
JsonException during buffered-delivery payload deserialization is classified as permanent inline inside DeliverBufferedAsync (both ExternalSystemClient and DatabaseGateway), not via ErrorClassifier — a malformed payload will not become well-formed on retry, so it is parked immediately.
Transient failures on CachedCall / CachedWrite are silently buffered (logged at Debug). Permanent failures on the synchronous (InvokeHttpAsync) path are logged at Warning and returned to the calling script. Permanent failures detected during buffered retry delivery (DeliverBufferedAsync) are logged at Error before parking.
Architecture
HTTP invocation (ExternalSystemClient)
InvokeHttpAsync constructs the request, applies auth, dispatches, and classifies the response. The gateway creates a named HttpClient per system (ExternalSystem_{systemName}) through IHttpClientFactory, with SocketsHttpHandler.MaxConnectionsPerServer capped by MaxConcurrentConnectionsPerSystem. The framework default HttpClient.Timeout (100 s) is deliberately overridden to Timeout.InfiniteTimeSpan so the gateway's own CancellationTokenSource(DefaultHttpTimeout) is the sole timeout source — without this, configured timeouts above 100 s would be silently clipped.
Parameter routing by verb:
POST,PUT,PATCH→ JSON body (application/json).GET,DELETE→ URL query string (null-valued parameters omitted; no trailing?when all values are null).
Auth application:
apikey—AuthConfigurationformat"HeaderName:KeyValue"or bare key value (default headerX-API-Key).basic—AuthConfigurationformat"username:password", Base64-encoded asAuthorization: Basic ....none— silent no-op.- Missing or malformed
AuthConfigurationfor a type that requires credentials logs aWarningbut does not abort the call.
Error body embedded in script-visible messages is capped at 2 048 characters so a misbehaving endpoint cannot inflate error strings.
// ExternalSystemClient.cs
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// The caller asked to abandon the work — do not reclassify as transient.
throw;
}
catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested)
{
// Our own timeout elapsed — a transient failure per the design.
throw ErrorClassifier.AsTransient(
$"Timeout calling {system.Name} after {_options.DefaultHttpTimeout.TotalSeconds:0.##}s", ex);
}
catch (Exception ex) when (ErrorClassifier.IsTransient(ex))
{
throw ErrorClassifier.AsTransient($"Connection error to {system.Name}: {ex.Message}", ex);
}
CachedCallAsync — the buffered path
On a transient failure, CachedCallAsync serializes {SystemName, MethodName, Parameters} as JSON and calls StoreAndForwardService.EnqueueAsync with StoreAndForwardCategory.ExternalSystem. Three details matter for correct S&F integration:
attemptImmediateDelivery: false— the HTTP attempt has already been made; passingtruewould dispatch the same request twice.MaxRetries/RetryDelaydefaulting —ExternalSystemDefinition.MaxRetriesdefaults to0, and the S&F engine treats a stored0as "no limit". A0is therefore passed asnullso the engine's own bounded default applies, avoiding unbounded retry loops on unconfigured systems.messageId: trackedOperationId— pins the S&F message GUID to the caller-suppliedTrackedOperationIdso the retry loop can emit per-attempt and terminal audit telemetry under the same tracking id.
// ExternalSystemClient.cs — transient branch of CachedCallAsync
await _storeAndForward.EnqueueAsync(
StoreAndForwardCategory.ExternalSystem,
systemName,
payload,
originInstanceName,
system.MaxRetries > 0 ? system.MaxRetries : null,
system.RetryDelay > TimeSpan.Zero ? system.RetryDelay : null,
attemptImmediateDelivery: false,
messageId: trackedOperationId?.ToString(),
executionId: executionId,
sourceScript: sourceScript,
parentExecutionId: parentExecutionId);
return new ExternalCallResult(true, null, null, WasBuffered: true);
DeliverBufferedAsync — S&F retry delivery
The Store-and-Forward Engine calls ExternalSystemClient.DeliverBufferedAsync and DatabaseGateway.DeliverBufferedAsync during retry sweeps. Both methods:
- Deserialize the payload JSON; treat
JsonExceptionas permanent (returnfalse→ park). - Re-resolve the definition by name; if gone, return
false→ park. - Execute the operation.
PermanentExternalSystemException→ park.TransientExternalSystemExceptionpropagates → engine retries.
Database gateway (DatabaseGateway)
GetConnectionAsync resolves the DatabaseConnectionDefinition, opens a SqlConnection against ConnectionString, and returns the open connection. The caller owns disposal. If OpenAsync throws (unreachable server, bad credentials), the connection is disposed before the exception propagates.
CachedWriteAsync serializes {ConnectionName, Sql, Parameters} and enqueues to S&F under StoreAndForwardCategory.CachedDbWrite, with the same MaxRetries / RetryDelay defaulting logic as CachedCallAsync.
During retry delivery, JsonElement parameter values are converted with a numeric type preference of long → decimal → double. This matters because a script's decimal SQL parameter is serialized as an untagged JSON number; naively casting to double loses precision for money and measurement values.
// DatabaseGateway.cs — JsonElementToParameterValue
JsonValueKind.Number => element.TryGetInt64(out var l)
? l
: element.TryGetDecimal(out var dec)
? dec
: element.GetDouble(),
Usage
Scripts interact through IExternalSystemClient and IDatabaseGateway, which the Script Runtime Context exposes as ExternalSystem and Database respectively. Scripts never construct gateway types directly.
Synchronous external system call — blocks until the response arrives or the timeout elapses:
// Script code (via ScriptRuntimeContext)
var result = await ExternalSystem.Call("MES", "GetRecipe", new { RecipeId = 42 });
if (result.Success)
{
var name = result.Response.recipeName; // dynamic JSON access
}
Cached external system call — returns immediately with a TrackedOperationId; the actual HTTP request is attempted once and, on transient failure, buffered for retry:
var tracked = await ExternalSystem.CachedCall("MES", "PostProductionResult", payload);
// tracked.WasBuffered == true when queued to S&F
Synchronous database access — caller controls the connection lifetime:
await using var conn = await Database.Connection("HistorianDB");
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT TOP 1 Value FROM dbo.Tags WHERE Name = @name";
cmd.Parameters.AddWithValue("@name", tagName);
var value = await cmd.ExecuteScalarAsync();
Cached database write — enqueued immediately; returns nothing (Task):
await Database.CachedWrite("MES_DB",
"INSERT INTO dbo.ProductionLog (BatchId, Qty) VALUES (@batchId, @qty)",
new { batchId = id, qty = quantity });
Call status is observable via Tracking.Status(trackedOperationId) — answered site-locally against the S&F tracking table, or centrally via the Site Call Audit page.
Configuration
Options are bound from ScadaBridge:ExternalSystemGateway into ExternalSystemGatewayOptions by AddExternalSystemGateway.
| Key | Default | Description |
|---|---|---|
DefaultHttpTimeout |
00:00:30 |
Per-call HTTP round-trip timeout. Applied via CancellationTokenSource; overrides the framework 100 s default. |
MaxConcurrentConnectionsPerSystem |
10 |
SocketsHttpHandler.MaxConnectionsPerServer applied to each named HttpClient (ExternalSystem_{name}). Does not affect other host HttpClient instances. |
Per-system retry settings (MaxRetries, RetryDelay) are properties of ExternalSystemDefinition and DatabaseConnectionDefinition, authored by operators in the Central UI and deployed as part of the system artifact. The gateway passes these directly to the Store-and-Forward Engine on enqueue.
There is no separate configuration section for database connections — connection strings reside in DatabaseConnectionDefinition.ConnectionString, deployed via artifact. Pool tuning (max pool size, connection lifetime) can be embedded in the connection string itself.
Dependencies & Interactions
- Commons (#16) — owns
IExternalSystemClient,IDatabaseGateway,ExternalCallResult,TrackedOperationId,ExternalSystemDefinition,ExternalSystemMethod,DatabaseConnectionDefinition,IExternalSystemRepository, and theStoreAndForwardCategoryenum values consumed here. - Store-and-Forward Engine (#6) — receives buffered
ExternalSystemandCachedDbWritepayloads fromCachedCallAsync/CachedWriteAsync; drives retry sweeps by callingDeliverBufferedAsyncon both gateway types; assignsTrackedOperationIdtracking rows; owns the site-local operation tracking table read byTracking.Status(). - Configuration Database (#17) — provides
IExternalSystemRepository, implemented against the site SQLite replica. Central uses the same interface against MS SQL for definition management. - Site Runtime (#3) — Script Execution Actors resolve
IExternalSystemClientandIDatabaseGatewayfrom DI and expose them to script code asExternalSystemandDatabase. Actors run on a dedicated blocking I/O dispatcher to isolate HTTP and SQL waits from the actor system's default dispatcher. - Site Call Audit (#22) — receives cached-call lifecycle telemetry (via the combined
CachedCallTelemetrypacket) so cached call status is observable centrally; the gateway's S&F delivery writes the tracking row thatTracking.Status()reads. - Audit Log (#23) — audit rows for
ApiOutboundandDbOutboundchannels are emitted by the Script Runtime Context around gateway calls; gateway itself does not write audit rows directly. ThetrackedOperationId,executionId, andparentExecutionIdthreaded throughCachedCallAsync/CachedWriteAsynckeep audit rows correlated across the retry lifecycle.
Troubleshooting
A cached call is stuck retrying
If the external system definition or database connection has MaxRetries = 0 and the operator intended "no retries", the S&F engine interprets 0 as "no limit" (retry forever). The gateway normalizes 0 to null on enqueue so the engine's bounded default applies. Verify the definition's MaxRetries field is set to the intended value in the Central UI and redeployed.
Timeout is not being respected
ExternalSystemGatewayOptions.DefaultHttpTimeout applies only when HttpClient.Timeout is Timeout.InfiniteTimeSpan. The gateway sets this explicitly on every factory-supplied client. If a custom HttpMessageHandler upstream resets Timeout, the gateway's CancellationTokenSource(DefaultHttpTimeout) is still the controlling token because SendAsync is called with the linked token, not the raw cancellationToken.
Auth header not sent
The gateway logs a Warning when AuthType is "apikey" or "basic" but AuthConfiguration is empty or absent, and when AuthType is "basic" but AuthConfiguration has no : separator. Check the site log for ApplyAuth: warning messages. The credential value is never logged — only the system name and auth type.
A buffered call is parked immediately
A JsonException during DeliverBufferedAsync payload deserialization is treated as permanent (the same malformed payload will fail every time). The message is parked rather than retried. Check the site log for "malformed JSON payload; parking" alongside the message GUID, then inspect the S&F store for the payload to identify the serialization issue.