# External System Gateway The External System Gateway gives site scripts two runtime capabilities: invoking HTTP/REST APIs on named external systems, and executing SQL writes against named database connections. Both capabilities expose a dual call mode — synchronous (blocking, result returned) and cached (store-and-forward on transient failure, `TrackedOperationId` returned) — so scripts choose the right delivery guarantee per operation without knowing the underlying retry machinery. ## Overview External System Gateway (#7) runs exclusively at the site. Definitions — external system endpoints with their authentication and method catalogue, and database connection strings — are authored centrally and deployed to the site's local SQLite by the Deployment Manager. The site never reaches back to the configuration database at call time; the repository resolves each definition from SQLite on the hot path. The component code lives in `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/`, with all four source files at the root: - `ExternalSystemClient.cs` — `IExternalSystemClient` implementation; `CallAsync` (synchronous) and `CachedCallAsync` (store-and-forward on transient failure), plus the `DeliverBufferedAsync` entry point consumed by the Store-and-Forward Engine during retry sweeps. - `DatabaseGateway.cs` — `IDatabaseGateway` implementation; `GetConnectionAsync` (ADO.NET `SqlConnection`) and `CachedWriteAsync` (S&F-buffered SQL), plus its own `DeliverBufferedAsync` for the retry path. - `ErrorClassifier.cs` — static helper that maps HTTP status codes and exception types to `TransientExternalSystemException` / `PermanentExternalSystemException`. - `ExternalSystemGatewayOptions.cs` — options class bound from `ScadaBridge:ExternalSystemGateway`. - `ServiceCollectionExtensions.cs` — `AddExternalSystemGateway` extension; registers `ExternalSystemClient` and `DatabaseGateway` as scoped services and applies per-system connection limits to named `HttpClient` instances. Both services are DI-scoped. Script Execution Actors (short-lived, per-invocation) resolve them; blocking I/O from both runs on a dedicated Akka.NET dispatcher to keep the default dispatcher free for coordination actors. ## Key Concepts ### Definitions at rest An `ExternalSystemDefinition` carries the base `EndpointUrl`, `AuthType` (`"apikey"`, `"basic"`, or `"none"`), `AuthConfiguration` (the credential payload), and per-system retry settings (`MaxRetries`, `RetryDelay`). Its child `ExternalSystemMethod` records each carry `HttpMethod`, `Path` (relative to the base URL), and JSON-serialized `ParameterDefinitions` / `ReturnDefinition`. A `DatabaseConnectionDefinition` carries an ADO.NET `ConnectionString` and its own `MaxRetries` / `RetryDelay`. Definitions are resolved from the site SQLite repository on every call via name-keyed indexed queries (`GetExternalSystemByNameAsync`, `GetDatabaseConnectionByNameAsync`) rather than a fetch-all-then-filter scan, because definitions are read on every script invocation. ### Dual call modes Every API call and every database write has two modes: | Mode | API surface | Failure behaviour | Return value | |------|-------------|-------------------|--------------| | Synchronous | `ExternalSystem.Call()` / `Database.Connection()` | All failures returned to script | Response JSON / `DbConnection` | | Cached | `ExternalSystem.CachedCall()` / `Database.CachedWrite()` | Transient → buffered; permanent → returned | `TrackedOperationId` (on buffer) | `CachedCallAsync` and `CachedWriteAsync` attempt immediate delivery first. Only a transient failure routes to the Store-and-Forward Engine. ### Error classification `ErrorClassifier` is the single authority on what counts as transient: - **HTTP status codes**: 5xx, 408 (Request Timeout), 429 (Too Many Requests) → transient. All other non-success 4xx → permanent. - **Exceptions**: `HttpRequestException`, `TaskCanceledException`, `TimeoutException`, `OperationCanceledException` → transient. `JsonException` during payload deserialization → permanent (a malformed payload will not become well-formed on retry, so it is parked rather than retried forever). Transient failures on `CachedCall` / `CachedWrite` are silently buffered (logged at `Debug`). Permanent failures are logged at `Warning` and returned to the calling script regardless of call mode, because a permanently-wrong request should surface immediately. ## Architecture ### HTTP invocation (`ExternalSystemClient`) `InvokeHttpAsync` constructs the request, applies auth, dispatches, and classifies the response. The gateway creates a named `HttpClient` per system (`ExternalSystem_{systemName}`) through `IHttpClientFactory`, with `SocketsHttpHandler.MaxConnectionsPerServer` capped by `MaxConcurrentConnectionsPerSystem`. The framework default `HttpClient.Timeout` (100 s) is deliberately overridden to `Timeout.InfiniteTimeSpan` so the gateway's own `CancellationTokenSource(DefaultHttpTimeout)` is the sole timeout source — without this, configured timeouts above 100 s would be silently clipped. Parameter routing by verb: - `POST`, `PUT`, `PATCH` → JSON body (`application/json`). - `GET`, `DELETE` → URL query string (null-valued parameters omitted; no trailing `?` when all values are null). Auth application: - `apikey` — `AuthConfiguration` format `"HeaderName:KeyValue"` or bare key value (default header `X-API-Key`). - `basic` — `AuthConfiguration` format `"username:password"`, Base64-encoded as `Authorization: Basic ...`. - `none` — silent no-op. - Missing or malformed `AuthConfiguration` for a type that requires credentials logs a `Warning` but does not abort the call. Error body embedded in script-visible messages is capped at 2 048 characters so a misbehaving endpoint cannot inflate error strings. ```csharp // ExternalSystemClient.cs catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested) { // Our own timeout elapsed — a transient failure per the design. throw ErrorClassifier.AsTransient( $"Timeout calling {system.Name} after {_options.DefaultHttpTimeout.TotalSeconds:0.##}s", ex); } catch (Exception ex) when (ErrorClassifier.IsTransient(ex)) { throw ErrorClassifier.AsTransient($"Connection error to {system.Name}: {ex.Message}", ex); } ``` ### `CachedCallAsync` — the buffered path On a transient failure, `CachedCallAsync` serializes `{SystemName, MethodName, Parameters}` as JSON and calls `StoreAndForwardService.EnqueueAsync` with `StoreAndForwardCategory.ExternalSystem`. Three details matter for correct S&F integration: - **`attemptImmediateDelivery: false`** — the HTTP attempt has already been made; passing `true` would dispatch the same request twice. - **`MaxRetries` / `RetryDelay` defaulting** — `ExternalSystemDefinition.MaxRetries` defaults to `0`, and the S&F engine treats a stored `0` as "no limit". A `0` is therefore passed as `null` so the engine's own bounded default applies, avoiding unbounded retry loops on unconfigured systems. - **`messageId: trackedOperationId`** — pins the S&F message GUID to the caller-supplied `TrackedOperationId` so the retry loop can emit per-attempt and terminal audit telemetry under the same tracking id. ```csharp // ExternalSystemClient.cs — transient branch of CachedCallAsync await _storeAndForward.EnqueueAsync( StoreAndForwardCategory.ExternalSystem, systemName, payload, originInstanceName, system.MaxRetries > 0 ? system.MaxRetries : null, system.RetryDelay > TimeSpan.Zero ? system.RetryDelay : null, attemptImmediateDelivery: false, messageId: trackedOperationId?.ToString(), executionId: executionId, sourceScript: sourceScript, parentExecutionId: parentExecutionId); return new ExternalCallResult(true, null, null, WasBuffered: true); ``` ### `DeliverBufferedAsync` — S&F retry delivery The Store-and-Forward Engine calls `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` during retry sweeps. Both methods: 1. Deserialize the payload JSON; treat `JsonException` as permanent (return `false` → park). 2. Re-resolve the definition by name; if gone, return `false` → park. 3. Execute the operation. `PermanentExternalSystemException` → park. `TransientExternalSystemException` propagates → engine retries. ### Database gateway (`DatabaseGateway`) `GetConnectionAsync` resolves the `DatabaseConnectionDefinition`, opens a `SqlConnection` against `ConnectionString`, and returns the open connection. The caller owns disposal. If `OpenAsync` throws (unreachable server, bad credentials), the connection is disposed before the exception propagates. `CachedWriteAsync` serializes `{ConnectionName, Sql, Parameters}` and enqueues to S&F under `StoreAndForwardCategory.CachedDbWrite`, with the same `MaxRetries` / `RetryDelay` defaulting logic as `CachedCallAsync`. During retry delivery, `JsonElement` parameter values are converted with a numeric type preference of `long` → `decimal` → `double`. This matters because a script's decimal SQL parameter is serialized as an untagged JSON number; naively casting to `double` loses precision for money and measurement values. ```csharp // DatabaseGateway.cs — JsonElementToParameterValue JsonValueKind.Number => element.TryGetInt64(out var l) ? l : element.TryGetDecimal(out var dec) ? dec : element.GetDouble(), ``` ## Usage Scripts interact through `IExternalSystemClient` and `IDatabaseGateway`, which the Script Runtime Context exposes as `ExternalSystem` and `Database` respectively. Scripts never construct gateway types directly. **Synchronous external system call** — blocks until the response arrives or the timeout elapses: ```csharp // Script code (via ScriptRuntimeContext) var result = await ExternalSystem.Call("MES", "GetRecipe", new { RecipeId = 42 }); if (result.Success) { var name = result.Response.recipeName; // dynamic JSON access } ``` **Cached external system call** — returns immediately with a `TrackedOperationId`; the actual HTTP request is attempted once and, on transient failure, buffered for retry: ```csharp var tracked = await ExternalSystem.CachedCall("MES", "PostProductionResult", payload); // tracked.WasBuffered == true when queued to S&F ``` **Synchronous database access** — caller controls the connection lifetime: ```csharp await using var conn = await Database.Connection("HistorianDB"); using var cmd = conn.CreateCommand(); cmd.CommandText = "SELECT TOP 1 Value FROM dbo.Tags WHERE Name = @name"; cmd.Parameters.AddWithValue("@name", tagName); var value = await cmd.ExecuteScalarAsync(); ``` **Cached database write** — enqueued immediately; returns a `TrackedOperationId`: ```csharp await Database.CachedWrite("MES_DB", "INSERT INTO dbo.ProductionLog (BatchId, Qty) VALUES (@batchId, @qty)", new { batchId = id, qty = quantity }); ``` Call status is observable via `Tracking.Status(trackedOperationId)` — answered site-locally against the S&F tracking table, or centrally via the Site Call Audit page. ## Configuration Options are bound from `ScadaBridge:ExternalSystemGateway` into `ExternalSystemGatewayOptions` by `AddExternalSystemGateway`. | Key | Default | Description | |-----|---------|-------------| | `DefaultHttpTimeout` | `00:00:30` | Per-call HTTP round-trip timeout. Applied via `CancellationTokenSource`; overrides the framework 100 s default. | | `MaxConcurrentConnectionsPerSystem` | `10` | `SocketsHttpHandler.MaxConnectionsPerServer` applied to each named `HttpClient` (`ExternalSystem_{name}`). Does not affect other host `HttpClient` instances. | Per-system retry settings (`MaxRetries`, `RetryDelay`) are properties of `ExternalSystemDefinition` and `DatabaseConnectionDefinition`, authored by operators in the Central UI and deployed as part of the system artifact. The gateway passes these directly to the Store-and-Forward Engine on enqueue. There is no separate configuration section for database connections — connection strings reside in `DatabaseConnectionDefinition.ConnectionString`, deployed via artifact. Pool tuning (max pool size, connection lifetime) can be embedded in the connection string itself. ## Dependencies & Interactions - [Commons (#16)](./Commons.md) — owns `IExternalSystemClient`, `IDatabaseGateway`, `ExternalCallResult`, `TrackedOperationId`, `ExternalSystemDefinition`, `ExternalSystemMethod`, `DatabaseConnectionDefinition`, `IExternalSystemRepository`, and the `StoreAndForwardCategory` enum values consumed here. - [Store-and-Forward Engine (#6)](./StoreAndForward.md) — receives buffered `ExternalSystem` and `CachedDbWrite` payloads from `CachedCallAsync` / `CachedWriteAsync`; drives retry sweeps by calling `DeliverBufferedAsync` on both gateway types; assigns `TrackedOperationId` tracking rows; owns the site-local operation tracking table read by `Tracking.Status()`. - [Configuration Database (#17)](./ConfigurationDatabase.md) — provides `IExternalSystemRepository`, implemented against the site SQLite replica. Central uses the same interface against MS SQL for definition management. - [Site Runtime (#3)](../requirements/Component-SiteRuntime.md) — Script Execution Actors resolve `IExternalSystemClient` and `IDatabaseGateway` from DI and expose them to script code as `ExternalSystem` and `Database`. Actors run on a dedicated blocking I/O dispatcher to isolate HTTP and SQL waits from the actor system's default dispatcher. - [Site Call Audit (#22)](./SiteCallAudit.md) — receives cached-call lifecycle telemetry (via the combined `CachedCallTelemetry` packet) so cached call status is observable centrally; the gateway's S&F delivery writes the tracking row that `Tracking.Status()` reads. - [Audit Log (#23)](./AuditLog.md) — audit rows for `ApiOutbound` and `DbOutbound` channels are emitted by the Script Runtime Context around gateway calls; gateway itself does not write audit rows directly. The `trackedOperationId`, `executionId`, and `parentExecutionId` threaded through `CachedCallAsync` / `CachedWriteAsync` keep audit rows correlated across the retry lifecycle. ## Troubleshooting ### A cached call is stuck retrying If the external system definition or database connection has `MaxRetries = 0` and the operator intended "no retries", the S&F engine interprets `0` as "no limit" (retry forever). The gateway normalizes `0` to `null` on enqueue so the engine's bounded default applies. Verify the definition's `MaxRetries` field is set to the intended value in the Central UI and redeployed. ### Timeout is not being respected `ExternalSystemGatewayOptions.DefaultHttpTimeout` applies only when `HttpClient.Timeout` is `Timeout.InfiniteTimeSpan`. The gateway sets this explicitly on every factory-supplied client. If a custom `HttpMessageHandler` upstream resets `Timeout`, the gateway's `CancellationTokenSource(DefaultHttpTimeout)` is still the controlling token because `SendAsync` is called with the linked token, not the raw `cancellationToken`. ### Auth header not sent The gateway logs a `Warning` when `AuthType` is `"apikey"` or `"basic"` but `AuthConfiguration` is empty or absent, and when `AuthType` is `"basic"` but `AuthConfiguration` has no `:` separator. Check the site log for `ApplyAuth:` warning messages. The credential value is never logged — only the system name and auth type. ### A buffered call is parked immediately A `JsonException` during `DeliverBufferedAsync` payload deserialization is treated as permanent (the same malformed payload will fail every time). The message is parked rather than retried. Check the site log for `"malformed JSON payload; parking"` alongside the message GUID, then inspect the S&F store for the payload to identify the serialization issue. ## Related Documentation - [External System Gateway design specification](../requirements/Component-ExternalSystemGateway.md) - [Store-and-Forward Engine](./StoreAndForward.md) - [Site Call Audit](./SiteCallAudit.md) - [Audit Log](./AuditLog.md) - [Commons](./Commons.md) - [Configuration Database](./ConfigurationDatabase.md)