Files
ScadaBridge/docs/components/ExternalSystemGateway.md
T

220 lines
16 KiB
Markdown

# External System Gateway
The External System Gateway gives site scripts two runtime capabilities: invoking HTTP/REST APIs on named external systems, and executing SQL writes against named database connections. Both capabilities expose a dual call mode — synchronous (blocking, result returned) and cached (store-and-forward on transient failure, `TrackedOperationId` returned) — so scripts choose the right delivery guarantee per operation without knowing the underlying retry machinery.
## Overview
External System Gateway (#7) runs exclusively at the site. Definitions — external system endpoints with their authentication and method catalogue, and database connection strings — are authored centrally and deployed to the site's local SQLite by the Deployment Manager. The site never reaches back to the configuration database at call time; the repository resolves each definition from SQLite on the hot path.
The component code lives in `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/`, with all four source files at the root:
- `ExternalSystemClient.cs``IExternalSystemClient` implementation; `CallAsync` (synchronous) and `CachedCallAsync` (store-and-forward on transient failure), plus the `DeliverBufferedAsync` entry point consumed by the Store-and-Forward Engine during retry sweeps.
- `DatabaseGateway.cs``IDatabaseGateway` implementation; `GetConnectionAsync` (ADO.NET `SqlConnection`) and `CachedWriteAsync` (S&F-buffered SQL), plus its own `DeliverBufferedAsync` for the retry path.
- `ErrorClassifier.cs` — static helper that maps HTTP status codes and exception types to `TransientExternalSystemException` / `PermanentExternalSystemException`.
- `ExternalSystemGatewayOptions.cs` — options class bound from `ScadaBridge:ExternalSystemGateway`.
- `ServiceCollectionExtensions.cs``AddExternalSystemGateway` extension; registers `ExternalSystemClient` and `DatabaseGateway` as scoped services and applies per-system connection limits to named `HttpClient` instances.
Both services are DI-scoped. Script Execution Actors (short-lived, per-invocation) resolve them; blocking I/O from both runs on a dedicated Akka.NET dispatcher to keep the default dispatcher free for coordination actors.
## Key Concepts
### Definitions at rest
An `ExternalSystemDefinition` carries the base `EndpointUrl`, `AuthType` (`"apikey"`, `"basic"`, or `"none"`), `AuthConfiguration` (the credential payload), and per-system retry settings (`MaxRetries`, `RetryDelay`). Its child `ExternalSystemMethod` records each carry `HttpMethod`, `Path` (relative to the base URL), and JSON-serialized `ParameterDefinitions` / `ReturnDefinition`. A `DatabaseConnectionDefinition` carries an ADO.NET `ConnectionString` and its own `MaxRetries` / `RetryDelay`.
Definitions are resolved from the site SQLite repository on every call via name-keyed indexed queries (`GetExternalSystemByNameAsync`, `GetDatabaseConnectionByNameAsync`) rather than a fetch-all-then-filter scan, because definitions are read on every script invocation.
### Dual call modes
Every API call and every database write has two modes:
| Mode | API surface | Failure behaviour | Return value |
|------|-------------|-------------------|--------------|
| Synchronous | `ExternalSystem.Call()` / `Database.Connection()` | All failures returned to script | Response JSON / `DbConnection` |
| Cached | `ExternalSystem.CachedCall()` / `Database.CachedWrite()` | Transient → buffered; permanent → returned | `TrackedOperationId` (on buffer) |
`CachedCallAsync` and `CachedWriteAsync` attempt immediate delivery first. Only a transient failure routes to the Store-and-Forward Engine.
### Error classification
`ErrorClassifier` is the single authority on what counts as transient:
- **HTTP status codes**: 5xx, 408 (Request Timeout), 429 (Too Many Requests) → transient. All other non-success 4xx → permanent.
- **Exceptions**: `HttpRequestException`, `TaskCanceledException`, `TimeoutException`, `OperationCanceledException` → transient. `JsonException` during payload deserialization → permanent (a malformed payload will not become well-formed on retry, so it is parked rather than retried forever).
Transient failures on `CachedCall` / `CachedWrite` are silently buffered (logged at `Debug`). Permanent failures are logged at `Warning` and returned to the calling script regardless of call mode, because a permanently-wrong request should surface immediately.
## Architecture
### HTTP invocation (`ExternalSystemClient`)
`InvokeHttpAsync` constructs the request, applies auth, dispatches, and classifies the response. The gateway creates a named `HttpClient` per system (`ExternalSystem_{systemName}`) through `IHttpClientFactory`, with `SocketsHttpHandler.MaxConnectionsPerServer` capped by `MaxConcurrentConnectionsPerSystem`. The framework default `HttpClient.Timeout` (100 s) is deliberately overridden to `Timeout.InfiniteTimeSpan` so the gateway's own `CancellationTokenSource(DefaultHttpTimeout)` is the sole timeout source — without this, configured timeouts above 100 s would be silently clipped.
Parameter routing by verb:
- `POST`, `PUT`, `PATCH` → JSON body (`application/json`).
- `GET`, `DELETE` → URL query string (null-valued parameters omitted; no trailing `?` when all values are null).
Auth application:
- `apikey``AuthConfiguration` format `"HeaderName:KeyValue"` or bare key value (default header `X-API-Key`).
- `basic``AuthConfiguration` format `"username:password"`, Base64-encoded as `Authorization: Basic ...`.
- `none` — silent no-op.
- Missing or malformed `AuthConfiguration` for a type that requires credentials logs a `Warning` but does not abort the call.
Error body embedded in script-visible messages is capped at 2 048 characters so a misbehaving endpoint cannot inflate error strings.
```csharp
// ExternalSystemClient.cs
catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested)
{
// Our own timeout elapsed — a transient failure per the design.
throw ErrorClassifier.AsTransient(
$"Timeout calling {system.Name} after {_options.DefaultHttpTimeout.TotalSeconds:0.##}s", ex);
}
catch (Exception ex) when (ErrorClassifier.IsTransient(ex))
{
throw ErrorClassifier.AsTransient($"Connection error to {system.Name}: {ex.Message}", ex);
}
```
### `CachedCallAsync` — the buffered path
On a transient failure, `CachedCallAsync` serializes `{SystemName, MethodName, Parameters}` as JSON and calls `StoreAndForwardService.EnqueueAsync` with `StoreAndForwardCategory.ExternalSystem`. Three details matter for correct S&F integration:
- **`attemptImmediateDelivery: false`** — the HTTP attempt has already been made; passing `true` would dispatch the same request twice.
- **`MaxRetries` / `RetryDelay` defaulting** — `ExternalSystemDefinition.MaxRetries` defaults to `0`, and the S&F engine treats a stored `0` as "no limit". A `0` is therefore passed as `null` so the engine's own bounded default applies, avoiding unbounded retry loops on unconfigured systems.
- **`messageId: trackedOperationId`** — pins the S&F message GUID to the caller-supplied `TrackedOperationId` so the retry loop can emit per-attempt and terminal audit telemetry under the same tracking id.
```csharp
// ExternalSystemClient.cs — transient branch of CachedCallAsync
await _storeAndForward.EnqueueAsync(
StoreAndForwardCategory.ExternalSystem,
systemName,
payload,
originInstanceName,
system.MaxRetries > 0 ? system.MaxRetries : null,
system.RetryDelay > TimeSpan.Zero ? system.RetryDelay : null,
attemptImmediateDelivery: false,
messageId: trackedOperationId?.ToString(),
executionId: executionId,
sourceScript: sourceScript,
parentExecutionId: parentExecutionId);
return new ExternalCallResult(true, null, null, WasBuffered: true);
```
### `DeliverBufferedAsync` — S&F retry delivery
The Store-and-Forward Engine calls `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` during retry sweeps. Both methods:
1. Deserialize the payload JSON; treat `JsonException` as permanent (return `false` → park).
2. Re-resolve the definition by name; if gone, return `false` → park.
3. Execute the operation. `PermanentExternalSystemException` → park. `TransientExternalSystemException` propagates → engine retries.
### Database gateway (`DatabaseGateway`)
`GetConnectionAsync` resolves the `DatabaseConnectionDefinition`, opens a `SqlConnection` against `ConnectionString`, and returns the open connection. The caller owns disposal. If `OpenAsync` throws (unreachable server, bad credentials), the connection is disposed before the exception propagates.
`CachedWriteAsync` serializes `{ConnectionName, Sql, Parameters}` and enqueues to S&F under `StoreAndForwardCategory.CachedDbWrite`, with the same `MaxRetries` / `RetryDelay` defaulting logic as `CachedCallAsync`.
During retry delivery, `JsonElement` parameter values are converted with a numeric type preference of `long``decimal``double`. This matters because a script's decimal SQL parameter is serialized as an untagged JSON number; naively casting to `double` loses precision for money and measurement values.
```csharp
// DatabaseGateway.cs — JsonElementToParameterValue
JsonValueKind.Number => element.TryGetInt64(out var l)
? l
: element.TryGetDecimal(out var dec)
? dec
: element.GetDouble(),
```
## Usage
Scripts interact through `IExternalSystemClient` and `IDatabaseGateway`, which the Script Runtime Context exposes as `ExternalSystem` and `Database` respectively. Scripts never construct gateway types directly.
**Synchronous external system call** — blocks until the response arrives or the timeout elapses:
```csharp
// Script code (via ScriptRuntimeContext)
var result = await ExternalSystem.Call("MES", "GetRecipe", new { RecipeId = 42 });
if (result.Success)
{
var name = result.Response.recipeName; // dynamic JSON access
}
```
**Cached external system call** — returns immediately with a `TrackedOperationId`; the actual HTTP request is attempted once and, on transient failure, buffered for retry:
```csharp
var tracked = await ExternalSystem.CachedCall("MES", "PostProductionResult", payload);
// tracked.WasBuffered == true when queued to S&F
```
**Synchronous database access** — caller controls the connection lifetime:
```csharp
await using var conn = await Database.Connection("HistorianDB");
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT TOP 1 Value FROM dbo.Tags WHERE Name = @name";
cmd.Parameters.AddWithValue("@name", tagName);
var value = await cmd.ExecuteScalarAsync();
```
**Cached database write** — enqueued immediately; returns a `TrackedOperationId`:
```csharp
await Database.CachedWrite("MES_DB",
"INSERT INTO dbo.ProductionLog (BatchId, Qty) VALUES (@batchId, @qty)",
new { batchId = id, qty = quantity });
```
Call status is observable via `Tracking.Status(trackedOperationId)` — answered site-locally against the S&F tracking table, or centrally via the Site Call Audit page.
## Configuration
Options are bound from `ScadaBridge:ExternalSystemGateway` into `ExternalSystemGatewayOptions` by `AddExternalSystemGateway`.
| Key | Default | Description |
|-----|---------|-------------|
| `DefaultHttpTimeout` | `00:00:30` | Per-call HTTP round-trip timeout. Applied via `CancellationTokenSource`; overrides the framework 100 s default. |
| `MaxConcurrentConnectionsPerSystem` | `10` | `SocketsHttpHandler.MaxConnectionsPerServer` applied to each named `HttpClient` (`ExternalSystem_{name}`). Does not affect other host `HttpClient` instances. |
Per-system retry settings (`MaxRetries`, `RetryDelay`) are properties of `ExternalSystemDefinition` and `DatabaseConnectionDefinition`, authored by operators in the Central UI and deployed as part of the system artifact. The gateway passes these directly to the Store-and-Forward Engine on enqueue.
There is no separate configuration section for database connections — connection strings reside in `DatabaseConnectionDefinition.ConnectionString`, deployed via artifact. Pool tuning (max pool size, connection lifetime) can be embedded in the connection string itself.
## Dependencies & Interactions
- [Commons (#16)](./Commons.md) — owns `IExternalSystemClient`, `IDatabaseGateway`, `ExternalCallResult`, `TrackedOperationId`, `ExternalSystemDefinition`, `ExternalSystemMethod`, `DatabaseConnectionDefinition`, `IExternalSystemRepository`, and the `StoreAndForwardCategory` enum values consumed here.
- [Store-and-Forward Engine (#6)](./StoreAndForward.md) — receives buffered `ExternalSystem` and `CachedDbWrite` payloads from `CachedCallAsync` / `CachedWriteAsync`; drives retry sweeps by calling `DeliverBufferedAsync` on both gateway types; assigns `TrackedOperationId` tracking rows; owns the site-local operation tracking table read by `Tracking.Status()`.
- [Configuration Database (#17)](./ConfigurationDatabase.md) — provides `IExternalSystemRepository`, implemented against the site SQLite replica. Central uses the same interface against MS SQL for definition management.
- [Site Runtime (#3)](../requirements/Component-SiteRuntime.md) — Script Execution Actors resolve `IExternalSystemClient` and `IDatabaseGateway` from DI and expose them to script code as `ExternalSystem` and `Database`. Actors run on a dedicated blocking I/O dispatcher to isolate HTTP and SQL waits from the actor system's default dispatcher.
- [Site Call Audit (#22)](./SiteCallAudit.md) — receives cached-call lifecycle telemetry (via the combined `CachedCallTelemetry` packet) so cached call status is observable centrally; the gateway's S&F delivery writes the tracking row that `Tracking.Status()` reads.
- [Audit Log (#23)](./AuditLog.md) — audit rows for `ApiOutbound` and `DbOutbound` channels are emitted by the Script Runtime Context around gateway calls; gateway itself does not write audit rows directly. The `trackedOperationId`, `executionId`, and `parentExecutionId` threaded through `CachedCallAsync` / `CachedWriteAsync` keep audit rows correlated across the retry lifecycle.
## Troubleshooting
### A cached call is stuck retrying
If the external system definition or database connection has `MaxRetries = 0` and the operator intended "no retries", the S&F engine interprets `0` as "no limit" (retry forever). The gateway normalizes `0` to `null` on enqueue so the engine's bounded default applies. Verify the definition's `MaxRetries` field is set to the intended value in the Central UI and redeployed.
### Timeout is not being respected
`ExternalSystemGatewayOptions.DefaultHttpTimeout` applies only when `HttpClient.Timeout` is `Timeout.InfiniteTimeSpan`. The gateway sets this explicitly on every factory-supplied client. If a custom `HttpMessageHandler` upstream resets `Timeout`, the gateway's `CancellationTokenSource(DefaultHttpTimeout)` is still the controlling token because `SendAsync` is called with the linked token, not the raw `cancellationToken`.
### Auth header not sent
The gateway logs a `Warning` when `AuthType` is `"apikey"` or `"basic"` but `AuthConfiguration` is empty or absent, and when `AuthType` is `"basic"` but `AuthConfiguration` has no `:` separator. Check the site log for `ApplyAuth:` warning messages. The credential value is never logged — only the system name and auth type.
### A buffered call is parked immediately
A `JsonException` during `DeliverBufferedAsync` payload deserialization is treated as permanent (the same malformed payload will fail every time). The message is parked rather than retried. Check the site log for `"malformed JSON payload; parking"` alongside the message GUID, then inspect the S&F store for the payload to identify the serialization issue.
## Related Documentation
- [External System Gateway design specification](../requirements/Component-ExternalSystemGateway.md)
- [Store-and-Forward Engine](./StoreAndForward.md)
- [Site Call Audit](./SiteCallAudit.md)
- [Audit Log](./AuditLog.md)
- [Commons](./Commons.md)
- [Configuration Database](./ConfigurationDatabase.md)