docs(components): reference docs batch 2/4 — TemplateEngine, DeploymentManager, SiteRuntime, DataConnectionLayer, StoreAndForward, ExternalSystemGateway
This commit is contained in:
@@ -0,0 +1,219 @@
|
||||
# External System Gateway
|
||||
|
||||
The External System Gateway gives site scripts two runtime capabilities: invoking HTTP/REST APIs on named external systems, and executing SQL writes against named database connections. Both capabilities expose a dual call mode — synchronous (blocking, result returned) and cached (store-and-forward on transient failure, `TrackedOperationId` returned) — so scripts choose the right delivery guarantee per operation without knowing the underlying retry machinery.
|
||||
|
||||
## Overview
|
||||
|
||||
External System Gateway (#7) runs exclusively at the site. Definitions — external system endpoints with their authentication and method catalogue, and database connection strings — are authored centrally and deployed to the site's local SQLite by the Deployment Manager. The site never reaches back to the configuration database at call time; the repository resolves each definition from SQLite on the hot path.
|
||||
|
||||
The component code lives in `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/`, with all four source files at the root:
|
||||
|
||||
- `ExternalSystemClient.cs` — `IExternalSystemClient` implementation; `CallAsync` (synchronous) and `CachedCallAsync` (store-and-forward on transient failure), plus the `DeliverBufferedAsync` entry point consumed by the Store-and-Forward Engine during retry sweeps.
|
||||
- `DatabaseGateway.cs` — `IDatabaseGateway` implementation; `GetConnectionAsync` (ADO.NET `SqlConnection`) and `CachedWriteAsync` (S&F-buffered SQL), plus its own `DeliverBufferedAsync` for the retry path.
|
||||
- `ErrorClassifier.cs` — static helper that maps HTTP status codes and exception types to `TransientExternalSystemException` / `PermanentExternalSystemException`.
|
||||
- `ExternalSystemGatewayOptions.cs` — options class bound from `ScadaBridge:ExternalSystemGateway`.
|
||||
- `ServiceCollectionExtensions.cs` — `AddExternalSystemGateway` extension; registers `ExternalSystemClient` and `DatabaseGateway` as scoped services and applies per-system connection limits to named `HttpClient` instances.
|
||||
|
||||
Both services are DI-scoped. Script Execution Actors (short-lived, per-invocation) resolve them; blocking I/O from both runs on a dedicated Akka.NET dispatcher to keep the default dispatcher free for coordination actors.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Definitions at rest
|
||||
|
||||
An `ExternalSystemDefinition` carries the base `EndpointUrl`, `AuthType` (`"apikey"`, `"basic"`, or `"none"`), `AuthConfiguration` (the credential payload), and per-system retry settings (`MaxRetries`, `RetryDelay`). Its child `ExternalSystemMethod` records each carry `HttpMethod`, `Path` (relative to the base URL), and JSON-serialized `ParameterDefinitions` / `ReturnDefinition`. A `DatabaseConnectionDefinition` carries an ADO.NET `ConnectionString` and its own `MaxRetries` / `RetryDelay`.
|
||||
|
||||
Definitions are resolved from the site SQLite repository on every call via name-keyed indexed queries (`GetExternalSystemByNameAsync`, `GetDatabaseConnectionByNameAsync`) rather than a fetch-all-then-filter scan, because definitions are read on every script invocation.
|
||||
|
||||
### Dual call modes
|
||||
|
||||
Every API call and every database write has two modes:
|
||||
|
||||
| Mode | API surface | Failure behaviour | Return value |
|
||||
|------|-------------|-------------------|--------------|
|
||||
| Synchronous | `ExternalSystem.Call()` / `Database.Connection()` | All failures returned to script | Response JSON / `DbConnection` |
|
||||
| Cached | `ExternalSystem.CachedCall()` / `Database.CachedWrite()` | Transient → buffered; permanent → returned | `TrackedOperationId` (on buffer) |
|
||||
|
||||
`CachedCallAsync` and `CachedWriteAsync` attempt immediate delivery first. Only a transient failure routes to the Store-and-Forward Engine.
|
||||
|
||||
### Error classification
|
||||
|
||||
`ErrorClassifier` is the single authority on what counts as transient:
|
||||
|
||||
- **HTTP status codes**: 5xx, 408 (Request Timeout), 429 (Too Many Requests) → transient. All other non-success 4xx → permanent.
|
||||
- **Exceptions**: `HttpRequestException`, `TaskCanceledException`, `TimeoutException`, `OperationCanceledException` → transient. `JsonException` during payload deserialization → permanent (a malformed payload will not become well-formed on retry, so it is parked rather than retried forever).
|
||||
|
||||
Transient failures on `CachedCall` / `CachedWrite` are silently buffered (logged at `Debug`). Permanent failures are logged at `Warning` and returned to the calling script regardless of call mode, because a permanently-wrong request should surface immediately.
|
||||
|
||||
## Architecture
|
||||
|
||||
### HTTP invocation (`ExternalSystemClient`)
|
||||
|
||||
`InvokeHttpAsync` constructs the request, applies auth, dispatches, and classifies the response. The gateway creates a named `HttpClient` per system (`ExternalSystem_{systemName}`) through `IHttpClientFactory`, with `SocketsHttpHandler.MaxConnectionsPerServer` capped by `MaxConcurrentConnectionsPerSystem`. The framework default `HttpClient.Timeout` (100 s) is deliberately overridden to `Timeout.InfiniteTimeSpan` so the gateway's own `CancellationTokenSource(DefaultHttpTimeout)` is the sole timeout source — without this, configured timeouts above 100 s would be silently clipped.
|
||||
|
||||
Parameter routing by verb:
|
||||
- `POST`, `PUT`, `PATCH` → JSON body (`application/json`).
|
||||
- `GET`, `DELETE` → URL query string (null-valued parameters omitted; no trailing `?` when all values are null).
|
||||
|
||||
Auth application:
|
||||
- `apikey` — `AuthConfiguration` format `"HeaderName:KeyValue"` or bare key value (default header `X-API-Key`).
|
||||
- `basic` — `AuthConfiguration` format `"username:password"`, Base64-encoded as `Authorization: Basic ...`.
|
||||
- `none` — silent no-op.
|
||||
- Missing or malformed `AuthConfiguration` for a type that requires credentials logs a `Warning` but does not abort the call.
|
||||
|
||||
Error body embedded in script-visible messages is capped at 2 048 characters so a misbehaving endpoint cannot inflate error strings.
|
||||
|
||||
```csharp
|
||||
// ExternalSystemClient.cs
|
||||
catch (OperationCanceledException ex) when (timeoutCts.IsCancellationRequested)
|
||||
{
|
||||
// Our own timeout elapsed — a transient failure per the design.
|
||||
throw ErrorClassifier.AsTransient(
|
||||
$"Timeout calling {system.Name} after {_options.DefaultHttpTimeout.TotalSeconds:0.##}s", ex);
|
||||
}
|
||||
catch (Exception ex) when (ErrorClassifier.IsTransient(ex))
|
||||
{
|
||||
throw ErrorClassifier.AsTransient($"Connection error to {system.Name}: {ex.Message}", ex);
|
||||
}
|
||||
```
|
||||
|
||||
### `CachedCallAsync` — the buffered path
|
||||
|
||||
On a transient failure, `CachedCallAsync` serializes `{SystemName, MethodName, Parameters}` as JSON and calls `StoreAndForwardService.EnqueueAsync` with `StoreAndForwardCategory.ExternalSystem`. Three details matter for correct S&F integration:
|
||||
|
||||
- **`attemptImmediateDelivery: false`** — the HTTP attempt has already been made; passing `true` would dispatch the same request twice.
|
||||
- **`MaxRetries` / `RetryDelay` defaulting** — `ExternalSystemDefinition.MaxRetries` defaults to `0`, and the S&F engine treats a stored `0` as "no limit". A `0` is therefore passed as `null` so the engine's own bounded default applies, avoiding unbounded retry loops on unconfigured systems.
|
||||
- **`messageId: trackedOperationId`** — pins the S&F message GUID to the caller-supplied `TrackedOperationId` so the retry loop can emit per-attempt and terminal audit telemetry under the same tracking id.
|
||||
|
||||
```csharp
|
||||
// ExternalSystemClient.cs — transient branch of CachedCallAsync
|
||||
await _storeAndForward.EnqueueAsync(
|
||||
StoreAndForwardCategory.ExternalSystem,
|
||||
systemName,
|
||||
payload,
|
||||
originInstanceName,
|
||||
system.MaxRetries > 0 ? system.MaxRetries : null,
|
||||
system.RetryDelay > TimeSpan.Zero ? system.RetryDelay : null,
|
||||
attemptImmediateDelivery: false,
|
||||
messageId: trackedOperationId?.ToString(),
|
||||
executionId: executionId,
|
||||
sourceScript: sourceScript,
|
||||
parentExecutionId: parentExecutionId);
|
||||
|
||||
return new ExternalCallResult(true, null, null, WasBuffered: true);
|
||||
```
|
||||
|
||||
### `DeliverBufferedAsync` — S&F retry delivery
|
||||
|
||||
The Store-and-Forward Engine calls `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` during retry sweeps. Both methods:
|
||||
|
||||
1. Deserialize the payload JSON; treat `JsonException` as permanent (return `false` → park).
|
||||
2. Re-resolve the definition by name; if gone, return `false` → park.
|
||||
3. Execute the operation. `PermanentExternalSystemException` → park. `TransientExternalSystemException` propagates → engine retries.
|
||||
|
||||
### Database gateway (`DatabaseGateway`)
|
||||
|
||||
`GetConnectionAsync` resolves the `DatabaseConnectionDefinition`, opens a `SqlConnection` against `ConnectionString`, and returns the open connection. The caller owns disposal. If `OpenAsync` throws (unreachable server, bad credentials), the connection is disposed before the exception propagates.
|
||||
|
||||
`CachedWriteAsync` serializes `{ConnectionName, Sql, Parameters}` and enqueues to S&F under `StoreAndForwardCategory.CachedDbWrite`, with the same `MaxRetries` / `RetryDelay` defaulting logic as `CachedCallAsync`.
|
||||
|
||||
During retry delivery, `JsonElement` parameter values are converted with a numeric type preference of `long` → `decimal` → `double`. This matters because a script's decimal SQL parameter is serialized as an untagged JSON number; naively casting to `double` loses precision for money and measurement values.
|
||||
|
||||
```csharp
|
||||
// DatabaseGateway.cs — JsonElementToParameterValue
|
||||
JsonValueKind.Number => element.TryGetInt64(out var l)
|
||||
? l
|
||||
: element.TryGetDecimal(out var dec)
|
||||
? dec
|
||||
: element.GetDouble(),
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Scripts interact through `IExternalSystemClient` and `IDatabaseGateway`, which the Script Runtime Context exposes as `ExternalSystem` and `Database` respectively. Scripts never construct gateway types directly.
|
||||
|
||||
**Synchronous external system call** — blocks until the response arrives or the timeout elapses:
|
||||
|
||||
```csharp
|
||||
// Script code (via ScriptRuntimeContext)
|
||||
var result = await ExternalSystem.Call("MES", "GetRecipe", new { RecipeId = 42 });
|
||||
if (result.Success)
|
||||
{
|
||||
var name = result.Response.recipeName; // dynamic JSON access
|
||||
}
|
||||
```
|
||||
|
||||
**Cached external system call** — returns immediately with a `TrackedOperationId`; the actual HTTP request is attempted once and, on transient failure, buffered for retry:
|
||||
|
||||
```csharp
|
||||
var tracked = await ExternalSystem.CachedCall("MES", "PostProductionResult", payload);
|
||||
// tracked.WasBuffered == true when queued to S&F
|
||||
```
|
||||
|
||||
**Synchronous database access** — caller controls the connection lifetime:
|
||||
|
||||
```csharp
|
||||
await using var conn = await Database.Connection("HistorianDB");
|
||||
using var cmd = conn.CreateCommand();
|
||||
cmd.CommandText = "SELECT TOP 1 Value FROM dbo.Tags WHERE Name = @name";
|
||||
cmd.Parameters.AddWithValue("@name", tagName);
|
||||
var value = await cmd.ExecuteScalarAsync();
|
||||
```
|
||||
|
||||
**Cached database write** — enqueued immediately; returns a `TrackedOperationId`:
|
||||
|
||||
```csharp
|
||||
await Database.CachedWrite("MES_DB",
|
||||
"INSERT INTO dbo.ProductionLog (BatchId, Qty) VALUES (@batchId, @qty)",
|
||||
new { batchId = id, qty = quantity });
|
||||
```
|
||||
|
||||
Call status is observable via `Tracking.Status(trackedOperationId)` — answered site-locally against the S&F tracking table, or centrally via the Site Call Audit page.
|
||||
|
||||
## Configuration
|
||||
|
||||
Options are bound from `ScadaBridge:ExternalSystemGateway` into `ExternalSystemGatewayOptions` by `AddExternalSystemGateway`.
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `DefaultHttpTimeout` | `00:00:30` | Per-call HTTP round-trip timeout. Applied via `CancellationTokenSource`; overrides the framework 100 s default. |
|
||||
| `MaxConcurrentConnectionsPerSystem` | `10` | `SocketsHttpHandler.MaxConnectionsPerServer` applied to each named `HttpClient` (`ExternalSystem_{name}`). Does not affect other host `HttpClient` instances. |
|
||||
|
||||
Per-system retry settings (`MaxRetries`, `RetryDelay`) are properties of `ExternalSystemDefinition` and `DatabaseConnectionDefinition`, authored by operators in the Central UI and deployed as part of the system artifact. The gateway passes these directly to the Store-and-Forward Engine on enqueue.
|
||||
|
||||
There is no separate configuration section for database connections — connection strings reside in `DatabaseConnectionDefinition.ConnectionString`, deployed via artifact. Pool tuning (max pool size, connection lifetime) can be embedded in the connection string itself.
|
||||
|
||||
## Dependencies & Interactions
|
||||
|
||||
- [Commons (#16)](./Commons.md) — owns `IExternalSystemClient`, `IDatabaseGateway`, `ExternalCallResult`, `TrackedOperationId`, `ExternalSystemDefinition`, `ExternalSystemMethod`, `DatabaseConnectionDefinition`, `IExternalSystemRepository`, and the `StoreAndForwardCategory` enum values consumed here.
|
||||
- [Store-and-Forward Engine (#6)](./StoreAndForward.md) — receives buffered `ExternalSystem` and `CachedDbWrite` payloads from `CachedCallAsync` / `CachedWriteAsync`; drives retry sweeps by calling `DeliverBufferedAsync` on both gateway types; assigns `TrackedOperationId` tracking rows; owns the site-local operation tracking table read by `Tracking.Status()`.
|
||||
- [Configuration Database (#17)](./ConfigurationDatabase.md) — provides `IExternalSystemRepository`, implemented against the site SQLite replica. Central uses the same interface against MS SQL for definition management.
|
||||
- [Site Runtime (#3)](../requirements/Component-SiteRuntime.md) — Script Execution Actors resolve `IExternalSystemClient` and `IDatabaseGateway` from DI and expose them to script code as `ExternalSystem` and `Database`. Actors run on a dedicated blocking I/O dispatcher to isolate HTTP and SQL waits from the actor system's default dispatcher.
|
||||
- [Site Call Audit (#22)](./SiteCallAudit.md) — receives cached-call lifecycle telemetry (via the combined `CachedCallTelemetry` packet) so cached call status is observable centrally; the gateway's S&F delivery writes the tracking row that `Tracking.Status()` reads.
|
||||
- [Audit Log (#23)](./AuditLog.md) — audit rows for `ApiOutbound` and `DbOutbound` channels are emitted by the Script Runtime Context around gateway calls; gateway itself does not write audit rows directly. The `trackedOperationId`, `executionId`, and `parentExecutionId` threaded through `CachedCallAsync` / `CachedWriteAsync` keep audit rows correlated across the retry lifecycle.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### A cached call is stuck retrying
|
||||
|
||||
If the external system definition or database connection has `MaxRetries = 0` and the operator intended "no retries", the S&F engine interprets `0` as "no limit" (retry forever). The gateway normalizes `0` to `null` on enqueue so the engine's bounded default applies. Verify the definition's `MaxRetries` field is set to the intended value in the Central UI and redeployed.
|
||||
|
||||
### Timeout is not being respected
|
||||
|
||||
`ExternalSystemGatewayOptions.DefaultHttpTimeout` applies only when `HttpClient.Timeout` is `Timeout.InfiniteTimeSpan`. The gateway sets this explicitly on every factory-supplied client. If a custom `HttpMessageHandler` upstream resets `Timeout`, the gateway's `CancellationTokenSource(DefaultHttpTimeout)` is still the controlling token because `SendAsync` is called with the linked token, not the raw `cancellationToken`.
|
||||
|
||||
### Auth header not sent
|
||||
|
||||
The gateway logs a `Warning` when `AuthType` is `"apikey"` or `"basic"` but `AuthConfiguration` is empty or absent, and when `AuthType` is `"basic"` but `AuthConfiguration` has no `:` separator. Check the site log for `ApplyAuth:` warning messages. The credential value is never logged — only the system name and auth type.
|
||||
|
||||
### A buffered call is parked immediately
|
||||
|
||||
A `JsonException` during `DeliverBufferedAsync` payload deserialization is treated as permanent (the same malformed payload will fail every time). The message is parked rather than retried. Check the site log for `"malformed JSON payload; parking"` alongside the message GUID, then inspect the S&F store for the payload to identify the serialization issue.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [External System Gateway design specification](../requirements/Component-ExternalSystemGateway.md)
|
||||
- [Store-and-Forward Engine](./StoreAndForward.md)
|
||||
- [Site Call Audit](./SiteCallAudit.md)
|
||||
- [Audit Log](./AuditLog.md)
|
||||
- [Commons](./Commons.md)
|
||||
- [Configuration Database](./ConfigurationDatabase.md)
|
||||
Reference in New Issue
Block a user