Refine External System Gateway: protocol, auth, timeouts, error classification

Specify HTTP/REST with JSON as the invocation protocol. Add API key and Basic Auth
as outbound authentication modes. Add per-system call timeouts. Classify errors by
HTTP status for store-and-forward decisions (5xx/transient → retry, 4xx → permanent
error to script). Document ADO.NET connection pooling for database connections.
Update Store-and-Forward to clarify transient-only buffering.
This commit is contained in:
Joseph Doherty
2026-03-16 07:57:00 -04:00
parent 19c7e6880f
commit 5fff1712a8
3 changed files with 89 additions and 2 deletions
+31 -2
View File
@@ -31,10 +31,16 @@ Site clusters (executes calls directly to external systems). Central cluster (st
Each external system definition includes:
- **Name**: Unique identifier (e.g., "MES", "RecipeManager").
- **Connection Details**: Endpoint URL, authentication, protocol.
- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine).
- **Base URL**: The root endpoint URL for the external system (e.g., `https://mes.example.com/api`).
- **Authentication**: One of:
- **API Key**: Header name (e.g., `X-API-Key`) and key value.
- **Basic Auth**: Username and password.
- **Timeout**: Per-system timeout for all method calls (e.g., 30 seconds). Applies to the HTTP request round-trip.
- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine for transient failures only).
- **Method Definitions**: List of available API methods, each with:
- Method name.
- **HTTP method**: GET, POST, PUT, or DELETE.
- **Path**: Relative path appended to the base URL (e.g., `/recipes/{id}`).
- Parameter definitions (name, type).
- Return type definition.
@@ -58,6 +64,29 @@ Each database connection definition includes:
- Payload includes: connection name, SQL statement, serialized parameter values.
- If the database is unavailable, the write is buffered and retried per the connection's retry settings.
## Invocation Protocol
All external system calls are **HTTP/REST** with **JSON** serialization:
- The ESG acts as an HTTP client. The external system definition provides the base URL; each method definition specifies the HTTP method and relative path.
- Request parameters are serialized as JSON in the request body (POST/PUT) or as query parameters (GET/DELETE).
- Response bodies are deserialized from JSON into the method's defined return type.
- Credentials (API key header or Basic Auth header) are attached to every request per the system's authentication configuration.
## Call Timeout & Error Handling
- Each external system definition specifies a **timeout** that applies to all method calls on that system.
- Error classification determines whether the Store-and-Forward Engine retries the call:
- **Transient failures** (connection refused, timeout, HTTP 5xx): The call is routed to the Store-and-Forward Engine for retry per the system's retry settings. The script does **not** block waiting for eventual delivery — the call is buffered and the script continues.
- **Permanent failures** (HTTP 4xx): No retry. The error is returned **synchronously** to the calling script for handling (log, notify, try different parameters, etc.). The failure is logged to Site Event Logging.
- This classification ensures the S&F buffer is not polluted with requests that will never succeed.
## Database Connection Management
- Database connections use **standard ADO.NET connection pooling** per named connection. No custom pool management.
- Pool behavior (max pool size, connection lifetime, etc.) can be tuned via connection string parameters in the database connection definition if needed.
- Synchronous failures on `Database.Connection()` (e.g., unreachable server) return an error to the calling script, consistent with external system permanent failure handling.
## Dependencies
- **Configuration Database (MS SQL)**: Stores external system and database connection definitions.
+2
View File
@@ -51,6 +51,8 @@ Retry settings are defined on the **source entity** (not per-message):
The retry interval is **fixed** (not exponential backoff). Fixed interval is sufficient for the expected use cases.
**Note**: Only **transient failures** are eligible for store-and-forward buffering. For external system calls, transient failures are connection errors, timeouts, and HTTP 5xx responses. Permanent failures (HTTP 4xx) are returned directly to the calling script and are **not** queued for retry. This prevents the buffer from accumulating requests that will never succeed.
## Buffer Size
There is **no maximum buffer size**. Messages accumulate in the buffer until delivery succeeds or retries are exhausted and the message is parked. Storage is bounded only by available disk space on the site node.
@@ -0,0 +1,56 @@
# External System Gateway Refinement — Design
**Date**: 2026-03-16
**Component**: External System Gateway (`Component-ExternalSystemGateway.md`)
**Status**: Approved
## Problem
The External System Gateway doc lacked specification for the invocation protocol, authentication methods, call timeouts, error classification for store-and-forward decisions, and database connection management.
## Decisions
### Invocation Protocol
- **HTTP/REST only** with **JSON** serialization. The ESG is an HTTP client with predefined endpoints.
- Method definitions include HTTP method (GET/POST/PUT/DELETE) and relative path.
- Parameters serialized as JSON body (POST/PUT) or query parameters (GET/DELETE).
### Outbound Authentication
- Two modes per external system definition:
- **API Key**: Configurable header name and key value.
- **Basic Auth**: Username and password, sent as standard HTTP Authorization header.
### Call Timeouts
- **Per-system timeout** — one timeout value applies to all methods on a given external system.
- Defined in the external system definition.
### Error Classification
- **Transient failures** (connection errors, timeouts, HTTP 5xx): Routed to Store-and-Forward for retry. Script does not block.
- **Permanent failures** (HTTP 4xx): No retry. Error returned synchronously to the calling script. Logged to Site Event Logging.
- S&F buffer only accepts transient failures to avoid accumulating unrecoverable requests.
### Permanent Failure Behavior
- Synchronous error back to script, consistent with DCL write failure handling.
### Database Connection Pooling
- Standard ADO.NET connection pooling per named connection. No custom pool logic.
- Pool tuning via connection string parameters if needed.
### Serialization
- JSON only, consistent with REST-only decision.
## Affected Documents
| Document | Change |
|----------|--------|
| `Component-ExternalSystemGateway.md` | Updated External System Definition fields (base URL, auth modes, timeout, HTTP method/path per method). Added 3 new sections: Invocation Protocol, Call Timeout & Error Handling, Database Connection Management. |
| `Component-StoreAndForward.md` | Clarified that only transient failures are buffered; 4xx errors are not queued. |
## Alternatives Considered
- **SOAP support**: Rejected — REST covers modern integrations; SOAP systems can be fronted by a thin REST wrapper.
- **OAuth2 client credentials**: Rejected — adds token lifecycle complexity at every site for marginal benefit; can be handled by a gateway/proxy.
- **Per-method timeouts**: Rejected — external systems tend to have consistent latency; per-system is the right granularity.
- **All failures retryable**: Rejected — retrying 4xx errors pollutes the S&F buffer with requests that will never succeed.
- **Custom connection pooling**: Rejected — ADO.NET pooling is battle-tested and handles this scenario natively.
- **XML serialization option**: Rejected — JSON-only is consistent with REST-only; XML systems can use a wrapper.