Refine External System Gateway: protocol, auth, timeouts, error classification

Specify HTTP/REST with JSON as the invocation protocol. Add API key and Basic Auth
as outbound authentication modes. Add per-system call timeouts. Classify errors by
HTTP status for store-and-forward decisions (5xx/transient → retry, 4xx → permanent
error to script). Document ADO.NET connection pooling for database connections.
Update Store-and-Forward to clarify transient-only buffering.
This commit is contained in:
Joseph Doherty
2026-03-16 07:57:00 -04:00
parent 19c7e6880f
commit 5fff1712a8
3 changed files with 89 additions and 2 deletions

View File

@@ -31,10 +31,16 @@ Site clusters (executes calls directly to external systems). Central cluster (st
Each external system definition includes:
- **Name**: Unique identifier (e.g., "MES", "RecipeManager").
- **Connection Details**: Endpoint URL, authentication, protocol.
- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine).
- **Base URL**: The root endpoint URL for the external system (e.g., `https://mes.example.com/api`).
- **Authentication**: One of:
- **API Key**: Header name (e.g., `X-API-Key`) and key value.
- **Basic Auth**: Username and password.
- **Timeout**: Per-system timeout for all method calls (e.g., 30 seconds). Applies to the HTTP request round-trip.
- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine for transient failures only).
- **Method Definitions**: List of available API methods, each with:
- Method name.
- **HTTP method**: GET, POST, PUT, or DELETE.
- **Path**: Relative path appended to the base URL (e.g., `/recipes/{id}`).
- Parameter definitions (name, type).
- Return type definition.
@@ -58,6 +64,29 @@ Each database connection definition includes:
- Payload includes: connection name, SQL statement, serialized parameter values.
- If the database is unavailable, the write is buffered and retried per the connection's retry settings.
## Invocation Protocol
All external system calls are **HTTP/REST** with **JSON** serialization:
- The ESG acts as an HTTP client. The external system definition provides the base URL; each method definition specifies the HTTP method and relative path.
- Request parameters are serialized as JSON in the request body (POST/PUT) or as query parameters (GET/DELETE).
- Response bodies are deserialized from JSON into the method's defined return type.
- Credentials (API key header or Basic Auth header) are attached to every request per the system's authentication configuration.
## Call Timeout & Error Handling
- Each external system definition specifies a **timeout** that applies to all method calls on that system.
- Error classification determines whether the Store-and-Forward Engine retries the call:
- **Transient failures** (connection refused, timeout, HTTP 5xx): The call is routed to the Store-and-Forward Engine for retry per the system's retry settings. The script does **not** block waiting for eventual delivery — the call is buffered and the script continues.
- **Permanent failures** (HTTP 4xx): No retry. The error is returned **synchronously** to the calling script for handling (log, notify, try different parameters, etc.). The failure is logged to Site Event Logging.
- This classification ensures the S&F buffer is not polluted with requests that will never succeed.
## Database Connection Management
- Database connections use **standard ADO.NET connection pooling** per named connection. No custom pool management.
- Pool behavior (max pool size, connection lifetime, etc.) can be tuned via connection string parameters in the database connection definition if needed.
- Synchronous failures on `Database.Connection()` (e.g., unreachable server) return an error to the calling script, consistent with external system permanent failure handling.
## Dependencies
- **Configuration Database (MS SQL)**: Stores external system and database connection definitions.

View File

@@ -51,6 +51,8 @@ Retry settings are defined on the **source entity** (not per-message):
The retry interval is **fixed** (not exponential backoff). Fixed interval is sufficient for the expected use cases.
**Note**: Only **transient failures** are eligible for store-and-forward buffering. For external system calls, transient failures are connection errors, timeouts, and HTTP 5xx responses. Permanent failures (HTTP 4xx) are returned directly to the calling script and are **not** queued for retry. This prevents the buffer from accumulating requests that will never succeed.
## Buffer Size
There is **no maximum buffer size**. Messages accumulate in the buffer until delivery succeeds or retries are exhausted and the message is parked. Storage is bounded only by available disk space on the site node.

View File

@@ -0,0 +1,56 @@
# External System Gateway Refinement — Design
**Date**: 2026-03-16
**Component**: External System Gateway (`Component-ExternalSystemGateway.md`)
**Status**: Approved
## Problem
The External System Gateway doc lacked specification for the invocation protocol, authentication methods, call timeouts, error classification for store-and-forward decisions, and database connection management.
## Decisions
### Invocation Protocol
- **HTTP/REST only** with **JSON** serialization. The ESG is an HTTP client with predefined endpoints.
- Method definitions include HTTP method (GET/POST/PUT/DELETE) and relative path.
- Parameters serialized as JSON body (POST/PUT) or query parameters (GET/DELETE).
### Outbound Authentication
- Two modes per external system definition:
- **API Key**: Configurable header name and key value.
- **Basic Auth**: Username and password, sent as standard HTTP Authorization header.
### Call Timeouts
- **Per-system timeout** — one timeout value applies to all methods on a given external system.
- Defined in the external system definition.
### Error Classification
- **Transient failures** (connection errors, timeouts, HTTP 5xx): Routed to Store-and-Forward for retry. Script does not block.
- **Permanent failures** (HTTP 4xx): No retry. Error returned synchronously to the calling script. Logged to Site Event Logging.
- S&F buffer only accepts transient failures to avoid accumulating unrecoverable requests.
### Permanent Failure Behavior
- Synchronous error back to script, consistent with DCL write failure handling.
### Database Connection Pooling
- Standard ADO.NET connection pooling per named connection. No custom pool logic.
- Pool tuning via connection string parameters if needed.
### Serialization
- JSON only, consistent with REST-only decision.
## Affected Documents
| Document | Change |
|----------|--------|
| `Component-ExternalSystemGateway.md` | Updated External System Definition fields (base URL, auth modes, timeout, HTTP method/path per method). Added 3 new sections: Invocation Protocol, Call Timeout & Error Handling, Database Connection Management. |
| `Component-StoreAndForward.md` | Clarified that only transient failures are buffered; 4xx errors are not queued. |
## Alternatives Considered
- **SOAP support**: Rejected — REST covers modern integrations; SOAP systems can be fronted by a thin REST wrapper.
- **OAuth2 client credentials**: Rejected — adds token lifecycle complexity at every site for marginal benefit; can be handled by a gateway/proxy.
- **Per-method timeouts**: Rejected — external systems tend to have consistent latency; per-system is the right granularity.
- **All failures retryable**: Rejected — retrying 4xx errors pollutes the S&F buffer with requests that will never succeed.
- **Custom connection pooling**: Rejected — ADO.NET pooling is battle-tested and handles this scenario natively.
- **XML serialization option**: Rejected — JSON-only is consistent with REST-only; XML systems can use a wrapper.