# Code Review — ExternalSystemGateway | Field | Value | |-------|-------| | Module | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway` | | Design doc | `docs/requirements/Component-ExternalSystemGateway.md` | | Status | Reviewed | | Last reviewed | 2026-05-28 | | Reviewer | claude-agent | | Commit reviewed | `1eb6e97` | | Open findings | 0 | ## Summary The External System Gateway is a small module (five source files plus options) that implements the HTTP/REST client (`ExternalSystemClient`), the database access surface (`DatabaseGateway`), and error classification (`ErrorClassifier`). The structure is clean and the dual call-mode semantics broadly match the design doc. However, the review surfaced several substantive problems that prevent the module from behaving as designed. The most serious is that **no store-and-forward delivery handler is ever registered** for the `ExternalSystem` or `CachedDbWrite` categories, so cached calls and cached writes are buffered but can never actually be delivered on retry — a silent data-loss path. Two further high-impact issues are that the **per-system call timeout is never applied** to the HTTP client (the design's central error-handling guarantee is absent), and that **`CachedCall` double-dispatches the HTTP request** because `StoreAndForwardService.EnqueueAsync` itself re-attempts immediate delivery, breaking the idempotency expectations. A cluster of medium issues concern resource leaks, classification gaps (cancellation conflation), and the dropped `StoreAndForwardResult`. Test coverage is thin — `CachedCall` transient/buffering paths and `DatabaseGateway` are entirely untested. Themes: incomplete wiring against the S&F engine, and design-doc requirements (timeout, retry settings) that are declared but not implemented. #### Re-review 2026-05-17 (commit `39d737e`) All fourteen prior findings remain `Resolved`; the resolutions for findings 001–014 were spot-checked against the current source and hold. The re-review walked the full 10-category checklist again and surfaced **three new findings**. The most serious (`ExternalSystemGateway-015`, High) is a regression *introduced by* the `ExternalSystemGateway-004` resolution: `CachedCall`/`CachedWrite` now pass a per-system/per-connection `MaxRetries` of `0` through verbatim, but `StoreAndForwardService.RetryMessageAsync` interprets a stored `MaxRetries` of `0` as **retry forever**, not "never retry" — so the very `0` the ESG-004 fix claims to "honour as never retry" actually produces an unbounded retry loop, and two ESG tests assert the broken behaviour. `ExternalSystemGateway-016` (Medium) is that the `ExternalSystemGateway-013` resolution used `ConfigureHttpClientDefaults`, which is a **process-global** registration — it forces a `SocketsHttpHandler` (capped at the ESG option) onto every `HttpClient` in the host, including the Notification Service's OAuth2 token client, not just the gateway's per-system clients. `ExternalSystemGateway-017` (Low) is a trailing-`?` URL nit when a GET method's parameters are all null. Theme: both substantive findings are second-order defects in earlier fixes — the earlier resolutions did not verify the downstream contract of the S&F engine they integrate with. #### Re-review 2026-05-28 (commit `1eb6e97`) All seventeen prior findings (001–017) remain `Resolved`; spot-checks against the current source confirm the fixes still hold. Between `39d737e` and this re-review the only source changes to the module are the documentation-only commit `1eb6e97` (XML doc additions) and the `executionId` / `sourceScript` / `parentExecutionId` plumbing threaded through `CachedCallAsync` / `CachedWriteAsync` to the S&F enqueue (Audit Log #23 Tasks 4/6). The re-review walked the full 10-category checklist again and surfaced **six new findings**, none Critical. The most serious (`ExternalSystemGateway-018`, High) is that `DeliverBufferedAsync` on both `ExternalSystemClient` and `DatabaseGateway` lets a `JsonException` from `JsonSerializer.Deserialize` propagate out of the delivery handler — the S&F engine treats any thrown exception as a transient retry, so a corrupted or schema-incompatible buffered row becomes a permanent poison message that is retried on every sweep forever (the same retry-forever class of hazard `-015` already addressed for a different cause). `ExternalSystemGateway-019` (Medium) is that `HttpClient.Timeout` is never set, so any operator-configured `DefaultHttpTimeout` greater than 100s is silently clipped by `HttpClient`'s built-in 100s default and the gateway's "timeout applies to the HTTP request round-trip" guarantee no longer holds — a partial reopen of the `-002` contract for the long-timeout case. `ExternalSystemGateway-020` (Medium) is a silent precision-loss bug in the cached-DB-write retry path: `JsonElementToParameterValue` collapses any JSON number that is not Int64-convertible to `double`, so a script's `decimal` SQL parameter is downcast on retry and only on retry. The remaining three (`-021`/`-022`/`-023`, Low) are an unauthenticated-by-default `ApplyAuth` for unknown `AuthType` / malformed Basic config, runtime-only HTTP-verb validation, and an undocumented PATCH HTTP method (code vs design-doc drift). Theme: every new finding is in a code path that was added or touched by the earlier fix bundle but whose error-propagation contract was not verified end-to-end against the S&F engine or the design doc. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ☑ | Prior: URL edge cases, dropped S&F result, classification — 003, 006, 009. Re-review: `MaxRetries == 0` retry-forever vs never-retry contradiction (015); trailing-`?` URL nit (017). | | 2 | Akka.NET conventions | ☑ | No actors in this module; `AddExternalSystemGatewayActors` is a no-op. Blocking-I/O isolation is delegated to Site Runtime. No issues found. | | 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; the S&F delivery handlers resolve in a fresh DI scope on the sweep thread. No findings raised. | | 4 | Error handling & resilience | ☑ | Prior: handler registration, double-dispatch, timeout, cancellation — 001, 002, 003, 008. Re-review: the unbounded-retry consequence of finding 015 is also a resilience defect (recorded under category 1). | | 5 | Security | ☑ | Error bodies now truncated (007). No new issues — auth secrets not logged, body capped. | | 6 | Performance & resource management | ☑ | Prior: disposal and repository-scan findings 005, 010, 011 — all resolved and verified. No new issues. | | 7 | Design-document adherence | ☑ | Prior: timeout, retry settings, logging — 002, 004, 012. Re-review: finding 015 is also a design-adherence gap (S&F retry contract); recorded under category 1. | | 8 | Code organization & conventions | ☑ | Prior: `MaxConcurrentConnectionsPerSystem` wiring — 013. Re-review: that wiring uses process-global `ConfigureHttpClientDefaults`, leaking the ESG cap onto every host `HttpClient` — finding 016. | | 9 | Testing coverage | ☑ | Coverage is broad after finding 014. Re-review note: the `ZeroMaxRetries...` tests assert the persisted column, not the sweep outcome, and so lock in the finding-015 defect. | | 10 | Documentation & comments | ☑ | Inline comments at `ExternalSystemClient.cs:118-119` / `DatabaseGateway.cs:99-101` assert a "never retry" semantic that the code does not deliver — see finding 015. | _Re-review (2026-05-28, `1eb6e97`):_ | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ☑ | `JsonException` not caught in either `DeliverBufferedAsync`, so a corrupt buffered payload becomes a permanent poison-message retried forever — finding 018. `JsonElementToParameterValue` collapses a non-Int64 number to `double`, silently losing precision for `decimal` SQL parameters on cached-write retry — finding 020. `new HttpMethod(method.HttpMethod)` accepts any string at runtime, so an invalid HTTP verb is only diagnosed at call time — finding 022. | | 2 | Akka.NET conventions | ☑ | Still no actors in this module; `AddExternalSystemGatewayActors` remains a no-op. The cached-call lifecycle/audit emission lives in `ScriptRuntimeContext` / `CachedCallTelemetryForwarder` (SiteRuntime / AuditLog), not here, and that boundary is correct. No issues found. | | 3 | Concurrency & thread safety | ☑ | Services are still stateless and DI-scoped; the S&F delivery handlers resolve in a fresh DI scope on the sweep thread. The added `executionId` / `sourceScript` / `parentExecutionId` plumbing flows through method arguments only — no shared state introduced. No findings. | | 4 | Error handling & resilience | ☑ | The poison-payload retry-forever path is the headline resilience issue (finding 018). `HttpClient.Timeout` not being set leaves the gateway's per-call round-trip cap clipped to the framework's 100s default whenever the configured `DefaultHttpTimeout` is larger — finding 019 (partial reopen of the `-002` contract). | | 5 | Security | ☑ | Auth secrets still never logged; error bodies still truncated. `ApplyAuth` is silent on unknown `AuthType` / empty `AuthConfiguration` / malformed Basic config — finding 021 (fail-open is a real but bounded risk; recorded Low because misconfiguration is the precondition). Connection-string handling in `DatabaseGateway` reads from the entity verbatim and never logs it. | | 6 | Performance & resource management | ☑ | Disposal paths from findings 005/010 still hold. The `IHttpClientFactory` name-keyed-options registration (finding 016 fix) creates a fresh `SocketsHttpHandler` per primary-handler build — acceptable because `IHttpClientFactory` recycles handlers. No new findings. | | 7 | Design-document adherence | ☑ | The design doc enumerates GET/POST/PUT/DELETE but the code also serializes a body for PATCH (and accepts arbitrary HTTP verbs at runtime) — finding 023 (drift to be reconciled). The per-call timeout guarantee is partially defeated by the unset `HttpClient.Timeout` for option values > 100s — finding 019. | | 8 | Code organization & conventions | ☑ | The `-016` fix replaced `ConfigureHttpClientDefaults` with a scoped `IConfigureNamedOptions` — verified clean, no new conventions issue. `internal virtual CreateConnection` (DatabaseGateway) and `internal InvokeHttpAsync` (ExternalSystemClient) are exposed via `InternalsVisibleTo` for tests — acceptable. No new findings. | | 9 | Testing coverage | ☑ | The `JsonException` deserialization path for `DeliverBufferedAsync` is untested; the `JsonElementToParameterValue` `double`-downcast path is untested; `ApplyAuth`'s unknown-AuthType / empty-config / malformed-Basic branches are untested. Recorded under findings 018 / 020 / 021 rather than a standalone coverage finding. | | 10 | Documentation & comments | ☑ | XML doc additions in `1eb6e97` are accurate and consistent. PATCH support is undocumented in the design doc (finding 023). The inline `ExternalSystemGateway-015` block-comment in `CachedCallAsync` (lines 126–133) and the equivalent in `DatabaseGateway.cs:106–113` now correctly describe the "treat 0 as unset" semantics. | ## Findings ### ExternalSystemGateway-001 — No S&F delivery handler registered; cached calls and writes can never be delivered | | | |--|--| | Severity | Critical | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:109`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:81` | **Description** `CachedCallAsync` and `CachedWriteAsync` enqueue messages under `StoreAndForwardCategory.ExternalSystem` and `StoreAndForwardCategory.CachedDbWrite`. `StoreAndForwardService.RegisterDeliveryHandler` is the only mechanism that lets the S&F engine actually deliver a buffered message, and a repository-wide search shows it is **never called for either category** anywhere in the codebase. Consequences: 1. On a transient failure, `EnqueueAsync` falls through to the "No handler registered — buffer for later" branch (`StoreAndForwardService.cs:163`) and the message is persisted. 2. During the retry sweep, `AttemptDeliveryAsync` (`StoreAndForwardService.cs:201`) logs `"No delivery handler for category {Category}"` and returns without ever removing or delivering the message. The result is that every cached external call and cached DB write is silently buffered forever and never delivered — a data-loss path for the exact "deferred delivery is acceptable" use cases the design doc calls out (posting production data, quality reports). The script also receives `WasBuffered: true` / a successful `CachedWriteAsync` completion, so the failure is completely invisible. **Recommendation** Register delivery handlers for `StoreAndForwardCategory.ExternalSystem` and `StoreAndForwardCategory.CachedDbWrite` during host/site startup. The `ExternalSystem` handler should deserialize the payload, re-resolve the system/method, and re-invoke `InvokeHttpAsync`, returning `true`/`false`/throwing per the transient-vs-permanent contract `EnqueueAsync` expects. The `CachedDbWrite` handler should execute the SQL against the named connection. Add an integration test that buffers a message and verifies it is delivered by a retry sweep. **Resolution** Resolved 2026-05-16. Delivery handlers for `StoreAndForwardCategory.ExternalSystem` and `CachedDbWrite` are now registered at site startup in `AkkaHostedService`, after `StoreAndForwardService.StartAsync()`. Each handler resolves its consumer in a fresh DI scope and calls a new `DeliverBufferedAsync`: `ExternalSystemClient.DeliverBufferedAsync` re-resolves the system/method and re-invokes `InvokeHttpAsync`, and `DatabaseGateway.DeliverBufferedAsync` executes the buffered SQL — each returning `true` on success, `false` (park) when the target no longer exists or fails permanently, and throwing on transient failure so the engine retries. `EnqueueAsync` gained an `attemptImmediateDelivery` parameter; `CachedCallAsync` passes `false` so registering the handler does not dispatch the request twice (the double-dispatch noted in `ExternalSystemGateway-003`). Regression tests cover the success, target-removed and transient-retry paths. Fixed by the commit whose message references `ExternalSystemGateway-001`. ### ExternalSystemGateway-002 — Per-system call timeout is never applied to HTTP requests | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:130`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ServiceCollectionExtensions.cs:13` | **Description** The design doc states each external system definition specifies a timeout that "applies to all method calls on that system" and "applies to the HTTP request round-trip", and `ExternalSystemGatewayOptions.DefaultHttpTimeout` exists as a fallback. In practice no timeout is ever configured. `ServiceCollectionExtensions` calls `services.AddHttpClient()` with no per-named-client configuration, and `InvokeHttpAsync` calls `_httpClientFactory.CreateClient($"ExternalSystem_{system.Name}")` without setting `client.Timeout` or passing a `CancellationToken` derived from a timeout. `SendAsync` is therefore subject only to `HttpClient`'s default 100-second timeout, regardless of the system definition or the configured `DefaultHttpTimeout`. A slow or hung external system will block the calling Script Execution Actor far longer than the operator configured, and the design's core error-handling guarantee (timeout → transient classification) does not hold within the intended window. There is also no `Timeout` field on `ExternalSystemDefinition` at all, so even a correct implementation has nowhere to read the per-system value from — the entity is missing the field the design requires. **Recommendation** Add a `Timeout` (TimeSpan) field to `ExternalSystemDefinition` and have `InvokeHttpAsync` enforce it — either by setting `client.Timeout` via a typed/named `HttpClient` registration, or by linking a `CancellationTokenSource` with the per-system (or `DefaultHttpTimeout`) timeout to the supplied `cancellationToken` before `SendAsync`. Ensure the resulting `TaskCanceledException`/`TimeoutException` is classified as transient. **Resolution** Resolved 2026-05-16 (commit ``). `InvokeHttpAsync` now enforces a call timeout: `ExternalSystemClient` takes an `IOptions` and links a `CancellationTokenSource(DefaultHttpTimeout)` with the caller's token before `SendAsync` and the response-body read, so the design's "timeout applies to the HTTP request round-trip" guarantee now holds within the configured window (default 30s) instead of `HttpClient`'s default 100s. A timeout is reclassified as a `TransientExternalSystemException`; a caller-initiated cancellation is distinguished from a timeout and propagated as `OperationCanceledException` rather than being swallowed as transient. Regression tests: `Call_SlowSystem_TimesOutAsTransientErrorWithinConfiguredWindow` and `Call_CallerCancellation_IsNotMisreportedAsTimeout`. Note (partial scope): the per-*system* `Timeout` field on `ExternalSystemDefinition` remains unimplemented — adding it requires a change to `ZB.MOM.WW.ScadaBridge.Commons`, which is outside this module's edit scope. Until that entity field exists, the configured `DefaultHttpTimeout` is the effective per-call limit for every system. A follow-up against the Commons module should add the `Timeout` field and have `InvokeHttpAsync` prefer it over the default. This is a tracked follow-up, not a regression. ### ExternalSystemGateway-003 — `CachedCall` double-dispatches the HTTP request | | | |--|--| | Severity | High | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:84-117` | **Description** `CachedCallAsync` first calls `InvokeHttpAsync` directly (line 86). On a `TransientExternalSystemException` it then calls `_storeAndForward.EnqueueAsync(...)` (line 109). `StoreAndForwardService.EnqueueAsync` is **not** a pure enqueue — it "Attempts immediate delivery" by invoking the registered delivery handler (`StoreAndForwardService.cs:128-159`). If a delivery handler for the `ExternalSystem` category is registered (as finding 001 recommends), the HTTP request will be executed a **second time** synchronously inside `EnqueueAsync`, immediately after the first attempt failed. For a transient failure that is actually a slow/overloaded system, this doubles the load and — critically — if the original request did reach the external system, the immediate retry produces a duplicate delivery before the script even returns, worsening the idempotency hazard the design doc explicitly warns about. **Recommendation** Decide on one dispatch path. Either (a) have `CachedCall` not pre-invoke `InvokeHttpAsync` and instead let `EnqueueAsync`'s immediate-delivery attempt be the single first attempt (requires the handler to exist and to surface permanent vs transient correctly); or (b) add an enqueue-only entry point to `StoreAndForwardService` that skips the immediate-delivery attempt, and have `CachedCall` use it after its own first attempt. Approach (a) is cleaner and removes the duplicated logic. **Resolution** Resolved 2026-05-16 (commit ``). Re-triage: this finding was already fixed in the codebase as a side effect of the `ExternalSystemGateway-001` fix and is no longer reproducible against the current source. `StoreAndForwardService.EnqueueAsync` gained an `attemptImmediateDelivery` parameter (recommendation approach (b)), and `CachedCallAsync` passes `attemptImmediateDelivery: false` after its own first HTTP attempt — so `EnqueueAsync` buffers the message for the background retry sweep without re-invoking the registered delivery handler, eliminating the duplicate dispatch. A dedicated regression test, `CachedCall_TransientFailure_DoesNotImmediatelyRedispatchViaRegisteredHandler`, was added in this module's test suite: it registers a counting delivery handler, drives a `CachedCall` whose HTTP attempt fails transiently, and asserts the handler is invoked zero times during enqueue. The test was verified to fail if `attemptImmediateDelivery` is flipped back to `true`. ### ExternalSystemGateway-004 — System retry settings are not honoured for cached calls/writes | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:114-115`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:86-87` | **Description** `CachedCallAsync` and `CachedWriteAsync` pass the definition's `MaxRetries` / `RetryDelay` to `EnqueueAsync` only when they are non-default (`MaxRetries > 0 ? ... : null`, `RetryDelay > TimeSpan.Zero ? ... : null`), otherwise falling back to the S&F defaults. The site-side repository that supplies these definitions, `SiteExternalSystemRepository.MapExternalSystem` (`src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Repositories/SiteExternalSystemRepository.cs:194`), never reads `MaxRetries`/`RetryDelay` from SQLite at all — the constructed entities always have `MaxRetries == 0` and `RetryDelay == TimeSpan.Zero`. As a result, at sites the per-system retry settings the design doc requires are *always* discarded and the global S&F defaults are silently used instead. The `> 0` guard in the ESG also makes a legitimately-configured `MaxRetries` of 0 ("never retry") indistinguishable from "unset", so an operator cannot express "do not retry". **Recommendation** Within this module, drop the `> 0` / `> Zero` guards and pass the definition values through directly (or use nullable fields on the entity to distinguish "unset"). The companion fix in `SiteExternalSystemRepository` to actually map the retry columns should be tracked against the SiteRuntime module. **Resolution** Resolved 2026-05-16 (commit pending). `CachedCallAsync` and `CachedWriteAsync` now pass the definition's `MaxRetries` to `EnqueueAsync` verbatim — the `> 0` guard is dropped, so a legitimately-configured `MaxRetries` of 0 ("never retry") is honoured instead of being collapsed to the S&F default. The `RetryDelay > TimeSpan.Zero` guard is deliberately **kept**: `TimeSpan.Zero` is the entity default for an unconfigured field and a literal zero-delay retry loop is not a valid configuration, so falling back to the S&F default interval for an unset delay is correct (only `MaxRetries == 0` is a meaningful operator choice). Regression test `CachedCall_TransientFailure_ZeroMaxRetriesIsHonouredNotTreatedAsUnset` buffers a transient failure and asserts the buffered message carries `MaxRetries == 0` rather than the S&F default; `CachedCall_TransientFailure_BuffersWithSystemRetrySettings` additionally covers a non-default settings pass-through. The companion fix in `SiteExternalSystemRepository.MapExternalSystem` to actually read the `MaxRetries` / `RetryDelay` columns from SQLite remains a tracked follow-up against the SiteRuntime module (outside this module's edit scope) — until then, sites still supply `MaxRetries == 0`, which this fix now correctly honours as "never retry". ### ExternalSystemGateway-005 — `HttpRequestMessage` and `HttpResponseMessage` are not disposed | | | |--|--| | Severity | Medium | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:133-167` | **Description** `InvokeHttpAsync` creates an `HttpRequestMessage` (line 133) and receives an `HttpResponseMessage` from `SendAsync` (line 155); neither is wrapped in a `using` nor explicitly disposed. Both are `IDisposable` and own resources (the request's `StringContent`, the response's content stream). Under the per-invocation call volume of a busy site this produces avoidable pressure on the finalizer queue and can hold socket/stream resources longer than necessary. The success path reads the content but never disposes the response; the error path likewise reads `errorBody` and then throws without disposing. **Recommendation** Wrap the request in `using var request = ...` and the response in `using var response = ...` (or call `Dispose()` in a `finally`). Ensure disposal still occurs on the exception paths. **Resolution** Resolved 2026-05-16 (commit pending). `InvokeHttpAsync` now declares the request as `using var request` and wraps all response handling in a `using (response)` block, so both `IDisposable` instances (and the request's `StringContent` / the response content stream) are released on the success path **and** on the permanent/transient exception paths. Regression tests `Call_SuccessfulHttp_DisposesRequestAndResponse` and `Call_PermanentFailure_StillDisposesRequestAndResponse` use a disposal-tracking `HttpMessageHandler`/`HttpContent` and assert both the request and the response content are disposed; both were verified to fail before the `using` wrappers were added. ### ExternalSystemGateway-006 — `BuildUrl` ignores path templates and appends a trailing slash for empty paths | | | |--|--| | Severity | Medium — partially re-triaged: trailing-slash bug fixed; path-templating sub-issue is a design decision (see Resolution) | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:180-196` | **Description** `BuildUrl` does `baseUrl.TrimEnd('/') + "/" + path.TrimStart('/')`. When `method.Path` is empty (a method that targets the base URL itself), this still appends a `/`, producing `https://host/api/` which some servers treat as a different resource than `https://host/api`. More importantly, the design doc shows method paths as templates like `/recipes/{id}`, but `BuildUrl` performs no placeholder substitution — a `{id}` token is sent literally in the URL and the corresponding parameter is instead appended as a query-string entry (for GET/DELETE) or placed in the JSON body (POST/PUT). Either the design's path-template feature is unimplemented, or the doc is stale; in the current code a method defined as `/recipes/{id}` will never produce a correct URL. **Recommendation** Decide whether path templating is in scope. If yes, implement `{name}` substitution from `parameters` in `BuildUrl` and exclude substituted parameters from the query string/body. If no, update the component design doc to remove the `/recipes/{id}` example and state that paths are literal. Also avoid appending a trailing `/` when `path` is empty. **Resolution** Resolved 2026-05-16 (commit pending). The **trailing-slash bug** is fixed: `BuildUrl` now appends a `/`-joined path segment only when the method's path is non-empty after trimming, so a method targeting the base URL itself produces `https://host/api` rather than `https://host/api/`. Regression tests `Call_MethodWithEmptyPath_DoesNotAppendTrailingSlash` and `Call_MethodWithPath_BuildsExpectedUrl` (asserting on the captured request URI) cover the empty-path and normal-path cases; the empty-path test was verified to fail before the fix. Re-triage of the **path-templating sub-issue** (`{id}` placeholder substitution): this is a genuine design decision, not a code bug, and it requires editing the component design doc — both outside this module's edit scope (`src/`, `tests/`, this file only). The current code treats method paths as literal strings and routes parameters to the query string (GET/DELETE) or JSON body (POST/PUT); a method authored as `/recipes/{id}` sends the `{id}` token verbatim. **Tracked follow-up / surfaced design question:** the design owner must decide whether path templating is in scope — if yes, implement `{name}` substitution in `BuildUrl` and exclude substituted params from the query/body; if no, the `Component-ExternalSystemGateway.md` `/recipes/{id}` example must be changed to a literal path. The trailing-slash defect (the concrete correctness bug in this finding) is fully resolved. ### ExternalSystemGateway-007 — External error response bodies are echoed verbatim into script-visible error messages | | | |--|--| | Severity | Medium | | Category | Security | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:167-177` | **Description** On a non-success HTTP response, the full response body is read into `errorBody` and embedded verbatim into the exception message (`$"HTTP {code} from {name}: {errorBody}"`), which then flows into `ExternalCallResult.ErrorMessage` and back to the calling script, and into Site Event Logging. An external system error page can be arbitrarily large (an HTML stack trace, a multi-megabyte body) and may contain sensitive detail. There is no size cap, so a hostile or misbehaving endpoint can inflate every error log entry and error string returned to scripts. There is also no content-type check before treating the body as text. **Recommendation** Truncate `errorBody` to a bounded length (e.g. 1–2 KB) before embedding it, and consider logging the full body separately at debug level rather than returning it to the script. Optionally only include the body when the content type is textual. **Resolution** Resolved 2026-05-16 (commit pending). `InvokeHttpAsync` now truncates the external error response body to `MaxErrorBodyChars` (2048) via a `Truncate` helper before embedding it into the transient/permanent exception message — so a misbehaving or hostile endpoint can no longer inflate every script-visible `ErrorMessage` and Site Event Logging entry with a multi-megabyte body. When truncation occurs the message is suffixed with `… [truncated, N chars total]` so the original size is still visible. Regression test `Call_PermanentFailureWithHugeErrorBody_TruncatesErrorMessage` drives a 400 with a 500 KB body and asserts the resulting `ErrorMessage` is bounded (< 4096 chars); it was verified to fail (500 040-char message) before the cap was added. Content-type filtering was considered optional in the recommendation and was not implemented — the size cap alone closes the inflation/disclosure vector. ### ExternalSystemGateway-008 — Cancellation is conflated with transient timeout failure | | | |--|--| | Severity | Medium — re-triaged: root cause already fixed in current source (see Resolution) | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ErrorClassifier.cs:24-30`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:157-159` | **Description** `ErrorClassifier.IsTransient(Exception)` returns `true` for `TaskCanceledException` and `OperationCanceledException`. `HttpClient.SendAsync` throws `TaskCanceledException` both when its internal timeout elapses *and* when the supplied `CancellationToken` is cancelled (e.g. the Script Execution Actor is stopped, or the actor system is shutting down). Because `InvokeHttpAsync`'s `catch` filter treats all of these as transient, a caller-initiated cancellation during a `CachedCall` will be misclassified as a transient failure and the message will be buffered for retry — work the caller explicitly asked to abandon. For a `Call`, a shutdown-time cancellation is reported to the script as a "Transient error" rather than an `OperationCanceledException`. **Recommendation** In `InvokeHttpAsync`, check `cancellationToken.IsCancellationRequested` first and rethrow `OperationCanceledException` (or let it propagate) before applying transient classification. Only treat a cancellation as a timeout when the supplied token is *not* the one that was cancelled. **Resolution** Resolved 2026-05-16 (commit pending). **Re-triage:** the root cause described — a caller-initiated cancellation being misclassified as a transient failure — is **no longer present in the current source** and is not reproducible. `InvokeHttpAsync` already wraps both `SendAsync` and the response-body `ReadAsStringAsync` in ordered `catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)` filters that rethrow the caller's cancellation *before* the `catch (Exception ex) when (ErrorClassifier.IsTransient(ex))` branch is ever reached (this was added alongside the `ExternalSystemGateway-002` timeout fix). A caller-cancel therefore propagates as `OperationCanceledException` and is never buffered; only the gateway's own timeout token reclassifies as transient. `ErrorClassifier.IsTransient(Exception)` does still return `true` for `TaskCanceledException`/`OperationCanceledException`, but that is **correct and intentional**: a `TaskCanceledException` raised by an HTTP timeout *is* a genuine transient failure, and the only caller (`InvokeHttpAsync:238`) is unreachable for a caller-cancellation because the two preceding `when`-filtered catches intercept it first. The transient-vs-cancel decision is contextual (which token fired) and cannot be made from the exception type alone, which is exactly why the call site does it. No source change was required. A regression guard, `CachedCall_CallerCancellation_IsNotBufferedAsTransient`, was added: it cancels the caller token mid-`CachedCall` and asserts an `OperationCanceledException` is thrown and the S&F buffer remains empty (the cancelled work is not retried). The existing `Call_CallerCancellation_IsNotMisreportedAsTimeout` covers the synchronous `Call` path. ### ExternalSystemGateway-009 — `StoreAndForwardResult` from `EnqueueAsync` is discarded; permanent failures during buffering are swallowed | | | |--|--| | Severity | Medium — re-triaged: root cause subsumed by the ExternalSystemGateway-003 dispatch redesign (see Resolution) | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:109-117` | **Description** `CachedCallAsync` assigns the result of `_storeAndForward.EnqueueAsync(...)` to `sfResult` and then never reads it — it unconditionally returns `new ExternalCallResult(true, null, null, WasBuffered: true)`. `EnqueueAsync` can return `Success == false` (a permanent failure encountered during its immediate-delivery attempt — `StoreAndForwardService.cs:142`) or `Buffered == false` (delivered immediately). In both cases the ESG still reports the call as buffered and successful to the script. A permanent failure surfaced by the S&F immediate attempt is therefore silently lost instead of being returned to the script as the design requires ("On permanent failure (HTTP 4xx), the error is returned synchronously"). **Recommendation** Inspect `sfResult`: if `Success == false` return an error `ExternalCallResult`; set `WasBuffered` from `sfResult.Buffered` rather than hard-coding `true`. (This finding is partly subsumed by the dispatch redesign in finding 003.) **Resolution** Resolved 2026-05-16 (commit pending). **Re-triage:** the stated root cause — "a permanent failure surfaced by `EnqueueAsync`'s immediate-delivery attempt is silently lost" — **can no longer occur** in the current source, and the dead `sfResult` variable the finding cites has already been removed. The `ExternalSystemGateway-003` fix changed `CachedCallAsync` to call `EnqueueAsync` with `attemptImmediateDelivery: false`. With that flag, `EnqueueAsync` never invokes the registered delivery handler: it skips the immediate-delivery block entirely (so the `StoreAndForwardResult(false, …, …)` permanent-failure return at `StoreAndForwardService.cs:147` is unreachable from this caller) and unconditionally buffers, returning `Accepted: true, WasBuffered: true` (`StoreAndForwardService.cs:180`). The `ExternalCallResult(true, null, null, WasBuffered: true)` that `CachedCallAsync` returns is therefore now factually correct in every reachable case — the message *is* buffered and there is no swallowed permanent failure. Permanent HTTP 4xx failures are still surfaced synchronously, because `CachedCallAsync` makes its own first HTTP attempt and catches `PermanentExternalSystemException` *before* it ever reaches `EnqueueAsync`. No source change was required beyond the `ExternalSystemGateway-003` redesign that already landed. Coverage: `CachedCall_TransientFailure_BuffersWithSystemRetrySettings` asserts both `result.WasBuffered == true` and that the message is genuinely present in the S&F buffer (depth == 1), confirming the `WasBuffered: true` claim is not a lie; the existing `CachedCall` permanent-failure path is exercised by `Call_Permanent400_ReturnsPermanentError` semantics shared via `InvokeHttpAsync`. ### ExternalSystemGateway-010 — `GetConnectionAsync` leaks the `SqlConnection` when `OpenAsync` fails | | | |--|--| | Severity | Medium | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:48-50` | **Description** `GetConnectionAsync` constructs `new SqlConnection(...)` and calls `await connection.OpenAsync(...)`. If `OpenAsync` throws (unreachable server, bad credentials, cancellation) the just-created `SqlConnection` instance is never disposed — the exception propagates and the local reference is lost. While an unopened `SqlConnection` is lightweight, over many failing calls this is an avoidable leak. The design doc says `Database.Connection()` failures return an error to the script; the current code lets a raw `SqlException` escape, which is acceptable, but the leak is not. **Recommendation** Wrap the open in a try/catch that disposes the connection before rethrowing: `try { await connection.OpenAsync(ct); } catch { connection.Dispose(); throw; }`. **Resolution** Resolved 2026-05-16 (commit pending). `GetConnectionAsync` now wraps `OpenAsync` in a `try/catch` that calls `await connection.DisposeAsync()` before rethrowing, so a failed open (unreachable server, bad credentials, cancellation) no longer leaks the `SqlConnection`. Connection creation was extracted into an `internal virtual CreateConnection(string)` factory so the failure path is unit-testable. Regression test `GetConnection_OpenFails_DisposesConnectionBeforeRethrowing` substitutes a `DbConnection` whose `OpenAsync` always throws and asserts the connection is disposed when the exception propagates; it was verified to fail before the `try/catch` was added. ### ExternalSystemGateway-011 — Every call performs a full repository scan of all systems and methods | | | |--|--| | Severity | Low | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:360-374`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:169-176` | **Description** `ResolveSystemAndMethodAsync` calls `GetAllExternalSystemsAsync()` and then `GetMethodsByExternalSystemIdAsync()` and filters in memory on every single call; `ResolveConnectionAsync` calls `GetAllDatabaseConnectionsAsync()` and filters in memory on every cached write / connection request. At sites this hits the SQLite repository, and `SiteExternalSystemRepository` re-reads and re-parses the methods JSON each time. For a hot script path this is unnecessary repeated I/O and allocation. Definitions only change on deployment, so they are eminently cacheable. **Recommendation** Add an in-memory cache of system/method/connection definitions keyed by name, invalidated on artifact deployment. Alternatively use a name-keyed repository lookup rather than fetch-all-then-filter. **Resolution** Resolved 2026-05-16 (commit ``). A cross-module change was explicitly authorized, so the **name-keyed repository lookup** recommendation was applied — the cleaner of the two options, and one that avoids the staleness hazard a deployment-invalidated cache would introduce. Three name-keyed methods were added to `IExternalSystemRepository` (`ZB.MOM.WW.ScadaBridge.Commons`): `GetExternalSystemByNameAsync(name)`, `GetMethodByNameAsync(externalSystemId, methodName)` and `GetDatabaseConnectionByNameAsync(name)`. The connection lookup belongs on the same interface because database connection definitions are already part of `IExternalSystemRepository` (alongside `GetAllDatabaseConnectionsAsync` / `GetDatabaseConnectionByIdAsync`), so the existing repository organization was followed rather than introducing a new interface. Both implementers of the interface were updated: - `ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.ExternalSystemRepository` — all three are genuine server-side keyed queries (`FirstOrDefaultAsync(x => x.Name == name)`, the method lookup additionally scoped by `ExternalSystemDefinitionId`), matching the existing `GetMethodByNameAsync` / `GetListByNameAsync` / `GetSharedScriptByNameAsync` convention in the other Central repositories. - `ZB.MOM.WW.ScadaBridge.SiteRuntime.SiteExternalSystemRepository` — `GetExternalSystemByNameAsync` and `GetDatabaseConnectionByNameAsync` are genuine single-row indexed SQLite queries (`WHERE name = @name`; both tables have `name` as the PRIMARY KEY). `GetMethodByNameAsync` resolves the named method from the parent system's `method_definitions` JSON column; this still requires resolving the parent system (one id→name scan), but the gateway's new call sequence performs **one** scan instead of the previous **two** (`GetAllExternalSystemsAsync` + `GetMethodsByExternalSystemIdAsync`, which itself scanned), so the site path is strictly better than before — noted as the one place a residual scan remains, bounded by the deployed system count. `ExternalSystemClient.ResolveSystemAndMethodAsync` and `DatabaseGateway.ResolveConnectionAsync` were rewritten to call the keyed lookups; the `GetAllExternalSystemsAsync` / `GetMethodsByExternalSystemIdAsync` / `GetAllDatabaseConnectionsAsync` + in-memory `FirstOrDefault` filtering is gone from both hot paths. Regression tests (TDD — written first, verified failing/not-compiling before the implementation, then confirmed green; one was additionally verified to fail when the keyed query is deliberately broken): - ConfigurationDatabase (`RepositoryCoverageTests`): `GetExternalSystemByName_ReturnsMatchingRow`, `GetExternalSystemByName_MissingName_ReturnsNull`, `GetMethodByName_ReturnsMethodScopedToParentSystem`, `GetMethodByName_MissingMethod_ReturnsNull`, `GetDatabaseConnectionByName_ReturnsMatchingRow`, `GetDatabaseConnectionByName_MissingName_ReturnsNull`. - SiteRuntime (`SiteRepositoryTests`): `ExternalSystemRepository_GetByName_ReturnsMatchingDefinition`, `ExternalSystemRepository_GetByName_MissingName_ReturnsNull`, `ExternalSystemRepository_GetMethodByName_ResolvesScopedToSystem`, `DatabaseConnectionRepository_GetByName_ReturnsMatchingDefinition`. - ExternalSystemGateway: the existing `ExternalSystemClientTests` / `DatabaseGatewayTests` resolution stubs were migrated to the keyed methods (via `StubResolution` / `StubConnection` helpers), so the full gateway suite now exercises and protects the keyed-lookup resolution path. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` is clean; `ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests` (54), `ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests` (106), `ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests` (196) and `ZB.MOM.WW.ScadaBridge.Commons.Tests` (226) all pass. ### ExternalSystemGateway-012 — Permanent-failure logging requirement is not met; `_logger` is injected but unused | | | |--|--| | Severity | Low | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:24,169-177`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:22` | **Description** The design doc states permanent failures are "Logged to Site Event Logging", but `InvokeHttpAsync` performs no logging on the permanent-failure path. In fact the injected `ILogger` and `ILogger` fields are never used at all in either class. Either the logging is expected to happen in the caller (Script Execution Actor) — in which case the design doc is imprecise about where — or it is missing. Separately, `IsTransient(HttpStatusCode)` treats any non-success, non-(5xx/408/429) status as permanent without an explicit comment, which is a reasonable default but undocumented. **Recommendation** Add a `_logger.LogWarning` on the permanent-failure path (and a debug log on transient), or clarify in the design doc that Site Event Logging capture is the caller's responsibility and remove the unused `_logger` fields. Add a comment in `ErrorClassifier` documenting the "default to permanent" behaviour. **Resolution** Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text: the "`_logger` injected but unused" claim is **stale** — both loggers are already used (the `DeliverBufferedAsync` retry-sweep handlers added by `ExternalSystemGateway-001` log park/error events). The genuine remaining gap — `InvokeHttpAsync` performing no logging on the HTTP-failure paths — is now fixed: `InvokeHttpAsync` emits a `_logger.LogWarning` on the permanent-failure path (status code, system, method, truncated error body) so a permanent failure is visible in Site Event Logging as the design requires, and a `_logger.LogDebug` on the transient-failure path (transient failures are normal retry/S&F operation and must not be noisy at warning level). A documentation comment was added to `ErrorClassifier.IsTransient(HttpStatusCode)` explaining the "any other non-success status defaults to permanent" behaviour and why permanent is the safe default. Regression tests: `Call_PermanentFailure_LogsAWarning` (asserts a warning carrying the system name is emitted; verified to fail before the `LogWarning` was added) and `Call_TransientFailure_DoesNotLogAtWarningOrAbove` (guards against over-logging transient failures). ### ExternalSystemGateway-013 — `MaxConcurrentConnectionsPerSystem` and `DefaultHttpTimeout` options are defined but never used | | | |--|--| | Severity | Low | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemGatewayOptions.cs:9,12`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ServiceCollectionExtensions.cs:13` | **Description** `ExternalSystemGatewayOptions.MaxConcurrentConnectionsPerSystem` (default 10) and `DefaultHttpTimeout` (default 30s) are bound from configuration but neither is read anywhere. `AddHttpClient()` registers the default factory with no `ConfigurePrimaryHttpMessageHandler`/`SocketsHttpHandler` `MaxConnectionsPerServer` and no `Timeout`, so both options have no effect. An operator setting these values gets them silently ignored — a misleading configuration surface (`DefaultHttpTimeout` is also referenced by finding 002). **Recommendation** Either wire the options into a named/typed `HttpClient` registration (set `MaxConnectionsPerServer` on the primary handler, set `Timeout`), or remove the unused options to avoid implying behaviour that does not exist. **Resolution** Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text: `DefaultHttpTimeout` is **no longer unused** — it became the effective per-call HTTP round-trip limit when `ExternalSystemGateway-002` was fixed (`InvokeHttpAsync` builds a linked `CancellationTokenSource(DefaultHttpTimeout)`). The genuinely-unused option, `MaxConcurrentConnectionsPerSystem`, is now wired in: `AddExternalSystemGateway` adds a `ConfigureHttpClientDefaults` registration that supplies a `SocketsHttpHandler` whose `MaxConnectionsPerServer` is bound from the option, so it applies to every per-system named client (`ExternalSystem_{name}`) the gateway creates rather than being silently ignored. Regression test `ServiceWiringTests.MaxConcurrentConnectionsPerSystem_IsAppliedToTheNamedHttpClientPrimaryHandler` builds the DI container, resolves the named client's handler chain via `IHttpMessageHandlerFactory`, walks to the primary handler and asserts `SocketsHttpHandler.MaxConnectionsPerServer` equals the configured value; it was verified to fail before the wiring was added. ### ExternalSystemGateway-014 — Cached-call buffering path and `DatabaseGateway` are untested | | | |--|--| | Severity | Low | | Category | Testing coverage | | Status | Resolved | | Location | `tests/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests/ExternalSystemClientTests.cs:1`, `tests/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests/DatabaseGatewayTests.cs` | **Description** `ExternalSystemClientTests` covers system/method not-found, success, transient 500 and permanent 400 for `CallAsync`, plus `CachedCall` not-found and success. It does **not** cover: the `CachedCall` transient-failure → S&F buffering branch (the most behaviour-rich path, including the `_storeAndForward == null` fallback and `WasBuffered` semantics), the `CachedCall` permanent-failure branch, connection-exception classification (`HttpRequestException` thrown by the handler), `BuildUrl` query-string construction, and `ApplyAuth` for the apikey/basic variants. There is **no test file for `DatabaseGateway`** at all — `GetConnectionAsync` not-found, `CachedWriteAsync` not-found, and the `_storeAndForward == null` guard are entirely uncovered. The `MockHttpMessageHandler` also does not assert request URL/headers/body, so auth and URL construction are unverified. **Recommendation** Add tests for the `CachedCall` transient/buffering paths (with a substituted S&F service), `DatabaseGateway` not-found and null-S&F guards, and `BuildUrl`/`ApplyAuth` by asserting on the captured `HttpRequestMessage` in the mock handler. **Resolution** Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text: a number of the listed gaps were **already closed** by tests added when findings 001–010 were fixed — `DatabaseGatewayTests.cs` now exists (not-found, null-S&F guard, buffered-write delivery), and the `CachedCall` transient/buffering and permanent paths are covered by the `ExternalSystemGateway-001/003/004/008` regression tests. The remaining genuine coverage gaps are now closed with new tests: - `Call_GetWithParameters_AppendsEscapedQueryString` — `BuildUrl` GET query-string construction, asserting on the captured request URI and that an `&` inside a value is percent-escaped. - `Call_PostWithParameters_SendsJsonBody` — POST JSON-body construction. - `Call_ApiKeyAuthWithDefaultHeader_SendsXApiKeyHeader`, `Call_ApiKeyAuthWithCustomHeader_SendsNamedHeader`, `Call_BasicAuth_SendsBase64AuthorizationHeader` — `ApplyAuth` for all three auth variants, asserting on the captured request headers. - `Call_ConnectionError_IsClassifiedAsTransient` — a connection-level `HttpRequestException` is classified transient. - `CachedWrite_BuffersTheWriteWithConnectionRetrySettings` and `CachedWrite_ZeroMaxRetriesIsHonouredNotTreatedAsUnset` — the `DatabaseGateway` `CachedWrite` happy-path enqueue against a real S&F service. The shared `RequestCapturingHandler` test helper was extended to capture request headers and body so URL/auth/body construction is now verified, not just status codes. These are new-coverage tests against already-correct behaviour, so they pass on the current source; the `BuildUrl` and `ApplyAuth` paths they exercise are now protected against regression. ### ExternalSystemGateway-015 — `MaxRetries == 0` is buffered as "retry forever", contradicting the ExternalSystemGateway-004 "never retry" claim | | | |--|--| | Severity | High | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:120-127`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:102-108` | **Description** The `ExternalSystemGateway-004` fix removed the `MaxRetries > 0 ? ... : null` guard so that `CachedCallAsync` and `CachedWriteAsync` now pass the definition's `MaxRetries` to `StoreAndForwardService.EnqueueAsync` verbatim. The stated rationale (and the inline comments at `ExternalSystemClient.cs:118-119` / `DatabaseGateway.cs:99-101`, plus the tests `CachedCall_TransientFailure_ZeroMaxRetriesIsHonouredNotTreatedAsUnset` and `CachedWrite_ZeroMaxRetriesIsHonouredNotTreatedAsUnset`) is that a configured `MaxRetries` of `0` means **"never retry"**. That is the opposite of what the Store-and-Forward engine actually does with the value. `EnqueueAsync` stores the passed `maxRetries` directly into `StoreAndForwardMessage.MaxRetries` (`StoreAndForwardService.cs:139`), whose own XML doc states **"Maximum retry-sweep attempts before parking (0 = no limit)"** (`StoreAndForwardMessage.cs:30`). The retry sweep enforces it as `if (message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries)` (`StoreAndForwardService.cs:285`) — when `MaxRetries == 0` that guard is false on every sweep, so the message is **never parked and is retried forever**. Consequences for a `CachedCall`/`CachedWrite` against a system or connection configured with `MaxRetries = 0`: 1. A transiently-failing message that the operator intended never to retry is instead retried on every sweep indefinitely, accumulating in the buffer and repeatedly re-dispatching the request — the exact unbounded-retry / duplicate-delivery hazard the idempotency note in the design doc warns about. 2. The two ESG regression tests cited above assert `max_retries == 0` is *stored* and describe that as "honoured" — they verify the persisted column, never the resulting sweep behaviour, so they lock in the broken semantics. 3. Because the SiteRuntime repository still always supplies `MaxRetries == 0` (the open companion gap noted in `ExternalSystemGateway-004`), **every** cached call and cached write at every site currently buffers as retry-forever. Before the ESG-004 fix the `> 0` guard sent `null`, so the S&F `DefaultMaxRetries` (a bounded value) applied — i.e. the ESG-004 fix turned a bounded retry into an unbounded one for the common path. **Recommendation** Reconcile the ESG and S&F interpretations of `MaxRetries == 0` — they must agree. Either: (a) treat the entity's `MaxRetries == 0` as "unset" and pass `null` so the bounded S&F default applies (reverting to the pre-ESG-004 behaviour, and accepting that "never retry" then needs a different representation such as a nullable field or a `MaxRetries == 1` convention); or (b) if `0` genuinely must mean "never retry", add an explicit no-retry path — e.g. do not enqueue at all on transient failure when `MaxRetries == 0`, or introduce a distinct sentinel — and fix the `StoreAndForwardMessage.MaxRetries` doc and `RetryMessageAsync` guard so `0` no longer means "no limit". Update the two `ZeroMaxRetries...` tests to assert the *sweep* outcome (parked / not retried), not just the stored column value. **Resolution** Resolved 2026-05-17. Root cause confirmed: the S&F engine treats a stored `MaxRetries == 0` as "no limit / retry forever" (`StoreAndForwardMessage.MaxRetries` doc "0 = no limit"; sweep guard `MaxRetries > 0 && RetryCount >= MaxRetries`), while the entity's non-nullable `int MaxRetries` defaults to `0` — so passing it verbatim buffered every cached call/write as an unbounded retry loop. Fix (ESG-side only, recommendation (a)): `CachedCallAsync` and `CachedWriteAsync` now pass `MaxRetries > 0 ? MaxRetries : null`, so an entity `0` is treated as "unset" and the bounded S&F `DefaultMaxRetries` applies; the misleading "0 = never retry" inline comments were corrected. The two `ZeroMaxRetries...` tests were rewritten to `CachedCall_TransientFailure_ZeroMaxRetriesIsTreatedAsUnsetNotRetryForever` / `CachedWrite_ZeroMaxRetriesIsTreatedAsUnsetNotRetryForever`, asserting the buffered message carries the bounded default (99) and never `0`. ### ExternalSystemGateway-016 — `ConfigureHttpClientDefaults` applies the ESG connection cap to every `HttpClient` in the host process | | | |--|--| | Severity | Medium | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ServiceCollectionExtensions.cs:21-29` | **Description** The `ExternalSystemGateway-013` fix wires `MaxConcurrentConnectionsPerSystem` by calling `services.ConfigureHttpClientDefaults(builder => builder.ConfigurePrimaryHttpMessageHandler(...))`. The inline comment claims this "applies to the dynamically-named clients created by `ExternalSystemClient`" — but `ConfigureHttpClientDefaults` is **process-global**: it adds the configuration to *every* `HttpClient`/`IHttpClientFactory` client created anywhere in the host, regardless of name. The Host registers the External System Gateway alongside other components that also use `IHttpClientFactory` — notably `ZB.MOM.WW.ScadaBridge.NotificationService` (`OAuth2TokenService` and its `ServiceCollectionExtensions` call `AddHttpClient`). With the ESG registration present, the OAuth2 token client (and any future `HttpClient` consumer in the host) has its **primary handler replaced** by a `SocketsHttpHandler` whose `MaxConnectionsPerServer` is the ESG's `MaxConcurrentConnectionsPerSystem`. That: 1. Silently caps unrelated clients at a value owned by, and named for, a different component — an operator tuning the ESG option unknowingly throttles Microsoft 365 OAuth2 token acquisition. 2. Overrides/discards any primary-handler configuration those other components add for their own clients (e.g. a custom handler, proxy, or certificate settings). This is a leaky, surprising side effect for what the option claims to be a per-ESG setting; `ConfigureHttpClientDefaults` should not be used to express a single component's policy. **Recommendation** Scope the handler configuration to the gateway's own clients only. The ESG already creates per-system clients with a deterministic name pattern (`ExternalSystem_{system.Name}`); register a typed/named `HttpClient` (or a small factory abstraction) for that pattern and call `ConfigurePrimaryHttpMessageHandler` on that registration instead of on the global defaults. If a name-pattern registration is impractical, document the global effect explicitly and rename the option, but the preferred fix is to stop using `ConfigureHttpClientDefaults`. **Resolution** Resolved 2026-05-17. Root cause confirmed: `ConfigureHttpClientDefaults` is process-global and replaced the primary handler of every `IHttpClientFactory` client in the host, leaking the ESG connection cap onto unrelated clients. Fix: the global `ConfigureHttpClientDefaults` registration was replaced with an `IConfigureNamedOptions` (`GatewayHttpClientConfigurator`) that applies the `SocketsHttpHandler`/`MaxConnectionsPerServer` cap only to clients whose name starts with `ExternalSystem_` (the gateway's own per-system clients), so clients owned by other components keep their own (or the framework default) primary handler. Regression test `ServiceWiringTests.MaxConcurrentConnectionsPerSystem_IsNotAppliedToNonGatewayHttpClients` asserts a non-gateway client does not inherit the cap while the gateway client still does; it was verified to fail before the fix. ### ExternalSystemGateway-017 — `BuildUrl` appends a bare trailing `?` when a GET method's parameters are all null | | | |--|--| | Severity | Low | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:324-333` | **Description** In `BuildUrl`, the GET/DELETE query-string branch is entered when `parameters != null && parameters.Count > 0`, but the projection then filters out null-valued entries (`parameters.Where(p => p.Value != null)`). When a GET/DELETE method is invoked with a non-empty parameter dictionary whose values are *all* null, `queryString` is the empty string and the code still executes `url += "?" + queryString`, producing a URL ending in a bare `?` (e.g. `https://host/api/resource?`). Most servers tolerate a trailing `?`, but it is an unintended artifact, can defeat response caching on some endpoints, and makes captured request URLs harder to read in logs. **Recommendation** Only append `"?" + queryString` when `queryString` is non-empty (compute the joined string first and check it), so a method whose effective parameter set is empty produces a clean URL identical to the no-parameters case. **Resolution** Resolved 2026-05-17. Root cause confirmed: `BuildUrl` appended `"?" + queryString` whenever the GET/DELETE parameter dictionary was non-empty, even when every value was null and `queryString` was the empty string, yielding a bare trailing `?`. Fix: `BuildUrl` now appends `"?" + queryString` only when `queryString.Length > 0`, so a method whose effective parameter set is empty produces a URL identical to the no-parameters case. Regression test `Call_GetWithAllNullParameters_DoesNotAppendTrailingQuestionMark` asserts the captured request URI has no trailing `?`; it was verified to fail before the fix. ### ExternalSystemGateway-018 — `DeliverBufferedAsync` lets `JsonException` propagate, turning a corrupt buffered row into a permanent retry-forever poison message | | | |--|--| | Severity | High | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:176`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:151` | **Resolution** — Wrapped the `JsonSerializer.Deserialize<...>(message.PayloadJson)` call in both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` in a `try`/`catch (JsonException)` block. A `JsonException` is by definition permanent (the same payload bytes always deserialize identically), so the catch branch logs at `LogError` and returns `false`, parking the message via the S&F engine instead of letting it throw and be retried as a transient failure. Regression tests `DeliverBuffered_MalformedJsonPayload_ReturnsFalseSoMessageParks` were added to both `ExternalSystemClientTests` and `DatabaseGatewayTests` — each feeds a truncated `PayloadJson` to the handler and asserts `delivered == false` and that no exception escapes. **Description** Both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync` begin with an unguarded `JsonSerializer.Deserialize<...>(message.PayloadJson)`: ```csharp var payload = JsonSerializer.Deserialize(message.PayloadJson); if (payload == null || string.IsNullOrEmpty(payload.SystemName) || ...) { _logger.LogError("... unreadable payload; parking."); return false; } ``` The "unreadable payload; parking" branch is only entered when `Deserialize` *succeeds* and produces a null / partially-empty object. If `PayloadJson` is **malformed JSON** — the column was truncated mid-write, an older payload schema is being deserialized into a newer record, or storage corruption occurred — `Deserialize` throws `JsonException` before that check is ever reached. The exception propagates out of the delivery handler. The Store-and-Forward retry loop treats *any* thrown exception from a delivery handler as a transient failure (only a returned `false` parks the message); see `StoreAndForwardService.RetryMessageAsync`. Combined with the `MaxRetries == 0` → "unset → bounded default" fix from `-015`, the resulting behaviour is: 1. Corrupt payload arrives in the buffer. 2. Every retry sweep deserializes, throws `JsonException`, increments `RetryCount`. 3. The message is retried until `RetryCount >= MaxRetries`, then parked — *only* if `MaxRetries > 0` is configured (which `-015` already established is not the default site configuration today). With the bounded S&F default it does eventually park, but it park-loops noisily for `DefaultMaxRetries` iterations first; without that bound it retries forever. 4. The script is unaware — the cached call was returned `WasBuffered: true` long ago. This is the same "poison message buffered forever" class of hazard that `ExternalSystemGateway-001` (no-handler) and `ExternalSystemGateway-015` (MaxRetries==0) already removed for their own causes; corrupt JSON is an alternative arrival path into the same bad state. The `DatabaseGateway.DeliverBufferedAsync` path has the same shape and the same defect: `JsonSerializer.Deserialize` at line 151 is not guarded. **Recommendation** Wrap the `Deserialize` call in a `try/catch (JsonException)` block in both `DeliverBufferedAsync` methods. A `JsonException` is by definition a permanent failure — re-running the same deserialization against the same payload will produce the same exception — so the catch should log at `LogError` and **return `false`** so the S&F engine parks the message rather than retrying. Add regression tests that feed a malformed `PayloadJson` to each handler and assert `delivered == false` (i.e. the message parks) and that no exception escapes the handler. ### ExternalSystemGateway-019 — `HttpClient.Timeout` is not set; `DefaultHttpTimeout` > 100s is silently clipped by the framework default | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:226,257-264`, `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ServiceCollectionExtensions.cs:90-102` | **Resolution (2026-05-28):** Set `client.Timeout = Timeout.InfiniteTimeSpan` immediately after `_httpClientFactory.CreateClient($"ExternalSystem_{system.Name}")` in `ExternalSystemClient.InvokeHttpAsync`, disabling the framework's 100 s default so the per-call `CancellationTokenSource(_options.DefaultHttpTimeout)` linked CTS already built below is the sole timeout source. An operator-configured `DefaultHttpTimeout` greater than 100 s is now honoured verbatim instead of being silently clipped and misclassified as a transient "connection error". Kept the fix local to the allowed file (`ExternalSystemClient.cs`) rather than touching `ServiceCollectionExtensions.cs`/`GatewayHttpClientConfigurator`. Regression test `Call_DisablesHttpClientFrameworkTimeoutSoLongTimeoutsArentClipped` asserts the rented client starts with the framework's 100 s default and is set to `Timeout.InfiniteTimeSpan` after `InvokeHttpAsync` runs. **Description** The `-002` fix enforces the per-call timeout via a linked `CancellationTokenSource` built from `_options.DefaultHttpTimeout` and passed into `SendAsync`. That correctly caps every call to *at most* the configured value when `DefaultHttpTimeout` ≤ 100s. However, `HttpClient.Timeout` (the framework default) is never set on either the named client or its primary handler — the `GatewayHttpClientConfigurator` only sets `MaxConnectionsPerServer`. `HttpClient.Timeout` defaults to **100 seconds**, and `SendAsync` enforces it internally by cancelling its own private CTS, raising a `TaskCanceledException` from `SendAsync` *without* cancelling either the caller's token or the gateway's `timeoutCts`. Consequences when an operator configures `DefaultHttpTimeout` to anything > 100s (a legitimate setting for external systems with long-running endpoints — recipe exports, large queries): 1. The gateway's `timeoutCts` (e.g. 5 minutes) has not yet fired. 2. `HttpClient.Timeout` fires at 100s, `SendAsync` throws. 3. Neither `when (cancellationToken.IsCancellationRequested)` nor `when (timeoutCts.IsCancellationRequested)` matches, so the exception falls into the generic `catch (Exception ex) when (ErrorClassifier.IsTransient(ex))` branch (line 277) and is re-thrown as a `TransientExternalSystemException` with the message `"Connection error to {Name}: A task was canceled."` — misattributing a timeout as a connection error. 4. The configured 5-minute round-trip window the design doc promises ("Each external system definition specifies a timeout that applies to all method calls on that system" / "applies to the HTTP request round-trip") is silently overridden. The opposite case (`DefaultHttpTimeout` < 100s) is the only one the `-002` regression test exercises (200ms), so the defect is not caught by the existing suite. **Recommendation** Set `HttpClient.Timeout = Timeout.InfiniteTimeSpan` on the gateway's named clients via the existing `GatewayHttpClientConfigurator` (delegate `HttpClientActions` rather than just `HttpMessageHandlerBuilderActions`), so the cancellation-token mechanism is the sole timeout source. The linked `timeoutCts` then reliably enforces `DefaultHttpTimeout` for every value, and the timeout-vs-cancellation classification at lines 266–276 stays accurate. Add a regression test that configures `DefaultHttpTimeout` to ~150s, hangs the handler, and asserts the call times out at the configured value and produces a `"Timeout calling..."` (not `"Connection error to..."`) error. ### ExternalSystemGateway-020 — `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:185-193` | **Resolution (2026-05-28):** `JsonElementToParameterValue` now probes `TryGetInt64` → `TryGetDecimal` → `GetDouble`, so a JSON number that fits in `decimal` materialises as a `decimal` (preserving the script's authored precision on cached-write retries) and only genuinely out-of-decimal-range values fall through to `double`. Regression test `JsonElementToParameterValue_DecimalShapedNumber_PreservesPrecisionViaDecimal` round-trips `1234567890.1234567890` through a `JsonElement` and asserts the result is a `decimal` carrying the original precision; companion tests guard the long-fast-path and the out-of-range-double fallback. **Description** `DatabaseGateway.JsonElementToParameterValue` materialises the buffered cached-write SQL parameter values during a retry-sweep delivery: ```csharp private static object JsonElementToParameterValue(JsonElement element) => element.ValueKind switch { JsonValueKind.String => (object?)element.GetString() ?? DBNull.Value, JsonValueKind.Number => element.TryGetInt64(out var l) ? l : element.GetDouble(), ... }; ``` For a JSON number, the helper attempts `Int64` first and otherwise returns a `double`. There is no `decimal` branch. The immediate-attempt path is unaffected — `CachedWriteAsync` on the original call serializes the script-provided typed parameters via `JsonSerializer.Serialize(new { ConnectionName, Sql, Parameters = parameters })` and executes the SQL directly outside this code path. But the **retry path** runs through `DeliverBufferedAsync` → `JsonElementToParameterValue`, so a script that submitted a `decimal` value (e.g. `123.4567890123m`) gets: 1. Immediate attempt: `decimal` parameter, full precision (or, more accurately, the value never enters this helper because cached writes today never re-execute on the immediate path — but on the retry path it does). 2. Retry attempt(s) after a transient failure: the value is deserialized as a JSON number, fails `TryGetInt64`, and is downcast to `double` — which has ~15–17 digits of precision against `decimal`'s 28–29. A SQL column of type `decimal(18, 6)` or `numeric` receives a value that has been truncated to `double` precision before parameter binding. Two further consequences worth recording: - The downcast is **silent** — there is no log, no error, and the cached-write acknowledgement to the script has long since happened. Data drift between a same-call immediate-success delivery and a same-call retry delivery is the worst shape of "looks like the right value but isn't" defect. - For SCADA telemetry (process variables, totals, currency-denominated quality reports) `decimal` is the correct CLR type and `double`'s representation error changes the persisted value. **Recommendation** Replace the `Number` branch with a precision-preserving cascade — try `Int64`, then `decimal` (`element.TryGetDecimal(out var d) ? d : element.GetDouble()`), and only fall back to `double` when even `decimal` fails. Add a regression test against `DatabaseGateway.DeliverBufferedAsync` that buffers a write with a high-precision `decimal` parameter, drives the delivery, and asserts the SQL parameter bound is a `decimal` (or compares the round-tripped value to the original at the parameter level) rather than a `double` with truncated precision. The same Number-branch decision should be reviewed against `JsonValueKind.True`/`False`/`Null` (currently fine) and a string that happens to encode a number (already correctly returns `string`). ### ExternalSystemGateway-021 — `ApplyAuth` silently sends an unauthenticated request on unknown `AuthType`, empty `AuthConfiguration`, or malformed Basic config | | | |--|--| | Severity | Low | | Category | Security | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:385-415` | **Resolution (2026-05-28):** `ApplyAuth` is now an instance method that uses the existing `_logger`. Three previously-silent fail-open paths now emit a `LogWarning` so an operator debugging a recurring 401 sees the cause inside ScadaBridge: (1) empty `AuthConfiguration` for `AuthType=apikey`/`basic`, (2) unknown `AuthType` (anything except `apikey`/`basic`/`none`), (3) malformed Basic config (no `:` separator). The `AuthConfiguration` value is NEVER included in the log message. `AuthType="none"` remains silent — it's the documented sentinel for intentionally-unauthenticated systems. Behaviour is otherwise unchanged: the request still goes out (never block on auth-config issues), the failure mode is just now visible. **Description** `ApplyAuth` has three fail-open paths that all result in an HTTP request being sent **without** the credential the operator configured: 1. Line 387 — `if (string.IsNullOrEmpty(system.AuthConfiguration)) return;` returns early regardless of `AuthType`. A system entity with `AuthType = "apikey"` but an empty `AuthConfiguration` (e.g. the secret column failed to deploy, or the protector key changed and decryption produced `""`) sends every request with no `X-API-Key` header — the gateway is silent. 2. The `switch` has no `default` arm. A system entity with `AuthType = "bearer"`, `"oauth2"`, a typo like `"ApiKey "` (trailing space) or even `"none"` falls off the `switch` and the request is sent without any auth header — again silent. 3. Line 408 — `if (basicParts.Length == 2)` skips the auth attach when `AuthConfiguration` for `basic` lacks a `:` separator. The request is sent with no `Authorization` header. Effectively the gateway treats every misconfiguration as "send anonymously" and relies on the remote system rejecting it with a 401/403. That is a defensible default on its own, but combined with `-007`'s 2 KB error-body cap and the fact that no audit or warning is emitted, an operator debugging "why does my external system always return 401" has nothing to go on inside ScadaBridge — the gateway never says it failed to apply auth. For `AuthType = "none"` (the design's expected sentinel for unauthenticated systems) the fall-through is correct; the failure mode is misconfig. **Recommendation** Add a `default:` arm to the `switch` that logs `_logger.LogWarning(...)` naming the unknown `AuthType` and the system, and emit a similar warning when `AuthConfiguration` is empty for an `AuthType` of `"apikey"` or `"basic"` (those require a value; `"none"` does not). For Basic auth specifically, the `basicParts.Length != 2` branch should also warn. Do **not** include the `AuthConfiguration` value in the log message — secrets must stay out of the log (consistent with the existing module). A small set of `ApplyAuth` unit tests verifying the warning emission and that no `Authorization` / `X-API-Key` header is ever leaked in the warning text would close the test gap as well. ### ExternalSystemGateway-022 — `new HttpMethod(method.HttpMethod)` accepts any string at runtime; an invalid HTTP verb fails only at call time | | | |--|--| | Severity | Low | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:233` | **Resolution (2026-05-28):** Added a `ValidateHttpMethod` helper called at the top of `InvokeHttpAsync` that rejects any verb outside the documented `GET/POST/PUT/PATCH/DELETE` allowlist (matching ESG-023's design-doc reconciliation) with a clear `ArgumentException` naming the offending verb. Allowlist is a `HashSet` with `OrdinalIgnoreCase` so the operator-authored entity column is case-insensitive. Regression tests `Call_UnsupportedHttpMethod_ThrowsArgumentException` (Theory: FOO/DLETE/GIT/OPTIONS/HEAD) and `Call_DocumentedHttpMethod_IsAccepted` (Theory: GET/get/Post/PATCH/delete) cover the rejection and the case-insensitive accept paths. **Description** `InvokeHttpAsync` constructs the request method directly from the string column: `new HttpRequestMessage(new HttpMethod(method.HttpMethod), url)`. `System.Net.Http.HttpMethod(string)` performs only a token-character validation (it rejects whitespace and control chars but accepts arbitrary non-standard tokens like `"FOO"` or `"GIT"`). The body-vs-query selection at lines 239–250 explicitly checks for POST/PUT/PATCH; for any other non-standard verb (`"FOO"`) the parameters silently go to neither body nor query string and the request is dispatched anyway. The design doc enumerates GET/POST/PUT/DELETE as the supported set. There is no validation at deployment time, at definition save time, or at gateway resolution time that `method.HttpMethod` is one of the expected verbs. An operator who typos `"DLETE"` discovers the issue only when a script invokes that method and the remote server rejects the request — usually as a 4xx that the gateway classifies as permanent, which is correct but obscures the root cause. **Recommendation** Validate `method.HttpMethod` at gateway entry — either with a small `switch` of allowed verbs in `InvokeHttpAsync` that throws `PermanentExternalSystemException` for an unsupported verb (cheap, immediate, surfaces a clear error to the script), or by adding a validation pass in the Template/Deployment Manager so it can never reach the gateway. The first option is local to this module and cheaper to land. Either way, the canonical list should agree with `BuildUrl`'s query-vs-body decision (which currently knows about POST/PUT/PATCH for body and GET/DELETE for query — note PATCH is in the body branch but not the design-doc list; see finding 023). ### ExternalSystemGateway-023 — PATCH HTTP method is supported by code but absent from the design doc; body-vs-query decision drifts from the documented set | | | |--|--| | Severity | Low | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/ExternalSystemClient.cs:241`, `docs/requirements/Component-ExternalSystemGateway.md:43` | **Resolution (2026-05-28):** Doc-only fix; confirmed PATCH is wired in `ExternalSystemClient.cs:258-260` alongside POST/PUT for body serialization. Added `PATCH` to the design doc's HTTP-method list (line 42) and updated the body/query-parameter sentence (line 75) so the documented set matches the code's `body = POST/PUT/PATCH; query = GET/DELETE` split. **Description** The component design doc lists the supported HTTP methods as `GET, POST, PUT, or DELETE` (line 43: `**HTTP method**: GET, POST, PUT, or DELETE.`). `InvokeHttpAsync`'s body-serialization branch at lines 239–250 explicitly includes `PATCH` alongside POST and PUT — so PATCH is in fact supported (and routes parameters into the JSON body), but operators reading the spec would not know it. Conversely, `BuildUrl`'s query-string branch at lines 364–366 lists only `GET` and `DELETE`, so a PATCH method's parameters always go to the body, matching the body-branch but not appearing anywhere in the documented contract. This is mild drift — the code is more permissive than the spec. It only becomes a real issue if a future change relies on the documented "only GET/POST/PUT/DELETE" set and breaks the PATCH path silently, or if PATCH is genuinely out of scope and a template author defines a PATCH method on purpose only to learn later it is unsupported. **Recommendation** Pick one direction and apply it in the same session, per the project's "design doc + code travel together" rule: - If PATCH is intentionally supported, add `PATCH` to the Component-ExternalSystemGateway.md HTTP-method list (line 43) and add a parameterised test confirming a PATCH method sends its parameters in the JSON body and resolves like POST/PUT for error classification. - If PATCH is not in scope, remove `method.HttpMethod.Equals("PATCH", ...)` from the body branch in `InvokeHttpAsync` and let finding-022's verb validation reject it. The design-doc list then remains the single source of truth.