Files
ScadaBridge/docs/components/InboundAPI.md
T

280 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Inbound API
The Inbound API exposes a `POST /api/{methodName}` endpoint on the active central node so external systems can invoke C# scripts that live entirely on central, with the ability to reach any site instance through a routing surface. It is the inward counterpart of the External System Gateway — where that component handles scripts calling out, this handles callers coming in.
## Overview
Inbound API (#14) is a central-only, active-node-only component. Its code lives in `src/ZB.MOM.WW.ScadaBridge.InboundAPI/`, with shared entity and message types in `src/ZB.MOM.WW.ScadaBridge.Commons/`.
The component has three runtime responsibilities:
- **Auth and dispatch** — `EndpointExtensions.MapInboundAPI` registers the endpoint; `InboundApiEndpointFilter` enforces the active-node gate and body-size cap before the handler runs; the handler authenticates via `IApiKeyVerifier` and resolves the matching `ApiMethod` from `IInboundApiRepository`.
- **Script execution** — `InboundScriptExecutor` compiles `ApiMethod.Script` via Roslyn, caches the compiled delegate, and runs it inside `InboundScriptContext` against a method-level timeout.
- **Audit emission** — `AuditWriteMiddleware` wraps the entire request pipeline; it mints the per-request `ExecutionId`, buffers request and response bodies up to the configured cap, and writes one `ApiInbound` row to `ICentralAuditWriter` in its `finally` block regardless of outcome.
The DI entry point is `ServiceCollectionExtensions.AddInboundAPI`, which registers `InboundScriptExecutor` (singleton), `RouteHelper` (scoped), `CommunicationServiceInstanceRouter` (scoped), and `InboundApiEndpointFilter` (singleton). API key verification is registered separately by the Host composition root via `AddZbApiKeyAuth``AddInboundAPI` does not register it.
## Key Concepts
### API key authentication
Authentication uses a Bearer token in the `Authorization` header (`sbk_<keyId>_<secret>`). The shared `IApiKeyVerifier` performs a peppered-HMAC constant-time secret comparison against the key store. Every verifier failure — missing token, unknown key, revoked key, secret mismatch — maps to a single `401` with the body `{"error":"Invalid or missing API key"}` so the failure reason is never surfaced to the caller.
The spec describes `X-API-Key` header auth. The code has retired that header in favour of a `Bearer` token scheme (`Authorization: Bearer sbk_<keyId>_<secret>`). The constant `UnauthorizedMessage` and `NotApprovedMessage` in `EndpointExtensions` are deliberately identical across different reject branches to prevent method enumeration.
### Per-method scope authorization
Once a key verifies, the handler checks whether `identity.Scopes.Contains(methodName)` (ordinal, case-sensitive) before making any database call. A key must carry the exact method name as a scope — `"Echo"` does not grant `"echo"`. If the scope check fails, or the subsequent `IInboundApiRepository.GetMethodByNameAsync` returns null, both branches emit `403` with the same body `{"error":"API key not approved for this method"}`. The scope check runs first to avoid a DB round-trip on the reject path and to eliminate a latency timing oracle.
### `ApiMethod` entity
`ApiMethod` (in `ZB.MOM.WW.ScadaBridge.Commons.Entities.InboundApi`) is the persistence-ignorant shape:
```csharp
public class ApiMethod
{
public int Id { get; set; }
public string Name { get; set; } // route segment
public string Script { get; set; } // Roslyn C# script body
public string? ParameterDefinitions { get; set; } // JSON: List<ParameterDefinition>
public string? ReturnDefinition { get; set; } // JSON: List<ReturnFieldDefinition>
public int TimeoutSeconds { get; set; }
}
```
`ParameterDefinitions` and `ReturnDefinition` are stored as JSON strings to keep the schema simple; both are deserialized on every request by `ParameterValidator` and `ReturnValueValidator`.
### Extended type system
Parameter and return field definitions share the same six-type vocabulary:
| Type | JSON shape | C# value after coercion |
|-----------|----------------------|-------------------------------------|
| `Boolean` | `true` / `false` | `bool` |
| `Integer` | number (whole) | `long` |
| `Float` | number | `double` |
| `String` | string | `string` |
| `Object` | JSON object | `Dictionary<string, object?>` |
| `List` | JSON array | `List<object?>` |
`Object` and `List` are validated for JSON shape only — field-level or element-level type constraints are the script's responsibility. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway.
## Architecture
### Request pipeline
```
POST /api/{methodName}
├─ AuditWriteMiddleware ← mints ExecutionId; buffers bodies; emits audit row in finally
│ └─ InboundApiEndpointFilter ← 503 on standby node; 413 on oversized body
│ └─ HandleInboundApiRequest
│ ├─ IApiKeyVerifier.VerifyAsync ← 401 on any auth failure
│ ├─ scope check + GetMethodByNameAsync ← 403 on not-approved
│ ├─ ParameterValidator.Validate ← 400 on bad parameters
│ └─ InboundScriptExecutor.ExecuteAsync
│ ├─ ForbiddenApiChecker ← static trust model enforcement
│ ├─ Roslyn compile + cache ← handler cached by method name
│ └─ ReturnValueValidator ← 500 on return shape mismatch
└─ ICentralAuditWriter.WriteAsync ← fire-and-forget from middleware finally
```
The filter is applied at registration time via `.AddEndpointFilter<InboundApiEndpointFilter>()` in `EndpointExtensions.MapInboundAPI`; it runs before the handler so a standby node or an oversized body never reaches auth or script execution.
### Script compilation and handler cache
`InboundScriptExecutor` is a singleton holding two `ConcurrentDictionary` instances:
- `_scriptHandlers` — maps method name to a compiled `Func<InboundScriptContext, Task<object?>>`.
- `_knownBadMethods` — records methods whose scripts have failed to compile, capped at 1 000 entries, so a bad script is compiled at most once per startup and a flood of unique bogus names cannot grow the cache without bound.
The compilation path in `CompileAndRegister`:
```csharp
public bool CompileAndRegister(ApiMethod method)
{
var handler = Compile(method);
if (handler == null)
{
TryRecordBadMethod(method.Name);
return false;
}
_knownBadMethods.TryRemove(method.Name, out _);
return Register(method.Name, handler);
}
```
`Compile` runs `ForbiddenApiChecker.FindViolations` first — a Roslyn syntax-tree walk that rejects forbidden namespace references (`System.IO`, `System.Diagnostics`, `System.Threading` except `Tasks`, `System.Reflection`, `System.Net`, `System.Runtime.InteropServices`, `Microsoft.Win32`) and reflection-gateway member names (`GetType`, `Assembly`, `GetMethod`, `CreateInstance`, `InvokeMember`, and others). Scripts containing `dynamic` or `Activator` are also rejected. This is defence-in-depth, not a true sandbox.
If a method is invoked before it has been compiled — for example a method created after startup — `ExecuteAsync` performs a lazy compile on first call, then stores the handler via `GetOrAdd` so concurrent first callers share one delegate.
Scripts are compiled with a restricted reference set (`mscorlib`, `System.Linq`, `System.Collections.Generic`, `RouteHelper`'s assembly, `ScriptParameters`'s assembly, and the C# runtime binder) and with imports for `System`, `System.Collections.Generic`, `System.Linq`, and `System.Threading.Tasks`. The `globalsType` is `InboundScriptContext`.
### Script context and the `Route` surface
`InboundScriptContext` is the Roslyn globals object injected into every running script:
```csharp
public class InboundScriptContext
{
public ScriptParameters Parameters { get; }
public RouteHelper Route { get; }
public CancellationToken CancellationToken { get; }
}
```
`Parameters` wraps the validated, type-coerced values. `Parameters["key"]` gives raw `object?` access; `Parameters.Get<T>("key")` adds typed conversion with clear error messages (`ScriptParameterException`). `Route` is a scoped `RouteHelper` already bound to the method-level deadline token and to the inbound request's `ParentExecutionId`.
`RouteHelper.To(instanceCode)` returns a `RouteTarget` that exposes five operations:
| Method | Description |
|--------|-------------|
| `Call(scriptName, parameters?)` | Invoke a script on the instance; returns the script's return value. |
| `GetAttribute(name)` | Read one attribute value. |
| `GetAttributes(names)` | Batch-read; returns `IReadOnlyDictionary<string, object?>`. |
| `SetAttribute(name, value)` | Write one attribute value. |
| `SetAttributes(dict)` | Batch-write. |
All five operations are synchronous from the script's perspective (the central node blocks until the site responds or the method timeout fires). There is no store-and-forward — a site-unreachable or timed-out routed call throws `InvalidOperationException` back to the script, which surfaces as a `500` to the caller.
`RouteTarget.Call` builds a `RouteToCallRequest` carrying `ParentExecutionId` so the spawned site script execution records the inbound request as its parent in the audit tree:
```csharp
var request = new RouteToCallRequest(
correlationId, _instanceCode, scriptName, ScriptArgs.Normalize(parameters),
DateTimeOffset.UtcNow, _parentExecutionId);
var response = await _instanceRouter.RouteToCallAsync(siteId, request, token);
```
`IInstanceRouter` is the seam over `CommunicationService`; in production, `CommunicationServiceInstanceRouter` delegates every call directly to `CommunicationService.RouteToCallAsync / RouteToGetAttributesAsync / RouteToSetAttributesAsync`.
### Active-node gating
`IActiveNodeGate.IsActiveNode` is the seam the Host implements using Akka cluster state. When `false`, `InboundApiEndpointFilter` returns `503` before any auth or script logic runs. When no implementation is registered — non-clustered hosts, tests — the endpoint is served, preserving prior behaviour.
### Audit integration
`AuditWriteMiddleware` sits in the pipeline above the endpoint filter and handler. At the start of every request it:
1. Mints a fresh `Guid` as the request's `ExecutionId` and stashes it on `HttpContext.Items[InboundExecutionIdItemKey]`.
2. Calls `HttpRequest.EnableBuffering()` (for POST/PUT/PATCH requests only) and reads up to `AuditLogOptions.InboundMaxBytes` bytes of the request body into a bounded audit copy, then rewinds the stream to position 0 so the downstream handler sees the full payload.
3. Wraps `HttpResponse.Body` in `CapturedResponseStream`, which mirrors every write to the real sink while capturing up to `InboundMaxBytes` bytes for the audit copy.
In the `finally` block, the middleware calls `ICentralAuditWriter.WriteAsync` (fire-and-forget with fault observation) to emit one `AuditChannel.ApiInbound` row. The row's `AuditKind` is `InboundAuthFailure` for `401`/`403` and `InboundRequest` for all other outcomes. Status is `Delivered` for 2xx and `Failed` for 4xx/5xx or a handler exception. `Actor` is the resolved API key display name (stashed by the endpoint handler on `HttpContext.Items[AuditActorItemKey]` after successful auth); it is forced null for auth failures so the middleware never echoes an unauthenticated principal. The audit row's `ExecutionId` is the same `Guid` minted in step 1.
The endpoint handler reads that same `ExecutionId` from `HttpContext.Items` and threads it into `InboundScriptExecutor.ExecuteAsync` as `parentExecutionId`, which in turn passes it to `RouteHelper.WithParentExecutionId`. Any `Route.To().Call()` inside the script carries it as `RouteToCallRequest.ParentExecutionId`, so the spawned site script execution is linked back to this inbound request in the audit tree.
Audit emission is best-effort. A write failure is caught, logged at `Warning`, and dropped. It never alters the HTTP response.
## Usage
### HTTP contract
```http
POST /api/{methodName}
Authorization: Bearer sbk_<keyId>_<secret>
Content-Type: application/json
{
"siteId": "SiteA",
"startDate": "2026-03-01",
"endDate": "2026-03-16"
}
```
Success response (`200`):
```json
{
"siteName": "Site Alpha",
"totalUnits": 14250,
"lines": [
{ "lineName": "Line-1", "units": 8200, "efficiency": 92.5 }
]
}
```
Error responses:
| Status | Condition |
|--------|-----------|
| `401` | Missing, malformed, unknown, revoked, or secret-mismatched token. |
| `403` | Valid key, but not in scope for this method; or method not found. |
| `400` | Missing required parameters, wrong types, or unexpected fields. |
| `413` | Request body exceeds `MaxRequestBodyBytes`. |
| `500` | Script execution error, compilation failure, or return-shape mismatch. |
| `503` | Request reached a standby node. |
The `403` body is identical whether the method does not exist or the key lacks scope, so a caller holding a valid key cannot enumerate method names by observing status differences.
### Writing a method script
A method script runs as a Roslyn C# script with `InboundScriptContext` as globals. The script has access to `Parameters`, `Route`, and `CancellationToken`.
```csharp
// Example: read a parameter, call a site script, return a result
var siteId = Parameters.Get<string>("siteId");
var result = await Route.To(siteId).Call("GetProductionSummary", new { date = Parameters.Get<string>("date") });
return result;
```
The `Route.To().Call()` inherits the method-level timeout automatically. A script that needs a tighter per-call bound may pass an explicit `CancellationToken`. Scripts may not access `System.IO`, `System.Diagnostics.Process`, `System.Threading` (except `Tasks`), `System.Reflection`, `System.Net`, or reflection-gateway members — violations are rejected statically at compile time.
### Startup compilation and hot-reload
At startup the Host loads all `ApiMethod` rows from the configuration database and calls `CompileAndRegister` on each. After a method is updated via the Management API or CLI, the Management Service calls `CompileAndRegister` again — the updated script takes effect on the next request, with no node restart. Methods created after startup compile lazily on first invocation. Scripts modified directly in the database do not take effect until the next node restart; always use the Management API, CLI, or Central UI.
## Configuration
Options class: `InboundApiOptions`, bound from the `ScadaBridge:InboundApi` section.
| Key | Default | Description |
|-----|---------|-------------|
| `DefaultMethodTimeout` | `00:00:30` | Execution timeout applied when `ApiMethod.TimeoutSeconds` is zero or not set. |
| `MaxRequestBodyBytes` | `1048576` (1 MiB) | Body size cap enforced by `InboundApiEndpointFilter` before the body is parsed. Requests whose `Content-Length` exceeds this return `413`; chunked requests are cut off by Kestrel as they stream in. |
| `ApiKeyPepper` | _(required)_ | Server-side HMAC pepper for bearer credentials. Consumed by the shared `IApiKeyVerifier`; must be a strong, random value (`≥ 16` characters), different per environment, supplied via a secret store. |
The inbound body-capture cap for audit is configured separately under `AuditLog:InboundMaxBytes` (default 1 MiB; range `[8192, 16777216]`). It governs only the audit copy — the downstream handler always sees the full body.
## Dependencies & Interactions
- [Commons (#16)](./Commons.md) — owns `ApiMethod`, `ParameterDefinition`, `ScriptParameters`, `ScriptParameterException`, the `RouteToCall*` / `RouteToGetAttributes*` / `RouteToSetAttributes*` message records, `IInboundApiRepository`, and `IInstanceLocator`. Also owns `ICentralAuditWriter` (via `ZB.MOM.WW.Audit`), `AuditChannel`, `AuditKind`, `AuditStatus`, and `ScadaBridgeAuditEventFactory`.
- [Configuration Database (#17)](./ConfigurationDatabase.md) — provides the `IInboundApiRepository` implementation (`GetMethodByNameAsync`, `GetAllApiMethodsAsync`, CRUD). Method definitions persist in the central MS SQL configuration database.
- [CentralSite Communication (#5)](./Communication.md) — `CommunicationServiceInstanceRouter` delegates every `Route.To()` operation to `CommunicationService`. The routed call travels from the central `CommunicationActor` to the target site via `ClusterClient`, reaches the target `InstanceActor`, and a `ScriptExecutionActor` executes the named script. The return value flows back synchronously.
- [Audit Log (#23)](./AuditLog.md) — `AuditWriteMiddleware` resolves `ICentralAuditWriter` to emit the `ApiInbound` row via the central direct-write path. The inbound request is the parent execution for any site script it spawns: the middleware's `ExecutionId` becomes `RouteToCallRequest.ParentExecutionId` on every routed `Call`. Cross-link: `AuditWriteMiddleware.InboundExecutionIdItemKey` / `AuditWriteMiddleware.AuditActorItemKey` are the `HttpContext.Items` keys that tie the endpoint handler and middleware together.
- [Security (#10)](./Security.md) — API key verification (`IApiKeyVerifier`, `AddZbApiKeyAuth`) is registered by the Host. The inbound API uses a dedicated key scheme independent of LDAP/AD session auth.
- [Cluster Infrastructure (#13)](./ClusterInfrastructure.md) — `IActiveNodeGate` (interface in this project; implementation in the Host) gates the endpoint to the active central node. A standby returns `503` without running any script logic.
- [Health Monitoring (#11)](./HealthMonitoring.md) — `ScadaBridgeTelemetry.RecordInboundApiRequest(methodName)` is called on every request (after auth failures are classified); `CentralAuditWriteFailures` surfaces on the central health snapshot when an audit write fails.
- Design spec: [Component-InboundAPI.md](../requirements/Component-InboundAPI.md).
## Troubleshooting
### A method always returns 500 after a script update
The in-memory handler cache still holds the previous compiled delegate. If the update went through the Management API or CLI, `CompileAndRegister` should have been called automatically and the new script should be active on the next request. If the script was edited directly in the database, the cached delegate is stale until the next node restart. Check the `ScadaBridge.InboundAPI` log category for a `"script compilation failed"` or `"trust model violation"` warning to distinguish a compile error from a routing failure.
### A method is stuck in the known-bad-methods cache
If a previously broken script is fixed but `ExecuteAsync` still returns `"Script compilation failed for this method"`, the method name is in `_knownBadMethods`. `CompileAndRegister` clears the bad-method entry on a successful compile; calling it (via the Management API or CLI `api-method update`) after the fix is applied resets the cache and makes the corrected script active immediately.
### Routed calls time out but the site is reachable
The method-level timeout covers the entire execution including `Route.To().Call()`. A slow site script, a large return value, or network latency can consume the budget. `TimeoutSeconds` on the `ApiMethod` entity controls the cap per method; `DefaultMethodTimeout` in `InboundApiOptions` applies when `TimeoutSeconds` is zero. Increase `TimeoutSeconds` for long-running methods; a `503` from the `/health/active` endpoint on the site side indicates a site failover mid-call.
### Audit rows missing for inbound requests
`AuditWriteMiddleware` emits on a fire-and-forget `Task`; a write failure is caught and logged at `Warning` under `AuditWriteMiddleware`. `CentralAuditWriteFailures` increments on the central health snapshot. The request itself still returns its normal HTTP response — a missing audit row never means the call failed.
## Related Documentation
- [Inbound API design specification](../requirements/Component-InboundAPI.md)
- [Audit Log](./AuditLog.md)
- [Commons](./Commons.md)
- [Configuration Database](./ConfigurationDatabase.md)
- [CentralSite Communication](./Communication.md)
- [Security](./Security.md)
- [Cluster Infrastructure](./ClusterInfrastructure.md)
- [External System Gateway](./ExternalSystemGateway.md)
- [Site Runtime](./SiteRuntime.md)