NotificationService (Notify.Send returns string not NotificationId; MaxConcurrentConnections unenforced; AddHttpClient), NotificationOutbox (one Attempted row always, terminal row only on terminal status), SiteCallAudit (direct dual-write, no Tell; KPI tiles consumed by CentralUI), HealthMonitoring (CentralOfflineTimeout 180s = 6x ReportInterval; HealthReportSender gates on IsActiveNode), SiteEventLogging (active-node purge seam not wired; runs on both nodes), InboundAPI (whole System.Diagnostics namespace forbidden).
20 KiB
Inbound API
The Inbound API exposes a POST /api/{methodName} endpoint on the active central node so external systems can invoke C# scripts that live entirely on central, with the ability to reach any site instance through a routing surface. It is the inward counterpart of the External System Gateway — where that component handles scripts calling out, this handles callers coming in.
Overview
Inbound API (#14) is a central-only, active-node-only component. Its code lives in src/ZB.MOM.WW.ScadaBridge.InboundAPI/, with shared entity and message types in src/ZB.MOM.WW.ScadaBridge.Commons/.
The component has three runtime responsibilities:
- Auth and dispatch —
EndpointExtensions.MapInboundAPIregisters the endpoint;InboundApiEndpointFilterenforces the active-node gate and body-size cap before the handler runs; the handler authenticates viaIApiKeyVerifierand resolves the matchingApiMethodfromIInboundApiRepository. - Script execution —
InboundScriptExecutorcompilesApiMethod.Scriptvia Roslyn, caches the compiled delegate, and runs it insideInboundScriptContextagainst a method-level timeout. - Audit emission —
AuditWriteMiddlewarewraps the entire request pipeline; it mints the per-requestExecutionId, buffers request and response bodies up to the configured cap, and writes oneApiInboundrow toICentralAuditWriterin itsfinallyblock regardless of outcome.
The DI entry point is ServiceCollectionExtensions.AddInboundAPI, which registers InboundScriptExecutor (singleton), RouteHelper (scoped), CommunicationServiceInstanceRouter (scoped), and InboundApiEndpointFilter (singleton). API key verification is registered separately by the Host composition root via AddZbApiKeyAuth — AddInboundAPI does not register it.
Key Concepts
API key authentication
Authentication uses a Bearer token in the Authorization header (sbk_<keyId>_<secret>). The shared IApiKeyVerifier performs a peppered-HMAC constant-time secret comparison against the key store. Every verifier failure — missing token, unknown key, revoked key, secret mismatch — maps to a single 401 with the body {"error":"Invalid or missing API key"} so the failure reason is never surfaced to the caller.
The spec describes X-API-Key header auth. The code has retired that header in favour of a Bearer token scheme (Authorization: Bearer sbk_<keyId>_<secret>). The constant UnauthorizedMessage and NotApprovedMessage in EndpointExtensions are deliberately identical across different reject branches to prevent method enumeration.
Per-method scope authorization
Once a key verifies, the handler checks whether identity.Scopes.Contains(methodName) (ordinal, case-sensitive) before making any database call. A key must carry the exact method name as a scope — "Echo" does not grant "echo". If the scope check fails, or the subsequent IInboundApiRepository.GetMethodByNameAsync returns null, both branches emit 403 with the same body {"error":"API key not approved for this method"}. The scope check runs first to avoid a DB round-trip on the reject path and to eliminate a latency timing oracle.
ApiMethod entity
ApiMethod (in ZB.MOM.WW.ScadaBridge.Commons.Entities.InboundApi) is the persistence-ignorant shape:
public class ApiMethod
{
public int Id { get; set; }
public string Name { get; set; } // route segment
public string Script { get; set; } // Roslyn C# script body
public string? ParameterDefinitions { get; set; } // JSON: List<ParameterDefinition>
public string? ReturnDefinition { get; set; } // JSON: List<ReturnFieldDefinition>
public int TimeoutSeconds { get; set; }
}
ParameterDefinitions and ReturnDefinition are stored as JSON strings to keep the schema simple; both are deserialized on every request by ParameterValidator and ReturnValueValidator.
Extended type system
Parameter and return field definitions share the same six-type vocabulary:
| Type | JSON shape | C# value after coercion |
|---|---|---|
Boolean |
true / false |
bool |
Integer |
number (whole) | long |
Float |
number | double |
String |
string | string |
Object |
JSON object | Dictionary<string, object?> |
List |
JSON array | List<object?> |
Object and List are validated for JSON shape only — field-level or element-level type constraints are the script's responsibility. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway.
Architecture
Request pipeline
POST /api/{methodName}
│
├─ AuditWriteMiddleware ← mints ExecutionId; buffers bodies; emits audit row in finally
│ └─ InboundApiEndpointFilter ← 503 on standby node; 413 on oversized body
│ └─ HandleInboundApiRequest
│ ├─ IApiKeyVerifier.VerifyAsync ← 401 on any auth failure
│ ├─ scope check + GetMethodByNameAsync ← 403 on not-approved
│ ├─ ParameterValidator.Validate ← 400 on bad parameters
│ └─ InboundScriptExecutor.ExecuteAsync
│ ├─ ForbiddenApiChecker ← static trust model enforcement
│ ├─ Roslyn compile + cache ← handler cached by method name
│ └─ ReturnValueValidator ← 500 on return shape mismatch
└─ ICentralAuditWriter.WriteAsync ← fire-and-forget from middleware finally
The filter is applied at registration time via .AddEndpointFilter<InboundApiEndpointFilter>() in EndpointExtensions.MapInboundAPI; it runs before the handler so a standby node or an oversized body never reaches auth or script execution.
Script compilation and handler cache
InboundScriptExecutor is a singleton holding two ConcurrentDictionary instances:
_scriptHandlers— maps method name to a compiledFunc<InboundScriptContext, Task<object?>>._knownBadMethods— records methods whose scripts have failed to compile, capped at 1 000 entries, so a bad script is compiled at most once per startup and a flood of unique bogus names cannot grow the cache without bound.
The compilation path in CompileAndRegister:
public bool CompileAndRegister(ApiMethod method)
{
var handler = Compile(method);
if (handler == null)
{
TryRecordBadMethod(method.Name);
return false;
}
_knownBadMethods.TryRemove(method.Name, out _);
return Register(method.Name, handler);
}
Compile runs ForbiddenApiChecker.FindViolations first — a Roslyn syntax-tree walk that rejects forbidden namespace references (System.IO, System.Diagnostics, System.Threading except Tasks, System.Reflection, System.Net, System.Runtime.InteropServices, Microsoft.Win32) and reflection-gateway member names (GetType, Assembly, GetMethod, CreateInstance, InvokeMember, and others). Scripts containing dynamic or Activator are also rejected. This is defence-in-depth, not a true sandbox.
If a method is invoked before it has been compiled — for example a method created after startup — ExecuteAsync performs a lazy compile on first call, then stores the handler via GetOrAdd so concurrent first callers share one delegate.
Scripts are compiled with a restricted reference set (mscorlib, System.Linq, System.Collections.Generic, RouteHelper's assembly, ScriptParameters's assembly, and the C# runtime binder) and with imports for System, System.Collections.Generic, System.Linq, and System.Threading.Tasks. The globalsType is InboundScriptContext.
Script context and the Route surface
InboundScriptContext is the Roslyn globals object injected into every running script:
public class InboundScriptContext
{
public ScriptParameters Parameters { get; }
public RouteHelper Route { get; }
public CancellationToken CancellationToken { get; }
}
Parameters wraps the validated, type-coerced values. Parameters["key"] gives raw object? access; Parameters.Get<T>("key") adds typed conversion with clear error messages (ScriptParameterException). Route is a scoped RouteHelper already bound to the method-level deadline token and to the inbound request's ParentExecutionId.
RouteHelper.To(instanceCode) returns a RouteTarget that exposes five operations:
| Method | Description |
|---|---|
Call(scriptName, parameters?) |
Invoke a script on the instance; returns the script's return value. |
GetAttribute(name) |
Read one attribute value. |
GetAttributes(names) |
Batch-read; returns IReadOnlyDictionary<string, object?>. |
SetAttribute(name, value) |
Write one attribute value. |
SetAttributes(dict) |
Batch-write. |
All five operations are synchronous from the script's perspective (the central node blocks until the site responds or the method timeout fires). There is no store-and-forward — a site-unreachable or timed-out routed call throws InvalidOperationException back to the script, which surfaces as a 500 to the caller.
RouteTarget.Call builds a RouteToCallRequest carrying ParentExecutionId so the spawned site script execution records the inbound request as its parent in the audit tree:
var request = new RouteToCallRequest(
correlationId, _instanceCode, scriptName, ScriptArgs.Normalize(parameters),
DateTimeOffset.UtcNow, _parentExecutionId);
var response = await _instanceRouter.RouteToCallAsync(siteId, request, token);
IInstanceRouter is the seam over CommunicationService; in production, CommunicationServiceInstanceRouter delegates every call directly to CommunicationService.RouteToCallAsync / RouteToGetAttributesAsync / RouteToSetAttributesAsync.
Active-node gating
IActiveNodeGate.IsActiveNode is the seam the Host implements using Akka cluster state. When false, InboundApiEndpointFilter returns 503 before any auth or script logic runs. When no implementation is registered — non-clustered hosts, tests — the endpoint is served, preserving prior behaviour.
Audit integration
AuditWriteMiddleware sits in the pipeline above the endpoint filter and handler. At the start of every request it:
- Mints a fresh
Guidas the request'sExecutionIdand stashes it onHttpContext.Items[InboundExecutionIdItemKey]. - Calls
HttpRequest.EnableBuffering()(for POST/PUT/PATCH requests only) and reads up toAuditLogOptions.InboundMaxBytesbytes of the request body into a bounded audit copy, then rewinds the stream to position 0 so the downstream handler sees the full payload. - Wraps
HttpResponse.BodyinCapturedResponseStream, which mirrors every write to the real sink while capturing up toInboundMaxBytesbytes for the audit copy.
In the finally block, the middleware calls ICentralAuditWriter.WriteAsync (fire-and-forget with fault observation) to emit one AuditChannel.ApiInbound row. The row's AuditKind is InboundAuthFailure for 401/403 and InboundRequest for all other outcomes. Status is Delivered for 2xx and Failed for 4xx/5xx or a handler exception. Actor is the resolved API key display name (stashed by the endpoint handler on HttpContext.Items[AuditActorItemKey] after successful auth); it is forced null for auth failures so the middleware never echoes an unauthenticated principal. The audit row's ExecutionId is the same Guid minted in step 1.
The endpoint handler reads that same ExecutionId from HttpContext.Items and threads it into InboundScriptExecutor.ExecuteAsync as parentExecutionId, which in turn passes it to RouteHelper.WithParentExecutionId. Any Route.To().Call() inside the script carries it as RouteToCallRequest.ParentExecutionId, so the spawned site script execution is linked back to this inbound request in the audit tree.
Audit emission is best-effort. A write failure is caught, logged at Warning, and dropped. It never alters the HTTP response.
Usage
HTTP contract
POST /api/{methodName}
Authorization: Bearer sbk_<keyId>_<secret>
Content-Type: application/json
{
"siteId": "SiteA",
"startDate": "2026-03-01",
"endDate": "2026-03-16"
}
Success response (200):
{
"siteName": "Site Alpha",
"totalUnits": 14250,
"lines": [
{ "lineName": "Line-1", "units": 8200, "efficiency": 92.5 }
]
}
Error responses:
| Status | Condition |
|---|---|
401 |
Missing, malformed, unknown, revoked, or secret-mismatched token. |
403 |
Valid key, but not in scope for this method; or method not found. |
400 |
Missing required parameters, wrong types, or unexpected fields. |
413 |
Request body exceeds MaxRequestBodyBytes. |
500 |
Script execution error, compilation failure, or return-shape mismatch. |
503 |
Request reached a standby node. |
The 403 body is identical whether the method does not exist or the key lacks scope, so a caller holding a valid key cannot enumerate method names by observing status differences.
Writing a method script
A method script runs as a Roslyn C# script with InboundScriptContext as globals. The script has access to Parameters, Route, and CancellationToken.
// Example: read a parameter, call a site script, return a result
var siteId = Parameters.Get<string>("siteId");
var result = await Route.To(siteId).Call("GetProductionSummary", new { date = Parameters.Get<string>("date") });
return result;
The Route.To().Call() inherits the method-level timeout automatically. A script that needs a tighter per-call bound may pass an explicit CancellationToken. Scripts may not access System.IO, the entire System.Diagnostics namespace (including Process), System.Threading (except Tasks), System.Reflection, System.Net, or reflection-gateway members — violations are rejected statically at compile time.
Startup compilation and hot-reload
At startup the Host loads all ApiMethod rows from the configuration database and calls CompileAndRegister on each. After a method is updated via the Management API or CLI, the Management Service calls CompileAndRegister again — the updated script takes effect on the next request, with no node restart. Methods created after startup compile lazily on first invocation. Scripts modified directly in the database do not take effect until the next node restart; always use the Management API, CLI, or Central UI.
Configuration
Options class: InboundApiOptions, bound from the ScadaBridge:InboundApi section.
| Key | Default | Description |
|---|---|---|
DefaultMethodTimeout |
00:00:30 |
Execution timeout applied when ApiMethod.TimeoutSeconds is zero or not set. |
MaxRequestBodyBytes |
1048576 (1 MiB) |
Body size cap enforced by InboundApiEndpointFilter before the body is parsed. Requests whose Content-Length exceeds this return 413; chunked requests are cut off by Kestrel as they stream in. |
ApiKeyPepper |
(required) | Server-side HMAC pepper for bearer credentials. Consumed by the shared IApiKeyVerifier; must be a strong, random value (≥ 16 characters), different per environment, supplied via a secret store. |
The inbound body-capture cap for audit is configured separately under AuditLog:InboundMaxBytes (default 1 MiB; range [8192, 16777216]). It governs only the audit copy — the downstream handler always sees the full body.
Dependencies & Interactions
- Commons (#16) — owns
ApiMethod,ParameterDefinition,ScriptParameters,ScriptParameterException, theRouteToCall*/RouteToGetAttributes*/RouteToSetAttributes*message records,IInboundApiRepository, andIInstanceLocator. Also ownsICentralAuditWriter(viaZB.MOM.WW.Audit),AuditChannel,AuditKind,AuditStatus, andScadaBridgeAuditEventFactory. - Configuration Database (#17) — provides the
IInboundApiRepositoryimplementation (GetMethodByNameAsync,GetAllApiMethodsAsync, CRUD). Method definitions persist in the central MS SQL configuration database. - Central–Site Communication (#5) —
CommunicationServiceInstanceRouterdelegates everyRoute.To()operation toCommunicationService. The routed call travels from the centralCentralCommunicationActorto the target site viaClusterClient, reaches the targetInstanceActor, and aScriptExecutionActorexecutes the named script. The return value flows back synchronously. - Audit Log (#23) —
AuditWriteMiddlewareresolvesICentralAuditWriterto emit theApiInboundrow via the central direct-write path. The inbound request is the parent execution for any site script it spawns: the middleware'sExecutionIdbecomesRouteToCallRequest.ParentExecutionIdon every routedCall. Cross-link:AuditWriteMiddleware.InboundExecutionIdItemKey/AuditWriteMiddleware.AuditActorItemKeyare theHttpContext.Itemskeys that tie the endpoint handler and middleware together. - Security (#10) — API key verification (
IApiKeyVerifier,AddZbApiKeyAuth) is registered by the Host. The inbound API uses a dedicated key scheme independent of LDAP/AD session auth. - Cluster Infrastructure (#13) —
IActiveNodeGate(interface in this project; implementation in the Host) gates the endpoint to the active central node. A standby returns503without running any script logic. - Health Monitoring (#11) —
ScadaBridgeTelemetry.RecordInboundApiRequest(methodName)is called on every request (after auth failures are classified);CentralAuditWriteFailuressurfaces on the central health snapshot when an audit write fails. - Design spec: Component-InboundAPI.md.
Troubleshooting
A method always returns 500 after a script update
The in-memory handler cache still holds the previous compiled delegate. If the update went through the Management API or CLI, CompileAndRegister should have been called automatically and the new script should be active on the next request. If the script was edited directly in the database, the cached delegate is stale until the next node restart. Check the ScadaBridge.InboundAPI log category for a "script compilation failed" or "trust model violation" warning to distinguish a compile error from a routing failure.
A method is stuck in the known-bad-methods cache
If a previously broken script is fixed but ExecuteAsync still returns "Script compilation failed for this method", the method name is in _knownBadMethods. CompileAndRegister clears the bad-method entry on a successful compile; calling it (via the Management API or CLI api-method update) after the fix is applied resets the cache and makes the corrected script active immediately.
Routed calls time out but the site is reachable
The method-level timeout covers the entire execution including Route.To().Call(). A slow site script, a large return value, or network latency can consume the budget. TimeoutSeconds on the ApiMethod entity controls the cap per method; DefaultMethodTimeout in InboundApiOptions applies when TimeoutSeconds is zero. Increase TimeoutSeconds for long-running methods; a 503 from the /health/active endpoint on the site side indicates a site failover mid-call.
Audit rows missing for inbound requests
AuditWriteMiddleware emits on a fire-and-forget Task; a write failure is caught and logged at Warning under AuditWriteMiddleware. CentralAuditWriteFailures increments on the central health snapshot. The request itself still returns its normal HTTP response — a missing audit row never means the call failed.