Files
ScadaBridge/docs/known-issues/2026-06-25-inbound-api-compile-error-not-client-visible.md
T
Joseph Doherty 8a78e759c0 docs: former-api-specs (MES + DNC/Delmia) + inbound compile-error known issue
- former-api-specs/mes: Alarm-API, MoveIn-MoveOut-API, API-key authgaps (from ~/Desktop/mesapi)
- former-api-specs/dnc: Delmia-Integration-API — Delmia document service + WW recipe-download notify (from ~/Desktop/delmiaintegration)
- known-issues: inbound API compile error not client-visible; no api-method validate
2026-06-26 04:13:19 -04:00

6.5 KiB

Known issue — inbound API method compile errors are not client-visible; no on-demand validation (2026-06-25 session)

Status: OPEN · Found: 2026-06-25 · Context: live ops session on wonder-app-vd03 — three deployed inbound methods (IpsenMESMoveIn, MesMoveIn, MesMoveOut) returned Script compilation failed for this method after being authored from a design doc that used the wrong DB-helper name (Database.QuerySingle<T> instead of the shipped async Database.QuerySingleAsync<T>). Diagnosing the actual Roslyn error required an SSH dive into the central log; nothing in the CLI or the Management/data-plane API surfaces it. Components: Inbound API (#14), CLI (#19), Management Service (#18)

Issues are listed worst-first. Severities are author estimates. Neither item caused data loss — once the scripts were corrected via UpdateApiMethod they compiled and ran (verified with a live MesMoveIn test against the Z28061Sim instance: {"WasSuccessful":true,"ErrorText":"","BatchID":0}).

Related: the runtime mechanics behind both items are captured in the recall notes inbound-known-bad-method-cache and scadabridge-inbound-db-helper-querysingleasync. The root-cause doc fix shipped in 66bbbb7a / 33da8c79.


1. The real inbound-script compile error is server-log-only; there is no api-method validate

Severity: Medium · Components: Inbound API (#14), CLI (#19), Management Service (#18)

Symptom: When an inbound method's script fails to compile, every caller of POST /api/{method} gets the same generic body — Script compilation failed for this method — with no diagnostic. The actual Roslyn error (e.g. 'InboundDatabaseHelper' does not contain a definition for 'QuerySingle') is written only to the central server log. There is no CLI command and no API verb to (a) retrieve the last compile error for a method, or (b) compile/validate a method's script on demand the way templates can be validated.

Reproduction (this session):

curl -s -X POST http://wonder-app-vd03.zmr.zimmer.com:8085/api/IpsenMESMoveIn \
  -H "X-API-Key: <key>" -H "Content-Type: application/json" -d '{ ...MoveIn... }'
# -> {"WasSuccessful":false,"ErrorText":null,"...":"Script compilation failed for this method"}   (no detail)

The only way to get the real cause was:

ssh -tt -p 2222 -i ~/.ssh/servecli_wonder dohertj2@wonder-app-vd03.zmr.zimmer.com
# grep E:\ApiInstall\ScadaBridge\central\logs\scadabridge-central-<date>.log for "script compilation failed"

Root cause: InboundScriptExecutor deliberately returns a non-leaky generic message to the data plane (InboundScriptExecutor.cs:299 and :311"Script compilation failed for this method"), while the genuine diagnostic only ever reaches the logger:

  • InboundScriptExecutor.cs:182-183LogWarning("API method {Method} script compilation failed: {Errors}", …)
  • InboundScriptExecutor.cs:197LogError(ex, "Failed to compile API method {Method} script", …)

Returning the raw Roslyn text to an external, API-key caller is the right default (it can leak code/internal type names), but it means an operator/admin has no first-class channel to that text either. Contrast template validate (CLI template validate --id, README §Template) which runs a real compile and returns the diagnostics — there is no api-method validate equivalent (grep for ValidateApiMethod/CompileApiMethod/RecompileApiMethod across src/ returns nothing; ApiMethodCommands.cs registers only list/get/create/update/delete).

Impact: turns a one-line fix into a host-access investigation. Authoring/repairing an inbound script becomes "update → fire a request → if it fails, SSH into the host and read the log → repeat," instead of "validate → read the error → fix."

Suggested fix (pick one or both):

  1. api-method validate --id <int> (CLI #19) backed by a management ValidateApiMethodCommand (#18) that runs CompileAndRegister's compile path and returns the structured diagnostics to the authenticated, role-gated management caller (never the data plane). Mirror template validate.
  2. Surface the last compile state on api-method get — e.g. LastCompileError (string, null when clean) + IsKnownBad (bool) — so an operator can see why a method is failing without re-firing it. Keep the data-plane /api/{method} body generic as-is.

2. The _knownBadMethods cache is neither observable per-method nor resettable without a full-replace update

Severity: Low-Medium · Components: Inbound API (#14)

Symptom: Once a method's script fails to compile, its name is recorded in an in-memory bad-methods set and every later request short-circuits to the generic message without recompiling. Consequences observed/known:

  • Fixing the stored script directly in the config DB does not take effect — the running process keeps serving the compile-failed message until a management UpdateApiMethod (which calls CompileAndRegister) or a service restart. (This is exactly why one of the three methods this session, MesMoveIn, had to be re-saved via the management API even though its stored script was already fine.)
  • There is no way to see whether a given method is currently in the bad set (only an aggregate internal int KnownBadMethodCount, not exposed to any client), and no lightweight way to force a recompile short of a full-entity replace.

Root cause: InboundScriptExecutor (InboundScriptExecutor.cs):

  • declares the cache at :39 (ConcurrentDictionary<string, byte> _knownBadMethods),
  • short-circuits on it at :58 / :298, adds to it at :62,
  • and clears an entry in exactly one placeCompileAndRegister (:103, removal at :118). A direct DB row edit never runs CompileAndRegister, so the stale entry persists.

The cache itself is correct and desirable (it stops every request re-running a doomed Roslyn compile). The gap is purely observability + a targeted reset.

Suggested fix:

  • Expose per-method state via the item-1 fix (IsKnownBad / LastCompileError on api-method get).
  • Add a lightweight api-method recompile --id (management RecompileApiMethodCommand) that re-runs CompileAndRegister for one method without requiring the caller to round-trip the whole entity (script + timeout + parameterDefinitions + returnDefinition) — today UpdateApiMethod is full-replace, so an operator must re-send every field just to bust the cache. This is the smaller, lower-risk sibling of item 1's validate verb.