# ScadaBridge — Pending / Deferred / Partial / Missing Functionality Audit **Date:** 2026-06-15 **Scope:** Full system — design specs (`docs/requirements/`), all of `src/`, the Central UI / CLI / Management Service surfaces, and the plan/checklist archive (`docs/plans/`). **Method:** Five parallel read-only investigators, each verifying doc claims against actual code (file:line evidence). Top findings were independently corroborated by 2+ agents. ## Executive summary The codebase is unusually clean: **zero real `TODO`/`FIXME` markers in `src/`**, and all 11 implementation phases self-report complete. Consequently the unfinished work does **not** announce itself — it hides in three forms: 1. **Silent gaps (Tier 1)** — documented as working, not marked deferred, but absent or inert in production. 2. **Partial / behaviorally-divergent functionality (Tier 2)** — real, but narrower or different from the spec. 3. **Intentional deferrals (Tier 3)** — knowingly punted, correctly documented, with extensible seams. *Not defects.* 4. **Doc↔code drift (Tier 4)** — code is fine; the specs describe a superseded architecture. **Actionable risk is concentrated in Tier 1.** Recommended starting point: **#3 / #4 (wire the two never-started audit actors)** — highest impact, smallest blast radius. --- ## Tier 1 — Silent gaps: documented as working, but not actually running These are the dangerous ones. Specs present them as live behavior, they are **not** marked deferred, yet the functionality is absent or inert. | # | Gap | Where | Impact | |---|-----|-------|--------| | **1** | **Script "test-compilation" does no real compilation.** The headline pre-deploy gate ("scripts must compile without errors") is a brace-balance + forbidden-API *substring* scan. No Roslyn reference in the project. | `TemplateEngine/Validation/ScriptCompiler.cs:56-104`; used by `ValidationService.cs:128` | Semantically-broken C# passes validation and **deploys**. Found independently by 2 agents. | | **2** | **Script forbidden-API gate is bypassable.** The script trust model's only design-time enforcement is the same substring scan — defeated by aliases / `using static` / `global::`. Self-documented as "SECURITY LIMITATION (TemplateEngine-006)". | `ScriptCompiler.cs:14-22,61-72`; `ValidationService.cs:346` | Security boundary is advisory only. 2 agents. | | **3** | **Audit Log 365-day retention purge never starts.** `AuditLogPurgeActor` exists but has **zero** `ActorOf`/`Props.Create` callers; only the roll-*forward* partition service runs. | `AuditLog/.../AuditLogPurgeActor.cs:58`; `AkkaHostedService.cs:486` ("wired in a later bundle") | The documented purge does not run in production; `AuditLog` grows unbounded. | | **4** | **Audit Log reconciliation self-heal never wired.** `IPullAuditEventsClient` has no implementation; `SiteAuditReconciliationActor` is never instantiated. | `IPullAuditEventsClient.cs:31`; `SiteAuditReconciliationActor.cs:68` | The documented "lost-telemetry fallback" doesn't exist; forward telemetry is the only path. | | **5** | **Site Call Audit reconciliation pull + daily purge both missing.** The actor's own docstring admits it. `PurgeTerminalAsync` is implemented but never invoked. | `SiteCallAuditActor.cs:28-34`; `SiteCallAuditRepository.cs:213` | `SiteCalls` mirror has no self-heal and grows unbounded. | | **6** | **Site Event Logging emits only 2 of 7 documented categories.** Only `connection` + `script` (error-path only). Alarm, Deployment, Store-and-Forward, Instance-Lifecycle, Notification events are never logged — `ISiteEventLogger` isn't injected into those subsystems. | `DataConnectionActor.cs`, `ScriptExecutionActor.cs` (only emitters); spec "Events Logged" §20-28 | Operational event log is materially incomplete vs spec. | --- ## Tier 2 — Partial / behaviorally-divergent functionality Real, but narrower than the spec — wrong in a way that could surprise an operator or script author. | # | Gap | Where | |---|-----|-------| | **7** | **`Database.CachedWrite` misclassifies permanent SQL errors as transient** → retries forever instead of failing fast to the script. The API path does it right; the DB path does not. No immediate attempt, no synchronous permanent-`Failed` return. | `DatabaseGateway.cs:78-204` (cf. `ExternalSystemClient.cs:100-161`) | | **8** | **Alarm `conditionFilter` is plumbed end-to-end but applied nowhere** — set a filter on a native-alarm source and it silently mirrors *all* conditions. | `DataConnectionActor.cs:1482,1540-1554`; `RealOpcUaClient.cs:242,295`; `MxGatewayDataConnection.cs:154-167` | | **9** | **Per-script execution timeout doesn't exist** — spec promises per-script; only a global `ScriptExecutionTimeoutSeconds`. No field in the template/flattened model to carry it. | `SiteRuntimeOptions.cs:31`; `ScriptExecutionActor.cs:100`; `AlarmExecutionActor.cs:66` | | **10** | **Connection-level diffs never surface in the deployment diff** — `ComputeConnectionsDiff` is dead code (no callers); `ConfigurationDiff` has no slot for it. Per-attribute binding drift *is* caught; standalone connection endpoint (protocol/config/failover) diff is not. | `DiffService.cs:158-204`; `Commons/Types/Flattening/ConfigurationDiff.cs:7-24` | | **11** | **Inbound API auth transport drift** — code uses `Authorization: Bearer sbk__`; doc says `X-API-Key` header. | `InboundAPI/.../EndpointExtensions.cs:83-90` | | **12** | **Inbound API audit write is fire-and-forget *after* response flush** — doc says synchronous *before* flush. Row is still emitted (fail-soft), just non-blocking and after the body is forwarded. | `AuditWriteMiddleware.cs:195-212,281-290` | | **13** | **Inbound `Object`/`List` extended types are shape-validated only** — no nested/field-level type validation, despite spec implying typed/nested validation. | `ParameterValidator.cs:109-145`; `ReturnValueValidator.cs:18` | | **14** | **JWT-in-cookie session design not implemented** — `/auth/login` signs a plain `ClaimsPrincipal`; `GenerateToken` only used by the CLI `/auth/token` path; `ValidateToken` has no external callers. | `AuthEndpoints.cs:38,75-112,152`; `ServiceCollectionExtensions.cs:99-118` | | **15** | **"Re-query LDAP every 15 min / roles never >15 min stale" not implemented for interactive sessions** — `JwtTokenService.RefreshToken`/`RecordActivity`/`ShouldRefresh`/`IsIdleTimedOut` have **zero** call sites; roles fixed until cookie expiry. The 15-min sliding + 30-min idle layers are collapsed into a single 30-min sliding cookie window. | `JwtTokenService.*` (no callers); `ServiceCollectionExtensions.cs:99-148` | | **16** | **Transport stale-instance enumeration always returns empty** — `BundleImporter` returns `Array.Empty()`; UI shows a generic warning with no count, link not filtered to stale instances. | `BundleImporter.cs:733`; `TransportImport.razor:347-388` | | **17** | **`MachineDataDb` fail-fast requirement not enforced** — spec (REQ-HOST-3/4) requires central nodes to validate a non-empty `MachineDataDb` connection string. `DatabaseOptions` has only `ConfigurationDb`/`SiteDbPath`; validator never checks it; 0 `grep` hits in `src/`. Key lives only in docker appsettings as dead config. | `DatabaseOptions.cs:6-12`; `StartupValidator.cs:60-61` | | **18** | **CI grep-guard against `UPDATE/DELETE … AuditLog` not in the repo** — spec claims a build-time grep that fails on data-layer mutations. DB-role DENY enforcement *is* present in migrations (so this is a backstop, not the only control), but the claimed code-level guard is absent. | spec `Component-AuditLog.md:335-336`, `Component-ConfigurationDatabase.md:297` | ### Lower-severity Tier-2 / behavioral notes | # | Gap | Where | |---|-----|-------| | 19 | **Script "started"/"completed" events not logged** (only failures, severity `Error`). | `ScriptExecutionActor.cs:239,256`; `ScriptActor.cs:369` | | 20 | **Return-type compatibility check is dead scaffolding** — `BuildReturnMap` builds maps never read; no return-type comparison runs. | `SemanticValidator.cs:62-63,279-287` | | 21 | **Argument *type* compatibility not checked** — only arg *count* (comma counting). | `SemanticValidator.cs:251-266,390-425` | | 22 | **Native-alarm-source connection-capability validation never runs in deploy pipeline** — `alarmCapableConnectionNames` param no production caller supplies. | `SemanticValidator.cs:30-33,239-245`; `FlatteningPipeline.cs:93,115` | | 23 | **Connection-binding completeness is a non-blocking Warning, not deploy-gating Error**; "name exists at site" half missing. | `ValidationService.cs:504-519`; `ValidationResult.cs:9` | | 24 | **Debug snapshot/subscribe for unknown instance returns empty snapshot, not error** — caller can't distinguish "not deployed" from "deployed but empty." | `DeploymentManagerActor.cs:845-866` | | 25 | **Recursion-limit error logged to .NET `ILogger`, not the site event log** as spec requires. | `ScriptRuntimeContext.cs:302-305,464-466` | | 26 | **Debug-stream snapshot/stream ordering reversed; no timestamp-dedup replay** — `PreStart` sends snapshot first, opens stream after; gap-window events lost (spec wants stream-first + replay/dedup). | `DebugStreamBridgeActor.cs:89-103,163-166` | | 27 | **OPC UA native-alarm transition leaves several display fields empty** (Category/Description/OperatorUser/OriginalRaiseTime/CurrentValue/LimitValue) — partly by design. | `RealOpcUaClient.cs:395-403`; `MxGatewayAlarmMapper.cs:79-113` | | 28 | **Readiness gate omits "required cluster singletons running" criterion** — covers membership + DB connectivity only (softened by spec's "(if applicable)"). | `Program.cs:188-201,314-317`; `AkkaClusterHealthCheck.cs:54` | | 29 | **SiteEventLog active-node purge gate never registered** — `SiteEventLogActiveNodeCheck` not added to DI; purge defaults to `() => true`, runs on standby too (harmless, but documented restriction unenforced). | `SiteEventLogging/ServiceCollectionExtensions.cs:33-37`; `EventLogPurgeService.cs:61` | | 30 | **`FailedWriteCount` metric exposed "for future Health Monitoring" but never consumed** — dangling metric. | `ISiteEventLogger.cs:32-40` | | 31 | **`StateTransitionValidator` allows Delete from `NotDeployed`; spec matrix says No** (deliberate per code comment, contradicts doc). | `StateTransitionValidator.cs:38-39` | --- ## Tier 3 — Intentional deferrals (correctly documented — NOT defects) Knowingly punted, with extensible seams and explicit doc notes. `[PERM]` = permanent / v-next; `[SLICE]` = deferred-to-a-later-slice with seam present. **Centralized Audit Log (#23)** - `[PERM]` Hash-chain tamper evidence (v1.x). `verify-chain` CLI is a no-op stub that prints "not enabled in this release". — `AuditCommands.cs:243-246`; `AuditVerifyChainHelpers.cs:6-8` - `[PERM]` Parquet export/archival. Server returns HTTP `501`; CSV + JSONL implemented. — `AuditEndpoints.cs:188-194`; `AuditExportHelpers.cs:139-148` - `[PERM]` Per-channel retention overrides. — `2026-05-20-audit-log-code-roadmap.md:16` - `[PERM]` Tag-cascade for `ParentExecutionId` — only the inbound-API→routed-site bridge is built; trigger-driven runs pass `parentExecutionId = null`. — `ScriptActor.cs:404,429`; `2026-05-21-audit-parent-executionid-design.md:209` - `[PERM]` ExecutionId/ParentExecutionId backfill on historical rows; SourceNode backfill on legacy rows; per-node stuck-count KPIs. - `[PERM]` Structured/response-header response capture; inbound request-header capture; per-method opt-out; `AuditInboundCeilingHits` metric. — `2026-05-23-inbound-api-full-response-audit-design.md:113-127` - *Uncertain:* CLI `audit tree` command (doc "maybe", not found in CLI). **Notifications (#8 / #21)** - `[SLICE]` Teams (and all non-Email) notification types — `INotificationDeliveryAdapter` seam exists, only `EmailNotificationDeliveryAdapter` implemented; `NotificationType` enum is Email-only. Missing-adapter path parks gracefully. — `NotificationType.cs:6-9`; `NotificationOutboxActor.cs:457-474` - `[SLICE]` Central UI notification-list form has no `Type` selector (Email hard-coded). — `NotificationListForm.razor` - `[PERM]` Historical/trend KPI charts (no time-series store). **Native Alarms / MxGateway / OPC UA** - `[PERM]` Native-alarm ack/shelve/suppress write-back; central alarm tables/history/journal; alarm-driven notifications/scripts — read-only by design. — `2026-05-29-native-alarms-design.md:201-206` - `[SLICE]` Dedicated operator Alarm Summary page (DebugView only for now). - `[PERM]` MxGateway secured writes (operator+verifier). - `[SLICE]` OPC UA address-space search; `BrowseNext` paging. — `RealOpcUaClient.cs:574` - `[PERM]` OPC UA type-info surfacing; bulk override import/CSV. - `[SLICE]` OPC UA "Verify endpoint" connectivity button; cert-management UI. **Transport (#24)** - `[PERM]` Site-scoped / instance-scoped artifact transport (needs name-mapping subsystem). - `[PERM]` Direct cluster-to-cluster pull; asymmetric bundle signing; differential/incremental bundles. - `[PERM/SLICE]` Per-line/Myers diff for Modified artifacts (coarse line-count delta only). — `ArtifactDiff.cs:18-25` **TreeView** - `[SLICE/PERM]` R6 lazy-loading, R7 keyboard nav, R16 multi-select — spec marks all "(Deferred)". — `Component-TreeView.md:87-93,288-295` **Templates / Data Connections / Triggers UI** - `[SLICE]` Template tree search/filter; `[PERM]` folder drag-drop, sibling reorder, root context menu. - `[PERM]` Move data connection between sites; `[SLICE]` connection live-status indicators (blocked on DCL state surfacing). - `[SLICE]` Base-template versioning "update-derived" flow; multiple inheritance levels; `[PERM]` promote-derived-to-base, cross-tenant libraries. - `[SLICE]` Strict expression-trigger analysis kind; `[PERM]` WhileTrue trigger mode for alarms. - `[SLICE]` Schema-driven value-entry forms; schema hover/completion; `[PERM]` JSON Schema `$ref` reuse / template-level schema library. **Cached-call tracking (#6 / #22)** - `[SLICE]` CLI surface for site-local Retry/Discard of cached calls; `[PERM]` unified notifications+site-calls outbox page. **UI audit backlog (`2026-05-12-ui-audit.md:536-554`)** - `IDialogService` modal abstraction; design-tokens/CSS-vars; dark-mode/theming; shared pagination+filter component; accessibility pass; replacing SignalR debug-view streaming. **Environment / tooling** - `[PERM]` True air-gapped second environment (env2 shares MSSQL/LDAP/SMTP); 3rd/4th env; `--env` flag on `deploy.sh`. - `[PERM]` Repo/folder rename (kept as ScadaBridge to preserve context). - `[SLICE]` Playwright alarm-override UI coverage. --- ## Tier 4 — Doc↔code drift (code is fine; docs describe a superseded architecture) Worth fixing for anyone relying on the docs as the spec. **Config DB / Commons re-architecture not reflected in specs (High doc-impact):** - `AuditLog` table collapsed to 10 canonical + `DetailsJson` + 6 PERSISTED `JSON_VALUE` computed cols; doc still lists ~24 typed columns (`Kind`, `HttpStatus`, `RequestSummary`, …). — migration `20260602174346_CollapseAuditLogToCanonical.cs`; `Entities/AuditLogRow.cs:54-136` - `AuditEvent` moved out of Commons into the external `ZB.MOM.WW.Audit` NuGet package; doc (REQ-COM-1/3/5b) still describes it as a Commons type. — `Commons.csproj:11` - `ApiKey` entity / API-key persistence retired to shared `ZB.MOM.WW.Auth.ApiKeys` SQLite store; doc still lists `ApiKey` + `ApprovedApiKeyIds`. — migration `20260602092753_RetireInboundApiKeyStore.cs` **CLI docs drift (README is the stale doc; `Component-CLI.md` mostly matches code):** - Entire `bundle` (Transport #24) command group is shipped + registered but documented in **neither** `Component-CLI.md` **nor** `CLI/README.md`. — `Program.cs:36`; `BundleCommands.cs:24-372` - `security api-key create` requires undocumented `--methods` (Required); docs show only `--name`. — `SecurityCommands.cs:41-45` - `security api-key update`/`delete` use `--key-id`; docs document `--id` (and an unwired `--name` on update). — `SecurityCommands.cs:60,71` - `security api-key set-methods` subcommand exists in code, documented nowhere. — `SecurityCommands.cs:91-102` - `api-method create` uses required `--script`; docs document `--code` + `--description` (neither exists). README is internally inconsistent (create=`--code`, update=`--script`). — `ApiMethodCommands.cs:57-62` - `db-connection create`/`update` documented with `--provider`; code has no such option. — `DbConnectionCommands.cs:56-72` - Widespread README option-name drift where `Component-CLI.md` already matches code (scope-rule `--mapping-id`, health `--site`/`--keyword`, template attribute `--value`/`--data-source`, template alarm `--trigger-type`/`--priority`/`--trigger-config`, composition delete `--id`, etc.). - `audit query` doc lists `--page` (code is keyset-only `--all`); undocumented `--execution-id`/`--parent-execution-id` filters exist. **Stale "deferred" markers for things that have actually SHIPPED:** - Transport CLI (`bundle export/preview/import`) — design doc §13 said "deferred"; now implemented. - `SourceNode` capture — `.tasks.json` shows all 21 tasks "pending"; fully implemented across Commons/AuditLog/NotificationOutbox/SiteCallOperational. - Site Call Audit Retry/Discard relay — DI comment says deferred; implemented + wired (`SiteCallAuditActor.cs:150-156,450-505`; `AkkaHostedService.cs:580-589`). - Bundle-import audit filter UI (Transport-012) — doc says deferred follow-up; shipped (`ConfigurationAuditLog.razor` `?bundleImportId=` filter). - Redaction/payload-cap "deferred to M5" comments in Site Runtime — already shipped (`ScadaBridgeAuditRedactor`, `AuditLogOptions.DefaultCapBytes/ErrorCapBytes`). - `AuditLogPage.HandleRowSelected` class comment says "no-op seam"; method is fully wired (opens drawer). **Other doc/spec inconsistencies (code richer/different than doc):** - Security role names: doc says Admin/Design/Deployment; code uses Administrator/Designer/Deployer/Viewer (canonicalized via migration). - `SiteCall` entity field names diverge from doc (`Channel` not `Kind`, `SourceSite` not `SourceSiteId`, adds `HttpStatus`/`IngestedAtUtc`). - `ExecuteReader` audited as `DbWrite` (read/write distinguished via `Extra` JSON `op`, not a distinct `AuditKind`). - Inbound audit doc references `ApiInbound.Completed`; actual kinds are `InboundRequest`/`InboundAuthFailure`. - `Teams` claimed present in `NotificationType` enum by Commons/ConfigDB docs; enum is Email-only. - Commons under-documents shipped code: MxGateway endpoint serializer/validator/config, `Observability/ScadaBridgeTelemetry.cs`, `IInboundApiKeyAdmin`, `IAuditActorAccessor` — none in the doc folder map. - `IHealthMonitoringRepository` listed in ConfigDB repo table but doesn't exist (doc annotated "future"). - `requirements-traceability.md` and many `.md.tasks.json` show "Pending" for shipped features — they track *plan generation*, not implementation; unreliable as a status source. - `ExternalSystemForm` "Recent audit activity" drill-in omits `channel=ApiOutbound` and uses exact-match `target` instead of starts-with (sibling `ApiKeyForm` link is correct). — `ExternalSystemForm.razor:20-24` --- ## Code-level sweep — investigated and ruled out (false positives) For completeness, items that *look* unfinished but are intentional: - ~44 empty `catch` blocks — all have explanatory comments / intentional fallback (JSON parse → default; disposal-race `ObjectDisposedException`). None silently swallow real errors. - `SiteNotificationRepository` / `SiteExternalSystemRepository` write methods throw `NotSupportedException` — by design (site config is read-only, managed via central deployment). - `StubOpcUaClient` (canned data; `BrowseChildrenAsync` throws `NotImplementedException`) — dev/test-only; production wires `RealOpcUaClientFactory`/`RealMxGatewayClientFactory` (`DataConnectionFactory.cs:38-47`). - `NoOpSiteStreamAuditClient`, `SandboxNotifyHelper`, sandbox host fakes — legitimate DI-default / test composition seams. - `AddSecurityActors` / `AddTemplateEngineActors` "Phase 0 placeholder" registrations — intentional empty seams (actor wiring lives in Host). - Migration `Down()` `NotSupportedException`, MxGateway/Bundle version-rejection `NotSupportedException`, `AuditWriteMiddleware` write-only-stream `NotSupportedException` — intentional guards. - Management Service: 113 handlers; all wire-registered `Mgmt*` commands dispatched. The three "unhandled" (`ResolveRolesCommand` retired; `BrowseNodeCommand`/`ReadTagValuesCommand` routed direct-to-site) are intentional. - Central UI: no stub/placeholder pages, no `NotImplementedException`, no "coming soon" banners, no no-op `@onclick`. `disabled=`/`placeholder=` usages are legitimate (loading guards, edit locks, HTML hints). --- ## Phase completeness (self-reported) All 11 phases report Complete with passing verification gates: - **Phase 0** Solution Skeleton — Complete (gate 11/11, 57 tests). - **Phase 1** Central Foundations — Complete (gate 20/20, 186 pass + 1 live-LDAP skip). - **Phase 2** Modeling & Validation — Complete (gate 9/9, 359 tests). - **Phase 3A** Runtime Foundation — Complete (gate 13/13, 389/389). - **Phase 3B** Site I/O & Observability — Complete (gate 11/11, 541 cumulative). - **Phase 3C** Deployment & Store-Forward — Complete (terse checklist). - **Phase 4** Operator UI — Complete (terse checklist). - **Phase 5** Authoring UI — Complete (terse checklist). - **Phase 6** Deployment & Ops UI — Complete (terse checklist; Codex external-review step skipped, best-effort). - **Phase 7** Integrations — Complete (terse checklist; Q12 SMTP-OAuth2 is a test-env dependency). - **Phase 8** Production Readiness — Complete (terse checklist). The ~665 unchecked `- [ ]` items in phase *plan* docs are spec-traceability references (each dispositioned Pass / Out-of-scope in Forward/Reverse tables), a documentation style — not a TODO list. **Operational (not code):** `docs/deployment/production-checklist.md` has ~60 unchecked install-time operator steps (env vars, connection strings, firewall ports 8081/636/587/1433, TLS certs, smoke tests). --- ## Confidence & caveats - **High confidence** on Tier 1 — each item verified by reading the code (class/interface existence + absence of callers via grep); top items corroborated by 2+ independent agents. - Terse Phase 3C–8 checklists self-report "Complete / tests passing" with no per-gate breakdown; test counts for those phases were not independently re-run. - Actual `src/` artifacts were treated as truth over `.tasks.json` status fields, which are demonstrably stale. - Items marked *Uncertain* (e.g. `audit tree` CLI, per-channel retention) rest on doc text only. ## Recommended next steps 1. **Wire the two never-started audit actors** (#3, #4) — highest impact, smallest blast radius (DI/Host wiring + `IPullAuditEventsClient` impl). 2. **Site Call Audit reconciliation + purge** (#5) — same shape as #3/#4. 3. **Decide on script compilation/security** (#1, #2) — either implement the Roslyn gate or downgrade the spec's claims; currently the strongest functional + security gap. 4. **Site Event Logging categories** (#6) — inject `ISiteEventLogger` into the 5 missing subsystems. 5. **Reconcile Tier-4 doc drift** — update Config DB / Commons specs for the audit/auth re-architecture and the CLI docs for the `bundle` group + option names.