Commit Graph

398 Commits

Author SHA1 Message Date
Joseph Doherty
b249ca3bf7 fix(management-service): resolve ManagementService-001/002/003 — enforce site scope on query/snapshot handlers and DebugStreamHub 2026-05-16 19:47:17 -04:00
Joseph Doherty
6f4efdfa2e fix(inbound-api): resolve InboundAPI-001/003/005 — concurrent handler cache, constant-time API key compare, script trust-model enforcement 2026-05-16 19:47:17 -04:00
Joseph Doherty
d30ded7e72 docs(code-reviews): regenerate index after batch 2 High fixes 2026-05-16 19:40:40 -04:00
Joseph Doherty
a0e6a36e79 fix(host): resolve Host-001 — exclude leader-only active-node check from /health/ready 2026-05-16 19:40:40 -04:00
Joseph Doherty
7d7214a4ca fix(health-monitoring): resolve HealthMonitoring-001/002 — populate S&F buffer depth, make SiteHealthState immutable 2026-05-16 19:40:40 -04:00
Joseph Doherty
340a70f0e6 fix(external-system-gateway): resolve ExternalSystemGateway-002/003 — apply HTTP call timeout, confirm CachedCall no double-dispatch 2026-05-16 19:40:40 -04:00
Joseph Doherty
ab098bf6c8 fix(deployment-manager): resolve DeploymentManager-001/002 — broaden failure catch, persist failure status with non-cancellable token 2026-05-16 19:40:40 -04:00
Joseph Doherty
fccd3274d3 fix(data-connection-layer): resolve DataConnectionLayer-002/003/004/005 — Resume supervision, concurrent dicts, subscribe-failure classification, write timeout 2026-05-16 19:40:40 -04:00
Joseph Doherty
d7630d80fe docs(code-reviews): regenerate index after batch 1 High fixes 2026-05-16 19:33:11 -04:00
Joseph Doherty
db08c6eb38 docs(code-reviews): re-triage ClusterInfrastructure-001 — bootstrap lives in Host, needs design decision 2026-05-16 19:33:09 -04:00
Joseph Doherty
9043f0089b fix(configuration-database): resolve ConfigurationDatabase-001 — remove dead child-template query in GetTemplateWithChildrenAsync 2026-05-16 19:33:09 -04:00
Joseph Doherty
301e7fb854 fix(communication): resolve Communication-002/003 — gRPC reconnect stream cleanup and subscription map safety 2026-05-16 19:33:09 -04:00
Joseph Doherty
87f14c190a fix(central-ui): resolve CentralUI-002/003/004 — site-scope enforcement, per-circuit console capture, cached auth state 2026-05-16 19:33:09 -04:00
Joseph Doherty
5a08b04535 fix(cli): resolve CLI-001 — honor SCADALINK_FORMAT and config-file format precedence 2026-05-16 19:33:09 -04:00
Joseph Doherty
d8f99ba781 docs(code-reviews): add regen-readme.py to generate the review index
README.md is now generated from the per-module findings.md files by
code-reviews/regen-readme.py (discovers modules, parses each finding's
severity/status, rebuilds the Pending Findings and Module Status tables).
Run with --check to fail when README.md is stale (CI-friendly).

REVIEW-PROCESS.md section 5 now points to the script instead of describing
a manual edit, and README.md carries a generated-file banner.
2026-05-16 19:18:18 -04:00
Joseph Doherty
91438dcc1b fix(store-and-forward): create the SQLite database directory on init (StoreAndForward-014)
StoreAndForwardStorage.InitializeAsync opened a SqliteConnection against the
configured SqliteDbPath (default ./data/store-and-forward.db) without ensuring
the parent directory exists. SQLite creates the database file but not its
directory, so when data/ was absent the connection failed with
"SQLite Error 14: unable to open database file" — aborting the site host's
RegisterSiteActors at StoreAndForwardService.StartAsync.

This was the root cause of the six failing SiteActorPathTests. Production
masked it because the Docker image / deployment creates data/.

InitializeAsync now calls EnsureDatabaseDirectoryExists, which parses the
connection string and creates the parent directory of a file-backed database
(in-memory databases and bare filenames are skipped).

Regression test InitializeAsync_FileInMissingDirectory_CreatesDirectory fails
against the pre-fix code. Host suite now 155/155 green (was 149/155).
2026-05-16 19:13:00 -04:00
Joseph Doherty
61253e3269 fix(store-and-forward): resolve S&F delivery + replication wiring (3 Critical findings)
Resolves StoreAndForward-001, ExternalSystemGateway-001, NotificationService-001
— one systemic gap where buffered messages were persisted but never delivered,
and the active node never replicated its buffer to the standby.

Delivery handlers (ExternalSystemGateway-001 / NotificationService-001):
- AkkaHostedService registers delivery handlers for the ExternalSystem,
  CachedDbWrite and Notification categories after StoreAndForwardService starts;
  each resolves its scoped consumer in a fresh DI scope.
- ExternalSystemClient, DatabaseGateway and NotificationDeliveryService each
  gain a DeliverBufferedAsync method: re-resolve the target and re-attempt
  delivery, returning true/false/throwing per the transient-vs-permanent contract.
- EnqueueAsync gains an attemptImmediateDelivery flag; CachedCallAsync and
  NotificationDeliveryService.SendAsync pass false (they already attempted
  delivery themselves) so registering a handler does not dispatch twice.

Replication (StoreAndForward-001):
- ReplicationService is injected into StoreAndForwardService; a new BufferAsync
  helper replicates every enqueue, and successful-retry removes and parks are
  replicated too. Fire-and-forget, no-op when replication is disabled.

Tests: StoreAndForwardReplicationTests (Add/Remove/Park observed),
attemptImmediateDelivery behaviour, and DeliverBufferedAsync paths for each
consumer. Full solution builds; StoreAndForward/ExternalSystemGateway/
NotificationService suites green.
2026-05-16 18:58:11 -04:00
Joseph Doherty
a9bd7ee37c fix(central-ui): resolve CentralUI-001 — enforce script trust model before sandbox execution
ScriptAnalysisService.RunInSandboxAsync compiled and executed arbitrary
user C# in the central host process with no trust-model enforcement — the
forbidden-API set was only a Monaco editor diagnostic. A Design-role user
could run System.IO/Process/Reflection/network code on the central node.

Added a Roslyn semantic gate (EnforceTrustModel) invoked after compilation
and before script.RunAsync, and on nested shared scripts in callSharedFunc;
a script referencing any forbidden API is rejected before it runs.

Reworked FindForbiddenApiUsages: it now resolves every identifier against
the semantic model and checks types and members, so a fully-qualified call
(System.IO.File.WriteAllText) is caught — the pre-fix check only inspected
the leftmost identifier and missed that shape. This is a static semantic
gate, not a process sandbox.

Adds gate regression tests that fail against the pre-fix code, plus a
clean-script test guarding against over-blocking.
2026-05-16 18:41:12 -04:00
Joseph Doherty
a9ceba00d0 fix(communication): resolve Communication-001 — early stream termination handling
DebugStreamService.StartStreamAsync awaited the initial debug snapshot inside
a try whose only handler was catch (OperationCanceledException). When the
stream terminated before the snapshot arrived, onTerminatedWrapper completed
the await with an InvalidOperationException that escaped the catch — the
caller got a raw, untranslated exception and the service did no teardown of
its own on that path.

Replaced with catch (Exception): it removes the session entry, sends
StopDebugStream to the bridge actor via the local reference (deterministic
teardown, idempotent), and throws a descriptive exception — TimeoutException
for the 30s timeout, otherwise an InvalidOperationException naming the
instance/site and wrapping the cause.

Re-triaged Critical -> Medium: the originally-claimed multi-minute site-side
resource leak does not occur (the bridge actor self-terminates on every
onTerminated path). Adds the first DebugStreamService test, which fails
against the pre-fix code.
2026-05-16 18:32:52 -04:00
Joseph Doherty
239bee3bc4 fix(data-connection): resolve DataConnectionLayer-001 — off-thread actor state mutation
HandleSubscribe spawned a Task.Run that mutated DataConnectionActor private
state (_subscriptionIds, _subscriptionsByInstance, _totalSubscribed,
_resolvedTags, _unresolvedTags) from a thread-pool thread, racing the actor's
own message loop — a data race on non-thread-safe Dictionary/HashSet and
non-atomic counters.

Restructured HandleSubscribe to follow the actor's existing PipeTo(Self)
pattern: the background task now performs only adapter I/O and pipes a
SubscribeCompleted message to Self; all subscription-state mutation happens
in the new HandleSubscribeCompleted handler on the actor thread (wired into
the Connected, Connecting and Reconnecting states).

Adds DCL001_ConcurrentSubscribes_DoNotCorruptSubscriptionCounters (30x30
concurrent subscribes) which fails against the pre-fix code and passes after.
2026-05-16 18:26:43 -04:00
Joseph Doherty
977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00
Joseph Doherty
9c60592632 build: adopt NuGet Central Package Management
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
2026-05-16 15:56:30 -04:00
Joseph Doherty
fd1518f4f4 test(central-ui): remove vacuous tests for removed analyzer diagnostics
Six tests asserted DoesNotContain(SCADA004/SCADA005) or an empty InlayHints
result — all pass for the wrong reason now that those diagnostics and the
positional InlayHints were removed in the analyzer realignment. They also
used the obsolete top-level CallScript syntax. Removed.
2026-05-16 15:06:30 -04:00
Joseph Doherty
b949dc4183 test(central-ui): realign analyzer tests with the reworked script-call API 2026-05-16 15:04:06 -04:00
Joseph Doherty
3cc174c3cd test(central-ui): fix the CentralUI.Tests build
Two stale references blocked compilation: the DataConnection page tests
still pointed at Components.Pages.Admin (the pages moved to .Design), and
ScriptAnalysisServiceTests constructed ScriptAnalysisService without the
IServiceProvider parameter. The project now compiles.
2026-05-16 14:44:30 -04:00
Joseph Doherty
d030153378 test(site-runtime): fix stale SetStaticAttribute tests
HandleSetStaticAttribute was made fire-and-forget (commit 2951507) — it no
longer replies with SetStaticAttributeResponse — but three InstanceActor
tests still ExpectMsg<SetStaticAttributeResponse> and timed out. Verify the
mutation via the GetAttributeRequest round-trip instead, which the FIFO
mailbox makes a sound sync point. Test intent (in-memory update, SQLite
persistence, serialized ordering) is unchanged.
2026-05-16 14:33:09 -04:00
Joseph Doherty
d63d412461 test(triggers): expect AlarmTriggerType.Expression in the enum membership test 2026-05-16 06:42:17 -04:00
Joseph Doherty
0a535cd4a5 fix(triggers): don't false-flag Children/Parent attribute refs in expression validation 2026-05-16 06:08:06 -04:00
Joseph Doherty
5065384305 fix(triggers): use explicit ValidationCategory + tighten expression syntax validation 2026-05-16 05:57:39 -04:00
Joseph Doherty
bf3f572ad9 feat(triggers): validate expression triggers pre-deployment 2026-05-16 05:52:25 -04:00
Joseph Doherty
3499d76f14 feat(ui/triggers): expression trigger panel in the script & alarm editors 2026-05-16 05:46:27 -04:00
Joseph Doherty
78b10d00d8 fix(triggers): bound expression evaluation, align AlarmActor error handling, dedupe config parsing 2026-05-16 05:43:18 -04:00
Joseph Doherty
41c3fa3d84 fix(triggers): seed the trigger-expression attribute snapshot at actor startup 2026-05-16 05:38:50 -04:00
Joseph Doherty
9e21b47080 feat(triggers): runtime expression trigger evaluation for scripts and alarms 2026-05-16 05:35:02 -04:00
Joseph Doherty
f789ab4a91 docs(triggers): list the Expression config shape in the codec summaries 2026-05-16 05:30:12 -04:00
Joseph Doherty
199cdbe798 feat(triggers): add Expression to the script & alarm trigger codecs 2026-05-16 05:27:33 -04:00
Joseph Doherty
8050a1996f docs(plans): implementation plan for expression triggers 2026-05-16 05:25:10 -04:00
Joseph Doherty
c94d3b7570 docs(plans): design for expression-based script & alarm triggers
Captures the brainstormed design for a new Expression trigger: a read-only
boolean C# expression evaluated on attribute updates, edge-triggered for
scripts and level-based for alarms, compiled against a restricted read-only
globals type.
2026-05-16 05:21:57 -04:00
Joseph Doherty
6fb313cf58 feat(ui/templates): structured trigger editor for template scripts
The script add/edit modal exposed a script's trigger as two raw free-text
inputs — a type string and hand-written config JSON — with no validation
and no parity with the alarm trigger UI.

Replace them with a ScriptTriggerEditor component (mirroring
AlarmTriggerEditor): a trigger-type dropdown plus type-specific panels for
Interval, ValueChange, Conditional, and Call, a grouped attribute picker,
and an auto-generated hint. A ScriptTriggerConfigCodec round-trips the
TriggerConfiguration JSON the site runtime's ScriptActor consumes, tolerant
of legacy keys; an unrecognized stored type is preserved untouched in a
read-only panel.
2026-05-16 04:03:42 -04:00
Joseph Doherty
295150751f feat(scripts): realign Test Run with runtime API, add anonymous-object calls and instance binding
The Test Run sandbox and Monaco analysis modelled a script API that had
drifted from the site runtime's ScriptGlobals, so real scripts failed to
compile in Test Run. Realign both to the runtime surface
(Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the
duplicate ScriptHost stub so the two cannot diverge again.

- Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call)
  accept an anonymous object instead of a hand-built dictionary, via a
  shared ScriptArgs normalizer; existing dictionary calls still compile.
- Test Run can optionally bind to a deployed instance, so Instance/
  Attributes/CallScript route to it cross-site; adds site-side
  RouteToGetAttributes/RouteToSetAttributes handlers.
- Adds Test Run panels to the API method and template script editors.
- Fixes the TestDatabaseQuery seed script, which queried a table that
  never existed.

Also commits unrelated in-progress work already in the tree: the health
monitoring report loop, site streaming changes, and the Admin/Design
data-connection and SMTP page reorganization.
2026-05-16 03:37:56 -04:00
Joseph Doherty
d7b05b40e9 fix(host): drop UseStaticFiles so MapStaticAssets controls caching
UseStaticFiles middleware ran before the MapStaticAssets endpoints and
served static assets (monaco-init.js, site.css, etc.) with no
Cache-Control header. Browsers then heuristically cached them and kept
serving stale copies across deploys — e.g. the Monaco editor ran an old
monaco-init.js that did not send the script kind, so inbound API method
scripts were analysed against the wrong globals and 'Route' was flagged
as undefined.

MapStaticAssets alone now serves every static asset, tagging
non-fingerprinted files with Cache-Control: no-cache so the browser
always revalidates via ETag.
2026-05-15 12:29:14 -04:00
Joseph Doherty
e54c4a6c2e feat(ui/auth): use a minimal layout for the login page
The login page previously rendered inside MainLayout, showing the full
nav sidebar and the authenticated-user footer. It now uses a bare
LoginLayout (no nav, no session-expiry watchdog, no dialog host) and
just renders its own centred card.
2026-05-15 12:16:36 -04:00
Joseph Doherty
fc18239b97 fix(ui/auth): stop /login redirect loop when the session is expired
SessionExpiry renders inside MainLayout, which also wraps the login
page. For a user with a still-present auth cookie but an expired
expires_at claim, it redirected /login back to /login indefinitely.
It now skips the redirect when already on the login page.
2026-05-15 12:14:57 -04:00
Joseph Doherty
1d5465f31c fix(deployment): instance delete fully removes the record
Deleting an instance only undeployed it from the site and set the state
to NotDeployed, leaving an orphan record that could never be removed —
the state-transition matrix rejected delete from NotDeployed.

Delete now removes the instance record entirely (deployment history,
snapshot, attribute/alarm overrides, and connection bindings go with
it), and is permitted from any state.
2026-05-15 12:05:13 -04:00
Joseph Doherty
17e24ddd20 fix(site-event-log): record script errors and route queries to the active node
Script execution failures were only written to Serilog, never to the
site event log — SiteRuntime did not reference the SiteEventLogging
project. ScriptExecutionActor now resolves ISiteEventLogger and emits a
'script'/'Error' event on timeout and exception.

The event-log query handler was a per-node actor bound to that node's
local SQLite. A ClusterClient query could land on the standby (which
records no events) and return nothing. The handler is now a cluster
singleton with a proxy, so queries always reach the active node.
2026-05-15 12:04:59 -04:00
Joseph Doherty
80ec16a6d0 feat(ui/auth): redirect to /login when the session times out
Previously a user idling past the 30-minute cookie expiry stayed parked
on a stale page until they tried to navigate. The auth cookie's UTC
expiry is now also stamped onto an expires_at claim at sign-in, and a
SessionExpiry component mounted in MainLayout schedules a delay until
expiry + 2s grace, then force-loads /login — at which point the standard
cookie middleware confirms the session is gone and serves the login page.
2026-05-13 16:13:53 -04:00
Joseph Doherty
3f37584728 feat(ui/topology): open instance in Debug View from context menu
Adds a Debug View item to the instance context menu on /deployment/topology
that navigates to /deployment/debug-view with siteId and instanceId query
parameters; the page now auto-connects when those are present (falling
back to the existing localStorage auto-reconnect otherwise). Disabled for
non-Enabled instances since debug streaming only targets enabled ones.

Also fixes a latent NRE in DebugView.OnInitializedAsync: the toast ref
isn't bound yet during init, so transient load failures are now stashed
and surfaced from OnAfterRenderAsync where the toast is ready.
2026-05-13 13:41:20 -04:00
Joseph Doherty
733679a376 feat(ui/api-keys): grant API method access on edit page
Admins can now check/uncheck which API methods this key is approved to
invoke directly on /admin/api-keys/{id}/edit, instead of having to bounce
through the Design role's API method editor. Membership is diffed against
the initial state and applied by mutating ApprovedApiKeyIds on each
affected ApiMethod in the same SaveChangesAsync.
2026-05-13 13:41:13 -04:00
Joseph Doherty
7044791a55 docs(plans): scrub LmxProxy references from design plans
Remove the LmxProxy work package (WP-8) from phase-3b, the CD-DCL-1..6
protocol details, Q9/Q-P3B-2 from the questions log, the LmxProxy
component-design rows in requirements-traceability, and the inline
mentions across phase-0, phase-4, the gRPC streaming plans, and the
primary/backup data-connection plans.
2026-05-13 13:30:07 -04:00
Joseph Doherty
72e7bbe968 chore: remove deprecated LmxProxy reference implementation
Delete the standalone ZB.MOM.WW.LmxProxy solution and loose adapter
stubs under deprecated/, plus the lone LmxProxy mention in
deprecated/windev.md. The protocol was never wired into the active
codebase and the runtime artifact has been removed from the cluster.
2026-05-13 13:29:58 -04:00