Commit Graph

161 Commits

Author SHA1 Message Date
Joseph Doherty
746ab90444 fix(cluster-infrastructure): resolve ClusterInfrastructure-005,007,008 — confirm config-section constant, XML docs, phase-status cleanup 2026-05-16 22:04:21 -04:00
Joseph Doherty
d7b275fc9b fix(central-ui): resolve CentralUI-015..019 — pager windowing, logout CSRF, narrowed catch blocks, coverage; CentralUI-015 re-triaged Won't Fix 2026-05-16 22:04:21 -04:00
Joseph Doherty
404216b4ee fix(cli): resolve CLI-008..013 — format validation, exit-code semantics, debug-stream cancellation/disposal, test coverage 2026-05-16 22:04:21 -04:00
Joseph Doherty
804697f873 fix(template-engine): resolve TemplateEngine-006..010 — code-region-aware API/brace scanning, composed-alarm override validation, N+1 fix, doc correction 2026-05-16 21:44:11 -04:00
Joseph Doherty
5672502d83 fix(store-and-forward): resolve StoreAndForward-004,005,010,013 — accurate handler-contract doc, conditional sweep writes, reset LastAttemptAt on parked retry, test coverage 2026-05-16 21:44:10 -04:00
Joseph Doherty
a88bec9376 fix(site-runtime): resolve SiteRuntime-004..011 — deploy-after-persist, remove reflection, deterministic IDs, non-blocking startup, dedicated script scheduler, config-change detection, semantic trust-model check 2026-05-16 21:44:10 -04:00
Joseph Doherty
24a4a2d165 fix(site-event-logging): resolve SiteEventLogging-005,007,008,010 — background async writer, drop concrete downcast, surface write failures, test coverage 2026-05-16 21:44:10 -04:00
Joseph Doherty
632d44f38c fix(host,deployment-manager,communication): repair cross-module DI regressions from batch 1-2
- DeploymentManager-008: revert IConfiguration overload (violated OptionsTests
  component-convention); Host now binds the ScadaLink:DeploymentManager section
- SiteStreamGrpcServer: make test-only int ctor internal so DI sees one public
  ctor (resolves ambiguous-constructor failure in SiteCompositionRootTests)
- Host site composition-root test config: supply Cluster:SeedNodes for the new
  ClusterOptionsValidator
2026-05-16 21:28:50 -04:00
Joseph Doherty
30ebbdd183 fix(security): resolve Security-004..007 — configurable user-id attribute, DN escaping, JWT issuer/audience validation, idle-timeout preservation 2026-05-16 21:22:01 -04:00
Joseph Doherty
a702cb96a8 fix(notification-service): resolve NotificationService-005..009 — explicit TLS modes, per-credential token cache, timeout/throttle, address validation, credential redaction 2026-05-16 21:22:01 -04:00
Joseph Doherty
57679d49f2 fix(management-service): resolve ManagementService-004,006,007,013 — PipeTo dispatch, JsonDocument disposal, unified serialization, endpoint tests; re-triage MS-009 2026-05-16 21:22:01 -04:00
Joseph Doherty
da955042aa fix(inbound-api): resolve InboundAPI-002,004,006,008 — disconnect vs timeout, body size limit, active-node gate; surface InboundAPI-007 2026-05-16 21:22:01 -04:00
Joseph Doherty
6563511b5f fix(host): resolve Host-003,004 — replace plaintext secrets with env placeholders, validate site seed-node ports; re-triage Host-002 2026-05-16 21:22:01 -04:00
Joseph Doherty
9f634e37c3 fix(health-monitoring): resolve HealthMonitoring-003..009 — central offline grace, register unknown-site heartbeats, test coverage 2026-05-16 21:11:24 -04:00
Joseph Doherty
2502e4d10a fix(external-system-gateway): resolve ExternalSystemGateway-004..010 — honour retry settings, dispose HTTP messages, fix URL building, truncate error bodies, fix connection leak 2026-05-16 21:11:24 -04:00
Joseph Doherty
8c67ffad2a fix(deployment-manager): resolve DeploymentManager-003..011 — atomic status commit, orphan-delete handling, semaphore reclamation, structured diff, options binding, lifecycle test coverage 2026-05-16 21:11:24 -04:00
Joseph Doherty
c9b236e507 fix(data-connection): resolve DataConnectionLayer-006..012 — quality-counter reconciliation, per-tag batch reads, configurable failover threshold, dedup retry, stale-callback guard, secure cert default 2026-05-16 21:11:24 -04:00
Joseph Doherty
0c82ffcbe6 fix(configuration-database): resolve ConfigurationDatabase-002..007 — remove hardcoded sa creds, fail-fast no-arg DI, encrypt secret columns, resilient audit serialization 2026-05-16 21:11:24 -04:00
Joseph Doherty
31a6995d24 fix(communication): resolve Communication-004..008 — Resume supervision, gRPC option wiring, address-load logging, sync dispose, flap detection 2026-05-16 20:58:03 -04:00
Joseph Doherty
3e7a3d7e31 fix(commons): resolve Commons-001..004 — stale-fire race, JsonDocument lifetime, GetNullable strictness, registry symmetry 2026-05-16 20:58:03 -04:00
Joseph Doherty
dba1a1b25f fix(cluster-infrastructure): resolve ClusterInfrastructure-002..006 — options validation, DI registration, down-if-alone 2026-05-16 20:58:03 -04:00
Joseph Doherty
71b90ba499 fix(central-ui): resolve CentralUI-007..014 — nav authz, UTC date filters, disposal guards, N+1 fix, async script analysis 2026-05-16 20:58:03 -04:00
Joseph Doherty
738e67acc5 fix(cli): resolve CLI-002..007 — robust response rendering, URL/JSON arg validation, credential env-vars, doc refresh 2026-05-16 20:58:03 -04:00
Joseph Doherty
305b42ea6d feat(template-engine): resolve TemplateEngine-002 — per-slot alarm override for derived templates
Adds IsInherited/LockedInDerived to the TemplateAlarm entity (mirroring the
attribute/script override model), an EF migration, base-alarm copy-on-derive,
inherited-alarm flattening skip, and LockedInDerived override-rejection validation.
2026-05-16 20:12:24 -04:00
Joseph Doherty
bc548e1447 feat(deployment-manager): resolve DeploymentManager-006 — query site deployment state before redeploy and reconcile
Adds DeploymentStateQuery request/response contracts (Commons), a site-side
handler (SiteRuntime), a CommunicationService query method (Communication), and
reconciliation in DeploymentService: when a prior record is InProgress or
Failed-on-timeout, query the site; if it already holds the target revision hash
mark the record Success without re-sending; on query failure fall through to a
normal deploy (site-side stale-rejection is the safety net).
2026-05-16 20:12:24 -04:00
Joseph Doherty
74aae53500 fix(template-engine): resolve TemplateEngine-001/003/004/005, re-triage 002 — recursive composed flattening, fixed-field guard, alarm script refs, dead collision query 2026-05-16 19:57:28 -04:00
Joseph Doherty
71c0564ec0 fix(store-and-forward): resolve StoreAndForward-003, re-triage 002 — fix retry-count off-by-one 2026-05-16 19:57:28 -04:00
Joseph Doherty
09b4bd5dfa fix(site-runtime): resolve SiteRuntime-001/002/003 — route data-sourced writes to DCL, real per-attribute API results, race-free redeploy 2026-05-16 19:57:28 -04:00
Joseph Doherty
0529cf2d40 fix(site-event-logging): resolve SiteEventLogging-001/002/003, re-triage 004 — incremental auto_vacuum, cap-purge guard, write-lock connection access 2026-05-16 19:47:51 -04:00
Joseph Doherty
0d9363766d fix(security): resolve Security-001/002/003 — reachable StartTLS path, Secure cookie, JWT signing key validation 2026-05-16 19:47:17 -04:00
Joseph Doherty
393172f169 fix(notification-service): resolve NotificationService-002/003/004 — error classification by SMTP status code, single SMTP client 2026-05-16 19:47:17 -04:00
Joseph Doherty
b249ca3bf7 fix(management-service): resolve ManagementService-001/002/003 — enforce site scope on query/snapshot handlers and DebugStreamHub 2026-05-16 19:47:17 -04:00
Joseph Doherty
6f4efdfa2e fix(inbound-api): resolve InboundAPI-001/003/005 — concurrent handler cache, constant-time API key compare, script trust-model enforcement 2026-05-16 19:47:17 -04:00
Joseph Doherty
a0e6a36e79 fix(host): resolve Host-001 — exclude leader-only active-node check from /health/ready 2026-05-16 19:40:40 -04:00
Joseph Doherty
7d7214a4ca fix(health-monitoring): resolve HealthMonitoring-001/002 — populate S&F buffer depth, make SiteHealthState immutable 2026-05-16 19:40:40 -04:00
Joseph Doherty
340a70f0e6 fix(external-system-gateway): resolve ExternalSystemGateway-002/003 — apply HTTP call timeout, confirm CachedCall no double-dispatch 2026-05-16 19:40:40 -04:00
Joseph Doherty
ab098bf6c8 fix(deployment-manager): resolve DeploymentManager-001/002 — broaden failure catch, persist failure status with non-cancellable token 2026-05-16 19:40:40 -04:00
Joseph Doherty
fccd3274d3 fix(data-connection-layer): resolve DataConnectionLayer-002/003/004/005 — Resume supervision, concurrent dicts, subscribe-failure classification, write timeout 2026-05-16 19:40:40 -04:00
Joseph Doherty
9043f0089b fix(configuration-database): resolve ConfigurationDatabase-001 — remove dead child-template query in GetTemplateWithChildrenAsync 2026-05-16 19:33:09 -04:00
Joseph Doherty
301e7fb854 fix(communication): resolve Communication-002/003 — gRPC reconnect stream cleanup and subscription map safety 2026-05-16 19:33:09 -04:00
Joseph Doherty
87f14c190a fix(central-ui): resolve CentralUI-002/003/004 — site-scope enforcement, per-circuit console capture, cached auth state 2026-05-16 19:33:09 -04:00
Joseph Doherty
5a08b04535 fix(cli): resolve CLI-001 — honor SCADALINK_FORMAT and config-file format precedence 2026-05-16 19:33:09 -04:00
Joseph Doherty
91438dcc1b fix(store-and-forward): create the SQLite database directory on init (StoreAndForward-014)
StoreAndForwardStorage.InitializeAsync opened a SqliteConnection against the
configured SqliteDbPath (default ./data/store-and-forward.db) without ensuring
the parent directory exists. SQLite creates the database file but not its
directory, so when data/ was absent the connection failed with
"SQLite Error 14: unable to open database file" — aborting the site host's
RegisterSiteActors at StoreAndForwardService.StartAsync.

This was the root cause of the six failing SiteActorPathTests. Production
masked it because the Docker image / deployment creates data/.

InitializeAsync now calls EnsureDatabaseDirectoryExists, which parses the
connection string and creates the parent directory of a file-backed database
(in-memory databases and bare filenames are skipped).

Regression test InitializeAsync_FileInMissingDirectory_CreatesDirectory fails
against the pre-fix code. Host suite now 155/155 green (was 149/155).
2026-05-16 19:13:00 -04:00
Joseph Doherty
61253e3269 fix(store-and-forward): resolve S&F delivery + replication wiring (3 Critical findings)
Resolves StoreAndForward-001, ExternalSystemGateway-001, NotificationService-001
— one systemic gap where buffered messages were persisted but never delivered,
and the active node never replicated its buffer to the standby.

Delivery handlers (ExternalSystemGateway-001 / NotificationService-001):
- AkkaHostedService registers delivery handlers for the ExternalSystem,
  CachedDbWrite and Notification categories after StoreAndForwardService starts;
  each resolves its scoped consumer in a fresh DI scope.
- ExternalSystemClient, DatabaseGateway and NotificationDeliveryService each
  gain a DeliverBufferedAsync method: re-resolve the target and re-attempt
  delivery, returning true/false/throwing per the transient-vs-permanent contract.
- EnqueueAsync gains an attemptImmediateDelivery flag; CachedCallAsync and
  NotificationDeliveryService.SendAsync pass false (they already attempted
  delivery themselves) so registering a handler does not dispatch twice.

Replication (StoreAndForward-001):
- ReplicationService is injected into StoreAndForwardService; a new BufferAsync
  helper replicates every enqueue, and successful-retry removes and parks are
  replicated too. Fire-and-forget, no-op when replication is disabled.

Tests: StoreAndForwardReplicationTests (Add/Remove/Park observed),
attemptImmediateDelivery behaviour, and DeliverBufferedAsync paths for each
consumer. Full solution builds; StoreAndForward/ExternalSystemGateway/
NotificationService suites green.
2026-05-16 18:58:11 -04:00
Joseph Doherty
a9bd7ee37c fix(central-ui): resolve CentralUI-001 — enforce script trust model before sandbox execution
ScriptAnalysisService.RunInSandboxAsync compiled and executed arbitrary
user C# in the central host process with no trust-model enforcement — the
forbidden-API set was only a Monaco editor diagnostic. A Design-role user
could run System.IO/Process/Reflection/network code on the central node.

Added a Roslyn semantic gate (EnforceTrustModel) invoked after compilation
and before script.RunAsync, and on nested shared scripts in callSharedFunc;
a script referencing any forbidden API is rejected before it runs.

Reworked FindForbiddenApiUsages: it now resolves every identifier against
the semantic model and checks types and members, so a fully-qualified call
(System.IO.File.WriteAllText) is caught — the pre-fix check only inspected
the leftmost identifier and missed that shape. This is a static semantic
gate, not a process sandbox.

Adds gate regression tests that fail against the pre-fix code, plus a
clean-script test guarding against over-blocking.
2026-05-16 18:41:12 -04:00
Joseph Doherty
a9ceba00d0 fix(communication): resolve Communication-001 — early stream termination handling
DebugStreamService.StartStreamAsync awaited the initial debug snapshot inside
a try whose only handler was catch (OperationCanceledException). When the
stream terminated before the snapshot arrived, onTerminatedWrapper completed
the await with an InvalidOperationException that escaped the catch — the
caller got a raw, untranslated exception and the service did no teardown of
its own on that path.

Replaced with catch (Exception): it removes the session entry, sends
StopDebugStream to the bridge actor via the local reference (deterministic
teardown, idempotent), and throws a descriptive exception — TimeoutException
for the 30s timeout, otherwise an InvalidOperationException naming the
instance/site and wrapping the cause.

Re-triaged Critical -> Medium: the originally-claimed multi-minute site-side
resource leak does not occur (the bridge actor self-terminates on every
onTerminated path). Adds the first DebugStreamService test, which fails
against the pre-fix code.
2026-05-16 18:32:52 -04:00
Joseph Doherty
239bee3bc4 fix(data-connection): resolve DataConnectionLayer-001 — off-thread actor state mutation
HandleSubscribe spawned a Task.Run that mutated DataConnectionActor private
state (_subscriptionIds, _subscriptionsByInstance, _totalSubscribed,
_resolvedTags, _unresolvedTags) from a thread-pool thread, racing the actor's
own message loop — a data race on non-thread-safe Dictionary/HashSet and
non-atomic counters.

Restructured HandleSubscribe to follow the actor's existing PipeTo(Self)
pattern: the background task now performs only adapter I/O and pipes a
SubscribeCompleted message to Self; all subscription-state mutation happens
in the new HandleSubscribeCompleted handler on the actor thread (wired into
the Connected, Connecting and Reconnecting states).

Adds DCL001_ConcurrentSubscribes_DoNotCorruptSubscriptionCounters (30x30
concurrent subscribes) which fails against the pre-fix code and passes after.
2026-05-16 18:26:43 -04:00
Joseph Doherty
9c60592632 build: adopt NuGet Central Package Management
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
2026-05-16 15:56:30 -04:00
Joseph Doherty
fd1518f4f4 test(central-ui): remove vacuous tests for removed analyzer diagnostics
Six tests asserted DoesNotContain(SCADA004/SCADA005) or an empty InlayHints
result — all pass for the wrong reason now that those diagnostics and the
positional InlayHints were removed in the analyzer realignment. They also
used the obsolete top-level CallScript syntax. Removed.
2026-05-16 15:06:30 -04:00
Joseph Doherty
b949dc4183 test(central-ui): realign analyzer tests with the reworked script-call API 2026-05-16 15:04:06 -04:00