Commit Graph

63 Commits

Author SHA1 Message Date
Joseph Doherty
fd2e96fea2 feat: replace debug view polling with real-time SignalR streaming
The debug view polled every 2s by re-subscribing for full snapshots. Now a
persistent DebugStreamBridgeActor on central subscribes once and receives
incremental Akka stream events from the site, forwarding them to the Blazor
component via callbacks and to the CLI via a new SignalR hub at
/hubs/debug-stream. Adds `debug stream` CLI command with auto-reconnect.
2026-03-21 01:34:53 -04:00
Joseph Doherty
0a85a839a2 feat(infra): add Traefik load balancer with active node health check for central cluster failover
Add ActiveNodeHealthCheck that returns 200 only on the Akka.NET cluster
leader, enabling Traefik to route traffic to the active central node and
automatically fail over when the leader changes. Also fixes AkkaClusterHealthCheck
to resolve ActorSystem from AkkaHostedService (was always null via DI).
2026-03-21 00:44:37 -04:00
Joseph Doherty
1a540f4f0a feat: add HTTP Management API, migrate CLI from Akka ClusterClient to HTTP
Replace the CLI's Akka.NET ClusterClient transport with a simple HTTP client
targeting a new POST /management endpoint on the Central Host. The endpoint
handles Basic Auth, LDAP authentication, role resolution, and ManagementActor
dispatch in a single round-trip — eliminating the CLI's Akka, LDAP, and
Security dependencies.

Also fixes DCL ReSubscribeAll losing subscriptions on repeated reconnect by
deriving the tag list from _subscriptionsByInstance instead of _subscriptionIds.
2026-03-20 23:55:31 -04:00
Joseph Doherty
7740a3bcf9 feat: add JoeAppEngine OPC UA nodes, fix DCL auto-reconnect and quality push
- Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime)
- Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads,
  preventing Self.Tell failure in Disconnected event handler
- Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect
- Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]"
- Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync
- Update test_infra_opcua.md with JoeAppEngine documentation
2026-03-19 13:27:54 -04:00
Joseph Doherty
e837eae2cc feat: wire real LmxProxy gRPC client into Data Connection Layer
Replace stub ILmxProxyClient with production proto-generated gRPC client
(RealLmxProxyClient) that connects to LmxProxy servers with x-api-key
metadata header authentication. Includes pre-generated proto stubs for
ARM64 Docker compatibility, updated adapter with proper quality mapping
(Good/Uncertain/Bad), subscription via server-streaming RPC, and 20 unit
tests covering all operations. Updated Component-DataConnectionLayer.md
to reflect the actual implementation.
2026-03-18 11:57:18 -04:00
Joseph Doherty
da683d4fe9 fix: lazy-compile API method scripts and prefix composed alarm trigger attributes
- InboundScriptExecutor lazy-compiles scripts on first request, solving
  the multi-node problem where methods created via CLI/UI were only compiled
  on the ManagementActor's node, not the node handling the HTTP request.
- ManagementActor hot-registers API method scripts on create/update/delete
  for the local node.
- FlatteningService prefixes the "attribute" field in composed alarm trigger
  configs with the composition instance name so alarms evaluate against the
  correct path-qualified attribute (e.g. CoolingTank.Level not Level).
2026-03-18 09:30:12 -04:00
Joseph Doherty
db387c6613 fix: include recipients in artifact deployment and load shared scripts on startup
NotificationRepository.GetAllNotificationListsAsync() was missing
.Include(Recipients), causing artifact deployments to push empty recipient
lists to sites. Also load shared scripts from SQLite on DeploymentManager
startup so they're available before Instance Actors compile their scripts.
2026-03-18 09:13:10 -04:00
Joseph Doherty
78fbb13df7 feat: wire Inbound API Route.To().Call() to site instance scripts and add Roslyn compilation
Completes the Inbound API → site script call chain by adding RouteToCallRequest
handlers in SiteCommunicationActor and DeploymentManagerActor. Also replaces the
placeholder dispatch table in InboundScriptExecutor with Roslyn compilation of
API method scripts at startup, enabling user-defined inbound API methods to call
instance scripts across the cluster.
2026-03-18 08:43:13 -04:00
Joseph Doherty
eb8ead58d2 feat: wire SQLite replication between site nodes and fix ConfigurationDatabase tests
Add SiteReplicationActor (runs on every site node) to replicate deployed
configs and store-and-forward buffer operations to the standby peer via
cluster member discovery and fire-and-forget Tell. Wire ReplicationService
handler and pass replication actor to DeploymentManagerActor singleton.

Fix 5 pre-existing ConfigurationDatabase test failures: RowVersion NOT NULL
on SQLite, stale migration name assertion, and seed data count mismatch.
2026-03-18 08:28:02 -04:00
Joseph Doherty
f063fb1ca3 fix: wire DCL tag value delivery, alarm evaluation, and snapshot timestamps
Three runtime bugs fixed:
- DataConnectionActor: TagValueReceived/TagResolutionSucceeded/Failed not
  handled in any Become state — OPC UA values went to dead letters. Added
  initial read after subscribe to seed current values immediately.
- AlarmActor: ParseEvalConfig expected "attributeName"/"matchValue"/"min"/
  "max" keys but seed data uses "attribute"/"value"/"high"/"low". Added
  support for both conventions and !=prefix for not-equal matching.
- InstanceActor: snapshots reported all alarms (including unevaluated) with
  correct priorities and source timestamps instead of current UTC. Removed
  bogus Vibration template attribute that shadowed Speed's tag mapping.
2026-03-18 07:36:48 -04:00
Joseph Doherty
9c6e3c2e56 feat: add CLI debug snapshot command for one-shot instance state inspection
Adds `debug snapshot --id <int>` to query a running instance's current
attribute values and alarm states without the subscribe/stream overhead
of the debug view. Routes through ManagementActor → CommunicationService
→ site DeploymentManager → InstanceActor using the existing remote query
pattern.
2026-03-18 07:16:22 -04:00
Joseph Doherty
6ee820b0f0 docs: add Docker/OrbStack CLI connection note to CLI README 2026-03-18 07:04:53 -04:00
Joseph Doherty
899dec6b6f feat: wire ExternalSystem, Database, and Notify APIs into script runtime
IServiceProvider now flows through the actor chain (DeploymentManagerActor
→ InstanceActor → ScriptActor → ScriptExecutionActor) so scripts can
resolve IExternalSystemClient, IDatabaseGateway, and
INotificationDeliveryService from DI. ScriptGlobals exposes ExternalSystem,
Database, Notify, and Scripts as top-level properties so scripts can use
them without the Instance. prefix.
2026-03-18 02:41:18 -04:00
Joseph Doherty
8095c8efbe fix: only active singleton node sends health reports
Both nodes of a site cluster were sending health reports. The standby
node (without the DeploymentManager singleton) reported 0 instances and
no connections, overwriting the active node's data in the aggregator.

Added IsActiveNode flag to ISiteHealthCollector, set by
DeploymentManagerActor on PreStart/PostStop. HealthReportSender skips
sending when the node is not active. Also ensured EnsureDclConnections
is called during startup batch creation so data connections survive
container restarts.
2026-03-18 01:44:57 -04:00
Joseph Doherty
213ca2698a fix: update instance counts after startup batch creation completes
UpdateInstanceCounts() was only called before Instance Actors were
created (in HandleStartupConfigsLoaded), showing 0 enabled on the
health dashboard. Now also called after each batch in
HandleStartNextBatch to reflect actual running actor count.
2026-03-18 01:28:46 -04:00
Joseph Doherty
c63fb1c4a6 feat: achieve CLI parity with Central UI
Add 33 new management message records, ManagementActor handlers, and CLI
commands to close all functionality gaps between the Central UI and the
Management CLI. New capabilities include:

- Template member CRUD (attributes, alarms, scripts, compositions)
- Shared script CRUD
- Database connection definition CRUD
- Inbound API method CRUD
- LDAP scope rule management
- API key enable/disable
- Area update
- Remote event log and parked message queries
- Missing get/update commands for templates, sites, instances, data
  connections, external systems, notifications, and SMTP config

Includes 12 new ManagementActor unit tests covering authorization,
happy-path queries, and error handling. Updates CLI README and component
design documents (Component-CLI.md, Component-ManagementService.md).
2026-03-18 01:21:20 -04:00
Joseph Doherty
b2385709f8 fix: raise health report sender log level to INFO for observability
Changed "Sent health report" from DEBUG to INFO and failure log from
WARNING to ERROR so health report activity is visible in default logging.
2026-03-18 01:08:44 -04:00
Joseph Doherty
f165ca2774 feat: wire all health metrics and add instance counts to dashboard
Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor),
alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and
S&F buffer depth placeholder. Added instance count tracking (deployed/
enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated
Health Dashboard UI to show instance counts per site. All metrics flow
through the existing health report pipeline via ClusterClient.
2026-03-18 00:57:49 -04:00
Joseph Doherty
88b5f6cb54 fix: handle mixed JSON types in data connection config deserialization
DeploymentManagerActor deserialized connection config JSON as
Dictionary<string, string>, which silently failed on non-string values
like {"publishInterval":1000}. The OPC UA adapter then fell back to
localhost:4840 (unreachable in Docker). Now uses JsonDocument to handle
any JSON value type. OPC PLC Simulator connects successfully.
2026-03-18 00:39:01 -04:00
Joseph Doherty
68115e7e38 feat: move Areas to Design role, fix logout, add Sign Out button
Areas management is a design concern, not admin. Moved Areas page
authorization from RequireAdmin to RequireDesign, moved nav link from
Admin to Design section, updated ManagementActor role check. Added
GET /logout endpoint (was 404, now redirects to login). Improved Sign
Out button visibility in sidebar next to username.
2026-03-18 00:28:35 -04:00
Joseph Doherty
75a6636a2c fix: wire DCL connection state changes into ISiteHealthCollector
DataConnectionActor now calls UpdateConnectionHealth() on state
transitions (Connecting/Connected/Reconnecting) and UpdateTagResolution()
on connection establishment. DataConnectionManagerActor calls
RemoveConnection() on actor removal. Health reports now include
data connection statuses when instances are deployed with bindings.
2026-03-18 00:20:02 -04:00
Joseph Doherty
4f22ca2b1f feat: replace ActorSelection with ClusterClient for inter-cluster communication
Central and site clusters now communicate via ClusterClient/
ClusterClientReceptionist instead of direct ActorSelection. Both
CentralCommunicationActor and SiteCommunicationActor are registered
with their cluster's receptionist. Central creates one ClusterClient
per site using NodeA/NodeB contact points from the DB. Sites configure
multiple CentralContactPoints for automatic failover between central
nodes. ISiteClientFactory enables test injection.
2026-03-18 00:08:47 -04:00
Joseph Doherty
e5eb871961 fix: wire up health report pipeline between sites and central aggregator
Sites now send SiteHealthReport via AkkaHealthReportTransport →
SiteCommunicationActor → CentralCommunicationActor → CentralHealthAggregator.
Added IHealthReportTransport impl, ISiteIdentityProvider impl, registered
HealthReportSender on site nodes, and added SiteHealthReport handler in
CentralCommunicationActor. Health Dashboard now shows all 3 sites online.
2026-03-17 23:46:17 -04:00
Joseph Doherty
9e97c1acd2 feat: replace site registration with database-driven site addressing
Central now resolves site Akka remoting addresses from the Sites DB table
(NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite
messages. Eliminates the race condition where sites starting before central
had their registration dead-lettered. Addresses are cached in
CentralCommunicationActor with 60s periodic refresh and on-demand refresh
when sites are added/edited/deleted via UI or CLI.
2026-03-17 23:13:10 -04:00
Joseph Doherty
775cb8084f feat: data-sourced attributes start with uncertain quality before first DCL value
Attributes bound to data connections now initialize with "Uncertain" quality,
distinguishing "never received a value" from "known good" or "connection lost."
Quality is tracked per attribute and included in GetAttributeResponse.
2026-03-17 18:25:39 -04:00
Joseph Doherty
85d351d729 chore: ignore all logs/ directories at any depth 2026-03-17 18:18:28 -04:00
Joseph Doherty
eea50014de fix: resolve CLI serialization failures and add README
Two Akka.NET deserialization bugs prevented CLI commands from reaching ManagementActor: IReadOnlyList<string> in AuthenticatedUser serialized as a compiler-generated internal type unknown to the server, and ManagementSuccess.Data carried server-side assembly types the CLI couldn't resolve on receipt. Fixed by using string[] for roles and pre-serializing response data to JSON in ManagementActor before sending. Adds full CLI reference documentation covering all 10 command groups.
2026-03-17 18:17:47 -04:00
Joseph Doherty
40f74e4a42 feat: implement all CLI command groups (10 groups, 11 files) 2026-03-17 14:59:08 -04:00
Joseph Doherty
229287cfd2 feat: scaffold CLI project with ClusterClient connection and System.CommandLine 2026-03-17 14:51:43 -04:00
Joseph Doherty
1942544769 feat: register ManagementActor on Central with ClusterClientReceptionist 2026-03-17 14:49:35 -04:00
Joseph Doherty
1dc7d50bce feat: implement ManagementActor with all command handlers and authorization 2026-03-17 14:46:57 -04:00
Joseph Doherty
8068c499bd feat: define management message contracts in Commons (10 command groups) 2026-03-17 14:41:54 -04:00
Joseph Doherty
e9acd2dd34 feat: scaffold ManagementService project and test project 2026-03-17 14:40:39 -04:00
Joseph Doherty
7dcdcc46c7 Replace hardcoded "system" user with actual logged-in user across all UI pages
All 22 occurrences of hardcoded "system" user string replaced with
GetCurrentUserAsync() which reads the Username claim from AuthenticationState.
Affected: Instances.razor (6), Sites.razor (2), Templates.razor (11),
SharedScripts.razor (3).
2026-03-17 14:09:04 -04:00
Joseph Doherty
1ae4d09614 feat: add Deploy Artifacts button to Sites admin page 2026-03-17 13:57:30 -04:00
Joseph Doherty
3b22a8f0da feat: wire site-local repos, remove config DB from Site, update artifact service
- SiteExternalSystemRepository and SiteNotificationRepository registered in Site DI
- Removed AddConfigurationDatabase from Site role in Program.cs
- Removed ConfigurationDb from appsettings.Site.json
- ArtifactDeploymentService collects all 6 artifact types including data connections and SMTP
2026-03-17 13:54:37 -04:00
Joseph Doherty
2f3e0ceecb feat: include data connections and SMTP in artifact deployment 2026-03-17 13:48:52 -04:00
Joseph Doherty
e313eda9fd feat: add SiteNotificationRepository and SMTP storage 2026-03-17 13:42:15 -04:00
Joseph Doherty
0a1de710e8 feat: add SiteExternalSystemRepository backed by SQLite 2026-03-17 13:39:37 -04:00
Joseph Doherty
1b06a4971e Fix OPC UA adapter: pass connection details, certificate stores, endpoint discovery
- DataConnectionActor now stores and passes connection details to adapter ConnectAsync
- DataConnectionManagerActor passes connection details when creating actor
- RealOpcUaClient uses DiscoveryClient for endpoint selection with no-security preference
- Added certificate trust store paths to prevent TrustedIssuerCertificates error
- Sanitize connection names for Akka actor paths (replace spaces)
2026-03-17 12:19:44 -04:00
Joseph Doherty
8e1d0816b3 Complete OPC UA data flow: binding UI, flattening connections, real OPC UA client
- Add connection binding UI to Instances page (per-attribute and bulk assign)
- FlatteningService populates Connections dict from bound data connections
- Real OPC UA client using OPC Foundation SDK for live tag subscriptions
- DataConnectionFactory uses RealOpcUaClientFactory by default
- OpcUaDataConnection supports both "endpoint" and "EndpointUrl" config keys
2026-03-17 11:40:39 -04:00
Joseph Doherty
dfb809a909 Wire DCL to Instance Actors for OPC UA tag value flow
- Add TagValueUpdate/ConnectionQualityChanged handlers to InstanceActor
- InstanceActor subscribes to DCL on PreStart based on DataSourceReference
- DeploymentManagerActor creates DCL connections on deploy and passes DCL ref
- AkkaHostedService creates DCL Manager Actor for tag subscriptions
- Move CreateConnectionCommand to Commons for cross-project access
- Add ConnectionConfig to FlattenedConfiguration for deployment packaging
2026-03-17 11:21:11 -04:00
Joseph Doherty
2798b91fe1 Wire up debug view: route subscribe/unsubscribe through DeploymentManagerActor
DeploymentManagerActor now handles SubscribeDebugViewRequest and
UnsubscribeDebugViewRequest by forwarding to the appropriate Instance Actor.
This completes the debug view data flow from Central UI through to the site's
Instance Actor snapshot. Reduced refresh interval to 2s for responsiveness.
2026-03-17 10:55:47 -04:00
Joseph Doherty
60243ad619 Add Deploy/Redeploy button and fix actor replacement on redeployment
Instances page gains Deploy button that triggers flattening pipeline and sends
config to site. Button shows "Redeploy" when instance is stale. Fixed actor name
collision on redeployment by scheduling deferred recreation after Context.Stop.
2026-03-17 10:28:44 -04:00
Joseph Doherty
4879c4e01e Fix auth, Bootstrap, Blazor nav, LDAP, and deployment pipeline for working Central UI
Bootstrap served locally with absolute paths and <base href="/">.
LDAP auth uses search-then-bind with service account for GLAuth compatibility.
CookieAuthenticationStateProvider reads HttpContext.User instead of parsing JWT.
Login/logout forms opt out of Blazor enhanced nav (data-enhance="false").
Nav links use absolute paths; seed data includes Design/Deployment group mappings.
DataConnections page loads all connections (not just site-assigned).
Site appsettings configured for Test Plant A; Site registers with Central on startup.
DeploymentService resolves string site identifier for Akka routing.
Instances page gains Create Instance form.
2026-03-17 10:03:06 -04:00
Joseph Doherty
6fa4c101ab Fix blazor.web.js 404: move App.razor and Routes.razor to Host project
Root cause: App.razor was in CentralUI (Microsoft.NET.Sdk.Razor RCL) but
MapRazorComponents<App>().AddInteractiveServerRenderMode() serves
_framework/blazor.web.js from the host assembly (Microsoft.NET.Sdk.Web).
Per MS docs, the root component must be in the server (host) project.

- Move App.razor and Routes.razor from CentralUI to Host/Components/
- Add Host/_Imports.razor for Razor component usings
- Make MapCentralUI generic: MapCentralUI<TApp>() accepts root component
- Add AdditionalAssemblies to discover CentralUI's pages/layouts
- Routes.razor references both Host and CentralUI assemblies
- Fix all DI registrations (TemplateEngine services)
- SCADALINK_CONFIG env var for role-specific config loading

Verified: blazor.web.js HTTP 200 (200KB), login page renders with Blazor
interactive server mode, SignalR circuit establishes, LDAP auth works.
2026-03-17 04:01:12 -04:00
Joseph Doherty
0b10747bd2 Fix Central launch profile: auth middleware, cookie auth, antiforgery, static files
- Add UseAuthentication/UseAuthorization/UseAntiforgery/UseStaticFiles middleware
- Register ASP.NET Core cookie authentication scheme in AddSecurity()
- Update auth endpoints to use SignInAsync/SignOutAsync (proper cookie auth)
- Add [AllowAnonymous] to login page
- Create wwwroot for static file serving
- Regenerate clean EF migration after model changes

Verified with launch profile "ScadaLink Central":
- Host starts, connects to SQL Server, applies EF migrations
- Akka.NET cluster forms (remoting on 8081, node joins self as leader)
- /health/ready returns Healthy (DB + Akka checks)
- LDAP auth works (admin/password via GLAuth → 302 + auth cookie set)
- Login page renders (HTTP 200)
- Unauthenticated requests redirect to /login
2026-03-17 03:43:11 -04:00
Joseph Doherty
121983fd66 Add Rider launch profiles, fix DI and migrations for dev startup
- Rider launch profiles: "ScadaLink Central" and "ScadaLink Site"
- appsettings.Central.json: correct test_infra credentials (ScadaLink_Dev1#,
  scadalink_app user, GLAuth on 3893, Mailpit on 1025)
- Fix HealthMonitoring DI: split site vs central registration to avoid
  missing IHealthReportTransport on central
- Regenerate single clean EF migration (InitialSchema) covering all entities
- Suppress PendingModelChangesWarning in dev mode
- Fix isDevelopment check for ASPNETCORE_ENVIRONMENT propagation

Verified: Host starts, connects to SQL Server, applies migrations, boots
Akka.NET cluster, LDAP auth works (admin/password via GLAuth), health
endpoint returns Healthy.
2026-03-17 03:01:21 -04:00
Joseph Doherty
2ae807df37 Phase 7: Integration surfaces — Inbound API, External System Gateway, Notification Service
Inbound API (WP-1–5):
- POST /api/{methodName} with X-API-Key auth (401/403)
- Parameter validation with extended type system (Object, List)
- Central script execution with configurable timeout
- Route.To() cross-site calls (Call, GetAttribute/SetAttribute batch)
- Failures-only logging

External System Gateway (WP-6–10):
- HTTP/REST client with JSON, API Key + Basic Auth
- Dual call modes: Call() synchronous, CachedCall() with S&F
- Error classification (transient: 5xx/408/429, permanent: 4xx)
- Database.Connection() (ADO.NET pooling) + Database.CachedWrite() (S&F)

Notification Service (WP-11–13):
- SMTP with OAuth2 Client Credentials + Basic Auth
- BCC delivery, plain text, token lifecycle
- Transient → S&F, permanent → returned to script
- ScriptRuntimeContext wired with ExternalSystem/Database/Notify APIs

Repository implementations: ExternalSystem, Notification, InboundApi, InstanceLocator
781 tests pass, zero warnings.
2026-03-16 22:19:12 -04:00
Joseph Doherty
b659978764 Phase 8: Production readiness — failover tests, security hardening, sandboxing, deployment docs
- WP-1-3: Central/site failover + dual-node recovery tests (17 tests)
- WP-4: Performance testing framework for target scale (7 tests)
- WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests)
- WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs)
- WP-7: Recovery drill test scaffolds (5 tests)
- WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests)
- WP-9: Message contract compatibility (forward/backward compat) (18 tests)
- WP-10: Deployment packaging (installation guide, production checklist, topology)
- WP-11: Operational runbooks (failover, troubleshooting, maintenance)
92 new tests, all passing. Zero warnings.
2026-03-16 22:12:31 -04:00