ScadaBridge

Author	SHA1	Message	Date
Joseph Doherty	17e24ddd20	fix(site-event-log): record script errors and route queries to the active node Script execution failures were only written to Serilog, never to the site event log — SiteRuntime did not reference the SiteEventLogging project. ScriptExecutionActor now resolves ISiteEventLogger and emits a 'script'/'Error' event on timeout and exception. The event-log query handler was a per-node actor bound to that node's local SQLite. A ClusterClient query could land on the standby (which records no events) and return nothing. The handler is now a cluster singleton with a proxy, so queries always reach the active node.	2026-05-15 12:04:59 -04:00
Joseph Doherty	1822e3c76f	fix(store-and-forward): wire up parked-message handler and start S&F service on sites The Parked Messages page returned "Parked message handler not available" because no actor was ever registered for ParkedMessages, and Retry/Discard requests had no Receive at all (would have hit deadletters). On top of that, StoreAndForwardService.StartAsync() was never called anywhere, so the sf_messages SQLite table was never created and the retry timer never ran — silently breaking all of S&F. - New ParkedMessageHandlerActor bridges StoreAndForwardService.{Get,Retry,Discard} using the Sender→Task→PipeTo pattern already used in DeploymentManagerActor. - SiteCommunicationActor now routes ParkedMessageRetryRequest and ParkedMessageDiscardRequest the same way as the existing Query handler. - AkkaHostedService.RegisterSiteActors() resolves StoreAndForwardService, calls StartAsync() to create the schema and start the timer, then creates and registers the handler actor.	2026-05-13 07:12:37 -04:00
Joseph Doherty	6f1f6b8467	fix(health): replicate site health reports between central nodes CentralHealthAggregator is a per-node hosted singleton, but site health reports flow through ClusterClient which round-robins each report to one central node only. The other node's aggregator never saw those reports and marked sites offline at the 60s threshold — sites constantly flapped between online and offline on the monitoring page. On receive, the active CentralCommunicationActor now republishes a SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both central nodes subscribe to the topic and process replicas through a dedicated path that updates the local aggregator without re-broadcasting (avoids fan-out loops). The aggregator's existing sequence-number idempotency makes self-delivery a cheap no-op. DistributedPubSubExtensionProvider is now listed in the HOCON `akka.extensions` block so the mediator is initialised at cluster start, eliminating a race where the first Subscribe arrived before the extension was loaded.	2026-05-13 06:20:07 -04:00
Joseph Doherty	65cc7b69cd	feat(health): wire up NodeHostname, ConnectionEndpoint, TagQuality, ParkedMessageCount collectors - AkkaHostedService: SetNodeHostname from NodeOptions - DataConnectionActor: UpdateConnectionEndpoint on state transitions, track per-tag quality counts and UpdateTagQuality on value changes - HealthReportSender: query StoreAndForwardStorage for parked message count - StoreAndForwardStorage: add GetParkedMessageCountAsync()	2026-03-24 16:19:39 -04:00
Joseph Doherty	b3222cf30b	fix(site-runtime): wire EventLogHandlerActor so site event log queries work The SiteCommunicationActor expected an event log handler but none was registered, causing "Event log handler not available" on the Event Logs page and CLI. Bridge IEventLogQueryService to Akka via a simple actor.	2026-03-23 00:37:33 -04:00
Joseph Doherty	801c0c1df2	feat(dcl): add active endpoint to health reports and log failover events Add ActiveEndpoint field to DataConnectionHealthReport showing which endpoint is active (Primary, Backup, or Primary with no backup configured). Log failover transitions and connection restoration events to the site event log via ISiteEventLogger, passed as an optional parameter through the actor hierarchy for backwards compatibility.	2026-03-22 08:34:05 -04:00
Joseph Doherty	416a03b782	feat: complete gRPC streaming channel — site host, docker config, docs, integration tests Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server, add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber, expose gRPC ports in docker-compose, add site seed script, update all 10 requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.	2026-03-21 12:38:33 -04:00
Joseph Doherty	fd2e96fea2	feat: replace debug view polling with real-time SignalR streaming The debug view polled every 2s by re-subscribing for full snapshots. Now a persistent DebugStreamBridgeActor on central subscribes once and receives incremental Akka stream events from the site, forwarding them to the Blazor component via callbacks and to the CLI via a new SignalR hub at /hubs/debug-stream. Adds `debug stream` CLI command with auto-reconnect.	2026-03-21 01:34:53 -04:00
Joseph Doherty	1a540f4f0a	feat: add HTTP Management API, migrate CLI from Akka ClusterClient to HTTP Replace the CLI's Akka.NET ClusterClient transport with a simple HTTP client targeting a new POST /management endpoint on the Central Host. The endpoint handles Basic Auth, LDAP authentication, role resolution, and ManagementActor dispatch in a single round-trip — eliminating the CLI's Akka, LDAP, and Security dependencies. Also fixes DCL ReSubscribeAll losing subscriptions on repeated reconnect by deriving the tag list from _subscriptionsByInstance instead of _subscriptionIds.	2026-03-20 23:55:31 -04:00
Joseph Doherty	eb8ead58d2	feat: wire SQLite replication between site nodes and fix ConfigurationDatabase tests Add SiteReplicationActor (runs on every site node) to replicate deployed configs and store-and-forward buffer operations to the standby peer via cluster member discovery and fire-and-forget Tell. Wire ReplicationService handler and pass replication actor to DeploymentManagerActor singleton. Fix 5 pre-existing ConfigurationDatabase test failures: RowVersion NOT NULL on SQLite, stale migration name assertion, and seed data count mismatch.	2026-03-18 08:28:02 -04:00
Joseph Doherty	899dec6b6f	feat: wire ExternalSystem, Database, and Notify APIs into script runtime IServiceProvider now flows through the actor chain (DeploymentManagerActor → InstanceActor → ScriptActor → ScriptExecutionActor) so scripts can resolve IExternalSystemClient, IDatabaseGateway, and INotificationDeliveryService from DI. ScriptGlobals exposes ExternalSystem, Database, Notify, and Scripts as top-level properties so scripts can use them without the Instance. prefix.	2026-03-18 02:41:18 -04:00
Joseph Doherty	f165ca2774	feat: wire all health metrics and add instance counts to dashboard Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor), alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and S&F buffer depth placeholder. Added instance count tracking (deployed/ enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated Health Dashboard UI to show instance counts per site. All metrics flow through the existing health report pipeline via ClusterClient.	2026-03-18 00:57:49 -04:00
Joseph Doherty	75a6636a2c	fix: wire DCL connection state changes into ISiteHealthCollector DataConnectionActor now calls UpdateConnectionHealth() on state transitions (Connecting/Connected/Reconnecting) and UpdateTagResolution() on connection establishment. DataConnectionManagerActor calls RemoveConnection() on actor removal. Health reports now include data connection statuses when instances are deployed with bindings.	2026-03-18 00:20:02 -04:00
Joseph Doherty	4f22ca2b1f	feat: replace ActorSelection with ClusterClient for inter-cluster communication Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.	2026-03-18 00:08:47 -04:00
Joseph Doherty	9e97c1acd2	feat: replace site registration with database-driven site addressing Central now resolves site Akka remoting addresses from the Sites DB table (NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite messages. Eliminates the race condition where sites starting before central had their registration dead-lettered. Addresses are cached in CentralCommunicationActor with 60s periodic refresh and on-demand refresh when sites are added/edited/deleted via UI or CLI.	2026-03-17 23:13:10 -04:00
Joseph Doherty	1942544769	feat: register ManagementActor on Central with ClusterClientReceptionist	2026-03-17 14:49:35 -04:00
Joseph Doherty	dfb809a909	Wire DCL to Instance Actors for OPC UA tag value flow - Add TagValueUpdate/ConnectionQualityChanged handlers to InstanceActor - InstanceActor subscribes to DCL on PreStart based on DataSourceReference - DeploymentManagerActor creates DCL connections on deploy and passes DCL ref - AkkaHostedService creates DCL Manager Actor for tag subscriptions - Move CreateConnectionCommand to Commons for cross-project access - Add ConnectionConfig to FlattenedConfiguration for deployment packaging	2026-03-17 11:21:11 -04:00
Joseph Doherty	4879c4e01e	Fix auth, Bootstrap, Blazor nav, LDAP, and deployment pipeline for working Central UI Bootstrap served locally with absolute paths and <base href="/">. LDAP auth uses search-then-bind with service account for GLAuth compatibility. CookieAuthenticationStateProvider reads HttpContext.User instead of parsing JWT. Login/logout forms opt out of Blazor enhanced nav (data-enhance="false"). Nav links use absolute paths; seed data includes Design/Deployment group mappings. DataConnections page loads all connections (not just site-assigned). Site appsettings configured for Test Plant A; Site registers with Central on startup. DeploymentService resolves string site identifier for Akka routing. Instances page gains Create Instance form.	2026-03-17 10:03:06 -04:00
Joseph Doherty	389f5a0378	Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging Communication Layer (WP-1–5): - 8 message patterns with correlation IDs, per-pattern timeouts - Central/Site communication actors, transport heartbeat config - Connection failure handling (no central buffering, debug streams killed) Data Connection Layer (WP-6–14, WP-34): - Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting) - OPC UA + LmxProxy adapters behind IDataConnection - Auto-reconnect, bad quality propagation, transparent re-subscribe - Write-back, tag path resolution with retry, health reporting - Protocol extensibility via DataConnectionFactory Site Runtime (WP-15–25, WP-32–33): - ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher) - AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state) - SharedScriptLibrary (inline execution), ScriptRuntimeContext (API) - ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout) - Recursion limit (default 10), call direction enforcement - SiteStreamManager (per-subscriber bounded buffers, fire-and-forget) - Debug view backend (snapshot + stream), concurrency serialization - Local artifact storage (4 SQLite tables) Health Monitoring (WP-26–28): - SiteHealthCollector (thread-safe counters, connection state) - HealthReportSender (30s interval, monotonic sequence numbers) - CentralHealthAggregator (offline detection 60s, online recovery) Site Event Logging (WP-29–31): - SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC) - EventLogPurgeService (30-day retention, 1GB cap) - EventLogQueryService (filters, keyword search, keyset pagination) 541 tests pass, zero warnings.	2026-03-16 20:57:25 -04:00
Joseph Doherty	e9e6165914	Phase 3A: Site runtime foundation — Akka cluster, SQLite persistence, Deployment Manager singleton, Instance Actor - WP-1: Site cluster config (keep-oldest SBR, down-if-alone, 2s/10s failure detection) - WP-2: Site-role host bootstrap (no Kestrel, SQLite paths) - WP-3: SiteStorageService with deployed_configurations + static_attribute_overrides tables - WP-4: DeploymentManagerActor as cluster singleton with staggered Instance Actor creation, OneForOneStrategy/Resume supervision, deploy/disable/enable/delete lifecycle - WP-5: InstanceActor with attribute state, GetAttribute/SetAttribute, SQLite override persistence - WP-6: CoordinatedShutdown verified for graceful singleton handover - WP-7: Dual-node recovery (both seed nodes, min-nr-of-members=1) - WP-8: 31 tests (storage CRUD, actor lifecycle, supervision, negative checks) 389 total tests pass, zero warnings.	2026-03-16 20:34:56 -04:00
Joseph Doherty	d38356efdb	Phase 1 WP-11–22: Host infrastructure, Blazor Server UI, and integration tests Host infrastructure (WP-11–17): - StartupValidator with 19 validation rules - /health/ready endpoint with DB + Akka health checks - Akka.NET bootstrap via AkkaHostedService (HOCON config, cluster, remoting, SBR) - Serilog with SiteId/NodeHostname/NodeRole enrichment - DeadLetterMonitorActor with count tracking - CoordinatedShutdown wiring (no Environment.Exit) - Windows Service support (UseWindowsService) Central UI (WP-18–21): - Blazor Server shell with Bootstrap 5, role-aware NavMenu - Login/logout flow (LDAP auth → JWT → HTTP-only cookie) - CookieAuthenticationStateProvider with idle timeout - LDAP group mapping CRUD page (Admin role) - Route guards with Authorize attributes per role - SignalR reconnection overlay for failover Integration tests (WP-22): - Startup validation, auth flow, audit transactions, readiness gating 186 tests pass (1 skipped: LDAP integration), zero warnings.	2026-03-16 19:50:59 -04:00

21 Commits