ScadaBridge

Author	SHA1	Message	Date
Joseph Doherty	8c78913503	fix(communication): correct audit-ingest timeout-path docs and add timeout test	2026-05-21 03:29:54 -04:00
Joseph Doherty	6d073046c6	feat(communication): route audit ingest commands through CentralCommunicationActor	2026-05-21 03:23:30 -04:00
Joseph Doherty	2ff62a2ceb	feat(notification-outbox): route NotificationSubmit to the outbox actor	2026-05-19 02:38:04 -04:00
Joseph Doherty	0b4c1563aa	fix(communication): resolve Communication-009,010,011 — atomic site-cache refresh, XML doc correction, test coverage	2026-05-16 22:04:21 -04:00
Joseph Doherty	31a6995d24	fix(communication): resolve Communication-004..008 — Resume supervision, gRPC option wiring, address-load logging, sync dispose, flap detection	2026-05-16 20:58:03 -04:00
Joseph Doherty	f66dc031a4	fix(health): route site heartbeats into the aggregator CentralCommunicationActor.HandleHeartbeat was forwarding each incoming HeartbeatMessage to Context.Parent, which resolves to the /user guardian — a non-actor. Every site heartbeat went straight to dead letters (~1026 per central node per 30 minutes at the default ~2s interval across three sites). The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which bumps LastReportReceivedAt on already-known sites (and clears IsOnline if it had flipped) without touching LatestReport. Heartbeats from unregistered sites are dropped — first registration still happens on the first full report. CentralCommunicationActor calls this in place of the no-op Tell. The result: heartbeats now serve their stated health-monitoring purpose (per CLAUDE.md) by keeping a site marked online between the 30s full reports if a single report is briefly delayed, and the dead letter noise disappears entirely.	2026-05-13 08:11:43 -04:00
Joseph Doherty	6f1f6b8467	fix(health): replicate site health reports between central nodes CentralHealthAggregator is a per-node hosted singleton, but site health reports flow through ClusterClient which round-robins each report to one central node only. The other node's aggregator never saw those reports and marked sites offline at the 60s threshold — sites constantly flapped between online and offline on the monitoring page. On receive, the active CentralCommunicationActor now republishes a SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both central nodes subscribe to the topic and process replicas through a dedicated path that updates the local aggregator without re-broadcasting (avoids fan-out loops). The aggregator's existing sequence-number idempotency makes self-delivery a cheap no-op. DistributedPubSubExtensionProvider is now listed in the HOCON `akka.extensions` block so the mediator is initialised at cluster start, eliminating a race where the first Subscribe arrived before the extension was loaded.	2026-05-13 06:20:07 -04:00
Joseph Doherty	49f042a937	refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC	2026-03-21 12:18:52 -04:00
Joseph Doherty	3efec91386	fix: route debug stream events through ClusterClient site→central path ClusterClient Sender refs are temporary proxies — valid for immediate reply but not durable for future Tells. Events now flow as DebugStreamEvent through SiteCommunicationActor → ClusterClient → CentralCommunicationActor → bridge actor (same pattern as health reports). Also fix DebugStreamHub to use IHubContext for long-lived callbacks instead of transient hub instance.	2026-03-21 11:32:17 -04:00
Joseph Doherty	4f22ca2b1f	feat: replace ActorSelection with ClusterClient for inter-cluster communication Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.	2026-03-18 00:08:47 -04:00
Joseph Doherty	e5eb871961	fix: wire up health report pipeline between sites and central aggregator Sites now send SiteHealthReport via AkkaHealthReportTransport → SiteCommunicationActor → CentralCommunicationActor → CentralHealthAggregator. Added IHealthReportTransport impl, ISiteIdentityProvider impl, registered HealthReportSender on site nodes, and added SiteHealthReport handler in CentralCommunicationActor. Health Dashboard now shows all 3 sites online.	2026-03-17 23:46:17 -04:00
Joseph Doherty	9e97c1acd2	feat: replace site registration with database-driven site addressing Central now resolves site Akka remoting addresses from the Sites DB table (NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite messages. Eliminates the race condition where sites starting before central had their registration dead-lettered. Addresses are cached in CentralCommunicationActor with 60s periodic refresh and on-demand refresh when sites are added/edited/deleted via UI or CLI.	2026-03-17 23:13:10 -04:00
Joseph Doherty	389f5a0378	Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging Communication Layer (WP-1–5): - 8 message patterns with correlation IDs, per-pattern timeouts - Central/Site communication actors, transport heartbeat config - Connection failure handling (no central buffering, debug streams killed) Data Connection Layer (WP-6–14, WP-34): - Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting) - OPC UA + LmxProxy adapters behind IDataConnection - Auto-reconnect, bad quality propagation, transparent re-subscribe - Write-back, tag path resolution with retry, health reporting - Protocol extensibility via DataConnectionFactory Site Runtime (WP-15–25, WP-32–33): - ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher) - AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state) - SharedScriptLibrary (inline execution), ScriptRuntimeContext (API) - ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout) - Recursion limit (default 10), call direction enforcement - SiteStreamManager (per-subscriber bounded buffers, fire-and-forget) - Debug view backend (snapshot + stream), concurrency serialization - Local artifact storage (4 SQLite tables) Health Monitoring (WP-26–28): - SiteHealthCollector (thread-safe counters, connection state) - HealthReportSender (30s interval, monotonic sequence numbers) - CentralHealthAggregator (offline detection 60s, online recovery) Site Event Logging (WP-29–31): - SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC) - EventLogPurgeService (30-day retention, 1GB cap) - EventLogQueryService (filters, keyword search, keyset pagination) 541 tests pass, zero warnings.	2026-03-16 20:57:25 -04:00

13 Commits