scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	2ff62a2ceb	feat(notification-outbox): route NotificationSubmit to the outbox actor	2026-05-19 02:38:04 -04:00
Joseph Doherty	0b4c1563aa	fix(communication): resolve Communication-009,010,011 — atomic site-cache refresh, XML doc correction, test coverage	2026-05-16 22:04:21 -04:00
Joseph Doherty	31a6995d24	fix(communication): resolve Communication-004..008 — Resume supervision, gRPC option wiring, address-load logging, sync dispose, flap detection	2026-05-16 20:58:03 -04:00
Joseph Doherty	f66dc031a4	fix(health): route site heartbeats into the aggregator CentralCommunicationActor.HandleHeartbeat was forwarding each incoming HeartbeatMessage to Context.Parent, which resolves to the /user guardian — a non-actor. Every site heartbeat went straight to dead letters (~1026 per central node per 30 minutes at the default ~2s interval across three sites). The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which bumps LastReportReceivedAt on already-known sites (and clears IsOnline if it had flipped) without touching LatestReport. Heartbeats from unregistered sites are dropped — first registration still happens on the first full report. CentralCommunicationActor calls this in place of the no-op Tell. The result: heartbeats now serve their stated health-monitoring purpose (per CLAUDE.md) by keeping a site marked online between the 30s full reports if a single report is briefly delayed, and the dead letter noise disappears entirely.	2026-05-13 08:11:43 -04:00
Joseph Doherty	4f22ca2b1f	feat: replace ActorSelection with ClusterClient for inter-cluster communication Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.	2026-03-18 00:08:47 -04:00
Joseph Doherty	9e97c1acd2	feat: replace site registration with database-driven site addressing Central now resolves site Akka remoting addresses from the Sites DB table (NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite messages. Eliminates the race condition where sites starting before central had their registration dead-lettered. Addresses are cached in CentralCommunicationActor with 60s periodic refresh and on-demand refresh when sites are added/edited/deleted via UI or CLI.	2026-03-17 23:13:10 -04:00
Joseph Doherty	389f5a0378	Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging Communication Layer (WP-1–5): - 8 message patterns with correlation IDs, per-pattern timeouts - Central/Site communication actors, transport heartbeat config - Connection failure handling (no central buffering, debug streams killed) Data Connection Layer (WP-6–14, WP-34): - Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting) - OPC UA + LmxProxy adapters behind IDataConnection - Auto-reconnect, bad quality propagation, transparent re-subscribe - Write-back, tag path resolution with retry, health reporting - Protocol extensibility via DataConnectionFactory Site Runtime (WP-15–25, WP-32–33): - ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher) - AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state) - SharedScriptLibrary (inline execution), ScriptRuntimeContext (API) - ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout) - Recursion limit (default 10), call direction enforcement - SiteStreamManager (per-subscriber bounded buffers, fire-and-forget) - Debug view backend (snapshot + stream), concurrency serialization - Local artifact storage (4 SQLite tables) Health Monitoring (WP-26–28): - SiteHealthCollector (thread-safe counters, connection state) - HealthReportSender (30s interval, monotonic sequence numbers) - CentralHealthAggregator (offline detection 60s, online recovery) Site Event Logging (WP-29–31): - SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC) - EventLogPurgeService (30-day retention, 1GB cap) - EventLogQueryService (filters, keyword search, keyset pagination) 541 tests pass, zero warnings.	2026-03-16 20:57:25 -04:00

7 Commits