scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	0b4c1563aa	fix(communication): resolve Communication-009,010,011 — atomic site-cache refresh, XML doc correction, test coverage	2026-05-16 22:04:21 -04:00
Joseph Doherty	632d44f38c	fix(host,deployment-manager,communication): repair cross-module DI regressions from batch 1-2 - DeploymentManager-008: revert IConfiguration overload (violated OptionsTests component-convention); Host now binds the ScadaLink:DeploymentManager section - SiteStreamGrpcServer: make test-only int ctor internal so DI sees one public ctor (resolves ambiguous-constructor failure in SiteCompositionRootTests) - Host site composition-root test config: supply Cluster:SeedNodes for the new ClusterOptionsValidator	2026-05-16 21:28:50 -04:00
Joseph Doherty	31a6995d24	fix(communication): resolve Communication-004..008 — Resume supervision, gRPC option wiring, address-load logging, sync dispose, flap detection	2026-05-16 20:58:03 -04:00
Joseph Doherty	bc548e1447	feat(deployment-manager): resolve DeploymentManager-006 — query site deployment state before redeploy and reconcile Adds DeploymentStateQuery request/response contracts (Commons), a site-side handler (SiteRuntime), a CommunicationService query method (Communication), and reconciliation in DeploymentService: when a prior record is InProgress or Failed-on-timeout, query the site; if it already holds the target revision hash mark the record Success without re-sending; on query failure fall through to a normal deploy (site-side stale-rejection is the safety net).	2026-05-16 20:12:24 -04:00
Joseph Doherty	301e7fb854	fix(communication): resolve Communication-002/003 — gRPC reconnect stream cleanup and subscription map safety	2026-05-16 19:33:09 -04:00
Joseph Doherty	a9ceba00d0	fix(communication): resolve Communication-001 — early stream termination handling DebugStreamService.StartStreamAsync awaited the initial debug snapshot inside a try whose only handler was catch (OperationCanceledException). When the stream terminated before the snapshot arrived, onTerminatedWrapper completed the await with an InvalidOperationException that escaped the catch — the caller got a raw, untranslated exception and the service did no teardown of its own on that path. Replaced with catch (Exception): it removes the session entry, sends StopDebugStream to the bridge actor via the local reference (deterministic teardown, idempotent), and throws a descriptive exception — TimeoutException for the 30s timeout, otherwise an InvalidOperationException naming the instance/site and wrapping the cause. Re-triaged Critical -> Medium: the originally-claimed multi-minute site-side resource leak does not occur (the bridge actor self-terminates on every onTerminated path). Adds the first DebugStreamService test, which fails against the pre-fix code.	2026-05-16 18:32:52 -04:00
Joseph Doherty	9c60592632	build: adopt NuGet Central Package Management Move all package versions into Directory.Packages.props so every project resolves a single consistent version. Consolidates the Roslyn packages (Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which resolves the pre-existing NU1608 version-skew error in the test projects.	2026-05-16 15:56:30 -04:00
Joseph Doherty	295150751f	feat(scripts): realign Test Run with runtime API, add anonymous-object calls and instance binding The Test Run sandbox and Monaco analysis modelled a script API that had drifted from the site runtime's ScriptGlobals, so real scripts failed to compile in Test Run. Realign both to the runtime surface (Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the duplicate ScriptHost stub so the two cannot diverge again. - Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call) accept an anonymous object instead of a hand-built dictionary, via a shared ScriptArgs normalizer; existing dictionary calls still compile. - Test Run can optionally bind to a deployed instance, so Instance/ Attributes/CallScript route to it cross-site; adds site-side RouteToGetAttributes/RouteToSetAttributes handlers. - Adds Test Run panels to the API method and template script editors. - Fixes the TestDatabaseQuery seed script, which queried a table that never existed. Also commits unrelated in-progress work already in the tree: the health monitoring report loop, site streaming changes, and the Admin/Design data-connection and SMTP page reorganization.	2026-05-16 03:37:56 -04:00
Joseph Doherty	f66dc031a4	fix(health): route site heartbeats into the aggregator CentralCommunicationActor.HandleHeartbeat was forwarding each incoming HeartbeatMessage to Context.Parent, which resolves to the /user guardian — a non-actor. Every site heartbeat went straight to dead letters (~1026 per central node per 30 minutes at the default ~2s interval across three sites). The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which bumps LastReportReceivedAt on already-known sites (and clears IsOnline if it had flipped) without touching LatestReport. Heartbeats from unregistered sites are dropped — first registration still happens on the first full report. CentralCommunicationActor calls this in place of the no-op Tell. The result: heartbeats now serve their stated health-monitoring purpose (per CLAUDE.md) by keeping a site marked online between the 30s full reports if a single report is briefly delayed, and the dead letter noise disappears entirely.	2026-05-13 08:11:43 -04:00
Joseph Doherty	1822e3c76f	fix(store-and-forward): wire up parked-message handler and start S&F service on sites The Parked Messages page returned "Parked message handler not available" because no actor was ever registered for ParkedMessages, and Retry/Discard requests had no Receive at all (would have hit deadletters). On top of that, StoreAndForwardService.StartAsync() was never called anywhere, so the sf_messages SQLite table was never created and the retry timer never ran — silently breaking all of S&F. - New ParkedMessageHandlerActor bridges StoreAndForwardService.{Get,Retry,Discard} using the Sender→Task→PipeTo pattern already used in DeploymentManagerActor. - SiteCommunicationActor now routes ParkedMessageRetryRequest and ParkedMessageDiscardRequest the same way as the existing Query handler. - AkkaHostedService.RegisterSiteActors() resolves StoreAndForwardService, calls StartAsync() to create the schema and start the timer, then creates and registers the handler actor.	2026-05-13 07:12:37 -04:00
Joseph Doherty	6f1f6b8467	fix(health): replicate site health reports between central nodes CentralHealthAggregator is a per-node hosted singleton, but site health reports flow through ClusterClient which round-robins each report to one central node only. The other node's aggregator never saw those reports and marked sites offline at the 60s threshold — sites constantly flapped between online and offline on the monitoring page. On receive, the active CentralCommunicationActor now republishes a SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both central nodes subscribe to the topic and process replicas through a dedicated path that updates the local aggregator without re-broadcasting (avoids fan-out loops). The aggregator's existing sequence-number idempotency makes self-delivery a cheap no-op. DistributedPubSubExtensionProvider is now listed in the HOCON `akka.extensions` block so the mediator is initialised at cluster start, eliminating a race where the first Subscribe arrived before the extension was loaded.	2026-05-13 06:20:07 -04:00
Joseph Doherty	751248feb6	feat(alarms): HiLo trigger type with per-band level, hysteresis, messages, overrides Adds a new HiLo alarm trigger type with four configurable setpoints (LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority, deadband (for hysteresis), and operator message. The site runtime emits AlarmStateChanged with an AlarmLevel field so consumers can differentiate warning vs critical bands. Plumbing: - new AlarmLevel enum + AlarmStateChanged.Level/Message init properties - AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting - AlarmTriggerConfigCodec extracted from the editor for testability - sitestream.proto carries level + message over gRPC - SemanticValidator enforces numeric attribute, setpoint ordering, non-negative deadband - on-trigger scripts get an Alarm global (Name/Level/Priority/Message) so notification routing can branch by severity - per-instance InstanceAlarmOverride entity + EF migration + flattening step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary types whole-replace - DebugView shows a Level badge + per-band message tooltip - App.razor auto-reloads on permanent Blazor circuit failure - docker/regen-proto.sh automates the proto regen workflow (the linux/arm64 protoc segfault means generated files are checked in for now)	2026-05-13 03:23:32 -04:00
Joseph Doherty	dcdf79afdc	fix(dcl): format ArrayValue objects as comma-separated strings for display ArrayValue from LmxProxy client was showing as type name in debug views. Added ValueFormatter utility and NormalizeValue in LmxProxyDataConnection to convert arrays at the adapter boundary. DateTime arrays remain as "System.DateTime[]" due to server-side v1 string serialization.	2026-03-22 14:46:15 -04:00
Joseph Doherty	416a03b782	feat: complete gRPC streaming channel — site host, docker config, docs, integration tests Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server, add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber, expose gRPC ports in docker-compose, add site seed script, update all 10 requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.	2026-03-21 12:38:33 -04:00
Joseph Doherty	49f042a937	refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC	2026-03-21 12:18:52 -04:00
Joseph Doherty	2cd43b6992	feat: update DebugStreamBridgeActor to use gRPC for streaming events After receiving the initial snapshot via ClusterClient, the bridge actor now opens a gRPC server-streaming subscription via SiteStreamGrpcClient for ongoing AttributeValueChanged/AlarmStateChanged events. Adds NodeA/ NodeB failover with max 3 retries, retry count reset on successful event, and IWithTimers-based reconnect scheduling. - DebugStreamBridgeActor: gRPC stream after snapshot, reconnect state machine - DebugStreamService: inject SiteStreamGrpcClientFactory, resolve gRPC addresses - ServiceCollectionExtensions: register SiteStreamGrpcClientFactory singleton - SiteStreamGrpcClient: make SubscribeAsync/Unsubscribe virtual for testability - SiteStreamGrpcClientFactory: make GetOrCreate virtual for testability - New test suite: DebugStreamBridgeActorTests (8 tests)	2026-03-21 12:14:24 -04:00
Joseph Doherty	25a6022f7b	feat: add SiteStreamGrpcClient and SiteStreamGrpcClientFactory Per-site gRPC client for central-side streaming subscriptions to site servers. SiteStreamGrpcClient manages server-streaming calls with keepalive, converts proto events to domain types, and supports cancellation via Unsubscribe. SiteStreamGrpcClientFactory caches one client per site identifier. Includes InternalsVisibleTo for test access to conversion helpers and comprehensive unit tests for event mapping, quality/alarm-state conversion, unsubscribe behavior, and factory caching.	2026-03-21 12:06:38 -04:00
Joseph Doherty	55a05914d0	feat: add SiteStreamGrpcServer with Channel<T> bridge and stream limits - Define ISiteStreamSubscriber interface for decoupling from SiteRuntime - Implement SiteStreamGrpcServer (inherits SiteStreamServiceBase) with: - Readiness gate (SetReady) - Max concurrent stream enforcement - Duplicate correlationId replacement (cancels previous stream) - StreamRelayActor creation per subscription - Bounded Channel<SiteStreamEvent> bridge (1000 capacity, drop-oldest) - Clean teardown: unsubscribe, stop actor, remove tracking entry - Identity-safe cleanup using ConcurrentDictionary.TryRemove(KeyValuePair) to prevent replacement streams from being removed by predecessor cleanup - 7 unit tests covering reject-not-ready, max-streams, duplicate cancel, cleanup-on-cancel, subscribe/remove lifecycle, event forwarding	2026-03-21 11:52:31 -04:00
Joseph Doherty	d70bbbe739	feat: add StreamRelayActor bridging Akka events to gRPC proto channel	2026-03-21 11:48:04 -04:00
Joseph Doherty	826cfbee31	feat: add sitestream.proto definition and generated gRPC stubs Define the SiteStreamService proto for real-time instance event streaming (attribute value changes, alarm state changes) from site nodes to central. Add pre-generated C# stubs following the existing LmxProxy pattern, gRPC NuGet packages with FrameworkReference for ASP.NET Core server types, and proto roundtrip tests.	2026-03-21 11:37:39 -04:00
Joseph Doherty	3efec91386	fix: route debug stream events through ClusterClient site→central path ClusterClient Sender refs are temporary proxies — valid for immediate reply but not durable for future Tells. Events now flow as DebugStreamEvent through SiteCommunicationActor → ClusterClient → CentralCommunicationActor → bridge actor (same pattern as health reports). Also fix DebugStreamHub to use IHubContext for long-lived callbacks instead of transient hub instance.	2026-03-21 11:32:17 -04:00
Joseph Doherty	fd2e96fea2	feat: replace debug view polling with real-time SignalR streaming The debug view polled every 2s by re-subscribing for full snapshots. Now a persistent DebugStreamBridgeActor on central subscribes once and receives incremental Akka stream events from the site, forwarding them to the Blazor component via callbacks and to the CLI via a new SignalR hub at /hubs/debug-stream. Adds `debug stream` CLI command with auto-reconnect.	2026-03-21 01:34:53 -04:00
Joseph Doherty	7740a3bcf9	feat: add JoeAppEngine OPC UA nodes, fix DCL auto-reconnect and quality push - Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime) - Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads, preventing Self.Tell failure in Disconnected event handler - Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect - Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]" - Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync - Update test_infra_opcua.md with JoeAppEngine documentation	2026-03-19 13:27:54 -04:00
Joseph Doherty	78fbb13df7	feat: wire Inbound API Route.To().Call() to site instance scripts and add Roslyn compilation Completes the Inbound API → site script call chain by adding RouteToCallRequest handlers in SiteCommunicationActor and DeploymentManagerActor. Also replaces the placeholder dispatch table in InboundScriptExecutor with Roslyn compilation of API method scripts at startup, enabling user-defined inbound API methods to call instance scripts across the cluster.	2026-03-18 08:43:13 -04:00
Joseph Doherty	9c6e3c2e56	feat: add CLI debug snapshot command for one-shot instance state inspection Adds `debug snapshot --id <int>` to query a running instance's current attribute values and alarm states without the subscribe/stream overhead of the debug view. Routes through ManagementActor → CommunicationService → site DeploymentManager → InstanceActor using the existing remote query pattern.	2026-03-18 07:16:22 -04:00
Joseph Doherty	4f22ca2b1f	feat: replace ActorSelection with ClusterClient for inter-cluster communication Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.	2026-03-18 00:08:47 -04:00
Joseph Doherty	e5eb871961	fix: wire up health report pipeline between sites and central aggregator Sites now send SiteHealthReport via AkkaHealthReportTransport → SiteCommunicationActor → CentralCommunicationActor → CentralHealthAggregator. Added IHealthReportTransport impl, ISiteIdentityProvider impl, registered HealthReportSender on site nodes, and added SiteHealthReport handler in CentralCommunicationActor. Health Dashboard now shows all 3 sites online.	2026-03-17 23:46:17 -04:00
Joseph Doherty	9e97c1acd2	feat: replace site registration with database-driven site addressing Central now resolves site Akka remoting addresses from the Sites DB table (NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite messages. Eliminates the race condition where sites starting before central had their registration dead-lettered. Addresses are cached in CentralCommunicationActor with 60s periodic refresh and on-demand refresh when sites are added/edited/deleted via UI or CLI.	2026-03-17 23:13:10 -04:00
Joseph Doherty	4879c4e01e	Fix auth, Bootstrap, Blazor nav, LDAP, and deployment pipeline for working Central UI Bootstrap served locally with absolute paths and <base href="/">. LDAP auth uses search-then-bind with service account for GLAuth compatibility. CookieAuthenticationStateProvider reads HttpContext.User instead of parsing JWT. Login/logout forms opt out of Blazor enhanced nav (data-enhance="false"). Nav links use absolute paths; seed data includes Design/Deployment group mappings. DataConnections page loads all connections (not just site-assigned). Site appsettings configured for Test Plant A; Site registers with Central on startup. DeploymentService resolves string site identifier for Akka routing. Instances page gains Create Instance form.	2026-03-17 10:03:06 -04:00
Joseph Doherty	b659978764	Phase 8: Production readiness — failover tests, security hardening, sandboxing, deployment docs - WP-1-3: Central/site failover + dual-node recovery tests (17 tests) - WP-4: Performance testing framework for target scale (7 tests) - WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests) - WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs) - WP-7: Recovery drill test scaffolds (5 tests) - WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests) - WP-9: Message contract compatibility (forward/backward compat) (18 tests) - WP-10: Deployment packaging (installation guide, production checklist, topology) - WP-11: Operational runbooks (failover, troubleshooting, maintenance) 92 new tests, all passing. Zero warnings.	2026-03-16 22:12:31 -04:00
Joseph Doherty	389f5a0378	Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging Communication Layer (WP-1–5): - 8 message patterns with correlation IDs, per-pattern timeouts - Central/Site communication actors, transport heartbeat config - Connection failure handling (no central buffering, debug streams killed) Data Connection Layer (WP-6–14, WP-34): - Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting) - OPC UA + LmxProxy adapters behind IDataConnection - Auto-reconnect, bad quality propagation, transparent re-subscribe - Write-back, tag path resolution with retry, health reporting - Protocol extensibility via DataConnectionFactory Site Runtime (WP-15–25, WP-32–33): - ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher) - AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state) - SharedScriptLibrary (inline execution), ScriptRuntimeContext (API) - ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout) - Recursion limit (default 10), call direction enforcement - SiteStreamManager (per-subscriber bounded buffers, fire-and-forget) - Debug view backend (snapshot + stream), concurrency serialization - Local artifact storage (4 SQLite tables) Health Monitoring (WP-26–28): - SiteHealthCollector (thread-safe counters, connection state) - HealthReportSender (30s interval, monotonic sequence numbers) - CentralHealthAggregator (offline detection 60s, online recovery) Site Event Logging (WP-29–31): - SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC) - EventLogPurgeService (30-day retention, 1GB cap) - EventLogQueryService (filters, keyword search, keyset pagination) 541 tests pass, zero warnings.	2026-03-16 20:57:25 -04:00
Joseph Doherty	8c2091dc0a	Phase 0 WP-0.10–0.12: Host skeleton, options classes, sample configs, and execution framework - WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host), 15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs - WP-0.11: 12 per-component options classes with config binding - WP-0.12: Sample appsettings for central and site topologies - Add execution procedure and checklist template to generate_plans.md - Add phase-0-checklist.md for execution tracking - Resolve all 21 open questions from plan generation - Update IDataConnection with batch ops and IAsyncDisposable 57 tests pass, zero warnings.	2026-03-16 18:59:07 -04:00
Joseph Doherty	fed5f5a82c	Add .gitignore and remove tracked build artifacts (bin/obj)	2026-03-16 18:38:00 -04:00
Joseph Doherty	34190e1347	Phase 0 WP-0.1: Create .NET 10 solution structure with all 17 component projects 17 source projects (Commons + Host + 15 components) and 17 xUnit test projects. SLNX format, net10.0, nullable enabled, warnings as errors. All components reference Commons; Host references all components. Builds and tests clean.	2026-03-16 18:37:36 -04:00

34 Commits