ScadaBridge

Author	SHA1	Message	Date
Joseph Doherty	ec92d55ebf	feat(smtp): UpdateSmtpConfigCommand carries TlsMode + Credentials Add two optional nullable fields (TlsMode, Credentials) to the UpdateSmtpConfigCommand record. The handler applies preserve-if-null semantics: an update that omits a field leaves the existing value intact, so existing 5-arg callers remain non-breaking.	2026-05-21 02:11:03 -04:00
Joseph Doherty	e93f655ce4	feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6)	2026-05-20 19:02:01 -04:00
Joseph Doherty	23c0fd417e	feat(health): AuditRedactionFailure counter + bridge (#23 M5) Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor over-redactions as a Site Health metric so a misconfigured / catastrophic regex shows up on /monitoring/health rather than disappearing into a NoOp sink. - SiteHealthReport: new 'AuditRedactionFailure' int field (defaulted to 0 for back-compat with existing producers/tests). - ISiteHealthCollector / SiteHealthCollector: new IncrementAuditRedactionFailure() — per-interval atomic counter with Interlocked, reset on CollectReport, mirroring the M2 Bundle G SiteAuditWriteFailures pattern. - HealthMetricsAuditRedactionFailureCounter: new bridge in ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter increments to ISiteHealthCollector — mirrors HealthMetricsAuditWriteFailureCounter one-for-one. - AddAuditLogHealthMetricsBridge: now ALSO Replaces the NoOpAuditRedactionFailureCounter binding with the health-metrics bridge, so a single AddAuditLogHealthMetricsBridge() call wires both the M2 Bundle G write-failure counter and the M5 Bundle C redaction-failure counter into the health report. Site-side only for M5 — the filter also runs on CentralAuditWriter and AuditLogIngestActor (where it just keeps the NoOp default), but a central-side health-metric surface for AuditRedactionFailure is deferred to M6 alongside the rest of the central health collector work. Tests: - AuditRedactionFailureMetricTests (HealthMonitoring) covers the SiteHealthCollector increment/report/reset shape (3 tests). - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers the AuditLog → HealthMonitoring bridge (3 tests). - Existing CountCapturingHealthCollector stub in DeploymentManagerRedeployTests extended with the new no-op interface method. Verified: dotnet build clean, all 24 test projects green (the only Failed at first ScadaLink.SiteRuntime.Tests run was the known-flaky InstanceActorChildAttributeRaceTests; passes on re-run in isolation and full suite, unrelated to these changes).	2026-05-20 17:28:33 -04:00
Joseph Doherty	0a97fff906	feat(auditlog): combined telemetry dual-write transaction (#23 M3)	2026-05-20 14:33:14 -04:00
Joseph Doherty	de110f8b42	feat(scaudit): SiteCallAuditActor minimum surface (#22 , #23 M3) Bundle C of Audit Log #23 M3. Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring the ScadaLink.AuditLog scaffolding pattern (net10.0, central package management, InternalsVisibleTo to the tests assembly). SiteCallAuditActor is the central singleton entry point for Site Call Audit (#22): it receives UpsertSiteCallCommand and persists the SiteCall via ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order or duplicate updates are silent no-ops at the repo). Audit-write failures NEVER abort the user-facing action (CLAUDE.md): repository throws are caught + logged, the actor replies Accepted=false, and the singleton stays alive (Resume supervisor strategy as defence in depth). Two constructors mirror AuditLogIngestActor: - IServiceProvider production constructor resolves the scoped EF repository from a fresh DI scope per message. - ISiteCallAuditRepository test constructor injects a concrete repository so the TestKit tests exercise the real monotonic-upsert SQL end to end. UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct them without taking a project reference on the actor's host project. AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog / AddNotificationOutbox; Bundle F will populate it with the actor's Props factory + options bindings. Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2): - Receive_UpsertSiteCallCommand_Persists_Replies_Accepted - Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency) - Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive Reconciliation, KPIs, and the central->site Retry/Discard relay are deferred per CLAUDE.md scope discipline. ScadaLink.slnx updated to include both new projects. All 3 new tests pass against the running infra/mssql container; full suite (2683 tests across 27 projects) passes with no regressions.	2026-05-20 14:18:49 -04:00
Joseph Doherty	e416b21dad	feat(commons): CachedCallTelemetry combined operational+audit packet (#23 M3)	2026-05-20 13:58:57 -04:00
Joseph Doherty	dd3351da93	feat(health): SiteAuditWriteFailures counter + AuditLog bridge (#23 ) Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary- failure counter into the Site Health Monitoring report payload so a sustained audit-write outage surfaces on /monitoring/health instead of disappearing into a NoOp sink. - SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive). - ISiteHealthCollector + SiteHealthCollector: new IncrementSiteAuditWriteFailures() counter, per-interval reset semantics matching ScriptErrorCount / DeadLetterCount. - HealthMetricsAuditWriteFailureCounter: adapter forwarding IAuditWriteFailureCounter.Increment() to the collector. - AddAuditLogHealthMetricsBridge(): swaps the NoOp default registration for the real bridge; called from SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog. - Existing host-wiring test updated: site composition now resolves HealthMetricsAuditWriteFailureCounter (not NoOp). Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new), full solution green.	2026-05-20 13:22:25 -04:00
Joseph Doherty	87cae88f92	feat(auditlog): AuditLogIngestActor + gRPC handler (#23 )	2026-05-20 12:48:26 -04:00
Joseph Doherty	08743bc42d	feat(commons): add audit telemetry + pull message DTOs (#23 )	2026-05-20 09:57:39 -04:00
Joseph Doherty	adcab9dcfc	feat(notification-outbox): per-site KPI request/response message contracts	2026-05-19 05:33:37 -04:00
Joseph Doherty	c8b5871782	fix(notification-outbox): re-align Central UI sandbox Notify API with production The script-analysis sandbox Notify surface was stale after the Notification Outbox change: SandboxNotifyTarget.Send returned Task<NotificationResult> and there was no Status method, while production NotifyTarget.Send returns Task<string> (a NotificationId) plus NotifyHelper.Status. A script that test-ran cleanly in the sandbox would not compile against the real site runtime. - Move the NotificationDeliveryStatus record from ScadaLink.SiteRuntime.Scripts into ScadaLink.Commons.Messages.Notification so both production and the CentralUI sandbox reference the exact same type (CentralUI does not, and should not, reference SiteRuntime). Production NotifyHelper.Status is otherwise untouched. - Rewrite SandboxNotifyHelper/SandboxNotifyTarget to be a signature-faithful no-op fake: Send returns Task<string> (a fake NotificationId), Status returns Task<NotificationDeliveryStatus>. Production now enqueues into the site S&F engine, which has no central-side equivalent in the sandbox, so the fake no longer carries an INotificationDeliveryService. - Add script-analysis tests proving a script using the new Notify shape both diagnoses clean and runs in the sandbox.	2026-05-19 03:44:34 -04:00
Joseph Doherty	77a05a8960	fix(notification-outbox): give KPI response a failure shape; log status-query faults	2026-05-19 01:55:46 -04:00
Joseph Doherty	c547f82957	feat(notification-outbox): add notification message and outbox query contracts	2026-05-19 01:13:36 -04:00
Joseph Doherty	b1f4251d75	fix(commons): resolve Commons-008 — replace ValueTuple in SetConnectionBindingsCommand with named ConnectionBinding record (CLI, ManagementService, TemplateEngine, CentralUI)	2026-05-16 23:54:31 -04:00
Joseph Doherty	3e7a3d7e31	fix(commons): resolve Commons-001..004 — stale-fire race, JsonDocument lifetime, GetNullable strictness, registry symmetry	2026-05-16 20:58:03 -04:00
Joseph Doherty	bc548e1447	feat(deployment-manager): resolve DeploymentManager-006 — query site deployment state before redeploy and reconcile Adds DeploymentStateQuery request/response contracts (Commons), a site-side handler (SiteRuntime), a CommunicationService query method (Communication), and reconciliation in DeploymentService: when a prior record is InProgress or Failed-on-timeout, query the site; if it already holds the target revision hash mark the record Success without re-sending; on query failure fall through to a normal deploy (site-side stale-rejection is the safety net).	2026-05-16 20:12:24 -04:00
Joseph Doherty	295150751f	feat(scripts): realign Test Run with runtime API, add anonymous-object calls and instance binding The Test Run sandbox and Monaco analysis modelled a script API that had drifted from the site runtime's ScriptGlobals, so real scripts failed to compile in Test Run. Realign both to the runtime surface (Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the duplicate ScriptHost stub so the two cannot diverge again. - Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call) accept an anonymous object instead of a hand-built dictionary, via a shared ScriptArgs normalizer; existing dictionary calls still compile. - Test Run can optionally bind to a deployed instance, so Instance/ Attributes/CallScript route to it cross-site; adds site-side RouteToGetAttributes/RouteToSetAttributes handlers. - Adds Test Run panels to the API method and template script editors. - Fixes the TestDatabaseQuery seed script, which queried a table that never existed. Also commits unrelated in-progress work already in the tree: the health monitoring report loop, site streaming changes, and the Admin/Design data-connection and SMTP page reorganization.	2026-05-16 03:37:56 -04:00
Joseph Doherty	7bba48a14a	feat(ui/monitoring): redesign Parked Messages page with filters, drawer, and bulk actions Triage was painful on the old layout: a lone Site dropdown sat on a sparse row, errors were truncated mid-sentence with a per-row View/Hide toggle that on expand pushed an unwrapped <pre> through the table and shoved the Actions column off-screen, all rows looked the same regardless of age or attempt count, and OriginInstance — which tells you which instance produced the failure — wasn't displayed at all even though the data was on the entity. This pass: - Adds a real filter bar: Site, Category, Target system, Origin instance, Age window, free-text search. Category/Target/Origin/Age/Search filter the loaded page client-side; Site still drives the server query (and changing site now auto-queries — one fewer click). - Replaces the in-table expansion with an Offcanvas detail drawer. Clicking a row slides in a side panel with full message ID + copy, category label, origin, attempts, both timestamps in relative + absolute form, the complete error (pre-wrap, scrollable), and big Retry / Discard buttons. The table never overflows. - Stacks Target + Method into one column (target in semibold, method small/muted below) and surfaces Origin as a code-styled chip in a new column ("—" muted when null). - Severity left-border on each row, derived client-side from AttemptCount/MaxAttempts and age of the last attempt: red when retries are exhausted and last attempt was in the past hour, amber when exhausted but stale, muted grey otherwise. - Mini attempt progress bar under the n/max count, red when fully exhausted and amber while partial. - Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on hover via the title attribute — applies in both the table and the drawer. - Bulk select: header checkbox selects the filtered set, per-row checkboxes. When ≥1 selected, a sticky action strip slides in below the filter bar offering Retry selected / Discard selected with the usual confirm dialog. Toast reports per-item success/failure counts. - Summary line next to the title: "N parked · K target systems · oldest Xh ago" (and "(showing M of N)" when filters are active). - ParkedMessageEntry contract extended additively with MaxAttempts, Category, and OriginInstance so the UI has the data it needs for severity, the category filter, and the new column. - Bumped page size from 25 to 50 to better match the dense layout.	2026-05-13 08:05:22 -04:00
Joseph Doherty	6f1f6b8467	fix(health): replicate site health reports between central nodes CentralHealthAggregator is a per-node hosted singleton, but site health reports flow through ClusterClient which round-robins each report to one central node only. The other node's aggregator never saw those reports and marked sites offline at the 60s threshold — sites constantly flapped between online and offline on the monitoring page. On receive, the active CentralCommunicationActor now republishes a SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both central nodes subscribe to the topic and process replicas through a dedicated path that updates the local aggregator without re-broadcasting (avoids fan-out loops). The aggregator's existing sequence-number idempotency makes self-delivery a cheap no-op. DistributedPubSubExtensionProvider is now listed in the HOCON `akka.extensions` block so the mediator is initialised at cluster start, eliminating a race where the first Subscribe arrived before the extension was loaded.	2026-05-13 06:20:07 -04:00
Joseph Doherty	751248feb6	feat(alarms): HiLo trigger type with per-band level, hysteresis, messages, overrides Adds a new HiLo alarm trigger type with four configurable setpoints (LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority, deadband (for hysteresis), and operator message. The site runtime emits AlarmStateChanged with an AlarmLevel field so consumers can differentiate warning vs critical bands. Plumbing: - new AlarmLevel enum + AlarmStateChanged.Level/Message init properties - AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting - AlarmTriggerConfigCodec extracted from the editor for testability - sitestream.proto carries level + message over gRPC - SemanticValidator enforces numeric attribute, setpoint ordering, non-negative deadband - on-trigger scripts get an Alarm global (Name/Level/Priority/Message) so notification routing can branch by severity - per-instance InstanceAlarmOverride entity + EF migration + flattening step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary types whole-replace - DebugView shows a Level badge + per-band message tooltip - App.razor auto-reloads on permanent Blazor circuit failure - docker/regen-proto.sh automates the proto regen workflow (the linux/arm64 protoc segfault means generated files are checked in for now)	2026-05-13 03:23:32 -04:00
Joseph Doherty	a293f5a365	feat(management): add TemplateFolder command records	2026-05-11 11:05:32 -04:00
Joseph Doherty	02a7e8abc6	feat(health): show all cluster nodes (online/offline, primary/standby) in health dashboard Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider that queries Akka cluster membership for all site-role nodes. HealthReportSender populates ClusterNodes before each report. UI shows a row per node with hostname, Online/Offline badge, and Primary/Standby badge. Falls back to single-node display if ClusterNodes is not populated.	2026-03-24 16:19:39 -04:00
Joseph Doherty	e84a831a02	feat(health): redesign health dashboard with 4-column layout and new metrics New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints (primary/secondary), DataConnectionTagQuality (good/bad/uncertain), ParkedMessageCount. New collector methods to populate them. Health dashboard redesigned to match mockup: Nodes \| Data Connections (with per-connection tag quality) \| Instances + S&F Buffers \| Error Counts + Parked Messages. Site names resolved from repository.	2026-03-24 16:19:39 -04:00
Joseph Doherty	e8df71ea64	feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands Thread backup data connection fields through management command messages, ManagementActor handlers, SiteService, site-side SQLite storage, and deployment/replication actors. The old --configuration CLI flag is kept as a hidden alias for backwards compatibility.	2026-03-22 08:41:57 -04:00
Joseph Doherty	801c0c1df2	feat(dcl): add active endpoint to health reports and log failover events Add ActiveEndpoint field to DataConnectionHealthReport showing which endpoint is active (Primary, Backup, or Primary with no backup configured). Log failover transitions and connection restoration events to the site event log via ISiteEventLogger, passed as an optional parameter through the actor hierarchy for backwards compatibility.	2026-03-22 08:34:05 -04:00
Joseph Doherty	46304678da	feat(dcl): extend CreateConnectionCommand with backup config and failover retry count Update CreateConnectionCommand to carry PrimaryConnectionDetails, BackupConnectionDetails, and FailoverRetryCount. Update all callers: DataConnectionManagerActor, DataConnectionActor, DeploymentManagerActor, FlatteningService, and ConnectionConfig. The actor stores both configs but continues using primary only — failover logic comes in Task 3.	2026-03-22 08:24:39 -04:00
Joseph Doherty	04af03980e	feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount	2026-03-22 08:18:31 -04:00
Joseph Doherty	970d0a5cb3	refactor: simplify data connections from many-to-many site assignment to direct site ownership Replace SiteDataConnectionAssignment join table with a direct SiteId FK on DataConnection, simplifying the data model, repositories, UI, CLI, and deployment service.	2026-03-21 21:07:10 -04:00
Joseph Doherty	49f042a937	refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC	2026-03-21 12:18:52 -04:00
Joseph Doherty	9b0a80dcbd	feat: add GrpcNodeAAddress/GrpcNodeBAddress to Site entity, CLI, and UI	2026-03-21 11:45:22 -04:00
Joseph Doherty	3efec91386	fix: route debug stream events through ClusterClient site→central path ClusterClient Sender refs are temporary proxies — valid for immediate reply but not durable for future Tells. Events now flow as DebugStreamEvent through SiteCommunicationActor → ClusterClient → CentralCommunicationActor → bridge actor (same pattern as health reports). Also fix DebugStreamHub to use IHubContext for long-lived callbacks instead of transient hub instance.	2026-03-21 11:32:17 -04:00
Joseph Doherty	1a540f4f0a	feat: add HTTP Management API, migrate CLI from Akka ClusterClient to HTTP Replace the CLI's Akka.NET ClusterClient transport with a simple HTTP client targeting a new POST /management endpoint on the Central Host. The endpoint handles Basic Auth, LDAP authentication, role resolution, and ManagementActor dispatch in a single round-trip — eliminating the CLI's Akka, LDAP, and Security dependencies. Also fixes DCL ReSubscribeAll losing subscriptions on repeated reconnect by deriving the tag list from _subscriptionsByInstance instead of _subscriptionIds.	2026-03-20 23:55:31 -04:00
Joseph Doherty	7740a3bcf9	feat: add JoeAppEngine OPC UA nodes, fix DCL auto-reconnect and quality push - Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime) - Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads, preventing Self.Tell failure in Disconnected event handler - Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect - Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]" - Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync - Update test_infra_opcua.md with JoeAppEngine documentation	2026-03-19 13:27:54 -04:00
Joseph Doherty	eb8ead58d2	feat: wire SQLite replication between site nodes and fix ConfigurationDatabase tests Add SiteReplicationActor (runs on every site node) to replicate deployed configs and store-and-forward buffer operations to the standby peer via cluster member discovery and fire-and-forget Tell. Wire ReplicationService handler and pass replication actor to DeploymentManagerActor singleton. Fix 5 pre-existing ConfigurationDatabase test failures: RowVersion NOT NULL on SQLite, stale migration name assertion, and seed data count mismatch.	2026-03-18 08:28:02 -04:00
Joseph Doherty	9c6e3c2e56	feat: add CLI debug snapshot command for one-shot instance state inspection Adds `debug snapshot --id <int>` to query a running instance's current attribute values and alarm states without the subscribe/stream overhead of the debug view. Routes through ManagementActor → CommunicationService → site DeploymentManager → InstanceActor using the existing remote query pattern.	2026-03-18 07:16:22 -04:00
Joseph Doherty	c63fb1c4a6	feat: achieve CLI parity with Central UI Add 33 new management message records, ManagementActor handlers, and CLI commands to close all functionality gaps between the Central UI and the Management CLI. New capabilities include: - Template member CRUD (attributes, alarms, scripts, compositions) - Shared script CRUD - Database connection definition CRUD - Inbound API method CRUD - LDAP scope rule management - API key enable/disable - Area update - Remote event log and parked message queries - Missing get/update commands for templates, sites, instances, data connections, external systems, notifications, and SMTP config Includes 12 new ManagementActor unit tests covering authorization, happy-path queries, and error handling. Updates CLI README and component design documents (Component-CLI.md, Component-ManagementService.md).	2026-03-18 01:21:20 -04:00
Joseph Doherty	f165ca2774	feat: wire all health metrics and add instance counts to dashboard Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor), alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and S&F buffer depth placeholder. Added instance count tracking (deployed/ enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated Health Dashboard UI to show instance counts per site. All metrics flow through the existing health report pipeline via ClusterClient.	2026-03-18 00:57:49 -04:00
Joseph Doherty	9e97c1acd2	feat: replace site registration with database-driven site addressing Central now resolves site Akka remoting addresses from the Sites DB table (NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite messages. Eliminates the race condition where sites starting before central had their registration dead-lettered. Addresses are cached in CentralCommunicationActor with 60s periodic refresh and on-demand refresh when sites are added/edited/deleted via UI or CLI.	2026-03-17 23:13:10 -04:00
Joseph Doherty	775cb8084f	feat: data-sourced attributes start with uncertain quality before first DCL value Attributes bound to data connections now initialize with "Uncertain" quality, distinguishing "never received a value" from "known good" or "connection lost." Quality is tracked per attribute and included in GetAttributeResponse.	2026-03-17 18:25:39 -04:00
Joseph Doherty	eea50014de	fix: resolve CLI serialization failures and add README Two Akka.NET deserialization bugs prevented CLI commands from reaching ManagementActor: IReadOnlyList<string> in AuthenticatedUser serialized as a compiler-generated internal type unknown to the server, and ManagementSuccess.Data carried server-side assembly types the CLI couldn't resolve on receipt. Fixed by using string[] for roles and pre-serializing response data to JSON in ManagementActor before sending. Adds full CLI reference documentation covering all 10 command groups.	2026-03-17 18:17:47 -04:00
Joseph Doherty	8068c499bd	feat: define management message contracts in Commons (10 command groups)	2026-03-17 14:41:54 -04:00
Joseph Doherty	2f3e0ceecb	feat: include data connections and SMTP in artifact deployment	2026-03-17 13:48:52 -04:00
Joseph Doherty	dfb809a909	Wire DCL to Instance Actors for OPC UA tag value flow - Add TagValueUpdate/ConnectionQualityChanged handlers to InstanceActor - InstanceActor subscribes to DCL on PreStart based on DataSourceReference - DeploymentManagerActor creates DCL connections on deploy and passes DCL ref - AkkaHostedService creates DCL Manager Actor for tag subscriptions - Move CreateConnectionCommand to Commons for cross-project access - Add ConnectionConfig to FlattenedConfiguration for deployment packaging	2026-03-17 11:21:11 -04:00
Joseph Doherty	b659978764	Phase 8: Production readiness — failover tests, security hardening, sandboxing, deployment docs - WP-1-3: Central/site failover + dual-node recovery tests (17 tests) - WP-4: Performance testing framework for target scale (7 tests) - WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests) - WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs) - WP-7: Recovery drill test scaffolds (5 tests) - WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests) - WP-9: Message contract compatibility (forward/backward compat) (18 tests) - WP-10: Deployment packaging (installation guide, production checklist, topology) - WP-11: Operational runbooks (failover, troubleshooting, maintenance) 92 new tests, all passing. Zero warnings.	2026-03-16 22:12:31 -04:00
Joseph Doherty	389f5a0378	Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging Communication Layer (WP-1–5): - 8 message patterns with correlation IDs, per-pattern timeouts - Central/Site communication actors, transport heartbeat config - Connection failure handling (no central buffering, debug streams killed) Data Connection Layer (WP-6–14, WP-34): - Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting) - OPC UA + LmxProxy adapters behind IDataConnection - Auto-reconnect, bad quality propagation, transparent re-subscribe - Write-back, tag path resolution with retry, health reporting - Protocol extensibility via DataConnectionFactory Site Runtime (WP-15–25, WP-32–33): - ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher) - AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state) - SharedScriptLibrary (inline execution), ScriptRuntimeContext (API) - ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout) - Recursion limit (default 10), call direction enforcement - SiteStreamManager (per-subscriber bounded buffers, fire-and-forget) - Debug view backend (snapshot + stream), concurrency serialization - Local artifact storage (4 SQLite tables) Health Monitoring (WP-26–28): - SiteHealthCollector (thread-safe counters, connection state) - HealthReportSender (30s interval, monotonic sequence numbers) - CentralHealthAggregator (offline detection 60s, online recovery) Site Event Logging (WP-29–31): - SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC) - EventLogPurgeService (30-day retention, 1GB cap) - EventLogQueryService (filters, keyword search, keyset pagination) 541 tests pass, zero warnings.	2026-03-16 20:57:25 -04:00
Joseph Doherty	e9e6165914	Phase 3A: Site runtime foundation — Akka cluster, SQLite persistence, Deployment Manager singleton, Instance Actor - WP-1: Site cluster config (keep-oldest SBR, down-if-alone, 2s/10s failure detection) - WP-2: Site-role host bootstrap (no Kestrel, SQLite paths) - WP-3: SiteStorageService with deployed_configurations + static_attribute_overrides tables - WP-4: DeploymentManagerActor as cluster singleton with staggered Instance Actor creation, OneForOneStrategy/Resume supervision, deploy/disable/enable/delete lifecycle - WP-5: InstanceActor with attribute state, GetAttribute/SetAttribute, SQLite override persistence - WP-6: CoordinatedShutdown verified for graceful singleton handover - WP-7: Dual-node recovery (both seed nodes, min-nr-of-members=1) - WP-8: 31 tests (storage CRUD, actor lifecycle, supervision, negative checks) 389 total tests pass, zero warnings.	2026-03-16 20:34:56 -04:00
Joseph Doherty	22e1eba58a	Phase 0 WP-0.2–0.9: Implement Commons (types, entities, interfaces, messages, protocol, tests) - WP-0.2: Namespace/folder skeleton (26 directories) - WP-0.3: Shared data types (6 enums, RetryPolicy, Result<T>) - WP-0.4: 24 domain entity POCOs across 10 domain areas - WP-0.5: 7 repository interfaces with full CRUD signatures - WP-0.6: IAuditService cross-cutting interface - WP-0.7: 26 message contract records across 8 concern areas - WP-0.8: IDataConnection protocol abstraction with batch ops - WP-0.9: 8 architectural constraint enforcement tests All 40 tests pass, zero warnings.	2026-03-16 18:48:24 -04:00

47 Commits