Commit Graph

28 Commits

Author SHA1 Message Date
Joseph Doherty
eb8ead58d2 feat: wire SQLite replication between site nodes and fix ConfigurationDatabase tests
Add SiteReplicationActor (runs on every site node) to replicate deployed
configs and store-and-forward buffer operations to the standby peer via
cluster member discovery and fire-and-forget Tell. Wire ReplicationService
handler and pass replication actor to DeploymentManagerActor singleton.

Fix 5 pre-existing ConfigurationDatabase test failures: RowVersion NOT NULL
on SQLite, stale migration name assertion, and seed data count mismatch.
2026-03-18 08:28:02 -04:00
Joseph Doherty
899dec6b6f feat: wire ExternalSystem, Database, and Notify APIs into script runtime
IServiceProvider now flows through the actor chain (DeploymentManagerActor
→ InstanceActor → ScriptActor → ScriptExecutionActor) so scripts can
resolve IExternalSystemClient, IDatabaseGateway, and
INotificationDeliveryService from DI. ScriptGlobals exposes ExternalSystem,
Database, Notify, and Scripts as top-level properties so scripts can use
them without the Instance. prefix.
2026-03-18 02:41:18 -04:00
Joseph Doherty
c63fb1c4a6 feat: achieve CLI parity with Central UI
Add 33 new management message records, ManagementActor handlers, and CLI
commands to close all functionality gaps between the Central UI and the
Management CLI. New capabilities include:

- Template member CRUD (attributes, alarms, scripts, compositions)
- Shared script CRUD
- Database connection definition CRUD
- Inbound API method CRUD
- LDAP scope rule management
- API key enable/disable
- Area update
- Remote event log and parked message queries
- Missing get/update commands for templates, sites, instances, data
  connections, external systems, notifications, and SMTP config

Includes 12 new ManagementActor unit tests covering authorization,
happy-path queries, and error handling. Updates CLI README and component
design documents (Component-CLI.md, Component-ManagementService.md).
2026-03-18 01:21:20 -04:00
Joseph Doherty
f165ca2774 feat: wire all health metrics and add instance counts to dashboard
Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor),
alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and
S&F buffer depth placeholder. Added instance count tracking (deployed/
enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated
Health Dashboard UI to show instance counts per site. All metrics flow
through the existing health report pipeline via ClusterClient.
2026-03-18 00:57:49 -04:00
Joseph Doherty
75a6636a2c fix: wire DCL connection state changes into ISiteHealthCollector
DataConnectionActor now calls UpdateConnectionHealth() on state
transitions (Connecting/Connected/Reconnecting) and UpdateTagResolution()
on connection establishment. DataConnectionManagerActor calls
RemoveConnection() on actor removal. Health reports now include
data connection statuses when instances are deployed with bindings.
2026-03-18 00:20:02 -04:00
Joseph Doherty
4f22ca2b1f feat: replace ActorSelection with ClusterClient for inter-cluster communication
Central and site clusters now communicate via ClusterClient/
ClusterClientReceptionist instead of direct ActorSelection. Both
CentralCommunicationActor and SiteCommunicationActor are registered
with their cluster's receptionist. Central creates one ClusterClient
per site using NodeA/NodeB contact points from the DB. Sites configure
multiple CentralContactPoints for automatic failover between central
nodes. ISiteClientFactory enables test injection.
2026-03-18 00:08:47 -04:00
Joseph Doherty
9e97c1acd2 feat: replace site registration with database-driven site addressing
Central now resolves site Akka remoting addresses from the Sites DB table
(NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite
messages. Eliminates the race condition where sites starting before central
had their registration dead-lettered. Addresses are cached in
CentralCommunicationActor with 60s periodic refresh and on-demand refresh
when sites are added/edited/deleted via UI or CLI.
2026-03-17 23:13:10 -04:00
Joseph Doherty
775cb8084f feat: data-sourced attributes start with uncertain quality before first DCL value
Attributes bound to data connections now initialize with "Uncertain" quality,
distinguishing "never received a value" from "known good" or "connection lost."
Quality is tracked per attribute and included in GetAttributeResponse.
2026-03-17 18:25:39 -04:00
Joseph Doherty
d41e156fe4 test: add ManagementActor unit tests (12 tests, authorization + data flow + errors) 2026-03-17 14:57:46 -04:00
Joseph Doherty
e9acd2dd34 feat: scaffold ManagementService project and test project 2026-03-17 14:40:39 -04:00
Joseph Doherty
3b22a8f0da feat: wire site-local repos, remove config DB from Site, update artifact service
- SiteExternalSystemRepository and SiteNotificationRepository registered in Site DI
- Removed AddConfigurationDatabase from Site role in Program.cs
- Removed ConfigurationDb from appsettings.Site.json
- ArtifactDeploymentService collects all 6 artifact types including data connections and SMTP
2026-03-17 13:54:37 -04:00
Joseph Doherty
2f3e0ceecb feat: include data connections and SMTP in artifact deployment 2026-03-17 13:48:52 -04:00
Joseph Doherty
4879c4e01e Fix auth, Bootstrap, Blazor nav, LDAP, and deployment pipeline for working Central UI
Bootstrap served locally with absolute paths and <base href="/">.
LDAP auth uses search-then-bind with service account for GLAuth compatibility.
CookieAuthenticationStateProvider reads HttpContext.User instead of parsing JWT.
Login/logout forms opt out of Blazor enhanced nav (data-enhance="false").
Nav links use absolute paths; seed data includes Design/Deployment group mappings.
DataConnections page loads all connections (not just site-assigned).
Site appsettings configured for Test Plant A; Site registers with Central on startup.
DeploymentService resolves string site identifier for Akka routing.
Instances page gains Create Instance form.
2026-03-17 10:03:06 -04:00
Joseph Doherty
2ae807df37 Phase 7: Integration surfaces — Inbound API, External System Gateway, Notification Service
Inbound API (WP-1–5):
- POST /api/{methodName} with X-API-Key auth (401/403)
- Parameter validation with extended type system (Object, List)
- Central script execution with configurable timeout
- Route.To() cross-site calls (Call, GetAttribute/SetAttribute batch)
- Failures-only logging

External System Gateway (WP-6–10):
- HTTP/REST client with JSON, API Key + Basic Auth
- Dual call modes: Call() synchronous, CachedCall() with S&F
- Error classification (transient: 5xx/408/429, permanent: 4xx)
- Database.Connection() (ADO.NET pooling) + Database.CachedWrite() (S&F)

Notification Service (WP-11–13):
- SMTP with OAuth2 Client Credentials + Basic Auth
- BCC delivery, plain text, token lifecycle
- Transient → S&F, permanent → returned to script
- ScriptRuntimeContext wired with ExternalSystem/Database/Notify APIs

Repository implementations: ExternalSystem, Notification, InboundApi, InstanceLocator
781 tests pass, zero warnings.
2026-03-16 22:19:12 -04:00
Joseph Doherty
b659978764 Phase 8: Production readiness — failover tests, security hardening, sandboxing, deployment docs
- WP-1-3: Central/site failover + dual-node recovery tests (17 tests)
- WP-4: Performance testing framework for target scale (7 tests)
- WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests)
- WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs)
- WP-7: Recovery drill test scaffolds (5 tests)
- WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests)
- WP-9: Message contract compatibility (forward/backward compat) (18 tests)
- WP-10: Deployment packaging (installation guide, production checklist, topology)
- WP-11: Operational runbooks (failover, troubleshooting, maintenance)
92 new tests, all passing. Zero warnings.
2026-03-16 22:12:31 -04:00
Joseph Doherty
3b2320bd35 Phases 4-6: Complete Central UI — Admin, Design, Deployment, and Operations pages
Phase 4 — Operator/Admin UI:
- Sites, DataConnections, Areas (hierarchical), API Keys (auto-generated) CRUD
- Health Dashboard (live refresh, per-site metrics from CentralHealthAggregator)
- Instance list with filtering/staleness/lifecycle actions
- Deployment status tracking with auto-refresh

Phase 5 — Authoring UI:
- Template authoring with inheritance tree, tabs (attrs/alarms/scripts/compositions)
- Lock indicators, on-demand validation, collision detection
- Shared scripts with syntax check
- External systems, DB connections, notification lists, Inbound API methods

Phase 6 — Deployment Operations UI:
- Staleness indicators, validation gating
- Debug view (instance selection, attribute/alarm live tables)
- Site event log viewer (filters, keyword search, keyset pagination)
- Parked message management, Audit log viewer with JSON state

Shared components: DataTable, ConfirmDialog, ToastNotification, LoadingSpinner, TimestampDisplay
623 tests pass, zero warnings. All Bootstrap 5, clean corporate design.
2026-03-16 21:47:37 -04:00
Joseph Doherty
6ea38faa6f Phase 3C: Deployment pipeline & Store-and-Forward engine
Deployment Manager (WP-1–8, WP-16):
- DeploymentService: full pipeline (flatten→validate→send→track→audit)
- OperationLockManager: per-instance concurrency control
- StateTransitionValidator: Enabled/Disabled/NotDeployed transition matrix
- ArtifactDeploymentService: broadcast to all sites with per-site results
- Deployment identity (GUID + revision hash), idempotency, staleness detection
- Instance lifecycle commands (disable/enable/delete) with deduplication

Store-and-Forward (WP-9–15):
- StoreAndForwardStorage: SQLite persistence, 3 categories, no max buffer
- StoreAndForwardService: fixed-interval retry, transient-only buffering, parking
- ReplicationService: async best-effort to standby (fire-and-forget)
- Parked message management (query/retry/discard from central)
- Messages survive instance deletion, S&F drains on disable

620 tests pass (+79 new), zero warnings.
2026-03-16 21:27:18 -04:00
Joseph Doherty
389f5a0378 Phase 3B: Site I/O & Observability — Communication, DCL, Script/Alarm actors, Health, Event Logging
Communication Layer (WP-1–5):
- 8 message patterns with correlation IDs, per-pattern timeouts
- Central/Site communication actors, transport heartbeat config
- Connection failure handling (no central buffering, debug streams killed)

Data Connection Layer (WP-6–14, WP-34):
- Connection actor with Become/Stash lifecycle (Connecting/Connected/Reconnecting)
- OPC UA + LmxProxy adapters behind IDataConnection
- Auto-reconnect, bad quality propagation, transparent re-subscribe
- Write-back, tag path resolution with retry, health reporting
- Protocol extensibility via DataConnectionFactory

Site Runtime (WP-15–25, WP-32–33):
- ScriptActor/ScriptExecutionActor (triggers, concurrent execution, blocking I/O dispatcher)
- AlarmActor/AlarmExecutionActor (ValueMatch/RangeViolation/RateOfChange, in-memory state)
- SharedScriptLibrary (inline execution), ScriptRuntimeContext (API)
- ScriptCompilationService (Roslyn, forbidden API enforcement, execution timeout)
- Recursion limit (default 10), call direction enforcement
- SiteStreamManager (per-subscriber bounded buffers, fire-and-forget)
- Debug view backend (snapshot + stream), concurrency serialization
- Local artifact storage (4 SQLite tables)

Health Monitoring (WP-26–28):
- SiteHealthCollector (thread-safe counters, connection state)
- HealthReportSender (30s interval, monotonic sequence numbers)
- CentralHealthAggregator (offline detection 60s, online recovery)

Site Event Logging (WP-29–31):
- SiteEventLogger (SQLite, 6 event categories, ISO 8601 UTC)
- EventLogPurgeService (30-day retention, 1GB cap)
- EventLogQueryService (filters, keyword search, keyset pagination)

541 tests pass, zero warnings.
2026-03-16 20:57:25 -04:00
Joseph Doherty
e9e6165914 Phase 3A: Site runtime foundation — Akka cluster, SQLite persistence, Deployment Manager singleton, Instance Actor
- WP-1: Site cluster config (keep-oldest SBR, down-if-alone, 2s/10s failure detection)
- WP-2: Site-role host bootstrap (no Kestrel, SQLite paths)
- WP-3: SiteStorageService with deployed_configurations + static_attribute_overrides tables
- WP-4: DeploymentManagerActor as cluster singleton with staggered Instance Actor creation,
  OneForOneStrategy/Resume supervision, deploy/disable/enable/delete lifecycle
- WP-5: InstanceActor with attribute state, GetAttribute/SetAttribute, SQLite override persistence
- WP-6: CoordinatedShutdown verified for graceful singleton handover
- WP-7: Dual-node recovery (both seed nodes, min-nr-of-members=1)
- WP-8: 31 tests (storage CRUD, actor lifecycle, supervision, negative checks)
389 total tests pass, zero warnings.
2026-03-16 20:34:56 -04:00
Joseph Doherty
faef2d0de6 Phase 2 WP-1–13+23: Template Engine CRUD, composition, overrides, locking, collision detection, acyclicity
- WP-23: ITemplateEngineRepository full EF Core implementation
- WP-1: Template CRUD with deletion constraints (instances, children, compositions)
- WP-2–4: Attribute, alarm, script definitions with lock flags and override granularity
- WP-5: Shared script CRUD with syntax validation
- WP-6–7: Composition with recursive nesting and canonical naming
- WP-8–11: Override granularity, locking rules, inheritance/composition scope
- WP-12: Naming collision detection on canonical names (recursive)
- WP-13: Graph acyclicity (inheritance + composition cycles)
Core services: TemplateService, SharedScriptService, TemplateResolver,
LockEnforcer, CollisionDetector, CycleDetector. 358 tests pass.
2026-03-16 20:10:34 -04:00
Joseph Doherty
84ad6bb77d Fix LDAP integration test: use GLAuth test credentials and runtime availability check
- Password "admin" → "password" (matches GLAuth config.toml)
- Replace hard Skip attribute with TCP connectivity check (test runs when GLAuth available)
- Add LdapSearchBase + AllowInsecureLdap to appsettings.Central.json for dev
2026-03-16 19:56:05 -04:00
Joseph Doherty
d38356efdb Phase 1 WP-11–22: Host infrastructure, Blazor Server UI, and integration tests
Host infrastructure (WP-11–17):
- StartupValidator with 19 validation rules
- /health/ready endpoint with DB + Akka health checks
- Akka.NET bootstrap via AkkaHostedService (HOCON config, cluster, remoting, SBR)
- Serilog with SiteId/NodeHostname/NodeRole enrichment
- DeadLetterMonitorActor with count tracking
- CoordinatedShutdown wiring (no Environment.Exit)
- Windows Service support (UseWindowsService)

Central UI (WP-18–21):
- Blazor Server shell with Bootstrap 5, role-aware NavMenu
- Login/logout flow (LDAP auth → JWT → HTTP-only cookie)
- CookieAuthenticationStateProvider with idle timeout
- LDAP group mapping CRUD page (Admin role)
- Route guards with Authorize attributes per role
- SignalR reconnection overlay for failover

Integration tests (WP-22):
- Startup validation, auth flow, audit transactions, readiness gating
186 tests pass (1 skipped: LDAP integration), zero warnings.
2026-03-16 19:50:59 -04:00
Joseph Doherty
cafb7d2006 Phase 1 WP-2–10: Repositories, audit service, security & auth (LDAP, JWT, roles, policies, data protection)
- WP-2: SecurityRepository + CentralUiRepository with audit log queries
- WP-3: AuditService with transactional guarantee (same SaveChangesAsync)
- WP-4: Optimistic concurrency tests (deployment records vs template last-write-wins)
- WP-5: Seed data (SCADA-Admins → Admin role mapping)
- WP-6: LdapAuthService (direct bind, TLS enforcement, group query)
- WP-7: JwtTokenService (HMAC-SHA256, 15-min refresh, 30-min idle timeout)
- WP-8: RoleMapper (LDAP groups → roles with site-scoped deployment)
- WP-9: Authorization policies (Admin/Design/Deployment + site scope handler)
- WP-10: Shared Data Protection keys via EF Core
141 tests pass, zero warnings.
2026-03-16 19:32:43 -04:00
Joseph Doherty
1996b21961 Phase 1 WP-1: EF Core DbContext with Fluent API mappings for all 26 entities
ScadaLinkDbContext with 10 configuration classes (Fluent API only), initial
migration creating 25 tables, environment-aware migration helper (auto-apply
dev, validate-only prod), DesignTimeDbContextFactory, optimistic concurrency
on DeploymentRecord. 20 tests verify schema, CRUD, relationships, cascades.
2026-03-16 19:15:50 -04:00
Joseph Doherty
8c2091dc0a Phase 0 WP-0.10–0.12: Host skeleton, options classes, sample configs, and execution framework
- WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host),
  15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs
- WP-0.11: 12 per-component options classes with config binding
- WP-0.12: Sample appsettings for central and site topologies
- Add execution procedure and checklist template to generate_plans.md
- Add phase-0-checklist.md for execution tracking
- Resolve all 21 open questions from plan generation
- Update IDataConnection with batch ops and IAsyncDisposable
57 tests pass, zero warnings.
2026-03-16 18:59:07 -04:00
Joseph Doherty
22e1eba58a Phase 0 WP-0.2–0.9: Implement Commons (types, entities, interfaces, messages, protocol, tests)
- WP-0.2: Namespace/folder skeleton (26 directories)
- WP-0.3: Shared data types (6 enums, RetryPolicy, Result<T>)
- WP-0.4: 24 domain entity POCOs across 10 domain areas
- WP-0.5: 7 repository interfaces with full CRUD signatures
- WP-0.6: IAuditService cross-cutting interface
- WP-0.7: 26 message contract records across 8 concern areas
- WP-0.8: IDataConnection protocol abstraction with batch ops
- WP-0.9: 8 architectural constraint enforcement tests
All 40 tests pass, zero warnings.
2026-03-16 18:48:24 -04:00
Joseph Doherty
fed5f5a82c Add .gitignore and remove tracked build artifacts (bin/obj) 2026-03-16 18:38:00 -04:00
Joseph Doherty
34190e1347 Phase 0 WP-0.1: Create .NET 10 solution structure with all 17 component projects
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.
2026-03-16 18:37:36 -04:00