- WP-1-3: Central/site failover + dual-node recovery tests (17 tests) - WP-4: Performance testing framework for target scale (7 tests) - WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests) - WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs) - WP-7: Recovery drill test scaffolds (5 tests) - WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests) - WP-9: Message contract compatibility (forward/backward compat) (18 tests) - WP-10: Deployment packaging (installation guide, production checklist, topology) - WP-11: Operational runbooks (failover, troubleshooting, maintenance) 92 new tests, all passing. Zero warnings.
4.2 KiB
4.2 KiB
ScadaLink Production Deployment Checklist
Pre-Deployment
Configuration Verification
ScadaLink:Node:Roleis set correctly (CentralorSite)ScadaLink:Node:NodeHostnamematches the machine's resolvable hostnameScadaLink:Cluster:SeedNodescontains exactly 2 entries for the cluster pair- Seed node addresses use fully qualified hostnames (not
localhost) - Remoting port (default 8081) is open bidirectionally between cluster peers
Central Node
ScadaLink:Database:ConfigurationDbconnection string is valid and testedScadaLink:Database:MachineDataDbconnection string is valid and tested- SQL Server login has
db_ownerrole on both databases - EF Core migrations have been applied (SQL script reviewed and executed)
ScadaLink:Security:JwtSigningKeyis at least 32 characters, randomly generated- Both central nodes use the same JwtSigningKey (required for JWT failover)
ScadaLink:Security:LdapServerpoints to the production LDAP/AD serverScadaLink:Security:LdapUseTlsistrue(LDAPS required in production)ScadaLink:Security:AllowInsecureLdapisfalse- LDAP search base DN is correct for the organization
- LDAP group-to-role mappings are configured
- Load balancer is configured in front of central UI (sticky sessions not required)
- ASP.NET Data Protection keys are shared between central nodes (for cookie failover)
- HTTPS certificate is installed and configured
Site Node
ScadaLink:Node:SiteIdis set and unique across all sitesScadaLink:Database:SiteDbPathpoints to a writable directory- SQLite data directory has sufficient disk space (no max buffer size for S&F)
ScadaLink:Communication:CentralSeedNodepoints to a reachable central node- OPC UA server endpoints are accessible from site nodes
- OPC UA security certificates are configured if required
Security
- No secrets in
appsettings.jsoncommitted to source control - Secrets managed via environment variables or a secrets manager
- Windows Service account has minimum necessary permissions
- Log directory permissions restrict access to service account and administrators
- SMTP credentials use OAuth2 Client Credentials (preferred) or secure Basic Auth
- API keys for Inbound API are generated with sufficient entropy (32+ chars)
Network
- DNS resolution works between all cluster nodes
- Firewall rules permit Akka.NET remoting (TCP 8081)
- Firewall rules permit LDAP (TCP 636 for LDAPS)
- Firewall rules permit SMTP (TCP 587 for TLS)
- Firewall rules permit SQL Server (TCP 1433) from central nodes only
- Load balancer health check configured against
/health/ready
Deployment
Order of Operations
- Deploy central node A (forms single-node cluster)
- Verify central node A is healthy:
GET /health/readyreturns 200 - Deploy central node B (joins existing cluster)
- Verify both central nodes show as cluster members in logs
- Deploy site nodes (order does not matter)
- Verify sites register with central via health dashboard
Rollback Plan
- Previous version binaries are retained for rollback
- Database backup taken before migration
- Rollback SQL script is available (if migration requires it)
- Service can be stopped and previous binary restored
Post-Deployment
Smoke Tests
- Central UI is accessible and login works
- Health dashboard shows all expected sites as online
- Template engine can create/save/delete a test template
- Deployment pipeline can deploy a test instance to a site
- Inbound API responds to test requests with valid API key
- Notification Service can send a test email
Monitoring Setup
- Log aggregation is configured (Serilog file sink + centralized collector)
- Health dashboard bookmarked for operations team
- Alerting configured for site offline threshold violations
- Disk space monitoring on site nodes (SQLite growth)
Documentation
- Cluster topology documented (hostnames, ports, roles)
- Runbook updated with environment-specific details
- On-call team briefed on failover procedures