- WP-1-3: Central/site failover + dual-node recovery tests (17 tests) - WP-4: Performance testing framework for target scale (7 tests) - WP-5: Security hardening (LDAPS, JWT key length, no secrets in logs) (11 tests) - WP-6: Script sandboxing adversarial tests (28 tests, all forbidden APIs) - WP-7: Recovery drill test scaffolds (5 tests) - WP-8: Observability validation (structured logs, correlation IDs, metrics) (6 tests) - WP-9: Message contract compatibility (forward/backward compat) (18 tests) - WP-10: Deployment packaging (installation guide, production checklist, topology) - WP-11: Operational runbooks (failover, troubleshooting, maintenance) 92 new tests, all passing. Zero warnings.
98 lines
4.2 KiB
Markdown
98 lines
4.2 KiB
Markdown
# ScadaLink Production Deployment Checklist
|
|
|
|
## Pre-Deployment
|
|
|
|
### Configuration Verification
|
|
|
|
- [ ] `ScadaLink:Node:Role` is set correctly (`Central` or `Site`)
|
|
- [ ] `ScadaLink:Node:NodeHostname` matches the machine's resolvable hostname
|
|
- [ ] `ScadaLink:Cluster:SeedNodes` contains exactly 2 entries for the cluster pair
|
|
- [ ] Seed node addresses use fully qualified hostnames (not `localhost`)
|
|
- [ ] Remoting port (default 8081) is open bidirectionally between cluster peers
|
|
|
|
### Central Node
|
|
|
|
- [ ] `ScadaLink:Database:ConfigurationDb` connection string is valid and tested
|
|
- [ ] `ScadaLink:Database:MachineDataDb` connection string is valid and tested
|
|
- [ ] SQL Server login has `db_owner` role on both databases
|
|
- [ ] EF Core migrations have been applied (SQL script reviewed and executed)
|
|
- [ ] `ScadaLink:Security:JwtSigningKey` is at least 32 characters, randomly generated
|
|
- [ ] **Both central nodes use the same JwtSigningKey** (required for JWT failover)
|
|
- [ ] `ScadaLink:Security:LdapServer` points to the production LDAP/AD server
|
|
- [ ] `ScadaLink:Security:LdapUseTls` is `true` (LDAPS required in production)
|
|
- [ ] `ScadaLink:Security:AllowInsecureLdap` is `false`
|
|
- [ ] LDAP search base DN is correct for the organization
|
|
- [ ] LDAP group-to-role mappings are configured
|
|
- [ ] Load balancer is configured in front of central UI (sticky sessions not required)
|
|
- [ ] ASP.NET Data Protection keys are shared between central nodes (for cookie failover)
|
|
- [ ] HTTPS certificate is installed and configured
|
|
|
|
### Site Node
|
|
|
|
- [ ] `ScadaLink:Node:SiteId` is set and unique across all sites
|
|
- [ ] `ScadaLink:Database:SiteDbPath` points to a writable directory
|
|
- [ ] SQLite data directory has sufficient disk space (no max buffer size for S&F)
|
|
- [ ] `ScadaLink:Communication:CentralSeedNode` points to a reachable central node
|
|
- [ ] OPC UA server endpoints are accessible from site nodes
|
|
- [ ] OPC UA security certificates are configured if required
|
|
|
|
### Security
|
|
|
|
- [ ] No secrets in `appsettings.json` committed to source control
|
|
- [ ] Secrets managed via environment variables or a secrets manager
|
|
- [ ] Windows Service account has minimum necessary permissions
|
|
- [ ] Log directory permissions restrict access to service account and administrators
|
|
- [ ] SMTP credentials use OAuth2 Client Credentials (preferred) or secure Basic Auth
|
|
- [ ] API keys for Inbound API are generated with sufficient entropy (32+ chars)
|
|
|
|
### Network
|
|
|
|
- [ ] DNS resolution works between all cluster nodes
|
|
- [ ] Firewall rules permit Akka.NET remoting (TCP 8081)
|
|
- [ ] Firewall rules permit LDAP (TCP 636 for LDAPS)
|
|
- [ ] Firewall rules permit SMTP (TCP 587 for TLS)
|
|
- [ ] Firewall rules permit SQL Server (TCP 1433) from central nodes only
|
|
- [ ] Load balancer health check configured against `/health/ready`
|
|
|
|
## Deployment
|
|
|
|
### Order of Operations
|
|
|
|
1. Deploy central node A (forms single-node cluster)
|
|
2. Verify central node A is healthy: `GET /health/ready` returns 200
|
|
3. Deploy central node B (joins existing cluster)
|
|
4. Verify both central nodes show as cluster members in logs
|
|
5. Deploy site nodes (order does not matter)
|
|
6. Verify sites register with central via health dashboard
|
|
|
|
### Rollback Plan
|
|
|
|
- [ ] Previous version binaries are retained for rollback
|
|
- [ ] Database backup taken before migration
|
|
- [ ] Rollback SQL script is available (if migration requires it)
|
|
- [ ] Service can be stopped and previous binary restored
|
|
|
|
## Post-Deployment
|
|
|
|
### Smoke Tests
|
|
|
|
- [ ] Central UI is accessible and login works
|
|
- [ ] Health dashboard shows all expected sites as online
|
|
- [ ] Template engine can create/save/delete a test template
|
|
- [ ] Deployment pipeline can deploy a test instance to a site
|
|
- [ ] Inbound API responds to test requests with valid API key
|
|
- [ ] Notification Service can send a test email
|
|
|
|
### Monitoring Setup
|
|
|
|
- [ ] Log aggregation is configured (Serilog file sink + centralized collector)
|
|
- [ ] Health dashboard bookmarked for operations team
|
|
- [ ] Alerting configured for site offline threshold violations
|
|
- [ ] Disk space monitoring on site nodes (SQLite growth)
|
|
|
|
### Documentation
|
|
|
|
- [ ] Cluster topology documented (hostnames, ports, roles)
|
|
- [ ] Runbook updated with environment-specific details
|
|
- [ ] On-call team briefed on failover procedures
|