# ScadaLink Maintenance Procedures ## SQL Server Maintenance (Central) ### Regular Maintenance Schedule | Task | Frequency | Window | |------|-----------|--------| | Index rebuild | Weekly | Off-peak hours | | Statistics update | Daily | Automated | | Backup (full) | Daily | Off-peak hours | | Backup (differential) | Every 4 hours | Anytime | | Backup (transaction log) | Every 15 minutes | Anytime | | Integrity check (DBCC CHECKDB) | Weekly | Off-peak hours | ### Index Maintenance ```sql -- Rebuild fragmented indexes on configuration database USE ScadaLink; EXEC sp_MSforeachtable 'ALTER INDEX ALL ON ? REBUILD WITH (ONLINE = ON)'; ``` For large tables (AuditLogEntries, DeploymentRecords), consider filtered rebuilds: ```sql ALTER INDEX IX_AuditLogEntries_Timestamp ON AuditLogEntries REBUILD WITH (ONLINE = ON, FILLFACTOR = 90); ``` ### Audit Log Retention The AuditLogEntries table grows continuously. Implement a retention policy: ```sql -- Delete audit entries older than 1 year DELETE FROM AuditLogEntries WHERE Timestamp < DATEADD(YEAR, -1, GETUTCDATE()); ``` Consider partitioning the AuditLogEntries table by month for efficient purging. ### Database Growth Monitoring ```sql -- Check database sizes EXEC sp_helpdb 'ScadaLink'; EXEC sp_helpdb 'ScadaLink_MachineData'; -- Check table sizes SELECT t.NAME AS TableName, p.rows AS RowCount, SUM(a.total_pages) * 8 / 1024.0 AS TotalSpaceMB FROM sys.tables t INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id GROUP BY t.Name, p.Rows ORDER BY TotalSpaceMB DESC; ``` ## SQLite Management (Site) ### Database Files | File | Purpose | Growth Pattern | |------|---------|---------------| | `site.db` | Deployed configs, static overrides | Stable (grows with deployments) | | `store-and-forward.db` | S&F message buffer | Variable (grows during outages) | ### Monitoring SQLite Size ```powershell # Check SQLite file sizes Get-ChildItem C:\ScadaLink\data\*.db | Select-Object Name, @{N='SizeMB';E={[math]::Round($_.Length/1MB,2)}} ``` ### S&F Database Growth The S&F database has **no max buffer size** by design. During extended outages, it can grow significantly. **Monitoring:** - Check buffer depth in the health dashboard. - Alert if `store-and-forward.db` exceeds 1 GB. **Manual cleanup (if needed):** 1. Identify and discard permanently undeliverable parked messages via the central UI. 2. If the database is very large and the site is healthy, the messages will be delivered and removed automatically. ### SQLite Vacuum SQLite does not reclaim disk space after deleting rows. Periodically vacuum: ```powershell # Stop the ScadaLink service first sc.exe stop ScadaLink-Site # Vacuum the S&F database sqlite3 C:\ScadaLink\data\store-and-forward.db "VACUUM;" # Restart the service sc.exe start ScadaLink-Site ``` **Important:** Only vacuum when the service is stopped. SQLite does not support concurrent vacuum. ### SQLite Backup ```powershell # Hot backup using SQLite backup API (safe while service is running) sqlite3 C:\ScadaLink\data\site.db ".backup C:\Backups\site-$(Get-Date -Format yyyyMMdd).db" sqlite3 C:\ScadaLink\data\store-and-forward.db ".backup C:\Backups\sf-$(Get-Date -Format yyyyMMdd).db" ``` ## Log Rotation ### Serilog File Sink ScadaLink uses Serilog's rolling file sink with daily rotation: - New file created each day: `scadalink-20260316.log` - Files are not automatically deleted. ### Log Retention Policy Implement a scheduled task to delete old log files: ```powershell # Delete log files older than 30 days Get-ChildItem C:\ScadaLink\logs\scadalink-*.log | Where-Object { $_.LastWriteTime -lt (Get-Date).AddDays(-30) } | Remove-Item -Force ``` Schedule this as a Windows Task: ```powershell $action = New-ScheduledTaskAction -Execute "powershell.exe" -Argument "-NoProfile -Command `"Get-ChildItem C:\ScadaLink\logs\scadalink-*.log | Where-Object { `$_.LastWriteTime -lt (Get-Date).AddDays(-30) } | Remove-Item -Force`"" $trigger = New-ScheduledTaskTrigger -Daily -At "03:00" Register-ScheduledTask -TaskName "ScadaLink-LogCleanup" -Action $action -Trigger $trigger -Description "Clean up ScadaLink log files older than 30 days" ``` ### Log Disk Space Monitor disk space on all nodes: ```powershell Get-PSDrive C | Select-Object @{N='UsedGB';E={[math]::Round($_.Used/1GB,1)}}, @{N='FreeGB';E={[math]::Round($_.Free/1GB,1)}} ``` Alert if free space drops below 5 GB. ## Site Event Log Maintenance ### Automatic Purge The Site Event Logging component has built-in purge: - **Retention**: 30 days (configurable via `SiteEventLog:RetentionDays`) - **Storage cap**: 1 GB (configurable via `SiteEventLog:MaxStorageMB`) - **Purge interval**: Every 24 hours (configurable via `SiteEventLog:PurgeIntervalHours`) No manual intervention needed under normal conditions. ### Manual Purge (Emergency) If event log storage is consuming excessive disk space: ```powershell # Stop the service sc.exe stop ScadaLink-Site # Delete the event log database and let it be recreated Remove-Item C:\ScadaLink\data\event-log.db # Restart the service sc.exe start ScadaLink-Site ``` ## Certificate Management ### LDAP Certificates If using LDAPS (port 636), the LDAP server's TLS certificate must be trusted: 1. Export the CA certificate from Active Directory. 2. Import into the Windows certificate store on both central nodes. 3. Restart the ScadaLink service. ### OPC UA Certificates OPC UA connections may require certificate trust configuration: 1. On first connection, the OPC UA client generates a self-signed certificate. 2. The OPC UA server must trust this certificate. 3. If the site node is replaced, a new certificate is generated; update the server trust list. ## Scheduled Maintenance Window ### Recommended Procedure 1. **Notify operators** that the system will be in maintenance mode. 2. **Gracefully stop the standby node** first (allows singleton to remain on active). 3. Perform maintenance on the standby node (OS updates, disk cleanup, etc.). 4. **Start the standby node** and verify it joins the cluster. 5. **Gracefully stop the active node** (CoordinatedShutdown migrates singletons to the now-running standby). 6. Perform maintenance on the former active node. 7. **Start the former active node** — it rejoins as standby. This procedure maintains availability throughout the maintenance window. ### Emergency Maintenance (Both Nodes) If both nodes must be stopped simultaneously: 1. Stop both nodes. 2. Perform maintenance. 3. Start one node (it forms a single-node cluster). 4. Verify health. 5. Start the second node. Sites continue operating independently during central maintenance. Site-buffered data (S&F) will be delivered when central communication restores.