# Production Hardening - Implementation Guide ## Quick Reference ### Configuration (appsettings.json) ```json { "CBDDC": { "Network": { "TcpPort": 5000, "UdpPort": 6000, "RetryAttempts": 3 }, "Persistence": { "DatabasePath": "data/cbddc.db", "EnableWalMode": true, "CacheSizeMb": 50, "EnableAutoBackup": true, "BackupPath": "backups/" }, "Sync": { "EnableOfflineQueue": true, "MaxQueueSize": 1000 } } } ``` ### DI Setup ```csharp services.Configure(configuration.GetSection("CBDDC")); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); services.AddCBDDCHosting(options => { options.Cluster.NodeId = "server-01"; options.Cluster.TcpPort = 5001; options.Cluster.PeerConfirmationLagThresholdMs = 30_000; options.Cluster.PeerConfirmationCriticalLagThresholdMs = 120_000; }); ``` ### Health Check ```csharp app.MapHealthChecks("/health"); ``` ### Peer Confirmation Lag Thresholds `CBDDCHealthCheck` evaluates tracked peers using confirmation lag thresholds: - `PeerConfirmationLagThresholdMs` (default `30000`) marks peers as lagging and returns `Degraded`. - `PeerConfirmationCriticalLagThresholdMs` (default `120000`) marks critical lag and returns `Unhealthy`. Thresholds are clamped so critical is never lower than lag. ### Health Status Interpretation For the `cbddc` health check: | Status | Meaning | Typical operator action | |------|-----------|-------------| | Healthy | All active tracked peers have confirmations and lag is within threshold. | No action required. | | Degraded | At least one tracked peer is lagging, or at least one tracked peer has no confirmation rows yet. | Investigate slow/unconfirmed peers, confirm whether any should be untracked/deprecated. | | Unhealthy | At least one tracked peer exceeds critical lag threshold, or persistence check throws. | Page on-call, verify storage/network path, and evaluate emergency peer de-tracking for permanently retired peers. | Health payload fields: - `trackedPeerCount` - `peersWithNoConfirmation` - `maxLagMs` - `laggingPeers` - `lastSuccessfulConfirmationUpdateByPeer` Use these fields to distinguish temporary lag from stale peer registrations. ### Offline Queue ```csharp // Enqueue during offline if (!isOnline) { offlineQueue.Enqueue(new PendingOperation { Type = "put", Collection = "users", Key = "user1", Data = user }); } // Flush when back online var (successful, failed) = await offlineQueue.FlushAsync(async op => { var collection = database.Collection(op.Collection); if (op.Type == "put" && op.Data != null) await collection.Put(op.Key, op.Data); else if (op.Type == "delete") await collection.Delete(op.Key); }); ``` ### Document Cache ```csharp var cache = new DocumentCache(maxSizeMb: 50); // Check cache first var cached = cache.Get("users", "user1"); if (cached != null) return cached; // Load from database var doc = await store.GetDocumentAsync("users", "user1"); if (doc != null) cache.Set("users", "user1", doc); ``` ### SQLite Backup ```csharp await store.BackupAsync("backups/backup-20260115.db"); ``` ### Retry Policy ```csharp var retry = new RetryPolicy(logger, maxAttempts: 3, delayMs: 1000); await retry.ExecuteAsync( () => tcpClient.ConnectAsync(endpoint), "TCP Connect" ); ``` ### Error Handling Use specific exceptions for robust control flow: ```csharp try { await operation(); } catch (DocumentNotFoundException ex) { // Handle specific document missing case logger.LogWarning("Document {Key} missing", ex.Key); } catch (CBDDCConcurrencyException ex) { // Handle conflict (though LWW usually resolves it automatically) logger.LogWarning("Concurrency conflict: {Message}", ex.Message); } catch (NetworkException ex) { logger.LogError(ex, "Network operation failed"); syncTracker.RecordError(ex.Message, peerNodeId, ex.ErrorCode); } catch (PersistenceException ex) when (ex is DatabaseCorruptionException) { logger.LogCritical(ex, "Database corruption detected!"); // Attempt recovery or alert admin } ``` ## Error Codes | Code | Exception | Description | |------|-----------|-------------| | NETWORK_ERROR | NetworkException | Network operation failed | | PERSISTENCE_ERROR | PersistenceException | Database operation failed | | SYNC_ERROR | SyncException | Synchronization failed | | CONFIG_ERROR | ConfigurationException | Invalid configuration | | TIMEOUT_ERROR | TimeoutException | Operation timed out | ## Logging Levels - **Trace**: Internal details (cache hits/misses) - **Debug**: Debugging info (sync operations) - **Information**: Normal events (peer discovered, backup created) - **Warning**: Recoverable errors (queue full, retry attempt, documents not found) - **Error**: Failures requiring attention (sync failed, corruption detected) - **Critical**: System failures (database initialization failed) ## Best Practices 1. **Always use structured logging** ```csharp _logger.LogInformation("User {UserId} synced {Count} documents", userId, count); ``` 2. **Wrap network operations with retry policy** ```csharp await _retryPolicy.ExecuteAsync(() => client.ConnectAsync(), "Connect"); ``` 3. **Check cache before database** ```csharp var doc = _cache.Get(collection, key) ?? await _store.GetDocumentAsync(collection, key); ``` 4. **Enable offline queue for LAN instability** ```csharp if (options.Sync.EnableOfflineQueue && !isOnline) _offlineQueue.Enqueue(operation); ``` 5. **Periodic health checks** ```csharp var timer = new Timer(async _ => { var report = await healthCheckService.CheckHealthAsync(r => r.Name == "cbddc"); var entry = report.Entries["cbddc"]; if (entry.Status != HealthStatus.Healthy) { _logger.LogWarning( "CBDDC health is {Status}. LaggingPeers={LaggingPeers} UnconfirmedPeers={UnconfirmedPeers}", entry.Status, entry.Data["laggingPeers"], entry.Data["peersWithNoConfirmation"]); } }, null, TimeSpan.Zero, TimeSpan.FromMinutes(5)); ``` ## Deployment Checklist - [ ] Configuration file created (appsettings.json) - [ ] Log directory permissions set - [ ] Backup directory configured - [ ] Database file location specified - [ ] Network ports configured (firewall) - [ ] Health check endpoint tested - [ ] Offline queue tested - [ ] Backup/restore tested - [ ] Graceful shutdown tested ## Troubleshooting ### Database corruption ```csharp try { await store.CheckIntegrityAsync(); } catch (DatabaseCorruptionException) { // Restore from backup File.Copy("backups/latest.db", options.Persistence.DatabasePath, overwrite: true); } ``` ### Network issues ``` Check sync tracker: - Last sync time - Active peers - Recent errors ``` ### Performance degradation ```csharp var stats = cache.GetStatistics(); if (stats.HitRate < 0.5) { // Consider increasing cache size options.Persistence.CacheSizeMb = 100; } ```