7.2 KiB
Executable File
7.2 KiB
Executable File
Production Hardening - Implementation Guide
Quick Reference
Configuration (appsettings.json)
{
"CBDDC": {
"Network": {
"TcpPort": 5000,
"UdpPort": 6000,
"RetryAttempts": 3
},
"Persistence": {
"DatabasePath": "data/cbddc.db",
"EnableWalMode": true,
"CacheSizeMb": 50,
"EnableAutoBackup": true,
"BackupPath": "backups/"
},
"Sync": {
"EnableOfflineQueue": true,
"MaxQueueSize": 1000
}
}
}
DI Setup
services.Configure<CBDDCOptions>(configuration.GetSection("CBDDC"));
services.AddSingleton<RetryPolicy>();
services.AddSingleton<DocumentCache>();
services.AddSingleton<OfflineQueue>();
services.AddSingleton<SyncStatusTracker>();
services.AddCBDDCHosting(options =>
{
options.Cluster.NodeId = "server-01";
options.Cluster.TcpPort = 5001;
options.Cluster.PeerConfirmationLagThresholdMs = 30_000;
options.Cluster.PeerConfirmationCriticalLagThresholdMs = 120_000;
});
Health Check
app.MapHealthChecks("/health");
Peer Confirmation Lag Thresholds
CBDDCHealthCheck evaluates tracked peers using confirmation lag thresholds:
PeerConfirmationLagThresholdMs(default30000) marks peers as lagging and returnsDegraded.PeerConfirmationCriticalLagThresholdMs(default120000) marks critical lag and returnsUnhealthy.
Thresholds are clamped so critical is never lower than lag.
Health Status Interpretation
For the cbddc health check:
| Status | Meaning | Typical operator action |
|---|---|---|
| Healthy | All active tracked peers have confirmations and lag is within threshold. | No action required. |
| Degraded | At least one tracked peer is lagging, or at least one tracked peer has no confirmation rows yet. | Investigate slow/unconfirmed peers, confirm whether any should be untracked/deprecated. |
| Unhealthy | At least one tracked peer exceeds critical lag threshold, or persistence check throws. | Page on-call, verify storage/network path, and evaluate emergency peer de-tracking for permanently retired peers. |
Health payload fields:
trackedPeerCountpeersWithNoConfirmationmaxLagMslaggingPeerslastSuccessfulConfirmationUpdateByPeer
Use these fields to distinguish temporary lag from stale peer registrations.
Offline Queue
// Enqueue during offline
if (!isOnline)
{
offlineQueue.Enqueue(new PendingOperation
{
Type = "put",
Collection = "users",
Key = "user1",
Data = user
});
}
// Flush when back online
var (successful, failed) = await offlineQueue.FlushAsync(async op =>
{
var collection = database.Collection(op.Collection);
if (op.Type == "put" && op.Data != null)
await collection.Put(op.Key, op.Data);
else if (op.Type == "delete")
await collection.Delete(op.Key);
});
Document Cache
var cache = new DocumentCache(maxSizeMb: 50);
// Check cache first
var cached = cache.Get("users", "user1");
if (cached != null) return cached;
// Load from database
var doc = await store.GetDocumentAsync("users", "user1");
if (doc != null) cache.Set("users", "user1", doc);
SQLite Backup
await store.BackupAsync("backups/backup-20260115.db");
Retry Policy
var retry = new RetryPolicy(logger, maxAttempts: 3, delayMs: 1000);
await retry.ExecuteAsync(
() => tcpClient.ConnectAsync(endpoint),
"TCP Connect"
);
Error Handling
Use specific exceptions for robust control flow:
try
{
await operation();
}
catch (DocumentNotFoundException ex)
{
// Handle specific document missing case
logger.LogWarning("Document {Key} missing", ex.Key);
}
catch (CBDDCConcurrencyException ex)
{
// Handle conflict (though LWW usually resolves it automatically)
logger.LogWarning("Concurrency conflict: {Message}", ex.Message);
}
catch (NetworkException ex)
{
logger.LogError(ex, "Network operation failed");
syncTracker.RecordError(ex.Message, peerNodeId, ex.ErrorCode);
}
catch (PersistenceException ex) when (ex is DatabaseCorruptionException)
{
logger.LogCritical(ex, "Database corruption detected!");
// Attempt recovery or alert admin
}
Error Codes
| Code | Exception | Description |
|---|---|---|
| NETWORK_ERROR | NetworkException | Network operation failed |
| PERSISTENCE_ERROR | PersistenceException | Database operation failed |
| SYNC_ERROR | SyncException | Synchronization failed |
| CONFIG_ERROR | ConfigurationException | Invalid configuration |
| TIMEOUT_ERROR | TimeoutException | Operation timed out |
Logging Levels
- Trace: Internal details (cache hits/misses)
- Debug: Debugging info (sync operations)
- Information: Normal events (peer discovered, backup created)
- Warning: Recoverable errors (queue full, retry attempt, documents not found)
- Error: Failures requiring attention (sync failed, corruption detected)
- Critical: System failures (database initialization failed)
Best Practices
-
Always use structured logging
_logger.LogInformation("User {UserId} synced {Count} documents", userId, count); -
Wrap network operations with retry policy
await _retryPolicy.ExecuteAsync(() => client.ConnectAsync(), "Connect"); -
Check cache before database
var doc = _cache.Get(collection, key) ?? await _store.GetDocumentAsync(collection, key); -
Enable offline queue for LAN instability
if (options.Sync.EnableOfflineQueue && !isOnline) _offlineQueue.Enqueue(operation); -
Periodic health checks
var timer = new Timer(async _ => { var report = await healthCheckService.CheckHealthAsync(r => r.Name == "cbddc"); var entry = report.Entries["cbddc"]; if (entry.Status != HealthStatus.Healthy) { _logger.LogWarning( "CBDDC health is {Status}. LaggingPeers={LaggingPeers} UnconfirmedPeers={UnconfirmedPeers}", entry.Status, entry.Data["laggingPeers"], entry.Data["peersWithNoConfirmation"]); } }, null, TimeSpan.Zero, TimeSpan.FromMinutes(5));
Deployment Checklist
- Configuration file created (appsettings.json)
- Log directory permissions set
- Backup directory configured
- Database file location specified
- Network ports configured (firewall)
- Health check endpoint tested
- Offline queue tested
- Backup/restore tested
- Graceful shutdown tested
Troubleshooting
Database corruption
try
{
await store.CheckIntegrityAsync();
}
catch (DatabaseCorruptionException)
{
// Restore from backup
File.Copy("backups/latest.db", options.Persistence.DatabasePath, overwrite: true);
}
Network issues
Check sync tracker:
- Last sync time
- Active peers
- Recent errors
Performance degradation
var stats = cache.GetStatistics();
if (stats.HitRate < 0.5)
{
// Consider increasing cache size
options.Persistence.CacheSizeMb = 100;
}