# Troubleshooting This guide lists recurring CBDDC failure modes, likely causes, and remediation steps. ## Peer Cannot Connect Symptoms: - Node remains disconnected from expected peers. - Health check reports lagging or unconfirmed peers. Likely causes: - Network path blocked (port/firewall mismatch). - `AuthToken` mismatch. - Peer configuration drift. Resolution: 1. Verify TCP/UDP port configuration on both peers. 2. Confirm shared token and node identity settings. 3. Restart peer service and monitor logs. 4. Recheck `cbddc` health payload. ## Replication Delay or Missing Updates Symptoms: - Writes are visible locally but not on remote peers. - `maxLagMs` grows continuously. Likely causes: - Retired peer still tracked and gating pruning. - High load or transient network instability. - Invalid collection watch configuration. Resolution: 1. Confirm affected collections are registered with `WatchCollection()`. 2. Inspect peer confirmation metrics. 3. If needed, de-track retired peers using the runbook. 4. Re-run smoke sync validation after changes. ## Persistence Errors Symptoms: - Startup or write failures from persistence layer. - Unhealthy health check due to storage exceptions. Likely causes: - File/path permission errors. - Storage corruption. - Misconfigured provider settings. Resolution: 1. Validate storage path and runtime permissions. 2. Run integrity checks. 3. Restore from latest good backup if corruption is detected. 4. Validate read/write and replication after restore. ## Configuration Regressions After Release Symptoms: - Behavior changed immediately after deployment. - Multiple nodes fail with same error pattern. Likely causes: - Incorrect environment variables or appsettings values. - Partial rollout with incompatible settings. Resolution: 1. Compare deployed configuration to approved baseline. 2. Roll back to last known-good release if production impact is high. 3. Redeploy with corrected configuration. 4. Document root cause and preventive controls. ## Escalation If a Sev 1/Sev 2 condition cannot be resolved quickly, follow [Runbook](runbook.md) escalation and incident procedures.