Files
CBDDC/docs/peer-deprecation-removal-runbook.md

2.1 KiB

Peer Deprecation & Removal Runbook

Operational workflow for safely deprecating or removing peers in clusters using peer-confirmed pruning.

When to use this runbook

  • A site is permanently decommissioned.
  • A peer has been unreachable long enough to block prune progress.
  • A peer is being replaced and should stop gating prune decisions.

Decision matrix

Scenario Action
Peer is temporarily offline and expected to return soon Keep tracking; monitor lag and confirmations.
Peer should stay configured but must stop gating pruning RemovePeerTrackingAsync(nodeId, removeRemoteConfig: false)
Peer is permanently removed from topology RemoveRemotePeerAsync(nodeId)

Procedure

  1. Confirm peer intent (temporary outage vs. decommission).
  2. Inspect health payload:
    • peersWithNoConfirmation
    • laggingPeers
    • lastSuccessfulConfirmationUpdateByPeer
  3. If deprecating from prune gating only, run:
await peerManagement.RemovePeerTrackingAsync(
    nodeId: "peer-to-deprecate",
    removeRemoteConfig: false,
    cancellationToken);
  1. If permanently removing, run:
await peerManagement.RemoveRemotePeerAsync("peer-to-remove", cancellationToken);
  1. Re-check /health and verify status transition:
    • Degraded/Unhealthy should clear if the removed peer was the cause.
  2. Confirm maintenance logs no longer report prune blocking for that peer.

Post-change verification

  • Peer no longer appears in active tracked peers.
  • maxLagMs trends with only current active peers.
  • Pruning resumes with a valid effective cutoff (or a known non-peer reason).

Emergency path

If pruning is blocked and storage pressure is high:

  1. De-track the clearly retired peer first (removeRemoteConfig: false).
  2. Validate pruning resumes.
  3. Perform full peer removal after change-control approval.

Re-activation path

If a deprecated peer returns and should gate pruning again:

  1. Ensure peer config is enabled/available.
  2. Allow sync to re-register and refresh confirmations.
  3. Watch health payload until peer exits peersWithNoConfirmation.