2.1 KiB
2.1 KiB
Peer Deprecation & Removal Runbook
Operational workflow for safely deprecating or removing peers in clusters using peer-confirmed pruning.
When to use this runbook
- A site is permanently decommissioned.
- A peer has been unreachable long enough to block prune progress.
- A peer is being replaced and should stop gating prune decisions.
Decision matrix
| Scenario | Action |
|---|---|
| Peer is temporarily offline and expected to return soon | Keep tracking; monitor lag and confirmations. |
| Peer should stay configured but must stop gating pruning | RemovePeerTrackingAsync(nodeId, removeRemoteConfig: false) |
| Peer is permanently removed from topology | RemoveRemotePeerAsync(nodeId) |
Procedure
- Confirm peer intent (temporary outage vs. decommission).
- Inspect health payload:
peersWithNoConfirmationlaggingPeerslastSuccessfulConfirmationUpdateByPeer
- If deprecating from prune gating only, run:
await peerManagement.RemovePeerTrackingAsync(
nodeId: "peer-to-deprecate",
removeRemoteConfig: false,
cancellationToken);
- If permanently removing, run:
await peerManagement.RemoveRemotePeerAsync("peer-to-remove", cancellationToken);
- Re-check
/healthand verify status transition:Degraded/Unhealthyshould clear if the removed peer was the cause.
- Confirm maintenance logs no longer report prune blocking for that peer.
Post-change verification
- Peer no longer appears in active tracked peers.
maxLagMstrends with only current active peers.- Pruning resumes with a valid effective cutoff (or a known non-peer reason).
Emergency path
If pruning is blocked and storage pressure is high:
- De-track the clearly retired peer first (
removeRemoteConfig: false). - Validate pruning resumes.
- Perform full peer removal after change-control approval.
Re-activation path
If a deprecated peer returns and should gate pruning again:
- Ensure peer config is enabled/available.
- Allow sync to re-register and refresh confirmations.
- Watch health payload until peer exits
peersWithNoConfirmation.