Initial import of the CBDDC codebase with docs and tests. Add a .NET-focused gitignore to keep generated artifacts out of source control.
Some checks failed
CI / verify (push) Has been cancelled
Some checks failed
CI / verify (push) Has been cancelled
This commit is contained in:
66
docs/peer-deprecation-removal-runbook.md
Normal file
66
docs/peer-deprecation-removal-runbook.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Peer Deprecation & Removal Runbook
|
||||
|
||||
Operational workflow for safely deprecating or removing peers in clusters using
|
||||
peer-confirmed pruning.
|
||||
|
||||
## When to use this runbook
|
||||
|
||||
- A site is permanently decommissioned.
|
||||
- A peer has been unreachable long enough to block prune progress.
|
||||
- A peer is being replaced and should stop gating prune decisions.
|
||||
|
||||
## Decision matrix
|
||||
|
||||
| Scenario | Action |
|
||||
|------|-----------|
|
||||
| Peer is temporarily offline and expected to return soon | Keep tracking; monitor lag and confirmations. |
|
||||
| Peer should stay configured but must stop gating pruning | `RemovePeerTrackingAsync(nodeId, removeRemoteConfig: false)` |
|
||||
| Peer is permanently removed from topology | `RemoveRemotePeerAsync(nodeId)` |
|
||||
|
||||
## Procedure
|
||||
|
||||
1. Confirm peer intent (temporary outage vs. decommission).
|
||||
2. Inspect health payload:
|
||||
- `peersWithNoConfirmation`
|
||||
- `laggingPeers`
|
||||
- `lastSuccessfulConfirmationUpdateByPeer`
|
||||
3. If deprecating from prune gating only, run:
|
||||
|
||||
```csharp
|
||||
await peerManagement.RemovePeerTrackingAsync(
|
||||
nodeId: "peer-to-deprecate",
|
||||
removeRemoteConfig: false,
|
||||
cancellationToken);
|
||||
```
|
||||
|
||||
4. If permanently removing, run:
|
||||
|
||||
```csharp
|
||||
await peerManagement.RemoveRemotePeerAsync("peer-to-remove", cancellationToken);
|
||||
```
|
||||
|
||||
5. Re-check `/health` and verify status transition:
|
||||
- `Degraded`/`Unhealthy` should clear if the removed peer was the cause.
|
||||
6. Confirm maintenance logs no longer report prune blocking for that peer.
|
||||
|
||||
## Post-change verification
|
||||
|
||||
- Peer no longer appears in active tracked peers.
|
||||
- `maxLagMs` trends with only current active peers.
|
||||
- Pruning resumes with a valid effective cutoff (or a known non-peer reason).
|
||||
|
||||
## Emergency path
|
||||
|
||||
If pruning is blocked and storage pressure is high:
|
||||
|
||||
1. De-track the clearly retired peer first (`removeRemoteConfig: false`).
|
||||
2. Validate pruning resumes.
|
||||
3. Perform full peer removal after change-control approval.
|
||||
|
||||
## Re-activation path
|
||||
|
||||
If a deprecated peer returns and should gate pruning again:
|
||||
|
||||
1. Ensure peer config is enabled/available.
|
||||
2. Allow sync to re-register and refresh confirmations.
|
||||
3. Watch health payload until peer exits `peersWithNoConfirmation`.
|
||||
Reference in New Issue
Block a user