2.3 KiB
2.3 KiB
Upgrade Notes: Peer-Confirmed Pruning
This guide covers adopting peer-confirmed pruning semantics introduced across Phases 1-4.
What changed
- Oplog pruning now uses an effective cutoff:
min(retention cutoff, confirmation cutoff)when peer confirmations are complete.- prune is skipped for that cycle when an active tracked peer is missing required confirmation.
- Peer tracking is now a managed lifecycle:
RemovePeerTrackingAsync(nodeId, removeRemoteConfig: false)deprecates a peer from prune gating.RemoveRemotePeerAsync(nodeId)removes both static peer config and tracking.
- Hosting health now includes confirmation lag semantics:
Degradedfor lagging or unconfirmed tracked peers.Unhealthyfor critical lag or storage failures.
Upgrade impact to expect
- During initial rollout, a peer may appear in
peersWithNoConfirmationuntil the first successful confirmation update. - Any stale active tracked peer can block prune progress and/or keep health degraded.
Recommended rollout sequence
- Upgrade one node and validate health payload and pruning logs.
- Upgrade remaining nodes in the cluster.
- Audit peer inventory and remove/deprecate stale peers.
- Tune lag thresholds after observing normal confirmation latency.
Peer inventory and cleanup
List configured peers:
var peers = await peerManagement.GetAllRemotePeersAsync(cancellationToken);
Deprecate from pruning only:
await peerManagement.RemovePeerTrackingAsync(
nodeId: "retired-peer",
removeRemoteConfig: false,
cancellationToken);
Fully remove peer + tracking:
await peerManagement.RemoveRemotePeerAsync("retired-peer", cancellationToken);
Validation checklist
/healthreturnsHealthyor expected transientDegradedduring warm-up.laggingPeersandpeersWithNoConfirmationconverge toward zero for active peers.- Maintenance logs no longer report prune skip reasons for retired peers.
Rollback/mitigation
If rollout exposes unexpected persistent degradation:
- Remove tracking for permanently retired peers.
- Temporarily raise lag thresholds to reduce alert noise while investigating.
- Keep full peer removal for nodes that are confirmed decommissioned.
For a detailed operator procedure, see Peer Deprecation & Removal Runbook.