Initial import of the CBDDC codebase with docs and tests. Add a .NET-focused gitignore to keep generated artifacts out of source control.
Some checks failed
CI / verify (push) Has been cancelled
Some checks failed
CI / verify (push) Has been cancelled
This commit is contained in:
69
docs/upgrade-peer-confirmed-pruning.md
Normal file
69
docs/upgrade-peer-confirmed-pruning.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Upgrade Notes: Peer-Confirmed Pruning
|
||||
|
||||
This guide covers adopting peer-confirmed pruning semantics introduced across
|
||||
Phases 1-4.
|
||||
|
||||
## What changed
|
||||
|
||||
1. Oplog pruning now uses an **effective cutoff**:
|
||||
- `min(retention cutoff, confirmation cutoff)` when peer confirmations are complete.
|
||||
- prune is skipped for that cycle when an active tracked peer is missing required confirmation.
|
||||
2. Peer tracking is now a managed lifecycle:
|
||||
- `RemovePeerTrackingAsync(nodeId, removeRemoteConfig: false)` deprecates a peer from prune gating.
|
||||
- `RemoveRemotePeerAsync(nodeId)` removes both static peer config and tracking.
|
||||
3. Hosting health now includes confirmation lag semantics:
|
||||
- `Degraded` for lagging or unconfirmed tracked peers.
|
||||
- `Unhealthy` for critical lag or storage failures.
|
||||
|
||||
## Upgrade impact to expect
|
||||
|
||||
- During initial rollout, a peer may appear in `peersWithNoConfirmation` until the
|
||||
first successful confirmation update.
|
||||
- Any stale active tracked peer can block prune progress and/or keep health degraded.
|
||||
|
||||
## Recommended rollout sequence
|
||||
|
||||
1. Upgrade one node and validate health payload and pruning logs.
|
||||
2. Upgrade remaining nodes in the cluster.
|
||||
3. Audit peer inventory and remove/deprecate stale peers.
|
||||
4. Tune lag thresholds after observing normal confirmation latency.
|
||||
|
||||
## Peer inventory and cleanup
|
||||
|
||||
List configured peers:
|
||||
|
||||
```csharp
|
||||
var peers = await peerManagement.GetAllRemotePeersAsync(cancellationToken);
|
||||
```
|
||||
|
||||
Deprecate from pruning only:
|
||||
|
||||
```csharp
|
||||
await peerManagement.RemovePeerTrackingAsync(
|
||||
nodeId: "retired-peer",
|
||||
removeRemoteConfig: false,
|
||||
cancellationToken);
|
||||
```
|
||||
|
||||
Fully remove peer + tracking:
|
||||
|
||||
```csharp
|
||||
await peerManagement.RemoveRemotePeerAsync("retired-peer", cancellationToken);
|
||||
```
|
||||
|
||||
## Validation checklist
|
||||
|
||||
- `/health` returns `Healthy` or expected transient `Degraded` during warm-up.
|
||||
- `laggingPeers` and `peersWithNoConfirmation` converge toward zero for active peers.
|
||||
- Maintenance logs no longer report prune skip reasons for retired peers.
|
||||
|
||||
## Rollback/mitigation
|
||||
|
||||
If rollout exposes unexpected persistent degradation:
|
||||
|
||||
1. Remove tracking for permanently retired peers.
|
||||
2. Temporarily raise lag thresholds to reduce alert noise while investigating.
|
||||
3. Keep full peer removal for nodes that are confirmed decommissioned.
|
||||
|
||||
For a detailed operator procedure, see
|
||||
[Peer Deprecation & Removal Runbook](peer-deprecation-removal-runbook.html).
|
||||
Reference in New Issue
Block a user