docs: align internal docs to enterprise standards
All checks were successful
CI / verify (push) Successful in 2m33s
All checks were successful
CI / verify (push) Successful in 2m33s
Add canonical operations/security/access/feature docs and fix path integrity to improve onboarding and incident readiness.
This commit is contained in:
68
docs/features/peer-confirmed-pruning.md
Normal file
68
docs/features/peer-confirmed-pruning.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Feature: Peer-Confirmed Pruning
|
||||
|
||||
## Purpose and Business Outcome
|
||||
|
||||
Prune oplog history safely while preventing data loss for active peers that have not confirmed required streams.
|
||||
|
||||
## Scope and Non-Goals
|
||||
|
||||
Scope:
|
||||
|
||||
- Retention cutoff plus peer confirmation cutoff logic.
|
||||
- Operational controls for peer tracking and de-tracking.
|
||||
|
||||
Non-goals:
|
||||
|
||||
- Automatic removal of retired peers without operator action.
|
||||
- Replacement for backup/restore strategy.
|
||||
|
||||
## User and System Workflows
|
||||
|
||||
1. Maintenance job calculates retention and confirmation cutoffs.
|
||||
2. System blocks pruning when required confirmations are missing.
|
||||
3. Operator de-tracks retired peers when appropriate.
|
||||
4. Pruning resumes once constraints are satisfied.
|
||||
|
||||
## Interfaces, APIs, and Events Involved
|
||||
|
||||
- Maintenance pruning scheduler
|
||||
- Peer tracking operations (`RemovePeerTrackingAsync`, `RemoveRemotePeerAsync`)
|
||||
- Health metrics for lag and missing confirmations
|
||||
|
||||
## Permissions and Data Handling
|
||||
|
||||
- Only approved operators should modify peer tracking state.
|
||||
- Pruning decisions must be auditable through logs and incident notes.
|
||||
|
||||
## Dependencies and Failure Modes
|
||||
|
||||
Dependencies:
|
||||
|
||||
- Accurate peer tracking metadata
|
||||
- Timely confirmation updates per source stream
|
||||
|
||||
Failure modes:
|
||||
|
||||
- Pruning blocked indefinitely by stale peer tracking
|
||||
- Unsafe pruning if controls are bypassed
|
||||
|
||||
## Monitoring, Alerts, and Troubleshooting Pointers
|
||||
|
||||
- Monitor peers with missing confirmations and sustained lag.
|
||||
- Use [Peer Deprecation and Removal Runbook](../peer-deprecation-removal-runbook.md) for operational actions.
|
||||
|
||||
## Rollout and Change Considerations
|
||||
|
||||
- Introduce pruning policy changes with explicit maintenance windows.
|
||||
- Validate expected cutoff behavior in staging before production rollout.
|
||||
|
||||
## Validation and Testability Guidance
|
||||
|
||||
- Add tests for blocked prune when confirmations are missing.
|
||||
- Add tests for resumed prune after de-tracking retired peers.
|
||||
- Smoke test health status transitions around peer lifecycle changes.
|
||||
|
||||
## Related Security Controls
|
||||
|
||||
- [Security](../security.md)
|
||||
- [Runbook](../runbook.md)
|
||||
Reference in New Issue
Block a user