1.9 KiB
1.9 KiB
Runbook
Purpose
This runbook provides standard operations, incident triage, escalation, and recovery procedures for CBDD maintainers.
Signals And Entry Points
- CI failures on
main - Failing integration tests in consumer repositories
- Regression issues labeled
incident - Recovery or corruption reports from consumers
Alert Triage Procedure
- Capture incident context: version, environment, failing operation, and first failure timestamp.
- Classify severity:
SEV-1: data loss risk, persistent startup failure, or transaction correctness risk.SEV-2: feature-level regression without confirmed data loss.SEV-3: non-critical behavior or documentation defects.
- Create or update the incident issue with owner and current mitigation status.
- Reproduce with targeted tests in
/Users/dohertj2/Desktop/CBDD/tests/CBDD.Tests.
Diagnostics
- Validate build and tests.
dotnet test CBDD.slnx -c Release
- Run coverage threshold gate when behavior changed in core paths.
bash scripts/coverage-check.sh
- For storage and recovery incidents, prioritize:
StorageEngine.RecoveryWriteAheadLog- transaction protocol tests
Escalation Path
- Initial owner: maintainer on incident issue.
- Escalate to release maintainer when severity is
SEV-1or rollback is required. - Communicate status updates on each milestone: triage complete, mitigation active, fix merged, validation complete.
Recovery Actions
- Contain impact by pinning consumers to last known-good package version.
- Apply rollback steps from
deployment.md. - Validate repaired build with targeted and full regression suites.
- Publish fixed package and confirm consumer recovery.
Post-Incident Expectations
- Document root cause, blast radius, and timeline.
- Add regression tests to prevent recurrence.
- Record follow-up actions in issue tracker with owners and due dates.