# Runbook ## Purpose This runbook provides standard operations, incident triage, escalation, and recovery procedures for CBDD maintainers. ## Signals And Entry Points - CI failures on `main` - Failing integration tests in consumer repositories - Regression issues labeled `incident` - Recovery or corruption reports from consumers ## Alert Triage Procedure 1. Capture incident context: version, environment, failing operation, and first failure timestamp. 2. Classify severity: - `SEV-1`: data loss risk, persistent startup failure, or transaction correctness risk. - `SEV-2`: feature-level regression without confirmed data loss. - `SEV-3`: non-critical behavior or documentation defects. 3. Create or update the incident issue with owner and current mitigation status. 4. Reproduce with targeted tests in `/Users/dohertj2/Desktop/CBDD/tests/CBDD.Tests`. ## Diagnostics 1. Validate build and tests. ```bash dotnet test CBDD.slnx -c Release ``` 2. Run coverage threshold gate when behavior changed in core paths. ```bash bash scripts/coverage-check.sh ``` 3. For storage and recovery incidents, prioritize: - `StorageEngine.Recovery` - `WriteAheadLog` - transaction protocol tests ## Escalation Path 1. Initial owner: maintainer on incident issue. 2. Escalate to release maintainer when severity is `SEV-1` or rollback is required. 3. Communicate status updates on each milestone: triage complete, mitigation active, fix merged, validation complete. ## Recovery Actions 1. Contain impact by pinning consumers to last known-good package version. 2. Apply rollback steps from [`deployment.md`](deployment.md#rollback-procedure). 3. Validate repaired build with targeted and full regression suites. 4. Publish fixed package and confirm consumer recovery. ## Post-Incident Expectations 1. Document root cause, blast radius, and timeline. 2. Add regression tests to prevent recurrence. 3. Record follow-up actions in issue tracker with owners and due dates.