# Runbook

## Purpose

This runbook provides standard operations, incident triage, escalation, and recovery procedures for CBDD maintainers.

## Signals And Entry Points

- CI failures on `main`
- Failing integration tests in consumer repositories
- Regression issues labeled `incident`
- Recovery or corruption reports from consumers

## Alert Triage Procedure

1. Capture incident context: version, environment, failing operation, and first failure timestamp.
2. Classify severity:
- `SEV-1`: data loss risk, persistent startup failure, or transaction correctness risk.
- `SEV-2`: feature-level regression without confirmed data loss.
- `SEV-3`: non-critical behavior or documentation defects.
3. Create or update the incident issue with owner and current mitigation status.
4. Reproduce with targeted tests in `/Users/dohertj2/Desktop/CBDD/tests/CBDD.Tests`.

## Diagnostics

1. Validate build and tests.
```bash
dotnet test CBDD.slnx -c Release
```
2. Run coverage threshold gate when behavior changed in core paths.
```bash
bash scripts/coverage-check.sh
```
3. For storage and recovery incidents, prioritize:
- `StorageEngine.Recovery`
- `WriteAheadLog`
- transaction protocol tests

## Escalation Path

1. Initial owner: maintainer on incident issue.
2. Escalate to release maintainer when severity is `SEV-1` or rollback is required.
3. Communicate status updates on each milestone: triage complete, mitigation active, fix merged, validation complete.

## Recovery Actions

1. Contain impact by pinning consumers to last known-good package version.
2. Apply rollback steps from [`deployment.md`](deployment.md#rollback-procedure).
3. Validate repaired build with targeted and full regression suites.
4. Publish fixed package and confirm consumer recovery.

## Post-Incident Expectations

1. Document root cause, blast radius, and timeline.
2. Add regression tests to prevent recurrence.
3. Record follow-up actions in issue tracker with owners and due dates.