Files
CBDD/docs/runbook.md

1.9 KiB

Runbook

Purpose

This runbook provides standard operations, incident triage, escalation, and recovery procedures for CBDD maintainers.

Signals And Entry Points

  • CI failures on main
  • Failing integration tests in consumer repositories
  • Regression issues labeled incident
  • Recovery or corruption reports from consumers

Alert Triage Procedure

  1. Capture incident context: version, environment, failing operation, and first failure timestamp.
  2. Classify severity:
  • SEV-1: data loss risk, persistent startup failure, or transaction correctness risk.
  • SEV-2: feature-level regression without confirmed data loss.
  • SEV-3: non-critical behavior or documentation defects.
  1. Create or update the incident issue with owner and current mitigation status.
  2. Reproduce with targeted tests in /Users/dohertj2/Desktop/CBDD/tests/CBDD.Tests.

Diagnostics

  1. Validate build and tests.
dotnet test CBDD.slnx -c Release
  1. Run coverage threshold gate when behavior changed in core paths.
bash scripts/coverage-check.sh
  1. For storage and recovery incidents, prioritize:
  • StorageEngine.Recovery
  • WriteAheadLog
  • transaction protocol tests

Escalation Path

  1. Initial owner: maintainer on incident issue.
  2. Escalate to release maintainer when severity is SEV-1 or rollback is required.
  3. Communicate status updates on each milestone: triage complete, mitigation active, fix merged, validation complete.

Recovery Actions

  1. Contain impact by pinning consumers to last known-good package version.
  2. Apply rollback steps from deployment.md.
  3. Validate repaired build with targeted and full regression suites.
  4. Publish fixed package and confirm consumer recovery.

Post-Incident Expectations

  1. Document root cause, blast radius, and timeline.
  2. Add regression tests to prevent recurrence.
  3. Record follow-up actions in issue tracker with owners and due dates.