docs: align internal docs to enterprise standards

Add canonical operations/security/access/feature docs and fix path integrity to improve onboarding and incident readiness.
2026-02-20 13:23:55 -05:00
parent e6d81f6350
commit ce727eb30d
18 changed files with 783 additions and 186 deletions
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -0,0 +1,61 @@
+# Deployment
+
+This document defines the canonical CBDDC deployment workflow, validation gates, and rollback procedures.
+
+## Environments
+
+- Local development: single node for functional verification.
+- Staging: multi-node mesh or hosted mode with production-like configuration.
+- Production: controlled rollout with health checks, monitoring, and rollback readiness.
+
+## Promotion Workflow
+
+1. Build and test in CI (`dotnet build`, `dotnet test`).
+2. Validate configuration and secrets for the target environment.
+3. Deploy to staging.
+4. Run smoke checks:
+   - Node starts and joins expected peers.
+   - `/health` reports healthy.
+   - Sync operations replicate across target collections.
+5. Promote to production using approved change window.
+
+## Validation Gates
+
+- No failed test suites.
+- No unresolved critical incidents.
+- Health check status is healthy or approved degraded state with mitigation.
+- Backup/restore path confirmed for the active persistence provider.
+
+## Rollback Triggers
+
+Rollback immediately when any of the following occurs:
+
+- Sync correctness regression (missed or duplicate replication events).
+- Persistent unhealthy status after remediation attempt.
+- Authentication or connectivity failure across required peers.
+- Data integrity concern reported by validation checks.
+
+## Rollback Procedure
+
+1. Stop traffic or disable rollout to newly deployed nodes.
+2. Revert to the last known-good build.
+3. Restore previous configuration and secret set.
+4. If needed, restore persistence snapshot/backup.
+5. Re-run health and replication smoke checks.
+6. Record incident details in the runbook timeline.
+
+## Emergency Changes
+
+Use emergency deployment only for incident containment:
+
+1. Capture incident reference and approver.
+2. Apply minimal scoped change.
+3. Run abbreviated smoke checks (startup, health, critical replication path).
+4. Follow up with standard post-incident review.
+
+## Related Documents
+
+- [Deployment Modes](deployment-modes.md)
+- [Deployment (LAN)](deployment-lan.md)
+- [Production Hardening](production-hardening.md)
+- [Runbook](runbook.md)