Files
CBDDC/docs/features/peer-to-peer-gossip-sync.md
Joseph Doherty ce727eb30d
All checks were successful
CI / verify (push) Successful in 2m33s
docs: align internal docs to enterprise standards
Add canonical operations/security/access/feature docs and fix path integrity to improve onboarding and incident readiness.
2026-02-20 13:23:55 -05:00

71 lines
1.9 KiB
Markdown

# Feature: Peer-to-Peer Gossip Sync
## Purpose and Business Outcome
Propagate updates across mesh nodes without a central coordinator so local-first applications remain resilient to intermittent connectivity.
## Scope and Non-Goals
Scope:
- Peer discovery and sync orchestration.
- Push/pull propagation of oplog changes.
Non-goals:
- Strong global consistency guarantees.
- Public internet exposure without additional network controls.
## User and System Workflows
1. Node starts and discovers peers.
2. Node exchanges sync metadata with connected peers.
3. Missing operations are requested and applied.
4. Mesh converges over repeated gossip rounds.
## Interfaces, APIs, and Events Involved
- Sync orchestrator scheduling
- TCP peer sync channels
- Vector clock exchange and reconciliation
## Permissions and Data Handling
- Peers with valid authentication token can exchange replicated collection data.
- Cluster membership should be restricted to trusted nodes.
## Dependencies and Failure Modes
Dependencies:
- Network reachability
- Shared authentication material
- Healthy persistence layer
Failure modes:
- Peer isolation due to network outage
- Token mismatch blocking synchronization
- Sustained lag under high write pressure
## Monitoring, Alerts, and Troubleshooting Pointers
- Monitor `laggingPeers`, `maxLagMs`, and active peer counts.
- Follow [Runbook](../runbook.md) playbooks for lagging or disconnected peers.
## Rollout and Change Considerations
- Roll out protocol-affecting changes in a controlled window.
- Confirm backward/forward compatibility in staging mesh before production rollout.
## Validation and Testability Guidance
- Run multi-node integration tests with controlled partitions.
- Validate eventual convergence after reconnect.
- Verify no data-loss under repeated reconnect scenarios.
## Related Security Controls
- [Security](../security.md)
- [Access and Permissions](../access.md)