docs: align internal docs to enterprise standards
All checks were successful
CI / verify (push) Successful in 2m33s
All checks were successful
CI / verify (push) Successful in 2m33s
Add canonical operations/security/access/feature docs and fix path integrity to improve onboarding and incident readiness.
This commit is contained in:
70
docs/features/peer-to-peer-gossip-sync.md
Normal file
70
docs/features/peer-to-peer-gossip-sync.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# Feature: Peer-to-Peer Gossip Sync
|
||||
|
||||
## Purpose and Business Outcome
|
||||
|
||||
Propagate updates across mesh nodes without a central coordinator so local-first applications remain resilient to intermittent connectivity.
|
||||
|
||||
## Scope and Non-Goals
|
||||
|
||||
Scope:
|
||||
|
||||
- Peer discovery and sync orchestration.
|
||||
- Push/pull propagation of oplog changes.
|
||||
|
||||
Non-goals:
|
||||
|
||||
- Strong global consistency guarantees.
|
||||
- Public internet exposure without additional network controls.
|
||||
|
||||
## User and System Workflows
|
||||
|
||||
1. Node starts and discovers peers.
|
||||
2. Node exchanges sync metadata with connected peers.
|
||||
3. Missing operations are requested and applied.
|
||||
4. Mesh converges over repeated gossip rounds.
|
||||
|
||||
## Interfaces, APIs, and Events Involved
|
||||
|
||||
- Sync orchestrator scheduling
|
||||
- TCP peer sync channels
|
||||
- Vector clock exchange and reconciliation
|
||||
|
||||
## Permissions and Data Handling
|
||||
|
||||
- Peers with valid authentication token can exchange replicated collection data.
|
||||
- Cluster membership should be restricted to trusted nodes.
|
||||
|
||||
## Dependencies and Failure Modes
|
||||
|
||||
Dependencies:
|
||||
|
||||
- Network reachability
|
||||
- Shared authentication material
|
||||
- Healthy persistence layer
|
||||
|
||||
Failure modes:
|
||||
|
||||
- Peer isolation due to network outage
|
||||
- Token mismatch blocking synchronization
|
||||
- Sustained lag under high write pressure
|
||||
|
||||
## Monitoring, Alerts, and Troubleshooting Pointers
|
||||
|
||||
- Monitor `laggingPeers`, `maxLagMs`, and active peer counts.
|
||||
- Follow [Runbook](../runbook.md) playbooks for lagging or disconnected peers.
|
||||
|
||||
## Rollout and Change Considerations
|
||||
|
||||
- Roll out protocol-affecting changes in a controlled window.
|
||||
- Confirm backward/forward compatibility in staging mesh before production rollout.
|
||||
|
||||
## Validation and Testability Guidance
|
||||
|
||||
- Run multi-node integration tests with controlled partitions.
|
||||
- Validate eventual convergence after reconnect.
|
||||
- Verify no data-loss under repeated reconnect scenarios.
|
||||
|
||||
## Related Security Controls
|
||||
|
||||
- [Security](../security.md)
|
||||
- [Access and Permissions](../access.md)
|
||||
Reference in New Issue
Block a user