Implement in-process multi-dataset sync isolation across core, network, persistence, and tests
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
This commit is contained in:
@@ -30,6 +30,13 @@ To optimize reconnection, each node maintains a **Snapshot** of the last known s
|
||||
- If the chain hash matches, they only exchange the delta.
|
||||
- This avoids re-processing the entire operation history and ensures efficient gap recovery.
|
||||
|
||||
### Multi-Dataset Sync
|
||||
CBDDC supports per-dataset sync pipelines in one process.
|
||||
|
||||
- Dataset identity (`datasetId`) is propagated in protocol and persistence records.
|
||||
- Each dataset has independent oplog reads, confirmation state, and maintenance cadence.
|
||||
- Legacy peers without dataset fields interoperate on `primary`.
|
||||
|
||||
### Peer-Confirmed Oplog Pruning
|
||||
CBDDC maintenance pruning now uses a two-cutoff model:
|
||||
|
||||
|
||||
@@ -8,6 +8,7 @@ This index tracks CBDDC major functionality. Each feature has one canonical docu
|
||||
- [Peer-to-Peer Gossip Sync](peer-to-peer-gossip-sync.md)
|
||||
- [Secure Peer Transport](secure-peer-transport.md)
|
||||
- [Peer-Confirmed Pruning](peer-confirmed-pruning.md)
|
||||
- [Multi-Dataset Sync](multi-dataset-sync.md)
|
||||
|
||||
## Maintenance Rules
|
||||
|
||||
|
||||
67
docs/features/multi-dataset-sync.md
Normal file
67
docs/features/multi-dataset-sync.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Multi-Dataset Sync
|
||||
|
||||
## Summary
|
||||
|
||||
CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a `datasetId` (for example `primary`, `logs`, `timeseries`).
|
||||
Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.
|
||||
|
||||
## Why Use It
|
||||
|
||||
- Keep primary business data sync latency stable during high telemetry volume.
|
||||
- Isolate append-only streams (`logs`, `timeseries`) from CRUD-heavy collections.
|
||||
- Roll out incrementally using runtime flags and per-dataset enablement.
|
||||
|
||||
## Configuration
|
||||
|
||||
Register dataset options and enable the runtime coordinator:
|
||||
|
||||
```csharp
|
||||
services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
|
||||
.AddCBDDCSurrealEmbeddedDataset("primary", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Users", "TodoLists"];
|
||||
})
|
||||
.AddCBDDCSurrealEmbeddedDataset("logs", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Logs"];
|
||||
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
|
||||
})
|
||||
.AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Timeseries"];
|
||||
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
|
||||
})
|
||||
.AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();
|
||||
|
||||
services.AddCBDDCMultiDataset(options =>
|
||||
{
|
||||
options.EnableMultiDatasetSync = true;
|
||||
options.EnableDatasetPrimary = true;
|
||||
options.EnableDatasetLogs = true;
|
||||
options.EnableDatasetTimeseries = true;
|
||||
});
|
||||
```
|
||||
|
||||
## Wire and Storage Compatibility
|
||||
|
||||
- Protocol messages include optional `dataset_id` fields.
|
||||
- Missing `dataset_id` is treated as `primary`.
|
||||
- Surreal persistence records include `datasetId`; legacy rows without `datasetId` are read as `primary`.
|
||||
|
||||
## Operational Notes
|
||||
|
||||
- Each dataset runs its own `SyncOrchestrator` instance.
|
||||
- Maintenance pruning is dataset-scoped (`datasetId` + cutoff).
|
||||
- Snapshot APIs support dataset-scoped operations (`CreateSnapshotAsync(stream, datasetId)`).
|
||||
|
||||
## Migration
|
||||
|
||||
1. Deploy with `EnableMultiDatasetSync = false`.
|
||||
2. Enable multi-dataset mode with only `primary` enabled.
|
||||
3. Enable `logs`, verify primary sync SLO.
|
||||
4. Enable `timeseries`, verify primary sync SLO again.
|
||||
|
||||
## Rollback
|
||||
|
||||
- Set `EnableDatasetLogs = false` and `EnableDatasetTimeseries = false` first.
|
||||
- If needed, set `EnableMultiDatasetSync = false` to return to the single `primary` sync path.
|
||||
@@ -221,6 +221,14 @@ services.AddCBDDCCore()
|
||||
});
|
||||
```
|
||||
|
||||
### Multi-Dataset Partitioning
|
||||
|
||||
Surreal persistence now stores `datasetId` on oplog, metadata, snapshot metadata, confirmation, and CDC checkpoint records.
|
||||
|
||||
- Composite indexes include `datasetId` to prevent cross-dataset reads.
|
||||
- Legacy rows missing `datasetId` are interpreted as `primary` during reads.
|
||||
- Dataset-scoped store APIs (`ExportAsync(datasetId)`, `GetOplogAfterAsync(..., datasetId, ...)`) enforce isolation.
|
||||
|
||||
### CDC Durability Notes
|
||||
|
||||
1. **Checkpoint semantics**: each consumer id has an independent durable cursor (`timestamp + hash`).
|
||||
|
||||
@@ -27,6 +27,15 @@ Capture these artifacts before remediation:
|
||||
- Current runtime configuration (excluding secrets).
|
||||
- Most recent deployment identifier and change window.
|
||||
|
||||
## Multi-Dataset Gates
|
||||
|
||||
Before enabling telemetry datasets in production:
|
||||
|
||||
1. Enable `primary` only and record baseline primary sync lag.
|
||||
2. Enable `logs`; confirm primary lag remains within SLO.
|
||||
3. Enable `timeseries`; confirm primary lag remains within SLO.
|
||||
4. If primary SLO regresses, disable telemetry datasets first before broader rollback.
|
||||
|
||||
## Recovery Plays
|
||||
|
||||
### Peer unreachable or lagging
|
||||
|
||||
Reference in New Issue
Block a user