Files
CBDDC/docs/features/multi-dataset-sync.md
Joseph Doherty 8e97061ab8
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
Implement in-process multi-dataset sync isolation across core, network, persistence, and tests
2026-02-22 11:58:34 -05:00

68 lines
2.3 KiB
Markdown

# Multi-Dataset Sync
## Summary
CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a `datasetId` (for example `primary`, `logs`, `timeseries`).
Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.
## Why Use It
- Keep primary business data sync latency stable during high telemetry volume.
- Isolate append-only streams (`logs`, `timeseries`) from CRUD-heavy collections.
- Roll out incrementally using runtime flags and per-dataset enablement.
## Configuration
Register dataset options and enable the runtime coordinator:
```csharp
services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
.AddCBDDCSurrealEmbeddedDataset("primary", o =>
{
o.InterestingCollections = ["Users", "TodoLists"];
})
.AddCBDDCSurrealEmbeddedDataset("logs", o =>
{
o.InterestingCollections = ["Logs"];
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
})
.AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
{
o.InterestingCollections = ["Timeseries"];
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
})
.AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();
services.AddCBDDCMultiDataset(options =>
{
options.EnableMultiDatasetSync = true;
options.EnableDatasetPrimary = true;
options.EnableDatasetLogs = true;
options.EnableDatasetTimeseries = true;
});
```
## Wire and Storage Compatibility
- Protocol messages include optional `dataset_id` fields.
- Missing `dataset_id` is treated as `primary`.
- Surreal persistence records include `datasetId`; legacy rows without `datasetId` are read as `primary`.
## Operational Notes
- Each dataset runs its own `SyncOrchestrator` instance.
- Maintenance pruning is dataset-scoped (`datasetId` + cutoff).
- Snapshot APIs support dataset-scoped operations (`CreateSnapshotAsync(stream, datasetId)`).
## Migration
1. Deploy with `EnableMultiDatasetSync = false`.
2. Enable multi-dataset mode with only `primary` enabled.
3. Enable `logs`, verify primary sync SLO.
4. Enable `timeseries`, verify primary sync SLO again.
## Rollback
- Set `EnableDatasetLogs = false` and `EnableDatasetTimeseries = false` first.
- If needed, set `EnableMultiDatasetSync = false` to return to the single `primary` sync path.