Implement in-process multi-dataset sync isolation across core, network, persistence, and tests
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
This commit is contained in:
67
docs/features/multi-dataset-sync.md
Normal file
67
docs/features/multi-dataset-sync.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Multi-Dataset Sync
|
||||
|
||||
## Summary
|
||||
|
||||
CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a `datasetId` (for example `primary`, `logs`, `timeseries`).
|
||||
Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.
|
||||
|
||||
## Why Use It
|
||||
|
||||
- Keep primary business data sync latency stable during high telemetry volume.
|
||||
- Isolate append-only streams (`logs`, `timeseries`) from CRUD-heavy collections.
|
||||
- Roll out incrementally using runtime flags and per-dataset enablement.
|
||||
|
||||
## Configuration
|
||||
|
||||
Register dataset options and enable the runtime coordinator:
|
||||
|
||||
```csharp
|
||||
services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
|
||||
.AddCBDDCSurrealEmbeddedDataset("primary", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Users", "TodoLists"];
|
||||
})
|
||||
.AddCBDDCSurrealEmbeddedDataset("logs", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Logs"];
|
||||
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
|
||||
})
|
||||
.AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
|
||||
{
|
||||
o.InterestingCollections = ["Timeseries"];
|
||||
o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
|
||||
})
|
||||
.AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();
|
||||
|
||||
services.AddCBDDCMultiDataset(options =>
|
||||
{
|
||||
options.EnableMultiDatasetSync = true;
|
||||
options.EnableDatasetPrimary = true;
|
||||
options.EnableDatasetLogs = true;
|
||||
options.EnableDatasetTimeseries = true;
|
||||
});
|
||||
```
|
||||
|
||||
## Wire and Storage Compatibility
|
||||
|
||||
- Protocol messages include optional `dataset_id` fields.
|
||||
- Missing `dataset_id` is treated as `primary`.
|
||||
- Surreal persistence records include `datasetId`; legacy rows without `datasetId` are read as `primary`.
|
||||
|
||||
## Operational Notes
|
||||
|
||||
- Each dataset runs its own `SyncOrchestrator` instance.
|
||||
- Maintenance pruning is dataset-scoped (`datasetId` + cutoff).
|
||||
- Snapshot APIs support dataset-scoped operations (`CreateSnapshotAsync(stream, datasetId)`).
|
||||
|
||||
## Migration
|
||||
|
||||
1. Deploy with `EnableMultiDatasetSync = false`.
|
||||
2. Enable multi-dataset mode with only `primary` enabled.
|
||||
3. Enable `logs`, verify primary sync SLO.
|
||||
4. Enable `timeseries`, verify primary sync SLO again.
|
||||
|
||||
## Rollback
|
||||
|
||||
- Set `EnableDatasetLogs = false` and `EnableDatasetTimeseries = false` first.
|
||||
- If needed, set `EnableMultiDatasetSync = false` to return to the single `primary` sync path.
|
||||
Reference in New Issue
Block a user