Implement in-process multi-dataset sync isolation across core, network, persistence, and tests

2026-02-22 11:58:34 -05:00
parent c06b56172a
commit 8e97061ab8
60 changed files with 4519 additions and 559 deletions
--- a/docs/features/multi-dataset-sync.md
+++ b/docs/features/multi-dataset-sync.md
@@ -0,0 +1,67 @@
+# Multi-Dataset Sync
+
+## Summary
+
+CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a `datasetId` (for example `primary`, `logs`, `timeseries`).
+Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.
+
+## Why Use It
+
+- Keep primary business data sync latency stable during high telemetry volume.
+- Isolate append-only streams (`logs`, `timeseries`) from CRUD-heavy collections.
+- Roll out incrementally using runtime flags and per-dataset enablement.
+
+## Configuration
+
+Register dataset options and enable the runtime coordinator:
+
+```csharp
+services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
+    .AddCBDDCSurrealEmbeddedDataset("primary", o =>
+    {
+        o.InterestingCollections = ["Users", "TodoLists"];
+    })
+    .AddCBDDCSurrealEmbeddedDataset("logs", o =>
+    {
+        o.InterestingCollections = ["Logs"];
+        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
+    })
+    .AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
+    {
+        o.InterestingCollections = ["Timeseries"];
+        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
+    })
+    .AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();
+
+services.AddCBDDCMultiDataset(options =>
+{
+    options.EnableMultiDatasetSync = true;
+    options.EnableDatasetPrimary = true;
+    options.EnableDatasetLogs = true;
+    options.EnableDatasetTimeseries = true;
+});
+```
+
+## Wire and Storage Compatibility
+
+- Protocol messages include optional `dataset_id` fields.
+- Missing `dataset_id` is treated as `primary`.
+- Surreal persistence records include `datasetId`; legacy rows without `datasetId` are read as `primary`.
+
+## Operational Notes
+
+- Each dataset runs its own `SyncOrchestrator` instance.
+- Maintenance pruning is dataset-scoped (`datasetId` + cutoff).
+- Snapshot APIs support dataset-scoped operations (`CreateSnapshotAsync(stream, datasetId)`).
+
+## Migration
+
+1. Deploy with `EnableMultiDatasetSync = false`.
+2. Enable multi-dataset mode with only `primary` enabled.
+3. Enable `logs`, verify primary sync SLO.
+4. Enable `timeseries`, verify primary sync SLO again.
+
+## Rollback
+
+- Set `EnableDatasetLogs = false` and `EnableDatasetTimeseries = false` first.
+- If needed, set `EnableMultiDatasetSync = false` to return to the single `primary` sync path.