Files
CBDDC/docs/features/multi-dataset-sync.md
Joseph Doherty 8e97061ab8
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m14s
Implement in-process multi-dataset sync isolation across core, network, persistence, and tests
2026-02-22 11:58:34 -05:00

2.3 KiB

Multi-Dataset Sync

Summary

CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a datasetId (for example primary, logs, timeseries). Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.

Why Use It

  • Keep primary business data sync latency stable during high telemetry volume.
  • Isolate append-only streams (logs, timeseries) from CRUD-heavy collections.
  • Roll out incrementally using runtime flags and per-dataset enablement.

Configuration

Register dataset options and enable the runtime coordinator:

services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
    .AddCBDDCSurrealEmbeddedDataset("primary", o =>
    {
        o.InterestingCollections = ["Users", "TodoLists"];
    })
    .AddCBDDCSurrealEmbeddedDataset("logs", o =>
    {
        o.InterestingCollections = ["Logs"];
        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
    })
    .AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
    {
        o.InterestingCollections = ["Timeseries"];
        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
    })
    .AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();

services.AddCBDDCMultiDataset(options =>
{
    options.EnableMultiDatasetSync = true;
    options.EnableDatasetPrimary = true;
    options.EnableDatasetLogs = true;
    options.EnableDatasetTimeseries = true;
});

Wire and Storage Compatibility

  • Protocol messages include optional dataset_id fields.
  • Missing dataset_id is treated as primary.
  • Surreal persistence records include datasetId; legacy rows without datasetId are read as primary.

Operational Notes

  • Each dataset runs its own SyncOrchestrator instance.
  • Maintenance pruning is dataset-scoped (datasetId + cutoff).
  • Snapshot APIs support dataset-scoped operations (CreateSnapshotAsync(stream, datasetId)).

Migration

  1. Deploy with EnableMultiDatasetSync = false.
  2. Enable multi-dataset mode with only primary enabled.
  3. Enable logs, verify primary sync SLO.
  4. Enable timeseries, verify primary sync SLO again.

Rollback

  • Set EnableDatasetLogs = false and EnableDatasetTimeseries = false first.
  • If needed, set EnableMultiDatasetSync = false to return to the single primary sync path.