CBDDC/docs/features/multi-dataset-sync.md

# Multi-Dataset Sync

## Summary

CBDDC can run multiple sync pipelines inside one process by assigning each pipeline a `datasetId` (for example `primary`, `logs`, `timeseries`).
Each dataset pipeline has independent oplog state, vector-clock reads, peer confirmation watermarks, and maintenance scheduling.

## Why Use It

- Keep primary business data sync latency stable during high telemetry volume.
- Isolate append-only streams (`logs`, `timeseries`) from CRUD-heavy collections.
- Roll out incrementally using runtime flags and per-dataset enablement.

## Configuration

Register dataset options and enable the runtime coordinator:

```csharp
services.AddCBDDCSurrealEmbedded<SampleDocumentStore>(sp => options)
    .AddCBDDCSurrealEmbeddedDataset("primary", o =>
    {
        o.InterestingCollections = ["Users", "TodoLists"];
    })
    .AddCBDDCSurrealEmbeddedDataset("logs", o =>
    {
        o.InterestingCollections = ["Logs"];
        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
    })
    .AddCBDDCSurrealEmbeddedDataset("timeseries", o =>
    {
        o.InterestingCollections = ["Timeseries"];
        o.SyncLoopDelay = TimeSpan.FromMilliseconds(500);
    })
    .AddCBDDCNetwork<StaticPeerNodeConfigurationProvider>();

services.AddCBDDCMultiDataset(options =>
{
    options.EnableMultiDatasetSync = true;
    options.EnableDatasetPrimary = true;
    options.EnableDatasetLogs = true;
    options.EnableDatasetTimeseries = true;
});
```

## Wire and Storage Compatibility

- Protocol messages include optional `dataset_id` fields.
- Missing `dataset_id` is treated as `primary`.
- Surreal persistence records include `datasetId`; legacy rows without `datasetId` are read as `primary`.

## Operational Notes

- Each dataset runs its own `SyncOrchestrator` instance.
- Maintenance pruning is dataset-scoped (`datasetId` + cutoff).
- Snapshot APIs support dataset-scoped operations (`CreateSnapshotAsync(stream, datasetId)`).

## Migration

1. Deploy with `EnableMultiDatasetSync = false`.
2. Enable multi-dataset mode with only `primary` enabled.
3. Enable `logs`, verify primary sync SLO.
4. Enable `timeseries`, verify primary sync SLO again.

## Rollback

- Set `EnableDatasetLogs = false` and `EnableDatasetTimeseries = false` first.
- If needed, set `EnableMultiDatasetSync = false` to return to the single `primary` sync path.