# In-Process Multi-Dataset Sync Plan (Worktree Execution) ## Goal Add true in-process multi-dataset sync so primary business data can sync independently from high-volume append-only datasets (logs, timeseries), with separate state, scheduling, and backpressure behavior. ## Desired Outcome 1. Primary dataset sync throughput/latency is not materially impacted by telemetry dataset volume. 2. Log and timeseries datasets use independent sync pipelines in the same process. 3. Existing single-dataset apps continue to work with minimal/no code changes. 4. Test coverage explicitly verifies isolation and no cross-dataset leakage. ## Current Baseline (Why This Change Is Needed) 1. Current host wiring registers a single `IDocumentStore`, `IOplogStore`, and `ISyncOrchestrator` graph. 2. Collection filtering exists, but all collections still share one orchestrator/sync loop and one oplog/vector clock lifecycle. 3. Protocol filters by collection only; there is no dataset identity boundary. 4. Surreal schema objects are fixed names per configured namespace/database and are not dataset-aware by design. ## Proposed Target Architecture ## New Concepts 1. `DatasetId`: - Stable identifier (`primary`, `logs`, `timeseries`, etc). - Included in all sync-state-bearing entities and wire messages. 2. `DatasetSyncContext`: - Encapsulates one dataset's services: document store adapter, oplog store, snapshot metadata, peer confirmation state, orchestrator configuration. 3. `IMultiDatasetSyncOrchestrator`: - Host-level coordinator that starts/stops one `ISyncOrchestrator` per dataset. 4. `DatasetSyncOptions`: - Per-dataset scheduling and limits (loop delay, max peers, optional bandwidth/entry caps, maintenance interval override). ## Isolation Model 1. Independent per-dataset oplog stream and vector clock. 2. Independent per-dataset peer confirmation watermarks for pruning. 3. Independent per-dataset transport filtering (handshake and pull/push include dataset id). 4. Independent per-dataset observability counters. ## Compatibility Strategy 1. Backward compatible wire changes: - Add optional `dataset_id` fields; default to `"primary"` when absent. 2. Backward compatible storage: - Add `datasetId` columns/fields where needed. - Existing rows default to `"primary"` during migration/read fallback. 3. API defaults: - Existing single-store registration maps to dataset `"primary"` with no functional change. ## Git Worktree Execution Plan ## 0. Worktree Preparation 1. Create worktree and branch: - `git worktree add ../CBDDC-multidataset -b codex/multidataset-sync` 2. Build baseline in worktree: - `dotnet build CBDDC.slnx` 3. Capture baseline tests (save output artifact in worktree): - `dotnet test CBDDC.slnx` Deliverable: 1. Clean baseline build/test result captured before changes. ## 1. Design and Contract Layer ### Code Changes 1. Add dataset contracts in `src/ZB.MOM.WW.CBDDC.Core`: - `DatasetId` value object or constants. - `DatasetSyncOptions`. - `IDatasetSyncContext`/`IMultiDatasetSyncOrchestrator`. 2. Extend domain models where sync identity is required: - `OplogEntry` add `DatasetId` (constructor defaults to `"primary"`). - Any metadata types used for causal state/pruning that need dataset partitioning. 3. Extend store interfaces (minimally invasive): - Keep existing methods as compatibility overloads. - Add dataset-aware variants where cross-dataset ambiguity exists. ### Test Work 1. Add Core unit tests: - `OplogEntry` hash stability with `DatasetId`. - Defaulting behavior to `"primary"`. - Equality/serialization behavior for dataset-aware records. 2. Update existing Core tests that construct `OplogEntry` directly. Exit Criteria: 1. Core tests compile and pass with default dataset behavior unchanged. ## 2. Persistence Partitioning (Surreal) ### Code Changes 1. Add dataset partition key to persistence records: - Oplog rows. - Document metadata rows. - Snapshot metadata rows (if used in dataset-scoped recoveries). - Peer confirmation records. - CDC checkpoints (consumer id should include dataset id or add dedicated field). 2. Update schema initializer: - Add `datasetId` fields and composite indexes (`datasetId + existing key dimensions`). 3. Update queries in all Surreal stores: - Enforce dataset filter in every select/update/delete path. - Guard against full-table scans that omit dataset filter. 4. Add migration/read fallback: - If `datasetId` missing on older records, treat as `"primary"` during transitional reads. ### Test Work 1. Extend `SurrealStoreContractTests`: - Write records in two datasets and verify strict isolation. - Verify prune/merge/export/import scoped by dataset. 2. Add regression tests: - Legacy records without `datasetId` load as `"primary"` only. 3. Update durability tests: - CDC checkpoints do not collide between datasets. Exit Criteria: 1. Persistence tests prove no cross-dataset reads/writes. ## 3. Network Protocol Dataset Awareness ### Code Changes 1. Update `sync.proto` (backward compatible): - Add `dataset_id` to `HandshakeRequest`, `HandshakeResponse`, `PullChangesRequest`, `PushChangesRequest`, and optionally snapshot requests. 2. Regenerate protocol classes and adapt transport handlers: - `TcpPeerClient` sends dataset id for every dataset pipeline. - `TcpSyncServer` routes requests to correct dataset context. 3. Defaulting rules: - Missing/empty `dataset_id` => `"primary"`. 4. Add explicit rejection semantics: - If remote peer does not support requested dataset, return accepted handshake but with dataset capability mismatch response path (or reject per dataset connection). ### Test Work 1. Add protocol-level unit tests: - Message parse/serialize with and without dataset field. 2. Update network tests: - Handshake stores remote interests per dataset. - Pull/push operations do not cross datasets. - Backward compatibility with no dataset id present. Exit Criteria: 1. Network tests pass for both new and legacy message shapes. ## 4. Multi-Orchestrator Runtime and DI ### Code Changes 1. Add multi-dataset DI registration extensions: - `AddCBDDCSurrealEmbeddedDataset(...)` - `AddCBDDCMultiDataset(...)` 2. Build `MultiDatasetSyncOrchestrator`: - Start/stop orchestrators for configured datasets. - Isolated cancellation tokens, loops, and failure handling per dataset. 3. Ensure hosting services (`CBDDCNodeService`, `TcpSyncServerHostedService`) initialize dataset contexts deterministically. 4. Add per-dataset knobs: - Sync interval, max entries per cycle, maintenance interval, optional parallelism limits. ### Test Work 1. Add Hosting tests: - Multiple datasets register/start/stop cleanly. - Failure in one dataset does not stop others. 2. Add orchestrator tests: - Scheduling fairness and per-dataset failure backoff isolation. 3. Update `NoOp`/fallback tests for multi-dataset mode. Exit Criteria: 1. Runtime starts N dataset pipelines with independent lifecycle behavior. ## 5. Snapshot and Recovery Semantics ### Code Changes 1. Define snapshot scope options: - Per-dataset snapshot and full multi-dataset snapshot. 2. Update snapshot service APIs and implementations to support: - Export/import/merge by dataset id. 3. Ensure emergency recovery paths in orchestrator are dataset-scoped. ### Test Work 1. Add snapshot tests: - Replace/merge for one dataset leaves others untouched. 2. Update reconnect regression tests: - Snapshot-required flow only affects targeted dataset pipeline. Exit Criteria: 1. Recovery operations preserve dataset isolation. ## 6. Sample App and Developer Experience ### Code Changes 1. Add sample configuration for three datasets: - `primary`, `logs`, `timeseries`. 2. Implement append-only sample stores for `logs` and `timeseries`. 3. Expose sample CLI commands to emit load independently per dataset. ### Test Work 1. Add sample integration tests: - Heavy append load on logs/timeseries does not significantly delay primary data convergence. 2. Add benchmark harness cases: - Single-dataset baseline vs multi-dataset under telemetry load. Exit Criteria: 1. Demonstrable isolation in sample workload. ## 7. Documentation and Migration Guides ### Code/Docs Changes 1. New doc: `docs/features/multi-dataset-sync.md`. 2. Update: - `docs/architecture.md` - `docs/persistence-providers.md` - `docs/runbook.md` 3. Add migration notes: - From single pipeline to multi-dataset configuration. - Backward compatibility and rollout toggles. ### Test Work 1. Doc examples compile check (if applicable). 2. Add config parsing tests for dataset option sections. Exit Criteria: 1. Operators have explicit rollout and rollback steps. ## 8. Rollout Strategy (Safe Adoption) 1. Feature flags: - `EnableMultiDatasetSync` (global). - `EnableDatasetPrimary/Logs/Timeseries`. 2. Rollout sequence: - Stage 1: Deploy with flag off. - Stage 2: Enable `primary` only in new runtime path. - Stage 3: Enable `logs`, then `timeseries`. 3. Observability gates: - Primary sync latency SLO must remain within threshold before enabling telemetry datasets. ## 9. Test Plan (Comprehensive Coverage Matrix) ## Unit Tests 1. Core model defaults and hash behavior with dataset id. 2. Dataset routing logic in orchestrator dispatcher. 3. Protocol adapters default `dataset_id` to `"primary"` when absent. 4. Persistence query builders always include dataset predicate. ## Integration Tests 1. Surreal stores: - Same key/collection in different datasets remains isolated. 2. Network: - Pull/push with mixed datasets never cross-stream. 3. Hosting: - Independent orchestrator lifecycle and failure isolation. ## E2E Tests 1. Multi-node cluster: - Primary converges under heavy append-only telemetry load. 2. Snapshot/recovery: - Dataset-scoped restore preserves other datasets. 3. Backward compatibility: - Legacy node (no dataset id) interoperates on `"primary"`. ## Non-Functional Tests 1. Throughput and latency benchmarks: - Compare primary p95 sync lag before/after. 2. Resource isolation: - CPU/memory pressure from telemetry datasets should not break primary SLO. ## Test Update Checklist (Existing Tests to Modify) 1. `tests/ZB.MOM.WW.CBDDC.Core.Tests`: - Update direct `OplogEntry` constructions. 2. `tests/ZB.MOM.WW.CBDDC.Network.Tests`: - Handshake/connection/vector-clock tests for dataset-aware flows. 3. `tests/ZB.MOM.WW.CBDDC.Hosting.Tests`: - Add multi-dataset startup/shutdown/failure cases. 4. `tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests`: - Extend Surreal contract and durability tests for dataset partitioning. 5. `tests/ZB.MOM.WW.CBDDC.E2E.Tests`: - Add multi-dataset convergence + interference tests. ## Worktree Task Breakdown (Execution Order) 1. `Phase-A`: Contracts + Core model updates + unit tests. 2. `Phase-B`: Surreal schema/store partitioning + persistence tests. 3. `Phase-C`: Protocol and network routing + network tests. 4. `Phase-D`: Multi-orchestrator DI/runtime + hosting tests. 5. `Phase-E`: Snapshot/recovery updates + regression tests. 6. `Phase-F`: Sample/bench/docs + end-to-end verification. Each phase should be committed separately in the worktree to keep reviewable deltas. ## Validation Commands (Run in Worktree) 1. `dotnet build /Users/dohertj2/Desktop/CBDDC/CBDDC.slnx` 2. `dotnet test /Users/dohertj2/Desktop/CBDDC/CBDDC.slnx` 3. Focused suites during implementation: - `dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Core.Tests/ZB.MOM.WW.CBDDC.Core.Tests.csproj` - `dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/ZB.MOM.WW.CBDDC.Network.Tests.csproj` - `dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Hosting.Tests/ZB.MOM.WW.CBDDC.Hosting.Tests.csproj` - `dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests.csproj` - `dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ZB.MOM.WW.CBDDC.E2E.Tests.csproj` ## Definition of Done 1. Multi-dataset mode runs `primary`, `logs`, and `timeseries` in one process with independent sync paths. 2. No cross-dataset data movement in persistence, protocol, or runtime. 3. Single-dataset existing usage still works via default `"primary"` dataset. 4. Added/updated unit, integration, and E2E tests pass in CI. 5. Docs include migration and operational guidance.