12 KiB
12 KiB
In-Process Multi-Dataset Sync Plan (Worktree Execution)
Goal
Add true in-process multi-dataset sync so primary business data can sync independently from high-volume append-only datasets (logs, timeseries), with separate state, scheduling, and backpressure behavior.
Desired Outcome
- Primary dataset sync throughput/latency is not materially impacted by telemetry dataset volume.
- Log and timeseries datasets use independent sync pipelines in the same process.
- Existing single-dataset apps continue to work with minimal/no code changes.
- Test coverage explicitly verifies isolation and no cross-dataset leakage.
Current Baseline (Why This Change Is Needed)
- Current host wiring registers a single
IDocumentStore,IOplogStore, andISyncOrchestratorgraph. - Collection filtering exists, but all collections still share one orchestrator/sync loop and one oplog/vector clock lifecycle.
- Protocol filters by collection only; there is no dataset identity boundary.
- Surreal schema objects are fixed names per configured namespace/database and are not dataset-aware by design.
Proposed Target Architecture
New Concepts
DatasetId:- Stable identifier (
primary,logs,timeseries, etc). - Included in all sync-state-bearing entities and wire messages.
- Stable identifier (
DatasetSyncContext:- Encapsulates one dataset's services: document store adapter, oplog store, snapshot metadata, peer confirmation state, orchestrator configuration.
IMultiDatasetSyncOrchestrator:- Host-level coordinator that starts/stops one
ISyncOrchestratorper dataset.
- Host-level coordinator that starts/stops one
DatasetSyncOptions:- Per-dataset scheduling and limits (loop delay, max peers, optional bandwidth/entry caps, maintenance interval override).
Isolation Model
- Independent per-dataset oplog stream and vector clock.
- Independent per-dataset peer confirmation watermarks for pruning.
- Independent per-dataset transport filtering (handshake and pull/push include dataset id).
- Independent per-dataset observability counters.
Compatibility Strategy
- Backward compatible wire changes:
- Add optional
dataset_idfields; default to"primary"when absent.
- Add optional
- Backward compatible storage:
- Add
datasetIdcolumns/fields where needed. - Existing rows default to
"primary"during migration/read fallback.
- Add
- API defaults:
- Existing single-store registration maps to dataset
"primary"with no functional change.
- Existing single-store registration maps to dataset
Git Worktree Execution Plan
0. Worktree Preparation
- Create worktree and branch:
git worktree add ../CBDDC-multidataset -b codex/multidataset-sync
- Build baseline in worktree:
dotnet build CBDDC.slnx
- Capture baseline tests (save output artifact in worktree):
dotnet test CBDDC.slnx
Deliverable:
- Clean baseline build/test result captured before changes.
1. Design and Contract Layer
Code Changes
- Add dataset contracts in
src/ZB.MOM.WW.CBDDC.Core:DatasetIdvalue object or constants.DatasetSyncOptions.IDatasetSyncContext/IMultiDatasetSyncOrchestrator.
- Extend domain models where sync identity is required:
OplogEntryaddDatasetId(constructor defaults to"primary").- Any metadata types used for causal state/pruning that need dataset partitioning.
- Extend store interfaces (minimally invasive):
- Keep existing methods as compatibility overloads.
- Add dataset-aware variants where cross-dataset ambiguity exists.
Test Work
- Add Core unit tests:
OplogEntryhash stability withDatasetId.- Defaulting behavior to
"primary". - Equality/serialization behavior for dataset-aware records.
- Update existing Core tests that construct
OplogEntrydirectly.
Exit Criteria:
- Core tests compile and pass with default dataset behavior unchanged.
2. Persistence Partitioning (Surreal)
Code Changes
- Add dataset partition key to persistence records:
- Oplog rows.
- Document metadata rows.
- Snapshot metadata rows (if used in dataset-scoped recoveries).
- Peer confirmation records.
- CDC checkpoints (consumer id should include dataset id or add dedicated field).
- Update schema initializer:
- Add
datasetIdfields and composite indexes (datasetId + existing key dimensions).
- Add
- Update queries in all Surreal stores:
- Enforce dataset filter in every select/update/delete path.
- Guard against full-table scans that omit dataset filter.
- Add migration/read fallback:
- If
datasetIdmissing on older records, treat as"primary"during transitional reads.
- If
Test Work
- Extend
SurrealStoreContractTests:- Write records in two datasets and verify strict isolation.
- Verify prune/merge/export/import scoped by dataset.
- Add regression tests:
- Legacy records without
datasetIdload as"primary"only.
- Legacy records without
- Update durability tests:
- CDC checkpoints do not collide between datasets.
Exit Criteria:
- Persistence tests prove no cross-dataset reads/writes.
3. Network Protocol Dataset Awareness
Code Changes
- Update
sync.proto(backward compatible):- Add
dataset_idtoHandshakeRequest,HandshakeResponse,PullChangesRequest,PushChangesRequest, and optionally snapshot requests.
- Add
- Regenerate protocol classes and adapt transport handlers:
TcpPeerClientsends dataset id for every dataset pipeline.TcpSyncServerroutes requests to correct dataset context.
- Defaulting rules:
- Missing/empty
dataset_id=>"primary".
- Missing/empty
- Add explicit rejection semantics:
- If remote peer does not support requested dataset, return accepted handshake but with dataset capability mismatch response path (or reject per dataset connection).
Test Work
- Add protocol-level unit tests:
- Message parse/serialize with and without dataset field.
- Update network tests:
- Handshake stores remote interests per dataset.
- Pull/push operations do not cross datasets.
- Backward compatibility with no dataset id present.
Exit Criteria:
- Network tests pass for both new and legacy message shapes.
4. Multi-Orchestrator Runtime and DI
Code Changes
- Add multi-dataset DI registration extensions:
AddCBDDCSurrealEmbeddedDataset(...)AddCBDDCMultiDataset(...)
- Build
MultiDatasetSyncOrchestrator:- Start/stop orchestrators for configured datasets.
- Isolated cancellation tokens, loops, and failure handling per dataset.
- Ensure hosting services (
CBDDCNodeService,TcpSyncServerHostedService) initialize dataset contexts deterministically. - Add per-dataset knobs:
- Sync interval, max entries per cycle, maintenance interval, optional parallelism limits.
Test Work
- Add Hosting tests:
- Multiple datasets register/start/stop cleanly.
- Failure in one dataset does not stop others.
- Add orchestrator tests:
- Scheduling fairness and per-dataset failure backoff isolation.
- Update
NoOp/fallback tests for multi-dataset mode.
Exit Criteria:
- Runtime starts N dataset pipelines with independent lifecycle behavior.
5. Snapshot and Recovery Semantics
Code Changes
- Define snapshot scope options:
- Per-dataset snapshot and full multi-dataset snapshot.
- Update snapshot service APIs and implementations to support:
- Export/import/merge by dataset id.
- Ensure emergency recovery paths in orchestrator are dataset-scoped.
Test Work
- Add snapshot tests:
- Replace/merge for one dataset leaves others untouched.
- Update reconnect regression tests:
- Snapshot-required flow only affects targeted dataset pipeline.
Exit Criteria:
- Recovery operations preserve dataset isolation.
6. Sample App and Developer Experience
Code Changes
- Add sample configuration for three datasets:
primary,logs,timeseries.
- Implement append-only sample stores for
logsandtimeseries. - Expose sample CLI commands to emit load independently per dataset.
Test Work
- Add sample integration tests:
- Heavy append load on logs/timeseries does not significantly delay primary data convergence.
- Add benchmark harness cases:
- Single-dataset baseline vs multi-dataset under telemetry load.
Exit Criteria:
- Demonstrable isolation in sample workload.
7. Documentation and Migration Guides
Code/Docs Changes
- New doc:
docs/features/multi-dataset-sync.md. - Update:
docs/architecture.mddocs/persistence-providers.mddocs/runbook.md
- Add migration notes:
- From single pipeline to multi-dataset configuration.
- Backward compatibility and rollout toggles.
Test Work
- Doc examples compile check (if applicable).
- Add config parsing tests for dataset option sections.
Exit Criteria:
- Operators have explicit rollout and rollback steps.
8. Rollout Strategy (Safe Adoption)
- Feature flags:
EnableMultiDatasetSync(global).EnableDatasetPrimary/Logs/Timeseries.
- Rollout sequence:
- Stage 1: Deploy with flag off.
- Stage 2: Enable
primaryonly in new runtime path. - Stage 3: Enable
logs, thentimeseries.
- Observability gates:
- Primary sync latency SLO must remain within threshold before enabling telemetry datasets.
9. Test Plan (Comprehensive Coverage Matrix)
Unit Tests
- Core model defaults and hash behavior with dataset id.
- Dataset routing logic in orchestrator dispatcher.
- Protocol adapters default
dataset_idto"primary"when absent. - Persistence query builders always include dataset predicate.
Integration Tests
- Surreal stores:
- Same key/collection in different datasets remains isolated.
- Network:
- Pull/push with mixed datasets never cross-stream.
- Hosting:
- Independent orchestrator lifecycle and failure isolation.
E2E Tests
- Multi-node cluster:
- Primary converges under heavy append-only telemetry load.
- Snapshot/recovery:
- Dataset-scoped restore preserves other datasets.
- Backward compatibility:
- Legacy node (no dataset id) interoperates on
"primary".
- Legacy node (no dataset id) interoperates on
Non-Functional Tests
- Throughput and latency benchmarks:
- Compare primary p95 sync lag before/after.
- Resource isolation:
- CPU/memory pressure from telemetry datasets should not break primary SLO.
Test Update Checklist (Existing Tests to Modify)
tests/ZB.MOM.WW.CBDDC.Core.Tests:- Update direct
OplogEntryconstructions.
- Update direct
tests/ZB.MOM.WW.CBDDC.Network.Tests:- Handshake/connection/vector-clock tests for dataset-aware flows.
tests/ZB.MOM.WW.CBDDC.Hosting.Tests:- Add multi-dataset startup/shutdown/failure cases.
tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests:- Extend Surreal contract and durability tests for dataset partitioning.
tests/ZB.MOM.WW.CBDDC.E2E.Tests:- Add multi-dataset convergence + interference tests.
Worktree Task Breakdown (Execution Order)
Phase-A: Contracts + Core model updates + unit tests.Phase-B: Surreal schema/store partitioning + persistence tests.Phase-C: Protocol and network routing + network tests.Phase-D: Multi-orchestrator DI/runtime + hosting tests.Phase-E: Snapshot/recovery updates + regression tests.Phase-F: Sample/bench/docs + end-to-end verification.
Each phase should be committed separately in the worktree to keep reviewable deltas.
Validation Commands (Run in Worktree)
dotnet build /Users/dohertj2/Desktop/CBDDC/CBDDC.slnxdotnet test /Users/dohertj2/Desktop/CBDDC/CBDDC.slnx- Focused suites during implementation:
dotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Core.Tests/ZB.MOM.WW.CBDDC.Core.Tests.csprojdotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/ZB.MOM.WW.CBDDC.Network.Tests.csprojdotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Hosting.Tests/ZB.MOM.WW.CBDDC.Hosting.Tests.csprojdotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests.csprojdotnet test /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ZB.MOM.WW.CBDDC.E2E.Tests.csproj
Definition of Done
- Multi-dataset mode runs
primary,logs, andtimeseriesin one process with independent sync paths. - No cross-dataset data movement in persistence, protocol, or runtime.
- Single-dataset existing usage still works via default
"primary"dataset. - Added/updated unit, integration, and E2E tests pass in CI.
- Docs include migration and operational guidance.