# LMDB Oplog Migration Plan ## 1. Goal Move `IOplogStore` persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency. Primary outcomes: - Keep existing `IOplogStore` contract semantics. - Make `PruneOplogAsync` efficient and safe under current timestamp-based cutoff behavior. - Keep roll-forward and rollback low risk via feature flags and verification steps. ## 2. Current Constraints That Must Be Preserved The oplog is not just a queue; the implementation must support: - append + dedupe by hash - lookup by hash - node/time range scans - chain-range reconstruction by hash linkage - prune by cutoff timestamp - per-dataset isolation Key references: - `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs` - `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs` - `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs` Important behavior notes: - Prune is cutoff-based, not pure FIFO dequeue. - Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order. - Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally. ## 3. Target Design (LMDB) ## 3.1 New Provider Create an LMDB oplog provider in persistence: - New class: `LmdbOplogStore : OplogStore` - New options class: `LmdbOplogOptions` - New DI extension, e.g. `AddCBDDCLmdbOplog(...)` Suggested options: - `EnvironmentPath` - `MapSizeBytes` - `MaxDatabases` - `SyncMode` (durability/perf) - `PruneBatchSize` - `EnableCompactionCopy` (for optional file shrink operation) ## 3.2 LMDB Data Layout Use multiple named DBIs (single environment): 1. `oplog_by_hash` Key: `{datasetId}|{hash}` Value: serialized `OplogEntry` (compact binary or UTF-8 JSON) 2. `oplog_by_hlc` Key: `{datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}` Value: empty or small marker Purpose: `GetOplogAfterAsync`, prune range scan 3. `oplog_by_node_hlc` Key: `{datasetId}|{nodeId}|{wall}|{logic}|{hash}` Value: empty or marker Purpose: `GetOplogForNodeAfterAsync`, fast node head updates 4. `oplog_prev_to_hash` (duplicate-allowed) Key: `{datasetId}|{previousHash}` Value: `{hash}` Purpose: chain traversal support for `GetChainRangeAsync` 5. `oplog_node_head` Key: `{datasetId}|{nodeId}` Value: `{wall, logic, hash}` Purpose: O(1) `GetLastEntryHashAsync` 6. `oplog_meta` Stores schema version, migration markers, and optional prune watermark per dataset. Notes: - Use deterministic byte encoding for composite keys to preserve lexical order. - Keep dataset prefix in every index key to guarantee dataset isolation. ## 3.3 Write Transaction Rules `AppendOplogEntryAsync` transaction: 1. Check dedupe in `oplog_by_hash`. 2. If absent: insert `oplog_by_hash` + all secondary indexes. 3. Update `oplog_node_head` only if incoming timestamp > current head timestamp for that node. 4. Commit once. `MergeAsync`/`ImportAsync`: - Reuse same insert routine in loops with write batching. - Dedupe strictly by hash. ## 3.4 Prune Strategy Base prune operation (must-have): 1. Cursor-scan `oplog_by_hlc` up to cutoff key for target dataset. 2. For each candidate hash: - delete from `oplog_by_hash` - delete node index key - delete prev->hash duplicate mapping - delete hlc index key 3. Recompute affected `oplog_node_head` entries lazily (on read) or eagerly for touched nodes. Efficiency enhancements (recommended): - Process deletes in batches (`PruneBatchSize`) inside bounded write txns. - Keep optional per-node dirty set during prune to limit head recomputation. - Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately). ## 3.5 Atomicity with Document Metadata Decision required (explicit in implementation review): Option A (phase 1 recommended): - Accept cross-engine eventual atomicity. - Keep current document write flow. - Add reconciliation/repair on startup: - detect metadata entries missing oplog hash for recent writes - rebuild node-head and index consistency from `oplog_by_hash`. Option B (hard mode): - Introduce durable outbox pattern to guarantee atomic handoff across engines. - Higher complexity; schedule after functional cutover. Plan uses Option A first for lower migration risk. ## 4. Phased Execution Plan ## Phase 0: Prep and Design Freeze - Add ADR documenting: - key encoding format - index schema - prune algorithm - consistency model (Option A above) - Add config model and feature flags: - `UseLmdbOplog` - `DualWriteOplog` - `PreferLmdbReads` Exit criteria: - ADR approved. - Configuration contract approved. ## Phase 1: LMDB Store Skeleton - Add package reference `LightningDB`. - Implement `LmdbOplogStore` with: - `AppendOplogEntryAsync` - `GetEntryByHashAsync` - `GetLastEntryHashAsync` - `GetOplogAfterAsync` - `GetOplogForNodeAfterAsync` - `GetChainRangeAsync` - `PruneOplogAsync` - snapshot import/export/drop/merge APIs. - Implement startup/open/close lifecycle and map-size handling. Exit criteria: - Local contract tests pass for LMDB store. ## Phase 2: Dual-Write + Read Shadow Validation - Keep Surreal oplog as source of truth. - Write every oplog mutation to both stores (`DualWriteOplog=true`). - Read-path comparison mode in non-prod: - query both stores - assert same hashes/order for key APIs - log mismatches. Exit criteria: - Zero mismatches in soak tests. ## Phase 3: Cutover - Set `PreferLmdbReads=true` in staging first. - Keep dual-write enabled for one release window. - Monitor: - prune duration - oplog query latency - mismatch counters - restart recovery behavior. Exit criteria: - Stable staging and production canary. ## Phase 4: Cleanup - Disable Surreal oplog writes. - Keep migration utility for rollback window. - Remove dual-compare instrumentation after confidence period. ## 5. Data Migration / Backfill Backfill tool steps: 1. Read dataset-scoped Surreal oplog export. 2. Bulk import into LMDB by HLC order. 3. Rebuild node-head table. 4. Validate: - counts per dataset - counts per node - latest hash per node - random hash spot checks - chain-range spot checks. Rollback: - Keep Surreal oplog untouched during dual-write window. - Flip feature flags back to Surreal reads. ## 6. Unit Test Update Instructions ## 6.1 Reuse Existing Oplog Contract Tests Use these as baseline parity requirements: - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs` - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs` (class `SurrealStoreExportImportTests`) Actions: 1. Extract oplog contract cases into shared test base (provider-agnostic). 2. Run same suite against: - Surreal store - new LMDB store Minimum parity cases: - append/query/merge/drop - dataset isolation - legacy/default dataset behavior (if supported) - `GetChainRangeAsync` correctness - `GetLastEntryHashAsync` persistence across restart ## 6.2 Add LMDB-Specific Unit Tests Create new file: - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs` Add tests for: 1. Index consistency: - inserting one entry populates all indexes - deleting/pruning removes all index records 2. Prune correctness: - removes `<= cutoff` - does not remove `> cutoff` - handles interleaved node timestamps - handles late-arriving old timestamp entry safely 3. Node-head maintenance: - head advances on newer entry - prune invalidates/recomputes correctly 4. Restart durability: - reopen LMDB env and verify last-hash + scans 5. Dedupe: - duplicate hash append is idempotent ## 6.3 Update Integration/E2E Coverage Files to touch: - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs` - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs` (or LMDB-focused equivalent) Add/adjust scenarios: 1. Gap recovery still works with LMDB oplog backend. 2. Peer-confirmed prune still blocks/allows correctly. 3. Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation. 4. Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin). ## 6.4 Keep Existing Network Unit Tests Intact Most network tests mock `IOplogStore` and should remain unchanged: - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs` - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs` - `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs` Only update if method behavior/ordering contracts are intentionally changed. ## 7. Performance and Observability Plan Track and compare (Surreal vs LMDB): - `AppendOplogEntryAsync` latency p50/p95/p99 - `GetOplogForNodeAfterAsync` latency - prune duration and entries/sec deleted - LMDB env file size and reclaimed free-page ratio - mismatch counters in dual-read compare mode Add logs/metrics: - prune batches processed - dirty nodes recomputed - startup repair actions and counts ## 8. Risks and Mitigations 1. Cross-engine consistency gaps (document metadata vs oplog) - Mitigation: startup reconciliation + dual-write shadow period. 2. Incorrect composite key encoding - Mitigation: explicit encoding helper + property tests for sort/order invariants. 3. Prune causing stale node-head values - Mitigation: touched-node tracking and lazy/eager recompute tests. 4. LMDB map-size exhaustion - Mitigation: configurable mapsize, monitoring, and operational runbook for resize. ## 9. Review Checklist - [x] ADR approved for LMDB key/index schema. - [x] Feature flags merged (`UseLmdbOplog`, `DualWriteOplog`, `PreferLmdbReads`). - [x] LMDB contract tests passing. - [x] Dual-write mismatch telemetry in place. - [x] Backfill tool implemented and validated in automated tests (staging execution ready). - [x] Prune correctness + efficiency tests passing. - [x] Rollback path documented and tested.