Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
10 KiB
LMDB Oplog Migration Plan
1. Goal
Move IOplogStore persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.
Primary outcomes:
- Keep existing
IOplogStorecontract semantics. - Make
PruneOplogAsyncefficient and safe under current timestamp-based cutoff behavior. - Keep roll-forward and rollback low risk via feature flags and verification steps.
2. Current Constraints That Must Be Preserved
The oplog is not just a queue; the implementation must support:
- append + dedupe by hash
- lookup by hash
- node/time range scans
- chain-range reconstruction by hash linkage
- prune by cutoff timestamp
- per-dataset isolation
Key references:
/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs
Important behavior notes:
- Prune is cutoff-based, not pure FIFO dequeue.
- Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
- Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.
3. Target Design (LMDB)
3.1 New Provider
Create an LMDB oplog provider in persistence:
- New class:
LmdbOplogStore : OplogStore - New options class:
LmdbOplogOptions - New DI extension, e.g.
AddCBDDCLmdbOplog(...)
Suggested options:
EnvironmentPathMapSizeBytesMaxDatabasesSyncMode(durability/perf)PruneBatchSizeEnableCompactionCopy(for optional file shrink operation)
3.2 LMDB Data Layout
Use multiple named DBIs (single environment):
-
oplog_by_hash
Key:{datasetId}|{hash}
Value: serializedOplogEntry(compact binary or UTF-8 JSON) -
oplog_by_hlc
Key:{datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}
Value: empty or small marker
Purpose:GetOplogAfterAsync, prune range scan -
oplog_by_node_hlc
Key:{datasetId}|{nodeId}|{wall}|{logic}|{hash}
Value: empty or marker
Purpose:GetOplogForNodeAfterAsync, fast node head updates -
oplog_prev_to_hash(duplicate-allowed)
Key:{datasetId}|{previousHash}
Value:{hash}
Purpose: chain traversal support forGetChainRangeAsync -
oplog_node_head
Key:{datasetId}|{nodeId}
Value:{wall, logic, hash}
Purpose: O(1)GetLastEntryHashAsync -
oplog_meta
Stores schema version, migration markers, and optional prune watermark per dataset.
Notes:
- Use deterministic byte encoding for composite keys to preserve lexical order.
- Keep dataset prefix in every index key to guarantee dataset isolation.
3.3 Write Transaction Rules
AppendOplogEntryAsync transaction:
- Check dedupe in
oplog_by_hash. - If absent: insert
oplog_by_hash+ all secondary indexes. - Update
oplog_node_headonly if incoming timestamp > current head timestamp for that node. - Commit once.
MergeAsync/ImportAsync:
- Reuse same insert routine in loops with write batching.
- Dedupe strictly by hash.
3.4 Prune Strategy
Base prune operation (must-have):
- Cursor-scan
oplog_by_hlcup to cutoff key for target dataset. - For each candidate hash:
- delete from
oplog_by_hash - delete node index key
- delete prev->hash duplicate mapping
- delete hlc index key
- delete from
- Recompute affected
oplog_node_headentries lazily (on read) or eagerly for touched nodes.
Efficiency enhancements (recommended):
- Process deletes in batches (
PruneBatchSize) inside bounded write txns. - Keep optional per-node dirty set during prune to limit head recomputation.
- Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).
3.5 Atomicity with Document Metadata
Decision required (explicit in implementation review):
Option A (phase 1 recommended):
- Accept cross-engine eventual atomicity.
- Keep current document write flow.
- Add reconciliation/repair on startup:
- detect metadata entries missing oplog hash for recent writes
- rebuild node-head and index consistency from
oplog_by_hash.
Option B (hard mode):
- Introduce durable outbox pattern to guarantee atomic handoff across engines.
- Higher complexity; schedule after functional cutover.
Plan uses Option A first for lower migration risk.
4. Phased Execution Plan
Phase 0: Prep and Design Freeze
- Add ADR documenting:
- key encoding format
- index schema
- prune algorithm
- consistency model (Option A above)
- Add config model and feature flags:
UseLmdbOplogDualWriteOplogPreferLmdbReads
Exit criteria:
- ADR approved.
- Configuration contract approved.
Phase 1: LMDB Store Skeleton
- Add package reference
LightningDB. - Implement
LmdbOplogStorewith:AppendOplogEntryAsyncGetEntryByHashAsyncGetLastEntryHashAsyncGetOplogAfterAsyncGetOplogForNodeAfterAsyncGetChainRangeAsyncPruneOplogAsync- snapshot import/export/drop/merge APIs.
- Implement startup/open/close lifecycle and map-size handling.
Exit criteria:
- Local contract tests pass for LMDB store.
Phase 2: Dual-Write + Read Shadow Validation
- Keep Surreal oplog as source of truth.
- Write every oplog mutation to both stores (
DualWriteOplog=true). - Read-path comparison mode in non-prod:
- query both stores
- assert same hashes/order for key APIs
- log mismatches.
Exit criteria:
- Zero mismatches in soak tests.
Phase 3: Cutover
- Set
PreferLmdbReads=truein staging first. - Keep dual-write enabled for one release window.
- Monitor:
- prune duration
- oplog query latency
- mismatch counters
- restart recovery behavior.
Exit criteria:
- Stable staging and production canary.
Phase 4: Cleanup
- Disable Surreal oplog writes.
- Keep migration utility for rollback window.
- Remove dual-compare instrumentation after confidence period.
5. Data Migration / Backfill
Backfill tool steps:
- Read dataset-scoped Surreal oplog export.
- Bulk import into LMDB by HLC order.
- Rebuild node-head table.
- Validate:
- counts per dataset
- counts per node
- latest hash per node
- random hash spot checks
- chain-range spot checks.
Rollback:
- Keep Surreal oplog untouched during dual-write window.
- Flip feature flags back to Surreal reads.
6. Unit Test Update Instructions
6.1 Reuse Existing Oplog Contract Tests
Use these as baseline parity requirements:
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs(classSurrealStoreExportImportTests)
Actions:
- Extract oplog contract cases into shared test base (provider-agnostic).
- Run same suite against:
- Surreal store
- new LMDB store
Minimum parity cases:
- append/query/merge/drop
- dataset isolation
- legacy/default dataset behavior (if supported)
GetChainRangeAsynccorrectnessGetLastEntryHashAsyncpersistence across restart
6.2 Add LMDB-Specific Unit Tests
Create new file:
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs
Add tests for:
- Index consistency:
- inserting one entry populates all indexes
- deleting/pruning removes all index records
- Prune correctness:
- removes
<= cutoff - does not remove
> cutoff - handles interleaved node timestamps
- handles late-arriving old timestamp entry safely
- removes
- Node-head maintenance:
- head advances on newer entry
- prune invalidates/recomputes correctly
- Restart durability:
- reopen LMDB env and verify last-hash + scans
- Dedupe:
- duplicate hash append is idempotent
6.3 Update Integration/E2E Coverage
Files to touch:
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs(or LMDB-focused equivalent)
Add/adjust scenarios:
- Gap recovery still works with LMDB oplog backend.
- Peer-confirmed prune still blocks/allows correctly.
- Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
- Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).
6.4 Keep Existing Network Unit Tests Intact
Most network tests mock IOplogStore and should remain unchanged:
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs
Only update if method behavior/ordering contracts are intentionally changed.
7. Performance and Observability Plan
Track and compare (Surreal vs LMDB):
AppendOplogEntryAsynclatency p50/p95/p99GetOplogForNodeAfterAsynclatency- prune duration and entries/sec deleted
- LMDB env file size and reclaimed free-page ratio
- mismatch counters in dual-read compare mode
Add logs/metrics:
- prune batches processed
- dirty nodes recomputed
- startup repair actions and counts
8. Risks and Mitigations
- Cross-engine consistency gaps (document metadata vs oplog)
- Mitigation: startup reconciliation + dual-write shadow period.
- Incorrect composite key encoding
- Mitigation: explicit encoding helper + property tests for sort/order invariants.
- Prune causing stale node-head values
- Mitigation: touched-node tracking and lazy/eager recompute tests.
- LMDB map-size exhaustion
- Mitigation: configurable mapsize, monitoring, and operational runbook for resize.
9. Review Checklist
- ADR approved for LMDB key/index schema.
- Feature flags merged (
UseLmdbOplog,DualWriteOplog,PreferLmdbReads). - LMDB contract tests passing.
- Dual-write mismatch telemetry in place.
- Backfill tool implemented and validated in automated tests (staging execution ready).
- Prune correctness + efficiency tests passing.
- Rollback path documented and tested.