Files
CBDDC/lmdbop.md
Joseph Doherty cce24fa8f3
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m16s
Add LMDB oplog migration path with dual-write cutover support
Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
2026-02-22 17:44:57 -05:00

308 lines
10 KiB
Markdown

# LMDB Oplog Migration Plan
## 1. Goal
Move `IOplogStore` persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.
Primary outcomes:
- Keep existing `IOplogStore` contract semantics.
- Make `PruneOplogAsync` efficient and safe under current timestamp-based cutoff behavior.
- Keep roll-forward and rollback low risk via feature flags and verification steps.
## 2. Current Constraints That Must Be Preserved
The oplog is not just a queue; the implementation must support:
- append + dedupe by hash
- lookup by hash
- node/time range scans
- chain-range reconstruction by hash linkage
- prune by cutoff timestamp
- per-dataset isolation
Key references:
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs`
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs`
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs`
Important behavior notes:
- Prune is cutoff-based, not pure FIFO dequeue.
- Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
- Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.
## 3. Target Design (LMDB)
## 3.1 New Provider
Create an LMDB oplog provider in persistence:
- New class: `LmdbOplogStore : OplogStore`
- New options class: `LmdbOplogOptions`
- New DI extension, e.g. `AddCBDDCLmdbOplog(...)`
Suggested options:
- `EnvironmentPath`
- `MapSizeBytes`
- `MaxDatabases`
- `SyncMode` (durability/perf)
- `PruneBatchSize`
- `EnableCompactionCopy` (for optional file shrink operation)
## 3.2 LMDB Data Layout
Use multiple named DBIs (single environment):
1. `oplog_by_hash`
Key: `{datasetId}|{hash}`
Value: serialized `OplogEntry` (compact binary or UTF-8 JSON)
2. `oplog_by_hlc`
Key: `{datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}`
Value: empty or small marker
Purpose: `GetOplogAfterAsync`, prune range scan
3. `oplog_by_node_hlc`
Key: `{datasetId}|{nodeId}|{wall}|{logic}|{hash}`
Value: empty or marker
Purpose: `GetOplogForNodeAfterAsync`, fast node head updates
4. `oplog_prev_to_hash` (duplicate-allowed)
Key: `{datasetId}|{previousHash}`
Value: `{hash}`
Purpose: chain traversal support for `GetChainRangeAsync`
5. `oplog_node_head`
Key: `{datasetId}|{nodeId}`
Value: `{wall, logic, hash}`
Purpose: O(1) `GetLastEntryHashAsync`
6. `oplog_meta`
Stores schema version, migration markers, and optional prune watermark per dataset.
Notes:
- Use deterministic byte encoding for composite keys to preserve lexical order.
- Keep dataset prefix in every index key to guarantee dataset isolation.
## 3.3 Write Transaction Rules
`AppendOplogEntryAsync` transaction:
1. Check dedupe in `oplog_by_hash`.
2. If absent: insert `oplog_by_hash` + all secondary indexes.
3. Update `oplog_node_head` only if incoming timestamp > current head timestamp for that node.
4. Commit once.
`MergeAsync`/`ImportAsync`:
- Reuse same insert routine in loops with write batching.
- Dedupe strictly by hash.
## 3.4 Prune Strategy
Base prune operation (must-have):
1. Cursor-scan `oplog_by_hlc` up to cutoff key for target dataset.
2. For each candidate hash:
- delete from `oplog_by_hash`
- delete node index key
- delete prev->hash duplicate mapping
- delete hlc index key
3. Recompute affected `oplog_node_head` entries lazily (on read) or eagerly for touched nodes.
Efficiency enhancements (recommended):
- Process deletes in batches (`PruneBatchSize`) inside bounded write txns.
- Keep optional per-node dirty set during prune to limit head recomputation.
- Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).
## 3.5 Atomicity with Document Metadata
Decision required (explicit in implementation review):
Option A (phase 1 recommended):
- Accept cross-engine eventual atomicity.
- Keep current document write flow.
- Add reconciliation/repair on startup:
- detect metadata entries missing oplog hash for recent writes
- rebuild node-head and index consistency from `oplog_by_hash`.
Option B (hard mode):
- Introduce durable outbox pattern to guarantee atomic handoff across engines.
- Higher complexity; schedule after functional cutover.
Plan uses Option A first for lower migration risk.
## 4. Phased Execution Plan
## Phase 0: Prep and Design Freeze
- Add ADR documenting:
- key encoding format
- index schema
- prune algorithm
- consistency model (Option A above)
- Add config model and feature flags:
- `UseLmdbOplog`
- `DualWriteOplog`
- `PreferLmdbReads`
Exit criteria:
- ADR approved.
- Configuration contract approved.
## Phase 1: LMDB Store Skeleton
- Add package reference `LightningDB`.
- Implement `LmdbOplogStore` with:
- `AppendOplogEntryAsync`
- `GetEntryByHashAsync`
- `GetLastEntryHashAsync`
- `GetOplogAfterAsync`
- `GetOplogForNodeAfterAsync`
- `GetChainRangeAsync`
- `PruneOplogAsync`
- snapshot import/export/drop/merge APIs.
- Implement startup/open/close lifecycle and map-size handling.
Exit criteria:
- Local contract tests pass for LMDB store.
## Phase 2: Dual-Write + Read Shadow Validation
- Keep Surreal oplog as source of truth.
- Write every oplog mutation to both stores (`DualWriteOplog=true`).
- Read-path comparison mode in non-prod:
- query both stores
- assert same hashes/order for key APIs
- log mismatches.
Exit criteria:
- Zero mismatches in soak tests.
## Phase 3: Cutover
- Set `PreferLmdbReads=true` in staging first.
- Keep dual-write enabled for one release window.
- Monitor:
- prune duration
- oplog query latency
- mismatch counters
- restart recovery behavior.
Exit criteria:
- Stable staging and production canary.
## Phase 4: Cleanup
- Disable Surreal oplog writes.
- Keep migration utility for rollback window.
- Remove dual-compare instrumentation after confidence period.
## 5. Data Migration / Backfill
Backfill tool steps:
1. Read dataset-scoped Surreal oplog export.
2. Bulk import into LMDB by HLC order.
3. Rebuild node-head table.
4. Validate:
- counts per dataset
- counts per node
- latest hash per node
- random hash spot checks
- chain-range spot checks.
Rollback:
- Keep Surreal oplog untouched during dual-write window.
- Flip feature flags back to Surreal reads.
## 6. Unit Test Update Instructions
## 6.1 Reuse Existing Oplog Contract Tests
Use these as baseline parity requirements:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs` (class `SurrealStoreExportImportTests`)
Actions:
1. Extract oplog contract cases into shared test base (provider-agnostic).
2. Run same suite against:
- Surreal store
- new LMDB store
Minimum parity cases:
- append/query/merge/drop
- dataset isolation
- legacy/default dataset behavior (if supported)
- `GetChainRangeAsync` correctness
- `GetLastEntryHashAsync` persistence across restart
## 6.2 Add LMDB-Specific Unit Tests
Create new file:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs`
Add tests for:
1. Index consistency:
- inserting one entry populates all indexes
- deleting/pruning removes all index records
2. Prune correctness:
- removes `<= cutoff`
- does not remove `> cutoff`
- handles interleaved node timestamps
- handles late-arriving old timestamp entry safely
3. Node-head maintenance:
- head advances on newer entry
- prune invalidates/recomputes correctly
4. Restart durability:
- reopen LMDB env and verify last-hash + scans
5. Dedupe:
- duplicate hash append is idempotent
## 6.3 Update Integration/E2E Coverage
Files to touch:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs` (or LMDB-focused equivalent)
Add/adjust scenarios:
1. Gap recovery still works with LMDB oplog backend.
2. Peer-confirmed prune still blocks/allows correctly.
3. Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
4. Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).
## 6.4 Keep Existing Network Unit Tests Intact
Most network tests mock `IOplogStore` and should remain unchanged:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs`
Only update if method behavior/ordering contracts are intentionally changed.
## 7. Performance and Observability Plan
Track and compare (Surreal vs LMDB):
- `AppendOplogEntryAsync` latency p50/p95/p99
- `GetOplogForNodeAfterAsync` latency
- prune duration and entries/sec deleted
- LMDB env file size and reclaimed free-page ratio
- mismatch counters in dual-read compare mode
Add logs/metrics:
- prune batches processed
- dirty nodes recomputed
- startup repair actions and counts
## 8. Risks and Mitigations
1. Cross-engine consistency gaps (document metadata vs oplog)
- Mitigation: startup reconciliation + dual-write shadow period.
2. Incorrect composite key encoding
- Mitigation: explicit encoding helper + property tests for sort/order invariants.
3. Prune causing stale node-head values
- Mitigation: touched-node tracking and lazy/eager recompute tests.
4. LMDB map-size exhaustion
- Mitigation: configurable mapsize, monitoring, and operational runbook for resize.
## 9. Review Checklist
- [x] ADR approved for LMDB key/index schema.
- [x] Feature flags merged (`UseLmdbOplog`, `DualWriteOplog`, `PreferLmdbReads`).
- [x] LMDB contract tests passing.
- [x] Dual-write mismatch telemetry in place.
- [x] Backfill tool implemented and validated in automated tests (staging execution ready).
- [x] Prune correctness + efficiency tests passing.
- [x] Rollback path documented and tested.