Files
CBDDC/lmdbop.md
Joseph Doherty cce24fa8f3
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m16s
Add LMDB oplog migration path with dual-write cutover support
Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
2026-02-22 17:44:57 -05:00

10 KiB

LMDB Oplog Migration Plan

1. Goal

Move IOplogStore persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.

Primary outcomes:

  • Keep existing IOplogStore contract semantics.
  • Make PruneOplogAsync efficient and safe under current timestamp-based cutoff behavior.
  • Keep roll-forward and rollback low risk via feature flags and verification steps.

2. Current Constraints That Must Be Preserved

The oplog is not just a queue; the implementation must support:

  • append + dedupe by hash
  • lookup by hash
  • node/time range scans
  • chain-range reconstruction by hash linkage
  • prune by cutoff timestamp
  • per-dataset isolation

Key references:

  • /Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs
  • /Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs
  • /Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs

Important behavior notes:

  • Prune is cutoff-based, not pure FIFO dequeue.
  • Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
  • Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.

3. Target Design (LMDB)

3.1 New Provider

Create an LMDB oplog provider in persistence:

  • New class: LmdbOplogStore : OplogStore
  • New options class: LmdbOplogOptions
  • New DI extension, e.g. AddCBDDCLmdbOplog(...)

Suggested options:

  • EnvironmentPath
  • MapSizeBytes
  • MaxDatabases
  • SyncMode (durability/perf)
  • PruneBatchSize
  • EnableCompactionCopy (for optional file shrink operation)

3.2 LMDB Data Layout

Use multiple named DBIs (single environment):

  1. oplog_by_hash
    Key: {datasetId}|{hash}
    Value: serialized OplogEntry (compact binary or UTF-8 JSON)

  2. oplog_by_hlc
    Key: {datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}
    Value: empty or small marker
    Purpose: GetOplogAfterAsync, prune range scan

  3. oplog_by_node_hlc
    Key: {datasetId}|{nodeId}|{wall}|{logic}|{hash}
    Value: empty or marker
    Purpose: GetOplogForNodeAfterAsync, fast node head updates

  4. oplog_prev_to_hash (duplicate-allowed)
    Key: {datasetId}|{previousHash}
    Value: {hash}
    Purpose: chain traversal support for GetChainRangeAsync

  5. oplog_node_head
    Key: {datasetId}|{nodeId}
    Value: {wall, logic, hash}
    Purpose: O(1) GetLastEntryHashAsync

  6. oplog_meta
    Stores schema version, migration markers, and optional prune watermark per dataset.

Notes:

  • Use deterministic byte encoding for composite keys to preserve lexical order.
  • Keep dataset prefix in every index key to guarantee dataset isolation.

3.3 Write Transaction Rules

AppendOplogEntryAsync transaction:

  1. Check dedupe in oplog_by_hash.
  2. If absent: insert oplog_by_hash + all secondary indexes.
  3. Update oplog_node_head only if incoming timestamp > current head timestamp for that node.
  4. Commit once.

MergeAsync/ImportAsync:

  • Reuse same insert routine in loops with write batching.
  • Dedupe strictly by hash.

3.4 Prune Strategy

Base prune operation (must-have):

  1. Cursor-scan oplog_by_hlc up to cutoff key for target dataset.
  2. For each candidate hash:
    • delete from oplog_by_hash
    • delete node index key
    • delete prev->hash duplicate mapping
    • delete hlc index key
  3. Recompute affected oplog_node_head entries lazily (on read) or eagerly for touched nodes.

Efficiency enhancements (recommended):

  • Process deletes in batches (PruneBatchSize) inside bounded write txns.
  • Keep optional per-node dirty set during prune to limit head recomputation.
  • Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).

3.5 Atomicity with Document Metadata

Decision required (explicit in implementation review):

Option A (phase 1 recommended):

  • Accept cross-engine eventual atomicity.
  • Keep current document write flow.
  • Add reconciliation/repair on startup:
    • detect metadata entries missing oplog hash for recent writes
    • rebuild node-head and index consistency from oplog_by_hash.

Option B (hard mode):

  • Introduce durable outbox pattern to guarantee atomic handoff across engines.
  • Higher complexity; schedule after functional cutover.

Plan uses Option A first for lower migration risk.

4. Phased Execution Plan

Phase 0: Prep and Design Freeze

  • Add ADR documenting:
    • key encoding format
    • index schema
    • prune algorithm
    • consistency model (Option A above)
  • Add config model and feature flags:
    • UseLmdbOplog
    • DualWriteOplog
    • PreferLmdbReads

Exit criteria:

  • ADR approved.
  • Configuration contract approved.

Phase 1: LMDB Store Skeleton

  • Add package reference LightningDB.
  • Implement LmdbOplogStore with:
    • AppendOplogEntryAsync
    • GetEntryByHashAsync
    • GetLastEntryHashAsync
    • GetOplogAfterAsync
    • GetOplogForNodeAfterAsync
    • GetChainRangeAsync
    • PruneOplogAsync
    • snapshot import/export/drop/merge APIs.
  • Implement startup/open/close lifecycle and map-size handling.

Exit criteria:

  • Local contract tests pass for LMDB store.

Phase 2: Dual-Write + Read Shadow Validation

  • Keep Surreal oplog as source of truth.
  • Write every oplog mutation to both stores (DualWriteOplog=true).
  • Read-path comparison mode in non-prod:
    • query both stores
    • assert same hashes/order for key APIs
    • log mismatches.

Exit criteria:

  • Zero mismatches in soak tests.

Phase 3: Cutover

  • Set PreferLmdbReads=true in staging first.
  • Keep dual-write enabled for one release window.
  • Monitor:
    • prune duration
    • oplog query latency
    • mismatch counters
    • restart recovery behavior.

Exit criteria:

  • Stable staging and production canary.

Phase 4: Cleanup

  • Disable Surreal oplog writes.
  • Keep migration utility for rollback window.
  • Remove dual-compare instrumentation after confidence period.

5. Data Migration / Backfill

Backfill tool steps:

  1. Read dataset-scoped Surreal oplog export.
  2. Bulk import into LMDB by HLC order.
  3. Rebuild node-head table.
  4. Validate:
    • counts per dataset
    • counts per node
    • latest hash per node
    • random hash spot checks
    • chain-range spot checks.

Rollback:

  • Keep Surreal oplog untouched during dual-write window.
  • Flip feature flags back to Surreal reads.

6. Unit Test Update Instructions

6.1 Reuse Existing Oplog Contract Tests

Use these as baseline parity requirements:

  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs
  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs (class SurrealStoreExportImportTests)

Actions:

  1. Extract oplog contract cases into shared test base (provider-agnostic).
  2. Run same suite against:
    • Surreal store
    • new LMDB store

Minimum parity cases:

  • append/query/merge/drop
  • dataset isolation
  • legacy/default dataset behavior (if supported)
  • GetChainRangeAsync correctness
  • GetLastEntryHashAsync persistence across restart

6.2 Add LMDB-Specific Unit Tests

Create new file:

  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs

Add tests for:

  1. Index consistency:
    • inserting one entry populates all indexes
    • deleting/pruning removes all index records
  2. Prune correctness:
    • removes <= cutoff
    • does not remove > cutoff
    • handles interleaved node timestamps
    • handles late-arriving old timestamp entry safely
  3. Node-head maintenance:
    • head advances on newer entry
    • prune invalidates/recomputes correctly
  4. Restart durability:
    • reopen LMDB env and verify last-hash + scans
  5. Dedupe:
    • duplicate hash append is idempotent

6.3 Update Integration/E2E Coverage

Files to touch:

  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs
  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs (or LMDB-focused equivalent)

Add/adjust scenarios:

  1. Gap recovery still works with LMDB oplog backend.
  2. Peer-confirmed prune still blocks/allows correctly.
  3. Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
  4. Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).

6.4 Keep Existing Network Unit Tests Intact

Most network tests mock IOplogStore and should remain unchanged:

  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs
  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs
  • /Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs

Only update if method behavior/ordering contracts are intentionally changed.

7. Performance and Observability Plan

Track and compare (Surreal vs LMDB):

  • AppendOplogEntryAsync latency p50/p95/p99
  • GetOplogForNodeAfterAsync latency
  • prune duration and entries/sec deleted
  • LMDB env file size and reclaimed free-page ratio
  • mismatch counters in dual-read compare mode

Add logs/metrics:

  • prune batches processed
  • dirty nodes recomputed
  • startup repair actions and counts

8. Risks and Mitigations

  1. Cross-engine consistency gaps (document metadata vs oplog)
  • Mitigation: startup reconciliation + dual-write shadow period.
  1. Incorrect composite key encoding
  • Mitigation: explicit encoding helper + property tests for sort/order invariants.
  1. Prune causing stale node-head values
  • Mitigation: touched-node tracking and lazy/eager recompute tests.
  1. LMDB map-size exhaustion
  • Mitigation: configurable mapsize, monitoring, and operational runbook for resize.

9. Review Checklist

  • ADR approved for LMDB key/index schema.
  • Feature flags merged (UseLmdbOplog, DualWriteOplog, PreferLmdbReads).
  • LMDB contract tests passing.
  • Dual-write mismatch telemetry in place.
  • Backfill tool implemented and validated in automated tests (staging execution ready).
  • Prune correctness + efficiency tests passing.
  • Rollback path documented and tested.