Files

Joseph Doherty cce24fa8f3

NuGet Package Publish / nuget (push) Successful in 1m16s

Details

Add LMDB oplog migration path with dual-write cutover support

Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.

2026-02-22 17:44:57 -05:00

10 KiB

Raw Blame History

LMDB Oplog Migration Plan

1. Goal

Move IOplogStore persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.

Primary outcomes:

Keep existing IOplogStore contract semantics.
Make PruneOplogAsync efficient and safe under current timestamp-based cutoff behavior.
Keep roll-forward and rollback low risk via feature flags and verification steps.

2. Current Constraints That Must Be Preserved

The oplog is not just a queue; the implementation must support:

append + dedupe by hash
lookup by hash
node/time range scans
chain-range reconstruction by hash linkage
prune by cutoff timestamp
per-dataset isolation

Key references:

/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs
/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs
/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs

Important behavior notes:

Prune is cutoff-based, not pure FIFO dequeue.
Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.

3. Target Design (LMDB)

3.1 New Provider

Create an LMDB oplog provider in persistence:

New class: LmdbOplogStore : OplogStore
New options class: LmdbOplogOptions
New DI extension, e.g. AddCBDDCLmdbOplog(...)

Suggested options:

EnvironmentPath
MapSizeBytes
MaxDatabases
SyncMode (durability/perf)
PruneBatchSize
EnableCompactionCopy (for optional file shrink operation)

3.2 LMDB Data Layout

Use multiple named DBIs (single environment):

oplog_by_hash
Key: {datasetId}|{hash}
Value: serialized OplogEntry (compact binary or UTF-8 JSON)
oplog_by_hlc
Key: {datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}
Value: empty or small marker
Purpose: GetOplogAfterAsync, prune range scan
oplog_by_node_hlc
Key: {datasetId}|{nodeId}|{wall}|{logic}|{hash}
Value: empty or marker
Purpose: GetOplogForNodeAfterAsync, fast node head updates
oplog_prev_to_hash (duplicate-allowed)
Key: {datasetId}|{previousHash}
Value: {hash}
Purpose: chain traversal support for GetChainRangeAsync
oplog_node_head
Key: {datasetId}|{nodeId}
Value: {wall, logic, hash}
Purpose: O(1) GetLastEntryHashAsync
oplog_meta
Stores schema version, migration markers, and optional prune watermark per dataset.

Notes:

Use deterministic byte encoding for composite keys to preserve lexical order.
Keep dataset prefix in every index key to guarantee dataset isolation.

3.3 Write Transaction Rules

AppendOplogEntryAsync transaction:

Check dedupe in oplog_by_hash.
If absent: insert oplog_by_hash + all secondary indexes.
Update oplog_node_head only if incoming timestamp > current head timestamp for that node.
Commit once.

MergeAsync/ImportAsync:

Reuse same insert routine in loops with write batching.
Dedupe strictly by hash.

3.4 Prune Strategy

Base prune operation (must-have):

Cursor-scan oplog_by_hlc up to cutoff key for target dataset.
For each candidate hash:
- delete from oplog_by_hash
- delete node index key
- delete prev->hash duplicate mapping
- delete hlc index key
Recompute affected oplog_node_head entries lazily (on read) or eagerly for touched nodes.

Efficiency enhancements (recommended):

Process deletes in batches (PruneBatchSize) inside bounded write txns.
Keep optional per-node dirty set during prune to limit head recomputation.
Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).

3.5 Atomicity with Document Metadata

Decision required (explicit in implementation review):

Option A (phase 1 recommended):

Accept cross-engine eventual atomicity.
Keep current document write flow.
Add reconciliation/repair on startup:
- detect metadata entries missing oplog hash for recent writes
- rebuild node-head and index consistency from oplog_by_hash.

Option B (hard mode):

Introduce durable outbox pattern to guarantee atomic handoff across engines.
Higher complexity; schedule after functional cutover.

Plan uses Option A first for lower migration risk.

4. Phased Execution Plan

Phase 0: Prep and Design Freeze

Add ADR documenting:
- key encoding format
- index schema
- prune algorithm
- consistency model (Option A above)
Add config model and feature flags:
- UseLmdbOplog
- DualWriteOplog
- PreferLmdbReads

Exit criteria:

ADR approved.
Configuration contract approved.

Phase 1: LMDB Store Skeleton

Add package reference LightningDB.
Implement LmdbOplogStore with:
- AppendOplogEntryAsync
- GetEntryByHashAsync
- GetLastEntryHashAsync
- GetOplogAfterAsync
- GetOplogForNodeAfterAsync
- GetChainRangeAsync
- PruneOplogAsync
- snapshot import/export/drop/merge APIs.
Implement startup/open/close lifecycle and map-size handling.

Exit criteria:

Local contract tests pass for LMDB store.

Phase 2: Dual-Write + Read Shadow Validation

Keep Surreal oplog as source of truth.
Write every oplog mutation to both stores (DualWriteOplog=true).
Read-path comparison mode in non-prod:
- query both stores
- assert same hashes/order for key APIs
- log mismatches.

Exit criteria:

Zero mismatches in soak tests.

Phase 3: Cutover

Set PreferLmdbReads=true in staging first.
Keep dual-write enabled for one release window.
Monitor:
- prune duration
- oplog query latency
- mismatch counters
- restart recovery behavior.

Exit criteria:

Stable staging and production canary.

Phase 4: Cleanup

Disable Surreal oplog writes.
Keep migration utility for rollback window.
Remove dual-compare instrumentation after confidence period.

5. Data Migration / Backfill

Backfill tool steps:

Read dataset-scoped Surreal oplog export.
Bulk import into LMDB by HLC order.
Rebuild node-head table.
Validate:
- counts per dataset
- counts per node
- latest hash per node
- random hash spot checks
- chain-range spot checks.

Rollback:

Keep Surreal oplog untouched during dual-write window.
Flip feature flags back to Surreal reads.

6. Unit Test Update Instructions

6.1 Reuse Existing Oplog Contract Tests

Use these as baseline parity requirements:

/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs (class SurrealStoreExportImportTests)

Actions:

Extract oplog contract cases into shared test base (provider-agnostic).
Run same suite against:
- Surreal store
- new LMDB store

Minimum parity cases:

append/query/merge/drop
dataset isolation
legacy/default dataset behavior (if supported)
GetChainRangeAsync correctness
GetLastEntryHashAsync persistence across restart

6.2 Add LMDB-Specific Unit Tests

Create new file:

/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs

Add tests for:

Index consistency:
- inserting one entry populates all indexes
- deleting/pruning removes all index records
Prune correctness:
- removes <= cutoff
- does not remove > cutoff
- handles interleaved node timestamps
- handles late-arriving old timestamp entry safely
Node-head maintenance:
- head advances on newer entry
- prune invalidates/recomputes correctly
Restart durability:
- reopen LMDB env and verify last-hash + scans
Dedupe:
- duplicate hash append is idempotent

6.3 Update Integration/E2E Coverage

Files to touch:

/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs (or LMDB-focused equivalent)

Add/adjust scenarios:

Gap recovery still works with LMDB oplog backend.
Peer-confirmed prune still blocks/allows correctly.
Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).

6.4 Keep Existing Network Unit Tests Intact

Most network tests mock IOplogStore and should remain unchanged:

/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs
/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs

Only update if method behavior/ordering contracts are intentionally changed.

7. Performance and Observability Plan

Track and compare (Surreal vs LMDB):

AppendOplogEntryAsync latency p50/p95/p99
GetOplogForNodeAfterAsync latency
prune duration and entries/sec deleted
LMDB env file size and reclaimed free-page ratio
mismatch counters in dual-read compare mode

Add logs/metrics:

prune batches processed
dirty nodes recomputed
startup repair actions and counts

8. Risks and Mitigations

Cross-engine consistency gaps (document metadata vs oplog)

Mitigation: startup reconciliation + dual-write shadow period.

Incorrect composite key encoding

Mitigation: explicit encoding helper + property tests for sort/order invariants.

Prune causing stale node-head values

Mitigation: touched-node tracking and lazy/eager recompute tests.

LMDB map-size exhaustion

Mitigation: configurable mapsize, monitoring, and operational runbook for resize.

9. Review Checklist

ADR approved for LMDB key/index schema.
Feature flags merged (UseLmdbOplog, DualWriteOplog, PreferLmdbReads).
LMDB contract tests passing.
Dual-write mismatch telemetry in place.
Backfill tool implemented and validated in automated tests (staging execution ready).
Prune correctness + efficiency tests passing.
Rollback path documented and tested.

10 KiB Raw Blame History

LMDB Oplog Migration Plan

1. Goal

2. Current Constraints That Must Be Preserved

3. Target Design (LMDB)

3.1 New Provider

3.2 LMDB Data Layout

3.3 Write Transaction Rules

3.4 Prune Strategy

3.5 Atomicity with Document Metadata

4. Phased Execution Plan

Phase 0: Prep and Design Freeze

Phase 1: LMDB Store Skeleton

Phase 2: Dual-Write + Read Shadow Validation

Phase 3: Cutover

Phase 4: Cleanup

5. Data Migration / Backfill

6. Unit Test Update Instructions

6.1 Reuse Existing Oplog Contract Tests

6.2 Add LMDB-Specific Unit Tests

6.3 Update Integration/E2E Coverage

6.4 Keep Existing Network Unit Tests Intact

7. Performance and Observability Plan

8. Risks and Mitigations

9. Review Checklist

10 KiB

Raw Blame History