CBDDC/lmdbop.md

# LMDB Oplog Migration Plan

## 1. Goal

Move `IOplogStore` persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.

Primary outcomes:
- Keep existing `IOplogStore` contract semantics.
- Make `PruneOplogAsync` efficient and safe under current timestamp-based cutoff behavior.
- Keep roll-forward and rollback low risk via feature flags and verification steps.

## 2. Current Constraints That Must Be Preserved

The oplog is not just a queue; the implementation must support:
- append + dedupe by hash
- lookup by hash
- node/time range scans
- chain-range reconstruction by hash linkage
- prune by cutoff timestamp
- per-dataset isolation

Key references:
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs`
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs`
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs`

Important behavior notes:
- Prune is cutoff-based, not pure FIFO dequeue.
- Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
- Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.

## 3. Target Design (LMDB)

## 3.1 New Provider

Create an LMDB oplog provider in persistence:
- New class: `LmdbOplogStore : OplogStore`
- New options class: `LmdbOplogOptions`
- New DI extension, e.g. `AddCBDDCLmdbOplog(...)`

Suggested options:
- `EnvironmentPath`
- `MapSizeBytes`
- `MaxDatabases`
- `SyncMode` (durability/perf)
- `PruneBatchSize`
- `EnableCompactionCopy` (for optional file shrink operation)

## 3.2 LMDB Data Layout

Use multiple named DBIs (single environment):

1. `oplog_by_hash`
Key: `{datasetId}|{hash}`
Value: serialized `OplogEntry` (compact binary or UTF-8 JSON)

2. `oplog_by_hlc`
Key: `{datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}`
Value: empty or small marker
Purpose: `GetOplogAfterAsync`, prune range scan

3. `oplog_by_node_hlc`
Key: `{datasetId}|{nodeId}|{wall}|{logic}|{hash}`
Value: empty or marker
Purpose: `GetOplogForNodeAfterAsync`, fast node head updates

4. `oplog_prev_to_hash` (duplicate-allowed)
Key: `{datasetId}|{previousHash}`
Value: `{hash}`
Purpose: chain traversal support for `GetChainRangeAsync`

5. `oplog_node_head`
Key: `{datasetId}|{nodeId}`
Value: `{wall, logic, hash}`
Purpose: O(1) `GetLastEntryHashAsync`

6. `oplog_meta`
Stores schema version, migration markers, and optional prune watermark per dataset.

Notes:
- Use deterministic byte encoding for composite keys to preserve lexical order.
- Keep dataset prefix in every index key to guarantee dataset isolation.

## 3.3 Write Transaction Rules

`AppendOplogEntryAsync` transaction:
1. Check dedupe in `oplog_by_hash`.
2. If absent: insert `oplog_by_hash` + all secondary indexes.
3. Update `oplog_node_head` only if incoming timestamp > current head timestamp for that node.
4. Commit once.

`MergeAsync`/`ImportAsync`:
- Reuse same insert routine in loops with write batching.
- Dedupe strictly by hash.

## 3.4 Prune Strategy

Base prune operation (must-have):
1. Cursor-scan `oplog_by_hlc` up to cutoff key for target dataset.
2. For each candidate hash:
   - delete from `oplog_by_hash`
   - delete node index key
   - delete prev->hash duplicate mapping
   - delete hlc index key
3. Recompute affected `oplog_node_head` entries lazily (on read) or eagerly for touched nodes.

Efficiency enhancements (recommended):
- Process deletes in batches (`PruneBatchSize`) inside bounded write txns.
- Keep optional per-node dirty set during prune to limit head recomputation.
- Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).

## 3.5 Atomicity with Document Metadata

Decision required (explicit in implementation review):

Option A (phase 1 recommended):
- Accept cross-engine eventual atomicity.
- Keep current document write flow.
- Add reconciliation/repair on startup:
  - detect metadata entries missing oplog hash for recent writes
  - rebuild node-head and index consistency from `oplog_by_hash`.

Option B (hard mode):
- Introduce durable outbox pattern to guarantee atomic handoff across engines.
- Higher complexity; schedule after functional cutover.

Plan uses Option A first for lower migration risk.

## 4. Phased Execution Plan

## Phase 0: Prep and Design Freeze
- Add ADR documenting:
  - key encoding format
  - index schema
  - prune algorithm
  - consistency model (Option A above)
- Add config model and feature flags:
  - `UseLmdbOplog`
  - `DualWriteOplog`
  - `PreferLmdbReads`

Exit criteria:
- ADR approved.
- Configuration contract approved.

## Phase 1: LMDB Store Skeleton
- Add package reference `LightningDB`.
- Implement `LmdbOplogStore` with:
  - `AppendOplogEntryAsync`
  - `GetEntryByHashAsync`
  - `GetLastEntryHashAsync`
  - `GetOplogAfterAsync`
  - `GetOplogForNodeAfterAsync`
  - `GetChainRangeAsync`
  - `PruneOplogAsync`
  - snapshot import/export/drop/merge APIs.
- Implement startup/open/close lifecycle and map-size handling.

Exit criteria:
- Local contract tests pass for LMDB store.

## Phase 2: Dual-Write + Read Shadow Validation
- Keep Surreal oplog as source of truth.
- Write every oplog mutation to both stores (`DualWriteOplog=true`).
- Read-path comparison mode in non-prod:
  - query both stores
  - assert same hashes/order for key APIs
  - log mismatches.

Exit criteria:
- Zero mismatches in soak tests.

## Phase 3: Cutover
- Set `PreferLmdbReads=true` in staging first.
- Keep dual-write enabled for one release window.
- Monitor:
  - prune duration
  - oplog query latency
  - mismatch counters
  - restart recovery behavior.

Exit criteria:
- Stable staging and production canary.

## Phase 4: Cleanup
- Disable Surreal oplog writes.
- Keep migration utility for rollback window.
- Remove dual-compare instrumentation after confidence period.

## 5. Data Migration / Backfill

Backfill tool steps:
1. Read dataset-scoped Surreal oplog export.
2. Bulk import into LMDB by HLC order.
3. Rebuild node-head table.
4. Validate:
   - counts per dataset
   - counts per node
   - latest hash per node
   - random hash spot checks
   - chain-range spot checks.

Rollback:
- Keep Surreal oplog untouched during dual-write window.
- Flip feature flags back to Surreal reads.

## 6. Unit Test Update Instructions

## 6.1 Reuse Existing Oplog Contract Tests

Use these as baseline parity requirements:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs` (class `SurrealStoreExportImportTests`)

Actions:
1. Extract oplog contract cases into shared test base (provider-agnostic).
2. Run same suite against:
   - Surreal store
   - new LMDB store

Minimum parity cases:
- append/query/merge/drop
- dataset isolation
- legacy/default dataset behavior (if supported)
- `GetChainRangeAsync` correctness
- `GetLastEntryHashAsync` persistence across restart

## 6.2 Add LMDB-Specific Unit Tests

Create new file:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs`

Add tests for:
1. Index consistency:
   - inserting one entry populates all indexes
   - deleting/pruning removes all index records
2. Prune correctness:
   - removes `<= cutoff`
   - does not remove `> cutoff`
   - handles interleaved node timestamps
   - handles late-arriving old timestamp entry safely
3. Node-head maintenance:
   - head advances on newer entry
   - prune invalidates/recomputes correctly
4. Restart durability:
   - reopen LMDB env and verify last-hash + scans
5. Dedupe:
   - duplicate hash append is idempotent

## 6.3 Update Integration/E2E Coverage

Files to touch:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs` (or LMDB-focused equivalent)

Add/adjust scenarios:
1. Gap recovery still works with LMDB oplog backend.
2. Peer-confirmed prune still blocks/allows correctly.
3. Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
4. Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).

## 6.4 Keep Existing Network Unit Tests Intact

Most network tests mock `IOplogStore` and should remain unchanged:
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs`
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs`

Only update if method behavior/ordering contracts are intentionally changed.

## 7. Performance and Observability Plan

Track and compare (Surreal vs LMDB):
- `AppendOplogEntryAsync` latency p50/p95/p99
- `GetOplogForNodeAfterAsync` latency
- prune duration and entries/sec deleted
- LMDB env file size and reclaimed free-page ratio
- mismatch counters in dual-read compare mode

Add logs/metrics:
- prune batches processed
- dirty nodes recomputed
- startup repair actions and counts

## 8. Risks and Mitigations

1. Cross-engine consistency gaps (document metadata vs oplog)
- Mitigation: startup reconciliation + dual-write shadow period.

2. Incorrect composite key encoding
- Mitigation: explicit encoding helper + property tests for sort/order invariants.

3. Prune causing stale node-head values
- Mitigation: touched-node tracking and lazy/eager recompute tests.

4. LMDB map-size exhaustion
- Mitigation: configurable mapsize, monitoring, and operational runbook for resize.

## 9. Review Checklist

- [x] ADR approved for LMDB key/index schema.
- [x] Feature flags merged (`UseLmdbOplog`, `DualWriteOplog`, `PreferLmdbReads`).
- [x] LMDB contract tests passing.
- [x] Dual-write mismatch telemetry in place.
- [x] Backfill tool implemented and validated in automated tests (staging execution ready).
- [x] Prune correctness + efficiency tests passing.
- [x] Rollback path documented and tested.