All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m16s
Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
308 lines
10 KiB
Markdown
308 lines
10 KiB
Markdown
# LMDB Oplog Migration Plan
|
|
|
|
## 1. Goal
|
|
|
|
Move `IOplogStore` persistence from Surreal-backed oplog tables to an LMDB-backed store while preserving current sync behavior and improving prune efficiency.
|
|
|
|
Primary outcomes:
|
|
- Keep existing `IOplogStore` contract semantics.
|
|
- Make `PruneOplogAsync` efficient and safe under current timestamp-based cutoff behavior.
|
|
- Keep roll-forward and rollback low risk via feature flags and verification steps.
|
|
|
|
## 2. Current Constraints That Must Be Preserved
|
|
|
|
The oplog is not just a queue; the implementation must support:
|
|
- append + dedupe by hash
|
|
- lookup by hash
|
|
- node/time range scans
|
|
- chain-range reconstruction by hash linkage
|
|
- prune by cutoff timestamp
|
|
- per-dataset isolation
|
|
|
|
Key references:
|
|
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Core/Storage/IOplogStore.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/SyncOrchestrator.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/src/ZB.MOM.WW.CBDDC.Network/TcpSyncServer.cs`
|
|
|
|
Important behavior notes:
|
|
- Prune is cutoff-based, not pure FIFO dequeue.
|
|
- Late-arriving remote entries can have older timestamps than recent local writes, so prune can be non-contiguous in append order.
|
|
- Current Surreal local write path performs atomic oplog+metadata(+checkpoint) persistence inside one transaction; cross-engine behavior must be handled intentionally.
|
|
|
|
## 3. Target Design (LMDB)
|
|
|
|
## 3.1 New Provider
|
|
|
|
Create an LMDB oplog provider in persistence:
|
|
- New class: `LmdbOplogStore : OplogStore`
|
|
- New options class: `LmdbOplogOptions`
|
|
- New DI extension, e.g. `AddCBDDCLmdbOplog(...)`
|
|
|
|
Suggested options:
|
|
- `EnvironmentPath`
|
|
- `MapSizeBytes`
|
|
- `MaxDatabases`
|
|
- `SyncMode` (durability/perf)
|
|
- `PruneBatchSize`
|
|
- `EnableCompactionCopy` (for optional file shrink operation)
|
|
|
|
## 3.2 LMDB Data Layout
|
|
|
|
Use multiple named DBIs (single environment):
|
|
|
|
1. `oplog_by_hash`
|
|
Key: `{datasetId}|{hash}`
|
|
Value: serialized `OplogEntry` (compact binary or UTF-8 JSON)
|
|
|
|
2. `oplog_by_hlc`
|
|
Key: `{datasetId}|{wall:big-endian}|{logic:big-endian}|{nodeId}|{hash}`
|
|
Value: empty or small marker
|
|
Purpose: `GetOplogAfterAsync`, prune range scan
|
|
|
|
3. `oplog_by_node_hlc`
|
|
Key: `{datasetId}|{nodeId}|{wall}|{logic}|{hash}`
|
|
Value: empty or marker
|
|
Purpose: `GetOplogForNodeAfterAsync`, fast node head updates
|
|
|
|
4. `oplog_prev_to_hash` (duplicate-allowed)
|
|
Key: `{datasetId}|{previousHash}`
|
|
Value: `{hash}`
|
|
Purpose: chain traversal support for `GetChainRangeAsync`
|
|
|
|
5. `oplog_node_head`
|
|
Key: `{datasetId}|{nodeId}`
|
|
Value: `{wall, logic, hash}`
|
|
Purpose: O(1) `GetLastEntryHashAsync`
|
|
|
|
6. `oplog_meta`
|
|
Stores schema version, migration markers, and optional prune watermark per dataset.
|
|
|
|
Notes:
|
|
- Use deterministic byte encoding for composite keys to preserve lexical order.
|
|
- Keep dataset prefix in every index key to guarantee dataset isolation.
|
|
|
|
## 3.3 Write Transaction Rules
|
|
|
|
`AppendOplogEntryAsync` transaction:
|
|
1. Check dedupe in `oplog_by_hash`.
|
|
2. If absent: insert `oplog_by_hash` + all secondary indexes.
|
|
3. Update `oplog_node_head` only if incoming timestamp > current head timestamp for that node.
|
|
4. Commit once.
|
|
|
|
`MergeAsync`/`ImportAsync`:
|
|
- Reuse same insert routine in loops with write batching.
|
|
- Dedupe strictly by hash.
|
|
|
|
## 3.4 Prune Strategy
|
|
|
|
Base prune operation (must-have):
|
|
1. Cursor-scan `oplog_by_hlc` up to cutoff key for target dataset.
|
|
2. For each candidate hash:
|
|
- delete from `oplog_by_hash`
|
|
- delete node index key
|
|
- delete prev->hash duplicate mapping
|
|
- delete hlc index key
|
|
3. Recompute affected `oplog_node_head` entries lazily (on read) or eagerly for touched nodes.
|
|
|
|
Efficiency enhancements (recommended):
|
|
- Process deletes in batches (`PruneBatchSize`) inside bounded write txns.
|
|
- Keep optional per-node dirty set during prune to limit head recomputation.
|
|
- Optional periodic LMDB compact copy if physical file shrink is needed (LMDB naturally reuses freed pages, but does not always shrink file immediately).
|
|
|
|
## 3.5 Atomicity with Document Metadata
|
|
|
|
Decision required (explicit in implementation review):
|
|
|
|
Option A (phase 1 recommended):
|
|
- Accept cross-engine eventual atomicity.
|
|
- Keep current document write flow.
|
|
- Add reconciliation/repair on startup:
|
|
- detect metadata entries missing oplog hash for recent writes
|
|
- rebuild node-head and index consistency from `oplog_by_hash`.
|
|
|
|
Option B (hard mode):
|
|
- Introduce durable outbox pattern to guarantee atomic handoff across engines.
|
|
- Higher complexity; schedule after functional cutover.
|
|
|
|
Plan uses Option A first for lower migration risk.
|
|
|
|
## 4. Phased Execution Plan
|
|
|
|
## Phase 0: Prep and Design Freeze
|
|
- Add ADR documenting:
|
|
- key encoding format
|
|
- index schema
|
|
- prune algorithm
|
|
- consistency model (Option A above)
|
|
- Add config model and feature flags:
|
|
- `UseLmdbOplog`
|
|
- `DualWriteOplog`
|
|
- `PreferLmdbReads`
|
|
|
|
Exit criteria:
|
|
- ADR approved.
|
|
- Configuration contract approved.
|
|
|
|
## Phase 1: LMDB Store Skeleton
|
|
- Add package reference `LightningDB`.
|
|
- Implement `LmdbOplogStore` with:
|
|
- `AppendOplogEntryAsync`
|
|
- `GetEntryByHashAsync`
|
|
- `GetLastEntryHashAsync`
|
|
- `GetOplogAfterAsync`
|
|
- `GetOplogForNodeAfterAsync`
|
|
- `GetChainRangeAsync`
|
|
- `PruneOplogAsync`
|
|
- snapshot import/export/drop/merge APIs.
|
|
- Implement startup/open/close lifecycle and map-size handling.
|
|
|
|
Exit criteria:
|
|
- Local contract tests pass for LMDB store.
|
|
|
|
## Phase 2: Dual-Write + Read Shadow Validation
|
|
- Keep Surreal oplog as source of truth.
|
|
- Write every oplog mutation to both stores (`DualWriteOplog=true`).
|
|
- Read-path comparison mode in non-prod:
|
|
- query both stores
|
|
- assert same hashes/order for key APIs
|
|
- log mismatches.
|
|
|
|
Exit criteria:
|
|
- Zero mismatches in soak tests.
|
|
|
|
## Phase 3: Cutover
|
|
- Set `PreferLmdbReads=true` in staging first.
|
|
- Keep dual-write enabled for one release window.
|
|
- Monitor:
|
|
- prune duration
|
|
- oplog query latency
|
|
- mismatch counters
|
|
- restart recovery behavior.
|
|
|
|
Exit criteria:
|
|
- Stable staging and production canary.
|
|
|
|
## Phase 4: Cleanup
|
|
- Disable Surreal oplog writes.
|
|
- Keep migration utility for rollback window.
|
|
- Remove dual-compare instrumentation after confidence period.
|
|
|
|
## 5. Data Migration / Backfill
|
|
|
|
Backfill tool steps:
|
|
1. Read dataset-scoped Surreal oplog export.
|
|
2. Bulk import into LMDB by HLC order.
|
|
3. Rebuild node-head table.
|
|
4. Validate:
|
|
- counts per dataset
|
|
- counts per node
|
|
- latest hash per node
|
|
- random hash spot checks
|
|
- chain-range spot checks.
|
|
|
|
Rollback:
|
|
- Keep Surreal oplog untouched during dual-write window.
|
|
- Flip feature flags back to Surreal reads.
|
|
|
|
## 6. Unit Test Update Instructions
|
|
|
|
## 6.1 Reuse Existing Oplog Contract Tests
|
|
|
|
Use these as baseline parity requirements:
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealStoreContractTests.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/BLiteStoreExportImportTests.cs` (class `SurrealStoreExportImportTests`)
|
|
|
|
Actions:
|
|
1. Extract oplog contract cases into shared test base (provider-agnostic).
|
|
2. Run same suite against:
|
|
- Surreal store
|
|
- new LMDB store
|
|
|
|
Minimum parity cases:
|
|
- append/query/merge/drop
|
|
- dataset isolation
|
|
- legacy/default dataset behavior (if supported)
|
|
- `GetChainRangeAsync` correctness
|
|
- `GetLastEntryHashAsync` persistence across restart
|
|
|
|
## 6.2 Add LMDB-Specific Unit Tests
|
|
|
|
Create new file:
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/LmdbOplogStoreContractTests.cs`
|
|
|
|
Add tests for:
|
|
1. Index consistency:
|
|
- inserting one entry populates all indexes
|
|
- deleting/pruning removes all index records
|
|
2. Prune correctness:
|
|
- removes `<= cutoff`
|
|
- does not remove `> cutoff`
|
|
- handles interleaved node timestamps
|
|
- handles late-arriving old timestamp entry safely
|
|
3. Node-head maintenance:
|
|
- head advances on newer entry
|
|
- prune invalidates/recomputes correctly
|
|
4. Restart durability:
|
|
- reopen LMDB env and verify last-hash + scans
|
|
5. Dedupe:
|
|
- duplicate hash append is idempotent
|
|
|
|
## 6.3 Update Integration/E2E Coverage
|
|
|
|
Files to touch:
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.E2E.Tests/ClusterCrudSyncE2ETests.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Sample.Console.Tests/SurrealCdcDurabilityTests.cs` (or LMDB-focused equivalent)
|
|
|
|
Add/adjust scenarios:
|
|
1. Gap recovery still works with LMDB oplog backend.
|
|
2. Peer-confirmed prune still blocks/allows correctly.
|
|
3. Crash between document commit and oplog write (Option A behavior) is detected/repaired by startup reconciliation.
|
|
4. Prune performance smoke test (large synthetic oplog, bounded runtime threshold with generous margin).
|
|
|
|
## 6.4 Keep Existing Network Unit Tests Intact
|
|
|
|
Most network tests mock `IOplogStore` and should remain unchanged:
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorMaintenancePruningTests.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SyncOrchestratorConfirmationTests.cs`
|
|
- `/Users/dohertj2/Desktop/CBDDC/tests/ZB.MOM.WW.CBDDC.Network.Tests/SnapshotReconnectRegressionTests.cs`
|
|
|
|
Only update if method behavior/ordering contracts are intentionally changed.
|
|
|
|
## 7. Performance and Observability Plan
|
|
|
|
Track and compare (Surreal vs LMDB):
|
|
- `AppendOplogEntryAsync` latency p50/p95/p99
|
|
- `GetOplogForNodeAfterAsync` latency
|
|
- prune duration and entries/sec deleted
|
|
- LMDB env file size and reclaimed free-page ratio
|
|
- mismatch counters in dual-read compare mode
|
|
|
|
Add logs/metrics:
|
|
- prune batches processed
|
|
- dirty nodes recomputed
|
|
- startup repair actions and counts
|
|
|
|
## 8. Risks and Mitigations
|
|
|
|
1. Cross-engine consistency gaps (document metadata vs oplog)
|
|
- Mitigation: startup reconciliation + dual-write shadow period.
|
|
|
|
2. Incorrect composite key encoding
|
|
- Mitigation: explicit encoding helper + property tests for sort/order invariants.
|
|
|
|
3. Prune causing stale node-head values
|
|
- Mitigation: touched-node tracking and lazy/eager recompute tests.
|
|
|
|
4. LMDB map-size exhaustion
|
|
- Mitigation: configurable mapsize, monitoring, and operational runbook for resize.
|
|
|
|
## 9. Review Checklist
|
|
|
|
- [x] ADR approved for LMDB key/index schema.
|
|
- [x] Feature flags merged (`UseLmdbOplog`, `DualWriteOplog`, `PreferLmdbReads`).
|
|
- [x] LMDB contract tests passing.
|
|
- [x] Dual-write mismatch telemetry in place.
|
|
- [x] Backfill tool implemented and validated in automated tests (staging execution ready).
|
|
- [x] Prune correctness + efficiency tests passing.
|
|
- [x] Rollback path documented and tested.
|