Add LMDB oplog migration path with dual-write cutover support

Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
2026-02-22 17:44:57 -05:00
parent 3b9ff69adc
commit cce24fa8f3
16 changed files with 3601 additions and 6 deletions
@@ -0,0 +1,55 @@
+# ADR 0002: LMDB Oplog Migration
+
+## Status
+
+Accepted
+
+## Context
+
+The existing oplog persistence layer is Surreal-backed and tightly coupled to local CDC transaction boundaries. We need an LMDB-backed oplog path to improve prune efficiency and reduce query latency while preserving existing `IOplogStore` semantics and low-risk rollback.
+
+Key constraints:
+- Dedupe by hash.
+- Hash lookup, node/time scans, and chain reconstruction.
+- Cutoff-based prune (not strict FIFO) with interleaved late arrivals.
+- Dataset isolation in all indexes.
+- Explicit handling for cross-engine atomicity when document metadata remains Surreal-backed.
+
+## Decision
+
+Adopt an LMDB oplog provider (`LmdbOplogStore`) with feature-flag controlled migration:
+- `UseLmdbOplog`
+- `DualWriteOplog`
+- `PreferLmdbReads`
+
+### LMDB index schema
+
+Single environment with named DBIs:
+- `oplog_by_hash`: `{datasetId}|{hash}` -> serialized `OplogEntry`
+- `oplog_by_hlc`: `{datasetId}|{wall}|{logic}|{nodeId}|{hash}` -> marker
+- `oplog_by_node_hlc`: `{datasetId}|{nodeId}|{wall}|{logic}|{hash}` -> marker
+- `oplog_prev_to_hash` (`DUPSORT`): `{datasetId}|{previousHash}` -> `{hash}`
+- `oplog_node_head`: `{datasetId}|{nodeId}` -> `{wall,logic,hash}`
+- `oplog_meta`: schema/version markers + dataset prune watermark
+
+Composite keys use deterministic byte encodings with dataset prefixes on every index.
+
+### Prune algorithm
+
+Prune scans `oplog_by_hlc` up to cutoff and removes each candidate from all indexes, then recomputes touched node-head entries. Deletes run in bounded batches (`PruneBatchSize`) inside write transactions.
+
+### Consistency model
+
+Phase-1 consistency model is Option A (eventual cross-engine atomicity):
+- Surreal local CDC writes remain authoritative for atomic document+metadata+checkpoint transactions.
+- LMDB is backfilled/reconciled from Surreal when LMDB reads are preferred and gaps are detected.
+- Dual-write is available for sync-path writes to accelerate cutover confidence.
+
+## Consequences
+
+- Enables staged rollout (dual-write and read shadow validation before cutover).
+- Improves prune/query performance characteristics via ordered LMDB indexes.
+- Keeps rollback low-risk by retaining Surreal source-of-truth during migration windows.
+- Requires reconciliation logic and operational monitoring of mismatch counters/logs during migration.
+- Includes a dedicated backfill utility (`LmdbOplogBackfillTool`) with parity report output.
+- Exposes migration telemetry counters (`OplogMigrationTelemetry`) for mismatch/reconciliation tracking.
@@ -237,6 +237,83 @@ Surreal persistence now stores `datasetId` on oplog, metadata, snapshot metadata
 4. **Delete durability**: deletes persist as oplog delete operations plus tombstone metadata.
 5. **Remote apply behavior**: remote sync applies documents without generating local loopback CDC entries.

+## LMDB Oplog Migration Mode
+
+CBDDC now supports an LMDB-backed oplog provider for staged cutover from Surreal oplog tables.
+
+### Registration
+
+```csharp
+services.AddCBDDCCore()
+    .AddCBDDCSurrealEmbedded<SampleDocumentStore>(optionsFactory)
+    .AddCBDDCLmdbOplog(
+        _ => new LmdbOplogOptions
+        {
+            EnvironmentPath = "/var/lib/cbddc/oplog-lmdb",
+            MapSizeBytes = 256L * 1024 * 1024,
+            MaxDatabases = 16,
+            PruneBatchSize = 512
+        },
+        flags =>
+        {
+            flags.UseLmdbOplog = true;
+            flags.DualWriteOplog = true;
+            flags.PreferLmdbReads = false;
+        });
+```
+
+### Feature Flags
+
+- `UseLmdbOplog`: enables LMDB migration path.
+- `DualWriteOplog`: mirrors writes to Surreal + LMDB.
+- `PreferLmdbReads`: cuts reads over to LMDB.
+- `EnableReadShadowValidation`: compares Surreal/LMDB read results and logs mismatches.
+
+### Consistency Model
+
+The initial migration model is eventual cross-engine atomicity (Option A):
+
+- Surreal local CDC transactions remain authoritative for atomic document + metadata persistence.
+- LMDB is backfilled/reconciled when LMDB reads are preferred and LMDB is missing recent Surreal writes.
+- During rollout, keep dual-write enabled until mismatch logs remain stable.
+
+### Backfill Utility
+
+`LmdbOplogBackfillTool` performs Surreal -> LMDB oplog backfill and parity validation per dataset:
+
+```csharp
+var backfill = provider.GetRequiredService<LmdbOplogBackfillTool>();
+LmdbOplogBackfillReport report = await backfill.BackfillOrThrowAsync(DatasetId.Primary);
+```
+
+Validation includes:
+- total entry counts
+- per-node entry counts
+- latest hash per node
+- hash spot checks
+- chain-range spot checks
+
+### Migration Telemetry
+
+`FeatureFlagOplogStore` records migration counters through `OplogMigrationTelemetry`:
+
+- shadow comparisons
+- shadow mismatches
+- LMDB preferred-read fallbacks to Surreal
+- reconciliation runs and reconciled entry counts (global + per dataset)
+
+You can resolve `OplogMigrationTelemetry` from DI or call `GetTelemetrySnapshot()` on `FeatureFlagOplogStore`.
+
+### Rollback Path
+
+To roll back read/write behavior to Surreal during migration:
+
+- set `PreferLmdbReads = false`
+- set `DualWriteOplog = false`
+
+With `UseLmdbOplog = true`, this keeps LMDB services available while routing reads/writes to Surreal only.
+If LMDB should be fully disabled, set `UseLmdbOplog = false`.
+
 ## Feature Comparison

 | Feature | SQLite (Direct) | EF Core | PostgreSQL | Surreal Embedded |