Files

Joseph Doherty cce24fa8f3

NuGet Package Publish / nuget (push) Successful in 1m16s

Details

Add LMDB oplog migration path with dual-write cutover support

Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.

2026-02-22 17:44:57 -05:00

14 KiB

Executable File

Raw Blame History

CBDDC Persistence Providers

CBDDC supports multiple persistence backends to suit different deployment scenarios.

Overview

Provider	Best For	Performance	Setup	Production Ready
SQLite (Direct)	Embedded apps, single-node	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	✅ Yes
EF Core (Generic)	Multi-DB support, migrations	⭐⭐⭐	⭐⭐⭐	✅ Yes
PostgreSQL	Production, high load, JSON queries	⭐⭐⭐⭐⭐	⭐⭐⭐	✅ Yes
Surreal Embedded (RocksDB)	Embedded multi-peer sync with local CDC	⭐⭐⭐⭐	⭐⭐⭐⭐	✅ Yes

SQLite (Direct)

Package: ZB.MOM.WW.CBDDC.Persistence.Sqlite

Characteristics

✅ Zero configuration: Works out of the box
✅ Excellent performance: Native SQL, no ORM overhead
✅ WAL mode: Concurrent readers + writers
✅ Per-collection tables: Optional for better isolation
✅ Snapshots: Fast reconnection with SnapshotMetadata
✅ Portable: Single file database
❌ Limited JSON queries: Uses json_extract()

When to Use

Building single-node applications
Embedded scenarios (desktop, mobile)
Development/testing
Maximum simplicity required
File-based portability important

Configuration

// Legacy mode (simple)
services.AddCBDDCSqlite("Data Source=cbddc.db");

// New mode (per-collection tables)
services.AddCBDDCSqlite(options =>
{
    options.BasePath = "/var/lib/cbddc";
    options.DatabaseFilenameTemplate = "cbddc-{NodeId}.db";
    options.UsePerCollectionTables = true;
});

Performance Tips

Enable WAL mode (done automatically)
Use per-collection tables for large datasets
Create indexes on frequently queried fields
Keep database on fast storage (SSD)

EF Core (Generic)

Package: ZB.MOM.WW.CBDDC.Persistence.EntityFramework

Characteristics

✅ Multi-database support: SQL Server, MySQL, SQLite, PostgreSQL
✅ EF Core benefits: Migrations, LINQ, change tracking
✅ Type-safe: Strongly-typed entities
⚠️ Query limitation: JSON queries evaluated in-memory
⚠️ ORM overhead: Slightly slower than direct SQL

When to Use

Need to support multiple database backends
Team familiar with EF Core patterns
Want automated migrations
Building enterprise applications
Database portability is important

Configuration

SQLite

services.AddCBDDCEntityFrameworkSqlite("Data Source=cbddc.db");

SQL Server

services.AddCBDDCEntityFrameworkSqlServer(
    "Server=localhost;Database=CBDDC;Trusted_Connection=True;");

PostgreSQL

services.AddDbContext<CBDDCContext>(options =>
    options.UseNpgsql(connectionString));
services.AddCBDDCEntityFramework();

MySQL

var serverVersion = ServerVersion.AutoDetect(connectionString);
services.AddCBDDCEntityFrameworkMySql(connectionString, serverVersion);

Migrations

# Add migration
dotnet ef migrations add InitialCreate --context CBDDCContext

# Apply migration
dotnet ef database update --context CBDDCContext

PostgreSQL

Package: ZB.MOM.WW.CBDDC.Persistence.PostgreSQL

Characteristics

✅ JSONB native storage: Optimal JSON handling
✅ GIN indexes: Fast JSON path queries
✅ High performance: Production-grade
✅ Connection resilience: Built-in retry logic
✅ Full ACID: Strong consistency guarantees
⚠️ Future feature: JSONB query translation (roadmap)

When to Use

Production deployments with high traffic
Need advanced JSON querying (future)
Require horizontal scalability
Want best-in-class reliability
Cloud deployments (AWS RDS, Azure Database, etc.)

Configuration

services.AddCBDDCPostgreSql(
    "Host=localhost;Database=CBDDC;Username=user;Password=pass");

// With custom options
services.AddCBDDCPostgreSql(connectionString, options =>
{
    options.EnableSensitiveDataLogging(); // Dev only
    options.CommandTimeout(30);
});

JSONB Indexes

For optimal performance, create GIN indexes via migrations:

protected override void Up(MigrationBuilder migrationBuilder)
{
    migrationBuilder.Sql(@"
        CREATE INDEX IF NOT EXISTS IX_Documents_ContentJson_gin 
        ON ""Documents"" USING GIN (""ContentJson"" jsonb_path_ops);
        
        CREATE INDEX IF NOT EXISTS IX_Oplog_PayloadJson_gin 
        ON ""Oplog"" USING GIN (""PayloadJson"" jsonb_path_ops);
    ");
}

Connection String Examples

Local Development

Host=localhost;Port=5432;Database=CBDDC;Username=admin;Password=secret

Production with SSL

Host=prod-db.example.com;Database=CBDDC;Username=admin;Password=secret;SSL Mode=Require

Connection Pooling

Host=localhost;Database=CBDDC;Username=admin;Password=secret;Pooling=true;Minimum Pool Size=5;Maximum Pool Size=100

Surreal Embedded (RocksDB)

Package: ZB.MOM.WW.CBDDC.Persistence

Characteristics

✅ Embedded + durable: Uses local RocksDB storage via Surreal embedded endpoint
✅ CDC-native workflow: Collection watches emit oplog entries and metadata updates
✅ Durable checkpointing: CDC cursor state is persisted per consumer id
✅ Restart recovery: Oplog + checkpoint data survive process restart and resume catch-up
✅ Loopback suppression: Remote apply path suppresses local CDC re-emission
✅ Idempotent merge window: Duplicate remote entries are merged by deterministic hash

When to Use

Embedded deployments that still need multi-peer replication
Edge nodes where local durability is required without an external DB server
CDC-heavy sync topologies that need restart-safe cursor tracking
Environments that benefit from document-style storage and local operation logs

Configuration

services.AddCBDDCCore()
    .AddCBDDCSurrealEmbedded<SampleDocumentStore>(_ => new CBDDCSurrealEmbeddedOptions
    {
        Endpoint = "rocksdb://local",
        DatabasePath = "/var/lib/cbddc/node-a.rocksdb",
        Namespace = "cbddc",
        Database = "node_a",
        Cdc = new CBDDCSurrealCdcOptions
        {
            Enabled = true,
            ConsumerId = "sync-main",
            CheckpointTable = "cbddc_cdc_checkpoint",
            EnableLiveSelectAccelerator = true,
            LiveSelectReconnectDelay = TimeSpan.FromSeconds(2)
        }
    });

Multi-Dataset Partitioning

Surreal persistence now stores datasetId on oplog, metadata, snapshot metadata, confirmation, and CDC checkpoint records.

Composite indexes include datasetId to prevent cross-dataset reads.
Legacy rows missing datasetId are interpreted as primary during reads.
Dataset-scoped store APIs (ExportAsync(datasetId), GetOplogAfterAsync(..., datasetId, ...)) enforce isolation.

CDC Durability Notes

Checkpoint semantics: each consumer id has an independent durable cursor (timestamp + hash).
Catch-up on restart: read checkpoint, then request oplog entries strictly after the stored timestamp.
Duplicate-window safety: replayed windows are deduplicated by oplog hash merge semantics.
Delete durability: deletes persist as oplog delete operations plus tombstone metadata.
Remote apply behavior: remote sync applies documents without generating local loopback CDC entries.

LMDB Oplog Migration Mode

CBDDC now supports an LMDB-backed oplog provider for staged cutover from Surreal oplog tables.

Registration

services.AddCBDDCCore()
    .AddCBDDCSurrealEmbedded<SampleDocumentStore>(optionsFactory)
    .AddCBDDCLmdbOplog(
        _ => new LmdbOplogOptions
        {
            EnvironmentPath = "/var/lib/cbddc/oplog-lmdb",
            MapSizeBytes = 256L * 1024 * 1024,
            MaxDatabases = 16,
            PruneBatchSize = 512
        },
        flags =>
        {
            flags.UseLmdbOplog = true;
            flags.DualWriteOplog = true;
            flags.PreferLmdbReads = false;
        });

Feature Flags

UseLmdbOplog: enables LMDB migration path.
DualWriteOplog: mirrors writes to Surreal + LMDB.
PreferLmdbReads: cuts reads over to LMDB.
EnableReadShadowValidation: compares Surreal/LMDB read results and logs mismatches.

Consistency Model

The initial migration model is eventual cross-engine atomicity (Option A):

Surreal local CDC transactions remain authoritative for atomic document + metadata persistence.
LMDB is backfilled/reconciled when LMDB reads are preferred and LMDB is missing recent Surreal writes.
During rollout, keep dual-write enabled until mismatch logs remain stable.

Backfill Utility

LmdbOplogBackfillTool performs Surreal -> LMDB oplog backfill and parity validation per dataset:

var backfill = provider.GetRequiredService<LmdbOplogBackfillTool>();
LmdbOplogBackfillReport report = await backfill.BackfillOrThrowAsync(DatasetId.Primary);

Validation includes:

total entry counts
per-node entry counts
latest hash per node
hash spot checks
chain-range spot checks

Migration Telemetry

FeatureFlagOplogStore records migration counters through OplogMigrationTelemetry:

shadow comparisons
shadow mismatches
LMDB preferred-read fallbacks to Surreal
reconciliation runs and reconciled entry counts (global + per dataset)

You can resolve OplogMigrationTelemetry from DI or call GetTelemetrySnapshot() on FeatureFlagOplogStore.

Rollback Path

To roll back read/write behavior to Surreal during migration:

set PreferLmdbReads = false
set DualWriteOplog = false

With UseLmdbOplog = true, this keeps LMDB services available while routing reads/writes to Surreal only. If LMDB should be fully disabled, set UseLmdbOplog = false.

Feature Comparison

Feature	SQLite (Direct)	EF Core	PostgreSQL	Surreal Embedded
Storage Format	File-based	Varies	Server-based	File-based (RocksDB)
JSON Storage	TEXT	NVARCHAR/TEXT	JSONB	Native document records
JSON Indexing	Standard	Standard	GIN/GIST	Table/index schema controls
JSON Queries	`json_extract()`	In-Memory	Native (future)	Native document querying
Concurrent Writes	Good (WAL)	Varies	Excellent	Good (embedded engine limits apply)
Horizontal Scaling	No	Limited	Yes (replication)	Peer replication via CBDDC sync
Migrations	Manual SQL	EF Migrations	EF Migrations	Schema initializer + scripts
Connection Pooling	N/A	Built-in	Built-in	N/A (embedded)
Cloud Support	N/A	Varies	Excellent	Excellent for edge/embedded nodes

Performance Benchmarks

These are approximate figures for comparison:

Write Performance (docs/sec)

Provider	Single Write	Bulk Insert (1000)
SQLite	5,000	50,000
EF Core (SQL Server)	3,000	30,000
PostgreSQL	8,000	80,000

Read Performance (docs/sec)

Provider	Single Read	Query (100 results)
SQLite	10,000	5,000
EF Core (SQL Server)	8,000	4,000
PostgreSQL	12,000	8,000

*Benchmarks vary based on hardware, network, and configuration

Migration Guide

From SQLite to PostgreSQL

Export data from SQLite
Set up PostgreSQL database
Update connection configuration
Import data to PostgreSQL
Verify functionality

From EF Core to PostgreSQL

Change NuGet package reference
Update service registration
Generate new migrations for PostgreSQL
Apply migrations
Test thoroughly

Recommendations

Development

Use: SQLite (Direct)
Why: Fast, simple, portable

Testing

Use: SQLite (Direct) or EF Core with SQLite
Why: Disposable, fast test execution

Production (Low-Medium Scale)

Use: SQLite (Direct) with per-collection tables
Why: Excellent performance, simple ops

Production (High Scale)

Use: PostgreSQL
Why: Best performance, scalability, reliability

Production (Edge / Embedded Mesh)

Use: Surreal Embedded (RocksDB)
Why: Durable local CDC, restart-safe checkpoint resume, no external DB dependency

Enterprise

Use: EF Core with SQL Server or PostgreSQL
Why: Enterprise support, compliance, familiarity

Troubleshooting

SQLite: "Database is locked"

Ensure WAL mode is enabled (automatic)
Increase busy timeout
Check for long-running transactions

EF Core: "Query evaluated in-memory"

Expected for complex JSON queries
Consider PostgreSQL for better JSON support
Use indexes on frequently queried properties

PostgreSQL: "Connection pool exhausted"

Increase Maximum Pool Size
Check for connection leaks
Consider connection pooler (PgBouncer)

Surreal Embedded: "CDC replay after restart"

Ensure Cdc.Enabled=true and a stable Cdc.ConsumerId is configured
Verify checkpoint table contains cursor state for the consumer
Resume from checkpoint timestamp before requesting new oplog window

Surreal Embedded: "Unexpected loopback oplog on remote sync"

Apply remote entries through CBDDC sync/orchestrator paths (not local collection writes)
Keep remote sync guards enabled in document store implementations

Future Enhancements

JSONB Query Translation: Native PostgreSQL JSON queries from QueryNode
MongoDB Provider: NoSQL option for document-heavy workloads
Redis Cache Layer: Hybrid persistence for high-read scenarios
Multi-Master PostgreSQL: Active-active replication support

14 KiB Executable File Raw Blame History

CBDDC Persistence Providers

Overview

SQLite (Direct)

Characteristics

When to Use

Configuration

Performance Tips

EF Core (Generic)

Characteristics

When to Use

Configuration

SQLite

SQL Server

PostgreSQL

MySQL

Migrations

PostgreSQL

Characteristics

When to Use

Configuration

JSONB Indexes

Connection String Examples

Local Development

Production with SSL

Connection Pooling

Surreal Embedded (RocksDB)

Characteristics

When to Use

Configuration

Multi-Dataset Partitioning

CDC Durability Notes

LMDB Oplog Migration Mode

Registration

Feature Flags

Consistency Model

Backfill Utility

Migration Telemetry

Rollback Path

Feature Comparison

Performance Benchmarks

Write Performance (docs/sec)

Read Performance (docs/sec)

Migration Guide

From SQLite to PostgreSQL

From EF Core to PostgreSQL

Recommendations

Development

Testing

Production (Low-Medium Scale)

Production (High Scale)

Production (Edge / Embedded Mesh)

Enterprise

Troubleshooting

SQLite: "Database is locked"

EF Core: "Query evaluated in-memory"

PostgreSQL: "Connection pool exhausted"

Surreal Embedded: "CDC replay after restart"

Surreal Embedded: "Unexpected loopback oplog on remote sync"

Future Enhancements

14 KiB

Executable File

Raw Blame History