Files
CBDDC/docs/persistence-providers.md
Joseph Doherty cce24fa8f3
All checks were successful
NuGet Package Publish / nuget (push) Successful in 1m16s
Add LMDB oplog migration path with dual-write cutover support
Introduce LMDB oplog store, migration flags, telemetry/backfill tooling, and parity tests to enable staged Surreal-to-LMDB rollout with rollback coverage.
2026-02-22 17:44:57 -05:00

14 KiB
Executable File

CBDDC Persistence Providers

CBDDC supports multiple persistence backends to suit different deployment scenarios.

Overview

Provider Best For Performance Setup Production Ready
SQLite (Direct) Embedded apps, single-node Yes
EF Core (Generic) Multi-DB support, migrations Yes
PostgreSQL Production, high load, JSON queries Yes
Surreal Embedded (RocksDB) Embedded multi-peer sync with local CDC Yes

SQLite (Direct)

Package: ZB.MOM.WW.CBDDC.Persistence.Sqlite

Characteristics

  • Zero configuration: Works out of the box
  • Excellent performance: Native SQL, no ORM overhead
  • WAL mode: Concurrent readers + writers
  • Per-collection tables: Optional for better isolation
  • Snapshots: Fast reconnection with SnapshotMetadata
  • Portable: Single file database
  • Limited JSON queries: Uses json_extract()

When to Use

  • Building single-node applications
  • Embedded scenarios (desktop, mobile)
  • Development/testing
  • Maximum simplicity required
  • File-based portability important

Configuration

// Legacy mode (simple)
services.AddCBDDCSqlite("Data Source=cbddc.db");

// New mode (per-collection tables)
services.AddCBDDCSqlite(options =>
{
    options.BasePath = "/var/lib/cbddc";
    options.DatabaseFilenameTemplate = "cbddc-{NodeId}.db";
    options.UsePerCollectionTables = true;
});

Performance Tips

  1. Enable WAL mode (done automatically)
  2. Use per-collection tables for large datasets
  3. Create indexes on frequently queried fields
  4. Keep database on fast storage (SSD)

EF Core (Generic)

Package: ZB.MOM.WW.CBDDC.Persistence.EntityFramework

Characteristics

  • Multi-database support: SQL Server, MySQL, SQLite, PostgreSQL
  • EF Core benefits: Migrations, LINQ, change tracking
  • Type-safe: Strongly-typed entities
  • ⚠️ Query limitation: JSON queries evaluated in-memory
  • ⚠️ ORM overhead: Slightly slower than direct SQL

When to Use

  • Need to support multiple database backends
  • Team familiar with EF Core patterns
  • Want automated migrations
  • Building enterprise applications
  • Database portability is important

Configuration

SQLite

services.AddCBDDCEntityFrameworkSqlite("Data Source=cbddc.db");

SQL Server

services.AddCBDDCEntityFrameworkSqlServer(
    "Server=localhost;Database=CBDDC;Trusted_Connection=True;");

PostgreSQL

services.AddDbContext<CBDDCContext>(options =>
    options.UseNpgsql(connectionString));
services.AddCBDDCEntityFramework();

MySQL

var serverVersion = ServerVersion.AutoDetect(connectionString);
services.AddCBDDCEntityFrameworkMySql(connectionString, serverVersion);

Migrations

# Add migration
dotnet ef migrations add InitialCreate --context CBDDCContext

# Apply migration
dotnet ef database update --context CBDDCContext

PostgreSQL

Package: ZB.MOM.WW.CBDDC.Persistence.PostgreSQL

Characteristics

  • JSONB native storage: Optimal JSON handling
  • GIN indexes: Fast JSON path queries
  • High performance: Production-grade
  • Connection resilience: Built-in retry logic
  • Full ACID: Strong consistency guarantees
  • ⚠️ Future feature: JSONB query translation (roadmap)

When to Use

  • Production deployments with high traffic
  • Need advanced JSON querying (future)
  • Require horizontal scalability
  • Want best-in-class reliability
  • Cloud deployments (AWS RDS, Azure Database, etc.)

Configuration

services.AddCBDDCPostgreSql(
    "Host=localhost;Database=CBDDC;Username=user;Password=pass");

// With custom options
services.AddCBDDCPostgreSql(connectionString, options =>
{
    options.EnableSensitiveDataLogging(); // Dev only
    options.CommandTimeout(30);
});

JSONB Indexes

For optimal performance, create GIN indexes via migrations:

protected override void Up(MigrationBuilder migrationBuilder)
{
    migrationBuilder.Sql(@"
        CREATE INDEX IF NOT EXISTS IX_Documents_ContentJson_gin 
        ON ""Documents"" USING GIN (""ContentJson"" jsonb_path_ops);
        
        CREATE INDEX IF NOT EXISTS IX_Oplog_PayloadJson_gin 
        ON ""Oplog"" USING GIN (""PayloadJson"" jsonb_path_ops);
    ");
}

Connection String Examples

Local Development

Host=localhost;Port=5432;Database=CBDDC;Username=admin;Password=secret

Production with SSL

Host=prod-db.example.com;Database=CBDDC;Username=admin;Password=secret;SSL Mode=Require

Connection Pooling

Host=localhost;Database=CBDDC;Username=admin;Password=secret;Pooling=true;Minimum Pool Size=5;Maximum Pool Size=100

Surreal Embedded (RocksDB)

Package: ZB.MOM.WW.CBDDC.Persistence

Characteristics

  • Embedded + durable: Uses local RocksDB storage via Surreal embedded endpoint
  • CDC-native workflow: Collection watches emit oplog entries and metadata updates
  • Durable checkpointing: CDC cursor state is persisted per consumer id
  • Restart recovery: Oplog + checkpoint data survive process restart and resume catch-up
  • Loopback suppression: Remote apply path suppresses local CDC re-emission
  • Idempotent merge window: Duplicate remote entries are merged by deterministic hash

When to Use

  • Embedded deployments that still need multi-peer replication
  • Edge nodes where local durability is required without an external DB server
  • CDC-heavy sync topologies that need restart-safe cursor tracking
  • Environments that benefit from document-style storage and local operation logs

Configuration

services.AddCBDDCCore()
    .AddCBDDCSurrealEmbedded<SampleDocumentStore>(_ => new CBDDCSurrealEmbeddedOptions
    {
        Endpoint = "rocksdb://local",
        DatabasePath = "/var/lib/cbddc/node-a.rocksdb",
        Namespace = "cbddc",
        Database = "node_a",
        Cdc = new CBDDCSurrealCdcOptions
        {
            Enabled = true,
            ConsumerId = "sync-main",
            CheckpointTable = "cbddc_cdc_checkpoint",
            EnableLiveSelectAccelerator = true,
            LiveSelectReconnectDelay = TimeSpan.FromSeconds(2)
        }
    });

Multi-Dataset Partitioning

Surreal persistence now stores datasetId on oplog, metadata, snapshot metadata, confirmation, and CDC checkpoint records.

  • Composite indexes include datasetId to prevent cross-dataset reads.
  • Legacy rows missing datasetId are interpreted as primary during reads.
  • Dataset-scoped store APIs (ExportAsync(datasetId), GetOplogAfterAsync(..., datasetId, ...)) enforce isolation.

CDC Durability Notes

  1. Checkpoint semantics: each consumer id has an independent durable cursor (timestamp + hash).
  2. Catch-up on restart: read checkpoint, then request oplog entries strictly after the stored timestamp.
  3. Duplicate-window safety: replayed windows are deduplicated by oplog hash merge semantics.
  4. Delete durability: deletes persist as oplog delete operations plus tombstone metadata.
  5. Remote apply behavior: remote sync applies documents without generating local loopback CDC entries.

LMDB Oplog Migration Mode

CBDDC now supports an LMDB-backed oplog provider for staged cutover from Surreal oplog tables.

Registration

services.AddCBDDCCore()
    .AddCBDDCSurrealEmbedded<SampleDocumentStore>(optionsFactory)
    .AddCBDDCLmdbOplog(
        _ => new LmdbOplogOptions
        {
            EnvironmentPath = "/var/lib/cbddc/oplog-lmdb",
            MapSizeBytes = 256L * 1024 * 1024,
            MaxDatabases = 16,
            PruneBatchSize = 512
        },
        flags =>
        {
            flags.UseLmdbOplog = true;
            flags.DualWriteOplog = true;
            flags.PreferLmdbReads = false;
        });

Feature Flags

  • UseLmdbOplog: enables LMDB migration path.
  • DualWriteOplog: mirrors writes to Surreal + LMDB.
  • PreferLmdbReads: cuts reads over to LMDB.
  • EnableReadShadowValidation: compares Surreal/LMDB read results and logs mismatches.

Consistency Model

The initial migration model is eventual cross-engine atomicity (Option A):

  • Surreal local CDC transactions remain authoritative for atomic document + metadata persistence.
  • LMDB is backfilled/reconciled when LMDB reads are preferred and LMDB is missing recent Surreal writes.
  • During rollout, keep dual-write enabled until mismatch logs remain stable.

Backfill Utility

LmdbOplogBackfillTool performs Surreal -> LMDB oplog backfill and parity validation per dataset:

var backfill = provider.GetRequiredService<LmdbOplogBackfillTool>();
LmdbOplogBackfillReport report = await backfill.BackfillOrThrowAsync(DatasetId.Primary);

Validation includes:

  • total entry counts
  • per-node entry counts
  • latest hash per node
  • hash spot checks
  • chain-range spot checks

Migration Telemetry

FeatureFlagOplogStore records migration counters through OplogMigrationTelemetry:

  • shadow comparisons
  • shadow mismatches
  • LMDB preferred-read fallbacks to Surreal
  • reconciliation runs and reconciled entry counts (global + per dataset)

You can resolve OplogMigrationTelemetry from DI or call GetTelemetrySnapshot() on FeatureFlagOplogStore.

Rollback Path

To roll back read/write behavior to Surreal during migration:

  • set PreferLmdbReads = false
  • set DualWriteOplog = false

With UseLmdbOplog = true, this keeps LMDB services available while routing reads/writes to Surreal only. If LMDB should be fully disabled, set UseLmdbOplog = false.

Feature Comparison

Feature SQLite (Direct) EF Core PostgreSQL Surreal Embedded
Storage Format File-based Varies Server-based File-based (RocksDB)
JSON Storage TEXT NVARCHAR/TEXT JSONB Native document records
JSON Indexing Standard Standard GIN/GIST Table/index schema controls
JSON Queries json_extract() In-Memory Native (future) Native document querying
Concurrent Writes Good (WAL) Varies Excellent Good (embedded engine limits apply)
Horizontal Scaling No Limited Yes (replication) Peer replication via CBDDC sync
Migrations Manual SQL EF Migrations EF Migrations Schema initializer + scripts
Connection Pooling N/A Built-in Built-in N/A (embedded)
Cloud Support N/A Varies Excellent Excellent for edge/embedded nodes

Performance Benchmarks

These are approximate figures for comparison:

Write Performance (docs/sec)

Provider Single Write Bulk Insert (1000)
SQLite 5,000 50,000
EF Core (SQL Server) 3,000 30,000
PostgreSQL 8,000 80,000

Read Performance (docs/sec)

Provider Single Read Query (100 results)
SQLite 10,000 5,000
EF Core (SQL Server) 8,000 4,000
PostgreSQL 12,000 8,000

*Benchmarks vary based on hardware, network, and configuration

Migration Guide

From SQLite to PostgreSQL

  1. Export data from SQLite
  2. Set up PostgreSQL database
  3. Update connection configuration
  4. Import data to PostgreSQL
  5. Verify functionality

From EF Core to PostgreSQL

  1. Change NuGet package reference
  2. Update service registration
  3. Generate new migrations for PostgreSQL
  4. Apply migrations
  5. Test thoroughly

Recommendations

Development

  • Use: SQLite (Direct)
  • Why: Fast, simple, portable

Testing

  • Use: SQLite (Direct) or EF Core with SQLite
  • Why: Disposable, fast test execution

Production (Low-Medium Scale)

  • Use: SQLite (Direct) with per-collection tables
  • Why: Excellent performance, simple ops

Production (High Scale)

  • Use: PostgreSQL
  • Why: Best performance, scalability, reliability

Production (Edge / Embedded Mesh)

  • Use: Surreal Embedded (RocksDB)
  • Why: Durable local CDC, restart-safe checkpoint resume, no external DB dependency

Enterprise

  • Use: EF Core with SQL Server or PostgreSQL
  • Why: Enterprise support, compliance, familiarity

Troubleshooting

SQLite: "Database is locked"

  • Ensure WAL mode is enabled (automatic)
  • Increase busy timeout
  • Check for long-running transactions

EF Core: "Query evaluated in-memory"

  • Expected for complex JSON queries
  • Consider PostgreSQL for better JSON support
  • Use indexes on frequently queried properties

PostgreSQL: "Connection pool exhausted"

  • Increase Maximum Pool Size
  • Check for connection leaks
  • Consider connection pooler (PgBouncer)

Surreal Embedded: "CDC replay after restart"

  • Ensure Cdc.Enabled=true and a stable Cdc.ConsumerId is configured
  • Verify checkpoint table contains cursor state for the consumer
  • Resume from checkpoint timestamp before requesting new oplog window

Surreal Embedded: "Unexpected loopback oplog on remote sync"

  • Apply remote entries through CBDDC sync/orchestrator paths (not local collection writes)
  • Keep remote sync guards enabled in document store implementations

Future Enhancements

  • JSONB Query Translation: Native PostgreSQL JSON queries from QueryNode
  • MongoDB Provider: NoSQL option for document-heavy workloads
  • Redis Cache Layer: Hybrid persistence for high-read scenarios
  • Multi-Master PostgreSQL: Active-active replication support