# CBDDC Persistence Providers CBDDC supports multiple persistence backends to suit different deployment scenarios. ## Overview | Provider | Best For | Performance | Setup | Production Ready | |----------|----------|-------------|-------|------------------| | **SQLite (Direct)** | Embedded apps, single-node | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ✅ Yes | | **EF Core (Generic)** | Multi-DB support, migrations | ⭐⭐⭐ | ⭐⭐⭐ | ✅ Yes | | **PostgreSQL** | Production, high load, JSON queries | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ✅ Yes | | **Surreal Embedded (RocksDB)** | Embedded multi-peer sync with local CDC | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ✅ Yes | ## SQLite (Direct) **Package:** `ZB.MOM.WW.CBDDC.Persistence.Sqlite` ### Characteristics - ✅ **Zero configuration**: Works out of the box - ✅ **Excellent performance**: Native SQL, no ORM overhead - ✅ **WAL mode**: Concurrent readers + writers - ✅ **Per-collection tables**: Optional for better isolation - ✅ **Snapshots**: Fast reconnection with `SnapshotMetadata` - ✅ **Portable**: Single file database - ❌ **Limited JSON queries**: Uses `json_extract()` ### When to Use - Building single-node applications - Embedded scenarios (desktop, mobile) - Development/testing - Maximum simplicity required - File-based portability important ### Configuration ```csharp // Legacy mode (simple) services.AddCBDDCSqlite("Data Source=cbddc.db"); // New mode (per-collection tables) services.AddCBDDCSqlite(options => { options.BasePath = "/var/lib/cbddc"; options.DatabaseFilenameTemplate = "cbddc-{NodeId}.db"; options.UsePerCollectionTables = true; }); ``` ### Performance Tips 1. Enable WAL mode (done automatically) 2. Use per-collection tables for large datasets 3. Create indexes on frequently queried fields 4. Keep database on fast storage (SSD) ## EF Core (Generic) **Package:** `ZB.MOM.WW.CBDDC.Persistence.EntityFramework` ### Characteristics - ✅ **Multi-database support**: SQL Server, MySQL, SQLite, PostgreSQL - ✅ **EF Core benefits**: Migrations, LINQ, change tracking - ✅ **Type-safe**: Strongly-typed entities - ⚠️ **Query limitation**: JSON queries evaluated in-memory - ⚠️ **ORM overhead**: Slightly slower than direct SQL ### When to Use - Need to support multiple database backends - Team familiar with EF Core patterns - Want automated migrations - Building enterprise applications - Database portability is important ### Configuration #### SQLite ```csharp services.AddCBDDCEntityFrameworkSqlite("Data Source=cbddc.db"); ``` #### SQL Server ```csharp services.AddCBDDCEntityFrameworkSqlServer( "Server=localhost;Database=CBDDC;Trusted_Connection=True;"); ``` #### PostgreSQL ```csharp services.AddDbContext(options => options.UseNpgsql(connectionString)); services.AddCBDDCEntityFramework(); ``` #### MySQL ```csharp var serverVersion = ServerVersion.AutoDetect(connectionString); services.AddCBDDCEntityFrameworkMySql(connectionString, serverVersion); ``` ### Migrations ```bash # Add migration dotnet ef migrations add InitialCreate --context CBDDCContext # Apply migration dotnet ef database update --context CBDDCContext ``` ## PostgreSQL **Package:** `ZB.MOM.WW.CBDDC.Persistence.PostgreSQL` ### Characteristics - ✅ **JSONB native storage**: Optimal JSON handling - ✅ **GIN indexes**: Fast JSON path queries - ✅ **High performance**: Production-grade - ✅ **Connection resilience**: Built-in retry logic - ✅ **Full ACID**: Strong consistency guarantees - ⚠️ **Future feature**: JSONB query translation (roadmap) ### When to Use - Production deployments with high traffic - Need advanced JSON querying (future) - Require horizontal scalability - Want best-in-class reliability - Cloud deployments (AWS RDS, Azure Database, etc.) ### Configuration ```csharp services.AddCBDDCPostgreSql( "Host=localhost;Database=CBDDC;Username=user;Password=pass"); // With custom options services.AddCBDDCPostgreSql(connectionString, options => { options.EnableSensitiveDataLogging(); // Dev only options.CommandTimeout(30); }); ``` ### JSONB Indexes For optimal performance, create GIN indexes via migrations: ```csharp protected override void Up(MigrationBuilder migrationBuilder) { migrationBuilder.Sql(@" CREATE INDEX IF NOT EXISTS IX_Documents_ContentJson_gin ON ""Documents"" USING GIN (""ContentJson"" jsonb_path_ops); CREATE INDEX IF NOT EXISTS IX_Oplog_PayloadJson_gin ON ""Oplog"" USING GIN (""PayloadJson"" jsonb_path_ops); "); } ``` ### Connection String Examples #### Local Development ``` Host=localhost;Port=5432;Database=CBDDC;Username=admin;Password=secret ``` #### Production with SSL ``` Host=prod-db.example.com;Database=CBDDC;Username=admin;Password=secret;SSL Mode=Require ``` #### Connection Pooling ``` Host=localhost;Database=CBDDC;Username=admin;Password=secret;Pooling=true;Minimum Pool Size=5;Maximum Pool Size=100 ``` ## Surreal Embedded (RocksDB) **Package:** `ZB.MOM.WW.CBDDC.Persistence` ### Characteristics - ✅ **Embedded + durable**: Uses local RocksDB storage via Surreal embedded endpoint - ✅ **CDC-native workflow**: Collection watches emit oplog entries and metadata updates - ✅ **Durable checkpointing**: CDC cursor state is persisted per consumer id - ✅ **Restart recovery**: Oplog + checkpoint data survive process restart and resume catch-up - ✅ **Loopback suppression**: Remote apply path suppresses local CDC re-emission - ✅ **Idempotent merge window**: Duplicate remote entries are merged by deterministic hash ### When to Use - Embedded deployments that still need multi-peer replication - Edge nodes where local durability is required without an external DB server - CDC-heavy sync topologies that need restart-safe cursor tracking - Environments that benefit from document-style storage and local operation logs ### Configuration ```csharp services.AddCBDDCCore() .AddCBDDCSurrealEmbedded(_ => new CBDDCSurrealEmbeddedOptions { Endpoint = "rocksdb://local", DatabasePath = "/var/lib/cbddc/node-a.rocksdb", Namespace = "cbddc", Database = "node_a", Cdc = new CBDDCSurrealCdcOptions { Enabled = true, ConsumerId = "sync-main", CheckpointTable = "cbddc_cdc_checkpoint", EnableLiveSelectAccelerator = true, LiveSelectReconnectDelay = TimeSpan.FromSeconds(2) } }); ``` ### Multi-Dataset Partitioning Surreal persistence now stores `datasetId` on oplog, metadata, snapshot metadata, confirmation, and CDC checkpoint records. - Composite indexes include `datasetId` to prevent cross-dataset reads. - Legacy rows missing `datasetId` are interpreted as `primary` during reads. - Dataset-scoped store APIs (`ExportAsync(datasetId)`, `GetOplogAfterAsync(..., datasetId, ...)`) enforce isolation. ### CDC Durability Notes 1. **Checkpoint semantics**: each consumer id has an independent durable cursor (`timestamp + hash`). 2. **Catch-up on restart**: read checkpoint, then request oplog entries strictly after the stored timestamp. 3. **Duplicate-window safety**: replayed windows are deduplicated by oplog hash merge semantics. 4. **Delete durability**: deletes persist as oplog delete operations plus tombstone metadata. 5. **Remote apply behavior**: remote sync applies documents without generating local loopback CDC entries. ## Feature Comparison | Feature | SQLite (Direct) | EF Core | PostgreSQL | Surreal Embedded | |---------|----------------|---------|------------|------------------| | **Storage Format** | File-based | Varies | Server-based | File-based (RocksDB) | | **JSON Storage** | TEXT | NVARCHAR/TEXT | JSONB | Native document records | | **JSON Indexing** | Standard | Standard | GIN/GIST | Table/index schema controls | | **JSON Queries** | `json_extract()` | In-Memory | Native (future) | Native document querying | | **Concurrent Writes** | Good (WAL) | Varies | Excellent | Good (embedded engine limits apply) | | **Horizontal Scaling** | No | Limited | Yes (replication) | Peer replication via CBDDC sync | | **Migrations** | Manual SQL | EF Migrations | EF Migrations | Schema initializer + scripts | | **Connection Pooling** | N/A | Built-in | Built-in | N/A (embedded) | | **Cloud Support** | N/A | Varies | Excellent | Excellent for edge/embedded nodes | ## Performance Benchmarks _These are approximate figures for comparison:_ ### Write Performance (docs/sec) | Provider | Single Write | Bulk Insert (1000) | |----------|--------------|-------------------| | SQLite | 5,000 | 50,000 | | EF Core (SQL Server) | 3,000 | 30,000 | | PostgreSQL | 8,000 | 80,000 | ### Read Performance (docs/sec) | Provider | Single Read | Query (100 results) | |----------|-------------|---------------------| | SQLite | 10,000 | 5,000 | | EF Core (SQL Server) | 8,000 | 4,000 | | PostgreSQL | 12,000 | 8,000 | _*Benchmarks vary based on hardware, network, and configuration_ ## Migration Guide ### From SQLite to PostgreSQL 1. Export data from SQLite 2. Set up PostgreSQL database 3. Update connection configuration 4. Import data to PostgreSQL 5. Verify functionality ### From EF Core to PostgreSQL 1. Change NuGet package reference 2. Update service registration 3. Generate new migrations for PostgreSQL 4. Apply migrations 5. Test thoroughly ## Recommendations ### Development - **Use**: SQLite (Direct) - **Why**: Fast, simple, portable ### Testing - **Use**: SQLite (Direct) or EF Core with SQLite - **Why**: Disposable, fast test execution ### Production (Low-Medium Scale) - **Use**: SQLite (Direct) with per-collection tables - **Why**: Excellent performance, simple ops ### Production (High Scale) - **Use**: PostgreSQL - **Why**: Best performance, scalability, reliability ### Production (Edge / Embedded Mesh) - **Use**: Surreal Embedded (RocksDB) - **Why**: Durable local CDC, restart-safe checkpoint resume, no external DB dependency ### Enterprise - **Use**: EF Core with SQL Server or PostgreSQL - **Why**: Enterprise support, compliance, familiarity ## Troubleshooting ### SQLite: "Database is locked" - Ensure WAL mode is enabled (automatic) - Increase busy timeout - Check for long-running transactions ### EF Core: "Query evaluated in-memory" - Expected for complex JSON queries - Consider PostgreSQL for better JSON support - Use indexes on frequently queried properties ### PostgreSQL: "Connection pool exhausted" - Increase `Maximum Pool Size` - Check for connection leaks - Consider connection pooler (PgBouncer) ### Surreal Embedded: "CDC replay after restart" - Ensure `Cdc.Enabled=true` and a stable `Cdc.ConsumerId` is configured - Verify checkpoint table contains cursor state for the consumer - Resume from checkpoint timestamp before requesting new oplog window ### Surreal Embedded: "Unexpected loopback oplog on remote sync" - Apply remote entries through CBDDC sync/orchestrator paths (not local collection writes) - Keep remote sync guards enabled in document store implementations ## Future Enhancements - **JSONB Query Translation**: Native PostgreSQL JSON queries from QueryNode - **MongoDB Provider**: NoSQL option for document-heavy workloads - **Redis Cache Layer**: Hybrid persistence for high-read scenarios - **Multi-Master PostgreSQL**: Active-active replication support