Files
jdescopingtool/PLANS/2026-01-06-dbexporter-design.md
T
Joseph Doherty d2136cacf7 fix(DbExporter): fix compressed size calculation and clean up
- Move file size read after streams are disposed to get accurate compressed size
- Clean up definition files to use working example queries
- Add .gitignore for output directory
2026-01-06 17:06:16 -05:00

3.6 KiB

DbExporter Tool Design

Purpose

A command-line tool that queries databases (SQL Server or Oracle) and exports results to compressed protobuf files using protobuf-net-data and zstd compression.

CLI Interface

Usage: DbExporter <definition-file> [options]

Arguments:
  definition-file    Path to JSON definition file

Options:
  --verify          Verify output (row count + schema)
  --verify-full     Verify output with SHA256 checksum
  --help            Show help

Examples:

# Export data
dotnet run -- ./definitions/scada-clients.json

# Export and verify
dotnet run -- ./definitions/scada-clients.json --verify

# Full verification with checksum
dotnet run -- ./definitions/scada-clients.json --verify-full

Definition File Format (JSON)

{
  "providerType": "SqlServer",
  "connectionString": "Server=...;Database=...;User Id=...;Password=...;",
  "query": "SELECT * FROM MyTable",
  "outputPath": "./output/mytable.pb.zstd",
  "compressionLevel": 10
}
Field Required Default Description
providerType Yes - "SqlServer" or "Oracle"
connectionString Yes - ADO.NET connection string
query Yes - SQL query to execute
outputPath Yes - Output file path (.pb.zstd)
compressionLevel No 10 Zstd level 1-19 (higher = smaller, slower)

Core Workflow

Export Flow

  1. Parse definition file (JSON)
  2. Validate fields (provider type, connection string, query)
  3. Create appropriate DbConnection (SqlConnection or OracleConnection)
  4. Execute query → IDataReader
  5. Serialize IDataReader → protobuf stream (via protobuf-net-data)
  6. Compress protobuf stream → zstd (via ZstdSharp)
  7. While writing, compute SHA256 incrementally
  8. Write to output file + sidecar .sha256 file
  9. Print summary: row count, file size, compression ratio

Verify Flow (--verify)

  1. Open output file
  2. Decompress zstd → protobuf stream
  3. Deserialize protobuf → IDataReader
  4. Loop through all rows, count them (streaming)
  5. Extract schema (column names + types)
  6. Print: ✓ row count, schema

Verify-Full Flow (--verify-full)

  1. Open output file
  2. Decompress zstd → stream protobuf data
  3. While streaming: count rows, extract schema, compute SHA256 incrementally
  4. Compare computed SHA256 to stored sidecar file
  5. Print: ✓ row count, schema, checksum match/mismatch

Project Structure

Tools/DbExporter/
├── DbExporter.csproj
├── Program.cs              # CLI entry point, argument parsing
├── ExportDefinition.cs     # JSON model for definition file
├── DatabaseExporter.cs     # Core export logic
└── Verifier.cs             # Verify and verify-full logic

Dependencies

Package Purpose
protobuf-net-data Serialize IDataReader to protobuf
ZstdSharp.Port Zstd compression
Microsoft.Data.SqlClient SQL Server connectivity
Oracle.ManagedDataAccess.Core Oracle connectivity
System.Text.Json Parse definition files

Target Framework: net10.0

Testing with ScadaBridge

Connection:

Server=10.100.0.35;Database=ScadaBridge_Test;User Id=sa;Password=ScadaBridge2024;TrustServerCertificate=true;

Definition files will be created in Tools/DbExporter/definitions/ for ScadaBridge tables.

Test approach:

  1. Build the tool
  2. Run export for each definition file
  3. Run --verify to confirm row counts and schemas
  4. Run --verify-full on at least one to confirm checksum works