Files
jdescopingtool/PLANS/2026-01-06-dbexporter-design.md
T
Joseph Doherty d2136cacf7 fix(DbExporter): fix compressed size calculation and clean up
- Move file size read after streams are disposed to get accurate compressed size
- Clean up definition files to use working example queries
- Add .gitignore for output directory
2026-01-06 17:06:16 -05:00

118 lines
3.6 KiB
Markdown

# DbExporter Tool Design
## Purpose
A command-line tool that queries databases (SQL Server or Oracle) and exports results to compressed protobuf files using `protobuf-net-data` and zstd compression.
## CLI Interface
```
Usage: DbExporter <definition-file> [options]
Arguments:
definition-file Path to JSON definition file
Options:
--verify Verify output (row count + schema)
--verify-full Verify output with SHA256 checksum
--help Show help
```
**Examples:**
```bash
# Export data
dotnet run -- ./definitions/scada-clients.json
# Export and verify
dotnet run -- ./definitions/scada-clients.json --verify
# Full verification with checksum
dotnet run -- ./definitions/scada-clients.json --verify-full
```
## Definition File Format (JSON)
```json
{
"providerType": "SqlServer",
"connectionString": "Server=...;Database=...;User Id=...;Password=...;",
"query": "SELECT * FROM MyTable",
"outputPath": "./output/mytable.pb.zstd",
"compressionLevel": 10
}
```
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `providerType` | Yes | - | `"SqlServer"` or `"Oracle"` |
| `connectionString` | Yes | - | ADO.NET connection string |
| `query` | Yes | - | SQL query to execute |
| `outputPath` | Yes | - | Output file path (.pb.zstd) |
| `compressionLevel` | No | `10` | Zstd level 1-19 (higher = smaller, slower) |
## Core Workflow
### Export Flow
1. Parse definition file (JSON)
2. Validate fields (provider type, connection string, query)
3. Create appropriate DbConnection (SqlConnection or OracleConnection)
4. Execute query → IDataReader
5. Serialize IDataReader → protobuf stream (via protobuf-net-data)
6. Compress protobuf stream → zstd (via ZstdSharp)
7. While writing, compute SHA256 incrementally
8. Write to output file + sidecar .sha256 file
9. Print summary: row count, file size, compression ratio
### Verify Flow (--verify)
1. Open output file
2. Decompress zstd → protobuf stream
3. Deserialize protobuf → IDataReader
4. Loop through all rows, count them (streaming)
5. Extract schema (column names + types)
6. Print: ✓ row count, schema
### Verify-Full Flow (--verify-full)
1. Open output file
2. Decompress zstd → stream protobuf data
3. While streaming: count rows, extract schema, compute SHA256 incrementally
4. Compare computed SHA256 to stored sidecar file
5. Print: ✓ row count, schema, checksum match/mismatch
## Project Structure
```
Tools/DbExporter/
├── DbExporter.csproj
├── Program.cs # CLI entry point, argument parsing
├── ExportDefinition.cs # JSON model for definition file
├── DatabaseExporter.cs # Core export logic
└── Verifier.cs # Verify and verify-full logic
```
## Dependencies
| Package | Purpose |
|---------|---------|
| `protobuf-net-data` | Serialize IDataReader to protobuf |
| `ZstdSharp.Port` | Zstd compression |
| `Microsoft.Data.SqlClient` | SQL Server connectivity |
| `Oracle.ManagedDataAccess.Core` | Oracle connectivity |
| `System.Text.Json` | Parse definition files |
**Target Framework:** `net10.0`
## Testing with ScadaBridge
**Connection:**
```
Server=10.100.0.35;Database=ScadaBridge_Test;User Id=sa;Password=ScadaBridge2024;TrustServerCertificate=true;
```
Definition files will be created in `Tools/DbExporter/definitions/` for ScadaBridge tables.
**Test approach:**
1. Build the tool
2. Run export for each definition file
3. Run `--verify` to confirm row counts and schemas
4. Run `--verify-full` on at least one to confirm checksum works