protobuf-net-data is purpose-built for IDataReader serialization and returns IDataReader directly from Deserialize(), eliminating the need for custom streaming reader implementations.
5.6 KiB
Protobuf Cache Conversion Design
Purpose
Convert the development cache files in CACHED_DB_FILES/ from zstd-compressed JSON (.json.zstd) to zstd-compressed Protocol Buffers (.pb.zstd) using protobuf-net-data for faster deserialization and smaller file sizes.
Goals
- Faster deserialization - Protobuf is faster to parse than JSON
- Smaller file sizes - Protobuf is more compact than JSON
- Simpler code - protobuf-net-data returns
IDataReaderdirectly, no custom reader needed
Current State
- 22 cache files in
CACHED_DB_FILES/totaling ~3.6 GB (zstd-compressed JSON) JsonZstdFileSourcereads files using ZstdSharp + Utf8JsonReader- Custom
Utf8JsonStreamingDataReaderimplementsIDataReaderfor streaming - Each
*DevEtl.csdefines a schema and creates a pipeline from JSON files
Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Serialization library | protobuf-net-data | Purpose-built for IDataReader, returns IDataReader directly |
| Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated |
| Compression | zstd on whole file | Consistent with current approach, excellent compression |
| Converter location | Standalone console app in Tools/CacheConverter/ |
Isolated utility, not part of main solution |
File Format
New extension: .pb.zstd
File naming:
branch.json.zstd→branch.pb.zstdworkordertime_curr.json.zstd→workordertime_curr.pb.zstd
Data structure: protobuf-net-data binary format
- Schema embedded in stream (column names, types, nullability)
- Rows serialized sequentially
- Native ADO.NET type support (DateTime, Guid, decimal, etc.)
Libraries:
protobuf-net-data- IDataReader serialization/deserializationZstdSharp.Port- compression
Components
1. Converter Tool
Location: /JdeScopingTool/Tools/CacheConverter/
Tools/
└── CacheConverter/
├── CacheConverter.csproj
└── Program.cs
Dependencies:
ZstdSharp.Port- read zstd JSON, write zstd protobufprotobuf-net-data- protobuf serialization
Behavior:
- Read each
.json.zstdfile fromCACHED_DB_FILES/ - Decompress and parse JSON into an
IDataReader - Use
DataSerializer.Serialize(stream, reader)to write protobuf - Compress with zstd and write to
.pb.zstd - Print before/after sizes for comparison
Usage:
cd Tools/CacheConverter
dotnet run -- ../../CACHED_DB_FILES
2. ProtobufZstdFileSource
New file: NEW/src/JdeScoping.DataSync.Dev/Sources/ProtobufZstdFileSource.cs
Key simplification: No custom IDataReader implementation needed!
public sealed class ProtobufZstdFileSource : IImportSource
{
public async Task<IDataReader> ReadDataAsync(CancellationToken ct = default)
{
_fileStream = new FileStream(_filePath, FileMode.Open, ...);
_decompressionStream = new DecompressionStream(_fileStream);
// protobuf-net-data returns IDataReader directly!
return DataSerializer.Deserialize(_decompressionStream);
}
}
Package additions to JdeScoping.DataSync.Dev.csproj:
protobuf-net-data
3. DevEtl Class Updates
Changes to each *DevEtl.cs file (22 files):
-
Update
CacheFileNameconstant:// Before public static readonly string CacheFileName = "branch.json.zstd"; // After public static readonly string CacheFileName = "branch.pb.zstd"; -
Update
Create()method:// Before .WithSource(new JsonZstdFileSource(cacheFilePath, Schema)) // After .WithSource(new ProtobufZstdFileSource(cacheFilePath)) -
Remove schema definitions - protobuf-net-data embeds schema in the file, so
JsonColumnSchema[]arrays are no longer needed in DevEtl classes.
No changes to:
- Pipeline structure
DevEtlRegistry.cs
4. Cleanup (After Verification)
Remove obsolete files:
Sources/JsonZstdFileSource.csSources/JsonStreamingDataReader.csSources/Utf8JsonStreamingDataReader.csModels/JsonColumnSchema.cs
Remove old cache files:
- All
*.json.zstdfiles inCACHED_DB_FILES/
Code Simplification Summary
| Before (JSON) | After (Protobuf) |
|---|---|
JsonZstdFileSource |
ProtobufZstdFileSource |
Utf8JsonStreamingDataReader (custom) |
DataSerializer.Deserialize() (library) |
JsonStreamingDataReader (legacy) |
Removed |
JsonColumnSchema[] per table |
Not needed (embedded in file) |
Test Strategy
- Run converter tool, verify all 22 files convert without errors
- Compare file sizes (expect 10-30% reduction)
- Run existing
JdeScoping.DataSync.Dev.Tests- all tests should pass unchanged - Verify data loaded matches previous JSON-based loads
Files Changed
| File | Change |
|---|---|
Tools/CacheConverter/ (new) |
Standalone converter tool |
Sources/ProtobufZstdFileSource.cs (new) |
New protobuf reader (much simpler) |
JdeScoping.DataSync.Dev.csproj |
Add protobuf-net-data package |
*DevEtl.cs (22 files) |
Update file extension, source class, remove schema |
Sources/JsonZstdFileSource.cs |
Delete after migration |
Sources/JsonStreamingDataReader.cs |
Delete after migration |
Sources/Utf8JsonStreamingDataReader.cs |
Delete after migration |
Models/JsonColumnSchema.cs |
Delete after migration |
CACHED_DB_FILES/*.json.zstd |
Delete after verification |