8ce9a7dae1
protobuf-net-data is purpose-built for IDataReader serialization and returns IDataReader directly from Deserialize(), eliminating the need for custom streaming reader implementations.
165 lines
5.6 KiB
Markdown
165 lines
5.6 KiB
Markdown
# Protobuf Cache Conversion Design
|
|
|
|
## Purpose
|
|
|
|
Convert the development cache files in `CACHED_DB_FILES/` from zstd-compressed JSON (`.json.zstd`) to zstd-compressed Protocol Buffers (`.pb.zstd`) using protobuf-net-data for faster deserialization and smaller file sizes.
|
|
|
|
## Goals
|
|
|
|
1. **Faster deserialization** - Protobuf is faster to parse than JSON
|
|
2. **Smaller file sizes** - Protobuf is more compact than JSON
|
|
3. **Simpler code** - protobuf-net-data returns `IDataReader` directly, no custom reader needed
|
|
|
|
## Current State
|
|
|
|
- 22 cache files in `CACHED_DB_FILES/` totaling ~3.6 GB (zstd-compressed JSON)
|
|
- `JsonZstdFileSource` reads files using ZstdSharp + Utf8JsonReader
|
|
- Custom `Utf8JsonStreamingDataReader` implements `IDataReader` for streaming
|
|
- Each `*DevEtl.cs` defines a schema and creates a pipeline from JSON files
|
|
|
|
## Design Decisions
|
|
|
|
| Decision | Choice | Rationale |
|
|
|----------|--------|-----------|
|
|
| Serialization library | protobuf-net-data | Purpose-built for IDataReader, returns IDataReader directly |
|
|
| Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated |
|
|
| Compression | zstd on whole file | Consistent with current approach, excellent compression |
|
|
| Converter location | Standalone console app in `Tools/CacheConverter/` | Isolated utility, not part of main solution |
|
|
|
|
## File Format
|
|
|
|
**New extension:** `.pb.zstd`
|
|
|
|
**File naming:**
|
|
- `branch.json.zstd` → `branch.pb.zstd`
|
|
- `workordertime_curr.json.zstd` → `workordertime_curr.pb.zstd`
|
|
|
|
**Data structure:** protobuf-net-data binary format
|
|
- Schema embedded in stream (column names, types, nullability)
|
|
- Rows serialized sequentially
|
|
- Native ADO.NET type support (DateTime, Guid, decimal, etc.)
|
|
|
|
**Libraries:**
|
|
- `protobuf-net-data` - IDataReader serialization/deserialization
|
|
- `ZstdSharp.Port` - compression
|
|
|
|
## Components
|
|
|
|
### 1. Converter Tool
|
|
|
|
**Location:** `/JdeScopingTool/Tools/CacheConverter/`
|
|
|
|
```
|
|
Tools/
|
|
└── CacheConverter/
|
|
├── CacheConverter.csproj
|
|
└── Program.cs
|
|
```
|
|
|
|
**Dependencies:**
|
|
- `ZstdSharp.Port` - read zstd JSON, write zstd protobuf
|
|
- `protobuf-net-data` - protobuf serialization
|
|
|
|
**Behavior:**
|
|
1. Read each `.json.zstd` file from `CACHED_DB_FILES/`
|
|
2. Decompress and parse JSON into an `IDataReader`
|
|
3. Use `DataSerializer.Serialize(stream, reader)` to write protobuf
|
|
4. Compress with zstd and write to `.pb.zstd`
|
|
5. Print before/after sizes for comparison
|
|
|
|
**Usage:**
|
|
```bash
|
|
cd Tools/CacheConverter
|
|
dotnet run -- ../../CACHED_DB_FILES
|
|
```
|
|
|
|
### 2. ProtobufZstdFileSource
|
|
|
|
**New file:** `NEW/src/JdeScoping.DataSync.Dev/Sources/ProtobufZstdFileSource.cs`
|
|
|
|
**Key simplification:** No custom `IDataReader` implementation needed!
|
|
|
|
```csharp
|
|
public sealed class ProtobufZstdFileSource : IImportSource
|
|
{
|
|
public async Task<IDataReader> ReadDataAsync(CancellationToken ct = default)
|
|
{
|
|
_fileStream = new FileStream(_filePath, FileMode.Open, ...);
|
|
_decompressionStream = new DecompressionStream(_fileStream);
|
|
|
|
// protobuf-net-data returns IDataReader directly!
|
|
return DataSerializer.Deserialize(_decompressionStream);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Package additions to `JdeScoping.DataSync.Dev.csproj`:**
|
|
- `protobuf-net-data`
|
|
|
|
### 3. DevEtl Class Updates
|
|
|
|
**Changes to each `*DevEtl.cs` file (22 files):**
|
|
|
|
1. Update `CacheFileName` constant:
|
|
```csharp
|
|
// Before
|
|
public static readonly string CacheFileName = "branch.json.zstd";
|
|
// After
|
|
public static readonly string CacheFileName = "branch.pb.zstd";
|
|
```
|
|
|
|
2. Update `Create()` method:
|
|
```csharp
|
|
// Before
|
|
.WithSource(new JsonZstdFileSource(cacheFilePath, Schema))
|
|
// After
|
|
.WithSource(new ProtobufZstdFileSource(cacheFilePath))
|
|
```
|
|
|
|
3. **Remove schema definitions** - protobuf-net-data embeds schema in the file, so `JsonColumnSchema[]` arrays are no longer needed in DevEtl classes.
|
|
|
|
**No changes to:**
|
|
- Pipeline structure
|
|
- `DevEtlRegistry.cs`
|
|
|
|
### 4. Cleanup (After Verification)
|
|
|
|
**Remove obsolete files:**
|
|
- `Sources/JsonZstdFileSource.cs`
|
|
- `Sources/JsonStreamingDataReader.cs`
|
|
- `Sources/Utf8JsonStreamingDataReader.cs`
|
|
- `Models/JsonColumnSchema.cs`
|
|
|
|
**Remove old cache files:**
|
|
- All `*.json.zstd` files in `CACHED_DB_FILES/`
|
|
|
|
## Code Simplification Summary
|
|
|
|
| Before (JSON) | After (Protobuf) |
|
|
|---------------|------------------|
|
|
| `JsonZstdFileSource` | `ProtobufZstdFileSource` |
|
|
| `Utf8JsonStreamingDataReader` (custom) | `DataSerializer.Deserialize()` (library) |
|
|
| `JsonStreamingDataReader` (legacy) | Removed |
|
|
| `JsonColumnSchema[]` per table | Not needed (embedded in file) |
|
|
|
|
## Test Strategy
|
|
|
|
1. Run converter tool, verify all 22 files convert without errors
|
|
2. Compare file sizes (expect 10-30% reduction)
|
|
3. Run existing `JdeScoping.DataSync.Dev.Tests` - all tests should pass unchanged
|
|
4. Verify data loaded matches previous JSON-based loads
|
|
|
|
## Files Changed
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `Tools/CacheConverter/` (new) | Standalone converter tool |
|
|
| `Sources/ProtobufZstdFileSource.cs` (new) | New protobuf reader (much simpler) |
|
|
| `JdeScoping.DataSync.Dev.csproj` | Add protobuf-net-data package |
|
|
| `*DevEtl.cs` (22 files) | Update file extension, source class, remove schema |
|
|
| `Sources/JsonZstdFileSource.cs` | Delete after migration |
|
|
| `Sources/JsonStreamingDataReader.cs` | Delete after migration |
|
|
| `Sources/Utf8JsonStreamingDataReader.cs` | Delete after migration |
|
|
| `Models/JsonColumnSchema.cs` | Delete after migration |
|
|
| `CACHED_DB_FILES/*.json.zstd` | Delete after verification |
|