docs: add MessagePack cache conversion design plan
Design for converting CACHED_DB_FILES from zstd-compressed JSON to zstd-compressed MessagePack for faster deserialization and smaller file sizes.
This commit is contained in:
@@ -0,0 +1,142 @@
|
||||
# MessagePack Cache Conversion Design
|
||||
|
||||
## Purpose
|
||||
|
||||
Convert the development cache files in `CACHED_DB_FILES/` from zstd-compressed JSON (`.json.zstd`) to zstd-compressed MessagePack (`.msgpack.zstd`) for faster deserialization and smaller file sizes.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Faster deserialization** - MessagePack is faster to parse than JSON
|
||||
2. **Smaller file sizes** - MessagePack is more compact than JSON
|
||||
|
||||
## Current State
|
||||
|
||||
- 22 cache files in `CACHED_DB_FILES/` totaling ~3.6 GB (zstd-compressed JSON)
|
||||
- `JsonZstdFileSource` reads files using ZstdSharp + Utf8JsonReader
|
||||
- Each `*DevEtl.cs` defines a schema and creates a pipeline from JSON files
|
||||
- Tests verify ETL loads data from cache files into SQL Server
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated |
|
||||
| Data structure | Map format (field names as keys) | Self-describing, maintainable, keys compress well with zstd |
|
||||
| Compression | Keep zstd | Largest file is 878 MB; raw MessagePack would be 2-4x larger |
|
||||
| Converter location | Standalone console app in `Tools/CacheConverter/` | Isolated utility, not part of main solution |
|
||||
|
||||
## File Format
|
||||
|
||||
**New extension:** `.msgpack.zstd`
|
||||
|
||||
**File naming:**
|
||||
- `branch.json.zstd` → `branch.msgpack.zstd`
|
||||
- `workordertime_curr.json.zstd` → `workordertime_curr.msgpack.zstd`
|
||||
|
||||
**Data structure:** Array of maps (same logical structure as JSON)
|
||||
```
|
||||
[
|
||||
{ "Code": "ABC", "Description": "Branch ABC", "LastUpdateDT": <DateTime> },
|
||||
{ "Code": "DEF", "Description": "Branch DEF", "LastUpdateDT": <DateTime> },
|
||||
...
|
||||
]
|
||||
```
|
||||
|
||||
**Library:** MessagePack-CSharp (`MessagePack` NuGet package)
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Converter Tool
|
||||
|
||||
**Location:** `/JdeScopingTool/Tools/CacheConverter/`
|
||||
|
||||
```
|
||||
Tools/
|
||||
└── CacheConverter/
|
||||
├── CacheConverter.csproj
|
||||
└── Program.cs
|
||||
```
|
||||
|
||||
**Dependencies:**
|
||||
- `ZstdSharp.Port` - read zstd JSON, write zstd MessagePack
|
||||
- `MessagePack` - MessagePack serialization
|
||||
|
||||
**Behavior:**
|
||||
1. Read each `.json.zstd` file from `CACHED_DB_FILES/`
|
||||
2. Stream JSON → deserialize to `Dictionary<string, object?>[]`
|
||||
3. Serialize to MessagePack (map format) → compress with zstd
|
||||
4. Write to `.msgpack.zstd` alongside originals
|
||||
5. Print before/after sizes for comparison
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
cd Tools/CacheConverter
|
||||
dotnet run -- ../../CACHED_DB_FILES
|
||||
```
|
||||
|
||||
### 2. MessagePackZstdFileSource
|
||||
|
||||
**New file:** `NEW/src/JdeScoping.DataSync.Dev/Sources/MessagePackZstdFileSource.cs`
|
||||
|
||||
- Implements `IImportSource` (same interface as `JsonZstdFileSource`)
|
||||
- Reads `.msgpack.zstd` files using streaming decompression
|
||||
- Uses `MessagePackStreamReader` for efficient streaming deserialization
|
||||
- Returns an `IDataReader` that yields rows one at a time
|
||||
- Schema still needed for `IDataReader` field metadata (column names, types, ordinals)
|
||||
|
||||
**Package addition:** Add `MessagePack` to `JdeScoping.DataSync.Dev.csproj`
|
||||
|
||||
### 3. DevEtl Class Updates
|
||||
|
||||
**Changes to each `*DevEtl.cs` file (22 files):**
|
||||
|
||||
1. Update `CacheFileName` constant:
|
||||
```csharp
|
||||
// Before
|
||||
public static readonly string CacheFileName = "branch.json.zstd";
|
||||
// After
|
||||
public static readonly string CacheFileName = "branch.msgpack.zstd";
|
||||
```
|
||||
|
||||
2. Update `Create()` method:
|
||||
```csharp
|
||||
// Before
|
||||
.WithSource(new JsonZstdFileSource(cacheFilePath, Schema))
|
||||
// After
|
||||
.WithSource(new MessagePackZstdFileSource(cacheFilePath, Schema))
|
||||
```
|
||||
|
||||
**No changes to:**
|
||||
- Schema definitions (same column names and types)
|
||||
- Pipeline structure
|
||||
- `DevEtlRegistry.cs`
|
||||
|
||||
### 4. Cleanup (After Verification)
|
||||
|
||||
Remove obsolete JSON readers:
|
||||
- `JsonZstdFileSource.cs`
|
||||
- `JsonStreamingDataReader.cs`
|
||||
- `Utf8JsonStreamingDataReader.cs`
|
||||
|
||||
Remove old cache files:
|
||||
- All `*.json.zstd` files in `CACHED_DB_FILES/`
|
||||
|
||||
## Test Strategy
|
||||
|
||||
1. Run converter tool, verify all 22 files convert without errors
|
||||
2. Compare file sizes (expect 10-30% reduction)
|
||||
3. Run existing `JdeScoping.DataSync.Dev.Tests` - all tests should pass unchanged
|
||||
4. Verify data loaded matches previous JSON-based loads
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `Tools/CacheConverter/` (new) | Standalone converter tool |
|
||||
| `Sources/MessagePackZstdFileSource.cs` (new) | New MessagePack reader |
|
||||
| `JdeScoping.DataSync.Dev.csproj` | Add MessagePack package |
|
||||
| `*DevEtl.cs` (22 files) | Update file extension and source class |
|
||||
| `Sources/JsonZstdFileSource.cs` | Delete after migration |
|
||||
| `Sources/JsonStreamingDataReader.cs` | Delete after migration |
|
||||
| `Sources/Utf8JsonStreamingDataReader.cs` | Delete after migration |
|
||||
| `CACHED_DB_FILES/*.json.zstd` | Delete after verification |
|
||||
Reference in New Issue
Block a user