# MessagePack Cache Conversion Design ## Purpose Convert the development cache files in `CACHED_DB_FILES/` from zstd-compressed JSON (`.json.zstd`) to zstd-compressed MessagePack (`.msgpack.zstd`) for faster deserialization and smaller file sizes. ## Goals 1. **Faster deserialization** - MessagePack is faster to parse than JSON 2. **Smaller file sizes** - MessagePack is more compact than JSON ## Current State - 22 cache files in `CACHED_DB_FILES/` totaling ~3.6 GB (zstd-compressed JSON) - `JsonZstdFileSource` reads files using ZstdSharp + Utf8JsonReader - Each `*DevEtl.cs` defines a schema and creates a pipeline from JSON files - Tests verify ETL loads data from cache files into SQL Server ## Design Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated | | Data structure | Map format (field names as keys) | Self-describing, maintainable, keys compress well with zstd | | Compression | Keep zstd | Largest file is 878 MB; raw MessagePack would be 2-4x larger | | Converter location | Standalone console app in `Tools/CacheConverter/` | Isolated utility, not part of main solution | ## File Format **New extension:** `.msgpack.zstd` **File naming:** - `branch.json.zstd` → `branch.msgpack.zstd` - `workordertime_curr.json.zstd` → `workordertime_curr.msgpack.zstd` **Data structure:** Array of maps (same logical structure as JSON) ``` [ { "Code": "ABC", "Description": "Branch ABC", "LastUpdateDT": }, { "Code": "DEF", "Description": "Branch DEF", "LastUpdateDT": }, ... ] ``` **Library:** MessagePack-CSharp (`MessagePack` NuGet package) ## Components ### 1. Converter Tool **Location:** `/JdeScopingTool/Tools/CacheConverter/` ``` Tools/ └── CacheConverter/ ├── CacheConverter.csproj └── Program.cs ``` **Dependencies:** - `ZstdSharp.Port` - read zstd JSON, write zstd MessagePack - `MessagePack` - MessagePack serialization **Behavior:** 1. Read each `.json.zstd` file from `CACHED_DB_FILES/` 2. Stream JSON → deserialize to `Dictionary[]` 3. Serialize to MessagePack (map format) → compress with zstd 4. Write to `.msgpack.zstd` alongside originals 5. Print before/after sizes for comparison **Usage:** ```bash cd Tools/CacheConverter dotnet run -- ../../CACHED_DB_FILES ``` ### 2. MessagePackZstdFileSource **New file:** `NEW/src/JdeScoping.DataSync.Dev/Sources/MessagePackZstdFileSource.cs` - Implements `IImportSource` (same interface as `JsonZstdFileSource`) - Reads `.msgpack.zstd` files using streaming decompression - Uses `MessagePackStreamReader` for efficient streaming deserialization - Returns an `IDataReader` that yields rows one at a time - Schema still needed for `IDataReader` field metadata (column names, types, ordinals) **Package addition:** Add `MessagePack` to `JdeScoping.DataSync.Dev.csproj` ### 3. DevEtl Class Updates **Changes to each `*DevEtl.cs` file (22 files):** 1. Update `CacheFileName` constant: ```csharp // Before public static readonly string CacheFileName = "branch.json.zstd"; // After public static readonly string CacheFileName = "branch.msgpack.zstd"; ``` 2. Update `Create()` method: ```csharp // Before .WithSource(new JsonZstdFileSource(cacheFilePath, Schema)) // After .WithSource(new MessagePackZstdFileSource(cacheFilePath, Schema)) ``` **No changes to:** - Schema definitions (same column names and types) - Pipeline structure - `DevEtlRegistry.cs` ### 4. Cleanup (After Verification) Remove obsolete JSON readers: - `JsonZstdFileSource.cs` - `JsonStreamingDataReader.cs` - `Utf8JsonStreamingDataReader.cs` Remove old cache files: - All `*.json.zstd` files in `CACHED_DB_FILES/` ## Test Strategy 1. Run converter tool, verify all 22 files convert without errors 2. Compare file sizes (expect 10-30% reduction) 3. Run existing `JdeScoping.DataSync.Dev.Tests` - all tests should pass unchanged 4. Verify data loaded matches previous JSON-based loads ## Files Changed | File | Change | |------|--------| | `Tools/CacheConverter/` (new) | Standalone converter tool | | `Sources/MessagePackZstdFileSource.cs` (new) | New MessagePack reader | | `JdeScoping.DataSync.Dev.csproj` | Add MessagePack package | | `*DevEtl.cs` (22 files) | Update file extension and source class | | `Sources/JsonZstdFileSource.cs` | Delete after migration | | `Sources/JsonStreamingDataReader.cs` | Delete after migration | | `Sources/Utf8JsonStreamingDataReader.cs` | Delete after migration | | `CACHED_DB_FILES/*.json.zstd` | Delete after verification |