diff --git a/PLANS/2026-01-06-messagepack-cache-conversion-design.md b/PLANS/2026-01-06-messagepack-cache-conversion-design.md new file mode 100644 index 0000000..e51f04d --- /dev/null +++ b/PLANS/2026-01-06-messagepack-cache-conversion-design.md @@ -0,0 +1,142 @@ +# MessagePack Cache Conversion Design + +## Purpose + +Convert the development cache files in `CACHED_DB_FILES/` from zstd-compressed JSON (`.json.zstd`) to zstd-compressed MessagePack (`.msgpack.zstd`) for faster deserialization and smaller file sizes. + +## Goals + +1. **Faster deserialization** - MessagePack is faster to parse than JSON +2. **Smaller file sizes** - MessagePack is more compact than JSON + +## Current State + +- 22 cache files in `CACHED_DB_FILES/` totaling ~3.6 GB (zstd-compressed JSON) +- `JsonZstdFileSource` reads files using ZstdSharp + Utf8JsonReader +- Each `*DevEtl.cs` defines a schema and creates a pipeline from JSON files +- Tests verify ETL loads data from cache files into SQL Server + +## Design Decisions + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated | +| Data structure | Map format (field names as keys) | Self-describing, maintainable, keys compress well with zstd | +| Compression | Keep zstd | Largest file is 878 MB; raw MessagePack would be 2-4x larger | +| Converter location | Standalone console app in `Tools/CacheConverter/` | Isolated utility, not part of main solution | + +## File Format + +**New extension:** `.msgpack.zstd` + +**File naming:** +- `branch.json.zstd` → `branch.msgpack.zstd` +- `workordertime_curr.json.zstd` → `workordertime_curr.msgpack.zstd` + +**Data structure:** Array of maps (same logical structure as JSON) +``` +[ + { "Code": "ABC", "Description": "Branch ABC", "LastUpdateDT": }, + { "Code": "DEF", "Description": "Branch DEF", "LastUpdateDT": }, + ... +] +``` + +**Library:** MessagePack-CSharp (`MessagePack` NuGet package) + +## Components + +### 1. Converter Tool + +**Location:** `/JdeScopingTool/Tools/CacheConverter/` + +``` +Tools/ +└── CacheConverter/ + ├── CacheConverter.csproj + └── Program.cs +``` + +**Dependencies:** +- `ZstdSharp.Port` - read zstd JSON, write zstd MessagePack +- `MessagePack` - MessagePack serialization + +**Behavior:** +1. Read each `.json.zstd` file from `CACHED_DB_FILES/` +2. Stream JSON → deserialize to `Dictionary[]` +3. Serialize to MessagePack (map format) → compress with zstd +4. Write to `.msgpack.zstd` alongside originals +5. Print before/after sizes for comparison + +**Usage:** +```bash +cd Tools/CacheConverter +dotnet run -- ../../CACHED_DB_FILES +``` + +### 2. MessagePackZstdFileSource + +**New file:** `NEW/src/JdeScoping.DataSync.Dev/Sources/MessagePackZstdFileSource.cs` + +- Implements `IImportSource` (same interface as `JsonZstdFileSource`) +- Reads `.msgpack.zstd` files using streaming decompression +- Uses `MessagePackStreamReader` for efficient streaming deserialization +- Returns an `IDataReader` that yields rows one at a time +- Schema still needed for `IDataReader` field metadata (column names, types, ordinals) + +**Package addition:** Add `MessagePack` to `JdeScoping.DataSync.Dev.csproj` + +### 3. DevEtl Class Updates + +**Changes to each `*DevEtl.cs` file (22 files):** + +1. Update `CacheFileName` constant: + ```csharp + // Before + public static readonly string CacheFileName = "branch.json.zstd"; + // After + public static readonly string CacheFileName = "branch.msgpack.zstd"; + ``` + +2. Update `Create()` method: + ```csharp + // Before + .WithSource(new JsonZstdFileSource(cacheFilePath, Schema)) + // After + .WithSource(new MessagePackZstdFileSource(cacheFilePath, Schema)) + ``` + +**No changes to:** +- Schema definitions (same column names and types) +- Pipeline structure +- `DevEtlRegistry.cs` + +### 4. Cleanup (After Verification) + +Remove obsolete JSON readers: +- `JsonZstdFileSource.cs` +- `JsonStreamingDataReader.cs` +- `Utf8JsonStreamingDataReader.cs` + +Remove old cache files: +- All `*.json.zstd` files in `CACHED_DB_FILES/` + +## Test Strategy + +1. Run converter tool, verify all 22 files convert without errors +2. Compare file sizes (expect 10-30% reduction) +3. Run existing `JdeScoping.DataSync.Dev.Tests` - all tests should pass unchanged +4. Verify data loaded matches previous JSON-based loads + +## Files Changed + +| File | Change | +|------|--------| +| `Tools/CacheConverter/` (new) | Standalone converter tool | +| `Sources/MessagePackZstdFileSource.cs` (new) | New MessagePack reader | +| `JdeScoping.DataSync.Dev.csproj` | Add MessagePack package | +| `*DevEtl.cs` (22 files) | Update file extension and source class | +| `Sources/JsonZstdFileSource.cs` | Delete after migration | +| `Sources/JsonStreamingDataReader.cs` | Delete after migration | +| `Sources/Utf8JsonStreamingDataReader.cs` | Delete after migration | +| `CACHED_DB_FILES/*.json.zstd` | Delete after verification |