01da261d6c
Design for converting CACHED_DB_FILES from zstd-compressed JSON to zstd-compressed MessagePack for faster deserialization and smaller file sizes.
4.7 KiB
4.7 KiB
MessagePack Cache Conversion Design
Purpose
Convert the development cache files in CACHED_DB_FILES/ from zstd-compressed JSON (.json.zstd) to zstd-compressed MessagePack (.msgpack.zstd) for faster deserialization and smaller file sizes.
Goals
- Faster deserialization - MessagePack is faster to parse than JSON
- Smaller file sizes - MessagePack is more compact than JSON
Current State
- 22 cache files in
CACHED_DB_FILES/totaling ~3.6 GB (zstd-compressed JSON) JsonZstdFileSourcereads files using ZstdSharp + Utf8JsonReader- Each
*DevEtl.csdefines a schema and creates a pipeline from JSON files - Tests verify ETL loads data from cache files into SQL Server
Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Conversion approach | One-time manual | Cache files are static snapshots, not actively regenerated |
| Data structure | Map format (field names as keys) | Self-describing, maintainable, keys compress well with zstd |
| Compression | Keep zstd | Largest file is 878 MB; raw MessagePack would be 2-4x larger |
| Converter location | Standalone console app in Tools/CacheConverter/ |
Isolated utility, not part of main solution |
File Format
New extension: .msgpack.zstd
File naming:
branch.json.zstd→branch.msgpack.zstdworkordertime_curr.json.zstd→workordertime_curr.msgpack.zstd
Data structure: Array of maps (same logical structure as JSON)
[
{ "Code": "ABC", "Description": "Branch ABC", "LastUpdateDT": <DateTime> },
{ "Code": "DEF", "Description": "Branch DEF", "LastUpdateDT": <DateTime> },
...
]
Library: MessagePack-CSharp (MessagePack NuGet package)
Components
1. Converter Tool
Location: /JdeScopingTool/Tools/CacheConverter/
Tools/
└── CacheConverter/
├── CacheConverter.csproj
└── Program.cs
Dependencies:
ZstdSharp.Port- read zstd JSON, write zstd MessagePackMessagePack- MessagePack serialization
Behavior:
- Read each
.json.zstdfile fromCACHED_DB_FILES/ - Stream JSON → deserialize to
Dictionary<string, object?>[] - Serialize to MessagePack (map format) → compress with zstd
- Write to
.msgpack.zstdalongside originals - Print before/after sizes for comparison
Usage:
cd Tools/CacheConverter
dotnet run -- ../../CACHED_DB_FILES
2. MessagePackZstdFileSource
New file: NEW/src/JdeScoping.DataSync.Dev/Sources/MessagePackZstdFileSource.cs
- Implements
IImportSource(same interface asJsonZstdFileSource) - Reads
.msgpack.zstdfiles using streaming decompression - Uses
MessagePackStreamReaderfor efficient streaming deserialization - Returns an
IDataReaderthat yields rows one at a time - Schema still needed for
IDataReaderfield metadata (column names, types, ordinals)
Package addition: Add MessagePack to JdeScoping.DataSync.Dev.csproj
3. DevEtl Class Updates
Changes to each *DevEtl.cs file (22 files):
-
Update
CacheFileNameconstant:// Before public static readonly string CacheFileName = "branch.json.zstd"; // After public static readonly string CacheFileName = "branch.msgpack.zstd"; -
Update
Create()method:// Before .WithSource(new JsonZstdFileSource(cacheFilePath, Schema)) // After .WithSource(new MessagePackZstdFileSource(cacheFilePath, Schema))
No changes to:
- Schema definitions (same column names and types)
- Pipeline structure
DevEtlRegistry.cs
4. Cleanup (After Verification)
Remove obsolete JSON readers:
JsonZstdFileSource.csJsonStreamingDataReader.csUtf8JsonStreamingDataReader.cs
Remove old cache files:
- All
*.json.zstdfiles inCACHED_DB_FILES/
Test Strategy
- Run converter tool, verify all 22 files convert without errors
- Compare file sizes (expect 10-30% reduction)
- Run existing
JdeScoping.DataSync.Dev.Tests- all tests should pass unchanged - Verify data loaded matches previous JSON-based loads
Files Changed
| File | Change |
|---|---|
Tools/CacheConverter/ (new) |
Standalone converter tool |
Sources/MessagePackZstdFileSource.cs (new) |
New MessagePack reader |
JdeScoping.DataSync.Dev.csproj |
Add MessagePack package |
*DevEtl.cs (22 files) |
Update file extension and source class |
Sources/JsonZstdFileSource.cs |
Delete after migration |
Sources/JsonStreamingDataReader.cs |
Delete after migration |
Sources/Utf8JsonStreamingDataReader.cs |
Delete after migration |
CACHED_DB_FILES/*.json.zstd |
Delete after verification |