Files
jdescopingtool/PLANS/2026-01-06-messagepack-cache-conversion-design.md
T
Joseph Doherty 01da261d6c docs: add MessagePack cache conversion design plan
Design for converting CACHED_DB_FILES from zstd-compressed JSON
to zstd-compressed MessagePack for faster deserialization and
smaller file sizes.
2026-01-06 14:03:47 -05:00

4.7 KiB

MessagePack Cache Conversion Design

Purpose

Convert the development cache files in CACHED_DB_FILES/ from zstd-compressed JSON (.json.zstd) to zstd-compressed MessagePack (.msgpack.zstd) for faster deserialization and smaller file sizes.

Goals

  1. Faster deserialization - MessagePack is faster to parse than JSON
  2. Smaller file sizes - MessagePack is more compact than JSON

Current State

  • 22 cache files in CACHED_DB_FILES/ totaling ~3.6 GB (zstd-compressed JSON)
  • JsonZstdFileSource reads files using ZstdSharp + Utf8JsonReader
  • Each *DevEtl.cs defines a schema and creates a pipeline from JSON files
  • Tests verify ETL loads data from cache files into SQL Server

Design Decisions

Decision Choice Rationale
Conversion approach One-time manual Cache files are static snapshots, not actively regenerated
Data structure Map format (field names as keys) Self-describing, maintainable, keys compress well with zstd
Compression Keep zstd Largest file is 878 MB; raw MessagePack would be 2-4x larger
Converter location Standalone console app in Tools/CacheConverter/ Isolated utility, not part of main solution

File Format

New extension: .msgpack.zstd

File naming:

  • branch.json.zstdbranch.msgpack.zstd
  • workordertime_curr.json.zstdworkordertime_curr.msgpack.zstd

Data structure: Array of maps (same logical structure as JSON)

[
  { "Code": "ABC", "Description": "Branch ABC", "LastUpdateDT": <DateTime> },
  { "Code": "DEF", "Description": "Branch DEF", "LastUpdateDT": <DateTime> },
  ...
]

Library: MessagePack-CSharp (MessagePack NuGet package)

Components

1. Converter Tool

Location: /JdeScopingTool/Tools/CacheConverter/

Tools/
└── CacheConverter/
    ├── CacheConverter.csproj
    └── Program.cs

Dependencies:

  • ZstdSharp.Port - read zstd JSON, write zstd MessagePack
  • MessagePack - MessagePack serialization

Behavior:

  1. Read each .json.zstd file from CACHED_DB_FILES/
  2. Stream JSON → deserialize to Dictionary<string, object?>[]
  3. Serialize to MessagePack (map format) → compress with zstd
  4. Write to .msgpack.zstd alongside originals
  5. Print before/after sizes for comparison

Usage:

cd Tools/CacheConverter
dotnet run -- ../../CACHED_DB_FILES

2. MessagePackZstdFileSource

New file: NEW/src/JdeScoping.DataSync.Dev/Sources/MessagePackZstdFileSource.cs

  • Implements IImportSource (same interface as JsonZstdFileSource)
  • Reads .msgpack.zstd files using streaming decompression
  • Uses MessagePackStreamReader for efficient streaming deserialization
  • Returns an IDataReader that yields rows one at a time
  • Schema still needed for IDataReader field metadata (column names, types, ordinals)

Package addition: Add MessagePack to JdeScoping.DataSync.Dev.csproj

3. DevEtl Class Updates

Changes to each *DevEtl.cs file (22 files):

  1. Update CacheFileName constant:

    // Before
    public static readonly string CacheFileName = "branch.json.zstd";
    // After
    public static readonly string CacheFileName = "branch.msgpack.zstd";
    
  2. Update Create() method:

    // Before
    .WithSource(new JsonZstdFileSource(cacheFilePath, Schema))
    // After
    .WithSource(new MessagePackZstdFileSource(cacheFilePath, Schema))
    

No changes to:

  • Schema definitions (same column names and types)
  • Pipeline structure
  • DevEtlRegistry.cs

4. Cleanup (After Verification)

Remove obsolete JSON readers:

  • JsonZstdFileSource.cs
  • JsonStreamingDataReader.cs
  • Utf8JsonStreamingDataReader.cs

Remove old cache files:

  • All *.json.zstd files in CACHED_DB_FILES/

Test Strategy

  1. Run converter tool, verify all 22 files convert without errors
  2. Compare file sizes (expect 10-30% reduction)
  3. Run existing JdeScoping.DataSync.Dev.Tests - all tests should pass unchanged
  4. Verify data loaded matches previous JSON-based loads

Files Changed

File Change
Tools/CacheConverter/ (new) Standalone converter tool
Sources/MessagePackZstdFileSource.cs (new) New MessagePack reader
JdeScoping.DataSync.Dev.csproj Add MessagePack package
*DevEtl.cs (22 files) Update file extension and source class
Sources/JsonZstdFileSource.cs Delete after migration
Sources/JsonStreamingDataReader.cs Delete after migration
Sources/Utf8JsonStreamingDataReader.cs Delete after migration
CACHED_DB_FILES/*.json.zstd Delete after verification