Commit Graph

6 Commits

Author SHA1 Message Date
Joseph Doherty cd68b2c655 fix: read file size after streams are closed in converter 2026-01-06 15:39:47 -05:00
Joseph Doherty 6ebd78d487 feat: add parallel file conversion support to cache converter
- Add optional third parameter for parallelism (default: 8)
- Use Parallel.ForEachAsync for concurrent file processing
- Thread-safe console output with lock
- Thread-safe size counters with Interlocked

Usage: dotnet run -- <cache-dir> <scripts-dir> [parallelism]
2026-01-06 15:35:00 -05:00
Joseph Doherty c6aeb20d9c docs: update documentation for extraction functions migration
- Add ExtractionFunctions.md reference document
- Update database-schema spec with 11 extraction functions
- Update data-access spec to document extraction function approach
- Update search-processing spec with new query builder interface
- Add Database.Tests to Testing.md architecture doc
- Update DataFlow.md with extraction function flow
2026-01-06 14:54:10 -05:00
Joseph Doherty 35c1e6baf0 refactor: use SQL schema and streaming in converter
- Read schema from SQL CREATE TABLE scripts instead of inferring from JSON
- Stream JSON records using Utf8JsonReader instead of loading all into memory
- Write protobuf output in batches of 10000 rows to reduce memory usage
- Add mapping from cache file names to SQL scripts and table names
- Map SQL types (VARCHAR, BIGINT, DECIMAL, DATETIME2, BIT) to .NET types
- Update usage to require scripts directory as second argument
2026-01-06 14:39:22 -05:00
Joseph Doherty 8b1dfeb6c6 fix: address code review issues in converter tool 2026-01-06 14:24:23 -05:00
Joseph Doherty 6d08fd4a6c feat: add protobuf cache converter tool
Add standalone CLI tool to convert zstd-compressed JSON cache files
to zstd-compressed Protocol Buffers format for faster deserialization.
2026-01-06 14:21:46 -05:00