# Bulk Merge Helper Design **Date:** 2026-01-01 **Status:** Draft - Pending Review ## Overview Replace the current `StagingTableManager` approach with a streamlined `IBulkMergeHelper` backed by source-generated `IDataReader` converters for efficient `SqlBulkCopy` operations. ## Goals 1. Simplify bulk merge operations to a single method call with expression-based configuration 2. Generate efficient `IAsyncEnumerable` to `IDataReader` converters at compile time 3. Provide better error diagnostics with optional pre-validation 4. Remove manual staging table management code ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ JdeScoping.DataSync │ │ ┌─────────────────────┐ ┌─────────────────────────────────┐ │ │ │ BulkCopyTypeRegistry│ │ IBulkMergeHelper │ │ │ │ - Lists types to │ │ - MergeAsync(...) │ │ │ │ generate for │ │ - Uses IDataReaderFactory │ │ │ └─────────────────────┘ │ - Builds MERGE SQL from exprs │ │ │ │ └─────────────────────────────────┘ │ │ │ (analyzed by) │ │ │ ▼ │ (uses) │ │ ┌─────────────────────┐ ▼ │ │ │ Source Generator │ ┌─────────────────────────────────┐ │ │ │ - Generates │───▶│ Generated Code: │ │ │ │ IDataReader │ │ - WorkOrderDataReader │ │ │ │ wrappers │ │ - LotDataReader │ │ │ │ - Generates DI │ │ - DataReaderFactory impl │ │ │ │ registration │ │ - AddBulkCopyConverters() │ │ │ └─────────────────────┘ └─────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Design Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | Type identification | Explicit list in `BulkCopyTypeRegistry.cs` | Keeps Core project free of bulk copy concerns | | Registry location | DataSync project | Consolidates bulk copy knowledge in one place | | API style | Single method with expression parameters | Simple, all config visible in one place | | Conditional updates | Explicit `updateWhen` expression | Flexible, not tied to property naming conventions | | Error handling | Hybrid - context wrapping + optional validation | Balances performance with debuggability | | Transactions | None - each batch independent | Matches current behavior, idempotent syncs | | Generator project | Single `JdeScoping.DataSync.SourceGenerators` | Simple, can extract later if needed | | DI pattern | Generic `IDataReaderFactory` | Single injection point, easy to mock | | DELETE support | None | YAGNI, matches current behavior | | Migration config | Convention + override | Less boilerplate, explicit when needed | ## Component Details ### 1. BulkCopyTypeRegistry Location: `JdeScoping.DataSync/BulkCopyTypeRegistry.cs` ```csharp namespace JdeScoping.DataSync; public static class BulkCopyTypeRegistry { public static readonly Type[] Types = [ typeof(WorkOrder), typeof(Lot), typeof(LotUsage), typeof(Item), typeof(WorkCenter), typeof(ProfitCenter), typeof(JdeUser), typeof(Branch), typeof(MisData), ]; } ``` ### 2. Source Generator Project: `JdeScoping.DataSync.SourceGenerators` **Generated DataReader wrapper (per type):** ```csharp public sealed class WorkOrderDataReader : IDataReader { private readonly IAsyncEnumerator _enumerator; private WorkOrder? _current; private static readonly string[] _columnNames = ["WorkOrderNumber", "BranchCode", "LotNumber", ...]; public object GetValue(int i) => i switch { 0 => _current!.WorkOrderNumber, 1 => _current!.BranchCode, // ... generated for each property }; public bool Read() { return _enumerator.MoveNextAsync().AsTask().GetAwaiter().GetResult(); } // IDataReader implementation... } ``` **Generated factory:** ```csharp public sealed class DataReaderFactory : IDataReaderFactory { public IDataReader CreateReader(IAsyncEnumerable source) { return source switch { IAsyncEnumerable wo => new WorkOrderDataReader(wo), IAsyncEnumerable lot => new LotDataReader(lot), _ => throw new NotSupportedException($"No converter for {typeof(T).Name}") }; } } ``` **Generated DI extension:** ```csharp public static class BulkCopyServiceCollectionExtensions { public static IServiceCollection AddBulkCopyConverters(this IServiceCollection services) { services.AddSingleton(); return services; } } ``` ### 3. IBulkMergeHelper Interface Location: `JdeScoping.DataSync/Contracts/IBulkMergeHelper.cs` ```csharp namespace JdeScoping.DataSync.Contracts; public interface IBulkMergeHelper { Task MergeAsync( IAsyncEnumerable data, string destinationTable, Expression> matchOn, Expression>? updateColumns = null, Expression>? updateWhen = null, Expression>? insertColumns = null, string? tempTableName = null, int batchSize = 0, bool validateBeforeCopy = false, CancellationToken cancellationToken = default); } public record MergeResult( int TotalRowsProcessed, int RowsInserted, int RowsUpdated, int BatchCount, TimeSpan Elapsed); ``` **Parameters:** | Parameter | Purpose | Default | |-----------|---------|---------| | `data` | Source records to merge | required | | `destinationTable` | Target SQL table name | required | | `matchOn` | PK expression for MERGE ON clause | required | | `updateColumns` | Columns to SET on match | null = all non-PK | | `updateWhen` | Condition for UPDATE | null = always update | | `insertColumns` | Columns for INSERT | null = all columns | | `tempTableName` | Staging table name | `#TEMP_{table}` | | `batchSize` | Rows per batch | 0 = all at once | | `validateBeforeCopy` | Pre-validate data against schema | false | ### 4. BulkMergeHelper Implementation Location: `JdeScoping.DataSync/Services/BulkMergeHelper.cs` **Processing flow:** ``` 1. Parse expressions → extract column names matchOn: x => new { x.A, x.B } → ["A", "B"] 2. Get destination table schema (for temp table creation) SELECT TOP 0 * FROM WorkOrder → column types/lengths 3. Create temp table matching destination schema CREATE TABLE #TEMP_WorkOrder (... same columns ...) 4. If validateBeforeCopy: load schema constraints 5. Stream data in batches: foreach batch: a. Collect batchSize records from IAsyncEnumerable b. If validate: check each row against schema c. Create IDataReader via IDataReaderFactory d. SqlBulkCopy to temp table e. Execute MERGE statement f. TRUNCATE temp table g. Accumulate inserted/updated counts 6. DROP temp table (in finally block) 7. Return MergeResult with totals ``` **Generated MERGE SQL:** ```sql MERGE INTO [WorkOrder] AS target USING [#TEMP_WorkOrder] AS source ON target.[WorkOrderNumber] = source.[WorkOrderNumber] AND target.[BranchCode] = source.[BranchCode] WHEN MATCHED AND source.[LastUpdateDt] > target.[LastUpdateDt] THEN UPDATE SET target.[StatusCode] = source.[StatusCode], target.[OrderQuantity] = source.[OrderQuantity], target.[LastUpdateDt] = source.[LastUpdateDt] WHEN NOT MATCHED THEN INSERT ([WorkOrderNumber], [BranchCode], [StatusCode], ...) VALUES (source.[WorkOrderNumber], source.[BranchCode], ...); SELECT @@ROWCOUNT; ``` ### 5. Error Handling **Exception hierarchy:** ```csharp public class BulkMergeException : Exception { public string TableName { get; init; } public int BatchNumber { get; init; } public int RowsInBatch { get; init; } public string? SqlStatement { get; init; } } public class BulkMergeValidationException : BulkMergeException { public IReadOnlyList Errors { get; init; } } public record ValidationError( int RowIndex, string ColumnName, object? Value, string Message); ``` **Validation checks (when `validateBeforeCopy: true`):** | Check | Example Error | |-------|---------------| | String length | `"Column 'StatusCode' value 'TOOLONG' exceeds max length 5 at row 42"` | | Null in non-nullable | `"Column 'WorkOrderNumber' cannot be null at row 17"` | | Type mismatch | `"Column 'OrderQuantity' expected int, got string at row 89"` | | Decimal precision | `"Column 'Amount' value 12345.6789 exceeds precision(10,2) at row 5"` | ### 6. DI Registration ```csharp public static class ServiceCollectionExtensions { public static IServiceCollection AddDataSync( this IServiceCollection services, IConfiguration configuration) { // Existing registrations... // Add bulk copy converters (generated) services.AddBulkCopyConverters(); // Add bulk merge helper services.AddScoped(); return services; } } ``` ## Testing Strategy ### Unit Tests (JdeScoping.DataSync.Tests) | Test Class | Coverage | |------------|----------| | `BulkMergeHelperTests` | Expression parsing, SQL generation, batch splitting | | `ExpressionParserTests` | Column name extraction from expressions | | `MergeSqlBuilderTests` | Generated MERGE SQL correctness | | `DataReaderFactoryTests` | Factory type resolution | | `ValidationTests` | Schema validation logic | | `BulkMergeExceptionTests` | Exception properties and formatting | ### Integration Tests (JdeScoping.DataSync.IntegrationTests) | Test Class | Coverage | |------------|----------| | `BulkMergeHelperIntegrationTests` | End-to-end merge against SQL Server | | `BatchingIntegrationTests` | Large datasets, multiple batches | | `ValidationIntegrationTests` | Schema validation against real table | **Key scenarios:** - Insert new records (WHEN NOT MATCHED) - Update existing records (WHEN MATCHED) - Conditional update respects `updateWhen` - Composite primary key matching - Batch processing (10k+ records across multiple batches) - Temp table cleanup on success and failure - Validation catches truncation before SQL error ## Migration Plan ### Code to Replace | File | Change | |------|--------| | `StagingTableManager.cs` | Delete - replaced by `BulkMergeHelper` | | `TableSyncOperation.cs` | Simplify to use `IBulkMergeHelper` | | `LotFinderRepository.DataSync.cs` | Remove bulk-related methods | ### Before/After **Before:** ```csharp await _stagingTableManager.CreateStagingTableAsync(...); await _stagingTableManager.BulkCopyToStagingAsync(...); await _stagingTableManager.MergeFromStagingAsync(...); await _stagingTableManager.DropStagingTableAsync(...); ``` **After:** ```csharp var result = await _bulkMergeHelper.MergeAsync( data: fetcher.FetchAsync(lastUpdate), destinationTable: config.TableName, matchOn: config.MatchExpression, updateColumns: config.UpdateExpression, updateWhen: config.UpdateCondition, batchSize: _options.BatchSize); ``` ### Tests to Remove - `StagingTableManagerTests.cs` (unit) - `StagingTableManagerTests.cs` (integration) ## Example Usage ```csharp // Simple case - match on single PK, update all columns var result = await _bulkMergeHelper.MergeAsync( data: workOrders, destinationTable: "WorkOrder", matchOn: x => x.WorkOrderNumber); // Full configuration var result = await _bulkMergeHelper.MergeAsync( data: workOrders, destinationTable: "WorkOrder", matchOn: x => new { x.WorkOrderNumber, x.BranchCode }, updateColumns: x => new { x.StatusCode, x.OrderQuantity, x.LastUpdateDt }, updateWhen: (src, tgt) => src.LastUpdateDt > tgt.LastUpdateDt, batchSize: 10000, validateBeforeCopy: true); ```