Files
jdescopingtool/PLANS/2026-01-01-bulk-merge-helper-design.md
T
Joseph Doherty 26ff8d9b4f Initial commit: JDE Scoping Tool migration project
Set up repository with legacy .NET Framework 4.8 source (OLD/),
new .NET 10 Blazor solution (NEW/), OpenSpec specifications,
documentation, and project configuration.
2026-01-02 07:43:29 -05:00

13 KiB

Bulk Merge Helper Design

Date: 2026-01-01 Status: Draft - Pending Review

Overview

Replace the current StagingTableManager approach with a streamlined IBulkMergeHelper backed by source-generated IDataReader converters for efficient SqlBulkCopy operations.

Goals

  1. Simplify bulk merge operations to a single method call with expression-based configuration
  2. Generate efficient IAsyncEnumerable<T> to IDataReader converters at compile time
  3. Provide better error diagnostics with optional pre-validation
  4. Remove manual staging table management code

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    JdeScoping.DataSync                          │
│  ┌─────────────────────┐    ┌─────────────────────────────────┐ │
│  │ BulkCopyTypeRegistry│    │      IBulkMergeHelper           │ │
│  │ - Lists types to    │    │  - MergeAsync<T>(...)           │ │
│  │   generate for      │    │  - Uses IDataReaderFactory      │ │
│  └─────────────────────┘    │  - Builds MERGE SQL from exprs  │ │
│            │                └─────────────────────────────────┘ │
│            │ (analyzed by)              │                       │
│            ▼                            │ (uses)                │
│  ┌─────────────────────┐                ▼                       │
│  │  Source Generator   │    ┌─────────────────────────────────┐ │
│  │  - Generates        │───▶│  Generated Code:                │ │
│  │    IDataReader      │    │  - WorkOrderDataReader          │ │
│  │    wrappers         │    │  - LotDataReader                │ │
│  │  - Generates DI     │    │  - DataReaderFactory impl       │ │
│  │    registration     │    │  - AddBulkCopyConverters()      │ │
│  └─────────────────────┘    └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Design Decisions

Decision Choice Rationale
Type identification Explicit list in BulkCopyTypeRegistry.cs Keeps Core project free of bulk copy concerns
Registry location DataSync project Consolidates bulk copy knowledge in one place
API style Single method with expression parameters Simple, all config visible in one place
Conditional updates Explicit updateWhen expression Flexible, not tied to property naming conventions
Error handling Hybrid - context wrapping + optional validation Balances performance with debuggability
Transactions None - each batch independent Matches current behavior, idempotent syncs
Generator project Single JdeScoping.DataSync.SourceGenerators Simple, can extract later if needed
DI pattern Generic IDataReaderFactory Single injection point, easy to mock
DELETE support None YAGNI, matches current behavior
Migration config Convention + override Less boilerplate, explicit when needed

Component Details

1. BulkCopyTypeRegistry

Location: JdeScoping.DataSync/BulkCopyTypeRegistry.cs

namespace JdeScoping.DataSync;

public static class BulkCopyTypeRegistry
{
    public static readonly Type[] Types =
    [
        typeof(WorkOrder),
        typeof(Lot),
        typeof(LotUsage),
        typeof(Item),
        typeof(WorkCenter),
        typeof(ProfitCenter),
        typeof(JdeUser),
        typeof(Branch),
        typeof(MisData),
    ];
}

2. Source Generator

Project: JdeScoping.DataSync.SourceGenerators

Generated DataReader wrapper (per type):

public sealed class WorkOrderDataReader : IDataReader
{
    private readonly IAsyncEnumerator<WorkOrder> _enumerator;
    private WorkOrder? _current;

    private static readonly string[] _columnNames =
        ["WorkOrderNumber", "BranchCode", "LotNumber", ...];

    public object GetValue(int i) => i switch
    {
        0 => _current!.WorkOrderNumber,
        1 => _current!.BranchCode,
        // ... generated for each property
    };

    public bool Read()
    {
        return _enumerator.MoveNextAsync().AsTask().GetAwaiter().GetResult();
    }

    // IDataReader implementation...
}

Generated factory:

public sealed class DataReaderFactory : IDataReaderFactory
{
    public IDataReader CreateReader<T>(IAsyncEnumerable<T> source)
    {
        return source switch
        {
            IAsyncEnumerable<WorkOrder> wo => new WorkOrderDataReader(wo),
            IAsyncEnumerable<Lot> lot => new LotDataReader(lot),
            _ => throw new NotSupportedException($"No converter for {typeof(T).Name}")
        };
    }
}

Generated DI extension:

public static class BulkCopyServiceCollectionExtensions
{
    public static IServiceCollection AddBulkCopyConverters(this IServiceCollection services)
    {
        services.AddSingleton<IDataReaderFactory, DataReaderFactory>();
        return services;
    }
}

3. IBulkMergeHelper Interface

Location: JdeScoping.DataSync/Contracts/IBulkMergeHelper.cs

namespace JdeScoping.DataSync.Contracts;

public interface IBulkMergeHelper
{
    Task<MergeResult> MergeAsync<T>(
        IAsyncEnumerable<T> data,
        string destinationTable,
        Expression<Func<T, object>> matchOn,
        Expression<Func<T, object>>? updateColumns = null,
        Expression<Func<T, T, bool>>? updateWhen = null,
        Expression<Func<T, object>>? insertColumns = null,
        string? tempTableName = null,
        int batchSize = 0,
        bool validateBeforeCopy = false,
        CancellationToken cancellationToken = default);
}

public record MergeResult(
    int TotalRowsProcessed,
    int RowsInserted,
    int RowsUpdated,
    int BatchCount,
    TimeSpan Elapsed);

Parameters:

Parameter Purpose Default
data Source records to merge required
destinationTable Target SQL table name required
matchOn PK expression for MERGE ON clause required
updateColumns Columns to SET on match null = all non-PK
updateWhen Condition for UPDATE null = always update
insertColumns Columns for INSERT null = all columns
tempTableName Staging table name #TEMP_{table}
batchSize Rows per batch 0 = all at once
validateBeforeCopy Pre-validate data against schema false

4. BulkMergeHelper Implementation

Location: JdeScoping.DataSync/Services/BulkMergeHelper.cs

Processing flow:

1. Parse expressions → extract column names
   matchOn: x => new { x.A, x.B }  →  ["A", "B"]

2. Get destination table schema (for temp table creation)
   SELECT TOP 0 * FROM WorkOrder → column types/lengths

3. Create temp table matching destination schema
   CREATE TABLE #TEMP_WorkOrder (... same columns ...)

4. If validateBeforeCopy: load schema constraints

5. Stream data in batches:
   foreach batch:
     a. Collect batchSize records from IAsyncEnumerable
     b. If validate: check each row against schema
     c. Create IDataReader via IDataReaderFactory
     d. SqlBulkCopy to temp table
     e. Execute MERGE statement
     f. TRUNCATE temp table
     g. Accumulate inserted/updated counts

6. DROP temp table (in finally block)

7. Return MergeResult with totals

Generated MERGE SQL:

MERGE INTO [WorkOrder] AS target
USING [#TEMP_WorkOrder] AS source
ON target.[WorkOrderNumber] = source.[WorkOrderNumber]
   AND target.[BranchCode] = source.[BranchCode]

WHEN MATCHED AND source.[LastUpdateDt] > target.[LastUpdateDt] THEN
    UPDATE SET
        target.[StatusCode] = source.[StatusCode],
        target.[OrderQuantity] = source.[OrderQuantity],
        target.[LastUpdateDt] = source.[LastUpdateDt]

WHEN NOT MATCHED THEN
    INSERT ([WorkOrderNumber], [BranchCode], [StatusCode], ...)
    VALUES (source.[WorkOrderNumber], source.[BranchCode], ...);

SELECT @@ROWCOUNT;

5. Error Handling

Exception hierarchy:

public class BulkMergeException : Exception
{
    public string TableName { get; init; }
    public int BatchNumber { get; init; }
    public int RowsInBatch { get; init; }
    public string? SqlStatement { get; init; }
}

public class BulkMergeValidationException : BulkMergeException
{
    public IReadOnlyList<ValidationError> Errors { get; init; }
}

public record ValidationError(
    int RowIndex,
    string ColumnName,
    object? Value,
    string Message);

Validation checks (when validateBeforeCopy: true):

Check Example Error
String length "Column 'StatusCode' value 'TOOLONG' exceeds max length 5 at row 42"
Null in non-nullable "Column 'WorkOrderNumber' cannot be null at row 17"
Type mismatch "Column 'OrderQuantity' expected int, got string at row 89"
Decimal precision "Column 'Amount' value 12345.6789 exceeds precision(10,2) at row 5"

6. DI Registration

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddDataSync(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        // Existing registrations...

        // Add bulk copy converters (generated)
        services.AddBulkCopyConverters();

        // Add bulk merge helper
        services.AddScoped<IBulkMergeHelper, BulkMergeHelper>();

        return services;
    }
}

Testing Strategy

Unit Tests (JdeScoping.DataSync.Tests)

Test Class Coverage
BulkMergeHelperTests Expression parsing, SQL generation, batch splitting
ExpressionParserTests Column name extraction from expressions
MergeSqlBuilderTests Generated MERGE SQL correctness
DataReaderFactoryTests Factory type resolution
ValidationTests Schema validation logic
BulkMergeExceptionTests Exception properties and formatting

Integration Tests (JdeScoping.DataSync.IntegrationTests)

Test Class Coverage
BulkMergeHelperIntegrationTests End-to-end merge against SQL Server
BatchingIntegrationTests Large datasets, multiple batches
ValidationIntegrationTests Schema validation against real table

Key scenarios:

  • Insert new records (WHEN NOT MATCHED)
  • Update existing records (WHEN MATCHED)
  • Conditional update respects updateWhen
  • Composite primary key matching
  • Batch processing (10k+ records across multiple batches)
  • Temp table cleanup on success and failure
  • Validation catches truncation before SQL error

Migration Plan

Code to Replace

File Change
StagingTableManager.cs Delete - replaced by BulkMergeHelper
TableSyncOperation.cs Simplify to use IBulkMergeHelper
LotFinderRepository.DataSync.cs Remove bulk-related methods

Before/After

Before:

await _stagingTableManager.CreateStagingTableAsync(...);
await _stagingTableManager.BulkCopyToStagingAsync(...);
await _stagingTableManager.MergeFromStagingAsync(...);
await _stagingTableManager.DropStagingTableAsync(...);

After:

var result = await _bulkMergeHelper.MergeAsync(
    data: fetcher.FetchAsync(lastUpdate),
    destinationTable: config.TableName,
    matchOn: config.MatchExpression,
    updateColumns: config.UpdateExpression,
    updateWhen: config.UpdateCondition,
    batchSize: _options.BatchSize);

Tests to Remove

  • StagingTableManagerTests.cs (unit)
  • StagingTableManagerTests.cs (integration)

Example Usage

// Simple case - match on single PK, update all columns
var result = await _bulkMergeHelper.MergeAsync(
    data: workOrders,
    destinationTable: "WorkOrder",
    matchOn: x => x.WorkOrderNumber);

// Full configuration
var result = await _bulkMergeHelper.MergeAsync(
    data: workOrders,
    destinationTable: "WorkOrder",
    matchOn: x => new { x.WorkOrderNumber, x.BranchCode },
    updateColumns: x => new { x.StatusCode, x.OrderQuantity, x.LastUpdateDt },
    updateWhen: (src, tgt) => src.LastUpdateDt > tgt.LastUpdateDt,
    batchSize: 10000,
    validateBeforeCopy: true);