26ff8d9b4f
Set up repository with legacy .NET Framework 4.8 source (OLD/), new .NET 10 Blazor solution (NEW/), OpenSpec specifications, documentation, and project configuration.
13 KiB
13 KiB
Bulk Merge Helper Design
Date: 2026-01-01 Status: Draft - Pending Review
Overview
Replace the current StagingTableManager approach with a streamlined IBulkMergeHelper backed by source-generated IDataReader converters for efficient SqlBulkCopy operations.
Goals
- Simplify bulk merge operations to a single method call with expression-based configuration
- Generate efficient
IAsyncEnumerable<T>toIDataReaderconverters at compile time - Provide better error diagnostics with optional pre-validation
- Remove manual staging table management code
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ JdeScoping.DataSync │
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
│ │ BulkCopyTypeRegistry│ │ IBulkMergeHelper │ │
│ │ - Lists types to │ │ - MergeAsync<T>(...) │ │
│ │ generate for │ │ - Uses IDataReaderFactory │ │
│ └─────────────────────┘ │ - Builds MERGE SQL from exprs │ │
│ │ └─────────────────────────────────┘ │
│ │ (analyzed by) │ │
│ ▼ │ (uses) │
│ ┌─────────────────────┐ ▼ │
│ │ Source Generator │ ┌─────────────────────────────────┐ │
│ │ - Generates │───▶│ Generated Code: │ │
│ │ IDataReader │ │ - WorkOrderDataReader │ │
│ │ wrappers │ │ - LotDataReader │ │
│ │ - Generates DI │ │ - DataReaderFactory impl │ │
│ │ registration │ │ - AddBulkCopyConverters() │ │
│ └─────────────────────┘ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Type identification | Explicit list in BulkCopyTypeRegistry.cs |
Keeps Core project free of bulk copy concerns |
| Registry location | DataSync project | Consolidates bulk copy knowledge in one place |
| API style | Single method with expression parameters | Simple, all config visible in one place |
| Conditional updates | Explicit updateWhen expression |
Flexible, not tied to property naming conventions |
| Error handling | Hybrid - context wrapping + optional validation | Balances performance with debuggability |
| Transactions | None - each batch independent | Matches current behavior, idempotent syncs |
| Generator project | Single JdeScoping.DataSync.SourceGenerators |
Simple, can extract later if needed |
| DI pattern | Generic IDataReaderFactory |
Single injection point, easy to mock |
| DELETE support | None | YAGNI, matches current behavior |
| Migration config | Convention + override | Less boilerplate, explicit when needed |
Component Details
1. BulkCopyTypeRegistry
Location: JdeScoping.DataSync/BulkCopyTypeRegistry.cs
namespace JdeScoping.DataSync;
public static class BulkCopyTypeRegistry
{
public static readonly Type[] Types =
[
typeof(WorkOrder),
typeof(Lot),
typeof(LotUsage),
typeof(Item),
typeof(WorkCenter),
typeof(ProfitCenter),
typeof(JdeUser),
typeof(Branch),
typeof(MisData),
];
}
2. Source Generator
Project: JdeScoping.DataSync.SourceGenerators
Generated DataReader wrapper (per type):
public sealed class WorkOrderDataReader : IDataReader
{
private readonly IAsyncEnumerator<WorkOrder> _enumerator;
private WorkOrder? _current;
private static readonly string[] _columnNames =
["WorkOrderNumber", "BranchCode", "LotNumber", ...];
public object GetValue(int i) => i switch
{
0 => _current!.WorkOrderNumber,
1 => _current!.BranchCode,
// ... generated for each property
};
public bool Read()
{
return _enumerator.MoveNextAsync().AsTask().GetAwaiter().GetResult();
}
// IDataReader implementation...
}
Generated factory:
public sealed class DataReaderFactory : IDataReaderFactory
{
public IDataReader CreateReader<T>(IAsyncEnumerable<T> source)
{
return source switch
{
IAsyncEnumerable<WorkOrder> wo => new WorkOrderDataReader(wo),
IAsyncEnumerable<Lot> lot => new LotDataReader(lot),
_ => throw new NotSupportedException($"No converter for {typeof(T).Name}")
};
}
}
Generated DI extension:
public static class BulkCopyServiceCollectionExtensions
{
public static IServiceCollection AddBulkCopyConverters(this IServiceCollection services)
{
services.AddSingleton<IDataReaderFactory, DataReaderFactory>();
return services;
}
}
3. IBulkMergeHelper Interface
Location: JdeScoping.DataSync/Contracts/IBulkMergeHelper.cs
namespace JdeScoping.DataSync.Contracts;
public interface IBulkMergeHelper
{
Task<MergeResult> MergeAsync<T>(
IAsyncEnumerable<T> data,
string destinationTable,
Expression<Func<T, object>> matchOn,
Expression<Func<T, object>>? updateColumns = null,
Expression<Func<T, T, bool>>? updateWhen = null,
Expression<Func<T, object>>? insertColumns = null,
string? tempTableName = null,
int batchSize = 0,
bool validateBeforeCopy = false,
CancellationToken cancellationToken = default);
}
public record MergeResult(
int TotalRowsProcessed,
int RowsInserted,
int RowsUpdated,
int BatchCount,
TimeSpan Elapsed);
Parameters:
| Parameter | Purpose | Default |
|---|---|---|
data |
Source records to merge | required |
destinationTable |
Target SQL table name | required |
matchOn |
PK expression for MERGE ON clause | required |
updateColumns |
Columns to SET on match | null = all non-PK |
updateWhen |
Condition for UPDATE | null = always update |
insertColumns |
Columns for INSERT | null = all columns |
tempTableName |
Staging table name | #TEMP_{table} |
batchSize |
Rows per batch | 0 = all at once |
validateBeforeCopy |
Pre-validate data against schema | false |
4. BulkMergeHelper Implementation
Location: JdeScoping.DataSync/Services/BulkMergeHelper.cs
Processing flow:
1. Parse expressions → extract column names
matchOn: x => new { x.A, x.B } → ["A", "B"]
2. Get destination table schema (for temp table creation)
SELECT TOP 0 * FROM WorkOrder → column types/lengths
3. Create temp table matching destination schema
CREATE TABLE #TEMP_WorkOrder (... same columns ...)
4. If validateBeforeCopy: load schema constraints
5. Stream data in batches:
foreach batch:
a. Collect batchSize records from IAsyncEnumerable
b. If validate: check each row against schema
c. Create IDataReader via IDataReaderFactory
d. SqlBulkCopy to temp table
e. Execute MERGE statement
f. TRUNCATE temp table
g. Accumulate inserted/updated counts
6. DROP temp table (in finally block)
7. Return MergeResult with totals
Generated MERGE SQL:
MERGE INTO [WorkOrder] AS target
USING [#TEMP_WorkOrder] AS source
ON target.[WorkOrderNumber] = source.[WorkOrderNumber]
AND target.[BranchCode] = source.[BranchCode]
WHEN MATCHED AND source.[LastUpdateDt] > target.[LastUpdateDt] THEN
UPDATE SET
target.[StatusCode] = source.[StatusCode],
target.[OrderQuantity] = source.[OrderQuantity],
target.[LastUpdateDt] = source.[LastUpdateDt]
WHEN NOT MATCHED THEN
INSERT ([WorkOrderNumber], [BranchCode], [StatusCode], ...)
VALUES (source.[WorkOrderNumber], source.[BranchCode], ...);
SELECT @@ROWCOUNT;
5. Error Handling
Exception hierarchy:
public class BulkMergeException : Exception
{
public string TableName { get; init; }
public int BatchNumber { get; init; }
public int RowsInBatch { get; init; }
public string? SqlStatement { get; init; }
}
public class BulkMergeValidationException : BulkMergeException
{
public IReadOnlyList<ValidationError> Errors { get; init; }
}
public record ValidationError(
int RowIndex,
string ColumnName,
object? Value,
string Message);
Validation checks (when validateBeforeCopy: true):
| Check | Example Error |
|---|---|
| String length | "Column 'StatusCode' value 'TOOLONG' exceeds max length 5 at row 42" |
| Null in non-nullable | "Column 'WorkOrderNumber' cannot be null at row 17" |
| Type mismatch | "Column 'OrderQuantity' expected int, got string at row 89" |
| Decimal precision | "Column 'Amount' value 12345.6789 exceeds precision(10,2) at row 5" |
6. DI Registration
public static class ServiceCollectionExtensions
{
public static IServiceCollection AddDataSync(
this IServiceCollection services,
IConfiguration configuration)
{
// Existing registrations...
// Add bulk copy converters (generated)
services.AddBulkCopyConverters();
// Add bulk merge helper
services.AddScoped<IBulkMergeHelper, BulkMergeHelper>();
return services;
}
}
Testing Strategy
Unit Tests (JdeScoping.DataSync.Tests)
| Test Class | Coverage |
|---|---|
BulkMergeHelperTests |
Expression parsing, SQL generation, batch splitting |
ExpressionParserTests |
Column name extraction from expressions |
MergeSqlBuilderTests |
Generated MERGE SQL correctness |
DataReaderFactoryTests |
Factory type resolution |
ValidationTests |
Schema validation logic |
BulkMergeExceptionTests |
Exception properties and formatting |
Integration Tests (JdeScoping.DataSync.IntegrationTests)
| Test Class | Coverage |
|---|---|
BulkMergeHelperIntegrationTests |
End-to-end merge against SQL Server |
BatchingIntegrationTests |
Large datasets, multiple batches |
ValidationIntegrationTests |
Schema validation against real table |
Key scenarios:
- Insert new records (WHEN NOT MATCHED)
- Update existing records (WHEN MATCHED)
- Conditional update respects
updateWhen - Composite primary key matching
- Batch processing (10k+ records across multiple batches)
- Temp table cleanup on success and failure
- Validation catches truncation before SQL error
Migration Plan
Code to Replace
| File | Change |
|---|---|
StagingTableManager.cs |
Delete - replaced by BulkMergeHelper |
TableSyncOperation.cs |
Simplify to use IBulkMergeHelper |
LotFinderRepository.DataSync.cs |
Remove bulk-related methods |
Before/After
Before:
await _stagingTableManager.CreateStagingTableAsync(...);
await _stagingTableManager.BulkCopyToStagingAsync(...);
await _stagingTableManager.MergeFromStagingAsync(...);
await _stagingTableManager.DropStagingTableAsync(...);
After:
var result = await _bulkMergeHelper.MergeAsync(
data: fetcher.FetchAsync(lastUpdate),
destinationTable: config.TableName,
matchOn: config.MatchExpression,
updateColumns: config.UpdateExpression,
updateWhen: config.UpdateCondition,
batchSize: _options.BatchSize);
Tests to Remove
StagingTableManagerTests.cs(unit)StagingTableManagerTests.cs(integration)
Example Usage
// Simple case - match on single PK, update all columns
var result = await _bulkMergeHelper.MergeAsync(
data: workOrders,
destinationTable: "WorkOrder",
matchOn: x => x.WorkOrderNumber);
// Full configuration
var result = await _bulkMergeHelper.MergeAsync(
data: workOrders,
destinationTable: "WorkOrder",
matchOn: x => new { x.WorkOrderNumber, x.BranchCode },
updateColumns: x => new { x.StatusCode, x.OrderQuantity, x.LastUpdateDt },
updateWhen: (src, tgt) => src.LastUpdateDt > tgt.LastUpdateDt,
batchSize: 10000,
validateBeforeCopy: true);