Files
jdescopingtool/PLANS/2026-01-01-bulk-merge-helper-design.md
T
Joseph Doherty 26ff8d9b4f Initial commit: JDE Scoping Tool migration project
Set up repository with legacy .NET Framework 4.8 source (OLD/),
new .NET 10 Blazor solution (NEW/), OpenSpec specifications,
documentation, and project configuration.
2026-01-02 07:43:29 -05:00

376 lines
13 KiB
Markdown

# Bulk Merge Helper Design
**Date:** 2026-01-01
**Status:** Draft - Pending Review
## Overview
Replace the current `StagingTableManager` approach with a streamlined `IBulkMergeHelper` backed by source-generated `IDataReader` converters for efficient `SqlBulkCopy` operations.
## Goals
1. Simplify bulk merge operations to a single method call with expression-based configuration
2. Generate efficient `IAsyncEnumerable<T>` to `IDataReader` converters at compile time
3. Provide better error diagnostics with optional pre-validation
4. Remove manual staging table management code
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ JdeScoping.DataSync │
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
│ │ BulkCopyTypeRegistry│ │ IBulkMergeHelper │ │
│ │ - Lists types to │ │ - MergeAsync<T>(...) │ │
│ │ generate for │ │ - Uses IDataReaderFactory │ │
│ └─────────────────────┘ │ - Builds MERGE SQL from exprs │ │
│ │ └─────────────────────────────────┘ │
│ │ (analyzed by) │ │
│ ▼ │ (uses) │
│ ┌─────────────────────┐ ▼ │
│ │ Source Generator │ ┌─────────────────────────────────┐ │
│ │ - Generates │───▶│ Generated Code: │ │
│ │ IDataReader │ │ - WorkOrderDataReader │ │
│ │ wrappers │ │ - LotDataReader │ │
│ │ - Generates DI │ │ - DataReaderFactory impl │ │
│ │ registration │ │ - AddBulkCopyConverters() │ │
│ └─────────────────────┘ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Design Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Type identification | Explicit list in `BulkCopyTypeRegistry.cs` | Keeps Core project free of bulk copy concerns |
| Registry location | DataSync project | Consolidates bulk copy knowledge in one place |
| API style | Single method with expression parameters | Simple, all config visible in one place |
| Conditional updates | Explicit `updateWhen` expression | Flexible, not tied to property naming conventions |
| Error handling | Hybrid - context wrapping + optional validation | Balances performance with debuggability |
| Transactions | None - each batch independent | Matches current behavior, idempotent syncs |
| Generator project | Single `JdeScoping.DataSync.SourceGenerators` | Simple, can extract later if needed |
| DI pattern | Generic `IDataReaderFactory` | Single injection point, easy to mock |
| DELETE support | None | YAGNI, matches current behavior |
| Migration config | Convention + override | Less boilerplate, explicit when needed |
## Component Details
### 1. BulkCopyTypeRegistry
Location: `JdeScoping.DataSync/BulkCopyTypeRegistry.cs`
```csharp
namespace JdeScoping.DataSync;
public static class BulkCopyTypeRegistry
{
public static readonly Type[] Types =
[
typeof(WorkOrder),
typeof(Lot),
typeof(LotUsage),
typeof(Item),
typeof(WorkCenter),
typeof(ProfitCenter),
typeof(JdeUser),
typeof(Branch),
typeof(MisData),
];
}
```
### 2. Source Generator
Project: `JdeScoping.DataSync.SourceGenerators`
**Generated DataReader wrapper (per type):**
```csharp
public sealed class WorkOrderDataReader : IDataReader
{
private readonly IAsyncEnumerator<WorkOrder> _enumerator;
private WorkOrder? _current;
private static readonly string[] _columnNames =
["WorkOrderNumber", "BranchCode", "LotNumber", ...];
public object GetValue(int i) => i switch
{
0 => _current!.WorkOrderNumber,
1 => _current!.BranchCode,
// ... generated for each property
};
public bool Read()
{
return _enumerator.MoveNextAsync().AsTask().GetAwaiter().GetResult();
}
// IDataReader implementation...
}
```
**Generated factory:**
```csharp
public sealed class DataReaderFactory : IDataReaderFactory
{
public IDataReader CreateReader<T>(IAsyncEnumerable<T> source)
{
return source switch
{
IAsyncEnumerable<WorkOrder> wo => new WorkOrderDataReader(wo),
IAsyncEnumerable<Lot> lot => new LotDataReader(lot),
_ => throw new NotSupportedException($"No converter for {typeof(T).Name}")
};
}
}
```
**Generated DI extension:**
```csharp
public static class BulkCopyServiceCollectionExtensions
{
public static IServiceCollection AddBulkCopyConverters(this IServiceCollection services)
{
services.AddSingleton<IDataReaderFactory, DataReaderFactory>();
return services;
}
}
```
### 3. IBulkMergeHelper Interface
Location: `JdeScoping.DataSync/Contracts/IBulkMergeHelper.cs`
```csharp
namespace JdeScoping.DataSync.Contracts;
public interface IBulkMergeHelper
{
Task<MergeResult> MergeAsync<T>(
IAsyncEnumerable<T> data,
string destinationTable,
Expression<Func<T, object>> matchOn,
Expression<Func<T, object>>? updateColumns = null,
Expression<Func<T, T, bool>>? updateWhen = null,
Expression<Func<T, object>>? insertColumns = null,
string? tempTableName = null,
int batchSize = 0,
bool validateBeforeCopy = false,
CancellationToken cancellationToken = default);
}
public record MergeResult(
int TotalRowsProcessed,
int RowsInserted,
int RowsUpdated,
int BatchCount,
TimeSpan Elapsed);
```
**Parameters:**
| Parameter | Purpose | Default |
|-----------|---------|---------|
| `data` | Source records to merge | required |
| `destinationTable` | Target SQL table name | required |
| `matchOn` | PK expression for MERGE ON clause | required |
| `updateColumns` | Columns to SET on match | null = all non-PK |
| `updateWhen` | Condition for UPDATE | null = always update |
| `insertColumns` | Columns for INSERT | null = all columns |
| `tempTableName` | Staging table name | `#TEMP_{table}` |
| `batchSize` | Rows per batch | 0 = all at once |
| `validateBeforeCopy` | Pre-validate data against schema | false |
### 4. BulkMergeHelper Implementation
Location: `JdeScoping.DataSync/Services/BulkMergeHelper.cs`
**Processing flow:**
```
1. Parse expressions → extract column names
matchOn: x => new { x.A, x.B } → ["A", "B"]
2. Get destination table schema (for temp table creation)
SELECT TOP 0 * FROM WorkOrder → column types/lengths
3. Create temp table matching destination schema
CREATE TABLE #TEMP_WorkOrder (... same columns ...)
4. If validateBeforeCopy: load schema constraints
5. Stream data in batches:
foreach batch:
a. Collect batchSize records from IAsyncEnumerable
b. If validate: check each row against schema
c. Create IDataReader via IDataReaderFactory
d. SqlBulkCopy to temp table
e. Execute MERGE statement
f. TRUNCATE temp table
g. Accumulate inserted/updated counts
6. DROP temp table (in finally block)
7. Return MergeResult with totals
```
**Generated MERGE SQL:**
```sql
MERGE INTO [WorkOrder] AS target
USING [#TEMP_WorkOrder] AS source
ON target.[WorkOrderNumber] = source.[WorkOrderNumber]
AND target.[BranchCode] = source.[BranchCode]
WHEN MATCHED AND source.[LastUpdateDt] > target.[LastUpdateDt] THEN
UPDATE SET
target.[StatusCode] = source.[StatusCode],
target.[OrderQuantity] = source.[OrderQuantity],
target.[LastUpdateDt] = source.[LastUpdateDt]
WHEN NOT MATCHED THEN
INSERT ([WorkOrderNumber], [BranchCode], [StatusCode], ...)
VALUES (source.[WorkOrderNumber], source.[BranchCode], ...);
SELECT @@ROWCOUNT;
```
### 5. Error Handling
**Exception hierarchy:**
```csharp
public class BulkMergeException : Exception
{
public string TableName { get; init; }
public int BatchNumber { get; init; }
public int RowsInBatch { get; init; }
public string? SqlStatement { get; init; }
}
public class BulkMergeValidationException : BulkMergeException
{
public IReadOnlyList<ValidationError> Errors { get; init; }
}
public record ValidationError(
int RowIndex,
string ColumnName,
object? Value,
string Message);
```
**Validation checks (when `validateBeforeCopy: true`):**
| Check | Example Error |
|-------|---------------|
| String length | `"Column 'StatusCode' value 'TOOLONG' exceeds max length 5 at row 42"` |
| Null in non-nullable | `"Column 'WorkOrderNumber' cannot be null at row 17"` |
| Type mismatch | `"Column 'OrderQuantity' expected int, got string at row 89"` |
| Decimal precision | `"Column 'Amount' value 12345.6789 exceeds precision(10,2) at row 5"` |
### 6. DI Registration
```csharp
public static class ServiceCollectionExtensions
{
public static IServiceCollection AddDataSync(
this IServiceCollection services,
IConfiguration configuration)
{
// Existing registrations...
// Add bulk copy converters (generated)
services.AddBulkCopyConverters();
// Add bulk merge helper
services.AddScoped<IBulkMergeHelper, BulkMergeHelper>();
return services;
}
}
```
## Testing Strategy
### Unit Tests (JdeScoping.DataSync.Tests)
| Test Class | Coverage |
|------------|----------|
| `BulkMergeHelperTests` | Expression parsing, SQL generation, batch splitting |
| `ExpressionParserTests` | Column name extraction from expressions |
| `MergeSqlBuilderTests` | Generated MERGE SQL correctness |
| `DataReaderFactoryTests` | Factory type resolution |
| `ValidationTests` | Schema validation logic |
| `BulkMergeExceptionTests` | Exception properties and formatting |
### Integration Tests (JdeScoping.DataSync.IntegrationTests)
| Test Class | Coverage |
|------------|----------|
| `BulkMergeHelperIntegrationTests` | End-to-end merge against SQL Server |
| `BatchingIntegrationTests` | Large datasets, multiple batches |
| `ValidationIntegrationTests` | Schema validation against real table |
**Key scenarios:**
- Insert new records (WHEN NOT MATCHED)
- Update existing records (WHEN MATCHED)
- Conditional update respects `updateWhen`
- Composite primary key matching
- Batch processing (10k+ records across multiple batches)
- Temp table cleanup on success and failure
- Validation catches truncation before SQL error
## Migration Plan
### Code to Replace
| File | Change |
|------|--------|
| `StagingTableManager.cs` | Delete - replaced by `BulkMergeHelper` |
| `TableSyncOperation.cs` | Simplify to use `IBulkMergeHelper` |
| `LotFinderRepository.DataSync.cs` | Remove bulk-related methods |
### Before/After
**Before:**
```csharp
await _stagingTableManager.CreateStagingTableAsync(...);
await _stagingTableManager.BulkCopyToStagingAsync(...);
await _stagingTableManager.MergeFromStagingAsync(...);
await _stagingTableManager.DropStagingTableAsync(...);
```
**After:**
```csharp
var result = await _bulkMergeHelper.MergeAsync(
data: fetcher.FetchAsync(lastUpdate),
destinationTable: config.TableName,
matchOn: config.MatchExpression,
updateColumns: config.UpdateExpression,
updateWhen: config.UpdateCondition,
batchSize: _options.BatchSize);
```
### Tests to Remove
- `StagingTableManagerTests.cs` (unit)
- `StagingTableManagerTests.cs` (integration)
## Example Usage
```csharp
// Simple case - match on single PK, update all columns
var result = await _bulkMergeHelper.MergeAsync(
data: workOrders,
destinationTable: "WorkOrder",
matchOn: x => x.WorkOrderNumber);
// Full configuration
var result = await _bulkMergeHelper.MergeAsync(
data: workOrders,
destinationTable: "WorkOrder",
matchOn: x => new { x.WorkOrderNumber, x.BranchCode },
updateColumns: x => new { x.StatusCode, x.OrderQuantity, x.LastUpdateDt },
updateWhen: (src, tgt) => src.LastUpdateDt > tgt.LastUpdateDt,
batchSize: 10000,
validateBeforeCopy: true);
```