The WHERE clause was comparing Code to itself instead of the aliased table reference, which would always be true.
26 KiB
Old ETL Removal Design
Goal
Remove the legacy ETL implementation (Fetchers, MergeConfigurations, BulkMerge services, SourceGenerator) and wire the existing orchestration layer to use the new EtlPipeline system.
Background
The codebase has two parallel ETL implementations:
- OLD:
IDataFetcher<T>→BulkMergeHelper→IMergeConfiguration<T>with source-generatedIDataReaderimplementations - NEW:
EtlPipelinewithIImportSource→IDataTransformer→IImportDestination
The new pipeline is working well. The old implementation can be removed.
Design Decisions
- Keep orchestration layer -
DataSyncService,SyncOrchestrator,ScheduleCheckerremain; only internals change - Keep tracking -
DataUpdateRepositorypreserved; new pipeline writes sync timestamps - JSON config-driven pipelines - Pipeline definitions loaded from JSON files at runtime (not compiled code)
- Builder pattern for factory -
IEtlPipelineFactoryuses fluent builder:.ForTable().WithMode().Build() - Generic DbQuerySource - Single source class with connection type specified in config (not separate Oracle/Sybase classes)
- Conditional merge support - Extend
DbBulkMergeDestinationwithUpdateWhencondition - Relative time offsets - MinDt parameter uses TimeSpan format (e.g.,
"-7.00:00:00") computed at runtime - Config table names - Factory uses table names exactly as defined in config (e.g.,
WorkOrder_Curr) - MisData post-processing - Convert
MisDataPostProcessorto SQL post-script in pipeline - Sync mode mapping - Daily and Hourly both map to
incrementalmode; ScheduleChecker can override offset at runtime - Parameter mapping - Config defines parameter mappings for provider-specific syntax (
:dateUpdatedvs@MinDt) and format conversions (JDE Julian) - Destination override - Base destination config, sync modes can override destination type/settings
- Exclude list for updates - Default: update all non-match columns; config can specify
excludeFromUpdatefor exceptions - Implement new first - Build new factory/sources/config before deleting old code to keep build working
- Config as content file -
pipelines.jsoncopied to output directory, loaded from disk at runtime - PrePurge/ReIndex as scripts - PrePurge becomes pre-script (TRUNCATE), ReIndex becomes post-script
- Partial merge for overrides - Mode-specific destination config merges with base (only specified fields override)
- Generic parameters - Support arbitrary parameters with source types:
offset,static,runtime - Configurable timezone - JDE Julian conversion uses configurable timezone (UTC or local)
- Fail fast on missing config - Factory throws if requested sync mode not defined in config
- Runtime parameters deferred - Only
offsetandstaticparameter sources supported; throw ifruntimeused - Pipeline config owns PrePurge/ReIndex - Remove schedule flags; pipeline config is single source of truth
- JSON camelCase - Use
JsonSerializerOptionswithPropertyNameCaseInsensitive = true
Files to Delete
Source Generator Project (entire project)
src/JdeScoping.DataSync.SourceGenerators/
├── DataReaderGenerator.cs
├── IsExternalInit.cs
└── JdeScoping.DataSync.SourceGenerators.csproj
DataSync Source Files (~32 files)
src/JdeScoping.DataSync/
├── BulkCopyTypeRegistry.cs
├── Contracts/
│ ├── IBulkMergeHelper.cs
│ ├── IDataFetcher.cs
│ ├── IDataReaderFactory.cs
│ ├── IMergeConfiguration.cs
│ ├── IMergeConfigurationRegistry.cs
│ ├── IPostProcessor.cs
│ └── ISchemaValidator.cs
├── Configuration/MergeConfigurations/
│ ├── BranchMergeConfiguration.cs
│ ├── ItemMergeConfiguration.cs
│ ├── JdeUserMergeConfiguration.cs
│ ├── LotMergeConfiguration.cs
│ ├── LotUsageMergeConfiguration.cs
│ ├── MisDataMergeConfiguration.cs
│ ├── ProfitCenterMergeConfiguration.cs
│ ├── WorkCenterMergeConfiguration.cs
│ └── WorkOrderMergeConfiguration.cs
├── Exceptions/BulkMergeException.cs
├── Fetchers/
│ ├── Cms/CmsMisDataFetcher.cs
│ └── Jde/
│ ├── JdeBranchFetcher.cs
│ ├── JdeItemFetcher.cs
│ ├── JdeLotFetcher.cs
│ ├── JdeLotUsageFetcher.cs
│ ├── JdeProfitCenterFetcher.cs
│ ├── JdeUserFetcher.cs
│ ├── JdeWorkCenterFetcher.cs
│ └── JdeWorkOrderFetcher.cs
├── Models/
│ ├── ColumnSchema.cs
│ └── MergeResult.cs
└── Services/
├── BulkMergeHelper.cs
├── ExpressionParser.cs
├── MergeConfigurationRegistry.cs
├── MergeSqlBuilder.cs
├── MisDataPostProcessor.cs
└── SchemaValidator.cs
Test Files (~8 files)
tests/JdeScoping.DataSync.Tests/
├── Services/
│ ├── BulkMergeHelperTests.cs
│ ├── ExpressionParserTests.cs
│ ├── MergeConfigurationRegistryTests.cs
│ ├── MergeSqlBuilderTests.cs
│ └── SchemaValidatorTests.cs
└── TableSyncOperationTests.cs
tests/JdeScoping.DataSync.IntegrationTests/
├── BulkMergeHelperTests.cs
└── TableSyncOperationTests.cs
Integration Test Infrastructure (entire folder)
tests/JdeScoping.DataSync.IntegrationTests/Infrastructure/
├── TestDataReaderFactory.cs
├── BulkMergeTestEntityDataReader.cs
├── BulkMergeTestEntity.cs
├── TestDatabaseInitializer.cs
├── TestDbConnectionFactory.cs
├── TestDataGenerator.cs
└── SqlServerFixture.cs
Consider removing entire tests/JdeScoping.DataSync.IntegrationTests/ project if all tests are obsolete.
Files to Modify
1. TableSyncOperation.cs - Major Rewrite
Current: Uses IDataFetcher<T>, IBulkMergeHelper, IMergeConfiguration<T>
New: Uses IEtlPipelineFactory to get and execute pipelines
2. DependencyInjection.cs - Remove Old Registrations
Remove:
using JdeScoping.DataSync.Generated;statement- All
IDataFetcher<T>registrations - All
IMergeConfiguration<T>registrations IBulkMergeHelper,IDataReaderFactory,ISchemaValidatorIMergeConfigurationRegistryIPostProcessor,MisDataPostProcessor- Named fetcher registrations
DataReaderFactoryregistration
Keep:
DataSyncService,ISyncOrchestrator,IScheduleCheckerIDataUpdateRepository- Health check and metrics
Add:
IEtlPipelineFactoryregistrationPipelineOptionsconfiguration binding
3. DataSourceConfig.cs - Remove Unused Properties
Remove these properties:
FetcherTypeNamePostProcessorTypeNamePrepurgeData(now in pipeline config)ReIndexData(now in pipeline config)
4. appsettings.json / appsettings.Development.json
Remove FetcherTypeName and PostProcessorTypeName from data source configurations
5. JdeScoping.slnx
Remove JdeScoping.DataSync.SourceGenerators project reference
6. JdeScoping.DataSync.csproj
- Remove reference to SourceGenerators project
- Remove
InternalsVisibleTofor integration tests if project removed
7. Tests to Update (not delete)
ScheduleCheckerTests.cs- Update test fixtures to remove FetcherTypeNameSyncOrchestratorTests.cs- Update test fixtures to remove FetcherTypeName
Files to Create
1. Pipeline Configuration Files
Pipeline definitions stored in JSON, copied to output directory at build time.
Location: src/JdeScoping.DataSync/Pipelines/pipelines.json
Project file entry:
<ItemGroup>
<Content Include="Pipelines\pipelines.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
</ItemGroup>
Complete Schema Example:
{
"settings": {
"timezone": "UTC" // or "Local" - used for JDE Julian conversion
},
"pipelines": {
"WorkOrder_Curr": {
"source": {
"connection": "jde",
"query": "SELECT WADOCO, WADC0J, ... FROM PRODDTA.F4801 WHERE UPMJ >= :dateUpdated",
"parameters": {
"minDt": {
"name": ":dateUpdated",
"format": "jdeJulian",
"source": "offset"
}
}
},
"syncModes": {
"mass": {
"minDtOffset": "-365.00:00:00",
"prePurge": true,
"reIndex": true
},
"incremental": {
"minDtOffset": "-1.00:00:00"
}
},
"transformers": [
{ "type": "jdeDate", "columns": ["OrderDate", "CompletionDate", "StartDate"] },
{ "type": "columnRename", "mappings": { "WADOCO": "OrderNumber", "WADC0J": "Branch" } }
],
"destination": {
"table": "WorkOrder_Curr",
"matchColumns": ["OrderNumber"],
"excludeFromUpdate": ["CreatedDate"]
}
},
"MisData": {
"source": {
"connection": "cms",
"query": "SELECT ... FROM MIS_DATA WHERE LastUpdate >= @MinDt",
"parameters": {
"minDt": {
"name": "@MinDt",
"source": "offset"
}
}
},
"syncModes": {
"mass": {
"minDtOffset": "-365.00:00:00",
"prePurge": true,
"destination": { "type": "bulkImport" }
},
"incremental": {
"minDtOffset": "-7.00:00:00",
"destination": { "type": "bulkMerge" }
}
},
"destination": {
"table": "MisData",
"matchColumns": ["MisDataId"]
},
"postScripts": [
"UPDATE MisData SET ProcessedFlag = 1 WHERE ProcessedFlag IS NULL"
]
}
}
}
PrePurge/ReIndex behavior:
prePurge: true→ Factory adds pre-script:TRUNCATE TABLE [TableName]reIndex: true→ Factory adds post-script:ALTER INDEX ALL ON [TableName] REBUILD
2. Pipeline Configuration Models
PipelinesRoot.cs - Root config structure
namespace JdeScoping.DataSync.Configuration;
public record PipelinesRoot(
PipelineSettings Settings,
Dictionary<string, PipelineConfig> Pipelines);
public record PipelineSettings(
string Timezone = "UTC"); // "UTC" or "Local"
PipelineConfig.cs
namespace JdeScoping.DataSync.Configuration;
public record PipelineConfig(
SourceConfig Source,
Dictionary<string, SyncModeConfig> SyncModes,
List<TransformerConfig>? Transformers,
DestinationConfig Destination,
List<string>? PreScripts,
List<string>? PostScripts);
public record SourceConfig(
string Connection, // "jde", "cms", "lotfinder"
string Query,
Dictionary<string, ParameterConfig>? Parameters);
public record ParameterConfig(
string Name, // Provider-specific: ":dateUpdated" or "@MinDt"
string? Format, // "jdeJulian", "jdeTime", null for DateTime
string Source = "offset", // "offset", "static", "runtime"
string? Value); // For static source
public record SyncModeConfig(
string? MinDtOffset, // TimeSpan format: "-7.00:00:00"
bool PrePurge = false,
bool ReIndex = false,
string? UpdateWhen = null, // Conditional update expression
DestinationOverride? Destination = null); // Override base destination (partial merge)
public record DestinationOverride(
string? Type, // "bulkImport" or "bulkMerge"
List<string>? MatchColumns, // Override match columns
List<string>? ExcludeFromUpdate); // Override exclude list
public record TransformerConfig(
string Type,
List<string>? Columns,
Dictionary<string, string>? Mappings);
public record DestinationConfig(
string Table,
List<string>? MatchColumns, // For merge operations
List<string>? ExcludeFromUpdate); // Columns to skip on update
PipelineOptions.cs
namespace JdeScoping.DataSync.Options;
public class PipelineOptions
{
public const string SectionName = "Pipelines";
public string ConfigPath { get; set; } = "Pipelines/pipelines.json";
}
Config loading with validation:
private static readonly JsonSerializerOptions JsonOptions = new()
{
PropertyNameCaseInsensitive = true,
ReadCommentHandling = JsonCommentHandling.Skip,
AllowTrailingCommas = true
};
private PipelinesRoot LoadPipelineConfigs(string configPath)
{
// Resolve path relative to assembly location (handles both debug and publish)
var assemblyDir = Path.GetDirectoryName(typeof(EtlPipelineFactory).Assembly.Location)!;
var fullPath = Path.Combine(assemblyDir, configPath);
if (!File.Exists(fullPath))
throw new FileNotFoundException($"Pipeline config not found: {fullPath}");
var json = File.ReadAllText(fullPath);
var root = JsonSerializer.Deserialize<PipelinesRoot>(json, JsonOptions)
?? throw new InvalidOperationException("Failed to deserialize pipeline config");
// Validate all pipelines have required sync modes
foreach (var (name, config) in root.Pipelines)
{
if (!config.SyncModes.ContainsKey("mass"))
throw new InvalidOperationException($"Pipeline '{name}' missing 'mass' sync mode");
if (!config.SyncModes.ContainsKey("incremental"))
throw new InvalidOperationException($"Pipeline '{name}' missing 'incremental' sync mode");
// Validate no runtime parameters (not yet supported)
if (config.Source.Parameters != null)
{
foreach (var (paramName, paramConfig) in config.Source.Parameters)
{
if (paramConfig.Source == "runtime")
throw new NotSupportedException(
$"Pipeline '{name}' parameter '{paramName}': runtime source not yet supported");
}
}
}
return root;
}
3. IEtlPipelineFactory.cs (Builder Pattern)
namespace JdeScoping.DataSync.Contracts;
public interface IEtlPipelineFactory
{
IEtlPipelineBuilder ForTable(string tableName);
}
public interface IEtlPipelineBuilder
{
IEtlPipelineBuilder WithMode(SyncMode mode);
IEtlPipelineBuilder WithMinimumDate(DateTime? minDt); // Override config offset
EtlPipeline Build();
}
// Note: No WithPrePurge/WithReIndex - pipeline config is source of truth
public enum SyncMode
{
Mass, // Full refresh - uses bulkImport by default
Incremental // Delta sync - uses bulkMerge by default
}
4. EtlPipelineFactory.cs
namespace JdeScoping.DataSync.Services;
public class EtlPipelineFactory : IEtlPipelineFactory
{
private readonly IDbConnectionFactory _connectionFactory;
private readonly Dictionary<string, PipelineConfig> _configs;
public EtlPipelineFactory(
IDbConnectionFactory connectionFactory,
IOptions<PipelineOptions> options)
{
_connectionFactory = connectionFactory;
_configs = LoadPipelineConfigs(options.Value.ConfigPath);
}
public IEtlPipelineBuilder ForTable(string tableName)
{
if (!_configs.TryGetValue(tableName, out var config))
throw new ArgumentException($"No pipeline configured for table: {tableName}");
return new PipelineBuilder(_connectionFactory, tableName, config);
}
private class PipelineBuilder : IEtlPipelineBuilder
{
private readonly IDbConnectionFactory _connectionFactory;
private readonly string _tableName;
private readonly PipelineConfig _config;
private readonly PipelineSettings _settings;
private SyncMode _mode = SyncMode.Incremental;
private DateTime? _minDtOverride;
public EtlPipeline Build()
{
var modeKey = _mode == SyncMode.Mass ? "mass" : "incremental";
var modeConfig = _config.SyncModes[modeKey]; // Already validated at load
// Compute MinDt from offset or override
var minDt = _minDtOverride ?? ComputeMinDt(modeConfig.MinDtOffset);
// Create source with parameter substitution
var source = CreateSource(_config.Source, minDt);
// Determine destination type (mode override > default by mode)
var destType = modeConfig.Destination?.Type
?? (_mode == SyncMode.Mass ? "bulkImport" : "bulkMerge");
var destination = CreateDestination(destType, _config.Destination, modeConfig);
// Build pipeline with scripts
var builder = new EtlPipelineBuilder()
.WithName(_tableName)
.WithSource(source)
.WithDestination(destination);
// Add pre-scripts: config scripts first, then prePurge
foreach (var script in _config.PreScripts ?? [])
builder.WithPreScript(new SqlScriptRunner(_connectionFactory, script));
if (modeConfig.PrePurge)
builder.WithPreScript(new SqlScriptRunner(_connectionFactory,
$"TRUNCATE TABLE [{_config.Destination.Table}]"));
// Add post-scripts: reIndex first, then config scripts
if (modeConfig.ReIndex)
builder.WithPostScript(new SqlScriptRunner(_connectionFactory,
$"ALTER INDEX ALL ON [{_config.Destination.Table}] REBUILD"));
foreach (var script in _config.PostScripts ?? [])
builder.WithPostScript(new SqlScriptRunner(_connectionFactory, script));
// Add transformers
foreach (var t in _config.Transformers ?? [])
builder.WithTransformer(CreateTransformer(t));
return builder.Build();
}
}
}
5. DbQuerySource.cs (Generic)
namespace JdeScoping.DataSync.Etl.Sources;
/// <summary>
/// Generic database query source that works with any connection type.
/// </summary>
public class DbQuerySource : IImportSource
{
private readonly IDbConnectionFactory _connectionFactory;
private readonly string _connectionType; // "jde", "cms", "lotfinder"
private readonly string _query;
private readonly Dictionary<string, object> _parameters;
public string SourceName => $"DbQuery:{_connectionType}";
public DbQuerySource(
IDbConnectionFactory connectionFactory,
string connectionType,
string query,
Dictionary<string, object>? parameters = null)
{
_connectionFactory = connectionFactory;
_connectionType = connectionType;
_query = query;
_parameters = parameters ?? new();
}
public async Task<IDataReader> ReadDataAsync(CancellationToken ct)
{
var connection = _connectionType switch
{
"jde" => await _connectionFactory.CreateJdeConnectionAsync(),
"cms" => await _connectionFactory.CreateCmsConnectionAsync(),
"lotfinder" => await _connectionFactory.CreateLotFinderConnectionAsync(),
_ => throw new ArgumentException($"Unknown connection type: {_connectionType}")
};
var command = connection.CreateCommand();
command.CommandText = _query;
foreach (var (name, value) in _parameters)
{
var param = command.CreateParameter();
param.ParameterName = name;
param.Value = value;
command.Parameters.Add(param);
}
return await command.ExecuteReaderAsync(CommandBehavior.CloseConnection, ct);
}
}
6. Parameter Format Converters
namespace JdeScoping.DataSync.Services;
public class ParameterFormatConverter
{
private readonly TimeZoneInfo _timezone;
public ParameterFormatConverter(string timezone)
{
_timezone = timezone.ToUpperInvariant() switch
{
"UTC" => TimeZoneInfo.Utc,
"LOCAL" => TimeZoneInfo.Local,
_ => TimeZoneInfo.FindSystemTimeZoneById(timezone)
};
}
public object Convert(DateTime minDt, string? format)
{
// Convert to configured timezone
var adjusted = TimeZoneInfo.ConvertTime(minDt, _timezone);
return format switch
{
"jdeJulian" => ToJdeJulianDate(adjusted),
"jdeTime" => ToJdeTime(adjusted),
null => adjusted,
_ => throw new ArgumentException($"Unknown format: {format}")
};
}
private static int ToJdeJulianDate(DateTime date)
{
// JDE Julian: CYYDDD where C=century (0=19xx, 1=20xx), YY=year, DDD=day of year
int century = date.Year >= 2000 ? 1 : 0;
int year = date.Year % 100;
int dayOfYear = date.DayOfYear;
return century * 100000 + year * 1000 + dayOfYear;
}
private static int ToJdeTime(DateTime time)
{
// JDE Time: HHMMSS
return time.Hour * 10000 + time.Minute * 100 + time.Second;
}
}
7. DbBulkMergeDestination Extension
Extend to support conditional updates and exclude columns:
// Updated constructor
public DbBulkMergeDestination(
IDbConnectionFactory connectionFactory,
string tableName,
string[] matchColumns,
string[]? excludeFromUpdate = null,
string? updateCondition = null) // e.g., "source.LastUpdateDt > target.LastUpdateDt"
Modify MERGE SQL generation:
MERGE INTO [Target] AS target
USING #TempTable AS source
ON target.OrderNumber = source.OrderNumber
WHEN MATCHED AND source.LastUpdateDt > target.LastUpdateDt THEN -- updateCondition
UPDATE SET
target.Col1 = source.Col1,
-- excludeFromUpdate columns omitted
WHEN NOT MATCHED THEN
INSERT (...)
VALUES (...);
TableSyncOperation.cs - Error Handling
EtlPipeline.ExecuteAsync returns PipelineResult with Success=false on failure (doesn't throw).
TableSyncOperation must check Success and throw to keep DataUpdate records correct:
var pipeline = _pipelineFactory
.ForTable(config.TableName)
.WithMode(updateTask.IsMassUpdate ? SyncMode.Mass : SyncMode.Incremental)
.WithMinimumDate(updateTask.MinimumDt) // ScheduleChecker can override
.Build();
var result = await pipeline.ExecuteAsync(ct);
if (!result.Success)
throw new InvalidOperationException($"Pipeline failed for {config.TableName}: {result.ErrorMessage}");
Files to Keep (No Changes)
DataSyncService.cs- Background service hostSyncOrchestrator.cs- Orchestration logicScheduleChecker.cs- Schedule checking logic (provides mode and can override offset)DataUpdateRepository.cs- Sync timestamp trackingDataSyncHealthCheck.cs- Health monitoringDataSyncMetrics.cs- TelemetryDataSyncActivitySource.cs- Tracing
Test Files to Keep (with updates)
SyncOrchestratorTests.cs- Update fixtures to remove FetcherTypeNameScheduleCheckerTests.cs- Update fixtures to remove FetcherTypeNameDataSyncServiceTests.csDataSyncHealthCheckTests.cs
New Tests to Create
EtlPipelineFactoryTests.cs- Test config loading and pipeline buildingDbQuerySourceTests.cs- Test connection type switching and parameter handlingParameterFormatConverterTests.cs- Test JDE Julian/time conversionsDbBulkMergeDestinationTests.cs- Test UpdateWhen and excludeFromUpdate
Validation & Precedence Rules
Required Fields (fail at config load)
source.connection- must be "jde", "cms", or "lotfinder"source.query- must be non-emptydestination.table- must be non-emptysyncModes.massandsyncModes.incremental- both required
Precedence Rules
- MinDt:
WithMinimumDate()override > configminDtOffsetcomputation - PrePurge/ReIndex: Removed from builder; pipeline config is only source
- Scripts order: Config
preScriptsrun first, then generated prePurge script; generated reIndex script runs first, then configpostScripts - Destination merge: Mode-specific fields override base; missing fields inherit from base
Parameter Resolution
offset: Computed fromminDtOffset+ current time; format conversion appliedstatic: Value taken from configvaluefield; must be present; no format conversionruntime: ThrowsNotSupportedException(deferred)
Risk Assessment
Low risk:
- New ETL pipeline already working and tested
- Orchestration layer unchanged (just different internals)
- Clear separation between old and new code
Medium risk:
- Generic DbQuerySource with multiple connection types - need testing
- Conditional merge (UpdateWhen) is new - need testing
- JSON config loading is new - need validation
- Parameter format conversion (JDE Julian) - need testing
Migration Path
Phase 1: Build New (keep build working)
- Create pipeline config models and options
- Create
DbQuerySource(generic) - Extend
DbBulkMergeDestinationwith UpdateWhen and excludeFromUpdate - Create
ParameterFormatConverter - Create
IEtlPipelineFactoryandEtlPipelineFactory - Create
pipelines.jsonconfig file - Register new services in DI (alongside old)
Phase 2: Wire Up
8. Update TableSyncOperation to use IEtlPipelineFactory
9. Update DependencyInjection.cs to wire new factory
10. Test end-to-end with new pipeline
Phase 3: Clean Up 11. Delete old source files (Fetchers, MergeConfigurations, BulkMerge services) 12. Delete old contracts 13. Delete SourceGenerator project 14. Update solution file 15. Update tests (remove FetcherTypeName references) 16. Delete obsolete test files and infrastructure