Files
2026-01-03 15:41:21 -05:00

7.7 KiB

Troubleshooting

This document covers common errors, debugging patterns, and performance tuning for the ETL pipeline.

Common Errors

Column mapping errors

Error Cause Resolution
"No columns from source exist in destination table" Source column names don't match destination Check source query column aliases match destination table columns exactly (case-insensitive)
"Column name collision" Transformer creates duplicate column names Review rename mappings; ensure no two columns map to the same output name
"Column '{name}' not found or was dropped" Accessing a column that was dropped Check transformer chain order; don't access dropped columns in later transformers

Computed column errors

Error Cause Resolution
"GetBytes not supported for computed column at ordinal N" Binary access on transformed column Use GetValue() instead; computed columns (like JDE dates) don't support binary access
"GetChars not supported for computed column at ordinal N" Same as above Use GetValue() or GetString()
"GetData not supported for computed column at ordinal N" Same as above Computed columns can't return nested readers

Timeout errors

Error Cause Resolution
SqlException: Timeout expired during bulk copy Large dataset, slow network Increase commandTimeoutSeconds on destination
SqlException: Timeout expired during MERGE Many rows to match Increase timeout; consider smaller batches
SqlException: Timeout expired during script Index rebuild on large table Increase script timeoutSeconds (default 3600s for rebuild)

Validation errors

Error Cause Resolution
"Source is required. Call WithSource() before Build()" Missing source in pipeline Add .WithSource() to builder chain
"Destination is required. Call WithDestination() before Build()" Missing destination in pipeline Add .WithDestination() to builder chain
"At least one match column is required" Empty matchColumns array Provide key columns for MERGE matching
"Timeout must be between 0 and 24 hours" Invalid timeout value Use TimeSpan between 0 and 24 hours

Debugging Patterns

Inspecting pipeline results

Check PipelineResult after execution to understand what happened:

var result = await pipeline.ExecuteAsync(cancellationToken);

if (!result.Success)
{
    logger.LogError(result.Error, "Pipeline failed after {Rows} rows in {Elapsed}",
        result.TotalRows, result.Elapsed);

    // Find which step failed
    var lastStep = result.Steps.LastOrDefault();
    if (lastStep != null)
    {
        logger.LogError("Failed at step: {Step} ({Type})",
            lastStep.StepName, lastStep.StepType);
    }
}

Tracking step-by-step progress

Each step records timing and row counts:

foreach (var step in result.Steps)
{
    logger.LogInformation("Step {Name} ({Type}): {Rows} rows in {Elapsed}ms",
        step.StepName,
        step.StepType,
        step.RowsAffected,
        step.Elapsed.TotalMilliseconds);
}

Enabling detailed logging

Inject a logger into the pipeline for execution-level logging:

var pipeline = new EtlPipelineBuilder()
    .WithName("DebugPipeline")
    .WithSource(source)
    .WithDestination(destination)
    .WithLogger(loggerFactory.CreateLogger<EtlPipeline>())
    .Build();

Pipeline logs include:

  • Information: Pipeline start/complete with row counts
  • Debug: Individual script execution
  • Error: Failure with exception and last step

Identifying the failure point

When a pipeline fails, PipelineResult.Steps contains all completed steps:

if (!result.Success)
{
    // Steps completed before failure
    var completedSteps = result.Steps.Select(s => s.StepName);
    logger.LogError("Completed steps: {Steps}", string.Join(" → ", completedSteps));

    // The exception contains root cause
    logger.LogError(result.Error, "Root cause");
}

Performance Tuning

Batch size optimization

Default batch size is 10,000 rows. Adjust based on row width:

Row Size Recommended Batch Size
Narrow (< 20 columns) 10,000 - 50,000
Medium (20-50 columns) 5,000 - 10,000
Wide (> 50 columns) 1,000 - 5,000
// Large batch for narrow rows
new DbBulkImportDestination(factory, "LookupTable", batchSize: 50000)

// Small batch for wide rows
new DbBulkMergeDestination(factory, "DetailTable", matchColumns, batchSize: 2000)

Index management for bulk loads

Disable indexes before large imports, rebuild after:

var pipeline = new EtlPipelineBuilder()
    .WithName("FullTableRefresh")
    .WithPreScript(CommonScripts.DisableIndexes(factory, "LargeTable"))
    .WithSource(source)
    .WithDestination(new DbBulkImportDestination(factory, "LargeTable"))
    .WithPostScript(CommonScripts.RebuildIndexes(factory, "LargeTable"))
    .WithPostScript(CommonScripts.UpdateStatistics(factory, "LargeTable"))
    .Build();

When to use:

  • Full table refreshes (TRUNCATE + import)
  • Tables with 3+ non-clustered indexes
  • Import of 100,000+ rows

When to skip:

  • Incremental merges with few rows
  • Tables with only a clustered index
  • Frequent small updates

Timeout sizing guidelines

Operation Rows Suggested Timeout
Bulk import < 100K 600s (default)
Bulk import 100K - 1M 1800s (30 min)
Bulk import > 1M 3600s (1 hour)
Bulk merge < 50K 600s (default)
Bulk merge 50K - 500K 1800s (30 min)
Index rebuild Any 3600s (default)
// Large table with extended timeout
new DbBulkMergeDestination(factory, "HistoricalData",
    matchColumns: new[] { "RecordId" },
    commandTimeoutSeconds: 1800)

Reducing network and memory usage

Select only needed columns in the source query:

// Good - select only needed columns
var source = new DbQuerySource(factory,
    "SELECT OrderNumber, Status, StartDate FROM JDE.WorkOrders");

// Avoid - selecting all columns wastes bandwidth
var source = new DbQuerySource(factory,
    "SELECT * FROM JDE.WorkOrders");

Extra columns in the source are ignored by the destination column mapping, but they still consume network bandwidth and memory.

Monitoring baseline performance

Track PipelineResult.Elapsed over time to detect degradation:

var result = await pipeline.ExecuteAsync(ct);

metrics.RecordPipeline(
    pipelineName: pipeline.PipelineName,
    success: result.Success,
    rows: result.TotalRows,
    durationMs: result.Elapsed.TotalMilliseconds);

// Alert if duration exceeds baseline by 50%
if (result.Elapsed > baselineDuration * 1.5)
{
    logger.LogWarning("Pipeline {Name} took {Elapsed} (baseline: {Baseline})",
        pipeline.PipelineName, result.Elapsed, baselineDuration);
}

Step-level performance analysis

Identify slow steps using StepResult.Elapsed:

var slowSteps = result.Steps
    .Where(s => s.Elapsed > TimeSpan.FromSeconds(30))
    .OrderByDescending(s => s.Elapsed);

foreach (var step in slowSteps)
{
    logger.LogWarning("Slow step: {Name} took {Elapsed}",
        step.StepName, step.Elapsed);
}