Files
jdescopingtool/openspec/specs/data-sync/spec.md
T
Joseph Doherty 26ff8d9b4f Initial commit: JDE Scoping Tool migration project
Set up repository with legacy .NET Framework 4.8 source (OLD/),
new .NET 10 Blazor solution (NEW/), OpenSpec specifications,
documentation, and project configuration.
2026-01-02 07:43:29 -05:00

41 KiB

Data Sync Specification

Purpose

The Data Sync subsystem maintains a local SQL Server cache of enterprise data from JDE (JD Edwards - Oracle) and CMS (Sybase) source systems. Implemented as a .NET 10 BackgroundService, it enables fast search operations by synchronizing data on configurable schedules (mass/daily/hourly) and uses incremental updates with MERGE operations to minimize data transfer while keeping the cache current. The service integrates with the ASP.NET Core hosting model, supporting graceful shutdown, health checks, and telemetry.

Source Reference

Legacy Files Purpose
OLD/WorkerService/Process/UpdateProcessor.cs Main sync orchestration, schedule checking, update execution
OLD/WorkerService/Process/UpdateProcessor.TableManagement.cs Staging table creation, MERGE generation, bulk copy, index management
OLD/WorkerService/Process/UpdateProcessor.DataUpdateEntry.cs Update logging, history tracking, cleanup
OLD/WorkerService/dsconfig/*.json Per-table sync configuration files
OLD/WorkerService/Models/DataSourceConfig.cs Configuration model with fetch functions
OLD/WorkerService/Models/DataUpdateConfig.cs Schedule configuration (interval, prepurge, reindex)
OLD/WorkerService/Process/WorkProcessor.cs Work loop that triggers sync checks
OLD/Database/Views/LastDataUpdates.sql View for determining last successful sync per table/type

Requirements

Requirement: Background Service Lifecycle

The system SHALL implement data synchronization as a .NET BackgroundService with proper lifecycle management.

Inputs

  • CancellationToken from the host for graceful shutdown signals
  • IServiceScopeFactory for creating scoped services per sync operation
  • IOptions<DataSyncOptions> for configuration

Outputs

  • Long-running background task that processes sync schedules
  • Graceful shutdown with in-progress operation completion or cancellation

Business Rules

  • The service MUST inherit from BackgroundService and implement ExecuteAsync
  • The service SHALL respect CancellationToken for graceful shutdown
  • Each sync operation MUST create a new IServiceScope via IServiceScopeFactory
  • At startup, the service MUST call CloseOpenUpdateEntries() to mark interrupted syncs as failed
  • The service SHALL call PurgeUpdateEntries() periodically to clean old history records
  • The main loop SHALL use Task.Delay with the cancellation token between sync checks

Scenario: Service startup initialization

  • WHEN the BackgroundService starts
  • THEN the system SHALL invoke CloseOpenUpdateEntries() to mark any DataUpdate records with NumberRecords = -2 as failed
  • THEN the system SHALL begin the main sync check loop

Scenario: Graceful shutdown during sync

  • WHEN the host signals shutdown via CancellationToken
  • AND a sync operation is in progress
  • THEN the cancellation token SHALL propagate to all child operations
  • THEN the service SHALL wait for current batch completion or cancel gracefully
  • THEN any incomplete syncs SHALL be marked as failed with WasSuccessful = false

Scenario: Scoped service creation per sync

  • WHEN a sync operation begins
  • THEN the system SHALL create a new IServiceScope
  • THEN all services for that sync operation SHALL be resolved from the scope
  • THEN the scope SHALL be disposed after the sync completes or fails

Requirement: Strongly-Typed Configuration

The system SHALL use strongly-typed options classes bound from configuration instead of JSON file parsing with reflection.

Inputs

  • IOptions<DataSyncOptions> injected via dependency injection
  • Configuration bound from appsettings.json or environment variables

Outputs

  • DataSyncOptions containing global sync settings
  • DataSourceOptions containing per-table configuration
  • Type-resolved IDataFetcher<T> implementations

Business Rules

  • Configuration SHALL use IOptions<DataSyncOptions> pattern instead of JSON file loading
  • DataSyncOptions SHALL define: MaxDegreeOfParallelism, BatchSize, BulkCopyBatchSize, LookbackMultiplier, PurgeRetentionDays
  • DataSourceOptions SHALL define: SourceSystem, TableName, IsEnabled, MassConfig, DailyConfig, HourlyConfig, FetcherTypeName, PostProcessorTypeName
  • Each schedule config (MassConfig, DailyConfig, HourlyConfig) SHALL include an Enabled boolean flag for explicit schedule enable/disable control
  • The FetcherTypeName SHALL be resolved to an IDataFetcher<T> implementation at startup
  • The PostProcessorTypeName SHALL be resolved to an IPostProcessor implementation at startup
  • Invalid or unresolvable type names SHALL cause startup failure with descriptive error

Scenario: Configuration binding at startup

  • WHEN the application starts
  • THEN DataSyncOptions SHALL be bound from the DataSync configuration section
  • THEN each DataSourceOptions entry SHALL be validated for required fields
  • THEN FetcherTypeName values SHALL be resolved to registered IDataFetcher<T> services

Scenario: Invalid fetcher type configuration

  • WHEN a DataSourceOptions.FetcherTypeName cannot be resolved to a registered service
  • THEN the system SHALL throw a descriptive exception at startup
  • THEN the error message SHALL include the invalid type name and table name

Requirement: Data Fetcher Abstraction

The system SHALL use IDataFetcher<TEntity> interfaces instead of reflection-based delegates for data retrieval.

Inputs

  • DateTime? minimumDT parameter for incremental fetches
  • CancellationToken for cancellation support
  • Source system connection (JDE Oracle or CMS Sybase)

Outputs

  • IAsyncEnumerable<TEntity> streaming data from source systems
  • Support for cancellation during long-running fetches

Business Rules

  • Each data source MUST have a corresponding IDataFetcher<TEntity> implementation
  • The FetchAsync method SHALL return IAsyncEnumerable<TEntity> for memory-efficient streaming
  • All fetch operations MUST accept and respect CancellationToken
  • JDE fetchers SHALL use Oracle.ManagedDataAccess.Core connections
  • CMS fetchers SHALL use Oracle.ManagedDataAccess.Core connections (CMS uses Oracle via legacy DDTek driver, consolidated in migration)
  • Initial implementation MAY use stub fetchers that return empty IAsyncEnumerable<T> streams while JDE/CMS connectivity is deferred
  • Stub fetchers SHALL implement IDataFetcher<T> interface with yield break to enable testing without external dependencies

Scenario: Streaming data fetch

  • WHEN a sync operation requests data from a source system
  • THEN the system SHALL call IDataFetcher<T>.FetchAsync(minimumDT, cancellationToken)
  • THEN data SHALL stream via IAsyncEnumerable<T> without loading all records into memory
  • THEN cancellation SHALL stop the enumeration gracefully

Scenario: Cancellation during fetch

  • WHEN the cancellation token is triggered during a fetch operation
  • THEN the async enumerable SHALL stop yielding records
  • THEN database resources SHALL be properly disposed
  • THEN the sync operation SHALL be marked as failed

Requirement: Health Checks

The system SHALL expose health check endpoints for monitoring sync status.

Inputs

  • IHealthCheck registration with ASP.NET Core health checks
  • Current sync state and last successful sync timestamps

Outputs

  • Health status: Healthy, Degraded, or Unhealthy
  • Diagnostic data including last sync times and any error messages

Business Rules

  • The health check SHALL report Healthy when all enabled tables have synced within their configured intervals
  • The health check SHALL report Degraded when any table is overdue but syncs are progressing
  • The health check SHALL report Unhealthy when syncs have been failing repeatedly or the service is not running
  • Health check response SHALL include per-table sync status and timestamps

Scenario: All syncs current

  • WHEN health check executes
  • AND all enabled tables have successful syncs within their intervals
  • THEN the check SHALL return Healthy status
  • THEN response SHALL include last sync timestamps per table

Scenario: Overdue syncs with progress

  • WHEN health check executes
  • AND some tables are overdue for sync
  • AND sync operations are currently running or recently completed
  • THEN the check SHALL return Degraded status
  • THEN response SHALL identify which tables are overdue

Scenario: Repeated failures

  • WHEN health check executes
  • AND multiple recent sync operations have failed
  • THEN the check SHALL return Unhealthy status
  • THEN response SHALL include error details from failed syncs

Requirement: Telemetry and Metrics

The system SHALL emit metrics and traces for observability.

Inputs

  • System.Diagnostics.Metrics meter for metrics
  • System.Diagnostics.ActivitySource for distributed tracing

Outputs

  • Counters: sync operations started, completed, failed
  • Histograms: sync duration, records processed
  • Activity spans for distributed tracing

Business Rules

  • The service SHALL create a Meter named DataSync
  • The service SHALL emit counters for: sync.operations.started, sync.operations.completed, sync.operations.failed
  • The service SHALL emit histograms for: sync.duration.seconds, sync.records.processed
  • Each sync operation SHALL create an Activity span with tags for table name, update type, and source system
  • Activity spans SHALL include record count and duration on completion

Scenario: Sync operation telemetry

  • WHEN a sync operation starts

  • THEN the system SHALL increment sync.operations.started counter

  • THEN the system SHALL start an Activity span with table and type tags

  • WHEN a sync operation completes successfully

  • THEN the system SHALL increment sync.operations.completed counter

  • THEN the system SHALL record duration in sync.duration.seconds histogram

  • THEN the system SHALL record count in sync.records.processed histogram

  • THEN the Activity span SHALL be completed with success status

  • WHEN a sync operation fails

  • THEN the system SHALL increment sync.operations.failed counter

  • THEN the Activity span SHALL be completed with error status and exception details

Requirement: Schedule-Based Sync Triggering

The system SHALL support three distinct sync schedule types: Mass, Daily, and Hourly, each with independent intervals and behaviors per table.

Inputs

  • Data source configuration via IOptions<DataSyncOptions>
  • LastDataUpdates view providing timestamps of last successful syncs
  • Current system time

Outputs

  • List of pending DataUpdateTask objects requiring execution
  • Each task specifies: target table, update type, and minimum timestamp for incremental fetches

Business Rules

  • Mass updates SHALL trigger when no prior successful mass update exists OR when the configured mass interval has elapsed since the last mass update
  • Daily updates SHALL trigger when mass is current AND daily interval has elapsed since last daily update
  • Hourly updates SHALL trigger when mass and daily are current AND hourly interval has elapsed since last hourly update
  • Schedule priority SHALL be: Mass > Daily > Hourly (mass takes precedence)
  • Incremental updates (Daily/Hourly) SHALL use a configurable lookback window (default 3x) of the interval to capture delayed records
  • Hourly incremental updates use the last Daily sync timestamp with the lookback multiplier applied to the Daily interval (not Hourly interval)
  • Only tables with IsEnabled = true AND the specific schedule enabled SHALL be considered for automatic sync

Scenario: Initial system startup with no prior syncs

  • WHEN the system starts and no DataUpdate records exist for a table
  • AND the table has IsEnabled = true and MassConfig.Enabled = true
  • THEN the system SHALL queue a Mass update task for that table
  • THEN the MinimumDT parameter SHALL be null (full data fetch)

Scenario: Mass sync interval elapsed

  • WHEN a table's last successful Mass update occurred more than MassConfig.Interval minutes ago
  • THEN the system SHALL queue a Mass update task for that table
  • THEN any pending Daily or Hourly updates for that table SHALL be superseded

Scenario: Daily sync triggers after mass is current

  • WHEN a table's Mass update is current (within interval)
  • AND the last Daily update occurred more than DailyConfig.Interval minutes ago
  • THEN the system SHALL queue a Daily update task
  • THEN the MinimumDT SHALL be set to LastDailyUpdateDT - (LookbackMultiplier * DailyInterval) minutes

Scenario: Hourly sync with lookback window

  • WHEN an Hourly update is triggered
  • THEN the system SHALL fetch records modified since MinimumDT
  • AND MinimumDT SHALL equal LastDailyUpdateDT - (LookbackMultiplier * DailyInterval) minutes (using Daily timestamp, not Hourly)

Scenario: Disabled table not scheduled

  • WHEN a table has IsEnabled = false OR all schedule configs have Enabled = false
  • THEN the table SHALL NOT be automatically scheduled for sync
  • THEN syncs MAY only occur via explicit manual trigger through the admin API

Requirement: Data Source Configuration

The system SHALL load and validate data source configurations defining sync behavior per table.

Inputs

  • DataSourceOptions entries within DataSyncOptions
  • Each entry specifies: SourceSystem, TableName, FetcherTypeName, PostProcessorTypeName, and schedule configs

Outputs

  • Validated DataSourceOptions with resolved service types
  • Only configurations with IsEnabled = true are active

Business Rules

  • Each data source MUST specify a FetcherTypeName that resolves to an IDataFetcher<T> implementation
  • PostProcessorTypeName is optional and specifies an IPostProcessor implementation
  • Standard intervals SHALL be: Mass = 10080 minutes (7 days), Daily = 1440 minutes (24 hours), Hourly = 60 minutes
  • CMS data sources MAY have different intervals (e.g., MisData uses Mass = 100800 minutes / 70 days)
  • Archive tables MAY disable all schedules and require manual triggering via admin API

Scenario: Configuration validation at startup

  • WHEN the service starts
  • THEN all DataSourceOptions entries SHALL be validated
  • THEN FetcherTypeName values SHALL be resolved to registered services
  • THEN only configurations with IsEnabled = true SHALL be added to the active configs list

Scenario: Disabled archive table configuration

  • WHEN a configuration has MassConfig.Enabled = false, DailyConfig.Enabled = false, and HourlyConfig.Enabled = false
  • THEN the table SHALL never be automatically scheduled for sync
  • THEN syncs MAY only occur via explicit manual trigger through the admin API

Scenario: Post-processing action execution

  • WHEN a data source specifies a PostProcessorTypeName
  • AND the data merge completes successfully
  • THEN the system SHALL resolve and invoke the IPostProcessor.ProcessAsync() method
  • THEN the update SHALL only be marked complete after post-processing succeeds

Scenario: CMS vs JDE source configuration

  • WHEN a data source has SourceSystem = "CMS"
  • THEN the FetcherTypeName SHALL reference a CMS-specific IDataFetcher<T> implementation
  • WHEN a data source has SourceSystem = "JDE"
  • THEN the FetcherTypeName SHALL reference a JDE-specific IDataFetcher<T> implementation

Requirement: Table Management and Merge Operations

The system SHALL use staging tables and SQL MERGE operations to efficiently upsert data while preserving existing records.

Inputs

  • Source data from IDataFetcher<T>.FetchAsync() execution
  • Destination table schema (columns, primary key, indexes)
  • Update configuration (PrepurgeData, ReIndexData flags)

Outputs

  • Updated destination table with merged data
  • Rebuilt indexes (if configured)
  • Staging and temp tables cleaned up

Business Rules

  • Mass updates with PrepurgeData = true SHALL TRUNCATE the destination table before loading
  • Incremental updates (Daily/Hourly) SHALL use MERGE to upsert without deleting existing records
  • Data SHALL be batched in groups of 1,000,000 records for bulk copy operations
  • Bulk copy SHALL use batch size of 10,000 rows with streaming enabled
  • Staging tables SHALL be named #Staging{TableName}_{OperationId} (local temp tables with unique suffix for parallel isolation)
  • Temp tables SHALL be named #{TableName}_{OperationId} (local temp tables with unique suffix)
  • MERGE SHALL update existing records only when LastUpdateDT in source is greater than target (if column exists)
  • Tables without LastUpdateDT column SHALL update all matched rows unconditionally
  • Non-primary-key indexes SHALL be disabled during bulk load and rebuilt after

Scenario: Mass update with table truncation

  • WHEN a Mass update executes with PrepurgeData = true
  • THEN the destination table SHALL be truncated before data load
  • THEN all records from source SHALL be inserted
  • THEN indexes SHALL be rebuilt if ReIndexData = true

Scenario: Incremental update with MERGE

  • WHEN a Daily or Hourly update executes
  • THEN the system SHALL create a staging table matching destination schema with unique suffix
  • THEN source data SHALL be bulk copied to staging table
  • THEN data SHALL be deduplicated into temp table using ROW_NUMBER() OVER(PARTITION BY PK ORDER BY LastUpdateDT DESC)
  • THEN MERGE SHALL insert new records and update existing records where source LastUpdateDT > target.LastUpdateDT

Scenario: Table without LastUpdateDT column

  • WHEN MERGE executes on a table without LastUpdateDT column
  • THEN all matched rows SHALL be updated unconditionally
  • THEN the ReleaseDate column (if present) SHALL only be used for ORDER BY in deduplication, not for update filtering

Scenario: Large dataset batching

  • WHEN the data fetch streams more than 1,000,000 records
  • THEN records SHALL be processed in batches of 1,000,000
  • THEN each batch SHALL create fresh staging/temp tables with unique suffixes
  • THEN each batch SHALL execute MERGE independently
  • THEN total record count SHALL accumulate across all batches

Scenario: Index management during bulk load

  • WHEN staging table is created
  • THEN an index SHALL be created on primary key columns plus LastUpdateDT (or ReleaseDate)
  • THEN non-PK, non-unique indexes SHALL be disabled before bulk copy
  • THEN indexes SHALL be rebuilt after bulk copy completes

Requirement: Update Logging and Recovery

The system SHALL log all sync operations and support recovery from interrupted syncs.

Inputs

  • DataUpdate table for recording sync history
  • LastDataUpdates view for querying last successful syncs

Outputs

  • Complete audit trail of all sync operations
  • Automatic recovery of interrupted syncs

Business Rules

  • Each sync operation MUST create a DataUpdate record at start with NumberRecords = -2 (in-progress marker)
  • The sync operation MUST be wrapped in try/catch to ensure failed operations are marked properly
  • Successful completion SHALL update EndDT, WasSuccessful = true, and actual NumberRecords
  • Failed operations SHALL set WasSuccessful = false and NumberRecords = -1
  • Open entries (NumberRecords = -2) from prior runs SHALL be closed as failed at service startup via CloseOpenUpdateEntries()
  • Old DataUpdate records SHALL be purged periodically via PurgeUpdateEntries() after configurable retention period
  • All logging SHALL use ILogger<T> with BeginScope() for structured context (table name, update type, operation ID)

Scenario: Sync operation start logging

  • WHEN a sync operation begins
  • THEN a DataUpdate record SHALL be inserted with NumberRecords = -2
  • THEN the record SHALL include SourceSystem, SourceData, TableName, UpdateType, StartDT
  • THEN the operation SHALL create a logging scope with table name and operation ID

Scenario: Successful sync completion

  • WHEN a sync operation completes without errors
  • THEN the DataUpdate record SHALL be updated with EndDT = GETDATE()
  • THEN WasSuccessful SHALL be set to true
  • THEN NumberRecords SHALL reflect the total rows processed

Scenario: Failed sync handling

  • WHEN a sync operation throws an exception
  • THEN the exception SHALL be caught in the operation wrapper
  • THEN the DataUpdate record SHALL be updated with WasSuccessful = false, NumberRecords = -1
  • THEN the error SHALL be logged via ILogger<T> with full exception details
  • THEN subsequent sync attempts SHALL retry the operation

Scenario: Recovery from interrupted sync at startup

  • WHEN the service starts and finds DataUpdate records with NumberRecords = -2
  • THEN CloseOpenUpdateEntries() SHALL update those records to EndDT = GETDATE(), WasSuccessful = false, NumberRecords = -1
  • THEN the system SHALL treat those tables as needing fresh sync based on last successful update

Scenario: Periodic history purge

  • WHEN PurgeUpdateEntries() executes
  • THEN DataUpdate records older than PurgeRetentionDays SHALL be deleted
  • THEN the purge SHALL run periodically (e.g., daily) independent of sync operations

Requirement: Parallel Sync Execution

The system SHALL execute multiple table syncs in parallel to optimize throughput with proper cancellation support.

Inputs

  • List of pending DataUpdateTask objects
  • MaxDegreeOfParallelism from DataSyncOptions
  • CancellationToken for cancellation support

Outputs

  • Concurrent execution of sync operations
  • Proper isolation between parallel syncs
  • Graceful cancellation of parallel operations

Business Rules

  • Pending updates SHALL be executed in parallel using Parallel.ForEachAsync or SemaphoreSlim with Task.WhenAll
  • Maximum degree of parallelism SHALL be configurable (default = 8)
  • Each sync operation MUST use its own IServiceScope for scoped service resolution
  • Each sync operation MUST use its own database connection
  • Staging tables MUST use unique suffixes (_{OperationId}) to avoid conflicts in parallel scenarios
  • CancellationToken MUST be passed to all parallel operations
  • Search processing is blocked while any sync operations are pending

Scenario: Multiple tables need sync

  • WHEN multiple tables have pending sync tasks
  • THEN the system SHALL execute up to MaxDegreeOfParallelism sync operations concurrently
  • THEN each operation SHALL create its own IServiceScope
  • THEN each operation SHALL use independent SQL connections
  • THEN completion of one operation SHALL not affect others

Scenario: Cancellation during parallel sync

  • WHEN cancellation is requested during parallel sync execution
  • THEN the CancellationToken SHALL propagate to all running operations
  • THEN operations SHALL check the token and exit gracefully
  • THEN incomplete operations SHALL be marked as failed

Scenario: Sync blocks search processing

  • WHEN the work processor checks for pending sync tasks
  • AND pending tasks exist
  • THEN sync operations SHALL execute before processing any queued searches
  • THEN search processing SHALL only begin when no sync tasks remain pending

Scenario: Sync with isolated resources

  • WHEN multiple sync operations run in parallel
  • THEN each operation SHALL create staging tables with unique suffixes
  • THEN each operation SHALL use its own scoped database connection
  • THEN no shared mutable state SHALL exist between parallel operations

Requirement: CMS Availability and Circuit Breaker

The system SHALL handle CMS (Sybase) connectivity issues with circuit breaker pattern.

Inputs

  • CMS connection state
  • Recent CMS sync failure history

Outputs

  • Automatic retry with backoff for transient failures
  • Circuit breaker to prevent repeated failed connection attempts

Business Rules

  • CMS connections SHALL use Polly or similar circuit breaker pattern
  • The circuit SHALL open after consecutive failures (configurable, default = 3)
  • The circuit SHALL remain open for a configurable duration (default = 5 minutes)
  • Health checks SHALL report CMS circuit state
  • JDE syncs SHALL continue independently of CMS circuit state

Scenario: CMS transient failure

  • WHEN a CMS sync fails with a transient error
  • THEN the system SHALL retry with exponential backoff
  • THEN the failure count SHALL increment

Scenario: Circuit breaker opens

  • WHEN consecutive CMS sync failures exceed threshold
  • THEN the circuit breaker SHALL open
  • THEN subsequent CMS sync attempts SHALL fail fast without attempting connection
  • THEN JDE syncs SHALL continue normally

Scenario: Circuit breaker recovery

  • WHEN the circuit breaker open duration elapses
  • THEN the circuit SHALL transition to half-open state
  • THEN the next CMS sync attempt SHALL be allowed
  • THEN success SHALL close the circuit; failure SHALL reopen it

Requirement: Archive Sync Manual Trigger

The system SHALL support manual triggering of archive table syncs via admin API.

Inputs

  • HTTP request to admin API endpoint
  • Table name and optional update type parameters

Outputs

  • Queued sync task for the specified archive table
  • Status response indicating task queued

Business Rules

  • Archive tables with all schedules disabled SHALL only sync via manual trigger
  • The admin API endpoint SHALL require authentication and authorization
  • Manual triggers SHALL queue a Mass update task for the specified table
  • The system SHALL return immediate acknowledgment; sync runs asynchronously

Scenario: Manual archive sync trigger

  • WHEN an authenticated admin calls the manual sync API for an archive table
  • THEN a Mass update task SHALL be queued for that table
  • THEN the API SHALL return 202 Accepted with task ID
  • THEN the sync SHALL execute in the background service

Requirement: Periodic Index Maintenance

The system SHALL support periodic index maintenance independent of mass syncs.

Inputs

  • Index maintenance configuration (schedule, tables)
  • Current table statistics

Outputs

  • Rebuilt or reorganized indexes
  • Updated statistics

Business Rules

  • Index maintenance MAY be configured to run on a schedule independent of mass syncs
  • Maintenance SHALL check index fragmentation before rebuilding
  • Indexes with fragmentation > 30% SHALL be rebuilt; 10-30% SHALL be reorganized
  • Statistics SHALL be updated after index maintenance
  • Maintenance operations SHALL be logged for audit

Scenario: Scheduled index maintenance

  • WHEN the index maintenance schedule triggers
  • THEN the system SHALL check fragmentation levels for configured tables
  • THEN highly fragmented indexes SHALL be rebuilt
  • THEN moderately fragmented indexes SHALL be reorganized
  • THEN table statistics SHALL be updated

Requirement: Background service implementation pattern

The system SHALL implement the data synchronization service following .NET BackgroundService best practices for hosted service lifecycle management.

Inputs

  • IServiceScopeFactory for creating scoped service instances
  • IOptions<DataSyncOptions> for configuration access
  • ILogger<DataSyncService> for structured logging
  • CancellationToken from ExecuteAsync stoppingToken parameter

Outputs

  • Continuously running background task that checks schedules and executes syncs
  • Proper cleanup on shutdown with all resources disposed
  • Logging scope context for all operations

Business Rules

  • The service MUST implement BackgroundService.ExecuteAsync(CancellationToken)
  • The main loop MUST use Task.Delay(checkInterval, stoppingToken) between cycles
  • Each sync cycle MUST create a new IServiceScope via IServiceScopeFactory.CreateAsyncScope()
  • All scoped services MUST be resolved from the current scope, not from root provider
  • The scope MUST be disposed using await using pattern after each cycle
  • Exception handling MUST catch and log errors without crashing the service
  • OperationCanceledException MUST be caught and result in graceful loop exit when stoppingToken.IsCancellationRequested
  • The service MUST NOT use static state or shared mutable collections

Scenario: Normal sync cycle execution

  • WHEN the BackgroundService enters ExecuteAsync
  • THEN the service SHALL call CloseOpenUpdateEntriesAsync to recover from prior crashes
  • THEN the service SHALL enter a while loop checking !stoppingToken.IsCancellationRequested
  • THEN each iteration SHALL create a new IServiceScope
  • THEN the ISyncOrchestrator SHALL be resolved from the scope
  • THEN ExecutePendingSyncsAsync SHALL be called with the stoppingToken
  • THEN the scope SHALL be disposed after the call completes
  • THEN Task.Delay SHALL pause before the next iteration

Scenario: Exception during sync cycle

  • WHEN an exception occurs during sync execution (not OperationCanceledException)
  • THEN the exception SHALL be caught and logged with LogError
  • THEN the service SHALL continue to the next iteration
  • THEN the current scope SHALL still be disposed properly
  • THEN the service SHALL NOT crash or stop unexpectedly

Scenario: Graceful shutdown request

  • WHEN the host signals shutdown by canceling the stoppingToken
  • THEN any running Task.Delay SHALL throw OperationCanceledException
  • THEN the while loop SHALL exit on the IsCancellationRequested check
  • THEN the ExecuteAsync method SHALL complete normally
  • THEN any in-progress sync operations SHALL receive the cancellation and complete or cancel

Requirement: Parallel fetch isolation with scoped resources

The system SHALL ensure complete isolation between parallel sync operations using scoped resources and unique identifiers.

Inputs

  • List of DataUpdateTask objects to execute in parallel
  • MaxDegreeOfParallelism configuration value
  • CancellationToken for coordinated cancellation

Outputs

  • Concurrent execution of sync operations with no resource conflicts
  • Unique staging tables per operation that do not collide
  • Independent database connections per operation

Business Rules

  • Parallel.ForEachAsync MUST be used with ParallelOptions.CancellationToken set
  • Each parallel task MUST create its own IServiceScope inside the parallel delegate
  • Database connections MUST NOT be shared across parallel operations
  • Staging table names MUST include a unique OperationId suffix (GUID or sequential ID)
  • Format: #Staging{TableName}_{OperationId} and #{TableName}_{OperationId}
  • Each parallel operation MUST resolve its own instances of all scoped services
  • No ConcurrentDictionary, shared counters, or other shared mutable state SHALL exist between operations
  • Total record counts SHALL be accumulated via return values, not shared state

Scenario: Parallel sync with isolated scopes

  • WHEN multiple DataUpdateTasks are executed via Parallel.ForEachAsync
  • THEN each task SHALL execute the async delegate independently
  • THEN each delegate SHALL create a new IServiceScope using CreateAsyncScope
  • THEN ITableSyncOperation SHALL be resolved from each scope independently
  • THEN each operation SHALL use its own database connection from the scope
  • THEN staging tables SHALL use unique OperationId suffixes preventing name collisions
  • THEN completion of one operation SHALL NOT affect the execution of others

Scenario: Parallel cancellation propagation

  • WHEN cancellation is requested during Parallel.ForEachAsync execution
  • THEN the CancellationToken SHALL propagate to all running parallel operations
  • THEN Parallel.ForEachAsync SHALL stop starting new operations
  • THEN running operations SHALL receive the token in their async methods
  • THEN each operation SHALL check the token and exit gracefully
  • THEN incomplete operations SHALL mark their DataUpdate records as failed

Scenario: Staging table uniqueness verification

  • WHEN two sync operations for the same table run in parallel
  • THEN each operation SHALL generate a unique OperationId as GUID
  • THEN operation A SHALL create staging table with GuidA suffix
  • THEN operation B SHALL create staging table with GuidB suffix
  • THEN no SQL errors SHALL occur from table name conflicts
  • THEN each operation cleanup SHALL only drop its own staging tables

Requirement: Structured logging context

The system SHALL use ILogger.BeginScope to attach contextual information to all log entries during sync operations.

Inputs

  • ILogger<T> injected into sync operation classes
  • TableName, UpdateType, OperationId values from current operation

Outputs

  • All log entries within the scope contain the contextual properties
  • Log aggregation systems can filter and group by table, type, or operation

Business Rules

  • Each sync operation MUST call _logger.BeginScope(...) at the start
  • The scope MUST include at minimum: TableName, UpdateType, OperationId
  • The scope MUST be disposed using using statement when operation completes
  • Nested scopes for batches SHALL preserve parent scope properties
  • LogInformation, LogWarning, LogError calls within the scope SHALL include the context automatically

Scenario: Log scope creation and usage

  • WHEN a TableSyncOperation begins execution
  • THEN the operation SHALL create a logging scope with TableName, UpdateType, OperationId
  • THEN all log calls within ExecuteAsync SHALL include these properties
  • THEN when the operation completes the scope SHALL be disposed
  • THEN subsequent operations SHALL have their own independent scopes

Data Source Configurations

Current/Transactional Tables (Full Schedule)

Table Source Mass Interval Daily Interval Hourly Interval
WorkOrder_Curr JDE 10080 min (7d) 1440 min (24h) 60 min
LotUsage_Curr JDE 10080 min (7d) 1440 min (24h) 60 min
WorkOrderTime_Curr JDE 10080 min (7d) 1440 min (24h) 60 min
WorkOrderStep_Curr JDE 10080 min (7d) 1440 min (24h) 60 min
WorkOrderComponent_Curr JDE 10080 min (7d) 1440 min (24h) 60 min
WorkOrderRouting JDE 10080 min (7d) 1440 min (24h) 60 min

Reference Tables (Full Schedule)

Table Source Mass Interval Daily Interval Hourly Interval
Item JDE 10080 min (7d) 1440 min (24h) 60 min
Lot JDE 10080 min (7d) 1440 min (24h) 60 min
WorkCenter JDE 10080 min (7d) 1440 min (24h) 60 min
ProfitCenter JDE 10080 min (7d) 1440 min (24h) 60 min
Branch JDE 10080 min (7d) 1440 min (24h) 60 min
JdeUser JDE 10080 min (7d) 1440 min (24h) 60 min
StatusCode JDE 10080 min (7d) 1440 min (24h) 60 min
FunctionCode JDE 10080 min (7d) 1440 min (24h) 60 min
OrgHierarchy JDE 10080 min (7d) 1440 min (24h) 60 min
RouteMaster JDE 10080 min (7d) 1440 min (24h) 60 min

CMS Tables

Table Source Mass Interval Daily Interval Hourly Interval Notes
MisData CMS 100800 min (70d) 1440 min (24h) Disabled Has PostProcessor

Archive Tables (Disabled - Manual Trigger via Admin API)

Table Source Notes
WorkOrder_Hist JDE All schedules disabled
LotUsage_Hist JDE All schedules disabled
WorkOrderStep_Hist JDE All schedules disabled
WorkOrderTime_Hist JDE All schedules disabled
WorkOrderComponent_Hist JDE All schedules disabled

Migration Notes

Legacy Pattern New Pattern Rationale
Topshelf Windows Service .NET BackgroundService Native .NET hosting, cross-platform support
ManualResetEvent for shutdown CancellationToken Standard .NET cancellation pattern
Thread while loop BackgroundService.ExecuteAsync with Task.Delay Proper async/await, no thread blocking
Parallel.ForEach with MaxDegreeOfParallelism Parallel.ForEachAsync or SemaphoreSlim with Task.WhenAll Modern async patterns, cancellation support
JSON config files + Newtonsoft.Json System.Text.Json + IOptions<T> pattern Built-in JSON support, configuration binding
FunctionConverter reflection-based delegates IDataFetcher<T> interfaces Type safety, dependency injection, testability
ActionConverter reflection-based delegates IPostProcessor interfaces Type safety, dependency injection, testability
Static UpdateProcessor class Scoped/singleton services with DI Testability, proper lifecycle management
NLog ILogger<T> injected + BeginScope() for context Framework-integrated logging, structured context
Global temp tables ##staging_* Local temp tables #Staging{Table}_{OperationId} Better isolation in parallel scenarios
System.Data.SqlClient Microsoft.Data.SqlClient Modern SQL Server driver with better performance
Manual SQL MERGE generation Continue with Dapper + manual MERGE Performance critical, maintain fine control
No health checks IHealthCheck implementation Kubernetes/container orchestration support
No metrics/tracing System.Diagnostics.Metrics + ActivitySource Observability, distributed tracing

Resolved Design Decisions

Archive Sync Strategy

Decision: Archive tables will be synced via manual trigger through an authenticated admin API endpoint.

Rationale: Archive data changes infrequently and full syncs are expensive. Manual triggering allows administrators to control when these resource-intensive operations occur.

CMS Availability Handling

Decision: Use circuit breaker pattern (Polly) for CMS connections with configurable failure threshold and open duration.

Rationale: CMS (Sybase) may have different availability characteristics than JDE. Circuit breaker prevents cascading failures and allows JDE syncs to continue independently.

Post-Processing Migration

Decision: Replace reflection-based PostProcessingAction with IPostProcessor interfaces resolved via DI.

Rationale: Type-safe interfaces enable compile-time checking, better testability, and clearer contracts. DI resolution allows for proper scoping and dependency management.

Lookback Window Configuration

Decision: Make lookback multiplier configurable via DataSyncOptions.LookbackMultiplier (default = 3).

Rationale: Different environments may need different lookback windows based on data arrival patterns. Configuration allows tuning without code changes.

Index Rebuild Strategy

Decision: Add periodic index maintenance independent of mass syncs, checking fragmentation before rebuilding.

Rationale: Mass syncs may not run frequently enough for optimal index health. Separate maintenance allows proactive optimization based on actual fragmentation levels.

Codex Review Findings (Addressed)

The following issues were identified during code review and have been addressed in this specification:

  1. Hourly MinimumDT Calculation: ADDRESSED - Spec now correctly documents that hourly updates use the daily timestamp with daily interval lookback (not hourly interval). See "Schedule-Based Sync Triggering" requirement.

  2. Failure Recovery: ADDRESSED - Spec now requires DoUpdate wrapper with try/catch to mark failed updates. CloseOpenUpdateEntries() is invoked at startup. PurgeUpdateEntries() is invoked periodically. See "Update Logging and Recovery" and "Background Service Lifecycle" requirements.

  3. Disabled Schedules Can Run: ADDRESSED - Spec now requires checking both IsEnabled AND specific schedule Enabled flags. Tables with all schedules disabled are only synced via manual trigger. See "Schedule-Based Sync Triggering" requirement.

  4. Temp Table Naming: ADDRESSED - Spec now correctly documents #Staging{Table}_{OperationId} and #{Table}_{OperationId} naming with unique suffixes for parallel isolation. See "Table Management and Merge Operations" requirement.

  5. Archive Table Names: ADDRESSED - Data Source Configurations table now uses correct _Hist suffix (LotUsage_Hist, WorkOrderStep_Hist, etc.).

  6. WorkOrderRouting Table: ADDRESSED - Data Source Configurations table now correctly shows WorkOrderRouting (no _Curr suffix).

  7. MERGE LastUpdateDT Edge Case: ADDRESSED - Spec now documents that tables without LastUpdateDT column update all matched rows unconditionally, and ReleaseDate is only used for ORDER BY in deduplication. See "Table without LastUpdateDT column" scenario.