Files

T

Joseph Doherty 26ff8d9b4f Initial commit: JDE Scoping Tool migration project

Set up repository with legacy .NET Framework 4.8 source (OLD/),
new .NET 10 Blazor solution (NEW/), OpenSpec specifications,
documentation, and project configuration.

2026-01-02 07:43:29 -05:00

41 KiB

Raw Blame History

Data Sync Specification

Purpose

The Data Sync subsystem maintains a local SQL Server cache of enterprise data from JDE (JD Edwards - Oracle) and CMS (Sybase) source systems. Implemented as a .NET 10 BackgroundService, it enables fast search operations by synchronizing data on configurable schedules (mass/daily/hourly) and uses incremental updates with MERGE operations to minimize data transfer while keeping the cache current. The service integrates with the ASP.NET Core hosting model, supporting graceful shutdown, health checks, and telemetry.

Source Reference

Legacy Files	Purpose
`OLD/WorkerService/Process/UpdateProcessor.cs`	Main sync orchestration, schedule checking, update execution
`OLD/WorkerService/Process/UpdateProcessor.TableManagement.cs`	Staging table creation, MERGE generation, bulk copy, index management
`OLD/WorkerService/Process/UpdateProcessor.DataUpdateEntry.cs`	Update logging, history tracking, cleanup
`OLD/WorkerService/dsconfig/*.json`	Per-table sync configuration files
`OLD/WorkerService/Models/DataSourceConfig.cs`	Configuration model with fetch functions
`OLD/WorkerService/Models/DataUpdateConfig.cs`	Schedule configuration (interval, prepurge, reindex)
`OLD/WorkerService/Process/WorkProcessor.cs`	Work loop that triggers sync checks
`OLD/Database/Views/LastDataUpdates.sql`	View for determining last successful sync per table/type

Requirements

Requirement: Background Service Lifecycle

The system SHALL implement data synchronization as a .NET BackgroundService with proper lifecycle management.

Inputs

CancellationToken from the host for graceful shutdown signals
IServiceScopeFactory for creating scoped services per sync operation
IOptions<DataSyncOptions> for configuration

Outputs

Long-running background task that processes sync schedules
Graceful shutdown with in-progress operation completion or cancellation

Business Rules

The service MUST inherit from BackgroundService and implement ExecuteAsync
The service SHALL respect CancellationToken for graceful shutdown
Each sync operation MUST create a new IServiceScope via IServiceScopeFactory
At startup, the service MUST call CloseOpenUpdateEntries() to mark interrupted syncs as failed
The service SHALL call PurgeUpdateEntries() periodically to clean old history records
The main loop SHALL use Task.Delay with the cancellation token between sync checks

Scenario: Service startup initialization

WHEN the BackgroundService starts
THEN the system SHALL invoke CloseOpenUpdateEntries() to mark any DataUpdate records with NumberRecords = -2 as failed
THEN the system SHALL begin the main sync check loop

Scenario: Graceful shutdown during sync

WHEN the host signals shutdown via CancellationToken
AND a sync operation is in progress
THEN the cancellation token SHALL propagate to all child operations
THEN the service SHALL wait for current batch completion or cancel gracefully
THEN any incomplete syncs SHALL be marked as failed with WasSuccessful = false

Scenario: Scoped service creation per sync

WHEN a sync operation begins
THEN the system SHALL create a new IServiceScope
THEN all services for that sync operation SHALL be resolved from the scope
THEN the scope SHALL be disposed after the sync completes or fails

Requirement: Strongly-Typed Configuration

The system SHALL use strongly-typed options classes bound from configuration instead of JSON file parsing with reflection.

Inputs

IOptions<DataSyncOptions> injected via dependency injection
Configuration bound from appsettings.json or environment variables

Outputs

DataSyncOptions containing global sync settings
DataSourceOptions containing per-table configuration
Type-resolved IDataFetcher<T> implementations

Business Rules

Configuration SHALL use IOptions<DataSyncOptions> pattern instead of JSON file loading
DataSyncOptions SHALL define: MaxDegreeOfParallelism, BatchSize, BulkCopyBatchSize, LookbackMultiplier, PurgeRetentionDays
DataSourceOptions SHALL define: SourceSystem, TableName, IsEnabled, MassConfig, DailyConfig, HourlyConfig, FetcherTypeName, PostProcessorTypeName
Each schedule config (MassConfig, DailyConfig, HourlyConfig) SHALL include an Enabled boolean flag for explicit schedule enable/disable control
The FetcherTypeName SHALL be resolved to an IDataFetcher<T> implementation at startup
The PostProcessorTypeName SHALL be resolved to an IPostProcessor implementation at startup
Invalid or unresolvable type names SHALL cause startup failure with descriptive error

Scenario: Configuration binding at startup

WHEN the application starts
THEN DataSyncOptions SHALL be bound from the DataSync configuration section
THEN each DataSourceOptions entry SHALL be validated for required fields
THEN FetcherTypeName values SHALL be resolved to registered IDataFetcher<T> services

Scenario: Invalid fetcher type configuration

WHEN a DataSourceOptions.FetcherTypeName cannot be resolved to a registered service
THEN the system SHALL throw a descriptive exception at startup
THEN the error message SHALL include the invalid type name and table name

Requirement: Data Fetcher Abstraction

The system SHALL use IDataFetcher<TEntity> interfaces instead of reflection-based delegates for data retrieval.

Inputs

DateTime? minimumDT parameter for incremental fetches
CancellationToken for cancellation support
Source system connection (JDE Oracle or CMS Sybase)

Outputs

IAsyncEnumerable<TEntity> streaming data from source systems
Support for cancellation during long-running fetches

Business Rules

Each data source MUST have a corresponding IDataFetcher<TEntity> implementation
The FetchAsync method SHALL return IAsyncEnumerable<TEntity> for memory-efficient streaming
All fetch operations MUST accept and respect CancellationToken
JDE fetchers SHALL use Oracle.ManagedDataAccess.Core connections
CMS fetchers SHALL use Oracle.ManagedDataAccess.Core connections (CMS uses Oracle via legacy DDTek driver, consolidated in migration)
Initial implementation MAY use stub fetchers that return empty IAsyncEnumerable<T> streams while JDE/CMS connectivity is deferred
Stub fetchers SHALL implement IDataFetcher<T> interface with yield break to enable testing without external dependencies

Scenario: Streaming data fetch

WHEN a sync operation requests data from a source system
THEN the system SHALL call IDataFetcher<T>.FetchAsync(minimumDT, cancellationToken)
THEN data SHALL stream via IAsyncEnumerable<T> without loading all records into memory
THEN cancellation SHALL stop the enumeration gracefully

Scenario: Cancellation during fetch

WHEN the cancellation token is triggered during a fetch operation
THEN the async enumerable SHALL stop yielding records
THEN database resources SHALL be properly disposed
THEN the sync operation SHALL be marked as failed

Requirement: Health Checks

The system SHALL expose health check endpoints for monitoring sync status.

Inputs

IHealthCheck registration with ASP.NET Core health checks
Current sync state and last successful sync timestamps

Outputs

Health status: Healthy, Degraded, or Unhealthy
Diagnostic data including last sync times and any error messages

Business Rules

The health check SHALL report Healthy when all enabled tables have synced within their configured intervals
The health check SHALL report Degraded when any table is overdue but syncs are progressing
The health check SHALL report Unhealthy when syncs have been failing repeatedly or the service is not running
Health check response SHALL include per-table sync status and timestamps

Scenario: All syncs current

WHEN health check executes
AND all enabled tables have successful syncs within their intervals
THEN the check SHALL return Healthy status
THEN response SHALL include last sync timestamps per table

Scenario: Overdue syncs with progress

WHEN health check executes
AND some tables are overdue for sync
AND sync operations are currently running or recently completed
THEN the check SHALL return Degraded status
THEN response SHALL identify which tables are overdue

Scenario: Repeated failures

WHEN health check executes
AND multiple recent sync operations have failed
THEN the check SHALL return Unhealthy status
THEN response SHALL include error details from failed syncs

Requirement: Telemetry and Metrics

The system SHALL emit metrics and traces for observability.

Inputs

System.Diagnostics.Metrics meter for metrics
System.Diagnostics.ActivitySource for distributed tracing

Outputs

Counters: sync operations started, completed, failed
Histograms: sync duration, records processed
Activity spans for distributed tracing

Business Rules

The service SHALL create a Meter named DataSync
The service SHALL emit counters for: sync.operations.started, sync.operations.completed, sync.operations.failed
The service SHALL emit histograms for: sync.duration.seconds, sync.records.processed
Each sync operation SHALL create an Activity span with tags for table name, update type, and source system
Activity spans SHALL include record count and duration on completion

Scenario: Sync operation telemetry

WHEN a sync operation starts
THEN the system SHALL increment sync.operations.started counter
THEN the system SHALL start an Activity span with table and type tags
WHEN a sync operation completes successfully
THEN the system SHALL increment sync.operations.completed counter
THEN the system SHALL record duration in sync.duration.seconds histogram
THEN the system SHALL record count in sync.records.processed histogram
THEN the Activity span SHALL be completed with success status
WHEN a sync operation fails
THEN the system SHALL increment sync.operations.failed counter
THEN the Activity span SHALL be completed with error status and exception details

Requirement: Schedule-Based Sync Triggering

The system SHALL support three distinct sync schedule types: Mass, Daily, and Hourly, each with independent intervals and behaviors per table.

Inputs

Data source configuration via IOptions<DataSyncOptions>
LastDataUpdates view providing timestamps of last successful syncs
Current system time

Outputs

List of pending DataUpdateTask objects requiring execution
Each task specifies: target table, update type, and minimum timestamp for incremental fetches

Business Rules

Mass updates SHALL trigger when no prior successful mass update exists OR when the configured mass interval has elapsed since the last mass update
Daily updates SHALL trigger when mass is current AND daily interval has elapsed since last daily update
Hourly updates SHALL trigger when mass and daily are current AND hourly interval has elapsed since last hourly update
Schedule priority SHALL be: Mass > Daily > Hourly (mass takes precedence)
Incremental updates (Daily/Hourly) SHALL use a configurable lookback window (default 3x) of the interval to capture delayed records
Hourly incremental updates use the last Daily sync timestamp with the lookback multiplier applied to the Daily interval (not Hourly interval)
Only tables with IsEnabled = true AND the specific schedule enabled SHALL be considered for automatic sync

Scenario: Initial system startup with no prior syncs

WHEN the system starts and no DataUpdate records exist for a table
AND the table has IsEnabled = true and MassConfig.Enabled = true
THEN the system SHALL queue a Mass update task for that table
THEN the MinimumDT parameter SHALL be null (full data fetch)

Scenario: Mass sync interval elapsed

WHEN a table's last successful Mass update occurred more than MassConfig.Interval minutes ago
THEN the system SHALL queue a Mass update task for that table
THEN any pending Daily or Hourly updates for that table SHALL be superseded

Scenario: Daily sync triggers after mass is current

WHEN a table's Mass update is current (within interval)
AND the last Daily update occurred more than DailyConfig.Interval minutes ago
THEN the system SHALL queue a Daily update task
THEN the MinimumDT SHALL be set to LastDailyUpdateDT - (LookbackMultiplier * DailyInterval) minutes

Scenario: Hourly sync with lookback window

WHEN an Hourly update is triggered
THEN the system SHALL fetch records modified since MinimumDT
AND MinimumDT SHALL equal LastDailyUpdateDT - (LookbackMultiplier * DailyInterval) minutes (using Daily timestamp, not Hourly)

Scenario: Disabled table not scheduled

WHEN a table has IsEnabled = false OR all schedule configs have Enabled = false
THEN the table SHALL NOT be automatically scheduled for sync
THEN syncs MAY only occur via explicit manual trigger through the admin API

Requirement: Data Source Configuration

The system SHALL load and validate data source configurations defining sync behavior per table.

Inputs

DataSourceOptions entries within DataSyncOptions
Each entry specifies: SourceSystem, TableName, FetcherTypeName, PostProcessorTypeName, and schedule configs

Outputs

Validated DataSourceOptions with resolved service types
Only configurations with IsEnabled = true are active

Business Rules

Each data source MUST specify a FetcherTypeName that resolves to an IDataFetcher<T> implementation
PostProcessorTypeName is optional and specifies an IPostProcessor implementation
Standard intervals SHALL be: Mass = 10080 minutes (7 days), Daily = 1440 minutes (24 hours), Hourly = 60 minutes
CMS data sources MAY have different intervals (e.g., MisData uses Mass = 100800 minutes / 70 days)
Archive tables MAY disable all schedules and require manual triggering via admin API

Scenario: Configuration validation at startup

WHEN the service starts
THEN all DataSourceOptions entries SHALL be validated
THEN FetcherTypeName values SHALL be resolved to registered services
THEN only configurations with IsEnabled = true SHALL be added to the active configs list

Scenario: Disabled archive table configuration

WHEN a configuration has MassConfig.Enabled = false, DailyConfig.Enabled = false, and HourlyConfig.Enabled = false
THEN the table SHALL never be automatically scheduled for sync
THEN syncs MAY only occur via explicit manual trigger through the admin API

Scenario: Post-processing action execution

WHEN a data source specifies a PostProcessorTypeName
AND the data merge completes successfully
THEN the system SHALL resolve and invoke the IPostProcessor.ProcessAsync() method
THEN the update SHALL only be marked complete after post-processing succeeds

Scenario: CMS vs JDE source configuration

WHEN a data source has SourceSystem = "CMS"
THEN the FetcherTypeName SHALL reference a CMS-specific IDataFetcher<T> implementation
WHEN a data source has SourceSystem = "JDE"
THEN the FetcherTypeName SHALL reference a JDE-specific IDataFetcher<T> implementation

Requirement: Table Management and Merge Operations

The system SHALL use staging tables and SQL MERGE operations to efficiently upsert data while preserving existing records.

Inputs

Source data from IDataFetcher<T>.FetchAsync() execution
Destination table schema (columns, primary key, indexes)
Update configuration (PrepurgeData, ReIndexData flags)

Outputs

Updated destination table with merged data
Rebuilt indexes (if configured)
Staging and temp tables cleaned up

Business Rules

Mass updates with PrepurgeData = true SHALL TRUNCATE the destination table before loading
Incremental updates (Daily/Hourly) SHALL use MERGE to upsert without deleting existing records
Data SHALL be batched in groups of 1,000,000 records for bulk copy operations
Bulk copy SHALL use batch size of 10,000 rows with streaming enabled
Staging tables SHALL be named #Staging{TableName}_{OperationId} (local temp tables with unique suffix for parallel isolation)
Temp tables SHALL be named #{TableName}_{OperationId} (local temp tables with unique suffix)
MERGE SHALL update existing records only when LastUpdateDT in source is greater than target (if column exists)
Tables without LastUpdateDT column SHALL update all matched rows unconditionally
Non-primary-key indexes SHALL be disabled during bulk load and rebuilt after

Scenario: Mass update with table truncation

WHEN a Mass update executes with PrepurgeData = true
THEN the destination table SHALL be truncated before data load
THEN all records from source SHALL be inserted
THEN indexes SHALL be rebuilt if ReIndexData = true

Scenario: Incremental update with MERGE

WHEN a Daily or Hourly update executes
THEN the system SHALL create a staging table matching destination schema with unique suffix
THEN source data SHALL be bulk copied to staging table
THEN data SHALL be deduplicated into temp table using ROW_NUMBER() OVER(PARTITION BY PK ORDER BY LastUpdateDT DESC)
THEN MERGE SHALL insert new records and update existing records where source LastUpdateDT > target.LastUpdateDT

Scenario: Table without LastUpdateDT column

WHEN MERGE executes on a table without LastUpdateDT column
THEN all matched rows SHALL be updated unconditionally
THEN the ReleaseDate column (if present) SHALL only be used for ORDER BY in deduplication, not for update filtering

Scenario: Large dataset batching

WHEN the data fetch streams more than 1,000,000 records
THEN records SHALL be processed in batches of 1,000,000
THEN each batch SHALL create fresh staging/temp tables with unique suffixes
THEN each batch SHALL execute MERGE independently
THEN total record count SHALL accumulate across all batches

Scenario: Index management during bulk load

WHEN staging table is created
THEN an index SHALL be created on primary key columns plus LastUpdateDT (or ReleaseDate)
THEN non-PK, non-unique indexes SHALL be disabled before bulk copy
THEN indexes SHALL be rebuilt after bulk copy completes

Requirement: Update Logging and Recovery

The system SHALL log all sync operations and support recovery from interrupted syncs.

Inputs

DataUpdate table for recording sync history
LastDataUpdates view for querying last successful syncs

Outputs

Complete audit trail of all sync operations
Automatic recovery of interrupted syncs

Business Rules

Each sync operation MUST create a DataUpdate record at start with NumberRecords = -2 (in-progress marker)
The sync operation MUST be wrapped in try/catch to ensure failed operations are marked properly
Successful completion SHALL update EndDT, WasSuccessful = true, and actual NumberRecords
Failed operations SHALL set WasSuccessful = false and NumberRecords = -1
Open entries (NumberRecords = -2) from prior runs SHALL be closed as failed at service startup via CloseOpenUpdateEntries()
Old DataUpdate records SHALL be purged periodically via PurgeUpdateEntries() after configurable retention period
All logging SHALL use ILogger<T> with BeginScope() for structured context (table name, update type, operation ID)

Scenario: Sync operation start logging

WHEN a sync operation begins
THEN a DataUpdate record SHALL be inserted with NumberRecords = -2
THEN the record SHALL include SourceSystem, SourceData, TableName, UpdateType, StartDT
THEN the operation SHALL create a logging scope with table name and operation ID

Scenario: Successful sync completion

WHEN a sync operation completes without errors
THEN the DataUpdate record SHALL be updated with EndDT = GETDATE()
THEN WasSuccessful SHALL be set to true
THEN NumberRecords SHALL reflect the total rows processed

Scenario: Failed sync handling

WHEN a sync operation throws an exception
THEN the exception SHALL be caught in the operation wrapper
THEN the DataUpdate record SHALL be updated with WasSuccessful = false, NumberRecords = -1
THEN the error SHALL be logged via ILogger<T> with full exception details
THEN subsequent sync attempts SHALL retry the operation

Scenario: Recovery from interrupted sync at startup

WHEN the service starts and finds DataUpdate records with NumberRecords = -2
THEN CloseOpenUpdateEntries() SHALL update those records to EndDT = GETDATE(), WasSuccessful = false, NumberRecords = -1
THEN the system SHALL treat those tables as needing fresh sync based on last successful update

Scenario: Periodic history purge

WHEN PurgeUpdateEntries() executes
THEN DataUpdate records older than PurgeRetentionDays SHALL be deleted
THEN the purge SHALL run periodically (e.g., daily) independent of sync operations

Requirement: Parallel Sync Execution

The system SHALL execute multiple table syncs in parallel to optimize throughput with proper cancellation support.

Inputs

List of pending DataUpdateTask objects
MaxDegreeOfParallelism from DataSyncOptions
CancellationToken for cancellation support

Outputs

Concurrent execution of sync operations
Proper isolation between parallel syncs
Graceful cancellation of parallel operations

Business Rules

Pending updates SHALL be executed in parallel using Parallel.ForEachAsync or SemaphoreSlim with Task.WhenAll
Maximum degree of parallelism SHALL be configurable (default = 8)
Each sync operation MUST use its own IServiceScope for scoped service resolution
Each sync operation MUST use its own database connection
Staging tables MUST use unique suffixes (_{OperationId}) to avoid conflicts in parallel scenarios
CancellationToken MUST be passed to all parallel operations
Search processing is blocked while any sync operations are pending

Scenario: Multiple tables need sync

WHEN multiple tables have pending sync tasks
THEN the system SHALL execute up to MaxDegreeOfParallelism sync operations concurrently
THEN each operation SHALL create its own IServiceScope
THEN each operation SHALL use independent SQL connections
THEN completion of one operation SHALL not affect others

Scenario: Cancellation during parallel sync

WHEN cancellation is requested during parallel sync execution
THEN the CancellationToken SHALL propagate to all running operations
THEN operations SHALL check the token and exit gracefully
THEN incomplete operations SHALL be marked as failed

Scenario: Sync blocks search processing

WHEN the work processor checks for pending sync tasks
AND pending tasks exist
THEN sync operations SHALL execute before processing any queued searches
THEN search processing SHALL only begin when no sync tasks remain pending

Scenario: Sync with isolated resources

WHEN multiple sync operations run in parallel
THEN each operation SHALL create staging tables with unique suffixes
THEN each operation SHALL use its own scoped database connection
THEN no shared mutable state SHALL exist between parallel operations

Requirement: CMS Availability and Circuit Breaker

The system SHALL handle CMS (Sybase) connectivity issues with circuit breaker pattern.

Inputs

CMS connection state
Recent CMS sync failure history

Outputs

Automatic retry with backoff for transient failures
Circuit breaker to prevent repeated failed connection attempts

Business Rules

CMS connections SHALL use Polly or similar circuit breaker pattern
The circuit SHALL open after consecutive failures (configurable, default = 3)
The circuit SHALL remain open for a configurable duration (default = 5 minutes)
Health checks SHALL report CMS circuit state
JDE syncs SHALL continue independently of CMS circuit state

Scenario: CMS transient failure

WHEN a CMS sync fails with a transient error
THEN the system SHALL retry with exponential backoff
THEN the failure count SHALL increment

Scenario: Circuit breaker opens

WHEN consecutive CMS sync failures exceed threshold
THEN the circuit breaker SHALL open
THEN subsequent CMS sync attempts SHALL fail fast without attempting connection
THEN JDE syncs SHALL continue normally

Scenario: Circuit breaker recovery

WHEN the circuit breaker open duration elapses
THEN the circuit SHALL transition to half-open state
THEN the next CMS sync attempt SHALL be allowed
THEN success SHALL close the circuit; failure SHALL reopen it

Requirement: Archive Sync Manual Trigger

The system SHALL support manual triggering of archive table syncs via admin API.

Inputs

HTTP request to admin API endpoint
Table name and optional update type parameters

Outputs

Queued sync task for the specified archive table
Status response indicating task queued

Business Rules

Archive tables with all schedules disabled SHALL only sync via manual trigger
The admin API endpoint SHALL require authentication and authorization
Manual triggers SHALL queue a Mass update task for the specified table
The system SHALL return immediate acknowledgment; sync runs asynchronously

Scenario: Manual archive sync trigger

WHEN an authenticated admin calls the manual sync API for an archive table
THEN a Mass update task SHALL be queued for that table
THEN the API SHALL return 202 Accepted with task ID
THEN the sync SHALL execute in the background service

Requirement: Periodic Index Maintenance

The system SHALL support periodic index maintenance independent of mass syncs.

Inputs

Index maintenance configuration (schedule, tables)
Current table statistics

Outputs

Rebuilt or reorganized indexes
Updated statistics

Business Rules

Index maintenance MAY be configured to run on a schedule independent of mass syncs
Maintenance SHALL check index fragmentation before rebuilding
Indexes with fragmentation > 30% SHALL be rebuilt; 10-30% SHALL be reorganized
Statistics SHALL be updated after index maintenance
Maintenance operations SHALL be logged for audit

Scenario: Scheduled index maintenance

WHEN the index maintenance schedule triggers
THEN the system SHALL check fragmentation levels for configured tables
THEN highly fragmented indexes SHALL be rebuilt
THEN moderately fragmented indexes SHALL be reorganized
THEN table statistics SHALL be updated

Requirement: Background service implementation pattern

The system SHALL implement the data synchronization service following .NET BackgroundService best practices for hosted service lifecycle management.

Inputs

IServiceScopeFactory for creating scoped service instances
IOptions<DataSyncOptions> for configuration access
ILogger<DataSyncService> for structured logging
CancellationToken from ExecuteAsync stoppingToken parameter

Outputs

Continuously running background task that checks schedules and executes syncs
Proper cleanup on shutdown with all resources disposed
Logging scope context for all operations

Business Rules

The service MUST implement BackgroundService.ExecuteAsync(CancellationToken)
The main loop MUST use Task.Delay(checkInterval, stoppingToken) between cycles
Each sync cycle MUST create a new IServiceScope via IServiceScopeFactory.CreateAsyncScope()
All scoped services MUST be resolved from the current scope, not from root provider
The scope MUST be disposed using await using pattern after each cycle
Exception handling MUST catch and log errors without crashing the service
OperationCanceledException MUST be caught and result in graceful loop exit when stoppingToken.IsCancellationRequested
The service MUST NOT use static state or shared mutable collections

Scenario: Normal sync cycle execution

WHEN the BackgroundService enters ExecuteAsync
THEN the service SHALL call CloseOpenUpdateEntriesAsync to recover from prior crashes
THEN the service SHALL enter a while loop checking !stoppingToken.IsCancellationRequested
THEN each iteration SHALL create a new IServiceScope
THEN the ISyncOrchestrator SHALL be resolved from the scope
THEN ExecutePendingSyncsAsync SHALL be called with the stoppingToken
THEN the scope SHALL be disposed after the call completes
THEN Task.Delay SHALL pause before the next iteration

Scenario: Exception during sync cycle

WHEN an exception occurs during sync execution (not OperationCanceledException)
THEN the exception SHALL be caught and logged with LogError
THEN the service SHALL continue to the next iteration
THEN the current scope SHALL still be disposed properly
THEN the service SHALL NOT crash or stop unexpectedly

Scenario: Graceful shutdown request

WHEN the host signals shutdown by canceling the stoppingToken
THEN any running Task.Delay SHALL throw OperationCanceledException
THEN the while loop SHALL exit on the IsCancellationRequested check
THEN the ExecuteAsync method SHALL complete normally
THEN any in-progress sync operations SHALL receive the cancellation and complete or cancel

Requirement: Parallel fetch isolation with scoped resources

The system SHALL ensure complete isolation between parallel sync operations using scoped resources and unique identifiers.

Inputs

List of DataUpdateTask objects to execute in parallel
MaxDegreeOfParallelism configuration value
CancellationToken for coordinated cancellation

Outputs

Concurrent execution of sync operations with no resource conflicts
Unique staging tables per operation that do not collide
Independent database connections per operation

Business Rules

Parallel.ForEachAsync MUST be used with ParallelOptions.CancellationToken set
Each parallel task MUST create its own IServiceScope inside the parallel delegate
Database connections MUST NOT be shared across parallel operations
Staging table names MUST include a unique OperationId suffix (GUID or sequential ID)
Format: #Staging{TableName}_{OperationId} and #{TableName}_{OperationId}
Each parallel operation MUST resolve its own instances of all scoped services
No ConcurrentDictionary, shared counters, or other shared mutable state SHALL exist between operations
Total record counts SHALL be accumulated via return values, not shared state

Scenario: Parallel sync with isolated scopes

WHEN multiple DataUpdateTasks are executed via Parallel.ForEachAsync
THEN each task SHALL execute the async delegate independently
THEN each delegate SHALL create a new IServiceScope using CreateAsyncScope
THEN ITableSyncOperation SHALL be resolved from each scope independently
THEN each operation SHALL use its own database connection from the scope
THEN staging tables SHALL use unique OperationId suffixes preventing name collisions
THEN completion of one operation SHALL NOT affect the execution of others

Scenario: Parallel cancellation propagation

WHEN cancellation is requested during Parallel.ForEachAsync execution
THEN the CancellationToken SHALL propagate to all running parallel operations
THEN Parallel.ForEachAsync SHALL stop starting new operations
THEN running operations SHALL receive the token in their async methods
THEN each operation SHALL check the token and exit gracefully
THEN incomplete operations SHALL mark their DataUpdate records as failed

Scenario: Staging table uniqueness verification

WHEN two sync operations for the same table run in parallel
THEN each operation SHALL generate a unique OperationId as GUID
THEN operation A SHALL create staging table with GuidA suffix
THEN operation B SHALL create staging table with GuidB suffix
THEN no SQL errors SHALL occur from table name conflicts
THEN each operation cleanup SHALL only drop its own staging tables

Requirement: Structured logging context

The system SHALL use ILogger.BeginScope to attach contextual information to all log entries during sync operations.

Inputs

ILogger<T> injected into sync operation classes
TableName, UpdateType, OperationId values from current operation

Outputs

All log entries within the scope contain the contextual properties
Log aggregation systems can filter and group by table, type, or operation

Business Rules

Each sync operation MUST call _logger.BeginScope(...) at the start
The scope MUST include at minimum: TableName, UpdateType, OperationId
The scope MUST be disposed using using statement when operation completes
Nested scopes for batches SHALL preserve parent scope properties
LogInformation, LogWarning, LogError calls within the scope SHALL include the context automatically

Scenario: Log scope creation and usage

WHEN a TableSyncOperation begins execution
THEN the operation SHALL create a logging scope with TableName, UpdateType, OperationId
THEN all log calls within ExecuteAsync SHALL include these properties
THEN when the operation completes the scope SHALL be disposed
THEN subsequent operations SHALL have their own independent scopes

Data Source Configurations

Current/Transactional Tables (Full Schedule)

Table	Source	Mass Interval	Daily Interval	Hourly Interval
WorkOrder_Curr	JDE	10080 min (7d)	1440 min (24h)	60 min
LotUsage_Curr	JDE	10080 min (7d)	1440 min (24h)	60 min
WorkOrderTime_Curr	JDE	10080 min (7d)	1440 min (24h)	60 min
WorkOrderStep_Curr	JDE	10080 min (7d)	1440 min (24h)	60 min
WorkOrderComponent_Curr	JDE	10080 min (7d)	1440 min (24h)	60 min
WorkOrderRouting	JDE	10080 min (7d)	1440 min (24h)	60 min

Reference Tables (Full Schedule)

Table	Source	Mass Interval	Daily Interval	Hourly Interval
Item	JDE	10080 min (7d)	1440 min (24h)	60 min
Lot	JDE	10080 min (7d)	1440 min (24h)	60 min
WorkCenter	JDE	10080 min (7d)	1440 min (24h)	60 min
ProfitCenter	JDE	10080 min (7d)	1440 min (24h)	60 min
Branch	JDE	10080 min (7d)	1440 min (24h)	60 min
JdeUser	JDE	10080 min (7d)	1440 min (24h)	60 min
StatusCode	JDE	10080 min (7d)	1440 min (24h)	60 min
FunctionCode	JDE	10080 min (7d)	1440 min (24h)	60 min
OrgHierarchy	JDE	10080 min (7d)	1440 min (24h)	60 min
RouteMaster	JDE	10080 min (7d)	1440 min (24h)	60 min

CMS Tables

Table	Source	Mass Interval	Daily Interval	Hourly Interval	Notes
MisData	CMS	100800 min (70d)	1440 min (24h)	Disabled	Has PostProcessor

Archive Tables (Disabled - Manual Trigger via Admin API)

Table	Source	Notes
WorkOrder_Hist	JDE	All schedules disabled
LotUsage_Hist	JDE	All schedules disabled
WorkOrderStep_Hist	JDE	All schedules disabled
WorkOrderTime_Hist	JDE	All schedules disabled
WorkOrderComponent_Hist	JDE	All schedules disabled

Migration Notes

Legacy Pattern	New Pattern	Rationale
`Topshelf` Windows Service	.NET `BackgroundService`	Native .NET hosting, cross-platform support
`ManualResetEvent` for shutdown	`CancellationToken`	Standard .NET cancellation pattern
`Thread` while loop	`BackgroundService.ExecuteAsync` with `Task.Delay`	Proper async/await, no thread blocking
`Parallel.ForEach` with `MaxDegreeOfParallelism`	`Parallel.ForEachAsync` or `SemaphoreSlim` with `Task.WhenAll`	Modern async patterns, cancellation support
JSON config files + `Newtonsoft.Json`	`System.Text.Json` + `IOptions<T>` pattern	Built-in JSON support, configuration binding
`FunctionConverter` reflection-based delegates	`IDataFetcher<T>` interfaces	Type safety, dependency injection, testability
`ActionConverter` reflection-based delegates	`IPostProcessor` interfaces	Type safety, dependency injection, testability
Static `UpdateProcessor` class	Scoped/singleton services with DI	Testability, proper lifecycle management
NLog	`ILogger<T>` injected + `BeginScope()` for context	Framework-integrated logging, structured context
Global temp tables `##staging_*`	Local temp tables `#Staging{Table}_{OperationId}`	Better isolation in parallel scenarios
`System.Data.SqlClient`	`Microsoft.Data.SqlClient`	Modern SQL Server driver with better performance
Manual SQL MERGE generation	Continue with Dapper + manual MERGE	Performance critical, maintain fine control
No health checks	`IHealthCheck` implementation	Kubernetes/container orchestration support
No metrics/tracing	`System.Diagnostics.Metrics` + `ActivitySource`	Observability, distributed tracing

Resolved Design Decisions

Archive Sync Strategy

Decision: Archive tables will be synced via manual trigger through an authenticated admin API endpoint.

Rationale: Archive data changes infrequently and full syncs are expensive. Manual triggering allows administrators to control when these resource-intensive operations occur.

CMS Availability Handling

Decision: Use circuit breaker pattern (Polly) for CMS connections with configurable failure threshold and open duration.

Rationale: CMS (Sybase) may have different availability characteristics than JDE. Circuit breaker prevents cascading failures and allows JDE syncs to continue independently.

Post-Processing Migration

Decision: Replace reflection-based PostProcessingAction with IPostProcessor interfaces resolved via DI.

Rationale: Type-safe interfaces enable compile-time checking, better testability, and clearer contracts. DI resolution allows for proper scoping and dependency management.

Lookback Window Configuration

Decision: Make lookback multiplier configurable via DataSyncOptions.LookbackMultiplier (default = 3).

Rationale: Different environments may need different lookback windows based on data arrival patterns. Configuration allows tuning without code changes.

Index Rebuild Strategy

Decision: Add periodic index maintenance independent of mass syncs, checking fragmentation before rebuilding.

Rationale: Mass syncs may not run frequently enough for optimal index health. Separate maintenance allows proactive optimization based on actual fragmentation levels.

Codex Review Findings (Addressed)

The following issues were identified during code review and have been addressed in this specification:

Hourly MinimumDT Calculation: ADDRESSED - Spec now correctly documents that hourly updates use the daily timestamp with daily interval lookback (not hourly interval). See "Schedule-Based Sync Triggering" requirement.
Failure Recovery: ADDRESSED - Spec now requires DoUpdate wrapper with try/catch to mark failed updates. CloseOpenUpdateEntries() is invoked at startup. PurgeUpdateEntries() is invoked periodically. See "Update Logging and Recovery" and "Background Service Lifecycle" requirements.
Disabled Schedules Can Run: ADDRESSED - Spec now requires checking both IsEnabled AND specific schedule Enabled flags. Tables with all schedules disabled are only synced via manual trigger. See "Schedule-Based Sync Triggering" requirement.
Temp Table Naming: ADDRESSED - Spec now correctly documents #Staging{Table}_{OperationId} and #{Table}_{OperationId} naming with unique suffixes for parallel isolation. See "Table Management and Merge Operations" requirement.
Archive Table Names: ADDRESSED - Data Source Configurations table now uses correct _Hist suffix (LotUsage_Hist, WorkOrderStep_Hist, etc.).
WorkOrderRouting Table: ADDRESSED - Data Source Configurations table now correctly shows WorkOrderRouting (no _Curr suffix).
MERGE LastUpdateDT Edge Case: ADDRESSED - Spec now documents that tables without LastUpdateDT column update all matched rows unconditionally, and ReleaseDate is only used for ORDER BY in deduplication. See "Table without LastUpdateDT column" scenario.

41 KiB Raw Blame History

Data Sync Specification

Purpose

Source Reference

Requirements

Requirement: Background Service Lifecycle

Inputs

Outputs

Business Rules

Scenario: Service startup initialization

Scenario: Graceful shutdown during sync

Scenario: Scoped service creation per sync

Requirement: Strongly-Typed Configuration

Inputs

Outputs

Business Rules

Scenario: Configuration binding at startup

Scenario: Invalid fetcher type configuration

Requirement: Data Fetcher Abstraction

Inputs

Outputs

Business Rules

Scenario: Streaming data fetch

Scenario: Cancellation during fetch

Requirement: Health Checks

Inputs

Outputs

Business Rules

Scenario: All syncs current

Scenario: Overdue syncs with progress

Scenario: Repeated failures

Requirement: Telemetry and Metrics

Inputs

Outputs

Business Rules

Scenario: Sync operation telemetry

Requirement: Schedule-Based Sync Triggering

Inputs

Outputs

Business Rules

Scenario: Initial system startup with no prior syncs

Scenario: Mass sync interval elapsed

Scenario: Daily sync triggers after mass is current

Scenario: Hourly sync with lookback window

Scenario: Disabled table not scheduled

Requirement: Data Source Configuration

Inputs

Outputs

Business Rules

Scenario: Configuration validation at startup

Scenario: Disabled archive table configuration

Scenario: Post-processing action execution

Scenario: CMS vs JDE source configuration

Requirement: Table Management and Merge Operations

Inputs

Outputs

Business Rules

Scenario: Mass update with table truncation

Scenario: Incremental update with MERGE

Scenario: Table without LastUpdateDT column

Scenario: Large dataset batching

Scenario: Index management during bulk load

Requirement: Update Logging and Recovery

Inputs

Outputs

Business Rules

Scenario: Sync operation start logging

Scenario: Successful sync completion

Scenario: Failed sync handling

Scenario: Recovery from interrupted sync at startup

Scenario: Periodic history purge

Requirement: Parallel Sync Execution

Inputs

Outputs

Business Rules

Scenario: Multiple tables need sync

Scenario: Cancellation during parallel sync

Scenario: Sync blocks search processing

Scenario: Sync with isolated resources

Requirement: CMS Availability and Circuit Breaker

41 KiB

Raw Blame History