lmxopcua

Author	SHA1	Message	Date
Joseph Doherty	0da4f3b63a	fix(core-alarm-historian): resolve Low code-review findings (Core.AlarmHistorian-008,011) - Core.AlarmHistorian-008: cache queue depth in an Interlocked counter so EnqueueAsync no longer runs COUNT(*) on every alarm; consolidate DrainOnceAsync onto a single SqliteConnection per tick (purge, batch read, dead-letter, and outcome transaction all share it). - Core.AlarmHistorian-011: confirm the stale Galaxy.Host XML doc references were already fixed under earlier commits; flip to Resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 05:38:26 -04:00
Joseph Doherty	cec7ab6ec4	fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-010) The test suite lacked coverage for four critical paths: corrupt/null- deserializing PayloadJson rows, StartDrainLoop timer behavior and backoff honoring, concurrent EnqueueAsync+DrainOnceAsync stress, and the outcomes.Count != events.Count cardinality-mismatch branch. Added tests covering all four gaps (committed across companion findings): - Drain_with_corrupt_payload_row_deadletters_it_and_keeps_good_rows_aligned - Drain_with_corrupt_head_row_does_not_stall_queue - StartDrainLoop_honors_backoff_and_slows_cadence_under_retry - StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy - StartDrainLoop_records_drain_fault_and_keeps_running - Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy - Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows - Capacity_eviction_increments_evicted_count_on_status - GetStatus_snapshot_is_consistent_under_concurrent_drain Updated Open findings count to 2 (Core.AlarmHistorian-008 + -011, both Low). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 09:30:17 -04:00
Joseph Doherty	f6d487b167	fix(driver-historian-wonderware-client): suppress xUnit1051 false-positive in ContractsWireParityTests Add #pragma warning disable xUnit1051 at the top of ContractsWireParityTests.cs. The xUnit1051 analyser fires on MessagePack's Serialize/Deserialize overloads that have an optional CancellationToken parameter; these are synchronous parity tests where the token is not meaningful — the suppression is scoped to this file only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 09:28:20 -04:00
Joseph Doherty	5718cb5778	fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-007) When WriteBatchAsync returned a wrong-cardinality outcome list, DrainOnceAsync threw InvalidOperationException after potential delivery — causing duplicate events on re-drain or permanent queue stall on a deterministic writer bug. - The throw replaced with log + backoff: mismatch is recorded into _lastError, _drainState set to BackingOff, backoff bumped, method returns without applying any outcomes, mirroring the writer-exception path. - Regression test Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows asserts rows stay queued, DrainState = BackingOff, LastError populated, and that a fixed writer subsequently drains cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 09:27:55 -04:00
Joseph Doherty	6d520c6756	fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-005) Status fields (_lastDrainUtc, _lastSuccessUtc, _lastError, _drainState, _evictedCount) were written by the drain timer thread and read by GetStatus() / health-check threads with no memory barrier, risking torn DateTime? reads and stale DrainState observations. - Added _statusLock object; all writes to status fields now happen inside lock(_statusLock) blocks in DrainOnceAsync and DrainTimerCallback. - GetStatus() snapshots all fields atomically under the same lock so the Admin UI / /healthz endpoint always sees a consistent view. - Regression test GetStatus_snapshot_is_consistent_under_concurrent_drain drives status writes and reads from concurrent threads; asserts no throws. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 09:27:31 -04:00
Joseph Doherty	0cc3b23101	fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-003) EnqueueAsync used synchronous SQLite I/O (conn.Open / ExecuteNonQuery / COUNT(*)) on the caller's thread, blocking the alarm-emitting thread under write contention with the drain worker. The cancellationToken parameter was silently ignored. - EnqueueAsync converted to genuine async: OpenAsync / ExecuteNonQueryAsync / ExecuteScalarAsync used throughout; ct threaded to every await. - ApplyPragmasAsync added alongside the existing ApplyPragmas helper so the WAL + busy_timeout PRAGMAs are applied on the async open path too. - EnforceCapacityAsync added to handle capacity eviction on the async path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 09:23:14 -04:00
Joseph Doherty	4638366b77	fix(alarm-historian): resolve High code-review findings (Core.AlarmHistorian-002, -004, -006) Core.AlarmHistorian-002 — drain loop now honors exponential backoff: StartDrainLoop arms a self-rescheduling one-shot Timer. RescheduleDrain sets the next due-time to max(tickInterval, CurrentBackoff) while the sink is BackingOff, so a historian outage genuinely slows the cadence down the 1s->2s->5s->15s->60s ladder instead of hammering at the fixed tick. Class doc-comment updated. Core.AlarmHistorian-004 — SQLite busy handling: the connection string is built via SqliteConnectionStringBuilder with DefaultTimeout=5, and a new OpenConnection helper applies PRAGMA busy_timeout=5000 and PRAGMA journal_mode=WAL on every open. A concurrent enqueue-vs-drain file-lock collision now waits the lock out instead of failing fast with SQLITE_BUSY. All connection open sites switched to the helper. Core.AlarmHistorian-006 — drain-loop faults are no longer unobserved: the timer callback (DrainTimerCallback) awaits DrainOnceAsync inside a try/catch that logs via _logger.Error, records the message into _lastError, and sets _drainState=BackingOff so a stalled drain is visible on GetStatus; a finally always re-arms the timer. Regression tests added to SqliteStoreAndForwardSinkTests: StartDrainLoop_honors_backoff_and_slows_cadence_under_retry, StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy, StartDrainLoop_records_drain_fault_and_keeps_running, Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy. findings.md: 002/004/006 marked Resolved; open count 10 -> 7. Build: clean (0 warnings). Tests: 20/20 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 06:27:39 -04:00
Joseph Doherty	796871c210	fix(alarm-historian): keep queue rows aligned to events on drain (Core.AlarmHistorian-001) ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every row but events.Add was guarded by `if (evt is not null)`. A corrupt / null-deserializing payload desynced the lists, so DrainOnceAsync applied each outcome to the wrong RowId — an Ack could delete an un-sent event (silent alarm-event data loss) and the corrupt row stalled the queue head forever. ReadBatch now returns a single list of QueueRow(long RowId, AlarmHistorianEvent? Event) records so a rowId can never drift from its event; deserialization is wrapped to yield null on JsonException. DrainOnceAsync immediately dead-letters rows whose payload is null/un-deserializable and forwards only well-formed events to the writer, mapping outcomes by RowId. Regression tests cover a corrupt row mid-batch and at the queue head. Core.AlarmHistorian suite: 16/16 pass. Resolves code-review finding Core.AlarmHistorian-001 (Critical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 05:54:20 -04:00
Joseph Doherty	8568f5cd85	docs(code-reviews): comprehensive per-module review pass at `76d35d1` Reviewed all 31 src/ production projects against the 10-category checklist in REVIEW-PROCESS.md. Each module gets its own findings.md; code-reviews/README.md is regenerated from them. 334 findings: 6 Critical, 46 High, 126 Medium, 156 Low. Critical findings: - Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead on an unresolvable node crashes the process (remote DoS). - Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView) plus unauthenticated mutating routes. - Core.Scripting-001: System.Environment reachable from operator scripts; Environment.Exit() terminates the server. - Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt payload misapplies outcomes — silent alarm-event data loss. - Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so a transient gateway drop permanently kills the event stream. All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md section 4. Review only — no source code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 05:20:27 -04:00

9 Commits