lmxopcua

Author	SHA1	Message	Date
Joseph Doherty	0001cdd579	fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary + AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch held in _scratchByAlarmId, refilled in place under _evalGate on each evaluation. The hot path no longer allocates per upstream tag change. Why this matters: On a busy line where many tags feeding many alarms change frequently, the old BuildReadCache allocated a fresh dictionary + context on every predicate evaluation — a steady stream of short-lived allocations the GC eventually has to reclaim. With the reuse, the dictionary and context are allocated once per alarm (on first evaluation) and refilled in place across every subsequent re-eval. Implementation: - New private AlarmScratch class holds the reusable Dictionary<string, DataValueSnapshot> read cache (pre-sized to the alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by reference. The context observes refilled values without being re-created. - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the engine, cleared in LoadAsync alongside _alarms so a config-publish drops the prior generation's scratch (Inputs / Logger may change). - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear + repopulate the dictionary in place, then runs the predicate against the reused context. - BuildReadCache removed. Safety: Reuse is serialised under _evalGate which guarantees no two threads ever observe the same scratch in a half-refilled state. The AlarmPredicateContext is bound to the scratch dictionary by reference, so the predicate's ctx.GetTag(path) sees the freshly-refilled values rather than a stale snapshot. Verification: - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests locking the reuse contract). - All 56 VirtualTags tests still pass (unchanged). - All 104 Core.Scripting tests still pass (unchanged). New tests in ScriptedAlarmEngineTests: - Reevaluation_reuses_the_same_read_cache_dictionary — asserts ReferenceEquals(scratch_before, scratch_after) across two evaluations of the same alarm. - Reevaluation_reuses_the_same_predicate_context — same, for the context. - LoadAsync_drops_the_prior_generations_scratch — asserts a config publish wipes the prior scratch (so a stale Logger / Inputs can't leak into the new generation). Internal test hooks TryGetScratchReadCacheForTest / TryGetScratchContextForTest added via the existing InternalsVisibleTo for the tests project. Kept internal — not part of the public engine surface. Docs: - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section rewritten as "hot-path allocation reuse" documenting the new contract + reuse safety reasoning + the three regression tests. - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped Won't Fix → Resolved. - code-reviews/README.md regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 16:10:09 -04:00
Joseph Doherty	99354bfaf2	fix(core-scripted-alarms): resolve Low code-review findings (Core.ScriptedAlarms-003,006,008,010,011; -009 documented) - Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting pending emissions during the gate-held section and flushing them after release; eliminates re-entrancy deadlock the docs already promised. - Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate / ShelvingCheck task in _inFlight; Dispose drains the set so the engine no longer races store writes against teardown. - Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment> so AppendComment is O(log n) instead of O(n). - Core.ScriptedAlarms-010: document the deliberate input-quality asymmetry (Uncertain drives the predicate, renders {?} in the message) in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks. - Core.ScriptedAlarms-011: propagate the no-op reason through TransitionResult.NoOp(state, reason) and log it from ScriptedAlarmEngine.ApplyAsync. - Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md with a mitigation path if a future soak surfaces pressure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 07:23:31 -04:00
Joseph Doherty	4dcfaace62	fix(scripted-alarms): update findings.md for resolved Medium findings Mark Core.ScriptedAlarms-002, -004, -005, -007, -012 as Resolved with one-line descriptions. Update open-findings count from 11 to 6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 08:24:54 -04:00
Joseph Doherty	e3f8fa535a	fix(scripted-alarms): resolve High code-review finding (Core.ScriptedAlarms-001) _alarms was a plain Dictionary<string, AlarmState> mutated under the _evalGate semaphore, but four read paths (GetState, GetAllStates, the LoadedAlarmIds property, and RunShelvingCheck) touched it from arbitrary threads with no synchronisation. A Dictionary read concurrent with a writer's entry reassignment can throw InvalidOperationException or return torn state. Switched _alarms to ConcurrentDictionary<string, AlarmState>. The only write shapes are indexer-set and Clear, both atomic on ConcurrentDictionary, so all mutations stay correct without further change; reads now get safe snapshot semantics. LoadedAlarmIds materialises the key snapshot to keep its IReadOnlyCollection<string> return type. This matches _valueCache, which is already a ConcurrentDictionary. Added a regression test (Concurrent_reads_during_mutation_do_not_throw) that hammers the engine with state mutations while four reader threads continuously call the three unguarded read paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 06:27:40 -04:00
Joseph Doherty	8568f5cd85	docs(code-reviews): comprehensive per-module review pass at `76d35d1` Reviewed all 31 src/ production projects against the 10-category checklist in REVIEW-PROCESS.md. Each module gets its own findings.md; code-reviews/README.md is regenerated from them. 334 findings: 6 Critical, 46 High, 126 Medium, 156 Low. Critical findings: - Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead on an unresolvable node crashes the process (remote DoS). - Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView) plus unauthenticated mutating routes. - Core.Scripting-001: System.Environment reachable from operator scripts; Environment.Exit() terminates the server. - Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt payload misapplies outcomes — silent alarm-event data loss. - Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so a transient gateway drop permanently kills the event stream. All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md section 4. Review only — no source code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 05:20:27 -04:00

5 Commits