Commit Graph

11 Commits

Author SHA1 Message Date
Joseph Doherty
23d59d73f2 fix(scripting+alarms): close remaining re-review findings
Single commit covering the four small/medium fixes from the updated
code review.

Core.Scripting-014 (Medium, Concurrency):
  CompiledScriptCache.Clear() used the key-only TryRemove(key, out var
  lazy) overload — same race shape Core.Scripting-006 closed in
  GetOrCompile's catch block. A concurrent re-add between snapshot and
  TryRemove was evicted + disposed while the new caller still held it.
  Replaced with the value-scoped TryRemove(KeyValuePair<,>) overload.
  Regression test
  Clear_uses_value_scoped_TryRemove_so_a_race_inserted_entry_survives
  added.

Core.Scripting-013 (Medium, Security):
  Hand-rolled BuildWrapperSource pastes user source between literal
  braces; brace-balanced source could inject sibling methods/classes
  alongside CompiledScript.Run. Analyzer still walked the injected
  members so it wasn't a direct escape, but it relaxed the documented
  'method body' authoring contract. Added EnforceSingleRunMember:
  after ParseText, the compilation unit must hold exactly one type
  (CompiledScript) and that type must hold exactly one member (the Run
  method). Any deviation throws CompilationErrorException with LMX001/
  LMX002 diagnostic IDs and a Core.Scripting-013 reference in the
  message. Two regression tests added covering the sibling-method and
  sibling-class injection vectors.

Core.Scripting-015 (Low, Correctness, latent):
  ToCSharpTypeName's generic branch truncated at the first backtick via
  IndexOf, silently dropping closed args of nested-generic shapes
  (Outer<T>.Inner<U>). No production caller exercises this shape today
  (all TContext/TResult are top-level non-nested), so the bug was
  latent. Rewrote the generic branch to walk the FullName segment-by-
  segment, consuming generic args per segment so nested shapes emit
  valid C# (global::Ns.Outer<T>.Inner<U> rather than the broken
  Outer<T,U>).

Core.ScriptedAlarms-013 (Low, Documentation):
  The internal test accessors TryGetScratchReadCacheForTest /
  TryGetScratchContextForTest return live mutable scratch refilled in
  place under _evalGate. XML docs didn't warn future test authors about
  the synchronization contract. Added a <remarks> block to each
  documenting the only-safe-on-quiesced-engine + identity-or-single-key
  contract.

Verification (suites green):
  Core.Scripting.Tests: 110/110 (was 107 — +3 new rejection/race tests)
  Core.ScriptedAlarms.Tests: 67/67 (unchanged — doc-only fix)
  Core.VirtualTags.Tests: 57/57 (unchanged)

After this commit, all 12 findings from the updated re-review are
closed (10 Resolved, 1 Won't Fix none, 1 Deferred — Driver.Galaxy-017).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:00:59 -04:00
Joseph Doherty
fb7c6c7046 fix(scripting): route engines through CompiledScriptCache (Core.Scripting-016)
Both VirtualTagEngine.Load and ScriptedAlarmEngine.LoadAsync were calling
ScriptEvaluator.Compile directly, bypassing CompiledScriptCache. The
Core.Scripting-008 collectible-ALC fix wired Dispose only through the cache's
Clear()/Dispose(), so the per-publish accretion the -008 fix was meant to
eliminate was still in effect on the actual production path — the headline
'no more restarts needed' guarantee wasn't delivered.

Resolution:
  - VirtualTagEngine + ScriptedAlarmEngine each gained a private
    CompiledScriptCache<TContext, TResult> instance.
  - Both Load methods now call _compileCache.GetOrCompile(source).
  - Publish-replace path: _compileCache.Clear() runs alongside the existing
    _tags / _alarms clears so the prior generation's ALCs are disposed
    before recompile.
  - Engine Dispose now calls _compileCache.Dispose() so shutdown actually
    releases the emitted assemblies.

Side-fix in CompiledScriptCache: Dispose() set _disposed=true then called
Clear(), but Clear() had a pre-existing 'if (_disposed) return' guard that
aborted the drain unconditionally — making the Dispose-triggered cleanup a
silent no-op. Removed the disposed-guard on Clear() (clearing an empty/
cleared cache is idempotent).

Side-fix in ScriptedAlarmEngine.Dispose: cleared _alarms AFTER the
Task.WhenAll drain. The drain guarantees no background callback is mid-
flight, so clearing is safe. Previously _alarms was deliberately NOT
cleared on Dispose (per Core.ScriptedAlarms-005), but that left the
AlarmState records holding TimedScriptEvaluator → ScriptEvaluator → delegate
references that rooted the emitted assemblies, defeating the cache's
Dispose work on the engine side.

Regression tests:
  - VirtualTagEngineTests.Dispose_unloads_compiled_script_assembly
  - ScriptedAlarmEngineTests.Dispose_unloads_compiled_predicate_assembly
  Both use WeakReference + bounded GC.Collect() to prove the emitted
  assembly is reclaimable after engine.Dispose(). The alarms test had to
  be synchronous (not 'async Task<WeakReference>') because async state
  machines capture locals as state-struct fields, keeping them alive past
  the method's apparent end and defeating GC.

Verification:
  - Core.Scripting.Tests: 104/104 (unchanged).
  - VirtualTags.Tests: 57/57 (was 56 — +1 unload test).
  - ScriptedAlarms.Tests: 67/67 (was 66 — +1 unload test).
  - All other consumer suites still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 17:33:34 -04:00
Joseph Doherty
0001cdd579 fix(scripted-alarms): reuse per-alarm evaluation scratch on the hot path
Core.ScriptedAlarms-009 resolution: replace the per-call Dictionary +
AlarmPredicateContext allocation with a per-alarm reusable AlarmScratch
held in _scratchByAlarmId, refilled in place under _evalGate on each
evaluation. The hot path no longer allocates per upstream tag change.

Why this matters:
  On a busy line where many tags feeding many alarms change frequently,
  the old BuildReadCache allocated a fresh dictionary + context on every
  predicate evaluation — a steady stream of short-lived allocations the
  GC eventually has to reclaim. With the reuse, the dictionary and
  context are allocated once per alarm (on first evaluation) and refilled
  in place across every subsequent re-eval.

Implementation:
  - New private AlarmScratch class holds the reusable
    Dictionary<string, DataValueSnapshot> read cache (pre-sized to the
    alarm's Inputs.Count) and the AlarmPredicateContext that wraps it by
    reference. The context observes refilled values without being
    re-created.
  - ConcurrentDictionary<string, AlarmScratch> _scratchByAlarmId on the
    engine, cleared in LoadAsync alongside _alarms so a config-publish
    drops the prior generation's scratch (Inputs / Logger may change).
  - EvaluatePredicateToStateAsync looks up scratch via GetOrAdd, calls
    the new RefillReadCache(Dictionary, IReadOnlySet) helper to clear +
    repopulate the dictionary in place, then runs the predicate against
    the reused context.
  - BuildReadCache removed.

Safety:
  Reuse is serialised under _evalGate which guarantees no two threads
  ever observe the same scratch in a half-refilled state. The
  AlarmPredicateContext is bound to the scratch dictionary by reference,
  so the predicate's ctx.GetTag(path) sees the freshly-refilled values
  rather than a stale snapshot.

Verification:
  - All 66 ScriptedAlarms tests pass (was 63 — three new regression tests
    locking the reuse contract).
  - All 56 VirtualTags tests still pass (unchanged).
  - All 104 Core.Scripting tests still pass (unchanged).

New tests in ScriptedAlarmEngineTests:
  - Reevaluation_reuses_the_same_read_cache_dictionary — asserts
    ReferenceEquals(scratch_before, scratch_after) across two
    evaluations of the same alarm.
  - Reevaluation_reuses_the_same_predicate_context — same, for the
    context.
  - LoadAsync_drops_the_prior_generations_scratch — asserts a config
    publish wipes the prior scratch (so a stale Logger / Inputs can't
    leak into the new generation).

Internal test hooks TryGetScratchReadCacheForTest /
TryGetScratchContextForTest added via the existing
InternalsVisibleTo for the tests project. Kept internal — not part of
the public engine surface.

Docs:
  - docs/v2/Galaxy.Performance.md "Scripted-alarm engine" section
    rewritten as "hot-path allocation reuse" documenting the new
    contract + reuse safety reasoning + the three regression tests.
  - code-reviews/Core.ScriptedAlarms/findings.md -009 flipped
    Won't Fix → Resolved.
  - code-reviews/README.md regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:10:09 -04:00
Joseph Doherty
99354bfaf2 fix(core-scripted-alarms): resolve Low code-review findings (Core.ScriptedAlarms-003,006,008,010,011; -009 documented)
- Core.ScriptedAlarms-003: emit OnEvent OUTSIDE _evalGate by collecting
  pending emissions during the gate-held section and flushing them after
  release; eliminates re-entrancy deadlock the docs already promised.
- Core.ScriptedAlarms-006: track every fire-and-forget Reevaluate /
  ShelvingCheck task in _inFlight; Dispose drains the set so the engine
  no longer races store writes against teardown.
- Core.ScriptedAlarms-008: store comments as ImmutableList<AlarmComment>
  so AppendComment is O(log n) instead of O(n).
- Core.ScriptedAlarms-010: document the deliberate input-quality
  asymmetry (Uncertain drives the predicate, renders {?} in the message)
  in docs/ScriptedAlarms.md and on MessageTemplate.Resolve remarks.
- Core.ScriptedAlarms-011: propagate the no-op reason through
  TransitionResult.NoOp(state, reason) and log it from
  ScriptedAlarmEngine.ApplyAsync.
- Core.ScriptedAlarms-009 (Won't Fix per recommendation): documented the
  per-evaluation dictionary allocation in docs/v2/Galaxy.Performance.md
  with a mitigation path if a future soak surfaces pressure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:23:31 -04:00
Joseph Doherty
69994f9cf6 fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-012)
Add engine-level tests covering the six gaps identified in the finding:
(1) timed-shelve auto-expiry driven via injectable clock + RunShelvingCheckForTest
    hook so timer tests are deterministic;
(2) ConfirmAsync, TimedShelveAsync/UnshelveAsync round-trip, EnableAsync engine
    methods exercised end-to-end;
(3) OnEvent subscriber-throws isolation — engine state advances and stays
    operational after a subscriber throws;
(4) IAlarmStateStore.SaveAsync failure leaves in-memory state unchanged (locks in
    the persist-before-update invariant from finding-007);
(5) second LoadAsync does not leak the old timer (regression for finding-002);
(6) AreInputsReady cold-start guard correctly blocks on Bad/missing inputs and
    allows Uncertain-quality inputs through.

Expose RunShelvingCheckForTest() internal method on ScriptedAlarmEngine to
support deterministic timer tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:19 -04:00
Joseph Doherty
a0b3a4c8a7 fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-007)
Reorder persist/update in ApplyAsync, ReevaluateAsync, and ShelvingCheckAsync:
SaveAsync is now called before the in-memory _alarms entry is advanced. A store
failure therefore leaves both the persisted and in-memory views at the prior state
rather than diverging, maintaining the invariant that startup recovery reflects
actual persisted state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:18:42 -04:00
Joseph Doherty
cdaa0da45d fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-005)
Add _disposed re-checks inside ReevaluateAsync and ShelvingCheckAsync after
acquiring _evalGate so callbacks in flight when Dispose() runs bail out cleanly
instead of mutating _alarms or writing to a disposed store. Drop the
_alarms.Clear() from Dispose() — clearing outside the gate races concurrent
reads and is unnecessary since the object is being discarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:17:50 -04:00
Joseph Doherty
72b5f7a20c fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-004)
Split the LoadAsync seed-read + subscribe loop: ReadTag seed fills _valueCache
first, then persisted-state restore runs, then _loaded = true, then SubscribeTag
is called. Any synchronous initial push from the upstream now arrives after
_alarms is fully initialised and _loaded = true, so ReevaluateAsync will queue
correctly behind the gate rather than racing the half-built state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:16:44 -04:00
Joseph Doherty
b75542bbac fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-002)
Dispose any existing _shelvingTimer before reassigning it inside LoadAsync so
that a second LoadAsync call does not leak the old timer and leave two timers
running concurrently against the same engine state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:15:58 -04:00
Joseph Doherty
e3f8fa535a fix(scripted-alarms): resolve High code-review finding (Core.ScriptedAlarms-001)
_alarms was a plain Dictionary<string, AlarmState> mutated under the
_evalGate semaphore, but four read paths (GetState, GetAllStates, the
LoadedAlarmIds property, and RunShelvingCheck) touched it from arbitrary
threads with no synchronisation. A Dictionary read concurrent with a
writer's entry reassignment can throw InvalidOperationException or return
torn state.

Switched _alarms to ConcurrentDictionary<string, AlarmState>. The only
write shapes are indexer-set and Clear, both atomic on ConcurrentDictionary,
so all mutations stay correct without further change; reads now get safe
snapshot semantics. LoadedAlarmIds materialises the key snapshot to keep
its IReadOnlyCollection<string> return type. This matches _valueCache,
which is already a ConcurrentDictionary.

Added a regression test (Concurrent_reads_during_mutation_do_not_throw)
that hammers the engine with state mutations while four reader threads
continuously call the three unguarded read paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:40 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00