Files
lmxopcua/docs/ScriptedAlarms.md
Joseph Doherty 7b6ab2ec6f fix(scripting): unload compiled-script assemblies via collectible ALC
Core.Scripting-008 resolution: replace the legacy CSharpScript.CreateDelegate
path with hand-rolled CSharpCompilation + Emit + collectible AssemblyLoadContext,
so per-publish compile accretion no longer requires a server restart to reclaim.

Why this was needed:
  Roslyn's CSharpScript path emits dynamically-compiled script assemblies into
  the default AssemblyLoadContext, which is non-collectible. Across config-
  publish generations each Clear() drops dictionary entries but the emitted
  assemblies stay loaded for process lifetime, so memory grows steadily on
  long-running servers with frequent publishes. The accepted-limitation note
  in docs/VirtualTags.md recommended scheduled restarts as the workaround;
  operator feedback was that restarts are difficult, so the underlying
  limitation was the right thing to fix.

Implementation:
  - New ScriptAssemblyLoadContext(name, isCollectible: true) hosts one emitted
    script assembly per evaluator.
  - ScriptEvaluator.Compile synthesises a wrapper class around the user source
    (CompiledScript.Run(globals) — explicit return required per ordinary C#
    semantics, which every existing script already uses), builds a
    CSharpCompilation against the sandbox references, runs the
    ForbiddenTypeAnalyzer over the semantic model unchanged, emits to an
    in-memory PE stream, loads via ScriptAssemblyLoadContext.LoadFromStream,
    and binds a strongly-typed Func<ScriptGlobals<TContext>, TResult> delegate
    via reflection.
  - ScriptEvaluator now implements IDisposable — Dispose calls
    AssemblyLoadContext.Unload(), which makes the emitted assembly eligible
    for GC at the next collection cycle.
  - CompiledScriptCache.Clear() disposes every materialised evaluator before
    dropping its dictionary entry; CompiledScriptCache itself is now
    IDisposable for graceful server shutdown.
  - ScriptSandbox.Build returns a new SandboxConfig (References + Imports)
    instead of a Roslyn ScriptOptions; references now span BCL via the
    TRUSTED_PLATFORM_ASSEMBLIES set filtered to System.* + netstandard +
    Microsoft.Win32.Registry, so forbidden BCL types resolve at compile and
    ForbiddenTypeAnalyzer is the sole security gate (consistent with the
    Core.Scripting-001 / -002 model — references-list-only restriction is
    porous against type forwarding, so the analyzer must be the real gate).

Verification:
  - All 104 Core.Scripting tests pass (was 101 — three new regression tests
    locking the unload contract).
  - All 56 VirtualTags tests pass (unchanged).
  - All 63 ScriptedAlarms tests pass (unchanged).
  - New CompiledScriptCacheTests:
    - Dispose_unloads_compiled_script_assembly_load_context — proves single-
      evaluator ALC unload via WeakReference + bounded GC.Collect() loop.
    - Clear_disposes_every_materialised_evaluator — proves publish-replace
      releases every prior generation's ALC.
    - GetOrCompile_after_Dispose_throws_ObjectDisposedException — locks the
      post-dispose contract.

Docs:
  - docs/VirtualTags.md "Compile cache" section rewritten: the accepted-
    limitation note replaced with the unload contract + the new authoring
    convention (explicit return).
  - docs/ScriptedAlarms.md cross-reference updated to drop the obsolete
    restart guidance.
  - code-reviews/Core.Scripting/findings.md Core.Scripting-008 flipped
    Won't Fix → Resolved with the implementation summary.
  - code-reviews/README.md regenerated.

Pre-existing breakage note: Driver.Galaxy fails the solution-wide build on
master because its ProjectReference to the sibling mxaccessgw repo's
MxGateway.Client targets a path that the sibling repo no longer has after a
recent restructuring. This is unrelated to Core.Scripting-008 and was
verified to exist on master before this branch was cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 15:55:04 -04:00

15 KiB

Scripted Alarms

Core.ScriptedAlarms is the Phase 7 subsystem that raises OPC UA Part 9 alarms from operator-authored C# predicates rather than from driver-native alarm streams. Scripted alarms are additive: Galaxy, AB CIP, FOCAS, and OPC UA Client drivers keep their native IAlarmSource implementations unchanged, and a ScriptedAlarmSource simply registers as another source in the same fan-out. Predicates read tags from any source (driver tags or virtual tags) through the shared ITagUpstreamSource and emit condition transitions through the engine's Part 9 state machine.

This file covers the engine internals — predicate evaluation, state machine, persistence, and the engine-to-IAlarmSource adapter. The server-side plumbing that turns those emissions into OPC UA AlarmConditionState nodes, applies retries, persists alarm transitions to the Historian, and routes operator acks through the session's AlarmAck permission lives in AlarmTracking.md and is not repeated here.

Definition shape

ScriptedAlarmDefinition (src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmDefinition.cs) is the runtime contract the engine consumes. The generation-publish path materialises these from the ScriptedAlarm + Script config tables via Phase7EngineComposer.ProjectScriptedAlarms.

Field Notes
AlarmId Stable identity. Also the OPC UA ConditionId and the key in IAlarmStateStore. Convention: {EquipmentPath}::{AlarmName}.
EquipmentPath UNS path the alarm hangs under in the address space. ACL scope inherits from the equipment node.
AlarmName Browse-tree display name.
Kind AlarmKindAlarmCondition, LimitAlarm, DiscreteAlarm, or OffNormalAlarm. Controls only the OPC UA ObjectType the node surfaces as; the internal state machine is identical for all four.
Severity AlarmSeverity enum (Low / Medium / High / Critical). Static per decision #13 — the predicate does not compute severity. The DB column is an OPC UA Part 9 1..1000 integer; Phase7EngineComposer.MapSeverity bands it into the four-value enum.
MessageTemplate String with {TagPath} placeholders, resolved at emission time. See below.
PredicateScriptSource Roslyn C# script returning bool. true = condition active; false = cleared.
HistorizeToAveva When true, every emission is enqueued to IAlarmHistorianSink. Default true. Galaxy-native alarms default false since Galaxy historises them directly.
Retain Part 9 retain flag — keep the condition visible after clear while un-acked/un-confirmed transitions remain. Default true.

Illustrative definition:

new ScriptedAlarmDefinition(
    AlarmId:       "Plant/Line1/Oven::OverTemp",
    EquipmentPath: "Plant/Line1/Oven",
    AlarmName:     "OverTemp",
    Kind:          AlarmKind.LimitAlarm,
    Severity:      AlarmSeverity.High,
    MessageTemplate: "Oven {Plant/Line1/Oven/Temp} exceeds limit {Plant/Line1/Oven/TempLimit}",
    PredicateScriptSource: "return GetTag(\"Plant/Line1/Oven/Temp\").AsDouble() > GetTag(\"Plant/Line1/Oven/TempLimit\").AsDouble();");

Predicate evaluation

Alarm predicates reuse the same Roslyn sandbox as virtual tags — ScriptEvaluator<AlarmPredicateContext, bool> compiles the source, TimedScriptEvaluator wraps it with the configured timeout (default from TimedScriptEvaluator.DefaultTimeout), and DependencyExtractor statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in VirtualTags.md; ScriptedAlarms does not redefine them. The known resource limits — unbounded script-side memory and the orphan-thread CPU-budget caveat — are documented in that file as well; per-publish assembly accretion was resolved by the Core.Scripting-008 collectible-AssemblyLoadContext rewrite and no longer requires periodic server restarts.

AlarmPredicateContext (AlarmPredicateContext.cs) is the script's ScriptContext subclass:

  • GetTag(path) returns a DataValueSnapshot from the engine-maintained read cache. Missing path → DataValueSnapshot(null, 0x80340000u, null, now) (BadNodeIdUnknown). An empty path returns the same.
  • SetVirtualTag(path, value) throws InvalidOperationException. Predicates must be side-effect free per plan decision #6; writes would couple alarm state to virtual-tag state in ways that are near-impossible to reason about. Operators see the rejection in scripts-*.log.
  • Now and Logger are provided by the engine.

Evaluation cadence:

  • On every upstream tag change that any alarm's input set references (OnUpstreamChangeReevaluateAsync). The engine maintains an inverse index tag path → alarm ids (_alarmsReferencing); only affected alarms re-run.
  • On a 5-second shelving-check timer (_shelvingTimer) for timed-shelve expiry.
  • At LoadAsync for every alarm, to re-derive ActiveState per plan decision #14 (startup recovery).

If a predicate throws or times out, the engine logs the failure and leaves the prior ActiveState intact — it does not synthesise a clear. Operators investigating a broken predicate should never see a phantom clear preceding the error.

Part 9 state machine

Part9StateMachine (Part9StateMachine.cs) is a pure static function set. Every transition takes the current AlarmConditionState plus the event, returns a new record and an EmissionKind. No I/O, no mutation, trivially unit-testable. Transitions map to OPC UA Part 9:

  • ApplyPredicate(current, predicateTrue, nowUtc) — predicate re-evaluation. Inactive → Active sets Acked = Unacknowledged and Confirmed = Unconfirmed; Active → Inactive updates LastClearedUtc and consumes OneShot shelving. Disabled alarms no-op.
  • ApplyAcknowledge / ApplyConfirm — operator ack/confirm. Require a non-empty user string (audit requirement). Each appends an AlarmComment with Kind = "Acknowledge" / "Confirm".
  • ApplyOneShotShelve / ApplyTimedShelve(unshelveAtUtc) / ApplyUnshelve — shelving transitions. Timed requires unshelveAtUtc > nowUtc.
  • ApplyEnable / ApplyDisable — operator enable/disable. Disabled alarms ignore predicate results until re-enabled; on enable, ActiveState is re-derived from the next evaluation.
  • ApplyAddComment(text) — append-only audit entry, no state change.
  • ApplyShelvingCheck(nowUtc) — called by the 5s timer; promotes expired Timed shelving to Unshelved with a system / AutoUnshelve audit entry.

Two invariants the machine enforces:

  1. Disabled alarms ignore every predicate evaluation — they never transition ActiveState / AckedState / ConfirmedState until re-enabled.
  2. Shelved alarms still advance their internal state but emit EmissionKind.Suppressed instead of Activated / Cleared. The engine advances the state record (so startup recovery reflects reality) but ScriptedAlarmSource does not publish the suppressed transition to subscribers. OneShot expires on the next clear; Timed expires at ShelvingState.UnshelveAtUtc.

EmissionKind values: None, Suppressed, Activated, Cleared, Acknowledged, Confirmed, Shelved, Unshelved, Enabled, Disabled, CommentAdded.

Message templates

MessageTemplate (MessageTemplate.cs) resolves {path} placeholders in the configured message at emission time. Syntax:

  • {path/with/slashes} — brace-stripped contents are looked up via the engine's tag cache.
  • No escaping. Literal braces in messages are not currently supported.
  • ExtractTokenPaths(template) is called at LoadAsync so the engine subscribes to every referenced path (ensuring the value cache is populated before the first resolve).

Fallback rules: a resolved DataValueSnapshot with a non-zero StatusCode, a null Value, or an unknown path becomes {?}. The event still fires — the operator sees where the reference broke rather than having the alarm swallowed.

Input-quality policy

Predicate evaluation and message-template resolution deliberately treat tag-input quality differently:

Surface Quality bar Rationale
ScriptedAlarmEngine.AreInputsReady (predicate gate) Bad rejected (StatusCode bit 31 set). Good and Uncertain are both accepted. Uncertain quality still carries a value the predicate can inspect; rejecting it would mask a transitional alarm condition. Predicate evaluation is a state-machine input — operators want it to track reality as closely as the quality allows.
MessageTemplate.Resolve (operator-facing message) Any non-zero StatusCode rejected — only Good substitutes; Uncertain / Bad / unknown all render as {?}. The message is a human-readable signal; substituting an Uncertain value would let operators act on a questionable reading without seeing the qualifier. Rendering {?} makes the doubt explicit.

AlarmPredicateContext.GetTag returns a BadNodeIdUnknown (0x80340000) snapshot for missing or empty paths, so a typo in the predicate flows through AreInputsReady (Bad → predicate skipped, prior state held) and MessageTemplate.Resolve (non-Good → {?}) without crashing the engine. (Core.ScriptedAlarms-010)

State persistence

IAlarmStateStore (IAlarmStateStore.cs) is the persistence contract: LoadAsync(alarmId), LoadAllAsync, SaveAsync(state), RemoveAsync(alarmId). InMemoryAlarmStateStore in the same file is the default for tests and dev deployments without a SQL backend. Stream E wires the production implementation against the ScriptedAlarmState config-DB table with audit logging through Core.Abstractions.IAuditLogger.

Persisted scope per plan decision #14: Enabled, Acked, Confirmed, Shelving, LastTransitionUtc, the LastAck* / LastConfirm* audit fields, and the append-only Comments list. Active is not trusted across restart — the engine re-runs the predicate at LoadAsync so operators never re-ack an alarm that was already acknowledged before an outage, and alarms whose condition cleared during downtime settle to Inactive without a spurious clear-event.

Every mutation the state machine produces is immediately persisted inside the engine's _evalGate semaphore, so the store's view is always consistent with the in-memory state.

Source integration

ScriptedAlarmSource (ScriptedAlarmSource.cs) adapts the engine to the driver-agnostic IAlarmSource interface. The existing AlarmSurfaceInvoker + GenericDriverNodeManager fan-out consumes it the same way it consumes Galaxy / AB CIP / FOCAS sources — there is no scripted-alarm-specific code path in the server plumbing. From that point on, the flow into AlarmConditionState nodes, the AlarmAck session check, and the Historian sink is shared — see AlarmTracking.md.

Two mapping notes specific to this adapter:

  • SubscribeAlarmsAsync accepts a list of source-node-id filters, interpreted as Equipment-path prefixes. Empty list matches every alarm. Each emission is matched against every live subscription — the adapter keeps no per-subscription cursor.
  • IAlarmSource.AcknowledgeAsync does not carry a user identity. The adapter defaults the audit user to "opcua-client" so callers using the base interface still produce an audit entry. The server's Part 9 method handlers (Stream G) call the engine's richer AcknowledgeAsync / ConfirmAsync / OneShotShelveAsync / TimedShelveAsync / UnshelveAsync / AddCommentAsync directly with the authenticated principal instead.

Emissions map into AlarmEventArgs as AlarmType = Kind.ToString(), SourceNodeId = EquipmentPath, ConditionId = AlarmId, Message = resolved template string, Severity carried verbatim, SourceTimestampUtc = emission time.

Composition

Phase7EngineComposer.Compose (src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs) is the single call site that instantiates the engine. It takes the generation's Script / VirtualTag / ScriptedAlarm rows, the shared CachedTagUpstreamSource, an IAlarmStateStore, and an IAlarmHistorianSink, and returns a Phase7ComposedSources the caller owns. When scriptedAlarms.Count > 0:

  1. ProjectScriptedAlarms resolves each row's PredicateScriptId against the script dictionary and produces a ScriptedAlarmDefinition list. Unknown or disabled scripts throw immediately — the DB publish guarantees referential integrity but this is a belt-and-braces check.
  2. A ScriptedAlarmEngine is constructed with the upstream source, the store, a shared ScriptLoggerFactory keyed to scripts-*.log, and the root Serilog logger.
  3. alarmEngine.OnEvent is wired to RouteToHistorianAsync, which projects each emission into an AlarmHistorianEvent and enqueues it on the sink. Fire-and-forget — the SQLite store-and-forward sink is already non-blocking.
  4. LoadAsync(alarmDefs) runs synchronously on the startup thread: it compiles every predicate, subscribes to the union of predicate inputs and message-template tokens, seeds the value cache, loads persisted state, re-derives ActiveState from a fresh predicate evaluation, and starts the 5s shelving timer. Compile failures are aggregated into one InvalidOperationException so operators see every bad predicate in one startup log line rather than one at a time.
  5. A ScriptedAlarmSource is created for the event stream, and a ScriptedAlarmReadable (src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/ScriptedAlarmReadable.cs) is created for OPC UA variable reads on the alarm's active-state node (task #245) — unknown alarm ids return BadNodeIdUnknown rather than silently reading false.

Both engine and source are added to Phase7ComposedSources.Disposables, which Phase7Composer disposes on server shutdown.

Key source files

  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmEngine.cs — orchestrator, cascade wiring, shelving timer, OnEvent emission
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmSource.csIAlarmSource adapter over the engine
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/ScriptedAlarmDefinition.cs — runtime definition record
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/Part9StateMachine.cs — pure-function state machine + TransitionResult / EmissionKind
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmConditionState.cs — persisted state record + AlarmComment audit entry + ShelvingState
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmPredicateContext.cs — script-side ScriptContext (read-only, write rejected)
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/AlarmTypes.csAlarmKind + the four Part 9 enums
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/MessageTemplate.cs{path} placeholder resolver
  • src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/IAlarmStateStore.cs — persistence contract + InMemoryAlarmStateStore default
  • src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs — composition, config-row projection, historian routing
  • src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/ScriptedAlarmReadable.csIReadable adapter exposing ActiveState to OPC UA variable reads