Files
lmxopcua/docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime-design.md
T

13 KiB
Raw Blame History

Script-log Engine Emit + Scripted-Alarm Runtime — Design

Status: approved 2026-06-10 (brainstorming). Next: implementation plan (writing-plans). Companion gap analysis: pending.md (repo root, verified against master @ ac1e1dfd).

Goal

Make the Script-log page show real script output, and stand up scripted alarms end-to-end — including real OPC UA Part 9 condition nodes and client-initiated acknowledge/shelve. Delivered in three sequenced layers.

Problem / current state (verified)

  • The Script-log page (ScriptLog.razor) + transport (IInProcessBroadcasterScriptLogSignalRBridgescript-logs DPS topic) are fully built, but a healthy script's ctx.Logger.* output never reaches them — the Roslyn evaluators inject a static SerilogLog.ForContext<…>() into the script context. Only eval failures publish a ScriptLogEntry (from VirtualTagActor). Hence the page note "No script-log entries yet. Engine emit (F8/F9) is pending."
  • ScriptLoggerFactory / ScriptLogCompanionSink / the scripts-*.log sink exist but are not wired into the Host — the evaluators bypass them.
  • Scripted alarms do not run at all. The Part-9-complete ScriptedAlarmEngine + ScriptedAlarmSource are orphaned (never constructed); DriverHostActor wires only the virtual-tag host. The new runtime materialises a placeholder [active, acknowledged] variable per alarm; real Part 9 nodes are the unstarted F14b / #85 workstream.
  • phase-7-status.md marks the alarm runtime "Done", but it describes a superseded architecture (OpcUaApplicationHost / Phase7EngineComposer etc., all 0 hits today). Trust the code, not that doc. Full detail + audit table in pending.md.

Decisions (from brainstorming)

# Decision
D1 Full scope: all three layers (shared emit · F9 engine runtime · F14b real Part 9 + client ack).
D2 Build F9 on the heavyweight ScriptedAlarmEngine (Part-9-complete, tested), not the lightweight ScriptedAlarmActor. Retire ScriptedAlarmActor + RoslynScriptedAlarmEvaluator.
D3 A root script logger (file + companion + new topic sink) is the shared emit seam for F8 and F9. Build once at Host startup, inject into both evaluators and the engine's ScriptLoggerFactory.
D4 ScriptLogTopicSink gates at a configurable minimum level, default Information (Debug/Trace stay in the file, off the wire).
D5 Composition carries the alarm predicate source + dependency refs via a new EquipmentScriptedAlarmPlan (parallel to EquipmentVirtualTagPlan), built byte-parity in both compose seams (Phase7Composer + DeploymentArtifact).
D6 Ack plumbing is grouped into Layer 2 — AdminUI ack/shelve and OPC-UA-client ack both route to the same engine.AcknowledgeAsync(...). Layer 1 stays "alarms run + persist + historize + emit".

No Configuration entity / EF migration change is required: the ScriptedAlarmState table already exists; the EF store reads/writes it, and the composition enrichment is in-memory plan types only.

Architecture

The unifying seam is the root script logger:

root script logger ─┬→ scripts-*.log (rolling, all levels)
                    ├→ ScriptLogCompanionSink (Error+ → main opcua-*.log)
                    └→ ScriptLogTopicSink (≥ min level) → IScriptLogPublisher
                          → DPS "script-logs" → ScriptLogSignalRBridge
                          → IInProcessBroadcaster → ScriptLog.razor

Inject it into both Roslyn evaluators (F8) and the ScriptedAlarmEngine's ScriptLoggerFactory (F9); every layer's script logging lights up the page for free.

Layer 0  Shared emit          ScriptLogTopicSink + root logger + per-script ForContext   [foundation]
Layer 1  F9 engine runtime    ScriptedAlarmHostActor wraps ScriptedAlarmEngine            [the payoff]
Layer 2  F14b real Part 9     real AlarmConditionState nodes + events + inbound ack        [SDK epic]

Layer 0 — Shared script-log emit + F8 live

Outcome: a healthy virtual-tag script's ctx.Logger.Information(...) shows on the page in ~½s.

New:

  • ScriptLogTopicSink : Serilog.Core.ILogEventSink (Core.Scripting) — reads ScriptId/VirtualTagId/AlarmId/EquipmentId off the LogEvent, builds a ScriptLogEntry, hands it to an injected IScriptLogPublisher. Min-level gate (D4).
  • IScriptLogPublisher (Core.Scripting) — void Publish(ScriptLogEntry entry). Keeps Akka out of Core.
  • DpsScriptLogPublisher : IScriptLogPublisher (Runtime/Host) — holds the ActorSystem mediator; PublishMediator.Tell(new Publish("script-logs", entry)).

Changed:

  • Host startup builds the root script logger (file + companion + topic sink), registers it in DI.
  • RoslynVirtualTagEvaluator + RoslynScriptedAlarmEvaluator take the root logger instead of the static field; per evaluation ForContext the identity (ScriptId/VirtualTagId/EquipmentId) and inject into the script context. Requires threading scriptId+equipmentId to the evaluator per call (small extension; VirtualTagId already present — in the live path scriptId == virtualTagId today).
  • ScriptLoggerFactory gains a binding for the standard property set (so the engine's per-alarm logger carries AlarmId/EquipmentId the topic sink understands).

No regression: VirtualTagActor's existing failure PublishLog stays (catches compile errors/timeouts the script can't log) — distinct messages, no double-emit.

Tests (xUnit+Shouldly): sink props→entry mapping + min-level gate + null props; root-logger fan-out (Error→all three sinks, Debug→file only); evaluator emits via a fake IScriptLogPublisher when a script logs. Live-verify: author a logging vtag in docker-dev.


Layer 1 — F9 engine runtime

Outcome: scripted alarms run — predicates evaluate against live tags, state persists, transitions drive the alarm node + historian + Alerts page, predicate logs hit the page (via Layer 0).

Host: ScriptedAlarmHostActor (new, Runtime) — child of DriverHostActor, spawned where VirtualTagHostActor is. Owns one ScriptedAlarmEngine; on ApplyScriptedAlarms calls engine.LoadAsync(defs); disposes engine on stop.

Supporting pieces:

  1. DependencyMuxTagUpstreamSource : ITagUpstreamSource (new) — host registers interest with DependencyMuxActor for the union of alarm dep refs; each DependencyValueChanged pushes into the adapter cache and fires the engine's SubscribeTag observers. ReadTag = cache lookup (Bad if absent). Values wrapped as DataValueSnapshot (Good) like the vtag path.
  2. EfAlarmConditionStateStore : IAlarmStateStore (new) — persists AlarmConditionState ↔ the existing ScriptedAlarmState table (enabled/acked/confirmed/shelving + ShelvingExpiresUtc + LastAck*/LastConfirm*/CommentsJson; ActiveState re-derived, not stored). Mirrors EfAlarmActorStateStore; uses IDbContextFactory<OtOpcUaConfigDbContext>.
  3. Composition enrichment (D5) — new EquipmentScriptedAlarmPlan(ScriptedAlarmId, EquipmentId, Name, AlarmType, Severity, MessageTemplate, PredicateScriptId, PredicateSource, DependencyRefs, HistorizeToAveva, Retain, Enabled). PredicateSource resolved PredicateScriptId → Script.SourceCode; DependencyRefs = DependencyExtractor.Extract(source).Reads message-template token paths. Built in Phase7Composer.Compose (live DB) and DeploymentArtifact (artifact encode/decode), byte-parity. DriverHostActorApplyScriptedAlarms(composition.EquipmentScriptedAlarms).
  4. Engine→outputs bridge (host engine.OnEvent handler): map Condition(active, acknowledged)OpcUaPublishActor.WriteAlarmState(AlarmId, …); if HistorizeToAvevaHistorianAdapterActor; publish AlarmTransitionEvent on the alerts topic. Script-log emit is automatic — the host passes Layer 0's root logger into the engine's ScriptLoggerFactory (the Layer 0↔1 join).

Retire: ScriptedAlarmActor, RoslynScriptedAlarmEvaluator, the F9b DI registration in Program.cs.

deploy → EquipmentScriptedAlarmPlan(source+deps) → ScriptedAlarmHostActor.LoadAsync
tag change → DependencyMux → adapter → engine predicate eval → Part9StateMachine
   engine.OnEvent ─┬→ WriteAlarmState (active/ack node)
                   ├→ HistorianAdapterActor (if Historize)
                   ├→ alerts topic (Alerts page)
                   └→ per-alarm logger → [Layer 0] → Script-log page

Tests: EF store round-trip (in-memory EF); upstream adapter push→observer; composition enrichment + Phase7ComposerDeploymentArtifact parity (same discipline as {{equip}}); host actor TestKit (apply → tag change → asserts WriteAlarmState / historian / alerts emitted). Engine internals already tested. Live-verify: author an alarm, drive its tag, watch the node flip + historian queue + predicate logs.


Layer 2 — F14b real Part 9 + client ack

Outcome: alarms become real Part 9 conditions — clients see them in event subscriptions and can Acknowledge/Shelve/Confirm; the placeholder variable is retired.

SDK-heavy epic (issue #85). All SDK address-space work funnels through OpcUaPublishActor on the pinned dispatcher. SDK specifics (creating AlarmConditionState, the event model, method-handler wiring) confirmed via the DeepWiki MCP (OPCFoundation/UA-.NETStandard) during planning, not assumed here.

Components:

  1. Real condition nodes — materialise a proper AlarmConditionState (or the subtype per AlarmType: Limit/OffNormal/Discrete) under the equipment node with the standard sub-properties (ActiveState/AckedState/ConfirmedState/EnabledState/ ShelvingState/Severity/Retain/Comment). Replaces WriteAlarmState's flat variable.
  2. State → condition — the Layer 1 bridge now carries the full AlarmConditionState and sets it on the real node.
  3. Event firingcondition.ReportEvent(...) on each transition so subscribers get the alarm (AlarmsAndConditions).
  4. Inbound method dispatch + ack plumbing (D6) — wire Acknowledge/Confirm/AddComment/OneShotShelve/TimedShelve/Unshelveengine.<Op>Async(conditionId, principal, comment, ct) with the authenticated session principal. Both the OPC UA client path and ScriptedAlarms.razor route to the same engine methods; engine transition → persist → emit → node update + event.
  5. Permission gating — method calls gate at the AlarmAck tier (LDAP-group → OPC-UA permission map); Confirm/Shelve/AddComment equivalently.
OPC UA client ─┐                          ┌→ engine updates AlarmConditionState
AdminUI page ──┴→ engine.AcknowledgeAsync  ┤   → OpcUaPublishActor → real condition node
                  (principal, AlarmAck gate)└→ condition.ReportEvent → client event stream

Tests: engine-state→condition-state mapping; method→engine routing (fake engine) + permission gating; SDK node/event behaviour proven by Client.CLI alarm subscribe + ack round-trip in docker-dev. Highest-risk layer; most live-verification dependent.


Cross-cutting

Hard rules (carry into the plan):

  • Stage by explicit path — never git add .. Never stage sql_login.txt or src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/. Never echo the gateway API key into a new tracked file. No force-push, no --no-verify.
  • No Configuration entity / EF migration change (the ScriptedAlarmState table already exists).
  • Agent must not sign in to the AdminUI for live verification — the user signs in.
  • Razor/JS proven only by live docker-dev /run; everything else unit-tested (xUnit + Shouldly, in-memory EF, Akka TestKit). No bUnit.
  • Build on a feature branch off master; planning docs (this design + the plan + .tasks.json) committed to master per established pattern.

Touched code (indicative — plan nails exact files):

  • Layer 0: Core.Scripting/ (new sink + publisher iface + factory binding), Host/Program.cs (root logger wiring + retire static loggers), Host/Engines/Roslyn*Evaluator.cs, Runtime (DPS publisher).
  • Layer 1: Runtime/ScriptedAlarms/ (host actor, upstream adapter, EF store), OpcUaServer/Phase7Composer.cs + Runtime/Drivers/DeploymentArtifact.cs (enriched plan), Runtime/Drivers/DriverHostActor.cs (spawn + apply), retire ScriptedAlarmActor + RoslynScriptedAlarmEvaluator.
  • Layer 2: OpcUaServer/OtOpcUaNodeManager.cs (real condition nodes + methods), Runtime/OpcUa/OpcUaPublishActor.cs (richer alarm message), security gate, AdminUI/Components/Pages/ScriptedAlarms.razor (ack/shelve control).

Out of scope: virtual-tag historization production sink (Gap 5 / B.6 — separate); Phase-7 Playwright E2E (F.7).

Sequencing: Layer 0 is the low-risk foundation and independently shippable. Layer 1 is the self-contained "alarms work" payoff. Layer 2 is the SDK epic — largest, highest risk, most live-verification dependent.