Files
lmxopcua/docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime-design.md
T

230 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Script-log Engine Emit + Scripted-Alarm Runtime — Design
> **Status:** approved 2026-06-10 (brainstorming). Next: implementation plan
> (`writing-plans`). Companion gap analysis: `pending.md` (repo root, verified
> against `master @ ac1e1dfd`).
## Goal
Make the Script-log page show real script output, and stand up scripted alarms
end-to-end — including real OPC UA Part 9 condition nodes and client-initiated
acknowledge/shelve. Delivered in three sequenced layers.
## Problem / current state (verified)
- The Script-log page (`ScriptLog.razor`) + transport (`IInProcessBroadcaster`
`ScriptLogSignalRBridge``script-logs` DPS topic) are fully built, but a healthy
script's `ctx.Logger.*` output never reaches them — the Roslyn evaluators inject a
**static** `SerilogLog.ForContext<…>()` into the script context. Only eval
*failures* publish a `ScriptLogEntry` (from `VirtualTagActor`). Hence the page note
*"No script-log entries yet. Engine emit (F8/F9) is pending."*
- `ScriptLoggerFactory` / `ScriptLogCompanionSink` / the `scripts-*.log` sink exist
but are **not wired into the Host** — the evaluators bypass them.
- **Scripted alarms do not run at all.** The Part-9-complete `ScriptedAlarmEngine` +
`ScriptedAlarmSource` are orphaned (never constructed); `DriverHostActor` wires only
the virtual-tag host. The new runtime materialises a placeholder `[active,
acknowledged]` variable per alarm; real Part 9 nodes are the unstarted **F14b /
#85** workstream.
- `phase-7-status.md` marks the alarm runtime "Done", but it describes a **superseded
architecture** (`OpcUaApplicationHost` / `Phase7EngineComposer` etc., all 0 hits
today). Trust the code, not that doc. Full detail + audit table in `pending.md`.
## Decisions (from brainstorming)
| # | Decision |
|---|---|
| D1 | **Full scope: all three layers** (shared emit · F9 engine runtime · F14b real Part 9 + client ack). |
| D2 | Build F9 on the **heavyweight `ScriptedAlarmEngine`** (Part-9-complete, tested), **not** the lightweight `ScriptedAlarmActor`. Retire `ScriptedAlarmActor` + `RoslynScriptedAlarmEvaluator`. |
| D3 | A **root script logger** (file + companion + new topic sink) is the **shared emit seam** for F8 *and* F9. Build once at Host startup, inject into both evaluators and the engine's `ScriptLoggerFactory`. |
| D4 | `ScriptLogTopicSink` gates at a **configurable minimum level, default `Information`** (Debug/Trace stay in the file, off the wire). |
| D5 | Composition carries the alarm predicate **source + dependency refs** via a new `EquipmentScriptedAlarmPlan` (parallel to `EquipmentVirtualTagPlan`), built byte-parity in **both** compose seams (`Phase7Composer` + `DeploymentArtifact`). |
| D6 | **Ack plumbing is grouped into Layer 2** — AdminUI ack/shelve and OPC-UA-client ack both route to the same `engine.AcknowledgeAsync(...)`. Layer 1 stays "alarms run + persist + historize + emit". |
**No Configuration entity / EF migration change** is required: the `ScriptedAlarmState`
table already exists; the EF store reads/writes it, and the composition enrichment is
in-memory plan types only.
## Architecture
The unifying seam is the **root script logger**:
```
root script logger ─┬→ scripts-*.log (rolling, all levels)
├→ ScriptLogCompanionSink (Error+ → main opcua-*.log)
└→ ScriptLogTopicSink (≥ min level) → IScriptLogPublisher
→ DPS "script-logs" → ScriptLogSignalRBridge
→ IInProcessBroadcaster → ScriptLog.razor
```
Inject it into both Roslyn evaluators (F8) and the `ScriptedAlarmEngine`'s
`ScriptLoggerFactory` (F9); every layer's script logging lights up the page for free.
```
Layer 0 Shared emit ScriptLogTopicSink + root logger + per-script ForContext [foundation]
Layer 1 F9 engine runtime ScriptedAlarmHostActor wraps ScriptedAlarmEngine [the payoff]
Layer 2 F14b real Part 9 real AlarmConditionState nodes + events + inbound ack [SDK epic]
```
---
## Layer 0 — Shared script-log emit + F8 live
**Outcome:** a healthy virtual-tag script's `ctx.Logger.Information(...)` shows on the
page in ~½s.
**New:**
- `ScriptLogTopicSink : Serilog.Core.ILogEventSink` (`Core.Scripting`) — reads
`ScriptId`/`VirtualTagId`/`AlarmId`/`EquipmentId` off the `LogEvent`, builds a
`ScriptLogEntry`, hands it to an injected `IScriptLogPublisher`. Min-level gate (D4).
- `IScriptLogPublisher` (`Core.Scripting`) — `void Publish(ScriptLogEntry entry)`.
Keeps Akka out of Core.
- `DpsScriptLogPublisher : IScriptLogPublisher` (Runtime/Host) — holds the
`ActorSystem` mediator; `Publish``Mediator.Tell(new Publish("script-logs", entry))`.
**Changed:**
- Host startup builds the **root script logger** (file + companion + topic sink),
registers it in DI.
- `RoslynVirtualTagEvaluator` + `RoslynScriptedAlarmEvaluator` take the root logger
instead of the static field; per evaluation `ForContext` the identity
(`ScriptId`/`VirtualTagId`/`EquipmentId`) and inject into the script context.
Requires threading `scriptId`+`equipmentId` to the evaluator per call (small
extension; `VirtualTagId` already present — in the live path `scriptId ==
virtualTagId` today).
- `ScriptLoggerFactory` gains a binding for the standard property set (so the engine's
per-alarm logger carries `AlarmId`/`EquipmentId` the topic sink understands).
**No regression:** `VirtualTagActor`'s existing failure `PublishLog` stays (catches
compile errors/timeouts the script can't log) — distinct messages, no double-emit.
**Tests (xUnit+Shouldly):** sink props→entry mapping + min-level gate + null props;
root-logger fan-out (Error→all three sinks, Debug→file only); evaluator emits via a
fake `IScriptLogPublisher` when a script logs. Live-verify: author a logging vtag in
docker-dev.
---
## Layer 1 — F9 engine runtime
**Outcome:** scripted alarms run — predicates evaluate against live tags, state
persists, transitions drive the alarm node + historian + Alerts page, predicate logs
hit the page (via Layer 0).
**Host:** `ScriptedAlarmHostActor` (new, Runtime) — child of `DriverHostActor`, spawned
where `VirtualTagHostActor` is. Owns one `ScriptedAlarmEngine`; on `ApplyScriptedAlarms`
calls `engine.LoadAsync(defs)`; disposes engine on stop.
**Supporting pieces:**
1. `DependencyMuxTagUpstreamSource : ITagUpstreamSource` (new) — host registers
interest with `DependencyMuxActor` for the union of alarm dep refs; each
`DependencyValueChanged` pushes into the adapter cache and fires the engine's
`SubscribeTag` observers. `ReadTag` = cache lookup (Bad if absent). Values wrapped
as `DataValueSnapshot` (Good) like the vtag path.
2. `EfAlarmConditionStateStore : IAlarmStateStore` (new) — persists `AlarmConditionState`
↔ the existing `ScriptedAlarmState` table (enabled/acked/confirmed/shelving +
`ShelvingExpiresUtc` + LastAck*/LastConfirm*/CommentsJson; **ActiveState re-derived,
not stored**). Mirrors `EfAlarmActorStateStore`; uses `IDbContextFactory<OtOpcUaConfigDbContext>`.
3. **Composition enrichment** (D5) — new `EquipmentScriptedAlarmPlan(ScriptedAlarmId,
EquipmentId, Name, AlarmType, Severity, MessageTemplate, PredicateScriptId,
PredicateSource, DependencyRefs, HistorizeToAveva, Retain, Enabled)`. `PredicateSource`
resolved `PredicateScriptId → Script.SourceCode`; `DependencyRefs` =
`DependencyExtractor.Extract(source).Reads` message-template token paths. Built in
`Phase7Composer.Compose` (live DB) and `DeploymentArtifact` (artifact encode/decode),
byte-parity. `DriverHostActor` → `ApplyScriptedAlarms(composition.EquipmentScriptedAlarms)`.
4. **Engine→outputs bridge** (host `engine.OnEvent` handler): map `Condition` →
`(active, acknowledged)` → `OpcUaPublishActor.WriteAlarmState(AlarmId, …)`; if
`HistorizeToAveva` → `HistorianAdapterActor`; publish `AlarmTransitionEvent` on the
`alerts` topic. **Script-log emit is automatic** — the host passes Layer 0's root
logger into the engine's `ScriptLoggerFactory` (the Layer 0↔1 join).
**Retire:** `ScriptedAlarmActor`, `RoslynScriptedAlarmEvaluator`, the F9b DI
registration in `Program.cs`.
```
deploy → EquipmentScriptedAlarmPlan(source+deps) → ScriptedAlarmHostActor.LoadAsync
tag change → DependencyMux → adapter → engine predicate eval → Part9StateMachine
engine.OnEvent ─┬→ WriteAlarmState (active/ack node)
├→ HistorianAdapterActor (if Historize)
├→ alerts topic (Alerts page)
└→ per-alarm logger → [Layer 0] → Script-log page
```
**Tests:** EF store round-trip (in-memory EF); upstream adapter push→observer;
composition enrichment + `Phase7Composer`↔`DeploymentArtifact` parity (same discipline
as `{{equip}}`); host actor TestKit (apply → tag change → asserts WriteAlarmState /
historian / alerts emitted). Engine internals already tested. Live-verify: author an
alarm, drive its tag, watch the node flip + historian queue + predicate logs.
---
## Layer 2 — F14b real Part 9 + client ack
**Outcome:** alarms become real Part 9 conditions — clients see them in event
subscriptions and can Acknowledge/Shelve/Confirm; the placeholder variable is retired.
**SDK-heavy epic (issue #85).** All SDK address-space work funnels through
`OpcUaPublishActor` on the pinned dispatcher. **SDK specifics (creating
`AlarmConditionState`, the event model, method-handler wiring) confirmed via the
DeepWiki MCP (`OPCFoundation/UA-.NETStandard`) during planning**, not assumed here.
**Components:**
1. **Real condition nodes** — materialise a proper `AlarmConditionState` (or the
subtype per `AlarmType`: Limit/OffNormal/Discrete) under the equipment node with the
standard sub-properties (ActiveState/AckedState/ConfirmedState/EnabledState/
ShelvingState/Severity/Retain/Comment). Replaces `WriteAlarmState`'s flat variable.
2. **State → condition** — the Layer 1 bridge now carries the full `AlarmConditionState`
and sets it on the real node.
3. **Event firing** — `condition.ReportEvent(...)` on each transition so subscribers get
the alarm (AlarmsAndConditions).
4. **Inbound method dispatch + ack plumbing (D6)** — wire
`Acknowledge`/`Confirm`/`AddComment`/`OneShotShelve`/`TimedShelve`/`Unshelve` →
`engine.<Op>Async(conditionId, principal, comment, ct)` with the **authenticated
session principal**. Both the OPC UA client path and `ScriptedAlarms.razor` route to
the same engine methods; engine transition → persist → emit → node update + event.
5. **Permission gating** — method calls gate at the `AlarmAck` tier (LDAP-group → OPC-UA
permission map); Confirm/Shelve/AddComment equivalently.
```
OPC UA client ─┐ ┌→ engine updates AlarmConditionState
AdminUI page ──┴→ engine.AcknowledgeAsync ┤ → OpcUaPublishActor → real condition node
(principal, AlarmAck gate)└→ condition.ReportEvent → client event stream
```
**Tests:** engine-state→condition-state mapping; method→engine routing (fake engine) +
permission gating; SDK node/event behaviour proven by **Client.CLI** alarm subscribe +
ack round-trip in docker-dev. Highest-risk layer; most live-verification dependent.
---
## Cross-cutting
**Hard rules (carry into the plan):**
- Stage by explicit path — never `git add .`. Never stage `sql_login.txt` or
`src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`. Never echo the gateway API key into a new
tracked file. No force-push, no `--no-verify`.
- **No Configuration entity / EF migration change** (the `ScriptedAlarmState` table
already exists).
- Agent must **not** sign in to the AdminUI for live verification — the user signs in.
- Razor/JS proven only by live docker-dev `/run`; everything else unit-tested
(xUnit + Shouldly, in-memory EF, Akka TestKit). No bUnit.
- Build on a feature branch off `master`; planning docs (this design + the plan +
`.tasks.json`) committed to `master` per established pattern.
**Touched code (indicative — plan nails exact files):**
- Layer 0: `Core.Scripting/` (new sink + publisher iface + factory binding),
`Host/Program.cs` (root logger wiring + retire static loggers), `Host/Engines/Roslyn*Evaluator.cs`,
`Runtime` (DPS publisher).
- Layer 1: `Runtime/ScriptedAlarms/` (host actor, upstream adapter, EF store),
`OpcUaServer/Phase7Composer.cs` + `Runtime/Drivers/DeploymentArtifact.cs` (enriched plan),
`Runtime/Drivers/DriverHostActor.cs` (spawn + apply), retire `ScriptedAlarmActor` +
`RoslynScriptedAlarmEvaluator`.
- Layer 2: `OpcUaServer/OtOpcUaNodeManager.cs` (real condition nodes + methods),
`Runtime/OpcUa/OpcUaPublishActor.cs` (richer alarm message), security gate,
`AdminUI/Components/Pages/ScriptedAlarms.razor` (ack/shelve control).
**Out of scope:** virtual-tag historization production sink (Gap 5 / B.6 — separate);
Phase-7 Playwright E2E (F.7).
**Sequencing:** Layer 0 is the low-risk foundation and independently shippable. Layer 1
is the self-contained "alarms work" payoff. Layer 2 is the SDK epic — largest, highest
risk, most live-verification dependent.