Compare commits

...

4 Commits

Author SHA1 Message Date
Joseph Doherty
f2c1cc84e9 Phase 7 plan doc — scripting runtime + virtual tags + scripted alarms + historian alarm sink. Draft output from the 2026-04-20 interactive planning session. Phase 7 is the last phase before v2 release readiness; adds two additive runtime capabilities on top of the existing driver + Equipment address-space foundation: (1) virtual (calculated) tags — OPC UA variables whose values are computed by user-authored C# scripts against other tags, evaluated on change and/or timer, living in the existing Equipment tree alongside driver tags, behaving identically to clients; (2) Part 9 scripted alarms — full state machine (EnabledState/ActiveState/AckedState/ConfirmedState/ShelvingState) with persistent operator-supplied state across restarts, complementing (not replacing) the existing Galaxy-native and AB CIP ALMD alarm sources. A third tie-in capability — Aveva Historian as alarm system of record — routes every qualifying alarm transition from any IAlarmSource (scripted + Galaxy + ALMD) through a local SQLite store-and-forward queue to Galaxy.Host, which uses its already-loaded aahClientManaged DLLs to write to the Historian alarm schema; per-alarm HistorizeToAveva toggle gates which sources flow (default off for Galaxy-native to avoid duplicating the direct Galaxy historian path, default on for scripted).
Locks in 22 design decisions from the planning conversation: C# via Roslyn scripting; virtual tags in the Equipment tree (not a separate /Virtual/ namespace); change-driven + timer-driven triggers operator-configurable per tag; Shape A one-script-per-tag-or-alarm (no predicate/action split); full OPC UA Part 9 alarm fidelity; read-only sandbox (scripts read any tag, write only to virtual tags, no File/HttpClient/Process/reflection); AST-inferred dependencies via CSharpSyntaxWalker (non-literal tag paths rejected at publish); config DB storage with generation-sealed cache; ctx.GetTag returns a full DataValue {Value, StatusCode, Timestamp}; per-tag Historize checkbox; per-tag error isolation (throwing script sets tag quality BadInternalError, engine unaffected); dedicated scripts-*.log Serilog sink bound to ctx.Logger; alarm message as template with {TagPath} substitution resolved at event emission; ActiveState recomputed from tags on startup while EnabledState/AckedState/ConfirmedState/ShelvingState + audit persist to config DB; historian sink scope = all IAlarmSource impls with per-alarm toggle; SQLite store-and-forward on the node so operators are never blocked by Historian downtime; IPC to Galaxy.Host for ingestion reusing the already-loaded aahClientManaged DLLs; Monaco editor for Admin code editing; serial cascade evaluation for v1 (parallel as follow-up); shelving UX via OPC UA method calls only with no custom Admin controls (operator drives state transitions from plant HMIs or Client.CLI); 30-day dead-letter retention with manual retry button; test harness accepts only declared-input paths so the harness enforces dependency declaration.

Eight streams totaling ~10-12 weeks, scope-comparable to Phase 6: A - Core.Scripting (Roslyn engine + sandbox + AST inference + logger); B - virtual tag engine (dependency graph + change/timer schedulers + historize); C - scripted alarm engine (Part 9 state machine + template messages + startup recovery + OPC UA method binding); D - historian alarm sink (SQLite store-and-forward + Galaxy.Host IPC contract extension); E - config DB schema (four new tables under sp_PublishGeneration); F - Admin UI scripting tab (Monaco + test harness + dependency preview + script-log viewer + historian diagnostics); G - address-space integration (extend EquipmentNodeWalker for virtual source kind + extend DriverNodeManager dispatch); H - exit gate.

Compliance-check surface covers sandbox escape (typeof/Assembly.Load/File/HttpClient attempts must fail at compile), dependency inference (literal-only paths), change cascade (topological ordering), cycle rejection at publish, startup recovery (ack/confirm/shelve survive restart but ActiveState recomputed), ack audit trail persistence, historian queue durability (Galaxy.Host offline → online drains in-order), per-alarm historian toggle gating, script timeout isolation, log sink isolation, ACL binding (virtual tags inherit Equipment scope grants).

Follow-up artifacts tracked as tasks #231-#238 (stream placeholders). Supporting doc updates (plan.md §6 Migration Strategy, config-db-schema.md §§ for the four new tables, driver-specs.md §Alarm semantics clarification, new ADR-002 for driver-vs-virtual dispatch) will land alongside the streams that touch them, not in this doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:05:12 -04:00
8384e58655 Merge pull request 'Modbus exception-injection profile — wire-level coverage for codes 0x01/0x03/0x04/0x05/0x06/0x0A/0x0B' (#174) from modbus-exception-injection-profile into v2 2026-04-20 15:14:00 -04:00
Joseph Doherty
96940aeb24 Modbus exception-injection profile — closes the end-to-end test gap for exception codes 0x01/0x03/0x04/0x05/0x06/0x0A/0x0B. pymodbus simulator naturally emits only 0x02 (Illegal Data Address on reads outside configured ranges) + 0x03 (Illegal Data Value on over-length); the driver's MapModbusExceptionToStatus table translates eight codes, but only 0x02 had integration-level coverage (via DL205's unmapped-register test). Unit tests lock the translation function in isolation but an integration test was missing for everything else. This PR lands wire-level coverage for the remaining seven codes without depending on device-specific quirks to naturally produce them.
New exception_injector.py — standalone pure-Python-stdlib Modbus/TCP server shipped alongside the pymodbus image. Speaks the wire protocol directly (MBAP header parse + FC 01/02/03/04/05/06/15/16 dispatch + store-backed happy-path reads/writes + spec-enforced length caps) and looks up each (fc, starting-address) against a rules list loaded from JSON; a matching rule makes the server respond [fc|0x80, exception_code] instead of the normal response. Zero runtime dependencies outside the stdlib — the Dockerfile just COPY's the script into /fixtures/ alongside the pymodbus profile JSONs, no new pip install needed. ~200 lines. New exception_injection.json profile carries rules for every exception code on FC03 (addresses 1000-1007, one per code), FC06 (2000-2001 for CPU-PROGRAM-mode and busy), and FC16 (3000 for server failure). New exception_injection compose profile binds :5020 like every other service + runs python /fixtures/exception_injector.py --config /fixtures/exception_injection.json.

New ExceptionInjectionTests.cs in Modbus.IntegrationTests — 11 tests. Eight FC03-read theories exercise every exception code 0x01/0x02/0x03/0x04/0x05/0x06/0x0A/0x0B asserting the driver's expected OPC UA StatusCode mapping (BadNotSupported/BadOutOfRange/BadOutOfRange/BadDeviceFailure/BadDeviceFailure/BadDeviceFailure/BadCommunicationError/BadCommunicationError). Two FC06-write theories cover the write path for 0x04 (Server Failure, CPU in PROGRAM mode) + 0x06 (Server Busy). One sanity-check read at address 5 confirms the injector isn't globally broken + non-injected reads round-trip cleanly with Value=5/StatusCode=Good. All tests follow the MODBUS_SIM_PROFILE=exception_injection skip guard so they no-op on a fresh clone without Docker running.

Docker/README.md gains an §Exception injection section explaining what pymodbus can and cannot emit, what the injector does, where the rules live, and how to append new ones. docs/drivers/Modbus-Test-Fixture.md follow-up item #2 (extend pymodbus profiles to inject exceptions) gets a shipped strikethrough with the new coverage inventory; the unit-level section adds ExceptionInjectionTests next to DL205ExceptionCodeTests so the split-of-responsibilities is explicit (DL205 test = natural out-of-range via dl205 profile, ExceptionInjectionTests = every other code via the injector).

Test baselines: Modbus unit 182/182 green (unchanged); Modbus integration with exception_injection profile live 11/11 new tests green. Existing DL205/S7/Mitsubishi integration tests unaffected since they skip on MODBUS_SIM_PROFILE mismatch.

Found + fixed during validation: a stale native pymodbus simulator from April 18 was still listening on port 5020 on IPv6 localhost (Windows was load-balancing between it + the Docker IPv4 forward, making injected exceptions intermittently come back as pymodbus's default 0x02). Killed the leftover. Documented the debugging path in the commit as a note for anyone who hits the same "my tests see exception 0x02 but the injector log has no request" symptom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:11:32 -04:00
340f580be0 Merge pull request 'FOCAS Tier-C PR E — ops glue: ProcessHostLauncher + post-mortem MMF + NSSM scripts' (#173) from focas-tier-c-pr-e-ops-glue into v2 2026-04-20 14:26:35 -04:00
8 changed files with 680 additions and 5 deletions

View File

@@ -34,7 +34,8 @@ shaped (neither is a Modbus-side concept).
- `DL205SmokeTests` — FC16 write → FC03 read round-trip on holding register
- `DL205CoilMappingTests` — Y-output / C-relay / X-input address mapping
(octal → Modbus offset)
- `DL205ExceptionCodeTests` — Modbus exception → OPC UA StatusCode mapping
- `DL205ExceptionCodeTests` — Modbus exception 0x02 → OPC UA `BadOutOfRange` against the dl205 profile (natural out-of-range path)
- `ExceptionInjectionTests` — every other exception code in the mapping table (0x01 / 0x03 / 0x04 / 0x05 / 0x06 / 0x0A / 0x0B) against the `exception_injection` profile on both read + write paths
- `DL205FloatCdabQuirkTests` — CDAB word-swap float encoding
- `DL205StringQuirkTests` — packed-string V-memory layout
- `DL205VMemoryQuirkTests` — V-memory octal addressing
@@ -103,8 +104,13 @@ Not a Modbus concept. Driver doesn't implement `IAlarmSource` or
1. Add `MODBUS_SIM_ENDPOINT` override documentation to
`docs/v2/test-data-sources.md` so operators can point the suite at a lab rig.
2. Extend `pymodbus` profiles to inject exception responses — a JSON flag per
register saying "next read returns exception 0x04."
2. ~~Extend `pymodbus` profiles to inject exception responses~~**shipped**
via the `exception_injection` compose profile + standalone
`exception_injector.py` server. Rules in
`Docker/profiles/exception_injection.json` map `(fc, address)` to an
exception code; `ExceptionInjectionTests` exercises every code in
`MapModbusExceptionToStatus` (0x01 / 0x02 / 0x03 / 0x04 / 0x05 / 0x06 /
0x0A / 0x0B) end-to-end on both read (FC03) and write (FC06) paths.
3. Add an FX5U profile once a lab rig is available; the scaffolding is in place.
## Key fixture / config files

View File

@@ -0,0 +1,190 @@
# Phase 7 — Scripting Runtime, Virtual Tags, and Scripted Alarms
> **Status**: DRAFT — planning output from the 2026-04-20 interactive planning session. Pending review before work begins. Task #230 tracks the draft; #231#238 are the stream placeholders.
>
> **Branch**: `v2/phase-7-scripting-and-alarming`
> **Estimated duration**: 1012 weeks (scope-comparable to Phase 6; largest single phase outside Phase 2 Galaxy split)
> **Predecessor**: Phase 6.4 (Admin UI completion) — reuses the tab-plugin pattern + draft/publish flow
> **Successor**: v2 release-readiness capstone
## Phase Objective
Add two **additive** runtime capabilities on top of the existing driver + Equipment address-space foundation:
1. **Virtual (calculated) tags** — OPC UA variables whose values are computed by user-authored C# scripts against other tags (driver or virtual), evaluated on change and/or timer. They live in the existing Equipment/UNS tree alongside driver tags and behave identically to clients (browse, subscribe, historize).
2. **Scripted alarms** — OPC UA Part 9 alarms whose condition is a user-authored C# predicate. Full state machine (EnabledState / ActiveState / AckedState / ConfirmedState / ShelvingState) with persistent operator-supplied state across restarts. Complement the existing Galaxy-native and AB CIP ALMD alarm sources — they do not replace them.
Tie-in capability — **historian alarm sink**:
3. **Aveva Historian as alarm system of record** — every qualifying alarm transition (activation, ack, confirm, clear, shelve, disable, comment) from **any `IAlarmSource`** (scripted + Galaxy + ALMD) routes through a new local SQLite store-and-forward queue to Galaxy.Host, which uses its already-loaded `aahClientManaged` DLLs to write to the Historian's alarm schema. Per-alarm `HistorizeToAveva` toggle gates which sources flow (default off for Galaxy-native since Galaxy itself already historizes them). Plant operators query one uniform historical alarm timeline.
**Why it's additive, not a rewrite**: every `IAlarmSource` implementation shipped in Phase 6.x stays unchanged; scripted alarms register as an additional source in the existing fan-out. The Equipment node walker built in ADR-001 gains a "virtual" source kind alongside "driver" without removing anything. Operator-facing semantics for existing driver tags and alarms are unchanged.
## Design Decisions (locked in the 2026-04-20 planning session)
| # | Decision | Rationale |
|---|---------|-----------|
| 1 | Script language = **C# via Roslyn scripting** | Developer audience, strong typing, AST walkable for dependency inference, existing .NET 10 runtime in main server. |
| 2 | Virtual tags live in the **Equipment tree** alongside driver tags (not a separate `/Virtual/...` namespace) | Operator mental model stays unified; calculated `LineRate` shows up under the Line1 folder next to the driver-sourced `SpeedSetpoint` it's derived from. |
| 3 | Evaluation trigger = **change-driven + timer-driven**; operator chooses per-tag | Change-driven is cheap at steady state; timer is the escape hatch for polling derivations that don't have a discrete "input changed" signal. |
| 4 | Script shape = **Shape A — one script per virtual tag/alarm**; `return` produces the value (or `bool` for alarm condition) | Minimal surface; no predicate/action split. Alarm side-effects (severity, message) configured out-of-band, not in the script. |
| 5 | Alarm fidelity = **full OPC UA Part 9** | Uniform with Galaxy + ALMD on the wire; client-side tooling (HMIs, historians, event pipelines) gets one shape. |
| 6 | Sandbox = **read-only context**; scripts can only read any tag + write to virtual tags | Strict Roslyn `ScriptOptions` allow-list. No HttpClient / File / Process / reflection. |
| 7 | Dependency declaration = **AST inference**; operator doesn't maintain a separate dependency list | `CSharpSyntaxWalker` extracts `ctx.GetTag("path")` string-literal calls at compile time; dynamic paths rejected at publish. |
| 8 | Config storage = **config DB with generation-sealed cache** (same as driver instances) | Virtual tags + alarms publish atomically in the same generation as the driver instance config they may depend on. |
| 9 | Script return value shape (`ctx.GetTag`) = **`DataValue { Value, StatusCode, Timestamp }`** | Scripts branch on quality naturally without separate `ctx.GetQuality(...)` calls. |
| 10 | Historize virtual tags = **per-tag checkbox** | Writes flow through the same history-write path as driver tags. Consumed by existing `IHistoryProvider`. |
| 11 | Per-tag error isolation — a throwing script sets that tag's quality to `BadInternalError`; engine keeps running for every other tag | Mirrors Phase 6.1 Stream B's per-surface error handling. |
| 12 | Dedicated Serilog sink = `scripts-*.log` rolling file; structured-property `ScriptName` for filtering | Keeps noisy script logs out of the main `opcua-*.log`. `ctx.Logger.Info/Warning/Error/Debug` bound in the script context. |
| 13 | Alarm message = **template with substitution** (`"Reactor temp {Reactor/Temp} exceeded {Limit}"`) | Middle ground between static and separate message-script; engine resolves `{path}` tokens at event emission. |
| 14 | Alarm state persistence — `ActiveState` recomputed from tag values on startup; `EnabledState / AckedState / ConfirmedState / ShelvingState` + audit trail persist to config DB | Operators don't re-ack after restart; ack history survives for compliance (GxP / 21 CFR Part 11). |
| 15 | Historian sink scope = **all `IAlarmSource` implementations**, not just scripted; per-alarm `HistorizeToAveva` toggle | Plant gets one consolidated alarm timeline; Galaxy-native alarms default off to avoid duplication. |
| 16 | Historian failure mode = **SQLite store-and-forward queue on the node**; config DB is source of truth, Historian is best-effort projection | Operators never blocked by Historian downtime; failed writes queue + retry when Historian recovers. |
| 17 | Historian ingestion path = **IPC to Galaxy.Host**, which calls the already-loaded `aahClientManaged` DLLs | Reuses existing bitness / licensing / Tier-C isolation. No new 32-bit DLL load in the main server. |
| 18 | Admin UI code editor = **Monaco** via the Admin project's asset pipeline | Industry default for C# editing in a browser; ~3 MB bundle acceptable given Admin is operator-facing only, not public. Revisitable if bundle size becomes a deployment constraint. |
| 19 | Cascade evaluation order = **serial** for v1; parallel promoted to a Phase 7 follow-up | Deterministic, easier to reason about, simplifies cycle + ordering bugs in the rollout. Parallel becomes a tuning knob when real 1000+ virtual-tag deployments measure contention. |
| 20 | Shelving UX = **OPC UA method calls only** (`OneShotShelve` / `TimedShelve` / `Unshelve` on the `AlarmConditionType` node); **no Admin UI shelve controls** | Plant HMIs + OPC UA clients already speak these methods by spec; reinventing the UI adds surface without operator value. Admin still renders current shelve state + audit trail read-only on the alarm detail page. |
| 21 | Dead-lettered historian events retained for **30 days** in the SQLite queue; Admin `/alarms/historian` exposes a "Retry dead-lettered" button | Long enough for a Historian outage or licensing glitch to be resolved + operator to investigate; short enough that the SQLite file doesn't grow unbounded. Configurable via `AlarmHistorian:DeadLetterRetentionDays` for deployments with stricter compliance windows. |
| 22 | Test harness synthetic inputs = **declared inputs only** (from the AST walker's extracted dependency set) | Enforces the dependency declaration — if a path can't be supplied to the harness, the AST walker didn't see it and the script can't reference it at runtime. Catches dependency-inference drift at test time, not publish time. |
## Scope — What Changes
| Concern | Change |
|---------|--------|
| **New project `OtOpcUa.Core.Scripting`** (.NET 10) | Roslyn-based script engine. Compiles user C# scripts with a sandboxed `ScriptOptions` allow-list (numeric / string / datetime / `ScriptContext` API only — no reflection / File / Process / HttpClient). `DependencyExtractor` uses `CSharpSyntaxWalker` to enumerate `ctx.GetTag("...")` literal-string calls; rejects non-literal paths at publish time. Per-script compile cache keyed by source hash. Per-evaluation timeout. Exception in script → tag goes `BadInternalError`; engine unaffected for other tags. `ctx.Logger` is a Serilog `ILogger` bound to the `scripts-*.log` rolling sink with structured property `ScriptName`. |
| **New project `OtOpcUa.Core.VirtualTags`** (.NET 10) | `VirtualTagEngine` consumes the `DependencyExtractor` output, builds a topological dependency graph spanning driver tags + other virtual tags (cycle detection at publish time), schedules re-evaluation on change + on timer, propagates results through an `IVirtualTagSource` that implements `IReadable` + `ISubscribable` so `DriverNodeManager` routes reads / subscriptions uniformly. Per-tag `Historize` flag routes to the same history-write path driver tags use. |
| **New project `OtOpcUa.Core.ScriptedAlarms`** (.NET 10) | `ScriptedAlarmEngine` materializes each configured alarm as an OPC UA `AlarmConditionType` (or `LimitAlarmType` / `OffNormalAlarmType`). On startup, re-evaluates every predicate against current tag values to rebuild `ActiveState` — no persistence needed for the active flag. Persistent state: `EnabledState`, `AckedState`, `ConfirmedState`, `ShelvingState`, branch stack, ack audit (user/time/comment). Template message substitution resolves `{TagPath}` tokens at event emission. Ack / Confirm / Shelve method nodes bound to the engine; transitions audit-logged via the existing `IAuditLogger` (Phase 6.2). Registers as an additional `IAlarmSource` — no change to the existing fan-out. |
| **New project `OtOpcUa.Core.AlarmHistorian`** (.NET 10) | `IAlarmHistorianSink` abstraction + `SqliteStoreAndForwardSink` default implementation. Every qualifying `IAlarmSource` emission (per-alarm `HistorizeToAveva` toggle) persists to a local SQLite queue (`%ProgramData%\OtOpcUa\alarm-historian-queue.db`). Background drain worker reads unsent rows + forwards over IPC to Galaxy.Host. Failed writes keep the row pending with exponential backoff. Queue capacity bounded (default 1M events, oldest-dropped with a structured warning log). |
| **`Driver.Galaxy.Shared`** — new IPC contracts | `HistorianAlarmEventRequest` (activation / ack / confirm / clear / shelve / disable / comment payloads matching the Aveva Historian alarm schema) + `HistorianAlarmEventResponse` (ack / retry-please / permanent-fail). `HistorianConnectivityStatusNotification` so the main server can surface "Historian disconnected" on the Admin `/hosts` page. |
| **`Driver.Galaxy.Host`** — new frame handler for alarm writes | Reuses the already-loaded `aahClientManaged.dll` + `aahClientCommon.dll`. Maps the IPC request DTOs to the historian SDK's alarm-event API (exact method TBD during Stream D.2 — needs a live-historian smoke to confirm the right SDK entry point). Errors map to structured response codes so the main server's backoff logic can distinguish "transient" from "permanent". |
| **Config DB schema** — new tables | `VirtualTag (Id, EquipmentPath, Name, DataType, IntervalMs?, ChangeTriggerEnabled, Historize, ScriptId)`; `Script (Id, SourceCode, CompiledHash, Language='CSharp')`; `ScriptedAlarm (Id, EquipmentPath, Name, AlarmType, Severity, MessageTemplate, HistorizeToAveva, PredicateScriptId)`; `ScriptedAlarmState (AlarmId, EnabledState, AckedState, ConfirmedState, ShelvingState, ShelvingExpiresUtc?, LastAckUser, LastAckComment, LastAckUtc, BranchStack_JSON)`. Every write goes through `sp_PublishGeneration` + `IAuditLogger`. |
| **Address-space build** — Phase 6 `EquipmentNodeWalker` extension | Emits virtual-tag nodes alongside driver-sourced nodes under the same Equipment folder. `NodeScopeResolver` gains a `Virtual` source kind alongside `Driver`. `DriverNodeManager` dispatch routes reads / writes / subscriptions to the `VirtualTagEngine` when the source is virtual. |
| **Admin UI** — new tabs | `/virtual-tags` and `/scripted-alarms` tabs under the existing draft/publish flow. Monaco-based C# code editor (syntax highlighting, IntelliSense against a hand-written type stub for `ScriptContext`). Dependency preview panel shows the inferred input list from the AST walker. Test-harness lets operator supply synthetic `DataValue` inputs + see script output + logger emissions without publishing. Per-alarm controls: `AlarmType`, `Severity`, `MessageTemplate`, `HistorizeToAveva`. New `/alarms/historian` diagnostics view: queue depth, drain rate, last-successful-write, per-alarm "last routed to historian" timestamp. |
| **`DriverTypeRegistry`** — no change | Scripting is not a driver — it doesn't register as a `DriverType`. The engine hangs off the same `SealedBootstrap` as drivers but through a different composition root. |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| Existing `IAlarmSource` implementations (Galaxy, AB CIP ALMD) | Scripted alarms register as an *additional* source; existing sources pass through unchanged. Default `HistorizeToAveva=false` for Galaxy alarms avoids duplicating records the Galaxy historian wiring already captures. |
| Driver capability surface (`IReadable` / `IWritable` / `ISubscribable` / etc.) | Virtual tags implement the same interfaces — drivers and virtual tags are interchangeable from the node manager's perspective. No new capability. |
| Config DB publication flow (`sp_PublishGeneration` + sealed cache) | Virtual tag + alarm tables plug in as additional rows. Atomic publish semantics unchanged. |
| Authorization trie (Phase 6.2) | Virtual-tag nodes inherit the Equipment scope's grants — same treatment as the Phase 6.4 Identification sub-folder. No new scope level. |
| Tier-C isolation topology | Scripting engine runs in the main .NET 10 server process. Roslyn scripts are already sandboxed via `ScriptOptions`; no need for process isolation because they have no unmanaged reach. Galaxy.Host's existing Tier-C boundary already owns the historian SDK writes. |
| Galaxy alarm ingestion path into the historian | Galaxy writes alarms directly via `aahClientManaged` today; Phase 7 Stream D gives it a *second* path (via the new sink) when a Galaxy alarm has `HistorizeToAveva=true`, but the direct path stays for the default case. |
| OPC UA wire protocol / AddressSpace schema | Clients see new nodes under existing folders + new alarm conditions. No new namespaces, no new ObjectTypes beyond what Part 9 already defines. |
## Entry Gate Checklist
- [ ] All Phase 6.x exit gates cleared (#133, #142, #151, #158)
- [ ] Equipment node walker wired into `DriverNodeManager` (task #212 — done)
- [ ] `IAuditLogger` surface live (Phase 6.2 Stream A)
- [ ] `sp_PublishGeneration` + sealed-cache flow verified on the existing driver-config tables
- [ ] Dev Aveva Historian reachable from the dev box (for Stream D.2 smoke)
- [ ] `v2` branch clean + baseline tests green
- [ ] Blazor editor component library picked (Monaco confirmed vs alternatives — see decision to log)
- [ ] Review this plan — decisions #1#17 signed off, no open questions
## Task Breakdown
### Stream A — `Core.Scripting` (Roslyn engine + sandbox + AST inference + logger) — **2 weeks**
1. **A.1** Project scaffold + NuGet `Microsoft.CodeAnalysis.CSharp.Scripting`. `ScriptOptions` allow-list (`typeof(object).Assembly`, `typeof(Enumerable).Assembly`, the Core.Scripting assembly itself — nothing else). Hand-written `ScriptContext` base class with `GetTag(string)` / `SetVirtualTag(string, object)` / `Logger` / `Now` / `Deadband(double, double, double)` helpers.
2. **A.2** `DependencyExtractor : CSharpSyntaxWalker`. Visits every `InvocationExpressionSyntax` targeting `ctx.GetTag` / `ctx.SetVirtualTag`; accepts only a `LiteralExpressionSyntax` argument. Non-literal arguments (concat, variable, method call) → publish-time rejection with an actionable error pointing the operator at the exact span. Outputs `IReadOnlySet<string> Inputs` + `IReadOnlySet<string> Outputs`.
3. **A.3** Compile cache. `(source_hash) → compiled Script<T>`. Recompile only when source changes. Warm on `SealedBootstrap`.
4. **A.4** Per-evaluation timeout wrapper (default 250ms; configurable per tag). Timeout = tag quality `BadInternalError` + structured warning log. Keeps a single runaway script from starving the engine.
5. **A.5** Serilog sink wiring. New `scripts-*.log` rolling file enricher; `ctx.Logger` returns an `ILogger` with `ForContext("ScriptName", ...)`. Main `opcua-*.log` gets a companion entry at WARN level if a script logs ERROR, so the operator sees it in the primary log.
6. **A.6** Tests: AST extraction unit tests (30+ cases covering literal / concat / variable / null / method-returned paths); sandbox escape tests (attempt `typeof`, `Assembly.Load`, `File.OpenRead` — all must fail at compile); exception isolation (throwing script doesn't kill the engine); timeout behavior; logger structured-property binding.
### Stream B — Virtual tag engine (dependency graph + change/timer schedulers + historize) — **1.5 weeks**
1. **B.1** `VirtualTagEngine`. Ingests the set of compiled scripts + their inputs/outputs; builds a directed dependency graph (driver tag ID → virtual tag ID → virtual tag ID). Cycle detection at publish-time via Tarjan; publish rejects with a clear error message listing the cycle.
2. **B.2** `ChangeTriggerDispatcher`. Subscribes to every referenced driver tag via the existing `ISubscribable` fan-out. On a `DataValueSnapshot` delta (value / status / timestamp — any of the three), enqueues affected virtual tags for re-evaluation in topological order.
3. **B.3** `TimerTriggerDispatcher`. Per-tag `IntervalMs` scheduled via a shared timer-wheel. Independent of change triggers — a tag can have both, either, or neither.
4. **B.4** `EvaluationPipeline`. Serial evaluation per cascade (parallel promoted to a follow-up — avoids cross-tag ordering bugs on first rollout). Exception handling per A.4; propagates results via `IVirtualTagSource`.
5. **B.5** `IVirtualTagSource` implementation. Implements `IReadable` + `ISubscribable`. Reads return the most recent evaluated value; subscriptions receive `OnDataChange` events on each re-evaluation.
6. **B.6** History routing. Per-tag `Historize` flag emits the value + timestamp to the existing history-write path used by drivers.
7. **B.7** Tests: dependency graph (happy + cycle); change cascade through two levels of virtual tags; timer-only tag ignores input changes; change + timer both configured; error propagation; historize on/off.
### Stream C — Scripted alarm engine + Part 9 state machine + template messages — **2.5 weeks**
1. **C.1** Alarm config model + `ScriptedAlarmEngine` skeleton. Alarms materialize as `AlarmConditionType` (or subtype — `LimitAlarm`, `OffNormal`) nodes under their configured Equipment path. Severity loaded from config.
2. **C.2** `Part9StateMachine`. Tracks `EnabledState`, `ActiveState`, `AckedState`, `ConfirmedState`, `ShelvingState` per condition ID. Shelving has `OneShotShelving` + `TimedShelving` variants + an `UnshelveTime` timer.
3. **C.3** Predicate evaluation. On any input change (same trigger mechanism as Stream B), run the `bool` predicate. On `false → true` transition, activate (increment branch stack if prior Ack-but-not-Confirmed state exists). On `true → false`, clear (but keep condition visible if retain flag set).
4. **C.4** Startup recovery. For every configured alarm, run the predicate against current tag values to rebuild `ActiveState` *only*. Load `EnabledState` / `AckedState` / `ConfirmedState` / `ShelvingState` + audit from the `ScriptedAlarmState` table. No re-acknowledgment required for conditions that were acked before restart.
5. **C.5** Template substitution. Engine resolves `{TagPath}` tokens in `MessageTemplate` at event emission time using current tag values. Unresolvable tokens (bad path, missing tag) emit a structured error log + substitute `{?}` so the event still fires.
6. **C.6** OPC UA method binding. `Acknowledge`, `Confirm`, `AddComment`, `OneShotShelve`, `TimedShelve`, `Unshelve` methods on each condition node route to the engine + persist via audit-logged writes to `ScriptedAlarmState`.
7. **C.7** `IAlarmSource` implementation. Emits Part 9-shaped events through the existing fan-out the `AlarmTracker` composes.
8. **C.8** Tests: every transition (all 32 state combinations the state machine can produce); startup recovery (seed table with varied ack/confirm/shelve state, restart, verify correct recovery); template substitution (literal path, nested path, bad path); shelving timer expiry; OPC UA method calls via Client.CLI.
### Stream D — Historian alarm sink (SQLite store-and-forward + Galaxy.Host IPC) — **2 weeks**
1. **D.1** `Core.AlarmHistorian` project. `IAlarmHistorianSink` interface; `SqliteStoreAndForwardSink` default implementation using Microsoft.Data.Sqlite. Schema: `Queue (RowId, AlarmId, EventType, PayloadJson, EnqueuedUtc, LastAttemptUtc?, AttemptCount, DeadLettered)`. Queue capacity bounded; oldest-dropped on overflow with structured warning.
2. **D.2** **Live-historian smoke** against the dev box's Aveva Historian. Identify the exact `aahClientManaged` alarm-write API entry point (likely `IAlarmsDatabase.WriteAlarmEvent` or equivalent — verify with a throwaway Galaxy.Host test hook). Document in a short `docs/v2/historian-alarm-api.md` artifact.
3. **D.3** `Driver.Galaxy.Shared` contract additions. `HistorianAlarmEventRequest` / `HistorianAlarmEventResponse` / `HistorianConnectivityStatusNotification`. Round-trip tests in `Driver.Galaxy.Shared.Tests`.
4. **D.4** `Driver.Galaxy.Host` handler. Translates incoming `HistorianAlarmEventRequest` to the SDK call identified in D.2. Returns structured response (Ack / RetryPlease / PermanentFail). Connectivity notifications sent proactively when the SDK's session drops.
5. **D.5** Drain worker in the main server. Polls the SQLite queue; batches up to 100 events per IPC round-trip; exponential backoff on `RetryPlease` (1s → 2s → 5s → 15s → 60s cap); `PermanentFail` dead-letters the row + structured error log.
6. **D.6** Per-alarm toggle wired through: `HistorizeToAveva` column on both `ScriptedAlarm` + a new `AlarmHistorizationPolicy` projection the Galaxy / ALMD alarm sources consult (default `false` for Galaxy, `true` for scripted, operator-adjustable per-alarm).
7. **D.7** `/alarms/historian` diagnostics view in Admin. Queue depth, drain rate, last-successful-write, last-error, per-alarm last-routed timestamp.
8. **D.8** Tests: SQLite queue round-trip; drain worker with fake IPC (success / retry / perm-fail); overflow eviction; Galaxy.Host handler against a stub historian API; end-to-end with the live historian on the dev box (non-CI — operator-invoked).
### Stream E — Config DB schema + generation-sealed cache extensions — **1 week**
1. **E.1** EF migration for new tables. Foreign keys from `VirtualTag.ScriptId` / `ScriptedAlarm.PredicateScriptId` to `Script.Id`.
2. **E.2** `sp_PublishGeneration` extension. Sealed-cache snapshot includes virtual tags + scripted alarms + their scripts. Atomic publish guarantees the address-space build sees a consistent view.
3. **E.3** CRUD services. `VirtualTagService`, `ScriptedAlarmService`, `ScriptService`. Each audit-logged; Ack / Confirm / Shelve persist through `ScriptedAlarmStateService` with full audit trail (who / when / comment / previous state).
4. **E.4** Tests: migration up / down; publish atomicity (concurrent writes to different alarm rows don't leak into an in-flight publish); audit trail on every mutation.
### Stream F — Admin UI scripting tab — **2 weeks**
1. **F.1** Monaco editor Razor component. CSS-isolated; loads Monaco via NPM + the Admin project's existing asset pipeline. C# syntax highlighting (Monaco ships it). IntelliSense via a hand-written `ScriptContext.cs` type stub delivered with the editor (not the compiled Core.Scripting DLL — keeps the browser bundle small).
2. **F.2** `/virtual-tags` tab. List view (Equipment path / Name / DataType / inputs-summary / Historize / actions). Edit pane splits: Monaco editor left, dependency preview panel right (live-updates from a debounced `/api/scripting/analyze` endpoint that runs the `DependencyExtractor`). Publish button gated by Phase 6.2 `WriteConfigure` permission.
3. **F.3** `/scripted-alarms` tab. Same editor shape + extra controls: AlarmType dropdown, Severity slider, MessageTemplate textbox with live-preview showing `{path}` token resolution against latest tag values, `HistorizeToAveva` checkbox. **Alarm detail page displays current `ShelvingState` + `LastAckUser / LastAckUtc / LastAckComment` read-only** — no shelve/unshelve / ack / confirm buttons per decision #20. Operators drive state transitions via OPC UA method calls from plant HMIs or the Client.CLI.
4. **F.4** Test harness. Modal that lets the operator supply synthetic `DataValue` inputs for the dependency set + see script output + logger emissions (rendered in a virtual terminal). Enables testing without publishing.
5. **F.5** Script log viewer. SignalR stream of the `scripts-*.log` sink filtered by the script under edit (using the structured `ScriptName` property). Tail-last-200 + "load more".
6. **F.6** `/alarms/historian` diagnostics view per Stream D.7.
7. **F.7** Playwright smoke. Author a calc tag, publish, verify it appears in the equipment tree via a probe OPC UA read. Author an alarm, verify it appears in `AlarmsAndConditions`.
### Stream G — Address-space integration — **1 week**
1. **G.1** `EquipmentNodeWalker` extension. Current walker iterates driver tags per equipment; extend to also iterate virtual tags + alarms. `NodeScopeResolver` returns `NodeSource.Virtual` for virtual nodes and `NodeSource.Driver` for existing.
2. **G.2** `DriverNodeManager` dispatch. Read / Write / Subscribe operations check the resolved source and route to `VirtualTagEngine` or the driver as appropriate. Writes to virtual tags allowed only from scripts (per decision #6) — OPC UA client writes to a virtual node return `BadUserAccessDenied`.
3. **G.3** `AlarmTracker` composition. The `ScriptedAlarmEngine` registers as an additional `IAlarmSource` — no new composition code, the existing fan-out already accepts multiple sources.
4. **G.4** Tests: mixed equipment folder (driver tag + virtual tag + driver-native alarm + scripted alarm) browsable via Client.CLI; read / subscribe round-trip for the virtual tag; scripted alarm transitions visible in the alarm event stream.
### Stream H — Exit gate — **1 week**
1. **H.1** Compliance script real-checks: schema migrations applied; new tables populated from a draft→publish cycle; sealed-generation snapshot includes virtual tags + alarms; SQLite alarm queue initialized; `scripts-*.log` sink emitting; `AlarmConditionType` nodes materialize in the address space; per-alarm `HistorizeToAveva` toggle enforced end-to-end.
2. **H.2** Full-solution `dotnet test` baseline. Target: Phase 6 baseline + ~300 new tests across Streams AG.
3. **H.3** `docs/v2/plan.md` Migration Strategy §6 update — add Phase 7.
4. **H.4** Phase-status memory update.
5. **H.5** Merge `v2/phase-7-scripting-and-alarming``v2`.
## Compliance Checks (run at exit gate)
- [ ] **Sandbox escape**: attempts to reference `System.IO.File`, `System.Net.Http.HttpClient`, `System.Diagnostics.Process`, or `typeof(X).Assembly.Load` fail at script compile with an actionable error.
- [ ] **Dependency inference**: `ctx.GetTag(myStringVar)` (non-literal path) is rejected at publish with a span-pointed error; `ctx.GetTag("Line1/Speed")` is accepted + appears in the inferred input set.
- [ ] **Change cascade**: tag A → virtual tag B → virtual tag C. When A changes, B recomputes, then C recomputes. Single change event triggers the full cascade in topological order within one evaluation pass.
- [ ] **Cycle rejection**: publish a config where virtual tag B depends on A and A depends on B. Publish fails pre-commit with a clear cycle message.
- [ ] **Startup recovery**: seed `ScriptedAlarmState` with one acked+confirmed alarm + one shelved alarm + one clean alarm, restart, verify operator does NOT see ack prompts for the first two, shelving remains in effect, clean alarm is clear.
- [ ] **Ack audit**: acknowledge an alarm; `IAuditLogger` captures user / timestamp / comment / prior state; row persists through restart.
- [ ] **Historian queue durability**: take Galaxy.Host offline, fire 10 alarm transitions, bring Galaxy.Host back; queue drains all 10 in order.
- [ ] **Per-alarm historian toggle**: Galaxy-native alarm with `HistorizeToAveva=false` does NOT enqueue; scripted alarm with `HistorizeToAveva=true` DOES enqueue.
- [ ] **Script timeout**: infinite-loop script times out at 250ms; tag quality `BadInternalError`; other tags unaffected.
- [ ] **Log isolation**: `ctx.Logger.Error("test")` lands in `scripts-*.log` with structured property `ScriptName=<name>`; main `opcua-*.log` gets a WARN companion entry.
- [ ] **ACL binding**: virtual tag under an Equipment scope inherits the Equipment's grants. User without the Equipment grant reads the virtual tag and gets `BadUserAccessDenied`.
## Decisions Resolved in Plan Review
Every open question from the initial draft was resolved in the 2026-04-20 plan review — see decisions #18#22 in the decisions table above. No pending questions block Stream A.
## References
- [`docs/v2/plan.md`](../plan.md) §6 Migration Strategy — add Phase 7 as the final additive phase before v2 release readiness.
- [`docs/v2/implementation/overview.md`](overview.md) — phase gate conventions.
- [`docs/v2/implementation/phase-6-2-authorization-runtime.md`](phase-6-2-authorization-runtime.md) — `IAuditLogger` surface reused for Ack/Confirm/Shelve + script edits.
- [`docs/v2/implementation/phase-6-4-admin-ui-completion.md`](phase-6-4-admin-ui-completion.md) — draft/publish flow, diff viewer, tab-plugin pattern reused.
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md) — Galaxy.Host IPC shape + shared-contract conventions reused for Stream D.
- [`docs/v2/driver-specs.md`](../driver-specs.md) §Alarm semantics — Part 9 fidelity requirements.
- [`docs/v2/driver-stability.md`](../driver-stability.md) — per-surface error handling, crash-loop breaker patterns Stream A.4 mirrors.
- [`docs/v2/config-db-schema.md`](../config-db-schema.md) — add a Phase 7 §§ for `VirtualTag`, `Script`, `ScriptedAlarm`, `ScriptedAlarmState`.

View File

@@ -15,6 +15,12 @@ RUN pip install --no-cache-dir "pymodbus[simulator]==3.13.0"
WORKDIR /fixtures
COPY profiles/ /fixtures/
# Standalone exception-injection server (pure Python stdlib — no pymodbus
# dependency). Speaks raw Modbus/TCP and emits arbitrary exception codes
# per rules in exception_injection.json. Drives the `exception_injection`
# compose profile. See Docker/README.md §exception injection.
COPY exception_injector.py /fixtures/
EXPOSE 5020
# Default to the standard profile; docker-compose.yml overrides per service.

View File

@@ -9,9 +9,10 @@ nothing else.
| File | Purpose |
|---|---|
| [`Dockerfile`](Dockerfile) | `python:3.12-slim-bookworm` + `pymodbus[simulator]==3.13.0` + the four profile JSONs |
| [`docker-compose.yml`](docker-compose.yml) | One service per profile (`standard` / `dl205` / `mitsubishi` / `s7_1500`); all bind `:5020` so only one runs at a time |
| [`Dockerfile`](Dockerfile) | `python:3.12-slim-bookworm` + `pymodbus[simulator]==3.13.0` + every profile JSON + `exception_injector.py` |
| [`docker-compose.yml`](docker-compose.yml) | One service per profile (`standard` / `dl205` / `mitsubishi` / `s7_1500` / `exception_injection`); all bind `:5020` so only one runs at a time |
| [`profiles/*.json`](profiles/) | Same seed-register definitions the native launcher uses — canonical source |
| [`exception_injector.py`](exception_injector.py) | Pure-stdlib Modbus/TCP server that emits arbitrary exception codes per rule — used by the `exception_injection` profile |
## Run
@@ -29,6 +30,10 @@ docker compose -f tests\ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests\Docker\
# Siemens S7-1500 MB_SERVER quirks
docker compose -f tests\ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests\Docker\docker-compose.yml --profile s7_1500 up
# Exception-injection — end-to-end coverage of every Modbus exception code
# (01/02/03/04/05/06/0A/0B), not just the 02 + 03 pymodbus emits naturally
docker compose -f tests\ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests\Docker\docker-compose.yml --profile exception_injection up
```
Detached + stop:
@@ -61,6 +66,36 @@ dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests
records a `SkipReason` when unreachable, so tests stay green on a fresh
clone without Docker running.
## Exception injection
pymodbus's simulator naturally emits only Modbus exception codes `0x02`
(Illegal Data Address, on reads outside its configured ranges) and
`0x03` (Illegal Data Value, on over-length requests). The driver's
`MapModbusExceptionToStatus` table translates eight codes: `0x01`,
`0x02`, `0x03`, `0x04`, `0x05`, `0x06`, `0x0A`, `0x0B`. Unit tests
lock the translation function; the integration side previously only
proved the wire-to-status path for `0x02`.
The `exception_injection` profile runs
[`exception_injector.py`](exception_injector.py) — a tiny standalone
Modbus/TCP server written against the Python stdlib (zero
dependencies outside what's in the base image). It speaks the wire
protocol directly (FC 01/02/03/04/05/06/15/16) and looks up each
incoming `(fc, address)` against the rules in
[`profiles/exception_injection.json`](profiles/exception_injection.json);
a matching rule makes the server reply with
`[fc | 0x80, exception_code]` instead of the normal response.
Current rules (see the JSON file for the canonical list):
- `FC03 @1000..1007` — one per exception code (`0x01`/`0x02`/`0x03`/`0x04`/`0x05`/`0x06`/`0x0A`/`0x0B`)
- `FC06 @2000..2001``0x04` Server Failure, `0x06` Server Busy (write-path coverage)
- `FC16 @3000``0x04` Server Failure (multi-register write path)
Adding more coverage is append-only: drop a new `{fc, address,
exception, description}` entry into the JSON, restart the service,
add an `[InlineData]` row in `ExceptionInjectionTests`.
## References
- [`docs/drivers/Modbus-Test-Fixture.md`](../../../docs/drivers/Modbus-Test-Fixture.md) — coverage map + gap inventory

View File

@@ -77,3 +77,24 @@ services:
"--modbus_device", "dev",
"--json_file", "/fixtures/s7_1500.json"
]
# Exception-injection profile. Runs the standalone pure-stdlib Modbus/TCP
# server shipped as exception_injector.py instead of the pymodbus
# simulator — pymodbus naturally emits only exception codes 02 + 03, and
# this profile extends integration coverage to the other codes the
# driver's MapModbusExceptionToStatus table handles (01, 04, 05, 06,
# 0A, 0B). Rules are driven by exception_injection.json.
exception_injection:
profiles: ["exception_injection"]
image: otopcua-pymodbus:3.13.0
build:
context: .
dockerfile: Dockerfile
container_name: otopcua-modbus-exception-injector
restart: "no"
ports:
- "5020:5020"
command: [
"python", "/fixtures/exception_injector.py",
"--config", "/fixtures/exception_injection.json"
]

View File

@@ -0,0 +1,261 @@
#!/usr/bin/env python3
"""
Minimal Modbus/TCP server that supports per-address + per-function-code
exception injection — the missing piece of the pymodbus simulator, which
only naturally emits exception code 02 (Illegal Data Address) via its
"invalid" list and 03 (Illegal Data Value) via spec-enforced length caps.
Integration tests against this fixture drive the driver's
`MapModbusExceptionToStatus` end-to-end over the wire for codes 01, 04,
05, 06, 0A, 0B — the ones the pymodbus simulator can't be configured to
return.
Wire protocol — straight Modbus/TCP (spec chapter 7.1):
MBAP header (7 bytes): [tx_id:u16 BE][proto=0:u16][length:u16][unit_id:u8]
then length-1 bytes of PDU. Length covers unit_id + PDU.
Supported function codes (enough for the driver's RMW + read paths):
01 Read Coils, 02 Read Discrete Inputs,
03 Read Holding Registers, 04 Read Input Registers,
05 Write Single Coil, 06 Write Single Register,
15 Write Multiple Coils, 16 Write Multiple Registers.
Config JSON schema (see exception_injection.json):
{
"listen": { "host": "0.0.0.0", "port": 5020 },
"seeds": { "hr": { "<addr>": <uint16>, ... },
"ir": { "<addr>": <uint16>, ... },
"co": { "<addr>": <0|1>, ... },
"di": { "<addr>": <0|1>, ... } },
"rules": [ { "fc": <int>, "address": <int>, "exception": <int>,
"description": "..." }, ... ]
}
Rules match on (fc, starting address). A matching rule wins and the server
responds with the PDU `[fc | 0x80, exception_code]`.
Zero runtime dependencies outside the Python stdlib so the Docker image
stays tiny.
"""
from __future__ import annotations
import argparse
import asyncio
import json
import logging
import struct
import sys
from dataclasses import dataclass
log = logging.getLogger("exception_injector")
@dataclass(frozen=True)
class Rule:
fc: int
address: int
exception: int
description: str = ""
class Store:
"""In-memory data store backing non-injected reads + writes."""
def __init__(self, seeds: dict[str, dict[str, int]]) -> None:
self.hr: dict[int, int] = {int(k): int(v) for k, v in seeds.get("hr", {}).items()}
self.ir: dict[int, int] = {int(k): int(v) for k, v in seeds.get("ir", {}).items()}
self.co: dict[int, int] = {int(k): int(v) for k, v in seeds.get("co", {}).items()}
self.di: dict[int, int] = {int(k): int(v) for k, v in seeds.get("di", {}).items()}
def read_bits(self, table: dict[int, int], addr: int, count: int) -> bytes:
"""Pack `count` bits LSB-first into the Modbus bit response body."""
bits = [table.get(addr + i, 0) & 1 for i in range(count)]
out = bytearray((count + 7) // 8)
for i, b in enumerate(bits):
if b:
out[i // 8] |= 1 << (i % 8)
return bytes(out)
def read_regs(self, table: dict[int, int], addr: int, count: int) -> bytes:
"""Pack `count` uint16 BE into the Modbus register response body."""
return b"".join(struct.pack(">H", table.get(addr + i, 0) & 0xFFFF) for i in range(count))
class Server:
EXC_ILLEGAL_FUNCTION = 0x01
EXC_ILLEGAL_DATA_ADDRESS = 0x02
EXC_ILLEGAL_DATA_VALUE = 0x03
def __init__(self, store: Store, rules: list[Rule]) -> None:
self._store = store
# Index rules by (fc, address) for O(1) lookup.
self._rules: dict[tuple[int, int], Rule] = {(r.fc, r.address): r for r in rules}
def lookup_rule(self, fc: int, address: int) -> Rule | None:
return self._rules.get((fc, address))
def exception_pdu(self, fc: int, code: int) -> bytes:
return bytes([fc | 0x80, code & 0xFF])
def handle_pdu(self, pdu: bytes) -> bytes:
if not pdu:
return self.exception_pdu(0, self.EXC_ILLEGAL_FUNCTION)
fc = pdu[0]
# Reads: FC 01/02/03/04 — [fc u8][addr u16][quantity u16]
if fc in (0x01, 0x02, 0x03, 0x04):
if len(pdu) != 5:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
addr, count = struct.unpack(">HH", pdu[1:5])
rule = self.lookup_rule(fc, addr)
if rule is not None:
log.info("inject fc=%d addr=%d -> exception 0x%02X (%s)",
fc, addr, rule.exception, rule.description)
return self.exception_pdu(fc, rule.exception)
# Spec caps — FC01/02 allow 1..2000 bits; FC03/04 allow 1..125 regs.
if fc in (0x01, 0x02):
if not 1 <= count <= 2000:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
body = self._store.read_bits(
self._store.co if fc == 0x01 else self._store.di, addr, count)
return bytes([fc, len(body)]) + body
if not 1 <= count <= 125:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
body = self._store.read_regs(
self._store.hr if fc == 0x03 else self._store.ir, addr, count)
return bytes([fc, len(body)]) + body
# FC05 — [fc u8][addr u16][value u16] where value is 0xFF00=ON or 0x0000=OFF.
if fc == 0x05:
if len(pdu) != 5:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
addr, value = struct.unpack(">HH", pdu[1:5])
rule = self.lookup_rule(fc, addr)
if rule is not None:
return self.exception_pdu(fc, rule.exception)
if value == 0xFF00:
self._store.co[addr] = 1
elif value == 0x0000:
self._store.co[addr] = 0
else:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
return pdu # FC05 echoes the request on success.
# FC06 — [fc u8][addr u16][value u16].
if fc == 0x06:
if len(pdu) != 5:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
addr, value = struct.unpack(">HH", pdu[1:5])
rule = self.lookup_rule(fc, addr)
if rule is not None:
return self.exception_pdu(fc, rule.exception)
self._store.hr[addr] = value
return pdu # FC06 echoes on success.
# FC15 — [fc u8][addr u16][count u16][byte_count u8][values...]
if fc == 0x0F:
if len(pdu) < 6:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
addr, count = struct.unpack(">HH", pdu[1:5])
rule = self.lookup_rule(fc, addr)
if rule is not None:
return self.exception_pdu(fc, rule.exception)
# Happy-path ignore-the-data, ack with standard response.
return struct.pack(">BHH", fc, addr, count)
# FC16 — [fc u8][addr u16][count u16][byte_count u8][u16 values...]
if fc == 0x10:
if len(pdu) < 6:
return self.exception_pdu(fc, self.EXC_ILLEGAL_DATA_VALUE)
addr, count = struct.unpack(">HH", pdu[1:5])
rule = self.lookup_rule(fc, addr)
if rule is not None:
return self.exception_pdu(fc, rule.exception)
byte_count = pdu[5]
data = pdu[6:6 + byte_count]
for i in range(count):
self._store.hr[addr + i] = struct.unpack(">H", data[i * 2:i * 2 + 2])[0]
return struct.pack(">BHH", fc, addr, count)
return self.exception_pdu(fc, self.EXC_ILLEGAL_FUNCTION)
async def handle_connection(self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
peer = writer.get_extra_info("peername")
log.info("client connected from %s", peer)
try:
while True:
hdr = await reader.readexactly(7)
tx_id, proto, length, unit_id = struct.unpack(">HHHB", hdr)
if length < 1:
return
pdu = await reader.readexactly(length - 1)
resp = self.handle_pdu(pdu)
out = struct.pack(">HHHB", tx_id, proto, len(resp) + 1, unit_id) + resp
writer.write(out)
await writer.drain()
except asyncio.IncompleteReadError:
log.info("client %s disconnected", peer)
except Exception: # pylint: disable=broad-except
log.exception("unexpected error serving %s", peer)
finally:
try:
writer.close()
await writer.wait_closed()
except Exception: # pylint: disable=broad-except
pass
def load_config(path: str) -> tuple[Store, list[Rule], str, int]:
with open(path, "r", encoding="utf-8") as fh:
raw = json.load(fh)
listen = raw.get("listen", {})
host = listen.get("host", "0.0.0.0")
port = int(listen.get("port", 5020))
store = Store(raw.get("seeds", {}))
rules = [
Rule(
fc=int(r["fc"]),
address=int(r["address"]),
exception=int(r["exception"]),
description=str(r.get("description", "")),
)
for r in raw.get("rules", [])
]
return store, rules, host, port
async def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--config", required=True, help="Path to exception-injection JSON config.")
args = parser.parse_args(argv)
logging.basicConfig(level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s - %(message)s")
store, rules, host, port = load_config(args.config)
server = Server(store, rules)
listener = await asyncio.start_server(server.handle_connection, host, port)
log.info("exception-injector listening on %s:%d with %d rule(s)", host, port, len(rules))
for r in rules:
log.info(" rule: fc=%d addr=%d -> exception 0x%02X (%s)",
r.fc, r.address, r.exception, r.description)
async with listener:
await listener.serve_forever()
return 0
if __name__ == "__main__":
try:
sys.exit(asyncio.run(main(sys.argv[1:])))
except KeyboardInterrupt:
sys.exit(0)

View File

@@ -0,0 +1,34 @@
{
"_comment": "Modbus exception-injection profile — feeds exception_injector.py (not pymodbus). Rules match by (fc, address). HR[0-31] are address-as-value for the happy-path reads; HR[1000..1010] + coils[2000..2010] carry per-exception-code rules. Every code in the driver's MapModbusExceptionToStatus table that pymodbus can't naturally emit has a dedicated slot. See Docker/README.md §exception injection.",
"listen": { "host": "0.0.0.0", "port": 5020 },
"seeds": {
"hr": {
"0": 0, "1": 1, "2": 2, "3": 3,
"4": 4, "5": 5, "6": 6, "7": 7,
"8": 8, "9": 9, "10": 10, "11": 11,
"12": 12, "13": 13, "14": 14, "15": 15,
"16": 16, "17": 17, "18": 18, "19": 19,
"20": 20, "21": 21, "22": 22, "23": 23,
"24": 24, "25": 25, "26": 26, "27": 27,
"28": 28, "29": 29, "30": 30, "31": 31
}
},
"rules": [
{ "fc": 3, "address": 1000, "exception": 1, "description": "FC03 @1000 -> Illegal Function (0x01)" },
{ "fc": 3, "address": 1001, "exception": 2, "description": "FC03 @1001 -> Illegal Data Address (0x02)" },
{ "fc": 3, "address": 1002, "exception": 3, "description": "FC03 @1002 -> Illegal Data Value (0x03)" },
{ "fc": 3, "address": 1003, "exception": 4, "description": "FC03 @1003 -> Server Failure (0x04)" },
{ "fc": 3, "address": 1004, "exception": 5, "description": "FC03 @1004 -> Acknowledge (0x05)" },
{ "fc": 3, "address": 1005, "exception": 6, "description": "FC03 @1005 -> Server Busy (0x06)" },
{ "fc": 3, "address": 1006, "exception": 10, "description": "FC03 @1006 -> Gateway Path Unavailable (0x0A)" },
{ "fc": 3, "address": 1007, "exception": 11, "description": "FC03 @1007 -> Gateway Target No Response (0x0B)" },
{ "fc": 6, "address": 2000, "exception": 4, "description": "FC06 @2000 -> Server Failure (0x04, e.g. CPU in PROGRAM mode)" },
{ "fc": 6, "address": 2001, "exception": 6, "description": "FC06 @2001 -> Server Busy (0x06)" },
{ "fc": 16, "address": 3000, "exception": 4, "description": "FC16 @3000 -> Server Failure (0x04)" }
]
}

View File

@@ -0,0 +1,122 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests;
/// <summary>
/// End-to-end verification that the driver's <c>MapModbusExceptionToStatus</c>
/// translation is wire-correct for every exception code in the mapping table —
/// not just 0x02, which is the only code the pymodbus simulator naturally emits.
/// Drives the standalone <c>exception_injector.py</c> server (<c>exception_injection</c>
/// compose profile) at each of the rule addresses in
/// <c>Docker/profiles/exception_injection.json</c> and asserts the driver surfaces
/// the expected OPC UA StatusCode.
/// </summary>
/// <remarks>
/// Why integration coverage on top of the unit tests: the unit tests prove the
/// translation function is correct; these prove the driver wires it through on
/// the read + write paths unchanged, after the MBAP header + PDU round-trip
/// (where a subtle framing bug could swallow or misclassify the exception).
/// </remarks>
[Collection(ModbusSimulatorCollection.Name)]
[Trait("Category", "Integration")]
[Trait("Device", "ExceptionInjection")]
public sealed class ExceptionInjectionTests(ModbusSimulatorFixture sim)
{
private const uint StatusGood = 0u;
private const uint StatusBadOutOfRange = 0x803C0000u;
private const uint StatusBadNotSupported = 0x803D0000u;
private const uint StatusBadDeviceFailure = 0x80550000u;
private const uint StatusBadCommunicationError = 0x80050000u;
private void SkipUnlessInjectorLive()
{
if (sim.SkipReason is not null) Assert.Skip(sim.SkipReason);
var profile = Environment.GetEnvironmentVariable("MODBUS_SIM_PROFILE");
if (!string.Equals(profile, "exception_injection", StringComparison.OrdinalIgnoreCase))
Assert.Skip("MODBUS_SIM_PROFILE != exception_injection — skipping. " +
"Start the fixture with --profile exception_injection.");
}
private async Task<IReadOnlyList<DataValueSnapshot>> ReadSingleAsync(int address, string tagName)
{
var opts = new ModbusDriverOptions
{
Host = sim.Host,
Port = sim.Port,
UnitId = 1,
Timeout = TimeSpan.FromSeconds(2),
Tags =
[
new ModbusTagDefinition(tagName,
ModbusRegion.HoldingRegisters, Address: (ushort)address,
DataType: ModbusDataType.UInt16, Writable: false),
],
Probe = new ModbusProbeOptions { Enabled = false },
};
await using var driver = new ModbusDriver(opts, driverInstanceId: "modbus-exc");
await driver.InitializeAsync("{}", TestContext.Current.CancellationToken);
return await driver.ReadAsync([tagName], TestContext.Current.CancellationToken);
}
[Theory]
[InlineData(1000, StatusBadNotSupported, "exc 0x01 (Illegal Function) -> BadNotSupported")]
[InlineData(1001, StatusBadOutOfRange, "exc 0x02 (Illegal Data Address) -> BadOutOfRange")]
[InlineData(1002, StatusBadOutOfRange, "exc 0x03 (Illegal Data Value) -> BadOutOfRange")]
[InlineData(1003, StatusBadDeviceFailure, "exc 0x04 (Server Failure) -> BadDeviceFailure")]
[InlineData(1004, StatusBadDeviceFailure, "exc 0x05 (Acknowledge / long op) -> BadDeviceFailure")]
[InlineData(1005, StatusBadDeviceFailure, "exc 0x06 (Server Busy) -> BadDeviceFailure")]
[InlineData(1006, StatusBadCommunicationError, "exc 0x0A (Gateway Path Unavailable) -> BadCommunicationError")]
[InlineData(1007, StatusBadCommunicationError, "exc 0x0B (Gateway Target No Response) -> BadCommunicationError")]
public async Task FC03_read_at_injection_address_surfaces_expected_status(
int address, uint expectedStatus, string scenario)
{
SkipUnlessInjectorLive();
var results = await ReadSingleAsync(address, $"Injected_{address}");
results[0].StatusCode.ShouldBe(expectedStatus, scenario);
}
[Fact]
public async Task FC03_read_at_non_injected_address_returns_Good()
{
// Sanity: HR[0..31] are seeded with address-as-value in the profile. A read at
// one of those addresses must come back Good (0) — otherwise the injector is
// misbehaving and every other assertion in this class is uninformative.
SkipUnlessInjectorLive();
var results = await ReadSingleAsync(address: 5, tagName: "Healthy_5");
results[0].StatusCode.ShouldBe(StatusGood);
results[0].Value.ShouldBe((ushort)5);
}
[Theory]
[InlineData(2000, StatusBadDeviceFailure, "exc 0x04 on FC06 -> BadDeviceFailure (CPU in PROGRAM mode)")]
[InlineData(2001, StatusBadDeviceFailure, "exc 0x06 on FC06 -> BadDeviceFailure (Server Busy)")]
public async Task FC06_write_at_injection_address_surfaces_expected_status(
int address, uint expectedStatus, string scenario)
{
SkipUnlessInjectorLive();
var tag = $"InjectedWrite_{address}";
var opts = new ModbusDriverOptions
{
Host = sim.Host,
Port = sim.Port,
UnitId = 1,
Timeout = TimeSpan.FromSeconds(2),
Tags =
[
new ModbusTagDefinition(tag,
ModbusRegion.HoldingRegisters, Address: (ushort)address,
DataType: ModbusDataType.UInt16, Writable: true),
],
Probe = new ModbusProbeOptions { Enabled = false },
};
await using var driver = new ModbusDriver(opts, driverInstanceId: "modbus-exc-write");
await driver.InitializeAsync("{}", TestContext.Current.CancellationToken);
var writes = await driver.WriteAsync(
[new WriteRequest(tag, (ushort)42)],
TestContext.Current.CancellationToken);
writes[0].StatusCode.ShouldBe(expectedStatus, scenario);
}
}