From b89d69a008800c7521c524bfca2d3aab4c286212 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Wed, 17 Jun 2026 08:14:09 -0400 Subject: [PATCH] docs(siteruntime): add WaitAsync attribute-change helper spec --- ...17-waitfor-attribute-change-helper-spec.md | 251 ++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md diff --git a/docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md b/docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md new file mode 100644 index 00000000..512bcd78 --- /dev/null +++ b/docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md @@ -0,0 +1,251 @@ +# Patch request — event-driven "wait for attribute change (with timeout)" script helper + +**Date:** 2026-06-17 +**Type:** Source enhancement (small, additive) to the SiteRuntime script surface +**Why now:** the DELMIA/MES receiver re-implementation +([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1) +currently has to **busy-poll** for the handshake completion flag. This spec describes the gap +and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait +**event-driven** for a tag/attribute to reach a value, bounded by a timeout. + +> All file paths, line numbers, message records, and signatures below were read from source on +> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors. + +--- + +## 1. The gap + +The receiver handshake (and any request/response tag interaction) needs to **wait until a +data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true` +or `MoveInCompleteFlag == true` after setting the trigger flag. + +ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write** +(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is +a manual poll loop: + +```csharp +// current workaround — every handshake script repeats this +var deadline = DateTime.UtcNow.AddSeconds(30); +while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested) +{ + if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break; + await Task.Delay(200, CancellationToken); +} +``` + +Why this is unsatisfactory: + +- **Latency** — completion is detected up to one poll interval late (200 ms here). +- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the + `InstanceActor`); N handshakes × M polls = a lot of needless messages. +- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong + (forgetting `CancellationToken`, off-by-one on the deadline, not handling quality). +- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality. + +Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced +attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises +`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely. + +--- + +## 2. Where the change goes (verified wiring) + +| Concern | Type / file | Notes | +|---|---|---| +| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)` — `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change | +| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)` — `src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate` ← `TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` | +| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here | +| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute` → `Ask(GetAttributeRequest)` | the helper mirrors this | +| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in | +| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom | +| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. | + +**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter, +not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged` +is the one place every change passes, a waiter that (a) checks the current value on registration and +(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe". + +--- + +## 3. Proposed API (script-facing) + +Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition +path resolution (`Resolve(name)`) applies just like get/set: + +```csharp +// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched +// within the timeout, false if it timed out. Honors the script CancellationToken. +Task Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout); + +// Predicate form — site-local template scripts only (predicate is an in-process delegate). +Task Attributes.WaitAsync(string name, Func predicate, TimeSpan timeout); + +// Optional richer overload that also returns the matched value + quality. +Task Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout); +// record WaitResult(bool Matched, object? Value, string Quality, bool TimedOut); +``` + +Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an +exception. The value-equality overload is the one the handshake needs and is the one that can also +be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not. + +Handshake, rewritten (replaces the §1 poll loop): + +```csharp +await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger +var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30)); +if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" }; +return new { + Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false, + ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? "" +}; +``` + +```csharp +await Attributes.SetAsync("MoveInFlag", true); +var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30)); +// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID … +``` + +--- + +## 4. Implementation outline (the patch) + +### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`) +```csharp +// actor protocol (site-local; delegate is fine because messaging is in-process) +public record WaitForAttributeRequest( + string CorrelationId, + string InstanceName, + string AttributeName, // already scope-resolved by the accessor + string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change" + Func? Predicate, // local-only; null when TargetValueEncoded is used + TimeSpan Timeout, + DateTimeOffset OccurredAtUtc); + +public record WaitForAttributeResponse( + string CorrelationId, + bool Matched, + object? Value, + string Quality, + bool TimedOut, + string? ErrorMessage = null); + +// internal self-message used to fire the timeout +public record WaitForAttributeTimeout(string CorrelationId); +``` + +### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`) +- Add a registry: `Dictionary _attributeWaiters` keyed by `CorrelationId`, where + `PendingWait` holds the attribute name, the match test (decoded target value **or** predicate), + the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle. +- **Handle `WaitForAttributeRequest`:** + 1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or + use `Predicate`). + 2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply + `WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return. + 3. Otherwise register the waiter and schedule the timeout: + `Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`, + storing the returned `ICancelable`. Capture `Sender` now (it is invalid later). + 4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask` + already carries the script token; see §4.3). Optionally cap the number of concurrent waiters + per instance (defensive; reply with `ErrorMessage` if exceeded). +- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose + attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout, + reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to + allow removal during enumeration.) +- **Handle `WaitForAttributeTimeout`:** if still registered, reply + `WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove. +- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore + Bad-quality transients. + +### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`) +- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor + parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not** + passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context. +- Add a method that the accessor calls: + ```csharp + public async Task WaitAttribute(string name, string? targetValueEncoded, + Func? predicate, TimeSpan timeout) + { + var cid = Guid.NewGuid().ToString(); + var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded, + predicate, timeout, DateTimeOffset.UtcNow); + // Ask bounded by the script timeout token so a script-deadline abort cancels the await. + var resp = await _instanceActor.Ask( + req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken); + return resp.Matched; + } + ``` + +### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) +- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)` + site) into the new `ScriptRuntimeContext` constructor parameter from §4.3. + +### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`) +```csharp +public Task WaitAsync(string key, object? targetValue, TimeSpan timeout) + => _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout); + +public Task WaitAsync(string key, Func predicate, TimeSpan timeout) + => _ctx.WaitAttribute(Resolve(key), null, predicate, timeout); +``` + +### 4.6 Trust model — no change +`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script +only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks` ++ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper +type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't. + +--- + +## 5. Correctness notes + +- **No missed edge.** Registration (current-value check) and change-handling both run on the + `InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter" + is caught by the fast-path check; a value that flips after registration is caught by + `HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is + event-driven and cheaper. +- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees + the waiter is removed and the caller answered even if the value never changes. Match cancels the + scheduled timeout. +- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits + its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the + actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout). +- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`). + Consider a per-instance cap as a guard against a script leaking waiters in a loop. + +--- + +## 6. Optional: inbound / routed variant + +For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script +could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed +pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync` +method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the +`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized, +so the routed form takes the encoded target value (the predicate overload stays site-local). This is +optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone +fully cover the DELMIA/MES use case. + +--- + +## 7. Acceptance criteria + +1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it + returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update), + with no poll loop. +2. Returns `false` (no throw) when the value never matches within the timeout. +3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins). +4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the + value in the same actor step as registration, and one step after). +5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward). +6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the + composed child's attribute). +7. Passes `ScriptAnalysis` trust validation unchanged. +8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place + of the poll loop. + +Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and +the script-surface tests under `tests/…/SiteRuntime*`. +```