Files
ScadaBridge/docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md
T

252 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
**Date:** 2026-06-17
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
**Why now:** the DELMIA/MES receiver re-implementation
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
> All file paths, line numbers, message records, and signatures below were read from source on
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
---
## 1. The gap
The receiver handshake (and any request/response tag interaction) needs to **wait until a
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
or `MoveInCompleteFlag == true` after setting the trigger flag.
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
a manual poll loop:
```csharp
// current workaround — every handshake script repeats this
var deadline = DateTime.UtcNow.AddSeconds(30);
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
{
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
await Task.Delay(200, CancellationToken);
}
```
Why this is unsatisfactory:
- **Latency** — completion is detected up to one poll interval late (200 ms here).
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
---
## 2. Where the change goes (verified wiring)
| Concern | Type / file | Notes |
|---|---|---|
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)``src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)``src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate``TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute``Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
is the one place every change passes, a waiter that (a) checks the current value on registration and
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
---
## 3. Proposed API (script-facing)
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
path resolution (`Resolve(name)`) applies just like get/set:
```csharp
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
// within the timeout, false if it timed out. Honors the script CancellationToken.
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
// Optional richer overload that also returns the matched value + quality.
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
// record WaitResult(bool Matched, object? Value, string Quality, bool TimedOut);
```
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
exception. The value-equality overload is the one the handshake needs and is the one that can also
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
Handshake, rewritten (replaces the §1 poll loop):
```csharp
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
return new {
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
};
```
```csharp
await Attributes.SetAsync("MoveInFlag", true);
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
```
---
## 4. Implementation outline (the patch)
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
```csharp
// actor protocol (site-local; delegate is fine because messaging is in-process)
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName, // already scope-resolved by the accessor
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc);
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string Quality,
bool TimedOut,
string? ErrorMessage = null);
// internal self-message used to fire the timeout
public record WaitForAttributeTimeout(string CorrelationId);
```
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
- **Handle `WaitForAttributeRequest`:**
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
use `Predicate`).
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
3. Otherwise register the waiter and schedule the timeout:
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
per instance (defensive; reply with `ErrorMessage` if exceeded).
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
allow removal during enumeration.)
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
Bad-quality transients.
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
- Add a method that the accessor calls:
```csharp
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
Func<object?,bool>? predicate, TimeSpan timeout)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
predicate, timeout, DateTimeOffset.UtcNow);
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
return resp.Matched;
}
```
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
```csharp
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
```
### 4.6 Trust model — no change
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
---
## 5. Correctness notes
- **No missed edge.** Registration (current-value check) and change-handling both run on the
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
is caught by the fast-path check; a value that flips after registration is caught by
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
event-driven and cheaper.
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
the waiter is removed and the caller answered even if the value never changes. Match cancels the
scheduled timeout.
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
---
## 6. Optional: inbound / routed variant
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
fully cover the DELMIA/MES use case.
---
## 7. Acceptance criteria
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
with no poll loop.
2. Returns `false` (no throw) when the value never matches within the timeout.
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
value in the same actor step as registration, and one step after).
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
composed child's attribute).
7. Passes `ScriptAnalysis` trust validation unchanged.
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
of the poll loop.
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
the script-surface tests under `tests/…/SiteRuntime*`.
```