docs(siteruntime): add WaitAsync attribute-change helper spec

This commit is contained in:
Joseph Doherty
2026-06-17 08:14:09 -04:00
parent 639e331db1
commit b89d69a008
@@ -0,0 +1,251 @@
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
**Date:** 2026-06-17
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
**Why now:** the DELMIA/MES receiver re-implementation
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
> All file paths, line numbers, message records, and signatures below were read from source on
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
---
## 1. The gap
The receiver handshake (and any request/response tag interaction) needs to **wait until a
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
or `MoveInCompleteFlag == true` after setting the trigger flag.
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
a manual poll loop:
```csharp
// current workaround — every handshake script repeats this
var deadline = DateTime.UtcNow.AddSeconds(30);
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
{
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
await Task.Delay(200, CancellationToken);
}
```
Why this is unsatisfactory:
- **Latency** — completion is detected up to one poll interval late (200 ms here).
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
---
## 2. Where the change goes (verified wiring)
| Concern | Type / file | Notes |
|---|---|---|
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)``src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)``src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate``TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute``Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
is the one place every change passes, a waiter that (a) checks the current value on registration and
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
---
## 3. Proposed API (script-facing)
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
path resolution (`Resolve(name)`) applies just like get/set:
```csharp
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
// within the timeout, false if it timed out. Honors the script CancellationToken.
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
// Optional richer overload that also returns the matched value + quality.
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
// record WaitResult(bool Matched, object? Value, string Quality, bool TimedOut);
```
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
exception. The value-equality overload is the one the handshake needs and is the one that can also
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
Handshake, rewritten (replaces the §1 poll loop):
```csharp
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
return new {
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
};
```
```csharp
await Attributes.SetAsync("MoveInFlag", true);
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
```
---
## 4. Implementation outline (the patch)
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
```csharp
// actor protocol (site-local; delegate is fine because messaging is in-process)
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName, // already scope-resolved by the accessor
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc);
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string Quality,
bool TimedOut,
string? ErrorMessage = null);
// internal self-message used to fire the timeout
public record WaitForAttributeTimeout(string CorrelationId);
```
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
- **Handle `WaitForAttributeRequest`:**
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
use `Predicate`).
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
3. Otherwise register the waiter and schedule the timeout:
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
per instance (defensive; reply with `ErrorMessage` if exceeded).
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
allow removal during enumeration.)
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
Bad-quality transients.
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
- Add a method that the accessor calls:
```csharp
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
Func<object?,bool>? predicate, TimeSpan timeout)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
predicate, timeout, DateTimeOffset.UtcNow);
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
return resp.Matched;
}
```
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
```csharp
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
```
### 4.6 Trust model — no change
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
---
## 5. Correctness notes
- **No missed edge.** Registration (current-value check) and change-handling both run on the
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
is caught by the fast-path check; a value that flips after registration is caught by
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
event-driven and cheaper.
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
the waiter is removed and the caller answered even if the value never changes. Match cancels the
scheduled timeout.
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
---
## 6. Optional: inbound / routed variant
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
fully cover the DELMIA/MES use case.
---
## 7. Acceptance criteria
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
with no poll loop.
2. Returns `false` (no throw) when the value never matches within the timeout.
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
value in the same actor step as registration, and one step after).
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
composed child's attribute).
7. Passes `ScriptAnalysis` trust validation unchanged.
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
of the poll loop.
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
the script-surface tests under `tests/…/SiteRuntime*`.
```