merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes

This commit is contained in:
Joseph Doherty
2026-06-17 09:28:15 -04:00
88 changed files with 7714 additions and 169 deletions
@@ -0,0 +1,150 @@
# M5 — Audit Hardening (T3T8) — Design
**Status:** Approved (awaiting plan).
**Worktree/branch:** `worktree-m5-audit-hardening` off `main` (`e77e209`).
**Source:** Phase-2 milestone M5 from `docs/plans/2026-06-15-stillpending-completion-design.md`.
## Goal
Harden the centralized Audit Log with six independent, ready-to-build items. Two
items originally listed under M5 — **T1 hash-chain tamper evidence** and **T2
Parquet export** — remain **deferred to v1.x** (per CLAUDE.md's audit design
decisions); their stubs (CLI `verify-chain` no-op, export `501`) stay unchanged.
## Scope (in)
T3 per-channel retention · T4 ParentExecutionId tag-cascade · T5 historical
backfill (reframed) · T6 per-node stuck KPIs · T7 structured response-capture
increments · T8 CLI `audit tree`.
## Scope (out / deferred to v1.x)
T1 hash-chain (no Hash/PrevHash columns, no real verify-chain), T2 Parquet
export (the `501` gate stays). Reversing those deferrals is a separate decision.
---
## Items
### T8 — CLI `audit tree` (smallest; reuses existing server walk + UI)
The recursive execution-tree walk (`IAuditLogRepository.GetExecutionTreeAsync`,
backed by `IX_AuditLog_ParentExecution`) and the Blazor `ExecutionTreePage`
already exist; only an HTTP projection + CLI surface are missing.
- **Server:** add `GET /api/audit/tree?executionId=…` in
`AuditEndpoints.MapAuditAPI``repo.GetExecutionTreeAsync` → serialize
`ExecutionTreeNode[]`.
- **CLI:** add `audit tree --execution-id <guid> [--format table|json]` in
`AuditCommands` + an `AuditTreeHelpers` renderer (indented ASCII tree for
`table`; raw nodes for `json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- No schema change. **Tests:** endpoint returns the tree; CLI renders a
multi-level tree + handles not-found.
### T6 — Per-node stuck-count KPIs
KPIs are per-site today; `SourceNode` is on the `Notification` and `SiteCalls`
rows but not aggregated.
- Add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to the existing
`ComputePerSiteKpisAsync` in `NotificationOutboxRepository` and
`SiteCallAuditRepository`.
- New `PerNode…KpiRequest`/`Response` message pair per actor; register in each
actor's `Receive<>`.
- Surface a per-node breakdown on the existing KPI tiles
(`AuditKpiTiles`/`SiteCallKpiTiles`) — additive, behind the existing tiles.
- **Tests:** repository grouping returns correct per-node counts (stuck/parked/
queue-depth); message round-trip.
### T7 — Structured response-capture increments (no schema change)
- **(a) Inbound request headers** → captured into the existing `Extra` JSON in
`AuditWriteMiddleware.EmitInboundAudit`, passed through the existing header
redactor (auth headers redacted by default).
- **(b) `AuditInboundCeilingHits`** counter on `AuditCentralHealthSnapshot`
(alongside the existing failure counters), incremented when an inbound row
truncates (request or response hits `InboundMaxBytes`). Surfaced via the
health snapshot.
- **(c) Per-method opt-out** of body capture: a `SkipBodyCapture` flag on
`PerTargetRedactionOverride`, checked in the capture pipeline so a noisy/
sensitive method can suppress body capture (headers + metadata still recorded).
- **Tests:** request headers land in `Extra` and are redacted; ceiling-hit
increments the counter; opt-out suppresses body but keeps the row.
### T4 — `ParentExecutionId` tag-cascade (touches the actor model — high-risk)
Completes the execution tree beyond the inbound-API→routed-script case.
- **Alarm on-trigger:** thread a `Guid? parentExecutionId` through
`AlarmActor.SpawnAlarmExecutionActor``AlarmExecutionActor`
`ScriptRuntimeContext`, so an alarm-triggered script chains to its firing
context (the alarm's own execution id where one exists; otherwise a root).
- **Nested `CallScript`/`CallShared`:** in `ScriptRuntimeContext`, pass **the
current run's `ExecutionId`** (not the inherited `_parentExecutionId`) as the
child invocation's `ParentExecutionId`, so `A → CallScript(B)` records B's
parent as A — a true multi-level tree.
- **Timer/expression-trigger top-level runs** stay roots (no spawner) — unchanged.
- **Tests:** alarm-triggered script row carries the expected parent; a 2-level
nested `CallScript` produces a chain A→B→C walkable by `GetExecutionTreeAsync`.
- **Risk:** serialized actor state + correlation plumbing; covered by targeted
SiteRuntime actor tests + a tree-walk integration assertion.
### T3 — Per-channel retention overrides (one design wrinkle, resolved)
Retention is a single global `RetentionDays`; the purge actor switches out whole
month partitions by `OccurredAtUtc` (channel-blind).
- Add `PerChannelRetentionDays` (`Dictionary<string,int>`, keyed by channel /
`Action` name) to `AuditLogOptions`, validated like the global value; a channel
override may only be **shorter** than the global window (longer is meaningless
under month-partition switch-out, which is governed by the largest retention).
- **Mechanism (resolved):** after the coarse global partition purge, the purge
actor runs a **bounded row-level delete** for channels whose override is
shorter than global (`DELETE … WHERE Action=@channel AND OccurredAtUtc<@thr`,
batched). This runs from the **purge/maintenance path, not the writer role**
the append-only invariant binds the writer/ingest role, not maintenance. The
**M2.10 CI grep-guard is widened** to allow the purge actor's single audited
deletion call site (an allow-list entry, not a blanket exemption).
- **Tests:** a channel with a shorter override is purged earlier than the global;
channels without an override follow the global; the guard still rejects
UPDATE/DELETE everywhere except the sanctioned purge site.
### T5 — Historical backfill (reframed per the computed-column reality)
- **`SourceNode`** is a physical nullable column. For truly historical rows the
node-of-origin is **unknowable**, so the backfill sets a **configurable
sentinel** (default `"unknown"`) on `NULL` rows via a one-shot maintenance
command (run from the purge/maintenance path), rather than guessing a node.
- **`ExecutionId`/`ParentExecutionId`** are **persisted computed columns derived
from `DetailsJson`**; backfilling them means mutating the JSON, which
append-only forbids. These are **documented as a runbook limitation** (pre-feature
rows stay NULL) — no code.
- **Tests:** the SourceNode backfill sets the sentinel only on NULL rows within a
bounded range and is idempotent; documentation note added.
---
## Cross-cutting
- **Shared seams:** `AuditLogOptions` (T3, T7), `AuditEndpoints.MapAuditAPI`
(T8), `AuditCommands` (T8), `AuditCentralHealthSnapshot` (T6, T7),
`IAuditLogRepository`/the KPI repositories (T6), the purge/maintenance role
(T3, T5). No AuditLog **schema** change in M5 (T1/T2 deferred).
- **Append-only:** the only new deletion is T3's purge-role channel delete +
T5's purge-role sentinel UPDATE — both maintenance-path, both reflected in the
CI guard's allow-list. Writer/ingest paths stay INSERT-only.
## Testing strategy
Per-item unit + targeted integration tests (above). T4 additionally gets a
tree-walk integration assertion. Full-solution build + targeted suites at the
integration step. No new infra dependency (Parquet deferred).
## Sequencing
Independent items, parallelizable by disjoint area:
- **Wave A (parallel):** T8 (CLI+endpoint), T6 (KPI repos+actors+tiles), T7
(middleware+health+redaction-override) — disjoint projects.
- **Wave B (parallel):** T4 (SiteRuntime actors — high-risk), T3 (AuditLog
options+purge actor+CI guard), T5 (purge-path backfill command + runbook).
- **Wave C:** integration verification + docs (Component-AuditLog/-CLI, CLAUDE.md
KPI/retention notes, runbook).
## Risks
- **T4** actor-model correlation (serialized state) — targeted tests + tree-walk
assertion.
- **T3** append-only tension — resolved via maintenance-role delete + CI-guard
allow-list; verify the guard still blocks all other DELETE/UPDATE.
- **T5** node-of-origin unknowable — sentinel + documented limitation (no false
precision).
@@ -0,0 +1,92 @@
# M5 — Audit Hardening (T3T8) Implementation Plan
> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session.
**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change.
**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`.
**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
**Conventions:** targeted builds/tests per task (`dotnet build <proj>`, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- <paths>` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
---
# Wave A — leverage-existing-infra (parallel; disjoint projects)
### Task M5.1 (T8): CLI `audit tree` + tree endpoint
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=<guid>``IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows.
- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id <guid> [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern.
- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering.
**AC:** `audit tree --execution-id <id>` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change.
### Task M5.2 (T6): Per-node stuck-count KPIs
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3
**Files:**
- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`.
- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each.
- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown.
- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged.
### Task M5.3 (T7): Structured response-capture increments
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row).
- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row.
**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved.
---
# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
### Task M5.4 (T4): ParentExecutionId tag-cascade
**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6)
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree.
- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain).
**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
### Task M5.5 (T3): Per-channel retention overrides
**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary<string,int> PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path.
- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere.
**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective.
### Task M5.6 (T5): SourceNode sentinel backfill + runbook
**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`)
**Files:**
- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel <s>] [--before <date>]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file).
- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision.
- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows.
**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
---
# Wave C — integration + docs
### Task M5.7: Integration verification + docs
**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1M5.6
**Steps:**
1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution).
2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation.
4. Commit; final integration review of the whole `1b7600f..HEAD` diff.
**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
---
## Native tasks & dependencies
Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.
@@ -0,0 +1,13 @@
{
"planPath": "docs/plans/2026-06-16-m5-audit-hardening.md",
"tasks": [
{"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"},
{"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"},
{"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"},
{"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"},
{"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"},
{"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]},
{"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]}
],
"lastUpdated": "2026-06-16"
}
@@ -0,0 +1,264 @@
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
**Date:** 2026-06-17
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
**Why now:** the DELMIA/MES receiver re-implementation
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
> All file paths, line numbers, message records, and signatures below were read from source on
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
---
## 1. The gap
The receiver handshake (and any request/response tag interaction) needs to **wait until a
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
or `MoveInCompleteFlag == true` after setting the trigger flag.
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
a manual poll loop:
```csharp
// current workaround — every handshake script repeats this
var deadline = DateTime.UtcNow.AddSeconds(30);
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
{
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
await Task.Delay(200, CancellationToken);
}
```
Why this is unsatisfactory:
- **Latency** — completion is detected up to one poll interval late (200 ms here).
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
---
## 2. Where the change goes (verified wiring)
| Concern | Type / file | Notes |
|---|---|---|
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)``src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)``src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate``TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute``Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
is the one place every change passes, a waiter that (a) checks the current value on registration and
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
---
## 3. Proposed API (script-facing)
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
path resolution (`Resolve(name)`) applies just like get/set:
```csharp
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
// within the timeout, false if it timed out. Honors the script CancellationToken.
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
// Optional richer overload that also returns the matched value + quality.
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
// record WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
```
> **Status:** IMPLEMENTED. `Attributes.WaitForAsync(...)` returns a `WaitResult`
> (`readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut)`
> in Commons), populated on match (Value + Quality) and `Matched:false, TimedOut:true` on timeout.
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
exception. The value-equality overload is the one the handshake needs and is the one that can also
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
Handshake, rewritten (replaces the §1 poll loop):
```csharp
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
return new {
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
};
```
```csharp
await Attributes.SetAsync("MoveInFlag", true);
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
```
---
## 4. Implementation outline (the patch)
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
```csharp
// actor protocol (site-local; delegate is fine because messaging is in-process)
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName, // already scope-resolved by the accessor
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc);
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string Quality,
bool TimedOut,
string? ErrorMessage = null);
// internal self-message used to fire the timeout
public record WaitForAttributeTimeout(string CorrelationId);
```
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
- **Handle `WaitForAttributeRequest`:**
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
use `Predicate`).
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
3. Otherwise register the waiter and schedule the timeout:
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
per instance (defensive; reply with `ErrorMessage` if exceeded).
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
allow removal during enumeration.)
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
Bad-quality transients.
> **Status:** IMPLEMENTED as an opt-in `requireGoodQuality` parameter on `WaitAsync`/`WaitForAsync`
> (additive trailing `RequireGoodQuality` field on `WaitForAttributeRequest`, gated at both the
> fast-path and resolve-loop match sites). Default `false` = quality-agnostic (matches on value only).
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
- Add a method that the accessor calls:
```csharp
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
Func<object?,bool>? predicate, TimeSpan timeout)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
predicate, timeout, DateTimeOffset.UtcNow);
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
return resp.Matched;
}
```
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
```csharp
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
```
### 4.6 Trust model — no change
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
---
## 5. Correctness notes
- **No missed edge.** Registration (current-value check) and change-handling both run on the
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
is caught by the fast-path check; a value that flips after registration is caught by
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
event-driven and cheaper.
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
the waiter is removed and the caller answered even if the value never changes. Match cancels the
scheduled timeout.
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
---
## 6. Optional: inbound / routed variant
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
fully cover the DELMIA/MES use case.
> **Status:** IMPLEMENTED. `Route.To(code).WaitForAttribute(name, targetValue, timeout)` is wired
> end-to-end (`RouteToWaitForAttributeRequest/Response` → `IInstanceRouter` → `CommunicationService`
> → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only
> across the wire. NOT wired into the CentralUI Test-Run sandbox — that remains a follow-up.
---
## 7. Acceptance criteria
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
with no poll loop.
2. Returns `false` (no throw) when the value never matches within the timeout.
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
value in the same actor step as registration, and one step after).
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
composed child's attribute).
7. Passes `ScriptAnalysis` trust validation unchanged.
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
of the poll loop.
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
the script-surface tests under `tests/…/SiteRuntime*`.
```
@@ -0,0 +1,226 @@
# WaitAsync Deferred Optional Items — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (subagent-driven) to implement this plan task-by-task.
**Goal:** Implement the three items deferred from the WaitAsync spec (`docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md`): §3 `WaitForAsync`/`WaitResult` richer overload, §4.2 quality-gated ("Good"-only) matching, and §6 inbound/routed `Route.To(...).WaitForAttribute` variant.
**Architecture:** Builds on the shipped core (`b89d69a``04e97f4`). Two of the items (§3, §4.2) are site-local enrichments of the existing `Attributes` script surface + `InstanceActor` waiter; no new actor protocol shapes beyond an additive `RequireGoodQuality` field. The third (§6) mirrors the existing `Route.To(...).GetAttributes` cross-cluster path end-to-end (`RouteTarget``IInstanceRouter``CommunicationService``SiteCommunicationActor``DeploymentManagerActor``InstanceActor`), value-equality only across the wire, with the cluster Ask bounded by the *wait* timeout rather than the generic integration timeout.
**Tech Stack:** C#/.NET 10, Akka.NET 1.5, xUnit + Akka.TestKit + NSubstitute.
**Branch/worktree:** `waitfor-attr-helper` at `/Users/dohertj2/Desktop/ScadaBridge/.claude/worktrees/waitfor-attr-helper` (off local main; carries the core feature). Implementers do NOT create worktrees, commit **pathspec form** (`git commit -m "…" -- <paths>`), do NOT push, do NOT touch main. Targeted builds/tests per task; full-solution build only in WD-3.
---
## Naming / shared shapes
- New script return type `WaitResult` (Commons): `public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);`
- `WaitForAttributeRequest` gains a trailing additive field `bool RequireGoodQuality = false` (site-local request). `RequireGoodQuality` semantics: a match requires the value test to pass **and** `string.Equals(quality, "Good", StringComparison.Ordinal)`.
- Routed contract (value-equality only, no predicate, no quality flag across the wire — §6 says value-equality only): `RouteToWaitForAttributeRequest` / `RouteToWaitForAttributeResponse` (Commons `Messages/InboundApi`).
- The `WaitForAttributeResponse.Quality` field is already `string?` (null on timeout/error).
---
## Execution waves
- **Wave 1 (parallel, disjoint files):** WD-1 ∥ WD-2a. (2 concurrent committers; post-wave HEAD-presence check.)
- **Wave 2:** WD-2b (after WD-2a).
- **Wave 3:** WD-3 (after WD-1, WD-2a, WD-2b).
WD-1 must add `RequireGoodQuality` ONLY as a **trailing defaulted** ctor param of `WaitForAttributeRequest`, so WD-2b's `new WaitForAttributeRequest(...)` (built in wave 2) compiles regardless.
---
### Task WD-1: Site-local `WaitForAsync` + `WaitResult` + quality-gated mode (§3 + §4.2)
**Classification:** high-risk (modifies the `InstanceActor` single-threaded match evaluation + an additive message-contract field)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-2a
**Files:**
- Create: `src/ZB.MOM.WW.ScadaBridge.Commons/Types/WaitResult.cs`
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Instance/WaitForAttribute.cs` (add trailing `bool RequireGoodQuality = false` to `WaitForAttributeRequest`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/InstanceActor.cs` (thread `RequireGoodQuality` into `PendingWait` + both match sites)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (add `WaitAttributeFull` returning `WaitResult`; add `requireGoodQuality` param)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScopeAccessors.cs` (add `WaitForAsync` overloads + `requireGoodQuality` optional param on `WaitAsync`)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/InstanceActorWaitForAttributeTests.cs` + `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Scripts/ScopeAccessorTests.cs`
**Steps (TDD):**
1. **`WaitResult`** — add the readonly record struct above.
2. **`WaitForAttributeRequest`** — add trailing `bool RequireGoodQuality = false`. Keep the `Func<>` predicate field as-is. Update the XML-doc.
3. **`InstanceActor`** — add `bool RequireGoodQuality` to the `PendingWait` record. At BOTH match sites build the effective match as:
```csharp
// fast-path (HandleWaitForAttribute): quality from _attributeQualities.GetValueOrDefault(name, <existing default>)
// resolve loop (ResolveMatchedWaiters): quality from changed.Quality
bool QualityOk(string? q) => !requireGoodQuality || string.Equals(q, "Good", StringComparison.Ordinal);
bool matched = QualityOk(quality) && test(value); // keep test() inside its existing try/catch
```
Store `RequireGoodQuality` on the `PendingWait` so the resolve loop knows it. Keep the throwing-predicate guard (the `QualityOk && test` must still be inside the existing try/catch). The fast-path quality-fail when `requireGoodQuality` is just a non-match → register + schedule timeout as normal (do NOT fast-reply matched).
4. **`ScriptRuntimeContext`** — refactor: a private `Task<WaitForAttributeResponse> WaitInternal(name, encoded, predicate, timeout, requireGoodQuality)` that does the token-bounded `Ask` (keep the existing `AskTimeoutException → ...` handling; on AskTimeout return a synthetic `WaitForAttributeResponse(.., Matched:false, TimedOut:true)`). Then:
```csharp
public async Task<bool> WaitAttribute(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
=> (await WaitInternal(name, enc, pred, t, requireGoodQuality)).Matched;
public async Task<WaitResult> WaitAttributeFull(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
{ var r = await WaitInternal(...); return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut); }
```
(Note: `WaitAttribute`'s existing `AskTimeoutException → return false` must be preserved — fold it into `WaitInternal` returning a non-matched/timed-out response, OR catch in both. Do NOT catch `OperationCanceledException`/`TaskCanceledException`.)
5. **`AttributeAccessor`** — add `requireGoodQuality` optional param to both existing `WaitAsync` overloads, and add two `WaitForAsync` overloads:
```csharp
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
public Task<WaitResult> WaitForAsync(string key, Func<object?,bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
```
XML-doc: `requireGoodQuality:true` ignores Bad/Uncertain-quality transients.
6. **Tests** (extend existing files): (a) `WaitForAsync` returns a populated `WaitResult` on match (Value+Quality) and on timeout (`Matched:false, TimedOut:true`). (b) quality-gated: a value reaching target at **Bad** quality does NOT match when `requireGoodQuality:true` (stays pending → times out), but DOES match when `false`; and matches when it reaches target at Good quality. Cover both fast-path (already-at-target-but-Bad) and change-match. (c) scope resolution still applied for `WaitForAsync`.
7. Build `Commons` + `SiteRuntime` + the SiteRuntime test project; run `--filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync"` and the `~InstanceActor|~ScopeAccessor` regression filter. All green.
8. Commit (pathspec).
---
### Task WD-2a: Routed contract + central path (§6, part 1)
**Classification:** high-risk (cross-cluster message contract + `IInstanceRouter` surface)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-1
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs` (add the two records)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/IInstanceRouter.cs` (add method)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/CommunicationServiceInstanceRouter.cs` (delegate)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs` (`RouteTarget.WaitForAttribute`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/CommunicationService.cs` (`RouteToWaitForAttributeAsync` — **wait-timeout-aware** Ask)
- Modify (compile-break fixes — interface gained a member): `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Integration/ParentExecutionIdCorrelationTests.cs` (`BridgingInstanceRouter`) and the inline `IInstanceRouter` double in `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointContentTypeTests.cs`
- Test: `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/RouteHelperTests.cs`
**Steps (TDD):**
1. **Commons records** (mirror `RouteToGetAttributes*`, value-equality only):
```csharp
public record RouteToWaitForAttributeRequest(
string CorrelationId, string InstanceUniqueName, string AttributeName,
string? TargetValueEncoded, TimeSpan Timeout, DateTimeOffset Timestamp,
Guid? ParentExecutionId = null);
public record RouteToWaitForAttributeResponse(
string CorrelationId, bool Matched, object? Value, string? Quality, bool TimedOut,
bool Success, string? ErrorMessage, DateTimeOffset Timestamp);
```
(`Success`/`ErrorMessage` = routing-level outcome, e.g. instance-not-found; `Matched`/`TimedOut`/`Value`/`Quality` = wait outcome.)
2. **`IInstanceRouter`** — add `Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);`. **Update all 3 implementers** (prod `CommunicationServiceInstanceRouter` + the 2 test doubles listed above; the test doubles can return a canned response / throw NotImplemented only if never exercised — prefer a sane canned response).
3. **`CommunicationServiceInstanceRouter`** — delegate to `_communicationService.RouteToWaitForAttributeAsync(...)`.
4. **`RouteHelper.RouteTarget`** — add (mirror `GetAttributes`, throw on `!Success`):
```csharp
public async Task<bool> WaitForAttribute(string attributeName, object? targetValue, TimeSpan timeout, CancellationToken cancellationToken = default)
{
var token = Effective(cancellationToken);
var siteId = await ResolveSiteAsync(token);
var request = new RouteToWaitForAttributeRequest(Guid.NewGuid().ToString(), _instanceCode,
attributeName, AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow, _parentExecutionId);
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
if (!response.Success) throw new InvalidOperationException(response.ErrorMessage ?? "Remote attribute wait failed");
return response.Matched;
}
```
(`AttributeValueCodec` is in Commons.Types — add the using if needed.)
5. **`CommunicationService.RouteToWaitForAttributeAsync`** — mirror `RouteToGetAttributesAsync` BUT bound the Ask by the wait timeout, not the generic integration timeout:
```csharp
var envelope = new SiteEnvelope(siteId, request);
var askTimeout = request.Timeout + _options.IntegrationTimeout; // slack beyond the wait
return await GetActor().Ask<RouteToWaitForAttributeResponse>(envelope, askTimeout, cancellationToken);
```
6. **Test** (`RouteHelperTests`): with a substitute `IInstanceRouter` returning a canned `RouteToWaitForAttributeResponse(Matched:true,...)`, `Route.To("x").WaitForAttribute("Flag", true, 30s)` returns true; `Success:false` → throws `InvalidOperationException`; the encoded target equals `AttributeValueCodec.Encode(true)`.
7. Build `Commons` + `InboundAPI` + `Communication` + the two affected test projects; run `--filter "FullyQualifiedName~RouteHelper"` + a build of AuditLog.Tests/InboundAPI.Tests to confirm the interface-addition compiles. Commit (pathspec).
---
### Task WD-2b: Site unpacking + handler (§6, part 2)
**Classification:** high-risk (actor handler crossing into `InstanceActor`; Ask-timeout correctness)
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-2a
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/Actors/SiteCommunicationActor.cs` (add `Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));` next to the other RouteTo forwards ~line 145)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (`Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);` + handler)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/DeploymentManagerActorTests.cs`
**Steps (TDD):**
1. **`SiteCommunicationActor`** — add the `Receive`/Forward line.
2. **`DeploymentManagerActor.RouteInboundApiWaitForAttribute`** — mirror `RouteInboundApiGetAttributes`:
```csharp
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
{
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
{
Sender.Tell(new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false,
false, $"Instance '{request.InstanceUniqueName}' not found on this site.", DateTimeOffset.UtcNow));
return;
}
var sender = Sender;
var inner = new WaitForAttributeRequest(request.CorrelationId, request.InstanceUniqueName,
request.AttributeName, request.TargetValueEncoded, null /*predicate*/, request.Timeout,
DateTimeOffset.UtcNow /*, RequireGoodQuality defaults false */);
// Ask bounded by the WAIT timeout + slack (NOT a fixed 30s).
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
.ContinueWith(t => t.IsCompletedSuccessfully
? new RouteToWaitForAttributeResponse(request.CorrelationId, t.Result.Matched, t.Result.Value,
t.Result.Quality, t.Result.TimedOut, true, null, DateTimeOffset.UtcNow)
: new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false, false,
t.Exception?.GetBaseException().Message ?? "Attribute wait timed out", DateTimeOffset.UtcNow))
.PipeTo(sender);
}
```
(`WaitForAttributeRequest` lives in Commons `Messages/Instance` — add the using. Build with both the trailing-`RequireGoodQuality` and pre-field signatures in mind; passing 7 positional args + default is fine.)
3. **Test** (`DeploymentManagerActorTests`, mirror the routed get-attributes test): deploy/register an instance whose attribute already equals the target → `RouteToWaitForAttributeRequest` → `RouteToWaitForAttributeResponse(Success:true, Matched:true)`; unknown instance → `Success:false`.
4. Build `Communication` + `SiteRuntime` + SiteRuntime test project; run `--filter "FullyQualifiedName~DeploymentManagerActor"`. Commit (pathspec).
---
### Task WD-3: Integration — docs + full verification
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-1, WD-2a, WD-2b
**Files:**
- Modify: `docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md` (mark §3 `WaitForAsync`/`WaitResult`, §4.2 quality-gated mode, and §6 routed variant as IMPLEMENTED; note Test-Run sandbox parity excluded)
- Modify: `docs/requirements/Component-SiteRuntime.md` (script-surface note: `Attributes.WaitForAsync` + `requireGoodQuality`) and `docs/requirements/Component-InboundAPI.md` (`Route.To(...).WaitForAttribute`) — brief, only if those docs enumerate the script surface
- (No new component, no migration, no docker config change)
**Steps:**
1. Update the spec doc + component docs as above.
2. **Full-solution build:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` — 0 errors.
3. **Targeted test sweep** across everything touched:
`dotnet test tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/... --filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync|FullyQualifiedName~DeploymentManagerActor"`,
`dotnet test tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/... --filter "FullyQualifiedName~RouteHelper"`,
and a build of `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` + `tests/ZB.MOM.WW.ScadaBridge.Communication.Tests` to confirm no compile/regression from the interface addition.
4. `git diff` review; commit (pathspec).
---
## Out of scope (explicit)
- Routed `WaitForAttribute` is NOT wired into the CentralUI Test-Run sandbox (`ISandboxInstanceGateway`/`SandboxInstanceGateway`); production inbound scripts get it. Follow-up if Test-Run parity is wanted.
- No predicate or quality flag across the wire (§6 is value-equality only, per spec).
- No docker redeploy (no cluster-runtime config change; additive script surface only).
@@ -0,0 +1,10 @@
{
"planPath": "docs/plans/2026-06-17-waitfor-deferred-items.md",
"tasks": [
{"id": 1, "subject": "WD-1: site-local WaitForAsync + WaitResult + quality-gated mode (§3+§4.2)", "classification": "high-risk", "status": "pending", "parallelizableWith": [2]},
{"id": 2, "subject": "WD-2a: routed contract + central path (§6 part 1)", "classification": "high-risk", "status": "pending", "parallelizableWith": [1]},
{"id": 3, "subject": "WD-2b: site unpacking + DeploymentManager handler (§6 part 2)", "classification": "high-risk", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "WD-3: integration — docs + full verification", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]}
],
"lastUpdated": "2026-06-17"
}