merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes

2026-06-17 09:28:15 -04:00
parent bf2f481bb4 11534089b9
commit af54c8ad11
88 changed files with 7714 additions and 169 deletions
@@ -0,0 +1,150 @@
+# M5 — Audit Hardening (T3–T8) — Design
+
+**Status:** Approved (awaiting plan).
+**Worktree/branch:** `worktree-m5-audit-hardening` off `main` (`e77e209`).
+**Source:** Phase-2 milestone M5 from `docs/plans/2026-06-15-stillpending-completion-design.md`.
+
+## Goal
+
+Harden the centralized Audit Log with six independent, ready-to-build items. Two
+items originally listed under M5 — **T1 hash-chain tamper evidence** and **T2
+Parquet export** — remain **deferred to v1.x** (per CLAUDE.md's audit design
+decisions); their stubs (CLI `verify-chain` no-op, export `501`) stay unchanged.
+
+## Scope (in)
+
+T3 per-channel retention · T4 ParentExecutionId tag-cascade · T5 historical
+backfill (reframed) · T6 per-node stuck KPIs · T7 structured response-capture
+increments · T8 CLI `audit tree`.
+
+## Scope (out / deferred to v1.x)
+
+T1 hash-chain (no Hash/PrevHash columns, no real verify-chain), T2 Parquet
+export (the `501` gate stays). Reversing those deferrals is a separate decision.
+
+---
+
+## Items
+
+### T8 — CLI `audit tree` (smallest; reuses existing server walk + UI)
+The recursive execution-tree walk (`IAuditLogRepository.GetExecutionTreeAsync`,
+backed by `IX_AuditLog_ParentExecution`) and the Blazor `ExecutionTreePage`
+already exist; only an HTTP projection + CLI surface are missing.
+- **Server:** add `GET /api/audit/tree?executionId=…` in
+  `AuditEndpoints.MapAuditAPI` → `repo.GetExecutionTreeAsync` → serialize
+  `ExecutionTreeNode[]`.
+- **CLI:** add `audit tree --execution-id <guid> [--format table|json]` in
+  `AuditCommands` + an `AuditTreeHelpers` renderer (indented ASCII tree for
+  `table`; raw nodes for `json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
+- No schema change. **Tests:** endpoint returns the tree; CLI renders a
+  multi-level tree + handles not-found.
+
+### T6 — Per-node stuck-count KPIs
+KPIs are per-site today; `SourceNode` is on the `Notification` and `SiteCalls`
+rows but not aggregated.
+- Add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to the existing
+  `ComputePerSiteKpisAsync` in `NotificationOutboxRepository` and
+  `SiteCallAuditRepository`.
+- New `PerNode…KpiRequest`/`Response` message pair per actor; register in each
+  actor's `Receive<>`.
+- Surface a per-node breakdown on the existing KPI tiles
+  (`AuditKpiTiles`/`SiteCallKpiTiles`) — additive, behind the existing tiles.
+- **Tests:** repository grouping returns correct per-node counts (stuck/parked/
+  queue-depth); message round-trip.
+
+### T7 — Structured response-capture increments (no schema change)
+- **(a) Inbound request headers** → captured into the existing `Extra` JSON in
+  `AuditWriteMiddleware.EmitInboundAudit`, passed through the existing header
+  redactor (auth headers redacted by default).
+- **(b) `AuditInboundCeilingHits`** counter on `AuditCentralHealthSnapshot`
+  (alongside the existing failure counters), incremented when an inbound row
+  truncates (request or response hits `InboundMaxBytes`). Surfaced via the
+  health snapshot.
+- **(c) Per-method opt-out** of body capture: a `SkipBodyCapture` flag on
+  `PerTargetRedactionOverride`, checked in the capture pipeline so a noisy/
+  sensitive method can suppress body capture (headers + metadata still recorded).
+- **Tests:** request headers land in `Extra` and are redacted; ceiling-hit
+  increments the counter; opt-out suppresses body but keeps the row.
+
+### T4 — `ParentExecutionId` tag-cascade (touches the actor model — high-risk)
+Completes the execution tree beyond the inbound-API→routed-script case.
+- **Alarm on-trigger:** thread a `Guid? parentExecutionId` through
+  `AlarmActor.SpawnAlarmExecutionActor` → `AlarmExecutionActor` →
+  `ScriptRuntimeContext`, so an alarm-triggered script chains to its firing
+  context (the alarm's own execution id where one exists; otherwise a root).
+- **Nested `CallScript`/`CallShared`:** in `ScriptRuntimeContext`, pass **the
+  current run's `ExecutionId`** (not the inherited `_parentExecutionId`) as the
+  child invocation's `ParentExecutionId`, so `A → CallScript(B)` records B's
+  parent as A — a true multi-level tree.
+- **Timer/expression-trigger top-level runs** stay roots (no spawner) — unchanged.
+- **Tests:** alarm-triggered script row carries the expected parent; a 2-level
+  nested `CallScript` produces a chain A→B→C walkable by `GetExecutionTreeAsync`.
+- **Risk:** serialized actor state + correlation plumbing; covered by targeted
+  SiteRuntime actor tests + a tree-walk integration assertion.
+
+### T3 — Per-channel retention overrides (one design wrinkle, resolved)
+Retention is a single global `RetentionDays`; the purge actor switches out whole
+month partitions by `OccurredAtUtc` (channel-blind).
+- Add `PerChannelRetentionDays` (`Dictionary<string,int>`, keyed by channel /
+  `Action` name) to `AuditLogOptions`, validated like the global value; a channel
+  override may only be **shorter** than the global window (longer is meaningless
+  under month-partition switch-out, which is governed by the largest retention).
+- **Mechanism (resolved):** after the coarse global partition purge, the purge
+  actor runs a **bounded row-level delete** for channels whose override is
+  shorter than global (`DELETE … WHERE Action=@channel AND OccurredAtUtc<@thr`,
+  batched). This runs from the **purge/maintenance path, not the writer role** —
+  the append-only invariant binds the writer/ingest role, not maintenance. The
+  **M2.10 CI grep-guard is widened** to allow the purge actor's single audited
+  deletion call site (an allow-list entry, not a blanket exemption).
+- **Tests:** a channel with a shorter override is purged earlier than the global;
+  channels without an override follow the global; the guard still rejects
+  UPDATE/DELETE everywhere except the sanctioned purge site.
+
+### T5 — Historical backfill (reframed per the computed-column reality)
+- **`SourceNode`** is a physical nullable column. For truly historical rows the
+  node-of-origin is **unknowable**, so the backfill sets a **configurable
+  sentinel** (default `"unknown"`) on `NULL` rows via a one-shot maintenance
+  command (run from the purge/maintenance path), rather than guessing a node.
+- **`ExecutionId`/`ParentExecutionId`** are **persisted computed columns derived
+  from `DetailsJson`**; backfilling them means mutating the JSON, which
+  append-only forbids. These are **documented as a runbook limitation** (pre-feature
+  rows stay NULL) — no code.
+- **Tests:** the SourceNode backfill sets the sentinel only on NULL rows within a
+  bounded range and is idempotent; documentation note added.
+
+---
+
+## Cross-cutting
+
+- **Shared seams:** `AuditLogOptions` (T3, T7), `AuditEndpoints.MapAuditAPI`
+  (T8), `AuditCommands` (T8), `AuditCentralHealthSnapshot` (T6, T7),
+  `IAuditLogRepository`/the KPI repositories (T6), the purge/maintenance role
+  (T3, T5). No AuditLog **schema** change in M5 (T1/T2 deferred).
+- **Append-only:** the only new deletion is T3's purge-role channel delete +
+  T5's purge-role sentinel UPDATE — both maintenance-path, both reflected in the
+  CI guard's allow-list. Writer/ingest paths stay INSERT-only.
+
+## Testing strategy
+
+Per-item unit + targeted integration tests (above). T4 additionally gets a
+tree-walk integration assertion. Full-solution build + targeted suites at the
+integration step. No new infra dependency (Parquet deferred).
+
+## Sequencing
+
+Independent items, parallelizable by disjoint area:
+- **Wave A (parallel):** T8 (CLI+endpoint), T6 (KPI repos+actors+tiles), T7
+  (middleware+health+redaction-override) — disjoint projects.
+- **Wave B (parallel):** T4 (SiteRuntime actors — high-risk), T3 (AuditLog
+  options+purge actor+CI guard), T5 (purge-path backfill command + runbook).
+- **Wave C:** integration verification + docs (Component-AuditLog/-CLI, CLAUDE.md
+  KPI/retention notes, runbook).
+
+## Risks
+
+- **T4** actor-model correlation (serialized state) — targeted tests + tree-walk
+  assertion.
+- **T3** append-only tension — resolved via maintenance-role delete + CI-guard
+  allow-list; verify the guard still blocks all other DELETE/UPDATE.
+- **T5** node-of-origin unknowable — sentinel + documented limitation (no false
+  precision).
@@ -0,0 +1,92 @@
+# M5 — Audit Hardening (T3–T8) Implementation Plan
+
+> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session.
+
+**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change.
+
+**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`.
+
+**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
+
+**Conventions:** targeted builds/tests per task (`dotnet build <proj>`, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- <paths>` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
+
+---
+
+# Wave A — leverage-existing-infra (parallel; disjoint projects)
+
+### Task M5.1 (T8): CLI `audit tree` + tree endpoint
+**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=<guid>` → `IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows.
+- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
+- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id <guid> [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern.
+- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering.
+**AC:** `audit tree --execution-id <id>` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change.
+
+### Task M5.2 (T6): Per-node stuck-count KPIs
+**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3
+**Files:**
+- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`.
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`.
+- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each.
+- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown.
+- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
+**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged.
+
+### Task M5.3 (T7): Structured response-capture increments
+**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default).
+- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`).
+- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row).
+- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row.
+**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved.
+
+---
+
+# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
+
+### Task M5.4 (T4): ParentExecutionId tag-cascade
+**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6)
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`).
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree.
+- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain).
+**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
+
+### Task M5.5 (T3): Per-channel retention overrides
+**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary<string,int> PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only).
+- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path.
+- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
+- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere.
+**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective.
+
+### Task M5.6 (T5): SourceNode sentinel backfill + runbook
+**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`)
+**Files:**
+- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent.
+- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel <s>] [--before <date>]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file).
+- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision.
+- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows.
+**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
+
+---
+
+# Wave C — integration + docs
+
+### Task M5.7: Integration verification + docs
+**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1–M5.6
+**Steps:**
+1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution).
+2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
+3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation.
+4. Commit; final integration review of the whole `1b7600f..HEAD` diff.
+**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
+
+---
+
+## Native tasks & dependencies
+
+Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1–M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.
@@ -0,0 +1,13 @@
+{
+  "planPath": "docs/plans/2026-06-16-m5-audit-hardening.md",
+  "tasks": [
+    {"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"},
+    {"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"},
+    {"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"},
+    {"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"},
+    {"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"},
+    {"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]},
+    {"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]}
+  ],
+  "lastUpdated": "2026-06-16"
+}
@@ -0,0 +1,264 @@
+# Patch request — event-driven "wait for attribute change (with timeout)" script helper
+
+**Date:** 2026-06-17
+**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
+**Why now:** the DELMIA/MES receiver re-implementation
+([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
+currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
+and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
+**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
+
+> All file paths, line numbers, message records, and signatures below were read from source on
+> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
+
+---
+
+## 1. The gap
+
+The receiver handshake (and any request/response tag interaction) needs to **wait until a
+data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
+or `MoveInCompleteFlag == true` after setting the trigger flag.
+
+ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
+(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
+a manual poll loop:
+
+```csharp
+// current workaround — every handshake script repeats this
+var deadline = DateTime.UtcNow.AddSeconds(30);
+while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
+{
+    if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
+    await Task.Delay(200, CancellationToken);
+}
+```
+
+Why this is unsatisfactory:
+
+- **Latency** — completion is detected up to one poll interval late (200 ms here).
+- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
+  `InstanceActor`); N handshakes × M polls = a lot of needless messages.
+- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
+  (forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
+- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
+
+Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
+attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
+`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
+
+---
+
+## 2. Where the change goes (verified wiring)
+
+| Concern | Type / file | Notes |
+|---|---|---|
+| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)` — `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
+| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)` — `src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate` ← `TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
+| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
+| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute` → `Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
+| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
+| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
+| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
+
+**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
+not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
+is the one place every change passes, a waiter that (a) checks the current value on registration and
+(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
+
+---
+
+## 3. Proposed API (script-facing)
+
+Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
+path resolution (`Resolve(name)`) applies just like get/set:
+
+```csharp
+// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
+// within the timeout, false if it timed out. Honors the script CancellationToken.
+Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
+
+// Predicate form — site-local template scripts only (predicate is an in-process delegate).
+Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
+
+// Optional richer overload that also returns the matched value + quality.
+Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
+// record WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
+```
+
+> **Status:** IMPLEMENTED. `Attributes.WaitForAsync(...)` returns a `WaitResult`
+> (`readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut)`
+> in Commons), populated on match (Value + Quality) and `Matched:false, TimedOut:true` on timeout.
+
+Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
+exception. The value-equality overload is the one the handshake needs and is the one that can also
+be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
+
+Handshake, rewritten (replaces the §1 poll loop):
+
+```csharp
+await Attributes.SetAsync("RecipeDownloadFlag", true);                 // trigger
+var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
+if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
+return new {
+    Result     = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
+    ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
+};
+```
+
+```csharp
+await Attributes.SetAsync("MoveInFlag", true);
+var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
+// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
+```
+
+---
+
+## 4. Implementation outline (the patch)
+
+### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
+```csharp
+// actor protocol (site-local; delegate is fine because messaging is in-process)
+public record WaitForAttributeRequest(
+    string  CorrelationId,
+    string  InstanceName,
+    string  AttributeName,            // already scope-resolved by the accessor
+    string? TargetValueEncoded,       // AttributeValueCodec.Encode(targetValue); null = "any change"
+    Func<object?, bool>? Predicate,   // local-only; null when TargetValueEncoded is used
+    TimeSpan Timeout,
+    DateTimeOffset OccurredAtUtc);
+
+public record WaitForAttributeResponse(
+    string CorrelationId,
+    bool   Matched,
+    object? Value,
+    string Quality,
+    bool   TimedOut,
+    string? ErrorMessage = null);
+
+// internal self-message used to fire the timeout
+public record WaitForAttributeTimeout(string CorrelationId);
+```
+
+### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
+- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
+  `PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
+  the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
+- **Handle `WaitForAttributeRequest`:**
+  1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
+     use `Predicate`).
+  2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
+     `WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
+  3. Otherwise register the waiter and schedule the timeout:
+     `Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
+     storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
+  4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
+     already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
+     per instance (defensive; reply with `ErrorMessage` if exceeded).
+- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
+  attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
+  reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
+  allow removal during enumeration.)
+- **Handle `WaitForAttributeTimeout`:** if still registered, reply
+  `WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
+- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
+  Bad-quality transients.
+
+> **Status:** IMPLEMENTED as an opt-in `requireGoodQuality` parameter on `WaitAsync`/`WaitForAsync`
+> (additive trailing `RequireGoodQuality` field on `WaitForAttributeRequest`, gated at both the
+> fast-path and resolve-loop match sites). Default `false` = quality-agnostic (matches on value only).
+
+### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
+- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
+  parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
+  passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
+- Add a method that the accessor calls:
+  ```csharp
+  public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
+                                        Func<object?,bool>? predicate, TimeSpan timeout)
+  {
+      var cid = Guid.NewGuid().ToString();
+      var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
+                                            predicate, timeout, DateTimeOffset.UtcNow);
+      // Ask bounded by the script timeout token so a script-deadline abort cancels the await.
+      var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
+                     req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
+      return resp.Matched;
+  }
+  ```
+
+### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
+- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
+  site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
+
+### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
+```csharp
+public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
+    => _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
+
+public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
+    => _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
+```
+
+### 4.6 Trust model — no change
+`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
+only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
+type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
+
+---
+
+## 5. Correctness notes
+
+- **No missed edge.** Registration (current-value check) and change-handling both run on the
+  `InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
+  is caught by the fast-path check; a value that flips after registration is caught by
+  `HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
+  event-driven and cheaper.
+- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
+  the waiter is removed and the caller answered even if the value never changes. Match cancels the
+  scheduled timeout.
+- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
+  its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
+  actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
+- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
+  Consider a per-instance cap as a guard against a script leaking waiters in a loop.
+
+---
+
+## 6. Optional: inbound / routed variant
+
+For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
+could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
+pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
+method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
+`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
+so the routed form takes the encoded target value (the predicate overload stays site-local). This is
+optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
+fully cover the DELMIA/MES use case.
+
+> **Status:** IMPLEMENTED. `Route.To(code).WaitForAttribute(name, targetValue, timeout)` is wired
+> end-to-end (`RouteToWaitForAttributeRequest/Response` → `IInstanceRouter` → `CommunicationService`
+> → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only
+> across the wire. NOT wired into the CentralUI Test-Run sandbox — that remains a follow-up.
+
+---
+
+## 7. Acceptance criteria
+
+1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
+   returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
+   with no poll loop.
+2. Returns `false` (no throw) when the value never matches within the timeout.
+3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
+4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
+   value in the same actor step as registration, and one step after).
+5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
+6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
+   composed child's attribute).
+7. Passes `ScriptAnalysis` trust validation unchanged.
+8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
+   of the poll loop.
+
+Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
+the script-surface tests under `tests/…/SiteRuntime*`.
+```
@@ -0,0 +1,226 @@
+# WaitAsync Deferred Optional Items — Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (subagent-driven) to implement this plan task-by-task.
+
+**Goal:** Implement the three items deferred from the WaitAsync spec (`docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md`): §3 `WaitForAsync`/`WaitResult` richer overload, §4.2 quality-gated ("Good"-only) matching, and §6 inbound/routed `Route.To(...).WaitForAttribute` variant.
+
+**Architecture:** Builds on the shipped core (`b89d69a`→`04e97f4`). Two of the items (§3, §4.2) are site-local enrichments of the existing `Attributes` script surface + `InstanceActor` waiter; no new actor protocol shapes beyond an additive `RequireGoodQuality` field. The third (§6) mirrors the existing `Route.To(...).GetAttributes` cross-cluster path end-to-end (`RouteTarget` → `IInstanceRouter` → `CommunicationService` → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only across the wire, with the cluster Ask bounded by the *wait* timeout rather than the generic integration timeout.
+
+**Tech Stack:** C#/.NET 10, Akka.NET 1.5, xUnit + Akka.TestKit + NSubstitute.
+
+**Branch/worktree:** `waitfor-attr-helper` at `/Users/dohertj2/Desktop/ScadaBridge/.claude/worktrees/waitfor-attr-helper` (off local main; carries the core feature). Implementers do NOT create worktrees, commit **pathspec form** (`git commit -m "…" -- <paths>`), do NOT push, do NOT touch main. Targeted builds/tests per task; full-solution build only in WD-3.
+
+---
+
+## Naming / shared shapes
+
+- New script return type `WaitResult` (Commons): `public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);`
+- `WaitForAttributeRequest` gains a trailing additive field `bool RequireGoodQuality = false` (site-local request). `RequireGoodQuality` semantics: a match requires the value test to pass **and** `string.Equals(quality, "Good", StringComparison.Ordinal)`.
+- Routed contract (value-equality only, no predicate, no quality flag across the wire — §6 says value-equality only): `RouteToWaitForAttributeRequest` / `RouteToWaitForAttributeResponse` (Commons `Messages/InboundApi`).
+- The `WaitForAttributeResponse.Quality` field is already `string?` (null on timeout/error).
+
+---
+
+## Execution waves
+
+- **Wave 1 (parallel, disjoint files):** WD-1 ∥ WD-2a. (2 concurrent committers; post-wave HEAD-presence check.)
+- **Wave 2:** WD-2b (after WD-2a).
+- **Wave 3:** WD-3 (after WD-1, WD-2a, WD-2b).
+
+WD-1 must add `RequireGoodQuality` ONLY as a **trailing defaulted** ctor param of `WaitForAttributeRequest`, so WD-2b's `new WaitForAttributeRequest(...)` (built in wave 2) compiles regardless.
+
+---
+
+### Task WD-1: Site-local `WaitForAsync` + `WaitResult` + quality-gated mode (§3 + §4.2)
+
+**Classification:** high-risk (modifies the `InstanceActor` single-threaded match evaluation + an additive message-contract field)
+**Estimated implement time:** ~5 min
+**Parallelizable with:** WD-2a
+
+**Files:**
+- Create: `src/ZB.MOM.WW.ScadaBridge.Commons/Types/WaitResult.cs`
+- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Instance/WaitForAttribute.cs` (add trailing `bool RequireGoodQuality = false` to `WaitForAttributeRequest`)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/InstanceActor.cs` (thread `RequireGoodQuality` into `PendingWait` + both match sites)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (add `WaitAttributeFull` returning `WaitResult`; add `requireGoodQuality` param)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScopeAccessors.cs` (add `WaitForAsync` overloads + `requireGoodQuality` optional param on `WaitAsync`)
+- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/InstanceActorWaitForAttributeTests.cs` + `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Scripts/ScopeAccessorTests.cs`
+
+**Steps (TDD):**
+
+1. **`WaitResult`** — add the readonly record struct above.
+
+2. **`WaitForAttributeRequest`** — add trailing `bool RequireGoodQuality = false`. Keep the `Func<>` predicate field as-is. Update the XML-doc.
+
+3. **`InstanceActor`** — add `bool RequireGoodQuality` to the `PendingWait` record. At BOTH match sites build the effective match as:
+   ```csharp
+   // fast-path (HandleWaitForAttribute): quality from _attributeQualities.GetValueOrDefault(name, <existing default>)
+   // resolve loop (ResolveMatchedWaiters): quality from changed.Quality
+   bool QualityOk(string? q) => !requireGoodQuality || string.Equals(q, "Good", StringComparison.Ordinal);
+   bool matched = QualityOk(quality) && test(value);   // keep test() inside its existing try/catch
+   ```
+   Store `RequireGoodQuality` on the `PendingWait` so the resolve loop knows it. Keep the throwing-predicate guard (the `QualityOk && test` must still be inside the existing try/catch). The fast-path quality-fail when `requireGoodQuality` is just a non-match → register + schedule timeout as normal (do NOT fast-reply matched).
+
+4. **`ScriptRuntimeContext`** — refactor: a private `Task<WaitForAttributeResponse> WaitInternal(name, encoded, predicate, timeout, requireGoodQuality)` that does the token-bounded `Ask` (keep the existing `AskTimeoutException → ...` handling; on AskTimeout return a synthetic `WaitForAttributeResponse(.., Matched:false, TimedOut:true)`). Then:
+   ```csharp
+   public async Task<bool> WaitAttribute(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
+       => (await WaitInternal(name, enc, pred, t, requireGoodQuality)).Matched;
+   public async Task<WaitResult> WaitAttributeFull(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
+   { var r = await WaitInternal(...); return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut); }
+   ```
+   (Note: `WaitAttribute`'s existing `AskTimeoutException → return false` must be preserved — fold it into `WaitInternal` returning a non-matched/timed-out response, OR catch in both. Do NOT catch `OperationCanceledException`/`TaskCanceledException`.)
+
+5. **`AttributeAccessor`** — add `requireGoodQuality` optional param to both existing `WaitAsync` overloads, and add two `WaitForAsync` overloads:
+   ```csharp
+   public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
+       => _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
+   public Task<WaitResult> WaitForAsync(string key, Func<object?,bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
+       => _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
+   ```
+   XML-doc: `requireGoodQuality:true` ignores Bad/Uncertain-quality transients.
+
+6. **Tests** (extend existing files): (a) `WaitForAsync` returns a populated `WaitResult` on match (Value+Quality) and on timeout (`Matched:false, TimedOut:true`). (b) quality-gated: a value reaching target at **Bad** quality does NOT match when `requireGoodQuality:true` (stays pending → times out), but DOES match when `false`; and matches when it reaches target at Good quality. Cover both fast-path (already-at-target-but-Bad) and change-match. (c) scope resolution still applied for `WaitForAsync`.
+
+7. Build `Commons` + `SiteRuntime` + the SiteRuntime test project; run `--filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync"` and the `~InstanceActor|~ScopeAccessor` regression filter. All green.
+
+8. Commit (pathspec).
+
+---
+
+### Task WD-2a: Routed contract + central path (§6, part 1)
+
+**Classification:** high-risk (cross-cluster message contract + `IInstanceRouter` surface)
+**Estimated implement time:** ~5 min
+**Parallelizable with:** WD-1
+
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs` (add the two records)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/IInstanceRouter.cs` (add method)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/CommunicationServiceInstanceRouter.cs` (delegate)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs` (`RouteTarget.WaitForAttribute`)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/CommunicationService.cs` (`RouteToWaitForAttributeAsync` — **wait-timeout-aware** Ask)
+- Modify (compile-break fixes — interface gained a member): `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Integration/ParentExecutionIdCorrelationTests.cs` (`BridgingInstanceRouter`) and the inline `IInstanceRouter` double in `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointContentTypeTests.cs`
+- Test: `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/RouteHelperTests.cs`
+
+**Steps (TDD):**
+
+1. **Commons records** (mirror `RouteToGetAttributes*`, value-equality only):
+   ```csharp
+   public record RouteToWaitForAttributeRequest(
+       string CorrelationId, string InstanceUniqueName, string AttributeName,
+       string? TargetValueEncoded, TimeSpan Timeout, DateTimeOffset Timestamp,
+       Guid? ParentExecutionId = null);
+   public record RouteToWaitForAttributeResponse(
+       string CorrelationId, bool Matched, object? Value, string? Quality, bool TimedOut,
+       bool Success, string? ErrorMessage, DateTimeOffset Timestamp);
+   ```
+   (`Success`/`ErrorMessage` = routing-level outcome, e.g. instance-not-found; `Matched`/`TimedOut`/`Value`/`Quality` = wait outcome.)
+
+2. **`IInstanceRouter`** — add `Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);`. **Update all 3 implementers** (prod `CommunicationServiceInstanceRouter` + the 2 test doubles listed above; the test doubles can return a canned response / throw NotImplemented only if never exercised — prefer a sane canned response).
+
+3. **`CommunicationServiceInstanceRouter`** — delegate to `_communicationService.RouteToWaitForAttributeAsync(...)`.
+
+4. **`RouteHelper.RouteTarget`** — add (mirror `GetAttributes`, throw on `!Success`):
+   ```csharp
+   public async Task<bool> WaitForAttribute(string attributeName, object? targetValue, TimeSpan timeout, CancellationToken cancellationToken = default)
+   {
+       var token = Effective(cancellationToken);
+       var siteId = await ResolveSiteAsync(token);
+       var request = new RouteToWaitForAttributeRequest(Guid.NewGuid().ToString(), _instanceCode,
+           attributeName, AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow, _parentExecutionId);
+       var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
+       if (!response.Success) throw new InvalidOperationException(response.ErrorMessage ?? "Remote attribute wait failed");
+       return response.Matched;
+   }
+   ```
+   (`AttributeValueCodec` is in Commons.Types — add the using if needed.)
+
+5. **`CommunicationService.RouteToWaitForAttributeAsync`** — mirror `RouteToGetAttributesAsync` BUT bound the Ask by the wait timeout, not the generic integration timeout:
+   ```csharp
+   var envelope = new SiteEnvelope(siteId, request);
+   var askTimeout = request.Timeout + _options.IntegrationTimeout; // slack beyond the wait
+   return await GetActor().Ask<RouteToWaitForAttributeResponse>(envelope, askTimeout, cancellationToken);
+   ```
+
+6. **Test** (`RouteHelperTests`): with a substitute `IInstanceRouter` returning a canned `RouteToWaitForAttributeResponse(Matched:true,...)`, `Route.To("x").WaitForAttribute("Flag", true, 30s)` returns true; `Success:false` → throws `InvalidOperationException`; the encoded target equals `AttributeValueCodec.Encode(true)`.
+
+7. Build `Commons` + `InboundAPI` + `Communication` + the two affected test projects; run `--filter "FullyQualifiedName~RouteHelper"` + a build of AuditLog.Tests/InboundAPI.Tests to confirm the interface-addition compiles. Commit (pathspec).
+
+---
+
+### Task WD-2b: Site unpacking + handler (§6, part 2)
+
+**Classification:** high-risk (actor handler crossing into `InstanceActor`; Ask-timeout correctness)
+**Estimated implement time:** ~4 min
+**Parallelizable with:** none
+**blockedBy:** WD-2a
+
+**Files:**
+- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/Actors/SiteCommunicationActor.cs` (add `Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));` next to the other RouteTo forwards ~line 145)
+- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (`Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);` + handler)
+- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/DeploymentManagerActorTests.cs`
+
+**Steps (TDD):**
+
+1. **`SiteCommunicationActor`** — add the `Receive`/Forward line.
+
+2. **`DeploymentManagerActor.RouteInboundApiWaitForAttribute`** — mirror `RouteInboundApiGetAttributes`:
+   ```csharp
+   private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
+   {
+       if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
+       {
+           Sender.Tell(new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false,
+               false, $"Instance '{request.InstanceUniqueName}' not found on this site.", DateTimeOffset.UtcNow));
+           return;
+       }
+       var sender = Sender;
+       var inner = new WaitForAttributeRequest(request.CorrelationId, request.InstanceUniqueName,
+           request.AttributeName, request.TargetValueEncoded, null /*predicate*/, request.Timeout,
+           DateTimeOffset.UtcNow /*, RequireGoodQuality defaults false */);
+       // Ask bounded by the WAIT timeout + slack (NOT a fixed 30s).
+       instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
+           .ContinueWith(t => t.IsCompletedSuccessfully
+               ? new RouteToWaitForAttributeResponse(request.CorrelationId, t.Result.Matched, t.Result.Value,
+                   t.Result.Quality, t.Result.TimedOut, true, null, DateTimeOffset.UtcNow)
+               : new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false, false,
+                   t.Exception?.GetBaseException().Message ?? "Attribute wait timed out", DateTimeOffset.UtcNow))
+           .PipeTo(sender);
+   }
+   ```
+   (`WaitForAttributeRequest` lives in Commons `Messages/Instance` — add the using. Build with both the trailing-`RequireGoodQuality` and pre-field signatures in mind; passing 7 positional args + default is fine.)
+
+3. **Test** (`DeploymentManagerActorTests`, mirror the routed get-attributes test): deploy/register an instance whose attribute already equals the target → `RouteToWaitForAttributeRequest` → `RouteToWaitForAttributeResponse(Success:true, Matched:true)`; unknown instance → `Success:false`.
+
+4. Build `Communication` + `SiteRuntime` + SiteRuntime test project; run `--filter "FullyQualifiedName~DeploymentManagerActor"`. Commit (pathspec).
+
+---
+
+### Task WD-3: Integration — docs + full verification
+
+**Classification:** standard
+**Estimated implement time:** ~4 min
+**Parallelizable with:** none
+**blockedBy:** WD-1, WD-2a, WD-2b
+
+**Files:**
+- Modify: `docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md` (mark §3 `WaitForAsync`/`WaitResult`, §4.2 quality-gated mode, and §6 routed variant as IMPLEMENTED; note Test-Run sandbox parity excluded)
+- Modify: `docs/requirements/Component-SiteRuntime.md` (script-surface note: `Attributes.WaitForAsync` + `requireGoodQuality`) and `docs/requirements/Component-InboundAPI.md` (`Route.To(...).WaitForAttribute`) — brief, only if those docs enumerate the script surface
+- (No new component, no migration, no docker config change)
+
+**Steps:**
+
+1. Update the spec doc + component docs as above.
+2. **Full-solution build:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` — 0 errors.
+3. **Targeted test sweep** across everything touched:
+   `dotnet test tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/... --filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync|FullyQualifiedName~DeploymentManagerActor"`,
+   `dotnet test tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/... --filter "FullyQualifiedName~RouteHelper"`,
+   and a build of `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` + `tests/ZB.MOM.WW.ScadaBridge.Communication.Tests` to confirm no compile/regression from the interface addition.
+4. `git diff` review; commit (pathspec).
+
+---
+
+## Out of scope (explicit)
+
+- Routed `WaitForAttribute` is NOT wired into the CentralUI Test-Run sandbox (`ISandboxInstanceGateway`/`SandboxInstanceGateway`); production inbound scripts get it. Follow-up if Test-Run parity is wanted.
+- No predicate or quality flag across the wire (§6 is value-equality only, per spec).
+- No docker redeploy (no cluster-runtime config change; additive script surface only).
@@ -0,0 +1,10 @@
+{
+  "planPath": "docs/plans/2026-06-17-waitfor-deferred-items.md",
+  "tasks": [
+    {"id": 1, "subject": "WD-1: site-local WaitForAsync + WaitResult + quality-gated mode (§3+§4.2)", "classification": "high-risk", "status": "pending", "parallelizableWith": [2]},
+    {"id": 2, "subject": "WD-2a: routed contract + central path (§6 part 1)", "classification": "high-risk", "status": "pending", "parallelizableWith": [1]},
+    {"id": 3, "subject": "WD-2b: site unpacking + DeploymentManager handler (§6 part 2)", "classification": "high-risk", "status": "pending", "blockedBy": [2]},
+    {"id": 4, "subject": "WD-3: integration — docs + full verification", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]}
+  ],
+  "lastUpdated": "2026-06-17"
+}