merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes

This commit is contained in:
Joseph Doherty
2026-06-17 09:28:15 -04:00
88 changed files with 7714 additions and 169 deletions
@@ -0,0 +1,150 @@
# M5 — Audit Hardening (T3T8) — Design
**Status:** Approved (awaiting plan).
**Worktree/branch:** `worktree-m5-audit-hardening` off `main` (`e77e209`).
**Source:** Phase-2 milestone M5 from `docs/plans/2026-06-15-stillpending-completion-design.md`.
## Goal
Harden the centralized Audit Log with six independent, ready-to-build items. Two
items originally listed under M5 — **T1 hash-chain tamper evidence** and **T2
Parquet export** — remain **deferred to v1.x** (per CLAUDE.md's audit design
decisions); their stubs (CLI `verify-chain` no-op, export `501`) stay unchanged.
## Scope (in)
T3 per-channel retention · T4 ParentExecutionId tag-cascade · T5 historical
backfill (reframed) · T6 per-node stuck KPIs · T7 structured response-capture
increments · T8 CLI `audit tree`.
## Scope (out / deferred to v1.x)
T1 hash-chain (no Hash/PrevHash columns, no real verify-chain), T2 Parquet
export (the `501` gate stays). Reversing those deferrals is a separate decision.
---
## Items
### T8 — CLI `audit tree` (smallest; reuses existing server walk + UI)
The recursive execution-tree walk (`IAuditLogRepository.GetExecutionTreeAsync`,
backed by `IX_AuditLog_ParentExecution`) and the Blazor `ExecutionTreePage`
already exist; only an HTTP projection + CLI surface are missing.
- **Server:** add `GET /api/audit/tree?executionId=…` in
`AuditEndpoints.MapAuditAPI``repo.GetExecutionTreeAsync` → serialize
`ExecutionTreeNode[]`.
- **CLI:** add `audit tree --execution-id <guid> [--format table|json]` in
`AuditCommands` + an `AuditTreeHelpers` renderer (indented ASCII tree for
`table`; raw nodes for `json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- No schema change. **Tests:** endpoint returns the tree; CLI renders a
multi-level tree + handles not-found.
### T6 — Per-node stuck-count KPIs
KPIs are per-site today; `SourceNode` is on the `Notification` and `SiteCalls`
rows but not aggregated.
- Add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to the existing
`ComputePerSiteKpisAsync` in `NotificationOutboxRepository` and
`SiteCallAuditRepository`.
- New `PerNode…KpiRequest`/`Response` message pair per actor; register in each
actor's `Receive<>`.
- Surface a per-node breakdown on the existing KPI tiles
(`AuditKpiTiles`/`SiteCallKpiTiles`) — additive, behind the existing tiles.
- **Tests:** repository grouping returns correct per-node counts (stuck/parked/
queue-depth); message round-trip.
### T7 — Structured response-capture increments (no schema change)
- **(a) Inbound request headers** → captured into the existing `Extra` JSON in
`AuditWriteMiddleware.EmitInboundAudit`, passed through the existing header
redactor (auth headers redacted by default).
- **(b) `AuditInboundCeilingHits`** counter on `AuditCentralHealthSnapshot`
(alongside the existing failure counters), incremented when an inbound row
truncates (request or response hits `InboundMaxBytes`). Surfaced via the
health snapshot.
- **(c) Per-method opt-out** of body capture: a `SkipBodyCapture` flag on
`PerTargetRedactionOverride`, checked in the capture pipeline so a noisy/
sensitive method can suppress body capture (headers + metadata still recorded).
- **Tests:** request headers land in `Extra` and are redacted; ceiling-hit
increments the counter; opt-out suppresses body but keeps the row.
### T4 — `ParentExecutionId` tag-cascade (touches the actor model — high-risk)
Completes the execution tree beyond the inbound-API→routed-script case.
- **Alarm on-trigger:** thread a `Guid? parentExecutionId` through
`AlarmActor.SpawnAlarmExecutionActor``AlarmExecutionActor`
`ScriptRuntimeContext`, so an alarm-triggered script chains to its firing
context (the alarm's own execution id where one exists; otherwise a root).
- **Nested `CallScript`/`CallShared`:** in `ScriptRuntimeContext`, pass **the
current run's `ExecutionId`** (not the inherited `_parentExecutionId`) as the
child invocation's `ParentExecutionId`, so `A → CallScript(B)` records B's
parent as A — a true multi-level tree.
- **Timer/expression-trigger top-level runs** stay roots (no spawner) — unchanged.
- **Tests:** alarm-triggered script row carries the expected parent; a 2-level
nested `CallScript` produces a chain A→B→C walkable by `GetExecutionTreeAsync`.
- **Risk:** serialized actor state + correlation plumbing; covered by targeted
SiteRuntime actor tests + a tree-walk integration assertion.
### T3 — Per-channel retention overrides (one design wrinkle, resolved)
Retention is a single global `RetentionDays`; the purge actor switches out whole
month partitions by `OccurredAtUtc` (channel-blind).
- Add `PerChannelRetentionDays` (`Dictionary<string,int>`, keyed by channel /
`Action` name) to `AuditLogOptions`, validated like the global value; a channel
override may only be **shorter** than the global window (longer is meaningless
under month-partition switch-out, which is governed by the largest retention).
- **Mechanism (resolved):** after the coarse global partition purge, the purge
actor runs a **bounded row-level delete** for channels whose override is
shorter than global (`DELETE … WHERE Action=@channel AND OccurredAtUtc<@thr`,
batched). This runs from the **purge/maintenance path, not the writer role**
the append-only invariant binds the writer/ingest role, not maintenance. The
**M2.10 CI grep-guard is widened** to allow the purge actor's single audited
deletion call site (an allow-list entry, not a blanket exemption).
- **Tests:** a channel with a shorter override is purged earlier than the global;
channels without an override follow the global; the guard still rejects
UPDATE/DELETE everywhere except the sanctioned purge site.
### T5 — Historical backfill (reframed per the computed-column reality)
- **`SourceNode`** is a physical nullable column. For truly historical rows the
node-of-origin is **unknowable**, so the backfill sets a **configurable
sentinel** (default `"unknown"`) on `NULL` rows via a one-shot maintenance
command (run from the purge/maintenance path), rather than guessing a node.
- **`ExecutionId`/`ParentExecutionId`** are **persisted computed columns derived
from `DetailsJson`**; backfilling them means mutating the JSON, which
append-only forbids. These are **documented as a runbook limitation** (pre-feature
rows stay NULL) — no code.
- **Tests:** the SourceNode backfill sets the sentinel only on NULL rows within a
bounded range and is idempotent; documentation note added.
---
## Cross-cutting
- **Shared seams:** `AuditLogOptions` (T3, T7), `AuditEndpoints.MapAuditAPI`
(T8), `AuditCommands` (T8), `AuditCentralHealthSnapshot` (T6, T7),
`IAuditLogRepository`/the KPI repositories (T6), the purge/maintenance role
(T3, T5). No AuditLog **schema** change in M5 (T1/T2 deferred).
- **Append-only:** the only new deletion is T3's purge-role channel delete +
T5's purge-role sentinel UPDATE — both maintenance-path, both reflected in the
CI guard's allow-list. Writer/ingest paths stay INSERT-only.
## Testing strategy
Per-item unit + targeted integration tests (above). T4 additionally gets a
tree-walk integration assertion. Full-solution build + targeted suites at the
integration step. No new infra dependency (Parquet deferred).
## Sequencing
Independent items, parallelizable by disjoint area:
- **Wave A (parallel):** T8 (CLI+endpoint), T6 (KPI repos+actors+tiles), T7
(middleware+health+redaction-override) — disjoint projects.
- **Wave B (parallel):** T4 (SiteRuntime actors — high-risk), T3 (AuditLog
options+purge actor+CI guard), T5 (purge-path backfill command + runbook).
- **Wave C:** integration verification + docs (Component-AuditLog/-CLI, CLAUDE.md
KPI/retention notes, runbook).
## Risks
- **T4** actor-model correlation (serialized state) — targeted tests + tree-walk
assertion.
- **T3** append-only tension — resolved via maintenance-role delete + CI-guard
allow-list; verify the guard still blocks all other DELETE/UPDATE.
- **T5** node-of-origin unknowable — sentinel + documented limitation (no false
precision).
@@ -0,0 +1,92 @@
# M5 — Audit Hardening (T3T8) Implementation Plan
> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session.
**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change.
**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`.
**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
**Conventions:** targeted builds/tests per task (`dotnet build <proj>`, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- <paths>` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
---
# Wave A — leverage-existing-infra (parallel; disjoint projects)
### Task M5.1 (T8): CLI `audit tree` + tree endpoint
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=<guid>``IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows.
- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id <guid> [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern.
- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering.
**AC:** `audit tree --execution-id <id>` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change.
### Task M5.2 (T6): Per-node stuck-count KPIs
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3
**Files:**
- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`.
- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each.
- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown.
- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged.
### Task M5.3 (T7): Structured response-capture increments
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row).
- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row.
**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved.
---
# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
### Task M5.4 (T4): ParentExecutionId tag-cascade
**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6)
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree.
- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain).
**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
### Task M5.5 (T3): Per-channel retention overrides
**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary<string,int> PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path.
- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere.
**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective.
### Task M5.6 (T5): SourceNode sentinel backfill + runbook
**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`)
**Files:**
- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel <s>] [--before <date>]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file).
- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision.
- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows.
**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
---
# Wave C — integration + docs
### Task M5.7: Integration verification + docs
**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1M5.6
**Steps:**
1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution).
2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation.
4. Commit; final integration review of the whole `1b7600f..HEAD` diff.
**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
---
## Native tasks & dependencies
Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.
@@ -0,0 +1,13 @@
{
"planPath": "docs/plans/2026-06-16-m5-audit-hardening.md",
"tasks": [
{"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"},
{"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"},
{"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"},
{"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"},
{"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"},
{"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]},
{"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]}
],
"lastUpdated": "2026-06-16"
}
@@ -0,0 +1,264 @@
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
**Date:** 2026-06-17
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
**Why now:** the DELMIA/MES receiver re-implementation
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
> All file paths, line numbers, message records, and signatures below were read from source on
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
---
## 1. The gap
The receiver handshake (and any request/response tag interaction) needs to **wait until a
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
or `MoveInCompleteFlag == true` after setting the trigger flag.
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
a manual poll loop:
```csharp
// current workaround — every handshake script repeats this
var deadline = DateTime.UtcNow.AddSeconds(30);
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
{
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
await Task.Delay(200, CancellationToken);
}
```
Why this is unsatisfactory:
- **Latency** — completion is detected up to one poll interval late (200 ms here).
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
---
## 2. Where the change goes (verified wiring)
| Concern | Type / file | Notes |
|---|---|---|
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)``src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)``src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate``TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute``Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
is the one place every change passes, a waiter that (a) checks the current value on registration and
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
---
## 3. Proposed API (script-facing)
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
path resolution (`Resolve(name)`) applies just like get/set:
```csharp
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
// within the timeout, false if it timed out. Honors the script CancellationToken.
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
// Optional richer overload that also returns the matched value + quality.
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
// record WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
```
> **Status:** IMPLEMENTED. `Attributes.WaitForAsync(...)` returns a `WaitResult`
> (`readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut)`
> in Commons), populated on match (Value + Quality) and `Matched:false, TimedOut:true` on timeout.
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
exception. The value-equality overload is the one the handshake needs and is the one that can also
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
Handshake, rewritten (replaces the §1 poll loop):
```csharp
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
return new {
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
};
```
```csharp
await Attributes.SetAsync("MoveInFlag", true);
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
```
---
## 4. Implementation outline (the patch)
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
```csharp
// actor protocol (site-local; delegate is fine because messaging is in-process)
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName, // already scope-resolved by the accessor
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc);
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string Quality,
bool TimedOut,
string? ErrorMessage = null);
// internal self-message used to fire the timeout
public record WaitForAttributeTimeout(string CorrelationId);
```
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
- **Handle `WaitForAttributeRequest`:**
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
use `Predicate`).
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
3. Otherwise register the waiter and schedule the timeout:
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
per instance (defensive; reply with `ErrorMessage` if exceeded).
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
allow removal during enumeration.)
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
Bad-quality transients.
> **Status:** IMPLEMENTED as an opt-in `requireGoodQuality` parameter on `WaitAsync`/`WaitForAsync`
> (additive trailing `RequireGoodQuality` field on `WaitForAttributeRequest`, gated at both the
> fast-path and resolve-loop match sites). Default `false` = quality-agnostic (matches on value only).
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
- Add a method that the accessor calls:
```csharp
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
Func<object?,bool>? predicate, TimeSpan timeout)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
predicate, timeout, DateTimeOffset.UtcNow);
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
return resp.Matched;
}
```
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
```csharp
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
```
### 4.6 Trust model — no change
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
---
## 5. Correctness notes
- **No missed edge.** Registration (current-value check) and change-handling both run on the
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
is caught by the fast-path check; a value that flips after registration is caught by
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
event-driven and cheaper.
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
the waiter is removed and the caller answered even if the value never changes. Match cancels the
scheduled timeout.
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
---
## 6. Optional: inbound / routed variant
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
fully cover the DELMIA/MES use case.
> **Status:** IMPLEMENTED. `Route.To(code).WaitForAttribute(name, targetValue, timeout)` is wired
> end-to-end (`RouteToWaitForAttributeRequest/Response` → `IInstanceRouter` → `CommunicationService`
> → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only
> across the wire. NOT wired into the CentralUI Test-Run sandbox — that remains a follow-up.
---
## 7. Acceptance criteria
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
with no poll loop.
2. Returns `false` (no throw) when the value never matches within the timeout.
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
value in the same actor step as registration, and one step after).
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
composed child's attribute).
7. Passes `ScriptAnalysis` trust validation unchanged.
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
of the poll loop.
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
the script-surface tests under `tests/…/SiteRuntime*`.
```
@@ -0,0 +1,226 @@
# WaitAsync Deferred Optional Items — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (subagent-driven) to implement this plan task-by-task.
**Goal:** Implement the three items deferred from the WaitAsync spec (`docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md`): §3 `WaitForAsync`/`WaitResult` richer overload, §4.2 quality-gated ("Good"-only) matching, and §6 inbound/routed `Route.To(...).WaitForAttribute` variant.
**Architecture:** Builds on the shipped core (`b89d69a``04e97f4`). Two of the items (§3, §4.2) are site-local enrichments of the existing `Attributes` script surface + `InstanceActor` waiter; no new actor protocol shapes beyond an additive `RequireGoodQuality` field. The third (§6) mirrors the existing `Route.To(...).GetAttributes` cross-cluster path end-to-end (`RouteTarget``IInstanceRouter``CommunicationService``SiteCommunicationActor``DeploymentManagerActor``InstanceActor`), value-equality only across the wire, with the cluster Ask bounded by the *wait* timeout rather than the generic integration timeout.
**Tech Stack:** C#/.NET 10, Akka.NET 1.5, xUnit + Akka.TestKit + NSubstitute.
**Branch/worktree:** `waitfor-attr-helper` at `/Users/dohertj2/Desktop/ScadaBridge/.claude/worktrees/waitfor-attr-helper` (off local main; carries the core feature). Implementers do NOT create worktrees, commit **pathspec form** (`git commit -m "…" -- <paths>`), do NOT push, do NOT touch main. Targeted builds/tests per task; full-solution build only in WD-3.
---
## Naming / shared shapes
- New script return type `WaitResult` (Commons): `public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);`
- `WaitForAttributeRequest` gains a trailing additive field `bool RequireGoodQuality = false` (site-local request). `RequireGoodQuality` semantics: a match requires the value test to pass **and** `string.Equals(quality, "Good", StringComparison.Ordinal)`.
- Routed contract (value-equality only, no predicate, no quality flag across the wire — §6 says value-equality only): `RouteToWaitForAttributeRequest` / `RouteToWaitForAttributeResponse` (Commons `Messages/InboundApi`).
- The `WaitForAttributeResponse.Quality` field is already `string?` (null on timeout/error).
---
## Execution waves
- **Wave 1 (parallel, disjoint files):** WD-1 ∥ WD-2a. (2 concurrent committers; post-wave HEAD-presence check.)
- **Wave 2:** WD-2b (after WD-2a).
- **Wave 3:** WD-3 (after WD-1, WD-2a, WD-2b).
WD-1 must add `RequireGoodQuality` ONLY as a **trailing defaulted** ctor param of `WaitForAttributeRequest`, so WD-2b's `new WaitForAttributeRequest(...)` (built in wave 2) compiles regardless.
---
### Task WD-1: Site-local `WaitForAsync` + `WaitResult` + quality-gated mode (§3 + §4.2)
**Classification:** high-risk (modifies the `InstanceActor` single-threaded match evaluation + an additive message-contract field)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-2a
**Files:**
- Create: `src/ZB.MOM.WW.ScadaBridge.Commons/Types/WaitResult.cs`
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Instance/WaitForAttribute.cs` (add trailing `bool RequireGoodQuality = false` to `WaitForAttributeRequest`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/InstanceActor.cs` (thread `RequireGoodQuality` into `PendingWait` + both match sites)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (add `WaitAttributeFull` returning `WaitResult`; add `requireGoodQuality` param)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScopeAccessors.cs` (add `WaitForAsync` overloads + `requireGoodQuality` optional param on `WaitAsync`)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/InstanceActorWaitForAttributeTests.cs` + `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Scripts/ScopeAccessorTests.cs`
**Steps (TDD):**
1. **`WaitResult`** — add the readonly record struct above.
2. **`WaitForAttributeRequest`** — add trailing `bool RequireGoodQuality = false`. Keep the `Func<>` predicate field as-is. Update the XML-doc.
3. **`InstanceActor`** — add `bool RequireGoodQuality` to the `PendingWait` record. At BOTH match sites build the effective match as:
```csharp
// fast-path (HandleWaitForAttribute): quality from _attributeQualities.GetValueOrDefault(name, <existing default>)
// resolve loop (ResolveMatchedWaiters): quality from changed.Quality
bool QualityOk(string? q) => !requireGoodQuality || string.Equals(q, "Good", StringComparison.Ordinal);
bool matched = QualityOk(quality) && test(value); // keep test() inside its existing try/catch
```
Store `RequireGoodQuality` on the `PendingWait` so the resolve loop knows it. Keep the throwing-predicate guard (the `QualityOk && test` must still be inside the existing try/catch). The fast-path quality-fail when `requireGoodQuality` is just a non-match → register + schedule timeout as normal (do NOT fast-reply matched).
4. **`ScriptRuntimeContext`** — refactor: a private `Task<WaitForAttributeResponse> WaitInternal(name, encoded, predicate, timeout, requireGoodQuality)` that does the token-bounded `Ask` (keep the existing `AskTimeoutException → ...` handling; on AskTimeout return a synthetic `WaitForAttributeResponse(.., Matched:false, TimedOut:true)`). Then:
```csharp
public async Task<bool> WaitAttribute(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
=> (await WaitInternal(name, enc, pred, t, requireGoodQuality)).Matched;
public async Task<WaitResult> WaitAttributeFull(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
{ var r = await WaitInternal(...); return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut); }
```
(Note: `WaitAttribute`'s existing `AskTimeoutException → return false` must be preserved — fold it into `WaitInternal` returning a non-matched/timed-out response, OR catch in both. Do NOT catch `OperationCanceledException`/`TaskCanceledException`.)
5. **`AttributeAccessor`** — add `requireGoodQuality` optional param to both existing `WaitAsync` overloads, and add two `WaitForAsync` overloads:
```csharp
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
public Task<WaitResult> WaitForAsync(string key, Func<object?,bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
```
XML-doc: `requireGoodQuality:true` ignores Bad/Uncertain-quality transients.
6. **Tests** (extend existing files): (a) `WaitForAsync` returns a populated `WaitResult` on match (Value+Quality) and on timeout (`Matched:false, TimedOut:true`). (b) quality-gated: a value reaching target at **Bad** quality does NOT match when `requireGoodQuality:true` (stays pending → times out), but DOES match when `false`; and matches when it reaches target at Good quality. Cover both fast-path (already-at-target-but-Bad) and change-match. (c) scope resolution still applied for `WaitForAsync`.
7. Build `Commons` + `SiteRuntime` + the SiteRuntime test project; run `--filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync"` and the `~InstanceActor|~ScopeAccessor` regression filter. All green.
8. Commit (pathspec).
---
### Task WD-2a: Routed contract + central path (§6, part 1)
**Classification:** high-risk (cross-cluster message contract + `IInstanceRouter` surface)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-1
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs` (add the two records)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/IInstanceRouter.cs` (add method)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/CommunicationServiceInstanceRouter.cs` (delegate)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs` (`RouteTarget.WaitForAttribute`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/CommunicationService.cs` (`RouteToWaitForAttributeAsync` — **wait-timeout-aware** Ask)
- Modify (compile-break fixes — interface gained a member): `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Integration/ParentExecutionIdCorrelationTests.cs` (`BridgingInstanceRouter`) and the inline `IInstanceRouter` double in `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointContentTypeTests.cs`
- Test: `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/RouteHelperTests.cs`
**Steps (TDD):**
1. **Commons records** (mirror `RouteToGetAttributes*`, value-equality only):
```csharp
public record RouteToWaitForAttributeRequest(
string CorrelationId, string InstanceUniqueName, string AttributeName,
string? TargetValueEncoded, TimeSpan Timeout, DateTimeOffset Timestamp,
Guid? ParentExecutionId = null);
public record RouteToWaitForAttributeResponse(
string CorrelationId, bool Matched, object? Value, string? Quality, bool TimedOut,
bool Success, string? ErrorMessage, DateTimeOffset Timestamp);
```
(`Success`/`ErrorMessage` = routing-level outcome, e.g. instance-not-found; `Matched`/`TimedOut`/`Value`/`Quality` = wait outcome.)
2. **`IInstanceRouter`** — add `Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);`. **Update all 3 implementers** (prod `CommunicationServiceInstanceRouter` + the 2 test doubles listed above; the test doubles can return a canned response / throw NotImplemented only if never exercised — prefer a sane canned response).
3. **`CommunicationServiceInstanceRouter`** — delegate to `_communicationService.RouteToWaitForAttributeAsync(...)`.
4. **`RouteHelper.RouteTarget`** — add (mirror `GetAttributes`, throw on `!Success`):
```csharp
public async Task<bool> WaitForAttribute(string attributeName, object? targetValue, TimeSpan timeout, CancellationToken cancellationToken = default)
{
var token = Effective(cancellationToken);
var siteId = await ResolveSiteAsync(token);
var request = new RouteToWaitForAttributeRequest(Guid.NewGuid().ToString(), _instanceCode,
attributeName, AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow, _parentExecutionId);
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
if (!response.Success) throw new InvalidOperationException(response.ErrorMessage ?? "Remote attribute wait failed");
return response.Matched;
}
```
(`AttributeValueCodec` is in Commons.Types — add the using if needed.)
5. **`CommunicationService.RouteToWaitForAttributeAsync`** — mirror `RouteToGetAttributesAsync` BUT bound the Ask by the wait timeout, not the generic integration timeout:
```csharp
var envelope = new SiteEnvelope(siteId, request);
var askTimeout = request.Timeout + _options.IntegrationTimeout; // slack beyond the wait
return await GetActor().Ask<RouteToWaitForAttributeResponse>(envelope, askTimeout, cancellationToken);
```
6. **Test** (`RouteHelperTests`): with a substitute `IInstanceRouter` returning a canned `RouteToWaitForAttributeResponse(Matched:true,...)`, `Route.To("x").WaitForAttribute("Flag", true, 30s)` returns true; `Success:false` → throws `InvalidOperationException`; the encoded target equals `AttributeValueCodec.Encode(true)`.
7. Build `Commons` + `InboundAPI` + `Communication` + the two affected test projects; run `--filter "FullyQualifiedName~RouteHelper"` + a build of AuditLog.Tests/InboundAPI.Tests to confirm the interface-addition compiles. Commit (pathspec).
---
### Task WD-2b: Site unpacking + handler (§6, part 2)
**Classification:** high-risk (actor handler crossing into `InstanceActor`; Ask-timeout correctness)
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-2a
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/Actors/SiteCommunicationActor.cs` (add `Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));` next to the other RouteTo forwards ~line 145)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (`Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);` + handler)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/DeploymentManagerActorTests.cs`
**Steps (TDD):**
1. **`SiteCommunicationActor`** — add the `Receive`/Forward line.
2. **`DeploymentManagerActor.RouteInboundApiWaitForAttribute`** — mirror `RouteInboundApiGetAttributes`:
```csharp
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
{
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
{
Sender.Tell(new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false,
false, $"Instance '{request.InstanceUniqueName}' not found on this site.", DateTimeOffset.UtcNow));
return;
}
var sender = Sender;
var inner = new WaitForAttributeRequest(request.CorrelationId, request.InstanceUniqueName,
request.AttributeName, request.TargetValueEncoded, null /*predicate*/, request.Timeout,
DateTimeOffset.UtcNow /*, RequireGoodQuality defaults false */);
// Ask bounded by the WAIT timeout + slack (NOT a fixed 30s).
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
.ContinueWith(t => t.IsCompletedSuccessfully
? new RouteToWaitForAttributeResponse(request.CorrelationId, t.Result.Matched, t.Result.Value,
t.Result.Quality, t.Result.TimedOut, true, null, DateTimeOffset.UtcNow)
: new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false, false,
t.Exception?.GetBaseException().Message ?? "Attribute wait timed out", DateTimeOffset.UtcNow))
.PipeTo(sender);
}
```
(`WaitForAttributeRequest` lives in Commons `Messages/Instance` — add the using. Build with both the trailing-`RequireGoodQuality` and pre-field signatures in mind; passing 7 positional args + default is fine.)
3. **Test** (`DeploymentManagerActorTests`, mirror the routed get-attributes test): deploy/register an instance whose attribute already equals the target → `RouteToWaitForAttributeRequest` → `RouteToWaitForAttributeResponse(Success:true, Matched:true)`; unknown instance → `Success:false`.
4. Build `Communication` + `SiteRuntime` + SiteRuntime test project; run `--filter "FullyQualifiedName~DeploymentManagerActor"`. Commit (pathspec).
---
### Task WD-3: Integration — docs + full verification
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-1, WD-2a, WD-2b
**Files:**
- Modify: `docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md` (mark §3 `WaitForAsync`/`WaitResult`, §4.2 quality-gated mode, and §6 routed variant as IMPLEMENTED; note Test-Run sandbox parity excluded)
- Modify: `docs/requirements/Component-SiteRuntime.md` (script-surface note: `Attributes.WaitForAsync` + `requireGoodQuality`) and `docs/requirements/Component-InboundAPI.md` (`Route.To(...).WaitForAttribute`) — brief, only if those docs enumerate the script surface
- (No new component, no migration, no docker config change)
**Steps:**
1. Update the spec doc + component docs as above.
2. **Full-solution build:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` — 0 errors.
3. **Targeted test sweep** across everything touched:
`dotnet test tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/... --filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync|FullyQualifiedName~DeploymentManagerActor"`,
`dotnet test tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/... --filter "FullyQualifiedName~RouteHelper"`,
and a build of `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` + `tests/ZB.MOM.WW.ScadaBridge.Communication.Tests` to confirm no compile/regression from the interface addition.
4. `git diff` review; commit (pathspec).
---
## Out of scope (explicit)
- Routed `WaitForAttribute` is NOT wired into the CentralUI Test-Run sandbox (`ISandboxInstanceGateway`/`SandboxInstanceGateway`); production inbound scripts get it. Follow-up if Test-Run parity is wanted.
- No predicate or quality flag across the wire (§6 is value-equality only, per spec).
- No docker redeploy (no cluster-runtime config change; additive script surface only).
@@ -0,0 +1,10 @@
{
"planPath": "docs/plans/2026-06-17-waitfor-deferred-items.md",
"tasks": [
{"id": 1, "subject": "WD-1: site-local WaitForAsync + WaitResult + quality-gated mode (§3+§4.2)", "classification": "high-risk", "status": "pending", "parallelizableWith": [2]},
{"id": 2, "subject": "WD-2a: routed contract + central path (§6 part 1)", "classification": "high-risk", "status": "pending", "parallelizableWith": [1]},
{"id": 3, "subject": "WD-2b: site unpacking + DeploymentManager handler (§6 part 2)", "classification": "high-risk", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "WD-3: integration — docs + full verification", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]}
],
"lastUpdated": "2026-06-17"
}
+165 -32
View File
@@ -158,16 +158,32 @@ is per-run and flat — `WHERE ExecutionId = X` returns everything one run did,
nothing links a run to the run that *spawned* it. `ParentExecutionId` carries the
spawning execution's `ExecutionId`: a spawned run still gets its own fresh
`ExecutionId`, and every audit row it emits also carries the spawner's id in
`ParentExecutionId`. The first cut bridges the **inbound API → routed-site-script**
case: an inbound request runs a method script that calls `Route.Call`, routing to
a site instance; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
itself is top-level (`ParentExecutionId` NULL). The pointer always references the
*immediate* spawner, so a routed run that itself routes onward threads its own
`ExecutionId` — walking `ParentExecutionId → ExecutionId` recursively
reconstructs the call chain as a tree of arbitrary depth. The tag-cascade case
(an attribute write triggering another script) is **deferred** — the model
generalises to it with no schema change once that spawn point is threaded.
`ParentExecutionId`. The pointer always references the *immediate* spawner, so a
run that itself spawns further runs threads its own `ExecutionId` — walking
`ParentExecutionId → ExecutionId` recursively reconstructs the call chain as a
tree of arbitrary depth.
**Tag-cascade coverage (M5.4 T4):** `ParentExecutionId` threading now spans all
known spawn points:
- **Inbound API → routed site script** — an inbound request runs a method script
that calls `Route.Call`; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
is top-level (`ParentExecutionId` NULL).
- **Alarm-triggered on-trigger script** — when an alarm fires and its on-trigger
script runs (via `AlarmActor → AlarmExecutionActor`), the alarm context's
`ExecutionId` is carried as the run's `ParentExecutionId`. Currently the alarm
subsystem has no Guid-typed firing id so on-trigger runs are roots (NULL) in
practice, but the wiring is in place for a future alarm `ExecutionId`.
- **Nested `CallScript` / `CallShared` invocations** — when a script calls
`Instance.CallScript(...)` or a shared script via `CallShared`, the calling
execution's `ExecutionId` threads into the spawned run as its
`ParentExecutionId`, making deeply nested call chains visible as a tree.
Attribute-write-triggered cascades (one tag change triggering another script via a
tag subscription) are also wired: trigger-driven runs carry `ParentExecutionId =
NULL` (top-level roots), and any nested `CallScript`/`CallShared` they perform
chains as above. The schema is unchanged — no further tag-cascade work is deferred.
## The Site-Local `AuditLog` (SQLite)
@@ -268,7 +284,34 @@ operational `SiteCalls` shape for the dispatcher and UI.
- **Default cap** — 8 KB for each of `RequestSummary` and `ResponseSummary`;
raised to 64 KB on any error row (`Status IN ('Failed', 'Parked', 'Discarded')`).
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and `ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB (configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min 8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to other channels do not apply here. `PayloadTruncated = 1` is set only when the inbound ceiling is hit — verbatim capture is the normal case. The ceiling applies independently to each body. Header redaction and per-target body redactors still run before persistence.
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and
`ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB
(configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min
8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to
other channels do not apply here. `PayloadTruncated = 1` is set only when the
inbound ceiling is hit — verbatim capture is the normal case. The ceiling
applies independently to each body. Header redaction and per-target body
redactors still run before persistence.
- **Inbound ceiling hits (M5.3 T7).** Every time the `InboundMaxBytes` ceiling
truncates a body an `IAuditInboundCeilingHitsCounter.Increment()` call fires.
This counter is surfaced as `AuditInboundCeilingHits` on the central health
snapshot (alongside `CentralAuditWriteFailures` / `AuditRedactionFailure`) so
operators can detect persistently oversized payloads and raise the ceiling or
add per-target body redactors.
- **Request headers in `Extra` (M5.3 T7).** For `Channel = ApiInbound`, the
`AuditWriteMiddleware` captures the inbound HTTP request headers (post-redaction
`Authorization`, `X-API-Key`, `Cookie`, `Set-Cookie`, and the configured
`HeaderRedactList` are scrubbed before serialization) into the `Extra` JSON
column under the key `"requestHeaders"`. This makes the full header envelope
visible in the Audit Log UI's detail drawer and the CLI's `audit query` output
without widening the schema.
- **Per-method `SkipBodyCapture` (M5.3 T7).** `PerTargetOverrides` now includes
a `SkipBodyCapture: true` flag. When set for an inbound API method, the audit
row is always emitted (headers, status, duration, actor, etc. are recorded) but
`RequestSummary` and `ResponseSummary` are left null. Use this for methods whose
payloads are structurally large or contain secrets not covered by body redactors.
Headers are still captured into `Extra.requestHeaders` (after redaction) even
when `SkipBodyCapture` is true.
- **Truncation** — UTF-8 byte-safe; `PayloadTruncated = 1` when applied. Full
bodies are never stored.
- **HTTP headers** — `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`, and
@@ -311,16 +354,33 @@ MS SQL for direct-write events). Unredacted secrets never persist.
## Retention & Purge
- **Central:** 365-day default based on `OccurredAtUtc`, configurable via
`AuditLog:RetentionDays` (min 7, max 3650). Single global retention in v1 —
no per-channel overrides.
`AuditLog:RetentionDays` (min 30, max 3650).
- **Partitioning:** monthly partitions on `OccurredAtUtc` from day one
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). Purge is a partition switch;
there are no row-level deletes at central.
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). The global partition switch is
channel-blind; it drops a whole month once every row in it is older than the
global window. There are no row-level deletes at central for the global purge.
- **Purge actor:** `AuditLogPurgeActor` singleton on the active central node
runs daily, switches out any partition whose latest `OccurredAtUtc` is older
than the retention window, and emits an `AuditLog:Purged` event (partition
range, rowcount, duration). A partition-maintenance step rolls forward each
month, creating the next month's partition ahead of time.
than the retention window, then applies any per-channel overrides (see below),
and emits an `AuditLog:Purged` event (partition range, rowcount, duration) per
switched partition. A partition-maintenance step rolls forward each month,
creating the next month's partition ahead of time.
- **Per-channel retention overrides (M5.5 T3):** `AuditLog:PerChannelRetentionDays`
is a dictionary keyed by canonical channel name (`ApiOutbound`, `DbOutbound`,
`Notification`, `ApiInbound`) whose value is a retention window in days that
MUST be strictly shorter than the global `RetentionDays`. After the daily
partition switch-out, the purge actor runs a bounded, batched row DELETE
(`PurgeChannelOlderThanAsync`) for each channel whose override is shorter than
the global window — expiring rows of that channel earlier than the global
partition switch would. Overrides equal to or longer than the global window are
silently skipped (the global switch already covers them). The DELETE runs under
`scadabridge_audit_purger` (the maintenance role); the append-only writer role
is unaffected. Batch size is configurable via
`AuditLogPurge:ChannelPurgeBatchSize` (default 5000). Each channel override
runs in its own try/catch, mirroring the per-boundary error-isolation of the
partition switch-out loop. Values are validated to be in
`[30, RetentionDays]`; keys that are not a recognized `AuditChannel` enum name
are rejected at startup.
- **Sites:** daily site job; default 7-day retention (configurable, min 1,
max 90). Respects the hard `ForwardState` invariant — `Pending` rows are
never purged on age alone.
@@ -340,10 +400,13 @@ MS SQL for direct-write events). Unredacted secrets never persist.
**AuditExport** permission.
- **Payload redaction at write.** See Payload Capture Policy. Unredacted
secrets never persist; the safety net over-redacts on misconfiguration.
- **Hash-chain tamper evidence — deferred to v1.x.** A future `RowHash` column,
computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will be
verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. Off by
default in v1.
- **Hash-chain tamper evidence (T1) — deferred to v1.x.** A future `RowHash`
column, computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will
be verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. The
`verify-chain` CLI command is a no-op placeholder today. Off by default in v1.
- **Parquet archival (T2) — deferred to v1.x.** Long-term cold storage of purged
monthly partitions as Parquet files (suitable for offline analytics) will be
added in a future milestone. T1 and T2 are not shipped as part of M5.
- **Site SQLite security.** File permissions: read/write by the ScadaBridge
service account only. Not backed up off-machine — site SQLite is a buffer,
not a record.
@@ -355,11 +418,22 @@ Point-in-time, computed from the central `AuditLog` table; global and per-site.
- **Audit volume** — events/min landing in the central `AuditLog`; global plus per-site sparkline.
- **Audit error rate** — % of central `AuditLog` rows with `Status IN ('Failed', 'Parked', 'Discarded')` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, permanent failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit backlog** — sum of `Pending` site rows across sites; click drills into a per-site breakdown.
- **`AuditInboundCeilingHits`** (M5.3 T7) — rolling count of inbound API responses truncated by the `InboundMaxBytes` ceiling; surfaced on the central health snapshot alongside `CentralAuditWriteFailures`.
**Per-node stuck KPIs (M5.3 T6):** Both [Notification Outbox](Component-NotificationOutbox.md)
and [Site Call Audit](Component-SiteCallAudit.md) now expose a
`PerNodeNotificationKpiRequest` / `PerNodeSiteCallKpiRequest` message pair that
groups the existing stuck, parked, and delivered-last-interval counts by the
`SourceNode` that emitted the original row. This surfaces per-node breakdowns on
the Health dashboard tiles and the Notification Outbox / Site Calls pages,
making it possible to identify a single misbehaving node (e.g., `site-a:node-b`)
as the source of a spike rather than a site-wide problem. The existing global and
per-site KPI shapes are unchanged; the per-node slice is additive.
[Notification Outbox](Component-NotificationOutbox.md) and
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected they remain
sourced from `Notifications` and `SiteCalls` respectively. Audit Log KPIs
describe the audit table itself.
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected for their
operational dispatch responsibilities — they remain sourced from `Notifications`
and `SiteCalls` respectively. Audit Log KPIs describe the audit table itself.
## Configuration
@@ -370,21 +444,78 @@ component (Options pattern):
"AuditLog": {
"DefaultCapBytes": 8192,
"ErrorCapBytes": 65536,
"InboundMaxBytes": 1048576,
"HeaderRedactList": [ "Authorization", "Cookie", "Set-Cookie", "X-API-Key" ],
"GlobalBodyRedactors": [
{ "Pattern": "\"password\"\\s*:\\s*\"[^\"]+\"", "Replacement": "\"password\":\"<redacted>\"" }
],
"PerTargetOverrides": {
"Weather/GetForecast": { "CapBytes": 4096 },
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" }
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" },
"HighVolumeMethod": { "SkipBodyCapture": true }
},
"RetentionDays": 365
"RetentionDays": 365,
"PerChannelRetentionDays": {
"ApiOutbound": 90,
"Notification": 180
}
}
```
`PerTargetOverrides` keys bind by External System / Inbound Method /
Notification List / Database Connection name. `RetentionDays` is a single
global value in v1; per-channel overrides are deferred to v1.x.
Notification List / Database Connection name. `SkipBodyCapture: true` omits
`RequestSummary`/`ResponseSummary` for that method while still capturing headers
into `Extra.requestHeaders` and emitting the full audit row. `RetentionDays` is
the global window; `PerChannelRetentionDays` specifies per-channel windows that
are strictly shorter — any channel whose override equals or exceeds the global
value is silently ignored (the global partition switch-out already governs it).
`AuditLogPurge` section controls the purge actor cadence and batch size:
```jsonc
"AuditLogPurge": {
"IntervalHours": 24,
"ChannelPurgeBatchSize": 5000
}
```
## Ops Notes — Historical Null Columns
### `SourceNode` backfill (M5.6 T5)
`SourceNode` (`varchar(64)` NULL) is a physical column stamped on every row at
write time. Rows ingested before M5.6 shipped have `SourceNode IS NULL` because
the value was not populated until the feature landed. A one-time CLI command sets
these to a configurable sentinel:
```
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel unknown] [--batch 5000]
```
The default sentinel is `"unknown"`. The true node-of-origin for pre-feature rows
is **unknowable** retroactively — the emitting node is long gone from the telemetry
pipeline. The sentinel makes that explicit rather than leaving the column NULL
(which the Audit Log UI's Node filter already treats as "unresolved", but which
an operator might mistake for a data-quality bug).
The backfill runs via `POST /api/audit/backfill-source-node` (Admin role required)
on the maintenance/purge path, NOT the append-only `scadabridge_audit_writer` role.
It is idempotent and can be re-run safely.
### `ExecutionId` and `ParentExecutionId` — cannot be backfilled
`ExecutionId` and `ParentExecutionId` are **PERSISTED COMPUTED columns** derived
from `DetailsJson`. They were introduced in the same feature window as the column
itself but their value comes from the JSON payload that was written at ingest time.
The AuditLog append-only invariant **forbids mutating `DetailsJson`** — rows may
only be inserted, never updated. Because backfilling the computed values would
require rewriting the underlying `DetailsJson`, it is impossible under the
append-only contract. Pre-feature rows carry `NULL` in both columns permanently.
This is a documented limitation, not a defect. The NULL values are visible in the
Audit Log UI's execution-tree drilldown (rows with no `ExecutionId` appear as
orphaned entries) and in the CLI's `audit tree` output.
## Dependencies
@@ -442,6 +573,8 @@ global value in v1; per-channel overrides are deferred to v1.x.
tiles (Volume, Error rate, Backlog) plus new health metrics:
`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`,
`CentralAuditWriteFailures`, `AuditRedactionFailure`.
- **[CLI (#19)](Component-CLI.md)** — new `scadabridge audit query`,
`scadabridge audit export`, and `scadabridge audit verify-chain` commands; same
permission requirements as the UI.
- **[CLI (#19)](Component-CLI.md)** — `scadabridge audit query`,
`scadabridge audit export`, `scadabridge audit tree --execution-id <guid>`,
`scadabridge audit backfill-source-node --sentinel <s> --before <date>`, and
`scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain
feature); same permission requirements as the UI.
+20 -5
View File
@@ -228,14 +228,17 @@ The new centralized Audit Log component (#23) is exposed via the `scadabridge au
The `scadabridge audit` group targets the centralized Audit Log component (#23) and
exposes the UI-equivalent operational audit surface. Permissions follow the same
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
require the `OperationalAudit` permission; `audit export` additionally requires
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
exit code 2) on denial.
Tamper-Evidence, and Security & Auth #10): `audit query`, `audit tree`, and
`audit verify-chain` require the `OperationalAudit` permission; `audit export`
additionally requires `AuditExport`; `audit backfill-source-node` requires the
`Admin` role (maintenance path only). The server enforces permission checks and
returns HTTP 403 (CLI exit code 2) on denial.
```
scadabridge audit query [--since <t>] [--until <t>] [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>] [--correlation-id <id>] [--execution-id <id>] [--parent-execution-id <id>] [--errors-only] [--page-size <n>] [--all]
scadabridge audit export --since <t> --until <t> --format csv|jsonl|parquet --output <path> [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>]
scadabridge audit tree --execution-id <guid> [--format table|json]
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
scadabridge audit verify-chain --month <YYYY-MM>
```
@@ -247,6 +250,18 @@ scadabridge audit verify-chain --month <YYYY-MM>
requested format (`csv`, `jsonl`, `parquet`) written to `--output`. The server
streams rows rather than materializing them in memory; the CLI writes bytes
through to disk. Supports the same scoping filters as `audit query`.
- `audit tree --execution-id <guid>` (M5.3 T8) — renders the full execution-chain
tree for the given `ExecutionId`. The server resolves the root from any node in
the chain (walks `ParentExecutionId` to find the root, then traverses downward)
and returns all reachable executions with their summary row counts and first/last
occurred timestamps. Output format: `json` (default — structured tree suitable
for scripting) or `table` (human-readable indented tree). Requires
`OperationalAudit` permission. Backed by `GET /api/audit/tree?executionId=<guid>`.
- `audit backfill-source-node --before <ISO-8601-UTC>` (M5.6 T5) — sets
`SourceNode` to a sentinel value (`--sentinel`, default `"unknown"`) on pre-feature
rows where `SourceNode IS NULL` and `OccurredAtUtc < --before`, in batches
(`--batch`, default 5000). Admin-only maintenance command. Idempotent.
Backed by `POST /api/audit/backfill-source-node`.
- `audit verify-chain` — hash-chain verification for the named month.
**No-op in v1**: the command is defined so the command tree is stable, but
verification only becomes meaningful once the hash-chain ships (see
@@ -366,7 +381,7 @@ Configuration is resolved in the following priority order (highest wins):
- **System.CommandLine**: Command-line argument parsing.
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadabridge audit` command group rides a parallel REST surface on the same Host (`GET /api/audit/query` and `GET /api/audit/export`), sharing HTTP Basic Auth with `/management` but bypassing the actor for read-only, keyset-paged / streaming workloads.
- **Audit Log (#23)**: The `scadabridge audit query` and `audit export` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) on the Host's Management API surface; `audit verify-chain` rides `POST /management` until hash-chain verification ships. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side by `AuditEndpoints`.
- **Audit Log (#23)**: The `scadabridge audit query`, `audit export`, `audit tree`, and `audit backfill-source-node` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`, `GET /api/audit/tree`, `POST /api/audit/backfill-source-node`) on the Host's Management API surface; `audit verify-chain` is a client-side no-op today (hash-chain deferred to v1.x). Permission checks (`OperationalAudit`, `AuditExport`, `Admin`) are enforced server-side by `AuditEndpoints`.
## Interactions
@@ -189,6 +189,7 @@ Inbound API scripts **cannot** call shared scripts directly — shared scripts a
- `Route.To("instanceUniqueCode").GetAttributes("attr1", "attr2", ...)` — Read multiple attribute values in a **single call**, returned as a dictionary of name-value pairs.
- `Route.To("instanceUniqueCode").SetAttribute("attributeName", value)` — Write a single attribute value on a specific instance at any site.
- `Route.To("instanceUniqueCode").SetAttributes(dictionary)` — Write multiple attribute values in a **single call**, accepting a dictionary of name-value pairs.
- `Route.To("instanceUniqueCode").WaitForAttribute("attributeName", targetValue, timeout)` — Wait, event-driven, until an attribute on a specific instance at any site reaches `targetValue` (value-equality only across the wire), bounded by `timeout`. Returns `true` if matched within the timeout, `false` if it timed out. The cluster call is bounded by the wait timeout rather than the generic integration timeout.
#### Input/Output
- **Input parameters** are available as defined in the method definition.