merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes

This commit is contained in:
Joseph Doherty
2026-06-17 09:28:15 -04:00
88 changed files with 7714 additions and 169 deletions
+6 -4
View File
@@ -163,14 +163,16 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
- Scope = script trust boundary: outbound API (sync + cached), outbound DB (sync + cached), notifications, inbound API. Framework/internal traffic is explicitly excluded.
- One row per lifecycle event; cached calls produce 4+ rows per operation (`Submitted`, `Forwarded`, `Attempted`, `Delivered`/`Parked`/`Discarded`).
- `ExecutionId` (`uniqueidentifier NULL`) is the universal per-run correlation value — every audit row emitted by one script execution / inbound request shares it; `CorrelationId` remains the per-operation lifecycle id (NULL for sync one-shots).
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; first cut bridges the inbound API → routed-site-script case (the routed run records the inbound request's `ExecutionId`; the inbound row stays top-level / NULL); `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk; tag cascade deferred.
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; bridges inbound API → routed-site-script, alarm-triggered on-trigger scripts, and nested `CallScript`/`CallShared` invocations; `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk. Tag-cascade coverage is complete as of M5.4 (T4) — no further spawn points are deferred.
- Site SQLite hot-path first, then gRPC telemetry to central; ingest is idempotent on `EventId`; periodic reconciliation pull as fallback when telemetry is lost.
- Cached operations: site emits a single additively-extended `CachedCallTelemetry` packet carrying both audit events and operational state; central writes `AuditLog` + `SiteCalls` in one transaction.
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in.
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in. Inbound API: full verbatim capture up to `InboundMaxBytes` (default 1 MiB); request headers stored in `Extra.requestHeaders` (post-redaction); per-method `SkipBodyCapture` flag suppresses bodies while still recording headers + metadata; `AuditInboundCeilingHits` counter surfaced on health snapshot. (M5.3 T7)
- Audit-write failure NEVER aborts the user-facing action — audit is best-effort, the action's own success/failure path is authoritative.
- 365-day central retention with monthly partition-switch purge; 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence and Parquet archival are deferred to v1.x.
- 365-day central retention with monthly partition-switch purge; per-channel retention overrides (`AuditLog:PerChannelRetentionDays`) expire rows earlier than the global window via a bounded, batched row DELETE on the purge actor's maintenance path — values must be shorter than the global window (M5.5 T3); 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence (T1) and Parquet archival (T2) are deferred to v1.x — not shipped in M5.
- Node-of-origin is captured alongside site-of-origin: `SourceNode` (`varchar(64)` NULL) on `AuditLog`, `Notifications`, and `SiteCalls``node-a`/`node-b` for site rows (qualified by `SourceSiteId`/`SourceSite`), `central-a`/`central-b` for central direct-write rows. Stamped at the writing node, carried verbatim through telemetry + reconciliation, and indexed via `IX_AuditLog_Node_Occurred (SourceNode, OccurredAtUtc)` on `AuditLog`.
- Per-node stuck KPIs (M5.3 T6): Notification Outbox and Site Call Audit expose `PerNodeNotificationKpiRequest`/`PerNodeSiteCallKpiRequest` messages that group stuck/parked/delivered counts by `SourceNode`, surfacing per-node breakdowns on the Health dashboard.
- `audit tree --execution-id <guid>` CLI command (M5.3 T8) + `GET /api/audit/tree` endpoint — resolves any node to its chain root and renders the full execution tree; backed by `IAuditLogRepository.GetExecutionTreeAsync`.
- Central UI: new top-level **Audit** nav group + Audit Log page, with drill-ins from Notifications, Site Calls, External Systems, Inbound API Keys, Sites, and Instances.
### Security & Auth
@@ -0,0 +1,150 @@
# M5 — Audit Hardening (T3T8) — Design
**Status:** Approved (awaiting plan).
**Worktree/branch:** `worktree-m5-audit-hardening` off `main` (`e77e209`).
**Source:** Phase-2 milestone M5 from `docs/plans/2026-06-15-stillpending-completion-design.md`.
## Goal
Harden the centralized Audit Log with six independent, ready-to-build items. Two
items originally listed under M5 — **T1 hash-chain tamper evidence** and **T2
Parquet export** — remain **deferred to v1.x** (per CLAUDE.md's audit design
decisions); their stubs (CLI `verify-chain` no-op, export `501`) stay unchanged.
## Scope (in)
T3 per-channel retention · T4 ParentExecutionId tag-cascade · T5 historical
backfill (reframed) · T6 per-node stuck KPIs · T7 structured response-capture
increments · T8 CLI `audit tree`.
## Scope (out / deferred to v1.x)
T1 hash-chain (no Hash/PrevHash columns, no real verify-chain), T2 Parquet
export (the `501` gate stays). Reversing those deferrals is a separate decision.
---
## Items
### T8 — CLI `audit tree` (smallest; reuses existing server walk + UI)
The recursive execution-tree walk (`IAuditLogRepository.GetExecutionTreeAsync`,
backed by `IX_AuditLog_ParentExecution`) and the Blazor `ExecutionTreePage`
already exist; only an HTTP projection + CLI surface are missing.
- **Server:** add `GET /api/audit/tree?executionId=…` in
`AuditEndpoints.MapAuditAPI``repo.GetExecutionTreeAsync` → serialize
`ExecutionTreeNode[]`.
- **CLI:** add `audit tree --execution-id <guid> [--format table|json]` in
`AuditCommands` + an `AuditTreeHelpers` renderer (indented ASCII tree for
`table`; raw nodes for `json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- No schema change. **Tests:** endpoint returns the tree; CLI renders a
multi-level tree + handles not-found.
### T6 — Per-node stuck-count KPIs
KPIs are per-site today; `SourceNode` is on the `Notification` and `SiteCalls`
rows but not aggregated.
- Add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to the existing
`ComputePerSiteKpisAsync` in `NotificationOutboxRepository` and
`SiteCallAuditRepository`.
- New `PerNode…KpiRequest`/`Response` message pair per actor; register in each
actor's `Receive<>`.
- Surface a per-node breakdown on the existing KPI tiles
(`AuditKpiTiles`/`SiteCallKpiTiles`) — additive, behind the existing tiles.
- **Tests:** repository grouping returns correct per-node counts (stuck/parked/
queue-depth); message round-trip.
### T7 — Structured response-capture increments (no schema change)
- **(a) Inbound request headers** → captured into the existing `Extra` JSON in
`AuditWriteMiddleware.EmitInboundAudit`, passed through the existing header
redactor (auth headers redacted by default).
- **(b) `AuditInboundCeilingHits`** counter on `AuditCentralHealthSnapshot`
(alongside the existing failure counters), incremented when an inbound row
truncates (request or response hits `InboundMaxBytes`). Surfaced via the
health snapshot.
- **(c) Per-method opt-out** of body capture: a `SkipBodyCapture` flag on
`PerTargetRedactionOverride`, checked in the capture pipeline so a noisy/
sensitive method can suppress body capture (headers + metadata still recorded).
- **Tests:** request headers land in `Extra` and are redacted; ceiling-hit
increments the counter; opt-out suppresses body but keeps the row.
### T4 — `ParentExecutionId` tag-cascade (touches the actor model — high-risk)
Completes the execution tree beyond the inbound-API→routed-script case.
- **Alarm on-trigger:** thread a `Guid? parentExecutionId` through
`AlarmActor.SpawnAlarmExecutionActor``AlarmExecutionActor`
`ScriptRuntimeContext`, so an alarm-triggered script chains to its firing
context (the alarm's own execution id where one exists; otherwise a root).
- **Nested `CallScript`/`CallShared`:** in `ScriptRuntimeContext`, pass **the
current run's `ExecutionId`** (not the inherited `_parentExecutionId`) as the
child invocation's `ParentExecutionId`, so `A → CallScript(B)` records B's
parent as A — a true multi-level tree.
- **Timer/expression-trigger top-level runs** stay roots (no spawner) — unchanged.
- **Tests:** alarm-triggered script row carries the expected parent; a 2-level
nested `CallScript` produces a chain A→B→C walkable by `GetExecutionTreeAsync`.
- **Risk:** serialized actor state + correlation plumbing; covered by targeted
SiteRuntime actor tests + a tree-walk integration assertion.
### T3 — Per-channel retention overrides (one design wrinkle, resolved)
Retention is a single global `RetentionDays`; the purge actor switches out whole
month partitions by `OccurredAtUtc` (channel-blind).
- Add `PerChannelRetentionDays` (`Dictionary<string,int>`, keyed by channel /
`Action` name) to `AuditLogOptions`, validated like the global value; a channel
override may only be **shorter** than the global window (longer is meaningless
under month-partition switch-out, which is governed by the largest retention).
- **Mechanism (resolved):** after the coarse global partition purge, the purge
actor runs a **bounded row-level delete** for channels whose override is
shorter than global (`DELETE … WHERE Action=@channel AND OccurredAtUtc<@thr`,
batched). This runs from the **purge/maintenance path, not the writer role**
the append-only invariant binds the writer/ingest role, not maintenance. The
**M2.10 CI grep-guard is widened** to allow the purge actor's single audited
deletion call site (an allow-list entry, not a blanket exemption).
- **Tests:** a channel with a shorter override is purged earlier than the global;
channels without an override follow the global; the guard still rejects
UPDATE/DELETE everywhere except the sanctioned purge site.
### T5 — Historical backfill (reframed per the computed-column reality)
- **`SourceNode`** is a physical nullable column. For truly historical rows the
node-of-origin is **unknowable**, so the backfill sets a **configurable
sentinel** (default `"unknown"`) on `NULL` rows via a one-shot maintenance
command (run from the purge/maintenance path), rather than guessing a node.
- **`ExecutionId`/`ParentExecutionId`** are **persisted computed columns derived
from `DetailsJson`**; backfilling them means mutating the JSON, which
append-only forbids. These are **documented as a runbook limitation** (pre-feature
rows stay NULL) — no code.
- **Tests:** the SourceNode backfill sets the sentinel only on NULL rows within a
bounded range and is idempotent; documentation note added.
---
## Cross-cutting
- **Shared seams:** `AuditLogOptions` (T3, T7), `AuditEndpoints.MapAuditAPI`
(T8), `AuditCommands` (T8), `AuditCentralHealthSnapshot` (T6, T7),
`IAuditLogRepository`/the KPI repositories (T6), the purge/maintenance role
(T3, T5). No AuditLog **schema** change in M5 (T1/T2 deferred).
- **Append-only:** the only new deletion is T3's purge-role channel delete +
T5's purge-role sentinel UPDATE — both maintenance-path, both reflected in the
CI guard's allow-list. Writer/ingest paths stay INSERT-only.
## Testing strategy
Per-item unit + targeted integration tests (above). T4 additionally gets a
tree-walk integration assertion. Full-solution build + targeted suites at the
integration step. No new infra dependency (Parquet deferred).
## Sequencing
Independent items, parallelizable by disjoint area:
- **Wave A (parallel):** T8 (CLI+endpoint), T6 (KPI repos+actors+tiles), T7
(middleware+health+redaction-override) — disjoint projects.
- **Wave B (parallel):** T4 (SiteRuntime actors — high-risk), T3 (AuditLog
options+purge actor+CI guard), T5 (purge-path backfill command + runbook).
- **Wave C:** integration verification + docs (Component-AuditLog/-CLI, CLAUDE.md
KPI/retention notes, runbook).
## Risks
- **T4** actor-model correlation (serialized state) — targeted tests + tree-walk
assertion.
- **T3** append-only tension — resolved via maintenance-role delete + CI-guard
allow-list; verify the guard still blocks all other DELETE/UPDATE.
- **T5** node-of-origin unknowable — sentinel + documented limitation (no false
precision).
@@ -0,0 +1,92 @@
# M5 — Audit Hardening (T3T8) Implementation Plan
> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session.
**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change.
**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`.
**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
**Conventions:** targeted builds/tests per task (`dotnet build <proj>`, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- <paths>` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
---
# Wave A — leverage-existing-infra (parallel; disjoint projects)
### Task M5.1 (T8): CLI `audit tree` + tree endpoint
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=<guid>``IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows.
- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id <guid> [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern.
- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering.
**AC:** `audit tree --execution-id <id>` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change.
### Task M5.2 (T6): Per-node stuck-count KPIs
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3
**Files:**
- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`.
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`.
- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each.
- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown.
- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged.
### Task M5.3 (T7): Structured response-capture increments
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row).
- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row.
**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved.
---
# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
### Task M5.4 (T4): ParentExecutionId tag-cascade
**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6)
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`).
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree.
- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain).
**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
### Task M5.5 (T3): Per-channel retention overrides
**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary<string,int> PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only).
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path.
- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere.
**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective.
### Task M5.6 (T5): SourceNode sentinel backfill + runbook
**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`)
**Files:**
- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent.
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel <s>] [--before <date>]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file).
- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision.
- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows.
**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
---
# Wave C — integration + docs
### Task M5.7: Integration verification + docs
**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1M5.6
**Steps:**
1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution).
2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation.
4. Commit; final integration review of the whole `1b7600f..HEAD` diff.
**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
---
## Native tasks & dependencies
Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.
@@ -0,0 +1,13 @@
{
"planPath": "docs/plans/2026-06-16-m5-audit-hardening.md",
"tasks": [
{"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"},
{"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"},
{"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"},
{"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"},
{"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"},
{"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]},
{"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]}
],
"lastUpdated": "2026-06-16"
}
@@ -0,0 +1,264 @@
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
**Date:** 2026-06-17
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
**Why now:** the DELMIA/MES receiver re-implementation
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
> All file paths, line numbers, message records, and signatures below were read from source on
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
---
## 1. The gap
The receiver handshake (and any request/response tag interaction) needs to **wait until a
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
or `MoveInCompleteFlag == true` after setting the trigger flag.
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
a manual poll loop:
```csharp
// current workaround — every handshake script repeats this
var deadline = DateTime.UtcNow.AddSeconds(30);
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
{
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
await Task.Delay(200, CancellationToken);
}
```
Why this is unsatisfactory:
- **Latency** — completion is detected up to one poll interval late (200 ms here).
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
---
## 2. Where the change goes (verified wiring)
| Concern | Type / file | Notes |
|---|---|---|
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)``src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)``src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate``TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute``Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
is the one place every change passes, a waiter that (a) checks the current value on registration and
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
---
## 3. Proposed API (script-facing)
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
path resolution (`Resolve(name)`) applies just like get/set:
```csharp
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
// within the timeout, false if it timed out. Honors the script CancellationToken.
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
// Optional richer overload that also returns the matched value + quality.
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
// record WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
```
> **Status:** IMPLEMENTED. `Attributes.WaitForAsync(...)` returns a `WaitResult`
> (`readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut)`
> in Commons), populated on match (Value + Quality) and `Matched:false, TimedOut:true` on timeout.
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
exception. The value-equality overload is the one the handshake needs and is the one that can also
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
Handshake, rewritten (replaces the §1 poll loop):
```csharp
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
return new {
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
};
```
```csharp
await Attributes.SetAsync("MoveInFlag", true);
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
```
---
## 4. Implementation outline (the patch)
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
```csharp
// actor protocol (site-local; delegate is fine because messaging is in-process)
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName, // already scope-resolved by the accessor
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc);
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string Quality,
bool TimedOut,
string? ErrorMessage = null);
// internal self-message used to fire the timeout
public record WaitForAttributeTimeout(string CorrelationId);
```
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
- **Handle `WaitForAttributeRequest`:**
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
use `Predicate`).
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
3. Otherwise register the waiter and schedule the timeout:
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
per instance (defensive; reply with `ErrorMessage` if exceeded).
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
allow removal during enumeration.)
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
Bad-quality transients.
> **Status:** IMPLEMENTED as an opt-in `requireGoodQuality` parameter on `WaitAsync`/`WaitForAsync`
> (additive trailing `RequireGoodQuality` field on `WaitForAttributeRequest`, gated at both the
> fast-path and resolve-loop match sites). Default `false` = quality-agnostic (matches on value only).
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
- Add a method that the accessor calls:
```csharp
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
Func<object?,bool>? predicate, TimeSpan timeout)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
predicate, timeout, DateTimeOffset.UtcNow);
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
return resp.Matched;
}
```
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
```csharp
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
```
### 4.6 Trust model — no change
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
---
## 5. Correctness notes
- **No missed edge.** Registration (current-value check) and change-handling both run on the
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
is caught by the fast-path check; a value that flips after registration is caught by
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
event-driven and cheaper.
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
the waiter is removed and the caller answered even if the value never changes. Match cancels the
scheduled timeout.
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
---
## 6. Optional: inbound / routed variant
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
fully cover the DELMIA/MES use case.
> **Status:** IMPLEMENTED. `Route.To(code).WaitForAttribute(name, targetValue, timeout)` is wired
> end-to-end (`RouteToWaitForAttributeRequest/Response` → `IInstanceRouter` → `CommunicationService`
> → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only
> across the wire. NOT wired into the CentralUI Test-Run sandbox — that remains a follow-up.
---
## 7. Acceptance criteria
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
with no poll loop.
2. Returns `false` (no throw) when the value never matches within the timeout.
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
value in the same actor step as registration, and one step after).
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
composed child's attribute).
7. Passes `ScriptAnalysis` trust validation unchanged.
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
of the poll loop.
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
the script-surface tests under `tests/…/SiteRuntime*`.
```
@@ -0,0 +1,226 @@
# WaitAsync Deferred Optional Items — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (subagent-driven) to implement this plan task-by-task.
**Goal:** Implement the three items deferred from the WaitAsync spec (`docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md`): §3 `WaitForAsync`/`WaitResult` richer overload, §4.2 quality-gated ("Good"-only) matching, and §6 inbound/routed `Route.To(...).WaitForAttribute` variant.
**Architecture:** Builds on the shipped core (`b89d69a``04e97f4`). Two of the items (§3, §4.2) are site-local enrichments of the existing `Attributes` script surface + `InstanceActor` waiter; no new actor protocol shapes beyond an additive `RequireGoodQuality` field. The third (§6) mirrors the existing `Route.To(...).GetAttributes` cross-cluster path end-to-end (`RouteTarget``IInstanceRouter``CommunicationService``SiteCommunicationActor``DeploymentManagerActor``InstanceActor`), value-equality only across the wire, with the cluster Ask bounded by the *wait* timeout rather than the generic integration timeout.
**Tech Stack:** C#/.NET 10, Akka.NET 1.5, xUnit + Akka.TestKit + NSubstitute.
**Branch/worktree:** `waitfor-attr-helper` at `/Users/dohertj2/Desktop/ScadaBridge/.claude/worktrees/waitfor-attr-helper` (off local main; carries the core feature). Implementers do NOT create worktrees, commit **pathspec form** (`git commit -m "…" -- <paths>`), do NOT push, do NOT touch main. Targeted builds/tests per task; full-solution build only in WD-3.
---
## Naming / shared shapes
- New script return type `WaitResult` (Commons): `public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);`
- `WaitForAttributeRequest` gains a trailing additive field `bool RequireGoodQuality = false` (site-local request). `RequireGoodQuality` semantics: a match requires the value test to pass **and** `string.Equals(quality, "Good", StringComparison.Ordinal)`.
- Routed contract (value-equality only, no predicate, no quality flag across the wire — §6 says value-equality only): `RouteToWaitForAttributeRequest` / `RouteToWaitForAttributeResponse` (Commons `Messages/InboundApi`).
- The `WaitForAttributeResponse.Quality` field is already `string?` (null on timeout/error).
---
## Execution waves
- **Wave 1 (parallel, disjoint files):** WD-1 ∥ WD-2a. (2 concurrent committers; post-wave HEAD-presence check.)
- **Wave 2:** WD-2b (after WD-2a).
- **Wave 3:** WD-3 (after WD-1, WD-2a, WD-2b).
WD-1 must add `RequireGoodQuality` ONLY as a **trailing defaulted** ctor param of `WaitForAttributeRequest`, so WD-2b's `new WaitForAttributeRequest(...)` (built in wave 2) compiles regardless.
---
### Task WD-1: Site-local `WaitForAsync` + `WaitResult` + quality-gated mode (§3 + §4.2)
**Classification:** high-risk (modifies the `InstanceActor` single-threaded match evaluation + an additive message-contract field)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-2a
**Files:**
- Create: `src/ZB.MOM.WW.ScadaBridge.Commons/Types/WaitResult.cs`
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Instance/WaitForAttribute.cs` (add trailing `bool RequireGoodQuality = false` to `WaitForAttributeRequest`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/InstanceActor.cs` (thread `RequireGoodQuality` into `PendingWait` + both match sites)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (add `WaitAttributeFull` returning `WaitResult`; add `requireGoodQuality` param)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScopeAccessors.cs` (add `WaitForAsync` overloads + `requireGoodQuality` optional param on `WaitAsync`)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/InstanceActorWaitForAttributeTests.cs` + `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Scripts/ScopeAccessorTests.cs`
**Steps (TDD):**
1. **`WaitResult`** — add the readonly record struct above.
2. **`WaitForAttributeRequest`** — add trailing `bool RequireGoodQuality = false`. Keep the `Func<>` predicate field as-is. Update the XML-doc.
3. **`InstanceActor`** — add `bool RequireGoodQuality` to the `PendingWait` record. At BOTH match sites build the effective match as:
```csharp
// fast-path (HandleWaitForAttribute): quality from _attributeQualities.GetValueOrDefault(name, <existing default>)
// resolve loop (ResolveMatchedWaiters): quality from changed.Quality
bool QualityOk(string? q) => !requireGoodQuality || string.Equals(q, "Good", StringComparison.Ordinal);
bool matched = QualityOk(quality) && test(value); // keep test() inside its existing try/catch
```
Store `RequireGoodQuality` on the `PendingWait` so the resolve loop knows it. Keep the throwing-predicate guard (the `QualityOk && test` must still be inside the existing try/catch). The fast-path quality-fail when `requireGoodQuality` is just a non-match → register + schedule timeout as normal (do NOT fast-reply matched).
4. **`ScriptRuntimeContext`** — refactor: a private `Task<WaitForAttributeResponse> WaitInternal(name, encoded, predicate, timeout, requireGoodQuality)` that does the token-bounded `Ask` (keep the existing `AskTimeoutException → ...` handling; on AskTimeout return a synthetic `WaitForAttributeResponse(.., Matched:false, TimedOut:true)`). Then:
```csharp
public async Task<bool> WaitAttribute(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
=> (await WaitInternal(name, enc, pred, t, requireGoodQuality)).Matched;
public async Task<WaitResult> WaitAttributeFull(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
{ var r = await WaitInternal(...); return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut); }
```
(Note: `WaitAttribute`'s existing `AskTimeoutException → return false` must be preserved — fold it into `WaitInternal` returning a non-matched/timed-out response, OR catch in both. Do NOT catch `OperationCanceledException`/`TaskCanceledException`.)
5. **`AttributeAccessor`** — add `requireGoodQuality` optional param to both existing `WaitAsync` overloads, and add two `WaitForAsync` overloads:
```csharp
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
public Task<WaitResult> WaitForAsync(string key, Func<object?,bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
```
XML-doc: `requireGoodQuality:true` ignores Bad/Uncertain-quality transients.
6. **Tests** (extend existing files): (a) `WaitForAsync` returns a populated `WaitResult` on match (Value+Quality) and on timeout (`Matched:false, TimedOut:true`). (b) quality-gated: a value reaching target at **Bad** quality does NOT match when `requireGoodQuality:true` (stays pending → times out), but DOES match when `false`; and matches when it reaches target at Good quality. Cover both fast-path (already-at-target-but-Bad) and change-match. (c) scope resolution still applied for `WaitForAsync`.
7. Build `Commons` + `SiteRuntime` + the SiteRuntime test project; run `--filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync"` and the `~InstanceActor|~ScopeAccessor` regression filter. All green.
8. Commit (pathspec).
---
### Task WD-2a: Routed contract + central path (§6, part 1)
**Classification:** high-risk (cross-cluster message contract + `IInstanceRouter` surface)
**Estimated implement time:** ~5 min
**Parallelizable with:** WD-1
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs` (add the two records)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/IInstanceRouter.cs` (add method)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/CommunicationServiceInstanceRouter.cs` (delegate)
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs` (`RouteTarget.WaitForAttribute`)
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/CommunicationService.cs` (`RouteToWaitForAttributeAsync` — **wait-timeout-aware** Ask)
- Modify (compile-break fixes — interface gained a member): `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Integration/ParentExecutionIdCorrelationTests.cs` (`BridgingInstanceRouter`) and the inline `IInstanceRouter` double in `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointContentTypeTests.cs`
- Test: `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/RouteHelperTests.cs`
**Steps (TDD):**
1. **Commons records** (mirror `RouteToGetAttributes*`, value-equality only):
```csharp
public record RouteToWaitForAttributeRequest(
string CorrelationId, string InstanceUniqueName, string AttributeName,
string? TargetValueEncoded, TimeSpan Timeout, DateTimeOffset Timestamp,
Guid? ParentExecutionId = null);
public record RouteToWaitForAttributeResponse(
string CorrelationId, bool Matched, object? Value, string? Quality, bool TimedOut,
bool Success, string? ErrorMessage, DateTimeOffset Timestamp);
```
(`Success`/`ErrorMessage` = routing-level outcome, e.g. instance-not-found; `Matched`/`TimedOut`/`Value`/`Quality` = wait outcome.)
2. **`IInstanceRouter`** — add `Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);`. **Update all 3 implementers** (prod `CommunicationServiceInstanceRouter` + the 2 test doubles listed above; the test doubles can return a canned response / throw NotImplemented only if never exercised — prefer a sane canned response).
3. **`CommunicationServiceInstanceRouter`** — delegate to `_communicationService.RouteToWaitForAttributeAsync(...)`.
4. **`RouteHelper.RouteTarget`** — add (mirror `GetAttributes`, throw on `!Success`):
```csharp
public async Task<bool> WaitForAttribute(string attributeName, object? targetValue, TimeSpan timeout, CancellationToken cancellationToken = default)
{
var token = Effective(cancellationToken);
var siteId = await ResolveSiteAsync(token);
var request = new RouteToWaitForAttributeRequest(Guid.NewGuid().ToString(), _instanceCode,
attributeName, AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow, _parentExecutionId);
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
if (!response.Success) throw new InvalidOperationException(response.ErrorMessage ?? "Remote attribute wait failed");
return response.Matched;
}
```
(`AttributeValueCodec` is in Commons.Types — add the using if needed.)
5. **`CommunicationService.RouteToWaitForAttributeAsync`** — mirror `RouteToGetAttributesAsync` BUT bound the Ask by the wait timeout, not the generic integration timeout:
```csharp
var envelope = new SiteEnvelope(siteId, request);
var askTimeout = request.Timeout + _options.IntegrationTimeout; // slack beyond the wait
return await GetActor().Ask<RouteToWaitForAttributeResponse>(envelope, askTimeout, cancellationToken);
```
6. **Test** (`RouteHelperTests`): with a substitute `IInstanceRouter` returning a canned `RouteToWaitForAttributeResponse(Matched:true,...)`, `Route.To("x").WaitForAttribute("Flag", true, 30s)` returns true; `Success:false` → throws `InvalidOperationException`; the encoded target equals `AttributeValueCodec.Encode(true)`.
7. Build `Commons` + `InboundAPI` + `Communication` + the two affected test projects; run `--filter "FullyQualifiedName~RouteHelper"` + a build of AuditLog.Tests/InboundAPI.Tests to confirm the interface-addition compiles. Commit (pathspec).
---
### Task WD-2b: Site unpacking + handler (§6, part 2)
**Classification:** high-risk (actor handler crossing into `InstanceActor`; Ask-timeout correctness)
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-2a
**Files:**
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/Actors/SiteCommunicationActor.cs` (add `Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));` next to the other RouteTo forwards ~line 145)
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (`Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);` + handler)
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/DeploymentManagerActorTests.cs`
**Steps (TDD):**
1. **`SiteCommunicationActor`** — add the `Receive`/Forward line.
2. **`DeploymentManagerActor.RouteInboundApiWaitForAttribute`** — mirror `RouteInboundApiGetAttributes`:
```csharp
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
{
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
{
Sender.Tell(new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false,
false, $"Instance '{request.InstanceUniqueName}' not found on this site.", DateTimeOffset.UtcNow));
return;
}
var sender = Sender;
var inner = new WaitForAttributeRequest(request.CorrelationId, request.InstanceUniqueName,
request.AttributeName, request.TargetValueEncoded, null /*predicate*/, request.Timeout,
DateTimeOffset.UtcNow /*, RequireGoodQuality defaults false */);
// Ask bounded by the WAIT timeout + slack (NOT a fixed 30s).
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
.ContinueWith(t => t.IsCompletedSuccessfully
? new RouteToWaitForAttributeResponse(request.CorrelationId, t.Result.Matched, t.Result.Value,
t.Result.Quality, t.Result.TimedOut, true, null, DateTimeOffset.UtcNow)
: new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false, false,
t.Exception?.GetBaseException().Message ?? "Attribute wait timed out", DateTimeOffset.UtcNow))
.PipeTo(sender);
}
```
(`WaitForAttributeRequest` lives in Commons `Messages/Instance` — add the using. Build with both the trailing-`RequireGoodQuality` and pre-field signatures in mind; passing 7 positional args + default is fine.)
3. **Test** (`DeploymentManagerActorTests`, mirror the routed get-attributes test): deploy/register an instance whose attribute already equals the target → `RouteToWaitForAttributeRequest` → `RouteToWaitForAttributeResponse(Success:true, Matched:true)`; unknown instance → `Success:false`.
4. Build `Communication` + `SiteRuntime` + SiteRuntime test project; run `--filter "FullyQualifiedName~DeploymentManagerActor"`. Commit (pathspec).
---
### Task WD-3: Integration — docs + full verification
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** none
**blockedBy:** WD-1, WD-2a, WD-2b
**Files:**
- Modify: `docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md` (mark §3 `WaitForAsync`/`WaitResult`, §4.2 quality-gated mode, and §6 routed variant as IMPLEMENTED; note Test-Run sandbox parity excluded)
- Modify: `docs/requirements/Component-SiteRuntime.md` (script-surface note: `Attributes.WaitForAsync` + `requireGoodQuality`) and `docs/requirements/Component-InboundAPI.md` (`Route.To(...).WaitForAttribute`) — brief, only if those docs enumerate the script surface
- (No new component, no migration, no docker config change)
**Steps:**
1. Update the spec doc + component docs as above.
2. **Full-solution build:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` — 0 errors.
3. **Targeted test sweep** across everything touched:
`dotnet test tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/... --filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync|FullyQualifiedName~DeploymentManagerActor"`,
`dotnet test tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/... --filter "FullyQualifiedName~RouteHelper"`,
and a build of `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` + `tests/ZB.MOM.WW.ScadaBridge.Communication.Tests` to confirm no compile/regression from the interface addition.
4. `git diff` review; commit (pathspec).
---
## Out of scope (explicit)
- Routed `WaitForAttribute` is NOT wired into the CentralUI Test-Run sandbox (`ISandboxInstanceGateway`/`SandboxInstanceGateway`); production inbound scripts get it. Follow-up if Test-Run parity is wanted.
- No predicate or quality flag across the wire (§6 is value-equality only, per spec).
- No docker redeploy (no cluster-runtime config change; additive script surface only).
@@ -0,0 +1,10 @@
{
"planPath": "docs/plans/2026-06-17-waitfor-deferred-items.md",
"tasks": [
{"id": 1, "subject": "WD-1: site-local WaitForAsync + WaitResult + quality-gated mode (§3+§4.2)", "classification": "high-risk", "status": "pending", "parallelizableWith": [2]},
{"id": 2, "subject": "WD-2a: routed contract + central path (§6 part 1)", "classification": "high-risk", "status": "pending", "parallelizableWith": [1]},
{"id": 3, "subject": "WD-2b: site unpacking + DeploymentManager handler (§6 part 2)", "classification": "high-risk", "status": "pending", "blockedBy": [2]},
{"id": 4, "subject": "WD-3: integration — docs + full verification", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]}
],
"lastUpdated": "2026-06-17"
}
+165 -32
View File
@@ -158,16 +158,32 @@ is per-run and flat — `WHERE ExecutionId = X` returns everything one run did,
nothing links a run to the run that *spawned* it. `ParentExecutionId` carries the
spawning execution's `ExecutionId`: a spawned run still gets its own fresh
`ExecutionId`, and every audit row it emits also carries the spawner's id in
`ParentExecutionId`. The first cut bridges the **inbound API → routed-site-script**
case: an inbound request runs a method script that calls `Route.Call`, routing to
a site instance; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
itself is top-level (`ParentExecutionId` NULL). The pointer always references the
*immediate* spawner, so a routed run that itself routes onward threads its own
`ExecutionId` — walking `ParentExecutionId → ExecutionId` recursively
reconstructs the call chain as a tree of arbitrary depth. The tag-cascade case
(an attribute write triggering another script) is **deferred** — the model
generalises to it with no schema change once that spawn point is threaded.
`ParentExecutionId`. The pointer always references the *immediate* spawner, so a
run that itself spawns further runs threads its own `ExecutionId` — walking
`ParentExecutionId → ExecutionId` recursively reconstructs the call chain as a
tree of arbitrary depth.
**Tag-cascade coverage (M5.4 T4):** `ParentExecutionId` threading now spans all
known spawn points:
- **Inbound API → routed site script** — an inbound request runs a method script
that calls `Route.Call`; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
is top-level (`ParentExecutionId` NULL).
- **Alarm-triggered on-trigger script** — when an alarm fires and its on-trigger
script runs (via `AlarmActor → AlarmExecutionActor`), the alarm context's
`ExecutionId` is carried as the run's `ParentExecutionId`. Currently the alarm
subsystem has no Guid-typed firing id so on-trigger runs are roots (NULL) in
practice, but the wiring is in place for a future alarm `ExecutionId`.
- **Nested `CallScript` / `CallShared` invocations** — when a script calls
`Instance.CallScript(...)` or a shared script via `CallShared`, the calling
execution's `ExecutionId` threads into the spawned run as its
`ParentExecutionId`, making deeply nested call chains visible as a tree.
Attribute-write-triggered cascades (one tag change triggering another script via a
tag subscription) are also wired: trigger-driven runs carry `ParentExecutionId =
NULL` (top-level roots), and any nested `CallScript`/`CallShared` they perform
chains as above. The schema is unchanged — no further tag-cascade work is deferred.
## The Site-Local `AuditLog` (SQLite)
@@ -268,7 +284,34 @@ operational `SiteCalls` shape for the dispatcher and UI.
- **Default cap** — 8 KB for each of `RequestSummary` and `ResponseSummary`;
raised to 64 KB on any error row (`Status IN ('Failed', 'Parked', 'Discarded')`).
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and `ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB (configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min 8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to other channels do not apply here. `PayloadTruncated = 1` is set only when the inbound ceiling is hit — verbatim capture is the normal case. The ceiling applies independently to each body. Header redaction and per-target body redactors still run before persistence.
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and
`ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB
(configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min
8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to
other channels do not apply here. `PayloadTruncated = 1` is set only when the
inbound ceiling is hit — verbatim capture is the normal case. The ceiling
applies independently to each body. Header redaction and per-target body
redactors still run before persistence.
- **Inbound ceiling hits (M5.3 T7).** Every time the `InboundMaxBytes` ceiling
truncates a body an `IAuditInboundCeilingHitsCounter.Increment()` call fires.
This counter is surfaced as `AuditInboundCeilingHits` on the central health
snapshot (alongside `CentralAuditWriteFailures` / `AuditRedactionFailure`) so
operators can detect persistently oversized payloads and raise the ceiling or
add per-target body redactors.
- **Request headers in `Extra` (M5.3 T7).** For `Channel = ApiInbound`, the
`AuditWriteMiddleware` captures the inbound HTTP request headers (post-redaction
`Authorization`, `X-API-Key`, `Cookie`, `Set-Cookie`, and the configured
`HeaderRedactList` are scrubbed before serialization) into the `Extra` JSON
column under the key `"requestHeaders"`. This makes the full header envelope
visible in the Audit Log UI's detail drawer and the CLI's `audit query` output
without widening the schema.
- **Per-method `SkipBodyCapture` (M5.3 T7).** `PerTargetOverrides` now includes
a `SkipBodyCapture: true` flag. When set for an inbound API method, the audit
row is always emitted (headers, status, duration, actor, etc. are recorded) but
`RequestSummary` and `ResponseSummary` are left null. Use this for methods whose
payloads are structurally large or contain secrets not covered by body redactors.
Headers are still captured into `Extra.requestHeaders` (after redaction) even
when `SkipBodyCapture` is true.
- **Truncation** — UTF-8 byte-safe; `PayloadTruncated = 1` when applied. Full
bodies are never stored.
- **HTTP headers** — `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`, and
@@ -311,16 +354,33 @@ MS SQL for direct-write events). Unredacted secrets never persist.
## Retention & Purge
- **Central:** 365-day default based on `OccurredAtUtc`, configurable via
`AuditLog:RetentionDays` (min 7, max 3650). Single global retention in v1 —
no per-channel overrides.
`AuditLog:RetentionDays` (min 30, max 3650).
- **Partitioning:** monthly partitions on `OccurredAtUtc` from day one
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). Purge is a partition switch;
there are no row-level deletes at central.
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). The global partition switch is
channel-blind; it drops a whole month once every row in it is older than the
global window. There are no row-level deletes at central for the global purge.
- **Purge actor:** `AuditLogPurgeActor` singleton on the active central node
runs daily, switches out any partition whose latest `OccurredAtUtc` is older
than the retention window, and emits an `AuditLog:Purged` event (partition
range, rowcount, duration). A partition-maintenance step rolls forward each
month, creating the next month's partition ahead of time.
than the retention window, then applies any per-channel overrides (see below),
and emits an `AuditLog:Purged` event (partition range, rowcount, duration) per
switched partition. A partition-maintenance step rolls forward each month,
creating the next month's partition ahead of time.
- **Per-channel retention overrides (M5.5 T3):** `AuditLog:PerChannelRetentionDays`
is a dictionary keyed by canonical channel name (`ApiOutbound`, `DbOutbound`,
`Notification`, `ApiInbound`) whose value is a retention window in days that
MUST be strictly shorter than the global `RetentionDays`. After the daily
partition switch-out, the purge actor runs a bounded, batched row DELETE
(`PurgeChannelOlderThanAsync`) for each channel whose override is shorter than
the global window — expiring rows of that channel earlier than the global
partition switch would. Overrides equal to or longer than the global window are
silently skipped (the global switch already covers them). The DELETE runs under
`scadabridge_audit_purger` (the maintenance role); the append-only writer role
is unaffected. Batch size is configurable via
`AuditLogPurge:ChannelPurgeBatchSize` (default 5000). Each channel override
runs in its own try/catch, mirroring the per-boundary error-isolation of the
partition switch-out loop. Values are validated to be in
`[30, RetentionDays]`; keys that are not a recognized `AuditChannel` enum name
are rejected at startup.
- **Sites:** daily site job; default 7-day retention (configurable, min 1,
max 90). Respects the hard `ForwardState` invariant — `Pending` rows are
never purged on age alone.
@@ -340,10 +400,13 @@ MS SQL for direct-write events). Unredacted secrets never persist.
**AuditExport** permission.
- **Payload redaction at write.** See Payload Capture Policy. Unredacted
secrets never persist; the safety net over-redacts on misconfiguration.
- **Hash-chain tamper evidence — deferred to v1.x.** A future `RowHash` column,
computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will be
verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. Off by
default in v1.
- **Hash-chain tamper evidence (T1) — deferred to v1.x.** A future `RowHash`
column, computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will
be verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. The
`verify-chain` CLI command is a no-op placeholder today. Off by default in v1.
- **Parquet archival (T2) — deferred to v1.x.** Long-term cold storage of purged
monthly partitions as Parquet files (suitable for offline analytics) will be
added in a future milestone. T1 and T2 are not shipped as part of M5.
- **Site SQLite security.** File permissions: read/write by the ScadaBridge
service account only. Not backed up off-machine — site SQLite is a buffer,
not a record.
@@ -355,11 +418,22 @@ Point-in-time, computed from the central `AuditLog` table; global and per-site.
- **Audit volume** — events/min landing in the central `AuditLog`; global plus per-site sparkline.
- **Audit error rate** — % of central `AuditLog` rows with `Status IN ('Failed', 'Parked', 'Discarded')` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, permanent failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit backlog** — sum of `Pending` site rows across sites; click drills into a per-site breakdown.
- **`AuditInboundCeilingHits`** (M5.3 T7) — rolling count of inbound API responses truncated by the `InboundMaxBytes` ceiling; surfaced on the central health snapshot alongside `CentralAuditWriteFailures`.
**Per-node stuck KPIs (M5.3 T6):** Both [Notification Outbox](Component-NotificationOutbox.md)
and [Site Call Audit](Component-SiteCallAudit.md) now expose a
`PerNodeNotificationKpiRequest` / `PerNodeSiteCallKpiRequest` message pair that
groups the existing stuck, parked, and delivered-last-interval counts by the
`SourceNode` that emitted the original row. This surfaces per-node breakdowns on
the Health dashboard tiles and the Notification Outbox / Site Calls pages,
making it possible to identify a single misbehaving node (e.g., `site-a:node-b`)
as the source of a spike rather than a site-wide problem. The existing global and
per-site KPI shapes are unchanged; the per-node slice is additive.
[Notification Outbox](Component-NotificationOutbox.md) and
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected they remain
sourced from `Notifications` and `SiteCalls` respectively. Audit Log KPIs
describe the audit table itself.
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected for their
operational dispatch responsibilities — they remain sourced from `Notifications`
and `SiteCalls` respectively. Audit Log KPIs describe the audit table itself.
## Configuration
@@ -370,21 +444,78 @@ component (Options pattern):
"AuditLog": {
"DefaultCapBytes": 8192,
"ErrorCapBytes": 65536,
"InboundMaxBytes": 1048576,
"HeaderRedactList": [ "Authorization", "Cookie", "Set-Cookie", "X-API-Key" ],
"GlobalBodyRedactors": [
{ "Pattern": "\"password\"\\s*:\\s*\"[^\"]+\"", "Replacement": "\"password\":\"<redacted>\"" }
],
"PerTargetOverrides": {
"Weather/GetForecast": { "CapBytes": 4096 },
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" }
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" },
"HighVolumeMethod": { "SkipBodyCapture": true }
},
"RetentionDays": 365
"RetentionDays": 365,
"PerChannelRetentionDays": {
"ApiOutbound": 90,
"Notification": 180
}
}
```
`PerTargetOverrides` keys bind by External System / Inbound Method /
Notification List / Database Connection name. `RetentionDays` is a single
global value in v1; per-channel overrides are deferred to v1.x.
Notification List / Database Connection name. `SkipBodyCapture: true` omits
`RequestSummary`/`ResponseSummary` for that method while still capturing headers
into `Extra.requestHeaders` and emitting the full audit row. `RetentionDays` is
the global window; `PerChannelRetentionDays` specifies per-channel windows that
are strictly shorter — any channel whose override equals or exceeds the global
value is silently ignored (the global partition switch-out already governs it).
`AuditLogPurge` section controls the purge actor cadence and batch size:
```jsonc
"AuditLogPurge": {
"IntervalHours": 24,
"ChannelPurgeBatchSize": 5000
}
```
## Ops Notes — Historical Null Columns
### `SourceNode` backfill (M5.6 T5)
`SourceNode` (`varchar(64)` NULL) is a physical column stamped on every row at
write time. Rows ingested before M5.6 shipped have `SourceNode IS NULL` because
the value was not populated until the feature landed. A one-time CLI command sets
these to a configurable sentinel:
```
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel unknown] [--batch 5000]
```
The default sentinel is `"unknown"`. The true node-of-origin for pre-feature rows
is **unknowable** retroactively — the emitting node is long gone from the telemetry
pipeline. The sentinel makes that explicit rather than leaving the column NULL
(which the Audit Log UI's Node filter already treats as "unresolved", but which
an operator might mistake for a data-quality bug).
The backfill runs via `POST /api/audit/backfill-source-node` (Admin role required)
on the maintenance/purge path, NOT the append-only `scadabridge_audit_writer` role.
It is idempotent and can be re-run safely.
### `ExecutionId` and `ParentExecutionId` — cannot be backfilled
`ExecutionId` and `ParentExecutionId` are **PERSISTED COMPUTED columns** derived
from `DetailsJson`. They were introduced in the same feature window as the column
itself but their value comes from the JSON payload that was written at ingest time.
The AuditLog append-only invariant **forbids mutating `DetailsJson`** — rows may
only be inserted, never updated. Because backfilling the computed values would
require rewriting the underlying `DetailsJson`, it is impossible under the
append-only contract. Pre-feature rows carry `NULL` in both columns permanently.
This is a documented limitation, not a defect. The NULL values are visible in the
Audit Log UI's execution-tree drilldown (rows with no `ExecutionId` appear as
orphaned entries) and in the CLI's `audit tree` output.
## Dependencies
@@ -442,6 +573,8 @@ global value in v1; per-channel overrides are deferred to v1.x.
tiles (Volume, Error rate, Backlog) plus new health metrics:
`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`,
`CentralAuditWriteFailures`, `AuditRedactionFailure`.
- **[CLI (#19)](Component-CLI.md)** — new `scadabridge audit query`,
`scadabridge audit export`, and `scadabridge audit verify-chain` commands; same
permission requirements as the UI.
- **[CLI (#19)](Component-CLI.md)** — `scadabridge audit query`,
`scadabridge audit export`, `scadabridge audit tree --execution-id <guid>`,
`scadabridge audit backfill-source-node --sentinel <s> --before <date>`, and
`scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain
feature); same permission requirements as the UI.
+20 -5
View File
@@ -228,14 +228,17 @@ The new centralized Audit Log component (#23) is exposed via the `scadabridge au
The `scadabridge audit` group targets the centralized Audit Log component (#23) and
exposes the UI-equivalent operational audit surface. Permissions follow the same
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
require the `OperationalAudit` permission; `audit export` additionally requires
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
exit code 2) on denial.
Tamper-Evidence, and Security & Auth #10): `audit query`, `audit tree`, and
`audit verify-chain` require the `OperationalAudit` permission; `audit export`
additionally requires `AuditExport`; `audit backfill-source-node` requires the
`Admin` role (maintenance path only). The server enforces permission checks and
returns HTTP 403 (CLI exit code 2) on denial.
```
scadabridge audit query [--since <t>] [--until <t>] [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>] [--correlation-id <id>] [--execution-id <id>] [--parent-execution-id <id>] [--errors-only] [--page-size <n>] [--all]
scadabridge audit export --since <t> --until <t> --format csv|jsonl|parquet --output <path> [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>]
scadabridge audit tree --execution-id <guid> [--format table|json]
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
scadabridge audit verify-chain --month <YYYY-MM>
```
@@ -247,6 +250,18 @@ scadabridge audit verify-chain --month <YYYY-MM>
requested format (`csv`, `jsonl`, `parquet`) written to `--output`. The server
streams rows rather than materializing them in memory; the CLI writes bytes
through to disk. Supports the same scoping filters as `audit query`.
- `audit tree --execution-id <guid>` (M5.3 T8) — renders the full execution-chain
tree for the given `ExecutionId`. The server resolves the root from any node in
the chain (walks `ParentExecutionId` to find the root, then traverses downward)
and returns all reachable executions with their summary row counts and first/last
occurred timestamps. Output format: `json` (default — structured tree suitable
for scripting) or `table` (human-readable indented tree). Requires
`OperationalAudit` permission. Backed by `GET /api/audit/tree?executionId=<guid>`.
- `audit backfill-source-node --before <ISO-8601-UTC>` (M5.6 T5) — sets
`SourceNode` to a sentinel value (`--sentinel`, default `"unknown"`) on pre-feature
rows where `SourceNode IS NULL` and `OccurredAtUtc < --before`, in batches
(`--batch`, default 5000). Admin-only maintenance command. Idempotent.
Backed by `POST /api/audit/backfill-source-node`.
- `audit verify-chain` — hash-chain verification for the named month.
**No-op in v1**: the command is defined so the command tree is stable, but
verification only becomes meaningful once the hash-chain ships (see
@@ -366,7 +381,7 @@ Configuration is resolved in the following priority order (highest wins):
- **System.CommandLine**: Command-line argument parsing.
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadabridge audit` command group rides a parallel REST surface on the same Host (`GET /api/audit/query` and `GET /api/audit/export`), sharing HTTP Basic Auth with `/management` but bypassing the actor for read-only, keyset-paged / streaming workloads.
- **Audit Log (#23)**: The `scadabridge audit query` and `audit export` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) on the Host's Management API surface; `audit verify-chain` rides `POST /management` until hash-chain verification ships. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side by `AuditEndpoints`.
- **Audit Log (#23)**: The `scadabridge audit query`, `audit export`, `audit tree`, and `audit backfill-source-node` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`, `GET /api/audit/tree`, `POST /api/audit/backfill-source-node`) on the Host's Management API surface; `audit verify-chain` is a client-side no-op today (hash-chain deferred to v1.x). Permission checks (`OperationalAudit`, `AuditExport`, `Admin`) are enforced server-side by `AuditEndpoints`.
## Interactions
@@ -189,6 +189,7 @@ Inbound API scripts **cannot** call shared scripts directly — shared scripts a
- `Route.To("instanceUniqueCode").GetAttributes("attr1", "attr2", ...)` — Read multiple attribute values in a **single call**, returned as a dictionary of name-value pairs.
- `Route.To("instanceUniqueCode").SetAttribute("attributeName", value)` — Write a single attribute value on a specific instance at any site.
- `Route.To("instanceUniqueCode").SetAttributes(dictionary)` — Write multiple attribute values in a **single call**, accepting a dictionary of name-value pairs.
- `Route.To("instanceUniqueCode").WaitForAttribute("attributeName", targetValue, timeout)` — Wait, event-driven, until an attribute on a specific instance at any site reaches `targetValue` (value-equality only across the wire), bounded by `timeout`. Returns `true` if matched within the timeout, `false` if it timed out. The cluster call is bounded by the wait timeout rather than the generic integration timeout.
#### Input/Output
- **Input parameters** are available as defined in the method definition.
@@ -39,10 +39,12 @@ namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
public sealed class AuditCentralHealthSnapshot
: IAuditCentralHealthSnapshot,
ICentralAuditWriteFailureCounter,
IAuditRedactionFailureCounter
IAuditRedactionFailureCounter,
IAuditInboundCeilingHitsCounter
{
private int _centralAuditWriteFailures;
private int _auditRedactionFailure;
private int _auditInboundCeilingHits;
private readonly ConcurrentDictionary<string, bool> _stalled = new();
/// <inheritdoc/>
@@ -53,6 +55,10 @@ public sealed class AuditCentralHealthSnapshot
public int AuditRedactionFailure =>
Interlocked.CompareExchange(ref _auditRedactionFailure, 0, 0);
/// <inheritdoc/>
public int AuditInboundCeilingHits =>
Interlocked.CompareExchange(ref _auditInboundCeilingHits, 0, 0);
/// <inheritdoc/>
public IReadOnlyDictionary<string, bool> SiteAuditTelemetryStalled =>
new Dictionary<string, bool>(_stalled);
@@ -78,4 +84,8 @@ public sealed class AuditCentralHealthSnapshot
/// <inheritdoc/>
void IAuditRedactionFailureCounter.Increment() =>
Interlocked.Increment(ref _auditRedactionFailure);
/// <inheritdoc/>
void IAuditInboundCeilingHitsCounter.Increment() =>
Interlocked.Increment(ref _auditInboundCeilingHits);
}
@@ -167,6 +167,9 @@ public class AuditLogPurgeActor : ReceiveActor
if (boundaries.Count == 0)
{
// No whole-month partitions are eligible, but per-channel overrides may
// still expire rows earlier than the global window — run them below.
await RunPerChannelOverridesAsync(repository).ConfigureAwait(false);
return;
}
@@ -202,6 +205,80 @@ public class AuditLogPurgeActor : ReceiveActor
sw.ElapsedMilliseconds);
}
}
// M5.5 (T3): after the channel-blind global partition switch-out, apply any
// per-channel retention overrides that are SHORTER than the global window via
// a bounded, batched row DELETE on the same maintenance path. The global
// switch-out has already dropped whole months older than RetentionDays; these
// deletes only ever expire rows EARLIER than that, so they run last and are a
// strict tightening.
await RunPerChannelOverridesAsync(repository).ConfigureAwait(false);
}
/// <summary>
/// M5.5 (T3): runs each per-channel retention override whose window is strictly
/// shorter than the global <see cref="AuditLogOptions.RetentionDays"/>, deleting
/// rows of that channel older than the channel-specific threshold via a bounded,
/// batched maintenance-path DELETE. Each channel runs inside its own try/catch so
/// one bad channel does not abandon the others on the same tick, mirroring the
/// per-boundary error isolation of the partition switch-out loop.
/// </summary>
/// <param name="repository">The repository resolved for this tick's DI scope.</param>
private async Task RunPerChannelOverridesAsync(IAuditLogRepository repository)
{
var overrides = _auditOptions.PerChannelRetentionDays;
if (overrides is null || overrides.Count == 0)
{
return;
}
var globalDays = _auditOptions.RetentionDays;
foreach (var (channel, days) in overrides)
{
// Only act when the per-channel window is strictly shorter than the global
// one. Equal/longer windows are already covered by the global partition
// switch-out, so a row DELETE would be redundant work (and a longer window
// is meaningless — the partition is dropped on the global schedule).
if (days >= globalDays)
{
continue;
}
var channelThreshold = DateTime.UtcNow - TimeSpan.FromDays(days);
var sw = Stopwatch.StartNew();
try
{
var rowsDeleted = await repository
.PurgeChannelOlderThanAsync(channel, channelThreshold, _purgeOptions.ChannelPurgeBatchSize)
.ConfigureAwait(false);
sw.Stop();
if (rowsDeleted > 0)
{
_logger.LogInformation(
"Purged {RowsDeleted} AuditLog rows for channel {Channel} older than {Threshold:o} " +
"(per-channel override {Days}d < global {GlobalDays}d) in {DurationMs} ms.",
rowsDeleted,
channel,
channelThreshold,
days,
globalDays,
sw.ElapsedMilliseconds);
}
}
catch (Exception ex)
{
sw.Stop();
_logger.LogError(
ex,
"Failed to apply per-channel retention override for channel {Channel} " +
"({Days}d); other channels continue. Elapsed {DurationMs} ms.",
channel,
days,
sw.ElapsedMilliseconds);
}
}
}
/// <summary>Self-tick triggering a purge pass across all eligible partitions.</summary>
@@ -28,6 +28,24 @@ public sealed class AuditLogPurgeOptions
/// <summary>Period of the purge tick in hours (default 24).</summary>
public int IntervalHours { get; set; } = 24;
/// <summary>
/// M5.5 (T3): batch size for the per-channel retention-override row DELETE
/// (<see cref="ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories.IAuditLogRepository.PurgeChannelOlderThanAsync"/>).
/// Each <c>DELETE TOP (@batch)</c> caps the transaction-log and lock footprint
/// per statement; the repository loops batches until no rows remain. Default
/// 5000 keeps individual deletes short on a busy central DB while still draining
/// a large backlog within a tick. Clamped to a sane minimum in
/// <see cref="ChannelPurgeBatchSize"/>.
/// </summary>
public int ChannelPurgeBatchSizeConfigured { get; set; } = 5000;
/// <summary>
/// Resolves the effective per-channel purge batch size, clamped to at least 1 so
/// a misconfigured <c>0</c>/negative value cannot make the repository's DELETE
/// loop spin or throw.
/// </summary>
public int ChannelPurgeBatchSize => ChannelPurgeBatchSizeConfigured < 1 ? 1 : ChannelPurgeBatchSizeConfigured;
/// <summary>
/// Test-only override for finer control over the tick cadence than
/// whole-hour resolution allows. When non-null, takes precedence over
@@ -50,6 +50,17 @@ public interface IAuditCentralHealthSnapshot
/// </summary>
int AuditRedactionFailure { get; }
/// <summary>
/// Count of inbound request/response body truncations at the
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.AuditLogOptions.InboundMaxBytes"/>
/// ceiling since process start. Incremented by
/// <see cref="ZB.MOM.WW.ScadaBridge.InboundAPI.Middleware.AuditWriteMiddleware"/>
/// whenever either the request or response body exceeds the cap and is
/// truncated in the audit copy. A sustained non-zero count can indicate
/// callers sending unexpectedly large bodies.
/// </summary>
int AuditInboundCeilingHits { get; }
/// <summary>
/// Per-site latched stalled state: <c>true</c> when the
/// <see cref="SiteAuditReconciliationActor"/> has observed two
@@ -0,0 +1,24 @@
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
/// <summary>
/// Audit Log (#23) M5.3 (T7) counter sink incremented by
/// <see cref="ZB.MOM.WW.ScadaBridge.InboundAPI.Middleware.AuditWriteMiddleware"/>
/// whenever an inbound request or response body is truncated at the
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.AuditLogOptions.InboundMaxBytes"/>
/// ceiling. Mirrors the <see cref="ICentralAuditWriteFailureCounter"/> shape:
/// one-method, NoOp default, must-never-abort-the-user-facing-action invariant.
/// </summary>
/// <remarks>
/// A ceiling hit is a normal operational event (the caller sent a large
/// body) rather than a failure, but surfacing a cumulative count lets
/// operators detect over-size callers early. The
/// <see cref="AuditCentralHealthSnapshot"/> production implementation
/// accumulates the count via an <c>Interlocked</c> field alongside
/// <see cref="ICentralAuditWriteFailureCounter"/> and
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Payload.IAuditRedactionFailureCounter"/>.
/// </remarks>
public interface IAuditInboundCeilingHitsCounter
{
/// <summary>Increment the inbound body-ceiling hit counter by one.</summary>
void Increment();
}
@@ -0,0 +1,13 @@
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
/// <summary>
/// Default <see cref="IAuditInboundCeilingHitsCounter"/> binding used when
/// the central health snapshot is not wired (e.g. site composition roots,
/// test harnesses that have no health dashboard). All increments are silently
/// dropped — correct for environments that have no audit KPI surface.
/// </summary>
public sealed class NoOpAuditInboundCeilingHitsCounter : IAuditInboundCeilingHitsCounter
{
/// <inheritdoc/>
public void Increment() { }
}
@@ -37,6 +37,33 @@ public sealed class AuditLogOptions
/// <summary>Central retention window in days (default 365, range [30, 3650]).</summary>
public int RetentionDays { get; set; } = 365;
/// <summary>
/// M5.5 (T3) per-channel retention overrides, keyed by the canonical channel name
/// (the <see cref="AuditChannel"/> enum name — e.g. <c>ApiOutbound</c>,
/// <c>DbOutbound</c>, <c>Notification</c>, <c>ApiInbound</c>). The value is a
/// retention window in days that MUST be SHORTER than or equal to the global
/// <see cref="RetentionDays"/>.
/// </summary>
/// <remarks>
/// <para>
/// The global <see cref="RetentionDays"/> window is enforced by month-partition
/// switch-out, which is channel-blind: it can only drop a whole month once every
/// row in it is older than the global window. A per-channel override therefore
/// can only ever expire rows EARLIER than the global purge would — never later
/// (a longer per-channel window is meaningless because the partition switch-out
/// would already have dropped the month). Overrides shorter than the global window
/// are honoured by the purge actor as a bounded, batched row DELETE on the
/// maintenance path (see <c>AuditLogPurgeActor</c>); the append-only writer/ingest
/// role is unaffected.
/// </para>
/// <para>
/// Each value is validated to be in <c>[30, RetentionDays]</c> by
/// <c>AuditLogOptionsValidator</c>; keys that are not recognized
/// <see cref="AuditChannel"/> names are rejected.
/// </para>
/// </remarks>
public Dictionary<string, int> PerChannelRetentionDays { get; set; } = new();
/// <summary>
/// Per-body byte ceiling applied to <see cref="AuditEvent.RequestSummary"/> and
/// <see cref="AuditEvent.ResponseSummary"/> for <see cref="AuditChannel.ApiInbound"/> rows
@@ -1,4 +1,5 @@
using ZB.MOM.WW.Configuration;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Configuration;
@@ -52,5 +53,27 @@ public sealed class AuditLogOptionsValidator : OptionsValidatorBase<AuditLogOpti
!(options.InboundMaxBytes < MinInboundMaxBytes || options.InboundMaxBytes > MaxInboundMaxBytes),
$"AuditLog:{nameof(AuditLogOptions.InboundMaxBytes)} ({options.InboundMaxBytes}) " +
$"must be in [{MinInboundMaxBytes}, {MaxInboundMaxBytes}] bytes.");
// M5.5 (T3): per-channel retention overrides. Each entry must be keyed by a
// recognized AuditChannel name and carry a window in [MinRetentionDays,
// RetentionDays] — i.e. SHORTER than or equal to the global window. A longer
// per-channel window is meaningless under month-partition switch-out (governed
// by the global window), so it is rejected rather than silently ignored.
foreach (var (channelKey, days) in options.PerChannelRetentionDays)
{
builder.RequireThat(
Enum.TryParse<AuditChannel>(channelKey, ignoreCase: false, out _),
$"AuditLog:{nameof(AuditLogOptions.PerChannelRetentionDays)} key '{channelKey}' " +
$"is not a recognized channel name. Valid keys: {string.Join(", ", Enum.GetNames<AuditChannel>())}.");
// Valid when days is within [MinRetentionDays, RetentionDays] inclusive.
// The lower bound matches the global RetentionDays floor; the upper bound
// is the configured global window (longer is meaningless — see remarks).
builder.RequireThat(
!(days < MinRetentionDays || days > options.RetentionDays),
$"AuditLog:{nameof(AuditLogOptions.PerChannelRetentionDays)}['{channelKey}'] ({days}) " +
$"must be in [{MinRetentionDays}, {nameof(AuditLogOptions.RetentionDays)}={options.RetentionDays}] days " +
"— a per-channel window must be shorter than or equal to the global retention window.");
}
}
}
@@ -25,4 +25,15 @@ public sealed class PerTargetRedactionOverride
/// rows.
/// </summary>
public string? RedactSqlParamsMatching { get; set; }
/// <summary>
/// When <c>true</c>, the inbound API audit row for this target records
/// request/response headers and metadata (status, duration, actor, etc.)
/// but the request and response body strings are omitted
/// (<c>RequestSummary</c> / <c>ResponseSummary</c> are left null). The
/// audit row itself is always emitted — only the body content is suppressed.
/// Null (the default, equivalent to <c>false</c>) means body capture
/// proceeds normally up to <see cref="AuditLogOptions.InboundMaxBytes"/>.
/// </summary>
public bool SkipBodyCapture { get; set; }
}
@@ -200,6 +200,13 @@ public static class ServiceCollectionExtensions
// surface on the central dashboard.
services.TryAddSingleton<ICentralAuditWriteFailureCounter, NoOpCentralAuditWriteFailureCounter>();
// M5.3 (T7): inbound body-ceiling hit counter — NoOp default for
// site/test roots. AddAuditLogCentralMaintenance replaces this binding
// with the AuditCentralHealthSnapshot implementation so ceiling-hit
// counts surface on the central dashboard alongside write-failure and
// redaction-failure counters.
services.TryAddSingleton<IAuditInboundCeilingHitsCounter, NoOpAuditInboundCeilingHitsCounter>();
// M4 Bundle B: central direct-write audit writer used by
// NotificationOutboxActor (Bundle B) and Inbound API (Bundle C/D) to
// emit AuditLog rows that originate ON central, not via site telemetry.
@@ -383,6 +390,12 @@ public static class ServiceCollectionExtensions
// HealthMetricsAuditRedactionFailureCounter shape one-for-one.
services.Replace(ServiceDescriptor.Singleton<IAuditRedactionFailureCounter,
CentralAuditRedactionFailureCounter>());
// M5.3 (T7): replace the NoOp IAuditInboundCeilingHitsCounter with the
// AuditCentralHealthSnapshot so ceiling-hit counts surface on the
// central dashboard. Same singleton-forward pattern as
// ICentralAuditWriteFailureCounter above.
services.Replace(ServiceDescriptor.Singleton<IAuditInboundCeilingHitsCounter>(
sp => sp.GetRequiredService<AuditCentralHealthSnapshot>()));
return services;
}
@@ -0,0 +1,113 @@
using System.Text;
using System.Text.Json;
namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
/// <summary>
/// Arguments for an <c>audit backfill-source-node</c> invocation.
/// </summary>
public sealed class AuditBackfillSourceNodeArgs
{
/// <summary>
/// Value written into <c>SourceNode</c> for NULL rows (default <c>"unknown"</c>).
/// </summary>
public string Sentinel { get; set; } = "unknown";
/// <summary>
/// Only rows with <c>OccurredAtUtc</c> strictly before this UTC datetime are
/// eligible. Required — must be an ISO-8601 UTC datetime.
/// </summary>
public string Before { get; set; } = string.Empty;
/// <summary>
/// Maximum rows updated per batch (default 5000). Caps the per-transaction
/// log footprint; the loop repeats until no rows remain.
/// </summary>
public int BatchSize { get; set; } = 5000;
}
/// <summary>
/// Pure helpers for the <c>audit backfill-source-node</c> subcommand (Audit Log
/// #23 M5.6 T5). Builds the request body, POSTs to
/// <c>/api/audit/backfill-source-node</c>, and renders the result. Kept separate
/// from the command wiring so each piece is unit-testable without standing up the
/// command tree.
/// </summary>
public static class AuditBackfillHelpers
{
private static readonly JsonSerializerOptions JsonWriteOptions = new()
{
WriteIndented = true,
};
/// <summary>
/// Builds the JSON request body for <c>POST /api/audit/backfill-source-node</c>.
/// </summary>
/// <param name="args">The backfill arguments.</param>
/// <returns>A JSON string suitable for the request body.</returns>
public static string BuildRequestBody(AuditBackfillSourceNodeArgs args)
{
var obj = new
{
sentinel = args.Sentinel,
before = args.Before,
batchSize = args.BatchSize,
};
return JsonSerializer.Serialize(obj);
}
/// <summary>
/// Executes the backfill: POSTs <c>/api/audit/backfill-source-node</c> and
/// prints the result. Returns the process exit code (0 = success,
/// 1 = error, 2 = authorization failure).
/// </summary>
/// <param name="client">The management HTTP client.</param>
/// <param name="args">The backfill arguments.</param>
/// <param name="output">The output writer for results.</param>
/// <returns>A task that resolves to the process exit code.</returns>
public static async Task<int> RunBackfillAsync(
ManagementHttpClient client,
AuditBackfillSourceNodeArgs args,
TextWriter output)
{
var body = BuildRequestBody(args);
var response = await client.SendPostAsync(
"api/audit/backfill-source-node", body, TimeSpan.FromMinutes(10));
if (response.JsonData == null)
{
OutputFormatter.WriteError(
response.Error ?? "Backfill request failed.", response.ErrorCode ?? "ERROR");
return CommandHelpers.IsAuthorizationFailure(response) ? 2 : 1;
}
// Parse and display the result.
try
{
using var doc = JsonDocument.Parse(response.JsonData);
var root = doc.RootElement;
var rowsUpdated = root.TryGetProperty("rowsUpdated", out var r)
? r.GetInt64()
: 0L;
var sentinel = root.TryGetProperty("sentinel", out var s)
? s.GetString() ?? args.Sentinel
: args.Sentinel;
var before = root.TryGetProperty("before", out var b)
? b.GetString() ?? args.Before
: args.Before;
output.WriteLine($"SourceNode backfill complete.");
output.WriteLine($" rows updated : {rowsUpdated}");
output.WriteLine($" sentinel : {sentinel}");
output.WriteLine($" before : {before}");
}
catch (JsonException)
{
// Server returned success but non-JSON body — not expected; print raw.
output.WriteLine(response.JsonData);
}
output.Flush();
return 0;
}
}
@@ -6,13 +6,15 @@ namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
/// <summary>
/// The <c>scadabridge audit</c> command group (Audit Log #23 M8). Provides read access to
/// the centralized append-only Audit Log via the Bundle B REST endpoints
/// (<c>GET /api/audit/query</c>, <c>GET /api/audit/export</c>), plus a v1 no-op
/// <c>verify-chain</c> placeholder for the deferred hash-chain tamper-evidence feature.
/// (<c>GET /api/audit/query</c>, <c>GET /api/audit/export</c>,
/// <c>GET /api/audit/tree</c>), plus a v1 no-op <c>verify-chain</c> placeholder
/// for the deferred hash-chain tamper-evidence feature.
/// </summary>
public static class AuditCommands
{
/// <summary>
/// Builds the <c>audit</c> command group with query, export, and verify-chain sub-commands.
/// Builds the <c>audit</c> command group with query, export, tree, and verify-chain
/// sub-commands.
/// </summary>
/// <param name="urlOption">Global <c>--url</c> option for the management API endpoint.</param>
/// <param name="formatOption">Global <c>--format</c> option for output format.</param>
@@ -25,7 +27,9 @@ public static class AuditCommands
command.Add(BuildQuery(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildExport(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildTree(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildVerifyChain(urlOption, formatOption, usernameOption, passwordOption));
command.Add(BuildBackfillSourceNode(urlOption, formatOption, usernameOption, passwordOption));
return command;
}
@@ -224,6 +228,44 @@ public static class AuditCommands
return cmd;
}
private static Command BuildTree(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var executionIdOption = new Option<string>("--execution-id")
{
Description = "Execution ID (GUID) to look up — may be any node in the chain",
Required = true,
};
var cmd = new Command("tree") { Description = "Display the full execution-chain tree for an audit execution" };
cmd.Add(executionIdOption);
cmd.SetAction(async (ParseResult result) =>
{
var connection = AuditCommandHelpers.ResolveConnection(result, urlOption, usernameOption, passwordOption);
if (connection.Error != null)
{
OutputFormatter.WriteError(connection.Error, connection.ErrorCode!);
return 1;
}
var rawId = result.GetValue(executionIdOption);
if (!Guid.TryParse(rawId, out var executionId))
{
OutputFormatter.WriteError(
$"Invalid execution ID '{rawId}'. Expected a GUID (e.g. 11111111-1111-1111-1111-111111111111).",
"INVALID_ARGUMENT");
return 1;
}
var format = AuditCommandHelpers.ResolveFormat(result, formatOption);
using var client = new ManagementHttpClient(connection.Url!, connection.Username!, connection.Password!);
return await AuditTreeHelpers.RunTreeAsync(client, executionId, format, Console.Out);
});
return cmd;
}
private static Command BuildVerifyChain(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var monthOption = new Option<string>("--month") { Description = "Month to verify (YYYY-MM)", Required = true };
@@ -247,4 +289,76 @@ public static class AuditCommands
});
return cmd;
}
/// <summary>
/// Builds the <c>audit backfill-source-node</c> sub-command (Audit Log #23 M5.6 T5).
/// Sets <c>SourceNode</c> on historical pre-feature rows whose <c>SourceNode IS NULL</c>
/// and <c>OccurredAtUtc</c> is older than <c>--before</c>, in batches. Admin-only.
/// </summary>
private static Command BuildBackfillSourceNode(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
{
var sentinelOption = new Option<string>("--sentinel")
{
Description = "Value to write for pre-feature rows whose node-of-origin is unknown (default: unknown)",
};
sentinelOption.DefaultValueFactory = _ => "unknown";
var beforeOption = new Option<string>("--before")
{
Description = "ISO-8601 UTC datetime; only rows older than this date are eligible (required)",
Required = true,
};
var batchOption = new Option<int>("--batch")
{
Description = "Max rows updated per batch (default: 5000)",
};
batchOption.DefaultValueFactory = _ => 5000;
var cmd = new Command("backfill-source-node")
{
Description = "Set SourceNode to a sentinel value on pre-feature rows where it is NULL (admin-only, maintenance path)",
};
cmd.Add(sentinelOption);
cmd.Add(beforeOption);
cmd.Add(batchOption);
cmd.SetAction(async (ParseResult result) =>
{
var connection = AuditCommandHelpers.ResolveConnection(result, urlOption, usernameOption, passwordOption);
if (connection.Error != null)
{
OutputFormatter.WriteError(connection.Error, connection.ErrorCode!);
return 1;
}
var sentinel = result.GetValue(sentinelOption) ?? "unknown";
var before = result.GetValue(beforeOption)!;
var batch = result.GetValue(batchOption);
if (string.IsNullOrWhiteSpace(sentinel))
{
OutputFormatter.WriteError("--sentinel must be a non-empty string.", "INVALID_ARGUMENT");
return 1;
}
if (batch <= 0)
{
OutputFormatter.WriteError("--batch must be > 0.", "INVALID_ARGUMENT");
return 1;
}
var args = new AuditBackfillSourceNodeArgs
{
Sentinel = sentinel,
Before = before,
BatchSize = batch,
};
using var client = new ManagementHttpClient(connection.Url!, connection.Username!, connection.Password!);
return await AuditBackfillHelpers.RunBackfillAsync(client, args, Console.Out);
});
return cmd;
}
}
@@ -0,0 +1,208 @@
using System.Text;
using System.Text.Json;
namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
/// <summary>
/// Arguments for an <c>audit tree</c> invocation.
/// </summary>
public sealed class AuditTreeArgs
{
/// <summary>
/// The execution ID (GUID) to look up. May be any node in the chain — the
/// server walks to the root and returns the full tree.
/// </summary>
public string ExecutionId { get; set; } = string.Empty;
}
/// <summary>
/// Represents one execution node as returned by <c>GET /api/audit/tree</c>.
/// Property names match the server's camelCase JSON serialisation of
/// <c>ExecutionTreeNode</c>.
/// </summary>
internal sealed class AuditTreeNodeDto
{
public Guid ExecutionId { get; init; }
public Guid? ParentExecutionId { get; init; }
public int RowCount { get; init; }
public string[] Channels { get; init; } = Array.Empty<string>();
public string[] Statuses { get; init; } = Array.Empty<string>();
public string? SourceSiteId { get; init; }
public string? SourceInstanceId { get; init; }
public DateTime? FirstOccurredAtUtc { get; init; }
public DateTime? LastOccurredAtUtc { get; init; }
}
/// <summary>
/// Pure helpers for the <c>audit tree</c> subcommand: builds the query string,
/// calls <c>GET /api/audit/tree</c>, and renders the result as either an
/// indented ASCII tree (table format) or raw JSON. Kept separate from the
/// command wiring so each piece is unit-testable without standing up the
/// command tree.
/// </summary>
public static class AuditTreeHelpers
{
private static readonly JsonSerializerOptions JsonReadOptions = new()
{
PropertyNameCaseInsensitive = true,
};
private static readonly JsonSerializerOptions JsonWriteOptions = new()
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
WriteIndented = true,
};
/// <summary>
/// Builds the query string for <c>GET /api/audit/tree</c>.
/// </summary>
/// <param name="executionId">The execution ID GUID.</param>
/// <returns>A relative path + query string ready to append to the base URL.</returns>
public static string BuildUrl(Guid executionId)
=> $"api/audit/tree?executionId={executionId:D}";
/// <summary>
/// Executes the tree lookup: GETs <c>/api/audit/tree</c> and renders the result
/// in the requested format. Returns the process exit code (0 = success,
/// 1 = error, 2 = authorization failure).
/// </summary>
/// <param name="client">The management HTTP client.</param>
/// <param name="executionId">The execution ID to look up.</param>
/// <param name="format">"table" (default) or "json".</param>
/// <param name="output">The output writer for results.</param>
/// <returns>A task that resolves to the process exit code.</returns>
public static async Task<int> RunTreeAsync(
ManagementHttpClient client,
Guid executionId,
string format,
TextWriter output)
{
var url = BuildUrl(executionId);
var response = await client.SendGetAsync(url, TimeSpan.FromSeconds(30));
if (response.JsonData == null)
{
OutputFormatter.WriteError(
response.Error ?? "Audit tree request failed.", response.ErrorCode ?? "ERROR");
return CommandHelpers.IsAuthorizationFailure(response) ? 2 : 1;
}
var nodes = ParseNodes(response.JsonData);
if (format == "json")
{
WriteJson(nodes, output);
}
else
{
WriteTable(nodes, executionId, output);
}
output.Flush();
return 0;
}
/// <summary>
/// Parses the JSON array from the server into an array of
/// <see cref="AuditTreeNodeDto"/>.
/// </summary>
/// <param name="json">The raw JSON response body.</param>
/// <returns>An array of deserialized tree nodes (empty on parse failure).</returns>
internal static AuditTreeNodeDto[] ParseNodes(string json)
{
try
{
return JsonSerializer.Deserialize<AuditTreeNodeDto[]>(json, JsonReadOptions)
?? Array.Empty<AuditTreeNodeDto>();
}
catch (JsonException)
{
return Array.Empty<AuditTreeNodeDto>();
}
}
/// <summary>
/// Renders the nodes as pretty-printed JSON to <paramref name="output"/>.
/// </summary>
internal static void WriteJson(AuditTreeNodeDto[] nodes, TextWriter output)
{
output.WriteLine(JsonSerializer.Serialize(nodes, JsonWriteOptions));
}
/// <summary>
/// Renders the nodes as an indented ASCII tree. The root node (null
/// <c>ParentExecutionId</c>) is printed first; each child is indented
/// two spaces per depth level. The queried/entry-point node is marked
/// with <c> [*]</c>.
/// </summary>
internal static void WriteTable(
AuditTreeNodeDto[] nodes,
Guid queriedExecutionId,
TextWriter output)
{
if (nodes.Length == 0)
{
output.WriteLine("(no execution tree found)");
return;
}
// Build a parent → children lookup (keyed by non-null parent Guid).
// Nodes whose ParentExecutionId is null are roots and are not placed in
// the lookup; they are identified separately below.
var childrenOf = new Dictionary<Guid, List<AuditTreeNodeDto>>();
foreach (var node in nodes)
{
if (node.ParentExecutionId is { } parentId)
{
if (!childrenOf.ContainsKey(parentId))
childrenOf[parentId] = new List<AuditTreeNodeDto>();
childrenOf[parentId].Add(node);
}
}
// Identify roots: nodes whose ParentExecutionId is null, or whose parent
// is not present in the node set (stub-root case).
var nodeIds = new HashSet<Guid>(nodes.Select(n => n.ExecutionId));
var roots = nodes
.Where(n => n.ParentExecutionId == null || !nodeIds.Contains(n.ParentExecutionId.Value))
.ToList();
// Render depth-first.
var sb = new StringBuilder();
foreach (var root in roots)
{
RenderNode(root, depth: 0, childrenOf, queriedExecutionId, sb);
}
output.Write(sb.ToString());
}
private static void RenderNode(
AuditTreeNodeDto node,
int depth,
Dictionary<Guid, List<AuditTreeNodeDto>> childrenOf,
Guid queriedExecutionId,
StringBuilder sb)
{
var indent = new string(' ', depth * 2);
var marker = node.ExecutionId == queriedExecutionId ? " [*]" : string.Empty;
var channels = node.Channels.Length > 0 ? string.Join(",", node.Channels) : "-";
var statuses = node.Statuses.Length > 0 ? string.Join(",", node.Statuses) : "-";
var site = node.SourceSiteId ?? "-";
var instance = node.SourceInstanceId ?? "-";
var first = node.FirstOccurredAtUtc.HasValue
? node.FirstOccurredAtUtc.Value.ToString("yyyy-MM-ddTHH:mm:ssZ")
: "-";
sb.AppendLine(
$"{indent}{node.ExecutionId:D}{marker} rows={node.RowCount} channels=[{channels}] statuses=[{statuses}] site={site} instance={instance} first={first}");
if (childrenOf.TryGetValue(node.ExecutionId, out var children))
{
foreach (var child in children)
{
RenderNode(child, depth + 1, childrenOf, queriedExecutionId, sb);
}
}
}
}
@@ -142,6 +142,60 @@ public class ManagementHttpClient : IDisposable
return new ManagementResponse((int)httpResponse.StatusCode, null, error, code);
}
/// <summary>
/// Issues a plain HTTP <c>POST</c> against a REST endpoint (e.g. the audit
/// maintenance endpoints) with a JSON body and returns the response. Unlike
/// <see cref="SendCommandAsync"/>, this does not wrap the call in the
/// <c>POST /management</c> command envelope — these are plain REST resources.
/// Authentication (HTTP Basic) and the base address are shared.
/// </summary>
/// <param name="relativePath">Path relative to the base URL.</param>
/// <param name="body">The JSON body to send, or <c>null</c> for an empty body.</param>
/// <param name="timeout">The request timeout.</param>
/// <returns>A management response containing status and data.</returns>
public async Task<ManagementResponse> SendPostAsync(string relativePath, string? body, TimeSpan timeout)
{
using var cts = new CancellationTokenSource(timeout);
var content = new StringContent(body ?? "{}", Encoding.UTF8, "application/json");
HttpResponseMessage httpResponse;
try
{
httpResponse = await _httpClient.PostAsync(relativePath, content, cts.Token);
}
catch (TaskCanceledException)
{
return new ManagementResponse(504, null, "Request timed out.", "TIMEOUT");
}
catch (HttpRequestException ex)
{
return new ManagementResponse(0, null, $"Connection failed: {ex.Message}", "CONNECTION_FAILED");
}
var responseBody = await httpResponse.Content.ReadAsStringAsync(cts.Token);
if (httpResponse.IsSuccessStatusCode)
{
return new ManagementResponse((int)httpResponse.StatusCode, responseBody, null, null);
}
string? error = null;
string? code = null;
try
{
using var doc = JsonDocument.Parse(responseBody);
error = doc.RootElement.TryGetProperty("error", out var e) ? e.GetString() : responseBody;
code = doc.RootElement.TryGetProperty("code", out var c) ? c.GetString() : null;
}
catch
{
error = responseBody;
}
return new ManagementResponse((int)httpResponse.StatusCode, null, error, code);
}
/// <summary>
/// Issues a plain HTTP <c>GET</c> and returns the raw <see cref="HttpResponseMessage"/>
/// so the caller can stream the response body without buffering it in memory — used
+56 -13
View File
@@ -1269,15 +1269,18 @@ script-trust-boundary action: outbound API calls (sync + cached), outbound DB
operations (sync + cached), notifications, and inbound API calls. This is distinct
from the configuration-change audit trail exposed by [`audit-config`](#audit-config--configuration-change-audit-log).
The subcommands map directly onto the `GET /api/audit/query` and
`GET /api/audit/export` management endpoints. Filters and the result columns mirror
the Central UI **Audit** page, so a CLI query and a UI query with the same filters
return the same rows — CLI ↔ UI filter parity is intentional.
The subcommands map directly onto the `GET /api/audit/query`,
`GET /api/audit/export`, `GET /api/audit/tree`, and
`POST /api/audit/backfill-source-node` management endpoints. Filters and the
result columns mirror the Central UI **Audit** page, so a CLI query and a UI
query with the same filters return the same rows — CLI ↔ UI filter parity is
intentional.
**Permissions.** Querying requires the `OperationalAudit` permission (roles `Admin`,
`Audit`, or `AuditReadOnly`). Exporting requires the stricter `AuditExport` permission
(roles `Admin` or `Audit`) — read access does *not* imply export access. A request
without the required role returns exit code `2`.
**Permissions.** Querying and tree traversal require the `OperationalAudit`
permission (roles `Admin`, `Audit`, or `AuditReadOnly`). Exporting requires the
stricter `AuditExport` permission (roles `Admin` or `Audit`) — read access does
*not* imply export access. The `backfill-source-node` maintenance command requires
the `Admin` role. A request without the required role returns exit code `2`.
#### `audit query`
@@ -1342,6 +1345,46 @@ scadabridge --url <url> audit export --since <time> --until <time> --format <fmt
> Implemented` — Parquet archival is deferred to v1.x (see `Component-AuditLog.md`).
> Use `csv` or `jsonl`.
#### `audit tree` (M5.3 T8)
Display the full execution-chain tree for a given execution ID. The server walks
`ParentExecutionId` to find the root, then traverses downward to collect all
reachable executions in the chain.
```sh
scadabridge --url <url> audit tree --execution-id <guid> [--format table|json]
```
| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--execution-id` | yes | — | Any `ExecutionId` in the chain (root or child) |
| `--format` | no | `json` | Output format: `json` (structured tree) or `table` (indented tree) |
The `--execution-id` can be any node in the chain — the server resolves the root
automatically. With `--format table` the tree is printed as an indented text
representation. With `--format json` (the default) a structured JSON tree is
returned, suitable for scripting. Backed by `GET /api/audit/tree?executionId=<guid>`.
Requires `OperationalAudit` permission.
#### `audit backfill-source-node` (M5.6 T5)
Set `SourceNode` to a sentinel value on pre-feature rows where `SourceNode IS NULL`
and `OccurredAtUtc` is older than `--before`. Admin-only maintenance command.
```sh
scadabridge --url <url> audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
```
| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--before` | yes | — | ISO-8601 UTC datetime; only rows older than this date are eligible |
| `--sentinel` | no | `unknown` | Value to write (must be non-empty) |
| `--batch` | no | `5000` | Max rows updated per batch; controls transaction size |
The command is idempotent — running it multiple times converges (only rows where
`SourceNode IS NULL` are eligible; already-set rows are untouched). Backed by
`POST /api/audit/backfill-source-node`. Requires `Admin` role.
#### `audit verify-chain`
Verify the audit log hash chain for a given month.
@@ -1354,11 +1397,11 @@ scadabridge --url <url> audit verify-chain --month <YYYY-MM>
|--------|----------|---------|-------------|
| `--month` | yes | — | Month to verify, `YYYY-MM` (e.g. `2026-05`) |
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release. The
> subcommand validates the `--month` argument and prints a notice pointing at the
> v1.x roadmap in `Component-AuditLog.md`; it exits `0` without contacting the server.
> The command exists now so scripts and operator habits do not need to change when
> tamper-evidence ships.
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release (T1
> deferred to v1.x). The subcommand validates the `--month` argument and prints a
> notice pointing at the v1.x roadmap in `Component-AuditLog.md`; it exits `0`
> without contacting the server. The command exists now so scripts and operator
> habits do not need to change when tamper-evidence ships.
---
@@ -58,3 +58,31 @@
{
<div class="text-muted small mb-3">Site Call KPIs unavailable: @ErrorMessage</div>
}
@* ── Per-node stuck/parked sub-table (T6: M5.2 per-node stuck-count KPIs) ── *@
@if (HasNodeBreakdown)
{
<div class="mb-3">
<div class="d-flex justify-content-between align-items-center mb-1">
<small class="text-muted">By node</small>
</div>
<table class="table table-sm table-borderless mb-0 site-call-kpi-node-table">
<thead class="table-light">
<tr>
<th class="small py-1">Node</th>
<th class="text-end small py-1">Stuck</th>
<th class="text-end small py-1">Parked</th>
</tr>
</thead>
<tbody>
@foreach (var n in PerNodeSnapshots!)
{
<tr @key="n.SourceNode">
<td class="small py-1"><code>@n.SourceNode</code></td>
<td class="text-end font-monospace small py-1 @(n.StuckCount > 0 ? "text-warning" : "")">@n.StuckCount</td>
<td class="text-end font-monospace small py-1 @(n.ParkedCount > 0 ? "text-danger" : "")">@n.ParkedCount</td>
</tr>
}
</tbody>
</table>
</div>
}
@@ -1,5 +1,6 @@
using Microsoft.AspNetCore.Components;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
namespace ZB.MOM.WW.ScadaBridge.CentralUI.Components.Health;
@@ -59,6 +60,24 @@ public partial class SiteCallKpiTiles
/// </summary>
[Parameter] public string? ErrorMessage { get; set; }
/// <summary>
/// Optional per-node KPI breakdown (T6: M5.2 per-node stuck-count KPIs).
/// When non-null and non-empty, a compact node-level stuck/parked sub-table
/// is rendered below the main tiles. <c>null</c> means the parent has not
/// loaded it yet or has opted out — the sub-table is suppressed entirely.
/// </summary>
[Parameter] public IReadOnlyList<SiteCallNodeKpiSnapshot>? PerNodeSnapshots { get; set; }
/// <summary>
/// True when <see cref="PerNodeSnapshots"/> is a successful query result.
/// Used to suppress the sub-table on a load failure.
/// </summary>
[Parameter] public bool PerNodeAvailable { get; set; }
/// <summary>Whether the per-node sub-table has data to render.</summary>
internal bool HasNodeBreakdown =>
PerNodeAvailable && PerNodeSnapshots is { Count: > 0 };
// ── Buffered tile ───────────────────────────────────────────────────────
private string BufferedDisplay =>
@@ -9,6 +9,7 @@
@using ZB.MOM.WW.ScadaBridge.HealthMonitoring
@using ZB.MOM.WW.ScadaBridge.Commons.Messages.Notification
@using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit
@using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit
@using ZB.MOM.WW.ScadaBridge.Communication
@implements IDisposable
@inject ICentralHealthAggregator HealthAggregator
@@ -65,7 +66,9 @@
(buffered / stuck / parked). Refreshed alongside the site states. *@
<SiteCallKpiTiles Snapshot="@_siteCallKpi"
IsAvailable="@_siteCallKpiAvailable"
ErrorMessage="@_siteCallKpiError" />
ErrorMessage="@_siteCallKpiError"
PerNodeSnapshots="@_siteCallNodeKpis"
PerNodeAvailable="@_siteCallNodeKpiAvailable" />
@* Audit Log (#23) M7 Bundle E — three KPI tiles for the Audit channel
(volume / error rate / backlog). Refreshed alongside the site states. *@
@@ -378,6 +381,12 @@
private bool _siteCallKpiAvailable;
private string? _siteCallKpiError;
// Per-node Site Call KPI breakdown (T6: M5.2 per-node stuck-count KPIs).
// Passed to SiteCallKpiTiles as an optional sub-table.
private IReadOnlyList<SiteCallNodeKpiSnapshot> _siteCallNodeKpis =
Array.Empty<SiteCallNodeKpiSnapshot>();
private bool _siteCallNodeKpiAvailable;
private static bool SiteHasActiveErrors(SiteHealthState state)
{
var report = state.LatestReport;
@@ -415,7 +424,7 @@
{
_siteStates = HealthAggregator.GetAllSiteStates();
await LoadOutboxKpis();
await LoadSiteCallKpis();
await Task.WhenAll(LoadSiteCallKpis(), LoadSiteCallNodeKpis());
await LoadAuditKpis();
}
@@ -474,6 +483,30 @@
}
}
// Per-node site-call KPI loader (T6: M5.2). Best-effort; a fault silently
// suppresses the per-node sub-table rather than degrading the dashboard.
private async Task LoadSiteCallNodeKpis()
{
try
{
var response = await CommunicationService.GetPerNodeSiteCallKpisAsync(
new PerNodeSiteCallKpiRequest(Guid.NewGuid().ToString("N")));
if (response.Success)
{
_siteCallNodeKpis = response.Nodes;
_siteCallNodeKpiAvailable = true;
}
else
{
_siteCallNodeKpiAvailable = false;
}
}
catch
{
_siteCallNodeKpiAvailable = false;
}
}
// Tiles show the numeric KPI when available, or an em dash when the outbox
// KPI query failed — matching how the page renders other unavailable data.
private string OutboxTileValue(int value) =>
@@ -69,6 +69,51 @@
</div>
}
@* ── Per-node breakdown (T6: additive) ── *@
<h5 class="mb-2">Per-node breakdown</h5>
@if (_perNodeError != null)
{
<div class="alert alert-warning py-2">Per-node KPIs unavailable: @_perNodeError</div>
}
else if (_perNode.Count == 0)
{
<div class="card mb-3">
<div class="card-body text-center text-muted py-3">
<div class="small">No per-node activity (rows may have a null SourceNode).</div>
</div>
</div>
}
else
{
<div class="table-responsive mb-3">
<table class="table table-sm table-hover align-middle">
<thead class="table-light">
<tr>
<th>Node</th>
<th class="text-end">Queue Depth</th>
<th class="text-end">Stuck</th>
<th class="text-end">Parked</th>
<th class="text-end">Delivered (last interval)</th>
<th class="text-end">Oldest Pending Age</th>
</tr>
</thead>
<tbody>
@foreach (var n in _perNode)
{
<tr @key="n.SourceNode" class="@(n.StuckCount > 0 ? "table-warning" : "")">
<td><code>@n.SourceNode</code></td>
<td class="text-end font-monospace">@n.QueueDepth</td>
<td class="text-end font-monospace @(n.StuckCount > 0 ? "text-warning" : "")">@n.StuckCount</td>
<td class="text-end font-monospace @(n.ParkedCount > 0 ? "text-danger" : "")">@n.ParkedCount</td>
<td class="text-end font-monospace text-success">@n.DeliveredLastInterval</td>
<td class="text-end font-monospace">@FormatAge(n.OldestPendingAge)</td>
</tr>
}
</tbody>
</table>
</div>
}
@* ── Per-site breakdown ── *@
<h5 class="mb-2">Per-site breakdown</h5>
@if (_perSiteError != null)
@@ -124,6 +169,10 @@
private IReadOnlyList<SiteNotificationKpiSnapshot> _perSite = Array.Empty<SiteNotificationKpiSnapshot>();
private string? _perSiteError;
// ── Per-node (T6: M5.2 per-node stuck-count KPIs) ──
private IReadOnlyList<NodeNotificationKpiSnapshot> _perNode = Array.Empty<NodeNotificationKpiSnapshot>();
private string? _perNodeError;
private bool _loading;
protected override async Task OnInitializedAsync()
@@ -144,9 +193,9 @@
private async Task RefreshAll()
{
_loading = true;
// Race-free despite both tasks mutating component fields: Blazor Server runs
// Race-free despite all tasks mutating component fields: Blazor Server runs
// every continuation on the circuit's single-threaded synchronization context.
await Task.WhenAll(LoadGlobalKpis(), LoadPerSiteKpis());
await Task.WhenAll(LoadGlobalKpis(), LoadPerSiteKpis(), LoadPerNodeKpis());
_loading = false;
}
@@ -194,6 +243,28 @@
}
}
private async Task LoadPerNodeKpis()
{
try
{
var response = await CommunicationService.GetPerNodeNotificationKpisAsync(
new PerNodeNotificationKpiRequest(Guid.NewGuid().ToString("N")));
if (response.Success)
{
_perNode = response.Nodes;
_perNodeError = null;
}
else
{
_perNodeError = response.ErrorMessage ?? "Per-node KPI query failed.";
}
}
catch (Exception ex)
{
_perNodeError = $"Per-node KPI query failed: {ex.Message}";
}
}
private string SiteName(string siteId) =>
_sites.FirstOrDefault(s => s.SiteIdentifier == siteId)?.Name ?? siteId;
@@ -87,6 +87,42 @@ public interface IAuditLogRepository
/// <returns>A task that resolves to the approximate number of rows discarded by the partition switch.</returns>
Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default);
/// <summary>
/// M5.5 (T3) per-channel retention override purge. Deletes <c>AuditLog</c> rows for a
/// single <paramref name="channel"/> (matched against the canonical
/// <c>Category</c> column — the bare channel name, e.g. <c>ApiOutbound</c>) whose
/// <c>OccurredAtUtc</c> is strictly older than <paramref name="threshold"/>, in
/// bounded batches of <paramref name="batchSize"/> rows, looping until no further
/// rows match. Returns the total number of rows deleted across all batches.
/// </summary>
/// <remarks>
/// <para>
/// <b>Maintenance path — NOT the writer role.</b> The append-only invariant binds
/// the <c>scadabridge_audit_writer</c> ingest role (INSERT + SELECT only). This row
/// DELETE runs on the purge/maintenance connection, the same path that performs the
/// global partition switch-out (also a destructive operation forbidden to the writer
/// role). Per-channel overrides can only ever expire rows EARLIER than the global
/// month-partition switch-out would — never later — so this is a strict tightening
/// of the retention window, applied AFTER the global purge on the same tick.
/// </para>
/// <para>
/// <b>Bounded + idempotent.</b> Each batch is a <c>DELETE TOP (@batch)</c> so the
/// transaction log and lock footprint stay bounded regardless of backlog. Re-running
/// the purge is a no-op once every eligible row is gone (the loop exits when a batch
/// deletes zero rows), so a crash mid-loop is recoverable by simply running again.
/// </para>
/// </remarks>
/// <param name="channel">Canonical channel name (the <c>Category</c> column value, e.g. <c>ApiOutbound</c>).</param>
/// <param name="threshold">Rows with <c>OccurredAtUtc</c> strictly older than this UTC datetime are deleted.</param>
/// <param name="batchSize">Maximum rows deleted per batch; must be &gt; 0.</param>
/// <param name="ct">Cancellation token.</param>
/// <returns>A task that resolves to the total number of rows deleted across all batches.</returns>
Task<long> PurgeChannelOlderThanAsync(
string channel,
DateTime threshold,
int batchSize,
CancellationToken ct = default);
/// <summary>
/// Returns the set of <c>pf_AuditLog_Month</c> partition lower-bound
/// boundaries whose partitions contain only rows with
@@ -201,4 +237,59 @@ public interface IAuditLogRepository
/// <param name="ct">Cancellation token.</param>
/// <returns>A task that resolves to the distinct, non-null source node names in ascending order.</returns>
Task<IReadOnlyList<string>> GetDistinctSourceNodesAsync(CancellationToken ct = default);
/// <summary>
/// M5.6 (T5) one-time operational backfill: sets <c>SourceNode</c> to
/// <paramref name="sentinel"/> on every row where <c>SourceNode IS NULL</c>
/// and <c>OccurredAtUtc &lt; <paramref name="before"/></c>, in bounded
/// batches of <paramref name="batchSize"/> rows, looping until no further
/// rows match. Returns the total number of rows updated across all batches.
/// </summary>
/// <remarks>
/// <para>
/// <b>Why a sentinel, not the real value.</b> <c>SourceNode</c> captures the
/// physical cluster node on which an event was emitted. For pre-feature rows
/// that were ingested before the column was stamped, the true node-of-origin
/// is UNKNOWABLE — the original emitter is long gone and there is no
/// retroactive way to determine it. Backfilling a configurable sentinel
/// (default <c>"unknown"</c>) makes it explicit that these rows pre-date the
/// feature rather than silently leaving them NULL (which the filter UI already
/// treats as "unresolved" but which an operator might mistake for a bug).
/// </para>
/// <para>
/// <b><c>ExecutionId</c> / <c>ParentExecutionId</c> cannot be backfilled.</b>
/// These are PERSISTED COMPUTED columns derived from <c>DetailsJson</c>. The
/// AuditLog append-only invariant forbids mutating <c>DetailsJson</c>, so
/// the computed values for pre-feature rows remain NULL permanently. This is
/// documented rather than coded — see the Ops Note in
/// <c>Component-AuditLog.md § Ops Notes — Historical Null Columns</c>.
/// </para>
/// <para>
/// <b>Maintenance path — NOT the writer role.</b> This UPDATE runs on the
/// purge/maintenance connection (the same path as
/// <see cref="SwitchOutPartitionAsync"/> and any per-channel purge), NOT the
/// append-only <c>scadabridge_audit_writer</c> role. The CI guard
/// (<c>AuditLogAppendOnlyGuardTests</c>) recognises the
/// <c>// AUDIT-PURGE-ALLOWED</c> marker on the UPDATE line and forgives
/// exactly this one sanctioned maintenance-path UPDATE; any other UPDATE
/// against <c>AuditLog</c> still trips the guard.
/// </para>
/// <para>
/// <b>Bounded + idempotent.</b> <c>UPDATE TOP (@batch)</c> caps the
/// transaction-log and lock footprint per statement. The loop exits when a
/// batch updates zero rows, so a crash mid-loop is recoverable by simply
/// running again; re-running after completion is a no-op (no NULL rows
/// remain for the given <paramref name="before"/> window).
/// </para>
/// </remarks>
/// <param name="sentinel">Value to write into <c>SourceNode</c> for pre-feature rows (e.g. <c>"unknown"</c>).</param>
/// <param name="before">Rows with <c>OccurredAtUtc</c> strictly older than this UTC datetime are eligible.</param>
/// <param name="batchSize">Maximum rows updated per batch; must be &gt; 0.</param>
/// <param name="ct">Cancellation token.</param>
/// <returns>A task that resolves to the total number of rows updated across all batches.</returns>
Task<long> BackfillSourceNodeAsync(
string sentinel,
DateTime before,
int batchSize,
CancellationToken ct = default);
}
@@ -100,6 +100,19 @@ public interface INotificationOutboxRepository
Task<IReadOnlyList<SiteNotificationKpiSnapshot>> ComputePerSiteKpisAsync(
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default);
/// <summary>
/// Computes a point-in-time <see cref="NodeNotificationKpiSnapshot"/> per originating node.
/// Nodes with no notification rows at all are omitted; rows with a <c>NULL</c>
/// <c>SourceNode</c> are excluded. The stuck and delivered cutoffs are supplied by the
/// caller; the current time used for <c>OldestPendingAge</c> is captured inside the method.
/// </summary>
/// <param name="stuckCutoff">The time threshold for marking notifications as stuck.</param>
/// <param name="deliveredSince">The time threshold for counting delivered notifications.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>A list of per-node KPI snapshots, ordered by node name.</returns>
Task<IReadOnlyList<NodeNotificationKpiSnapshot>> ComputePerNodeKpisAsync(
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default);
/// <summary>
/// Persists pending changes tracked on the underlying context. Use this when staging
/// multiple changes for a single commit; the individual mutating methods on this
@@ -107,4 +107,19 @@ public interface ISiteCallAuditRepository
DateTime stuckCutoff,
DateTime intervalSince,
CancellationToken ct = default);
/// <summary>
/// Computes a point-in-time <see cref="SiteCallNodeKpiSnapshot"/> per originating
/// node. Nodes with no <c>SiteCalls</c> rows at all are omitted; rows with a
/// <c>NULL</c> <c>SourceNode</c> are excluded. The stuck cutoff and interval
/// bounds are interpreted as in <see cref="ComputeKpisAsync"/>.
/// </summary>
/// <param name="stuckCutoff">UTC threshold for classifying a row as stuck.</param>
/// <param name="intervalSince">UTC start of the delivered/failed interval window.</param>
/// <param name="ct">Cancellation token.</param>
/// <returns>A task that resolves to a per-node KPI list; nodes with no rows are omitted.</returns>
Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff,
DateTime intervalSince,
CancellationToken ct = default);
}
@@ -164,3 +164,24 @@ public sealed record PerSiteSiteCallKpiResponse(
bool Success,
string? ErrorMessage,
IReadOnlyList<SiteCallSiteKpiSnapshot> Sites);
/// <summary>
/// Site Calls UI -> Central: request for the per-node <c>SiteCalls</c>
/// KPI breakdown. Mirrors <see cref="PerSiteSiteCallKpiRequest"/> but groups
/// by <c>SourceNode</c> instead of <c>SourceSite</c>. Additive — does not
/// change per-site behaviour.
/// </summary>
public sealed record PerNodeSiteCallKpiRequest(
string CorrelationId);
/// <summary>
/// Central -> Site Calls UI: per-node KPI breakdown for the Site Calls KPIs
/// page. On a repository fault <see cref="Success"/> is <c>false</c>,
/// <see cref="ErrorMessage"/> carries the cause, and <see cref="Nodes"/> is empty.
/// Nodes with a <c>NULL</c> <c>SourceNode</c> are omitted.
/// </summary>
public sealed record PerNodeSiteCallKpiResponse(
string CorrelationId,
bool Success,
string? ErrorMessage,
IReadOnlyList<SiteCallNodeKpiSnapshot> Nodes);
@@ -83,3 +83,46 @@ public record RouteToSetAttributesResponse(
bool Success,
string? ErrorMessage,
DateTimeOffset Timestamp);
/// <summary>
/// Request to block until a remote instance attribute reaches a target value
/// (spec §6 — <c>Route.To("inst").WaitForAttribute(name, targetValue, timeout)</c>).
/// Value-equality ONLY across the wire: <see cref="TargetValueEncoded"/> carries the
/// canonical <c>AttributeValueCodec</c>-encoded target; there is no predicate and no
/// quality flag in the comparison. The site evaluates equality and either matches or
/// times out.
/// </summary>
/// <param name="ParentExecutionId">
/// Audit Log #23 (ParentExecutionId): mirrors <see cref="RouteToCallRequest.ParentExecutionId"/>.
/// For an inbound-API-routed wait this is the inbound request's per-request execution id;
/// future site-side audit emission for routed waits can stamp it as <c>ParentExecutionId</c>
/// so the inbound→site execution-tree link survives the wait path. Additive trailing
/// member — null for the Central UI sandbox path or for callers built before the field existed.
/// </param>
public record RouteToWaitForAttributeRequest(
string CorrelationId,
string InstanceUniqueName,
string AttributeName,
string? TargetValueEncoded,
TimeSpan Timeout,
DateTimeOffset Timestamp,
Guid? ParentExecutionId = null);
/// <summary>
/// Response from a remote attribute wait. <see cref="Success"/>/<see cref="ErrorMessage"/>
/// convey the routing-level outcome (e.g. instance-not-found); <see cref="Matched"/>,
/// <see cref="TimedOut"/>, <see cref="Value"/>, and <see cref="Quality"/> convey the wait
/// outcome itself. When <see cref="Success"/> is <c>true</c>, exactly one of
/// <see cref="Matched"/>/<see cref="TimedOut"/> holds: <see cref="Matched"/> means the
/// attribute reached the target value (with <see cref="Value"/>/<see cref="Quality"/>
/// captured at the match), <see cref="TimedOut"/> means the deadline elapsed first.
/// </summary>
public record RouteToWaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string? Quality,
bool TimedOut,
bool Success,
string? ErrorMessage,
DateTimeOffset Timestamp);
@@ -0,0 +1,82 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
/// <summary>
/// Request to wait, event-driven, until an attribute reaches a value (or any
/// value satisfying a predicate), bounded by a timeout — the backing protocol for
/// the script-facing <c>Attributes.WaitAsync</c> helper.
///
/// <para>
/// <b>Site-local only.</b> The optional <see cref="Predicate"/> is a non-serializable
/// in-process delegate, so this message MUST flow only within a single site node's
/// actor system (script execution → Instance Actor). It is never sent across the
/// ClusterClient / gRPC boundary. The value-equality form (<see cref="TargetValueEncoded"/>)
/// would serialize, but the routed/inbound variant is deliberately out of scope here.
/// </para>
/// </summary>
/// <param name="CorrelationId">Per-wait correlation id; keys the waiter registry and the timeout self-message.</param>
/// <param name="InstanceName">The instance this wait targets.</param>
/// <param name="AttributeName">The attribute to watch — already scope-resolved by the accessor.</param>
/// <param name="TargetValueEncoded">
/// The codec-encoded target value (<c>AttributeValueCodec.Encode(target)</c>). A
/// match compares the codec-encoded form of the current value against this string.
/// When both this and <see cref="Predicate"/> are null the wait matches on ANY change.
/// </param>
/// <param name="Predicate">
/// Site-local predicate tested against the raw (decoded) current value. Mutually
/// exclusive with <see cref="TargetValueEncoded"/> — null when the encoded target is used.
/// </param>
/// <param name="Timeout">How long to wait before self-evicting with a timeout reply.</param>
/// <param name="OccurredAtUtc">When the request was issued (UTC).</param>
/// <param name="RequireGoodQuality">
/// Quality-gated ("Good"-only) mode (spec §4.2): when <see langword="true"/>, a
/// match additionally requires the attribute quality to be exactly
/// <c>"Good"</c> (<see cref="System.StringComparison.Ordinal"/>) — a value that
/// reaches the target / satisfies the predicate at Bad/Uncertain quality is NOT a
/// match and the waiter stays pending until the value satisfies the test at Good
/// quality (or times out). Defaults to <see langword="false"/> (quality-agnostic:
/// the match tests the value only). Trailing/defaulted so existing positional
/// constructions compile unchanged.
/// </param>
public record WaitForAttributeRequest(
string CorrelationId,
string InstanceName,
string AttributeName,
string? TargetValueEncoded,
Func<object?, bool>? Predicate,
TimeSpan Timeout,
DateTimeOffset OccurredAtUtc,
bool RequireGoodQuality = false);
/// <summary>
/// Reply to a <see cref="WaitForAttributeRequest"/>. Exactly one of
/// <see cref="Matched"/> / <see cref="TimedOut"/> is set on the happy paths;
/// <see cref="ErrorMessage"/> is populated on the failure paths (per-instance
/// waiter cap exceeded, or the match predicate threw).
/// </summary>
/// <param name="CorrelationId">Echoes the request's correlation id.</param>
/// <param name="Matched">True when the attribute reached the target/predicate within the timeout.</param>
/// <param name="Value">The matched value (null on timeout / error).</param>
/// <param name="Quality">
/// The attribute quality at match time; <see langword="null"/> on the non-match
/// paths (timeout / error / cap-exceeded), matching the nullable
/// <see cref="ErrorMessage"/> convention.
/// </param>
/// <param name="TimedOut">True when the timeout fired before a match.</param>
/// <param name="ErrorMessage">
/// Non-null only when the wait failed/refused — the per-instance waiter cap was
/// exceeded, or the match predicate threw (<c>"Wait predicate threw: …"</c>).
/// </param>
public record WaitForAttributeResponse(
string CorrelationId,
bool Matched,
object? Value,
string? Quality,
bool TimedOut,
string? ErrorMessage = null);
/// <summary>
/// Internal self-message scheduled by the Instance Actor to fire a waiter's
/// timeout. Site-local only; never crosses a cluster boundary.
/// </summary>
/// <param name="CorrelationId">The waiter whose timeout fired.</param>
public record WaitForAttributeTimeout(string CorrelationId);
@@ -159,3 +159,23 @@ public record PerSiteNotificationKpiResponse(
bool Success,
string? ErrorMessage,
IReadOnlyList<SiteNotificationKpiSnapshot> Sites);
/// <summary>
/// Outbox UI -> Central: request for the per-node notification outbox KPI breakdown.
/// Mirrors <see cref="PerSiteNotificationKpiRequest"/> but groups by <c>SourceNode</c>
/// instead of <c>SourceSiteId</c>. Additive — does not change per-site behaviour.
/// </summary>
public record PerNodeNotificationKpiRequest(
string CorrelationId);
/// <summary>
/// Central -> Outbox UI: per-node KPI breakdown for the Notification KPIs page.
/// On a repository fault <see cref="Success"/> is <c>false</c>, <see cref="ErrorMessage"/>
/// carries the cause, and <see cref="Nodes"/> is empty. Nodes with a <c>NULL</c>
/// <c>SourceNode</c> are omitted.
/// </summary>
public record PerNodeNotificationKpiResponse(
string CorrelationId,
bool Success,
string? ErrorMessage,
IReadOnlyList<NodeNotificationKpiSnapshot> Nodes);
@@ -0,0 +1,37 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
/// <summary>
/// Point-in-time <c>SiteCalls</c> metrics scoped to a single originating node. The
/// per-node counterpart of <see cref="SiteCallSiteKpiSnapshot"/>; surfaced in the
/// per-node breakdown table on the Site Calls KPIs page. Mirrors
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.Notifications.NodeNotificationKpiSnapshot"/>.
/// </summary>
/// <param name="SourceNode">
/// The node identifier these metrics are scoped to (e.g. <c>node-a</c>,
/// <c>node-b</c>). Rows with a <c>NULL</c> <c>SourceNode</c> are omitted.
/// </param>
/// <param name="BufferedCount">Count of this node's non-terminal rows (<c>TerminalAtUtc IS NULL</c>).</param>
/// <param name="ParkedCount">Count of this node's rows in the <c>Parked</c> status.</param>
/// <param name="FailedLastInterval">
/// Count of this node's <c>Failed</c> rows whose <c>TerminalAtUtc</c> is at or
/// after the "since" timestamp.
/// </param>
/// <param name="DeliveredLastInterval">
/// Count of this node's <c>Delivered</c> rows whose <c>TerminalAtUtc</c> is at
/// or after the "since" timestamp.
/// </param>
/// <param name="OldestPendingAge">
/// Age of this node's oldest non-terminal row, or <c>null</c> when it has none.
/// </param>
/// <param name="StuckCount">
/// Count of this node's non-terminal rows whose <c>CreatedAtUtc</c> is older
/// than the stuck cutoff.
/// </param>
public sealed record SiteCallNodeKpiSnapshot(
string SourceNode,
int BufferedCount,
int ParkedCount,
int FailedLastInterval,
int DeliveredLastInterval,
TimeSpan? OldestPendingAge,
int StuckCount);
@@ -0,0 +1,30 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.Notifications;
/// <summary>
/// Point-in-time notification-outbox metrics scoped to a single originating node.
/// The per-node counterpart of <see cref="SiteNotificationKpiSnapshot"/>; surfaced
/// in the per-node breakdown table on the Notification KPIs page.
/// </summary>
/// <param name="SourceNode">
/// The node identifier these metrics are scoped to (e.g. <c>node-a</c>,
/// <c>node-b</c>). Rows with a <c>NULL</c> <c>SourceNode</c> are omitted.
/// </param>
/// <param name="QueueDepth">Count of this node's non-terminal rows (Pending + Retrying).</param>
/// <param name="StuckCount">
/// Count of this node's non-terminal rows whose <c>CreatedAt</c> is older than the stuck cutoff.
/// </param>
/// <param name="ParkedCount">Count of this node's rows in the Parked status.</param>
/// <param name="DeliveredLastInterval">
/// Count of this node's Delivered rows whose <c>DeliveredAt</c> is at or after the
/// "delivered since" timestamp.
/// </param>
/// <param name="OldestPendingAge">
/// Age of this node's oldest non-terminal row, or <c>null</c> when it has none.
/// </param>
public record NodeNotificationKpiSnapshot(
string SourceNode,
int QueueDepth,
int StuckCount,
int ParkedCount,
int DeliveredLastInterval,
TimeSpan? OldestPendingAge);
@@ -0,0 +1,21 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Types;
/// <summary>
/// Rich result of an <c>Attributes.WaitForAsync</c> wait (spec §3) — the full
/// outcome of waiting for an attribute to reach a value / satisfy a predicate /
/// change at all, bounded by a timeout. The <c>Attributes.WaitAsync</c> helpers
/// surface only <see cref="Matched"/>; <c>WaitForAsync</c> returns this struct so
/// a script can also read the matched <see cref="Value"/>, its <see cref="Quality"/>,
/// and distinguish a genuine timeout (<see cref="TimedOut"/>) from a non-match.
/// </summary>
/// <param name="Matched">
/// <see langword="true"/> when the attribute reached the target / satisfied the
/// predicate within the timeout (and, in quality-gated mode, at "Good" quality).
/// </param>
/// <param name="Value">The matched value; <see langword="null"/> on timeout / error.</param>
/// <param name="Quality">
/// The attribute quality at match time; <see langword="null"/> on the non-match
/// paths (timeout / error / cap-exceeded).
/// </param>
/// <param name="TimedOut"><see langword="true"/> when the timeout fired before a match.</param>
public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
@@ -144,6 +144,7 @@ public class SiteCommunicationActor : ReceiveActor, IWithTimers
Receive<RouteToCallRequest>(msg => _deploymentManagerProxy.Forward(msg));
Receive<RouteToGetAttributesRequest>(msg => _deploymentManagerProxy.Forward(msg));
Receive<RouteToSetAttributesRequest>(msg => _deploymentManagerProxy.Forward(msg));
Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));
// OPC UA Tag Browser (interactive design-time query) — forward to the
// Deployment Manager singleton, which always lands on the active site
@@ -445,6 +445,25 @@ public class CommunicationService
envelope, _options.IntegrationTimeout, cancellationToken);
}
/// <summary>
/// Routes an inbound API wait-for-attribute request to a site (spec §6).
/// </summary>
/// <param name="siteId">The target site identifier.</param>
/// <param name="request">The wait-for-attribute route request.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The wait-for-attribute route response.</returns>
public async Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken = default)
{
var envelope = new SiteEnvelope(siteId, request);
// A wait legitimately blocks up to request.Timeout on the site, so the cluster
// Ask must be bounded by the WAIT deadline (plus integration-timeout slack for
// the round trip), not the generic IntegrationTimeout used by the other routes.
var askTimeout = request.Timeout + _options.IntegrationTimeout;
return await GetActor().Ask<RouteToWaitForAttributeResponse>(
envelope, askTimeout, cancellationToken);
}
// ── Notification Outbox (central-local actor — Asked directly, no SiteEnvelope) ──
/// <summary>
@@ -525,6 +544,22 @@ public class CommunicationService
request, _options.QueryTimeout, cancellationToken);
}
/// <summary>
/// Gets per-node KPI metrics for the notification outbox.
/// Groups by <c>SourceNode</c> (e.g. <c>node-a</c>/<c>node-b</c>); rows with
/// a <c>NULL</c> node are omitted. Additive alongside
/// <see cref="GetPerSiteNotificationKpisAsync"/>.
/// </summary>
/// <param name="request">The per-node notification KPI request.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The per-node notification KPI response.</returns>
public async Task<PerNodeNotificationKpiResponse> GetPerNodeNotificationKpisAsync(
PerNodeNotificationKpiRequest request, CancellationToken cancellationToken = default)
{
return await GetNotificationOutbox().Ask<PerNodeNotificationKpiResponse>(
request, _options.QueryTimeout, cancellationToken);
}
// ── Site Call Audit (central-local actor — Asked directly, no SiteEnvelope) ──
/// <summary>
@@ -579,6 +614,21 @@ public class CommunicationService
request, _options.QueryTimeout, cancellationToken);
}
/// <summary>
/// Gets per-node KPI metrics for site calls. Groups by <c>SourceNode</c>
/// (e.g. <c>node-a</c>/<c>node-b</c>); rows with a <c>NULL</c> node are
/// omitted. Additive alongside <see cref="GetPerSiteSiteCallKpisAsync"/>.
/// </summary>
/// <param name="request">The per-node site call KPI request.</param>
/// <param name="cancellationToken">Cancellation token.</param>
/// <returns>The per-node site call KPI response.</returns>
public async Task<PerNodeSiteCallKpiResponse> GetPerNodeSiteCallKpisAsync(
PerNodeSiteCallKpiRequest request, CancellationToken cancellationToken = default)
{
return await GetSiteCallAudit().Ask<PerNodeSiteCallKpiResponse>(
request, _options.QueryTimeout, cancellationToken);
}
/// <summary>
/// Task 5 (#22): relays an operator Retry of a parked cached call to its
/// owning site. The <c>SiteCallAuditActor</c> is Asked directly (it is
@@ -370,6 +370,99 @@ VALUES
return rowsDeleted;
}
/// <inheritdoc />
public async Task<long> PurgeChannelOlderThanAsync(
string channel,
DateTime threshold,
int batchSize,
CancellationToken ct = default)
{
if (string.IsNullOrWhiteSpace(channel))
{
throw new ArgumentException("Channel must be a non-empty channel name.", nameof(channel));
}
if (batchSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(batchSize), batchSize, "Batch size must be > 0.");
}
var thresholdUtc = DateTime.SpecifyKind(threshold.ToUniversalTime(), DateTimeKind.Utc);
// M5.5 (T3) per-channel retention override purge. This is the ONLY DELETE
// against dbo.AuditLog in the codebase and it runs on the purge/maintenance
// path, NOT the append-only writer role (which has INSERT + SELECT only — see
// the DENY UPDATE/DENY DELETE grants in CollapseAuditLogToCanonical). The
// AuditLog append-only CI guard (AuditLogAppendOnlyGuardTests) is intentionally
// widened to allow ONLY the single marked DELETE below; any other UPDATE/DELETE
// targeting AuditLog still trips the guard.
//
// Bounded + idempotent: DELETE TOP (@batch) caps the log/lock footprint per
// statement; the loop repeats until a batch deletes zero rows, so re-running
// after a crash mid-loop simply resumes. Category is the canonical
// channel-name column (e.g. 'ApiOutbound'); Action holds "{channel}.{kind}" so
// it is NOT the right column to match a bare channel name against.
//
// The trailing AUDIT-PURGE-ALLOWED marker on the DELETE line below is the
// single narrow exemption the append-only CI guard (AuditLogAppendOnlyGuardTests)
// recognizes; any other UPDATE/DELETE targeting AuditLog still trips the guard.
const string deleteBatchSql =
"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;"; // AUDIT-PURGE-ALLOWED: per-channel retention override (M5.5 T3), maintenance path
long totalDeleted = 0;
var conn = _context.Database.GetDbConnection();
var openedHere = false;
if (conn.State != System.Data.ConnectionState.Open)
{
await conn.OpenAsync(ct).ConfigureAwait(false);
openedHere = true;
}
try
{
while (true)
{
ct.ThrowIfCancellationRequested();
await using var cmd = conn.CreateCommand();
cmd.CommandText = deleteBatchSql;
var pBatch = cmd.CreateParameter();
pBatch.ParameterName = "@batch";
pBatch.Value = batchSize;
cmd.Parameters.Add(pBatch);
var pChannel = cmd.CreateParameter();
pChannel.ParameterName = "@channel";
pChannel.Value = channel;
cmd.Parameters.Add(pChannel);
var pThreshold = cmd.CreateParameter();
pThreshold.ParameterName = "@threshold";
pThreshold.Value = thresholdUtc;
cmd.Parameters.Add(pThreshold);
var rows = await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
if (rows <= 0)
{
break;
}
totalDeleted += rows;
}
}
finally
{
if (openedHere)
{
await conn.CloseAsync().ConfigureAwait(false);
}
}
return totalDeleted;
}
/// <inheritdoc />
public async Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
DateTime threshold,
@@ -716,6 +809,102 @@ VALUES
.ToListAsync(ct);
}
/// <inheritdoc />
public async Task<long> BackfillSourceNodeAsync(
string sentinel,
DateTime before,
int batchSize,
CancellationToken ct = default)
{
if (string.IsNullOrWhiteSpace(sentinel))
{
throw new ArgumentException("Sentinel must be a non-empty value.", nameof(sentinel));
}
if (batchSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(batchSize), batchSize, "Batch size must be > 0.");
}
var beforeUtc = DateTime.SpecifyKind(before.ToUniversalTime(), DateTimeKind.Utc);
// M5.6 (T5) SourceNode sentinel backfill. This is the ONE sanctioned UPDATE
// against dbo.AuditLog in the codebase. It touches ONLY rows where
// SourceNode IS NULL AND OccurredAtUtc < @before — rows that pre-date the
// M5.6 feature and whose node-of-origin is UNKNOWABLE. The sentinel (default
// "unknown") makes that explicit. ExecutionId/ParentExecutionId are PERSISTED
// COMPUTED columns derived from DetailsJson — mutating DetailsJson is forbidden
// under the append-only invariant, so those stay NULL on pre-feature rows.
//
// Maintenance path (NOT the writer role): runs on the same connection used for
// SwitchOutPartitionAsync (partition-switch DDL), which requires a role that
// holds UPDATE — the append-only scadabridge_audit_writer role has only
// INSERT + SELECT.
//
// Bounded + idempotent: UPDATE TOP (@batch) caps the log/lock footprint per
// statement; the loop exits when a batch updates 0 rows. Re-running after a
// crash simply resumes where it left off.
//
// The trailing AUDIT-PURGE-ALLOWED marker on the UPDATE line below is the
// single narrow exemption the append-only CI guard (AuditLogAppendOnlyGuardTests)
// recognises for an UPDATE; any other UPDATE targeting AuditLog still trips the guard.
const string updateBatchSql =
"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;"; // AUDIT-PURGE-ALLOWED: SourceNode sentinel backfill (M5.6 T5), maintenance path
long totalUpdated = 0;
var conn = _context.Database.GetDbConnection();
var openedHere = false;
if (conn.State != System.Data.ConnectionState.Open)
{
await conn.OpenAsync(ct).ConfigureAwait(false);
openedHere = true;
}
try
{
while (true)
{
ct.ThrowIfCancellationRequested();
await using var cmd = conn.CreateCommand();
cmd.CommandText = updateBatchSql;
var pBatch = cmd.CreateParameter();
pBatch.ParameterName = "@batch";
pBatch.Value = batchSize;
cmd.Parameters.Add(pBatch);
var pSentinel = cmd.CreateParameter();
pSentinel.ParameterName = "@sentinel";
pSentinel.Value = sentinel;
cmd.Parameters.Add(pSentinel);
var pBefore = cmd.CreateParameter();
pBefore.ParameterName = "@before";
pBefore.Value = beforeUtc;
cmd.Parameters.Add(pBefore);
var rows = await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
if (rows <= 0)
{
break;
}
totalUpdated += rows;
}
}
finally
{
if (openedHere)
{
await conn.CloseAsync().ConfigureAwait(false);
}
}
return totalUpdated;
}
/// <summary>
/// Splits a <c>STRING_AGG</c> comma-joined value into a distinct, ordered
/// list. A null/empty aggregate (a stub node with no rows) yields an empty
@@ -300,6 +300,63 @@ VALUES
: null)).ToList();
}
/// <inheritdoc />
public async Task<IReadOnlyList<NodeNotificationKpiSnapshot>> ComputePerNodeKpisAsync(
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default)
{
var now = DateTimeOffset.UtcNow;
// Exclude rows with NULL SourceNode (legacy / unstamped) — per-node KPIs
// are only meaningful when the node identity is known.
var queueDepth = await CountByNodeAsync(
n => (n.Status == NotificationStatus.Pending || n.Status == NotificationStatus.Retrying)
&& n.SourceNode != null,
cancellationToken);
var stuck = await CountByNodeAsync(
n => (n.Status == NotificationStatus.Pending || n.Status == NotificationStatus.Retrying)
&& n.CreatedAt < stuckCutoff
&& n.SourceNode != null,
cancellationToken);
var parked = await CountByNodeAsync(
n => n.Status == NotificationStatus.Parked && n.SourceNode != null,
cancellationToken);
var delivered = await CountByNodeAsync(
n => n.Status == NotificationStatus.Delivered
&& n.DeliveredAt != null && n.DeliveredAt >= deliveredSince
&& n.SourceNode != null,
cancellationToken);
// Oldest non-terminal CreatedAt per node — same in-memory reduction
// pattern as ComputePerSiteKpisAsync (DateTimeOffset converter makes
// a SQL Min awkward).
var oldest = (await _context.Notifications
.Where(n => (n.Status == NotificationStatus.Pending
|| n.Status == NotificationStatus.Retrying)
&& n.SourceNode != null)
.Select(n => new { n.SourceNode, n.CreatedAt })
.ToListAsync(cancellationToken))
.GroupBy(x => x.SourceNode!)
.ToDictionary(g => g.Key, g => g.Min(x => x.CreatedAt));
var nodeNames = queueDepth.Keys
.Concat(stuck.Keys).Concat(parked.Keys).Concat(delivered.Keys)
.Distinct()
.OrderBy(n => n, StringComparer.Ordinal);
return nodeNames.Select(node => new NodeNotificationKpiSnapshot(
SourceNode: node,
QueueDepth: queueDepth.GetValueOrDefault(node),
StuckCount: stuck.GetValueOrDefault(node),
ParkedCount: parked.GetValueOrDefault(node),
DeliveredLastInterval: delivered.GetValueOrDefault(node),
OldestPendingAge: oldest.TryGetValue(node, out var createdAt)
? now - createdAt
: null)).ToList();
}
/// <summary>Counts notification rows matching <paramref name="predicate"/>, grouped by source site.</summary>
private async Task<Dictionary<string, int>> CountBySiteAsync(
System.Linq.Expressions.Expression<Func<Notification, bool>> predicate,
@@ -312,6 +369,22 @@ VALUES
.ToDictionaryAsync(x => x.Site, x => x.Count, cancellationToken);
}
/// <summary>
/// Counts notification rows matching <paramref name="predicate"/>, grouped by source node.
/// Only rows with a non-null <c>SourceNode</c> should be included; the predicate is
/// responsible for enforcing that guard.
/// </summary>
private async Task<Dictionary<string, int>> CountByNodeAsync(
System.Linq.Expressions.Expression<Func<Notification, bool>> predicate,
CancellationToken cancellationToken)
{
return await _context.Notifications
.Where(predicate)
.GroupBy(n => n.SourceNode!)
.Select(g => new { Node = g.Key, Count = g.Count() })
.ToDictionaryAsync(x => x.Node, x => x.Count, cancellationToken);
}
/// <inheritdoc />
public async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
=> await _context.SaveChangesAsync(cancellationToken);
@@ -324,6 +324,61 @@ ORDER BY CreatedAtUtc DESC, TrackedOperationId DESC;";
StuckCount: stuck.GetValueOrDefault(site))).ToList();
}
/// <inheritdoc />
public async Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default)
{
var now = DateTime.UtcNow;
// Exclude rows with NULL SourceNode — per-node KPIs are only meaningful
// when the node identity is known. Each predicate guards n.SourceNode != null
// so the GROUP BY key is always non-null.
var buffered = await CountByNodeAsync(
s => s.TerminalAtUtc == null && s.SourceNode != null, ct);
var parked = await CountByNodeAsync(
s => s.Status == StatusParked && s.SourceNode != null, ct);
var failed = await CountByNodeAsync(
s => s.Status == StatusFailed
&& s.TerminalAtUtc != null && s.TerminalAtUtc >= intervalSince
&& s.SourceNode != null, ct);
var delivered = await CountByNodeAsync(
s => s.Status == StatusDelivered
&& s.TerminalAtUtc != null && s.TerminalAtUtc >= intervalSince
&& s.SourceNode != null, ct);
var stuck = await CountByNodeAsync(
s => s.TerminalAtUtc == null && s.CreatedAtUtc < stuckCutoff
&& s.SourceNode != null, ct);
// Oldest non-terminal CreatedAtUtc per node — server-side GROUP BY MIN.
var oldest = (await _context.SiteCalls
.Where(s => s.TerminalAtUtc == null && s.SourceNode != null)
.GroupBy(s => s.SourceNode!)
.Select(g => new { Node = g.Key, Oldest = g.Min(s => s.CreatedAtUtc) })
.ToListAsync(ct))
.ToDictionary(x => x.Node, x => x.Oldest);
var nodeNames = buffered.Keys
.Concat(parked.Keys).Concat(failed.Keys)
.Concat(delivered.Keys).Concat(stuck.Keys)
.Distinct()
.OrderBy(n => n, StringComparer.Ordinal);
return nodeNames.Select(node => new SiteCallNodeKpiSnapshot(
SourceNode: node,
BufferedCount: buffered.GetValueOrDefault(node),
ParkedCount: parked.GetValueOrDefault(node),
FailedLastInterval: failed.GetValueOrDefault(node),
DeliveredLastInterval: delivered.GetValueOrDefault(node),
OldestPendingAge: oldest.TryGetValue(node, out var createdAt)
? now - createdAt
: null,
StuckCount: stuck.GetValueOrDefault(node))).ToList();
}
/// <summary>Counts <c>SiteCalls</c> rows matching <paramref name="predicate"/>, grouped by source site.</summary>
private async Task<Dictionary<string, int>> CountBySiteAsync(
System.Linq.Expressions.Expression<Func<SiteCall, bool>> predicate,
@@ -336,6 +391,22 @@ ORDER BY CreatedAtUtc DESC, TrackedOperationId DESC;";
.ToDictionaryAsync(x => x.Site, x => x.Count, ct);
}
/// <summary>
/// Counts <c>SiteCalls</c> rows matching <paramref name="predicate"/>, grouped by source node.
/// Only rows with a non-null <c>SourceNode</c> should be included; the predicate is
/// responsible for enforcing that guard.
/// </summary>
private async Task<Dictionary<string, int>> CountByNodeAsync(
System.Linq.Expressions.Expression<Func<SiteCall, bool>> predicate,
CancellationToken ct)
{
return await _context.SiteCalls
.Where(predicate)
.GroupBy(s => s.SourceNode!)
.Select(g => new { Node = g.Key, Count = g.Count() })
.ToDictionaryAsync(x => x.Node, x => x.Count, ct);
}
private static int GetRankOrThrow(string status)
{
if (!StatusRank.TryGetValue(status, out var rank))
@@ -35,4 +35,9 @@ public sealed class CommunicationServiceInstanceRouter : IInstanceRouter
public Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken) =>
_communicationService.RouteToSetAttributesAsync(siteId, request, cancellationToken);
/// <inheritdoc />
public Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken) =>
_communicationService.RouteToWaitForAttributeAsync(siteId, request, cancellationToken);
}
@@ -34,4 +34,12 @@ public interface IInstanceRouter
/// <returns>A task that resolves to the set-attributes response from the target site.</returns>
Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken);
/// <summary>Routes a wait-for-attribute request to the specified site (spec §6).</summary>
/// <param name="siteId">Target site identifier.</param>
/// <param name="request">The wait-for-attribute request to route (value-equality only).</param>
/// <param name="cancellationToken">Cancellation token for the routed call.</param>
/// <returns>A task that resolves to the wait-for-attribute response from the target site.</returns>
Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);
}
@@ -6,6 +6,7 @@ using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.ScadaBridge.AuditLog.Central;
using ZB.MOM.WW.ScadaBridge.AuditLog.Configuration;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
@@ -95,6 +96,7 @@ public sealed class AuditWriteMiddleware
private readonly ILogger<AuditWriteMiddleware> _logger;
private readonly IOptionsMonitor<AuditLogOptions> _options;
private readonly IAuditActorAccessor? _actorAccessor;
private readonly IAuditInboundCeilingHitsCounter _ceilingHitsCounter;
/// <summary>
/// Initializes the middleware with its required dependencies.
@@ -110,18 +112,26 @@ public sealed class AuditWriteMiddleware
/// construct the middleware; when absent, actor resolution falls back to the
/// stashed API-key name only.
/// </param>
/// <param name="ceilingHitsCounter">
/// M5.3 (T7, optional): incremented whenever an inbound request or response
/// body is truncated at <see cref="AuditLogOptions.InboundMaxBytes"/>. Optional
/// so existing tests and composition roots without the central health snapshot
/// wired still construct without the counter; a NoOp is used when absent.
/// </param>
public AuditWriteMiddleware(
RequestDelegate next,
ICentralAuditWriter auditWriter,
ILogger<AuditWriteMiddleware> logger,
IOptionsMonitor<AuditLogOptions> options,
IAuditActorAccessor? actorAccessor = null)
IAuditActorAccessor? actorAccessor = null,
IAuditInboundCeilingHitsCounter? ceilingHitsCounter = null)
{
_next = next ?? throw new ArgumentNullException(nameof(next));
_auditWriter = auditWriter ?? throw new ArgumentNullException(nameof(auditWriter));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_options = options ?? throw new ArgumentNullException(nameof(options));
_actorAccessor = actorAccessor;
_ceilingHitsCounter = ceilingHitsCounter ?? new NoOpAuditInboundCeilingHitsCounter();
}
/// <summary>
@@ -133,9 +143,11 @@ public sealed class AuditWriteMiddleware
{
var sw = Stopwatch.StartNew();
// Per-request hot read of the inbound cap so a live config change
// Per-request hot read of the options snapshot so a live config change
// picks up on the next request without re-resolving the singleton.
var cap = _options.CurrentValue.InboundMaxBytes;
// InboundMaxBytes is read once here and passed to the capture helpers.
var opts = _options.CurrentValue;
var cap = opts.InboundMaxBytes;
// Audit Log #23 (ParentExecutionId): mint the inbound request's per-request
// ExecutionId ONCE, here at the start of the request, and stash it on
@@ -163,9 +175,20 @@ public sealed class AuditWriteMiddleware
// ReadBufferedRequestBodyAsync's own ContentLength is 0 short-circuit
// returns (null, false) for the bodyless case anyway, so the audit row
// is unchanged.
//
// M5.3 (T7): check if the matched method/target has SkipBodyCapture set.
// The route value is resolved BEFORE the pipeline runs (route matching
// has already bound {methodName} at this point), so we can skip the
// EnableBuffering allocation and body read up front.
var methodNameForOverride = ctx.Request.RouteValues.TryGetValue("methodName", out var rv)
&& rv is string mn && !string.IsNullOrWhiteSpace(mn) ? mn : null;
var skipBody = methodNameForOverride != null
&& opts.PerTargetOverrides.TryGetValue(methodNameForOverride, out var perTarget)
&& perTarget.SkipBodyCapture;
var requestBody = (string?)null;
var requestTruncated = false;
if (RequestHasBody(ctx.Request))
if (!skipBody && RequestHasBody(ctx.Request))
{
ctx.Request.EnableBuffering();
(requestBody, requestTruncated) =
@@ -200,15 +223,25 @@ public sealed class AuditWriteMiddleware
// The forwarding wrapper has already written every byte to the
// original sink; this just pulls back the bounded UTF-8 string.
ctx.Response.Body = originalResponseBody;
var (responseBody, responseTruncated) = captureStream.GetCapturedBody();
var (capturedResponseBody, capturedResponseTruncated) = captureStream.GetCapturedBody();
// M5.3 (T7): if SkipBodyCapture is set, discard the captured response
// body (the request body was never captured above). The row + headers
// still emit with null RequestSummary / ResponseSummary.
// Truncation flags are also cleared so ceiling-hit counter is not
// bumped for methods that deliberately opt out of body capture.
var responseBody = skipBody ? null : capturedResponseBody;
var responseTruncated = skipBody ? false : capturedResponseTruncated;
EmitInboundAudit(
ctx,
opts,
sw.ElapsedMilliseconds,
thrown,
requestBody,
responseBody,
requestTruncated || responseTruncated);
requestTruncated || responseTruncated,
requestTruncated,
responseTruncated);
}
}
@@ -219,11 +252,14 @@ public sealed class AuditWriteMiddleware
/// </summary>
private void EmitInboundAudit(
HttpContext ctx,
AuditLogOptions opts,
long durationMs,
Exception? thrown,
string? requestBody,
string? responseBody,
bool payloadTruncated)
bool payloadTruncated,
bool requestTruncated = false,
bool responseTruncated = false)
{
try
{
@@ -243,10 +279,43 @@ public sealed class AuditWriteMiddleware
var actor = isAuthFailure ? null : ResolveActor(ctx);
var methodName = ResolveMethodName(ctx);
// M5.3 (T7): increment the ceiling-hits counter once per request
// that hit the cap on EITHER the request or response body.
if (requestTruncated || responseTruncated)
{
try { _ceilingHitsCounter.Increment(); } catch { /* swallow per §7 */ }
}
// M5.3 (T7): capture request headers into Extra JSON alongside the
// existing remoteIp / userAgent provenance fields. The header
// collection is run through the SAME header-redaction list
// (AuditLogOptions.HeaderRedactList) that the ScadaBridgeAuditRedactor
// applies to RequestSummary / ResponseSummary — auth/sensitive
// headers are redacted before they land in the row. Uses the SAME
// options snapshot captured at request start (passed in as opts) as
// the SkipBodyCapture / PerTargetOverrides decisions, so a mid-request
// live-reload can't split the body-capture and header-redaction
// verdicts across two different snapshots.
var redactSet = new HashSet<string>(
opts.HeaderRedactList,
StringComparer.OrdinalIgnoreCase);
var headerDict = new Dictionary<string, string>(StringComparer.Ordinal);
foreach (var header in ctx.Request.Headers)
{
// Redact headers whose name appears in the HeaderRedactList —
// the same "<redacted>" marker used by ScadaBridgeAuditRedactor.
var value = redactSet.Contains(header.Key)
? "<redacted>"
: header.Value.ToString();
headerDict[header.Key] = value;
}
var extra = JsonSerializer.Serialize(new
{
remoteIp = ctx.Connection.RemoteIpAddress?.ToString(),
userAgent = ctx.Request.Headers.UserAgent.ToString(),
requestHeaders = headerDict,
});
var evt = ScadaBridgeAuditEventFactory.Create(
@@ -205,6 +205,47 @@ public class RouteTarget
return response.Values;
}
/// <summary>
/// Blocks until a remote instance attribute reaches <paramref name="targetValue"/>
/// or <paramref name="timeout"/> elapses (spec §6). Value-equality ONLY across the
/// wire: the target is canonically encoded via <see cref="AttributeValueCodec"/> and
/// the site evaluates equality — there is no predicate and no quality flag in the
/// comparison.
/// </summary>
/// <param name="attributeName">Name of the attribute to wait on.</param>
/// <param name="targetValue">Target value the attribute must equal for the wait to match.</param>
/// <param name="timeout">Maximum time to wait for the attribute to reach the target value.</param>
/// <param name="cancellationToken">Optional cancellation token; defaults to the method deadline.</param>
/// <returns>A task that resolves to <c>true</c> if the attribute reached the target value, <c>false</c> if the wait timed out.</returns>
public async Task<bool> WaitForAttribute(
string attributeName,
object? targetValue,
TimeSpan timeout,
CancellationToken cancellationToken = default)
{
var token = Effective(cancellationToken);
var siteId = await ResolveSiteAsync(token);
// Audit Log #23 (ParentExecutionId): mirrors the Call path — stamp the
// spawning inbound request's ExecutionId so future site-side audit
// emission for routed waits can record this wait's parent. CorrelationId
// is the per-operation lifecycle id, freshly minted per routed wait.
var request = new RouteToWaitForAttributeRequest(
Guid.NewGuid().ToString(), _instanceCode, attributeName,
AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow,
_parentExecutionId);
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
if (!response.Success)
{
throw new InvalidOperationException(
response.ErrorMessage ?? "Remote attribute wait failed");
}
return response.Matched;
}
/// <summary>
/// Sets a single attribute value on the remote instance.
/// </summary>
@@ -18,13 +18,17 @@ namespace ZB.MOM.WW.ScadaBridge.ManagementService;
/// <summary>
/// Minimal-API endpoints exposing the central Audit Log (#23) over HTTP for the
/// ScadaBridge CLI (M8). Two routes:
/// ScadaBridge CLI (M8). Three routes:
/// <list type="bullet">
/// <item><c>GET /api/audit/query</c> — keyset-paged JSON page, gated on the
/// <see cref="AuthorizationPolicies.OperationalAudit"/> permission.</item>
/// <item><c>GET /api/audit/export</c> — streamed bulk export (csv / jsonl;
/// parquet returns HTTP 501), gated on the
/// <see cref="AuthorizationPolicies.AuditExport"/> permission.</item>
/// <item><c>GET /api/audit/tree</c> — execution-chain tree rooted at the
/// topmost ancestor of a given <c>executionId</c>, returned as a JSON array
/// of <see cref="ExecutionTreeNode"/>; gated on
/// <see cref="AuthorizationPolicies.OperationalAudit"/>.</item>
/// </list>
///
/// <para>
@@ -85,8 +89,16 @@ public static class AuditEndpoints
Converters = { new JsonStringEnumConverter() },
};
/// <summary>Default sentinel written by the backfill endpoint when the caller omits <c>sentinel</c>.</summary>
public const string DefaultBackfillSentinel = "unknown";
/// <summary>Default batch size for the backfill endpoint when the caller omits <c>batchSize</c>.</summary>
public const int DefaultBackfillBatchSize = 5000;
/// <summary>
/// Registers the <c>/api/audit/query</c> and <c>/api/audit/export</c> minimal-API endpoints.
/// Registers the <c>/api/audit/query</c>, <c>/api/audit/export</c>,
/// <c>/api/audit/tree</c>, and <c>POST /api/audit/backfill-source-node</c>
/// minimal-API endpoints.
/// </summary>
/// <param name="endpoints">The endpoint route builder to register routes on.</param>
/// <returns>The same <paramref name="endpoints"/> builder, for chaining.</returns>
@@ -94,6 +106,8 @@ public static class AuditEndpoints
{
endpoints.MapGet("/api/audit/query", (Delegate)HandleQuery);
endpoints.MapGet("/api/audit/export", (Delegate)HandleExport);
endpoints.MapGet("/api/audit/tree", (Delegate)HandleTree);
endpoints.MapPost("/api/audit/backfill-source-node", (Delegate)HandleBackfillSourceNode);
return endpoints;
}
@@ -232,6 +246,177 @@ public static class AuditEndpoints
return Results.Empty;
}
// ─────────────────────────────────────────────────────────────────────
// GET /api/audit/tree
// ─────────────────────────────────────────────────────────────────────
/// <summary>
/// Handles <c>GET /api/audit/tree?executionId=...</c>: authenticates, checks the
/// OperationalAudit permission, and returns the full execution-chain tree rooted at
/// the topmost ancestor of the supplied <c>executionId</c>. The response is a JSON
/// array of <see cref="ExecutionTreeNode"/> objects (empty array when the id is
/// not found). Returns HTTP 400 when <c>executionId</c> is absent or not a valid
/// GUID.
/// </summary>
/// <param name="context">The HTTP context for the current request.</param>
/// <returns>A task that resolves to the HTTP result (200 JSON array, 400, 401, or 403).</returns>
internal static async Task<IResult> HandleTree(HttpContext context)
{
var auth = await AuthenticateAsync(context);
if (auth.Failure is not null)
{
return auth.Failure;
}
if (!HasAnyRole(auth.User!, AuthorizationPolicies.OperationalAuditRoles))
{
return Forbidden("OperationalAudit");
}
var raw = context.Request.Query["executionId"].ToString();
if (string.IsNullOrWhiteSpace(raw) || !Guid.TryParse(raw, out var executionId))
{
return Results.Json(
new { error = "Missing or invalid 'executionId' query parameter (expected a GUID).", code = "BAD_REQUEST" },
statusCode: 400);
}
var repo = context.RequestServices.GetRequiredService<IAuditLogRepository>();
var nodes = await repo.GetExecutionTreeAsync(executionId, context.RequestAborted);
return Results.Json(nodes, JsonOptions);
}
// ─────────────────────────────────────────────────────────────────────
// POST /api/audit/backfill-source-node
// ─────────────────────────────────────────────────────────────────────
/// <summary>
/// Handles <c>POST /api/audit/backfill-source-node</c>: authenticates (Admin role
/// required), reads the JSON body for <c>sentinel</c> / <c>before</c> /
/// <c>batchSize</c>, and calls
/// <see cref="IAuditLogRepository.BackfillSourceNodeAsync"/> on the maintenance
/// path.
///
/// <para>
/// <b>Auth.</b> Admin-only — backfilling the SourceNode column is a one-time ops
/// procedure that mutates the AuditLog table via the maintenance path (NOT the
/// append-only writer role). Restricted to <see cref="AuthorizationPolicies.AuditExportRoles"/>
/// (Administrator) so it is never accessible to Viewer-role users.
/// </para>
///
/// <para>
/// <b>Request body.</b>
/// <code>
/// {
/// "sentinel": "unknown", // optional; default "unknown"
/// "before": "2026-01-01T00:00:00Z", // required ISO-8601 UTC
/// "batchSize": 5000 // optional; default 5000
/// }
/// </code>
/// </para>
///
/// <para>
/// <b>Response (200).</b>
/// <code>{ "rowsUpdated": 12345, "sentinel": "unknown", "before": "2026-01-01T00:00:00Z" }</code>
/// </para>
/// </summary>
/// <param name="context">The HTTP context for the current request.</param>
/// <returns>A task that resolves to the HTTP result (200 JSON, 400, 401, or 403).</returns>
internal static async Task<IResult> HandleBackfillSourceNode(HttpContext context)
{
var auth = await AuthenticateAsync(context);
if (auth.Failure is not null)
{
return auth.Failure;
}
// Admin-only: backfilling is a one-time ops procedure on the maintenance path.
if (!HasAnyRole(auth.User!, AuthorizationPolicies.AuditExportRoles))
{
return Forbidden("Administrator");
}
string bodyText;
try
{
using var reader = new System.IO.StreamReader(context.Request.Body);
bodyText = await reader.ReadToEndAsync(context.RequestAborted);
}
catch (OperationCanceledException)
{
return Results.Json(new { error = "Request cancelled.", code = "CANCELLED" }, statusCode: 499);
}
string sentinel = DefaultBackfillSentinel;
DateTime? beforeUtc = null;
int batchSize = DefaultBackfillBatchSize;
if (!string.IsNullOrWhiteSpace(bodyText))
{
try
{
using var doc = System.Text.Json.JsonDocument.Parse(bodyText);
var root = doc.RootElement;
if (root.TryGetProperty("sentinel", out var sentinelEl))
{
var s = sentinelEl.GetString();
if (!string.IsNullOrWhiteSpace(s))
{
sentinel = s.Trim();
}
}
if (root.TryGetProperty("before", out var beforeEl))
{
if (DateTime.TryParse(
beforeEl.GetString(),
System.Globalization.CultureInfo.InvariantCulture,
System.Globalization.DateTimeStyles.AssumeUniversal | System.Globalization.DateTimeStyles.AdjustToUniversal,
out var parsed))
{
beforeUtc = DateTime.SpecifyKind(parsed, DateTimeKind.Utc);
}
else
{
return Results.Json(
new { error = "Invalid 'before' value; expected ISO-8601 UTC datetime.", code = "BAD_REQUEST" },
statusCode: 400);
}
}
if (root.TryGetProperty("batchSize", out var batchEl) && batchEl.TryGetInt32(out var b) && b > 0)
{
batchSize = b;
}
}
catch (System.Text.Json.JsonException)
{
return Results.Json(
new { error = "Request body must be valid JSON.", code = "BAD_REQUEST" },
statusCode: 400);
}
}
if (beforeUtc is null)
{
return Results.Json(
new { error = "Required field 'before' (ISO-8601 UTC datetime) is missing.", code = "BAD_REQUEST" },
statusCode: 400);
}
var repo = context.RequestServices.GetRequiredService<IAuditLogRepository>();
var rowsUpdated = await repo.BackfillSourceNodeAsync(sentinel, beforeUtc.Value, batchSize, context.RequestAborted);
return Results.Json(new
{
rowsUpdated,
sentinel,
before = beforeUtc.Value.ToString("O", System.Globalization.CultureInfo.InvariantCulture),
}, JsonOptions);
}
/// <summary>
/// Streams every matching row as RFC 4180 CSV, paging the repository with its
/// keyset cursor and flushing after each page so a large export starts
@@ -122,6 +122,7 @@ public class NotificationOutboxActor : ReceiveActor, IWithTimers
Receive<DiscardNotificationRequest>(HandleDiscard);
Receive<NotificationKpiRequest>(HandleKpiRequest);
Receive<PerSiteNotificationKpiRequest>(HandlePerSiteKpiRequest);
Receive<PerNodeNotificationKpiRequest>(HandlePerNodeKpiRequest);
}
/// <inheritdoc />
@@ -1081,6 +1082,38 @@ public class NotificationOutboxActor : ReceiveActor, IWithTimers
return new PerSiteNotificationKpiResponse(correlationId, Success: true, ErrorMessage: null, sites);
}
/// <summary>
/// Handles a per-node KPI request, computing the per-source-node outbox metrics with the
/// same stuck cutoff and delivered window as <see cref="HandleKpiRequest"/>. Additive
/// alongside <see cref="HandlePerSiteKpiRequest"/> — does not change per-site behaviour.
/// </summary>
private void HandlePerNodeKpiRequest(PerNodeNotificationKpiRequest request)
{
var sender = Sender;
var now = DateTimeOffset.UtcNow;
var stuckCutoff = StuckCutoff(now);
var deliveredSince = now - _options.DeliveredKpiWindow;
ComputePerNodeKpisAsync(request.CorrelationId, stuckCutoff, deliveredSince).PipeTo(
sender,
success: response => response,
failure: ex => new PerNodeNotificationKpiResponse(
request.CorrelationId,
Success: false,
ErrorMessage: ex.GetBaseException().Message,
Nodes: Array.Empty<NodeNotificationKpiSnapshot>()));
}
private async Task<PerNodeNotificationKpiResponse> ComputePerNodeKpisAsync(
string correlationId, DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince)
{
using var scope = _serviceProvider.CreateScope();
var repository = scope.ServiceProvider.GetRequiredService<INotificationOutboxRepository>();
var nodes = await repository.ComputePerNodeKpisAsync(stuckCutoff, deliveredSince);
return new PerNodeNotificationKpiResponse(correlationId, Success: true, ErrorMessage: null, nodes);
}
/// <summary>
/// The instant before which a still-pending notification counts as stuck — <paramref name="now"/>
/// offset back by <see cref="NotificationOutboxOptions.StuckAgeThreshold"/>.
@@ -239,6 +239,7 @@ public class SiteCallAuditActor : ReceiveActor
Receive<SiteCallDetailRequest>(HandleDetail);
Receive<SiteCallKpiRequest>(HandleKpi);
Receive<PerSiteSiteCallKpiRequest>(HandlePerSiteKpi);
Receive<PerNodeSiteCallKpiRequest>(HandlePerNodeKpi);
// Task 5 (#22): central→site Retry/Discard relay for parked cached calls.
Receive<RegisterCentralCommunication>(msg =>
@@ -817,6 +818,47 @@ public class SiteCallAuditActor : ReceiveActor
}
}
/// <summary>
/// Handles a per-node KPI request, using the same stuck cutoff and
/// interval bound as <see cref="HandleKpi"/>. Additive alongside
/// <see cref="HandlePerSiteKpi"/> — does not change per-site behaviour.
/// </summary>
private void HandlePerNodeKpi(PerNodeSiteCallKpiRequest request)
{
var sender = Sender;
var now = DateTime.UtcNow;
var stuckCutoff = now - _options.StuckAgeThreshold;
var intervalSince = now - _options.KpiInterval;
PerNodeKpiAsync(request.CorrelationId, stuckCutoff, intervalSince).PipeTo(
sender,
success: response => response,
failure: ex => new PerNodeSiteCallKpiResponse(
request.CorrelationId,
Success: false,
ErrorMessage: ex.GetBaseException().Message,
Nodes: Array.Empty<SiteCallNodeKpiSnapshot>()));
}
private async Task<PerNodeSiteCallKpiResponse> PerNodeKpiAsync(
string correlationId, DateTime stuckCutoff, DateTime intervalSince)
{
var (scope, repository) = ResolveRepository();
try
{
var nodes = await repository
.ComputePerNodeKpisAsync(stuckCutoff, intervalSince)
.ConfigureAwait(false);
return new PerNodeSiteCallKpiResponse(
correlationId, Success: true, ErrorMessage: null, nodes);
}
finally
{
scope?.Dispose();
}
}
// ── Task 5: central→site Retry/Discard relay ──
/// <summary>
@@ -571,7 +571,20 @@ public class AlarmActor : ReceiveActor
/// Passes the firing alarm's level/priority/message so the script can
/// branch on severity via the <c>Alarm</c> global.
/// </summary>
private void SpawnAlarmExecution(AlarmLevel level, int priority, string message)
/// <param name="level">The firing alarm severity level.</param>
/// <param name="priority">The firing alarm priority.</param>
/// <param name="message">The firing alarm message.</param>
/// <param name="parentExecutionId">
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the execution id of
/// the context that fired this alarm, recorded as the on-trigger script run's
/// <c>ParentExecutionId</c> so the alarm-triggered run chains under its firing
/// context in the audit tree. The alarm subsystem currently has no Guid-typed
/// firing id, so the only call sites pass <c>null</c> (the on-trigger run is a
/// root). The parameter exists so a future firing-id can flow without
/// touching the actor wiring.
/// </param>
private void SpawnAlarmExecution(
AlarmLevel level, int priority, string message, Guid? parentExecutionId = null)
{
if (_onTriggerCompiledScript == null) return;
@@ -591,7 +604,9 @@ public class AlarmActor : ReceiveActor
_options,
_logger,
// M2.5 (#9): per-script timeout from the on-trigger script (null = global).
_onTriggerExecutionTimeoutSeconds));
_onTriggerExecutionTimeoutSeconds,
// Audit Log #23 (M5.4): the firing context's execution id (null today).
parentExecutionId));
Context.ActorOf(props, executionId);
}
@@ -29,6 +29,14 @@ public class AlarmExecutionActor : ReceiveActor
/// <param name="options">Site runtime configuration options, including the execution timeout.</param>
/// <param name="logger">Logger for execution diagnostics.</param>
/// <param name="executionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
/// <param name="parentExecutionId">
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the execution id of
/// the context that fired this alarm, threaded into the on-trigger script's
/// <see cref="ScriptRuntimeContext"/> as its <c>ParentExecutionId</c> so the
/// alarm-triggered run chains under its firing context. Null today (no
/// Guid-typed firing id exists yet) — the run is a root, but the plumbing
/// is in place for a future firing id.
/// </param>
public AlarmExecutionActor(
string alarmName,
string instanceName,
@@ -42,7 +50,9 @@ public class AlarmExecutionActor : ReceiveActor
ILogger logger,
// M2.5 (#9): per-script execution timeout override (seconds) for the
// alarm on-trigger script. Null or non-positive falls back to the global.
int? executionTimeoutSeconds = null)
int? executionTimeoutSeconds = null,
// Audit Log #23 (M5.4): the firing context's execution id (null today).
Guid? parentExecutionId = null)
{
var self = Self;
var parent = Context.Parent;
@@ -51,7 +61,7 @@ public class AlarmExecutionActor : ReceiveActor
alarmName, instanceName, level, priority, message,
compiledScript, instanceActor,
sharedScriptLibrary, options, self, parent, logger,
executionTimeoutSeconds);
executionTimeoutSeconds, parentExecutionId);
}
private static void ExecuteAlarmScript(
@@ -67,7 +77,8 @@ public class AlarmExecutionActor : ReceiveActor
IActorRef self,
IActorRef parent,
ILogger logger,
int? executionTimeoutSeconds)
int? executionTimeoutSeconds,
Guid? parentExecutionId)
{
// M2.5 (#9): per-script timeout overrides the global default. A null or
// non-positive per-script value (≤ 0) falls back to the global.
@@ -95,7 +106,19 @@ public class AlarmExecutionActor : ReceiveActor
options.MaxScriptCallDepth,
timeout,
instanceName,
logger);
logger,
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the
// alarm on-trigger run mints its own fresh ExecutionId (the
// ctor's `?? NewGuid()` fallback) and records the firing
// context's id as its ParentExecutionId — null today, so the
// run is a root, but the plumbing exists for a future
// firing id.
parentExecutionId: parentExecutionId,
// WaitForAttribute (spec §4.4): thread the alarm on-trigger
// script's per-script execution-timeout token so a
// Attributes.WaitAsync inside an on-trigger script is bounded
// by the same script deadline.
scriptTimeoutToken: cts.Token);
var globals = new ScriptGlobals
{
@@ -149,6 +149,7 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
Receive<RouteToCallRequest>(RouteInboundApiCall);
Receive<RouteToGetAttributesRequest>(RouteInboundApiGetAttributes);
Receive<RouteToSetAttributesRequest>(RouteInboundApiSetAttributes);
Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);
// OPC UA Tag Browser — singleton-only re-forward to local /user/dcl-manager.
// BrowseNodeCommand is routed to this singleton (active node) by
@@ -1078,6 +1079,45 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
}).PipeTo(sender);
}
/// <summary>
/// Spec §6 (WD-2b): unpacks a routed <see cref="RouteToWaitForAttributeRequest"/>
/// (inbound-API <c>Route.To().WaitForAttribute()</c>) into the deployed
/// Instance Actor's site-local <see cref="WaitForAttributeRequest"/> and relays
/// the result back. Value-equality only across the wire — the predicate is null
/// and <c>RequireGoodQuality</c> is left at its default. The Ask is bounded by the
/// wait timeout plus slack (NOT a fixed 30s), since the wait legitimately blocks
/// for up to <see cref="RouteToWaitForAttributeRequest.Timeout"/>.
/// </summary>
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
{
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
{
Sender.Tell(new RouteToWaitForAttributeResponse(
request.CorrelationId, false, null, null, false,
false, $"Instance '{request.InstanceUniqueName}' not found on this site.",
DateTimeOffset.UtcNow));
return;
}
var sender = Sender;
// Routed waits are value-equality only (predicate null); RequireGoodQuality left at default.
var inner = new WaitForAttributeRequest(
request.CorrelationId, request.InstanceUniqueName, request.AttributeName,
request.TargetValueEncoded, null, request.Timeout, DateTimeOffset.UtcNow);
// Ask bounded by the WAIT timeout + slack — NOT a fixed 30s (the wait legitimately blocks up to request.Timeout).
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
.ContinueWith(t => t.IsCompletedSuccessfully
? new RouteToWaitForAttributeResponse(
request.CorrelationId, t.Result.Matched, t.Result.Value, t.Result.Quality, t.Result.TimedOut,
true, null, DateTimeOffset.UtcNow)
: new RouteToWaitForAttributeResponse(
request.CorrelationId, false, null, null, false,
false, t.Exception?.GetBaseException().Message ?? "Attribute wait timed out",
DateTimeOffset.UtcNow))
.PipeTo(sender);
}
/// <summary>
/// Writes attribute values on a deployed instance for a Route.To().SetAttribute(s)
/// call (or a central Test Run bound to the instance). Each write is Ask'd to the
@@ -68,6 +68,18 @@ public class InstanceActor : ReceiveActor
// mirroring the rest of the actor's by-name dictionaries).
private readonly Dictionary<string, ResolvedAttribute> _resolvedAttributeByName = new();
// WaitForAttribute (spec §4.2): one-shot waiter registry keyed by the
// request CorrelationId. Each entry holds the watched attribute name, the
// match test (decoded target equality OR a site-local predicate), the
// original Sender to reply to, and the scheduled-timeout handle so a match
// can cancel it. Single-threaded actor access — no locking needed.
private readonly Dictionary<string, PendingWait> _attributeWaiters = new();
// WaitForAttribute: defensive per-instance cap so a script leaking waiters
// in a loop cannot grow the registry without bound. Exceeding it refuses the
// wait with an error reply rather than registering.
private const int MaxAttributeWaiters = 100;
// DCL manager actor reference for subscribing to tag values
private readonly IActorRef? _dclManager;
// Maps each tag path to every attribute canonical name that references it.
@@ -170,6 +182,12 @@ public class InstanceActor : ReceiveActor
// WP-22/23: Handle attribute value changes from DCL (Tell pattern)
Receive<AttributeValueChanged>(HandleAttributeValueChanged);
// WaitForAttribute (spec §4.2): event-driven "wait for value" waiter
// registration + its scheduled-timeout self-message. Both flow only
// site-locally (the predicate variant carries a non-serializable delegate).
Receive<WaitForAttributeRequest>(HandleWaitForAttribute);
Receive<WaitForAttributeTimeout>(HandleWaitForAttributeTimeout);
// Handle tag value updates from DCL — convert to AttributeValueChanged
Receive<TagValueUpdate>(HandleTagValueUpdate);
Receive<SubscribeTagsResponse>(_ => { }); // Ack from DCL subscribe — no action needed
@@ -519,6 +537,114 @@ public class InstanceActor : ReceiveActor
PublishAndNotifyChildren(changed);
}
/// <summary>
/// WaitForAttribute (spec §4.2): registers a one-shot event-driven waiter for
/// an attribute to reach a value (encoded-equality), satisfy a site-local
/// predicate, or change at all. The current-value fast-path and the
/// change-handling in <see cref="HandleAttributeValueChanged"/> both run on
/// this single-threaded actor, so a value that flips between "read current"
/// and "register" cannot be missed (spec §5).
/// </summary>
private void HandleWaitForAttribute(WaitForAttributeRequest req)
{
// Capture the sender immediately — Sender is invalid once we schedule /
// return and a later message arrives.
var replyer = Sender;
// Build the match test: explicit predicate wins; else null encoded target
// means "any change"; else compare the codec-encoded current value to the
// encoded target (avoids needing the attribute's DataType to decode).
Func<object?, bool> test;
if (req.Predicate is not null)
{
test = req.Predicate;
}
else if (req.TargetValueEncoded is null)
{
test = _ => true;
}
else
{
var target = req.TargetValueEncoded;
test = v => string.Equals(
AttributeValueCodec.Encode(v), target, StringComparison.Ordinal);
}
// Fast path: the current value already satisfies the test → reply now.
// A script-supplied predicate (or the codec-equality lambda) runs on the
// actor thread; guard it so a throwing predicate cannot crash the actor or
// leak a never-resolved waiter. On throw: reply non-matched + ErrorMessage
// and return WITHOUT registering (no timeout scheduled).
if (_attributes.TryGetValue(req.AttributeName, out var current))
{
// Effective quality used for BOTH the §4.2 quality gate and the match
// reply — the same `?? "Good"` default the reply has always used.
_attributeQualities.TryGetValue(req.AttributeName, out var fastQuality);
var effectiveQuality = fastQuality ?? "Good";
bool fastMatch;
try
{
// §4.2 quality gate ANDed with the value test, both INSIDE the guard:
// in quality-gated mode a value already at target but at Bad/Uncertain
// quality is NOT a fast match — it falls through to register + schedule
// the timeout like any other pending waiter (do NOT fast-reply matched).
fastMatch =
(!req.RequireGoodQuality
|| string.Equals(effectiveQuality, "Good", StringComparison.Ordinal))
&& test(current);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"WaitForAttribute predicate threw on the fast-path for {Instance}.{Attribute}; refusing the wait",
_instanceUniqueName, req.AttributeName);
replyer.Tell(new WaitForAttributeResponse(
req.CorrelationId, Matched: false, null, null, TimedOut: false,
ErrorMessage: "Wait predicate threw: " + ex.Message));
return;
}
if (fastMatch)
{
replyer.Tell(new WaitForAttributeResponse(
req.CorrelationId, Matched: true, current, effectiveQuality, TimedOut: false));
return;
}
}
// Defensive cap: refuse rather than register if the instance already has
// too many concurrent waiters (guards against a script leaking waiters).
if (_attributeWaiters.Count >= MaxAttributeWaiters)
{
replyer.Tell(new WaitForAttributeResponse(
req.CorrelationId, Matched: false, null, null, TimedOut: false,
ErrorMessage: "Too many concurrent attribute waiters on this instance"));
return;
}
// Register and schedule the self-evicting timeout (NativeAlarmActor idiom).
var handle = Context.System.Scheduler.ScheduleTellOnceCancelable(
req.Timeout, Self, new WaitForAttributeTimeout(req.CorrelationId), Self);
_attributeWaiters[req.CorrelationId] =
new PendingWait(req.AttributeName, test, replyer, handle, req.RequireGoodQuality);
}
/// <summary>
/// WaitForAttribute (spec §4.2): the scheduled timeout fired for a waiter that
/// never matched. If still registered (a match would have removed + canceled
/// it), reply TimedOut and evict it.
/// </summary>
private void HandleWaitForAttributeTimeout(WaitForAttributeTimeout msg)
{
if (_attributeWaiters.Remove(msg.CorrelationId, out var pending))
{
pending.Replyer.Tell(new WaitForAttributeResponse(
msg.CorrelationId, Matched: false, null, null, TimedOut: true));
}
}
/// <summary>
/// Handles tag value updates from DCL. Maps the tag path back to the attribute
/// canonical name and converts to an AttributeValueChanged for unified processing.
@@ -556,9 +682,14 @@ public class InstanceActor : ReceiveActor
_attributeQualities[attrName] = "Bad";
_attributeTimestamps[attrName] = update.Timestamp;
var currentValue = _attributes.GetValueOrDefault(attrName);
// WaitForAttribute (spec §4.2): quality-only republish — the
// stored value is UNCHANGED (we publish the OLD currentValue, only
// the quality flips to Bad). Do NOT evaluate waiters, or an
// "any-change" / unchanged-value-equality waiter would fire on a
// non-change.
PublishAndNotifyChildren(new AttributeValueChanged(
_instanceUniqueName, update.TagPath, attrName,
currentValue, "Bad", update.Timestamp));
currentValue, "Bad", update.Timestamp), evaluateWaiters: false);
}
continue;
}
@@ -908,7 +1039,17 @@ public class InstanceActor : ReceiveActor
/// Publishes attribute change to stream and notifies child Script/Alarm actors.
/// WP-22: Tell for attribute notifications (fire-and-forget, never blocks).
/// </summary>
private void PublishAndNotifyChildren(AttributeValueChanged changed)
/// <param name="changed">The attribute change to publish.</param>
/// <param name="evaluateWaiters">
/// WaitForAttribute (spec §4.2): when <c>true</c> (the default), registered
/// <c>Attributes.WaitAsync</c> waiters on this attribute are re-evaluated against
/// <paramref name="changed"/>'s value. Pass <c>false</c> on republish/quality-only
/// paths that do NOT assign a new value to <c>_attributes[name]</c> (e.g. the
/// List-coerce-failure Bad-quality republish, which publishes the OLD value) —
/// otherwise an "any-change" waiter (or a waiter whose target equals the unchanged
/// value) would spuriously fire even though nothing actually changed.
/// </param>
private void PublishAndNotifyChildren(AttributeValueChanged changed, bool evaluateWaiters = true)
{
// WP-23: Publish to site-wide stream
_streamManager?.PublishAttributeValueChanged(changed);
@@ -924,6 +1065,83 @@ public class InstanceActor : ReceiveActor
{
alarmActor.Tell(changed);
}
// WaitForAttribute (spec §4.2): re-evaluate any waiters on THIS attribute —
// but ONLY when this publish reflects a real value change (evaluateWaiters).
// The genuine value-change paths (HandleAttributeValueChanged, the scalar
// DCL update path, HandleSetStaticAttributeCore) call it AFTER assigning
// _attributes[name], so changed.Value is the just-applied current value.
// Republish/quality-only paths (List-coerce-failure Bad-quality, which
// publishes the OLD value) pass evaluateWaiters:false so an "any-change" or
// unchanged-value-equality waiter does not spuriously fire (spec §4.2).
// Iterate a snapshot so satisfied waiters can be removed during the loop;
// each match cancels its scheduled timeout (so no stray WaitForAttributeTimeout
// follows) and replies Matched=true.
if (evaluateWaiters)
ResolveMatchedWaiters(changed);
}
/// <summary>
/// WaitForAttribute (spec §4.2): fires every registered waiter on
/// <paramref name="changed"/>'s attribute whose test now passes against the
/// just-applied value — cancelling its timeout, replying Matched, and removing
/// it from the registry. A no-op when there are no waiters.
///
/// <para>
/// Each waiter's match test runs inside a per-waiter try/catch: a throwing
/// script-supplied predicate (or codec lambda) must NOT abort the loop and
/// strand sibling waiters on the same attribute, nor leave the throwing waiter
/// registered with a live scheduled timeout. On throw we cancel that waiter's
/// timeout, reply non-matched + ErrorMessage, remove it, and continue.
/// </para>
/// </summary>
private void ResolveMatchedWaiters(AttributeValueChanged changed)
{
if (_attributeWaiters.Count == 0)
return;
// Snapshot the candidate waiters on THIS attribute. Iterating a snapshot
// (and NOT evaluating the test inside the LINQ filter) keeps removal mid-loop
// safe and ensures one throwing test cannot abort materialization for siblings.
var candidates = _attributeWaiters
.Where(kvp => kvp.Value.AttributeName == changed.AttributeName)
.ToList();
foreach (var (cid, pending) in candidates)
{
bool matched;
try
{
// §4.2 quality gate ANDed with the value test, both INSIDE the guard:
// in quality-gated mode a value reaching the target at Bad/Uncertain
// quality is NOT a match — the waiter stays pending until it satisfies
// the test at Good quality (or times out).
matched =
(!pending.RequireGoodQuality
|| string.Equals(changed.Quality, "Good", StringComparison.Ordinal))
&& pending.Test(changed.Value);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"WaitForAttribute predicate threw while resolving waiter {CorrelationId} on {Instance}.{Attribute}; evicting it",
cid, _instanceUniqueName, changed.AttributeName);
pending.Timeout.Cancel();
pending.Replyer.Tell(new WaitForAttributeResponse(
cid, Matched: false, null, null, TimedOut: false,
ErrorMessage: "Wait predicate threw: " + ex.Message));
_attributeWaiters.Remove(cid);
continue;
}
if (!matched)
continue;
pending.Timeout.Cancel();
pending.Replyer.Tell(new WaitForAttributeResponse(
cid, Matched: true, changed.Value, changed.Quality, TimedOut: false));
_attributeWaiters.Remove(cid);
}
}
/// <summary>
@@ -1202,4 +1420,23 @@ public class InstanceActor : ReceiveActor
/// Internal message for async override loading result.
/// </summary>
internal record LoadOverridesResult(Dictionary<string, string> Overrides, string? Error);
/// <summary>
/// WaitForAttribute (spec §4.2): one registered, not-yet-satisfied waiter.
/// </summary>
/// <param name="AttributeName">The attribute this waiter watches (scope-resolved).</param>
/// <param name="Test">The match test (decoded-target equality OR site-local predicate OR any-change).</param>
/// <param name="Replyer">The original sender to reply to on match / timeout.</param>
/// <param name="Timeout">The scheduled timeout handle, canceled on match.</param>
/// <param name="RequireGoodQuality">
/// Quality-gated ("Good"-only) mode (spec §4.2): when <c>true</c>, the resolve
/// loop additionally requires <c>changed.Quality == "Good"</c> before the test
/// can match.
/// </param>
private sealed record PendingWait(
string AttributeName,
Func<object?, bool> Test,
IActorRef Replyer,
ICancelable Timeout,
bool RequireGoodQuality);
}
@@ -221,7 +221,12 @@ public class ScriptExecutionActor : ReceiveActor
// M2.12 (#25): thread the singleton site event logger so
// recursion-limit violations at CallScript/CallShared emit a
// script Error site event in addition to ILogger.LogError.
siteEventLogger: siteEventLogger);
siteEventLogger: siteEventLogger,
// WaitForAttribute (spec §4.3/§4.4): thread the per-script
// execution-timeout token so Attributes.WaitAsync's Ask is
// bounded by the script's own ExecutionTimeoutSeconds — a
// shorter script deadline wins over the wait's own timeout.
scriptTimeoutToken: cts.Token);
var globals = new ScriptGlobals
{
@@ -73,6 +73,107 @@ public class AttributeAccessor
/// <returns>A task that represents the asynchronous operation.</returns>
public Task SetAsync(string key, object? value)
=> _ctx.SetAttribute(Resolve(key), AttributeValueCodec.Encode(value) ?? string.Empty);
/// <summary>
/// WaitForAttribute (spec §3-§5): waits event-driven until the attribute equals
/// <paramref name="targetValue"/> (value-equality, codec-normalized), bounded by
/// <paramref name="timeout"/>. Returns <c>true</c> if matched within the timeout,
/// <c>false</c> on timeout (no throw). Honors the script's execution-timeout token.
/// Scope/composition path resolution (<see cref="Resolve"/>) is applied just like
/// <see cref="GetAsync"/> / <see cref="SetAsync"/>.
///
/// <para>
/// <b>Quality-agnostic by default (spec §4.2):</b> matching tests the VALUE, not
/// the quality — a value arriving at Bad quality still satisfies the wait. Pass
/// <paramref name="requireGoodQuality"/><c>:true</c> for quality-gated ("Good"-only)
/// matching: a value reaching the target at Bad/Uncertain quality is ignored and
/// the wait holds until the target is reached at "Good" quality (or times out).
/// </para>
///
/// <para>
/// Passing a <b>null</b> <paramref name="targetValue"/> means "match on any change":
/// the wait then matches the next value the attribute receives — and matches
/// IMMEDIATELY (fast-path) if the attribute already holds any value at registration.
/// </para>
/// </summary>
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
/// <param name="targetValue">
/// The value to wait for (codec-encoded for comparison); <c>null</c> means
/// "match on any change" (matches immediately if the attribute already has a value).
/// </param>
/// <param name="timeout">How long to wait before returning false.</param>
/// <param name="requireGoodQuality">
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to
/// <c>false</c> (quality-agnostic — Bad/Uncertain-quality transients still match).
/// </param>
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
/// <summary>
/// WaitForAttribute (spec §3-§5): predicate form — waits event-driven until
/// <paramref name="predicate"/> returns <c>true</c> for the attribute's current
/// value, bounded by <paramref name="timeout"/>. Site-local only (the predicate
/// is an in-process delegate). Returns <c>true</c> if matched within the timeout,
/// <c>false</c> on timeout (no throw). Scope/composition path resolution applies.
///
/// <para>
/// <b>Quality-agnostic by default (spec §4.2):</b> the predicate is tested against
/// the VALUE, regardless of quality — a value arriving at Bad quality still
/// satisfies the wait if the predicate passes. Pass <paramref name="requireGoodQuality"/>
/// <c>:true</c> for quality-gated ("Good"-only) matching: a value satisfying the
/// predicate at Bad/Uncertain quality is ignored until it does so at "Good" quality.
/// </para>
/// </summary>
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
/// <param name="predicate">The site-local predicate tested against the current value.</param>
/// <param name="timeout">How long to wait before returning false.</param>
/// <param name="requireGoodQuality">
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to
/// <c>false</c> (quality-agnostic).
/// </param>
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout, requireGoodQuality);
/// <summary>
/// WaitForAttribute (spec §3): richer value-equality form — like
/// <see cref="WaitAsync(string, object?, TimeSpan, bool)"/> but returns the full
/// <see cref="WaitResult"/> (matched flag + matched value + quality + timed-out
/// flag) instead of a bare bool. Scope/composition path resolution
/// (<see cref="Resolve"/>) is applied to <paramref name="key"/> just like the
/// other accessors. Never throws on timeout — a timeout yields
/// <c>WaitResult { Matched = false, TimedOut = true }</c>.
/// </summary>
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
/// <param name="targetValue">
/// The value to wait for (codec-encoded for comparison); <c>null</c> means
/// "match on any change".
/// </param>
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
/// <param name="requireGoodQuality">
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to <c>false</c>.
/// </param>
/// <returns>The full <see cref="WaitResult"/> for the wait.</returns>
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
/// <summary>
/// WaitForAttribute (spec §3): richer predicate form — like
/// <see cref="WaitAsync(string, Func{object?, bool}, TimeSpan, bool)"/> but returns
/// the full <see cref="WaitResult"/>. Site-local only (the predicate is an
/// in-process delegate). Scope/composition path resolution applies. Never throws
/// on timeout (<c>WaitResult { Matched = false, TimedOut = true }</c>).
/// </summary>
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
/// <param name="predicate">The site-local predicate tested against the current value.</param>
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
/// <param name="requireGoodQuality">
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to <c>false</c>.
/// </param>
/// <returns>The full <see cref="WaitResult"/> for the wait.</returns>
public Task<WaitResult> WaitForAsync(string key, Func<object?, bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
}
/// <summary>
@@ -46,6 +46,16 @@ public class ScriptRuntimeContext
private readonly ILogger _logger;
private readonly string _instanceName;
/// <summary>
/// WaitForAttribute (spec §4.3): the per-script execution-timeout token from
/// the owning <c>ScriptExecutionActor</c>/<c>AlarmExecutionActor</c>
/// (<c>cts.Token</c>). Bounds the <c>Attributes.WaitAsync</c> Ask so a script
/// that hits its own <c>ExecutionTimeoutSeconds</c> abandons the wait. Defaults
/// to <see cref="CancellationToken.None"/> for contexts that do not thread one
/// (legacy callers / tests / the alarm path when it has no CTS).
/// </summary>
private readonly CancellationToken _scriptTimeoutToken;
/// <summary>
/// WP-13: External system client for ExternalSystem.Call/CachedCall.
/// </summary>
@@ -194,6 +204,13 @@ public class ScriptRuntimeContext
/// <c>ILogger.LogError</c> + throw. When null the existing behaviour is
/// unchanged; all existing callers and tests remain source-compatible.
/// </param>
/// <param name="scriptTimeoutToken">
/// WaitForAttribute (spec §4.3): the per-script execution-timeout token
/// (<c>cts.Token</c> on the owning execution actor) used to bound
/// <c>Attributes.WaitAsync</c>. Defaults to
/// <see cref="CancellationToken.None"/> for callers / tests that do not
/// thread one — those waits are bounded only by their own timeout.
/// </param>
public ScriptRuntimeContext(
IActorRef instanceActor,
IActorRef self,
@@ -215,7 +232,8 @@ public class ScriptRuntimeContext
Guid? executionId = null,
Guid? parentExecutionId = null,
string? sourceNode = null,
ISiteEventLogger? siteEventLogger = null)
ISiteEventLogger? siteEventLogger = null,
CancellationToken scriptTimeoutToken = default)
{
_instanceActor = instanceActor;
_self = self;
@@ -245,6 +263,66 @@ public class ScriptRuntimeContext
_parentExecutionId = parentExecutionId;
// M2.12 (#25): optional — null when not wired (tests / AlarmExecutionActor).
_siteEventLogger = siteEventLogger;
// WaitForAttribute (spec §4.3): default(CancellationToken) == None when
// not threaded in — the WaitAsync Ask is then bounded only by its own timeout.
_scriptTimeoutToken = scriptTimeoutToken;
}
/// <summary>
/// Audit Log #23 (M5.4): this run's own per-execution id. Exposed so a
/// nested <c>Scripts.CallShared</c> can record it as the spawned shared
/// script's <c>ParentExecutionId</c>, forming a true execution tree.
/// </summary>
internal Guid ExecutionId => _executionId;
/// <summary>
/// Audit Log #23 (M5.4): the spawning execution's id for this run (null for
/// a root run). Exposed for test assertions on the execution tree.
/// </summary>
internal Guid? ParentExecutionId => _parentExecutionId;
/// <summary>
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): builds a child
/// <see cref="ScriptRuntimeContext"/> for an inline <c>Scripts.CallShared</c>
/// invocation. The shared script runs inline (no actor hop) but is modelled
/// as its OWN execution node in the audit tree: it mints a fresh
/// <see cref="_executionId"/> and records THIS run's <see cref="_executionId"/>
/// as its <c>ParentExecutionId</c>, so <c>B → CallShared(C)</c> yields
/// <c>C.ParentExecutionId == B.ExecutionId</c>. Every other dependency
/// (actors, gateways, audit writer, site id, source node, call-depth) is
/// carried over verbatim from this context.
/// </summary>
/// <param name="childCallDepth">The recursion depth of the shared-script call.</param>
internal ScriptRuntimeContext CreateChildContextForSharedScript(int childCallDepth)
{
return new ScriptRuntimeContext(
_instanceActor,
_self,
_sharedScriptLibrary,
childCallDepth,
_maxCallDepth,
_askTimeout,
_instanceName,
_logger,
_externalSystemClient,
_databaseGateway,
_storeAndForward,
_siteCommunicationActor,
_siteId,
_sourceScript,
_auditWriter,
_operationTrackingStore,
_cachedForwarder,
// Fresh execution id for the shared-script run (omit so the ctor mints one)…
executionId: null,
// …parented to THIS run's execution id (the spawner).
parentExecutionId: _executionId,
sourceNode: _sourceNode,
siteEventLogger: _siteEventLogger,
// WaitForAttribute (spec §4.3): an inline shared-script call shares the
// parent run's execution-timeout token so a WaitAsync inside the shared
// script is bounded by the SAME script deadline.
scriptTimeoutToken: _scriptTimeoutToken);
}
/// <summary>
@@ -307,6 +385,115 @@ public class ScriptRuntimeContext
return response.Value;
}
/// <summary>
/// WaitForAttribute (spec §3-§5): waits event-driven for an attribute to reach
/// a value (encoded-equality), satisfy a site-local predicate, or change at all,
/// bounded by <paramref name="timeout"/>. Returns <c>true</c> if matched within
/// the timeout, <c>false</c> on timeout — NEVER throws on timeout. The backing
/// <c>Attributes.WaitAsync</c> for the accessor.
///
/// <para>
/// The Ask is bounded by the script's own execution-timeout token (§4.3): a
/// script that hits its <c>ExecutionTimeoutSeconds</c> abandons the wait. The
/// Ask timeout is the wait timeout plus a small <see cref="_askTimeout"/> slack
/// so the InstanceActor's own scheduled timeout reply is the authoritative path
/// for the false/timed-out outcome, not the Ask deadline.
/// </para>
///
/// <para>
/// <b>Quality-agnostic by default (spec §4.2):</b> a value arriving at Bad
/// quality still satisfies the wait — the match tests the value, not the quality.
/// A quality-gated ("Good"-only) mode is a planned enhancement, deferred per spec §4.2.
/// </para>
///
/// <para>
/// <b>Never throws on timeout.</b> An <see cref="Akka.Actor.AskTimeoutException"/>
/// (the pathological case where the InstanceActor's authoritative timeout reply
/// never arrives — actor stopped/restarted) is caught and surfaced as <c>false</c>,
/// matching the timeout contract. An <see cref="OperationCanceledException"/> /
/// <see cref="TaskCanceledException"/> from the script-deadline token is NOT caught
/// — it propagates to abort the script (intended §4.3 behaviour).
/// </para>
/// </summary>
/// <param name="name">The scope-resolved attribute name to wait on.</param>
/// <param name="targetValueEncoded">
/// The codec-encoded target value; null (with null <paramref name="predicate"/>)
/// means "any change".
/// </param>
/// <param name="predicate">Site-local predicate; null when the encoded target is used.</param>
/// <param name="timeout">How long to wait before returning false.</param>
/// <param name="requireGoodQuality">
/// Quality-gated ("Good"-only) mode (spec §4.2): when <see langword="true"/>, a
/// value reaching the target / satisfying the predicate at Bad/Uncertain quality
/// is NOT a match — the wait holds until the value satisfies the test at Good
/// quality (or times out). Defaults to <see langword="false"/> (quality-agnostic).
/// </param>
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
public async Task<bool> WaitAttribute(
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
bool requireGoodQuality = false)
=> (await WaitInternal(name, targetValueEncoded, predicate, timeout, requireGoodQuality)).Matched;
/// <summary>
/// WaitForAttribute (spec §3): the richer overload backing <c>Attributes.WaitForAsync</c>
/// — identical semantics to <see cref="WaitAttribute"/> but surfaces the full
/// <see cref="WaitResult"/> (matched flag + matched value + quality + timed-out
/// flag) instead of a bare bool. Never throws on timeout (see <see cref="WaitInternal"/>).
/// </summary>
/// <param name="name">The scope-resolved attribute name to wait on.</param>
/// <param name="targetValueEncoded">The codec-encoded target value; null (with null predicate) means "any change".</param>
/// <param name="predicate">Site-local predicate; null when the encoded target is used.</param>
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
/// <param name="requireGoodQuality">Quality-gated ("Good"-only) mode (spec §4.2); defaults to <see langword="false"/>.</param>
/// <returns>The full <see cref="WaitResult"/> — on timeout: <c>Matched:false, TimedOut:true</c>.</returns>
public async Task<WaitResult> WaitAttributeFull(
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
bool requireGoodQuality = false)
{
var r = await WaitInternal(name, targetValueEncoded, predicate, timeout, requireGoodQuality);
return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut);
}
/// <summary>
/// Shared core for <see cref="WaitAttribute"/> / <see cref="WaitAttributeFull"/>:
/// builds the <see cref="WaitForAttributeRequest"/> (incl. the §4.2
/// <paramref name="requireGoodQuality"/> flag), Asks the InstanceActor bounded by
/// the script's execution-timeout token, and returns the full response. An
/// <see cref="AskTimeoutException"/> (the pathological case where the actor's own
/// authoritative timeout reply never arrives — actor stopped/restarted) is caught
/// and surfaced as a synthetic non-matched/timed-out response, preserving the
/// "never throw on timeout" contract. An <see cref="OperationCanceledException"/> /
/// <see cref="TaskCanceledException"/> from the script-deadline token is NOT caught
/// — it propagates to abort the script (§4.3).
/// </summary>
private async Task<WaitForAttributeResponse> WaitInternal(
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
bool requireGoodQuality)
{
var cid = Guid.NewGuid().ToString();
var req = new WaitForAttributeRequest(
cid, _instanceName, name, targetValueEncoded, predicate, timeout, DateTimeOffset.UtcNow,
requireGoodQuality);
try
{
return await _instanceActor.Ask<WaitForAttributeResponse>(
req, timeout + _askTimeout, _scriptTimeoutToken);
}
catch (AskTimeoutException)
{
// Pathological: the InstanceActor's own scheduled timeout reply never
// arrived (e.g. the actor stopped/restarted under us). The helper's
// contract is "false on timeout, never throw" — so synthesize a
// non-matched/timed-out response rather than leaking the Ask exception.
// OperationCanceledException / TaskCanceledException from the
// script-deadline token are deliberately NOT caught here: they must
// propagate to abort the script (§4.3).
return new WaitForAttributeResponse(
cid, Matched: false, null, null, TimedOut: true);
}
}
/// <summary>
/// Sets an attribute value. For data-connected attributes the Instance Actor
/// forwards the write to the DCL, which writes the physical device; the
@@ -366,7 +553,14 @@ public class ScriptRuntimeContext
scriptName,
ScriptArgs.Normalize(parameters),
nextDepth,
correlationId);
correlationId,
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the child
// script run is a NEW execution spawned BY this run. Its parent is
// THIS run's own ExecutionId — NOT the inherited _parentExecutionId.
// So A → CallScript(B) yields B.ParentExecutionId == A.ExecutionId,
// building a true multi-level execution tree rather than flattening
// every nested call under the original inbound spawner.
ParentExecutionId: _executionId);
// Ask the Instance Actor, which routes to the appropriate Script Actor
var result = await _instanceActor.Ask<ScriptCallResult>(request, _askTimeout);
@@ -526,8 +720,14 @@ public class ScriptRuntimeContext
throw new InvalidOperationException(msg);
}
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the shared
// script runs inline, but is modelled as its OWN execution node — a
// child context mints a fresh ExecutionId parented to the caller's
// ExecutionId, so its audit rows chain under the calling run.
var childContext = _context.CreateChildContextForSharedScript(nextDepth);
return await _library.ExecuteAsync(
scriptName, _context, ScriptArgs.Normalize(parameters), cancellationToken);
scriptName, childContext, ScriptArgs.Normalize(parameters), cancellationToken);
}
}
@@ -362,6 +362,9 @@ public class AuditLogIngestActorCombinedTelemetryTests : TestKit, IClassFixture<
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
}
/// <summary>
@@ -399,5 +402,8 @@ public class AuditLogIngestActorCombinedTelemetryTests : TestKit, IClassFixture<
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
}
}
@@ -216,6 +216,14 @@ public class AuditLogIngestActorTests : TestKit, IClassFixture<MsSqlMigrationFix
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
_inner.SwitchOutPartitionAsync(monthBoundary, ct);
public Task<long> PurgeChannelOlderThanAsync(
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
_inner.PurgeChannelOlderThanAsync(channel, threshold, batchSize, ct);
public Task<long> BackfillSourceNodeAsync(
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
_inner.BackfillSourceNodeAsync(sentinel, before, batchSize, ct);
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
DateTime threshold, CancellationToken ct = default) =>
_inner.GetPartitionBoundariesOlderThanAsync(threshold, ct);
@@ -51,6 +51,12 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
public DateTime? ThrowOnBoundary { get; set; }
public Exception? BoundaryException { get; set; }
// M5.5 (T3): records every per-channel purge call as
// (channel, threshold, batchSize) so tests can assert which channels the
// actor chose to purge and with what window.
public List<(string Channel, DateTime Threshold, int BatchSize)> ChannelPurges { get; } = new();
public Func<string, long> RowsPerChannel { get; set; } = _ => 0L;
// The actor enumerator returns whichever list is configured here.
// Mutating this between ticks lets tests simulate "no longer
// eligible" boundaries on the second tick.
@@ -80,6 +86,17 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
return Task.FromResult<IReadOnlyList<DateTime>>(Boundaries.ToArray());
}
public Task<long> PurgeChannelOlderThanAsync(
string channel, DateTime threshold, int batchSize, CancellationToken ct = default)
{
ChannelPurges.Add((channel, threshold, batchSize));
return Task.FromResult(RowsPerChannel(channel));
}
public Task<long> BackfillSourceNodeAsync(
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<ZB.MOM.WW.ScadaBridge.Commons.Types.AuditLogKpiSnapshot> GetKpiSnapshotAsync(
TimeSpan window, DateTime? nowUtc = null, CancellationToken ct = default) =>
Task.FromResult(new ZB.MOM.WW.ScadaBridge.Commons.Types.AuditLogKpiSnapshot(0L, 0L, 0L, nowUtc ?? DateTime.UtcNow));
@@ -268,21 +285,32 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Today is ~2026-05-20 per the test environment. With RetentionDays =
// 60 the actor computes threshold ≈ 2026-03-21:
// * Jan partition (MAX = Jan 15) → older than threshold → PURGED
// * Apr partition (MAX = Apr 15) → newer than threshold → KEPT
// Seeds two rows within the defined pf_AuditLog_Month partition range (Jan 2026
// Dec 2027). RetentionDays is computed dynamically so the purge threshold always
// anchors near 2026-01-20, keeping the test date-independent:
// old row = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → partition PURGED
// kept row = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → partition KEPT
//
// Using a fixed thresholdAnchor rather than "N months ago" avoids the problem
// of relative seeds landing before 2026-01-01 (the catch-all partition that
// GetPartitionBoundariesOlderThanAsync never returns).
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
var oldOccurred = new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc);
var keptOccurred = new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc);
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEvt = ScadaBridgeAuditEventFactory.Create(
var oldEvt = ScadaBridgeAuditEventFactory.Create(
eventId: Guid.NewGuid(),
occurredAtUtc: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
occurredAtUtc: oldOccurred,
channel: AuditChannel.ApiOutbound,
kind: AuditKind.ApiCall,
status: AuditStatus.Delivered,
sourceSiteId: siteId);
var aprEvt = ScadaBridgeAuditEventFactory.Create(
var keptEvt = ScadaBridgeAuditEventFactory.Create(
eventId: Guid.NewGuid(),
occurredAtUtc: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc),
occurredAtUtc: keptOccurred,
channel: AuditChannel.ApiOutbound,
kind: AuditKind.ApiCall,
status: AuditStatus.Delivered,
@@ -291,8 +319,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
await using (var seedContext = CreateMsSqlContext())
{
var seedRepo = new AuditLogRepository(seedContext);
await seedRepo.InsertIfNotExistsAsync(janEvt);
await seedRepo.InsertIfNotExistsAsync(aprEvt);
await seedRepo.InsertIfNotExistsAsync(oldEvt);
await seedRepo.InsertIfNotExistsAsync(keptEvt);
}
// Wire the actor's DI scope to the real repository against the
@@ -306,7 +334,7 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
services.AddScoped<IAuditLogRepository, AuditLogRepository>();
var sp = services.BuildServiceProvider();
var auditOptions = new AuditLogOptions { RetentionDays = 60 };
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
var purgeOptions = new AuditLogPurgeOptions
{
IntervalHours = 24,
@@ -320,13 +348,9 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
Options.Create(auditOptions),
NullLogger<AuditLogPurgeActor>.Instance)));
// The probe receives one AuditLogPurgedEvent per partition the actor
// purges per tick — other test runs that share the fixture DB may
// also leave behind eligible partitions, but this test creates its
// own fixture DB so the Jan-2026 partition is the only eligible one.
// Use FishForMessage to filter just in case, with a generous timeout
// because the real drop-and-rebuild dance against MSSQL routinely
// takes a couple of seconds on a busy dev container.
// Fish for the Jan-2026 partition boundary — the only eligible one in this
// fixture DB. The generous timeout covers the real drop-and-rebuild dance
// against MSSQL which routinely takes a couple of seconds on a busy dev container.
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
isMessage: m => m.MonthBoundary == janBoundary,
@@ -342,8 +366,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
.Where(e => e.SourceSiteId == siteId)
.ToListAsync();
Assert.DoesNotContain(rows, r => r.EventId == janEvt.EventId);
Assert.Contains(rows, r => r.EventId == aprEvt.EventId);
Assert.DoesNotContain(rows, r => r.EventId == oldEvt.EventId);
Assert.Contains(rows, r => r.EventId == keptEvt.EventId);
}
private ScadaBridgeDbContext CreateMsSqlContext() =>
@@ -381,4 +405,90 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
Math.Abs((threshold - expected).TotalMinutes) < 1.0,
$"threshold {threshold:o} should be within 1 minute of {expected:o}");
}
// ---------------------------------------------------------------------
// 8. PerChannelOverride_ShorterThanGlobal_TriggersChannelPurge (M5.5 T3)
// ---------------------------------------------------------------------
[Fact]
public void PerChannelOverride_ShorterThanGlobal_TriggersChannelPurge()
{
// ApiOutbound has a 30-day override under a 365-day global window — strictly
// shorter, so the actor must run a per-channel purge with a threshold of
// ~today-30d and the configured batch size.
var repo = new RecordingRepo { Boundaries = new List<DateTime>() };
var purgeOptions = FastTickOptions();
purgeOptions.ChannelPurgeBatchSizeConfigured = 1234;
// Build the options OUTSIDE the Props expression tree — a collection/dictionary
// initializer is not legal inside an expression-tree lambda (CS8074).
var auditOptions = Options.Create(new AuditLogOptions
{
RetentionDays = 365,
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiOutbound"] = 30 },
});
var purgeOptionsWrapped = Options.Create(purgeOptions);
var sp = BuildScopedProvider(repo);
Sys.ActorOf(Props.Create(() => new AuditLogPurgeActor(
sp,
purgeOptionsWrapped,
auditOptions,
NullLogger<AuditLogPurgeActor>.Instance)));
AwaitAssert(
() => Assert.Contains(repo.ChannelPurges, p => p.Channel == "ApiOutbound"),
duration: TimeSpan.FromSeconds(3),
interval: TimeSpan.FromMilliseconds(50));
var purge = repo.ChannelPurges.First(p => p.Channel == "ApiOutbound");
Assert.Equal(1234, purge.BatchSize);
var expected = DateTime.UtcNow - TimeSpan.FromDays(30);
Assert.True(
Math.Abs((purge.Threshold - expected).TotalMinutes) < 1.0,
$"channel threshold {purge.Threshold:o} should be within 1 minute of {expected:o}");
}
// ---------------------------------------------------------------------
// 9. PerChannelOverride_EqualOrLongerThanGlobal_SkipsChannelPurge (M5.5 T3)
// ---------------------------------------------------------------------
[Fact]
public void PerChannelOverride_EqualOrLongerThanGlobal_SkipsChannelPurge()
{
// DbOutbound = 365 (== global) and Notification = 400 (> global, validator would
// normally reject this but the actor must defensively skip it too). Neither is
// SHORTER than the global window, so the actor must NOT issue a channel purge —
// the global partition switch-out already governs those rows.
var repo = new RecordingRepo { Boundaries = new List<DateTime>() };
// Build the options OUTSIDE the Props expression tree (CS8074).
var auditOptions = Options.Create(new AuditLogOptions
{
RetentionDays = 365,
PerChannelRetentionDays = new Dictionary<string, int>
{
["DbOutbound"] = 365,
["Notification"] = 400,
},
});
var purgeOptions = Options.Create(FastTickOptions());
var sp = BuildScopedProvider(repo);
Sys.ActorOf(Props.Create(() => new AuditLogPurgeActor(
sp,
purgeOptions,
auditOptions,
NullLogger<AuditLogPurgeActor>.Instance)));
// Wait for at least one tick (visible via the enumerator call), then assert no
// channel purge was issued.
AwaitAssert(
() => Assert.True(repo.ThresholdQueries.Count >= 1),
duration: TimeSpan.FromSeconds(3),
interval: TimeSpan.FromMilliseconds(50));
Assert.Empty(repo.ChannelPurges);
}
}
@@ -8,6 +8,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using IAuditInboundCeilingHitsCounter = ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter;
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Tests.Central;
@@ -43,6 +44,12 @@ public class CentralAuditWriteFailuresTests : TestKit
Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>());
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<long> PurgeChannelOlderThanAsync(
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<long> BackfillSourceNodeAsync(
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
DateTime threshold, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<DateTime>>(Array.Empty<DateTime>());
@@ -163,6 +170,69 @@ public class CentralAuditWriteFailuresTests : TestKit
var snapshot = new AuditCentralHealthSnapshot();
Assert.Equal(0, snapshot.CentralAuditWriteFailures);
Assert.Equal(0, snapshot.AuditRedactionFailure);
Assert.Equal(0, snapshot.AuditInboundCeilingHits);
Assert.Empty(snapshot.SiteAuditTelemetryStalled);
}
// ---------------------------------------------------------------------
// M5.3 (T7) AuditInboundCeilingHits counter
// AuditCentralHealthSnapshot implements IAuditInboundCeilingHitsCounter.
// Incrementing through the interface surface is reflected on the snapshot.
// ---------------------------------------------------------------------
[Fact]
public void AuditInboundCeilingHits_StartsAtZero()
{
var snapshot = new AuditCentralHealthSnapshot();
Assert.Equal(0, snapshot.AuditInboundCeilingHits);
}
[Fact]
public void AuditInboundCeilingHits_IncrementedThroughInterface_ReflectedOnSnapshot()
{
var snapshot = new AuditCentralHealthSnapshot();
var counter = (IAuditInboundCeilingHitsCounter)snapshot;
counter.Increment();
counter.Increment();
counter.Increment();
Assert.Equal(3, snapshot.AuditInboundCeilingHits);
}
[Fact]
public void AuditInboundCeilingHits_IsThreadSafe()
{
// Interlocked increment must produce the correct count under concurrent
// increments — same shape as the existing counter tests.
var snapshot = new AuditCentralHealthSnapshot();
var counter = (IAuditInboundCeilingHitsCounter)snapshot;
const int incrementCount = 1000;
Parallel.For(0, incrementCount, _ => counter.Increment());
Assert.Equal(incrementCount, snapshot.AuditInboundCeilingHits);
}
[Fact]
public void AuditInboundCeilingHits_IsIndependentOfOtherCounters()
{
// Ceiling-hits increments must not cross-contaminate the other counters
// and vice versa — each Interlocked field is independent.
var snapshot = new AuditCentralHealthSnapshot();
var ceilingCounter = (IAuditInboundCeilingHitsCounter)snapshot;
var writeCounter = (ICentralAuditWriteFailureCounter)snapshot;
var redactCounter = (ZB.MOM.WW.ScadaBridge.AuditLog.Payload.IAuditRedactionFailureCounter)snapshot;
ceilingCounter.Increment();
ceilingCounter.Increment();
writeCounter.Increment();
redactCounter.Increment();
redactCounter.Increment();
redactCounter.Increment();
Assert.Equal(2, snapshot.AuditInboundCeilingHits);
Assert.Equal(1, snapshot.CentralAuditWriteFailures);
Assert.Equal(3, snapshot.AuditRedactionFailure);
}
}
@@ -89,6 +89,14 @@ public class SiteAuditReconciliationActorTests : TestKit, IClassFixture<MsSqlMig
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<long> PurgeChannelOlderThanAsync(
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<long> BackfillSourceNodeAsync(
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
Task.FromResult(0L);
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
DateTime threshold, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<DateTime>>(Array.Empty<DateTime>());
@@ -50,4 +50,107 @@ public class AuditLogOptionsValidatorTests
result.Failures!,
f => f.Contains(nameof(AuditLogOptions.InboundMaxBytes), StringComparison.Ordinal));
}
// ---------------------------------------------------------------------
// M5.5 (T3) per-channel retention overrides
// ---------------------------------------------------------------------
[Fact]
public void Validate_PerChannelRetention_ShorterThanGlobal_Passes()
{
// A per-channel window strictly shorter than the global window is the
// sanctioned case — the purge actor expires those rows earlier via the
// maintenance-path row DELETE.
var validator = new AuditLogOptionsValidator();
var opts = new AuditLogOptions
{
RetentionDays = 365,
PerChannelRetentionDays = new Dictionary<string, int>
{
["ApiOutbound"] = 90,
["Notification"] = 30, // floor (MinRetentionDays)
},
};
Assert.True(validator.Validate(null, opts).Succeeded);
}
[Fact]
public void Validate_PerChannelRetention_EqualToGlobal_Passes()
{
// Equal to global is allowed (the bound is [Min, RetentionDays] inclusive);
// the purge actor simply treats it as a no-op since it is not SHORTER.
var validator = new AuditLogOptionsValidator();
var opts = new AuditLogOptions
{
RetentionDays = 200,
PerChannelRetentionDays = new Dictionary<string, int> { ["DbOutbound"] = 200 },
};
Assert.True(validator.Validate(null, opts).Succeeded);
}
[Fact]
public void Validate_PerChannelRetention_LongerThanGlobal_Fails()
{
// A per-channel window LONGER than the global window is meaningless under
// month-partition switch-out (governed by the global window) and is rejected.
var validator = new AuditLogOptionsValidator();
var opts = new AuditLogOptions
{
RetentionDays = 100,
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiInbound"] = 200 },
};
var result = validator.Validate(null, opts);
Assert.False(result.Succeeded);
Assert.Contains(
result.Failures!,
f => f.Contains(nameof(AuditLogOptions.PerChannelRetentionDays), StringComparison.Ordinal)
&& f.Contains("ApiInbound", StringComparison.Ordinal));
}
[Fact]
public void Validate_PerChannelRetention_BelowMinimum_Fails()
{
var validator = new AuditLogOptionsValidator();
var opts = new AuditLogOptions
{
RetentionDays = 365,
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiOutbound"] = 29 },
};
var result = validator.Validate(null, opts);
Assert.False(result.Succeeded);
Assert.Contains(
result.Failures!,
f => f.Contains(nameof(AuditLogOptions.PerChannelRetentionDays), StringComparison.Ordinal));
}
[Fact]
public void Validate_PerChannelRetention_UnknownChannelKey_Fails()
{
// Keys must be recognized AuditChannel names; a typo / unknown key is rejected
// rather than silently ignored so a misconfiguration surfaces at boot.
var validator = new AuditLogOptionsValidator();
var opts = new AuditLogOptions
{
RetentionDays = 365,
PerChannelRetentionDays = new Dictionary<string, int> { ["NotAChannel"] = 90 },
};
var result = validator.Validate(null, opts);
Assert.False(result.Succeeded);
Assert.Contains(
result.Failures!,
f => f.Contains("NotAChannel", StringComparison.Ordinal));
}
[Fact]
public void Validate_PerChannelRetention_DefaultEmpty_Passes()
{
// The default (no overrides) must pass — this is the common case.
var validator = new AuditLogOptionsValidator();
Assert.True(validator.Validate(null, new AuditLogOptions()).Succeeded);
}
}
@@ -623,5 +623,11 @@ public class ParentExecutionIdCorrelationTests : TestKit, IClassFixture<MsSqlMig
public Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken)
=> throw new NotSupportedException();
// WaitForAttribute is not part of this fixture's routed-Call audit scenario;
// mirror the other non-Call methods (unexercised here).
public Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken)
=> throw new NotSupportedException();
}
}
@@ -67,19 +67,25 @@ public class PartitionPurgeTests : TestKit, IClassFixture<MsSqlMigrationFixture>
SqlConnection conn,
Guid eventId,
DateTime occurredAtUtc,
string siteId)
string siteId,
string channel = "ApiOutbound",
string kind = "ApiCall")
{
await using var cmd = conn.CreateCommand();
// C5 (Task 2.5): dbo.AuditLog is now the 10 canonical columns + DetailsJson;
// the ScadaBridge domain fields (channel/kind/status/sourceSiteId) ride in
// DetailsJson and the SourceSiteId/Kind/Status computed columns auto-derive.
// Action = "{channel}.{kind}", Category = channel name, Outcome = Success.
// The channel/kind are parameterized so the M5.5 per-channel purge test can
// seed multiple channels into the same partition.
cmd.CommandText = @"
INSERT INTO dbo.AuditLog
(EventId, OccurredAtUtc, Actor, Action, Outcome, Category, Target, SourceNode, CorrelationId, DetailsJson)
VALUES
(@EventId, @OccurredAtUtc, NULL, 'ApiOutbound.ApiCall', 'Success', 'ApiOutbound', NULL, NULL, NULL,
(@EventId, @OccurredAtUtc, NULL, @Action, 'Success', @Category, NULL, NULL, NULL,
@DetailsJson);";
cmd.Parameters.Add("@Action", System.Data.SqlDbType.VarChar, 64).Value = $"{channel}.{kind}";
cmd.Parameters.Add("@Category", System.Data.SqlDbType.VarChar, 32).Value = channel;
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
// SqlDbType.DateTime2 with explicit Scale 7 matches the
// OccurredAtUtc column shape (datetime2(7)) and avoids the implicit
@@ -97,7 +103,7 @@ VALUES
// the computed SourceSiteId column the verify queries scope on. payloadTruncated
// is always present (the codec always writes the bool).
var detailsJson =
"{\"channel\":\"ApiOutbound\",\"kind\":\"ApiCall\",\"status\":\"Delivered\"," +
"{\"channel\":\"" + channel + "\",\"kind\":\"" + kind + "\",\"status\":\"Delivered\"," +
"\"sourceSiteId\":\"" + siteId + "\",\"payloadTruncated\":false}";
cmd.Parameters.Add("@DetailsJson", System.Data.SqlDbType.NVarChar, -1).Value = detailsJson;
await cmd.ExecuteNonQueryAsync();
@@ -134,10 +140,49 @@ WHERE name = 'UX_AuditLog_EventId'
NullLogger<AuditLogPurgeActor>.Instance)));
}
private static (DateTime Jan, DateTime Feb, DateTime Mar) SeedOccurredAt() => (
new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
new DateTime(2026, 2, 15, 0, 0, 0, DateTimeKind.Utc),
new DateTime(2026, 3, 15, 0, 0, 0, DateTimeKind.Utc));
/// <summary>
/// Returns three seed timestamps and a computed <c>RetentionDays</c> value that
/// keep the purge-intent date-independent regardless of when the test runs.
/// </summary>
/// <remarks>
/// <para>
/// The partition function <c>pf_AuditLog_Month</c> has explicit boundaries only
/// for 2026-01-01 through 2027-12-01. Rows outside that range land in the
/// catch-all partitions which have no <c>partition_range_values</c> entry and are
/// therefore never returned by
/// <see cref="IAuditLogRepository.GetPartitionBoundariesOlderThanAsync"/>.
/// All three seeds must therefore fall inside the defined boundary range.
/// </para>
/// <para>
/// To remain date-independent the test computes <c>RetentionDays</c> dynamically
/// so the purge threshold always lands near <b>2026-01-20</b>:
/// <code>
/// RetentionDays = (int)(DateTime.UtcNow - new DateTime(2026, 1, 20, UTC)).TotalDays + 1
/// </code>
/// This gives:
/// <list type="bullet">
/// <item>Jan 15 2026 row → Jan 15 &lt; Jan 20 threshold → <b>PURGED</b>.</item>
/// <item>Apr 15 / Jun 15 2026 rows → both after Jan 20 → <b>KEPT</b>.</item>
/// </list>
/// The threshold anchors to a fixed calendar point (~Jan 20 2026), so the
/// relationship holds for any future run date as long as the explicit partition
/// boundaries remain.
/// </para>
/// </remarks>
private static (DateTime Old, DateTime Mid, DateTime Recent, int RetentionDays) SeedOccurredAt()
{
// Anchor the threshold midway through January 2026 — strictly after the
// "old" seed (Jan 15) and strictly before the "mid" seed (Apr 15).
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
return (
Old: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc), // in Jan-2026 partition → PURGED
Mid: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc), // in Apr-2026 partition → KEPT
Recent: new DateTime(2026, 6, 15, 0, 0, 0, DateTimeKind.Utc), // in Jun-2026 partition → KEPT
RetentionDays: retentionDays
);
}
// ---------------------------------------------------------------------
// 1. EndToEnd_OldestPartition_PurgedViaActor_NewerKept
@@ -148,24 +193,23 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Test date is ~2026-05-20 per environment. We want a threshold that
// sits strictly between Jan 15 (the Jan partition's MAX) and Feb 15
// (the Feb partition's MAX) so only the Jan-2026 partition is
// eligible for purge. RetentionDays = 100 gives a threshold of
// ~2026-02-09 — Jan 15 is older (purged), Feb 15 and Mar 15 are
// newer (kept). The window between Jan 15 and Feb 15 is wide enough
// (~30 days) to tolerate any plausible test-clock drift in CI.
// Seeds three rows in distinct calendar months. RetentionDays is computed
// dynamically so the purge threshold always lands near 2026-01-20 (see
// SeedOccurredAt() for the full rationale):
// Old = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → PURGED
// Mid = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → KEPT
// Recent = Jun 15 2026 → Jun 15 > threshold ~Jan 20 → KEPT
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var febEventId = Guid.NewGuid();
var marEventId = Guid.NewGuid();
var (janOccurred, febOccurred, marOccurred) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var midEventId = Guid.NewGuid();
var recentEventId = Guid.NewGuid();
var (oldOccurred, midOccurred, recentOccurred, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, febEventId, febOccurred, siteId);
await DirectInsertAsync(seedConn, marEventId, marOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
await DirectInsertAsync(seedConn, midEventId, midOccurred, siteId);
await DirectInsertAsync(seedConn, recentEventId, recentOccurred, siteId);
}
// Wire the actor with a real EF context against the fixture DB.
@@ -184,15 +228,11 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
};
var auditOptions = new AuditLogOptions { RetentionDays = 100 };
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
CreateActor(sp, purgeOptions, auditOptions);
// Wait for the actor's tick to purge the Jan-2026 partition.
// Concurrent test runs against the same fixture might also create
// eligible partitions, but each test class owns its own fixture DB
// (MsSqlMigrationFixture seeds a guid-named DB per class), so the
// Jan-2026 boundary is the only one this test can have produced.
// The Jan-2026 partition boundary is the only eligible one in this fixture DB.
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
isMessage: m => m.MonthBoundary == janBoundary,
@@ -200,9 +240,7 @@ WHERE name = 'UX_AuditLog_EventId'
Assert.True(matched.RowsDeleted >= 1,
$"Expected RowsDeleted >= 1 for Jan-2026 boundary; got {matched.RowsDeleted}.");
// Allow a brief settle in case the actor is mid-tick on Feb/Mar
// (it shouldn't be, since RetentionDays = 90 means only Jan is
// eligible, but the actor MAY re-enumerate quickly while we read).
// Allow a brief settle in case the actor re-enumerates quickly.
await Task.Delay(TimeSpan.FromMilliseconds(500));
await using var verify = CreateContext();
@@ -210,11 +248,10 @@ WHERE name = 'UX_AuditLog_EventId'
.Where(e => e.SourceSiteId == siteId)
.ToListAsync();
// Jan removed; Feb + Mar untouched. Because the test owns the site
// id and the fixture DB, exact set membership is observable.
Assert.DoesNotContain(rows, r => r.EventId == janEventId);
Assert.Contains(rows, r => r.EventId == febEventId);
Assert.Contains(rows, r => r.EventId == marEventId);
// Old (Jan) removed; Mid (Apr) + Recent (Jun) untouched.
Assert.DoesNotContain(rows, r => r.EventId == oldEventId);
Assert.Contains(rows, r => r.EventId == midEventId);
Assert.Contains(rows, r => r.EventId == recentEventId);
}
// ---------------------------------------------------------------------
@@ -226,20 +263,19 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Same shape as test 1 — purge the Jan-2026 partition and then
// assert the UX_AuditLog_EventId index is still present. The
// drop-and-rebuild dance briefly removes it inside its transaction
// (the SWITCH PARTITION step requires the non-aligned unique index
// to be absent), but step 5 rebuilds it before committing. Sanity-
// checking the post-COMMIT shape here documents the invariant in an
// assertable way.
// Same shape as test 1 — purge the Jan-2026 partition and then assert the
// UX_AuditLog_EventId index is still present. RetentionDays is computed
// dynamically so the threshold always lands near 2026-01-20 (see SeedOccurredAt()).
// The drop-and-rebuild dance briefly removes the index inside its transaction
// (the SWITCH PARTITION step requires the non-aligned unique index to be absent),
// but step 5 rebuilds it before committing.
var siteId = "purge-uxidx-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var (janOccurred, _, _) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
}
var services = new ServiceCollection();
@@ -259,7 +295,7 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
},
new AuditLogOptions { RetentionDays = 90 });
new AuditLogOptions { RetentionDays = retentionDays });
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
probe.FishForMessage<AuditLogPurgedEvent>(
@@ -281,18 +317,19 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Seed + purge a Jan-2026 row, THEN exercise InsertIfNotExistsAsync
// twice for a fresh (May-2026) EventId. The second call must be a
// no-op (duplicate-key collision swallowed by the repository, per
// M2 Bundle A's race-fix) — which means the rebuilt
// UX_AuditLog_EventId unique index is functioning as intended.
// Seed + purge the Jan-2026 row, THEN exercise InsertIfNotExistsAsync twice for
// a fresh recent EventId. The second call must be a no-op (duplicate-key collision
// swallowed by the repository, per M2 Bundle A's race-fix) — which means the
// rebuilt UX_AuditLog_EventId unique index is functioning as intended.
// RetentionDays is computed dynamically so the threshold always lands near
// 2026-01-20 (see SeedOccurredAt()).
var siteId = "purge-idem-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var (janOccurred, _, _) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
}
var services = new ServiceCollection();
@@ -312,7 +349,7 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
},
new AuditLogOptions { RetentionDays = 90 });
new AuditLogOptions { RetentionDays = retentionDays });
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
probe.FishForMessage<AuditLogPurgedEvent>(
@@ -328,7 +365,7 @@ WHERE name = 'UX_AuditLog_EventId'
await Task.Delay(TimeSpan.FromMilliseconds(500));
var freshEventId = Guid.NewGuid();
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc);
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc); // within partition range, well inside retention window
var freshSite = "purge-idem-fresh-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var freshEvt = ScadaBridgeAuditEventFactory.Create(
eventId: freshEventId,
@@ -354,4 +391,87 @@ WHERE name = 'UX_AuditLog_EventId'
Assert.Single(rows);
Assert.Equal(freshEventId, rows[0].EventId);
}
// ---------------------------------------------------------------------
// 4. PerChannelOverride_DeletesOnlyOverriddenChannelsOldRows (M5.5 T3)
// ---------------------------------------------------------------------
/// <summary>
/// M5.5 (T3): exercises <see cref="IAuditLogRepository.PurgeChannelOlderThanAsync"/>
/// directly against the real repository + fixture DB. Seeds, in the SAME partition,
/// old + recent rows for an OVERRIDDEN channel (<c>ApiOutbound</c>) and old + recent
/// rows for an UN-overridden channel (<c>DbOutbound</c>), then runs the per-channel
/// purge for <c>ApiOutbound</c> only. Asserts:
/// <list type="number">
/// <item>The overridden channel's OLD rows are deleted.</item>
/// <item>The overridden channel's RECENT rows (newer than the channel threshold) survive.</item>
/// <item>The un-overridden channel's rows (old AND recent) are completely untouched
/// — they follow the global window, which the channel purge never applies to them.</item>
/// </list>
/// This is the maintenance-path row DELETE; the fixture connects as <c>sa</c>, which
/// the append-only writer-role DENYs do not bind (the role granularity is exercised
/// in the repository/migration tests).
/// </summary>
[SkippableFact]
public async Task PerChannelOverride_DeletesOnlyOverriddenChannelsOldRows()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var siteId = "perchannel-" + Guid.NewGuid().ToString("N").Substring(0, 8);
// Two timestamps: one OLD (older than the channel threshold we will purge with)
// and one RECENT (newer than it). Both sit comfortably inside the retention
// window so the global partition purge would NOT touch either — isolating the
// per-channel DELETE as the only force acting here.
var oldOccurred = new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc);
var recentOccurred = new DateTime(2026, 5, 15, 0, 0, 0, DateTimeKind.Utc);
var apiOldId = Guid.NewGuid(); // ApiOutbound, old → SHOULD be deleted
var apiRecentId = Guid.NewGuid(); // ApiOutbound, recent→ SHOULD survive
var dbOldId = Guid.NewGuid(); // DbOutbound, old → SHOULD survive (un-overridden)
var dbRecentId = Guid.NewGuid(); // DbOutbound, recent → SHOULD survive
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, apiOldId, oldOccurred, siteId, channel: "ApiOutbound", kind: "ApiCall");
await DirectInsertAsync(seedConn, apiRecentId, recentOccurred, siteId, channel: "ApiOutbound", kind: "ApiCall");
await DirectInsertAsync(seedConn, dbOldId, oldOccurred, siteId, channel: "DbOutbound", kind: "DbWrite");
await DirectInsertAsync(seedConn, dbRecentId, recentOccurred, siteId, channel: "DbOutbound", kind: "DbWrite");
}
// Purge ApiOutbound rows older than a threshold that sits strictly between the
// old (Jan 15) and recent (May 15) seeds — e.g. Mar 1. Only apiOldId qualifies.
var channelThreshold = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
await using (var ctx = CreateContext())
{
var repo = new AuditLogRepository(ctx);
var deleted = await repo.PurgeChannelOlderThanAsync(
channel: "ApiOutbound",
threshold: channelThreshold,
batchSize: 2);
Assert.Equal(1L, deleted);
// Idempotent: a second run deletes nothing (the eligible row is gone).
var deletedAgain = await repo.PurgeChannelOlderThanAsync(
channel: "ApiOutbound",
threshold: channelThreshold,
batchSize: 2);
Assert.Equal(0L, deletedAgain);
}
await using var verify = CreateContext();
var rows = await verify.Set<AuditLogRow>()
.Where(e => e.SourceSiteId == siteId)
.ToListAsync();
// Overridden channel: old gone, recent kept.
Assert.DoesNotContain(rows, r => r.EventId == apiOldId);
Assert.Contains(rows, r => r.EventId == apiRecentId);
// Un-overridden channel: BOTH rows untouched (follow the global window).
Assert.Contains(rows, r => r.EventId == dbOldId);
Assert.Contains(rows, r => r.EventId == dbRecentId);
}
}
@@ -0,0 +1,244 @@
using System.CommandLine;
using System.Net;
using System.Text;
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.CLI;
using ZB.MOM.WW.ScadaBridge.CLI.Commands;
namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
/// <summary>
/// Tests for the <c>scadabridge audit backfill-source-node</c> subcommand
/// (Audit Log #23 M5.6 T5): argument parsing, request-body construction,
/// HTTP wiring, and CLI scaffold.
/// </summary>
[Collection("Console")]
public class AuditBackfillCommandTests
{
// ─────────────────────────────────────────────────────────────────────
// BuildRequestBody
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void BuildRequestBody_DefaultArgs_ContainsExpectedFields()
{
var args = new AuditBackfillSourceNodeArgs
{
Sentinel = "unknown",
Before = "2026-01-01T00:00:00Z",
BatchSize = 5000,
};
var body = AuditBackfillHelpers.BuildRequestBody(args);
using var doc = JsonDocument.Parse(body);
var root = doc.RootElement;
Assert.Equal("unknown", root.GetProperty("sentinel").GetString());
Assert.Equal("2026-01-01T00:00:00Z", root.GetProperty("before").GetString());
Assert.Equal(5000, root.GetProperty("batchSize").GetInt32());
}
[Fact]
public void BuildRequestBody_CustomSentinelAndBatch_ReflectedInJson()
{
var args = new AuditBackfillSourceNodeArgs
{
Sentinel = "pre-feature",
Before = "2026-06-01T00:00:00Z",
BatchSize = 1000,
};
var body = AuditBackfillHelpers.BuildRequestBody(args);
using var doc = JsonDocument.Parse(body);
var root = doc.RootElement;
Assert.Equal("pre-feature", root.GetProperty("sentinel").GetString());
Assert.Equal("2026-06-01T00:00:00Z", root.GetProperty("before").GetString());
Assert.Equal(1000, root.GetProperty("batchSize").GetInt32());
}
// ─────────────────────────────────────────────────────────────────────
// RunBackfillAsync — HTTP execution
// ─────────────────────────────────────────────────────────────────────
private sealed class CapturingHandler : HttpMessageHandler
{
private readonly HttpStatusCode _status;
private readonly string _responseBody;
public CapturingHandler(HttpStatusCode status, string responseBody)
{
_status = status;
_responseBody = responseBody;
}
public string? LastRequestUri { get; private set; }
public string? LastRequestBody { get; private set; }
public string? LastMethod { get; private set; }
protected override async Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request, CancellationToken cancellationToken)
{
LastRequestUri = request.RequestUri!.PathAndQuery;
LastMethod = request.Method.Method;
if (request.Content != null)
{
LastRequestBody = await request.Content.ReadAsStringAsync(cancellationToken);
}
return new HttpResponseMessage(_status)
{
Content = new StringContent(_responseBody, Encoding.UTF8, "application/json"),
};
}
}
private static string SuccessBody(long rowsUpdated = 42, string sentinel = "unknown", string before = "2026-01-01T00:00:00.0000000Z")
=> JsonSerializer.Serialize(new { rowsUpdated, sentinel, before });
[Fact]
public async Task RunBackfill_Success_ReturnsZeroAndWritesOutput()
{
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody(rowsUpdated: 42));
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var args = new AuditBackfillSourceNodeArgs
{
Sentinel = "unknown",
Before = "2026-01-01T00:00:00Z",
BatchSize = 5000,
};
var exit = await AuditBackfillHelpers.RunBackfillAsync(client, args, output);
Assert.Equal(0, exit);
var text = output.ToString();
Assert.Contains("42", text);
Assert.Contains("backfill complete", text, StringComparison.OrdinalIgnoreCase);
}
[Fact]
public async Task RunBackfill_RequestUri_ContainsBackfillPath()
{
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody());
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
await AuditBackfillHelpers.RunBackfillAsync(
client,
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
output);
Assert.Contains("backfill-source-node", handler.LastRequestUri);
Assert.Equal("POST", handler.LastMethod);
}
[Fact]
public async Task RunBackfill_RequestBody_ContainsSentinelAndBefore()
{
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody());
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
await AuditBackfillHelpers.RunBackfillAsync(
client,
new AuditBackfillSourceNodeArgs
{
Sentinel = "pre-feature",
Before = "2026-01-01T00:00:00Z",
BatchSize = 2000,
},
output);
Assert.NotNull(handler.LastRequestBody);
using var doc = JsonDocument.Parse(handler.LastRequestBody!);
Assert.Equal("pre-feature", doc.RootElement.GetProperty("sentinel").GetString());
Assert.Equal("2026-01-01T00:00:00Z", doc.RootElement.GetProperty("before").GetString());
Assert.Equal(2000, doc.RootElement.GetProperty("batchSize").GetInt32());
}
[Fact]
public async Task RunBackfill_Http403_ReturnsExitCode2()
{
var handler = new CapturingHandler(HttpStatusCode.Forbidden,
"{\"error\":\"Permission required.\",\"code\":\"UNAUTHORIZED\"}");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditBackfillHelpers.RunBackfillAsync(
client,
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
output);
Assert.Equal(2, exit);
}
[Fact]
public async Task RunBackfill_Http500_ReturnsExitCode1()
{
var handler = new CapturingHandler(HttpStatusCode.InternalServerError,
"{\"error\":\"boom\",\"code\":\"INTERNAL\"}");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditBackfillHelpers.RunBackfillAsync(
client,
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
output);
Assert.Equal(1, exit);
}
// ─────────────────────────────────────────────────────────────────────
// CLI parsing
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void BackfillSourceNode_Subcommand_ExistsInAuditCommandGroup()
{
var root = AuditCommandTestHarness.BuildRoot();
var parse = root.Parse(new[] { "audit", "backfill-source-node", "--help" });
Assert.Empty(parse.Errors);
}
[Fact]
public void BackfillSourceNode_BeforeOption_IsRequired()
{
var root = AuditCommandTestHarness.BuildRoot();
var (exit, _, err) = AuditCommandTestHarness.Invoke(root, "audit", "backfill-source-node");
Assert.NotEqual(0, exit);
}
[Fact]
public void BackfillSourceNode_HelpText_DescribesSentinelAndBefore()
{
var root = AuditCommandTestHarness.BuildRoot();
var output = new StringWriter();
var exit = root.Parse(new[] { "audit", "backfill-source-node", "--help" })
.Invoke(new InvocationConfiguration { Output = output });
Assert.Equal(0, exit);
var text = output.ToString();
Assert.Contains("sentinel", text, StringComparison.OrdinalIgnoreCase);
Assert.Contains("before", text, StringComparison.OrdinalIgnoreCase);
}
[Fact]
public void BackfillSourceNode_DefaultSentinel_IsUnknown()
{
// Verify the default sentinel value is "unknown" as documented.
var url = new Option<string>("--url") { Recursive = true };
var username = new Option<string>("--username") { Recursive = true };
var password = new Option<string>("--password") { Recursive = true };
var format = CliOptions.CreateFormatOption();
var auditGroup = AuditCommands.Build(url, format, username, password);
var backfillCmd = auditGroup.Subcommands
.FirstOrDefault(c => c.Name == "backfill-source-node");
Assert.NotNull(backfillCmd);
// The subcommand exists and its description mentions maintenance/sentinel.
Assert.False(string.IsNullOrWhiteSpace(backfillCmd!.Description));
}
}
@@ -5,8 +5,8 @@ namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
/// <summary>
/// Scaffold tests for the <c>scadabridge audit</c> command group (Audit Log #23 M8-T1).
/// Verifies the parent command exists with its three subcommands and that every leaf
/// has an action wired.
/// Verifies the parent command exists with its subcommands and that every leaf
/// has an action wired. Updated for M5.6 T5 to cover <c>backfill-source-node</c>.
/// </summary>
public class AuditCommandsScaffoldTests
{
@@ -27,11 +27,13 @@ public class AuditCommandsScaffoldTests
}
[Fact]
public void Audit_HasThreeSubcommands_QueryExportVerifyChain()
public void Audit_HasFiveSubcommands_QueryExportTreeVerifyChainBackfillSourceNode()
{
var audit = BuildAudit();
var names = audit.Subcommands.Select(c => c.Name).OrderBy(n => n).ToArray();
Assert.Equal(new[] { "export", "query", "verify-chain" }, names);
Assert.Equal(
new[] { "backfill-source-node", "export", "query", "tree", "verify-chain" },
names);
}
[Fact]
@@ -48,7 +50,9 @@ public class AuditCommandsScaffoldTests
var text = output.ToString();
Assert.Contains("query", text);
Assert.Contains("export", text);
Assert.Contains("tree", text);
Assert.Contains("verify-chain", text);
Assert.Contains("backfill-source-node", text);
}
[Fact]
@@ -0,0 +1,346 @@
using System.CommandLine;
using System.Net;
using System.Text;
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.CLI;
using ZB.MOM.WW.ScadaBridge.CLI.Commands;
namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
/// <summary>
/// Tests for the <c>scadabridge audit tree</c> subcommand (Audit Log #23 M5.1-T8):
/// tree rendering (table format), JSON output, error handling, and CLI parsing.
/// </summary>
[Collection("Console")]
public class AuditTreeCommandTests
{
// ─────────────────────────────────────────────────────────────────────
// JSON parsing helpers
// ─────────────────────────────────────────────────────────────────────
private static string NodeJson(
string executionId,
string? parentId = null,
int rowCount = 3,
string[]? channels = null,
string[]? statuses = null,
string? siteId = "plant-a",
string? instanceId = "inst-1",
string? first = "2026-05-20T10:00:00Z",
string? last = "2026-05-20T10:01:00Z")
{
var parentStr = parentId != null ? $"\"{parentId}\"" : "null";
var channelArr = channels is { Length: > 0 }
? "[" + string.Join(",", channels.Select(c => $"\"{c}\"")) + "]"
: "[\"ApiOutbound\"]";
var statusArr = statuses is { Length: > 0 }
? "[" + string.Join(",", statuses.Select(s => $"\"{s}\"")) + "]"
: "[\"Delivered\"]";
var siteStr = siteId != null ? $"\"{siteId}\"" : "null";
var instanceStr = instanceId != null ? $"\"{instanceId}\"" : "null";
var firstStr = first != null ? $"\"{first}\"" : "null";
var lastStr = last != null ? $"\"{last}\"" : "null";
return $@"{{
""executionId"":""{executionId}"",
""parentExecutionId"":{parentStr},
""rowCount"":{rowCount},
""channels"":{channelArr},
""statuses"":{statusArr},
""sourceSiteId"":{siteStr},
""sourceInstanceId"":{instanceStr},
""firstOccurredAtUtc"":{firstStr},
""lastOccurredAtUtc"":{lastStr}
}}";
}
// ─────────────────────────────────────────────────────────────────────
// ParseNodes
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void ParseNodes_ValidArray_ReturnsDtos()
{
var root = "11111111-1111-1111-1111-111111111111";
var child = "22222222-2222-2222-2222-222222222222";
var json = $"[{NodeJson(root)},{NodeJson(child, parentId: root)}]";
var nodes = AuditTreeHelpers.ParseNodes(json);
Assert.Equal(2, nodes.Length);
Assert.Equal(Guid.Parse(root), nodes[0].ExecutionId);
Assert.Null(nodes[0].ParentExecutionId);
Assert.Equal(Guid.Parse(child), nodes[1].ExecutionId);
Assert.Equal(Guid.Parse(root), nodes[1].ParentExecutionId);
Assert.Equal(3, nodes[0].RowCount);
}
[Fact]
public void ParseNodes_EmptyArray_ReturnsEmpty()
{
var nodes = AuditTreeHelpers.ParseNodes("[]");
Assert.Empty(nodes);
}
[Fact]
public void ParseNodes_InvalidJson_ReturnsEmpty()
{
var nodes = AuditTreeHelpers.ParseNodes("not-json");
Assert.Empty(nodes);
}
// ─────────────────────────────────────────────────────────────────────
// WriteTable — ASCII tree rendering
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void WriteTable_EmptyNodes_PrintsFallbackMessage()
{
var output = new StringWriter();
AuditTreeHelpers.WriteTable(Array.Empty<AuditTreeNodeDto>(), Guid.NewGuid(), output);
Assert.Contains("no execution tree found", output.ToString());
}
[Fact]
public void WriteTable_SingleRootNode_PrintsWithNoIndent()
{
var rootId = Guid.Parse("11111111-1111-1111-1111-111111111111");
var nodes = AuditTreeHelpers.ParseNodes($"[{NodeJson(rootId.ToString())}]");
var output = new StringWriter();
AuditTreeHelpers.WriteTable(nodes, rootId, output);
var text = output.ToString();
// Root node printed at column 0 (no leading spaces).
var line = text.Split('\n', StringSplitOptions.RemoveEmptyEntries).First();
Assert.StartsWith(rootId.ToString("D"), line);
Assert.Contains("[*]", line); // queried node marked
}
[Fact]
public void WriteTable_MultiLevelTree_IndentsChildrenCorrectly()
{
var rootId = "11111111-1111-1111-1111-111111111111";
var childId = "22222222-2222-2222-2222-222222222222";
var grandChildId = "33333333-3333-3333-3333-333333333333";
var json = $"[{NodeJson(rootId)},{NodeJson(childId, parentId: rootId)},{NodeJson(grandChildId, parentId: childId)}]";
var nodes = AuditTreeHelpers.ParseNodes(json);
var output = new StringWriter();
AuditTreeHelpers.WriteTable(nodes, Guid.Parse(rootId), output);
var lines = output.ToString().Split('\n', StringSplitOptions.RemoveEmptyEntries);
// Root: no indent.
Assert.True(lines[0].StartsWith(rootId, StringComparison.OrdinalIgnoreCase) ||
lines[0].StartsWith(rootId.ToUpper(), StringComparison.OrdinalIgnoreCase));
// Child: 2-space indent (exactly 2, not 4+).
var childLine = lines.First(l => l.Contains(childId));
Assert.StartsWith(" ", childLine);
Assert.False(childLine.StartsWith(" ", StringComparison.Ordinal), "child should be indented exactly 2, not 4+");
// Grandchild: 4-space indent.
var grandLine = lines.First(l => l.Contains(grandChildId));
Assert.StartsWith(" ", grandLine);
}
[Fact]
public void WriteTable_QueriedNodeIsMarked_OthersAreNot()
{
var rootId = Guid.Parse("11111111-1111-1111-1111-111111111111");
var childId = Guid.Parse("22222222-2222-2222-2222-222222222222");
var json = $"[{NodeJson(rootId.ToString())},{NodeJson(childId.ToString(), parentId: rootId.ToString())}]";
var nodes = AuditTreeHelpers.ParseNodes(json);
// Query via child ID — child should be marked, root should not.
var output = new StringWriter();
AuditTreeHelpers.WriteTable(nodes, childId, output);
var lines = output.ToString().Split('\n', StringSplitOptions.RemoveEmptyEntries);
var childLine = lines.First(l => l.Contains(childId.ToString("D")));
var rootLine = lines.First(l => l.Contains(rootId.ToString("D")));
Assert.Contains("[*]", childLine);
Assert.DoesNotContain("[*]", rootLine);
}
// ─────────────────────────────────────────────────────────────────────
// WriteJson
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void WriteJson_ValidNodes_EmitsValidJsonArray()
{
var rootId = "11111111-1111-1111-1111-111111111111";
var childId = "22222222-2222-2222-2222-222222222222";
var nodes = AuditTreeHelpers.ParseNodes($"[{NodeJson(rootId)},{NodeJson(childId, parentId: rootId)}]");
var output = new StringWriter();
AuditTreeHelpers.WriteJson(nodes, output);
var text = output.ToString();
using var doc = JsonDocument.Parse(text);
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
Assert.Equal(2, doc.RootElement.GetArrayLength());
}
[Fact]
public void WriteJson_EmptyNodes_EmitsEmptyArray()
{
var output = new StringWriter();
AuditTreeHelpers.WriteJson(Array.Empty<AuditTreeNodeDto>(), output);
var text = output.ToString().Trim();
using var doc = JsonDocument.Parse(text);
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
Assert.Equal(0, doc.RootElement.GetArrayLength());
}
// ─────────────────────────────────────────────────────────────────────
// RunTreeAsync — HTTP execution
// ─────────────────────────────────────────────────────────────────────
private sealed class FixedHandler : HttpMessageHandler
{
private readonly HttpStatusCode _status;
private readonly string _body;
public FixedHandler(HttpStatusCode status, string body)
{
_status = status;
_body = body;
}
public string? LastRequestUri { get; private set; }
protected override Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request, CancellationToken cancellationToken)
{
LastRequestUri = request.RequestUri!.PathAndQuery;
return Task.FromResult(new HttpResponseMessage(_status)
{
Content = new StringContent(_body, Encoding.UTF8, "application/json"),
});
}
}
[Fact]
public async Task RunTree_Success_ReturnsZeroAndWritesOutput()
{
var rootId = "11111111-1111-1111-1111-111111111111";
var json = $"[{NodeJson(rootId)}]";
var handler = new FixedHandler(HttpStatusCode.OK, json);
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditTreeHelpers.RunTreeAsync(
client, Guid.Parse(rootId), "table", output);
Assert.Equal(0, exit);
Assert.Contains(rootId, output.ToString());
}
[Fact]
public async Task RunTree_EmptyResponse_ReturnsZeroWithFallbackMessage()
{
var handler = new FixedHandler(HttpStatusCode.OK, "[]");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditTreeHelpers.RunTreeAsync(
client, Guid.NewGuid(), "table", output);
Assert.Equal(0, exit);
Assert.Contains("no execution tree found", output.ToString());
}
[Fact]
public async Task RunTree_JsonFormat_EmitsValidJson()
{
var rootId = "11111111-1111-1111-1111-111111111111";
var handler = new FixedHandler(HttpStatusCode.OK, $"[{NodeJson(rootId)}]");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditTreeHelpers.RunTreeAsync(
client, Guid.Parse(rootId), "json", output);
Assert.Equal(0, exit);
using var doc = JsonDocument.Parse(output.ToString());
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
}
[Fact]
public async Task RunTree_Http403_ReturnsExitCode2()
{
var handler = new FixedHandler(HttpStatusCode.Forbidden, "{\"error\":\"nope\",\"code\":\"UNAUTHORIZED\"}");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditTreeHelpers.RunTreeAsync(
client, Guid.NewGuid(), "table", output);
Assert.Equal(2, exit);
}
[Fact]
public async Task RunTree_Http500_ReturnsExitCode1()
{
var handler = new FixedHandler(HttpStatusCode.InternalServerError, "{\"error\":\"boom\",\"code\":\"INTERNAL\"}");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
var exit = await AuditTreeHelpers.RunTreeAsync(
client, Guid.NewGuid(), "table", output);
Assert.Equal(1, exit);
}
[Fact]
public async Task RunTree_RequestUrlContainsExecutionId()
{
var id = Guid.Parse("11111111-1111-1111-1111-111111111111");
var handler = new FixedHandler(HttpStatusCode.OK, "[]");
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
var output = new StringWriter();
await AuditTreeHelpers.RunTreeAsync(client, id, "table", output);
Assert.Contains("11111111-1111-1111-1111-111111111111", handler.LastRequestUri);
Assert.Contains("executionId", handler.LastRequestUri);
}
// ─────────────────────────────────────────────────────────────────────
// CLI parsing — audit tree subcommand
// ─────────────────────────────────────────────────────────────────────
[Fact]
public void Tree_Subcommand_ExistsInAuditCommandGroup()
{
var root = AuditCommandTestHarness.BuildRoot();
var parse = root.Parse(new[] { "audit", "tree", "--help" });
// --help is never an error, exit 0.
Assert.Empty(parse.Errors);
}
[Fact]
public void Tree_ExecutionIdOption_IsRequired()
{
// Invoking without --execution-id must produce an error (the option is Required).
var root = AuditCommandTestHarness.BuildRoot();
var (exit, _, err) = AuditCommandTestHarness.Invoke(root, "audit", "tree");
// System.CommandLine returns non-zero for a missing required option.
Assert.NotEqual(0, exit);
}
[Fact]
public void Tree_HelpText_DescribesExecutionId()
{
var root = AuditCommandTestHarness.BuildRoot();
var output = new StringWriter();
var exit = root.Parse(new[] { "audit", "tree", "--help" })
.Invoke(new InvocationConfiguration { Output = output });
Assert.Equal(0, exit);
Assert.Contains("execution-id", output.ToString());
}
}
@@ -13,6 +13,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Notification;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
using ZB.MOM.WW.ScadaBridge.Communication;
using ZB.MOM.WW.ScadaBridge.HealthMonitoring;
using HealthPage = ZB.MOM.WW.ScadaBridge.CentralUI.Components.Pages.Monitoring.Health;
@@ -232,13 +233,18 @@ public class HealthPageTests : BunitContext
/// <summary>
/// Stand-in for the Site Call Audit actor. Replies to the KPI request with
/// the test's currently-scripted response.
/// the test's currently-scripted response. Also handles the per-node KPI
/// request (T6: M5.2) with an empty-nodes success reply so the Health page
/// can complete initialization without a 30-second Ask timeout.
/// </summary>
private sealed class ScriptedSiteCallAuditActor : ReceiveActor
{
public ScriptedSiteCallAuditActor(HealthPageTests test)
{
Receive<SiteCallKpiRequest>(_ => Sender.Tell(test._siteCallKpiReply));
Receive<PerNodeSiteCallKpiRequest>(req => Sender.Tell(
new PerNodeSiteCallKpiResponse(req.CorrelationId, Success: true, ErrorMessage: null,
Nodes: Array.Empty<SiteCallNodeKpiSnapshot>())));
}
}
}
@@ -153,7 +153,9 @@ public class NotificationKpisPageTests : BunitContext
/// <summary>
/// Stand-in for the notification-outbox actor. Replies to each KPI message
/// type with the test's currently-scripted response.
/// type with the test's currently-scripted response. Also handles the per-node
/// KPI request (T6: M5.2) with an empty-nodes success reply so the page can
/// complete initialization without a 30-second Ask timeout.
/// </summary>
private sealed class ScriptedOutboxActor : ReceiveActor
{
@@ -161,6 +163,9 @@ public class NotificationKpisPageTests : BunitContext
{
Receive<NotificationKpiRequest>(_ => Sender.Tell(test._kpiReply));
Receive<PerSiteNotificationKpiRequest>(_ => Sender.Tell(test._perSiteReply));
Receive<PerNodeNotificationKpiRequest>(req => Sender.Tell(
new PerNodeNotificationKpiResponse(req.CorrelationId, Success: true, ErrorMessage: null,
Nodes: Array.Empty<NodeNotificationKpiSnapshot>())));
}
}
}
@@ -31,9 +31,40 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
/// targeting the AuditLog entity are NOT covered and must never be introduced.
/// Additionally, the scan is line-oriented: DML where the keyword and table name appear
/// on separate lines is an accepted, undetected edge case.
///
/// <b>Allow-list.</b> Two narrow maintenance-path exemptions carry the exact
/// <see cref="AuditPurgeAllowedMarker"/> trailing comment:
/// <list type="bullet">
/// <item><description>
/// M5.5 (T3) — <c>AuditLogRepository.PurgeChannelOlderThanAsync</c>: the
/// one sanctioned batched <c>DELETE TOP (@batch) FROM dbo.AuditLog</c>,
/// running on the purge/maintenance connection.
/// </description></item>
/// <item><description>
/// M5.6 (T5) — <c>AuditLogRepository.BackfillSourceNodeAsync</c>: the
/// one sanctioned batched <c>UPDATE TOP (@batch) dbo.AuditLog SET SourceNode</c>,
/// running on the maintenance connection. The sentinel backfill is a
/// one-time ops procedure; the append-only invariant still applies to all
/// other columns and all other UPDATE forms.
/// </description></item>
/// </list>
/// The allow-list is applied in the file-scan test only
/// (<see cref="ConfigurationDatabase_ShouldNotContainAuditLogMutations"/>) — the
/// raw mutation matcher (<see cref="ContainsAuditLogMutation"/>) is marker-blind,
/// so the matcher's self-tests remain honest and any OTHER UPDATE/DELETE against
/// AuditLog (or any DML lacking the marker) still fails the build.
/// </summary>
public class AuditLogAppendOnlyGuardTests
{
/// <summary>
/// The exact trailing-comment marker that exempts a single sanctioned
/// maintenance-path DML line from the append-only guard. Carried at the END of
/// the SQL constant string in both <c>AuditLogRepository.PurgeChannelOlderThanAsync</c>
/// (M5.5 T3 batched DELETE) and <c>AuditLogRepository.BackfillSourceNodeAsync</c>
/// (M5.6 T5 batched UPDATE). Kept deliberately specific so it cannot be pasted
/// onto an unrelated mutation without a reviewer noticing.
/// </summary>
internal const string AuditPurgeAllowedMarker = "AUDIT-PURGE-ALLOWED";
// ---------------------------------------------------------------------------
// Source root location — same walk-up pattern used by ArchitecturalConstraintTests
// in the Commons.Tests project.
@@ -133,11 +164,38 @@ public class AuditLogAppendOnlyGuardTests
return AuditLogMutationPattern.IsMatch(text);
}
// The DELETE branch tolerates an optional TOP (...) batch-size clause between
// DELETE and the (optional) FROM — e.g. "DELETE TOP (@batch) FROM dbo.AuditLog"
// (the M5.5 T3 batched purge shape). Without this the guard would silently miss a
// batched row DELETE against AuditLog, which is exactly the kind of mutation it
// must catch. The TOP sub-pattern is (?:TOP\s*\(.*?\)\s+)? — optional, lazy inside
// the parens so it never swallows past the matching ')'.
//
// The UPDATE branch similarly tolerates an optional TOP (...) clause between
// UPDATE and (optional schema.) AuditLog — e.g.
// "UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel …"
// (the M5.6 T5 batched backfill shape).
private static readonly Regex AuditLogMutationPattern = new(
@"\bUPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
@"|\bDELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
@"\bUPDATE\s+(?:TOP\s*\(.*?\)\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
@"|\bDELETE\s+(?:TOP\s*\(.*?\)\s+)?(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
/// <summary>
/// Returns <see langword="true"/> when <paramref name="line"/> carries the narrow
/// <see cref="AuditPurgeAllowedMarker"/> exemption. Sanctioned uses are:
/// <list type="bullet">
/// <item><description>M5.5 T3 — the per-channel maintenance-path batched DELETE.</description></item>
/// <item><description>M5.6 T5 — the SourceNode sentinel batched UPDATE.</description></item>
/// </list>
/// A flagged line that lacks the marker is NOT allow-listed. The mutation matcher
/// itself stays marker-blind; the allow-list is applied only by the file-scan test,
/// so the matcher's self-tests still observe the raw mutation.
/// </summary>
/// <param name="line">A single source line already known to contain a mutation.</param>
/// <returns><see langword="true"/> if the line is a sanctioned maintenance-path exemption.</returns>
internal static bool IsAllowListed(string line) =>
line.Contains(AuditPurgeAllowedMarker, StringComparison.Ordinal);
// ---------------------------------------------------------------------------
// Guard test: scan every *.cs file in ConfigurationDatabase (excluding
// Designer/Snapshot EF artefacts and the obj/ directory).
@@ -168,7 +226,7 @@ public class AuditLogAppendOnlyGuardTests
var lines = content.Split('\n');
for (var i = 0; i < lines.Length; i++)
{
if (ContainsAuditLogMutation(lines[i]))
if (ContainsAuditLogMutation(lines[i]) && !IsAllowListed(lines[i]))
{
var relativePath = Path.GetRelativePath(sourceDir, file);
violations.Add($"{relativePath}:{i + 1}: {lines[i].Trim()}");
@@ -179,7 +237,7 @@ public class AuditLogAppendOnlyGuardTests
Assert.True(violations.Count == 0,
"AuditLog append-only guard: found UPDATE/DELETE targeting dbo.AuditLog " +
"in ConfigurationDatabase source. AuditLog is APPEND-ONLY (retention uses " +
"partition-switch DDL, not row DELETE). Violation(s):\n" +
"partition-switch DDL, not row DELETE/UPDATE). Violation(s):\n" +
string.Join("\n", violations));
}
@@ -285,6 +343,27 @@ public class AuditLogAppendOnlyGuardTests
// DELETE FROM [AuditLog] — bracketed table, no schema prefix.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM [AuditLog] WHERE OccurredAtUtc < @threshold;"));
// ---- Batched DELETE TOP (...) forms (M5.5 T3 purge shape) ----
// The matcher must catch a batched DELETE against AuditLog regardless of the
// marker — the allow-list (IsAllowListed) is what forgives the ONE sanctioned
// line, not the matcher.
Assert.True(ContainsAuditLogMutation(
"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;"));
Assert.True(ContainsAuditLogMutation(
"DELETE TOP (5000) FROM dbo.AuditLog WHERE OccurredAtUtc < @threshold;"));
Assert.True(ContainsAuditLogMutation(
"DELETE TOP(100) FROM [dbo].[AuditLog] WHERE Status = 'Parked';"));
// ---- Batched UPDATE TOP (...) forms (M5.6 T5 backfill shape) ----
// The matcher must also catch a batched UPDATE against AuditLog, regardless of
// the marker — the allow-list is what forgives the ONE sanctioned backfill line.
Assert.True(ContainsAuditLogMutation(
"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;"));
Assert.True(ContainsAuditLogMutation(
"UPDATE TOP (500) dbo.AuditLog SET SourceNode = 'unknown' WHERE SourceNode IS NULL;"));
Assert.True(ContainsAuditLogMutation(
"UPDATE TOP(100) [dbo].[AuditLog] SET SourceNode = @s WHERE SourceNode IS NULL;"));
}
[Fact]
@@ -315,4 +394,75 @@ public class AuditLogAppendOnlyGuardTests
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.SiteCalls WHERE TerminalAtUtc < @cutoff;"));
}
// ---------------------------------------------------------------------------
// Allow-list self-tests (M5.5 T3 / M5.6 T5) — prove the narrow exemption only
// forgives the marked maintenance-path DML and still blocks everything else.
// ---------------------------------------------------------------------------
[Fact]
public void AllowList_ForgivesMarkedPurgeDelete_ButMatcherStillTrips()
{
// The sanctioned per-channel purge DELETE — verbatim shape from
// AuditLogRepository.PurgeChannelOlderThanAsync, carrying the trailing marker.
const string sanctioned =
"\"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;\"; " +
"// AUDIT-PURGE-ALLOWED: per-channel retention override (M5.5 T3), maintenance path";
// The raw matcher STILL sees the mutation (the matcher is marker-blind) ...
Assert.True(ContainsAuditLogMutation(sanctioned));
// ... but the allow-list forgives it because of the trailing marker.
Assert.True(IsAllowListed(sanctioned));
}
[Fact]
public void AllowList_ForgivesMarkedBackfillUpdate_ButMatcherStillTrips()
{
// The sanctioned SourceNode sentinel backfill UPDATE — verbatim shape from
// AuditLogRepository.BackfillSourceNodeAsync, carrying the trailing marker.
const string sanctioned =
"\"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;\"; " +
"// AUDIT-PURGE-ALLOWED: SourceNode sentinel backfill (M5.6 T5), maintenance path";
// The raw matcher STILL sees the mutation (the matcher is marker-blind) ...
Assert.True(ContainsAuditLogMutation(sanctioned));
// ... but the allow-list forgives it because of the trailing marker.
Assert.True(IsAllowListed(sanctioned));
}
[Fact]
public void AllowList_DoesNotForgive_UnmarkedStrayDelete()
{
// A stray DELETE against AuditLog WITHOUT the marker — exactly the kind of
// regression the guard exists to catch. It must be flagged (matcher) AND not
// forgiven (allow-list), so the file-scan test would record it as a violation.
const string stray = "DELETE FROM dbo.AuditLog WHERE Status = 'Parked';";
Assert.True(ContainsAuditLogMutation(stray));
Assert.False(IsAllowListed(stray),
"A DELETE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
}
[Fact]
public void AllowList_DoesNotForgive_UnmarkedStrayUpdate()
{
// A stray UPDATE against AuditLog WITHOUT the marker — must still trip the guard.
const string stray = "UPDATE dbo.AuditLog SET Status = 'Corrected' WHERE EventId = @id;";
Assert.True(ContainsAuditLogMutation(stray));
Assert.False(IsAllowListed(stray),
"An UPDATE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
}
[Fact]
public void AllowList_DoesNotForgive_BatchedUpdateWithoutMarker()
{
// A batched UPDATE TOP ... AuditLog without the marker — the TOP clause variant
// must also be caught and not forgiven without the explicit marker.
const string stray = "UPDATE TOP (500) dbo.AuditLog SET SourceNode = 'unknown' WHERE SourceNode IS NULL;";
Assert.True(ContainsAuditLogMutation(stray));
Assert.False(IsAllowListed(stray),
"A batched UPDATE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
}
}
@@ -0,0 +1,237 @@
using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Repositories;
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests.Migrations;
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests.Maintenance;
/// <summary>
/// Integration tests for <see cref="AuditLogRepository.BackfillSourceNodeAsync"/>
/// (M5.6 T5 — SourceNode sentinel backfill).
///
/// <para>
/// These tests exercise the real <see cref="AuditLogRepository"/> against a
/// per-class <see cref="MsSqlMigrationFixture"/> database, mirroring the
/// style of <c>PartitionPurgeTests</c>. All tests are guarded with
/// <c>[SkippableFact]</c> and skipped when the MSSQL container is absent.
/// </para>
/// </summary>
public class BackfillSourceNodeTests : IClassFixture<MsSqlMigrationFixture>
{
private readonly MsSqlMigrationFixture _fixture;
public BackfillSourceNodeTests(MsSqlMigrationFixture fixture)
{
_fixture = fixture;
}
private ScadaBridgeDbContext CreateContext() =>
new(new DbContextOptionsBuilder<ScadaBridgeDbContext>()
.UseSqlServer(_fixture.ConnectionString).Options);
private AuditLogRepository CreateRepo(ScadaBridgeDbContext ctx) => new(ctx);
// ------------------------------------------------------------------
// Seed helper: direct INSERT bypassing the writer role, same pattern
// as PartitionPurgeTests.DirectInsertAsync.
// ------------------------------------------------------------------
private async Task SeedRowAsync(
SqlConnection conn,
Guid eventId,
DateTime occurredAtUtc,
string? sourceNode)
{
await using var cmd = conn.CreateCommand();
// Supply SourceNode explicitly (NULL or a value) so the test controls
// which rows are eligible for backfill.
cmd.CommandText = @"
INSERT INTO dbo.AuditLog
(EventId, OccurredAtUtc, Actor, Action, Outcome, Category, Target, SourceNode, CorrelationId, DetailsJson)
VALUES
(@EventId, @OccurredAtUtc, NULL, 'ApiOutbound.ApiCall', 'Success', 'ApiOutbound', NULL, @SourceNode, NULL,
@DetailsJson);";
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
var occurredParam = cmd.Parameters.Add("@OccurredAtUtc", System.Data.SqlDbType.DateTime2);
occurredParam.Scale = 7;
occurredParam.Value = occurredAtUtc;
var sourceNodeParam = cmd.Parameters.Add("@SourceNode", System.Data.SqlDbType.VarChar, 64);
sourceNodeParam.Value = (object?)sourceNode ?? DBNull.Value;
var detailsJson =
"{\"channel\":\"ApiOutbound\",\"kind\":\"ApiCall\",\"status\":\"Delivered\"," +
"\"payloadTruncated\":false}";
cmd.Parameters.Add("@DetailsJson", System.Data.SqlDbType.NVarChar, -1).Value = detailsJson;
await cmd.ExecuteNonQueryAsync();
}
private async Task<string?> ReadSourceNodeAsync(SqlConnection conn, Guid eventId)
{
await using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT SourceNode FROM dbo.AuditLog WHERE EventId = @EventId;";
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
var raw = await cmd.ExecuteScalarAsync();
return raw == DBNull.Value ? null : (string?)raw;
}
// ------------------------------------------------------------------
// 1. SetsNullRowsBeforeThreshold
// ------------------------------------------------------------------
[SkippableFact]
public async Task BackfillSourceNode_SetsNullRowsBeforeThreshold()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
var eligibleId = Guid.NewGuid(); // NULL, occurred before threshold
var tooNewId = Guid.NewGuid(); // NULL, occurred after threshold
await using var seedConn = _fixture.OpenConnection();
await SeedRowAsync(seedConn, eligibleId,
new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
await SeedRowAsync(seedConn, tooNewId,
new DateTime(2026, 4, 1, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
await using var ctx = CreateContext();
var repo = CreateRepo(ctx);
var rows = await repo.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
Assert.True(rows >= 1, $"Expected at least 1 row updated; got {rows}.");
// eligible row: must now have the sentinel
var eligibleNode = await ReadSourceNodeAsync(seedConn, eligibleId);
Assert.Equal("unknown", eligibleNode);
// too-new row: must still be NULL
var tooNewNode = await ReadSourceNodeAsync(seedConn, tooNewId);
Assert.Null(tooNewNode);
}
// ------------------------------------------------------------------
// 2. LeavesNonNullRowsUntouched
// ------------------------------------------------------------------
[SkippableFact]
public async Task BackfillSourceNode_LeavesNonNullRowsUntouched()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
var alreadySetId = Guid.NewGuid(); // already has a SourceNode value
await using var seedConn = _fixture.OpenConnection();
await SeedRowAsync(seedConn, alreadySetId,
new DateTime(2026, 1, 10, 0, 0, 0, DateTimeKind.Utc), sourceNode: "node-a");
await using var ctx = CreateContext();
var repo = CreateRepo(ctx);
await repo.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
// "node-a" must still be "node-a", not overwritten
var node = await ReadSourceNodeAsync(seedConn, alreadySetId);
Assert.Equal("node-a", node);
}
// ------------------------------------------------------------------
// 3. Idempotent_SecondRunUpdatesZeroRows
// ------------------------------------------------------------------
[SkippableFact]
public async Task BackfillSourceNode_Idempotent_SecondRunUpdatesZeroRows()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
var idempotentId = Guid.NewGuid();
await using var seedConn = _fixture.OpenConnection();
await SeedRowAsync(seedConn, idempotentId,
new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
await using var ctx1 = CreateContext();
var repo1 = CreateRepo(ctx1);
var firstRun = await repo1.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
Assert.True(firstRun >= 1, "First run should update at least 1 row.");
// Second run: no NULL rows remain for this threshold — must update 0.
await using var ctx2 = CreateContext();
var repo2 = CreateRepo(ctx2);
var secondRun = await repo2.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
// The second run must not update the already-sentinel row again.
// We cannot assert exactly 0 because other tests share the same fixture DB
// and may have left unrelated NULL rows; but the idempotentId row must not
// have been touched (it already has "unknown", so the WHERE SourceNode IS NULL
// filter excludes it).
var node = await ReadSourceNodeAsync(seedConn, idempotentId);
Assert.Equal("unknown", node);
// The second run returning 0 would be true if no other NULL rows exist —
// we assert the contract from the repo's perspective by checking the row.
_ = secondRun; // acknowledged: value consumed
}
// ------------------------------------------------------------------
// 4. CustomSentinelIsWritten
// ------------------------------------------------------------------
[SkippableFact]
public async Task BackfillSourceNode_CustomSentinel_IsWritten()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var before = new DateTime(2026, 6, 1, 0, 0, 0, DateTimeKind.Utc);
var customId = Guid.NewGuid();
await using var seedConn = _fixture.OpenConnection();
await SeedRowAsync(seedConn, customId,
new DateTime(2026, 2, 5, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
await using var ctx = CreateContext();
var repo = CreateRepo(ctx);
await repo.BackfillSourceNodeAsync("pre-feature", before, batchSize: 1000);
var node = await ReadSourceNodeAsync(seedConn, customId);
Assert.Equal("pre-feature", node);
}
// ------------------------------------------------------------------
// 5. ArgumentValidation
// ------------------------------------------------------------------
[Fact]
public async Task BackfillSourceNode_EmptySentinel_Throws()
{
// Guard fires even without a DB connection — no Skip needed.
// Use a null/empty context via a degenerate connection string; the
// argument check fires before any SQL runs.
await using var ctx = new ScadaBridgeDbContext(
new DbContextOptionsBuilder<ScadaBridgeDbContext>()
.UseSqlServer("Server=.;Database=dummy;Connect Timeout=0;")
.Options);
var repo = new AuditLogRepository(ctx);
await Assert.ThrowsAsync<ArgumentException>(
() => repo.BackfillSourceNodeAsync("", DateTime.UtcNow, 1000));
}
[Fact]
public async Task BackfillSourceNode_ZeroBatchSize_Throws()
{
await using var ctx = new ScadaBridgeDbContext(
new DbContextOptionsBuilder<ScadaBridgeDbContext>()
.UseSqlServer("Server=.;Database=dummy;Connect Timeout=0;")
.Options);
var repo = new AuditLogRepository(ctx);
await Assert.ThrowsAsync<ArgumentOutOfRangeException>(
() => repo.BackfillSourceNodeAsync("unknown", DateTime.UtcNow, 0));
}
}
@@ -0,0 +1,128 @@
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Notifications;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Repositories;
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
// Coverage for per-node KPI aggregation in the Notification Outbox repository
// (T6: M5.2 per-node stuck-count KPIs).
public class NotificationOutboxRepositoryPerNodeKpiTests
{
private static ScadaBridgeDbContext NewContext() => SqliteTestHelper.CreateInMemoryContext();
private static Notification NewNotification(
string sourceSiteId,
NotificationStatus status,
DateTimeOffset createdAt,
DateTimeOffset? deliveredAt = null,
string? sourceNode = null)
{
return new Notification(
Guid.NewGuid().ToString(), NotificationType.Email, "Ops List", "Subject", "Body", sourceSiteId)
{
Status = status,
CreatedAt = createdAt,
DeliveredAt = deliveredAt,
SourceNode = sourceNode,
};
}
[Fact]
public async Task ComputePerNodeKpisAsync_AggregatesMetricsPerNode()
{
await using var ctx = NewContext();
var now = DateTimeOffset.UtcNow;
// node-a: 1 pending (stuck, created 20m ago), 1 parked
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
createdAt: now.AddMinutes(-20), sourceNode: "node-a"));
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Parked,
createdAt: now.AddMinutes(-5), sourceNode: "node-a"));
// node-b: 1 delivered in-window, 1 pending (fresh)
ctx.Notifications.Add(NewNotification("plant-b", NotificationStatus.Delivered,
createdAt: now.AddHours(-2), deliveredAt: now.AddMinutes(-2), sourceNode: "node-b"));
ctx.Notifications.Add(NewNotification("plant-b", NotificationStatus.Pending,
createdAt: now.AddMinutes(-1), sourceNode: "node-b"));
// NULL SourceNode — must be excluded from per-node results
ctx.Notifications.Add(NewNotification("plant-c", NotificationStatus.Pending,
createdAt: now.AddMinutes(-5), sourceNode: null));
await ctx.SaveChangesAsync();
var repo = new NotificationOutboxRepository(ctx);
var result = await repo.ComputePerNodeKpisAsync(
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
// Only node-a and node-b — the null-node row is excluded.
Assert.Equal(2, result.Count);
var a = result.Single(n => n.SourceNode == "node-a");
Assert.Equal(1, a.QueueDepth);
Assert.Equal(1, a.StuckCount);
Assert.Equal(1, a.ParkedCount);
Assert.Equal(0, a.DeliveredLastInterval);
Assert.NotNull(a.OldestPendingAge);
var b = result.Single(n => n.SourceNode == "node-b");
Assert.Equal(1, b.QueueDepth);
Assert.Equal(0, b.StuckCount);
Assert.Equal(0, b.ParkedCount);
Assert.Equal(1, b.DeliveredLastInterval);
Assert.NotNull(b.OldestPendingAge);
}
[Fact]
public async Task ComputePerNodeKpisAsync_ExcludesNullSourceNode()
{
await using var ctx = NewContext();
var now = DateTimeOffset.UtcNow;
// Only null-node rows — result must be empty.
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
createdAt: now.AddMinutes(-5), sourceNode: null));
await ctx.SaveChangesAsync();
var repo = new NotificationOutboxRepository(ctx);
var result = await repo.ComputePerNodeKpisAsync(
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
Assert.Empty(result);
}
[Fact]
public async Task ComputePerNodeKpisAsync_ReturnsEmpty_WhenNoNotifications()
{
await using var ctx = NewContext();
var repo = new NotificationOutboxRepository(ctx);
var result = await repo.ComputePerNodeKpisAsync(
DateTimeOffset.UtcNow, DateTimeOffset.UtcNow.AddMinutes(-30));
Assert.Empty(result);
}
[Fact]
public async Task ComputePerNodeKpisAsync_OldestPendingAge_ReflectsOlderRow()
{
await using var ctx = NewContext();
var now = DateTimeOffset.UtcNow;
// node-a: pending 90m ago, retrying 40m ago.
// OldestPendingAge must reflect the 90m row.
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
createdAt: now.AddMinutes(-90), sourceNode: "node-a"));
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Retrying,
createdAt: now.AddMinutes(-40), sourceNode: "node-a"));
await ctx.SaveChangesAsync();
var repo = new NotificationOutboxRepository(ctx);
var result = await repo.ComputePerNodeKpisAsync(
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
var a = result.Single(n => n.SourceNode == "node-a");
Assert.Equal(2, a.QueueDepth);
Assert.Equal(2, a.StuckCount);
Assert.NotNull(a.OldestPendingAge);
Assert.True(a.OldestPendingAge >= TimeSpan.FromMinutes(85),
$"expected OldestPendingAge >= 85m, got {a.OldestPendingAge}");
Assert.True(a.OldestPendingAge < TimeSpan.FromMinutes(95),
$"expected OldestPendingAge < 95m, got {a.OldestPendingAge}");
}
}
@@ -497,6 +497,54 @@ public class SiteCallAuditRepositoryTests : IClassFixture<MsSqlMigrationFixture>
Assert.Null(b.OldestPendingAge);
}
[SkippableFact]
public async Task ComputePerNodeKpisAsync_ScopesCountsToEachNode()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Use unique site + node combos to isolate from other tests running
// concurrently on the shared MsSql fixture.
var nodeId = "node-b3-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var nodeB = nodeId + "-b";
await using var context = CreateContext();
var repo = new SiteCallAuditRepository(context);
var now = DateTime.UtcNow;
var stuckCutoff = now.AddMinutes(-10);
var intervalSince = now.AddHours(-1);
// nodeId: 2 buffered (one stuck), 1 parked.
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
createdAtUtc: now.AddMinutes(-30), sourceNode: nodeId));
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
createdAtUtc: now.AddMinutes(-2), sourceNode: nodeId));
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Parked",
createdAtUtc: now.AddMinutes(-5), terminal: true, sourceNode: nodeId));
// nodeB: 1 delivered within interval only.
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Delivered",
createdAtUtc: now.AddMinutes(-4), updatedAtUtc: now.AddMinutes(-1),
terminal: true, terminalAtUtc: now.AddMinutes(-1), sourceNode: nodeB));
// Null SourceNode row — must NOT appear in per-node results.
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
createdAtUtc: now.AddMinutes(-3), sourceNode: null));
var perNode = await repo.ComputePerNodeKpisAsync(stuckCutoff, intervalSince);
var na = Assert.Single(perNode, n => n.SourceNode == nodeId);
Assert.Equal(2, na.BufferedCount);
Assert.Equal(1, na.ParkedCount);
Assert.Equal(1, na.StuckCount);
Assert.NotNull(na.OldestPendingAge);
var nb = Assert.Single(perNode, n => n.SourceNode == nodeB);
Assert.Equal(0, nb.BufferedCount);
Assert.Equal(1, nb.DeliveredLastInterval);
Assert.Null(nb.OldestPendingAge);
// Null-node row must be absent.
Assert.DoesNotContain(perNode, n => n.SourceNode is null);
}
// --- helpers ------------------------------------------------------------
private ScadaBridgeDbContext CreateContext()
@@ -1022,4 +1022,429 @@ public class AuditWriteMiddlewareTests
var evt = Assert.Single(writer.Events);
Assert.Equal(requestJson, evt.RequestSummary);
}
// ---------------------------------------------------------------------
// M5.3 (T7) Increment 1: Request headers in Extra JSON
// Request headers are captured into the Extra JSON object alongside the
// existing remoteIp / userAgent fields. Sensitive headers (e.g.
// Authorization, X-Api-Key) are redacted to "<redacted>" using the same
// HeaderRedactList as ScadaBridgeAuditRedactor.
// ---------------------------------------------------------------------
[Fact]
public async Task RequestHeaders_AppearInExtra_UnderRequestHeadersKey()
{
var writer = new RecordingAuditWriter();
var ctx = BuildContext();
ctx.Request.Headers["X-Custom-Header"] = "custom-value";
var mw = CreateMiddleware(_ =>
{
ctx.Response.StatusCode = 200;
return Task.CompletedTask;
}, writer);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
Assert.NotNull(evt.Extra);
using var doc = JsonDocument.Parse(evt.Extra!);
var root = doc.RootElement;
// Extra must carry a requestHeaders object.
Assert.True(root.TryGetProperty("requestHeaders", out var headers),
"Extra JSON must contain a 'requestHeaders' property");
Assert.Equal(JsonValueKind.Object, headers.ValueKind);
// The non-sensitive custom header must appear unredacted.
Assert.True(headers.TryGetProperty("X-Custom-Header", out var customVal),
"requestHeaders must contain 'X-Custom-Header'");
Assert.Equal("custom-value", customVal.GetString());
}
[Fact]
public async Task RequestHeaders_AuthorizationHeader_IsRedacted()
{
// Authorization is in the default HeaderRedactList and must appear as
// "<redacted>" rather than the real token value.
var writer = new RecordingAuditWriter();
var ctx = BuildContext();
ctx.Request.Headers["Authorization"] = "Bearer secret-token-abc";
var mw = CreateMiddleware(_ =>
{
ctx.Response.StatusCode = 200;
return Task.CompletedTask;
}, writer);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
Assert.NotNull(evt.Extra);
using var doc = JsonDocument.Parse(evt.Extra!);
var root = doc.RootElement;
var headers = root.GetProperty("requestHeaders");
Assert.True(headers.TryGetProperty("Authorization", out var authVal),
"requestHeaders must contain 'Authorization'");
Assert.Equal("<redacted>", authVal.GetString());
}
[Fact]
public async Task RequestHeaders_XApiKeyHeader_IsRedacted()
{
// X-Api-Key is in the default HeaderRedactList and must be redacted.
var writer = new RecordingAuditWriter();
var ctx = BuildContext();
ctx.Request.Headers["X-Api-Key"] = "sbk_12345_secretkey";
var mw = CreateMiddleware(_ =>
{
ctx.Response.StatusCode = 200;
return Task.CompletedTask;
}, writer);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
Assert.NotNull(evt.Extra);
using var doc = JsonDocument.Parse(evt.Extra!);
var root = doc.RootElement;
var headers = root.GetProperty("requestHeaders");
Assert.True(headers.TryGetProperty("X-Api-Key", out var keyVal));
Assert.Equal("<redacted>", keyVal.GetString());
}
[Fact]
public async Task RequestHeaders_CustomRedactListEntry_IsRedacted()
{
// A non-default entry added to HeaderRedactList must also be redacted.
var opts = new AuditLogOptions
{
HeaderRedactList = new List<string>
{
"Authorization", "X-Api-Key", "Cookie", "Set-Cookie",
"X-Internal-Secret", // custom addition
},
};
var writer = new RecordingAuditWriter();
var ctx = BuildContext();
ctx.Request.Headers["X-Internal-Secret"] = "my-secret-value";
ctx.Request.Headers["X-Safe-Header"] = "safe-value";
var mw = CreateMiddleware(
_ =>
{
ctx.Response.StatusCode = 200;
return Task.CompletedTask;
},
writer,
options: opts);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
using var doc = JsonDocument.Parse(evt.Extra!);
var headers = doc.RootElement.GetProperty("requestHeaders");
Assert.Equal("<redacted>", headers.GetProperty("X-Internal-Secret").GetString());
Assert.Equal("safe-value", headers.GetProperty("X-Safe-Header").GetString());
}
[Fact]
public async Task RequestHeaders_Redaction_IsCaseInsensitive()
{
// HeaderRedactList match must be case-insensitive (mirrors the
// ScadaBridgeAuditRedactor behaviour — the redact set uses
// OrdinalIgnoreCase).
var writer = new RecordingAuditWriter();
var ctx = BuildContext();
// Vary the casing from the list entry ("Authorization").
ctx.Request.Headers["authorization"] = "Bearer lower-case-token";
var mw = CreateMiddleware(_ =>
{
ctx.Response.StatusCode = 200;
return Task.CompletedTask;
}, writer);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
using var doc = JsonDocument.Parse(evt.Extra!);
var headers = doc.RootElement.GetProperty("requestHeaders");
// ASP.NET Core normalises the header name to "authorization" in the dict;
// the redact set (OrdinalIgnoreCase) must still match it.
Assert.Equal("<redacted>", headers.GetProperty("authorization").GetString());
}
// ---------------------------------------------------------------------
// M5.3 (T7) Increment 2: AuditInboundCeilingHits counter
// When request OR response exceeds InboundMaxBytes, the middleware
// increments IAuditInboundCeilingHitsCounter once per request.
// ---------------------------------------------------------------------
/// <summary>
/// In-memory <see cref="IAuditInboundCeilingHitsCounter"/> that records
/// every <see cref="Increment"/> call.
/// </summary>
private sealed class RecordingCeilingHitsCounter : ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter
{
private int _count;
public int Count => Volatile.Read(ref _count);
public void Increment() => Interlocked.Increment(ref _count);
}
private static AuditWriteMiddleware CreateMiddlewareWithCounter(
RequestDelegate next,
ICentralAuditWriter writer,
AuditLogOptions? options,
ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter counter) =>
new(
next,
writer,
NullLogger<AuditWriteMiddleware>.Instance,
new StaticAuditLogOptionsMonitor(options ?? new AuditLogOptions()),
actorAccessor: null,
ceilingHitsCounter: counter);
[Fact]
public async Task RequestBody_AboveInboundMaxBytes_IncrementsCeilingHitsCounter()
{
const int cap = 1024;
var bigBody = new string('x', cap + 100);
var writer = new RecordingAuditWriter();
var counter = new RecordingCeilingHitsCounter();
var ctx = BuildContext(body: bigBody);
var mw = CreateMiddlewareWithCounter(
hc =>
{
hc.Response.StatusCode = 200;
return Task.CompletedTask;
},
writer,
options: new AuditLogOptions { InboundMaxBytes = cap },
counter: counter);
await mw.InvokeAsync(ctx);
Assert.Equal(1, counter.Count);
// Verify the truncation did happen to confirm ceiling was hit.
var evt = Assert.Single(writer.Events);
Assert.True(evt.PayloadTruncated);
}
[Fact]
public async Task ResponseBody_AboveInboundMaxBytes_IncrementsCeilingHitsCounter()
{
const int cap = 1024;
var bigResponse = new string('y', cap + 100);
var writer = new RecordingAuditWriter();
var counter = new RecordingCeilingHitsCounter();
var ctx = BuildContext();
ctx.Response.Body = new MemoryStream();
var mw = CreateMiddlewareWithCounter(
async hc =>
{
hc.Response.StatusCode = 200;
await hc.Response.WriteAsync(bigResponse);
},
writer,
options: new AuditLogOptions { InboundMaxBytes = cap },
counter: counter);
await mw.InvokeAsync(ctx);
Assert.Equal(1, counter.Count);
var evt = Assert.Single(writer.Events);
Assert.True(evt.PayloadTruncated);
}
[Fact]
public async Task NormalRequest_WithinCap_DoesNotIncrementCeilingHitsCounter()
{
var writer = new RecordingAuditWriter();
var counter = new RecordingCeilingHitsCounter();
var smallBody = "{\"ok\":true}";
var ctx = BuildContext(body: smallBody);
// Cap is well above the body size.
var mw = CreateMiddlewareWithCounter(
hc =>
{
hc.Response.StatusCode = 200;
return Task.CompletedTask;
},
writer,
options: new AuditLogOptions { InboundMaxBytes = 8192 },
counter: counter);
await mw.InvokeAsync(ctx);
Assert.Equal(0, counter.Count);
}
// ---------------------------------------------------------------------
// M5.3 (T7) Increment 3: SkipBodyCapture per-method opt-out
// A target with SkipBodyCapture=true produces an audit row with
// headers/metadata but empty/omitted body. A normal target still captures.
// ---------------------------------------------------------------------
private static DefaultHttpContext BuildContextWithRoute(
string methodName,
string? body = null)
{
var ctx = new DefaultHttpContext();
ctx.Request.Method = "POST";
ctx.Request.Path = $"/api/{methodName}";
ctx.Request.RouteValues["methodName"] = methodName;
ctx.Connection.RemoteIpAddress = System.Net.IPAddress.Parse("10.0.0.1");
if (body is not null)
{
var bytes = Encoding.UTF8.GetBytes(body);
ctx.Request.Body = new MemoryStream(bytes);
ctx.Request.ContentLength = bytes.Length;
ctx.Request.ContentType = "application/json";
}
return ctx;
}
[Fact]
public async Task SkipBodyCapture_True_AuditRowEmitted_ButBodyIsNull()
{
// A target with SkipBodyCapture=true must produce an audit row (the
// row must not be suppressed entirely) but RequestSummary and
// ResponseSummary must both be null — only the body is omitted.
var writer = new RecordingAuditWriter();
var opts = new AuditLogOptions
{
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
{
["secret-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
{
SkipBodyCapture = true,
},
},
};
var ctx = BuildContextWithRoute("secret-method", body: "{\"sensitive\":\"data\"}");
var mw = CreateMiddleware(
async hc =>
{
hc.Response.StatusCode = 200;
await hc.Response.WriteAsync("{\"result\":\"secret\"}");
},
writer,
options: opts);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
// Row IS emitted — only the body content is suppressed.
Assert.Equal("secret-method", evt.Target);
Assert.Equal(AuditStatus.Delivered, evt.Status);
// Bodies are null — SkipBodyCapture stripped them.
Assert.Null(evt.RequestSummary);
Assert.Null(evt.ResponseSummary);
// Headers / metadata are still present.
Assert.NotNull(evt.Extra);
using var doc = JsonDocument.Parse(evt.Extra!);
Assert.True(doc.RootElement.TryGetProperty("requestHeaders", out _),
"Headers must be present even when body capture is skipped");
Assert.Equal(200, evt.HttpStatus);
}
[Fact]
public async Task SkipBodyCapture_True_CeilingHitsCounter_NotIncremented()
{
// When SkipBodyCapture=true the body is never measured against the cap;
// the counter must NOT be bumped even if the body would have exceeded it.
var writer = new RecordingAuditWriter();
var counter = new RecordingCeilingHitsCounter();
const int cap = 64;
var bigBody = new string('z', cap + 1000);
var opts = new AuditLogOptions
{
InboundMaxBytes = cap,
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
{
["large-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
{
SkipBodyCapture = true,
},
},
};
var ctx = BuildContextWithRoute("large-method", body: bigBody);
var mw = CreateMiddlewareWithCounter(
hc =>
{
hc.Response.StatusCode = 200;
return Task.CompletedTask;
},
writer,
options: opts,
counter: counter);
await mw.InvokeAsync(ctx);
Assert.Equal(0, counter.Count);
}
[Fact]
public async Task SkipBodyCapture_False_NormalTarget_StillCapturesBody()
{
// Regression: a target WITHOUT SkipBodyCapture (or with SkipBodyCapture=false)
// must still capture the body normally.
var writer = new RecordingAuditWriter();
var opts = new AuditLogOptions
{
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
{
["normal-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
{
SkipBodyCapture = false,
},
},
};
var requestJson = "{\"a\":1}";
var ctx = BuildContextWithRoute("normal-method", body: requestJson);
var mw = CreateMiddleware(
async hc =>
{
hc.Response.StatusCode = 200;
await hc.Response.WriteAsync("{\"result\":1}");
},
writer,
options: opts);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
Assert.Equal(requestJson, evt.RequestSummary);
Assert.Equal("{\"result\":1}", evt.ResponseSummary);
}
[Fact]
public async Task SkipBodyCapture_NoOverride_DefaultTarget_StillCapturesBody()
{
// A target with no per-target override at all must still capture the body —
// SkipBodyCapture defaults to false and must not suppress capture.
var writer = new RecordingAuditWriter();
var requestJson = "{\"x\":99}";
var ctx = BuildContext(body: requestJson);
var mw = CreateMiddleware(
async hc =>
{
hc.Response.StatusCode = 200;
await hc.Response.WriteAsync("{\"y\":99}");
},
writer);
await mw.InvokeAsync(ctx);
var evt = Assert.Single(writer.Events);
Assert.Equal(requestJson, evt.RequestSummary);
Assert.Equal("{\"y\":99}", evt.ResponseSummary);
}
}
@@ -1,6 +1,7 @@
using NSubstitute;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.InboundApi;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
@@ -139,6 +140,116 @@ public class RouteHelperTests
Assert.Equal("read failed", ex.Message);
}
// --- WaitForAttribute (spec §6) ---
[Fact]
public async Task WaitForAttribute_Matched_ReturnsTrue()
{
SiteResolves("inst-1", "SiteA");
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: true, Value: true, Quality: "Good", TimedOut: false,
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
var matched = await CreateHelper().To("inst-1")
.WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
Assert.True(matched);
}
[Fact]
public async Task WaitForAttribute_TimedOut_ReturnsFalse()
{
SiteResolves("inst-1", "SiteA");
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: false, Value: null, Quality: null, TimedOut: true,
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
var matched = await CreateHelper().To("inst-1")
.WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
Assert.False(matched);
}
[Fact]
public async Task WaitForAttribute_RoutingFailure_ThrowsInvalidOperationException()
{
// Success=false is a routing-level outcome (e.g. instance not found on the
// site), distinct from the wait outcome (Matched/TimedOut).
SiteResolves("inst-1", "SiteA");
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: false, Value: null, Quality: null, TimedOut: false,
Success: false, ErrorMessage: "instance not found", DateTimeOffset.UtcNow));
var ex = await Assert.ThrowsAsync<InvalidOperationException>(
() => CreateHelper().To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30)));
Assert.Equal("instance not found", ex.Message);
}
[Fact]
public async Task WaitForAttribute_EncodesTargetValue_OnRequest()
{
// Value-equality only across the wire: the target value is encoded via the
// canonical AttributeValueCodec, identical to how attribute values travel.
SiteResolves("inst-1", "SiteA");
RouteToWaitForAttributeRequest? captured = null;
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Do<RouteToWaitForAttributeRequest>(r => captured = r), Arg.Any<CancellationToken>())
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: true, Value: true, Quality: "Good", TimedOut: false,
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
await CreateHelper().To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
Assert.NotNull(captured);
Assert.Equal("Flag", captured!.AttributeName);
Assert.Equal(TimeSpan.FromSeconds(30), captured.Timeout);
Assert.Equal(AttributeValueCodec.Encode(true), captured.TargetValueEncoded);
Assert.True(Guid.TryParse(captured.CorrelationId, out _));
}
[Fact]
public async Task WaitForAttribute_WithNoExplicitToken_InheritsMethodDeadlineToken()
{
SiteResolves("inst-1", "SiteA");
using var deadline = new CancellationTokenSource();
CancellationToken seen = default;
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Do<CancellationToken>(t => seen = t))
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: false, Value: null, Quality: null, TimedOut: true,
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
var bound = CreateHelper().WithDeadline(deadline.Token);
await bound.To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
Assert.Equal(deadline.Token, seen);
}
[Fact]
public async Task WaitForAttribute_WithParentExecutionId_CarriesItOnRequest()
{
SiteResolves("inst-1", "SiteA");
var inboundExecutionId = Guid.NewGuid();
RouteToWaitForAttributeRequest? captured = null;
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Do<RouteToWaitForAttributeRequest>(r => captured = r), Arg.Any<CancellationToken>())
.Returns(ci => new RouteToWaitForAttributeResponse(
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
Matched: true, Value: true, Quality: "Good", TimedOut: false,
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
var bound = CreateHelper().WithParentExecutionId(inboundExecutionId);
await bound.To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
Assert.NotNull(captured);
Assert.Equal(inboundExecutionId, captured!.ParentExecutionId);
}
// --- SetAttribute(s) ---
[Fact]
@@ -89,6 +89,14 @@ public class SiteAuditPushFlowTests : TestKit
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
=> throw new NotSupportedException();
public Task<long> PurgeChannelOlderThanAsync(
string channel, DateTime threshold, int batchSize, CancellationToken ct = default)
=> throw new NotSupportedException();
public Task<long> BackfillSourceNodeAsync(
string sentinel, DateTime before, int batchSize, CancellationToken ct = default)
=> throw new NotSupportedException();
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
DateTime threshold, CancellationToken ct = default)
=> throw new NotSupportedException();
@@ -610,4 +610,366 @@ public class AuditEndpointsTests
Assert.NotNull(result);
Assert.Equal(new[] { "plant-a" }, result!.SourceSiteIds);
}
// ─────────────────────────────────────────────────────────────────────
// /api/audit/tree
// ─────────────────────────────────────────────────────────────────────
/// <summary>
/// Builds a TestServer with the audit-log endpoints wired up and the repository
/// stub returning the supplied <paramref name="treeNodes"/> for
/// <c>GetExecutionTreeAsync</c>.
/// </summary>
private static async Task<(HttpClient Client, IAuditLogRepository Repo, IHost Host)> BuildHostWithTreeAsync(
string[] roles,
IReadOnlyList<ExecutionTreeNode>? treeNodes = null)
{
var repo = Substitute.For<IAuditLogRepository>();
// Default QueryAsync stub so the shared host initialisation does not fail.
repo.QueryAsync(Arg.Any<AuditLogQueryFilter>(), Arg.Any<AuditLogPaging>(), Arg.Any<CancellationToken>())
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
var returnNodes = treeNodes ?? Array.Empty<ExecutionTreeNode>();
repo.GetExecutionTreeAsync(Arg.Any<Guid>(), Arg.Any<CancellationToken>())
.Returns(Task.FromResult<IReadOnlyList<ExecutionTreeNode>>(returnNodes));
var ldap = Substitute.For<ILdapAuthService>();
ldap.AuthenticateAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>())
.Returns(LdapAuthResult.Success("auditor", "Auditor", new[] { "audit" }));
var roleMapper = Substitute.For<RoleMapper>(Substitute.For<ISecurityRepository>());
roleMapper.MapGroupsToRolesAsync(Arg.Any<IReadOnlyList<string>>(), Arg.Any<CancellationToken>())
.Returns(new RoleMappingResult(roles, Array.Empty<string>(), IsSystemWideDeployment: true));
var hostBuilder = new HostBuilder()
.ConfigureWebHost(web =>
{
web.UseTestServer();
web.ConfigureServices(services =>
{
services.AddRouting();
services.AddSingleton(repo);
services.AddSingleton(ldap);
services.AddSingleton(roleMapper);
});
web.Configure(app =>
{
app.UseRouting();
app.UseEndpoints(endpoints => endpoints.MapAuditAPI());
});
});
var host = await hostBuilder.StartAsync();
return (host.GetTestClient(), repo, host);
}
private static ExecutionTreeNode MakeNode(Guid id, Guid? parentId = null, int rowCount = 2) =>
new ExecutionTreeNode(
ExecutionId: id,
ParentExecutionId: parentId,
RowCount: rowCount,
Channels: new[] { "ApiOutbound" },
Statuses: new[] { "Delivered" },
SourceSiteId: "plant-a",
SourceInstanceId: "inst-1",
FirstOccurredAtUtc: new DateTime(2026, 5, 20, 10, 0, 0, DateTimeKind.Utc),
LastOccurredAtUtc: new DateTime(2026, 5, 20, 10, 1, 0, DateTimeKind.Utc));
[Fact]
public async Task Tree_ValidExecutionId_ReturnsJsonArray()
{
var root = Guid.Parse("aaaaaaaa-0000-0000-0000-000000000001");
var child = Guid.Parse("aaaaaaaa-0000-0000-0000-000000000002");
var nodes = new[]
{
MakeNode(root),
MakeNode(child, parentId: root),
};
var (client, repo, host) = await BuildHostWithTreeAsync(
roles: new[] { "Administrator" },
treeNodes: nodes);
using (host)
{
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={root:D}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
Assert.Equal("application/json", response.Content.Headers.ContentType!.MediaType);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
Assert.Equal(2, doc.RootElement.GetArrayLength());
await repo.Received(1).GetExecutionTreeAsync(root, Arg.Any<CancellationToken>());
}
}
[Fact]
public async Task Tree_RepoReturnsEmpty_ReturnsEmptyArray()
{
var id = Guid.NewGuid();
var (client, _, host) = await BuildHostWithTreeAsync(
roles: new[] { "Administrator" },
treeNodes: Array.Empty<ExecutionTreeNode>());
using (host)
{
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={id:D}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
Assert.Equal(0, doc.RootElement.GetArrayLength());
}
}
[Fact]
public async Task Tree_MissingExecutionId_Returns400()
{
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
using (host)
{
var response = await client.SendAsync(Get("/api/audit/tree"));
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
}
}
[Fact]
public async Task Tree_InvalidExecutionId_Returns400()
{
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
using (host)
{
var response = await client.SendAsync(Get("/api/audit/tree?executionId=not-a-guid"));
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
var body = await response.Content.ReadAsStringAsync();
Assert.Contains("BAD_REQUEST", body);
}
}
[Fact]
public async Task Tree_WithoutOperationalAudit_Returns403()
{
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Designer" });
using (host)
{
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}"));
Assert.Equal(HttpStatusCode.Forbidden, response.StatusCode);
}
}
[Fact]
public async Task Tree_WithoutCredentials_Returns401()
{
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
using (host)
{
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}", credential: ""));
Assert.Equal(HttpStatusCode.Unauthorized, response.StatusCode);
}
}
[Fact]
public async Task Tree_ViewerRole_IsAllowed()
{
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Viewer" });
using (host)
{
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
}
}
// ─────────────────────────────────────────────────────────────────────
// POST /api/audit/backfill-source-node (M5.6 T5)
// ─────────────────────────────────────────────────────────────────────
private static async Task<(HttpClient Client, IAuditLogRepository Repo, IHost Host)> BuildHostWithBackfillAsync(
string[] roles,
long backfillResult = 42L,
bool ldapSucceeds = true)
{
var repo = Substitute.For<IAuditLogRepository>();
repo.QueryAsync(Arg.Any<AuditLogQueryFilter>(), Arg.Any<AuditLogPaging>(), Arg.Any<CancellationToken>())
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
repo.BackfillSourceNodeAsync(
Arg.Any<string>(), Arg.Any<DateTime>(), Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(Task.FromResult(backfillResult));
repo.GetExecutionTreeAsync(Arg.Any<Guid>(), Arg.Any<CancellationToken>())
.Returns(Task.FromResult<IReadOnlyList<ZB.MOM.WW.ScadaBridge.Commons.Types.Audit.ExecutionTreeNode>>(
Array.Empty<ZB.MOM.WW.ScadaBridge.Commons.Types.Audit.ExecutionTreeNode>()));
var ldap = Substitute.For<ILdapAuthService>();
ldap.AuthenticateAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>())
.Returns(ldapSucceeds
? LdapAuthResult.Success("auditor", "Auditor", new[] { "audit" })
: LdapAuthResult.Fail(LdapAuthFailure.BadCredentials));
var roleMapper = Substitute.For<RoleMapper>(Substitute.For<ISecurityRepository>());
roleMapper.MapGroupsToRolesAsync(Arg.Any<IReadOnlyList<string>>(), Arg.Any<CancellationToken>())
.Returns(new RoleMappingResult(roles, Array.Empty<string>(), IsSystemWideDeployment: true));
var hostBuilder = new HostBuilder()
.ConfigureWebHost(web =>
{
web.UseTestServer();
web.ConfigureServices(services =>
{
services.AddRouting();
services.AddSingleton(repo);
services.AddSingleton(ldap);
services.AddSingleton(roleMapper);
});
web.Configure(app =>
{
app.UseRouting();
app.UseEndpoints(endpoints => endpoints.MapAuditAPI());
});
});
var host = await hostBuilder.StartAsync();
return (host.GetTestClient(), repo, host);
}
private static HttpRequestMessage Post(string url, string body, string credential = BasicCredential)
{
var request = new HttpRequestMessage(HttpMethod.Post, url)
{
Content = new StringContent(body, Encoding.UTF8, "application/json"),
};
if (credential.Length > 0)
{
request.Headers.Authorization = new AuthenticationHeaderValue(
"Basic", Convert.ToBase64String(Encoding.UTF8.GetBytes(credential)));
}
return request;
}
[Fact]
public async Task BackfillSourceNode_AdminRole_Returns200WithRowCount()
{
var (client, _, host) = await BuildHostWithBackfillAsync(
roles: new[] { "Administrator" }, backfillResult: 12345L);
using (host)
{
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
var root = doc.RootElement;
Assert.Equal(12345L, root.GetProperty("rowsUpdated").GetInt64());
Assert.Equal("unknown", root.GetProperty("sentinel").GetString());
}
}
[Fact]
public async Task BackfillSourceNode_ViewerRole_Returns403()
{
// Viewer has OperationalAudit but NOT the Admin-only backfill permission.
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Viewer" });
using (host)
{
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}"));
Assert.Equal(HttpStatusCode.Forbidden, response.StatusCode);
}
}
[Fact]
public async Task BackfillSourceNode_NoCredentials_Returns401()
{
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
using (host)
{
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}",
credential: ""));
Assert.Equal(HttpStatusCode.Unauthorized, response.StatusCode);
}
}
[Fact]
public async Task BackfillSourceNode_MissingBefore_Returns400()
{
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
using (host)
{
// No "before" field — required.
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"unknown\"}"));
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
}
}
[Fact]
public async Task BackfillSourceNode_InvalidBeforeDate_Returns400()
{
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
using (host)
{
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"unknown\",\"before\":\"not-a-date\"}"));
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
}
}
[Fact]
public async Task BackfillSourceNode_CustomSentinelAndBatch_PassedToRepo()
{
var (client, repo, host) = await BuildHostWithBackfillAsync(
roles: new[] { "Administrator" }, backfillResult: 7L);
using (host)
{
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"sentinel\":\"pre-feature\",\"before\":\"2026-01-01T00:00:00Z\",\"batchSize\":2000}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
await repo.Received(1).BackfillSourceNodeAsync(
"pre-feature",
Arg.Is<DateTime>(d => d.Year == 2026 && d.Month == 1 && d.Day == 1),
2000,
Arg.Any<CancellationToken>());
}
}
[Fact]
public async Task BackfillSourceNode_DefaultSentinel_IsUnknown_WhenOmitted()
{
var (client, repo, host) = await BuildHostWithBackfillAsync(
roles: new[] { "Administrator" }, backfillResult: 0L);
using (host)
{
// Omit "sentinel" — endpoint defaults to "unknown".
var response = await client.SendAsync(Post(
"/api/audit/backfill-source-node",
"{\"before\":\"2026-01-01T00:00:00Z\"}"));
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
await repo.Received(1).BackfillSourceNodeAsync(
"unknown",
Arg.Any<DateTime>(),
Arg.Any<int>(),
Arg.Any<CancellationToken>());
}
}
}
@@ -495,4 +495,50 @@ public class NotificationOutboxActorQueryTests : TestKit
Assert.Contains("db down", response.ErrorMessage);
Assert.Empty(response.Sites);
}
// ── Per-node KPI (T6: M5.2 per-node stuck-count KPIs) ──────────────────
[Fact]
public void PerNodeKpiRequest_RepliesWithPerNodeSnapshots()
{
_repository.ComputePerNodeKpisAsync(
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>())
.Returns(new List<NodeNotificationKpiSnapshot>
{
new("node-a", QueueDepth: 3, StuckCount: 1, ParkedCount: 0,
DeliveredLastInterval: 5, OldestPendingAge: TimeSpan.FromMinutes(12)),
});
var actor = CreateActor();
actor.Tell(new PerNodeNotificationKpiRequest("corr-pn"), TestActor);
var response = ExpectMsg<PerNodeNotificationKpiResponse>();
Assert.True(response.Success);
Assert.Null(response.ErrorMessage);
Assert.Equal("corr-pn", response.CorrelationId);
Assert.Single(response.Nodes);
Assert.Equal("node-a", response.Nodes[0].SourceNode);
Assert.Equal(1, response.Nodes[0].StuckCount);
_repository.Received(1).ComputePerNodeKpisAsync(
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>());
}
[Fact]
public void PerNodeKpiRequest_RepositoryFault_RepliesUnsuccessful()
{
_repository.ComputePerNodeKpisAsync(
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>())
.ThrowsAsync(new InvalidOperationException("node-kpi db down"));
var actor = CreateActor();
actor.Tell(new PerNodeNotificationKpiRequest("corr-pn"), TestActor);
var response = ExpectMsg<PerNodeNotificationKpiResponse>();
Assert.False(response.Success);
Assert.Equal("corr-pn", response.CorrelationId);
Assert.NotNull(response.ErrorMessage);
Assert.Contains("node-kpi db down", response.ErrorMessage);
Assert.Empty(response.Nodes);
}
}
@@ -594,6 +594,43 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
Assert.NotNull(response.OldestPendingAge);
}
// ── Per-node KPI (T6: M5.2 per-node stuck-count KPIs) ──────────────────
[SkippableFact]
public async Task PerNodeSiteCallKpiRequest_ScopesCountsToEachNode()
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
var nodeId = "node-" + Guid.NewGuid().ToString("N").Substring(0, 8);
await using var context = CreateContext();
var repo = new SiteCallAuditRepository(context);
var actor = CreateActor(repo, new SiteCallAuditOptions
{
StuckAgeThreshold = TimeSpan.FromMinutes(10),
KpiInterval = TimeSpan.FromHours(1),
});
var now = DateTime.UtcNow;
var siteId = NewSiteId();
// Non-terminal Attempted, created 30 min ago — buffered + stuck.
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), siteId, status: "Attempted",
createdAtUtc: now.AddMinutes(-30), sourceNode: nodeId));
// Terminal Parked.
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), siteId, status: "Parked",
createdAtUtc: now.AddMinutes(-5), terminal: true, sourceNode: nodeId));
actor.Tell(new PerNodeSiteCallKpiRequest("corr-pnk"), TestActor);
var response = ExpectMsg<PerNodeSiteCallKpiResponse>(TimeSpan.FromSeconds(10));
Assert.True(response.Success);
var myNode = Assert.Single(response.Nodes, n => n.SourceNode == nodeId);
Assert.Equal(1, myNode.BufferedCount);
Assert.Equal(1, myNode.ParkedCount);
Assert.Equal(1, myNode.StuckCount);
Assert.NotNull(myNode.OldestPendingAge);
}
[SkippableFact]
public async Task PerSiteSiteCallKpiRequest_ScopesCountsToEachSite()
{
@@ -745,6 +782,10 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
}
/// <summary>
@@ -790,5 +831,9 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
}
}
@@ -76,6 +76,10 @@ public class SiteCallAuditPurgeTests : TestKit
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
}
/// <summary>Repository whose purge always throws — to prove continue-on-error keeps the singleton alive.</summary>
@@ -94,6 +98,7 @@ public class SiteCallAuditPurgeTests : TestKit
public Task<IReadOnlyList<SiteCall>> QueryAsync(SiteCallQueryFilter f, SiteCallPaging p, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCall>>(Array.Empty<SiteCall>());
public Task<SiteCallKpiSnapshot> ComputeKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult(new SiteCallKpiSnapshot(0, 0, 0, 0, null, 0));
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
}
private IActorRef CreateActor(ISiteCallAuditRepository repo, SiteCallAuditOptions options) =>
@@ -142,6 +142,10 @@ public class SiteCallAuditReconciliationTests : TestKit
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
}
private IActorRef CreateActor(
@@ -50,6 +50,10 @@ public class SiteCallRelayTests : TestKit
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
throw new InvalidOperationException("relay must not compute per-site KPIs");
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
throw new InvalidOperationException("relay must not compute per-node KPIs");
}
/// <summary>
@@ -6,6 +6,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Messages.Deployment;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.DebugView;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.InboundApi;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Lifecycle;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
@@ -389,6 +390,61 @@ public class DeploymentManagerActorTests : TestKit, IDisposable
Assert.True(response.Success, $"Routed call failed: {response.ErrorMessage}");
}
// ── Spec §6 (WD-2b): routed RouteToWaitForAttributeRequest → InstanceActor ──
[Fact]
public async Task RouteInboundApiWaitForAttribute_AttributeAlreadyAtTarget_RepliesMatched()
{
// A routed wait whose target equals the instance's current (static)
// attribute value must satisfy the InstanceActor fast-path and come back
// Success:true, Matched:true with the matched value/quality.
var actor = CreateDeploymentManager();
await Task.Delay(500); // empty startup
// MakeConfigJson seeds a scalar static attribute "TestAttr" = "42" (Good).
actor.Tell(new DeployInstanceCommand(
"dep-wait", "WaitPump", "sha256:wait",
MakeConfigJson("WaitPump"), "admin", DateTimeOffset.UtcNow));
ExpectMsg<DeploymentStatusResponse>(TimeSpan.FromSeconds(5));
await Task.Delay(1000); // let the InstanceActor spin up + load static attrs
// Encode the target the same way the InstanceActor encodes the current
// value for its codec-equality match (value-equality only across the wire).
var encodedTarget = AttributeValueCodec.Encode("42");
actor.Tell(new RouteToWaitForAttributeRequest(
"wait-corr-1", "WaitPump", "TestAttr", encodedTarget,
TimeSpan.FromSeconds(5), DateTimeOffset.UtcNow));
var response = ExpectMsg<RouteToWaitForAttributeResponse>(TimeSpan.FromSeconds(10));
Assert.Equal("wait-corr-1", response.CorrelationId);
Assert.True(response.Success, $"Routed wait failed: {response.ErrorMessage}");
Assert.True(response.Matched, "Expected fast-path match (attribute already at target).");
Assert.False(response.TimedOut);
Assert.Equal("42", response.Value);
Assert.Equal("Good", response.Quality);
}
[Fact]
public async Task RouteInboundApiWaitForAttribute_UnknownInstance_RepliesNotFound()
{
// A routed wait for an instance that was never deployed to this site must
// come back Success:false with a not-found message (routing-level outcome),
// mirroring the other RouteTo* unknown-instance paths.
var actor = CreateDeploymentManager();
await Task.Delay(500);
actor.Tell(new RouteToWaitForAttributeRequest(
"wait-corr-2", "NeverDeployedWait", "TestAttr",
AttributeValueCodec.Encode("42"), TimeSpan.FromSeconds(5), DateTimeOffset.UtcNow));
var response = ExpectMsg<RouteToWaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("wait-corr-2", response.CorrelationId);
Assert.False(response.Success);
Assert.False(response.Matched);
Assert.NotNull(response.ErrorMessage);
Assert.Contains("not found", response.ErrorMessage!, StringComparison.OrdinalIgnoreCase);
}
// ── M2.11: Debug-view routing — unknown-instance not-found signal ──
[Fact]
@@ -0,0 +1,853 @@
using Akka.Actor;
using Akka.TestKit;
using Akka.TestKit.Xunit2;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.DataConnection;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Persistence;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
using System.Text.Json;
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests.Actors;
/// <summary>
/// Tests for the event-driven <c>WaitForAttribute</c> one-shot waiter registry in
/// <see cref="InstanceActor"/> (Attributes.WaitAsync spec §3-§5). Covers the
/// fast-path, change-match, timeout, no-leak (timeout-canceled-on-match), and
/// predicate-overload acceptance criteria.
/// </summary>
public class InstanceActorWaitForAttributeTests : TestKit, IDisposable
{
private readonly SiteStorageService _storage;
private readonly ScriptCompilationService _compilationService;
private readonly SharedScriptLibrary _sharedScriptLibrary;
private readonly SiteRuntimeOptions _options;
private readonly string _dbFile;
public InstanceActorWaitForAttributeTests()
{
_dbFile = Path.Combine(Path.GetTempPath(), $"instance-waitfor-test-{Guid.NewGuid():N}.db");
_storage = new SiteStorageService(
$"Data Source={_dbFile}",
NullLogger<SiteStorageService>.Instance);
_storage.InitializeAsync().GetAwaiter().GetResult();
_compilationService = new ScriptCompilationService(
NullLogger<ScriptCompilationService>.Instance);
_sharedScriptLibrary = new SharedScriptLibrary(
_compilationService, NullLogger<SharedScriptLibrary>.Instance);
_options = new SiteRuntimeOptions();
}
private IActorRef CreateInstanceActor(string instanceName, FlattenedConfiguration config)
{
return ActorOf(Props.Create(() => new InstanceActor(
instanceName,
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null, // no stream manager in tests
_options,
NullLogger<InstanceActor>.Instance)));
}
void IDisposable.Dispose()
{
Shutdown();
try { File.Delete(_dbFile); } catch { /* cleanup */ }
}
// ── 1. Fast-path: attribute already at target ────────────────────────────
/// <summary>
/// Acceptance §7.1: when the attribute already equals the target at the time
/// the waiter registers, the actor must reply immediately with Matched=true
/// (carrying the current value), without scheduling a timeout.
/// </summary>
[Fact]
public void WaitForAttribute_FastPath_AlreadyAtTarget_RepliesMatchedImmediately()
{
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Flag", Value = "true", DataType = "Boolean" }
]
};
var actor = CreateInstanceActor("Pump1", config);
actor.Tell(new WaitForAttributeRequest(
"wfa-fast", "Pump1", "Flag",
"true", null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-fast", response.CorrelationId);
Assert.Equal("true", response.Value?.ToString());
}
// ── 2. Change-match: register first, then drive a value change ───────────
/// <summary>
/// Acceptance §7.1/§7.4: registering when the value does NOT match, then
/// driving the attribute to the target value (via a DCL TagValueUpdate) must
/// produce a single Matched=true reply carrying the new value.
/// </summary>
[Fact]
public void WaitForAttribute_ChangeMatch_RepliesMatchedWithNewValue()
{
const string tag = "ns=3;s=Recipe.Processed";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
CanonicalName = "Processed", Value = "false", DataType = "Boolean",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// Register: current value "false" does not match the target. The value
// arrives from the DCL as a boolean true, whose codec-encoded form is
// "True" — so the target must be encoded the same way the accessor would
// (AttributeValueCodec.Encode(true)), NOT the literal string "true".
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode(true);
actor.Tell(new WaitForAttributeRequest(
"wfa-change", "Pump1", "Processed",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
// No reply yet — the value has not changed to the target.
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
// Drive the value to the target through the DCL ingest path.
actor.Tell(new TagValueUpdate("PLC", tag, true, QualityCode.Good, DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-change", response.CorrelationId);
Assert.Equal(true, response.Value);
Assert.Equal("Good", response.Quality);
}
// ── 3. Timeout: value never matches ──────────────────────────────────────
/// <summary>
/// Acceptance §7.2: when the attribute never reaches the target within the
/// timeout, the actor replies Matched=false, TimedOut=true (no throw).
/// </summary>
[Fact]
public void WaitForAttribute_Timeout_RepliesNotMatchedTimedOut()
{
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Flag", Value = "false", DataType = "Boolean" }
]
};
var actor = CreateInstanceActor("Pump1", config);
actor.Tell(new WaitForAttributeRequest(
"wfa-timeout", "Pump1", "Flag",
"true", null, TimeSpan.FromMilliseconds(300), DateTimeOffset.UtcNow));
// The scheduled timeout fires; allow a tolerant deadline.
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
Assert.False(response.Matched);
Assert.True(response.TimedOut);
Assert.Equal("wfa-timeout", response.CorrelationId);
}
// ── 4. No-leak: timeout canceled on match (no second reply) ──────────────
/// <summary>
/// Acceptance §7.5: after a successful change-match, the scheduled timeout
/// must have been canceled and the waiter removed — so NO second (timeout)
/// response arrives after the match.
/// </summary>
[Fact]
public void WaitForAttribute_Match_CancelsTimeout_NoSecondReply()
{
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Flag", Value = "false", DataType = "Boolean" }
]
};
var actor = CreateInstanceActor("Pump1", config);
// Register with a short timeout, then match BEFORE it would fire.
actor.Tell(new WaitForAttributeRequest(
"wfa-noleak", "Pump1", "Flag",
"true", null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow));
// Drive the static value to the target; the actor publishes via
// HandleAttributeValueChanged, satisfying the waiter.
actor.Tell(new SetStaticAttributeCommand(
"set-flag", "Pump1", "Flag", "true", DateTimeOffset.UtcNow));
// First reply: the match. (A SetStaticAttributeResponse also arrives for
// the set command — filter for the WaitForAttributeResponse.)
var matched = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(matched.Matched);
Assert.False(matched.TimedOut);
// The set command's own ack — drain it so the no-msg assert below is clean.
ExpectMsg<SetStaticAttributeResponse>(TimeSpan.FromSeconds(5));
// No second WaitForAttributeResponse (the timeout was canceled) for longer
// than the original 500ms timeout window.
ExpectNoMsg(TimeSpan.FromSeconds(1));
}
// ── 5. Predicate overload ────────────────────────────────────────────────
/// <summary>
/// Acceptance §7 (predicate form): registering with a site-local predicate and
/// then flipping the value so the predicate passes must produce Matched=true.
/// </summary>
[Fact]
public void WaitForAttribute_PredicateOverload_MatchesOnPredicatePass()
{
const string tag = "ns=3;s=Level";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
CanonicalName = "Level", Value = "0", DataType = "Int32",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// Predicate: value > 50 (current is 0, so no immediate match).
Func<object?, bool> predicate = v =>
v is not null && int.TryParse(v.ToString(), out var n) && n > 50;
actor.Tell(new WaitForAttributeRequest(
"wfa-pred", "Pump1", "Level",
null, predicate, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
// A value below the threshold must NOT satisfy the predicate.
actor.Tell(new TagValueUpdate("PLC", tag, 25, QualityCode.Good, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
// A value above the threshold satisfies it.
actor.Tell(new TagValueUpdate("PLC", tag, 75, QualityCode.Good, DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal(75, response.Value);
}
// ── 6. "any change" (null target + null predicate) ───────────────────────
/// <summary>
/// Spec §4.1: a null TargetValueEncoded + null Predicate means "wait for any
/// change" (test <c>_ => true</c>). When the attribute ALREADY holds a value at
/// registration, the fast-path matches IMMEDIATELY — there is no need to wait for
/// a subsequent update. (A separate test covers the absent-at-registration case.)
/// </summary>
[Fact]
public void WaitForAttribute_AnyChange_MatchesImmediatelyWhenAttributePresent()
{
const string tag = "ns=3;s=Speed";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
CanonicalName = "Speed", Value = "0", DataType = "Int32",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// "any change" registers with a non-trivial timeout. The fast-path uses
// `_ => true`, so a currently-present attribute matches immediately.
actor.Tell(new WaitForAttributeRequest(
"wfa-any", "Pump1", "Speed",
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
// Speed=0 is already present, so the "any change" test (_ => true) matches
// immediately on the fast path.
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
}
/// <summary>
/// Spec §4.1 (companion to the immediate-match case): when the attribute is
/// ABSENT at registration (no entry in <c>_attributes</c>), the "any change"
/// waiter does NOT fast-path — it registers, and a later value update on that
/// attribute is the first thing that satisfies it.
/// </summary>
[Fact]
public void WaitForAttribute_AnyChange_AttributeAbsent_MatchesOnLaterSet()
{
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute { CanonicalName = "Known", Value = "x", DataType = "String" }
]
};
var actor = CreateInstanceActor("Pump1", config);
// "Ghost" is not a configured attribute, so _attributes has no entry — the
// fast-path TryGetValue misses and the waiter registers rather than matching.
actor.Tell(new WaitForAttributeRequest(
"wfa-absent", "Pump1", "Ghost",
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
// A direct AttributeValueChanged for "Ghost" populates _attributes and
// re-evaluates the waiter; the any-change test now matches the new value.
actor.Tell(new AttributeValueChanged(
"Pump1", "Ghost", "Ghost", "appeared", "Good", DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-absent", response.CorrelationId);
Assert.Equal("appeared", response.Value);
}
// ── 7. CRITICAL 1: no spurious match on a quality-only republish ─────────
/// <summary>
/// CRITICAL 1 regression: the List-coerce-failure Bad-quality path republishes
/// the OLD value (quality flipped to Bad) WITHOUT changing <c>_attributes</c>, so
/// it passes <c>evaluateWaiters:false</c> — registered waiters are NOT re-evaluated
/// on this non-change republish, must NOT spuriously fire, and must STILL resolve
/// on the next genuine value change.
///
/// <para>
/// We register an "any-change" waiter (which correctly fast-path matches the
/// present value and is drained) plus a pending predicate waiter that does not yet
/// match, then drive the Bad-quality republish and assert NO match is delivered for
/// the pending waiter, and that a subsequent REAL change resolves it. (Note: the
/// purest "any-change fires on a non-change republish" symptom is not directly
/// reproducible — an any-change waiter against a present attribute always fast-path
/// matches and so never stays pending across a republish; this test guards the
/// republish path against double-firing / stranding waiters and against the
/// predicate being re-evaluated on the non-change republish.)
/// </para>
/// </summary>
[Fact]
public void WaitForAttribute_BadQualityRepublish_NoValueChange_DoesNotMatch()
{
const string tag = "ns=3;s=Items";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
// Static default {1,2}: a real list value is present from
// construction so the Bad-quality republish has an OLD value to
// republish. The waiter below targets a DIFFERENT value so it is
// genuinely pending (no fast-path match) when the republish fires.
CanonicalName = "Items", Value = "[1,2]", DataType = "List",
ElementDataType = "Int32",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// A predicate waiter that matches a list of length >= 3. Current value is
// {1,2} (length 2) so it does NOT fast-path match — it registers and stays
// pending. Crucially, the Bad-quality republish below carries the SAME OLD
// value {1,2} (length 2); with the bug (evaluateWaiters always true) the
// predicate would be re-evaluated against {1,2} → still false, so this probe
// also guards the predicate-isolation contract on the republish path.
Func<object?, bool> lenAtLeast3 = v =>
v is System.Collections.IList list && list.Count >= 3;
actor.Tell(new WaitForAttributeRequest(
"wfa-len3", "Pump1", "Items",
null, lenAtLeast3, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
// Also register an "any-change" waiter while the attribute is present — it
// fast-path matches the current {1,2} immediately. Drain that correct match;
// it is the documented immediate-match behaviour, not the bug under test.
actor.Tell(new WaitForAttributeRequest(
"wfa-any", "Pump1", "Items",
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
var immediate = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("wfa-any", immediate.CorrelationId);
Assert.True(immediate.Matched);
// Drive the List-coerce-FAILURE Bad-quality republish: a scalar int cannot
// coerce to List<Int32>, so the actor sets quality Bad and republishes the
// OLD value {1,2} WITHOUT changing _attributes (evaluateWaiters:false).
actor.Tell(new TagValueUpdate("PLC", tag, 999, QualityCode.Good, DateTimeOffset.UtcNow));
// The pending length>=3 waiter must NOT fire on this non-change republish.
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
// A REAL change to a length-3 list resolves the still-pending waiter.
actor.Tell(new TagValueUpdate("PLC", tag, new[] { 7, 8, 9 }, QualityCode.Good, DateTimeOffset.UtcNow));
var realChange = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("wfa-len3", realChange.CorrelationId);
Assert.True(realChange.Matched);
Assert.False(realChange.TimedOut);
}
// ── 8. CRITICAL 2: throwing predicate is isolated ────────────────────────
/// <summary>
/// CRITICAL 2 regression: two waiters on the SAME attribute — one with a
/// predicate that throws, one a normal value-equality. A single value change
/// must (a) NOT crash the actor, (b) evict the throwing waiter with a
/// non-matched error reply, and (c) STILL resolve the normal sibling. Finally
/// the actor must remain responsive to a subsequent request.
/// </summary>
[Fact]
public void WaitForAttribute_ThrowingPredicate_IsIsolated_SiblingStillMatches()
{
const string tag = "ns=3;s=State";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
CanonicalName = "State", Value = "init", DataType = "String",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// Waiter A: predicate that returns false for the CURRENT value ("init") so
// it clears the fast-path and registers, but THROWS once the value becomes
// "ready" — exercising the resolve-loop guard (not the fast-path guard).
Func<object?, bool> boom = v =>
v?.ToString() == "ready" ? throw new InvalidOperationException("kaboom") : false;
actor.Tell(new WaitForAttributeRequest(
"wfa-throw", "Pump1", "State",
null, boom, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
// Waiter B: normal value-equality waiting for "ready".
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-normal", "Pump1", "State",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
// One change to "ready": evaluates BOTH waiters on this attribute. The
// throwing one must be evicted (error reply); the normal one must match.
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
// Collect the two replies (order is registry-iteration dependent).
var r1 = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
var r2 = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
var byId = new[] { r1, r2 }.ToDictionary(r => r.CorrelationId);
var thrown = byId["wfa-throw"];
Assert.False(thrown.Matched);
Assert.False(thrown.TimedOut);
Assert.NotNull(thrown.ErrorMessage);
Assert.Contains("Wait predicate threw", thrown.ErrorMessage);
var normal = byId["wfa-normal"];
Assert.True(normal.Matched);
Assert.False(normal.TimedOut);
Assert.Equal("ready", normal.Value);
// The actor stayed alive and responsive: a follow-up request resolves.
actor.Tell(new GetAttributeRequest("get-after", "Pump1", "State", DateTimeOffset.UtcNow));
var get = ExpectMsg<GetAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("ready", get.Value);
// And the throwing waiter was REMOVED (no longer in the registry): driving
// another change produces NO further reply for it.
actor.Tell(new TagValueUpdate("PLC", tag, "again", QualityCode.Good, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
}
// ── 8b. CRITICAL 2 (fast-path): throwing predicate on already-held value ──
/// <summary>
/// CRITICAL 2 regression (fast-path analogue of
/// <see cref="WaitForAttribute_ThrowingPredicate_IsIsolated_SiblingStillMatches"/>):
/// a predicate that THROWS is registered against an attribute that ALREADY holds a
/// value, so the fast-path <c>test(current)</c> runs and throws. The actor must
/// (a) reply a non-matched <c>WaitForAttributeResponse</c> with a non-null
/// <c>ErrorMessage</c> (predicate-threw), (b) stay alive/responsive (it answers a
/// subsequent <c>GetAttributeRequest</c>), and (c) NOT register the waiter — there
/// is no later/second reply even after a value change on that attribute (the
/// fast-path guard returns WITHOUT scheduling a timeout or storing the waiter).
/// </summary>
[Fact]
public void WaitForAttribute_ThrowingPredicate_FastPath_RepliesError_NoRegistration_ActorStaysAlive()
{
const string tag = "ns=3;s=State";
var config = new FlattenedConfiguration
{
InstanceUniqueName = "Pump1",
Attributes =
[
new ResolvedAttribute
{
// Present from construction so the fast-path TryGetValue HITS and
// the predicate runs on the current value (and throws).
CanonicalName = "State", Value = "init", DataType = "String",
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var dcl = CreateTestProbe();
var actor = ActorOf(Props.Create(() => new InstanceActor(
"Pump1",
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
// Predicate THROWS unconditionally — the current value "init" is already
// present, so the fast-path test(current) executes it and throws.
Func<object?, bool> boom = _ => throw new InvalidOperationException("kaboom");
actor.Tell(new WaitForAttributeRequest(
"wfa-fp-throw", "Pump1", "State",
null, boom, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
// (a) Non-matched error reply (predicate-threw), guarded on the fast-path.
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("wfa-fp-throw", response.CorrelationId);
Assert.False(response.Matched);
Assert.False(response.TimedOut);
Assert.NotNull(response.ErrorMessage);
Assert.Contains("Wait predicate threw", response.ErrorMessage);
// (b) The actor stayed alive and responsive: a follow-up request resolves.
actor.Tell(new GetAttributeRequest("get-after-fp", "Pump1", "State", DateTimeOffset.UtcNow));
var get = ExpectMsg<GetAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.Equal("init", get.Value);
// (c) The waiter was NOT registered (no timeout scheduled): driving a value
// change on "State" produces NO further WaitForAttributeResponse.
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
}
// ── 9. Quality-gated ("Good"-only) matching (spec §4.2) ──────────────────
/// <summary>
/// Builds a data-connected instance actor with a single attribute backed by a
/// DCL probe, draining the initial <c>SubscribeTagsRequest</c>. Used by the
/// quality-gate tests, which drive value+quality through the DCL ingest path.
/// </summary>
private IActorRef CreateDataConnectedActor(
string instanceName, string attribute, string tag, string dataType, TestProbe dcl)
{
var config = new FlattenedConfiguration
{
InstanceUniqueName = instanceName,
Attributes =
[
new ResolvedAttribute
{
CanonicalName = attribute, Value = "init", DataType = dataType,
DataSourceReference = tag, BoundDataConnectionName = "PLC"
}
]
};
var actor = ActorOf(Props.Create(() => new InstanceActor(
instanceName,
JsonSerializer.Serialize(config),
_storage,
_compilationService,
_sharedScriptLibrary,
null,
_options,
NullLogger<InstanceActor>.Instance,
dcl.Ref)));
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
return actor;
}
/// <summary>
/// Spec §4.2 (change-match): with <c>RequireGoodQuality:true</c>, a value that
/// reaches the target but arrives at <b>Bad</b> quality is NOT a match — the
/// waiter stays pending and times out.
/// </summary>
[Fact]
public void WaitForAttribute_QualityGated_ChangeMatch_BadQuality_DoesNotMatch_TimesOut()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qg-bad", "Pump1", "State",
target, null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow,
RequireGoodQuality: true));
// Value reaches the target but at Bad quality → must NOT match.
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
// The only reply must be the timeout (no spurious Bad-quality match).
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
Assert.False(response.Matched);
Assert.True(response.TimedOut);
Assert.Equal("wfa-qg-bad", response.CorrelationId);
}
/// <summary>
/// Spec §4.2 (change-match, quality-agnostic baseline): the SAME Bad-quality
/// value-reaches-target scenario DOES match when <c>RequireGoodQuality:false</c>.
/// </summary>
[Fact]
public void WaitForAttribute_QualityAgnostic_ChangeMatch_BadQuality_Matches()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qa-bad", "Pump1", "State",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
RequireGoodQuality: false));
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-qa-bad", response.CorrelationId);
Assert.Equal("ready", response.Value);
Assert.Equal("Bad", response.Quality);
}
/// <summary>
/// Spec §4.2 (change-match): with <c>RequireGoodQuality:true</c>, a value that
/// reaches the target at <b>Good</b> quality matches normally. Also proves the
/// gate is per-quality not per-value: a Bad-quality arrival at the target is
/// skipped, then a Good-quality arrival at the target resolves the waiter.
/// </summary>
[Fact]
public void WaitForAttribute_QualityGated_ChangeMatch_GoodQuality_Matches()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qg-good", "Pump1", "State",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
RequireGoodQuality: true));
// First arrival at target but Bad quality is skipped (gate holds it pending).
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(400));
// Then a Good-quality arrival at the target resolves it.
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-qg-good", response.CorrelationId);
Assert.Equal("ready", response.Value);
Assert.Equal("Good", response.Quality);
}
/// <summary>
/// Spec §4.2 (fast-path): the attribute ALREADY holds the target value at
/// <b>Bad</b> quality when the quality-gated waiter registers. The fast-path must
/// NOT reply matched — it registers + schedules the timeout like any pending
/// waiter, and (here) times out because the value never reaches target at Good.
/// </summary>
[Fact]
public void WaitForAttribute_QualityGated_FastPath_AlreadyAtTargetButBad_DoesNotMatch_TimesOut()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
// Seed the attribute to the target value at Bad quality BEFORE registering.
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(200)); // no waiter yet → no reply
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qg-fp-bad", "Pump1", "State",
target, null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow,
RequireGoodQuality: true));
// Fast-path quality-fail → registers, then times out (no fast matched reply).
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
Assert.False(response.Matched);
Assert.True(response.TimedOut);
Assert.Equal("wfa-qg-fp-bad", response.CorrelationId);
}
/// <summary>
/// Spec §4.2 (fast-path, quality-agnostic baseline): the SAME already-at-target-
/// but-Bad attribute fast-path MATCHES when <c>RequireGoodQuality:false</c>.
/// </summary>
[Fact]
public void WaitForAttribute_QualityAgnostic_FastPath_AlreadyAtTargetButBad_Matches()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qa-fp-bad", "Pump1", "State",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
RequireGoodQuality: false));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-qa-fp-bad", response.CorrelationId);
Assert.Equal("ready", response.Value);
Assert.Equal("Bad", response.Quality);
}
/// <summary>
/// Spec §4.2 (fast-path): the attribute ALREADY holds the target value at
/// <b>Good</b> quality when the quality-gated waiter registers → the fast-path
/// matches immediately.
/// </summary>
[Fact]
public void WaitForAttribute_QualityGated_FastPath_AlreadyAtTargetGood_MatchesImmediately()
{
const string tag = "ns=3;s=State";
var dcl = CreateTestProbe();
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
actor.Tell(new WaitForAttributeRequest(
"wfa-qg-fp-good", "Pump1", "State",
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
RequireGoodQuality: true));
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
Assert.True(response.Matched);
Assert.False(response.TimedOut);
Assert.Equal("wfa-qg-fp-good", response.CorrelationId);
Assert.Equal("ready", response.Value);
Assert.Equal("Good", response.Quality);
}
}
@@ -0,0 +1,291 @@
using Akka.Actor;
using Akka.TestKit.Xunit2;
using Microsoft.Extensions.Logging.Abstractions;
using Moq;
using ZB.MOM.WW.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.ScriptExecution;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using IAuditWriter = ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services.IAuditWriter;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests.Scripts;
/// <summary>
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): nested
/// <c>CallScript</c> / <c>CallShared</c> invocations and alarm on-trigger runs
/// must form a true execution tree, where each spawned run records its
/// immediate spawner's <c>ExecutionId</c> as its <c>ParentExecutionId</c>.
///
/// <list type="bullet">
/// <item><description>
/// A nested <c>CallScript</c> (actor-routed) emits a
/// <see cref="ScriptCallRequest"/> whose <c>ParentExecutionId</c> is the
/// CALLING run's OWN <c>ExecutionId</c> — NOT the inherited grandparent — so
/// <c>A → CallScript(B)</c> yields <c>B.Parent == A.ExecutionId</c>.
/// </description></item>
/// <item><description>
/// A nested <c>CallShared</c> (inline) runs in a child context that mints a
/// fresh <c>ExecutionId</c> and records the caller's <c>ExecutionId</c> as its
/// parent — so <c>B → CallShared(C)</c> yields <c>C.Parent == B.ExecutionId</c>
/// (and NOT B's inherited parent A), proving a multi-level tree.
/// </description></item>
/// <item><description>
/// The alarm on-trigger plumbing carries a <c>parentExecutionId</c> into the
/// script context — null today (the run is a root) but threaded so a future
/// firing id can flow.
/// </description></item>
/// </list>
/// </summary>
public class ParentExecutionTreeTests : TestKit
{
private const string InstanceName = "Plant.Pump42";
/// <summary>
/// In-memory <see cref="IAuditWriter"/> capturing every emitted event
/// (mirrors <c>ExecutionCorrelationContextTests.CapturingAuditWriter</c>).
/// </summary>
private sealed class CapturingAuditWriter : IAuditWriter
{
public List<AuditRowProjection.AuditRowValues> Events { get; } = new();
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
{
Events.Add(evt.AsRow());
return Task.CompletedTask;
}
}
private static SharedScriptLibrary NewLibrary()
{
var compilationService = new ScriptCompilationService(
NullLogger<ScriptCompilationService>.Instance);
return new SharedScriptLibrary(
compilationService, NullLogger<SharedScriptLibrary>.Instance);
}
/// <summary>
/// Builds a context whose <c>CallScript</c> Ask targets <paramref name="instanceActor"/>
/// (a probe), so the forwarded <see cref="ScriptCallRequest"/> can be captured.
/// </summary>
private static ScriptRuntimeContext CreateContext(
IActorRef instanceActor,
SharedScriptLibrary library,
IExternalSystemClient? externalSystemClient = null,
IAuditWriter? auditWriter = null,
Guid? executionId = null,
Guid? parentExecutionId = null)
{
return new ScriptRuntimeContext(
instanceActor,
ActorRefs.Nobody,
library,
currentCallDepth: 0,
maxCallDepth: 10,
askTimeout: TimeSpan.FromSeconds(5),
instanceName: InstanceName,
logger: NullLogger.Instance,
externalSystemClient: externalSystemClient,
siteId: "site-77",
sourceScript: "ScriptActor:A",
auditWriter: auditWriter,
executionId: executionId,
parentExecutionId: parentExecutionId);
}
// -------------------------------------------------------------------------
// Nested CallScript (actor-routed) — A → CallScript(B)
// -------------------------------------------------------------------------
[Fact]
public async Task CallScript_StampsCallingRunsOwnExecutionId_AsChildParent()
{
// A → CallScript(B): the child request's ParentExecutionId must be A's
// OWN ExecutionId, forming the A→B tree edge.
var probe = CreateTestProbe();
var aExecutionId = Guid.NewGuid();
var context = CreateContext(probe.Ref, NewLibrary(), executionId: aExecutionId);
var call = context.CallScript("B");
var request = probe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("B", request.ScriptName);
// B's parent is A's own execution id — the A→B tree edge.
Assert.Equal(aExecutionId, request.ParentExecutionId);
// Unblock the Ask so the test completes cleanly.
probe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
await call;
}
[Fact]
public async Task CallScript_FromRoutedRun_UsesOwnExecutionId_NotInheritedParent()
{
// A 2-level tree edge: B was itself spawned (it carries a parent = A).
// When B does CallScript(C), C.Parent must be B's OWN ExecutionId — NOT
// the inherited A. This is the regression that distinguishes a true tree
// from a flattened "everything under the original spawner" model.
var probe = CreateTestProbe();
var bExecutionId = Guid.NewGuid();
var aExecutionId = Guid.NewGuid(); // B's inherited parent
var context = CreateContext(
probe.Ref, NewLibrary(),
executionId: bExecutionId,
parentExecutionId: aExecutionId);
var call = context.CallScript("C");
var request = probe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
Assert.Equal(bExecutionId, request.ParentExecutionId);
Assert.NotEqual(aExecutionId, request.ParentExecutionId);
probe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
await call;
}
// -------------------------------------------------------------------------
// Nested CallShared (inline) — B → CallShared(C)
// -------------------------------------------------------------------------
[Fact]
public async Task CallShared_ChildRun_ParentIsCallersExecutionId_FreshOwnExecutionId()
{
// B → CallShared(C): the shared script C runs inline but is modelled as
// its OWN execution node — a fresh ExecutionId parented to B's
// ExecutionId. Asserted via the audit row C emits through
// Instance.ExternalSystem.Call.
var client = new Mock<IExternalSystemClient>();
client
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(new ExternalCallResult(true, "{}", null));
var writer = new CapturingAuditWriter();
var library = NewLibrary();
Assert.True(library.CompileAndRegister(
"C", "await Instance.ExternalSystem.Call(\"ERP\", \"GetOrder\"); return null;"));
var bExecutionId = Guid.NewGuid();
var context = CreateContext(
ActorRefs.Nobody, library,
externalSystemClient: client.Object,
auditWriter: writer,
executionId: bExecutionId);
await context.Scripts.CallShared("C");
var evt = Assert.Single(writer.Events);
// C's parent is B's execution id — the B→C tree edge.
Assert.Equal(bExecutionId, evt.ParentExecutionId);
// C minted its OWN fresh, non-empty execution id, distinct from B.
Assert.NotNull(evt.ExecutionId);
Assert.NotEqual(Guid.Empty, evt.ExecutionId!.Value);
Assert.NotEqual(bExecutionId, evt.ExecutionId!.Value);
}
[Fact]
public async Task CallShared_FromRoutedRun_ChildParentIsCaller_NotInheritedGrandparent()
{
// Regression / multi-level: B itself carries a parent A. When B does
// CallShared(C), C.Parent must be B's OWN ExecutionId — NOT A. This is
// the A→B→C chain proving each level points at its immediate spawner.
var client = new Mock<IExternalSystemClient>();
client
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(new ExternalCallResult(true, "{}", null));
var writer = new CapturingAuditWriter();
var library = NewLibrary();
Assert.True(library.CompileAndRegister(
"C", "await Instance.ExternalSystem.Call(\"ERP\", \"GetOrder\"); return null;"));
var bExecutionId = Guid.NewGuid();
var aExecutionId = Guid.NewGuid(); // B's inherited parent
var context = CreateContext(
ActorRefs.Nobody, library,
externalSystemClient: client.Object,
auditWriter: writer,
executionId: bExecutionId,
parentExecutionId: aExecutionId);
await context.Scripts.CallShared("C");
var evt = Assert.Single(writer.Events);
Assert.Equal(bExecutionId, evt.ParentExecutionId);
Assert.NotEqual(aExecutionId, evt.ParentExecutionId);
}
// -------------------------------------------------------------------------
// Alarm on-trigger plumbing
// -------------------------------------------------------------------------
[Fact]
public void CreateChildContextForSharedScript_ParentIsCallerExecution_FreshOwnId()
{
// Unit-level proof of the child-context contract the CallShared path uses.
var bExecutionId = Guid.NewGuid();
var context = CreateContext(
ActorRefs.Nobody, NewLibrary(), executionId: bExecutionId);
var child = context.CreateChildContextForSharedScript(childCallDepth: 1);
Assert.Equal(bExecutionId, child.ParentExecutionId);
Assert.NotEqual(Guid.Empty, child.ExecutionId);
Assert.NotEqual(bExecutionId, child.ExecutionId);
}
[Fact]
public void AlarmOnTrigger_NestedCallScript_CarriesAlarmRunsOwnExecutionId_AsParent()
{
// End-to-end alarm plumbing: when an alarm fires, its on-trigger script
// runs in a ScriptRuntimeContext built by AlarmExecutionActor. With no
// Guid firing id today the alarm run is a ROOT (its own ParentExecutionId
// is null), but it still mints its OWN fresh ExecutionId. A nested
// CallScript from that on-trigger script must therefore carry the alarm
// run's OWN (non-null) ExecutionId as the child's ParentExecutionId —
// proving the alarm context is a proper execution node feeding the
// cascade and the parentExecutionId parameter is plumbed end-to-end.
var compilationService = new ScriptCompilationService(
NullLogger<ScriptCompilationService>.Instance);
var sharedLibrary = new SharedScriptLibrary(
compilationService, NullLogger<SharedScriptLibrary>.Instance);
var options = new SiteRuntimeOptions();
var onTrigger = compilationService.Compile(
"OnTrigger", "await Instance.CallScript(\"Child\"); return null;");
Assert.NotNull(onTrigger.CompiledScript);
var alarmConfig = new ResolvedAlarm
{
CanonicalName = "HighTemp",
TriggerType = "ValueMatch",
TriggerConfiguration = "{\"attributeName\":\"Status\",\"matchValue\":\"Critical\"}",
PriorityLevel = 1
};
var instanceProbe = CreateTestProbe();
var alarm = ActorOf(Props.Create(() => new AlarmActor(
"HighTemp", "Pump1", instanceProbe.Ref, alarmConfig,
onTrigger.CompiledScript, sharedLibrary, options,
NullLogger<AlarmActor>.Instance)));
alarm.Tell(new AttributeValueChanged(
"Pump1", "Status", "Status", "Critical", "Good", DateTimeOffset.UtcNow));
// The alarm raises (instance gets AlarmStateChanged) AND the on-trigger
// script fires its nested CallScript at the instance.
instanceProbe.ExpectMsg<AlarmStateChanged>(TimeSpan.FromSeconds(5));
var request = instanceProbe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("Child", request.ScriptName);
// The alarm run is a root today (its own parent is null), but its OWN
// freshly-minted ExecutionId cascades to the child — so the child's
// ParentExecutionId is a real, non-empty value, NOT null.
Assert.NotNull(request.ParentExecutionId);
Assert.NotEqual(Guid.Empty, request.ParentExecutionId!.Value);
instanceProbe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
}
}
@@ -1,3 +1,8 @@
using Akka.Actor;
using Akka.TestKit;
using Akka.TestKit.Xunit2;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Scripts;
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
@@ -137,3 +142,157 @@ public class ScopeAccessorTests
Assert.Equal("[1,2,3]", encoded);
}
}
/// <summary>
/// WaitAsync (spec §3-§5, acceptance §7.6) scope-resolution tests. Unlike the
/// path-arithmetic tests above, these route a real <see cref="ScriptRuntimeContext"/>
/// against a TestProbe standing in for the Instance Actor, so they need a live
/// ActorSystem — hence a TestKit-derived class. They assert that
/// <c>Attributes.WaitAsync</c> applies <see cref="AttributeAccessor.Resolve"/>
/// (the composition prefix) to the key BEFORE the request is sent to the actor —
/// the same contract Get/Set obey.
/// </summary>
public class AttributeAccessorWaitAsyncTests : TestKit, IDisposable
{
private ScriptRuntimeContext MakeContext(IActorRef instanceActor) =>
new(
instanceActor,
instanceActor,
sharedScriptLibrary: null!,
currentCallDepth: 0,
maxCallDepth: 10,
askTimeout: TimeSpan.FromSeconds(2),
instanceName: "Pump1",
logger: NullLogger<ScriptRuntimeContext>.Instance);
void IDisposable.Dispose() => Shutdown();
[Fact]
public void WaitAsync_Value_AppliesScopeResolution_BeforeSendingRequest()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
// Composed scope "TempSensor" — Resolve("Flag") => "TempSensor.Flag".
var acc = new AttributeAccessor(ctx, "TempSensor");
// Fire-and-forget; the assertion is on the message the actor receives.
_ = acc.WaitAsync("Flag", true, TimeSpan.FromSeconds(30));
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("TempSensor.Flag", req.AttributeName);
// The value overload encodes the target via AttributeValueCodec.Encode and
// sends a null predicate. bool true encodes to "True" (capital T).
Assert.Equal(AttributeValueCodec.Encode(true), req.TargetValueEncoded);
Assert.Equal("True", req.TargetValueEncoded);
Assert.Null(req.Predicate);
Assert.Equal("Pump1", req.InstanceName);
}
[Fact]
public void WaitAsync_Predicate_AppliesScopeResolution_AndSendsPredicate()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
var acc = new AttributeAccessor(ctx, "Motor.TempSensor");
Func<object?, bool> predicate = _ => true;
_ = acc.WaitAsync("Level", predicate, TimeSpan.FromSeconds(30));
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("Motor.TempSensor.Level", req.AttributeName);
// The predicate overload sends the delegate and a null encoded target.
Assert.Null(req.TargetValueEncoded);
Assert.NotNull(req.Predicate);
}
[Fact]
public void WaitAsync_RootScope_LeavesKeyBare()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
var acc = new AttributeAccessor(ctx, "");
_ = acc.WaitAsync("Flag", true, TimeSpan.FromSeconds(30));
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("Flag", req.AttributeName);
}
// ── WaitForAsync (spec §3): scope resolution + populated WaitResult ───────
[Fact]
public async Task WaitForAsync_Value_AppliesScopeResolution_AndSurfacesPopulatedWaitResult()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
// Composed scope "TempSensor" — Resolve("Flag") => "TempSensor.Flag".
var acc = new AttributeAccessor(ctx, "TempSensor");
var task = acc.WaitForAsync("Flag", true, TimeSpan.FromSeconds(30));
// The actor receives the scope-resolved, codec-encoded request.
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("TempSensor.Flag", req.AttributeName);
Assert.Equal(AttributeValueCodec.Encode(true), req.TargetValueEncoded);
Assert.Null(req.Predicate);
Assert.False(req.RequireGoodQuality);
// Reply with a matched response — the accessor must surface the full WaitResult.
probe.Reply(new WaitForAttributeResponse(
req.CorrelationId, Matched: true, Value: true, Quality: "Good", TimedOut: false));
var result = await task;
Assert.True(result.Matched);
Assert.Equal(true, result.Value);
Assert.Equal("Good", result.Quality);
Assert.False(result.TimedOut);
}
[Fact]
public async Task WaitForAsync_Predicate_AppliesScopeResolution_AndSurfacesWaitResult()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
var acc = new AttributeAccessor(ctx, "Motor.TempSensor");
Func<object?, bool> predicate = _ => true;
var task = acc.WaitForAsync("Level", predicate, TimeSpan.FromSeconds(30));
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.Equal("Motor.TempSensor.Level", req.AttributeName);
Assert.Null(req.TargetValueEncoded);
Assert.NotNull(req.Predicate);
probe.Reply(new WaitForAttributeResponse(
req.CorrelationId, Matched: true, Value: 42, Quality: "Good", TimedOut: false));
var result = await task;
Assert.True(result.Matched);
Assert.Equal(42, result.Value);
}
[Fact]
public async Task WaitForAsync_RequireGoodQuality_ThreadsFlagIntoRequest()
{
var probe = CreateTestProbe();
var ctx = MakeContext(probe.Ref);
var acc = new AttributeAccessor(ctx, "");
var task = acc.WaitForAsync("Flag", true, TimeSpan.FromSeconds(30), requireGoodQuality: true);
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
Assert.True(req.RequireGoodQuality);
probe.Reply(new WaitForAttributeResponse(
req.CorrelationId, Matched: false, Value: null, Quality: null, TimedOut: true));
var result = await task;
Assert.False(result.Matched);
Assert.True(result.TimedOut);
Assert.Null(result.Value);
}
}