merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes
This commit is contained in:
@@ -163,14 +163,16 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
|
||||
- Scope = script trust boundary: outbound API (sync + cached), outbound DB (sync + cached), notifications, inbound API. Framework/internal traffic is explicitly excluded.
|
||||
- One row per lifecycle event; cached calls produce 4+ rows per operation (`Submitted`, `Forwarded`, `Attempted`, `Delivered`/`Parked`/`Discarded`).
|
||||
- `ExecutionId` (`uniqueidentifier NULL`) is the universal per-run correlation value — every audit row emitted by one script execution / inbound request shares it; `CorrelationId` remains the per-operation lifecycle id (NULL for sync one-shots).
|
||||
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; first cut bridges the inbound API → routed-site-script case (the routed run records the inbound request's `ExecutionId`; the inbound row stays top-level / NULL); `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk; tag cascade deferred.
|
||||
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; bridges inbound API → routed-site-script, alarm-triggered on-trigger scripts, and nested `CallScript`/`CallShared` invocations; `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk. Tag-cascade coverage is complete as of M5.4 (T4) — no further spawn points are deferred.
|
||||
- Site SQLite hot-path first, then gRPC telemetry to central; ingest is idempotent on `EventId`; periodic reconciliation pull as fallback when telemetry is lost.
|
||||
- Cached operations: site emits a single additively-extended `CachedCallTelemetry` packet carrying both audit events and operational state; central writes `AuditLog` + `SiteCalls` in one transaction.
|
||||
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in.
|
||||
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in. Inbound API: full verbatim capture up to `InboundMaxBytes` (default 1 MiB); request headers stored in `Extra.requestHeaders` (post-redaction); per-method `SkipBodyCapture` flag suppresses bodies while still recording headers + metadata; `AuditInboundCeilingHits` counter surfaced on health snapshot. (M5.3 T7)
|
||||
- Audit-write failure NEVER aborts the user-facing action — audit is best-effort, the action's own success/failure path is authoritative.
|
||||
- 365-day central retention with monthly partition-switch purge; 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
|
||||
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence and Parquet archival are deferred to v1.x.
|
||||
- 365-day central retention with monthly partition-switch purge; per-channel retention overrides (`AuditLog:PerChannelRetentionDays`) expire rows earlier than the global window via a bounded, batched row DELETE on the purge actor's maintenance path — values must be shorter than the global window (M5.5 T3); 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
|
||||
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence (T1) and Parquet archival (T2) are deferred to v1.x — not shipped in M5.
|
||||
- Node-of-origin is captured alongside site-of-origin: `SourceNode` (`varchar(64)` NULL) on `AuditLog`, `Notifications`, and `SiteCalls` — `node-a`/`node-b` for site rows (qualified by `SourceSiteId`/`SourceSite`), `central-a`/`central-b` for central direct-write rows. Stamped at the writing node, carried verbatim through telemetry + reconciliation, and indexed via `IX_AuditLog_Node_Occurred (SourceNode, OccurredAtUtc)` on `AuditLog`.
|
||||
- Per-node stuck KPIs (M5.3 T6): Notification Outbox and Site Call Audit expose `PerNodeNotificationKpiRequest`/`PerNodeSiteCallKpiRequest` messages that group stuck/parked/delivered counts by `SourceNode`, surfacing per-node breakdowns on the Health dashboard.
|
||||
- `audit tree --execution-id <guid>` CLI command (M5.3 T8) + `GET /api/audit/tree` endpoint — resolves any node to its chain root and renders the full execution tree; backed by `IAuditLogRepository.GetExecutionTreeAsync`.
|
||||
- Central UI: new top-level **Audit** nav group + Audit Log page, with drill-ins from Notifications, Site Calls, External Systems, Inbound API Keys, Sites, and Instances.
|
||||
|
||||
### Security & Auth
|
||||
|
||||
@@ -0,0 +1,150 @@
|
||||
# M5 — Audit Hardening (T3–T8) — Design
|
||||
|
||||
**Status:** Approved (awaiting plan).
|
||||
**Worktree/branch:** `worktree-m5-audit-hardening` off `main` (`e77e209`).
|
||||
**Source:** Phase-2 milestone M5 from `docs/plans/2026-06-15-stillpending-completion-design.md`.
|
||||
|
||||
## Goal
|
||||
|
||||
Harden the centralized Audit Log with six independent, ready-to-build items. Two
|
||||
items originally listed under M5 — **T1 hash-chain tamper evidence** and **T2
|
||||
Parquet export** — remain **deferred to v1.x** (per CLAUDE.md's audit design
|
||||
decisions); their stubs (CLI `verify-chain` no-op, export `501`) stay unchanged.
|
||||
|
||||
## Scope (in)
|
||||
|
||||
T3 per-channel retention · T4 ParentExecutionId tag-cascade · T5 historical
|
||||
backfill (reframed) · T6 per-node stuck KPIs · T7 structured response-capture
|
||||
increments · T8 CLI `audit tree`.
|
||||
|
||||
## Scope (out / deferred to v1.x)
|
||||
|
||||
T1 hash-chain (no Hash/PrevHash columns, no real verify-chain), T2 Parquet
|
||||
export (the `501` gate stays). Reversing those deferrals is a separate decision.
|
||||
|
||||
---
|
||||
|
||||
## Items
|
||||
|
||||
### T8 — CLI `audit tree` (smallest; reuses existing server walk + UI)
|
||||
The recursive execution-tree walk (`IAuditLogRepository.GetExecutionTreeAsync`,
|
||||
backed by `IX_AuditLog_ParentExecution`) and the Blazor `ExecutionTreePage`
|
||||
already exist; only an HTTP projection + CLI surface are missing.
|
||||
- **Server:** add `GET /api/audit/tree?executionId=…` in
|
||||
`AuditEndpoints.MapAuditAPI` → `repo.GetExecutionTreeAsync` → serialize
|
||||
`ExecutionTreeNode[]`.
|
||||
- **CLI:** add `audit tree --execution-id <guid> [--format table|json]` in
|
||||
`AuditCommands` + an `AuditTreeHelpers` renderer (indented ASCII tree for
|
||||
`table`; raw nodes for `json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
|
||||
- No schema change. **Tests:** endpoint returns the tree; CLI renders a
|
||||
multi-level tree + handles not-found.
|
||||
|
||||
### T6 — Per-node stuck-count KPIs
|
||||
KPIs are per-site today; `SourceNode` is on the `Notification` and `SiteCalls`
|
||||
rows but not aggregated.
|
||||
- Add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to the existing
|
||||
`ComputePerSiteKpisAsync` in `NotificationOutboxRepository` and
|
||||
`SiteCallAuditRepository`.
|
||||
- New `PerNode…KpiRequest`/`Response` message pair per actor; register in each
|
||||
actor's `Receive<>`.
|
||||
- Surface a per-node breakdown on the existing KPI tiles
|
||||
(`AuditKpiTiles`/`SiteCallKpiTiles`) — additive, behind the existing tiles.
|
||||
- **Tests:** repository grouping returns correct per-node counts (stuck/parked/
|
||||
queue-depth); message round-trip.
|
||||
|
||||
### T7 — Structured response-capture increments (no schema change)
|
||||
- **(a) Inbound request headers** → captured into the existing `Extra` JSON in
|
||||
`AuditWriteMiddleware.EmitInboundAudit`, passed through the existing header
|
||||
redactor (auth headers redacted by default).
|
||||
- **(b) `AuditInboundCeilingHits`** counter on `AuditCentralHealthSnapshot`
|
||||
(alongside the existing failure counters), incremented when an inbound row
|
||||
truncates (request or response hits `InboundMaxBytes`). Surfaced via the
|
||||
health snapshot.
|
||||
- **(c) Per-method opt-out** of body capture: a `SkipBodyCapture` flag on
|
||||
`PerTargetRedactionOverride`, checked in the capture pipeline so a noisy/
|
||||
sensitive method can suppress body capture (headers + metadata still recorded).
|
||||
- **Tests:** request headers land in `Extra` and are redacted; ceiling-hit
|
||||
increments the counter; opt-out suppresses body but keeps the row.
|
||||
|
||||
### T4 — `ParentExecutionId` tag-cascade (touches the actor model — high-risk)
|
||||
Completes the execution tree beyond the inbound-API→routed-script case.
|
||||
- **Alarm on-trigger:** thread a `Guid? parentExecutionId` through
|
||||
`AlarmActor.SpawnAlarmExecutionActor` → `AlarmExecutionActor` →
|
||||
`ScriptRuntimeContext`, so an alarm-triggered script chains to its firing
|
||||
context (the alarm's own execution id where one exists; otherwise a root).
|
||||
- **Nested `CallScript`/`CallShared`:** in `ScriptRuntimeContext`, pass **the
|
||||
current run's `ExecutionId`** (not the inherited `_parentExecutionId`) as the
|
||||
child invocation's `ParentExecutionId`, so `A → CallScript(B)` records B's
|
||||
parent as A — a true multi-level tree.
|
||||
- **Timer/expression-trigger top-level runs** stay roots (no spawner) — unchanged.
|
||||
- **Tests:** alarm-triggered script row carries the expected parent; a 2-level
|
||||
nested `CallScript` produces a chain A→B→C walkable by `GetExecutionTreeAsync`.
|
||||
- **Risk:** serialized actor state + correlation plumbing; covered by targeted
|
||||
SiteRuntime actor tests + a tree-walk integration assertion.
|
||||
|
||||
### T3 — Per-channel retention overrides (one design wrinkle, resolved)
|
||||
Retention is a single global `RetentionDays`; the purge actor switches out whole
|
||||
month partitions by `OccurredAtUtc` (channel-blind).
|
||||
- Add `PerChannelRetentionDays` (`Dictionary<string,int>`, keyed by channel /
|
||||
`Action` name) to `AuditLogOptions`, validated like the global value; a channel
|
||||
override may only be **shorter** than the global window (longer is meaningless
|
||||
under month-partition switch-out, which is governed by the largest retention).
|
||||
- **Mechanism (resolved):** after the coarse global partition purge, the purge
|
||||
actor runs a **bounded row-level delete** for channels whose override is
|
||||
shorter than global (`DELETE … WHERE Action=@channel AND OccurredAtUtc<@thr`,
|
||||
batched). This runs from the **purge/maintenance path, not the writer role** —
|
||||
the append-only invariant binds the writer/ingest role, not maintenance. The
|
||||
**M2.10 CI grep-guard is widened** to allow the purge actor's single audited
|
||||
deletion call site (an allow-list entry, not a blanket exemption).
|
||||
- **Tests:** a channel with a shorter override is purged earlier than the global;
|
||||
channels without an override follow the global; the guard still rejects
|
||||
UPDATE/DELETE everywhere except the sanctioned purge site.
|
||||
|
||||
### T5 — Historical backfill (reframed per the computed-column reality)
|
||||
- **`SourceNode`** is a physical nullable column. For truly historical rows the
|
||||
node-of-origin is **unknowable**, so the backfill sets a **configurable
|
||||
sentinel** (default `"unknown"`) on `NULL` rows via a one-shot maintenance
|
||||
command (run from the purge/maintenance path), rather than guessing a node.
|
||||
- **`ExecutionId`/`ParentExecutionId`** are **persisted computed columns derived
|
||||
from `DetailsJson`**; backfilling them means mutating the JSON, which
|
||||
append-only forbids. These are **documented as a runbook limitation** (pre-feature
|
||||
rows stay NULL) — no code.
|
||||
- **Tests:** the SourceNode backfill sets the sentinel only on NULL rows within a
|
||||
bounded range and is idempotent; documentation note added.
|
||||
|
||||
---
|
||||
|
||||
## Cross-cutting
|
||||
|
||||
- **Shared seams:** `AuditLogOptions` (T3, T7), `AuditEndpoints.MapAuditAPI`
|
||||
(T8), `AuditCommands` (T8), `AuditCentralHealthSnapshot` (T6, T7),
|
||||
`IAuditLogRepository`/the KPI repositories (T6), the purge/maintenance role
|
||||
(T3, T5). No AuditLog **schema** change in M5 (T1/T2 deferred).
|
||||
- **Append-only:** the only new deletion is T3's purge-role channel delete +
|
||||
T5's purge-role sentinel UPDATE — both maintenance-path, both reflected in the
|
||||
CI guard's allow-list. Writer/ingest paths stay INSERT-only.
|
||||
|
||||
## Testing strategy
|
||||
|
||||
Per-item unit + targeted integration tests (above). T4 additionally gets a
|
||||
tree-walk integration assertion. Full-solution build + targeted suites at the
|
||||
integration step. No new infra dependency (Parquet deferred).
|
||||
|
||||
## Sequencing
|
||||
|
||||
Independent items, parallelizable by disjoint area:
|
||||
- **Wave A (parallel):** T8 (CLI+endpoint), T6 (KPI repos+actors+tiles), T7
|
||||
(middleware+health+redaction-override) — disjoint projects.
|
||||
- **Wave B (parallel):** T4 (SiteRuntime actors — high-risk), T3 (AuditLog
|
||||
options+purge actor+CI guard), T5 (purge-path backfill command + runbook).
|
||||
- **Wave C:** integration verification + docs (Component-AuditLog/-CLI, CLAUDE.md
|
||||
KPI/retention notes, runbook).
|
||||
|
||||
## Risks
|
||||
|
||||
- **T4** actor-model correlation (serialized state) — targeted tests + tree-walk
|
||||
assertion.
|
||||
- **T3** append-only tension — resolved via maintenance-role delete + CI-guard
|
||||
allow-list; verify the guard still blocks all other DELETE/UPDATE.
|
||||
- **T5** node-of-origin unknowable — sentinel + documented limitation (no false
|
||||
precision).
|
||||
@@ -0,0 +1,92 @@
|
||||
# M5 — Audit Hardening (T3–T8) Implementation Plan
|
||||
|
||||
> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session.
|
||||
|
||||
**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change.
|
||||
|
||||
**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`.
|
||||
|
||||
**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
|
||||
|
||||
**Conventions:** targeted builds/tests per task (`dotnet build <proj>`, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- <paths>` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
|
||||
|
||||
---
|
||||
|
||||
# Wave A — leverage-existing-infra (parallel; disjoint projects)
|
||||
|
||||
### Task M5.1 (T8): CLI `audit tree` + tree endpoint
|
||||
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=<guid>` → `IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows.
|
||||
- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`.
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id <guid> [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern.
|
||||
- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering.
|
||||
**AC:** `audit tree --execution-id <id>` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change.
|
||||
|
||||
### Task M5.2 (T6): Per-node stuck-count KPIs
|
||||
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3
|
||||
**Files:**
|
||||
- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`.
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`.
|
||||
- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each.
|
||||
- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown.
|
||||
- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
|
||||
**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged.
|
||||
|
||||
### Task M5.3 (T7): Structured response-capture increments
|
||||
**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default).
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`).
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row).
|
||||
- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row.
|
||||
**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved.
|
||||
|
||||
---
|
||||
|
||||
# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
|
||||
|
||||
### Task M5.4 (T4): ParentExecutionId tag-cascade
|
||||
**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6)
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`).
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree.
|
||||
- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain).
|
||||
**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
|
||||
|
||||
### Task M5.5 (T3): Per-channel retention overrides
|
||||
**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary<string,int> PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only).
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path.
|
||||
- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
|
||||
- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere.
|
||||
**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective.
|
||||
|
||||
### Task M5.6 (T5): SourceNode sentinel backfill + runbook
|
||||
**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`)
|
||||
**Files:**
|
||||
- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent.
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel <s>] [--before <date>]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file).
|
||||
- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision.
|
||||
- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows.
|
||||
**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
|
||||
|
||||
---
|
||||
|
||||
# Wave C — integration + docs
|
||||
|
||||
### Task M5.7: Integration verification + docs
|
||||
**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1–M5.6
|
||||
**Steps:**
|
||||
1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution).
|
||||
2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
|
||||
3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation.
|
||||
4. Commit; final integration review of the whole `1b7600f..HEAD` diff.
|
||||
**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
|
||||
|
||||
---
|
||||
|
||||
## Native tasks & dependencies
|
||||
|
||||
Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1–M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-16-m5-audit-hardening.md",
|
||||
"tasks": [
|
||||
{"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"},
|
||||
{"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"},
|
||||
{"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"},
|
||||
{"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"},
|
||||
{"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"},
|
||||
{"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]},
|
||||
{"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]}
|
||||
],
|
||||
"lastUpdated": "2026-06-16"
|
||||
}
|
||||
@@ -0,0 +1,264 @@
|
||||
# Patch request — event-driven "wait for attribute change (with timeout)" script helper
|
||||
|
||||
**Date:** 2026-06-17
|
||||
**Type:** Source enhancement (small, additive) to the SiteRuntime script surface
|
||||
**Why now:** the DELMIA/MES receiver re-implementation
|
||||
([`2026-06-17-delmia-mes-receiver-templates-design.md`](2026-06-17-delmia-mes-receiver-templates-design.md), §9 risk #1)
|
||||
currently has to **busy-poll** for the handshake completion flag. This spec describes the gap
|
||||
and a precise, patch-ready design for a host-provided `WaitAsync` helper so scripts can wait
|
||||
**event-driven** for a tag/attribute to reach a value, bounded by a timeout.
|
||||
|
||||
> All file paths, line numbers, message records, and signatures below were read from source on
|
||||
> 2026-06-17. Treat line numbers as guides (they drift); the type/method names are the anchors.
|
||||
|
||||
---
|
||||
|
||||
## 1. The gap
|
||||
|
||||
The receiver handshake (and any request/response tag interaction) needs to **wait until a
|
||||
data-sourced attribute reaches a value** — e.g. wait up to 30 s for `RecipeProcessedFlag == true`
|
||||
or `MoveInCompleteFlag == true` after setting the trigger flag.
|
||||
|
||||
ScadaBridge's script surface today has **read** (`Attributes.GetAsync` / indexer) and **write**
|
||||
(`Attributes.SetAsync` / indexer), but **no "wait for value" primitive**. The only way to wait is
|
||||
a manual poll loop:
|
||||
|
||||
```csharp
|
||||
// current workaround — every handshake script repeats this
|
||||
var deadline = DateTime.UtcNow.AddSeconds(30);
|
||||
while (DateTime.UtcNow < deadline && !CancellationToken.IsCancellationRequested)
|
||||
{
|
||||
if ((bool?)(await Attributes.GetAsync("RecipeProcessedFlag")) == true) break;
|
||||
await Task.Delay(200, CancellationToken);
|
||||
}
|
||||
```
|
||||
|
||||
Why this is unsatisfactory:
|
||||
|
||||
- **Latency** — completion is detected up to one poll interval late (200 ms here).
|
||||
- **Wasted work** — each iteration is an actor `Ask` (`GetAttributeRequest` round-trip to the
|
||||
`InstanceActor`); N handshakes × M polls = a lot of needless messages.
|
||||
- **Boilerplate** — the same loop is copy-pasted into every handshake script, easy to get wrong
|
||||
(forgetting `CancellationToken`, off-by-one on the deadline, not handling quality).
|
||||
- **No quality awareness** — the poll reads whatever value is cached regardless of OPC/MX quality.
|
||||
|
||||
Crucially, **the data is already being pushed to the actor that owns it.** A data-sourced
|
||||
attribute's value arrives from the DCL and is applied in the `InstanceActor`, which then raises
|
||||
`AttributeValueChanged`. So an event-driven waiter is natural and removes the poll entirely.
|
||||
|
||||
---
|
||||
|
||||
## 2. Where the change goes (verified wiring)
|
||||
|
||||
| Concern | Type / file | Notes |
|
||||
|---|---|---|
|
||||
| Change notification | `AttributeValueChanged(InstanceUniqueName, AttributePath, AttributeName, Value, Quality, Timestamp)` — `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Streaming/AttributeValueChanged.cs` | raised on **every** change |
|
||||
| **Single choke point** | `InstanceActor.HandleAttributeValueChanged(...)` — `src/…/SiteRuntime/Actors/InstanceActor.cs` | both static writes (`HandleSetStaticAttributeCore`) **and** DCL/subscription updates (`HandleTagValueUpdate` ← `TagValueUpdate`) funnel through here, then `PublishAndNotifyChildren` |
|
||||
| Owner of state | `InstanceActor` (`_attributes`, `_attributeQualities`, `_attributeTimestamps`) | **single-threaded** — registration + current-value check is atomic here |
|
||||
| Script read path | `AttributeAccessor` (`ScopeAccessors.cs`) → `ScriptRuntimeContext.GetAttribute` → `Ask<GetAttributeResponse>(GetAttributeRequest)` | the helper mirrors this |
|
||||
| Script globals build | `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`) builds `ScriptRuntimeContext` (passes `instanceActor`, `self`, `_askTimeout`) and `ScriptGlobals` (`CancellationToken = cts.Token` from the per-script timeout) | **the script timeout token is NOT currently passed into `ScriptRuntimeContext`** — this patch must thread it in |
|
||||
| Helper idiom | `ScriptRuntimeContext` nested helpers (e.g. `ExternalSystemHelper`) — ctor deps stored as readonly fields, exposed via an on-demand property | follow this idiom |
|
||||
| Trust model | `ScriptTrustPolicy` (`src/…/ScriptAnalysis/`) | `System.Threading.Tasks` + `CancellationToken`/`CancellationTokenSource` are in `AllowedExceptions`; lambdas/`Func<>` are fine. **No trust change needed** — the wait runs in host code; the script just `await`s a provided method. |
|
||||
|
||||
**Design principle:** do the wait **inside the `InstanceActor`** as a one-shot registered waiter,
|
||||
not in the script via polling. Because the actor is single-threaded and `HandleAttributeValueChanged`
|
||||
is the one place every change passes, a waiter that (a) checks the current value on registration and
|
||||
(b) is re-evaluated on each change **cannot miss the edge** between "read current" and "subscribe".
|
||||
|
||||
---
|
||||
|
||||
## 3. Proposed API (script-facing)
|
||||
|
||||
Add to the `Attributes` accessor (`AttributeAccessor` in `ScopeAccessors.cs`), so scope/composition
|
||||
path resolution (`Resolve(name)`) applies just like get/set:
|
||||
|
||||
```csharp
|
||||
// Wait until `name` equals targetValue (value-equality, codec-normalized). Returns true if matched
|
||||
// within the timeout, false if it timed out. Honors the script CancellationToken.
|
||||
Task<bool> Attributes.WaitAsync(string name, object? targetValue, TimeSpan timeout);
|
||||
|
||||
// Predicate form — site-local template scripts only (predicate is an in-process delegate).
|
||||
Task<bool> Attributes.WaitAsync(string name, Func<object?, bool> predicate, TimeSpan timeout);
|
||||
|
||||
// Optional richer overload that also returns the matched value + quality.
|
||||
Task<WaitResult> Attributes.WaitForAsync(string name, object? targetValue, TimeSpan timeout);
|
||||
// record WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
|
||||
```
|
||||
|
||||
> **Status:** IMPLEMENTED. `Attributes.WaitForAsync(...)` returns a `WaitResult`
|
||||
> (`readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut)`
|
||||
> in Commons), populated on match (Value + Quality) and `Matched:false, TimedOut:true` on timeout.
|
||||
|
||||
Return **bool** (not throw) for the common case — the handshake wants matched/timed-out, not an
|
||||
exception. The value-equality overload is the one the handshake needs and is the one that can also
|
||||
be exposed on the inbound/routed side (§6), because a value serializes and a delegate does not.
|
||||
|
||||
Handshake, rewritten (replaces the §1 poll loop):
|
||||
|
||||
```csharp
|
||||
await Attributes.SetAsync("RecipeDownloadFlag", true); // trigger
|
||||
var ok = await Attributes.WaitAsync("RecipeProcessedFlag", true, TimeSpan.FromSeconds(30));
|
||||
if (!ok) return new { Result = false, ResultText = "Timeout waiting for recipe to be processed" };
|
||||
return new {
|
||||
Result = (bool?)(await Attributes.GetAsync("RecipeProcessResult")) ?? false,
|
||||
ResultText = (string?)(await Attributes.GetAsync("RecipeProcessResultText")) ?? ""
|
||||
};
|
||||
```
|
||||
|
||||
```csharp
|
||||
await Attributes.SetAsync("MoveInFlag", true);
|
||||
var ok = await Attributes.WaitAsync("MoveInCompleteFlag", true, TimeSpan.FromSeconds(30));
|
||||
// … read MoveInSuccessfulFlag / MoveInErrorText / MoveInBatchID …
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Implementation outline (the patch)
|
||||
|
||||
### 4.1 New messages (`src/ZB.MOM.WW.ScadaBridge.Commons/Messages/…`)
|
||||
```csharp
|
||||
// actor protocol (site-local; delegate is fine because messaging is in-process)
|
||||
public record WaitForAttributeRequest(
|
||||
string CorrelationId,
|
||||
string InstanceName,
|
||||
string AttributeName, // already scope-resolved by the accessor
|
||||
string? TargetValueEncoded, // AttributeValueCodec.Encode(targetValue); null = "any change"
|
||||
Func<object?, bool>? Predicate, // local-only; null when TargetValueEncoded is used
|
||||
TimeSpan Timeout,
|
||||
DateTimeOffset OccurredAtUtc);
|
||||
|
||||
public record WaitForAttributeResponse(
|
||||
string CorrelationId,
|
||||
bool Matched,
|
||||
object? Value,
|
||||
string Quality,
|
||||
bool TimedOut,
|
||||
string? ErrorMessage = null);
|
||||
|
||||
// internal self-message used to fire the timeout
|
||||
public record WaitForAttributeTimeout(string CorrelationId);
|
||||
```
|
||||
|
||||
### 4.2 `InstanceActor` (`src/…/SiteRuntime/Actors/InstanceActor.cs`)
|
||||
- Add a registry: `Dictionary<string, PendingWait> _attributeWaiters` keyed by `CorrelationId`, where
|
||||
`PendingWait` holds the attribute name, the match test (decoded target value **or** predicate),
|
||||
the original `Sender` (`IActorRef`), and the scheduled `ICancelable` timeout handle.
|
||||
- **Handle `WaitForAttributeRequest`:**
|
||||
1. Build the match test (decode `TargetValueEncoded` via `AttributeValueCodec` → equality test, or
|
||||
use `Predicate`).
|
||||
2. **Fast path:** if the current `_attributes[name]` already satisfies the test, reply
|
||||
`WaitForAttributeResponse(Matched: true, Value, Quality)` immediately and return.
|
||||
3. Otherwise register the waiter and schedule the timeout:
|
||||
`Context.System.Scheduler.ScheduleTellOnce(effectiveTimeout, Self, new WaitForAttributeTimeout(cid), Self)`,
|
||||
storing the returned `ICancelable`. Capture `Sender` now (it is invalid later).
|
||||
4. Bound `effectiveTimeout = min(request.Timeout, requestDeadlineFromCaller)` (the caller's `Ask`
|
||||
already carries the script token; see §4.3). Optionally cap the number of concurrent waiters
|
||||
per instance (defensive; reply with `ErrorMessage` if exceeded).
|
||||
- **In `HandleAttributeValueChanged` (after state is updated):** iterate `_attributeWaiters` whose
|
||||
attribute matches the changed `AttributeName`; for any whose test now passes, cancel its timeout,
|
||||
reply `WaitForAttributeResponse(Matched: true, …)`, and remove it. (Iterate over a snapshot to
|
||||
allow removal during enumeration.)
|
||||
- **Handle `WaitForAttributeTimeout`:** if still registered, reply
|
||||
`WaitForAttributeResponse(Matched: false, TimedOut: true)` and remove.
|
||||
- Optional: a `quality == "Good"`-only mode (parameter on the request) if a handshake must ignore
|
||||
Bad-quality transients.
|
||||
|
||||
> **Status:** IMPLEMENTED as an opt-in `requireGoodQuality` parameter on `WaitAsync`/`WaitForAsync`
|
||||
> (additive trailing `RequireGoodQuality` field on `WaitForAttributeRequest`, gated at both the
|
||||
> fast-path and resolve-loop match sites). Default `false` = quality-agnostic (matches on value only).
|
||||
|
||||
### 4.3 `ScriptRuntimeContext` (`src/…/SiteRuntime/Scripts/ScriptRuntimeContext.cs`)
|
||||
- **Thread the script timeout token in.** Add a `CancellationToken scriptTimeoutToken` constructor
|
||||
parameter (today only `_askTimeout` is available to helpers; the per-script `cts.Token` is **not**
|
||||
passed). `ScriptExecutionActor` already has `cts.Token` — pass it when constructing the context.
|
||||
- Add a method that the accessor calls:
|
||||
```csharp
|
||||
public async Task<bool> WaitAttribute(string name, string? targetValueEncoded,
|
||||
Func<object?,bool>? predicate, TimeSpan timeout)
|
||||
{
|
||||
var cid = Guid.NewGuid().ToString();
|
||||
var req = new WaitForAttributeRequest(cid, _instanceName, name, targetValueEncoded,
|
||||
predicate, timeout, DateTimeOffset.UtcNow);
|
||||
// Ask bounded by the script timeout token so a script-deadline abort cancels the await.
|
||||
var resp = await _instanceActor.Ask<WaitForAttributeResponse>(
|
||||
req, timeout + _askTimeout /* small slack */, _scriptTimeoutToken);
|
||||
return resp.Matched;
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 `ScriptExecutionActor` (`src/…/SiteRuntime/Actors/ScriptExecutionActor.cs`)
|
||||
- Pass `cts.Token` (the per-script timeout, created at the `new CancellationTokenSource(timeout)`
|
||||
site) into the new `ScriptRuntimeContext` constructor parameter from §4.3.
|
||||
|
||||
### 4.5 `AttributeAccessor` (`src/…/SiteRuntime/Scripts/ScopeAccessors.cs`)
|
||||
```csharp
|
||||
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout)
|
||||
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout);
|
||||
|
||||
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout)
|
||||
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout);
|
||||
```
|
||||
|
||||
### 4.6 Trust model — no change
|
||||
`WaitAsync` is a host-provided async method; the wait/scheduling happens in host code. The script
|
||||
only `await`s it and may pass a `Func<>` (a normal closure, not reflection). `System.Threading.Tasks`
|
||||
+ `CancellationToken` are already in `ScriptTrustPolicy.AllowedExceptions`. Verify the new helper
|
||||
type/members don't collide with `ForbiddenIdentifiers` (`dynamic`, `Activator`) — they don't.
|
||||
|
||||
---
|
||||
|
||||
## 5. Correctness notes
|
||||
|
||||
- **No missed edge.** Registration (current-value check) and change-handling both run on the
|
||||
`InstanceActor`'s single thread, so a value that flips between "set trigger" and "register waiter"
|
||||
is caught by the fast-path check; a value that flips after registration is caught by
|
||||
`HandleAttributeValueChanged`. The poll-loop and this design are both correct; this one is
|
||||
event-driven and cheaper.
|
||||
- **Timeout is authoritative and self-cleaning.** The scheduled `WaitForAttributeTimeout` guarantees
|
||||
the waiter is removed and the caller answered even if the value never changes. Match cancels the
|
||||
scheduled timeout.
|
||||
- **Cancellation.** Bounding the helper `Ask` with the script timeout token means a script that hits
|
||||
its own `ExecutionTimeoutSeconds` abandons the wait; pair with a best-effort cancel message to the
|
||||
actor to evict the orphan waiter promptly (otherwise it self-evicts at its own timeout).
|
||||
- **Concurrency / re-entrancy.** Multiple waiters per instance are fine (keyed by `CorrelationId`).
|
||||
Consider a per-instance cap as a guard against a script leaking waiters in a loop.
|
||||
|
||||
---
|
||||
|
||||
## 6. Optional: inbound / routed variant
|
||||
|
||||
For symmetry with `RouteTarget.GetAttributes` (`src/…/InboundAPI/RouteHelper.cs`), an inbound script
|
||||
could call `Route.To(code).WaitForAttribute(name, targetValue, timeout)`. Mirror the existing routed
|
||||
pattern: add `RouteToWaitForAttributeRequest/Response`, an `IInstanceRouter.RouteToWaitForAttributeAsync`
|
||||
method, and unpack it on the site comms actor into the same `WaitForAttributeRequest` to the
|
||||
`InstanceActor`. **Value-equality only** across the wire — a `Func<>` predicate cannot be serialized,
|
||||
so the routed form takes the encoded target value (the predicate overload stays site-local). This is
|
||||
optional: the receiver handshake runs **inside** the template script (site-local), so §3–§5 alone
|
||||
fully cover the DELMIA/MES use case.
|
||||
|
||||
> **Status:** IMPLEMENTED. `Route.To(code).WaitForAttribute(name, targetValue, timeout)` is wired
|
||||
> end-to-end (`RouteToWaitForAttributeRequest/Response` → `IInstanceRouter` → `CommunicationService`
|
||||
> → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only
|
||||
> across the wire. NOT wired into the CentralUI Test-Run sandbox — that remains a follow-up.
|
||||
|
||||
---
|
||||
|
||||
## 7. Acceptance criteria
|
||||
|
||||
1. A template script can `await Attributes.WaitAsync("Flag", true, TimeSpan.FromSeconds(30))` and it
|
||||
returns `true` promptly when the data-sourced attribute reaches `true` (driven by a DCL update),
|
||||
with no poll loop.
|
||||
2. Returns `false` (no throw) when the value never matches within the timeout.
|
||||
3. The wait is bounded by the script's own `ExecutionTimeoutSeconds` (a shorter script deadline wins).
|
||||
4. No `AttributeValueChanged` edge is missed across the register/change boundary (unit test: flip the
|
||||
value in the same actor step as registration, and one step after).
|
||||
5. Waiters are removed on match and on timeout (no leak; assert registry empty afterward).
|
||||
6. Scope/composition path resolution works (`Children["DelmiaReceiver"]`-scoped wait resolves to the
|
||||
composed child's attribute).
|
||||
7. Passes `ScriptAnalysis` trust validation unchanged.
|
||||
8. The DELMIA/MES handshake base scripts (design doc §4) compile and pass using `WaitAsync` in place
|
||||
of the poll loop.
|
||||
|
||||
Suggested tests: extend `InstanceActor` tests (waiter fast-path, change-match, timeout, removal) and
|
||||
the script-surface tests under `tests/…/SiteRuntime*`.
|
||||
```
|
||||
@@ -0,0 +1,226 @@
|
||||
# WaitAsync Deferred Optional Items — Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (subagent-driven) to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Implement the three items deferred from the WaitAsync spec (`docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md`): §3 `WaitForAsync`/`WaitResult` richer overload, §4.2 quality-gated ("Good"-only) matching, and §6 inbound/routed `Route.To(...).WaitForAttribute` variant.
|
||||
|
||||
**Architecture:** Builds on the shipped core (`b89d69a`→`04e97f4`). Two of the items (§3, §4.2) are site-local enrichments of the existing `Attributes` script surface + `InstanceActor` waiter; no new actor protocol shapes beyond an additive `RequireGoodQuality` field. The third (§6) mirrors the existing `Route.To(...).GetAttributes` cross-cluster path end-to-end (`RouteTarget` → `IInstanceRouter` → `CommunicationService` → `SiteCommunicationActor` → `DeploymentManagerActor` → `InstanceActor`), value-equality only across the wire, with the cluster Ask bounded by the *wait* timeout rather than the generic integration timeout.
|
||||
|
||||
**Tech Stack:** C#/.NET 10, Akka.NET 1.5, xUnit + Akka.TestKit + NSubstitute.
|
||||
|
||||
**Branch/worktree:** `waitfor-attr-helper` at `/Users/dohertj2/Desktop/ScadaBridge/.claude/worktrees/waitfor-attr-helper` (off local main; carries the core feature). Implementers do NOT create worktrees, commit **pathspec form** (`git commit -m "…" -- <paths>`), do NOT push, do NOT touch main. Targeted builds/tests per task; full-solution build only in WD-3.
|
||||
|
||||
---
|
||||
|
||||
## Naming / shared shapes
|
||||
|
||||
- New script return type `WaitResult` (Commons): `public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);`
|
||||
- `WaitForAttributeRequest` gains a trailing additive field `bool RequireGoodQuality = false` (site-local request). `RequireGoodQuality` semantics: a match requires the value test to pass **and** `string.Equals(quality, "Good", StringComparison.Ordinal)`.
|
||||
- Routed contract (value-equality only, no predicate, no quality flag across the wire — §6 says value-equality only): `RouteToWaitForAttributeRequest` / `RouteToWaitForAttributeResponse` (Commons `Messages/InboundApi`).
|
||||
- The `WaitForAttributeResponse.Quality` field is already `string?` (null on timeout/error).
|
||||
|
||||
---
|
||||
|
||||
## Execution waves
|
||||
|
||||
- **Wave 1 (parallel, disjoint files):** WD-1 ∥ WD-2a. (2 concurrent committers; post-wave HEAD-presence check.)
|
||||
- **Wave 2:** WD-2b (after WD-2a).
|
||||
- **Wave 3:** WD-3 (after WD-1, WD-2a, WD-2b).
|
||||
|
||||
WD-1 must add `RequireGoodQuality` ONLY as a **trailing defaulted** ctor param of `WaitForAttributeRequest`, so WD-2b's `new WaitForAttributeRequest(...)` (built in wave 2) compiles regardless.
|
||||
|
||||
---
|
||||
|
||||
### Task WD-1: Site-local `WaitForAsync` + `WaitResult` + quality-gated mode (§3 + §4.2)
|
||||
|
||||
**Classification:** high-risk (modifies the `InstanceActor` single-threaded match evaluation + an additive message-contract field)
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** WD-2a
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ZB.MOM.WW.ScadaBridge.Commons/Types/WaitResult.cs`
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Instance/WaitForAttribute.cs` (add trailing `bool RequireGoodQuality = false` to `WaitForAttributeRequest`)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/InstanceActor.cs` (thread `RequireGoodQuality` into `PendingWait` + both match sites)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (add `WaitAttributeFull` returning `WaitResult`; add `requireGoodQuality` param)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScopeAccessors.cs` (add `WaitForAsync` overloads + `requireGoodQuality` optional param on `WaitAsync`)
|
||||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/InstanceActorWaitForAttributeTests.cs` + `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Scripts/ScopeAccessorTests.cs`
|
||||
|
||||
**Steps (TDD):**
|
||||
|
||||
1. **`WaitResult`** — add the readonly record struct above.
|
||||
|
||||
2. **`WaitForAttributeRequest`** — add trailing `bool RequireGoodQuality = false`. Keep the `Func<>` predicate field as-is. Update the XML-doc.
|
||||
|
||||
3. **`InstanceActor`** — add `bool RequireGoodQuality` to the `PendingWait` record. At BOTH match sites build the effective match as:
|
||||
```csharp
|
||||
// fast-path (HandleWaitForAttribute): quality from _attributeQualities.GetValueOrDefault(name, <existing default>)
|
||||
// resolve loop (ResolveMatchedWaiters): quality from changed.Quality
|
||||
bool QualityOk(string? q) => !requireGoodQuality || string.Equals(q, "Good", StringComparison.Ordinal);
|
||||
bool matched = QualityOk(quality) && test(value); // keep test() inside its existing try/catch
|
||||
```
|
||||
Store `RequireGoodQuality` on the `PendingWait` so the resolve loop knows it. Keep the throwing-predicate guard (the `QualityOk && test` must still be inside the existing try/catch). The fast-path quality-fail when `requireGoodQuality` is just a non-match → register + schedule timeout as normal (do NOT fast-reply matched).
|
||||
|
||||
4. **`ScriptRuntimeContext`** — refactor: a private `Task<WaitForAttributeResponse> WaitInternal(name, encoded, predicate, timeout, requireGoodQuality)` that does the token-bounded `Ask` (keep the existing `AskTimeoutException → ...` handling; on AskTimeout return a synthetic `WaitForAttributeResponse(.., Matched:false, TimedOut:true)`). Then:
|
||||
```csharp
|
||||
public async Task<bool> WaitAttribute(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
|
||||
=> (await WaitInternal(name, enc, pred, t, requireGoodQuality)).Matched;
|
||||
public async Task<WaitResult> WaitAttributeFull(string name, string? enc, Func<object?,bool>? pred, TimeSpan t, bool requireGoodQuality = false)
|
||||
{ var r = await WaitInternal(...); return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut); }
|
||||
```
|
||||
(Note: `WaitAttribute`'s existing `AskTimeoutException → return false` must be preserved — fold it into `WaitInternal` returning a non-matched/timed-out response, OR catch in both. Do NOT catch `OperationCanceledException`/`TaskCanceledException`.)
|
||||
|
||||
5. **`AttributeAccessor`** — add `requireGoodQuality` optional param to both existing `WaitAsync` overloads, and add two `WaitForAsync` overloads:
|
||||
```csharp
|
||||
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
|
||||
public Task<WaitResult> WaitForAsync(string key, Func<object?,bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
|
||||
```
|
||||
XML-doc: `requireGoodQuality:true` ignores Bad/Uncertain-quality transients.
|
||||
|
||||
6. **Tests** (extend existing files): (a) `WaitForAsync` returns a populated `WaitResult` on match (Value+Quality) and on timeout (`Matched:false, TimedOut:true`). (b) quality-gated: a value reaching target at **Bad** quality does NOT match when `requireGoodQuality:true` (stays pending → times out), but DOES match when `false`; and matches when it reaches target at Good quality. Cover both fast-path (already-at-target-but-Bad) and change-match. (c) scope resolution still applied for `WaitForAsync`.
|
||||
|
||||
7. Build `Commons` + `SiteRuntime` + the SiteRuntime test project; run `--filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync"` and the `~InstanceActor|~ScopeAccessor` regression filter. All green.
|
||||
|
||||
8. Commit (pathspec).
|
||||
|
||||
---
|
||||
|
||||
### Task WD-2a: Routed contract + central path (§6, part 1)
|
||||
|
||||
**Classification:** high-risk (cross-cluster message contract + `IInstanceRouter` surface)
|
||||
**Estimated implement time:** ~5 min
|
||||
**Parallelizable with:** WD-1
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs` (add the two records)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/IInstanceRouter.cs` (add method)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/CommunicationServiceInstanceRouter.cs` (delegate)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs` (`RouteTarget.WaitForAttribute`)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/CommunicationService.cs` (`RouteToWaitForAttributeAsync` — **wait-timeout-aware** Ask)
|
||||
- Modify (compile-break fixes — interface gained a member): `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Integration/ParentExecutionIdCorrelationTests.cs` (`BridgingInstanceRouter`) and the inline `IInstanceRouter` double in `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointContentTypeTests.cs`
|
||||
- Test: `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/RouteHelperTests.cs`
|
||||
|
||||
**Steps (TDD):**
|
||||
|
||||
1. **Commons records** (mirror `RouteToGetAttributes*`, value-equality only):
|
||||
```csharp
|
||||
public record RouteToWaitForAttributeRequest(
|
||||
string CorrelationId, string InstanceUniqueName, string AttributeName,
|
||||
string? TargetValueEncoded, TimeSpan Timeout, DateTimeOffset Timestamp,
|
||||
Guid? ParentExecutionId = null);
|
||||
public record RouteToWaitForAttributeResponse(
|
||||
string CorrelationId, bool Matched, object? Value, string? Quality, bool TimedOut,
|
||||
bool Success, string? ErrorMessage, DateTimeOffset Timestamp);
|
||||
```
|
||||
(`Success`/`ErrorMessage` = routing-level outcome, e.g. instance-not-found; `Matched`/`TimedOut`/`Value`/`Quality` = wait outcome.)
|
||||
|
||||
2. **`IInstanceRouter`** — add `Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);`. **Update all 3 implementers** (prod `CommunicationServiceInstanceRouter` + the 2 test doubles listed above; the test doubles can return a canned response / throw NotImplemented only if never exercised — prefer a sane canned response).
|
||||
|
||||
3. **`CommunicationServiceInstanceRouter`** — delegate to `_communicationService.RouteToWaitForAttributeAsync(...)`.
|
||||
|
||||
4. **`RouteHelper.RouteTarget`** — add (mirror `GetAttributes`, throw on `!Success`):
|
||||
```csharp
|
||||
public async Task<bool> WaitForAttribute(string attributeName, object? targetValue, TimeSpan timeout, CancellationToken cancellationToken = default)
|
||||
{
|
||||
var token = Effective(cancellationToken);
|
||||
var siteId = await ResolveSiteAsync(token);
|
||||
var request = new RouteToWaitForAttributeRequest(Guid.NewGuid().ToString(), _instanceCode,
|
||||
attributeName, AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow, _parentExecutionId);
|
||||
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
|
||||
if (!response.Success) throw new InvalidOperationException(response.ErrorMessage ?? "Remote attribute wait failed");
|
||||
return response.Matched;
|
||||
}
|
||||
```
|
||||
(`AttributeValueCodec` is in Commons.Types — add the using if needed.)
|
||||
|
||||
5. **`CommunicationService.RouteToWaitForAttributeAsync`** — mirror `RouteToGetAttributesAsync` BUT bound the Ask by the wait timeout, not the generic integration timeout:
|
||||
```csharp
|
||||
var envelope = new SiteEnvelope(siteId, request);
|
||||
var askTimeout = request.Timeout + _options.IntegrationTimeout; // slack beyond the wait
|
||||
return await GetActor().Ask<RouteToWaitForAttributeResponse>(envelope, askTimeout, cancellationToken);
|
||||
```
|
||||
|
||||
6. **Test** (`RouteHelperTests`): with a substitute `IInstanceRouter` returning a canned `RouteToWaitForAttributeResponse(Matched:true,...)`, `Route.To("x").WaitForAttribute("Flag", true, 30s)` returns true; `Success:false` → throws `InvalidOperationException`; the encoded target equals `AttributeValueCodec.Encode(true)`.
|
||||
|
||||
7. Build `Commons` + `InboundAPI` + `Communication` + the two affected test projects; run `--filter "FullyQualifiedName~RouteHelper"` + a build of AuditLog.Tests/InboundAPI.Tests to confirm the interface-addition compiles. Commit (pathspec).
|
||||
|
||||
---
|
||||
|
||||
### Task WD-2b: Site unpacking + handler (§6, part 2)
|
||||
|
||||
**Classification:** high-risk (actor handler crossing into `InstanceActor`; Ask-timeout correctness)
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none
|
||||
**blockedBy:** WD-2a
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Communication/Actors/SiteCommunicationActor.cs` (add `Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));` next to the other RouteTo forwards ~line 145)
|
||||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (`Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);` + handler)
|
||||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/Actors/DeploymentManagerActorTests.cs`
|
||||
|
||||
**Steps (TDD):**
|
||||
|
||||
1. **`SiteCommunicationActor`** — add the `Receive`/Forward line.
|
||||
|
||||
2. **`DeploymentManagerActor.RouteInboundApiWaitForAttribute`** — mirror `RouteInboundApiGetAttributes`:
|
||||
```csharp
|
||||
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
|
||||
{
|
||||
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
|
||||
{
|
||||
Sender.Tell(new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false,
|
||||
false, $"Instance '{request.InstanceUniqueName}' not found on this site.", DateTimeOffset.UtcNow));
|
||||
return;
|
||||
}
|
||||
var sender = Sender;
|
||||
var inner = new WaitForAttributeRequest(request.CorrelationId, request.InstanceUniqueName,
|
||||
request.AttributeName, request.TargetValueEncoded, null /*predicate*/, request.Timeout,
|
||||
DateTimeOffset.UtcNow /*, RequireGoodQuality defaults false */);
|
||||
// Ask bounded by the WAIT timeout + slack (NOT a fixed 30s).
|
||||
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
|
||||
.ContinueWith(t => t.IsCompletedSuccessfully
|
||||
? new RouteToWaitForAttributeResponse(request.CorrelationId, t.Result.Matched, t.Result.Value,
|
||||
t.Result.Quality, t.Result.TimedOut, true, null, DateTimeOffset.UtcNow)
|
||||
: new RouteToWaitForAttributeResponse(request.CorrelationId, false, null, null, false, false,
|
||||
t.Exception?.GetBaseException().Message ?? "Attribute wait timed out", DateTimeOffset.UtcNow))
|
||||
.PipeTo(sender);
|
||||
}
|
||||
```
|
||||
(`WaitForAttributeRequest` lives in Commons `Messages/Instance` — add the using. Build with both the trailing-`RequireGoodQuality` and pre-field signatures in mind; passing 7 positional args + default is fine.)
|
||||
|
||||
3. **Test** (`DeploymentManagerActorTests`, mirror the routed get-attributes test): deploy/register an instance whose attribute already equals the target → `RouteToWaitForAttributeRequest` → `RouteToWaitForAttributeResponse(Success:true, Matched:true)`; unknown instance → `Success:false`.
|
||||
|
||||
4. Build `Communication` + `SiteRuntime` + SiteRuntime test project; run `--filter "FullyQualifiedName~DeploymentManagerActor"`. Commit (pathspec).
|
||||
|
||||
---
|
||||
|
||||
### Task WD-3: Integration — docs + full verification
|
||||
|
||||
**Classification:** standard
|
||||
**Estimated implement time:** ~4 min
|
||||
**Parallelizable with:** none
|
||||
**blockedBy:** WD-1, WD-2a, WD-2b
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md` (mark §3 `WaitForAsync`/`WaitResult`, §4.2 quality-gated mode, and §6 routed variant as IMPLEMENTED; note Test-Run sandbox parity excluded)
|
||||
- Modify: `docs/requirements/Component-SiteRuntime.md` (script-surface note: `Attributes.WaitForAsync` + `requireGoodQuality`) and `docs/requirements/Component-InboundAPI.md` (`Route.To(...).WaitForAttribute`) — brief, only if those docs enumerate the script surface
|
||||
- (No new component, no migration, no docker config change)
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Update the spec doc + component docs as above.
|
||||
2. **Full-solution build:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` — 0 errors.
|
||||
3. **Targeted test sweep** across everything touched:
|
||||
`dotnet test tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/... --filter "FullyQualifiedName~WaitForAttribute|FullyQualifiedName~WaitAsync|FullyQualifiedName~WaitForAsync|FullyQualifiedName~DeploymentManagerActor"`,
|
||||
`dotnet test tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/... --filter "FullyQualifiedName~RouteHelper"`,
|
||||
and a build of `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` + `tests/ZB.MOM.WW.ScadaBridge.Communication.Tests` to confirm no compile/regression from the interface addition.
|
||||
4. `git diff` review; commit (pathspec).
|
||||
|
||||
---
|
||||
|
||||
## Out of scope (explicit)
|
||||
|
||||
- Routed `WaitForAttribute` is NOT wired into the CentralUI Test-Run sandbox (`ISandboxInstanceGateway`/`SandboxInstanceGateway`); production inbound scripts get it. Follow-up if Test-Run parity is wanted.
|
||||
- No predicate or quality flag across the wire (§6 is value-equality only, per spec).
|
||||
- No docker redeploy (no cluster-runtime config change; additive script surface only).
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-06-17-waitfor-deferred-items.md",
|
||||
"tasks": [
|
||||
{"id": 1, "subject": "WD-1: site-local WaitForAsync + WaitResult + quality-gated mode (§3+§4.2)", "classification": "high-risk", "status": "pending", "parallelizableWith": [2]},
|
||||
{"id": 2, "subject": "WD-2a: routed contract + central path (§6 part 1)", "classification": "high-risk", "status": "pending", "parallelizableWith": [1]},
|
||||
{"id": 3, "subject": "WD-2b: site unpacking + DeploymentManager handler (§6 part 2)", "classification": "high-risk", "status": "pending", "blockedBy": [2]},
|
||||
{"id": 4, "subject": "WD-3: integration — docs + full verification", "classification": "standard", "status": "pending", "blockedBy": [1, 2, 3]}
|
||||
],
|
||||
"lastUpdated": "2026-06-17"
|
||||
}
|
||||
@@ -158,16 +158,32 @@ is per-run and flat — `WHERE ExecutionId = X` returns everything one run did,
|
||||
nothing links a run to the run that *spawned* it. `ParentExecutionId` carries the
|
||||
spawning execution's `ExecutionId`: a spawned run still gets its own fresh
|
||||
`ExecutionId`, and every audit row it emits also carries the spawner's id in
|
||||
`ParentExecutionId`. The first cut bridges the **inbound API → routed-site-script**
|
||||
case: an inbound request runs a method script that calls `Route.Call`, routing to
|
||||
a site instance; the routed site script records the inbound request's
|
||||
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
|
||||
itself is top-level (`ParentExecutionId` NULL). The pointer always references the
|
||||
*immediate* spawner, so a routed run that itself routes onward threads its own
|
||||
`ExecutionId` — walking `ParentExecutionId → ExecutionId` recursively
|
||||
reconstructs the call chain as a tree of arbitrary depth. The tag-cascade case
|
||||
(an attribute write triggering another script) is **deferred** — the model
|
||||
generalises to it with no schema change once that spawn point is threaded.
|
||||
`ParentExecutionId`. The pointer always references the *immediate* spawner, so a
|
||||
run that itself spawns further runs threads its own `ExecutionId` — walking
|
||||
`ParentExecutionId → ExecutionId` recursively reconstructs the call chain as a
|
||||
tree of arbitrary depth.
|
||||
|
||||
**Tag-cascade coverage (M5.4 T4):** `ParentExecutionId` threading now spans all
|
||||
known spawn points:
|
||||
|
||||
- **Inbound API → routed site script** — an inbound request runs a method script
|
||||
that calls `Route.Call`; the routed site script records the inbound request's
|
||||
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
|
||||
is top-level (`ParentExecutionId` NULL).
|
||||
- **Alarm-triggered on-trigger script** — when an alarm fires and its on-trigger
|
||||
script runs (via `AlarmActor → AlarmExecutionActor`), the alarm context's
|
||||
`ExecutionId` is carried as the run's `ParentExecutionId`. Currently the alarm
|
||||
subsystem has no Guid-typed firing id so on-trigger runs are roots (NULL) in
|
||||
practice, but the wiring is in place for a future alarm `ExecutionId`.
|
||||
- **Nested `CallScript` / `CallShared` invocations** — when a script calls
|
||||
`Instance.CallScript(...)` or a shared script via `CallShared`, the calling
|
||||
execution's `ExecutionId` threads into the spawned run as its
|
||||
`ParentExecutionId`, making deeply nested call chains visible as a tree.
|
||||
|
||||
Attribute-write-triggered cascades (one tag change triggering another script via a
|
||||
tag subscription) are also wired: trigger-driven runs carry `ParentExecutionId =
|
||||
NULL` (top-level roots), and any nested `CallScript`/`CallShared` they perform
|
||||
chains as above. The schema is unchanged — no further tag-cascade work is deferred.
|
||||
|
||||
## The Site-Local `AuditLog` (SQLite)
|
||||
|
||||
@@ -268,7 +284,34 @@ operational `SiteCalls` shape for the dispatcher and UI.
|
||||
|
||||
- **Default cap** — 8 KB for each of `RequestSummary` and `ResponseSummary`;
|
||||
raised to 64 KB on any error row (`Status IN ('Failed', 'Parked', 'Discarded')`).
|
||||
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and `ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB (configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min 8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to other channels do not apply here. `PayloadTruncated = 1` is set only when the inbound ceiling is hit — verbatim capture is the normal case. The ceiling applies independently to each body. Header redaction and per-target body redactors still run before persistence.
|
||||
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and
|
||||
`ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB
|
||||
(configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min
|
||||
8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to
|
||||
other channels do not apply here. `PayloadTruncated = 1` is set only when the
|
||||
inbound ceiling is hit — verbatim capture is the normal case. The ceiling
|
||||
applies independently to each body. Header redaction and per-target body
|
||||
redactors still run before persistence.
|
||||
- **Inbound ceiling hits (M5.3 T7).** Every time the `InboundMaxBytes` ceiling
|
||||
truncates a body an `IAuditInboundCeilingHitsCounter.Increment()` call fires.
|
||||
This counter is surfaced as `AuditInboundCeilingHits` on the central health
|
||||
snapshot (alongside `CentralAuditWriteFailures` / `AuditRedactionFailure`) so
|
||||
operators can detect persistently oversized payloads and raise the ceiling or
|
||||
add per-target body redactors.
|
||||
- **Request headers in `Extra` (M5.3 T7).** For `Channel = ApiInbound`, the
|
||||
`AuditWriteMiddleware` captures the inbound HTTP request headers (post-redaction
|
||||
— `Authorization`, `X-API-Key`, `Cookie`, `Set-Cookie`, and the configured
|
||||
`HeaderRedactList` are scrubbed before serialization) into the `Extra` JSON
|
||||
column under the key `"requestHeaders"`. This makes the full header envelope
|
||||
visible in the Audit Log UI's detail drawer and the CLI's `audit query` output
|
||||
without widening the schema.
|
||||
- **Per-method `SkipBodyCapture` (M5.3 T7).** `PerTargetOverrides` now includes
|
||||
a `SkipBodyCapture: true` flag. When set for an inbound API method, the audit
|
||||
row is always emitted (headers, status, duration, actor, etc. are recorded) but
|
||||
`RequestSummary` and `ResponseSummary` are left null. Use this for methods whose
|
||||
payloads are structurally large or contain secrets not covered by body redactors.
|
||||
Headers are still captured into `Extra.requestHeaders` (after redaction) even
|
||||
when `SkipBodyCapture` is true.
|
||||
- **Truncation** — UTF-8 byte-safe; `PayloadTruncated = 1` when applied. Full
|
||||
bodies are never stored.
|
||||
- **HTTP headers** — `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`, and
|
||||
@@ -311,16 +354,33 @@ MS SQL for direct-write events). Unredacted secrets never persist.
|
||||
## Retention & Purge
|
||||
|
||||
- **Central:** 365-day default based on `OccurredAtUtc`, configurable via
|
||||
`AuditLog:RetentionDays` (min 7, max 3650). Single global retention in v1 —
|
||||
no per-channel overrides.
|
||||
`AuditLog:RetentionDays` (min 30, max 3650).
|
||||
- **Partitioning:** monthly partitions on `OccurredAtUtc` from day one
|
||||
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). Purge is a partition switch;
|
||||
there are no row-level deletes at central.
|
||||
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). The global partition switch is
|
||||
channel-blind; it drops a whole month once every row in it is older than the
|
||||
global window. There are no row-level deletes at central for the global purge.
|
||||
- **Purge actor:** `AuditLogPurgeActor` singleton on the active central node
|
||||
runs daily, switches out any partition whose latest `OccurredAtUtc` is older
|
||||
than the retention window, and emits an `AuditLog:Purged` event (partition
|
||||
range, rowcount, duration). A partition-maintenance step rolls forward each
|
||||
month, creating the next month's partition ahead of time.
|
||||
than the retention window, then applies any per-channel overrides (see below),
|
||||
and emits an `AuditLog:Purged` event (partition range, rowcount, duration) per
|
||||
switched partition. A partition-maintenance step rolls forward each month,
|
||||
creating the next month's partition ahead of time.
|
||||
- **Per-channel retention overrides (M5.5 T3):** `AuditLog:PerChannelRetentionDays`
|
||||
is a dictionary keyed by canonical channel name (`ApiOutbound`, `DbOutbound`,
|
||||
`Notification`, `ApiInbound`) whose value is a retention window in days that
|
||||
MUST be strictly shorter than the global `RetentionDays`. After the daily
|
||||
partition switch-out, the purge actor runs a bounded, batched row DELETE
|
||||
(`PurgeChannelOlderThanAsync`) for each channel whose override is shorter than
|
||||
the global window — expiring rows of that channel earlier than the global
|
||||
partition switch would. Overrides equal to or longer than the global window are
|
||||
silently skipped (the global switch already covers them). The DELETE runs under
|
||||
`scadabridge_audit_purger` (the maintenance role); the append-only writer role
|
||||
is unaffected. Batch size is configurable via
|
||||
`AuditLogPurge:ChannelPurgeBatchSize` (default 5000). Each channel override
|
||||
runs in its own try/catch, mirroring the per-boundary error-isolation of the
|
||||
partition switch-out loop. Values are validated to be in
|
||||
`[30, RetentionDays]`; keys that are not a recognized `AuditChannel` enum name
|
||||
are rejected at startup.
|
||||
- **Sites:** daily site job; default 7-day retention (configurable, min 1,
|
||||
max 90). Respects the hard `ForwardState` invariant — `Pending` rows are
|
||||
never purged on age alone.
|
||||
@@ -340,10 +400,13 @@ MS SQL for direct-write events). Unredacted secrets never persist.
|
||||
**AuditExport** permission.
|
||||
- **Payload redaction at write.** See Payload Capture Policy. Unredacted
|
||||
secrets never persist; the safety net over-redacts on misconfiguration.
|
||||
- **Hash-chain tamper evidence — deferred to v1.x.** A future `RowHash` column,
|
||||
computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will be
|
||||
verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. Off by
|
||||
default in v1.
|
||||
- **Hash-chain tamper evidence (T1) — deferred to v1.x.** A future `RowHash`
|
||||
column, computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will
|
||||
be verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. The
|
||||
`verify-chain` CLI command is a no-op placeholder today. Off by default in v1.
|
||||
- **Parquet archival (T2) — deferred to v1.x.** Long-term cold storage of purged
|
||||
monthly partitions as Parquet files (suitable for offline analytics) will be
|
||||
added in a future milestone. T1 and T2 are not shipped as part of M5.
|
||||
- **Site SQLite security.** File permissions: read/write by the ScadaBridge
|
||||
service account only. Not backed up off-machine — site SQLite is a buffer,
|
||||
not a record.
|
||||
@@ -355,11 +418,22 @@ Point-in-time, computed from the central `AuditLog` table; global and per-site.
|
||||
- **Audit volume** — events/min landing in the central `AuditLog`; global plus per-site sparkline.
|
||||
- **Audit error rate** — % of central `AuditLog` rows with `Status IN ('Failed', 'Parked', 'Discarded')` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, permanent failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
|
||||
- **Audit backlog** — sum of `Pending` site rows across sites; click drills into a per-site breakdown.
|
||||
- **`AuditInboundCeilingHits`** (M5.3 T7) — rolling count of inbound API responses truncated by the `InboundMaxBytes` ceiling; surfaced on the central health snapshot alongside `CentralAuditWriteFailures`.
|
||||
|
||||
**Per-node stuck KPIs (M5.3 T6):** Both [Notification Outbox](Component-NotificationOutbox.md)
|
||||
and [Site Call Audit](Component-SiteCallAudit.md) now expose a
|
||||
`PerNodeNotificationKpiRequest` / `PerNodeSiteCallKpiRequest` message pair that
|
||||
groups the existing stuck, parked, and delivered-last-interval counts by the
|
||||
`SourceNode` that emitted the original row. This surfaces per-node breakdowns on
|
||||
the Health dashboard tiles and the Notification Outbox / Site Calls pages,
|
||||
making it possible to identify a single misbehaving node (e.g., `site-a:node-b`)
|
||||
as the source of a spike rather than a site-wide problem. The existing global and
|
||||
per-site KPI shapes are unchanged; the per-node slice is additive.
|
||||
|
||||
[Notification Outbox](Component-NotificationOutbox.md) and
|
||||
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected — they remain
|
||||
sourced from `Notifications` and `SiteCalls` respectively. Audit Log KPIs
|
||||
describe the audit table itself.
|
||||
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected for their
|
||||
operational dispatch responsibilities — they remain sourced from `Notifications`
|
||||
and `SiteCalls` respectively. Audit Log KPIs describe the audit table itself.
|
||||
|
||||
## Configuration
|
||||
|
||||
@@ -370,21 +444,78 @@ component (Options pattern):
|
||||
"AuditLog": {
|
||||
"DefaultCapBytes": 8192,
|
||||
"ErrorCapBytes": 65536,
|
||||
"InboundMaxBytes": 1048576,
|
||||
"HeaderRedactList": [ "Authorization", "Cookie", "Set-Cookie", "X-API-Key" ],
|
||||
"GlobalBodyRedactors": [
|
||||
{ "Pattern": "\"password\"\\s*:\\s*\"[^\"]+\"", "Replacement": "\"password\":\"<redacted>\"" }
|
||||
],
|
||||
"PerTargetOverrides": {
|
||||
"Weather/GetForecast": { "CapBytes": 4096 },
|
||||
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" }
|
||||
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" },
|
||||
"HighVolumeMethod": { "SkipBodyCapture": true }
|
||||
},
|
||||
"RetentionDays": 365
|
||||
"RetentionDays": 365,
|
||||
"PerChannelRetentionDays": {
|
||||
"ApiOutbound": 90,
|
||||
"Notification": 180
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`PerTargetOverrides` keys bind by External System / Inbound Method /
|
||||
Notification List / Database Connection name. `RetentionDays` is a single
|
||||
global value in v1; per-channel overrides are deferred to v1.x.
|
||||
Notification List / Database Connection name. `SkipBodyCapture: true` omits
|
||||
`RequestSummary`/`ResponseSummary` for that method while still capturing headers
|
||||
into `Extra.requestHeaders` and emitting the full audit row. `RetentionDays` is
|
||||
the global window; `PerChannelRetentionDays` specifies per-channel windows that
|
||||
are strictly shorter — any channel whose override equals or exceeds the global
|
||||
value is silently ignored (the global partition switch-out already governs it).
|
||||
|
||||
`AuditLogPurge` section controls the purge actor cadence and batch size:
|
||||
|
||||
```jsonc
|
||||
"AuditLogPurge": {
|
||||
"IntervalHours": 24,
|
||||
"ChannelPurgeBatchSize": 5000
|
||||
}
|
||||
```
|
||||
|
||||
## Ops Notes — Historical Null Columns
|
||||
|
||||
### `SourceNode` backfill (M5.6 T5)
|
||||
|
||||
`SourceNode` (`varchar(64)` NULL) is a physical column stamped on every row at
|
||||
write time. Rows ingested before M5.6 shipped have `SourceNode IS NULL` because
|
||||
the value was not populated until the feature landed. A one-time CLI command sets
|
||||
these to a configurable sentinel:
|
||||
|
||||
```
|
||||
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel unknown] [--batch 5000]
|
||||
```
|
||||
|
||||
The default sentinel is `"unknown"`. The true node-of-origin for pre-feature rows
|
||||
is **unknowable** retroactively — the emitting node is long gone from the telemetry
|
||||
pipeline. The sentinel makes that explicit rather than leaving the column NULL
|
||||
(which the Audit Log UI's Node filter already treats as "unresolved", but which
|
||||
an operator might mistake for a data-quality bug).
|
||||
|
||||
The backfill runs via `POST /api/audit/backfill-source-node` (Admin role required)
|
||||
on the maintenance/purge path, NOT the append-only `scadabridge_audit_writer` role.
|
||||
It is idempotent and can be re-run safely.
|
||||
|
||||
### `ExecutionId` and `ParentExecutionId` — cannot be backfilled
|
||||
|
||||
`ExecutionId` and `ParentExecutionId` are **PERSISTED COMPUTED columns** derived
|
||||
from `DetailsJson`. They were introduced in the same feature window as the column
|
||||
itself but their value comes from the JSON payload that was written at ingest time.
|
||||
|
||||
The AuditLog append-only invariant **forbids mutating `DetailsJson`** — rows may
|
||||
only be inserted, never updated. Because backfilling the computed values would
|
||||
require rewriting the underlying `DetailsJson`, it is impossible under the
|
||||
append-only contract. Pre-feature rows carry `NULL` in both columns permanently.
|
||||
|
||||
This is a documented limitation, not a defect. The NULL values are visible in the
|
||||
Audit Log UI's execution-tree drilldown (rows with no `ExecutionId` appear as
|
||||
orphaned entries) and in the CLI's `audit tree` output.
|
||||
|
||||
## Dependencies
|
||||
|
||||
@@ -442,6 +573,8 @@ global value in v1; per-channel overrides are deferred to v1.x.
|
||||
tiles (Volume, Error rate, Backlog) plus new health metrics:
|
||||
`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`,
|
||||
`CentralAuditWriteFailures`, `AuditRedactionFailure`.
|
||||
- **[CLI (#19)](Component-CLI.md)** — new `scadabridge audit query`,
|
||||
`scadabridge audit export`, and `scadabridge audit verify-chain` commands; same
|
||||
permission requirements as the UI.
|
||||
- **[CLI (#19)](Component-CLI.md)** — `scadabridge audit query`,
|
||||
`scadabridge audit export`, `scadabridge audit tree --execution-id <guid>`,
|
||||
`scadabridge audit backfill-source-node --sentinel <s> --before <date>`, and
|
||||
`scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain
|
||||
feature); same permission requirements as the UI.
|
||||
|
||||
@@ -228,14 +228,17 @@ The new centralized Audit Log component (#23) is exposed via the `scadabridge au
|
||||
The `scadabridge audit` group targets the centralized Audit Log component (#23) and
|
||||
exposes the UI-equivalent operational audit surface. Permissions follow the same
|
||||
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
|
||||
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
|
||||
require the `OperationalAudit` permission; `audit export` additionally requires
|
||||
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
|
||||
exit code 2) on denial.
|
||||
Tamper-Evidence, and Security & Auth #10): `audit query`, `audit tree`, and
|
||||
`audit verify-chain` require the `OperationalAudit` permission; `audit export`
|
||||
additionally requires `AuditExport`; `audit backfill-source-node` requires the
|
||||
`Admin` role (maintenance path only). The server enforces permission checks and
|
||||
returns HTTP 403 (CLI exit code 2) on denial.
|
||||
|
||||
```
|
||||
scadabridge audit query [--since <t>] [--until <t>] [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>] [--correlation-id <id>] [--execution-id <id>] [--parent-execution-id <id>] [--errors-only] [--page-size <n>] [--all]
|
||||
scadabridge audit export --since <t> --until <t> --format csv|jsonl|parquet --output <path> [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>]
|
||||
scadabridge audit tree --execution-id <guid> [--format table|json]
|
||||
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
|
||||
scadabridge audit verify-chain --month <YYYY-MM>
|
||||
```
|
||||
|
||||
@@ -247,6 +250,18 @@ scadabridge audit verify-chain --month <YYYY-MM>
|
||||
requested format (`csv`, `jsonl`, `parquet`) written to `--output`. The server
|
||||
streams rows rather than materializing them in memory; the CLI writes bytes
|
||||
through to disk. Supports the same scoping filters as `audit query`.
|
||||
- `audit tree --execution-id <guid>` (M5.3 T8) — renders the full execution-chain
|
||||
tree for the given `ExecutionId`. The server resolves the root from any node in
|
||||
the chain (walks `ParentExecutionId` to find the root, then traverses downward)
|
||||
and returns all reachable executions with their summary row counts and first/last
|
||||
occurred timestamps. Output format: `json` (default — structured tree suitable
|
||||
for scripting) or `table` (human-readable indented tree). Requires
|
||||
`OperationalAudit` permission. Backed by `GET /api/audit/tree?executionId=<guid>`.
|
||||
- `audit backfill-source-node --before <ISO-8601-UTC>` (M5.6 T5) — sets
|
||||
`SourceNode` to a sentinel value (`--sentinel`, default `"unknown"`) on pre-feature
|
||||
rows where `SourceNode IS NULL` and `OccurredAtUtc < --before`, in batches
|
||||
(`--batch`, default 5000). Admin-only maintenance command. Idempotent.
|
||||
Backed by `POST /api/audit/backfill-source-node`.
|
||||
- `audit verify-chain` — hash-chain verification for the named month.
|
||||
**No-op in v1**: the command is defined so the command tree is stable, but
|
||||
verification only becomes meaningful once the hash-chain ships (see
|
||||
@@ -366,7 +381,7 @@ Configuration is resolved in the following priority order (highest wins):
|
||||
- **System.CommandLine**: Command-line argument parsing.
|
||||
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
|
||||
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadabridge audit` command group rides a parallel REST surface on the same Host (`GET /api/audit/query` and `GET /api/audit/export`), sharing HTTP Basic Auth with `/management` but bypassing the actor for read-only, keyset-paged / streaming workloads.
|
||||
- **Audit Log (#23)**: The `scadabridge audit query` and `audit export` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) on the Host's Management API surface; `audit verify-chain` rides `POST /management` until hash-chain verification ships. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side by `AuditEndpoints`.
|
||||
- **Audit Log (#23)**: The `scadabridge audit query`, `audit export`, `audit tree`, and `audit backfill-source-node` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`, `GET /api/audit/tree`, `POST /api/audit/backfill-source-node`) on the Host's Management API surface; `audit verify-chain` is a client-side no-op today (hash-chain deferred to v1.x). Permission checks (`OperationalAudit`, `AuditExport`, `Admin`) are enforced server-side by `AuditEndpoints`.
|
||||
|
||||
## Interactions
|
||||
|
||||
|
||||
@@ -189,6 +189,7 @@ Inbound API scripts **cannot** call shared scripts directly — shared scripts a
|
||||
- `Route.To("instanceUniqueCode").GetAttributes("attr1", "attr2", ...)` — Read multiple attribute values in a **single call**, returned as a dictionary of name-value pairs.
|
||||
- `Route.To("instanceUniqueCode").SetAttribute("attributeName", value)` — Write a single attribute value on a specific instance at any site.
|
||||
- `Route.To("instanceUniqueCode").SetAttributes(dictionary)` — Write multiple attribute values in a **single call**, accepting a dictionary of name-value pairs.
|
||||
- `Route.To("instanceUniqueCode").WaitForAttribute("attributeName", targetValue, timeout)` — Wait, event-driven, until an attribute on a specific instance at any site reaches `targetValue` (value-equality only across the wire), bounded by `timeout`. Returns `true` if matched within the timeout, `false` if it timed out. The cluster call is bounded by the wait timeout rather than the generic integration timeout.
|
||||
|
||||
#### Input/Output
|
||||
- **Input parameters** are available as defined in the method definition.
|
||||
|
||||
@@ -39,10 +39,12 @@ namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
|
||||
public sealed class AuditCentralHealthSnapshot
|
||||
: IAuditCentralHealthSnapshot,
|
||||
ICentralAuditWriteFailureCounter,
|
||||
IAuditRedactionFailureCounter
|
||||
IAuditRedactionFailureCounter,
|
||||
IAuditInboundCeilingHitsCounter
|
||||
{
|
||||
private int _centralAuditWriteFailures;
|
||||
private int _auditRedactionFailure;
|
||||
private int _auditInboundCeilingHits;
|
||||
private readonly ConcurrentDictionary<string, bool> _stalled = new();
|
||||
|
||||
/// <inheritdoc/>
|
||||
@@ -53,6 +55,10 @@ public sealed class AuditCentralHealthSnapshot
|
||||
public int AuditRedactionFailure =>
|
||||
Interlocked.CompareExchange(ref _auditRedactionFailure, 0, 0);
|
||||
|
||||
/// <inheritdoc/>
|
||||
public int AuditInboundCeilingHits =>
|
||||
Interlocked.CompareExchange(ref _auditInboundCeilingHits, 0, 0);
|
||||
|
||||
/// <inheritdoc/>
|
||||
public IReadOnlyDictionary<string, bool> SiteAuditTelemetryStalled =>
|
||||
new Dictionary<string, bool>(_stalled);
|
||||
@@ -78,4 +84,8 @@ public sealed class AuditCentralHealthSnapshot
|
||||
/// <inheritdoc/>
|
||||
void IAuditRedactionFailureCounter.Increment() =>
|
||||
Interlocked.Increment(ref _auditRedactionFailure);
|
||||
|
||||
/// <inheritdoc/>
|
||||
void IAuditInboundCeilingHitsCounter.Increment() =>
|
||||
Interlocked.Increment(ref _auditInboundCeilingHits);
|
||||
}
|
||||
|
||||
@@ -167,6 +167,9 @@ public class AuditLogPurgeActor : ReceiveActor
|
||||
|
||||
if (boundaries.Count == 0)
|
||||
{
|
||||
// No whole-month partitions are eligible, but per-channel overrides may
|
||||
// still expire rows earlier than the global window — run them below.
|
||||
await RunPerChannelOverridesAsync(repository).ConfigureAwait(false);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -202,6 +205,80 @@ public class AuditLogPurgeActor : ReceiveActor
|
||||
sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
|
||||
// M5.5 (T3): after the channel-blind global partition switch-out, apply any
|
||||
// per-channel retention overrides that are SHORTER than the global window via
|
||||
// a bounded, batched row DELETE on the same maintenance path. The global
|
||||
// switch-out has already dropped whole months older than RetentionDays; these
|
||||
// deletes only ever expire rows EARLIER than that, so they run last and are a
|
||||
// strict tightening.
|
||||
await RunPerChannelOverridesAsync(repository).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// M5.5 (T3): runs each per-channel retention override whose window is strictly
|
||||
/// shorter than the global <see cref="AuditLogOptions.RetentionDays"/>, deleting
|
||||
/// rows of that channel older than the channel-specific threshold via a bounded,
|
||||
/// batched maintenance-path DELETE. Each channel runs inside its own try/catch so
|
||||
/// one bad channel does not abandon the others on the same tick, mirroring the
|
||||
/// per-boundary error isolation of the partition switch-out loop.
|
||||
/// </summary>
|
||||
/// <param name="repository">The repository resolved for this tick's DI scope.</param>
|
||||
private async Task RunPerChannelOverridesAsync(IAuditLogRepository repository)
|
||||
{
|
||||
var overrides = _auditOptions.PerChannelRetentionDays;
|
||||
if (overrides is null || overrides.Count == 0)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
var globalDays = _auditOptions.RetentionDays;
|
||||
|
||||
foreach (var (channel, days) in overrides)
|
||||
{
|
||||
// Only act when the per-channel window is strictly shorter than the global
|
||||
// one. Equal/longer windows are already covered by the global partition
|
||||
// switch-out, so a row DELETE would be redundant work (and a longer window
|
||||
// is meaningless — the partition is dropped on the global schedule).
|
||||
if (days >= globalDays)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
var channelThreshold = DateTime.UtcNow - TimeSpan.FromDays(days);
|
||||
var sw = Stopwatch.StartNew();
|
||||
try
|
||||
{
|
||||
var rowsDeleted = await repository
|
||||
.PurgeChannelOlderThanAsync(channel, channelThreshold, _purgeOptions.ChannelPurgeBatchSize)
|
||||
.ConfigureAwait(false);
|
||||
sw.Stop();
|
||||
|
||||
if (rowsDeleted > 0)
|
||||
{
|
||||
_logger.LogInformation(
|
||||
"Purged {RowsDeleted} AuditLog rows for channel {Channel} older than {Threshold:o} " +
|
||||
"(per-channel override {Days}d < global {GlobalDays}d) in {DurationMs} ms.",
|
||||
rowsDeleted,
|
||||
channel,
|
||||
channelThreshold,
|
||||
days,
|
||||
globalDays,
|
||||
sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
sw.Stop();
|
||||
_logger.LogError(
|
||||
ex,
|
||||
"Failed to apply per-channel retention override for channel {Channel} " +
|
||||
"({Days}d); other channels continue. Elapsed {DurationMs} ms.",
|
||||
channel,
|
||||
days,
|
||||
sw.ElapsedMilliseconds);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>Self-tick triggering a purge pass across all eligible partitions.</summary>
|
||||
|
||||
@@ -28,6 +28,24 @@ public sealed class AuditLogPurgeOptions
|
||||
/// <summary>Period of the purge tick in hours (default 24).</summary>
|
||||
public int IntervalHours { get; set; } = 24;
|
||||
|
||||
/// <summary>
|
||||
/// M5.5 (T3): batch size for the per-channel retention-override row DELETE
|
||||
/// (<see cref="ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories.IAuditLogRepository.PurgeChannelOlderThanAsync"/>).
|
||||
/// Each <c>DELETE TOP (@batch)</c> caps the transaction-log and lock footprint
|
||||
/// per statement; the repository loops batches until no rows remain. Default
|
||||
/// 5000 keeps individual deletes short on a busy central DB while still draining
|
||||
/// a large backlog within a tick. Clamped to a sane minimum in
|
||||
/// <see cref="ChannelPurgeBatchSize"/>.
|
||||
/// </summary>
|
||||
public int ChannelPurgeBatchSizeConfigured { get; set; } = 5000;
|
||||
|
||||
/// <summary>
|
||||
/// Resolves the effective per-channel purge batch size, clamped to at least 1 so
|
||||
/// a misconfigured <c>0</c>/negative value cannot make the repository's DELETE
|
||||
/// loop spin or throw.
|
||||
/// </summary>
|
||||
public int ChannelPurgeBatchSize => ChannelPurgeBatchSizeConfigured < 1 ? 1 : ChannelPurgeBatchSizeConfigured;
|
||||
|
||||
/// <summary>
|
||||
/// Test-only override for finer control over the tick cadence than
|
||||
/// whole-hour resolution allows. When non-null, takes precedence over
|
||||
|
||||
@@ -50,6 +50,17 @@ public interface IAuditCentralHealthSnapshot
|
||||
/// </summary>
|
||||
int AuditRedactionFailure { get; }
|
||||
|
||||
/// <summary>
|
||||
/// Count of inbound request/response body truncations at the
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.AuditLogOptions.InboundMaxBytes"/>
|
||||
/// ceiling since process start. Incremented by
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.InboundAPI.Middleware.AuditWriteMiddleware"/>
|
||||
/// whenever either the request or response body exceeds the cap and is
|
||||
/// truncated in the audit copy. A sustained non-zero count can indicate
|
||||
/// callers sending unexpectedly large bodies.
|
||||
/// </summary>
|
||||
int AuditInboundCeilingHits { get; }
|
||||
|
||||
/// <summary>
|
||||
/// Per-site latched stalled state: <c>true</c> when the
|
||||
/// <see cref="SiteAuditReconciliationActor"/> has observed two
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log (#23) M5.3 (T7) counter sink incremented by
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.InboundAPI.Middleware.AuditWriteMiddleware"/>
|
||||
/// whenever an inbound request or response body is truncated at the
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.AuditLogOptions.InboundMaxBytes"/>
|
||||
/// ceiling. Mirrors the <see cref="ICentralAuditWriteFailureCounter"/> shape:
|
||||
/// one-method, NoOp default, must-never-abort-the-user-facing-action invariant.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// A ceiling hit is a normal operational event (the caller sent a large
|
||||
/// body) rather than a failure, but surfacing a cumulative count lets
|
||||
/// operators detect over-size callers early. The
|
||||
/// <see cref="AuditCentralHealthSnapshot"/> production implementation
|
||||
/// accumulates the count via an <c>Interlocked</c> field alongside
|
||||
/// <see cref="ICentralAuditWriteFailureCounter"/> and
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.AuditLog.Payload.IAuditRedactionFailureCounter"/>.
|
||||
/// </remarks>
|
||||
public interface IAuditInboundCeilingHitsCounter
|
||||
{
|
||||
/// <summary>Increment the inbound body-ceiling hit counter by one.</summary>
|
||||
void Increment();
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Central;
|
||||
|
||||
/// <summary>
|
||||
/// Default <see cref="IAuditInboundCeilingHitsCounter"/> binding used when
|
||||
/// the central health snapshot is not wired (e.g. site composition roots,
|
||||
/// test harnesses that have no health dashboard). All increments are silently
|
||||
/// dropped — correct for environments that have no audit KPI surface.
|
||||
/// </summary>
|
||||
public sealed class NoOpAuditInboundCeilingHitsCounter : IAuditInboundCeilingHitsCounter
|
||||
{
|
||||
/// <inheritdoc/>
|
||||
public void Increment() { }
|
||||
}
|
||||
@@ -37,6 +37,33 @@ public sealed class AuditLogOptions
|
||||
/// <summary>Central retention window in days (default 365, range [30, 3650]).</summary>
|
||||
public int RetentionDays { get; set; } = 365;
|
||||
|
||||
/// <summary>
|
||||
/// M5.5 (T3) per-channel retention overrides, keyed by the canonical channel name
|
||||
/// (the <see cref="AuditChannel"/> enum name — e.g. <c>ApiOutbound</c>,
|
||||
/// <c>DbOutbound</c>, <c>Notification</c>, <c>ApiInbound</c>). The value is a
|
||||
/// retention window in days that MUST be SHORTER than or equal to the global
|
||||
/// <see cref="RetentionDays"/>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// The global <see cref="RetentionDays"/> window is enforced by month-partition
|
||||
/// switch-out, which is channel-blind: it can only drop a whole month once every
|
||||
/// row in it is older than the global window. A per-channel override therefore
|
||||
/// can only ever expire rows EARLIER than the global purge would — never later
|
||||
/// (a longer per-channel window is meaningless because the partition switch-out
|
||||
/// would already have dropped the month). Overrides shorter than the global window
|
||||
/// are honoured by the purge actor as a bounded, batched row DELETE on the
|
||||
/// maintenance path (see <c>AuditLogPurgeActor</c>); the append-only writer/ingest
|
||||
/// role is unaffected.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Each value is validated to be in <c>[30, RetentionDays]</c> by
|
||||
/// <c>AuditLogOptionsValidator</c>; keys that are not recognized
|
||||
/// <see cref="AuditChannel"/> names are rejected.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public Dictionary<string, int> PerChannelRetentionDays { get; set; } = new();
|
||||
|
||||
/// <summary>
|
||||
/// Per-body byte ceiling applied to <see cref="AuditEvent.RequestSummary"/> and
|
||||
/// <see cref="AuditEvent.ResponseSummary"/> for <see cref="AuditChannel.ApiInbound"/> rows
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
using ZB.MOM.WW.Configuration;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Configuration;
|
||||
|
||||
@@ -52,5 +53,27 @@ public sealed class AuditLogOptionsValidator : OptionsValidatorBase<AuditLogOpti
|
||||
!(options.InboundMaxBytes < MinInboundMaxBytes || options.InboundMaxBytes > MaxInboundMaxBytes),
|
||||
$"AuditLog:{nameof(AuditLogOptions.InboundMaxBytes)} ({options.InboundMaxBytes}) " +
|
||||
$"must be in [{MinInboundMaxBytes}, {MaxInboundMaxBytes}] bytes.");
|
||||
|
||||
// M5.5 (T3): per-channel retention overrides. Each entry must be keyed by a
|
||||
// recognized AuditChannel name and carry a window in [MinRetentionDays,
|
||||
// RetentionDays] — i.e. SHORTER than or equal to the global window. A longer
|
||||
// per-channel window is meaningless under month-partition switch-out (governed
|
||||
// by the global window), so it is rejected rather than silently ignored.
|
||||
foreach (var (channelKey, days) in options.PerChannelRetentionDays)
|
||||
{
|
||||
builder.RequireThat(
|
||||
Enum.TryParse<AuditChannel>(channelKey, ignoreCase: false, out _),
|
||||
$"AuditLog:{nameof(AuditLogOptions.PerChannelRetentionDays)} key '{channelKey}' " +
|
||||
$"is not a recognized channel name. Valid keys: {string.Join(", ", Enum.GetNames<AuditChannel>())}.");
|
||||
|
||||
// Valid when days is within [MinRetentionDays, RetentionDays] inclusive.
|
||||
// The lower bound matches the global RetentionDays floor; the upper bound
|
||||
// is the configured global window (longer is meaningless — see remarks).
|
||||
builder.RequireThat(
|
||||
!(days < MinRetentionDays || days > options.RetentionDays),
|
||||
$"AuditLog:{nameof(AuditLogOptions.PerChannelRetentionDays)}['{channelKey}'] ({days}) " +
|
||||
$"must be in [{MinRetentionDays}, {nameof(AuditLogOptions.RetentionDays)}={options.RetentionDays}] days " +
|
||||
"— a per-channel window must be shorter than or equal to the global retention window.");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -25,4 +25,15 @@ public sealed class PerTargetRedactionOverride
|
||||
/// rows.
|
||||
/// </summary>
|
||||
public string? RedactSqlParamsMatching { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// When <c>true</c>, the inbound API audit row for this target records
|
||||
/// request/response headers and metadata (status, duration, actor, etc.)
|
||||
/// but the request and response body strings are omitted
|
||||
/// (<c>RequestSummary</c> / <c>ResponseSummary</c> are left null). The
|
||||
/// audit row itself is always emitted — only the body content is suppressed.
|
||||
/// Null (the default, equivalent to <c>false</c>) means body capture
|
||||
/// proceeds normally up to <see cref="AuditLogOptions.InboundMaxBytes"/>.
|
||||
/// </summary>
|
||||
public bool SkipBodyCapture { get; set; }
|
||||
}
|
||||
|
||||
@@ -200,6 +200,13 @@ public static class ServiceCollectionExtensions
|
||||
// surface on the central dashboard.
|
||||
services.TryAddSingleton<ICentralAuditWriteFailureCounter, NoOpCentralAuditWriteFailureCounter>();
|
||||
|
||||
// M5.3 (T7): inbound body-ceiling hit counter — NoOp default for
|
||||
// site/test roots. AddAuditLogCentralMaintenance replaces this binding
|
||||
// with the AuditCentralHealthSnapshot implementation so ceiling-hit
|
||||
// counts surface on the central dashboard alongside write-failure and
|
||||
// redaction-failure counters.
|
||||
services.TryAddSingleton<IAuditInboundCeilingHitsCounter, NoOpAuditInboundCeilingHitsCounter>();
|
||||
|
||||
// M4 Bundle B: central direct-write audit writer used by
|
||||
// NotificationOutboxActor (Bundle B) and Inbound API (Bundle C/D) to
|
||||
// emit AuditLog rows that originate ON central, not via site telemetry.
|
||||
@@ -383,6 +390,12 @@ public static class ServiceCollectionExtensions
|
||||
// HealthMetricsAuditRedactionFailureCounter shape one-for-one.
|
||||
services.Replace(ServiceDescriptor.Singleton<IAuditRedactionFailureCounter,
|
||||
CentralAuditRedactionFailureCounter>());
|
||||
// M5.3 (T7): replace the NoOp IAuditInboundCeilingHitsCounter with the
|
||||
// AuditCentralHealthSnapshot so ceiling-hit counts surface on the
|
||||
// central dashboard. Same singleton-forward pattern as
|
||||
// ICentralAuditWriteFailureCounter above.
|
||||
services.Replace(ServiceDescriptor.Singleton<IAuditInboundCeilingHitsCounter>(
|
||||
sp => sp.GetRequiredService<AuditCentralHealthSnapshot>()));
|
||||
|
||||
return services;
|
||||
}
|
||||
|
||||
@@ -0,0 +1,113 @@
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
|
||||
|
||||
/// <summary>
|
||||
/// Arguments for an <c>audit backfill-source-node</c> invocation.
|
||||
/// </summary>
|
||||
public sealed class AuditBackfillSourceNodeArgs
|
||||
{
|
||||
/// <summary>
|
||||
/// Value written into <c>SourceNode</c> for NULL rows (default <c>"unknown"</c>).
|
||||
/// </summary>
|
||||
public string Sentinel { get; set; } = "unknown";
|
||||
|
||||
/// <summary>
|
||||
/// Only rows with <c>OccurredAtUtc</c> strictly before this UTC datetime are
|
||||
/// eligible. Required — must be an ISO-8601 UTC datetime.
|
||||
/// </summary>
|
||||
public string Before { get; set; } = string.Empty;
|
||||
|
||||
/// <summary>
|
||||
/// Maximum rows updated per batch (default 5000). Caps the per-transaction
|
||||
/// log footprint; the loop repeats until no rows remain.
|
||||
/// </summary>
|
||||
public int BatchSize { get; set; } = 5000;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pure helpers for the <c>audit backfill-source-node</c> subcommand (Audit Log
|
||||
/// #23 M5.6 T5). Builds the request body, POSTs to
|
||||
/// <c>/api/audit/backfill-source-node</c>, and renders the result. Kept separate
|
||||
/// from the command wiring so each piece is unit-testable without standing up the
|
||||
/// command tree.
|
||||
/// </summary>
|
||||
public static class AuditBackfillHelpers
|
||||
{
|
||||
private static readonly JsonSerializerOptions JsonWriteOptions = new()
|
||||
{
|
||||
WriteIndented = true,
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Builds the JSON request body for <c>POST /api/audit/backfill-source-node</c>.
|
||||
/// </summary>
|
||||
/// <param name="args">The backfill arguments.</param>
|
||||
/// <returns>A JSON string suitable for the request body.</returns>
|
||||
public static string BuildRequestBody(AuditBackfillSourceNodeArgs args)
|
||||
{
|
||||
var obj = new
|
||||
{
|
||||
sentinel = args.Sentinel,
|
||||
before = args.Before,
|
||||
batchSize = args.BatchSize,
|
||||
};
|
||||
return JsonSerializer.Serialize(obj);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Executes the backfill: POSTs <c>/api/audit/backfill-source-node</c> and
|
||||
/// prints the result. Returns the process exit code (0 = success,
|
||||
/// 1 = error, 2 = authorization failure).
|
||||
/// </summary>
|
||||
/// <param name="client">The management HTTP client.</param>
|
||||
/// <param name="args">The backfill arguments.</param>
|
||||
/// <param name="output">The output writer for results.</param>
|
||||
/// <returns>A task that resolves to the process exit code.</returns>
|
||||
public static async Task<int> RunBackfillAsync(
|
||||
ManagementHttpClient client,
|
||||
AuditBackfillSourceNodeArgs args,
|
||||
TextWriter output)
|
||||
{
|
||||
var body = BuildRequestBody(args);
|
||||
var response = await client.SendPostAsync(
|
||||
"api/audit/backfill-source-node", body, TimeSpan.FromMinutes(10));
|
||||
|
||||
if (response.JsonData == null)
|
||||
{
|
||||
OutputFormatter.WriteError(
|
||||
response.Error ?? "Backfill request failed.", response.ErrorCode ?? "ERROR");
|
||||
return CommandHelpers.IsAuthorizationFailure(response) ? 2 : 1;
|
||||
}
|
||||
|
||||
// Parse and display the result.
|
||||
try
|
||||
{
|
||||
using var doc = JsonDocument.Parse(response.JsonData);
|
||||
var root = doc.RootElement;
|
||||
var rowsUpdated = root.TryGetProperty("rowsUpdated", out var r)
|
||||
? r.GetInt64()
|
||||
: 0L;
|
||||
var sentinel = root.TryGetProperty("sentinel", out var s)
|
||||
? s.GetString() ?? args.Sentinel
|
||||
: args.Sentinel;
|
||||
var before = root.TryGetProperty("before", out var b)
|
||||
? b.GetString() ?? args.Before
|
||||
: args.Before;
|
||||
|
||||
output.WriteLine($"SourceNode backfill complete.");
|
||||
output.WriteLine($" rows updated : {rowsUpdated}");
|
||||
output.WriteLine($" sentinel : {sentinel}");
|
||||
output.WriteLine($" before : {before}");
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
// Server returned success but non-JSON body — not expected; print raw.
|
||||
output.WriteLine(response.JsonData);
|
||||
}
|
||||
|
||||
output.Flush();
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
@@ -6,13 +6,15 @@ namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
|
||||
/// <summary>
|
||||
/// The <c>scadabridge audit</c> command group (Audit Log #23 M8). Provides read access to
|
||||
/// the centralized append-only Audit Log via the Bundle B REST endpoints
|
||||
/// (<c>GET /api/audit/query</c>, <c>GET /api/audit/export</c>), plus a v1 no-op
|
||||
/// <c>verify-chain</c> placeholder for the deferred hash-chain tamper-evidence feature.
|
||||
/// (<c>GET /api/audit/query</c>, <c>GET /api/audit/export</c>,
|
||||
/// <c>GET /api/audit/tree</c>), plus a v1 no-op <c>verify-chain</c> placeholder
|
||||
/// for the deferred hash-chain tamper-evidence feature.
|
||||
/// </summary>
|
||||
public static class AuditCommands
|
||||
{
|
||||
/// <summary>
|
||||
/// Builds the <c>audit</c> command group with query, export, and verify-chain sub-commands.
|
||||
/// Builds the <c>audit</c> command group with query, export, tree, and verify-chain
|
||||
/// sub-commands.
|
||||
/// </summary>
|
||||
/// <param name="urlOption">Global <c>--url</c> option for the management API endpoint.</param>
|
||||
/// <param name="formatOption">Global <c>--format</c> option for output format.</param>
|
||||
@@ -25,7 +27,9 @@ public static class AuditCommands
|
||||
|
||||
command.Add(BuildQuery(urlOption, formatOption, usernameOption, passwordOption));
|
||||
command.Add(BuildExport(urlOption, formatOption, usernameOption, passwordOption));
|
||||
command.Add(BuildTree(urlOption, formatOption, usernameOption, passwordOption));
|
||||
command.Add(BuildVerifyChain(urlOption, formatOption, usernameOption, passwordOption));
|
||||
command.Add(BuildBackfillSourceNode(urlOption, formatOption, usernameOption, passwordOption));
|
||||
|
||||
return command;
|
||||
}
|
||||
@@ -224,6 +228,44 @@ public static class AuditCommands
|
||||
return cmd;
|
||||
}
|
||||
|
||||
private static Command BuildTree(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
|
||||
{
|
||||
var executionIdOption = new Option<string>("--execution-id")
|
||||
{
|
||||
Description = "Execution ID (GUID) to look up — may be any node in the chain",
|
||||
Required = true,
|
||||
};
|
||||
|
||||
var cmd = new Command("tree") { Description = "Display the full execution-chain tree for an audit execution" };
|
||||
cmd.Add(executionIdOption);
|
||||
|
||||
cmd.SetAction(async (ParseResult result) =>
|
||||
{
|
||||
var connection = AuditCommandHelpers.ResolveConnection(result, urlOption, usernameOption, passwordOption);
|
||||
if (connection.Error != null)
|
||||
{
|
||||
OutputFormatter.WriteError(connection.Error, connection.ErrorCode!);
|
||||
return 1;
|
||||
}
|
||||
|
||||
var rawId = result.GetValue(executionIdOption);
|
||||
if (!Guid.TryParse(rawId, out var executionId))
|
||||
{
|
||||
OutputFormatter.WriteError(
|
||||
$"Invalid execution ID '{rawId}'. Expected a GUID (e.g. 11111111-1111-1111-1111-111111111111).",
|
||||
"INVALID_ARGUMENT");
|
||||
return 1;
|
||||
}
|
||||
|
||||
var format = AuditCommandHelpers.ResolveFormat(result, formatOption);
|
||||
|
||||
using var client = new ManagementHttpClient(connection.Url!, connection.Username!, connection.Password!);
|
||||
return await AuditTreeHelpers.RunTreeAsync(client, executionId, format, Console.Out);
|
||||
});
|
||||
|
||||
return cmd;
|
||||
}
|
||||
|
||||
private static Command BuildVerifyChain(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
|
||||
{
|
||||
var monthOption = new Option<string>("--month") { Description = "Month to verify (YYYY-MM)", Required = true };
|
||||
@@ -247,4 +289,76 @@ public static class AuditCommands
|
||||
});
|
||||
return cmd;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Builds the <c>audit backfill-source-node</c> sub-command (Audit Log #23 M5.6 T5).
|
||||
/// Sets <c>SourceNode</c> on historical pre-feature rows whose <c>SourceNode IS NULL</c>
|
||||
/// and <c>OccurredAtUtc</c> is older than <c>--before</c>, in batches. Admin-only.
|
||||
/// </summary>
|
||||
private static Command BuildBackfillSourceNode(Option<string> urlOption, Option<string> formatOption, Option<string> usernameOption, Option<string> passwordOption)
|
||||
{
|
||||
var sentinelOption = new Option<string>("--sentinel")
|
||||
{
|
||||
Description = "Value to write for pre-feature rows whose node-of-origin is unknown (default: unknown)",
|
||||
};
|
||||
sentinelOption.DefaultValueFactory = _ => "unknown";
|
||||
|
||||
var beforeOption = new Option<string>("--before")
|
||||
{
|
||||
Description = "ISO-8601 UTC datetime; only rows older than this date are eligible (required)",
|
||||
Required = true,
|
||||
};
|
||||
|
||||
var batchOption = new Option<int>("--batch")
|
||||
{
|
||||
Description = "Max rows updated per batch (default: 5000)",
|
||||
};
|
||||
batchOption.DefaultValueFactory = _ => 5000;
|
||||
|
||||
var cmd = new Command("backfill-source-node")
|
||||
{
|
||||
Description = "Set SourceNode to a sentinel value on pre-feature rows where it is NULL (admin-only, maintenance path)",
|
||||
};
|
||||
cmd.Add(sentinelOption);
|
||||
cmd.Add(beforeOption);
|
||||
cmd.Add(batchOption);
|
||||
|
||||
cmd.SetAction(async (ParseResult result) =>
|
||||
{
|
||||
var connection = AuditCommandHelpers.ResolveConnection(result, urlOption, usernameOption, passwordOption);
|
||||
if (connection.Error != null)
|
||||
{
|
||||
OutputFormatter.WriteError(connection.Error, connection.ErrorCode!);
|
||||
return 1;
|
||||
}
|
||||
|
||||
var sentinel = result.GetValue(sentinelOption) ?? "unknown";
|
||||
var before = result.GetValue(beforeOption)!;
|
||||
var batch = result.GetValue(batchOption);
|
||||
|
||||
if (string.IsNullOrWhiteSpace(sentinel))
|
||||
{
|
||||
OutputFormatter.WriteError("--sentinel must be a non-empty string.", "INVALID_ARGUMENT");
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (batch <= 0)
|
||||
{
|
||||
OutputFormatter.WriteError("--batch must be > 0.", "INVALID_ARGUMENT");
|
||||
return 1;
|
||||
}
|
||||
|
||||
var args = new AuditBackfillSourceNodeArgs
|
||||
{
|
||||
Sentinel = sentinel,
|
||||
Before = before,
|
||||
BatchSize = batch,
|
||||
};
|
||||
|
||||
using var client = new ManagementHttpClient(connection.Url!, connection.Username!, connection.Password!);
|
||||
return await AuditBackfillHelpers.RunBackfillAsync(client, args, Console.Out);
|
||||
});
|
||||
|
||||
return cmd;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,208 @@
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.CLI.Commands;
|
||||
|
||||
/// <summary>
|
||||
/// Arguments for an <c>audit tree</c> invocation.
|
||||
/// </summary>
|
||||
public sealed class AuditTreeArgs
|
||||
{
|
||||
/// <summary>
|
||||
/// The execution ID (GUID) to look up. May be any node in the chain — the
|
||||
/// server walks to the root and returns the full tree.
|
||||
/// </summary>
|
||||
public string ExecutionId { get; set; } = string.Empty;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Represents one execution node as returned by <c>GET /api/audit/tree</c>.
|
||||
/// Property names match the server's camelCase JSON serialisation of
|
||||
/// <c>ExecutionTreeNode</c>.
|
||||
/// </summary>
|
||||
internal sealed class AuditTreeNodeDto
|
||||
{
|
||||
public Guid ExecutionId { get; init; }
|
||||
public Guid? ParentExecutionId { get; init; }
|
||||
public int RowCount { get; init; }
|
||||
public string[] Channels { get; init; } = Array.Empty<string>();
|
||||
public string[] Statuses { get; init; } = Array.Empty<string>();
|
||||
public string? SourceSiteId { get; init; }
|
||||
public string? SourceInstanceId { get; init; }
|
||||
public DateTime? FirstOccurredAtUtc { get; init; }
|
||||
public DateTime? LastOccurredAtUtc { get; init; }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pure helpers for the <c>audit tree</c> subcommand: builds the query string,
|
||||
/// calls <c>GET /api/audit/tree</c>, and renders the result as either an
|
||||
/// indented ASCII tree (table format) or raw JSON. Kept separate from the
|
||||
/// command wiring so each piece is unit-testable without standing up the
|
||||
/// command tree.
|
||||
/// </summary>
|
||||
public static class AuditTreeHelpers
|
||||
{
|
||||
private static readonly JsonSerializerOptions JsonReadOptions = new()
|
||||
{
|
||||
PropertyNameCaseInsensitive = true,
|
||||
};
|
||||
|
||||
private static readonly JsonSerializerOptions JsonWriteOptions = new()
|
||||
{
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
|
||||
WriteIndented = true,
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Builds the query string for <c>GET /api/audit/tree</c>.
|
||||
/// </summary>
|
||||
/// <param name="executionId">The execution ID GUID.</param>
|
||||
/// <returns>A relative path + query string ready to append to the base URL.</returns>
|
||||
public static string BuildUrl(Guid executionId)
|
||||
=> $"api/audit/tree?executionId={executionId:D}";
|
||||
|
||||
/// <summary>
|
||||
/// Executes the tree lookup: GETs <c>/api/audit/tree</c> and renders the result
|
||||
/// in the requested format. Returns the process exit code (0 = success,
|
||||
/// 1 = error, 2 = authorization failure).
|
||||
/// </summary>
|
||||
/// <param name="client">The management HTTP client.</param>
|
||||
/// <param name="executionId">The execution ID to look up.</param>
|
||||
/// <param name="format">"table" (default) or "json".</param>
|
||||
/// <param name="output">The output writer for results.</param>
|
||||
/// <returns>A task that resolves to the process exit code.</returns>
|
||||
public static async Task<int> RunTreeAsync(
|
||||
ManagementHttpClient client,
|
||||
Guid executionId,
|
||||
string format,
|
||||
TextWriter output)
|
||||
{
|
||||
var url = BuildUrl(executionId);
|
||||
var response = await client.SendGetAsync(url, TimeSpan.FromSeconds(30));
|
||||
|
||||
if (response.JsonData == null)
|
||||
{
|
||||
OutputFormatter.WriteError(
|
||||
response.Error ?? "Audit tree request failed.", response.ErrorCode ?? "ERROR");
|
||||
return CommandHelpers.IsAuthorizationFailure(response) ? 2 : 1;
|
||||
}
|
||||
|
||||
var nodes = ParseNodes(response.JsonData);
|
||||
|
||||
if (format == "json")
|
||||
{
|
||||
WriteJson(nodes, output);
|
||||
}
|
||||
else
|
||||
{
|
||||
WriteTable(nodes, executionId, output);
|
||||
}
|
||||
|
||||
output.Flush();
|
||||
return 0;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Parses the JSON array from the server into an array of
|
||||
/// <see cref="AuditTreeNodeDto"/>.
|
||||
/// </summary>
|
||||
/// <param name="json">The raw JSON response body.</param>
|
||||
/// <returns>An array of deserialized tree nodes (empty on parse failure).</returns>
|
||||
internal static AuditTreeNodeDto[] ParseNodes(string json)
|
||||
{
|
||||
try
|
||||
{
|
||||
return JsonSerializer.Deserialize<AuditTreeNodeDto[]>(json, JsonReadOptions)
|
||||
?? Array.Empty<AuditTreeNodeDto>();
|
||||
}
|
||||
catch (JsonException)
|
||||
{
|
||||
return Array.Empty<AuditTreeNodeDto>();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Renders the nodes as pretty-printed JSON to <paramref name="output"/>.
|
||||
/// </summary>
|
||||
internal static void WriteJson(AuditTreeNodeDto[] nodes, TextWriter output)
|
||||
{
|
||||
output.WriteLine(JsonSerializer.Serialize(nodes, JsonWriteOptions));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Renders the nodes as an indented ASCII tree. The root node (null
|
||||
/// <c>ParentExecutionId</c>) is printed first; each child is indented
|
||||
/// two spaces per depth level. The queried/entry-point node is marked
|
||||
/// with <c> [*]</c>.
|
||||
/// </summary>
|
||||
internal static void WriteTable(
|
||||
AuditTreeNodeDto[] nodes,
|
||||
Guid queriedExecutionId,
|
||||
TextWriter output)
|
||||
{
|
||||
if (nodes.Length == 0)
|
||||
{
|
||||
output.WriteLine("(no execution tree found)");
|
||||
return;
|
||||
}
|
||||
|
||||
// Build a parent → children lookup (keyed by non-null parent Guid).
|
||||
// Nodes whose ParentExecutionId is null are roots and are not placed in
|
||||
// the lookup; they are identified separately below.
|
||||
var childrenOf = new Dictionary<Guid, List<AuditTreeNodeDto>>();
|
||||
foreach (var node in nodes)
|
||||
{
|
||||
if (node.ParentExecutionId is { } parentId)
|
||||
{
|
||||
if (!childrenOf.ContainsKey(parentId))
|
||||
childrenOf[parentId] = new List<AuditTreeNodeDto>();
|
||||
childrenOf[parentId].Add(node);
|
||||
}
|
||||
}
|
||||
|
||||
// Identify roots: nodes whose ParentExecutionId is null, or whose parent
|
||||
// is not present in the node set (stub-root case).
|
||||
var nodeIds = new HashSet<Guid>(nodes.Select(n => n.ExecutionId));
|
||||
var roots = nodes
|
||||
.Where(n => n.ParentExecutionId == null || !nodeIds.Contains(n.ParentExecutionId.Value))
|
||||
.ToList();
|
||||
|
||||
// Render depth-first.
|
||||
var sb = new StringBuilder();
|
||||
foreach (var root in roots)
|
||||
{
|
||||
RenderNode(root, depth: 0, childrenOf, queriedExecutionId, sb);
|
||||
}
|
||||
|
||||
output.Write(sb.ToString());
|
||||
}
|
||||
|
||||
private static void RenderNode(
|
||||
AuditTreeNodeDto node,
|
||||
int depth,
|
||||
Dictionary<Guid, List<AuditTreeNodeDto>> childrenOf,
|
||||
Guid queriedExecutionId,
|
||||
StringBuilder sb)
|
||||
{
|
||||
var indent = new string(' ', depth * 2);
|
||||
var marker = node.ExecutionId == queriedExecutionId ? " [*]" : string.Empty;
|
||||
var channels = node.Channels.Length > 0 ? string.Join(",", node.Channels) : "-";
|
||||
var statuses = node.Statuses.Length > 0 ? string.Join(",", node.Statuses) : "-";
|
||||
var site = node.SourceSiteId ?? "-";
|
||||
var instance = node.SourceInstanceId ?? "-";
|
||||
var first = node.FirstOccurredAtUtc.HasValue
|
||||
? node.FirstOccurredAtUtc.Value.ToString("yyyy-MM-ddTHH:mm:ssZ")
|
||||
: "-";
|
||||
|
||||
sb.AppendLine(
|
||||
$"{indent}{node.ExecutionId:D}{marker} rows={node.RowCount} channels=[{channels}] statuses=[{statuses}] site={site} instance={instance} first={first}");
|
||||
|
||||
if (childrenOf.TryGetValue(node.ExecutionId, out var children))
|
||||
{
|
||||
foreach (var child in children)
|
||||
{
|
||||
RenderNode(child, depth + 1, childrenOf, queriedExecutionId, sb);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -142,6 +142,60 @@ public class ManagementHttpClient : IDisposable
|
||||
return new ManagementResponse((int)httpResponse.StatusCode, null, error, code);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Issues a plain HTTP <c>POST</c> against a REST endpoint (e.g. the audit
|
||||
/// maintenance endpoints) with a JSON body and returns the response. Unlike
|
||||
/// <see cref="SendCommandAsync"/>, this does not wrap the call in the
|
||||
/// <c>POST /management</c> command envelope — these are plain REST resources.
|
||||
/// Authentication (HTTP Basic) and the base address are shared.
|
||||
/// </summary>
|
||||
/// <param name="relativePath">Path relative to the base URL.</param>
|
||||
/// <param name="body">The JSON body to send, or <c>null</c> for an empty body.</param>
|
||||
/// <param name="timeout">The request timeout.</param>
|
||||
/// <returns>A management response containing status and data.</returns>
|
||||
public async Task<ManagementResponse> SendPostAsync(string relativePath, string? body, TimeSpan timeout)
|
||||
{
|
||||
using var cts = new CancellationTokenSource(timeout);
|
||||
|
||||
var content = new StringContent(body ?? "{}", Encoding.UTF8, "application/json");
|
||||
|
||||
HttpResponseMessage httpResponse;
|
||||
try
|
||||
{
|
||||
httpResponse = await _httpClient.PostAsync(relativePath, content, cts.Token);
|
||||
}
|
||||
catch (TaskCanceledException)
|
||||
{
|
||||
return new ManagementResponse(504, null, "Request timed out.", "TIMEOUT");
|
||||
}
|
||||
catch (HttpRequestException ex)
|
||||
{
|
||||
return new ManagementResponse(0, null, $"Connection failed: {ex.Message}", "CONNECTION_FAILED");
|
||||
}
|
||||
|
||||
var responseBody = await httpResponse.Content.ReadAsStringAsync(cts.Token);
|
||||
|
||||
if (httpResponse.IsSuccessStatusCode)
|
||||
{
|
||||
return new ManagementResponse((int)httpResponse.StatusCode, responseBody, null, null);
|
||||
}
|
||||
|
||||
string? error = null;
|
||||
string? code = null;
|
||||
try
|
||||
{
|
||||
using var doc = JsonDocument.Parse(responseBody);
|
||||
error = doc.RootElement.TryGetProperty("error", out var e) ? e.GetString() : responseBody;
|
||||
code = doc.RootElement.TryGetProperty("code", out var c) ? c.GetString() : null;
|
||||
}
|
||||
catch
|
||||
{
|
||||
error = responseBody;
|
||||
}
|
||||
|
||||
return new ManagementResponse((int)httpResponse.StatusCode, null, error, code);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Issues a plain HTTP <c>GET</c> and returns the raw <see cref="HttpResponseMessage"/>
|
||||
/// so the caller can stream the response body without buffering it in memory — used
|
||||
|
||||
@@ -1269,15 +1269,18 @@ script-trust-boundary action: outbound API calls (sync + cached), outbound DB
|
||||
operations (sync + cached), notifications, and inbound API calls. This is distinct
|
||||
from the configuration-change audit trail exposed by [`audit-config`](#audit-config--configuration-change-audit-log).
|
||||
|
||||
The subcommands map directly onto the `GET /api/audit/query` and
|
||||
`GET /api/audit/export` management endpoints. Filters and the result columns mirror
|
||||
the Central UI **Audit** page, so a CLI query and a UI query with the same filters
|
||||
return the same rows — CLI ↔ UI filter parity is intentional.
|
||||
The subcommands map directly onto the `GET /api/audit/query`,
|
||||
`GET /api/audit/export`, `GET /api/audit/tree`, and
|
||||
`POST /api/audit/backfill-source-node` management endpoints. Filters and the
|
||||
result columns mirror the Central UI **Audit** page, so a CLI query and a UI
|
||||
query with the same filters return the same rows — CLI ↔ UI filter parity is
|
||||
intentional.
|
||||
|
||||
**Permissions.** Querying requires the `OperationalAudit` permission (roles `Admin`,
|
||||
`Audit`, or `AuditReadOnly`). Exporting requires the stricter `AuditExport` permission
|
||||
(roles `Admin` or `Audit`) — read access does *not* imply export access. A request
|
||||
without the required role returns exit code `2`.
|
||||
**Permissions.** Querying and tree traversal require the `OperationalAudit`
|
||||
permission (roles `Admin`, `Audit`, or `AuditReadOnly`). Exporting requires the
|
||||
stricter `AuditExport` permission (roles `Admin` or `Audit`) — read access does
|
||||
*not* imply export access. The `backfill-source-node` maintenance command requires
|
||||
the `Admin` role. A request without the required role returns exit code `2`.
|
||||
|
||||
#### `audit query`
|
||||
|
||||
@@ -1342,6 +1345,46 @@ scadabridge --url <url> audit export --since <time> --until <time> --format <fmt
|
||||
> Implemented` — Parquet archival is deferred to v1.x (see `Component-AuditLog.md`).
|
||||
> Use `csv` or `jsonl`.
|
||||
|
||||
#### `audit tree` (M5.3 T8)
|
||||
|
||||
Display the full execution-chain tree for a given execution ID. The server walks
|
||||
`ParentExecutionId` to find the root, then traverses downward to collect all
|
||||
reachable executions in the chain.
|
||||
|
||||
```sh
|
||||
scadabridge --url <url> audit tree --execution-id <guid> [--format table|json]
|
||||
```
|
||||
|
||||
| Option | Required | Default | Description |
|
||||
|--------|----------|---------|-------------|
|
||||
| `--execution-id` | yes | — | Any `ExecutionId` in the chain (root or child) |
|
||||
| `--format` | no | `json` | Output format: `json` (structured tree) or `table` (indented tree) |
|
||||
|
||||
The `--execution-id` can be any node in the chain — the server resolves the root
|
||||
automatically. With `--format table` the tree is printed as an indented text
|
||||
representation. With `--format json` (the default) a structured JSON tree is
|
||||
returned, suitable for scripting. Backed by `GET /api/audit/tree?executionId=<guid>`.
|
||||
Requires `OperationalAudit` permission.
|
||||
|
||||
#### `audit backfill-source-node` (M5.6 T5)
|
||||
|
||||
Set `SourceNode` to a sentinel value on pre-feature rows where `SourceNode IS NULL`
|
||||
and `OccurredAtUtc` is older than `--before`. Admin-only maintenance command.
|
||||
|
||||
```sh
|
||||
scadabridge --url <url> audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
|
||||
```
|
||||
|
||||
| Option | Required | Default | Description |
|
||||
|--------|----------|---------|-------------|
|
||||
| `--before` | yes | — | ISO-8601 UTC datetime; only rows older than this date are eligible |
|
||||
| `--sentinel` | no | `unknown` | Value to write (must be non-empty) |
|
||||
| `--batch` | no | `5000` | Max rows updated per batch; controls transaction size |
|
||||
|
||||
The command is idempotent — running it multiple times converges (only rows where
|
||||
`SourceNode IS NULL` are eligible; already-set rows are untouched). Backed by
|
||||
`POST /api/audit/backfill-source-node`. Requires `Admin` role.
|
||||
|
||||
#### `audit verify-chain`
|
||||
|
||||
Verify the audit log hash chain for a given month.
|
||||
@@ -1354,11 +1397,11 @@ scadabridge --url <url> audit verify-chain --month <YYYY-MM>
|
||||
|--------|----------|---------|-------------|
|
||||
| `--month` | yes | — | Month to verify, `YYYY-MM` (e.g. `2026-05`) |
|
||||
|
||||
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release. The
|
||||
> subcommand validates the `--month` argument and prints a notice pointing at the
|
||||
> v1.x roadmap in `Component-AuditLog.md`; it exits `0` without contacting the server.
|
||||
> The command exists now so scripts and operator habits do not need to change when
|
||||
> tamper-evidence ships.
|
||||
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release (T1
|
||||
> deferred to v1.x). The subcommand validates the `--month` argument and prints a
|
||||
> notice pointing at the v1.x roadmap in `Component-AuditLog.md`; it exits `0`
|
||||
> without contacting the server. The command exists now so scripts and operator
|
||||
> habits do not need to change when tamper-evidence ships.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -58,3 +58,31 @@
|
||||
{
|
||||
<div class="text-muted small mb-3">Site Call KPIs unavailable: @ErrorMessage</div>
|
||||
}
|
||||
@* ── Per-node stuck/parked sub-table (T6: M5.2 per-node stuck-count KPIs) ── *@
|
||||
@if (HasNodeBreakdown)
|
||||
{
|
||||
<div class="mb-3">
|
||||
<div class="d-flex justify-content-between align-items-center mb-1">
|
||||
<small class="text-muted">By node</small>
|
||||
</div>
|
||||
<table class="table table-sm table-borderless mb-0 site-call-kpi-node-table">
|
||||
<thead class="table-light">
|
||||
<tr>
|
||||
<th class="small py-1">Node</th>
|
||||
<th class="text-end small py-1">Stuck</th>
|
||||
<th class="text-end small py-1">Parked</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
@foreach (var n in PerNodeSnapshots!)
|
||||
{
|
||||
<tr @key="n.SourceNode">
|
||||
<td class="small py-1"><code>@n.SourceNode</code></td>
|
||||
<td class="text-end font-monospace small py-1 @(n.StuckCount > 0 ? "text-warning" : "")">@n.StuckCount</td>
|
||||
<td class="text-end font-monospace small py-1 @(n.ParkedCount > 0 ? "text-danger" : "")">@n.ParkedCount</td>
|
||||
</tr>
|
||||
}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
}
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
using Microsoft.AspNetCore.Components;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.CentralUI.Components.Health;
|
||||
|
||||
@@ -59,6 +60,24 @@ public partial class SiteCallKpiTiles
|
||||
/// </summary>
|
||||
[Parameter] public string? ErrorMessage { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Optional per-node KPI breakdown (T6: M5.2 per-node stuck-count KPIs).
|
||||
/// When non-null and non-empty, a compact node-level stuck/parked sub-table
|
||||
/// is rendered below the main tiles. <c>null</c> means the parent has not
|
||||
/// loaded it yet or has opted out — the sub-table is suppressed entirely.
|
||||
/// </summary>
|
||||
[Parameter] public IReadOnlyList<SiteCallNodeKpiSnapshot>? PerNodeSnapshots { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// True when <see cref="PerNodeSnapshots"/> is a successful query result.
|
||||
/// Used to suppress the sub-table on a load failure.
|
||||
/// </summary>
|
||||
[Parameter] public bool PerNodeAvailable { get; set; }
|
||||
|
||||
/// <summary>Whether the per-node sub-table has data to render.</summary>
|
||||
internal bool HasNodeBreakdown =>
|
||||
PerNodeAvailable && PerNodeSnapshots is { Count: > 0 };
|
||||
|
||||
// ── Buffered tile ───────────────────────────────────────────────────────
|
||||
|
||||
private string BufferedDisplay =>
|
||||
|
||||
@@ -9,6 +9,7 @@
|
||||
@using ZB.MOM.WW.ScadaBridge.HealthMonitoring
|
||||
@using ZB.MOM.WW.ScadaBridge.Commons.Messages.Notification
|
||||
@using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit
|
||||
@using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit
|
||||
@using ZB.MOM.WW.ScadaBridge.Communication
|
||||
@implements IDisposable
|
||||
@inject ICentralHealthAggregator HealthAggregator
|
||||
@@ -65,7 +66,9 @@
|
||||
(buffered / stuck / parked). Refreshed alongside the site states. *@
|
||||
<SiteCallKpiTiles Snapshot="@_siteCallKpi"
|
||||
IsAvailable="@_siteCallKpiAvailable"
|
||||
ErrorMessage="@_siteCallKpiError" />
|
||||
ErrorMessage="@_siteCallKpiError"
|
||||
PerNodeSnapshots="@_siteCallNodeKpis"
|
||||
PerNodeAvailable="@_siteCallNodeKpiAvailable" />
|
||||
|
||||
@* Audit Log (#23) M7 Bundle E — three KPI tiles for the Audit channel
|
||||
(volume / error rate / backlog). Refreshed alongside the site states. *@
|
||||
@@ -378,6 +381,12 @@
|
||||
private bool _siteCallKpiAvailable;
|
||||
private string? _siteCallKpiError;
|
||||
|
||||
// Per-node Site Call KPI breakdown (T6: M5.2 per-node stuck-count KPIs).
|
||||
// Passed to SiteCallKpiTiles as an optional sub-table.
|
||||
private IReadOnlyList<SiteCallNodeKpiSnapshot> _siteCallNodeKpis =
|
||||
Array.Empty<SiteCallNodeKpiSnapshot>();
|
||||
private bool _siteCallNodeKpiAvailable;
|
||||
|
||||
private static bool SiteHasActiveErrors(SiteHealthState state)
|
||||
{
|
||||
var report = state.LatestReport;
|
||||
@@ -415,7 +424,7 @@
|
||||
{
|
||||
_siteStates = HealthAggregator.GetAllSiteStates();
|
||||
await LoadOutboxKpis();
|
||||
await LoadSiteCallKpis();
|
||||
await Task.WhenAll(LoadSiteCallKpis(), LoadSiteCallNodeKpis());
|
||||
await LoadAuditKpis();
|
||||
}
|
||||
|
||||
@@ -474,6 +483,30 @@
|
||||
}
|
||||
}
|
||||
|
||||
// Per-node site-call KPI loader (T6: M5.2). Best-effort; a fault silently
|
||||
// suppresses the per-node sub-table rather than degrading the dashboard.
|
||||
private async Task LoadSiteCallNodeKpis()
|
||||
{
|
||||
try
|
||||
{
|
||||
var response = await CommunicationService.GetPerNodeSiteCallKpisAsync(
|
||||
new PerNodeSiteCallKpiRequest(Guid.NewGuid().ToString("N")));
|
||||
if (response.Success)
|
||||
{
|
||||
_siteCallNodeKpis = response.Nodes;
|
||||
_siteCallNodeKpiAvailable = true;
|
||||
}
|
||||
else
|
||||
{
|
||||
_siteCallNodeKpiAvailable = false;
|
||||
}
|
||||
}
|
||||
catch
|
||||
{
|
||||
_siteCallNodeKpiAvailable = false;
|
||||
}
|
||||
}
|
||||
|
||||
// Tiles show the numeric KPI when available, or an em dash when the outbox
|
||||
// KPI query failed — matching how the page renders other unavailable data.
|
||||
private string OutboxTileValue(int value) =>
|
||||
|
||||
+73
-2
@@ -69,6 +69,51 @@
|
||||
</div>
|
||||
}
|
||||
|
||||
@* ── Per-node breakdown (T6: additive) ── *@
|
||||
<h5 class="mb-2">Per-node breakdown</h5>
|
||||
@if (_perNodeError != null)
|
||||
{
|
||||
<div class="alert alert-warning py-2">Per-node KPIs unavailable: @_perNodeError</div>
|
||||
}
|
||||
else if (_perNode.Count == 0)
|
||||
{
|
||||
<div class="card mb-3">
|
||||
<div class="card-body text-center text-muted py-3">
|
||||
<div class="small">No per-node activity (rows may have a null SourceNode).</div>
|
||||
</div>
|
||||
</div>
|
||||
}
|
||||
else
|
||||
{
|
||||
<div class="table-responsive mb-3">
|
||||
<table class="table table-sm table-hover align-middle">
|
||||
<thead class="table-light">
|
||||
<tr>
|
||||
<th>Node</th>
|
||||
<th class="text-end">Queue Depth</th>
|
||||
<th class="text-end">Stuck</th>
|
||||
<th class="text-end">Parked</th>
|
||||
<th class="text-end">Delivered (last interval)</th>
|
||||
<th class="text-end">Oldest Pending Age</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
@foreach (var n in _perNode)
|
||||
{
|
||||
<tr @key="n.SourceNode" class="@(n.StuckCount > 0 ? "table-warning" : "")">
|
||||
<td><code>@n.SourceNode</code></td>
|
||||
<td class="text-end font-monospace">@n.QueueDepth</td>
|
||||
<td class="text-end font-monospace @(n.StuckCount > 0 ? "text-warning" : "")">@n.StuckCount</td>
|
||||
<td class="text-end font-monospace @(n.ParkedCount > 0 ? "text-danger" : "")">@n.ParkedCount</td>
|
||||
<td class="text-end font-monospace text-success">@n.DeliveredLastInterval</td>
|
||||
<td class="text-end font-monospace">@FormatAge(n.OldestPendingAge)</td>
|
||||
</tr>
|
||||
}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
}
|
||||
|
||||
@* ── Per-site breakdown ── *@
|
||||
<h5 class="mb-2">Per-site breakdown</h5>
|
||||
@if (_perSiteError != null)
|
||||
@@ -124,6 +169,10 @@
|
||||
private IReadOnlyList<SiteNotificationKpiSnapshot> _perSite = Array.Empty<SiteNotificationKpiSnapshot>();
|
||||
private string? _perSiteError;
|
||||
|
||||
// ── Per-node (T6: M5.2 per-node stuck-count KPIs) ──
|
||||
private IReadOnlyList<NodeNotificationKpiSnapshot> _perNode = Array.Empty<NodeNotificationKpiSnapshot>();
|
||||
private string? _perNodeError;
|
||||
|
||||
private bool _loading;
|
||||
|
||||
protected override async Task OnInitializedAsync()
|
||||
@@ -144,9 +193,9 @@
|
||||
private async Task RefreshAll()
|
||||
{
|
||||
_loading = true;
|
||||
// Race-free despite both tasks mutating component fields: Blazor Server runs
|
||||
// Race-free despite all tasks mutating component fields: Blazor Server runs
|
||||
// every continuation on the circuit's single-threaded synchronization context.
|
||||
await Task.WhenAll(LoadGlobalKpis(), LoadPerSiteKpis());
|
||||
await Task.WhenAll(LoadGlobalKpis(), LoadPerSiteKpis(), LoadPerNodeKpis());
|
||||
_loading = false;
|
||||
}
|
||||
|
||||
@@ -194,6 +243,28 @@
|
||||
}
|
||||
}
|
||||
|
||||
private async Task LoadPerNodeKpis()
|
||||
{
|
||||
try
|
||||
{
|
||||
var response = await CommunicationService.GetPerNodeNotificationKpisAsync(
|
||||
new PerNodeNotificationKpiRequest(Guid.NewGuid().ToString("N")));
|
||||
if (response.Success)
|
||||
{
|
||||
_perNode = response.Nodes;
|
||||
_perNodeError = null;
|
||||
}
|
||||
else
|
||||
{
|
||||
_perNodeError = response.ErrorMessage ?? "Per-node KPI query failed.";
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_perNodeError = $"Per-node KPI query failed: {ex.Message}";
|
||||
}
|
||||
}
|
||||
|
||||
private string SiteName(string siteId) =>
|
||||
_sites.FirstOrDefault(s => s.SiteIdentifier == siteId)?.Name ?? siteId;
|
||||
|
||||
|
||||
@@ -87,6 +87,42 @@ public interface IAuditLogRepository
|
||||
/// <returns>A task that resolves to the approximate number of rows discarded by the partition switch.</returns>
|
||||
Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default);
|
||||
|
||||
/// <summary>
|
||||
/// M5.5 (T3) per-channel retention override purge. Deletes <c>AuditLog</c> rows for a
|
||||
/// single <paramref name="channel"/> (matched against the canonical
|
||||
/// <c>Category</c> column — the bare channel name, e.g. <c>ApiOutbound</c>) whose
|
||||
/// <c>OccurredAtUtc</c> is strictly older than <paramref name="threshold"/>, in
|
||||
/// bounded batches of <paramref name="batchSize"/> rows, looping until no further
|
||||
/// rows match. Returns the total number of rows deleted across all batches.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// <b>Maintenance path — NOT the writer role.</b> The append-only invariant binds
|
||||
/// the <c>scadabridge_audit_writer</c> ingest role (INSERT + SELECT only). This row
|
||||
/// DELETE runs on the purge/maintenance connection, the same path that performs the
|
||||
/// global partition switch-out (also a destructive operation forbidden to the writer
|
||||
/// role). Per-channel overrides can only ever expire rows EARLIER than the global
|
||||
/// month-partition switch-out would — never later — so this is a strict tightening
|
||||
/// of the retention window, applied AFTER the global purge on the same tick.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// <b>Bounded + idempotent.</b> Each batch is a <c>DELETE TOP (@batch)</c> so the
|
||||
/// transaction log and lock footprint stay bounded regardless of backlog. Re-running
|
||||
/// the purge is a no-op once every eligible row is gone (the loop exits when a batch
|
||||
/// deletes zero rows), so a crash mid-loop is recoverable by simply running again.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
/// <param name="channel">Canonical channel name (the <c>Category</c> column value, e.g. <c>ApiOutbound</c>).</param>
|
||||
/// <param name="threshold">Rows with <c>OccurredAtUtc</c> strictly older than this UTC datetime are deleted.</param>
|
||||
/// <param name="batchSize">Maximum rows deleted per batch; must be > 0.</param>
|
||||
/// <param name="ct">Cancellation token.</param>
|
||||
/// <returns>A task that resolves to the total number of rows deleted across all batches.</returns>
|
||||
Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel,
|
||||
DateTime threshold,
|
||||
int batchSize,
|
||||
CancellationToken ct = default);
|
||||
|
||||
/// <summary>
|
||||
/// Returns the set of <c>pf_AuditLog_Month</c> partition lower-bound
|
||||
/// boundaries whose partitions contain only rows with
|
||||
@@ -201,4 +237,59 @@ public interface IAuditLogRepository
|
||||
/// <param name="ct">Cancellation token.</param>
|
||||
/// <returns>A task that resolves to the distinct, non-null source node names in ascending order.</returns>
|
||||
Task<IReadOnlyList<string>> GetDistinctSourceNodesAsync(CancellationToken ct = default);
|
||||
|
||||
/// <summary>
|
||||
/// M5.6 (T5) one-time operational backfill: sets <c>SourceNode</c> to
|
||||
/// <paramref name="sentinel"/> on every row where <c>SourceNode IS NULL</c>
|
||||
/// and <c>OccurredAtUtc < <paramref name="before"/></c>, in bounded
|
||||
/// batches of <paramref name="batchSize"/> rows, looping until no further
|
||||
/// rows match. Returns the total number of rows updated across all batches.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// <b>Why a sentinel, not the real value.</b> <c>SourceNode</c> captures the
|
||||
/// physical cluster node on which an event was emitted. For pre-feature rows
|
||||
/// that were ingested before the column was stamped, the true node-of-origin
|
||||
/// is UNKNOWABLE — the original emitter is long gone and there is no
|
||||
/// retroactive way to determine it. Backfilling a configurable sentinel
|
||||
/// (default <c>"unknown"</c>) makes it explicit that these rows pre-date the
|
||||
/// feature rather than silently leaving them NULL (which the filter UI already
|
||||
/// treats as "unresolved" but which an operator might mistake for a bug).
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// <b><c>ExecutionId</c> / <c>ParentExecutionId</c> cannot be backfilled.</b>
|
||||
/// These are PERSISTED COMPUTED columns derived from <c>DetailsJson</c>. The
|
||||
/// AuditLog append-only invariant forbids mutating <c>DetailsJson</c>, so
|
||||
/// the computed values for pre-feature rows remain NULL permanently. This is
|
||||
/// documented rather than coded — see the Ops Note in
|
||||
/// <c>Component-AuditLog.md § Ops Notes — Historical Null Columns</c>.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// <b>Maintenance path — NOT the writer role.</b> This UPDATE runs on the
|
||||
/// purge/maintenance connection (the same path as
|
||||
/// <see cref="SwitchOutPartitionAsync"/> and any per-channel purge), NOT the
|
||||
/// append-only <c>scadabridge_audit_writer</c> role. The CI guard
|
||||
/// (<c>AuditLogAppendOnlyGuardTests</c>) recognises the
|
||||
/// <c>// AUDIT-PURGE-ALLOWED</c> marker on the UPDATE line and forgives
|
||||
/// exactly this one sanctioned maintenance-path UPDATE; any other UPDATE
|
||||
/// against <c>AuditLog</c> still trips the guard.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// <b>Bounded + idempotent.</b> <c>UPDATE TOP (@batch)</c> caps the
|
||||
/// transaction-log and lock footprint per statement. The loop exits when a
|
||||
/// batch updates zero rows, so a crash mid-loop is recoverable by simply
|
||||
/// running again; re-running after completion is a no-op (no NULL rows
|
||||
/// remain for the given <paramref name="before"/> window).
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
/// <param name="sentinel">Value to write into <c>SourceNode</c> for pre-feature rows (e.g. <c>"unknown"</c>).</param>
|
||||
/// <param name="before">Rows with <c>OccurredAtUtc</c> strictly older than this UTC datetime are eligible.</param>
|
||||
/// <param name="batchSize">Maximum rows updated per batch; must be > 0.</param>
|
||||
/// <param name="ct">Cancellation token.</param>
|
||||
/// <returns>A task that resolves to the total number of rows updated across all batches.</returns>
|
||||
Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel,
|
||||
DateTime before,
|
||||
int batchSize,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
|
||||
+13
@@ -100,6 +100,19 @@ public interface INotificationOutboxRepository
|
||||
Task<IReadOnlyList<SiteNotificationKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Computes a point-in-time <see cref="NodeNotificationKpiSnapshot"/> per originating node.
|
||||
/// Nodes with no notification rows at all are omitted; rows with a <c>NULL</c>
|
||||
/// <c>SourceNode</c> are excluded. The stuck and delivered cutoffs are supplied by the
|
||||
/// caller; the current time used for <c>OldestPendingAge</c> is captured inside the method.
|
||||
/// </summary>
|
||||
/// <param name="stuckCutoff">The time threshold for marking notifications as stuck.</param>
|
||||
/// <param name="deliveredSince">The time threshold for counting delivered notifications.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>A list of per-node KPI snapshots, ordered by node name.</returns>
|
||||
Task<IReadOnlyList<NodeNotificationKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default);
|
||||
|
||||
/// <summary>
|
||||
/// Persists pending changes tracked on the underlying context. Use this when staging
|
||||
/// multiple changes for a single commit; the individual mutating methods on this
|
||||
|
||||
@@ -107,4 +107,19 @@ public interface ISiteCallAuditRepository
|
||||
DateTime stuckCutoff,
|
||||
DateTime intervalSince,
|
||||
CancellationToken ct = default);
|
||||
|
||||
/// <summary>
|
||||
/// Computes a point-in-time <see cref="SiteCallNodeKpiSnapshot"/> per originating
|
||||
/// node. Nodes with no <c>SiteCalls</c> rows at all are omitted; rows with a
|
||||
/// <c>NULL</c> <c>SourceNode</c> are excluded. The stuck cutoff and interval
|
||||
/// bounds are interpreted as in <see cref="ComputeKpisAsync"/>.
|
||||
/// </summary>
|
||||
/// <param name="stuckCutoff">UTC threshold for classifying a row as stuck.</param>
|
||||
/// <param name="intervalSince">UTC start of the delivered/failed interval window.</param>
|
||||
/// <param name="ct">Cancellation token.</param>
|
||||
/// <returns>A task that resolves to a per-node KPI list; nodes with no rows are omitted.</returns>
|
||||
Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff,
|
||||
DateTime intervalSince,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
|
||||
@@ -164,3 +164,24 @@ public sealed record PerSiteSiteCallKpiResponse(
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
IReadOnlyList<SiteCallSiteKpiSnapshot> Sites);
|
||||
|
||||
/// <summary>
|
||||
/// Site Calls UI -> Central: request for the per-node <c>SiteCalls</c>
|
||||
/// KPI breakdown. Mirrors <see cref="PerSiteSiteCallKpiRequest"/> but groups
|
||||
/// by <c>SourceNode</c> instead of <c>SourceSite</c>. Additive — does not
|
||||
/// change per-site behaviour.
|
||||
/// </summary>
|
||||
public sealed record PerNodeSiteCallKpiRequest(
|
||||
string CorrelationId);
|
||||
|
||||
/// <summary>
|
||||
/// Central -> Site Calls UI: per-node KPI breakdown for the Site Calls KPIs
|
||||
/// page. On a repository fault <see cref="Success"/> is <c>false</c>,
|
||||
/// <see cref="ErrorMessage"/> carries the cause, and <see cref="Nodes"/> is empty.
|
||||
/// Nodes with a <c>NULL</c> <c>SourceNode</c> are omitted.
|
||||
/// </summary>
|
||||
public sealed record PerNodeSiteCallKpiResponse(
|
||||
string CorrelationId,
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
IReadOnlyList<SiteCallNodeKpiSnapshot> Nodes);
|
||||
|
||||
@@ -83,3 +83,46 @@ public record RouteToSetAttributesResponse(
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
DateTimeOffset Timestamp);
|
||||
|
||||
/// <summary>
|
||||
/// Request to block until a remote instance attribute reaches a target value
|
||||
/// (spec §6 — <c>Route.To("inst").WaitForAttribute(name, targetValue, timeout)</c>).
|
||||
/// Value-equality ONLY across the wire: <see cref="TargetValueEncoded"/> carries the
|
||||
/// canonical <c>AttributeValueCodec</c>-encoded target; there is no predicate and no
|
||||
/// quality flag in the comparison. The site evaluates equality and either matches or
|
||||
/// times out.
|
||||
/// </summary>
|
||||
/// <param name="ParentExecutionId">
|
||||
/// Audit Log #23 (ParentExecutionId): mirrors <see cref="RouteToCallRequest.ParentExecutionId"/>.
|
||||
/// For an inbound-API-routed wait this is the inbound request's per-request execution id;
|
||||
/// future site-side audit emission for routed waits can stamp it as <c>ParentExecutionId</c>
|
||||
/// so the inbound→site execution-tree link survives the wait path. Additive trailing
|
||||
/// member — null for the Central UI sandbox path or for callers built before the field existed.
|
||||
/// </param>
|
||||
public record RouteToWaitForAttributeRequest(
|
||||
string CorrelationId,
|
||||
string InstanceUniqueName,
|
||||
string AttributeName,
|
||||
string? TargetValueEncoded,
|
||||
TimeSpan Timeout,
|
||||
DateTimeOffset Timestamp,
|
||||
Guid? ParentExecutionId = null);
|
||||
|
||||
/// <summary>
|
||||
/// Response from a remote attribute wait. <see cref="Success"/>/<see cref="ErrorMessage"/>
|
||||
/// convey the routing-level outcome (e.g. instance-not-found); <see cref="Matched"/>,
|
||||
/// <see cref="TimedOut"/>, <see cref="Value"/>, and <see cref="Quality"/> convey the wait
|
||||
/// outcome itself. When <see cref="Success"/> is <c>true</c>, exactly one of
|
||||
/// <see cref="Matched"/>/<see cref="TimedOut"/> holds: <see cref="Matched"/> means the
|
||||
/// attribute reached the target value (with <see cref="Value"/>/<see cref="Quality"/>
|
||||
/// captured at the match), <see cref="TimedOut"/> means the deadline elapsed first.
|
||||
/// </summary>
|
||||
public record RouteToWaitForAttributeResponse(
|
||||
string CorrelationId,
|
||||
bool Matched,
|
||||
object? Value,
|
||||
string? Quality,
|
||||
bool TimedOut,
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
DateTimeOffset Timestamp);
|
||||
|
||||
@@ -0,0 +1,82 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
|
||||
|
||||
/// <summary>
|
||||
/// Request to wait, event-driven, until an attribute reaches a value (or any
|
||||
/// value satisfying a predicate), bounded by a timeout — the backing protocol for
|
||||
/// the script-facing <c>Attributes.WaitAsync</c> helper.
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Site-local only.</b> The optional <see cref="Predicate"/> is a non-serializable
|
||||
/// in-process delegate, so this message MUST flow only within a single site node's
|
||||
/// actor system (script execution → Instance Actor). It is never sent across the
|
||||
/// ClusterClient / gRPC boundary. The value-equality form (<see cref="TargetValueEncoded"/>)
|
||||
/// would serialize, but the routed/inbound variant is deliberately out of scope here.
|
||||
/// </para>
|
||||
/// </summary>
|
||||
/// <param name="CorrelationId">Per-wait correlation id; keys the waiter registry and the timeout self-message.</param>
|
||||
/// <param name="InstanceName">The instance this wait targets.</param>
|
||||
/// <param name="AttributeName">The attribute to watch — already scope-resolved by the accessor.</param>
|
||||
/// <param name="TargetValueEncoded">
|
||||
/// The codec-encoded target value (<c>AttributeValueCodec.Encode(target)</c>). A
|
||||
/// match compares the codec-encoded form of the current value against this string.
|
||||
/// When both this and <see cref="Predicate"/> are null the wait matches on ANY change.
|
||||
/// </param>
|
||||
/// <param name="Predicate">
|
||||
/// Site-local predicate tested against the raw (decoded) current value. Mutually
|
||||
/// exclusive with <see cref="TargetValueEncoded"/> — null when the encoded target is used.
|
||||
/// </param>
|
||||
/// <param name="Timeout">How long to wait before self-evicting with a timeout reply.</param>
|
||||
/// <param name="OccurredAtUtc">When the request was issued (UTC).</param>
|
||||
/// <param name="RequireGoodQuality">
|
||||
/// Quality-gated ("Good"-only) mode (spec §4.2): when <see langword="true"/>, a
|
||||
/// match additionally requires the attribute quality to be exactly
|
||||
/// <c>"Good"</c> (<see cref="System.StringComparison.Ordinal"/>) — a value that
|
||||
/// reaches the target / satisfies the predicate at Bad/Uncertain quality is NOT a
|
||||
/// match and the waiter stays pending until the value satisfies the test at Good
|
||||
/// quality (or times out). Defaults to <see langword="false"/> (quality-agnostic:
|
||||
/// the match tests the value only). Trailing/defaulted so existing positional
|
||||
/// constructions compile unchanged.
|
||||
/// </param>
|
||||
public record WaitForAttributeRequest(
|
||||
string CorrelationId,
|
||||
string InstanceName,
|
||||
string AttributeName,
|
||||
string? TargetValueEncoded,
|
||||
Func<object?, bool>? Predicate,
|
||||
TimeSpan Timeout,
|
||||
DateTimeOffset OccurredAtUtc,
|
||||
bool RequireGoodQuality = false);
|
||||
|
||||
/// <summary>
|
||||
/// Reply to a <see cref="WaitForAttributeRequest"/>. Exactly one of
|
||||
/// <see cref="Matched"/> / <see cref="TimedOut"/> is set on the happy paths;
|
||||
/// <see cref="ErrorMessage"/> is populated on the failure paths (per-instance
|
||||
/// waiter cap exceeded, or the match predicate threw).
|
||||
/// </summary>
|
||||
/// <param name="CorrelationId">Echoes the request's correlation id.</param>
|
||||
/// <param name="Matched">True when the attribute reached the target/predicate within the timeout.</param>
|
||||
/// <param name="Value">The matched value (null on timeout / error).</param>
|
||||
/// <param name="Quality">
|
||||
/// The attribute quality at match time; <see langword="null"/> on the non-match
|
||||
/// paths (timeout / error / cap-exceeded), matching the nullable
|
||||
/// <see cref="ErrorMessage"/> convention.
|
||||
/// </param>
|
||||
/// <param name="TimedOut">True when the timeout fired before a match.</param>
|
||||
/// <param name="ErrorMessage">
|
||||
/// Non-null only when the wait failed/refused — the per-instance waiter cap was
|
||||
/// exceeded, or the match predicate threw (<c>"Wait predicate threw: …"</c>).
|
||||
/// </param>
|
||||
public record WaitForAttributeResponse(
|
||||
string CorrelationId,
|
||||
bool Matched,
|
||||
object? Value,
|
||||
string? Quality,
|
||||
bool TimedOut,
|
||||
string? ErrorMessage = null);
|
||||
|
||||
/// <summary>
|
||||
/// Internal self-message scheduled by the Instance Actor to fire a waiter's
|
||||
/// timeout. Site-local only; never crosses a cluster boundary.
|
||||
/// </summary>
|
||||
/// <param name="CorrelationId">The waiter whose timeout fired.</param>
|
||||
public record WaitForAttributeTimeout(string CorrelationId);
|
||||
@@ -159,3 +159,23 @@ public record PerSiteNotificationKpiResponse(
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
IReadOnlyList<SiteNotificationKpiSnapshot> Sites);
|
||||
|
||||
/// <summary>
|
||||
/// Outbox UI -> Central: request for the per-node notification outbox KPI breakdown.
|
||||
/// Mirrors <see cref="PerSiteNotificationKpiRequest"/> but groups by <c>SourceNode</c>
|
||||
/// instead of <c>SourceSiteId</c>. Additive — does not change per-site behaviour.
|
||||
/// </summary>
|
||||
public record PerNodeNotificationKpiRequest(
|
||||
string CorrelationId);
|
||||
|
||||
/// <summary>
|
||||
/// Central -> Outbox UI: per-node KPI breakdown for the Notification KPIs page.
|
||||
/// On a repository fault <see cref="Success"/> is <c>false</c>, <see cref="ErrorMessage"/>
|
||||
/// carries the cause, and <see cref="Nodes"/> is empty. Nodes with a <c>NULL</c>
|
||||
/// <c>SourceNode</c> are omitted.
|
||||
/// </summary>
|
||||
public record PerNodeNotificationKpiResponse(
|
||||
string CorrelationId,
|
||||
bool Success,
|
||||
string? ErrorMessage,
|
||||
IReadOnlyList<NodeNotificationKpiSnapshot> Nodes);
|
||||
|
||||
@@ -0,0 +1,37 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
|
||||
/// <summary>
|
||||
/// Point-in-time <c>SiteCalls</c> metrics scoped to a single originating node. The
|
||||
/// per-node counterpart of <see cref="SiteCallSiteKpiSnapshot"/>; surfaced in the
|
||||
/// per-node breakdown table on the Site Calls KPIs page. Mirrors
|
||||
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.Notifications.NodeNotificationKpiSnapshot"/>.
|
||||
/// </summary>
|
||||
/// <param name="SourceNode">
|
||||
/// The node identifier these metrics are scoped to (e.g. <c>node-a</c>,
|
||||
/// <c>node-b</c>). Rows with a <c>NULL</c> <c>SourceNode</c> are omitted.
|
||||
/// </param>
|
||||
/// <param name="BufferedCount">Count of this node's non-terminal rows (<c>TerminalAtUtc IS NULL</c>).</param>
|
||||
/// <param name="ParkedCount">Count of this node's rows in the <c>Parked</c> status.</param>
|
||||
/// <param name="FailedLastInterval">
|
||||
/// Count of this node's <c>Failed</c> rows whose <c>TerminalAtUtc</c> is at or
|
||||
/// after the "since" timestamp.
|
||||
/// </param>
|
||||
/// <param name="DeliveredLastInterval">
|
||||
/// Count of this node's <c>Delivered</c> rows whose <c>TerminalAtUtc</c> is at
|
||||
/// or after the "since" timestamp.
|
||||
/// </param>
|
||||
/// <param name="OldestPendingAge">
|
||||
/// Age of this node's oldest non-terminal row, or <c>null</c> when it has none.
|
||||
/// </param>
|
||||
/// <param name="StuckCount">
|
||||
/// Count of this node's non-terminal rows whose <c>CreatedAtUtc</c> is older
|
||||
/// than the stuck cutoff.
|
||||
/// </param>
|
||||
public sealed record SiteCallNodeKpiSnapshot(
|
||||
string SourceNode,
|
||||
int BufferedCount,
|
||||
int ParkedCount,
|
||||
int FailedLastInterval,
|
||||
int DeliveredLastInterval,
|
||||
TimeSpan? OldestPendingAge,
|
||||
int StuckCount);
|
||||
@@ -0,0 +1,30 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.Notifications;
|
||||
|
||||
/// <summary>
|
||||
/// Point-in-time notification-outbox metrics scoped to a single originating node.
|
||||
/// The per-node counterpart of <see cref="SiteNotificationKpiSnapshot"/>; surfaced
|
||||
/// in the per-node breakdown table on the Notification KPIs page.
|
||||
/// </summary>
|
||||
/// <param name="SourceNode">
|
||||
/// The node identifier these metrics are scoped to (e.g. <c>node-a</c>,
|
||||
/// <c>node-b</c>). Rows with a <c>NULL</c> <c>SourceNode</c> are omitted.
|
||||
/// </param>
|
||||
/// <param name="QueueDepth">Count of this node's non-terminal rows (Pending + Retrying).</param>
|
||||
/// <param name="StuckCount">
|
||||
/// Count of this node's non-terminal rows whose <c>CreatedAt</c> is older than the stuck cutoff.
|
||||
/// </param>
|
||||
/// <param name="ParkedCount">Count of this node's rows in the Parked status.</param>
|
||||
/// <param name="DeliveredLastInterval">
|
||||
/// Count of this node's Delivered rows whose <c>DeliveredAt</c> is at or after the
|
||||
/// "delivered since" timestamp.
|
||||
/// </param>
|
||||
/// <param name="OldestPendingAge">
|
||||
/// Age of this node's oldest non-terminal row, or <c>null</c> when it has none.
|
||||
/// </param>
|
||||
public record NodeNotificationKpiSnapshot(
|
||||
string SourceNode,
|
||||
int QueueDepth,
|
||||
int StuckCount,
|
||||
int ParkedCount,
|
||||
int DeliveredLastInterval,
|
||||
TimeSpan? OldestPendingAge);
|
||||
@@ -0,0 +1,21 @@
|
||||
namespace ZB.MOM.WW.ScadaBridge.Commons.Types;
|
||||
|
||||
/// <summary>
|
||||
/// Rich result of an <c>Attributes.WaitForAsync</c> wait (spec §3) — the full
|
||||
/// outcome of waiting for an attribute to reach a value / satisfy a predicate /
|
||||
/// change at all, bounded by a timeout. The <c>Attributes.WaitAsync</c> helpers
|
||||
/// surface only <see cref="Matched"/>; <c>WaitForAsync</c> returns this struct so
|
||||
/// a script can also read the matched <see cref="Value"/>, its <see cref="Quality"/>,
|
||||
/// and distinguish a genuine timeout (<see cref="TimedOut"/>) from a non-match.
|
||||
/// </summary>
|
||||
/// <param name="Matched">
|
||||
/// <see langword="true"/> when the attribute reached the target / satisfied the
|
||||
/// predicate within the timeout (and, in quality-gated mode, at "Good" quality).
|
||||
/// </param>
|
||||
/// <param name="Value">The matched value; <see langword="null"/> on timeout / error.</param>
|
||||
/// <param name="Quality">
|
||||
/// The attribute quality at match time; <see langword="null"/> on the non-match
|
||||
/// paths (timeout / error / cap-exceeded).
|
||||
/// </param>
|
||||
/// <param name="TimedOut"><see langword="true"/> when the timeout fired before a match.</param>
|
||||
public readonly record struct WaitResult(bool Matched, object? Value, string? Quality, bool TimedOut);
|
||||
@@ -144,6 +144,7 @@ public class SiteCommunicationActor : ReceiveActor, IWithTimers
|
||||
Receive<RouteToCallRequest>(msg => _deploymentManagerProxy.Forward(msg));
|
||||
Receive<RouteToGetAttributesRequest>(msg => _deploymentManagerProxy.Forward(msg));
|
||||
Receive<RouteToSetAttributesRequest>(msg => _deploymentManagerProxy.Forward(msg));
|
||||
Receive<RouteToWaitForAttributeRequest>(msg => _deploymentManagerProxy.Forward(msg));
|
||||
|
||||
// OPC UA Tag Browser (interactive design-time query) — forward to the
|
||||
// Deployment Manager singleton, which always lands on the active site
|
||||
|
||||
@@ -445,6 +445,25 @@ public class CommunicationService
|
||||
envelope, _options.IntegrationTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Routes an inbound API wait-for-attribute request to a site (spec §6).
|
||||
/// </summary>
|
||||
/// <param name="siteId">The target site identifier.</param>
|
||||
/// <param name="request">The wait-for-attribute route request.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The wait-for-attribute route response.</returns>
|
||||
public async Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
|
||||
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken = default)
|
||||
{
|
||||
var envelope = new SiteEnvelope(siteId, request);
|
||||
// A wait legitimately blocks up to request.Timeout on the site, so the cluster
|
||||
// Ask must be bounded by the WAIT deadline (plus integration-timeout slack for
|
||||
// the round trip), not the generic IntegrationTimeout used by the other routes.
|
||||
var askTimeout = request.Timeout + _options.IntegrationTimeout;
|
||||
return await GetActor().Ask<RouteToWaitForAttributeResponse>(
|
||||
envelope, askTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
// ── Notification Outbox (central-local actor — Asked directly, no SiteEnvelope) ──
|
||||
|
||||
/// <summary>
|
||||
@@ -525,6 +544,22 @@ public class CommunicationService
|
||||
request, _options.QueryTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Gets per-node KPI metrics for the notification outbox.
|
||||
/// Groups by <c>SourceNode</c> (e.g. <c>node-a</c>/<c>node-b</c>); rows with
|
||||
/// a <c>NULL</c> node are omitted. Additive alongside
|
||||
/// <see cref="GetPerSiteNotificationKpisAsync"/>.
|
||||
/// </summary>
|
||||
/// <param name="request">The per-node notification KPI request.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The per-node notification KPI response.</returns>
|
||||
public async Task<PerNodeNotificationKpiResponse> GetPerNodeNotificationKpisAsync(
|
||||
PerNodeNotificationKpiRequest request, CancellationToken cancellationToken = default)
|
||||
{
|
||||
return await GetNotificationOutbox().Ask<PerNodeNotificationKpiResponse>(
|
||||
request, _options.QueryTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
// ── Site Call Audit (central-local actor — Asked directly, no SiteEnvelope) ──
|
||||
|
||||
/// <summary>
|
||||
@@ -579,6 +614,21 @@ public class CommunicationService
|
||||
request, _options.QueryTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Gets per-node KPI metrics for site calls. Groups by <c>SourceNode</c>
|
||||
/// (e.g. <c>node-a</c>/<c>node-b</c>); rows with a <c>NULL</c> node are
|
||||
/// omitted. Additive alongside <see cref="GetPerSiteSiteCallKpisAsync"/>.
|
||||
/// </summary>
|
||||
/// <param name="request">The per-node site call KPI request.</param>
|
||||
/// <param name="cancellationToken">Cancellation token.</param>
|
||||
/// <returns>The per-node site call KPI response.</returns>
|
||||
public async Task<PerNodeSiteCallKpiResponse> GetPerNodeSiteCallKpisAsync(
|
||||
PerNodeSiteCallKpiRequest request, CancellationToken cancellationToken = default)
|
||||
{
|
||||
return await GetSiteCallAudit().Ask<PerNodeSiteCallKpiResponse>(
|
||||
request, _options.QueryTimeout, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Task 5 (#22): relays an operator Retry of a parked cached call to its
|
||||
/// owning site. The <c>SiteCallAuditActor</c> is Asked directly (it is
|
||||
|
||||
@@ -370,6 +370,99 @@ VALUES
|
||||
return rowsDeleted;
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel,
|
||||
DateTime threshold,
|
||||
int batchSize,
|
||||
CancellationToken ct = default)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(channel))
|
||||
{
|
||||
throw new ArgumentException("Channel must be a non-empty channel name.", nameof(channel));
|
||||
}
|
||||
|
||||
if (batchSize <= 0)
|
||||
{
|
||||
throw new ArgumentOutOfRangeException(nameof(batchSize), batchSize, "Batch size must be > 0.");
|
||||
}
|
||||
|
||||
var thresholdUtc = DateTime.SpecifyKind(threshold.ToUniversalTime(), DateTimeKind.Utc);
|
||||
|
||||
// M5.5 (T3) per-channel retention override purge. This is the ONLY DELETE
|
||||
// against dbo.AuditLog in the codebase and it runs on the purge/maintenance
|
||||
// path, NOT the append-only writer role (which has INSERT + SELECT only — see
|
||||
// the DENY UPDATE/DENY DELETE grants in CollapseAuditLogToCanonical). The
|
||||
// AuditLog append-only CI guard (AuditLogAppendOnlyGuardTests) is intentionally
|
||||
// widened to allow ONLY the single marked DELETE below; any other UPDATE/DELETE
|
||||
// targeting AuditLog still trips the guard.
|
||||
//
|
||||
// Bounded + idempotent: DELETE TOP (@batch) caps the log/lock footprint per
|
||||
// statement; the loop repeats until a batch deletes zero rows, so re-running
|
||||
// after a crash mid-loop simply resumes. Category is the canonical
|
||||
// channel-name column (e.g. 'ApiOutbound'); Action holds "{channel}.{kind}" so
|
||||
// it is NOT the right column to match a bare channel name against.
|
||||
//
|
||||
// The trailing AUDIT-PURGE-ALLOWED marker on the DELETE line below is the
|
||||
// single narrow exemption the append-only CI guard (AuditLogAppendOnlyGuardTests)
|
||||
// recognizes; any other UPDATE/DELETE targeting AuditLog still trips the guard.
|
||||
const string deleteBatchSql =
|
||||
"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;"; // AUDIT-PURGE-ALLOWED: per-channel retention override (M5.5 T3), maintenance path
|
||||
|
||||
long totalDeleted = 0;
|
||||
|
||||
var conn = _context.Database.GetDbConnection();
|
||||
var openedHere = false;
|
||||
if (conn.State != System.Data.ConnectionState.Open)
|
||||
{
|
||||
await conn.OpenAsync(ct).ConfigureAwait(false);
|
||||
openedHere = true;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
while (true)
|
||||
{
|
||||
ct.ThrowIfCancellationRequested();
|
||||
|
||||
await using var cmd = conn.CreateCommand();
|
||||
cmd.CommandText = deleteBatchSql;
|
||||
|
||||
var pBatch = cmd.CreateParameter();
|
||||
pBatch.ParameterName = "@batch";
|
||||
pBatch.Value = batchSize;
|
||||
cmd.Parameters.Add(pBatch);
|
||||
|
||||
var pChannel = cmd.CreateParameter();
|
||||
pChannel.ParameterName = "@channel";
|
||||
pChannel.Value = channel;
|
||||
cmd.Parameters.Add(pChannel);
|
||||
|
||||
var pThreshold = cmd.CreateParameter();
|
||||
pThreshold.ParameterName = "@threshold";
|
||||
pThreshold.Value = thresholdUtc;
|
||||
cmd.Parameters.Add(pThreshold);
|
||||
|
||||
var rows = await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
|
||||
if (rows <= 0)
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
totalDeleted += rows;
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
if (openedHere)
|
||||
{
|
||||
await conn.CloseAsync().ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
return totalDeleted;
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
|
||||
DateTime threshold,
|
||||
@@ -716,6 +809,102 @@ VALUES
|
||||
.ToListAsync(ct);
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel,
|
||||
DateTime before,
|
||||
int batchSize,
|
||||
CancellationToken ct = default)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(sentinel))
|
||||
{
|
||||
throw new ArgumentException("Sentinel must be a non-empty value.", nameof(sentinel));
|
||||
}
|
||||
|
||||
if (batchSize <= 0)
|
||||
{
|
||||
throw new ArgumentOutOfRangeException(nameof(batchSize), batchSize, "Batch size must be > 0.");
|
||||
}
|
||||
|
||||
var beforeUtc = DateTime.SpecifyKind(before.ToUniversalTime(), DateTimeKind.Utc);
|
||||
|
||||
// M5.6 (T5) SourceNode sentinel backfill. This is the ONE sanctioned UPDATE
|
||||
// against dbo.AuditLog in the codebase. It touches ONLY rows where
|
||||
// SourceNode IS NULL AND OccurredAtUtc < @before — rows that pre-date the
|
||||
// M5.6 feature and whose node-of-origin is UNKNOWABLE. The sentinel (default
|
||||
// "unknown") makes that explicit. ExecutionId/ParentExecutionId are PERSISTED
|
||||
// COMPUTED columns derived from DetailsJson — mutating DetailsJson is forbidden
|
||||
// under the append-only invariant, so those stay NULL on pre-feature rows.
|
||||
//
|
||||
// Maintenance path (NOT the writer role): runs on the same connection used for
|
||||
// SwitchOutPartitionAsync (partition-switch DDL), which requires a role that
|
||||
// holds UPDATE — the append-only scadabridge_audit_writer role has only
|
||||
// INSERT + SELECT.
|
||||
//
|
||||
// Bounded + idempotent: UPDATE TOP (@batch) caps the log/lock footprint per
|
||||
// statement; the loop exits when a batch updates 0 rows. Re-running after a
|
||||
// crash simply resumes where it left off.
|
||||
//
|
||||
// The trailing AUDIT-PURGE-ALLOWED marker on the UPDATE line below is the
|
||||
// single narrow exemption the append-only CI guard (AuditLogAppendOnlyGuardTests)
|
||||
// recognises for an UPDATE; any other UPDATE targeting AuditLog still trips the guard.
|
||||
const string updateBatchSql =
|
||||
"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;"; // AUDIT-PURGE-ALLOWED: SourceNode sentinel backfill (M5.6 T5), maintenance path
|
||||
|
||||
long totalUpdated = 0;
|
||||
|
||||
var conn = _context.Database.GetDbConnection();
|
||||
var openedHere = false;
|
||||
if (conn.State != System.Data.ConnectionState.Open)
|
||||
{
|
||||
await conn.OpenAsync(ct).ConfigureAwait(false);
|
||||
openedHere = true;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
while (true)
|
||||
{
|
||||
ct.ThrowIfCancellationRequested();
|
||||
|
||||
await using var cmd = conn.CreateCommand();
|
||||
cmd.CommandText = updateBatchSql;
|
||||
|
||||
var pBatch = cmd.CreateParameter();
|
||||
pBatch.ParameterName = "@batch";
|
||||
pBatch.Value = batchSize;
|
||||
cmd.Parameters.Add(pBatch);
|
||||
|
||||
var pSentinel = cmd.CreateParameter();
|
||||
pSentinel.ParameterName = "@sentinel";
|
||||
pSentinel.Value = sentinel;
|
||||
cmd.Parameters.Add(pSentinel);
|
||||
|
||||
var pBefore = cmd.CreateParameter();
|
||||
pBefore.ParameterName = "@before";
|
||||
pBefore.Value = beforeUtc;
|
||||
cmd.Parameters.Add(pBefore);
|
||||
|
||||
var rows = await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
|
||||
if (rows <= 0)
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
totalUpdated += rows;
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
if (openedHere)
|
||||
{
|
||||
await conn.CloseAsync().ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
return totalUpdated;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Splits a <c>STRING_AGG</c> comma-joined value into a distinct, ordered
|
||||
/// list. A null/empty aggregate (a stub node with no rows) yields an empty
|
||||
|
||||
+73
@@ -300,6 +300,63 @@ VALUES
|
||||
: null)).ToList();
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<IReadOnlyList<NodeNotificationKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince, CancellationToken cancellationToken = default)
|
||||
{
|
||||
var now = DateTimeOffset.UtcNow;
|
||||
|
||||
// Exclude rows with NULL SourceNode (legacy / unstamped) — per-node KPIs
|
||||
// are only meaningful when the node identity is known.
|
||||
var queueDepth = await CountByNodeAsync(
|
||||
n => (n.Status == NotificationStatus.Pending || n.Status == NotificationStatus.Retrying)
|
||||
&& n.SourceNode != null,
|
||||
cancellationToken);
|
||||
|
||||
var stuck = await CountByNodeAsync(
|
||||
n => (n.Status == NotificationStatus.Pending || n.Status == NotificationStatus.Retrying)
|
||||
&& n.CreatedAt < stuckCutoff
|
||||
&& n.SourceNode != null,
|
||||
cancellationToken);
|
||||
|
||||
var parked = await CountByNodeAsync(
|
||||
n => n.Status == NotificationStatus.Parked && n.SourceNode != null,
|
||||
cancellationToken);
|
||||
|
||||
var delivered = await CountByNodeAsync(
|
||||
n => n.Status == NotificationStatus.Delivered
|
||||
&& n.DeliveredAt != null && n.DeliveredAt >= deliveredSince
|
||||
&& n.SourceNode != null,
|
||||
cancellationToken);
|
||||
|
||||
// Oldest non-terminal CreatedAt per node — same in-memory reduction
|
||||
// pattern as ComputePerSiteKpisAsync (DateTimeOffset converter makes
|
||||
// a SQL Min awkward).
|
||||
var oldest = (await _context.Notifications
|
||||
.Where(n => (n.Status == NotificationStatus.Pending
|
||||
|| n.Status == NotificationStatus.Retrying)
|
||||
&& n.SourceNode != null)
|
||||
.Select(n => new { n.SourceNode, n.CreatedAt })
|
||||
.ToListAsync(cancellationToken))
|
||||
.GroupBy(x => x.SourceNode!)
|
||||
.ToDictionary(g => g.Key, g => g.Min(x => x.CreatedAt));
|
||||
|
||||
var nodeNames = queueDepth.Keys
|
||||
.Concat(stuck.Keys).Concat(parked.Keys).Concat(delivered.Keys)
|
||||
.Distinct()
|
||||
.OrderBy(n => n, StringComparer.Ordinal);
|
||||
|
||||
return nodeNames.Select(node => new NodeNotificationKpiSnapshot(
|
||||
SourceNode: node,
|
||||
QueueDepth: queueDepth.GetValueOrDefault(node),
|
||||
StuckCount: stuck.GetValueOrDefault(node),
|
||||
ParkedCount: parked.GetValueOrDefault(node),
|
||||
DeliveredLastInterval: delivered.GetValueOrDefault(node),
|
||||
OldestPendingAge: oldest.TryGetValue(node, out var createdAt)
|
||||
? now - createdAt
|
||||
: null)).ToList();
|
||||
}
|
||||
|
||||
/// <summary>Counts notification rows matching <paramref name="predicate"/>, grouped by source site.</summary>
|
||||
private async Task<Dictionary<string, int>> CountBySiteAsync(
|
||||
System.Linq.Expressions.Expression<Func<Notification, bool>> predicate,
|
||||
@@ -312,6 +369,22 @@ VALUES
|
||||
.ToDictionaryAsync(x => x.Site, x => x.Count, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Counts notification rows matching <paramref name="predicate"/>, grouped by source node.
|
||||
/// Only rows with a non-null <c>SourceNode</c> should be included; the predicate is
|
||||
/// responsible for enforcing that guard.
|
||||
/// </summary>
|
||||
private async Task<Dictionary<string, int>> CountByNodeAsync(
|
||||
System.Linq.Expressions.Expression<Func<Notification, bool>> predicate,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
return await _context.Notifications
|
||||
.Where(predicate)
|
||||
.GroupBy(n => n.SourceNode!)
|
||||
.Select(g => new { Node = g.Key, Count = g.Count() })
|
||||
.ToDictionaryAsync(x => x.Node, x => x.Count, cancellationToken);
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
|
||||
=> await _context.SaveChangesAsync(cancellationToken);
|
||||
|
||||
+71
@@ -324,6 +324,61 @@ ORDER BY CreatedAtUtc DESC, TrackedOperationId DESC;";
|
||||
StuckCount: stuck.GetValueOrDefault(site))).ToList();
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public async Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default)
|
||||
{
|
||||
var now = DateTime.UtcNow;
|
||||
|
||||
// Exclude rows with NULL SourceNode — per-node KPIs are only meaningful
|
||||
// when the node identity is known. Each predicate guards n.SourceNode != null
|
||||
// so the GROUP BY key is always non-null.
|
||||
var buffered = await CountByNodeAsync(
|
||||
s => s.TerminalAtUtc == null && s.SourceNode != null, ct);
|
||||
|
||||
var parked = await CountByNodeAsync(
|
||||
s => s.Status == StatusParked && s.SourceNode != null, ct);
|
||||
|
||||
var failed = await CountByNodeAsync(
|
||||
s => s.Status == StatusFailed
|
||||
&& s.TerminalAtUtc != null && s.TerminalAtUtc >= intervalSince
|
||||
&& s.SourceNode != null, ct);
|
||||
|
||||
var delivered = await CountByNodeAsync(
|
||||
s => s.Status == StatusDelivered
|
||||
&& s.TerminalAtUtc != null && s.TerminalAtUtc >= intervalSince
|
||||
&& s.SourceNode != null, ct);
|
||||
|
||||
var stuck = await CountByNodeAsync(
|
||||
s => s.TerminalAtUtc == null && s.CreatedAtUtc < stuckCutoff
|
||||
&& s.SourceNode != null, ct);
|
||||
|
||||
// Oldest non-terminal CreatedAtUtc per node — server-side GROUP BY MIN.
|
||||
var oldest = (await _context.SiteCalls
|
||||
.Where(s => s.TerminalAtUtc == null && s.SourceNode != null)
|
||||
.GroupBy(s => s.SourceNode!)
|
||||
.Select(g => new { Node = g.Key, Oldest = g.Min(s => s.CreatedAtUtc) })
|
||||
.ToListAsync(ct))
|
||||
.ToDictionary(x => x.Node, x => x.Oldest);
|
||||
|
||||
var nodeNames = buffered.Keys
|
||||
.Concat(parked.Keys).Concat(failed.Keys)
|
||||
.Concat(delivered.Keys).Concat(stuck.Keys)
|
||||
.Distinct()
|
||||
.OrderBy(n => n, StringComparer.Ordinal);
|
||||
|
||||
return nodeNames.Select(node => new SiteCallNodeKpiSnapshot(
|
||||
SourceNode: node,
|
||||
BufferedCount: buffered.GetValueOrDefault(node),
|
||||
ParkedCount: parked.GetValueOrDefault(node),
|
||||
FailedLastInterval: failed.GetValueOrDefault(node),
|
||||
DeliveredLastInterval: delivered.GetValueOrDefault(node),
|
||||
OldestPendingAge: oldest.TryGetValue(node, out var createdAt)
|
||||
? now - createdAt
|
||||
: null,
|
||||
StuckCount: stuck.GetValueOrDefault(node))).ToList();
|
||||
}
|
||||
|
||||
/// <summary>Counts <c>SiteCalls</c> rows matching <paramref name="predicate"/>, grouped by source site.</summary>
|
||||
private async Task<Dictionary<string, int>> CountBySiteAsync(
|
||||
System.Linq.Expressions.Expression<Func<SiteCall, bool>> predicate,
|
||||
@@ -336,6 +391,22 @@ ORDER BY CreatedAtUtc DESC, TrackedOperationId DESC;";
|
||||
.ToDictionaryAsync(x => x.Site, x => x.Count, ct);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Counts <c>SiteCalls</c> rows matching <paramref name="predicate"/>, grouped by source node.
|
||||
/// Only rows with a non-null <c>SourceNode</c> should be included; the predicate is
|
||||
/// responsible for enforcing that guard.
|
||||
/// </summary>
|
||||
private async Task<Dictionary<string, int>> CountByNodeAsync(
|
||||
System.Linq.Expressions.Expression<Func<SiteCall, bool>> predicate,
|
||||
CancellationToken ct)
|
||||
{
|
||||
return await _context.SiteCalls
|
||||
.Where(predicate)
|
||||
.GroupBy(s => s.SourceNode!)
|
||||
.Select(g => new { Node = g.Key, Count = g.Count() })
|
||||
.ToDictionaryAsync(x => x.Node, x => x.Count, ct);
|
||||
}
|
||||
|
||||
private static int GetRankOrThrow(string status)
|
||||
{
|
||||
if (!StatusRank.TryGetValue(status, out var rank))
|
||||
|
||||
@@ -35,4 +35,9 @@ public sealed class CommunicationServiceInstanceRouter : IInstanceRouter
|
||||
public Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
|
||||
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken) =>
|
||||
_communicationService.RouteToSetAttributesAsync(siteId, request, cancellationToken);
|
||||
|
||||
/// <inheritdoc />
|
||||
public Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
|
||||
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken) =>
|
||||
_communicationService.RouteToWaitForAttributeAsync(siteId, request, cancellationToken);
|
||||
}
|
||||
|
||||
@@ -34,4 +34,12 @@ public interface IInstanceRouter
|
||||
/// <returns>A task that resolves to the set-attributes response from the target site.</returns>
|
||||
Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
|
||||
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken);
|
||||
|
||||
/// <summary>Routes a wait-for-attribute request to the specified site (spec §6).</summary>
|
||||
/// <param name="siteId">Target site identifier.</param>
|
||||
/// <param name="request">The wait-for-attribute request to route (value-equality only).</param>
|
||||
/// <param name="cancellationToken">Cancellation token for the routed call.</param>
|
||||
/// <returns>A task that resolves to the wait-for-attribute response from the target site.</returns>
|
||||
Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
|
||||
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken);
|
||||
}
|
||||
|
||||
@@ -6,6 +6,7 @@ using Microsoft.AspNetCore.Http;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ZB.MOM.WW.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.AuditLog.Central;
|
||||
using ZB.MOM.WW.ScadaBridge.AuditLog.Configuration;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
@@ -95,6 +96,7 @@ public sealed class AuditWriteMiddleware
|
||||
private readonly ILogger<AuditWriteMiddleware> _logger;
|
||||
private readonly IOptionsMonitor<AuditLogOptions> _options;
|
||||
private readonly IAuditActorAccessor? _actorAccessor;
|
||||
private readonly IAuditInboundCeilingHitsCounter _ceilingHitsCounter;
|
||||
|
||||
/// <summary>
|
||||
/// Initializes the middleware with its required dependencies.
|
||||
@@ -110,18 +112,26 @@ public sealed class AuditWriteMiddleware
|
||||
/// construct the middleware; when absent, actor resolution falls back to the
|
||||
/// stashed API-key name only.
|
||||
/// </param>
|
||||
/// <param name="ceilingHitsCounter">
|
||||
/// M5.3 (T7, optional): incremented whenever an inbound request or response
|
||||
/// body is truncated at <see cref="AuditLogOptions.InboundMaxBytes"/>. Optional
|
||||
/// so existing tests and composition roots without the central health snapshot
|
||||
/// wired still construct without the counter; a NoOp is used when absent.
|
||||
/// </param>
|
||||
public AuditWriteMiddleware(
|
||||
RequestDelegate next,
|
||||
ICentralAuditWriter auditWriter,
|
||||
ILogger<AuditWriteMiddleware> logger,
|
||||
IOptionsMonitor<AuditLogOptions> options,
|
||||
IAuditActorAccessor? actorAccessor = null)
|
||||
IAuditActorAccessor? actorAccessor = null,
|
||||
IAuditInboundCeilingHitsCounter? ceilingHitsCounter = null)
|
||||
{
|
||||
_next = next ?? throw new ArgumentNullException(nameof(next));
|
||||
_auditWriter = auditWriter ?? throw new ArgumentNullException(nameof(auditWriter));
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
_options = options ?? throw new ArgumentNullException(nameof(options));
|
||||
_actorAccessor = actorAccessor;
|
||||
_ceilingHitsCounter = ceilingHitsCounter ?? new NoOpAuditInboundCeilingHitsCounter();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -133,9 +143,11 @@ public sealed class AuditWriteMiddleware
|
||||
{
|
||||
var sw = Stopwatch.StartNew();
|
||||
|
||||
// Per-request hot read of the inbound cap so a live config change
|
||||
// Per-request hot read of the options snapshot so a live config change
|
||||
// picks up on the next request without re-resolving the singleton.
|
||||
var cap = _options.CurrentValue.InboundMaxBytes;
|
||||
// InboundMaxBytes is read once here and passed to the capture helpers.
|
||||
var opts = _options.CurrentValue;
|
||||
var cap = opts.InboundMaxBytes;
|
||||
|
||||
// Audit Log #23 (ParentExecutionId): mint the inbound request's per-request
|
||||
// ExecutionId ONCE, here at the start of the request, and stash it on
|
||||
@@ -163,9 +175,20 @@ public sealed class AuditWriteMiddleware
|
||||
// ReadBufferedRequestBodyAsync's own ContentLength is 0 short-circuit
|
||||
// returns (null, false) for the bodyless case anyway, so the audit row
|
||||
// is unchanged.
|
||||
//
|
||||
// M5.3 (T7): check if the matched method/target has SkipBodyCapture set.
|
||||
// The route value is resolved BEFORE the pipeline runs (route matching
|
||||
// has already bound {methodName} at this point), so we can skip the
|
||||
// EnableBuffering allocation and body read up front.
|
||||
var methodNameForOverride = ctx.Request.RouteValues.TryGetValue("methodName", out var rv)
|
||||
&& rv is string mn && !string.IsNullOrWhiteSpace(mn) ? mn : null;
|
||||
var skipBody = methodNameForOverride != null
|
||||
&& opts.PerTargetOverrides.TryGetValue(methodNameForOverride, out var perTarget)
|
||||
&& perTarget.SkipBodyCapture;
|
||||
|
||||
var requestBody = (string?)null;
|
||||
var requestTruncated = false;
|
||||
if (RequestHasBody(ctx.Request))
|
||||
if (!skipBody && RequestHasBody(ctx.Request))
|
||||
{
|
||||
ctx.Request.EnableBuffering();
|
||||
(requestBody, requestTruncated) =
|
||||
@@ -200,15 +223,25 @@ public sealed class AuditWriteMiddleware
|
||||
// The forwarding wrapper has already written every byte to the
|
||||
// original sink; this just pulls back the bounded UTF-8 string.
|
||||
ctx.Response.Body = originalResponseBody;
|
||||
var (responseBody, responseTruncated) = captureStream.GetCapturedBody();
|
||||
var (capturedResponseBody, capturedResponseTruncated) = captureStream.GetCapturedBody();
|
||||
// M5.3 (T7): if SkipBodyCapture is set, discard the captured response
|
||||
// body (the request body was never captured above). The row + headers
|
||||
// still emit with null RequestSummary / ResponseSummary.
|
||||
// Truncation flags are also cleared so ceiling-hit counter is not
|
||||
// bumped for methods that deliberately opt out of body capture.
|
||||
var responseBody = skipBody ? null : capturedResponseBody;
|
||||
var responseTruncated = skipBody ? false : capturedResponseTruncated;
|
||||
|
||||
EmitInboundAudit(
|
||||
ctx,
|
||||
opts,
|
||||
sw.ElapsedMilliseconds,
|
||||
thrown,
|
||||
requestBody,
|
||||
responseBody,
|
||||
requestTruncated || responseTruncated);
|
||||
requestTruncated || responseTruncated,
|
||||
requestTruncated,
|
||||
responseTruncated);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -219,11 +252,14 @@ public sealed class AuditWriteMiddleware
|
||||
/// </summary>
|
||||
private void EmitInboundAudit(
|
||||
HttpContext ctx,
|
||||
AuditLogOptions opts,
|
||||
long durationMs,
|
||||
Exception? thrown,
|
||||
string? requestBody,
|
||||
string? responseBody,
|
||||
bool payloadTruncated)
|
||||
bool payloadTruncated,
|
||||
bool requestTruncated = false,
|
||||
bool responseTruncated = false)
|
||||
{
|
||||
try
|
||||
{
|
||||
@@ -243,10 +279,43 @@ public sealed class AuditWriteMiddleware
|
||||
var actor = isAuthFailure ? null : ResolveActor(ctx);
|
||||
var methodName = ResolveMethodName(ctx);
|
||||
|
||||
// M5.3 (T7): increment the ceiling-hits counter once per request
|
||||
// that hit the cap on EITHER the request or response body.
|
||||
if (requestTruncated || responseTruncated)
|
||||
{
|
||||
try { _ceilingHitsCounter.Increment(); } catch { /* swallow per §7 */ }
|
||||
}
|
||||
|
||||
// M5.3 (T7): capture request headers into Extra JSON alongside the
|
||||
// existing remoteIp / userAgent provenance fields. The header
|
||||
// collection is run through the SAME header-redaction list
|
||||
// (AuditLogOptions.HeaderRedactList) that the ScadaBridgeAuditRedactor
|
||||
// applies to RequestSummary / ResponseSummary — auth/sensitive
|
||||
// headers are redacted before they land in the row. Uses the SAME
|
||||
// options snapshot captured at request start (passed in as opts) as
|
||||
// the SkipBodyCapture / PerTargetOverrides decisions, so a mid-request
|
||||
// live-reload can't split the body-capture and header-redaction
|
||||
// verdicts across two different snapshots.
|
||||
var redactSet = new HashSet<string>(
|
||||
opts.HeaderRedactList,
|
||||
StringComparer.OrdinalIgnoreCase);
|
||||
|
||||
var headerDict = new Dictionary<string, string>(StringComparer.Ordinal);
|
||||
foreach (var header in ctx.Request.Headers)
|
||||
{
|
||||
// Redact headers whose name appears in the HeaderRedactList —
|
||||
// the same "<redacted>" marker used by ScadaBridgeAuditRedactor.
|
||||
var value = redactSet.Contains(header.Key)
|
||||
? "<redacted>"
|
||||
: header.Value.ToString();
|
||||
headerDict[header.Key] = value;
|
||||
}
|
||||
|
||||
var extra = JsonSerializer.Serialize(new
|
||||
{
|
||||
remoteIp = ctx.Connection.RemoteIpAddress?.ToString(),
|
||||
userAgent = ctx.Request.Headers.UserAgent.ToString(),
|
||||
requestHeaders = headerDict,
|
||||
});
|
||||
|
||||
var evt = ScadaBridgeAuditEventFactory.Create(
|
||||
|
||||
@@ -205,6 +205,47 @@ public class RouteTarget
|
||||
return response.Values;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Blocks until a remote instance attribute reaches <paramref name="targetValue"/>
|
||||
/// or <paramref name="timeout"/> elapses (spec §6). Value-equality ONLY across the
|
||||
/// wire: the target is canonically encoded via <see cref="AttributeValueCodec"/> and
|
||||
/// the site evaluates equality — there is no predicate and no quality flag in the
|
||||
/// comparison.
|
||||
/// </summary>
|
||||
/// <param name="attributeName">Name of the attribute to wait on.</param>
|
||||
/// <param name="targetValue">Target value the attribute must equal for the wait to match.</param>
|
||||
/// <param name="timeout">Maximum time to wait for the attribute to reach the target value.</param>
|
||||
/// <param name="cancellationToken">Optional cancellation token; defaults to the method deadline.</param>
|
||||
/// <returns>A task that resolves to <c>true</c> if the attribute reached the target value, <c>false</c> if the wait timed out.</returns>
|
||||
public async Task<bool> WaitForAttribute(
|
||||
string attributeName,
|
||||
object? targetValue,
|
||||
TimeSpan timeout,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var token = Effective(cancellationToken);
|
||||
var siteId = await ResolveSiteAsync(token);
|
||||
|
||||
// Audit Log #23 (ParentExecutionId): mirrors the Call path — stamp the
|
||||
// spawning inbound request's ExecutionId so future site-side audit
|
||||
// emission for routed waits can record this wait's parent. CorrelationId
|
||||
// is the per-operation lifecycle id, freshly minted per routed wait.
|
||||
var request = new RouteToWaitForAttributeRequest(
|
||||
Guid.NewGuid().ToString(), _instanceCode, attributeName,
|
||||
AttributeValueCodec.Encode(targetValue), timeout, DateTimeOffset.UtcNow,
|
||||
_parentExecutionId);
|
||||
|
||||
var response = await _instanceRouter.RouteToWaitForAttributeAsync(siteId, request, token);
|
||||
|
||||
if (!response.Success)
|
||||
{
|
||||
throw new InvalidOperationException(
|
||||
response.ErrorMessage ?? "Remote attribute wait failed");
|
||||
}
|
||||
|
||||
return response.Matched;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Sets a single attribute value on the remote instance.
|
||||
/// </summary>
|
||||
|
||||
@@ -18,13 +18,17 @@ namespace ZB.MOM.WW.ScadaBridge.ManagementService;
|
||||
|
||||
/// <summary>
|
||||
/// Minimal-API endpoints exposing the central Audit Log (#23) over HTTP for the
|
||||
/// ScadaBridge CLI (M8). Two routes:
|
||||
/// ScadaBridge CLI (M8). Three routes:
|
||||
/// <list type="bullet">
|
||||
/// <item><c>GET /api/audit/query</c> — keyset-paged JSON page, gated on the
|
||||
/// <see cref="AuthorizationPolicies.OperationalAudit"/> permission.</item>
|
||||
/// <item><c>GET /api/audit/export</c> — streamed bulk export (csv / jsonl;
|
||||
/// parquet returns HTTP 501), gated on the
|
||||
/// <see cref="AuthorizationPolicies.AuditExport"/> permission.</item>
|
||||
/// <item><c>GET /api/audit/tree</c> — execution-chain tree rooted at the
|
||||
/// topmost ancestor of a given <c>executionId</c>, returned as a JSON array
|
||||
/// of <see cref="ExecutionTreeNode"/>; gated on
|
||||
/// <see cref="AuthorizationPolicies.OperationalAudit"/>.</item>
|
||||
/// </list>
|
||||
///
|
||||
/// <para>
|
||||
@@ -85,8 +89,16 @@ public static class AuditEndpoints
|
||||
Converters = { new JsonStringEnumConverter() },
|
||||
};
|
||||
|
||||
/// <summary>Default sentinel written by the backfill endpoint when the caller omits <c>sentinel</c>.</summary>
|
||||
public const string DefaultBackfillSentinel = "unknown";
|
||||
|
||||
/// <summary>Default batch size for the backfill endpoint when the caller omits <c>batchSize</c>.</summary>
|
||||
public const int DefaultBackfillBatchSize = 5000;
|
||||
|
||||
/// <summary>
|
||||
/// Registers the <c>/api/audit/query</c> and <c>/api/audit/export</c> minimal-API endpoints.
|
||||
/// Registers the <c>/api/audit/query</c>, <c>/api/audit/export</c>,
|
||||
/// <c>/api/audit/tree</c>, and <c>POST /api/audit/backfill-source-node</c>
|
||||
/// minimal-API endpoints.
|
||||
/// </summary>
|
||||
/// <param name="endpoints">The endpoint route builder to register routes on.</param>
|
||||
/// <returns>The same <paramref name="endpoints"/> builder, for chaining.</returns>
|
||||
@@ -94,6 +106,8 @@ public static class AuditEndpoints
|
||||
{
|
||||
endpoints.MapGet("/api/audit/query", (Delegate)HandleQuery);
|
||||
endpoints.MapGet("/api/audit/export", (Delegate)HandleExport);
|
||||
endpoints.MapGet("/api/audit/tree", (Delegate)HandleTree);
|
||||
endpoints.MapPost("/api/audit/backfill-source-node", (Delegate)HandleBackfillSourceNode);
|
||||
return endpoints;
|
||||
}
|
||||
|
||||
@@ -232,6 +246,177 @@ public static class AuditEndpoints
|
||||
return Results.Empty;
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// GET /api/audit/tree
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Handles <c>GET /api/audit/tree?executionId=...</c>: authenticates, checks the
|
||||
/// OperationalAudit permission, and returns the full execution-chain tree rooted at
|
||||
/// the topmost ancestor of the supplied <c>executionId</c>. The response is a JSON
|
||||
/// array of <see cref="ExecutionTreeNode"/> objects (empty array when the id is
|
||||
/// not found). Returns HTTP 400 when <c>executionId</c> is absent or not a valid
|
||||
/// GUID.
|
||||
/// </summary>
|
||||
/// <param name="context">The HTTP context for the current request.</param>
|
||||
/// <returns>A task that resolves to the HTTP result (200 JSON array, 400, 401, or 403).</returns>
|
||||
internal static async Task<IResult> HandleTree(HttpContext context)
|
||||
{
|
||||
var auth = await AuthenticateAsync(context);
|
||||
if (auth.Failure is not null)
|
||||
{
|
||||
return auth.Failure;
|
||||
}
|
||||
|
||||
if (!HasAnyRole(auth.User!, AuthorizationPolicies.OperationalAuditRoles))
|
||||
{
|
||||
return Forbidden("OperationalAudit");
|
||||
}
|
||||
|
||||
var raw = context.Request.Query["executionId"].ToString();
|
||||
if (string.IsNullOrWhiteSpace(raw) || !Guid.TryParse(raw, out var executionId))
|
||||
{
|
||||
return Results.Json(
|
||||
new { error = "Missing or invalid 'executionId' query parameter (expected a GUID).", code = "BAD_REQUEST" },
|
||||
statusCode: 400);
|
||||
}
|
||||
|
||||
var repo = context.RequestServices.GetRequiredService<IAuditLogRepository>();
|
||||
var nodes = await repo.GetExecutionTreeAsync(executionId, context.RequestAborted);
|
||||
|
||||
return Results.Json(nodes, JsonOptions);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// POST /api/audit/backfill-source-node
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Handles <c>POST /api/audit/backfill-source-node</c>: authenticates (Admin role
|
||||
/// required), reads the JSON body for <c>sentinel</c> / <c>before</c> /
|
||||
/// <c>batchSize</c>, and calls
|
||||
/// <see cref="IAuditLogRepository.BackfillSourceNodeAsync"/> on the maintenance
|
||||
/// path.
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Auth.</b> Admin-only — backfilling the SourceNode column is a one-time ops
|
||||
/// procedure that mutates the AuditLog table via the maintenance path (NOT the
|
||||
/// append-only writer role). Restricted to <see cref="AuthorizationPolicies.AuditExportRoles"/>
|
||||
/// (Administrator) so it is never accessible to Viewer-role users.
|
||||
/// </para>
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Request body.</b>
|
||||
/// <code>
|
||||
/// {
|
||||
/// "sentinel": "unknown", // optional; default "unknown"
|
||||
/// "before": "2026-01-01T00:00:00Z", // required ISO-8601 UTC
|
||||
/// "batchSize": 5000 // optional; default 5000
|
||||
/// }
|
||||
/// </code>
|
||||
/// </para>
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Response (200).</b>
|
||||
/// <code>{ "rowsUpdated": 12345, "sentinel": "unknown", "before": "2026-01-01T00:00:00Z" }</code>
|
||||
/// </para>
|
||||
/// </summary>
|
||||
/// <param name="context">The HTTP context for the current request.</param>
|
||||
/// <returns>A task that resolves to the HTTP result (200 JSON, 400, 401, or 403).</returns>
|
||||
internal static async Task<IResult> HandleBackfillSourceNode(HttpContext context)
|
||||
{
|
||||
var auth = await AuthenticateAsync(context);
|
||||
if (auth.Failure is not null)
|
||||
{
|
||||
return auth.Failure;
|
||||
}
|
||||
|
||||
// Admin-only: backfilling is a one-time ops procedure on the maintenance path.
|
||||
if (!HasAnyRole(auth.User!, AuthorizationPolicies.AuditExportRoles))
|
||||
{
|
||||
return Forbidden("Administrator");
|
||||
}
|
||||
|
||||
string bodyText;
|
||||
try
|
||||
{
|
||||
using var reader = new System.IO.StreamReader(context.Request.Body);
|
||||
bodyText = await reader.ReadToEndAsync(context.RequestAborted);
|
||||
}
|
||||
catch (OperationCanceledException)
|
||||
{
|
||||
return Results.Json(new { error = "Request cancelled.", code = "CANCELLED" }, statusCode: 499);
|
||||
}
|
||||
|
||||
string sentinel = DefaultBackfillSentinel;
|
||||
DateTime? beforeUtc = null;
|
||||
int batchSize = DefaultBackfillBatchSize;
|
||||
|
||||
if (!string.IsNullOrWhiteSpace(bodyText))
|
||||
{
|
||||
try
|
||||
{
|
||||
using var doc = System.Text.Json.JsonDocument.Parse(bodyText);
|
||||
var root = doc.RootElement;
|
||||
|
||||
if (root.TryGetProperty("sentinel", out var sentinelEl))
|
||||
{
|
||||
var s = sentinelEl.GetString();
|
||||
if (!string.IsNullOrWhiteSpace(s))
|
||||
{
|
||||
sentinel = s.Trim();
|
||||
}
|
||||
}
|
||||
|
||||
if (root.TryGetProperty("before", out var beforeEl))
|
||||
{
|
||||
if (DateTime.TryParse(
|
||||
beforeEl.GetString(),
|
||||
System.Globalization.CultureInfo.InvariantCulture,
|
||||
System.Globalization.DateTimeStyles.AssumeUniversal | System.Globalization.DateTimeStyles.AdjustToUniversal,
|
||||
out var parsed))
|
||||
{
|
||||
beforeUtc = DateTime.SpecifyKind(parsed, DateTimeKind.Utc);
|
||||
}
|
||||
else
|
||||
{
|
||||
return Results.Json(
|
||||
new { error = "Invalid 'before' value; expected ISO-8601 UTC datetime.", code = "BAD_REQUEST" },
|
||||
statusCode: 400);
|
||||
}
|
||||
}
|
||||
|
||||
if (root.TryGetProperty("batchSize", out var batchEl) && batchEl.TryGetInt32(out var b) && b > 0)
|
||||
{
|
||||
batchSize = b;
|
||||
}
|
||||
}
|
||||
catch (System.Text.Json.JsonException)
|
||||
{
|
||||
return Results.Json(
|
||||
new { error = "Request body must be valid JSON.", code = "BAD_REQUEST" },
|
||||
statusCode: 400);
|
||||
}
|
||||
}
|
||||
|
||||
if (beforeUtc is null)
|
||||
{
|
||||
return Results.Json(
|
||||
new { error = "Required field 'before' (ISO-8601 UTC datetime) is missing.", code = "BAD_REQUEST" },
|
||||
statusCode: 400);
|
||||
}
|
||||
|
||||
var repo = context.RequestServices.GetRequiredService<IAuditLogRepository>();
|
||||
var rowsUpdated = await repo.BackfillSourceNodeAsync(sentinel, beforeUtc.Value, batchSize, context.RequestAborted);
|
||||
|
||||
return Results.Json(new
|
||||
{
|
||||
rowsUpdated,
|
||||
sentinel,
|
||||
before = beforeUtc.Value.ToString("O", System.Globalization.CultureInfo.InvariantCulture),
|
||||
}, JsonOptions);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Streams every matching row as RFC 4180 CSV, paging the repository with its
|
||||
/// keyset cursor and flushing after each page so a large export starts
|
||||
|
||||
@@ -122,6 +122,7 @@ public class NotificationOutboxActor : ReceiveActor, IWithTimers
|
||||
Receive<DiscardNotificationRequest>(HandleDiscard);
|
||||
Receive<NotificationKpiRequest>(HandleKpiRequest);
|
||||
Receive<PerSiteNotificationKpiRequest>(HandlePerSiteKpiRequest);
|
||||
Receive<PerNodeNotificationKpiRequest>(HandlePerNodeKpiRequest);
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
@@ -1081,6 +1082,38 @@ public class NotificationOutboxActor : ReceiveActor, IWithTimers
|
||||
return new PerSiteNotificationKpiResponse(correlationId, Success: true, ErrorMessage: null, sites);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Handles a per-node KPI request, computing the per-source-node outbox metrics with the
|
||||
/// same stuck cutoff and delivered window as <see cref="HandleKpiRequest"/>. Additive
|
||||
/// alongside <see cref="HandlePerSiteKpiRequest"/> — does not change per-site behaviour.
|
||||
/// </summary>
|
||||
private void HandlePerNodeKpiRequest(PerNodeNotificationKpiRequest request)
|
||||
{
|
||||
var sender = Sender;
|
||||
var now = DateTimeOffset.UtcNow;
|
||||
var stuckCutoff = StuckCutoff(now);
|
||||
var deliveredSince = now - _options.DeliveredKpiWindow;
|
||||
|
||||
ComputePerNodeKpisAsync(request.CorrelationId, stuckCutoff, deliveredSince).PipeTo(
|
||||
sender,
|
||||
success: response => response,
|
||||
failure: ex => new PerNodeNotificationKpiResponse(
|
||||
request.CorrelationId,
|
||||
Success: false,
|
||||
ErrorMessage: ex.GetBaseException().Message,
|
||||
Nodes: Array.Empty<NodeNotificationKpiSnapshot>()));
|
||||
}
|
||||
|
||||
private async Task<PerNodeNotificationKpiResponse> ComputePerNodeKpisAsync(
|
||||
string correlationId, DateTimeOffset stuckCutoff, DateTimeOffset deliveredSince)
|
||||
{
|
||||
using var scope = _serviceProvider.CreateScope();
|
||||
var repository = scope.ServiceProvider.GetRequiredService<INotificationOutboxRepository>();
|
||||
var nodes = await repository.ComputePerNodeKpisAsync(stuckCutoff, deliveredSince);
|
||||
|
||||
return new PerNodeNotificationKpiResponse(correlationId, Success: true, ErrorMessage: null, nodes);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The instant before which a still-pending notification counts as stuck — <paramref name="now"/>
|
||||
/// offset back by <see cref="NotificationOutboxOptions.StuckAgeThreshold"/>.
|
||||
|
||||
@@ -239,6 +239,7 @@ public class SiteCallAuditActor : ReceiveActor
|
||||
Receive<SiteCallDetailRequest>(HandleDetail);
|
||||
Receive<SiteCallKpiRequest>(HandleKpi);
|
||||
Receive<PerSiteSiteCallKpiRequest>(HandlePerSiteKpi);
|
||||
Receive<PerNodeSiteCallKpiRequest>(HandlePerNodeKpi);
|
||||
|
||||
// Task 5 (#22): central→site Retry/Discard relay for parked cached calls.
|
||||
Receive<RegisterCentralCommunication>(msg =>
|
||||
@@ -817,6 +818,47 @@ public class SiteCallAuditActor : ReceiveActor
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Handles a per-node KPI request, using the same stuck cutoff and
|
||||
/// interval bound as <see cref="HandleKpi"/>. Additive alongside
|
||||
/// <see cref="HandlePerSiteKpi"/> — does not change per-site behaviour.
|
||||
/// </summary>
|
||||
private void HandlePerNodeKpi(PerNodeSiteCallKpiRequest request)
|
||||
{
|
||||
var sender = Sender;
|
||||
var now = DateTime.UtcNow;
|
||||
var stuckCutoff = now - _options.StuckAgeThreshold;
|
||||
var intervalSince = now - _options.KpiInterval;
|
||||
|
||||
PerNodeKpiAsync(request.CorrelationId, stuckCutoff, intervalSince).PipeTo(
|
||||
sender,
|
||||
success: response => response,
|
||||
failure: ex => new PerNodeSiteCallKpiResponse(
|
||||
request.CorrelationId,
|
||||
Success: false,
|
||||
ErrorMessage: ex.GetBaseException().Message,
|
||||
Nodes: Array.Empty<SiteCallNodeKpiSnapshot>()));
|
||||
}
|
||||
|
||||
private async Task<PerNodeSiteCallKpiResponse> PerNodeKpiAsync(
|
||||
string correlationId, DateTime stuckCutoff, DateTime intervalSince)
|
||||
{
|
||||
var (scope, repository) = ResolveRepository();
|
||||
try
|
||||
{
|
||||
var nodes = await repository
|
||||
.ComputePerNodeKpisAsync(stuckCutoff, intervalSince)
|
||||
.ConfigureAwait(false);
|
||||
|
||||
return new PerNodeSiteCallKpiResponse(
|
||||
correlationId, Success: true, ErrorMessage: null, nodes);
|
||||
}
|
||||
finally
|
||||
{
|
||||
scope?.Dispose();
|
||||
}
|
||||
}
|
||||
|
||||
// ── Task 5: central→site Retry/Discard relay ──
|
||||
|
||||
/// <summary>
|
||||
|
||||
@@ -571,7 +571,20 @@ public class AlarmActor : ReceiveActor
|
||||
/// Passes the firing alarm's level/priority/message so the script can
|
||||
/// branch on severity via the <c>Alarm</c> global.
|
||||
/// </summary>
|
||||
private void SpawnAlarmExecution(AlarmLevel level, int priority, string message)
|
||||
/// <param name="level">The firing alarm severity level.</param>
|
||||
/// <param name="priority">The firing alarm priority.</param>
|
||||
/// <param name="message">The firing alarm message.</param>
|
||||
/// <param name="parentExecutionId">
|
||||
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the execution id of
|
||||
/// the context that fired this alarm, recorded as the on-trigger script run's
|
||||
/// <c>ParentExecutionId</c> so the alarm-triggered run chains under its firing
|
||||
/// context in the audit tree. The alarm subsystem currently has no Guid-typed
|
||||
/// firing id, so the only call sites pass <c>null</c> (the on-trigger run is a
|
||||
/// root). The parameter exists so a future firing-id can flow without
|
||||
/// touching the actor wiring.
|
||||
/// </param>
|
||||
private void SpawnAlarmExecution(
|
||||
AlarmLevel level, int priority, string message, Guid? parentExecutionId = null)
|
||||
{
|
||||
if (_onTriggerCompiledScript == null) return;
|
||||
|
||||
@@ -591,7 +604,9 @@ public class AlarmActor : ReceiveActor
|
||||
_options,
|
||||
_logger,
|
||||
// M2.5 (#9): per-script timeout from the on-trigger script (null = global).
|
||||
_onTriggerExecutionTimeoutSeconds));
|
||||
_onTriggerExecutionTimeoutSeconds,
|
||||
// Audit Log #23 (M5.4): the firing context's execution id (null today).
|
||||
parentExecutionId));
|
||||
|
||||
Context.ActorOf(props, executionId);
|
||||
}
|
||||
|
||||
@@ -29,6 +29,14 @@ public class AlarmExecutionActor : ReceiveActor
|
||||
/// <param name="options">Site runtime configuration options, including the execution timeout.</param>
|
||||
/// <param name="logger">Logger for execution diagnostics.</param>
|
||||
/// <param name="executionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
|
||||
/// <param name="parentExecutionId">
|
||||
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the execution id of
|
||||
/// the context that fired this alarm, threaded into the on-trigger script's
|
||||
/// <see cref="ScriptRuntimeContext"/> as its <c>ParentExecutionId</c> so the
|
||||
/// alarm-triggered run chains under its firing context. Null today (no
|
||||
/// Guid-typed firing id exists yet) — the run is a root, but the plumbing
|
||||
/// is in place for a future firing id.
|
||||
/// </param>
|
||||
public AlarmExecutionActor(
|
||||
string alarmName,
|
||||
string instanceName,
|
||||
@@ -42,7 +50,9 @@ public class AlarmExecutionActor : ReceiveActor
|
||||
ILogger logger,
|
||||
// M2.5 (#9): per-script execution timeout override (seconds) for the
|
||||
// alarm on-trigger script. Null or non-positive falls back to the global.
|
||||
int? executionTimeoutSeconds = null)
|
||||
int? executionTimeoutSeconds = null,
|
||||
// Audit Log #23 (M5.4): the firing context's execution id (null today).
|
||||
Guid? parentExecutionId = null)
|
||||
{
|
||||
var self = Self;
|
||||
var parent = Context.Parent;
|
||||
@@ -51,7 +61,7 @@ public class AlarmExecutionActor : ReceiveActor
|
||||
alarmName, instanceName, level, priority, message,
|
||||
compiledScript, instanceActor,
|
||||
sharedScriptLibrary, options, self, parent, logger,
|
||||
executionTimeoutSeconds);
|
||||
executionTimeoutSeconds, parentExecutionId);
|
||||
}
|
||||
|
||||
private static void ExecuteAlarmScript(
|
||||
@@ -67,7 +77,8 @@ public class AlarmExecutionActor : ReceiveActor
|
||||
IActorRef self,
|
||||
IActorRef parent,
|
||||
ILogger logger,
|
||||
int? executionTimeoutSeconds)
|
||||
int? executionTimeoutSeconds,
|
||||
Guid? parentExecutionId)
|
||||
{
|
||||
// M2.5 (#9): per-script timeout overrides the global default. A null or
|
||||
// non-positive per-script value (≤ 0) falls back to the global.
|
||||
@@ -95,7 +106,19 @@ public class AlarmExecutionActor : ReceiveActor
|
||||
options.MaxScriptCallDepth,
|
||||
timeout,
|
||||
instanceName,
|
||||
logger);
|
||||
logger,
|
||||
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the
|
||||
// alarm on-trigger run mints its own fresh ExecutionId (the
|
||||
// ctor's `?? NewGuid()` fallback) and records the firing
|
||||
// context's id as its ParentExecutionId — null today, so the
|
||||
// run is a root, but the plumbing exists for a future
|
||||
// firing id.
|
||||
parentExecutionId: parentExecutionId,
|
||||
// WaitForAttribute (spec §4.4): thread the alarm on-trigger
|
||||
// script's per-script execution-timeout token so a
|
||||
// Attributes.WaitAsync inside an on-trigger script is bounded
|
||||
// by the same script deadline.
|
||||
scriptTimeoutToken: cts.Token);
|
||||
|
||||
var globals = new ScriptGlobals
|
||||
{
|
||||
|
||||
@@ -149,6 +149,7 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
|
||||
Receive<RouteToCallRequest>(RouteInboundApiCall);
|
||||
Receive<RouteToGetAttributesRequest>(RouteInboundApiGetAttributes);
|
||||
Receive<RouteToSetAttributesRequest>(RouteInboundApiSetAttributes);
|
||||
Receive<RouteToWaitForAttributeRequest>(RouteInboundApiWaitForAttribute);
|
||||
|
||||
// OPC UA Tag Browser — singleton-only re-forward to local /user/dcl-manager.
|
||||
// BrowseNodeCommand is routed to this singleton (active node) by
|
||||
@@ -1078,6 +1079,45 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
|
||||
}).PipeTo(sender);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §6 (WD-2b): unpacks a routed <see cref="RouteToWaitForAttributeRequest"/>
|
||||
/// (inbound-API <c>Route.To().WaitForAttribute()</c>) into the deployed
|
||||
/// Instance Actor's site-local <see cref="WaitForAttributeRequest"/> and relays
|
||||
/// the result back. Value-equality only across the wire — the predicate is null
|
||||
/// and <c>RequireGoodQuality</c> is left at its default. The Ask is bounded by the
|
||||
/// wait timeout plus slack (NOT a fixed 30s), since the wait legitimately blocks
|
||||
/// for up to <see cref="RouteToWaitForAttributeRequest.Timeout"/>.
|
||||
/// </summary>
|
||||
private void RouteInboundApiWaitForAttribute(RouteToWaitForAttributeRequest request)
|
||||
{
|
||||
if (!_instanceActors.TryGetValue(request.InstanceUniqueName, out var instanceActor))
|
||||
{
|
||||
Sender.Tell(new RouteToWaitForAttributeResponse(
|
||||
request.CorrelationId, false, null, null, false,
|
||||
false, $"Instance '{request.InstanceUniqueName}' not found on this site.",
|
||||
DateTimeOffset.UtcNow));
|
||||
return;
|
||||
}
|
||||
|
||||
var sender = Sender;
|
||||
// Routed waits are value-equality only (predicate null); RequireGoodQuality left at default.
|
||||
var inner = new WaitForAttributeRequest(
|
||||
request.CorrelationId, request.InstanceUniqueName, request.AttributeName,
|
||||
request.TargetValueEncoded, null, request.Timeout, DateTimeOffset.UtcNow);
|
||||
|
||||
// Ask bounded by the WAIT timeout + slack — NOT a fixed 30s (the wait legitimately blocks up to request.Timeout).
|
||||
instanceActor.Ask<WaitForAttributeResponse>(inner, request.Timeout + TimeSpan.FromSeconds(5))
|
||||
.ContinueWith(t => t.IsCompletedSuccessfully
|
||||
? new RouteToWaitForAttributeResponse(
|
||||
request.CorrelationId, t.Result.Matched, t.Result.Value, t.Result.Quality, t.Result.TimedOut,
|
||||
true, null, DateTimeOffset.UtcNow)
|
||||
: new RouteToWaitForAttributeResponse(
|
||||
request.CorrelationId, false, null, null, false,
|
||||
false, t.Exception?.GetBaseException().Message ?? "Attribute wait timed out",
|
||||
DateTimeOffset.UtcNow))
|
||||
.PipeTo(sender);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Writes attribute values on a deployed instance for a Route.To().SetAttribute(s)
|
||||
/// call (or a central Test Run bound to the instance). Each write is Ask'd to the
|
||||
|
||||
@@ -68,6 +68,18 @@ public class InstanceActor : ReceiveActor
|
||||
// mirroring the rest of the actor's by-name dictionaries).
|
||||
private readonly Dictionary<string, ResolvedAttribute> _resolvedAttributeByName = new();
|
||||
|
||||
// WaitForAttribute (spec §4.2): one-shot waiter registry keyed by the
|
||||
// request CorrelationId. Each entry holds the watched attribute name, the
|
||||
// match test (decoded target equality OR a site-local predicate), the
|
||||
// original Sender to reply to, and the scheduled-timeout handle so a match
|
||||
// can cancel it. Single-threaded actor access — no locking needed.
|
||||
private readonly Dictionary<string, PendingWait> _attributeWaiters = new();
|
||||
|
||||
// WaitForAttribute: defensive per-instance cap so a script leaking waiters
|
||||
// in a loop cannot grow the registry without bound. Exceeding it refuses the
|
||||
// wait with an error reply rather than registering.
|
||||
private const int MaxAttributeWaiters = 100;
|
||||
|
||||
// DCL manager actor reference for subscribing to tag values
|
||||
private readonly IActorRef? _dclManager;
|
||||
// Maps each tag path to every attribute canonical name that references it.
|
||||
@@ -170,6 +182,12 @@ public class InstanceActor : ReceiveActor
|
||||
// WP-22/23: Handle attribute value changes from DCL (Tell pattern)
|
||||
Receive<AttributeValueChanged>(HandleAttributeValueChanged);
|
||||
|
||||
// WaitForAttribute (spec §4.2): event-driven "wait for value" waiter
|
||||
// registration + its scheduled-timeout self-message. Both flow only
|
||||
// site-locally (the predicate variant carries a non-serializable delegate).
|
||||
Receive<WaitForAttributeRequest>(HandleWaitForAttribute);
|
||||
Receive<WaitForAttributeTimeout>(HandleWaitForAttributeTimeout);
|
||||
|
||||
// Handle tag value updates from DCL — convert to AttributeValueChanged
|
||||
Receive<TagValueUpdate>(HandleTagValueUpdate);
|
||||
Receive<SubscribeTagsResponse>(_ => { }); // Ack from DCL subscribe — no action needed
|
||||
@@ -519,6 +537,114 @@ public class InstanceActor : ReceiveActor
|
||||
PublishAndNotifyChildren(changed);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §4.2): registers a one-shot event-driven waiter for
|
||||
/// an attribute to reach a value (encoded-equality), satisfy a site-local
|
||||
/// predicate, or change at all. The current-value fast-path and the
|
||||
/// change-handling in <see cref="HandleAttributeValueChanged"/> both run on
|
||||
/// this single-threaded actor, so a value that flips between "read current"
|
||||
/// and "register" cannot be missed (spec §5).
|
||||
/// </summary>
|
||||
private void HandleWaitForAttribute(WaitForAttributeRequest req)
|
||||
{
|
||||
// Capture the sender immediately — Sender is invalid once we schedule /
|
||||
// return and a later message arrives.
|
||||
var replyer = Sender;
|
||||
|
||||
// Build the match test: explicit predicate wins; else null encoded target
|
||||
// means "any change"; else compare the codec-encoded current value to the
|
||||
// encoded target (avoids needing the attribute's DataType to decode).
|
||||
Func<object?, bool> test;
|
||||
if (req.Predicate is not null)
|
||||
{
|
||||
test = req.Predicate;
|
||||
}
|
||||
else if (req.TargetValueEncoded is null)
|
||||
{
|
||||
test = _ => true;
|
||||
}
|
||||
else
|
||||
{
|
||||
var target = req.TargetValueEncoded;
|
||||
test = v => string.Equals(
|
||||
AttributeValueCodec.Encode(v), target, StringComparison.Ordinal);
|
||||
}
|
||||
|
||||
// Fast path: the current value already satisfies the test → reply now.
|
||||
// A script-supplied predicate (or the codec-equality lambda) runs on the
|
||||
// actor thread; guard it so a throwing predicate cannot crash the actor or
|
||||
// leak a never-resolved waiter. On throw: reply non-matched + ErrorMessage
|
||||
// and return WITHOUT registering (no timeout scheduled).
|
||||
if (_attributes.TryGetValue(req.AttributeName, out var current))
|
||||
{
|
||||
// Effective quality used for BOTH the §4.2 quality gate and the match
|
||||
// reply — the same `?? "Good"` default the reply has always used.
|
||||
_attributeQualities.TryGetValue(req.AttributeName, out var fastQuality);
|
||||
var effectiveQuality = fastQuality ?? "Good";
|
||||
|
||||
bool fastMatch;
|
||||
try
|
||||
{
|
||||
// §4.2 quality gate ANDed with the value test, both INSIDE the guard:
|
||||
// in quality-gated mode a value already at target but at Bad/Uncertain
|
||||
// quality is NOT a fast match — it falls through to register + schedule
|
||||
// the timeout like any other pending waiter (do NOT fast-reply matched).
|
||||
fastMatch =
|
||||
(!req.RequireGoodQuality
|
||||
|| string.Equals(effectiveQuality, "Good", StringComparison.Ordinal))
|
||||
&& test(current);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogWarning(ex,
|
||||
"WaitForAttribute predicate threw on the fast-path for {Instance}.{Attribute}; refusing the wait",
|
||||
_instanceUniqueName, req.AttributeName);
|
||||
replyer.Tell(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: false, null, null, TimedOut: false,
|
||||
ErrorMessage: "Wait predicate threw: " + ex.Message));
|
||||
return;
|
||||
}
|
||||
|
||||
if (fastMatch)
|
||||
{
|
||||
replyer.Tell(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: true, current, effectiveQuality, TimedOut: false));
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
// Defensive cap: refuse rather than register if the instance already has
|
||||
// too many concurrent waiters (guards against a script leaking waiters).
|
||||
if (_attributeWaiters.Count >= MaxAttributeWaiters)
|
||||
{
|
||||
replyer.Tell(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: false, null, null, TimedOut: false,
|
||||
ErrorMessage: "Too many concurrent attribute waiters on this instance"));
|
||||
return;
|
||||
}
|
||||
|
||||
// Register and schedule the self-evicting timeout (NativeAlarmActor idiom).
|
||||
var handle = Context.System.Scheduler.ScheduleTellOnceCancelable(
|
||||
req.Timeout, Self, new WaitForAttributeTimeout(req.CorrelationId), Self);
|
||||
|
||||
_attributeWaiters[req.CorrelationId] =
|
||||
new PendingWait(req.AttributeName, test, replyer, handle, req.RequireGoodQuality);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §4.2): the scheduled timeout fired for a waiter that
|
||||
/// never matched. If still registered (a match would have removed + canceled
|
||||
/// it), reply TimedOut and evict it.
|
||||
/// </summary>
|
||||
private void HandleWaitForAttributeTimeout(WaitForAttributeTimeout msg)
|
||||
{
|
||||
if (_attributeWaiters.Remove(msg.CorrelationId, out var pending))
|
||||
{
|
||||
pending.Replyer.Tell(new WaitForAttributeResponse(
|
||||
msg.CorrelationId, Matched: false, null, null, TimedOut: true));
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Handles tag value updates from DCL. Maps the tag path back to the attribute
|
||||
/// canonical name and converts to an AttributeValueChanged for unified processing.
|
||||
@@ -556,9 +682,14 @@ public class InstanceActor : ReceiveActor
|
||||
_attributeQualities[attrName] = "Bad";
|
||||
_attributeTimestamps[attrName] = update.Timestamp;
|
||||
var currentValue = _attributes.GetValueOrDefault(attrName);
|
||||
// WaitForAttribute (spec §4.2): quality-only republish — the
|
||||
// stored value is UNCHANGED (we publish the OLD currentValue, only
|
||||
// the quality flips to Bad). Do NOT evaluate waiters, or an
|
||||
// "any-change" / unchanged-value-equality waiter would fire on a
|
||||
// non-change.
|
||||
PublishAndNotifyChildren(new AttributeValueChanged(
|
||||
_instanceUniqueName, update.TagPath, attrName,
|
||||
currentValue, "Bad", update.Timestamp));
|
||||
currentValue, "Bad", update.Timestamp), evaluateWaiters: false);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
@@ -908,7 +1039,17 @@ public class InstanceActor : ReceiveActor
|
||||
/// Publishes attribute change to stream and notifies child Script/Alarm actors.
|
||||
/// WP-22: Tell for attribute notifications (fire-and-forget, never blocks).
|
||||
/// </summary>
|
||||
private void PublishAndNotifyChildren(AttributeValueChanged changed)
|
||||
/// <param name="changed">The attribute change to publish.</param>
|
||||
/// <param name="evaluateWaiters">
|
||||
/// WaitForAttribute (spec §4.2): when <c>true</c> (the default), registered
|
||||
/// <c>Attributes.WaitAsync</c> waiters on this attribute are re-evaluated against
|
||||
/// <paramref name="changed"/>'s value. Pass <c>false</c> on republish/quality-only
|
||||
/// paths that do NOT assign a new value to <c>_attributes[name]</c> (e.g. the
|
||||
/// List-coerce-failure Bad-quality republish, which publishes the OLD value) —
|
||||
/// otherwise an "any-change" waiter (or a waiter whose target equals the unchanged
|
||||
/// value) would spuriously fire even though nothing actually changed.
|
||||
/// </param>
|
||||
private void PublishAndNotifyChildren(AttributeValueChanged changed, bool evaluateWaiters = true)
|
||||
{
|
||||
// WP-23: Publish to site-wide stream
|
||||
_streamManager?.PublishAttributeValueChanged(changed);
|
||||
@@ -924,6 +1065,83 @@ public class InstanceActor : ReceiveActor
|
||||
{
|
||||
alarmActor.Tell(changed);
|
||||
}
|
||||
|
||||
// WaitForAttribute (spec §4.2): re-evaluate any waiters on THIS attribute —
|
||||
// but ONLY when this publish reflects a real value change (evaluateWaiters).
|
||||
// The genuine value-change paths (HandleAttributeValueChanged, the scalar
|
||||
// DCL update path, HandleSetStaticAttributeCore) call it AFTER assigning
|
||||
// _attributes[name], so changed.Value is the just-applied current value.
|
||||
// Republish/quality-only paths (List-coerce-failure Bad-quality, which
|
||||
// publishes the OLD value) pass evaluateWaiters:false so an "any-change" or
|
||||
// unchanged-value-equality waiter does not spuriously fire (spec §4.2).
|
||||
// Iterate a snapshot so satisfied waiters can be removed during the loop;
|
||||
// each match cancels its scheduled timeout (so no stray WaitForAttributeTimeout
|
||||
// follows) and replies Matched=true.
|
||||
if (evaluateWaiters)
|
||||
ResolveMatchedWaiters(changed);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §4.2): fires every registered waiter on
|
||||
/// <paramref name="changed"/>'s attribute whose test now passes against the
|
||||
/// just-applied value — cancelling its timeout, replying Matched, and removing
|
||||
/// it from the registry. A no-op when there are no waiters.
|
||||
///
|
||||
/// <para>
|
||||
/// Each waiter's match test runs inside a per-waiter try/catch: a throwing
|
||||
/// script-supplied predicate (or codec lambda) must NOT abort the loop and
|
||||
/// strand sibling waiters on the same attribute, nor leave the throwing waiter
|
||||
/// registered with a live scheduled timeout. On throw we cancel that waiter's
|
||||
/// timeout, reply non-matched + ErrorMessage, remove it, and continue.
|
||||
/// </para>
|
||||
/// </summary>
|
||||
private void ResolveMatchedWaiters(AttributeValueChanged changed)
|
||||
{
|
||||
if (_attributeWaiters.Count == 0)
|
||||
return;
|
||||
|
||||
// Snapshot the candidate waiters on THIS attribute. Iterating a snapshot
|
||||
// (and NOT evaluating the test inside the LINQ filter) keeps removal mid-loop
|
||||
// safe and ensures one throwing test cannot abort materialization for siblings.
|
||||
var candidates = _attributeWaiters
|
||||
.Where(kvp => kvp.Value.AttributeName == changed.AttributeName)
|
||||
.ToList();
|
||||
|
||||
foreach (var (cid, pending) in candidates)
|
||||
{
|
||||
bool matched;
|
||||
try
|
||||
{
|
||||
// §4.2 quality gate ANDed with the value test, both INSIDE the guard:
|
||||
// in quality-gated mode a value reaching the target at Bad/Uncertain
|
||||
// quality is NOT a match — the waiter stays pending until it satisfies
|
||||
// the test at Good quality (or times out).
|
||||
matched =
|
||||
(!pending.RequireGoodQuality
|
||||
|| string.Equals(changed.Quality, "Good", StringComparison.Ordinal))
|
||||
&& pending.Test(changed.Value);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogWarning(ex,
|
||||
"WaitForAttribute predicate threw while resolving waiter {CorrelationId} on {Instance}.{Attribute}; evicting it",
|
||||
cid, _instanceUniqueName, changed.AttributeName);
|
||||
pending.Timeout.Cancel();
|
||||
pending.Replyer.Tell(new WaitForAttributeResponse(
|
||||
cid, Matched: false, null, null, TimedOut: false,
|
||||
ErrorMessage: "Wait predicate threw: " + ex.Message));
|
||||
_attributeWaiters.Remove(cid);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!matched)
|
||||
continue;
|
||||
|
||||
pending.Timeout.Cancel();
|
||||
pending.Replyer.Tell(new WaitForAttributeResponse(
|
||||
cid, Matched: true, changed.Value, changed.Quality, TimedOut: false));
|
||||
_attributeWaiters.Remove(cid);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -1202,4 +1420,23 @@ public class InstanceActor : ReceiveActor
|
||||
/// Internal message for async override loading result.
|
||||
/// </summary>
|
||||
internal record LoadOverridesResult(Dictionary<string, string> Overrides, string? Error);
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §4.2): one registered, not-yet-satisfied waiter.
|
||||
/// </summary>
|
||||
/// <param name="AttributeName">The attribute this waiter watches (scope-resolved).</param>
|
||||
/// <param name="Test">The match test (decoded-target equality OR site-local predicate OR any-change).</param>
|
||||
/// <param name="Replyer">The original sender to reply to on match / timeout.</param>
|
||||
/// <param name="Timeout">The scheduled timeout handle, canceled on match.</param>
|
||||
/// <param name="RequireGoodQuality">
|
||||
/// Quality-gated ("Good"-only) mode (spec §4.2): when <c>true</c>, the resolve
|
||||
/// loop additionally requires <c>changed.Quality == "Good"</c> before the test
|
||||
/// can match.
|
||||
/// </param>
|
||||
private sealed record PendingWait(
|
||||
string AttributeName,
|
||||
Func<object?, bool> Test,
|
||||
IActorRef Replyer,
|
||||
ICancelable Timeout,
|
||||
bool RequireGoodQuality);
|
||||
}
|
||||
|
||||
@@ -221,7 +221,12 @@ public class ScriptExecutionActor : ReceiveActor
|
||||
// M2.12 (#25): thread the singleton site event logger so
|
||||
// recursion-limit violations at CallScript/CallShared emit a
|
||||
// script Error site event in addition to ILogger.LogError.
|
||||
siteEventLogger: siteEventLogger);
|
||||
siteEventLogger: siteEventLogger,
|
||||
// WaitForAttribute (spec §4.3/§4.4): thread the per-script
|
||||
// execution-timeout token so Attributes.WaitAsync's Ask is
|
||||
// bounded by the script's own ExecutionTimeoutSeconds — a
|
||||
// shorter script deadline wins over the wait's own timeout.
|
||||
scriptTimeoutToken: cts.Token);
|
||||
|
||||
var globals = new ScriptGlobals
|
||||
{
|
||||
|
||||
@@ -73,6 +73,107 @@ public class AttributeAccessor
|
||||
/// <returns>A task that represents the asynchronous operation.</returns>
|
||||
public Task SetAsync(string key, object? value)
|
||||
=> _ctx.SetAttribute(Resolve(key), AttributeValueCodec.Encode(value) ?? string.Empty);
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3-§5): waits event-driven until the attribute equals
|
||||
/// <paramref name="targetValue"/> (value-equality, codec-normalized), bounded by
|
||||
/// <paramref name="timeout"/>. Returns <c>true</c> if matched within the timeout,
|
||||
/// <c>false</c> on timeout (no throw). Honors the script's execution-timeout token.
|
||||
/// Scope/composition path resolution (<see cref="Resolve"/>) is applied just like
|
||||
/// <see cref="GetAsync"/> / <see cref="SetAsync"/>.
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Quality-agnostic by default (spec §4.2):</b> matching tests the VALUE, not
|
||||
/// the quality — a value arriving at Bad quality still satisfies the wait. Pass
|
||||
/// <paramref name="requireGoodQuality"/><c>:true</c> for quality-gated ("Good"-only)
|
||||
/// matching: a value reaching the target at Bad/Uncertain quality is ignored and
|
||||
/// the wait holds until the target is reached at "Good" quality (or times out).
|
||||
/// </para>
|
||||
///
|
||||
/// <para>
|
||||
/// Passing a <b>null</b> <paramref name="targetValue"/> means "match on any change":
|
||||
/// the wait then matches the next value the attribute receives — and matches
|
||||
/// IMMEDIATELY (fast-path) if the attribute already holds any value at registration.
|
||||
/// </para>
|
||||
/// </summary>
|
||||
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
|
||||
/// <param name="targetValue">
|
||||
/// The value to wait for (codec-encoded for comparison); <c>null</c> means
|
||||
/// "match on any change" (matches immediately if the attribute already has a value).
|
||||
/// </param>
|
||||
/// <param name="timeout">How long to wait before returning false.</param>
|
||||
/// <param name="requireGoodQuality">
|
||||
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to
|
||||
/// <c>false</c> (quality-agnostic — Bad/Uncertain-quality transients still match).
|
||||
/// </param>
|
||||
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
|
||||
public Task<bool> WaitAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttribute(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3-§5): predicate form — waits event-driven until
|
||||
/// <paramref name="predicate"/> returns <c>true</c> for the attribute's current
|
||||
/// value, bounded by <paramref name="timeout"/>. Site-local only (the predicate
|
||||
/// is an in-process delegate). Returns <c>true</c> if matched within the timeout,
|
||||
/// <c>false</c> on timeout (no throw). Scope/composition path resolution applies.
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Quality-agnostic by default (spec §4.2):</b> the predicate is tested against
|
||||
/// the VALUE, regardless of quality — a value arriving at Bad quality still
|
||||
/// satisfies the wait if the predicate passes. Pass <paramref name="requireGoodQuality"/>
|
||||
/// <c>:true</c> for quality-gated ("Good"-only) matching: a value satisfying the
|
||||
/// predicate at Bad/Uncertain quality is ignored until it does so at "Good" quality.
|
||||
/// </para>
|
||||
/// </summary>
|
||||
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
|
||||
/// <param name="predicate">The site-local predicate tested against the current value.</param>
|
||||
/// <param name="timeout">How long to wait before returning false.</param>
|
||||
/// <param name="requireGoodQuality">
|
||||
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to
|
||||
/// <c>false</c> (quality-agnostic).
|
||||
/// </param>
|
||||
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
|
||||
public Task<bool> WaitAsync(string key, Func<object?, bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttribute(Resolve(key), null, predicate, timeout, requireGoodQuality);
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3): richer value-equality form — like
|
||||
/// <see cref="WaitAsync(string, object?, TimeSpan, bool)"/> but returns the full
|
||||
/// <see cref="WaitResult"/> (matched flag + matched value + quality + timed-out
|
||||
/// flag) instead of a bare bool. Scope/composition path resolution
|
||||
/// (<see cref="Resolve"/>) is applied to <paramref name="key"/> just like the
|
||||
/// other accessors. Never throws on timeout — a timeout yields
|
||||
/// <c>WaitResult { Matched = false, TimedOut = true }</c>.
|
||||
/// </summary>
|
||||
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
|
||||
/// <param name="targetValue">
|
||||
/// The value to wait for (codec-encoded for comparison); <c>null</c> means
|
||||
/// "match on any change".
|
||||
/// </param>
|
||||
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
|
||||
/// <param name="requireGoodQuality">
|
||||
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to <c>false</c>.
|
||||
/// </param>
|
||||
/// <returns>The full <see cref="WaitResult"/> for the wait.</returns>
|
||||
public Task<WaitResult> WaitForAsync(string key, object? targetValue, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttributeFull(Resolve(key), AttributeValueCodec.Encode(targetValue), null, timeout, requireGoodQuality);
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3): richer predicate form — like
|
||||
/// <see cref="WaitAsync(string, Func{object?, bool}, TimeSpan, bool)"/> but returns
|
||||
/// the full <see cref="WaitResult"/>. Site-local only (the predicate is an
|
||||
/// in-process delegate). Scope/composition path resolution applies. Never throws
|
||||
/// on timeout (<c>WaitResult { Matched = false, TimedOut = true }</c>).
|
||||
/// </summary>
|
||||
/// <param name="key">The attribute key (scope-resolved before the wait is registered).</param>
|
||||
/// <param name="predicate">The site-local predicate tested against the current value.</param>
|
||||
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
|
||||
/// <param name="requireGoodQuality">
|
||||
/// <c>true</c> for quality-gated ("Good"-only) matching (spec §4.2); defaults to <c>false</c>.
|
||||
/// </param>
|
||||
/// <returns>The full <see cref="WaitResult"/> for the wait.</returns>
|
||||
public Task<WaitResult> WaitForAsync(string key, Func<object?, bool> predicate, TimeSpan timeout, bool requireGoodQuality = false)
|
||||
=> _ctx.WaitAttributeFull(Resolve(key), null, predicate, timeout, requireGoodQuality);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
|
||||
@@ -46,6 +46,16 @@ public class ScriptRuntimeContext
|
||||
private readonly ILogger _logger;
|
||||
private readonly string _instanceName;
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §4.3): the per-script execution-timeout token from
|
||||
/// the owning <c>ScriptExecutionActor</c>/<c>AlarmExecutionActor</c>
|
||||
/// (<c>cts.Token</c>). Bounds the <c>Attributes.WaitAsync</c> Ask so a script
|
||||
/// that hits its own <c>ExecutionTimeoutSeconds</c> abandons the wait. Defaults
|
||||
/// to <see cref="CancellationToken.None"/> for contexts that do not thread one
|
||||
/// (legacy callers / tests / the alarm path when it has no CTS).
|
||||
/// </summary>
|
||||
private readonly CancellationToken _scriptTimeoutToken;
|
||||
|
||||
/// <summary>
|
||||
/// WP-13: External system client for ExternalSystem.Call/CachedCall.
|
||||
/// </summary>
|
||||
@@ -194,6 +204,13 @@ public class ScriptRuntimeContext
|
||||
/// <c>ILogger.LogError</c> + throw. When null the existing behaviour is
|
||||
/// unchanged; all existing callers and tests remain source-compatible.
|
||||
/// </param>
|
||||
/// <param name="scriptTimeoutToken">
|
||||
/// WaitForAttribute (spec §4.3): the per-script execution-timeout token
|
||||
/// (<c>cts.Token</c> on the owning execution actor) used to bound
|
||||
/// <c>Attributes.WaitAsync</c>. Defaults to
|
||||
/// <see cref="CancellationToken.None"/> for callers / tests that do not
|
||||
/// thread one — those waits are bounded only by their own timeout.
|
||||
/// </param>
|
||||
public ScriptRuntimeContext(
|
||||
IActorRef instanceActor,
|
||||
IActorRef self,
|
||||
@@ -215,7 +232,8 @@ public class ScriptRuntimeContext
|
||||
Guid? executionId = null,
|
||||
Guid? parentExecutionId = null,
|
||||
string? sourceNode = null,
|
||||
ISiteEventLogger? siteEventLogger = null)
|
||||
ISiteEventLogger? siteEventLogger = null,
|
||||
CancellationToken scriptTimeoutToken = default)
|
||||
{
|
||||
_instanceActor = instanceActor;
|
||||
_self = self;
|
||||
@@ -245,6 +263,66 @@ public class ScriptRuntimeContext
|
||||
_parentExecutionId = parentExecutionId;
|
||||
// M2.12 (#25): optional — null when not wired (tests / AlarmExecutionActor).
|
||||
_siteEventLogger = siteEventLogger;
|
||||
// WaitForAttribute (spec §4.3): default(CancellationToken) == None when
|
||||
// not threaded in — the WaitAsync Ask is then bounded only by its own timeout.
|
||||
_scriptTimeoutToken = scriptTimeoutToken;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23 (M5.4): this run's own per-execution id. Exposed so a
|
||||
/// nested <c>Scripts.CallShared</c> can record it as the spawned shared
|
||||
/// script's <c>ParentExecutionId</c>, forming a true execution tree.
|
||||
/// </summary>
|
||||
internal Guid ExecutionId => _executionId;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23 (M5.4): the spawning execution's id for this run (null for
|
||||
/// a root run). Exposed for test assertions on the execution tree.
|
||||
/// </summary>
|
||||
internal Guid? ParentExecutionId => _parentExecutionId;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): builds a child
|
||||
/// <see cref="ScriptRuntimeContext"/> for an inline <c>Scripts.CallShared</c>
|
||||
/// invocation. The shared script runs inline (no actor hop) but is modelled
|
||||
/// as its OWN execution node in the audit tree: it mints a fresh
|
||||
/// <see cref="_executionId"/> and records THIS run's <see cref="_executionId"/>
|
||||
/// as its <c>ParentExecutionId</c>, so <c>B → CallShared(C)</c> yields
|
||||
/// <c>C.ParentExecutionId == B.ExecutionId</c>. Every other dependency
|
||||
/// (actors, gateways, audit writer, site id, source node, call-depth) is
|
||||
/// carried over verbatim from this context.
|
||||
/// </summary>
|
||||
/// <param name="childCallDepth">The recursion depth of the shared-script call.</param>
|
||||
internal ScriptRuntimeContext CreateChildContextForSharedScript(int childCallDepth)
|
||||
{
|
||||
return new ScriptRuntimeContext(
|
||||
_instanceActor,
|
||||
_self,
|
||||
_sharedScriptLibrary,
|
||||
childCallDepth,
|
||||
_maxCallDepth,
|
||||
_askTimeout,
|
||||
_instanceName,
|
||||
_logger,
|
||||
_externalSystemClient,
|
||||
_databaseGateway,
|
||||
_storeAndForward,
|
||||
_siteCommunicationActor,
|
||||
_siteId,
|
||||
_sourceScript,
|
||||
_auditWriter,
|
||||
_operationTrackingStore,
|
||||
_cachedForwarder,
|
||||
// Fresh execution id for the shared-script run (omit so the ctor mints one)…
|
||||
executionId: null,
|
||||
// …parented to THIS run's execution id (the spawner).
|
||||
parentExecutionId: _executionId,
|
||||
sourceNode: _sourceNode,
|
||||
siteEventLogger: _siteEventLogger,
|
||||
// WaitForAttribute (spec §4.3): an inline shared-script call shares the
|
||||
// parent run's execution-timeout token so a WaitAsync inside the shared
|
||||
// script is bounded by the SAME script deadline.
|
||||
scriptTimeoutToken: _scriptTimeoutToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -307,6 +385,115 @@ public class ScriptRuntimeContext
|
||||
return response.Value;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3-§5): waits event-driven for an attribute to reach
|
||||
/// a value (encoded-equality), satisfy a site-local predicate, or change at all,
|
||||
/// bounded by <paramref name="timeout"/>. Returns <c>true</c> if matched within
|
||||
/// the timeout, <c>false</c> on timeout — NEVER throws on timeout. The backing
|
||||
/// <c>Attributes.WaitAsync</c> for the accessor.
|
||||
///
|
||||
/// <para>
|
||||
/// The Ask is bounded by the script's own execution-timeout token (§4.3): a
|
||||
/// script that hits its <c>ExecutionTimeoutSeconds</c> abandons the wait. The
|
||||
/// Ask timeout is the wait timeout plus a small <see cref="_askTimeout"/> slack
|
||||
/// so the InstanceActor's own scheduled timeout reply is the authoritative path
|
||||
/// for the false/timed-out outcome, not the Ask deadline.
|
||||
/// </para>
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Quality-agnostic by default (spec §4.2):</b> a value arriving at Bad
|
||||
/// quality still satisfies the wait — the match tests the value, not the quality.
|
||||
/// A quality-gated ("Good"-only) mode is a planned enhancement, deferred per spec §4.2.
|
||||
/// </para>
|
||||
///
|
||||
/// <para>
|
||||
/// <b>Never throws on timeout.</b> An <see cref="Akka.Actor.AskTimeoutException"/>
|
||||
/// (the pathological case where the InstanceActor's authoritative timeout reply
|
||||
/// never arrives — actor stopped/restarted) is caught and surfaced as <c>false</c>,
|
||||
/// matching the timeout contract. An <see cref="OperationCanceledException"/> /
|
||||
/// <see cref="TaskCanceledException"/> from the script-deadline token is NOT caught
|
||||
/// — it propagates to abort the script (intended §4.3 behaviour).
|
||||
/// </para>
|
||||
/// </summary>
|
||||
/// <param name="name">The scope-resolved attribute name to wait on.</param>
|
||||
/// <param name="targetValueEncoded">
|
||||
/// The codec-encoded target value; null (with null <paramref name="predicate"/>)
|
||||
/// means "any change".
|
||||
/// </param>
|
||||
/// <param name="predicate">Site-local predicate; null when the encoded target is used.</param>
|
||||
/// <param name="timeout">How long to wait before returning false.</param>
|
||||
/// <param name="requireGoodQuality">
|
||||
/// Quality-gated ("Good"-only) mode (spec §4.2): when <see langword="true"/>, a
|
||||
/// value reaching the target / satisfying the predicate at Bad/Uncertain quality
|
||||
/// is NOT a match — the wait holds until the value satisfies the test at Good
|
||||
/// quality (or times out). Defaults to <see langword="false"/> (quality-agnostic).
|
||||
/// </param>
|
||||
/// <returns><c>true</c> on match within the timeout; <c>false</c> on timeout.</returns>
|
||||
public async Task<bool> WaitAttribute(
|
||||
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
|
||||
bool requireGoodQuality = false)
|
||||
=> (await WaitInternal(name, targetValueEncoded, predicate, timeout, requireGoodQuality)).Matched;
|
||||
|
||||
/// <summary>
|
||||
/// WaitForAttribute (spec §3): the richer overload backing <c>Attributes.WaitForAsync</c>
|
||||
/// — identical semantics to <see cref="WaitAttribute"/> but surfaces the full
|
||||
/// <see cref="WaitResult"/> (matched flag + matched value + quality + timed-out
|
||||
/// flag) instead of a bare bool. Never throws on timeout (see <see cref="WaitInternal"/>).
|
||||
/// </summary>
|
||||
/// <param name="name">The scope-resolved attribute name to wait on.</param>
|
||||
/// <param name="targetValueEncoded">The codec-encoded target value; null (with null predicate) means "any change".</param>
|
||||
/// <param name="predicate">Site-local predicate; null when the encoded target is used.</param>
|
||||
/// <param name="timeout">How long to wait before returning a timed-out result.</param>
|
||||
/// <param name="requireGoodQuality">Quality-gated ("Good"-only) mode (spec §4.2); defaults to <see langword="false"/>.</param>
|
||||
/// <returns>The full <see cref="WaitResult"/> — on timeout: <c>Matched:false, TimedOut:true</c>.</returns>
|
||||
public async Task<WaitResult> WaitAttributeFull(
|
||||
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
|
||||
bool requireGoodQuality = false)
|
||||
{
|
||||
var r = await WaitInternal(name, targetValueEncoded, predicate, timeout, requireGoodQuality);
|
||||
return new WaitResult(r.Matched, r.Value, r.Quality, r.TimedOut);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Shared core for <see cref="WaitAttribute"/> / <see cref="WaitAttributeFull"/>:
|
||||
/// builds the <see cref="WaitForAttributeRequest"/> (incl. the §4.2
|
||||
/// <paramref name="requireGoodQuality"/> flag), Asks the InstanceActor bounded by
|
||||
/// the script's execution-timeout token, and returns the full response. An
|
||||
/// <see cref="AskTimeoutException"/> (the pathological case where the actor's own
|
||||
/// authoritative timeout reply never arrives — actor stopped/restarted) is caught
|
||||
/// and surfaced as a synthetic non-matched/timed-out response, preserving the
|
||||
/// "never throw on timeout" contract. An <see cref="OperationCanceledException"/> /
|
||||
/// <see cref="TaskCanceledException"/> from the script-deadline token is NOT caught
|
||||
/// — it propagates to abort the script (§4.3).
|
||||
/// </summary>
|
||||
private async Task<WaitForAttributeResponse> WaitInternal(
|
||||
string name, string? targetValueEncoded, Func<object?, bool>? predicate, TimeSpan timeout,
|
||||
bool requireGoodQuality)
|
||||
{
|
||||
var cid = Guid.NewGuid().ToString();
|
||||
var req = new WaitForAttributeRequest(
|
||||
cid, _instanceName, name, targetValueEncoded, predicate, timeout, DateTimeOffset.UtcNow,
|
||||
requireGoodQuality);
|
||||
|
||||
try
|
||||
{
|
||||
return await _instanceActor.Ask<WaitForAttributeResponse>(
|
||||
req, timeout + _askTimeout, _scriptTimeoutToken);
|
||||
}
|
||||
catch (AskTimeoutException)
|
||||
{
|
||||
// Pathological: the InstanceActor's own scheduled timeout reply never
|
||||
// arrived (e.g. the actor stopped/restarted under us). The helper's
|
||||
// contract is "false on timeout, never throw" — so synthesize a
|
||||
// non-matched/timed-out response rather than leaking the Ask exception.
|
||||
// OperationCanceledException / TaskCanceledException from the
|
||||
// script-deadline token are deliberately NOT caught here: they must
|
||||
// propagate to abort the script (§4.3).
|
||||
return new WaitForAttributeResponse(
|
||||
cid, Matched: false, null, null, TimedOut: true);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Sets an attribute value. For data-connected attributes the Instance Actor
|
||||
/// forwards the write to the DCL, which writes the physical device; the
|
||||
@@ -366,7 +553,14 @@ public class ScriptRuntimeContext
|
||||
scriptName,
|
||||
ScriptArgs.Normalize(parameters),
|
||||
nextDepth,
|
||||
correlationId);
|
||||
correlationId,
|
||||
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the child
|
||||
// script run is a NEW execution spawned BY this run. Its parent is
|
||||
// THIS run's own ExecutionId — NOT the inherited _parentExecutionId.
|
||||
// So A → CallScript(B) yields B.ParentExecutionId == A.ExecutionId,
|
||||
// building a true multi-level execution tree rather than flattening
|
||||
// every nested call under the original inbound spawner.
|
||||
ParentExecutionId: _executionId);
|
||||
|
||||
// Ask the Instance Actor, which routes to the appropriate Script Actor
|
||||
var result = await _instanceActor.Ask<ScriptCallResult>(request, _askTimeout);
|
||||
@@ -526,8 +720,14 @@ public class ScriptRuntimeContext
|
||||
throw new InvalidOperationException(msg);
|
||||
}
|
||||
|
||||
// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): the shared
|
||||
// script runs inline, but is modelled as its OWN execution node — a
|
||||
// child context mints a fresh ExecutionId parented to the caller's
|
||||
// ExecutionId, so its audit rows chain under the calling run.
|
||||
var childContext = _context.CreateChildContextForSharedScript(nextDepth);
|
||||
|
||||
return await _library.ExecuteAsync(
|
||||
scriptName, _context, ScriptArgs.Normalize(parameters), cancellationToken);
|
||||
scriptName, childContext, ScriptArgs.Normalize(parameters), cancellationToken);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
+6
@@ -362,6 +362,9 @@ public class AuditLogIngestActorCombinedTelemetryTests : TestKit, IClassFixture<
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -399,5 +402,8 @@ public class AuditLogIngestActorCombinedTelemetryTests : TestKit, IClassFixture<
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -216,6 +216,14 @@ public class AuditLogIngestActorTests : TestKit, IClassFixture<MsSqlMigrationFix
|
||||
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
|
||||
_inner.SwitchOutPartitionAsync(monthBoundary, ct);
|
||||
|
||||
public Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
|
||||
_inner.PurgeChannelOlderThanAsync(channel, threshold, batchSize, ct);
|
||||
|
||||
public Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
|
||||
_inner.BackfillSourceNodeAsync(sentinel, before, batchSize, ct);
|
||||
|
||||
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
|
||||
DateTime threshold, CancellationToken ct = default) =>
|
||||
_inner.GetPartitionBoundariesOlderThanAsync(threshold, ct);
|
||||
|
||||
@@ -51,6 +51,12 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
public DateTime? ThrowOnBoundary { get; set; }
|
||||
public Exception? BoundaryException { get; set; }
|
||||
|
||||
// M5.5 (T3): records every per-channel purge call as
|
||||
// (channel, threshold, batchSize) so tests can assert which channels the
|
||||
// actor chose to purge and with what window.
|
||||
public List<(string Channel, DateTime Threshold, int BatchSize)> ChannelPurges { get; } = new();
|
||||
public Func<string, long> RowsPerChannel { get; set; } = _ => 0L;
|
||||
|
||||
// The actor enumerator returns whichever list is configured here.
|
||||
// Mutating this between ticks lets tests simulate "no longer
|
||||
// eligible" boundaries on the second tick.
|
||||
@@ -80,6 +86,17 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
return Task.FromResult<IReadOnlyList<DateTime>>(Boundaries.ToArray());
|
||||
}
|
||||
|
||||
public Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel, DateTime threshold, int batchSize, CancellationToken ct = default)
|
||||
{
|
||||
ChannelPurges.Add((channel, threshold, batchSize));
|
||||
return Task.FromResult(RowsPerChannel(channel));
|
||||
}
|
||||
|
||||
public Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
|
||||
public Task<ZB.MOM.WW.ScadaBridge.Commons.Types.AuditLogKpiSnapshot> GetKpiSnapshotAsync(
|
||||
TimeSpan window, DateTime? nowUtc = null, CancellationToken ct = default) =>
|
||||
Task.FromResult(new ZB.MOM.WW.ScadaBridge.Commons.Types.AuditLogKpiSnapshot(0L, 0L, 0L, nowUtc ?? DateTime.UtcNow));
|
||||
@@ -268,21 +285,32 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
// Today is ~2026-05-20 per the test environment. With RetentionDays =
|
||||
// 60 the actor computes threshold ≈ 2026-03-21:
|
||||
// * Jan partition (MAX = Jan 15) → older than threshold → PURGED
|
||||
// * Apr partition (MAX = Apr 15) → newer than threshold → KEPT
|
||||
// Seeds two rows within the defined pf_AuditLog_Month partition range (Jan 2026 –
|
||||
// Dec 2027). RetentionDays is computed dynamically so the purge threshold always
|
||||
// anchors near 2026-01-20, keeping the test date-independent:
|
||||
// old row = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → partition PURGED
|
||||
// kept row = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → partition KEPT
|
||||
//
|
||||
// Using a fixed thresholdAnchor rather than "N months ago" avoids the problem
|
||||
// of relative seeds landing before 2026-01-01 (the catch-all partition that
|
||||
// GetPartitionBoundariesOlderThanAsync never returns).
|
||||
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
|
||||
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
|
||||
|
||||
var oldOccurred = new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc);
|
||||
var keptOccurred = new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var janEvt = ScadaBridgeAuditEventFactory.Create(
|
||||
var oldEvt = ScadaBridgeAuditEventFactory.Create(
|
||||
eventId: Guid.NewGuid(),
|
||||
occurredAtUtc: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
|
||||
occurredAtUtc: oldOccurred,
|
||||
channel: AuditChannel.ApiOutbound,
|
||||
kind: AuditKind.ApiCall,
|
||||
status: AuditStatus.Delivered,
|
||||
sourceSiteId: siteId);
|
||||
var aprEvt = ScadaBridgeAuditEventFactory.Create(
|
||||
var keptEvt = ScadaBridgeAuditEventFactory.Create(
|
||||
eventId: Guid.NewGuid(),
|
||||
occurredAtUtc: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc),
|
||||
occurredAtUtc: keptOccurred,
|
||||
channel: AuditChannel.ApiOutbound,
|
||||
kind: AuditKind.ApiCall,
|
||||
status: AuditStatus.Delivered,
|
||||
@@ -291,8 +319,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
await using (var seedContext = CreateMsSqlContext())
|
||||
{
|
||||
var seedRepo = new AuditLogRepository(seedContext);
|
||||
await seedRepo.InsertIfNotExistsAsync(janEvt);
|
||||
await seedRepo.InsertIfNotExistsAsync(aprEvt);
|
||||
await seedRepo.InsertIfNotExistsAsync(oldEvt);
|
||||
await seedRepo.InsertIfNotExistsAsync(keptEvt);
|
||||
}
|
||||
|
||||
// Wire the actor's DI scope to the real repository against the
|
||||
@@ -306,7 +334,7 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
services.AddScoped<IAuditLogRepository, AuditLogRepository>();
|
||||
var sp = services.BuildServiceProvider();
|
||||
|
||||
var auditOptions = new AuditLogOptions { RetentionDays = 60 };
|
||||
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
|
||||
var purgeOptions = new AuditLogPurgeOptions
|
||||
{
|
||||
IntervalHours = 24,
|
||||
@@ -320,13 +348,9 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
Options.Create(auditOptions),
|
||||
NullLogger<AuditLogPurgeActor>.Instance)));
|
||||
|
||||
// The probe receives one AuditLogPurgedEvent per partition the actor
|
||||
// purges per tick — other test runs that share the fixture DB may
|
||||
// also leave behind eligible partitions, but this test creates its
|
||||
// own fixture DB so the Jan-2026 partition is the only eligible one.
|
||||
// Use FishForMessage to filter just in case, with a generous timeout
|
||||
// because the real drop-and-rebuild dance against MSSQL routinely
|
||||
// takes a couple of seconds on a busy dev container.
|
||||
// Fish for the Jan-2026 partition boundary — the only eligible one in this
|
||||
// fixture DB. The generous timeout covers the real drop-and-rebuild dance
|
||||
// against MSSQL which routinely takes a couple of seconds on a busy dev container.
|
||||
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
|
||||
isMessage: m => m.MonthBoundary == janBoundary,
|
||||
@@ -342,8 +366,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
|
||||
Assert.DoesNotContain(rows, r => r.EventId == janEvt.EventId);
|
||||
Assert.Contains(rows, r => r.EventId == aprEvt.EventId);
|
||||
Assert.DoesNotContain(rows, r => r.EventId == oldEvt.EventId);
|
||||
Assert.Contains(rows, r => r.EventId == keptEvt.EventId);
|
||||
}
|
||||
|
||||
private ScadaBridgeDbContext CreateMsSqlContext() =>
|
||||
@@ -381,4 +405,90 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
Math.Abs((threshold - expected).TotalMinutes) < 1.0,
|
||||
$"threshold {threshold:o} should be within 1 minute of {expected:o}");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// 8. PerChannelOverride_ShorterThanGlobal_TriggersChannelPurge (M5.5 T3)
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void PerChannelOverride_ShorterThanGlobal_TriggersChannelPurge()
|
||||
{
|
||||
// ApiOutbound has a 30-day override under a 365-day global window — strictly
|
||||
// shorter, so the actor must run a per-channel purge with a threshold of
|
||||
// ~today-30d and the configured batch size.
|
||||
var repo = new RecordingRepo { Boundaries = new List<DateTime>() };
|
||||
var purgeOptions = FastTickOptions();
|
||||
purgeOptions.ChannelPurgeBatchSizeConfigured = 1234;
|
||||
|
||||
// Build the options OUTSIDE the Props expression tree — a collection/dictionary
|
||||
// initializer is not legal inside an expression-tree lambda (CS8074).
|
||||
var auditOptions = Options.Create(new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 365,
|
||||
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiOutbound"] = 30 },
|
||||
});
|
||||
var purgeOptionsWrapped = Options.Create(purgeOptions);
|
||||
|
||||
var sp = BuildScopedProvider(repo);
|
||||
Sys.ActorOf(Props.Create(() => new AuditLogPurgeActor(
|
||||
sp,
|
||||
purgeOptionsWrapped,
|
||||
auditOptions,
|
||||
NullLogger<AuditLogPurgeActor>.Instance)));
|
||||
|
||||
AwaitAssert(
|
||||
() => Assert.Contains(repo.ChannelPurges, p => p.Channel == "ApiOutbound"),
|
||||
duration: TimeSpan.FromSeconds(3),
|
||||
interval: TimeSpan.FromMilliseconds(50));
|
||||
|
||||
var purge = repo.ChannelPurges.First(p => p.Channel == "ApiOutbound");
|
||||
Assert.Equal(1234, purge.BatchSize);
|
||||
|
||||
var expected = DateTime.UtcNow - TimeSpan.FromDays(30);
|
||||
Assert.True(
|
||||
Math.Abs((purge.Threshold - expected).TotalMinutes) < 1.0,
|
||||
$"channel threshold {purge.Threshold:o} should be within 1 minute of {expected:o}");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// 9. PerChannelOverride_EqualOrLongerThanGlobal_SkipsChannelPurge (M5.5 T3)
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void PerChannelOverride_EqualOrLongerThanGlobal_SkipsChannelPurge()
|
||||
{
|
||||
// DbOutbound = 365 (== global) and Notification = 400 (> global, validator would
|
||||
// normally reject this but the actor must defensively skip it too). Neither is
|
||||
// SHORTER than the global window, so the actor must NOT issue a channel purge —
|
||||
// the global partition switch-out already governs those rows.
|
||||
var repo = new RecordingRepo { Boundaries = new List<DateTime>() };
|
||||
|
||||
// Build the options OUTSIDE the Props expression tree (CS8074).
|
||||
var auditOptions = Options.Create(new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 365,
|
||||
PerChannelRetentionDays = new Dictionary<string, int>
|
||||
{
|
||||
["DbOutbound"] = 365,
|
||||
["Notification"] = 400,
|
||||
},
|
||||
});
|
||||
var purgeOptions = Options.Create(FastTickOptions());
|
||||
|
||||
var sp = BuildScopedProvider(repo);
|
||||
Sys.ActorOf(Props.Create(() => new AuditLogPurgeActor(
|
||||
sp,
|
||||
purgeOptions,
|
||||
auditOptions,
|
||||
NullLogger<AuditLogPurgeActor>.Instance)));
|
||||
|
||||
// Wait for at least one tick (visible via the enumerator call), then assert no
|
||||
// channel purge was issued.
|
||||
AwaitAssert(
|
||||
() => Assert.True(repo.ThresholdQueries.Count >= 1),
|
||||
duration: TimeSpan.FromSeconds(3),
|
||||
interval: TimeSpan.FromMilliseconds(50));
|
||||
|
||||
Assert.Empty(repo.ChannelPurges);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,6 +8,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
|
||||
using IAuditInboundCeilingHitsCounter = ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.AuditLog.Tests.Central;
|
||||
|
||||
@@ -43,6 +44,12 @@ public class CentralAuditWriteFailuresTests : TestKit
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>());
|
||||
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
public Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
public Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
|
||||
DateTime threshold, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<DateTime>>(Array.Empty<DateTime>());
|
||||
@@ -163,6 +170,69 @@ public class CentralAuditWriteFailuresTests : TestKit
|
||||
var snapshot = new AuditCentralHealthSnapshot();
|
||||
Assert.Equal(0, snapshot.CentralAuditWriteFailures);
|
||||
Assert.Equal(0, snapshot.AuditRedactionFailure);
|
||||
Assert.Equal(0, snapshot.AuditInboundCeilingHits);
|
||||
Assert.Empty(snapshot.SiteAuditTelemetryStalled);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// M5.3 (T7) AuditInboundCeilingHits counter
|
||||
// AuditCentralHealthSnapshot implements IAuditInboundCeilingHitsCounter.
|
||||
// Incrementing through the interface surface is reflected on the snapshot.
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void AuditInboundCeilingHits_StartsAtZero()
|
||||
{
|
||||
var snapshot = new AuditCentralHealthSnapshot();
|
||||
Assert.Equal(0, snapshot.AuditInboundCeilingHits);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AuditInboundCeilingHits_IncrementedThroughInterface_ReflectedOnSnapshot()
|
||||
{
|
||||
var snapshot = new AuditCentralHealthSnapshot();
|
||||
var counter = (IAuditInboundCeilingHitsCounter)snapshot;
|
||||
|
||||
counter.Increment();
|
||||
counter.Increment();
|
||||
counter.Increment();
|
||||
|
||||
Assert.Equal(3, snapshot.AuditInboundCeilingHits);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AuditInboundCeilingHits_IsThreadSafe()
|
||||
{
|
||||
// Interlocked increment must produce the correct count under concurrent
|
||||
// increments — same shape as the existing counter tests.
|
||||
var snapshot = new AuditCentralHealthSnapshot();
|
||||
var counter = (IAuditInboundCeilingHitsCounter)snapshot;
|
||||
const int incrementCount = 1000;
|
||||
|
||||
Parallel.For(0, incrementCount, _ => counter.Increment());
|
||||
|
||||
Assert.Equal(incrementCount, snapshot.AuditInboundCeilingHits);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AuditInboundCeilingHits_IsIndependentOfOtherCounters()
|
||||
{
|
||||
// Ceiling-hits increments must not cross-contaminate the other counters
|
||||
// and vice versa — each Interlocked field is independent.
|
||||
var snapshot = new AuditCentralHealthSnapshot();
|
||||
var ceilingCounter = (IAuditInboundCeilingHitsCounter)snapshot;
|
||||
var writeCounter = (ICentralAuditWriteFailureCounter)snapshot;
|
||||
var redactCounter = (ZB.MOM.WW.ScadaBridge.AuditLog.Payload.IAuditRedactionFailureCounter)snapshot;
|
||||
|
||||
ceilingCounter.Increment();
|
||||
ceilingCounter.Increment();
|
||||
writeCounter.Increment();
|
||||
redactCounter.Increment();
|
||||
redactCounter.Increment();
|
||||
redactCounter.Increment();
|
||||
|
||||
Assert.Equal(2, snapshot.AuditInboundCeilingHits);
|
||||
Assert.Equal(1, snapshot.CentralAuditWriteFailures);
|
||||
Assert.Equal(3, snapshot.AuditRedactionFailure);
|
||||
}
|
||||
}
|
||||
|
||||
+8
@@ -89,6 +89,14 @@ public class SiteAuditReconciliationActorTests : TestKit, IClassFixture<MsSqlMig
|
||||
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
|
||||
public Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel, DateTime threshold, int batchSize, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
|
||||
public Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel, DateTime before, int batchSize, CancellationToken ct = default) =>
|
||||
Task.FromResult(0L);
|
||||
|
||||
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
|
||||
DateTime threshold, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<DateTime>>(Array.Empty<DateTime>());
|
||||
|
||||
+103
@@ -50,4 +50,107 @@ public class AuditLogOptionsValidatorTests
|
||||
result.Failures!,
|
||||
f => f.Contains(nameof(AuditLogOptions.InboundMaxBytes), StringComparison.Ordinal));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// M5.5 (T3) per-channel retention overrides
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_ShorterThanGlobal_Passes()
|
||||
{
|
||||
// A per-channel window strictly shorter than the global window is the
|
||||
// sanctioned case — the purge actor expires those rows earlier via the
|
||||
// maintenance-path row DELETE.
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 365,
|
||||
PerChannelRetentionDays = new Dictionary<string, int>
|
||||
{
|
||||
["ApiOutbound"] = 90,
|
||||
["Notification"] = 30, // floor (MinRetentionDays)
|
||||
},
|
||||
};
|
||||
|
||||
Assert.True(validator.Validate(null, opts).Succeeded);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_EqualToGlobal_Passes()
|
||||
{
|
||||
// Equal to global is allowed (the bound is [Min, RetentionDays] inclusive);
|
||||
// the purge actor simply treats it as a no-op since it is not SHORTER.
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 200,
|
||||
PerChannelRetentionDays = new Dictionary<string, int> { ["DbOutbound"] = 200 },
|
||||
};
|
||||
|
||||
Assert.True(validator.Validate(null, opts).Succeeded);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_LongerThanGlobal_Fails()
|
||||
{
|
||||
// A per-channel window LONGER than the global window is meaningless under
|
||||
// month-partition switch-out (governed by the global window) and is rejected.
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 100,
|
||||
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiInbound"] = 200 },
|
||||
};
|
||||
|
||||
var result = validator.Validate(null, opts);
|
||||
Assert.False(result.Succeeded);
|
||||
Assert.Contains(
|
||||
result.Failures!,
|
||||
f => f.Contains(nameof(AuditLogOptions.PerChannelRetentionDays), StringComparison.Ordinal)
|
||||
&& f.Contains("ApiInbound", StringComparison.Ordinal));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_BelowMinimum_Fails()
|
||||
{
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 365,
|
||||
PerChannelRetentionDays = new Dictionary<string, int> { ["ApiOutbound"] = 29 },
|
||||
};
|
||||
|
||||
var result = validator.Validate(null, opts);
|
||||
Assert.False(result.Succeeded);
|
||||
Assert.Contains(
|
||||
result.Failures!,
|
||||
f => f.Contains(nameof(AuditLogOptions.PerChannelRetentionDays), StringComparison.Ordinal));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_UnknownChannelKey_Fails()
|
||||
{
|
||||
// Keys must be recognized AuditChannel names; a typo / unknown key is rejected
|
||||
// rather than silently ignored so a misconfiguration surfaces at boot.
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
RetentionDays = 365,
|
||||
PerChannelRetentionDays = new Dictionary<string, int> { ["NotAChannel"] = 90 },
|
||||
};
|
||||
|
||||
var result = validator.Validate(null, opts);
|
||||
Assert.False(result.Succeeded);
|
||||
Assert.Contains(
|
||||
result.Failures!,
|
||||
f => f.Contains("NotAChannel", StringComparison.Ordinal));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Validate_PerChannelRetention_DefaultEmpty_Passes()
|
||||
{
|
||||
// The default (no overrides) must pass — this is the common case.
|
||||
var validator = new AuditLogOptionsValidator();
|
||||
Assert.True(validator.Validate(null, new AuditLogOptions()).Succeeded);
|
||||
}
|
||||
}
|
||||
|
||||
+6
@@ -623,5 +623,11 @@ public class ParentExecutionIdCorrelationTests : TestKit, IClassFixture<MsSqlMig
|
||||
public Task<RouteToSetAttributesResponse> RouteToSetAttributesAsync(
|
||||
string siteId, RouteToSetAttributesRequest request, CancellationToken cancellationToken)
|
||||
=> throw new NotSupportedException();
|
||||
|
||||
// WaitForAttribute is not part of this fixture's routed-Call audit scenario;
|
||||
// mirror the other non-Call methods (unexercised here).
|
||||
public Task<RouteToWaitForAttributeResponse> RouteToWaitForAttributeAsync(
|
||||
string siteId, RouteToWaitForAttributeRequest request, CancellationToken cancellationToken)
|
||||
=> throw new NotSupportedException();
|
||||
}
|
||||
}
|
||||
|
||||
@@ -67,19 +67,25 @@ public class PartitionPurgeTests : TestKit, IClassFixture<MsSqlMigrationFixture>
|
||||
SqlConnection conn,
|
||||
Guid eventId,
|
||||
DateTime occurredAtUtc,
|
||||
string siteId)
|
||||
string siteId,
|
||||
string channel = "ApiOutbound",
|
||||
string kind = "ApiCall")
|
||||
{
|
||||
await using var cmd = conn.CreateCommand();
|
||||
// C5 (Task 2.5): dbo.AuditLog is now the 10 canonical columns + DetailsJson;
|
||||
// the ScadaBridge domain fields (channel/kind/status/sourceSiteId) ride in
|
||||
// DetailsJson and the SourceSiteId/Kind/Status computed columns auto-derive.
|
||||
// Action = "{channel}.{kind}", Category = channel name, Outcome = Success.
|
||||
// The channel/kind are parameterized so the M5.5 per-channel purge test can
|
||||
// seed multiple channels into the same partition.
|
||||
cmd.CommandText = @"
|
||||
INSERT INTO dbo.AuditLog
|
||||
(EventId, OccurredAtUtc, Actor, Action, Outcome, Category, Target, SourceNode, CorrelationId, DetailsJson)
|
||||
VALUES
|
||||
(@EventId, @OccurredAtUtc, NULL, 'ApiOutbound.ApiCall', 'Success', 'ApiOutbound', NULL, NULL, NULL,
|
||||
(@EventId, @OccurredAtUtc, NULL, @Action, 'Success', @Category, NULL, NULL, NULL,
|
||||
@DetailsJson);";
|
||||
cmd.Parameters.Add("@Action", System.Data.SqlDbType.VarChar, 64).Value = $"{channel}.{kind}";
|
||||
cmd.Parameters.Add("@Category", System.Data.SqlDbType.VarChar, 32).Value = channel;
|
||||
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
|
||||
// SqlDbType.DateTime2 with explicit Scale 7 matches the
|
||||
// OccurredAtUtc column shape (datetime2(7)) and avoids the implicit
|
||||
@@ -97,7 +103,7 @@ VALUES
|
||||
// the computed SourceSiteId column the verify queries scope on. payloadTruncated
|
||||
// is always present (the codec always writes the bool).
|
||||
var detailsJson =
|
||||
"{\"channel\":\"ApiOutbound\",\"kind\":\"ApiCall\",\"status\":\"Delivered\"," +
|
||||
"{\"channel\":\"" + channel + "\",\"kind\":\"" + kind + "\",\"status\":\"Delivered\"," +
|
||||
"\"sourceSiteId\":\"" + siteId + "\",\"payloadTruncated\":false}";
|
||||
cmd.Parameters.Add("@DetailsJson", System.Data.SqlDbType.NVarChar, -1).Value = detailsJson;
|
||||
await cmd.ExecuteNonQueryAsync();
|
||||
@@ -134,10 +140,49 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
NullLogger<AuditLogPurgeActor>.Instance)));
|
||||
}
|
||||
|
||||
private static (DateTime Jan, DateTime Feb, DateTime Mar) SeedOccurredAt() => (
|
||||
new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
|
||||
new DateTime(2026, 2, 15, 0, 0, 0, DateTimeKind.Utc),
|
||||
new DateTime(2026, 3, 15, 0, 0, 0, DateTimeKind.Utc));
|
||||
/// <summary>
|
||||
/// Returns three seed timestamps and a computed <c>RetentionDays</c> value that
|
||||
/// keep the purge-intent date-independent regardless of when the test runs.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// The partition function <c>pf_AuditLog_Month</c> has explicit boundaries only
|
||||
/// for 2026-01-01 through 2027-12-01. Rows outside that range land in the
|
||||
/// catch-all partitions which have no <c>partition_range_values</c> entry and are
|
||||
/// therefore never returned by
|
||||
/// <see cref="IAuditLogRepository.GetPartitionBoundariesOlderThanAsync"/>.
|
||||
/// All three seeds must therefore fall inside the defined boundary range.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// To remain date-independent the test computes <c>RetentionDays</c> dynamically
|
||||
/// so the purge threshold always lands near <b>2026-01-20</b>:
|
||||
/// <code>
|
||||
/// RetentionDays = (int)(DateTime.UtcNow - new DateTime(2026, 1, 20, UTC)).TotalDays + 1
|
||||
/// </code>
|
||||
/// This gives:
|
||||
/// <list type="bullet">
|
||||
/// <item>Jan 15 2026 row → Jan 15 < Jan 20 threshold → <b>PURGED</b>.</item>
|
||||
/// <item>Apr 15 / Jun 15 2026 rows → both after Jan 20 → <b>KEPT</b>.</item>
|
||||
/// </list>
|
||||
/// The threshold anchors to a fixed calendar point (~Jan 20 2026), so the
|
||||
/// relationship holds for any future run date as long as the explicit partition
|
||||
/// boundaries remain.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
private static (DateTime Old, DateTime Mid, DateTime Recent, int RetentionDays) SeedOccurredAt()
|
||||
{
|
||||
// Anchor the threshold midway through January 2026 — strictly after the
|
||||
// "old" seed (Jan 15) and strictly before the "mid" seed (Apr 15).
|
||||
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
|
||||
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
|
||||
|
||||
return (
|
||||
Old: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc), // in Jan-2026 partition → PURGED
|
||||
Mid: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc), // in Apr-2026 partition → KEPT
|
||||
Recent: new DateTime(2026, 6, 15, 0, 0, 0, DateTimeKind.Utc), // in Jun-2026 partition → KEPT
|
||||
RetentionDays: retentionDays
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// 1. EndToEnd_OldestPartition_PurgedViaActor_NewerKept
|
||||
@@ -148,24 +193,23 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
// Test date is ~2026-05-20 per environment. We want a threshold that
|
||||
// sits strictly between Jan 15 (the Jan partition's MAX) and Feb 15
|
||||
// (the Feb partition's MAX) so only the Jan-2026 partition is
|
||||
// eligible for purge. RetentionDays = 100 gives a threshold of
|
||||
// ~2026-02-09 — Jan 15 is older (purged), Feb 15 and Mar 15 are
|
||||
// newer (kept). The window between Jan 15 and Feb 15 is wide enough
|
||||
// (~30 days) to tolerate any plausible test-clock drift in CI.
|
||||
// Seeds three rows in distinct calendar months. RetentionDays is computed
|
||||
// dynamically so the purge threshold always lands near 2026-01-20 (see
|
||||
// SeedOccurredAt() for the full rationale):
|
||||
// Old = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → PURGED
|
||||
// Mid = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → KEPT
|
||||
// Recent = Jun 15 2026 → Jun 15 > threshold ~Jan 20 → KEPT
|
||||
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var janEventId = Guid.NewGuid();
|
||||
var febEventId = Guid.NewGuid();
|
||||
var marEventId = Guid.NewGuid();
|
||||
var (janOccurred, febOccurred, marOccurred) = SeedOccurredAt();
|
||||
var oldEventId = Guid.NewGuid();
|
||||
var midEventId = Guid.NewGuid();
|
||||
var recentEventId = Guid.NewGuid();
|
||||
var (oldOccurred, midOccurred, recentOccurred, retentionDays) = SeedOccurredAt();
|
||||
|
||||
await using (var seedConn = _fixture.OpenConnection())
|
||||
{
|
||||
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, febEventId, febOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, marEventId, marOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, midEventId, midOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, recentEventId, recentOccurred, siteId);
|
||||
}
|
||||
|
||||
// Wire the actor with a real EF context against the fixture DB.
|
||||
@@ -184,15 +228,11 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
IntervalHours = 24,
|
||||
IntervalOverride = TimeSpan.FromMilliseconds(100),
|
||||
};
|
||||
var auditOptions = new AuditLogOptions { RetentionDays = 100 };
|
||||
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
|
||||
|
||||
CreateActor(sp, purgeOptions, auditOptions);
|
||||
|
||||
// Wait for the actor's tick to purge the Jan-2026 partition.
|
||||
// Concurrent test runs against the same fixture might also create
|
||||
// eligible partitions, but each test class owns its own fixture DB
|
||||
// (MsSqlMigrationFixture seeds a guid-named DB per class), so the
|
||||
// Jan-2026 boundary is the only one this test can have produced.
|
||||
// The Jan-2026 partition boundary is the only eligible one in this fixture DB.
|
||||
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
|
||||
isMessage: m => m.MonthBoundary == janBoundary,
|
||||
@@ -200,9 +240,7 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
Assert.True(matched.RowsDeleted >= 1,
|
||||
$"Expected RowsDeleted >= 1 for Jan-2026 boundary; got {matched.RowsDeleted}.");
|
||||
|
||||
// Allow a brief settle in case the actor is mid-tick on Feb/Mar
|
||||
// (it shouldn't be, since RetentionDays = 90 means only Jan is
|
||||
// eligible, but the actor MAY re-enumerate quickly while we read).
|
||||
// Allow a brief settle in case the actor re-enumerates quickly.
|
||||
await Task.Delay(TimeSpan.FromMilliseconds(500));
|
||||
|
||||
await using var verify = CreateContext();
|
||||
@@ -210,11 +248,10 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
|
||||
// Jan removed; Feb + Mar untouched. Because the test owns the site
|
||||
// id and the fixture DB, exact set membership is observable.
|
||||
Assert.DoesNotContain(rows, r => r.EventId == janEventId);
|
||||
Assert.Contains(rows, r => r.EventId == febEventId);
|
||||
Assert.Contains(rows, r => r.EventId == marEventId);
|
||||
// Old (Jan) removed; Mid (Apr) + Recent (Jun) untouched.
|
||||
Assert.DoesNotContain(rows, r => r.EventId == oldEventId);
|
||||
Assert.Contains(rows, r => r.EventId == midEventId);
|
||||
Assert.Contains(rows, r => r.EventId == recentEventId);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
@@ -226,20 +263,19 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
// Same shape as test 1 — purge the Jan-2026 partition and then
|
||||
// assert the UX_AuditLog_EventId index is still present. The
|
||||
// drop-and-rebuild dance briefly removes it inside its transaction
|
||||
// (the SWITCH PARTITION step requires the non-aligned unique index
|
||||
// to be absent), but step 5 rebuilds it before committing. Sanity-
|
||||
// checking the post-COMMIT shape here documents the invariant in an
|
||||
// assertable way.
|
||||
// Same shape as test 1 — purge the Jan-2026 partition and then assert the
|
||||
// UX_AuditLog_EventId index is still present. RetentionDays is computed
|
||||
// dynamically so the threshold always lands near 2026-01-20 (see SeedOccurredAt()).
|
||||
// The drop-and-rebuild dance briefly removes the index inside its transaction
|
||||
// (the SWITCH PARTITION step requires the non-aligned unique index to be absent),
|
||||
// but step 5 rebuilds it before committing.
|
||||
var siteId = "purge-uxidx-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var janEventId = Guid.NewGuid();
|
||||
var (janOccurred, _, _) = SeedOccurredAt();
|
||||
var oldEventId = Guid.NewGuid();
|
||||
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
|
||||
|
||||
await using (var seedConn = _fixture.OpenConnection())
|
||||
{
|
||||
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
|
||||
}
|
||||
|
||||
var services = new ServiceCollection();
|
||||
@@ -259,7 +295,7 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
IntervalHours = 24,
|
||||
IntervalOverride = TimeSpan.FromMilliseconds(100),
|
||||
},
|
||||
new AuditLogOptions { RetentionDays = 90 });
|
||||
new AuditLogOptions { RetentionDays = retentionDays });
|
||||
|
||||
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
probe.FishForMessage<AuditLogPurgedEvent>(
|
||||
@@ -281,18 +317,19 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
// Seed + purge a Jan-2026 row, THEN exercise InsertIfNotExistsAsync
|
||||
// twice for a fresh (May-2026) EventId. The second call must be a
|
||||
// no-op (duplicate-key collision swallowed by the repository, per
|
||||
// M2 Bundle A's race-fix) — which means the rebuilt
|
||||
// UX_AuditLog_EventId unique index is functioning as intended.
|
||||
// Seed + purge the Jan-2026 row, THEN exercise InsertIfNotExistsAsync twice for
|
||||
// a fresh recent EventId. The second call must be a no-op (duplicate-key collision
|
||||
// swallowed by the repository, per M2 Bundle A's race-fix) — which means the
|
||||
// rebuilt UX_AuditLog_EventId unique index is functioning as intended.
|
||||
// RetentionDays is computed dynamically so the threshold always lands near
|
||||
// 2026-01-20 (see SeedOccurredAt()).
|
||||
var siteId = "purge-idem-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var janEventId = Guid.NewGuid();
|
||||
var (janOccurred, _, _) = SeedOccurredAt();
|
||||
var oldEventId = Guid.NewGuid();
|
||||
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
|
||||
|
||||
await using (var seedConn = _fixture.OpenConnection())
|
||||
{
|
||||
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
|
||||
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
|
||||
}
|
||||
|
||||
var services = new ServiceCollection();
|
||||
@@ -312,7 +349,7 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
IntervalHours = 24,
|
||||
IntervalOverride = TimeSpan.FromMilliseconds(100),
|
||||
},
|
||||
new AuditLogOptions { RetentionDays = 90 });
|
||||
new AuditLogOptions { RetentionDays = retentionDays });
|
||||
|
||||
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
probe.FishForMessage<AuditLogPurgedEvent>(
|
||||
@@ -328,7 +365,7 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
await Task.Delay(TimeSpan.FromMilliseconds(500));
|
||||
|
||||
var freshEventId = Guid.NewGuid();
|
||||
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc);
|
||||
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc); // within partition range, well inside retention window
|
||||
var freshSite = "purge-idem-fresh-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var freshEvt = ScadaBridgeAuditEventFactory.Create(
|
||||
eventId: freshEventId,
|
||||
@@ -354,4 +391,87 @@ WHERE name = 'UX_AuditLog_EventId'
|
||||
Assert.Single(rows);
|
||||
Assert.Equal(freshEventId, rows[0].EventId);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// 4. PerChannelOverride_DeletesOnlyOverriddenChannelsOldRows (M5.5 T3)
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
/// <summary>
|
||||
/// M5.5 (T3): exercises <see cref="IAuditLogRepository.PurgeChannelOlderThanAsync"/>
|
||||
/// directly against the real repository + fixture DB. Seeds, in the SAME partition,
|
||||
/// old + recent rows for an OVERRIDDEN channel (<c>ApiOutbound</c>) and old + recent
|
||||
/// rows for an UN-overridden channel (<c>DbOutbound</c>), then runs the per-channel
|
||||
/// purge for <c>ApiOutbound</c> only. Asserts:
|
||||
/// <list type="number">
|
||||
/// <item>The overridden channel's OLD rows are deleted.</item>
|
||||
/// <item>The overridden channel's RECENT rows (newer than the channel threshold) survive.</item>
|
||||
/// <item>The un-overridden channel's rows (old AND recent) are completely untouched
|
||||
/// — they follow the global window, which the channel purge never applies to them.</item>
|
||||
/// </list>
|
||||
/// This is the maintenance-path row DELETE; the fixture connects as <c>sa</c>, which
|
||||
/// the append-only writer-role DENYs do not bind (the role granularity is exercised
|
||||
/// in the repository/migration tests).
|
||||
/// </summary>
|
||||
[SkippableFact]
|
||||
public async Task PerChannelOverride_DeletesOnlyOverriddenChannelsOldRows()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = "perchannel-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
|
||||
// Two timestamps: one OLD (older than the channel threshold we will purge with)
|
||||
// and one RECENT (newer than it). Both sit comfortably inside the retention
|
||||
// window so the global partition purge would NOT touch either — isolating the
|
||||
// per-channel DELETE as the only force acting here.
|
||||
var oldOccurred = new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc);
|
||||
var recentOccurred = new DateTime(2026, 5, 15, 0, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
var apiOldId = Guid.NewGuid(); // ApiOutbound, old → SHOULD be deleted
|
||||
var apiRecentId = Guid.NewGuid(); // ApiOutbound, recent→ SHOULD survive
|
||||
var dbOldId = Guid.NewGuid(); // DbOutbound, old → SHOULD survive (un-overridden)
|
||||
var dbRecentId = Guid.NewGuid(); // DbOutbound, recent → SHOULD survive
|
||||
|
||||
await using (var seedConn = _fixture.OpenConnection())
|
||||
{
|
||||
await DirectInsertAsync(seedConn, apiOldId, oldOccurred, siteId, channel: "ApiOutbound", kind: "ApiCall");
|
||||
await DirectInsertAsync(seedConn, apiRecentId, recentOccurred, siteId, channel: "ApiOutbound", kind: "ApiCall");
|
||||
await DirectInsertAsync(seedConn, dbOldId, oldOccurred, siteId, channel: "DbOutbound", kind: "DbWrite");
|
||||
await DirectInsertAsync(seedConn, dbRecentId, recentOccurred, siteId, channel: "DbOutbound", kind: "DbWrite");
|
||||
}
|
||||
|
||||
// Purge ApiOutbound rows older than a threshold that sits strictly between the
|
||||
// old (Jan 15) and recent (May 15) seeds — e.g. Mar 1. Only apiOldId qualifies.
|
||||
var channelThreshold = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
await using (var ctx = CreateContext())
|
||||
{
|
||||
var repo = new AuditLogRepository(ctx);
|
||||
var deleted = await repo.PurgeChannelOlderThanAsync(
|
||||
channel: "ApiOutbound",
|
||||
threshold: channelThreshold,
|
||||
batchSize: 2);
|
||||
|
||||
Assert.Equal(1L, deleted);
|
||||
|
||||
// Idempotent: a second run deletes nothing (the eligible row is gone).
|
||||
var deletedAgain = await repo.PurgeChannelOlderThanAsync(
|
||||
channel: "ApiOutbound",
|
||||
threshold: channelThreshold,
|
||||
batchSize: 2);
|
||||
Assert.Equal(0L, deletedAgain);
|
||||
}
|
||||
|
||||
await using var verify = CreateContext();
|
||||
var rows = await verify.Set<AuditLogRow>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
|
||||
// Overridden channel: old gone, recent kept.
|
||||
Assert.DoesNotContain(rows, r => r.EventId == apiOldId);
|
||||
Assert.Contains(rows, r => r.EventId == apiRecentId);
|
||||
|
||||
// Un-overridden channel: BOTH rows untouched (follow the global window).
|
||||
Assert.Contains(rows, r => r.EventId == dbOldId);
|
||||
Assert.Contains(rows, r => r.EventId == dbRecentId);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,244 @@
|
||||
using System.CommandLine;
|
||||
using System.Net;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
using ZB.MOM.WW.ScadaBridge.CLI;
|
||||
using ZB.MOM.WW.ScadaBridge.CLI.Commands;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
|
||||
|
||||
/// <summary>
|
||||
/// Tests for the <c>scadabridge audit backfill-source-node</c> subcommand
|
||||
/// (Audit Log #23 M5.6 T5): argument parsing, request-body construction,
|
||||
/// HTTP wiring, and CLI scaffold.
|
||||
/// </summary>
|
||||
[Collection("Console")]
|
||||
public class AuditBackfillCommandTests
|
||||
{
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// BuildRequestBody
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void BuildRequestBody_DefaultArgs_ContainsExpectedFields()
|
||||
{
|
||||
var args = new AuditBackfillSourceNodeArgs
|
||||
{
|
||||
Sentinel = "unknown",
|
||||
Before = "2026-01-01T00:00:00Z",
|
||||
BatchSize = 5000,
|
||||
};
|
||||
|
||||
var body = AuditBackfillHelpers.BuildRequestBody(args);
|
||||
using var doc = JsonDocument.Parse(body);
|
||||
var root = doc.RootElement;
|
||||
|
||||
Assert.Equal("unknown", root.GetProperty("sentinel").GetString());
|
||||
Assert.Equal("2026-01-01T00:00:00Z", root.GetProperty("before").GetString());
|
||||
Assert.Equal(5000, root.GetProperty("batchSize").GetInt32());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BuildRequestBody_CustomSentinelAndBatch_ReflectedInJson()
|
||||
{
|
||||
var args = new AuditBackfillSourceNodeArgs
|
||||
{
|
||||
Sentinel = "pre-feature",
|
||||
Before = "2026-06-01T00:00:00Z",
|
||||
BatchSize = 1000,
|
||||
};
|
||||
|
||||
var body = AuditBackfillHelpers.BuildRequestBody(args);
|
||||
using var doc = JsonDocument.Parse(body);
|
||||
var root = doc.RootElement;
|
||||
|
||||
Assert.Equal("pre-feature", root.GetProperty("sentinel").GetString());
|
||||
Assert.Equal("2026-06-01T00:00:00Z", root.GetProperty("before").GetString());
|
||||
Assert.Equal(1000, root.GetProperty("batchSize").GetInt32());
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// RunBackfillAsync — HTTP execution
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
private sealed class CapturingHandler : HttpMessageHandler
|
||||
{
|
||||
private readonly HttpStatusCode _status;
|
||||
private readonly string _responseBody;
|
||||
|
||||
public CapturingHandler(HttpStatusCode status, string responseBody)
|
||||
{
|
||||
_status = status;
|
||||
_responseBody = responseBody;
|
||||
}
|
||||
|
||||
public string? LastRequestUri { get; private set; }
|
||||
public string? LastRequestBody { get; private set; }
|
||||
public string? LastMethod { get; private set; }
|
||||
|
||||
protected override async Task<HttpResponseMessage> SendAsync(
|
||||
HttpRequestMessage request, CancellationToken cancellationToken)
|
||||
{
|
||||
LastRequestUri = request.RequestUri!.PathAndQuery;
|
||||
LastMethod = request.Method.Method;
|
||||
if (request.Content != null)
|
||||
{
|
||||
LastRequestBody = await request.Content.ReadAsStringAsync(cancellationToken);
|
||||
}
|
||||
return new HttpResponseMessage(_status)
|
||||
{
|
||||
Content = new StringContent(_responseBody, Encoding.UTF8, "application/json"),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
private static string SuccessBody(long rowsUpdated = 42, string sentinel = "unknown", string before = "2026-01-01T00:00:00.0000000Z")
|
||||
=> JsonSerializer.Serialize(new { rowsUpdated, sentinel, before });
|
||||
|
||||
[Fact]
|
||||
public async Task RunBackfill_Success_ReturnsZeroAndWritesOutput()
|
||||
{
|
||||
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody(rowsUpdated: 42));
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var args = new AuditBackfillSourceNodeArgs
|
||||
{
|
||||
Sentinel = "unknown",
|
||||
Before = "2026-01-01T00:00:00Z",
|
||||
BatchSize = 5000,
|
||||
};
|
||||
|
||||
var exit = await AuditBackfillHelpers.RunBackfillAsync(client, args, output);
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
var text = output.ToString();
|
||||
Assert.Contains("42", text);
|
||||
Assert.Contains("backfill complete", text, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunBackfill_RequestUri_ContainsBackfillPath()
|
||||
{
|
||||
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody());
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
await AuditBackfillHelpers.RunBackfillAsync(
|
||||
client,
|
||||
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
|
||||
output);
|
||||
|
||||
Assert.Contains("backfill-source-node", handler.LastRequestUri);
|
||||
Assert.Equal("POST", handler.LastMethod);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunBackfill_RequestBody_ContainsSentinelAndBefore()
|
||||
{
|
||||
var handler = new CapturingHandler(HttpStatusCode.OK, SuccessBody());
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
await AuditBackfillHelpers.RunBackfillAsync(
|
||||
client,
|
||||
new AuditBackfillSourceNodeArgs
|
||||
{
|
||||
Sentinel = "pre-feature",
|
||||
Before = "2026-01-01T00:00:00Z",
|
||||
BatchSize = 2000,
|
||||
},
|
||||
output);
|
||||
|
||||
Assert.NotNull(handler.LastRequestBody);
|
||||
using var doc = JsonDocument.Parse(handler.LastRequestBody!);
|
||||
Assert.Equal("pre-feature", doc.RootElement.GetProperty("sentinel").GetString());
|
||||
Assert.Equal("2026-01-01T00:00:00Z", doc.RootElement.GetProperty("before").GetString());
|
||||
Assert.Equal(2000, doc.RootElement.GetProperty("batchSize").GetInt32());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunBackfill_Http403_ReturnsExitCode2()
|
||||
{
|
||||
var handler = new CapturingHandler(HttpStatusCode.Forbidden,
|
||||
"{\"error\":\"Permission required.\",\"code\":\"UNAUTHORIZED\"}");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditBackfillHelpers.RunBackfillAsync(
|
||||
client,
|
||||
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
|
||||
output);
|
||||
|
||||
Assert.Equal(2, exit);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunBackfill_Http500_ReturnsExitCode1()
|
||||
{
|
||||
var handler = new CapturingHandler(HttpStatusCode.InternalServerError,
|
||||
"{\"error\":\"boom\",\"code\":\"INTERNAL\"}");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditBackfillHelpers.RunBackfillAsync(
|
||||
client,
|
||||
new AuditBackfillSourceNodeArgs { Sentinel = "unknown", Before = "2026-01-01T00:00:00Z" },
|
||||
output);
|
||||
|
||||
Assert.Equal(1, exit);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// CLI parsing
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void BackfillSourceNode_Subcommand_ExistsInAuditCommandGroup()
|
||||
{
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var parse = root.Parse(new[] { "audit", "backfill-source-node", "--help" });
|
||||
Assert.Empty(parse.Errors);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BackfillSourceNode_BeforeOption_IsRequired()
|
||||
{
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var (exit, _, err) = AuditCommandTestHarness.Invoke(root, "audit", "backfill-source-node");
|
||||
Assert.NotEqual(0, exit);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BackfillSourceNode_HelpText_DescribesSentinelAndBefore()
|
||||
{
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var output = new StringWriter();
|
||||
var exit = root.Parse(new[] { "audit", "backfill-source-node", "--help" })
|
||||
.Invoke(new InvocationConfiguration { Output = output });
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
var text = output.ToString();
|
||||
Assert.Contains("sentinel", text, StringComparison.OrdinalIgnoreCase);
|
||||
Assert.Contains("before", text, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BackfillSourceNode_DefaultSentinel_IsUnknown()
|
||||
{
|
||||
// Verify the default sentinel value is "unknown" as documented.
|
||||
var url = new Option<string>("--url") { Recursive = true };
|
||||
var username = new Option<string>("--username") { Recursive = true };
|
||||
var password = new Option<string>("--password") { Recursive = true };
|
||||
var format = CliOptions.CreateFormatOption();
|
||||
|
||||
var auditGroup = AuditCommands.Build(url, format, username, password);
|
||||
var backfillCmd = auditGroup.Subcommands
|
||||
.FirstOrDefault(c => c.Name == "backfill-source-node");
|
||||
|
||||
Assert.NotNull(backfillCmd);
|
||||
|
||||
// The subcommand exists and its description mentions maintenance/sentinel.
|
||||
Assert.False(string.IsNullOrWhiteSpace(backfillCmd!.Description));
|
||||
}
|
||||
}
|
||||
@@ -5,8 +5,8 @@ namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
|
||||
|
||||
/// <summary>
|
||||
/// Scaffold tests for the <c>scadabridge audit</c> command group (Audit Log #23 M8-T1).
|
||||
/// Verifies the parent command exists with its three subcommands and that every leaf
|
||||
/// has an action wired.
|
||||
/// Verifies the parent command exists with its subcommands and that every leaf
|
||||
/// has an action wired. Updated for M5.6 T5 to cover <c>backfill-source-node</c>.
|
||||
/// </summary>
|
||||
public class AuditCommandsScaffoldTests
|
||||
{
|
||||
@@ -27,11 +27,13 @@ public class AuditCommandsScaffoldTests
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Audit_HasThreeSubcommands_QueryExportVerifyChain()
|
||||
public void Audit_HasFiveSubcommands_QueryExportTreeVerifyChainBackfillSourceNode()
|
||||
{
|
||||
var audit = BuildAudit();
|
||||
var names = audit.Subcommands.Select(c => c.Name).OrderBy(n => n).ToArray();
|
||||
Assert.Equal(new[] { "export", "query", "verify-chain" }, names);
|
||||
Assert.Equal(
|
||||
new[] { "backfill-source-node", "export", "query", "tree", "verify-chain" },
|
||||
names);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
@@ -48,7 +50,9 @@ public class AuditCommandsScaffoldTests
|
||||
var text = output.ToString();
|
||||
Assert.Contains("query", text);
|
||||
Assert.Contains("export", text);
|
||||
Assert.Contains("tree", text);
|
||||
Assert.Contains("verify-chain", text);
|
||||
Assert.Contains("backfill-source-node", text);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
|
||||
@@ -0,0 +1,346 @@
|
||||
using System.CommandLine;
|
||||
using System.Net;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
using ZB.MOM.WW.ScadaBridge.CLI;
|
||||
using ZB.MOM.WW.ScadaBridge.CLI.Commands;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.CLI.Tests.Commands;
|
||||
|
||||
/// <summary>
|
||||
/// Tests for the <c>scadabridge audit tree</c> subcommand (Audit Log #23 M5.1-T8):
|
||||
/// tree rendering (table format), JSON output, error handling, and CLI parsing.
|
||||
/// </summary>
|
||||
[Collection("Console")]
|
||||
public class AuditTreeCommandTests
|
||||
{
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// JSON parsing helpers
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
private static string NodeJson(
|
||||
string executionId,
|
||||
string? parentId = null,
|
||||
int rowCount = 3,
|
||||
string[]? channels = null,
|
||||
string[]? statuses = null,
|
||||
string? siteId = "plant-a",
|
||||
string? instanceId = "inst-1",
|
||||
string? first = "2026-05-20T10:00:00Z",
|
||||
string? last = "2026-05-20T10:01:00Z")
|
||||
{
|
||||
var parentStr = parentId != null ? $"\"{parentId}\"" : "null";
|
||||
var channelArr = channels is { Length: > 0 }
|
||||
? "[" + string.Join(",", channels.Select(c => $"\"{c}\"")) + "]"
|
||||
: "[\"ApiOutbound\"]";
|
||||
var statusArr = statuses is { Length: > 0 }
|
||||
? "[" + string.Join(",", statuses.Select(s => $"\"{s}\"")) + "]"
|
||||
: "[\"Delivered\"]";
|
||||
var siteStr = siteId != null ? $"\"{siteId}\"" : "null";
|
||||
var instanceStr = instanceId != null ? $"\"{instanceId}\"" : "null";
|
||||
var firstStr = first != null ? $"\"{first}\"" : "null";
|
||||
var lastStr = last != null ? $"\"{last}\"" : "null";
|
||||
|
||||
return $@"{{
|
||||
""executionId"":""{executionId}"",
|
||||
""parentExecutionId"":{parentStr},
|
||||
""rowCount"":{rowCount},
|
||||
""channels"":{channelArr},
|
||||
""statuses"":{statusArr},
|
||||
""sourceSiteId"":{siteStr},
|
||||
""sourceInstanceId"":{instanceStr},
|
||||
""firstOccurredAtUtc"":{firstStr},
|
||||
""lastOccurredAtUtc"":{lastStr}
|
||||
}}";
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// ParseNodes
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void ParseNodes_ValidArray_ReturnsDtos()
|
||||
{
|
||||
var root = "11111111-1111-1111-1111-111111111111";
|
||||
var child = "22222222-2222-2222-2222-222222222222";
|
||||
var json = $"[{NodeJson(root)},{NodeJson(child, parentId: root)}]";
|
||||
|
||||
var nodes = AuditTreeHelpers.ParseNodes(json);
|
||||
|
||||
Assert.Equal(2, nodes.Length);
|
||||
Assert.Equal(Guid.Parse(root), nodes[0].ExecutionId);
|
||||
Assert.Null(nodes[0].ParentExecutionId);
|
||||
Assert.Equal(Guid.Parse(child), nodes[1].ExecutionId);
|
||||
Assert.Equal(Guid.Parse(root), nodes[1].ParentExecutionId);
|
||||
Assert.Equal(3, nodes[0].RowCount);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ParseNodes_EmptyArray_ReturnsEmpty()
|
||||
{
|
||||
var nodes = AuditTreeHelpers.ParseNodes("[]");
|
||||
Assert.Empty(nodes);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ParseNodes_InvalidJson_ReturnsEmpty()
|
||||
{
|
||||
var nodes = AuditTreeHelpers.ParseNodes("not-json");
|
||||
Assert.Empty(nodes);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// WriteTable — ASCII tree rendering
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void WriteTable_EmptyNodes_PrintsFallbackMessage()
|
||||
{
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteTable(Array.Empty<AuditTreeNodeDto>(), Guid.NewGuid(), output);
|
||||
Assert.Contains("no execution tree found", output.ToString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WriteTable_SingleRootNode_PrintsWithNoIndent()
|
||||
{
|
||||
var rootId = Guid.Parse("11111111-1111-1111-1111-111111111111");
|
||||
var nodes = AuditTreeHelpers.ParseNodes($"[{NodeJson(rootId.ToString())}]");
|
||||
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteTable(nodes, rootId, output);
|
||||
var text = output.ToString();
|
||||
|
||||
// Root node printed at column 0 (no leading spaces).
|
||||
var line = text.Split('\n', StringSplitOptions.RemoveEmptyEntries).First();
|
||||
Assert.StartsWith(rootId.ToString("D"), line);
|
||||
Assert.Contains("[*]", line); // queried node marked
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WriteTable_MultiLevelTree_IndentsChildrenCorrectly()
|
||||
{
|
||||
var rootId = "11111111-1111-1111-1111-111111111111";
|
||||
var childId = "22222222-2222-2222-2222-222222222222";
|
||||
var grandChildId = "33333333-3333-3333-3333-333333333333";
|
||||
var json = $"[{NodeJson(rootId)},{NodeJson(childId, parentId: rootId)},{NodeJson(grandChildId, parentId: childId)}]";
|
||||
var nodes = AuditTreeHelpers.ParseNodes(json);
|
||||
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteTable(nodes, Guid.Parse(rootId), output);
|
||||
var lines = output.ToString().Split('\n', StringSplitOptions.RemoveEmptyEntries);
|
||||
|
||||
// Root: no indent.
|
||||
Assert.True(lines[0].StartsWith(rootId, StringComparison.OrdinalIgnoreCase) ||
|
||||
lines[0].StartsWith(rootId.ToUpper(), StringComparison.OrdinalIgnoreCase));
|
||||
|
||||
// Child: 2-space indent (exactly 2, not 4+).
|
||||
var childLine = lines.First(l => l.Contains(childId));
|
||||
Assert.StartsWith(" ", childLine);
|
||||
Assert.False(childLine.StartsWith(" ", StringComparison.Ordinal), "child should be indented exactly 2, not 4+");
|
||||
|
||||
// Grandchild: 4-space indent.
|
||||
var grandLine = lines.First(l => l.Contains(grandChildId));
|
||||
Assert.StartsWith(" ", grandLine);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WriteTable_QueriedNodeIsMarked_OthersAreNot()
|
||||
{
|
||||
var rootId = Guid.Parse("11111111-1111-1111-1111-111111111111");
|
||||
var childId = Guid.Parse("22222222-2222-2222-2222-222222222222");
|
||||
var json = $"[{NodeJson(rootId.ToString())},{NodeJson(childId.ToString(), parentId: rootId.ToString())}]";
|
||||
var nodes = AuditTreeHelpers.ParseNodes(json);
|
||||
|
||||
// Query via child ID — child should be marked, root should not.
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteTable(nodes, childId, output);
|
||||
var lines = output.ToString().Split('\n', StringSplitOptions.RemoveEmptyEntries);
|
||||
|
||||
var childLine = lines.First(l => l.Contains(childId.ToString("D")));
|
||||
var rootLine = lines.First(l => l.Contains(rootId.ToString("D")));
|
||||
Assert.Contains("[*]", childLine);
|
||||
Assert.DoesNotContain("[*]", rootLine);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// WriteJson
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void WriteJson_ValidNodes_EmitsValidJsonArray()
|
||||
{
|
||||
var rootId = "11111111-1111-1111-1111-111111111111";
|
||||
var childId = "22222222-2222-2222-2222-222222222222";
|
||||
var nodes = AuditTreeHelpers.ParseNodes($"[{NodeJson(rootId)},{NodeJson(childId, parentId: rootId)}]");
|
||||
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteJson(nodes, output);
|
||||
var text = output.ToString();
|
||||
|
||||
using var doc = JsonDocument.Parse(text);
|
||||
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
|
||||
Assert.Equal(2, doc.RootElement.GetArrayLength());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WriteJson_EmptyNodes_EmitsEmptyArray()
|
||||
{
|
||||
var output = new StringWriter();
|
||||
AuditTreeHelpers.WriteJson(Array.Empty<AuditTreeNodeDto>(), output);
|
||||
var text = output.ToString().Trim();
|
||||
|
||||
using var doc = JsonDocument.Parse(text);
|
||||
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
|
||||
Assert.Equal(0, doc.RootElement.GetArrayLength());
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// RunTreeAsync — HTTP execution
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
private sealed class FixedHandler : HttpMessageHandler
|
||||
{
|
||||
private readonly HttpStatusCode _status;
|
||||
private readonly string _body;
|
||||
|
||||
public FixedHandler(HttpStatusCode status, string body)
|
||||
{
|
||||
_status = status;
|
||||
_body = body;
|
||||
}
|
||||
|
||||
public string? LastRequestUri { get; private set; }
|
||||
|
||||
protected override Task<HttpResponseMessage> SendAsync(
|
||||
HttpRequestMessage request, CancellationToken cancellationToken)
|
||||
{
|
||||
LastRequestUri = request.RequestUri!.PathAndQuery;
|
||||
return Task.FromResult(new HttpResponseMessage(_status)
|
||||
{
|
||||
Content = new StringContent(_body, Encoding.UTF8, "application/json"),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_Success_ReturnsZeroAndWritesOutput()
|
||||
{
|
||||
var rootId = "11111111-1111-1111-1111-111111111111";
|
||||
var json = $"[{NodeJson(rootId)}]";
|
||||
var handler = new FixedHandler(HttpStatusCode.OK, json);
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditTreeHelpers.RunTreeAsync(
|
||||
client, Guid.Parse(rootId), "table", output);
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
Assert.Contains(rootId, output.ToString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_EmptyResponse_ReturnsZeroWithFallbackMessage()
|
||||
{
|
||||
var handler = new FixedHandler(HttpStatusCode.OK, "[]");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditTreeHelpers.RunTreeAsync(
|
||||
client, Guid.NewGuid(), "table", output);
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
Assert.Contains("no execution tree found", output.ToString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_JsonFormat_EmitsValidJson()
|
||||
{
|
||||
var rootId = "11111111-1111-1111-1111-111111111111";
|
||||
var handler = new FixedHandler(HttpStatusCode.OK, $"[{NodeJson(rootId)}]");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditTreeHelpers.RunTreeAsync(
|
||||
client, Guid.Parse(rootId), "json", output);
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
using var doc = JsonDocument.Parse(output.ToString());
|
||||
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_Http403_ReturnsExitCode2()
|
||||
{
|
||||
var handler = new FixedHandler(HttpStatusCode.Forbidden, "{\"error\":\"nope\",\"code\":\"UNAUTHORIZED\"}");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditTreeHelpers.RunTreeAsync(
|
||||
client, Guid.NewGuid(), "table", output);
|
||||
|
||||
Assert.Equal(2, exit);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_Http500_ReturnsExitCode1()
|
||||
{
|
||||
var handler = new FixedHandler(HttpStatusCode.InternalServerError, "{\"error\":\"boom\",\"code\":\"INTERNAL\"}");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
var exit = await AuditTreeHelpers.RunTreeAsync(
|
||||
client, Guid.NewGuid(), "table", output);
|
||||
|
||||
Assert.Equal(1, exit);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RunTree_RequestUrlContainsExecutionId()
|
||||
{
|
||||
var id = Guid.Parse("11111111-1111-1111-1111-111111111111");
|
||||
var handler = new FixedHandler(HttpStatusCode.OK, "[]");
|
||||
var client = new ManagementHttpClient(new HttpClient(handler), "http://localhost:9001", "u", "p");
|
||||
var output = new StringWriter();
|
||||
|
||||
await AuditTreeHelpers.RunTreeAsync(client, id, "table", output);
|
||||
|
||||
Assert.Contains("11111111-1111-1111-1111-111111111111", handler.LastRequestUri);
|
||||
Assert.Contains("executionId", handler.LastRequestUri);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// CLI parsing — audit tree subcommand
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
[Fact]
|
||||
public void Tree_Subcommand_ExistsInAuditCommandGroup()
|
||||
{
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var parse = root.Parse(new[] { "audit", "tree", "--help" });
|
||||
// --help is never an error, exit 0.
|
||||
Assert.Empty(parse.Errors);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Tree_ExecutionIdOption_IsRequired()
|
||||
{
|
||||
// Invoking without --execution-id must produce an error (the option is Required).
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var (exit, _, err) = AuditCommandTestHarness.Invoke(root, "audit", "tree");
|
||||
// System.CommandLine returns non-zero for a missing required option.
|
||||
Assert.NotEqual(0, exit);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Tree_HelpText_DescribesExecutionId()
|
||||
{
|
||||
var root = AuditCommandTestHarness.BuildRoot();
|
||||
var output = new StringWriter();
|
||||
var exit = root.Parse(new[] { "audit", "tree", "--help" })
|
||||
.Invoke(new InvocationConfiguration { Output = output });
|
||||
|
||||
Assert.Equal(0, exit);
|
||||
Assert.Contains("execution-id", output.ToString());
|
||||
}
|
||||
}
|
||||
@@ -13,6 +13,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Notification;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Communication;
|
||||
using ZB.MOM.WW.ScadaBridge.HealthMonitoring;
|
||||
using HealthPage = ZB.MOM.WW.ScadaBridge.CentralUI.Components.Pages.Monitoring.Health;
|
||||
@@ -232,13 +233,18 @@ public class HealthPageTests : BunitContext
|
||||
|
||||
/// <summary>
|
||||
/// Stand-in for the Site Call Audit actor. Replies to the KPI request with
|
||||
/// the test's currently-scripted response.
|
||||
/// the test's currently-scripted response. Also handles the per-node KPI
|
||||
/// request (T6: M5.2) with an empty-nodes success reply so the Health page
|
||||
/// can complete initialization without a 30-second Ask timeout.
|
||||
/// </summary>
|
||||
private sealed class ScriptedSiteCallAuditActor : ReceiveActor
|
||||
{
|
||||
public ScriptedSiteCallAuditActor(HealthPageTests test)
|
||||
{
|
||||
Receive<SiteCallKpiRequest>(_ => Sender.Tell(test._siteCallKpiReply));
|
||||
Receive<PerNodeSiteCallKpiRequest>(req => Sender.Tell(
|
||||
new PerNodeSiteCallKpiResponse(req.CorrelationId, Success: true, ErrorMessage: null,
|
||||
Nodes: Array.Empty<SiteCallNodeKpiSnapshot>())));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -153,7 +153,9 @@ public class NotificationKpisPageTests : BunitContext
|
||||
|
||||
/// <summary>
|
||||
/// Stand-in for the notification-outbox actor. Replies to each KPI message
|
||||
/// type with the test's currently-scripted response.
|
||||
/// type with the test's currently-scripted response. Also handles the per-node
|
||||
/// KPI request (T6: M5.2) with an empty-nodes success reply so the page can
|
||||
/// complete initialization without a 30-second Ask timeout.
|
||||
/// </summary>
|
||||
private sealed class ScriptedOutboxActor : ReceiveActor
|
||||
{
|
||||
@@ -161,6 +163,9 @@ public class NotificationKpisPageTests : BunitContext
|
||||
{
|
||||
Receive<NotificationKpiRequest>(_ => Sender.Tell(test._kpiReply));
|
||||
Receive<PerSiteNotificationKpiRequest>(_ => Sender.Tell(test._perSiteReply));
|
||||
Receive<PerNodeNotificationKpiRequest>(req => Sender.Tell(
|
||||
new PerNodeNotificationKpiResponse(req.CorrelationId, Success: true, ErrorMessage: null,
|
||||
Nodes: Array.Empty<NodeNotificationKpiSnapshot>())));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
+154
-4
@@ -31,9 +31,40 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
|
||||
/// targeting the AuditLog entity are NOT covered and must never be introduced.
|
||||
/// Additionally, the scan is line-oriented: DML where the keyword and table name appear
|
||||
/// on separate lines is an accepted, undetected edge case.
|
||||
///
|
||||
/// <b>Allow-list.</b> Two narrow maintenance-path exemptions carry the exact
|
||||
/// <see cref="AuditPurgeAllowedMarker"/> trailing comment:
|
||||
/// <list type="bullet">
|
||||
/// <item><description>
|
||||
/// M5.5 (T3) — <c>AuditLogRepository.PurgeChannelOlderThanAsync</c>: the
|
||||
/// one sanctioned batched <c>DELETE TOP (@batch) FROM dbo.AuditLog</c>,
|
||||
/// running on the purge/maintenance connection.
|
||||
/// </description></item>
|
||||
/// <item><description>
|
||||
/// M5.6 (T5) — <c>AuditLogRepository.BackfillSourceNodeAsync</c>: the
|
||||
/// one sanctioned batched <c>UPDATE TOP (@batch) dbo.AuditLog SET SourceNode</c>,
|
||||
/// running on the maintenance connection. The sentinel backfill is a
|
||||
/// one-time ops procedure; the append-only invariant still applies to all
|
||||
/// other columns and all other UPDATE forms.
|
||||
/// </description></item>
|
||||
/// </list>
|
||||
/// The allow-list is applied in the file-scan test only
|
||||
/// (<see cref="ConfigurationDatabase_ShouldNotContainAuditLogMutations"/>) — the
|
||||
/// raw mutation matcher (<see cref="ContainsAuditLogMutation"/>) is marker-blind,
|
||||
/// so the matcher's self-tests remain honest and any OTHER UPDATE/DELETE against
|
||||
/// AuditLog (or any DML lacking the marker) still fails the build.
|
||||
/// </summary>
|
||||
public class AuditLogAppendOnlyGuardTests
|
||||
{
|
||||
/// <summary>
|
||||
/// The exact trailing-comment marker that exempts a single sanctioned
|
||||
/// maintenance-path DML line from the append-only guard. Carried at the END of
|
||||
/// the SQL constant string in both <c>AuditLogRepository.PurgeChannelOlderThanAsync</c>
|
||||
/// (M5.5 T3 batched DELETE) and <c>AuditLogRepository.BackfillSourceNodeAsync</c>
|
||||
/// (M5.6 T5 batched UPDATE). Kept deliberately specific so it cannot be pasted
|
||||
/// onto an unrelated mutation without a reviewer noticing.
|
||||
/// </summary>
|
||||
internal const string AuditPurgeAllowedMarker = "AUDIT-PURGE-ALLOWED";
|
||||
// ---------------------------------------------------------------------------
|
||||
// Source root location — same walk-up pattern used by ArchitecturalConstraintTests
|
||||
// in the Commons.Tests project.
|
||||
@@ -133,11 +164,38 @@ public class AuditLogAppendOnlyGuardTests
|
||||
return AuditLogMutationPattern.IsMatch(text);
|
||||
}
|
||||
|
||||
// The DELETE branch tolerates an optional TOP (...) batch-size clause between
|
||||
// DELETE and the (optional) FROM — e.g. "DELETE TOP (@batch) FROM dbo.AuditLog"
|
||||
// (the M5.5 T3 batched purge shape). Without this the guard would silently miss a
|
||||
// batched row DELETE against AuditLog, which is exactly the kind of mutation it
|
||||
// must catch. The TOP sub-pattern is (?:TOP\s*\(.*?\)\s+)? — optional, lazy inside
|
||||
// the parens so it never swallows past the matching ')'.
|
||||
//
|
||||
// The UPDATE branch similarly tolerates an optional TOP (...) clause between
|
||||
// UPDATE and (optional schema.) AuditLog — e.g.
|
||||
// "UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel …"
|
||||
// (the M5.6 T5 batched backfill shape).
|
||||
private static readonly Regex AuditLogMutationPattern = new(
|
||||
@"\bUPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
|
||||
@"|\bDELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
|
||||
@"\bUPDATE\s+(?:TOP\s*\(.*?\)\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
|
||||
@"|\bDELETE\s+(?:TOP\s*\(.*?\)\s+)?(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
|
||||
RegexOptions.IgnoreCase | RegexOptions.Compiled);
|
||||
|
||||
/// <summary>
|
||||
/// Returns <see langword="true"/> when <paramref name="line"/> carries the narrow
|
||||
/// <see cref="AuditPurgeAllowedMarker"/> exemption. Sanctioned uses are:
|
||||
/// <list type="bullet">
|
||||
/// <item><description>M5.5 T3 — the per-channel maintenance-path batched DELETE.</description></item>
|
||||
/// <item><description>M5.6 T5 — the SourceNode sentinel batched UPDATE.</description></item>
|
||||
/// </list>
|
||||
/// A flagged line that lacks the marker is NOT allow-listed. The mutation matcher
|
||||
/// itself stays marker-blind; the allow-list is applied only by the file-scan test,
|
||||
/// so the matcher's self-tests still observe the raw mutation.
|
||||
/// </summary>
|
||||
/// <param name="line">A single source line already known to contain a mutation.</param>
|
||||
/// <returns><see langword="true"/> if the line is a sanctioned maintenance-path exemption.</returns>
|
||||
internal static bool IsAllowListed(string line) =>
|
||||
line.Contains(AuditPurgeAllowedMarker, StringComparison.Ordinal);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Guard test: scan every *.cs file in ConfigurationDatabase (excluding
|
||||
// Designer/Snapshot EF artefacts and the obj/ directory).
|
||||
@@ -168,7 +226,7 @@ public class AuditLogAppendOnlyGuardTests
|
||||
var lines = content.Split('\n');
|
||||
for (var i = 0; i < lines.Length; i++)
|
||||
{
|
||||
if (ContainsAuditLogMutation(lines[i]))
|
||||
if (ContainsAuditLogMutation(lines[i]) && !IsAllowListed(lines[i]))
|
||||
{
|
||||
var relativePath = Path.GetRelativePath(sourceDir, file);
|
||||
violations.Add($"{relativePath}:{i + 1}: {lines[i].Trim()}");
|
||||
@@ -179,7 +237,7 @@ public class AuditLogAppendOnlyGuardTests
|
||||
Assert.True(violations.Count == 0,
|
||||
"AuditLog append-only guard: found UPDATE/DELETE targeting dbo.AuditLog " +
|
||||
"in ConfigurationDatabase source. AuditLog is APPEND-ONLY (retention uses " +
|
||||
"partition-switch DDL, not row DELETE). Violation(s):\n" +
|
||||
"partition-switch DDL, not row DELETE/UPDATE). Violation(s):\n" +
|
||||
string.Join("\n", violations));
|
||||
}
|
||||
|
||||
@@ -285,6 +343,27 @@ public class AuditLogAppendOnlyGuardTests
|
||||
// DELETE FROM [AuditLog] — bracketed table, no schema prefix.
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"DELETE FROM [AuditLog] WHERE OccurredAtUtc < @threshold;"));
|
||||
|
||||
// ---- Batched DELETE TOP (...) forms (M5.5 T3 purge shape) ----
|
||||
// The matcher must catch a batched DELETE against AuditLog regardless of the
|
||||
// marker — the allow-list (IsAllowListed) is what forgives the ONE sanctioned
|
||||
// line, not the matcher.
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;"));
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"DELETE TOP (5000) FROM dbo.AuditLog WHERE OccurredAtUtc < @threshold;"));
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"DELETE TOP(100) FROM [dbo].[AuditLog] WHERE Status = 'Parked';"));
|
||||
|
||||
// ---- Batched UPDATE TOP (...) forms (M5.6 T5 backfill shape) ----
|
||||
// The matcher must also catch a batched UPDATE against AuditLog, regardless of
|
||||
// the marker — the allow-list is what forgives the ONE sanctioned backfill line.
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;"));
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"UPDATE TOP (500) dbo.AuditLog SET SourceNode = 'unknown' WHERE SourceNode IS NULL;"));
|
||||
Assert.True(ContainsAuditLogMutation(
|
||||
"UPDATE TOP(100) [dbo].[AuditLog] SET SourceNode = @s WHERE SourceNode IS NULL;"));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
@@ -315,4 +394,75 @@ public class AuditLogAppendOnlyGuardTests
|
||||
Assert.False(ContainsAuditLogMutation(
|
||||
"DELETE FROM dbo.SiteCalls WHERE TerminalAtUtc < @cutoff;"));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Allow-list self-tests (M5.5 T3 / M5.6 T5) — prove the narrow exemption only
|
||||
// forgives the marked maintenance-path DML and still blocks everything else.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void AllowList_ForgivesMarkedPurgeDelete_ButMatcherStillTrips()
|
||||
{
|
||||
// The sanctioned per-channel purge DELETE — verbatim shape from
|
||||
// AuditLogRepository.PurgeChannelOlderThanAsync, carrying the trailing marker.
|
||||
const string sanctioned =
|
||||
"\"DELETE TOP (@batch) FROM dbo.AuditLog WHERE Category = @channel AND OccurredAtUtc < @threshold;\"; " +
|
||||
"// AUDIT-PURGE-ALLOWED: per-channel retention override (M5.5 T3), maintenance path";
|
||||
|
||||
// The raw matcher STILL sees the mutation (the matcher is marker-blind) ...
|
||||
Assert.True(ContainsAuditLogMutation(sanctioned));
|
||||
// ... but the allow-list forgives it because of the trailing marker.
|
||||
Assert.True(IsAllowListed(sanctioned));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AllowList_ForgivesMarkedBackfillUpdate_ButMatcherStillTrips()
|
||||
{
|
||||
// The sanctioned SourceNode sentinel backfill UPDATE — verbatim shape from
|
||||
// AuditLogRepository.BackfillSourceNodeAsync, carrying the trailing marker.
|
||||
const string sanctioned =
|
||||
"\"UPDATE TOP (@batch) dbo.AuditLog SET SourceNode = @sentinel WHERE SourceNode IS NULL AND OccurredAtUtc < @before;\"; " +
|
||||
"// AUDIT-PURGE-ALLOWED: SourceNode sentinel backfill (M5.6 T5), maintenance path";
|
||||
|
||||
// The raw matcher STILL sees the mutation (the matcher is marker-blind) ...
|
||||
Assert.True(ContainsAuditLogMutation(sanctioned));
|
||||
// ... but the allow-list forgives it because of the trailing marker.
|
||||
Assert.True(IsAllowListed(sanctioned));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AllowList_DoesNotForgive_UnmarkedStrayDelete()
|
||||
{
|
||||
// A stray DELETE against AuditLog WITHOUT the marker — exactly the kind of
|
||||
// regression the guard exists to catch. It must be flagged (matcher) AND not
|
||||
// forgiven (allow-list), so the file-scan test would record it as a violation.
|
||||
const string stray = "DELETE FROM dbo.AuditLog WHERE Status = 'Parked';";
|
||||
|
||||
Assert.True(ContainsAuditLogMutation(stray));
|
||||
Assert.False(IsAllowListed(stray),
|
||||
"A DELETE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AllowList_DoesNotForgive_UnmarkedStrayUpdate()
|
||||
{
|
||||
// A stray UPDATE against AuditLog WITHOUT the marker — must still trip the guard.
|
||||
const string stray = "UPDATE dbo.AuditLog SET Status = 'Corrected' WHERE EventId = @id;";
|
||||
|
||||
Assert.True(ContainsAuditLogMutation(stray));
|
||||
Assert.False(IsAllowListed(stray),
|
||||
"An UPDATE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AllowList_DoesNotForgive_BatchedUpdateWithoutMarker()
|
||||
{
|
||||
// A batched UPDATE TOP ... AuditLog without the marker — the TOP clause variant
|
||||
// must also be caught and not forgiven without the explicit marker.
|
||||
const string stray = "UPDATE TOP (500) dbo.AuditLog SET SourceNode = 'unknown' WHERE SourceNode IS NULL;";
|
||||
|
||||
Assert.True(ContainsAuditLogMutation(stray));
|
||||
Assert.False(IsAllowListed(stray),
|
||||
"A batched UPDATE against AuditLog without the AUDIT-PURGE-ALLOWED marker must NOT be allow-listed.");
|
||||
}
|
||||
}
|
||||
|
||||
+237
@@ -0,0 +1,237 @@
|
||||
using Microsoft.Data.SqlClient;
|
||||
using Microsoft.EntityFrameworkCore;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
|
||||
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Repositories;
|
||||
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests.Migrations;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests.Maintenance;
|
||||
|
||||
/// <summary>
|
||||
/// Integration tests for <see cref="AuditLogRepository.BackfillSourceNodeAsync"/>
|
||||
/// (M5.6 T5 — SourceNode sentinel backfill).
|
||||
///
|
||||
/// <para>
|
||||
/// These tests exercise the real <see cref="AuditLogRepository"/> against a
|
||||
/// per-class <see cref="MsSqlMigrationFixture"/> database, mirroring the
|
||||
/// style of <c>PartitionPurgeTests</c>. All tests are guarded with
|
||||
/// <c>[SkippableFact]</c> and skipped when the MSSQL container is absent.
|
||||
/// </para>
|
||||
/// </summary>
|
||||
public class BackfillSourceNodeTests : IClassFixture<MsSqlMigrationFixture>
|
||||
{
|
||||
private readonly MsSqlMigrationFixture _fixture;
|
||||
|
||||
public BackfillSourceNodeTests(MsSqlMigrationFixture fixture)
|
||||
{
|
||||
_fixture = fixture;
|
||||
}
|
||||
|
||||
private ScadaBridgeDbContext CreateContext() =>
|
||||
new(new DbContextOptionsBuilder<ScadaBridgeDbContext>()
|
||||
.UseSqlServer(_fixture.ConnectionString).Options);
|
||||
|
||||
private AuditLogRepository CreateRepo(ScadaBridgeDbContext ctx) => new(ctx);
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// Seed helper: direct INSERT bypassing the writer role, same pattern
|
||||
// as PartitionPurgeTests.DirectInsertAsync.
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
private async Task SeedRowAsync(
|
||||
SqlConnection conn,
|
||||
Guid eventId,
|
||||
DateTime occurredAtUtc,
|
||||
string? sourceNode)
|
||||
{
|
||||
await using var cmd = conn.CreateCommand();
|
||||
// Supply SourceNode explicitly (NULL or a value) so the test controls
|
||||
// which rows are eligible for backfill.
|
||||
cmd.CommandText = @"
|
||||
INSERT INTO dbo.AuditLog
|
||||
(EventId, OccurredAtUtc, Actor, Action, Outcome, Category, Target, SourceNode, CorrelationId, DetailsJson)
|
||||
VALUES
|
||||
(@EventId, @OccurredAtUtc, NULL, 'ApiOutbound.ApiCall', 'Success', 'ApiOutbound', NULL, @SourceNode, NULL,
|
||||
@DetailsJson);";
|
||||
|
||||
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
|
||||
|
||||
var occurredParam = cmd.Parameters.Add("@OccurredAtUtc", System.Data.SqlDbType.DateTime2);
|
||||
occurredParam.Scale = 7;
|
||||
occurredParam.Value = occurredAtUtc;
|
||||
|
||||
var sourceNodeParam = cmd.Parameters.Add("@SourceNode", System.Data.SqlDbType.VarChar, 64);
|
||||
sourceNodeParam.Value = (object?)sourceNode ?? DBNull.Value;
|
||||
|
||||
var detailsJson =
|
||||
"{\"channel\":\"ApiOutbound\",\"kind\":\"ApiCall\",\"status\":\"Delivered\"," +
|
||||
"\"payloadTruncated\":false}";
|
||||
cmd.Parameters.Add("@DetailsJson", System.Data.SqlDbType.NVarChar, -1).Value = detailsJson;
|
||||
|
||||
await cmd.ExecuteNonQueryAsync();
|
||||
}
|
||||
|
||||
private async Task<string?> ReadSourceNodeAsync(SqlConnection conn, Guid eventId)
|
||||
{
|
||||
await using var cmd = conn.CreateCommand();
|
||||
cmd.CommandText = "SELECT SourceNode FROM dbo.AuditLog WHERE EventId = @EventId;";
|
||||
cmd.Parameters.Add("@EventId", System.Data.SqlDbType.UniqueIdentifier).Value = eventId;
|
||||
var raw = await cmd.ExecuteScalarAsync();
|
||||
return raw == DBNull.Value ? null : (string?)raw;
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 1. SetsNullRowsBeforeThreshold
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
[SkippableFact]
|
||||
public async Task BackfillSourceNode_SetsNullRowsBeforeThreshold()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var eligibleId = Guid.NewGuid(); // NULL, occurred before threshold
|
||||
var tooNewId = Guid.NewGuid(); // NULL, occurred after threshold
|
||||
|
||||
await using var seedConn = _fixture.OpenConnection();
|
||||
await SeedRowAsync(seedConn, eligibleId,
|
||||
new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
|
||||
await SeedRowAsync(seedConn, tooNewId,
|
||||
new DateTime(2026, 4, 1, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
|
||||
|
||||
await using var ctx = CreateContext();
|
||||
var repo = CreateRepo(ctx);
|
||||
|
||||
var rows = await repo.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
|
||||
|
||||
Assert.True(rows >= 1, $"Expected at least 1 row updated; got {rows}.");
|
||||
|
||||
// eligible row: must now have the sentinel
|
||||
var eligibleNode = await ReadSourceNodeAsync(seedConn, eligibleId);
|
||||
Assert.Equal("unknown", eligibleNode);
|
||||
|
||||
// too-new row: must still be NULL
|
||||
var tooNewNode = await ReadSourceNodeAsync(seedConn, tooNewId);
|
||||
Assert.Null(tooNewNode);
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 2. LeavesNonNullRowsUntouched
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
[SkippableFact]
|
||||
public async Task BackfillSourceNode_LeavesNonNullRowsUntouched()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var alreadySetId = Guid.NewGuid(); // already has a SourceNode value
|
||||
|
||||
await using var seedConn = _fixture.OpenConnection();
|
||||
await SeedRowAsync(seedConn, alreadySetId,
|
||||
new DateTime(2026, 1, 10, 0, 0, 0, DateTimeKind.Utc), sourceNode: "node-a");
|
||||
|
||||
await using var ctx = CreateContext();
|
||||
var repo = CreateRepo(ctx);
|
||||
|
||||
await repo.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
|
||||
|
||||
// "node-a" must still be "node-a", not overwritten
|
||||
var node = await ReadSourceNodeAsync(seedConn, alreadySetId);
|
||||
Assert.Equal("node-a", node);
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 3. Idempotent_SecondRunUpdatesZeroRows
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
[SkippableFact]
|
||||
public async Task BackfillSourceNode_Idempotent_SecondRunUpdatesZeroRows()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var before = new DateTime(2026, 3, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var idempotentId = Guid.NewGuid();
|
||||
|
||||
await using var seedConn = _fixture.OpenConnection();
|
||||
await SeedRowAsync(seedConn, idempotentId,
|
||||
new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
|
||||
|
||||
await using var ctx1 = CreateContext();
|
||||
var repo1 = CreateRepo(ctx1);
|
||||
var firstRun = await repo1.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
|
||||
Assert.True(firstRun >= 1, "First run should update at least 1 row.");
|
||||
|
||||
// Second run: no NULL rows remain for this threshold — must update 0.
|
||||
await using var ctx2 = CreateContext();
|
||||
var repo2 = CreateRepo(ctx2);
|
||||
var secondRun = await repo2.BackfillSourceNodeAsync("unknown", before, batchSize: 1000);
|
||||
// The second run must not update the already-sentinel row again.
|
||||
// We cannot assert exactly 0 because other tests share the same fixture DB
|
||||
// and may have left unrelated NULL rows; but the idempotentId row must not
|
||||
// have been touched (it already has "unknown", so the WHERE SourceNode IS NULL
|
||||
// filter excludes it).
|
||||
var node = await ReadSourceNodeAsync(seedConn, idempotentId);
|
||||
Assert.Equal("unknown", node);
|
||||
// The second run returning 0 would be true if no other NULL rows exist —
|
||||
// we assert the contract from the repo's perspective by checking the row.
|
||||
_ = secondRun; // acknowledged: value consumed
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 4. CustomSentinelIsWritten
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
[SkippableFact]
|
||||
public async Task BackfillSourceNode_CustomSentinel_IsWritten()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var before = new DateTime(2026, 6, 1, 0, 0, 0, DateTimeKind.Utc);
|
||||
var customId = Guid.NewGuid();
|
||||
|
||||
await using var seedConn = _fixture.OpenConnection();
|
||||
await SeedRowAsync(seedConn, customId,
|
||||
new DateTime(2026, 2, 5, 0, 0, 0, DateTimeKind.Utc), sourceNode: null);
|
||||
|
||||
await using var ctx = CreateContext();
|
||||
var repo = CreateRepo(ctx);
|
||||
|
||||
await repo.BackfillSourceNodeAsync("pre-feature", before, batchSize: 1000);
|
||||
|
||||
var node = await ReadSourceNodeAsync(seedConn, customId);
|
||||
Assert.Equal("pre-feature", node);
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------
|
||||
// 5. ArgumentValidation
|
||||
// ------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_EmptySentinel_Throws()
|
||||
{
|
||||
// Guard fires even without a DB connection — no Skip needed.
|
||||
// Use a null/empty context via a degenerate connection string; the
|
||||
// argument check fires before any SQL runs.
|
||||
await using var ctx = new ScadaBridgeDbContext(
|
||||
new DbContextOptionsBuilder<ScadaBridgeDbContext>()
|
||||
.UseSqlServer("Server=.;Database=dummy;Connect Timeout=0;")
|
||||
.Options);
|
||||
var repo = new AuditLogRepository(ctx);
|
||||
|
||||
await Assert.ThrowsAsync<ArgumentException>(
|
||||
() => repo.BackfillSourceNodeAsync("", DateTime.UtcNow, 1000));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_ZeroBatchSize_Throws()
|
||||
{
|
||||
await using var ctx = new ScadaBridgeDbContext(
|
||||
new DbContextOptionsBuilder<ScadaBridgeDbContext>()
|
||||
.UseSqlServer("Server=.;Database=dummy;Connect Timeout=0;")
|
||||
.Options);
|
||||
var repo = new AuditLogRepository(ctx);
|
||||
|
||||
await Assert.ThrowsAsync<ArgumentOutOfRangeException>(
|
||||
() => repo.BackfillSourceNodeAsync("unknown", DateTime.UtcNow, 0));
|
||||
}
|
||||
}
|
||||
+128
@@ -0,0 +1,128 @@
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Notifications;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
|
||||
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Repositories;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
|
||||
|
||||
// Coverage for per-node KPI aggregation in the Notification Outbox repository
|
||||
// (T6: M5.2 per-node stuck-count KPIs).
|
||||
public class NotificationOutboxRepositoryPerNodeKpiTests
|
||||
{
|
||||
private static ScadaBridgeDbContext NewContext() => SqliteTestHelper.CreateInMemoryContext();
|
||||
|
||||
private static Notification NewNotification(
|
||||
string sourceSiteId,
|
||||
NotificationStatus status,
|
||||
DateTimeOffset createdAt,
|
||||
DateTimeOffset? deliveredAt = null,
|
||||
string? sourceNode = null)
|
||||
{
|
||||
return new Notification(
|
||||
Guid.NewGuid().ToString(), NotificationType.Email, "Ops List", "Subject", "Body", sourceSiteId)
|
||||
{
|
||||
Status = status,
|
||||
CreatedAt = createdAt,
|
||||
DeliveredAt = deliveredAt,
|
||||
SourceNode = sourceNode,
|
||||
};
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ComputePerNodeKpisAsync_AggregatesMetricsPerNode()
|
||||
{
|
||||
await using var ctx = NewContext();
|
||||
var now = DateTimeOffset.UtcNow;
|
||||
|
||||
// node-a: 1 pending (stuck, created 20m ago), 1 parked
|
||||
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
|
||||
createdAt: now.AddMinutes(-20), sourceNode: "node-a"));
|
||||
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Parked,
|
||||
createdAt: now.AddMinutes(-5), sourceNode: "node-a"));
|
||||
// node-b: 1 delivered in-window, 1 pending (fresh)
|
||||
ctx.Notifications.Add(NewNotification("plant-b", NotificationStatus.Delivered,
|
||||
createdAt: now.AddHours(-2), deliveredAt: now.AddMinutes(-2), sourceNode: "node-b"));
|
||||
ctx.Notifications.Add(NewNotification("plant-b", NotificationStatus.Pending,
|
||||
createdAt: now.AddMinutes(-1), sourceNode: "node-b"));
|
||||
// NULL SourceNode — must be excluded from per-node results
|
||||
ctx.Notifications.Add(NewNotification("plant-c", NotificationStatus.Pending,
|
||||
createdAt: now.AddMinutes(-5), sourceNode: null));
|
||||
await ctx.SaveChangesAsync();
|
||||
|
||||
var repo = new NotificationOutboxRepository(ctx);
|
||||
var result = await repo.ComputePerNodeKpisAsync(
|
||||
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
|
||||
|
||||
// Only node-a and node-b — the null-node row is excluded.
|
||||
Assert.Equal(2, result.Count);
|
||||
|
||||
var a = result.Single(n => n.SourceNode == "node-a");
|
||||
Assert.Equal(1, a.QueueDepth);
|
||||
Assert.Equal(1, a.StuckCount);
|
||||
Assert.Equal(1, a.ParkedCount);
|
||||
Assert.Equal(0, a.DeliveredLastInterval);
|
||||
Assert.NotNull(a.OldestPendingAge);
|
||||
|
||||
var b = result.Single(n => n.SourceNode == "node-b");
|
||||
Assert.Equal(1, b.QueueDepth);
|
||||
Assert.Equal(0, b.StuckCount);
|
||||
Assert.Equal(0, b.ParkedCount);
|
||||
Assert.Equal(1, b.DeliveredLastInterval);
|
||||
Assert.NotNull(b.OldestPendingAge);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ComputePerNodeKpisAsync_ExcludesNullSourceNode()
|
||||
{
|
||||
await using var ctx = NewContext();
|
||||
var now = DateTimeOffset.UtcNow;
|
||||
|
||||
// Only null-node rows — result must be empty.
|
||||
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
|
||||
createdAt: now.AddMinutes(-5), sourceNode: null));
|
||||
await ctx.SaveChangesAsync();
|
||||
|
||||
var repo = new NotificationOutboxRepository(ctx);
|
||||
var result = await repo.ComputePerNodeKpisAsync(
|
||||
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
|
||||
|
||||
Assert.Empty(result);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ComputePerNodeKpisAsync_ReturnsEmpty_WhenNoNotifications()
|
||||
{
|
||||
await using var ctx = NewContext();
|
||||
var repo = new NotificationOutboxRepository(ctx);
|
||||
var result = await repo.ComputePerNodeKpisAsync(
|
||||
DateTimeOffset.UtcNow, DateTimeOffset.UtcNow.AddMinutes(-30));
|
||||
Assert.Empty(result);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ComputePerNodeKpisAsync_OldestPendingAge_ReflectsOlderRow()
|
||||
{
|
||||
await using var ctx = NewContext();
|
||||
var now = DateTimeOffset.UtcNow;
|
||||
|
||||
// node-a: pending 90m ago, retrying 40m ago.
|
||||
// OldestPendingAge must reflect the 90m row.
|
||||
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Pending,
|
||||
createdAt: now.AddMinutes(-90), sourceNode: "node-a"));
|
||||
ctx.Notifications.Add(NewNotification("plant-a", NotificationStatus.Retrying,
|
||||
createdAt: now.AddMinutes(-40), sourceNode: "node-a"));
|
||||
await ctx.SaveChangesAsync();
|
||||
|
||||
var repo = new NotificationOutboxRepository(ctx);
|
||||
var result = await repo.ComputePerNodeKpisAsync(
|
||||
stuckCutoff: now.AddMinutes(-10), deliveredSince: now.AddMinutes(-30));
|
||||
|
||||
var a = result.Single(n => n.SourceNode == "node-a");
|
||||
Assert.Equal(2, a.QueueDepth);
|
||||
Assert.Equal(2, a.StuckCount);
|
||||
Assert.NotNull(a.OldestPendingAge);
|
||||
Assert.True(a.OldestPendingAge >= TimeSpan.FromMinutes(85),
|
||||
$"expected OldestPendingAge >= 85m, got {a.OldestPendingAge}");
|
||||
Assert.True(a.OldestPendingAge < TimeSpan.FromMinutes(95),
|
||||
$"expected OldestPendingAge < 95m, got {a.OldestPendingAge}");
|
||||
}
|
||||
}
|
||||
+48
@@ -497,6 +497,54 @@ public class SiteCallAuditRepositoryTests : IClassFixture<MsSqlMigrationFixture>
|
||||
Assert.Null(b.OldestPendingAge);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task ComputePerNodeKpisAsync_ScopesCountsToEachNode()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
// Use unique site + node combos to isolate from other tests running
|
||||
// concurrently on the shared MsSql fixture.
|
||||
var nodeId = "node-b3-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
var nodeB = nodeId + "-b";
|
||||
await using var context = CreateContext();
|
||||
var repo = new SiteCallAuditRepository(context);
|
||||
|
||||
var now = DateTime.UtcNow;
|
||||
var stuckCutoff = now.AddMinutes(-10);
|
||||
var intervalSince = now.AddHours(-1);
|
||||
|
||||
// nodeId: 2 buffered (one stuck), 1 parked.
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
|
||||
createdAtUtc: now.AddMinutes(-30), sourceNode: nodeId));
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
|
||||
createdAtUtc: now.AddMinutes(-2), sourceNode: nodeId));
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Parked",
|
||||
createdAtUtc: now.AddMinutes(-5), terminal: true, sourceNode: nodeId));
|
||||
// nodeB: 1 delivered within interval only.
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Delivered",
|
||||
createdAtUtc: now.AddMinutes(-4), updatedAtUtc: now.AddMinutes(-1),
|
||||
terminal: true, terminalAtUtc: now.AddMinutes(-1), sourceNode: nodeB));
|
||||
// Null SourceNode row — must NOT appear in per-node results.
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), status: "Attempted",
|
||||
createdAtUtc: now.AddMinutes(-3), sourceNode: null));
|
||||
|
||||
var perNode = await repo.ComputePerNodeKpisAsync(stuckCutoff, intervalSince);
|
||||
|
||||
var na = Assert.Single(perNode, n => n.SourceNode == nodeId);
|
||||
Assert.Equal(2, na.BufferedCount);
|
||||
Assert.Equal(1, na.ParkedCount);
|
||||
Assert.Equal(1, na.StuckCount);
|
||||
Assert.NotNull(na.OldestPendingAge);
|
||||
|
||||
var nb = Assert.Single(perNode, n => n.SourceNode == nodeB);
|
||||
Assert.Equal(0, nb.BufferedCount);
|
||||
Assert.Equal(1, nb.DeliveredLastInterval);
|
||||
Assert.Null(nb.OldestPendingAge);
|
||||
|
||||
// Null-node row must be absent.
|
||||
Assert.DoesNotContain(perNode, n => n.SourceNode is null);
|
||||
}
|
||||
|
||||
// --- helpers ------------------------------------------------------------
|
||||
|
||||
private ScadaBridgeDbContext CreateContext()
|
||||
|
||||
@@ -1022,4 +1022,429 @@ public class AuditWriteMiddlewareTests
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.Equal(requestJson, evt.RequestSummary);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// M5.3 (T7) Increment 1: Request headers in Extra JSON
|
||||
// Request headers are captured into the Extra JSON object alongside the
|
||||
// existing remoteIp / userAgent fields. Sensitive headers (e.g.
|
||||
// Authorization, X-Api-Key) are redacted to "<redacted>" using the same
|
||||
// HeaderRedactList as ScadaBridgeAuditRedactor.
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public async Task RequestHeaders_AppearInExtra_UnderRequestHeadersKey()
|
||||
{
|
||||
var writer = new RecordingAuditWriter();
|
||||
var ctx = BuildContext();
|
||||
ctx.Request.Headers["X-Custom-Header"] = "custom-value";
|
||||
|
||||
var mw = CreateMiddleware(_ =>
|
||||
{
|
||||
ctx.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
}, writer);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.NotNull(evt.Extra);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
var root = doc.RootElement;
|
||||
// Extra must carry a requestHeaders object.
|
||||
Assert.True(root.TryGetProperty("requestHeaders", out var headers),
|
||||
"Extra JSON must contain a 'requestHeaders' property");
|
||||
Assert.Equal(JsonValueKind.Object, headers.ValueKind);
|
||||
// The non-sensitive custom header must appear unredacted.
|
||||
Assert.True(headers.TryGetProperty("X-Custom-Header", out var customVal),
|
||||
"requestHeaders must contain 'X-Custom-Header'");
|
||||
Assert.Equal("custom-value", customVal.GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RequestHeaders_AuthorizationHeader_IsRedacted()
|
||||
{
|
||||
// Authorization is in the default HeaderRedactList and must appear as
|
||||
// "<redacted>" rather than the real token value.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var ctx = BuildContext();
|
||||
ctx.Request.Headers["Authorization"] = "Bearer secret-token-abc";
|
||||
|
||||
var mw = CreateMiddleware(_ =>
|
||||
{
|
||||
ctx.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
}, writer);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.NotNull(evt.Extra);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
var root = doc.RootElement;
|
||||
var headers = root.GetProperty("requestHeaders");
|
||||
Assert.True(headers.TryGetProperty("Authorization", out var authVal),
|
||||
"requestHeaders must contain 'Authorization'");
|
||||
Assert.Equal("<redacted>", authVal.GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RequestHeaders_XApiKeyHeader_IsRedacted()
|
||||
{
|
||||
// X-Api-Key is in the default HeaderRedactList and must be redacted.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var ctx = BuildContext();
|
||||
ctx.Request.Headers["X-Api-Key"] = "sbk_12345_secretkey";
|
||||
|
||||
var mw = CreateMiddleware(_ =>
|
||||
{
|
||||
ctx.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
}, writer);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.NotNull(evt.Extra);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
var root = doc.RootElement;
|
||||
var headers = root.GetProperty("requestHeaders");
|
||||
Assert.True(headers.TryGetProperty("X-Api-Key", out var keyVal));
|
||||
Assert.Equal("<redacted>", keyVal.GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RequestHeaders_CustomRedactListEntry_IsRedacted()
|
||||
{
|
||||
// A non-default entry added to HeaderRedactList must also be redacted.
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
HeaderRedactList = new List<string>
|
||||
{
|
||||
"Authorization", "X-Api-Key", "Cookie", "Set-Cookie",
|
||||
"X-Internal-Secret", // custom addition
|
||||
},
|
||||
};
|
||||
var writer = new RecordingAuditWriter();
|
||||
var ctx = BuildContext();
|
||||
ctx.Request.Headers["X-Internal-Secret"] = "my-secret-value";
|
||||
ctx.Request.Headers["X-Safe-Header"] = "safe-value";
|
||||
|
||||
var mw = CreateMiddleware(
|
||||
_ =>
|
||||
{
|
||||
ctx.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
},
|
||||
writer,
|
||||
options: opts);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
var headers = doc.RootElement.GetProperty("requestHeaders");
|
||||
Assert.Equal("<redacted>", headers.GetProperty("X-Internal-Secret").GetString());
|
||||
Assert.Equal("safe-value", headers.GetProperty("X-Safe-Header").GetString());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RequestHeaders_Redaction_IsCaseInsensitive()
|
||||
{
|
||||
// HeaderRedactList match must be case-insensitive (mirrors the
|
||||
// ScadaBridgeAuditRedactor behaviour — the redact set uses
|
||||
// OrdinalIgnoreCase).
|
||||
var writer = new RecordingAuditWriter();
|
||||
var ctx = BuildContext();
|
||||
// Vary the casing from the list entry ("Authorization").
|
||||
ctx.Request.Headers["authorization"] = "Bearer lower-case-token";
|
||||
|
||||
var mw = CreateMiddleware(_ =>
|
||||
{
|
||||
ctx.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
}, writer);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
var headers = doc.RootElement.GetProperty("requestHeaders");
|
||||
// ASP.NET Core normalises the header name to "authorization" in the dict;
|
||||
// the redact set (OrdinalIgnoreCase) must still match it.
|
||||
Assert.Equal("<redacted>", headers.GetProperty("authorization").GetString());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// M5.3 (T7) Increment 2: AuditInboundCeilingHits counter
|
||||
// When request OR response exceeds InboundMaxBytes, the middleware
|
||||
// increments IAuditInboundCeilingHitsCounter once per request.
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
/// <summary>
|
||||
/// In-memory <see cref="IAuditInboundCeilingHitsCounter"/> that records
|
||||
/// every <see cref="Increment"/> call.
|
||||
/// </summary>
|
||||
private sealed class RecordingCeilingHitsCounter : ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter
|
||||
{
|
||||
private int _count;
|
||||
public int Count => Volatile.Read(ref _count);
|
||||
public void Increment() => Interlocked.Increment(ref _count);
|
||||
}
|
||||
|
||||
private static AuditWriteMiddleware CreateMiddlewareWithCounter(
|
||||
RequestDelegate next,
|
||||
ICentralAuditWriter writer,
|
||||
AuditLogOptions? options,
|
||||
ZB.MOM.WW.ScadaBridge.AuditLog.Central.IAuditInboundCeilingHitsCounter counter) =>
|
||||
new(
|
||||
next,
|
||||
writer,
|
||||
NullLogger<AuditWriteMiddleware>.Instance,
|
||||
new StaticAuditLogOptionsMonitor(options ?? new AuditLogOptions()),
|
||||
actorAccessor: null,
|
||||
ceilingHitsCounter: counter);
|
||||
|
||||
[Fact]
|
||||
public async Task RequestBody_AboveInboundMaxBytes_IncrementsCeilingHitsCounter()
|
||||
{
|
||||
const int cap = 1024;
|
||||
var bigBody = new string('x', cap + 100);
|
||||
var writer = new RecordingAuditWriter();
|
||||
var counter = new RecordingCeilingHitsCounter();
|
||||
var ctx = BuildContext(body: bigBody);
|
||||
var mw = CreateMiddlewareWithCounter(
|
||||
hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
},
|
||||
writer,
|
||||
options: new AuditLogOptions { InboundMaxBytes = cap },
|
||||
counter: counter);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
Assert.Equal(1, counter.Count);
|
||||
// Verify the truncation did happen to confirm ceiling was hit.
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.True(evt.PayloadTruncated);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ResponseBody_AboveInboundMaxBytes_IncrementsCeilingHitsCounter()
|
||||
{
|
||||
const int cap = 1024;
|
||||
var bigResponse = new string('y', cap + 100);
|
||||
var writer = new RecordingAuditWriter();
|
||||
var counter = new RecordingCeilingHitsCounter();
|
||||
var ctx = BuildContext();
|
||||
ctx.Response.Body = new MemoryStream();
|
||||
|
||||
var mw = CreateMiddlewareWithCounter(
|
||||
async hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
await hc.Response.WriteAsync(bigResponse);
|
||||
},
|
||||
writer,
|
||||
options: new AuditLogOptions { InboundMaxBytes = cap },
|
||||
counter: counter);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
Assert.Equal(1, counter.Count);
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.True(evt.PayloadTruncated);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task NormalRequest_WithinCap_DoesNotIncrementCeilingHitsCounter()
|
||||
{
|
||||
var writer = new RecordingAuditWriter();
|
||||
var counter = new RecordingCeilingHitsCounter();
|
||||
var smallBody = "{\"ok\":true}";
|
||||
var ctx = BuildContext(body: smallBody);
|
||||
// Cap is well above the body size.
|
||||
var mw = CreateMiddlewareWithCounter(
|
||||
hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
},
|
||||
writer,
|
||||
options: new AuditLogOptions { InboundMaxBytes = 8192 },
|
||||
counter: counter);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
Assert.Equal(0, counter.Count);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------
|
||||
// M5.3 (T7) Increment 3: SkipBodyCapture per-method opt-out
|
||||
// A target with SkipBodyCapture=true produces an audit row with
|
||||
// headers/metadata but empty/omitted body. A normal target still captures.
|
||||
// ---------------------------------------------------------------------
|
||||
|
||||
private static DefaultHttpContext BuildContextWithRoute(
|
||||
string methodName,
|
||||
string? body = null)
|
||||
{
|
||||
var ctx = new DefaultHttpContext();
|
||||
ctx.Request.Method = "POST";
|
||||
ctx.Request.Path = $"/api/{methodName}";
|
||||
ctx.Request.RouteValues["methodName"] = methodName;
|
||||
ctx.Connection.RemoteIpAddress = System.Net.IPAddress.Parse("10.0.0.1");
|
||||
|
||||
if (body is not null)
|
||||
{
|
||||
var bytes = Encoding.UTF8.GetBytes(body);
|
||||
ctx.Request.Body = new MemoryStream(bytes);
|
||||
ctx.Request.ContentLength = bytes.Length;
|
||||
ctx.Request.ContentType = "application/json";
|
||||
}
|
||||
|
||||
return ctx;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SkipBodyCapture_True_AuditRowEmitted_ButBodyIsNull()
|
||||
{
|
||||
// A target with SkipBodyCapture=true must produce an audit row (the
|
||||
// row must not be suppressed entirely) but RequestSummary and
|
||||
// ResponseSummary must both be null — only the body is omitted.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
|
||||
{
|
||||
["secret-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
|
||||
{
|
||||
SkipBodyCapture = true,
|
||||
},
|
||||
},
|
||||
};
|
||||
var ctx = BuildContextWithRoute("secret-method", body: "{\"sensitive\":\"data\"}");
|
||||
|
||||
var mw = CreateMiddleware(
|
||||
async hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
await hc.Response.WriteAsync("{\"result\":\"secret\"}");
|
||||
},
|
||||
writer,
|
||||
options: opts);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
// Row IS emitted — only the body content is suppressed.
|
||||
Assert.Equal("secret-method", evt.Target);
|
||||
Assert.Equal(AuditStatus.Delivered, evt.Status);
|
||||
// Bodies are null — SkipBodyCapture stripped them.
|
||||
Assert.Null(evt.RequestSummary);
|
||||
Assert.Null(evt.ResponseSummary);
|
||||
// Headers / metadata are still present.
|
||||
Assert.NotNull(evt.Extra);
|
||||
using var doc = JsonDocument.Parse(evt.Extra!);
|
||||
Assert.True(doc.RootElement.TryGetProperty("requestHeaders", out _),
|
||||
"Headers must be present even when body capture is skipped");
|
||||
Assert.Equal(200, evt.HttpStatus);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SkipBodyCapture_True_CeilingHitsCounter_NotIncremented()
|
||||
{
|
||||
// When SkipBodyCapture=true the body is never measured against the cap;
|
||||
// the counter must NOT be bumped even if the body would have exceeded it.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var counter = new RecordingCeilingHitsCounter();
|
||||
const int cap = 64;
|
||||
var bigBody = new string('z', cap + 1000);
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
InboundMaxBytes = cap,
|
||||
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
|
||||
{
|
||||
["large-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
|
||||
{
|
||||
SkipBodyCapture = true,
|
||||
},
|
||||
},
|
||||
};
|
||||
var ctx = BuildContextWithRoute("large-method", body: bigBody);
|
||||
|
||||
var mw = CreateMiddlewareWithCounter(
|
||||
hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
return Task.CompletedTask;
|
||||
},
|
||||
writer,
|
||||
options: opts,
|
||||
counter: counter);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
Assert.Equal(0, counter.Count);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SkipBodyCapture_False_NormalTarget_StillCapturesBody()
|
||||
{
|
||||
// Regression: a target WITHOUT SkipBodyCapture (or with SkipBodyCapture=false)
|
||||
// must still capture the body normally.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var opts = new AuditLogOptions
|
||||
{
|
||||
PerTargetOverrides = new Dictionary<string, ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride>
|
||||
{
|
||||
["normal-method"] = new ZB.MOM.WW.ScadaBridge.AuditLog.Configuration.PerTargetRedactionOverride
|
||||
{
|
||||
SkipBodyCapture = false,
|
||||
},
|
||||
},
|
||||
};
|
||||
var requestJson = "{\"a\":1}";
|
||||
var ctx = BuildContextWithRoute("normal-method", body: requestJson);
|
||||
|
||||
var mw = CreateMiddleware(
|
||||
async hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
await hc.Response.WriteAsync("{\"result\":1}");
|
||||
},
|
||||
writer,
|
||||
options: opts);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.Equal(requestJson, evt.RequestSummary);
|
||||
Assert.Equal("{\"result\":1}", evt.ResponseSummary);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SkipBodyCapture_NoOverride_DefaultTarget_StillCapturesBody()
|
||||
{
|
||||
// A target with no per-target override at all must still capture the body —
|
||||
// SkipBodyCapture defaults to false and must not suppress capture.
|
||||
var writer = new RecordingAuditWriter();
|
||||
var requestJson = "{\"x\":99}";
|
||||
var ctx = BuildContext(body: requestJson);
|
||||
|
||||
var mw = CreateMiddleware(
|
||||
async hc =>
|
||||
{
|
||||
hc.Response.StatusCode = 200;
|
||||
await hc.Response.WriteAsync("{\"y\":99}");
|
||||
},
|
||||
writer);
|
||||
|
||||
await mw.InvokeAsync(ctx);
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.Equal(requestJson, evt.RequestSummary);
|
||||
Assert.Equal("{\"y\":99}", evt.ResponseSummary);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
using NSubstitute;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.InboundApi;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
|
||||
|
||||
@@ -139,6 +140,116 @@ public class RouteHelperTests
|
||||
Assert.Equal("read failed", ex.Message);
|
||||
}
|
||||
|
||||
// --- WaitForAttribute (spec §6) ---
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_Matched_ReturnsTrue()
|
||||
{
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: true, Value: true, Quality: "Good", TimedOut: false,
|
||||
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
|
||||
|
||||
var matched = await CreateHelper().To("inst-1")
|
||||
.WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
Assert.True(matched);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_TimedOut_ReturnsFalse()
|
||||
{
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: false, Value: null, Quality: null, TimedOut: true,
|
||||
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
|
||||
|
||||
var matched = await CreateHelper().To("inst-1")
|
||||
.WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
Assert.False(matched);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_RoutingFailure_ThrowsInvalidOperationException()
|
||||
{
|
||||
// Success=false is a routing-level outcome (e.g. instance not found on the
|
||||
// site), distinct from the wait outcome (Matched/TimedOut).
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Any<CancellationToken>())
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: false, Value: null, Quality: null, TimedOut: false,
|
||||
Success: false, ErrorMessage: "instance not found", DateTimeOffset.UtcNow));
|
||||
|
||||
var ex = await Assert.ThrowsAsync<InvalidOperationException>(
|
||||
() => CreateHelper().To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30)));
|
||||
Assert.Equal("instance not found", ex.Message);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_EncodesTargetValue_OnRequest()
|
||||
{
|
||||
// Value-equality only across the wire: the target value is encoded via the
|
||||
// canonical AttributeValueCodec, identical to how attribute values travel.
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
RouteToWaitForAttributeRequest? captured = null;
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Do<RouteToWaitForAttributeRequest>(r => captured = r), Arg.Any<CancellationToken>())
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: true, Value: true, Quality: "Good", TimedOut: false,
|
||||
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
|
||||
|
||||
await CreateHelper().To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
Assert.NotNull(captured);
|
||||
Assert.Equal("Flag", captured!.AttributeName);
|
||||
Assert.Equal(TimeSpan.FromSeconds(30), captured.Timeout);
|
||||
Assert.Equal(AttributeValueCodec.Encode(true), captured.TargetValueEncoded);
|
||||
Assert.True(Guid.TryParse(captured.CorrelationId, out _));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_WithNoExplicitToken_InheritsMethodDeadlineToken()
|
||||
{
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
using var deadline = new CancellationTokenSource();
|
||||
CancellationToken seen = default;
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Any<RouteToWaitForAttributeRequest>(), Arg.Do<CancellationToken>(t => seen = t))
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: false, Value: null, Quality: null, TimedOut: true,
|
||||
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
|
||||
|
||||
var bound = CreateHelper().WithDeadline(deadline.Token);
|
||||
await bound.To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
Assert.Equal(deadline.Token, seen);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAttribute_WithParentExecutionId_CarriesItOnRequest()
|
||||
{
|
||||
SiteResolves("inst-1", "SiteA");
|
||||
var inboundExecutionId = Guid.NewGuid();
|
||||
RouteToWaitForAttributeRequest? captured = null;
|
||||
_router.RouteToWaitForAttributeAsync("SiteA", Arg.Do<RouteToWaitForAttributeRequest>(r => captured = r), Arg.Any<CancellationToken>())
|
||||
.Returns(ci => new RouteToWaitForAttributeResponse(
|
||||
((RouteToWaitForAttributeRequest)ci[1]).CorrelationId,
|
||||
Matched: true, Value: true, Quality: "Good", TimedOut: false,
|
||||
Success: true, ErrorMessage: null, DateTimeOffset.UtcNow));
|
||||
|
||||
var bound = CreateHelper().WithParentExecutionId(inboundExecutionId);
|
||||
await bound.To("inst-1").WaitForAttribute("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
Assert.NotNull(captured);
|
||||
Assert.Equal(inboundExecutionId, captured!.ParentExecutionId);
|
||||
}
|
||||
|
||||
// --- SetAttribute(s) ---
|
||||
|
||||
[Fact]
|
||||
|
||||
@@ -89,6 +89,14 @@ public class SiteAuditPushFlowTests : TestKit
|
||||
public Task<long> SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default)
|
||||
=> throw new NotSupportedException();
|
||||
|
||||
public Task<long> PurgeChannelOlderThanAsync(
|
||||
string channel, DateTime threshold, int batchSize, CancellationToken ct = default)
|
||||
=> throw new NotSupportedException();
|
||||
|
||||
public Task<long> BackfillSourceNodeAsync(
|
||||
string sentinel, DateTime before, int batchSize, CancellationToken ct = default)
|
||||
=> throw new NotSupportedException();
|
||||
|
||||
public Task<IReadOnlyList<DateTime>> GetPartitionBoundariesOlderThanAsync(
|
||||
DateTime threshold, CancellationToken ct = default)
|
||||
=> throw new NotSupportedException();
|
||||
|
||||
@@ -610,4 +610,366 @@ public class AuditEndpointsTests
|
||||
Assert.NotNull(result);
|
||||
Assert.Equal(new[] { "plant-a" }, result!.SourceSiteIds);
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// /api/audit/tree
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Builds a TestServer with the audit-log endpoints wired up and the repository
|
||||
/// stub returning the supplied <paramref name="treeNodes"/> for
|
||||
/// <c>GetExecutionTreeAsync</c>.
|
||||
/// </summary>
|
||||
private static async Task<(HttpClient Client, IAuditLogRepository Repo, IHost Host)> BuildHostWithTreeAsync(
|
||||
string[] roles,
|
||||
IReadOnlyList<ExecutionTreeNode>? treeNodes = null)
|
||||
{
|
||||
var repo = Substitute.For<IAuditLogRepository>();
|
||||
|
||||
// Default QueryAsync stub so the shared host initialisation does not fail.
|
||||
repo.QueryAsync(Arg.Any<AuditLogQueryFilter>(), Arg.Any<AuditLogPaging>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
|
||||
var returnNodes = treeNodes ?? Array.Empty<ExecutionTreeNode>();
|
||||
repo.GetExecutionTreeAsync(Arg.Any<Guid>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<ExecutionTreeNode>>(returnNodes));
|
||||
|
||||
var ldap = Substitute.For<ILdapAuthService>();
|
||||
ldap.AuthenticateAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>())
|
||||
.Returns(LdapAuthResult.Success("auditor", "Auditor", new[] { "audit" }));
|
||||
|
||||
var roleMapper = Substitute.For<RoleMapper>(Substitute.For<ISecurityRepository>());
|
||||
roleMapper.MapGroupsToRolesAsync(Arg.Any<IReadOnlyList<string>>(), Arg.Any<CancellationToken>())
|
||||
.Returns(new RoleMappingResult(roles, Array.Empty<string>(), IsSystemWideDeployment: true));
|
||||
|
||||
var hostBuilder = new HostBuilder()
|
||||
.ConfigureWebHost(web =>
|
||||
{
|
||||
web.UseTestServer();
|
||||
web.ConfigureServices(services =>
|
||||
{
|
||||
services.AddRouting();
|
||||
services.AddSingleton(repo);
|
||||
services.AddSingleton(ldap);
|
||||
services.AddSingleton(roleMapper);
|
||||
});
|
||||
web.Configure(app =>
|
||||
{
|
||||
app.UseRouting();
|
||||
app.UseEndpoints(endpoints => endpoints.MapAuditAPI());
|
||||
});
|
||||
});
|
||||
|
||||
var host = await hostBuilder.StartAsync();
|
||||
return (host.GetTestClient(), repo, host);
|
||||
}
|
||||
|
||||
private static ExecutionTreeNode MakeNode(Guid id, Guid? parentId = null, int rowCount = 2) =>
|
||||
new ExecutionTreeNode(
|
||||
ExecutionId: id,
|
||||
ParentExecutionId: parentId,
|
||||
RowCount: rowCount,
|
||||
Channels: new[] { "ApiOutbound" },
|
||||
Statuses: new[] { "Delivered" },
|
||||
SourceSiteId: "plant-a",
|
||||
SourceInstanceId: "inst-1",
|
||||
FirstOccurredAtUtc: new DateTime(2026, 5, 20, 10, 0, 0, DateTimeKind.Utc),
|
||||
LastOccurredAtUtc: new DateTime(2026, 5, 20, 10, 1, 0, DateTimeKind.Utc));
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_ValidExecutionId_ReturnsJsonArray()
|
||||
{
|
||||
var root = Guid.Parse("aaaaaaaa-0000-0000-0000-000000000001");
|
||||
var child = Guid.Parse("aaaaaaaa-0000-0000-0000-000000000002");
|
||||
var nodes = new[]
|
||||
{
|
||||
MakeNode(root),
|
||||
MakeNode(child, parentId: root),
|
||||
};
|
||||
|
||||
var (client, repo, host) = await BuildHostWithTreeAsync(
|
||||
roles: new[] { "Administrator" },
|
||||
treeNodes: nodes);
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={root:D}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
Assert.Equal("application/json", response.Content.Headers.ContentType!.MediaType);
|
||||
|
||||
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
|
||||
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
|
||||
Assert.Equal(2, doc.RootElement.GetArrayLength());
|
||||
|
||||
await repo.Received(1).GetExecutionTreeAsync(root, Arg.Any<CancellationToken>());
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_RepoReturnsEmpty_ReturnsEmptyArray()
|
||||
{
|
||||
var id = Guid.NewGuid();
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(
|
||||
roles: new[] { "Administrator" },
|
||||
treeNodes: Array.Empty<ExecutionTreeNode>());
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={id:D}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
|
||||
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
|
||||
Assert.Equal(JsonValueKind.Array, doc.RootElement.ValueKind);
|
||||
Assert.Equal(0, doc.RootElement.GetArrayLength());
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_MissingExecutionId_Returns400()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get("/api/audit/tree"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_InvalidExecutionId_Returns400()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get("/api/audit/tree?executionId=not-a-guid"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
|
||||
var body = await response.Content.ReadAsStringAsync();
|
||||
Assert.Contains("BAD_REQUEST", body);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_WithoutOperationalAudit_Returns403()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Designer" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.Forbidden, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_WithoutCredentials_Returns401()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}", credential: ""));
|
||||
|
||||
Assert.Equal(HttpStatusCode.Unauthorized, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Tree_ViewerRole_IsAllowed()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithTreeAsync(roles: new[] { "Viewer" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Get($"/api/audit/tree?executionId={Guid.NewGuid():D}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// POST /api/audit/backfill-source-node (M5.6 T5)
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
private static async Task<(HttpClient Client, IAuditLogRepository Repo, IHost Host)> BuildHostWithBackfillAsync(
|
||||
string[] roles,
|
||||
long backfillResult = 42L,
|
||||
bool ldapSucceeds = true)
|
||||
{
|
||||
var repo = Substitute.For<IAuditLogRepository>();
|
||||
repo.QueryAsync(Arg.Any<AuditLogQueryFilter>(), Arg.Any<AuditLogPaging>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
repo.BackfillSourceNodeAsync(
|
||||
Arg.Any<string>(), Arg.Any<DateTime>(), Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult(backfillResult));
|
||||
repo.GetExecutionTreeAsync(Arg.Any<Guid>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<ZB.MOM.WW.ScadaBridge.Commons.Types.Audit.ExecutionTreeNode>>(
|
||||
Array.Empty<ZB.MOM.WW.ScadaBridge.Commons.Types.Audit.ExecutionTreeNode>()));
|
||||
|
||||
var ldap = Substitute.For<ILdapAuthService>();
|
||||
ldap.AuthenticateAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<CancellationToken>())
|
||||
.Returns(ldapSucceeds
|
||||
? LdapAuthResult.Success("auditor", "Auditor", new[] { "audit" })
|
||||
: LdapAuthResult.Fail(LdapAuthFailure.BadCredentials));
|
||||
|
||||
var roleMapper = Substitute.For<RoleMapper>(Substitute.For<ISecurityRepository>());
|
||||
roleMapper.MapGroupsToRolesAsync(Arg.Any<IReadOnlyList<string>>(), Arg.Any<CancellationToken>())
|
||||
.Returns(new RoleMappingResult(roles, Array.Empty<string>(), IsSystemWideDeployment: true));
|
||||
|
||||
var hostBuilder = new HostBuilder()
|
||||
.ConfigureWebHost(web =>
|
||||
{
|
||||
web.UseTestServer();
|
||||
web.ConfigureServices(services =>
|
||||
{
|
||||
services.AddRouting();
|
||||
services.AddSingleton(repo);
|
||||
services.AddSingleton(ldap);
|
||||
services.AddSingleton(roleMapper);
|
||||
});
|
||||
web.Configure(app =>
|
||||
{
|
||||
app.UseRouting();
|
||||
app.UseEndpoints(endpoints => endpoints.MapAuditAPI());
|
||||
});
|
||||
});
|
||||
|
||||
var host = await hostBuilder.StartAsync();
|
||||
return (host.GetTestClient(), repo, host);
|
||||
}
|
||||
|
||||
private static HttpRequestMessage Post(string url, string body, string credential = BasicCredential)
|
||||
{
|
||||
var request = new HttpRequestMessage(HttpMethod.Post, url)
|
||||
{
|
||||
Content = new StringContent(body, Encoding.UTF8, "application/json"),
|
||||
};
|
||||
if (credential.Length > 0)
|
||||
{
|
||||
request.Headers.Authorization = new AuthenticationHeaderValue(
|
||||
"Basic", Convert.ToBase64String(Encoding.UTF8.GetBytes(credential)));
|
||||
}
|
||||
return request;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_AdminRole_Returns200WithRowCount()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithBackfillAsync(
|
||||
roles: new[] { "Administrator" }, backfillResult: 12345L);
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
|
||||
using var doc = JsonDocument.Parse(await response.Content.ReadAsStringAsync());
|
||||
var root = doc.RootElement;
|
||||
Assert.Equal(12345L, root.GetProperty("rowsUpdated").GetInt64());
|
||||
Assert.Equal("unknown", root.GetProperty("sentinel").GetString());
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_ViewerRole_Returns403()
|
||||
{
|
||||
// Viewer has OperationalAudit but NOT the Admin-only backfill permission.
|
||||
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Viewer" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.Forbidden, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_NoCredentials_Returns401()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"unknown\",\"before\":\"2026-01-01T00:00:00Z\"}",
|
||||
credential: ""));
|
||||
|
||||
Assert.Equal(HttpStatusCode.Unauthorized, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_MissingBefore_Returns400()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
// No "before" field — required.
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"unknown\"}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_InvalidBeforeDate_Returns400()
|
||||
{
|
||||
var (client, _, host) = await BuildHostWithBackfillAsync(roles: new[] { "Administrator" });
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"unknown\",\"before\":\"not-a-date\"}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_CustomSentinelAndBatch_PassedToRepo()
|
||||
{
|
||||
var (client, repo, host) = await BuildHostWithBackfillAsync(
|
||||
roles: new[] { "Administrator" }, backfillResult: 7L);
|
||||
using (host)
|
||||
{
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"sentinel\":\"pre-feature\",\"before\":\"2026-01-01T00:00:00Z\",\"batchSize\":2000}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
|
||||
await repo.Received(1).BackfillSourceNodeAsync(
|
||||
"pre-feature",
|
||||
Arg.Is<DateTime>(d => d.Year == 2026 && d.Month == 1 && d.Day == 1),
|
||||
2000,
|
||||
Arg.Any<CancellationToken>());
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task BackfillSourceNode_DefaultSentinel_IsUnknown_WhenOmitted()
|
||||
{
|
||||
var (client, repo, host) = await BuildHostWithBackfillAsync(
|
||||
roles: new[] { "Administrator" }, backfillResult: 0L);
|
||||
using (host)
|
||||
{
|
||||
// Omit "sentinel" — endpoint defaults to "unknown".
|
||||
var response = await client.SendAsync(Post(
|
||||
"/api/audit/backfill-source-node",
|
||||
"{\"before\":\"2026-01-01T00:00:00Z\"}"));
|
||||
|
||||
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
|
||||
|
||||
await repo.Received(1).BackfillSourceNodeAsync(
|
||||
"unknown",
|
||||
Arg.Any<DateTime>(),
|
||||
Arg.Any<int>(),
|
||||
Arg.Any<CancellationToken>());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
+46
@@ -495,4 +495,50 @@ public class NotificationOutboxActorQueryTests : TestKit
|
||||
Assert.Contains("db down", response.ErrorMessage);
|
||||
Assert.Empty(response.Sites);
|
||||
}
|
||||
|
||||
// ── Per-node KPI (T6: M5.2 per-node stuck-count KPIs) ──────────────────
|
||||
|
||||
[Fact]
|
||||
public void PerNodeKpiRequest_RepliesWithPerNodeSnapshots()
|
||||
{
|
||||
_repository.ComputePerNodeKpisAsync(
|
||||
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>())
|
||||
.Returns(new List<NodeNotificationKpiSnapshot>
|
||||
{
|
||||
new("node-a", QueueDepth: 3, StuckCount: 1, ParkedCount: 0,
|
||||
DeliveredLastInterval: 5, OldestPendingAge: TimeSpan.FromMinutes(12)),
|
||||
});
|
||||
var actor = CreateActor();
|
||||
|
||||
actor.Tell(new PerNodeNotificationKpiRequest("corr-pn"), TestActor);
|
||||
|
||||
var response = ExpectMsg<PerNodeNotificationKpiResponse>();
|
||||
Assert.True(response.Success);
|
||||
Assert.Null(response.ErrorMessage);
|
||||
Assert.Equal("corr-pn", response.CorrelationId);
|
||||
Assert.Single(response.Nodes);
|
||||
Assert.Equal("node-a", response.Nodes[0].SourceNode);
|
||||
Assert.Equal(1, response.Nodes[0].StuckCount);
|
||||
|
||||
_repository.Received(1).ComputePerNodeKpisAsync(
|
||||
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void PerNodeKpiRequest_RepositoryFault_RepliesUnsuccessful()
|
||||
{
|
||||
_repository.ComputePerNodeKpisAsync(
|
||||
Arg.Any<DateTimeOffset>(), Arg.Any<DateTimeOffset>(), Arg.Any<CancellationToken>())
|
||||
.ThrowsAsync(new InvalidOperationException("node-kpi db down"));
|
||||
var actor = CreateActor();
|
||||
|
||||
actor.Tell(new PerNodeNotificationKpiRequest("corr-pn"), TestActor);
|
||||
|
||||
var response = ExpectMsg<PerNodeNotificationKpiResponse>();
|
||||
Assert.False(response.Success);
|
||||
Assert.Equal("corr-pn", response.CorrelationId);
|
||||
Assert.NotNull(response.ErrorMessage);
|
||||
Assert.Contains("node-kpi db down", response.ErrorMessage);
|
||||
Assert.Empty(response.Nodes);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -594,6 +594,43 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
Assert.NotNull(response.OldestPendingAge);
|
||||
}
|
||||
|
||||
// ── Per-node KPI (T6: M5.2 per-node stuck-count KPIs) ──────────────────
|
||||
|
||||
[SkippableFact]
|
||||
public async Task PerNodeSiteCallKpiRequest_ScopesCountsToEachNode()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var nodeId = "node-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
await using var context = CreateContext();
|
||||
var repo = new SiteCallAuditRepository(context);
|
||||
var actor = CreateActor(repo, new SiteCallAuditOptions
|
||||
{
|
||||
StuckAgeThreshold = TimeSpan.FromMinutes(10),
|
||||
KpiInterval = TimeSpan.FromHours(1),
|
||||
});
|
||||
|
||||
var now = DateTime.UtcNow;
|
||||
var siteId = NewSiteId();
|
||||
// Non-terminal Attempted, created 30 min ago — buffered + stuck.
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), siteId, status: "Attempted",
|
||||
createdAtUtc: now.AddMinutes(-30), sourceNode: nodeId));
|
||||
// Terminal Parked.
|
||||
await repo.UpsertAsync(NewRow(TrackedOperationId.New(), siteId, status: "Parked",
|
||||
createdAtUtc: now.AddMinutes(-5), terminal: true, sourceNode: nodeId));
|
||||
|
||||
actor.Tell(new PerNodeSiteCallKpiRequest("corr-pnk"), TestActor);
|
||||
|
||||
var response = ExpectMsg<PerNodeSiteCallKpiResponse>(TimeSpan.FromSeconds(10));
|
||||
Assert.True(response.Success);
|
||||
|
||||
var myNode = Assert.Single(response.Nodes, n => n.SourceNode == nodeId);
|
||||
Assert.Equal(1, myNode.BufferedCount);
|
||||
Assert.Equal(1, myNode.ParkedCount);
|
||||
Assert.Equal(1, myNode.StuckCount);
|
||||
Assert.NotNull(myNode.OldestPendingAge);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task PerSiteSiteCallKpiRequest_ScopesCountsToEachSite()
|
||||
{
|
||||
@@ -745,6 +782,10 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -790,5 +831,9 @@ public class SiteCallAuditActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerSiteKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
_inner.ComputePerNodeKpisAsync(stuckCutoff, intervalSince, ct);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -76,6 +76,10 @@ public class SiteCallAuditPurgeTests : TestKit
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
|
||||
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
|
||||
}
|
||||
|
||||
/// <summary>Repository whose purge always throws — to prove continue-on-error keeps the singleton alive.</summary>
|
||||
@@ -94,6 +98,7 @@ public class SiteCallAuditPurgeTests : TestKit
|
||||
public Task<IReadOnlyList<SiteCall>> QueryAsync(SiteCallQueryFilter f, SiteCallPaging p, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCall>>(Array.Empty<SiteCall>());
|
||||
public Task<SiteCallKpiSnapshot> ComputeKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult(new SiteCallKpiSnapshot(0, 0, 0, 0, null, 0));
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(DateTime a, DateTime b, CancellationToken ct = default) => Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
|
||||
}
|
||||
|
||||
private IActorRef CreateActor(ISiteCallAuditRepository repo, SiteCallAuditOptions options) =>
|
||||
|
||||
@@ -142,6 +142,10 @@ public class SiteCallAuditReconciliationTests : TestKit
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<SiteCallSiteKpiSnapshot>>(Array.Empty<SiteCallSiteKpiSnapshot>());
|
||||
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
Task.FromResult<IReadOnlyList<SiteCallNodeKpiSnapshot>>(Array.Empty<SiteCallNodeKpiSnapshot>());
|
||||
}
|
||||
|
||||
private IActorRef CreateActor(
|
||||
|
||||
@@ -50,6 +50,10 @@ public class SiteCallRelayTests : TestKit
|
||||
public Task<IReadOnlyList<SiteCallSiteKpiSnapshot>> ComputePerSiteKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
throw new InvalidOperationException("relay must not compute per-site KPIs");
|
||||
|
||||
public Task<IReadOnlyList<SiteCallNodeKpiSnapshot>> ComputePerNodeKpisAsync(
|
||||
DateTime stuckCutoff, DateTime intervalSince, CancellationToken ct = default) =>
|
||||
throw new InvalidOperationException("relay must not compute per-node KPIs");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
|
||||
@@ -6,6 +6,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Messages.Deployment;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.DebugView;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.InboundApi;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Lifecycle;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
|
||||
@@ -389,6 +390,61 @@ public class DeploymentManagerActorTests : TestKit, IDisposable
|
||||
Assert.True(response.Success, $"Routed call failed: {response.ErrorMessage}");
|
||||
}
|
||||
|
||||
// ── Spec §6 (WD-2b): routed RouteToWaitForAttributeRequest → InstanceActor ──
|
||||
|
||||
[Fact]
|
||||
public async Task RouteInboundApiWaitForAttribute_AttributeAlreadyAtTarget_RepliesMatched()
|
||||
{
|
||||
// A routed wait whose target equals the instance's current (static)
|
||||
// attribute value must satisfy the InstanceActor fast-path and come back
|
||||
// Success:true, Matched:true with the matched value/quality.
|
||||
var actor = CreateDeploymentManager();
|
||||
await Task.Delay(500); // empty startup
|
||||
|
||||
// MakeConfigJson seeds a scalar static attribute "TestAttr" = "42" (Good).
|
||||
actor.Tell(new DeployInstanceCommand(
|
||||
"dep-wait", "WaitPump", "sha256:wait",
|
||||
MakeConfigJson("WaitPump"), "admin", DateTimeOffset.UtcNow));
|
||||
ExpectMsg<DeploymentStatusResponse>(TimeSpan.FromSeconds(5));
|
||||
await Task.Delay(1000); // let the InstanceActor spin up + load static attrs
|
||||
|
||||
// Encode the target the same way the InstanceActor encodes the current
|
||||
// value for its codec-equality match (value-equality only across the wire).
|
||||
var encodedTarget = AttributeValueCodec.Encode("42");
|
||||
actor.Tell(new RouteToWaitForAttributeRequest(
|
||||
"wait-corr-1", "WaitPump", "TestAttr", encodedTarget,
|
||||
TimeSpan.FromSeconds(5), DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<RouteToWaitForAttributeResponse>(TimeSpan.FromSeconds(10));
|
||||
Assert.Equal("wait-corr-1", response.CorrelationId);
|
||||
Assert.True(response.Success, $"Routed wait failed: {response.ErrorMessage}");
|
||||
Assert.True(response.Matched, "Expected fast-path match (attribute already at target).");
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("42", response.Value);
|
||||
Assert.Equal("Good", response.Quality);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RouteInboundApiWaitForAttribute_UnknownInstance_RepliesNotFound()
|
||||
{
|
||||
// A routed wait for an instance that was never deployed to this site must
|
||||
// come back Success:false with a not-found message (routing-level outcome),
|
||||
// mirroring the other RouteTo* unknown-instance paths.
|
||||
var actor = CreateDeploymentManager();
|
||||
await Task.Delay(500);
|
||||
|
||||
actor.Tell(new RouteToWaitForAttributeRequest(
|
||||
"wait-corr-2", "NeverDeployedWait", "TestAttr",
|
||||
AttributeValueCodec.Encode("42"), TimeSpan.FromSeconds(5), DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<RouteToWaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("wait-corr-2", response.CorrelationId);
|
||||
Assert.False(response.Success);
|
||||
Assert.False(response.Matched);
|
||||
Assert.NotNull(response.ErrorMessage);
|
||||
Assert.Contains("not found", response.ErrorMessage!, StringComparison.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
// ── M2.11: Debug-view routing — unknown-instance not-found signal ──
|
||||
|
||||
[Fact]
|
||||
|
||||
+853
@@ -0,0 +1,853 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.DataConnection;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Persistence;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
|
||||
using System.Text.Json;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests.Actors;
|
||||
|
||||
/// <summary>
|
||||
/// Tests for the event-driven <c>WaitForAttribute</c> one-shot waiter registry in
|
||||
/// <see cref="InstanceActor"/> (Attributes.WaitAsync spec §3-§5). Covers the
|
||||
/// fast-path, change-match, timeout, no-leak (timeout-canceled-on-match), and
|
||||
/// predicate-overload acceptance criteria.
|
||||
/// </summary>
|
||||
public class InstanceActorWaitForAttributeTests : TestKit, IDisposable
|
||||
{
|
||||
private readonly SiteStorageService _storage;
|
||||
private readonly ScriptCompilationService _compilationService;
|
||||
private readonly SharedScriptLibrary _sharedScriptLibrary;
|
||||
private readonly SiteRuntimeOptions _options;
|
||||
private readonly string _dbFile;
|
||||
|
||||
public InstanceActorWaitForAttributeTests()
|
||||
{
|
||||
_dbFile = Path.Combine(Path.GetTempPath(), $"instance-waitfor-test-{Guid.NewGuid():N}.db");
|
||||
_storage = new SiteStorageService(
|
||||
$"Data Source={_dbFile}",
|
||||
NullLogger<SiteStorageService>.Instance);
|
||||
_storage.InitializeAsync().GetAwaiter().GetResult();
|
||||
_compilationService = new ScriptCompilationService(
|
||||
NullLogger<ScriptCompilationService>.Instance);
|
||||
_sharedScriptLibrary = new SharedScriptLibrary(
|
||||
_compilationService, NullLogger<SharedScriptLibrary>.Instance);
|
||||
_options = new SiteRuntimeOptions();
|
||||
}
|
||||
|
||||
private IActorRef CreateInstanceActor(string instanceName, FlattenedConfiguration config)
|
||||
{
|
||||
return ActorOf(Props.Create(() => new InstanceActor(
|
||||
instanceName,
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null, // no stream manager in tests
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance)));
|
||||
}
|
||||
|
||||
void IDisposable.Dispose()
|
||||
{
|
||||
Shutdown();
|
||||
try { File.Delete(_dbFile); } catch { /* cleanup */ }
|
||||
}
|
||||
|
||||
// ── 1. Fast-path: attribute already at target ────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Acceptance §7.1: when the attribute already equals the target at the time
|
||||
/// the waiter registers, the actor must reply immediately with Matched=true
|
||||
/// (carrying the current value), without scheduling a timeout.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_FastPath_AlreadyAtTarget_RepliesMatchedImmediately()
|
||||
{
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute { CanonicalName = "Flag", Value = "true", DataType = "Boolean" }
|
||||
]
|
||||
};
|
||||
|
||||
var actor = CreateInstanceActor("Pump1", config);
|
||||
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-fast", "Pump1", "Flag",
|
||||
"true", null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-fast", response.CorrelationId);
|
||||
Assert.Equal("true", response.Value?.ToString());
|
||||
}
|
||||
|
||||
// ── 2. Change-match: register first, then drive a value change ───────────
|
||||
|
||||
/// <summary>
|
||||
/// Acceptance §7.1/§7.4: registering when the value does NOT match, then
|
||||
/// driving the attribute to the target value (via a DCL TagValueUpdate) must
|
||||
/// produce a single Matched=true reply carrying the new value.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_ChangeMatch_RepliesMatchedWithNewValue()
|
||||
{
|
||||
const string tag = "ns=3;s=Recipe.Processed";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
CanonicalName = "Processed", Value = "false", DataType = "Boolean",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// Register: current value "false" does not match the target. The value
|
||||
// arrives from the DCL as a boolean true, whose codec-encoded form is
|
||||
// "True" — so the target must be encoded the same way the accessor would
|
||||
// (AttributeValueCodec.Encode(true)), NOT the literal string "true".
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode(true);
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-change", "Pump1", "Processed",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
// No reply yet — the value has not changed to the target.
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
|
||||
|
||||
// Drive the value to the target through the DCL ingest path.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, true, QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-change", response.CorrelationId);
|
||||
Assert.Equal(true, response.Value);
|
||||
Assert.Equal("Good", response.Quality);
|
||||
}
|
||||
|
||||
// ── 3. Timeout: value never matches ──────────────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Acceptance §7.2: when the attribute never reaches the target within the
|
||||
/// timeout, the actor replies Matched=false, TimedOut=true (no throw).
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_Timeout_RepliesNotMatchedTimedOut()
|
||||
{
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute { CanonicalName = "Flag", Value = "false", DataType = "Boolean" }
|
||||
]
|
||||
};
|
||||
|
||||
var actor = CreateInstanceActor("Pump1", config);
|
||||
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-timeout", "Pump1", "Flag",
|
||||
"true", null, TimeSpan.FromMilliseconds(300), DateTimeOffset.UtcNow));
|
||||
|
||||
// The scheduled timeout fires; allow a tolerant deadline.
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
|
||||
Assert.False(response.Matched);
|
||||
Assert.True(response.TimedOut);
|
||||
Assert.Equal("wfa-timeout", response.CorrelationId);
|
||||
}
|
||||
|
||||
// ── 4. No-leak: timeout canceled on match (no second reply) ──────────────
|
||||
|
||||
/// <summary>
|
||||
/// Acceptance §7.5: after a successful change-match, the scheduled timeout
|
||||
/// must have been canceled and the waiter removed — so NO second (timeout)
|
||||
/// response arrives after the match.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_Match_CancelsTimeout_NoSecondReply()
|
||||
{
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute { CanonicalName = "Flag", Value = "false", DataType = "Boolean" }
|
||||
]
|
||||
};
|
||||
|
||||
var actor = CreateInstanceActor("Pump1", config);
|
||||
|
||||
// Register with a short timeout, then match BEFORE it would fire.
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-noleak", "Pump1", "Flag",
|
||||
"true", null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow));
|
||||
|
||||
// Drive the static value to the target; the actor publishes via
|
||||
// HandleAttributeValueChanged, satisfying the waiter.
|
||||
actor.Tell(new SetStaticAttributeCommand(
|
||||
"set-flag", "Pump1", "Flag", "true", DateTimeOffset.UtcNow));
|
||||
|
||||
// First reply: the match. (A SetStaticAttributeResponse also arrives for
|
||||
// the set command — filter for the WaitForAttributeResponse.)
|
||||
var matched = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(matched.Matched);
|
||||
Assert.False(matched.TimedOut);
|
||||
|
||||
// The set command's own ack — drain it so the no-msg assert below is clean.
|
||||
ExpectMsg<SetStaticAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// No second WaitForAttributeResponse (the timeout was canceled) for longer
|
||||
// than the original 500ms timeout window.
|
||||
ExpectNoMsg(TimeSpan.FromSeconds(1));
|
||||
}
|
||||
|
||||
// ── 5. Predicate overload ────────────────────────────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Acceptance §7 (predicate form): registering with a site-local predicate and
|
||||
/// then flipping the value so the predicate passes must produce Matched=true.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_PredicateOverload_MatchesOnPredicatePass()
|
||||
{
|
||||
const string tag = "ns=3;s=Level";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
CanonicalName = "Level", Value = "0", DataType = "Int32",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// Predicate: value > 50 (current is 0, so no immediate match).
|
||||
Func<object?, bool> predicate = v =>
|
||||
v is not null && int.TryParse(v.ToString(), out var n) && n > 50;
|
||||
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-pred", "Pump1", "Level",
|
||||
null, predicate, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
|
||||
|
||||
// A value below the threshold must NOT satisfy the predicate.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, 25, QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
|
||||
|
||||
// A value above the threshold satisfies it.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, 75, QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal(75, response.Value);
|
||||
}
|
||||
|
||||
// ── 6. "any change" (null target + null predicate) ───────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.1: a null TargetValueEncoded + null Predicate means "wait for any
|
||||
/// change" (test <c>_ => true</c>). When the attribute ALREADY holds a value at
|
||||
/// registration, the fast-path matches IMMEDIATELY — there is no need to wait for
|
||||
/// a subsequent update. (A separate test covers the absent-at-registration case.)
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_AnyChange_MatchesImmediatelyWhenAttributePresent()
|
||||
{
|
||||
const string tag = "ns=3;s=Speed";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
CanonicalName = "Speed", Value = "0", DataType = "Int32",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// "any change" registers with a non-trivial timeout. The fast-path uses
|
||||
// `_ => true`, so a currently-present attribute matches immediately.
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-any", "Pump1", "Speed",
|
||||
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
// Speed=0 is already present, so the "any change" test (_ => true) matches
|
||||
// immediately on the fast path.
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.1 (companion to the immediate-match case): when the attribute is
|
||||
/// ABSENT at registration (no entry in <c>_attributes</c>), the "any change"
|
||||
/// waiter does NOT fast-path — it registers, and a later value update on that
|
||||
/// attribute is the first thing that satisfies it.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_AnyChange_AttributeAbsent_MatchesOnLaterSet()
|
||||
{
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute { CanonicalName = "Known", Value = "x", DataType = "String" }
|
||||
]
|
||||
};
|
||||
|
||||
var actor = CreateInstanceActor("Pump1", config);
|
||||
|
||||
// "Ghost" is not a configured attribute, so _attributes has no entry — the
|
||||
// fast-path TryGetValue misses and the waiter registers rather than matching.
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-absent", "Pump1", "Ghost",
|
||||
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(300));
|
||||
|
||||
// A direct AttributeValueChanged for "Ghost" populates _attributes and
|
||||
// re-evaluates the waiter; the any-change test now matches the new value.
|
||||
actor.Tell(new AttributeValueChanged(
|
||||
"Pump1", "Ghost", "Ghost", "appeared", "Good", DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-absent", response.CorrelationId);
|
||||
Assert.Equal("appeared", response.Value);
|
||||
}
|
||||
|
||||
// ── 7. CRITICAL 1: no spurious match on a quality-only republish ─────────
|
||||
|
||||
/// <summary>
|
||||
/// CRITICAL 1 regression: the List-coerce-failure Bad-quality path republishes
|
||||
/// the OLD value (quality flipped to Bad) WITHOUT changing <c>_attributes</c>, so
|
||||
/// it passes <c>evaluateWaiters:false</c> — registered waiters are NOT re-evaluated
|
||||
/// on this non-change republish, must NOT spuriously fire, and must STILL resolve
|
||||
/// on the next genuine value change.
|
||||
///
|
||||
/// <para>
|
||||
/// We register an "any-change" waiter (which correctly fast-path matches the
|
||||
/// present value and is drained) plus a pending predicate waiter that does not yet
|
||||
/// match, then drive the Bad-quality republish and assert NO match is delivered for
|
||||
/// the pending waiter, and that a subsequent REAL change resolves it. (Note: the
|
||||
/// purest "any-change fires on a non-change republish" symptom is not directly
|
||||
/// reproducible — an any-change waiter against a present attribute always fast-path
|
||||
/// matches and so never stays pending across a republish; this test guards the
|
||||
/// republish path against double-firing / stranding waiters and against the
|
||||
/// predicate being re-evaluated on the non-change republish.)
|
||||
/// </para>
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_BadQualityRepublish_NoValueChange_DoesNotMatch()
|
||||
{
|
||||
const string tag = "ns=3;s=Items";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
// Static default {1,2}: a real list value is present from
|
||||
// construction so the Bad-quality republish has an OLD value to
|
||||
// republish. The waiter below targets a DIFFERENT value so it is
|
||||
// genuinely pending (no fast-path match) when the republish fires.
|
||||
CanonicalName = "Items", Value = "[1,2]", DataType = "List",
|
||||
ElementDataType = "Int32",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// A predicate waiter that matches a list of length >= 3. Current value is
|
||||
// {1,2} (length 2) so it does NOT fast-path match — it registers and stays
|
||||
// pending. Crucially, the Bad-quality republish below carries the SAME OLD
|
||||
// value {1,2} (length 2); with the bug (evaluateWaiters always true) the
|
||||
// predicate would be re-evaluated against {1,2} → still false, so this probe
|
||||
// also guards the predicate-isolation contract on the republish path.
|
||||
Func<object?, bool> lenAtLeast3 = v =>
|
||||
v is System.Collections.IList list && list.Count >= 3;
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-len3", "Pump1", "Items",
|
||||
null, lenAtLeast3, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
// Also register an "any-change" waiter while the attribute is present — it
|
||||
// fast-path matches the current {1,2} immediately. Drain that correct match;
|
||||
// it is the documented immediate-match behaviour, not the bug under test.
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-any", "Pump1", "Items",
|
||||
null, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
var immediate = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("wfa-any", immediate.CorrelationId);
|
||||
Assert.True(immediate.Matched);
|
||||
|
||||
// Drive the List-coerce-FAILURE Bad-quality republish: a scalar int cannot
|
||||
// coerce to List<Int32>, so the actor sets quality Bad and republishes the
|
||||
// OLD value {1,2} WITHOUT changing _attributes (evaluateWaiters:false).
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, 999, QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
|
||||
// The pending length>=3 waiter must NOT fire on this non-change republish.
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
|
||||
|
||||
// A REAL change to a length-3 list resolves the still-pending waiter.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, new[] { 7, 8, 9 }, QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
var realChange = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("wfa-len3", realChange.CorrelationId);
|
||||
Assert.True(realChange.Matched);
|
||||
Assert.False(realChange.TimedOut);
|
||||
}
|
||||
|
||||
// ── 8. CRITICAL 2: throwing predicate is isolated ────────────────────────
|
||||
|
||||
/// <summary>
|
||||
/// CRITICAL 2 regression: two waiters on the SAME attribute — one with a
|
||||
/// predicate that throws, one a normal value-equality. A single value change
|
||||
/// must (a) NOT crash the actor, (b) evict the throwing waiter with a
|
||||
/// non-matched error reply, and (c) STILL resolve the normal sibling. Finally
|
||||
/// the actor must remain responsive to a subsequent request.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_ThrowingPredicate_IsIsolated_SiblingStillMatches()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
CanonicalName = "State", Value = "init", DataType = "String",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// Waiter A: predicate that returns false for the CURRENT value ("init") so
|
||||
// it clears the fast-path and registers, but THROWS once the value becomes
|
||||
// "ready" — exercising the resolve-loop guard (not the fast-path guard).
|
||||
Func<object?, bool> boom = v =>
|
||||
v?.ToString() == "ready" ? throw new InvalidOperationException("kaboom") : false;
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-throw", "Pump1", "State",
|
||||
null, boom, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
// Waiter B: normal value-equality waiting for "ready".
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-normal", "Pump1", "State",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
|
||||
|
||||
// One change to "ready": evaluates BOTH waiters on this attribute. The
|
||||
// throwing one must be evicted (error reply); the normal one must match.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
|
||||
// Collect the two replies (order is registry-iteration dependent).
|
||||
var r1 = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
var r2 = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
var byId = new[] { r1, r2 }.ToDictionary(r => r.CorrelationId);
|
||||
|
||||
var thrown = byId["wfa-throw"];
|
||||
Assert.False(thrown.Matched);
|
||||
Assert.False(thrown.TimedOut);
|
||||
Assert.NotNull(thrown.ErrorMessage);
|
||||
Assert.Contains("Wait predicate threw", thrown.ErrorMessage);
|
||||
|
||||
var normal = byId["wfa-normal"];
|
||||
Assert.True(normal.Matched);
|
||||
Assert.False(normal.TimedOut);
|
||||
Assert.Equal("ready", normal.Value);
|
||||
|
||||
// The actor stayed alive and responsive: a follow-up request resolves.
|
||||
actor.Tell(new GetAttributeRequest("get-after", "Pump1", "State", DateTimeOffset.UtcNow));
|
||||
var get = ExpectMsg<GetAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("ready", get.Value);
|
||||
|
||||
// And the throwing waiter was REMOVED (no longer in the registry): driving
|
||||
// another change produces NO further reply for it.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "again", QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
|
||||
}
|
||||
|
||||
// ── 8b. CRITICAL 2 (fast-path): throwing predicate on already-held value ──
|
||||
|
||||
/// <summary>
|
||||
/// CRITICAL 2 regression (fast-path analogue of
|
||||
/// <see cref="WaitForAttribute_ThrowingPredicate_IsIsolated_SiblingStillMatches"/>):
|
||||
/// a predicate that THROWS is registered against an attribute that ALREADY holds a
|
||||
/// value, so the fast-path <c>test(current)</c> runs and throws. The actor must
|
||||
/// (a) reply a non-matched <c>WaitForAttributeResponse</c> with a non-null
|
||||
/// <c>ErrorMessage</c> (predicate-threw), (b) stay alive/responsive (it answers a
|
||||
/// subsequent <c>GetAttributeRequest</c>), and (c) NOT register the waiter — there
|
||||
/// is no later/second reply even after a value change on that attribute (the
|
||||
/// fast-path guard returns WITHOUT scheduling a timeout or storing the waiter).
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_ThrowingPredicate_FastPath_RepliesError_NoRegistration_ActorStaysAlive()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = "Pump1",
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
// Present from construction so the fast-path TryGetValue HITS and
|
||||
// the predicate runs on the current value (and throws).
|
||||
CanonicalName = "State", Value = "init", DataType = "String",
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
"Pump1",
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
// Predicate THROWS unconditionally — the current value "init" is already
|
||||
// present, so the fast-path test(current) executes it and throws.
|
||||
Func<object?, bool> boom = _ => throw new InvalidOperationException("kaboom");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-fp-throw", "Pump1", "State",
|
||||
null, boom, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow));
|
||||
|
||||
// (a) Non-matched error reply (predicate-threw), guarded on the fast-path.
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("wfa-fp-throw", response.CorrelationId);
|
||||
Assert.False(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.NotNull(response.ErrorMessage);
|
||||
Assert.Contains("Wait predicate threw", response.ErrorMessage);
|
||||
|
||||
// (b) The actor stayed alive and responsive: a follow-up request resolves.
|
||||
actor.Tell(new GetAttributeRequest("get-after-fp", "Pump1", "State", DateTimeOffset.UtcNow));
|
||||
var get = ExpectMsg<GetAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("init", get.Value);
|
||||
|
||||
// (c) The waiter was NOT registered (no timeout scheduled): driving a value
|
||||
// change on "State" produces NO further WaitForAttributeResponse.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(500));
|
||||
}
|
||||
|
||||
// ── 9. Quality-gated ("Good"-only) matching (spec §4.2) ──────────────────
|
||||
|
||||
/// <summary>
|
||||
/// Builds a data-connected instance actor with a single attribute backed by a
|
||||
/// DCL probe, draining the initial <c>SubscribeTagsRequest</c>. Used by the
|
||||
/// quality-gate tests, which drive value+quality through the DCL ingest path.
|
||||
/// </summary>
|
||||
private IActorRef CreateDataConnectedActor(
|
||||
string instanceName, string attribute, string tag, string dataType, TestProbe dcl)
|
||||
{
|
||||
var config = new FlattenedConfiguration
|
||||
{
|
||||
InstanceUniqueName = instanceName,
|
||||
Attributes =
|
||||
[
|
||||
new ResolvedAttribute
|
||||
{
|
||||
CanonicalName = attribute, Value = "init", DataType = dataType,
|
||||
DataSourceReference = tag, BoundDataConnectionName = "PLC"
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
var actor = ActorOf(Props.Create(() => new InstanceActor(
|
||||
instanceName,
|
||||
JsonSerializer.Serialize(config),
|
||||
_storage,
|
||||
_compilationService,
|
||||
_sharedScriptLibrary,
|
||||
null,
|
||||
_options,
|
||||
NullLogger<InstanceActor>.Instance,
|
||||
dcl.Ref)));
|
||||
|
||||
dcl.ExpectMsg<SubscribeTagsRequest>(TimeSpan.FromSeconds(5));
|
||||
return actor;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (change-match): with <c>RequireGoodQuality:true</c>, a value that
|
||||
/// reaches the target but arrives at <b>Bad</b> quality is NOT a match — the
|
||||
/// waiter stays pending and times out.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityGated_ChangeMatch_BadQuality_DoesNotMatch_TimesOut()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qg-bad", "Pump1", "State",
|
||||
target, null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: true));
|
||||
|
||||
// Value reaches the target but at Bad quality → must NOT match.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
|
||||
|
||||
// The only reply must be the timeout (no spurious Bad-quality match).
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
|
||||
Assert.False(response.Matched);
|
||||
Assert.True(response.TimedOut);
|
||||
Assert.Equal("wfa-qg-bad", response.CorrelationId);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (change-match, quality-agnostic baseline): the SAME Bad-quality
|
||||
/// value-reaches-target scenario DOES match when <c>RequireGoodQuality:false</c>.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityAgnostic_ChangeMatch_BadQuality_Matches()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qa-bad", "Pump1", "State",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: false));
|
||||
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-qa-bad", response.CorrelationId);
|
||||
Assert.Equal("ready", response.Value);
|
||||
Assert.Equal("Bad", response.Quality);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (change-match): with <c>RequireGoodQuality:true</c>, a value that
|
||||
/// reaches the target at <b>Good</b> quality matches normally. Also proves the
|
||||
/// gate is per-quality not per-value: a Bad-quality arrival at the target is
|
||||
/// skipped, then a Good-quality arrival at the target resolves the waiter.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityGated_ChangeMatch_GoodQuality_Matches()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qg-good", "Pump1", "State",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: true));
|
||||
|
||||
// First arrival at target but Bad quality is skipped (gate holds it pending).
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(400));
|
||||
|
||||
// Then a Good-quality arrival at the target resolves it.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-qg-good", response.CorrelationId);
|
||||
Assert.Equal("ready", response.Value);
|
||||
Assert.Equal("Good", response.Quality);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (fast-path): the attribute ALREADY holds the target value at
|
||||
/// <b>Bad</b> quality when the quality-gated waiter registers. The fast-path must
|
||||
/// NOT reply matched — it registers + schedules the timeout like any pending
|
||||
/// waiter, and (here) times out because the value never reaches target at Good.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityGated_FastPath_AlreadyAtTargetButBad_DoesNotMatch_TimesOut()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
// Seed the attribute to the target value at Bad quality BEFORE registering.
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(200)); // no waiter yet → no reply
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qg-fp-bad", "Pump1", "State",
|
||||
target, null, TimeSpan.FromMilliseconds(500), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: true));
|
||||
|
||||
// Fast-path quality-fail → registers, then times out (no fast matched reply).
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(3));
|
||||
Assert.False(response.Matched);
|
||||
Assert.True(response.TimedOut);
|
||||
Assert.Equal("wfa-qg-fp-bad", response.CorrelationId);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (fast-path, quality-agnostic baseline): the SAME already-at-target-
|
||||
/// but-Bad attribute fast-path MATCHES when <c>RequireGoodQuality:false</c>.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityAgnostic_FastPath_AlreadyAtTargetButBad_Matches()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Bad, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qa-fp-bad", "Pump1", "State",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: false));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-qa-fp-bad", response.CorrelationId);
|
||||
Assert.Equal("ready", response.Value);
|
||||
Assert.Equal("Bad", response.Quality);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Spec §4.2 (fast-path): the attribute ALREADY holds the target value at
|
||||
/// <b>Good</b> quality when the quality-gated waiter registers → the fast-path
|
||||
/// matches immediately.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void WaitForAttribute_QualityGated_FastPath_AlreadyAtTargetGood_MatchesImmediately()
|
||||
{
|
||||
const string tag = "ns=3;s=State";
|
||||
var dcl = CreateTestProbe();
|
||||
var actor = CreateDataConnectedActor("Pump1", "State", tag, "String", dcl);
|
||||
|
||||
actor.Tell(new TagValueUpdate("PLC", tag, "ready", QualityCode.Good, DateTimeOffset.UtcNow));
|
||||
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
|
||||
|
||||
var target = ZB.MOM.WW.ScadaBridge.Commons.Types.AttributeValueCodec.Encode("ready");
|
||||
actor.Tell(new WaitForAttributeRequest(
|
||||
"wfa-qg-fp-good", "Pump1", "State",
|
||||
target, null, TimeSpan.FromSeconds(30), DateTimeOffset.UtcNow,
|
||||
RequireGoodQuality: true));
|
||||
|
||||
var response = ExpectMsg<WaitForAttributeResponse>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(response.Matched);
|
||||
Assert.False(response.TimedOut);
|
||||
Assert.Equal("wfa-qg-fp-good", response.CorrelationId);
|
||||
Assert.Equal("ready", response.Value);
|
||||
Assert.Equal("Good", response.Quality);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,291 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Moq;
|
||||
using ZB.MOM.WW.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.ScriptExecution;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
|
||||
using IAuditWriter = ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services.IAuditWriter;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Actors;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
|
||||
|
||||
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests.Scripts;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23 (M5.4 — ParentExecutionId tag-cascade): nested
|
||||
/// <c>CallScript</c> / <c>CallShared</c> invocations and alarm on-trigger runs
|
||||
/// must form a true execution tree, where each spawned run records its
|
||||
/// immediate spawner's <c>ExecutionId</c> as its <c>ParentExecutionId</c>.
|
||||
///
|
||||
/// <list type="bullet">
|
||||
/// <item><description>
|
||||
/// A nested <c>CallScript</c> (actor-routed) emits a
|
||||
/// <see cref="ScriptCallRequest"/> whose <c>ParentExecutionId</c> is the
|
||||
/// CALLING run's OWN <c>ExecutionId</c> — NOT the inherited grandparent — so
|
||||
/// <c>A → CallScript(B)</c> yields <c>B.Parent == A.ExecutionId</c>.
|
||||
/// </description></item>
|
||||
/// <item><description>
|
||||
/// A nested <c>CallShared</c> (inline) runs in a child context that mints a
|
||||
/// fresh <c>ExecutionId</c> and records the caller's <c>ExecutionId</c> as its
|
||||
/// parent — so <c>B → CallShared(C)</c> yields <c>C.Parent == B.ExecutionId</c>
|
||||
/// (and NOT B's inherited parent A), proving a multi-level tree.
|
||||
/// </description></item>
|
||||
/// <item><description>
|
||||
/// The alarm on-trigger plumbing carries a <c>parentExecutionId</c> into the
|
||||
/// script context — null today (the run is a root) but threaded so a future
|
||||
/// firing id can flow.
|
||||
/// </description></item>
|
||||
/// </list>
|
||||
/// </summary>
|
||||
public class ParentExecutionTreeTests : TestKit
|
||||
{
|
||||
private const string InstanceName = "Plant.Pump42";
|
||||
|
||||
/// <summary>
|
||||
/// In-memory <see cref="IAuditWriter"/> capturing every emitted event
|
||||
/// (mirrors <c>ExecutionCorrelationContextTests.CapturingAuditWriter</c>).
|
||||
/// </summary>
|
||||
private sealed class CapturingAuditWriter : IAuditWriter
|
||||
{
|
||||
public List<AuditRowProjection.AuditRowValues> Events { get; } = new();
|
||||
|
||||
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
Events.Add(evt.AsRow());
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
|
||||
private static SharedScriptLibrary NewLibrary()
|
||||
{
|
||||
var compilationService = new ScriptCompilationService(
|
||||
NullLogger<ScriptCompilationService>.Instance);
|
||||
return new SharedScriptLibrary(
|
||||
compilationService, NullLogger<SharedScriptLibrary>.Instance);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Builds a context whose <c>CallScript</c> Ask targets <paramref name="instanceActor"/>
|
||||
/// (a probe), so the forwarded <see cref="ScriptCallRequest"/> can be captured.
|
||||
/// </summary>
|
||||
private static ScriptRuntimeContext CreateContext(
|
||||
IActorRef instanceActor,
|
||||
SharedScriptLibrary library,
|
||||
IExternalSystemClient? externalSystemClient = null,
|
||||
IAuditWriter? auditWriter = null,
|
||||
Guid? executionId = null,
|
||||
Guid? parentExecutionId = null)
|
||||
{
|
||||
return new ScriptRuntimeContext(
|
||||
instanceActor,
|
||||
ActorRefs.Nobody,
|
||||
library,
|
||||
currentCallDepth: 0,
|
||||
maxCallDepth: 10,
|
||||
askTimeout: TimeSpan.FromSeconds(5),
|
||||
instanceName: InstanceName,
|
||||
logger: NullLogger.Instance,
|
||||
externalSystemClient: externalSystemClient,
|
||||
siteId: "site-77",
|
||||
sourceScript: "ScriptActor:A",
|
||||
auditWriter: auditWriter,
|
||||
executionId: executionId,
|
||||
parentExecutionId: parentExecutionId);
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Nested CallScript (actor-routed) — A → CallScript(B)
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public async Task CallScript_StampsCallingRunsOwnExecutionId_AsChildParent()
|
||||
{
|
||||
// A → CallScript(B): the child request's ParentExecutionId must be A's
|
||||
// OWN ExecutionId, forming the A→B tree edge.
|
||||
var probe = CreateTestProbe();
|
||||
var aExecutionId = Guid.NewGuid();
|
||||
var context = CreateContext(probe.Ref, NewLibrary(), executionId: aExecutionId);
|
||||
|
||||
var call = context.CallScript("B");
|
||||
|
||||
var request = probe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("B", request.ScriptName);
|
||||
// B's parent is A's own execution id — the A→B tree edge.
|
||||
Assert.Equal(aExecutionId, request.ParentExecutionId);
|
||||
|
||||
// Unblock the Ask so the test completes cleanly.
|
||||
probe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
|
||||
await call;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task CallScript_FromRoutedRun_UsesOwnExecutionId_NotInheritedParent()
|
||||
{
|
||||
// A 2-level tree edge: B was itself spawned (it carries a parent = A).
|
||||
// When B does CallScript(C), C.Parent must be B's OWN ExecutionId — NOT
|
||||
// the inherited A. This is the regression that distinguishes a true tree
|
||||
// from a flattened "everything under the original spawner" model.
|
||||
var probe = CreateTestProbe();
|
||||
var bExecutionId = Guid.NewGuid();
|
||||
var aExecutionId = Guid.NewGuid(); // B's inherited parent
|
||||
var context = CreateContext(
|
||||
probe.Ref, NewLibrary(),
|
||||
executionId: bExecutionId,
|
||||
parentExecutionId: aExecutionId);
|
||||
|
||||
var call = context.CallScript("C");
|
||||
|
||||
var request = probe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal(bExecutionId, request.ParentExecutionId);
|
||||
Assert.NotEqual(aExecutionId, request.ParentExecutionId);
|
||||
|
||||
probe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
|
||||
await call;
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Nested CallShared (inline) — B → CallShared(C)
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public async Task CallShared_ChildRun_ParentIsCallersExecutionId_FreshOwnExecutionId()
|
||||
{
|
||||
// B → CallShared(C): the shared script C runs inline but is modelled as
|
||||
// its OWN execution node — a fresh ExecutionId parented to B's
|
||||
// ExecutionId. Asserted via the audit row C emits through
|
||||
// Instance.ExternalSystem.Call.
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(true, "{}", null));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var library = NewLibrary();
|
||||
Assert.True(library.CompileAndRegister(
|
||||
"C", "await Instance.ExternalSystem.Call(\"ERP\", \"GetOrder\"); return null;"));
|
||||
|
||||
var bExecutionId = Guid.NewGuid();
|
||||
var context = CreateContext(
|
||||
ActorRefs.Nobody, library,
|
||||
externalSystemClient: client.Object,
|
||||
auditWriter: writer,
|
||||
executionId: bExecutionId);
|
||||
|
||||
await context.Scripts.CallShared("C");
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
// C's parent is B's execution id — the B→C tree edge.
|
||||
Assert.Equal(bExecutionId, evt.ParentExecutionId);
|
||||
// C minted its OWN fresh, non-empty execution id, distinct from B.
|
||||
Assert.NotNull(evt.ExecutionId);
|
||||
Assert.NotEqual(Guid.Empty, evt.ExecutionId!.Value);
|
||||
Assert.NotEqual(bExecutionId, evt.ExecutionId!.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task CallShared_FromRoutedRun_ChildParentIsCaller_NotInheritedGrandparent()
|
||||
{
|
||||
// Regression / multi-level: B itself carries a parent A. When B does
|
||||
// CallShared(C), C.Parent must be B's OWN ExecutionId — NOT A. This is
|
||||
// the A→B→C chain proving each level points at its immediate spawner.
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(true, "{}", null));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var library = NewLibrary();
|
||||
Assert.True(library.CompileAndRegister(
|
||||
"C", "await Instance.ExternalSystem.Call(\"ERP\", \"GetOrder\"); return null;"));
|
||||
|
||||
var bExecutionId = Guid.NewGuid();
|
||||
var aExecutionId = Guid.NewGuid(); // B's inherited parent
|
||||
var context = CreateContext(
|
||||
ActorRefs.Nobody, library,
|
||||
externalSystemClient: client.Object,
|
||||
auditWriter: writer,
|
||||
executionId: bExecutionId,
|
||||
parentExecutionId: aExecutionId);
|
||||
|
||||
await context.Scripts.CallShared("C");
|
||||
|
||||
var evt = Assert.Single(writer.Events);
|
||||
Assert.Equal(bExecutionId, evt.ParentExecutionId);
|
||||
Assert.NotEqual(aExecutionId, evt.ParentExecutionId);
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// Alarm on-trigger plumbing
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void CreateChildContextForSharedScript_ParentIsCallerExecution_FreshOwnId()
|
||||
{
|
||||
// Unit-level proof of the child-context contract the CallShared path uses.
|
||||
var bExecutionId = Guid.NewGuid();
|
||||
var context = CreateContext(
|
||||
ActorRefs.Nobody, NewLibrary(), executionId: bExecutionId);
|
||||
|
||||
var child = context.CreateChildContextForSharedScript(childCallDepth: 1);
|
||||
|
||||
Assert.Equal(bExecutionId, child.ParentExecutionId);
|
||||
Assert.NotEqual(Guid.Empty, child.ExecutionId);
|
||||
Assert.NotEqual(bExecutionId, child.ExecutionId);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AlarmOnTrigger_NestedCallScript_CarriesAlarmRunsOwnExecutionId_AsParent()
|
||||
{
|
||||
// End-to-end alarm plumbing: when an alarm fires, its on-trigger script
|
||||
// runs in a ScriptRuntimeContext built by AlarmExecutionActor. With no
|
||||
// Guid firing id today the alarm run is a ROOT (its own ParentExecutionId
|
||||
// is null), but it still mints its OWN fresh ExecutionId. A nested
|
||||
// CallScript from that on-trigger script must therefore carry the alarm
|
||||
// run's OWN (non-null) ExecutionId as the child's ParentExecutionId —
|
||||
// proving the alarm context is a proper execution node feeding the
|
||||
// cascade and the parentExecutionId parameter is plumbed end-to-end.
|
||||
var compilationService = new ScriptCompilationService(
|
||||
NullLogger<ScriptCompilationService>.Instance);
|
||||
var sharedLibrary = new SharedScriptLibrary(
|
||||
compilationService, NullLogger<SharedScriptLibrary>.Instance);
|
||||
var options = new SiteRuntimeOptions();
|
||||
|
||||
var onTrigger = compilationService.Compile(
|
||||
"OnTrigger", "await Instance.CallScript(\"Child\"); return null;");
|
||||
Assert.NotNull(onTrigger.CompiledScript);
|
||||
|
||||
var alarmConfig = new ResolvedAlarm
|
||||
{
|
||||
CanonicalName = "HighTemp",
|
||||
TriggerType = "ValueMatch",
|
||||
TriggerConfiguration = "{\"attributeName\":\"Status\",\"matchValue\":\"Critical\"}",
|
||||
PriorityLevel = 1
|
||||
};
|
||||
|
||||
var instanceProbe = CreateTestProbe();
|
||||
var alarm = ActorOf(Props.Create(() => new AlarmActor(
|
||||
"HighTemp", "Pump1", instanceProbe.Ref, alarmConfig,
|
||||
onTrigger.CompiledScript, sharedLibrary, options,
|
||||
NullLogger<AlarmActor>.Instance)));
|
||||
|
||||
alarm.Tell(new AttributeValueChanged(
|
||||
"Pump1", "Status", "Status", "Critical", "Good", DateTimeOffset.UtcNow));
|
||||
|
||||
// The alarm raises (instance gets AlarmStateChanged) AND the on-trigger
|
||||
// script fires its nested CallScript at the instance.
|
||||
instanceProbe.ExpectMsg<AlarmStateChanged>(TimeSpan.FromSeconds(5));
|
||||
var request = instanceProbe.ExpectMsg<ScriptCallRequest>(TimeSpan.FromSeconds(5));
|
||||
|
||||
Assert.Equal("Child", request.ScriptName);
|
||||
// The alarm run is a root today (its own parent is null), but its OWN
|
||||
// freshly-minted ExecutionId cascades to the child — so the child's
|
||||
// ParentExecutionId is a real, non-empty value, NOT null.
|
||||
Assert.NotNull(request.ParentExecutionId);
|
||||
Assert.NotEqual(Guid.Empty, request.ParentExecutionId!.Value);
|
||||
|
||||
instanceProbe.Reply(new ScriptCallResult(request.CorrelationId, true, null, null));
|
||||
}
|
||||
}
|
||||
@@ -1,3 +1,8 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Messages.Instance;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types;
|
||||
using ZB.MOM.WW.ScadaBridge.Commons.Types.Scripts;
|
||||
using ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
|
||||
@@ -137,3 +142,157 @@ public class ScopeAccessorTests
|
||||
Assert.Equal("[1,2,3]", encoded);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WaitAsync (spec §3-§5, acceptance §7.6) scope-resolution tests. Unlike the
|
||||
/// path-arithmetic tests above, these route a real <see cref="ScriptRuntimeContext"/>
|
||||
/// against a TestProbe standing in for the Instance Actor, so they need a live
|
||||
/// ActorSystem — hence a TestKit-derived class. They assert that
|
||||
/// <c>Attributes.WaitAsync</c> applies <see cref="AttributeAccessor.Resolve"/>
|
||||
/// (the composition prefix) to the key BEFORE the request is sent to the actor —
|
||||
/// the same contract Get/Set obey.
|
||||
/// </summary>
|
||||
public class AttributeAccessorWaitAsyncTests : TestKit, IDisposable
|
||||
{
|
||||
private ScriptRuntimeContext MakeContext(IActorRef instanceActor) =>
|
||||
new(
|
||||
instanceActor,
|
||||
instanceActor,
|
||||
sharedScriptLibrary: null!,
|
||||
currentCallDepth: 0,
|
||||
maxCallDepth: 10,
|
||||
askTimeout: TimeSpan.FromSeconds(2),
|
||||
instanceName: "Pump1",
|
||||
logger: NullLogger<ScriptRuntimeContext>.Instance);
|
||||
|
||||
void IDisposable.Dispose() => Shutdown();
|
||||
|
||||
[Fact]
|
||||
public void WaitAsync_Value_AppliesScopeResolution_BeforeSendingRequest()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
// Composed scope "TempSensor" — Resolve("Flag") => "TempSensor.Flag".
|
||||
var acc = new AttributeAccessor(ctx, "TempSensor");
|
||||
|
||||
// Fire-and-forget; the assertion is on the message the actor receives.
|
||||
_ = acc.WaitAsync("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("TempSensor.Flag", req.AttributeName);
|
||||
// The value overload encodes the target via AttributeValueCodec.Encode and
|
||||
// sends a null predicate. bool true encodes to "True" (capital T).
|
||||
Assert.Equal(AttributeValueCodec.Encode(true), req.TargetValueEncoded);
|
||||
Assert.Equal("True", req.TargetValueEncoded);
|
||||
Assert.Null(req.Predicate);
|
||||
Assert.Equal("Pump1", req.InstanceName);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WaitAsync_Predicate_AppliesScopeResolution_AndSendsPredicate()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
var acc = new AttributeAccessor(ctx, "Motor.TempSensor");
|
||||
|
||||
Func<object?, bool> predicate = _ => true;
|
||||
_ = acc.WaitAsync("Level", predicate, TimeSpan.FromSeconds(30));
|
||||
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("Motor.TempSensor.Level", req.AttributeName);
|
||||
// The predicate overload sends the delegate and a null encoded target.
|
||||
Assert.Null(req.TargetValueEncoded);
|
||||
Assert.NotNull(req.Predicate);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void WaitAsync_RootScope_LeavesKeyBare()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
var acc = new AttributeAccessor(ctx, "");
|
||||
_ = acc.WaitAsync("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("Flag", req.AttributeName);
|
||||
}
|
||||
|
||||
// ── WaitForAsync (spec §3): scope resolution + populated WaitResult ───────
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAsync_Value_AppliesScopeResolution_AndSurfacesPopulatedWaitResult()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
// Composed scope "TempSensor" — Resolve("Flag") => "TempSensor.Flag".
|
||||
var acc = new AttributeAccessor(ctx, "TempSensor");
|
||||
|
||||
var task = acc.WaitForAsync("Flag", true, TimeSpan.FromSeconds(30));
|
||||
|
||||
// The actor receives the scope-resolved, codec-encoded request.
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("TempSensor.Flag", req.AttributeName);
|
||||
Assert.Equal(AttributeValueCodec.Encode(true), req.TargetValueEncoded);
|
||||
Assert.Null(req.Predicate);
|
||||
Assert.False(req.RequireGoodQuality);
|
||||
|
||||
// Reply with a matched response — the accessor must surface the full WaitResult.
|
||||
probe.Reply(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: true, Value: true, Quality: "Good", TimedOut: false));
|
||||
|
||||
var result = await task;
|
||||
Assert.True(result.Matched);
|
||||
Assert.Equal(true, result.Value);
|
||||
Assert.Equal("Good", result.Quality);
|
||||
Assert.False(result.TimedOut);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAsync_Predicate_AppliesScopeResolution_AndSurfacesWaitResult()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
var acc = new AttributeAccessor(ctx, "Motor.TempSensor");
|
||||
|
||||
Func<object?, bool> predicate = _ => true;
|
||||
var task = acc.WaitForAsync("Level", predicate, TimeSpan.FromSeconds(30));
|
||||
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.Equal("Motor.TempSensor.Level", req.AttributeName);
|
||||
Assert.Null(req.TargetValueEncoded);
|
||||
Assert.NotNull(req.Predicate);
|
||||
|
||||
probe.Reply(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: true, Value: 42, Quality: "Good", TimedOut: false));
|
||||
|
||||
var result = await task;
|
||||
Assert.True(result.Matched);
|
||||
Assert.Equal(42, result.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WaitForAsync_RequireGoodQuality_ThreadsFlagIntoRequest()
|
||||
{
|
||||
var probe = CreateTestProbe();
|
||||
var ctx = MakeContext(probe.Ref);
|
||||
|
||||
var acc = new AttributeAccessor(ctx, "");
|
||||
var task = acc.WaitForAsync("Flag", true, TimeSpan.FromSeconds(30), requireGoodQuality: true);
|
||||
|
||||
var req = probe.ExpectMsg<WaitForAttributeRequest>(TimeSpan.FromSeconds(5));
|
||||
Assert.True(req.RequireGoodQuality);
|
||||
|
||||
probe.Reply(new WaitForAttributeResponse(
|
||||
req.CorrelationId, Matched: false, Value: null, Quality: null, TimedOut: true));
|
||||
|
||||
var result = await task;
|
||||
Assert.False(result.Matched);
|
||||
Assert.True(result.TimedOut);
|
||||
Assert.Null(result.Value);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user