Merge M2: stillpending.md Tier-2 correctness & behavioral gaps (#7,#8,#9,#10,#13,#15,#17,#18,#20,#21,#22,#23,#24,#25,#26,#27,#28,#29,#30,#31,#32)

20 tasks (M2.0-M2.19), each through its classification-driven review chain.
Full-solution build green (0 warnings, TreatWarningsAsErrors). Per-task targeted
suites all passed. Known pre-existing: 2 partition-purge E2E failures (follow-up #52).
This commit is contained in:
Joseph Doherty
2026-06-16 08:27:59 -04:00
110 changed files with 13232 additions and 495 deletions
+13 -13
View File
@@ -36,28 +36,28 @@ public class ApiMethod
public int Id { get; set; } public int Id { get; set; }
public string Name { get; set; } // route segment public string Name { get; set; } // route segment
public string Script { get; set; } // Roslyn C# script body public string Script { get; set; } // Roslyn C# script body
public string? ParameterDefinitions { get; set; } // JSON: List<ParameterDefinition> public string? ParameterDefinitions { get; set; } // JSON Schema (object) describing parameters
public string? ReturnDefinition { get; set; } // JSON: List<ReturnFieldDefinition> public string? ReturnDefinition { get; set; } // JSON Schema describing the return value
public int TimeoutSeconds { get; set; } public int TimeoutSeconds { get; set; }
} }
``` ```
`ParameterDefinitions` and `ReturnDefinition` are stored as JSON strings to keep the schema simple; both are deserialized on every request by `ParameterValidator` and `ReturnValueValidator`. `ParameterDefinitions` and `ReturnDefinition` are stored as JSON Schema strings (canonical form: `{"type":"object","properties":{…},"required":[…]}`, arrays via `"items"`); both are parsed on every request by `ParameterValidator` and `ReturnValueValidator` into a shared recursive `InboundApiSchema` (Commons). The legacy flat-array form (`[{name,type,required,itemType?}]`) is still accepted on read.
### Extended type system ### Extended type system
Parameter and return field definitions share the same six-type vocabulary: Parameter and return definitions share the same six-type vocabulary (JSON Schema type tokens in parentheses):
| Type | JSON shape | C# value after coercion | | Type | JSON Schema token | JSON shape | C# value after coercion |
|-----------|----------------------|-------------------------------------| |-----------|-------------------|------------------|-------------------------------|
| `Boolean` | `true` / `false` | `bool` | | `Boolean` | `boolean` | `true` / `false` | `bool` |
| `Integer` | number (whole) | `long` | | `Integer` | `integer` | number (whole) | `long` |
| `Float` | number | `double` | | `Float` | `number` | number | `double` |
| `String` | string | `string` | | `String` | `string` | string | `string` |
| `Object` | JSON object | `Dictionary<string, object?>` | | `Object` | `object` | JSON object | `Dictionary<string, object?>` |
| `List` | JSON array | `List<object?>` | | `List` | `array` | JSON array | `List<object?>` |
`Object` and `List` are validated for JSON shape only — field-level or element-level type constraints are the script's responsibility. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway. `Object` and `List` are validated **recursively**: a declared object validates each field against its declared (nested) type and rejects undeclared fields; a list validates every element against the declared `items` type. Scalars are checked at any depth and errors are path-qualified (e.g. `order.items[2].quantity`). A bare `{"type":"object"}` / `{"type":"array"}` (no `properties` / `items`) stays shape-only. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway.
## Architecture ## Architecture
@@ -0,0 +1,203 @@
# M2 — Correctness & Behavioral Gaps (Tier 2) Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: superpowers-extended-cc:subagent-driven-development. Execute task-by-task on branch `feature/stillpending-m2`, in-place (NOT a worktree — docker tooling builds from the repo path; implementers run **serially** to avoid racing the shared git index). Honor each task's `Classification` for the review chain.
**Goal:** Close the Tier-2 correctness/behavioral divergences from `stillpending.md` — make narrow/inert behaviors match the spec, and where the spec was the divergence, update it in the same slice.
**Architecture:** Touches the central Config DB (EF migrations), Site Runtime actors, the DCL alarm pipeline, the template validation/flattening pipeline, the deployment diff, Host startup validation, the Security cookie-auth pipeline, and Site Event Logging. Each item is independently shippable.
**Tech Stack:** C#/.NET 10, EF Core 10 (MS SQL central + SQLite site), Akka.NET 1.5, OPC UA (`OPCFoundation.NetStandard.Opc.Ua.Client`), ASP.NET Core cookie auth, xUnit/FluentAssertions/NSubstitute.
**Build/verify:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (TreatWarningsAsErrors ON). Redeploy: `bash docker/deploy.sh`. Test user `--username multi-role --password password`.
---
## Scope decisions (recorded; per "use recommendations")
- **#19 (script started/completed events)** — already shipped in M1.8 (`e74c3ae`). **Excluded.**
- **#16 (Transport stale-instance enumeration)** — genuine Tier-2 gap but NOT in the approved M2 list, and the fix needs a non-trivial shared-script-hash staleness compute across instances. **Deferred to the Transport milestone (M8).** Tracked, not dropped.
- **#17 (MachineDataDb)** — a deliberate prior decision ("Host-008") removed this validation with a regression test asserting absence *passes*. The approved design doc says to add the option + startup validation, and both REQ-HOST-3/4 and the shipped docker `appsettings.Central.json` carry the key. **Resolution: implement per design doc (add option + central startup validation, no DbContext since nothing consumes it), reverting the Host-008 regression test and noting the reversal in the commit.**
- **#31 (StateTransitionValidator delete-from-NotDeployed)** — the audit claimed a "deliberate per code comment"; investigation found **no such comment**. **Reconcile by intent (git blame); default = align code to the spec matrix (remove `NotDeployed` from `CanDelete`) unless blame shows deliberate orphan-cleanup intent, in which case update the doc matrix.**
- **#8 (conditionFilter) semantics** — the filter is currently an undefined nullable string. **Define it as a comma-separated, case-insensitive list of alarm/condition *type names*; null/blank = mirror all.** Authoritative enforcement is **client-side in `DataConnectionActor` routing** (uniform across OPC UA + MxGateway, since MxGateway has no server-side filter); OPC UA additionally gets a server-side `WhereClause` as a bandwidth optimization where the type maps cleanly. Implementer confirms the discriminator field on `NativeAlarmTransition`.
- **#15 (LDAP re-query)** — highest risk; passwordless group re-query depends on a shared-lib capability that may not exist. **Spike first**, then ship the always-achievable layers (idle-timeout enforcement + DB role-mapping refresh on stored group claims) and the LDAP group re-query only if the lib supports a service-account search; document any residual limitation.
---
## Execution order & dependencies
Risk-first, migration-safe ordering. `#32` first (unblocks DB-backed verification). The two migration-touching tasks (`#32`, M2.5) are serialized so the snapshot stays clean.
| # | Task | Class | Migration? |
|---|------|-------|-----------|
| #32 | M2.0 EF model/snapshot drift | high-risk | snapshot only |
| M2.1 | #22 native-alarm capability validation wired into deploy pipeline | standard | no |
| M2.2 | #10 connection-level diff surfaced | standard | no |
| M2.3 | #7 `Database.CachedWrite` transient/permanent classification | high-risk | no |
| M2.4 | #8 alarm `conditionFilter` applied | high-risk | no |
| M2.5 | #9 per-script execution timeout | standard | **yes** (new column) |
| M2.6 | #13 nested `Object`/`List` validation | standard | no |
| M2.7 | #20 + #21 return-type + argument-type compatibility | standard | no |
| M2.8 | #23 binding-completeness Error + "name exists at site" | standard | no |
| M2.9 | #17 MachineDataDb fail-fast | small | no |
| M2.10 | #18 CI grep-guard (data-layer scan test) | small | no |
| M2.11 | #24 debug snapshot unknown-instance → error | small | no |
| M2.12 | #25 recursion-limit → site event log | small | no |
| M2.13 | #27 OPC UA / MxGateway transition field population | small | no |
| M2.14 | #28 readiness "required singletons running" probe | standard | no |
| M2.15 | #29 site active-node purge-gate DI registration | small | no |
| M2.16 | #30 `FailedWriteCount` consumed by Health Monitoring | small | no |
| M2.17 | #31 StateTransitionValidator reconcile | small | no |
| M2.18 | #26 debug-stream ordering + replay/dedup | high-risk | no |
| M2.19 | #15 LDAP periodic re-query (spike + impl) | high-risk | no |
---
## Tasks
### M2.0 — #32: EF model/snapshot drift (PendingModelChangesWarning)
**Classification:** high-risk · **Files:** `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs:68-69`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Migrations/ScadaBridgeDbContextModelSnapshot.cs`, possibly a new empty migration.
**Root cause:** `OccurredAtUtc` has `.HasConversion(UtcConverter)` in config; the model snapshot omits the converter annotation → EF throws `PendingModelChangesWarning` in `MsSqlMigrationFixture.MigrateAsync` (~57 AuditLog MSSQL tests fail in fixture ctor).
**Fix:** Run `dotnet ef migrations has-pending-model-changes` (or `migrations add`) against `ScadaBridgeDbContext` to surface the FULL drift (there may be more than `OccurredAtUtc`). Prefer the EF-canonical path: `dotnet ef migrations add ResyncAuditLogModelSnapshot`**verify the generated migration's `Up`/`Down` are empty (no DDL)**; a value-converter-only change produces no DDL but realigns the snapshot. If non-empty/unexpected DDL appears, stop and report. Auto-apply is dev-only per CLAUDE.md, so an empty migration is harmless to prod.
**Tests:** Re-run `dotnet test tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` (requires MSSQL via `cd infra && docker compose up -d`); the ~57 fixture-ctor failures must clear. If MSSQL is unreachable in this environment, confirm the build + the snapshot diff is empty-DDL and note the test gating.
**DoD:** No `PendingModelChangesWarning`; AuditLog MSSQL suite green (or gated-with-note if no DB). Adversarial: confirm no real schema change was smuggled in.
### M2.1 — #22: native-alarm-source capability validation wired into deploy pipeline
**Classification:** standard · **Files:** `src/.../DeploymentManager/FlatteningPipeline.cs:93,115`, `SemanticValidator.cs:30-33,239-245`, validation service call site.
**Gap (M1-era regression):** `FlatteningPipeline` loads `dataConnections` but never extracts the alarm-capable subset, so `SemanticValidator.Validate(...)` is always called with `alarmCapableConnectionNames = null` → native-alarm-source capability check never runs; a source can reference a non-alarm-capable connection and deploy.
**Fix:** In `FlatteningPipeline`, compute the alarm-capable connection-name set from the loaded connections (filter by the protocol/capability that maps to `IAlarmSubscribableConnection` — OPC UA + MxGateway), pass it into the validator. Confirm the capability predicate (protocol enum / adapter capability) is the same one DCL uses to decide `IAlarmSubscribableConnection`.
**Tests:** `tests/.../TemplateEngine.Tests` SemanticValidator/flattening — add: native-alarm source on a non-alarm-capable connection → validation Error; on a capable one → passes.
**DoD:** Deploy gate rejects native-alarm sources bound to non-capable connections.
### M2.2 — #10: connection-level diff surfaced in deployment diff
**Classification:** standard · **Files:** `src/.../Commons/Types/Flattening/ConfigurationDiff.cs:7-24`, `src/.../TemplateEngine/Flattening/DiffService.cs:19-54,174-204`, Central UI diff render (`CentralUI/Components/Shared/DiffDialog.razor` caller / deployment preview page).
**Gap:** `ComputeConnectionsDiff` exists **with tests** but is dead (no callers); `ConfigurationDiff` has no `ConnectionChanges` slot; `HasChanges` ignores connections.
**Fix:** Add `ConnectionChanges` slot (`IReadOnlyList<DiffEntry<ConnectionConfig>>` — the element type already exists) to `ConfigurationDiff`; include it in `HasChanges`. Call `ComputeConnectionsDiff` from `ComputeDiff` and populate the slot. Surface in the deployment-diff UI alongside attribute/alarm/script changes (connection name + old/new protocol + endpoint config). Wire the existing `ComputeConnectionsDiff` tests' expectations through `ComputeDiff` too.
**Tests:** `tests/.../TemplateEngine.Tests/Flattening/DiffServiceTests.cs` — add a `ComputeDiff` integration assertion that `ConnectionChanges` populates and `HasChanges` is true when only a connection differs.
**DoD:** Standalone connection endpoint/protocol/failover drift appears in the deployment diff.
### M2.3 — #7: `Database.CachedWrite` classifies transient vs permanent SQL errors
**Classification:** high-risk · **Files:** `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:78-204`, new `SqlErrorClassifier.cs` + `PermanentDatabaseException`, reference `ExternalSystemClient.cs:80-162` + `ErrorClassifier.cs`.
**Gap:** `CachedWriteAsync` buffers ALL writes without an immediate attempt; `DeliverBufferedAsync` throws on any `SqlException` → S&F retries permanent errors forever; the script never gets a synchronous `Failed`. The API path (`ExternalSystemClient`) does it right.
**Fix (mirror API path):** Add `SqlErrorClassifier.IsTransient(SqlException)` — transient = connection/timeout/deadlock/throttle error numbers (e.g. `-2, 64, 53, 233, 1205, 40197, 40501, 40613, 49918-49920`); permanent = constraint/syntax/permission/etc. Create `PermanentDatabaseException` (parallel to `PermanentExternalSystemException`). In `CachedWriteAsync`: attempt immediately; on success done; on permanent → return `Failed` synchronously (set the tracking row terminal `Failed`) and do NOT buffer; on transient → buffer to S&F. In `DeliverBufferedAsync`: classify on `SqlException`, return `false` (park) for permanent, rethrow for transient (S&F retries). Keep behavior unified with `TrackedOperationId`/`OperationTrackingStore` and the `Pending → Retrying → Delivered/Parked/Failed/Discarded` lifecycle.
**Tests:** `tests/.../ExternalSystemGateway.Tests/DatabaseGatewayTests.cs` — transient SQL (deadlock 1205, timeout -2) → buffers/retries; permanent SQL (constraint 2627, syntax 102, permission 229) → synchronous `Failed`, not buffered; `DeliverBuffered` parks on permanent. Adversarial: ambiguous error numbers default to the safer classification (document which).
**DoD:** Permanent SQL errors fail fast to the script as `Failed`; only transient errors buffer.
### M2.4 — #8: alarm `conditionFilter` applied (OPC UA WhereClause + client-side routing)
**Classification:** high-risk · **Files:** `src/.../DataConnectionLayer/Actors/DataConnectionActor.cs:1482,1540-1554`, `Adapters/RealOpcUaClient.cs:242,295-310`, `Adapters/MxGatewayDataConnection.cs:154-167`, `IAlarmSubscribableConnection.cs`.
**Decision (semantics):** filter = comma-separated, case-insensitive list of alarm/condition **type names**; null/blank = mirror all. **Authoritative gate = client-side in `DataConnectionActor.HandleAlarmTransitionReceived`** (after source-ref match, drop transitions whose type name isn't in the source's filter set). Store the per-source filter set correctly (the current `_alarmSourceFilter[...]` keying is wrong — key by source reference). OPC UA additionally builds a server-side `WhereClause` in `RealOpcUaClient` as an optimization where the condition type maps cleanly; MxGateway relies solely on the client-side gate.
**Fix:** (1) Parse the filter string into a normalized set at subscribe time, keyed by source ref. (2) In routing, consult the set and skip non-matching transitions. (3) In `RealOpcUaClient.BuildAlarmEventFilter`, attach a `WhereClause` (ContentFilter on the condition/event type) built from the filter when present. Confirm `NativeAlarmTransition` exposes a usable type-name discriminator; if not, filter on the available field and note it.
**Tests:** `tests/.../DataConnectionLayer.Tests/DataConnectionActorAlarmTests.cs` — filter set → only matching-type transitions delivered; null → all delivered; MxGateway path filters client-side; OPC UA builds a non-empty WhereClause. Adversarial: case/whitespace variations in the filter list.
**DoD:** Setting a conditionFilter actually restricts mirrored conditions across both adapters.
### M2.5 — #9: per-script execution timeout
**Classification:** standard · **Migration: yes.** · **Files:** `Commons/Entities/Templates/TemplateScript.cs`, `ConfigurationDatabase/Configurations/TemplateConfiguration.cs` (`TemplateScriptConfiguration`), **new EF migration**, `Commons/Types/Flattening/FlattenedConfiguration.cs` (`ResolvedScript`), `TemplateEngine/Flattening/FlatteningService.cs` (`ResolveInheritedScripts`), `SiteRuntime/Actors/ScriptActor.cs`, `ScriptExecutionActor.cs:100`, `AlarmExecutionActor.cs:66`, `SiteRuntimeOptions.cs:31` (global fallback unchanged).
**Gap:** Only a global `ScriptExecutionTimeoutSeconds`; no per-script field. Mirror the existing nullable `MinTimeBetweenRuns` pattern end-to-end.
**Fix:** Add `int? ExecutionTimeoutSeconds` to `TemplateScript` + EF config (nullable) + **migration** (runs after M2.0 so the snapshot is clean) + `ResolvedScript` + flattening map + `ScriptActor` field; pass it into `ScriptExecutionActor`/`AlarmExecutionActor`, which compute `effective = perScript ?? options.ScriptExecutionTimeoutSeconds`. Validate non-negative.
**Tests:** flattening test (field threads through), actor test (per-script override vs global default both enforce the CTS timeout), EF round-trip test.
**DoD:** A per-script timeout overrides the global; absent → global default.
### M2.6 — #13: nested `Object`/`List` extended-type validation
**Classification:** standard · **Files:** `src/.../InboundAPI/.../ParameterValidator.cs:109-145`, `ReturnValueValidator.cs:18`.
**Gap:** `Object`/`List` are shape-validated only (object-vs-array); no nested/field-level type validation.
**Fix:** Recursive descent through the declared `Object` field schema / `List` element type, type-checking each level (scalars by extended-type, nested Object/List recursively). Reuse the existing extended-type system; keep error messages path-qualified (`field.sub[2].x`). Apply symmetrically in both validators.
**Tests:** `tests/.../InboundAPI.Tests` — valid nested payload passes; wrong scalar type at depth, wrong list element type, missing required nested field → rejected with path.
**DoD:** Nested type mismatches are caught at inbound validation, not at script runtime. (Satisfies the M4 cross-reference to this item.)
**Status: complete.** A shared recursive engine, `Commons.Types.InboundApi.InboundApiSchema` (parse + path-qualified `Validate`), backs both validators so they cannot drift. Key finding: the canonical persisted/authored format is **JSON Schema** (object `properties` + `required`, array `items`) — produced by the Central UI schema builder and the `MigrateParametersToJsonSchema` migration — but the validators still parsed the *legacy flat array* `[{name,type}]` and only shape-checked `Object`/`List`. They could not even consume a migrated JSON-Schema-object definition (the `Deserialize<List<…>>` would fail). Rewriting both to read `InboundApiSchema` fixes that latent format mismatch *and* delivers true nested validation; the legacy flat array is still accepted on read (case-insensitive keys) for transition safety. **Undeclared-field policy: reject at every level** (a declared object rejects any field not in its `properties`, consistent with the existing top-level `InboundAPI-010` "unexpected parameter" rejection); a bare `{"type":"object"}` with no declared fields stays shape-only. A present-but-null value satisfies any type; only the *absence* of a required field is an error.
### M2.7 — #20 + #21: return-type + argument-type compatibility checks
**Classification:** standard · **Files:** `src/.../TemplateEngine/Validation/SemanticValidator.cs:62-63,251-266,279-287,390-425`.
**Gap:** `BuildReturnMap` builds maps never read (no return-type comparison); call validation checks arg *count* only (comma counting), not arg *types*.
**Fix:** #20 — compare a call site's used-return against the target script's declared `ReturnDefinition`; flag incompatible use. #21 — extract/infer argument types at the call site and check each against the parameter definition (count + type). These share `SemanticValidator` — implement together. Be conservative: only flag clear mismatches (avoid false positives on dynamically-typed expressions); document the inference limits.
**Tests:** `tests/.../TemplateEngine.Tests` SemanticValidator — return-type mismatch flagged; arg type mismatch flagged; correct calls pass; dynamic/unknown types don't false-positive.
**DoD:** Type-incompatible script calls fail validation, not just count-mismatched ones.
### M2.8 — #23: connection-binding completeness as deploy-gating Error + "name exists at site"
**Classification:** standard · **Files:** `src/.../TemplateEngine/Validation/ValidationService.cs:504-519`, `ValidationResult.cs:9`.
**Gap:** Missing-binding for a data-sourced attribute is a non-blocking Warning (so `IsValid` stays true); the "connection name exists at the target site" half is missing.
**Fix:** Elevate binding-completeness to Error (or add a parallel Error-level check) so a deployment with unresolved bindings fails the gate; add the "binding references a connection that exists on the target site" check (resolve by site connection, not just name presence). Confirm this doesn't break legitimately-unbound attributes (static/non-data-sourced) — only data-sourced attributes require a binding.
**Tests:** `tests/.../TemplateEngine.Tests` ValidationService — data-sourced attribute with no binding → Error + `IsValid` false; binding to a non-existent site connection → Error; static attribute without binding → OK.
**DoD:** Incomplete/invalid connection bindings block deploy.
### M2.9 — #17: MachineDataDb fail-fast (per design doc; reverts Host-008)
**Classification:** small · **Files:** `src/ZB.MOM.WW.ScadaBridge.Host/DatabaseOptions.cs:6-12`, `StartupValidator.cs:59-62`, `tests/.../Host.Tests/StartupValidatorTests.cs` (the `Central_MissingMachineDataDb_PassesValidation` regression).
**Fix:** Add `string? MachineDataDb` to `DatabaseOptions`; add a Central-only `Require("ScadaBridge:Database:MachineDataDb", non-empty, ...)` in `StartupValidator`. **No DbContext** (nothing consumes it). Revert the Host-008 regression test to expect failure when missing; add `MachineDataDb` to `ValidCentralConfig()`. Commit message must note the deliberate Host-008 reversal and cite REQ-HOST-3/4 + shipped docker appsettings as justification.
**Tests:** `StartupValidatorTests` — Central missing MachineDataDb → fails; present → passes; Site role unaffected.
**DoD:** Central nodes fail fast on empty MachineDataDb; spec REQ-HOST-4 satisfied.
### M2.10 — #18: CI grep-guard against UPDATE/DELETE on AuditLog
**Classification:** small · **Files:** new guard test in `tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/` (the only thing that actually runs — no CI service exists; build is Docker-only).
**Fix:** Add a test that scans the ConfigurationDatabase source tree (and migration SQL) for `UPDATE`/`DELETE` statements targeting `AuditLog`, failing if any are found in C# data-access code. Scope strictly to the `AuditLog` table (allow purge/delete on Notifications/SiteCalls and partition-switch DDL). This backstops the existing DB-role `DENY UPDATE/DELETE` (migration `20260602174346`). Optionally add an MSBuild target mirroring it, but the test is the enforced control.
**Tests:** the guard test itself; verify it passes on current clean source and would fail on a planted violation (assert via a unit on the scanner helper).
**DoD:** A code-level guard fails the test run on AuditLog mutations.
### M2.11 — #24: debug snapshot/subscribe for unknown instance returns an error
**Classification:** small · **Files:** `src/.../DeploymentManager/.../DeploymentManagerActor.cs:845-866`.
**Gap:** Unknown-instance snapshot/subscribe returns an empty snapshot — caller can't distinguish "not deployed" from "deployed-but-empty".
**Fix:** Check instance registration first; return an explicit "instance not found"/not-deployed error response (matching the existing debug response contract) instead of an empty snapshot.
**Tests:** `tests/.../DeploymentManager` (or SiteRuntime) — unknown instance → error response; known empty instance → empty snapshot (unchanged).
**DoD:** Unknown-instance debug requests are distinguishable from empty ones.
### M2.12 — #25: recursion-limit error → site event log
**Classification:** small · **Files:** `src/.../SiteRuntime/.../ScriptRuntimeContext.cs:302-305,464-466` (thread `ISiteEventLogger` in, mirroring M1.8's `ScriptExecutionActor` wiring).
**Fix:** Inject `ISiteEventLogger` into `ScriptRuntimeContext`; on recursion-limit violation, emit a `script` Error site event (fire-and-forget `_ = logger?.LogEventAsync(...)`) in addition to the existing `ILogger` log, at both check sites.
**Tests:** `tests/.../SiteRuntime.Tests` — recursion-limit hit emits a site event with category `script`, severity Error.
**DoD:** Recursion-limit violations appear in the site event log per spec.
### M2.13 — #27: populate obtainable OPC UA / MxGateway transition fields
**Classification:** small · **Files:** `src/.../DataConnectionLayer/Adapters/RealOpcUaClient.cs:395-403`, `MxGatewayAlarmMapper.cs:79-113`.
**Fix:** Populate fields that are genuinely obtainable: for OPC UA A&C, add SelectClauses + map Category, Description, OriginalRaiseTime where the server exposes them (extend `BuildAlarmEventFilter`'s SelectClauses); for MxGateway, extract `OperatorUser` (present in the event but dropped) and any available Current/Limit values. Leave truly-unavailable fields empty and document which are unavailable-by-protocol vs left-empty.
**Tests:** `tests/.../DataConnectionLayer.Tests` mapper tests — obtainable fields populate from a representative event; unavailable fields documented.
**DoD:** Display fields populate where the source provides them.
### M2.14 — #28: readiness gate checks required cluster singletons
**Classification:** standard · **Files:** `src/.../Host/Program.cs:188-201,314-317`, new health check (peer to `AkkaClusterHealthCheck.cs`).
**Gap:** Readiness covers membership + DB connectivity only; spec wants "required singletons running".
**Fix:** Add a `Ready`-tagged health check that, on the active central node, verifies each required singleton proxy is reachable (e.g. `NotificationOutboxActor`, `AuditLogIngestActor`, `SiteCallAuditActor`, `AuditLogPurgeActor`, `SiteAuditReconciliationActor`) via a short `Ask`/Identify with timeout; degrade to Unhealthy if a required singleton is unreachable. Respect the "(if applicable)" softening — only gate on singletons that should be running for this node's role. Keep the probe cheap (cache/identify, short timeout) so readiness polling stays fast.
**Tests:** `tests/.../Host.Tests` or IntegrationTests — health check reports Unhealthy when a required singleton proxy is absent; Healthy when present. Avoid flakiness (use Identify with a bounded timeout).
**DoD:** `/health/ready` reflects singleton health.
### M2.15 — #29: register the site active-node purge gate
**Classification:** small · **Files:** `src/.../SiteEventLogging/ServiceCollectionExtensions.cs:33-37`, site service registration / cluster setup.
**Gap:** `SiteEventLogActiveNodeCheck` is consulted by `EventLogPurgeService` but no implementation is registered on the site node → purge runs on standby too (defaults to `() => true`).
**Fix:** Register a `SiteEventLogActiveNodeCheck` delegate on the site node that returns true only when this node is the cluster leader/active (mirror how central gates active-node work). Keep the null-default behavior for non-clustered test hosts.
**Tests:** `tests/.../SiteEventLogging.Tests` — purge gated off on standby, on for active; default-true preserved when unregistered.
**DoD:** Site event-log purge runs only on the active node.
### M2.16 — #30: Health Monitoring consumes `FailedWriteCount`
**Classification:** small · **Files:** `src/.../SiteEventLogging/ISiteEventLogger.cs:32-40`, Health Monitoring metric path.
**Fix:** Wire `FailedWriteCount` into the site health metrics the same way other site metrics are collected/reported (find the existing site metric collection path), so the dangling metric is consumed (surface as a health metric / threshold). Keep it raw-count per the health-reporting conventions.
**Tests:** `tests/.../HealthMonitoring`/SiteEventLogging — failed writes increment the reported metric.
**DoD:** `FailedWriteCount` reaches Health Monitoring.
### M2.17 — #31: reconcile StateTransitionValidator delete-from-NotDeployed
**Classification:** small · **Files:** `src/.../DeploymentManager/.../StateTransitionValidator.cs:38-41`, possibly `docs/requirements/Component-DeploymentManager.md` (spec matrix).
**Fix:** `git blame`/log the `CanDelete` line to recover intent. Default: **align code to the spec matrix** — remove `NotDeployed` from the allowed delete states, add a clarifying comment — UNLESS history shows deliberate orphan-cleanup intent, in which case update the spec matrix (Delete from NotDeployed = Yes, with a no-op-cleanup note) instead. Whichever direction, code and doc must agree at the end.
**Tests:** `tests/.../DeploymentManager` StateTransitionValidator — the chosen rule is asserted.
**DoD:** Code and spec matrix agree on delete-from-NotDeployed.
### M2.18 — #26: debug-stream stream-first ordering + replay/dedup
**Classification:** high-risk · **Files:** `src/.../DebugStreamBridgeActor.cs:89-103,163-166`.
**Gap:** `PreStart` sends the snapshot first, then opens the gRPC stream → events in the gap window are lost. Spec wants stream-first + replay with timestamp dedup.
**Fix:** Open the gRPC subscription FIRST (buffer incoming events), then fetch+send the snapshot, then flush buffered events, deduping by timestamp/identity against the snapshot so no gap-window event is lost or double-delivered. Preserve ordering. This is a re-arch of the actor's PreStart lifecycle — keep the existing message contract.
**Tests:** `tests/.../` DebugStreamBridgeActor — an event arriving during the snapshot window is delivered exactly once after the snapshot; ordering preserved; dedup drops the snapshot-overlapping event.
**DoD:** No gap-window events lost; no duplicates.
### M2.19 — #15: LDAP periodic re-query for interactive sessions (SECURITY)
**Classification:** high-risk · **Files:** `src/.../Security/ServiceCollectionExtensions.cs:86-148` (cookie events), `JwtTokenService.cs` (wire the unused `IsIdleTimedOut`/`ShouldRefresh`/`RecordActivity`/`RefreshToken`), `RoleMapper.cs`, LDAP service interface, `CentralUI/Auth/AuthEndpoints.cs` (claims-build parity).
**Spike first:** Determine whether the shared `ZB.MOM.WW.Auth.Ldap` lib exposes a **passwordless service-account group search** for an already-authenticated username. Report the answer before building the LDAP leg.
**Fix (layered):**
1. **Always achievable** — add `CookieAuthenticationEvents.OnValidatePrincipal` that: enforces idle-timeout (reject/sign-out past 30-min idle, advance last-activity on use), and refreshes role claims by **re-running `RoleMapper` on the stored group claims** (picks up central role-mapping changes without LDAP). Stamp a `LastLdapCheck` claim.
2. **If the lib supports passwordless group search** — when `LastLdapCheck` is >15 min old, re-query LDAP groups via the service-account search, re-map roles, update role/site claims. **On LDAP failure: keep existing roles, do NOT sign out** (per "LDAP failure: new logins fail; active sessions continue with current roles"). If the lib does NOT support it, ship layer 1 and document the residual limitation (group-membership changes picked up only at next login) in the security doc.
Rebuild claims identically to `/auth/login` (same claim types). Use the cookie-only model (embedded-JWT is dispositioned doc-only in M4).
**Tests (incl. adversarial):** idle-timeout enforced; role-mapping change reflected without LDAP; LDAP-down on re-query keeps existing roles (no sign-out); >15-min triggers re-query, <15-min skips (TTL respected); a revoked-group user loses roles after re-query (if LDAP leg shipped).
**DoD:** Interactive sessions enforce idle-timeout and refresh roles per the documented policy; any residual LDAP-dependency limitation is documented.
---
## Cross-cutting
- `dotnet build ZB.MOM.WW.ScadaBridge.slnx` green (TreatWarningsAsErrors); relevant unit/integration tests pass per task.
- MSSQL-backed tests need `cd infra && docker compose up -d`; if unavailable, gate-with-note (M2.0 especially).
- Migration tasks (M2.0, M2.5) serialized; M2.0 first.
- `git diff` review before each commit; design-summary commit messages; one logical slice per commit.
- After all tasks: final integration code review, build, and `bash docker/deploy.sh` smoke (`curl localhost:9000/health/ready`).
@@ -0,0 +1,35 @@
{
"planPath": "docs/plans/2026-06-15-stillpending-m2-implementation.md",
"tasks": [
{"id": 32, "ref": "M2.0", "subject": "M2.0 #32: EF model/snapshot drift (PendingModelChangesWarning)", "class": "high-risk", "status": "completed", "commits": ["2fb608f"]},
{"id": 33, "ref": "M2.1", "subject": "M2.1 #22: native-alarm capability validation wired into deploy pipeline", "class": "standard", "status": "completed", "commits": ["d690920", "41d828e"]},
{"id": 34, "ref": "M2.2", "subject": "M2.2 #10: connection-level diff surfaced in deployment diff", "class": "standard", "status": "completed", "commits": ["e9a84ba", "198770f"]},
{"id": 35, "ref": "M2.3", "subject": "M2.3 #7: Database.CachedWrite transient/permanent SQL classification", "class": "high-risk", "status": "completed", "commits": ["d052706", "de375ff"]},
{"id": 36, "ref": "M2.4", "subject": "M2.4 #8: alarm conditionFilter applied (OPC UA WhereClause + client routing)", "class": "high-risk", "status": "completed", "commits": ["8825df5", "00304a2"]},
{"id": 37, "ref": "M2.5", "subject": "M2.5 #9: per-script execution timeout (entity+migration+flatten+actor)", "class": "standard", "status": "completed", "blockedBy": [32], "commits": ["3edef09", "3032faa"]},
{"id": 38, "ref": "M2.6", "subject": "M2.6 #13: nested Object/List extended-type validation", "class": "standard", "status": "completed", "commits": ["4b6187c", "411d0c0"]},
{"id": 39, "ref": "M2.7", "subject": "M2.7 #20+#21: return-type + argument-type compatibility checks", "class": "standard", "status": "completed", "commits": ["958229e", "a8e9e99"]},
{"id": 40, "ref": "M2.8", "subject": "M2.8 #23: binding-completeness Error + name-exists-at-site", "class": "standard", "status": "completed", "commits": ["7c14a69", "21b801b"]},
{"id": 41, "ref": "M2.9", "subject": "M2.9 #17: MachineDataDb fail-fast (reverts Host-008)", "class": "small", "status": "completed", "commits": ["76198b3"]},
{"id": 42, "ref": "M2.10", "subject": "M2.10 #18: CI grep-guard against UPDATE/DELETE on AuditLog", "class": "small", "status": "completed", "commits": ["e7b6fe3", "9cd62aa"]},
{"id": 43, "ref": "M2.11", "subject": "M2.11 #24: debug snapshot unknown-instance returns error", "class": "small", "status": "completed", "commits": ["dbf44b9", "d160c7f"]},
{"id": 44, "ref": "M2.12", "subject": "M2.12 #25: recursion-limit error to site event log", "class": "small", "status": "completed", "commits": ["f08038d", "e2b31a9"]},
{"id": 45, "ref": "M2.13", "subject": "M2.13 #27: populate obtainable OPC UA/MxGateway transition fields", "class": "small", "status": "completed", "commits": ["722b866", "3945789"]},
{"id": 46, "ref": "M2.14", "subject": "M2.14 #28: readiness gate checks required cluster singletons", "class": "standard", "status": "completed", "commits": ["253bec5", "6b1cb9e"]},
{"id": 47, "ref": "M2.15", "subject": "M2.15 #29: register site active-node purge gate (DI)", "class": "small", "status": "completed", "commits": ["e1ee37e"]},
{"id": 48, "ref": "M2.16", "subject": "M2.16 #30: Health Monitoring consumes FailedWriteCount", "class": "small", "status": "completed", "commits": ["d81f747", "c9244d8"]},
{"id": 49, "ref": "M2.17", "subject": "M2.17 #31: reconcile StateTransitionValidator delete-from-NotDeployed", "class": "small", "status": "completed", "commits": ["c104356"]},
{"id": 50, "ref": "M2.18", "subject": "M2.18 #26: debug-stream stream-first ordering + replay/dedup", "class": "high-risk", "status": "completed", "commits": ["d8519cb", "a0d9379"]},
{"id": 51, "ref": "M2.19", "subject": "M2.19 #15: LDAP periodic re-query for interactive sessions (spike+impl)", "class": "high-risk", "status": "completed", "note": "Spike outcome: shared ILdapAuthService exposes only AuthenticateAsync (no passwordless group-search) -> live LDAP group re-query out of scope (external pkg, tracked follow-up). Implemented always-achievable layers: stored zb:group + zb:lastrolerefresh claims at login, shared SessionClaimBuilder (DRY login+refresh), CookieSessionValidator + OnValidatePrincipal (idle-timeout reject@30m, DB-only role-mapping refresh@15m, fail-soft keep-session on refresh error). Residual limitation documented in Component-Security.md.", "commits": ["8fe7f46", "fddc695"]}
],
"deferred": [
{"ref": "#16", "subject": "Transport stale-instance enumeration", "to": "M8 (Transport)"},
{"ref": "#19", "subject": "script started/completed events", "status": "done in M1.8"}
],
"followups": [
{"id": 52, "subject": "Investigate 2 partition-purge E2E test failures (AuditLogPurgeActor/PartitionPurge)", "from": "M2.0", "status": "pending"},
{"id": 53, "subject": "Dedup alarm-capable protocol predicate (3 copies → AlarmCapableProtocols)", "from": "M2.1", "status": "pending"},
{"id": 54, "subject": "Expose ExecutionTimeoutSeconds (+ MinTimeBetweenRuns) in CLI + UI script authoring", "from": "M2.5", "status": "pending"}
],
"lastUpdated": "2026-06-15"
}
@@ -84,7 +84,14 @@ All mutating operations on a single instance (deploy, disable, enable, delete) s
|---------------|--------|---------|--------|--------| |---------------|--------|---------|--------|--------|
| Enabled | Yes | Yes | No (already enabled) | Yes | | Enabled | Yes | Yes | No (already enabled) | Yes |
| Disabled | Yes (enables on apply) | No (already disabled) | Yes | Yes | | Disabled | Yes (enables on apply) | No (already disabled) | Yes | Yes |
| Not deployed | Yes (initial deploy) | No | No | No | | Not deployed | Yes (initial deploy) | No | No | Yes (removes the orphan record) |
> **Delete from Not deployed:** permitted so an instance that was previously
> undeployed (state `NotDeployed`) can have its record fully removed —
> deployment history, snapshot, attribute/alarm overrides, and connection
> bindings — rather than lingering as an unremovable orphan. There is no live
> site configuration to tear down in this state, so the delete is a
> central-side record cleanup (no site round-trip required).
## System-Wide Artifact Deployment Failure Handling ## System-Wide Artifact Deployment Failure Handling
+2
View File
@@ -95,6 +95,8 @@ On central nodes, the ASP.NET Core web endpoints (Central UI, Inbound API) must
- Database connectivity (MS SQL) is verified. - Database connectivity (MS SQL) is verified.
- Required cluster singletons are running (if applicable). - Required cluster singletons are running (if applicable).
These are implemented as three `Ready`-tagged health checks registered in the Central-role branch of `Program.cs` (so they are naturally role-scoped — site nodes do not run them): `database` (`DatabaseHealthCheck<ScadaBridgeDbContext>`), `akka-cluster` (`AkkaClusterHealthCheck`), and `required-singletons` (`RequiredSingletonsHealthCheck`). The last verifies each *required-always* central singleton is reachable by Asking its local `ClusterSingletonProxy` an `Identify` with a short bounded timeout (~2s, probes run concurrently) and treating a non-null `ActorIdentity.Subject` as reachable; any unreachable required singleton degrades the check to **Unhealthy**, naming it. The required-always set is the five unconditional central singletons: notification-outbox, audit-log-ingest, site-call-audit, audit-log-purge, and site-audit-reconciliation. Feature-gated singletons are the "if applicable" case and are not probed when their feature is off. The check is leadership-agnostic — the proxy reaches the singleton from either central node, so a ready standby still reports ready (readiness must NOT require cluster leadership; that is the `Active` tier's job). During a brief singleton handover the probe may momentarily time out and the node may flap to not-ready, which is correct: a node mid-handover is legitimately not fully ready (no retries are used, to keep readiness polling fast).
A standard ASP.NET Core health check endpoint (`/health/ready`) reports readiness status. The load balancer uses this endpoint to determine when to route traffic to the node. During startup or failover, the node returns `503 Service Unavailable` until ready. A standard ASP.NET Core health check endpoint (`/health/ready`) reports readiness status. The load balancer uses this endpoint to determine when to route traffic to the node. During startup or failover, the node returns `503 Service Unavailable` until ready.
### REQ-HOST-5: Windows Service Hosting ### REQ-HOST-5: Windows Service Hosting
+14 -2
View File
@@ -40,9 +40,10 @@ Each API method definition includes:
- **Approved API Keys**: List of API keys authorized to invoke this method. Requests from non-approved keys are rejected. - **Approved API Keys**: List of API keys authorized to invoke this method. Requests from non-approved keys are rejected.
- **Parameter Definitions**: Ordered list of input parameters, each with: - **Parameter Definitions**: Ordered list of input parameters, each with:
- Parameter name. - Parameter name.
- Data type (Boolean, Integer, Float, String — same fixed set as template attributes). - Data type — the **extended type system** (Boolean, Integer, Float, String, plus the nestable Object and List; see [Extended Type System](#extended-type-system)).
- Whether the parameter is required.
- **Return Value Definition**: Structure of the response, with: - **Return Value Definition**: Structure of the response, with:
- Field names and data types. Supports returning **lists of objects**. - Field names and (extended-system) data types. Supports returning **lists of objects** and arbitrarily nested structures.
- **Implementation Script**: C# script that executes when the method is called. Stored **inline** in the method definition. Follows standard C# authoring patterns but has no template inheritance — it is a standalone script tied to this method. - **Implementation Script**: C# script that executes when the method is called. Stored **inline** in the method definition. Follows standard C# authoring patterns but has no template inheritance — it is a standalone script tied to this method.
- **Timeout**: Configurable per method. Defines the maximum time the method is allowed to execute (including any routed calls to sites) before returning a timeout error to the caller. - **Timeout**: Configurable per method. Defines the maximum time the method is allowed to execute (including any routed calls to sites) before returning a timeout error to the caller.
@@ -99,6 +100,17 @@ Each API method definition includes:
- This allows complex request/response structures (e.g., an object containing properties and a list of nested objects). - This allows complex request/response structures (e.g., an object containing properties and a list of nested objects).
- Template attributes retain the simpler four-type system. The extended types apply only to Inbound API method definitions and External System Gateway method definitions. - Template attributes retain the simpler four-type system. The extended types apply only to Inbound API method definitions and External System Gateway method definitions.
#### Type Definition Format & Nested Validation
- Parameter and return type definitions are persisted as **JSON Schema** (the canonical format produced by the Central UI schema builder; see the `MigrateParametersToJsonSchema` migration). An object declares its fields via `properties` (+ a `required` array); a list declares its element type via `items`. The legacy flat-array form (`[{name,type,required,itemType?}]`) is still accepted on read for transition safety.
- Validation is **recursive and type-aware** for the extended types (request parameters and script return values alike, via a single shared engine so the two cannot drift):
- **Object**: each declared field's value is validated against its declared (possibly nested) type; a missing required field and a present-but-wrong type are both reported.
- **List**: every element is validated against the declared element type (recursing into nested objects/lists). A list whose element type is left undeclared (`array` without `items`) is shape-checked only.
- **Scalars at any depth** are checked against the extended type.
- Errors are **path-qualified** (e.g. `order.items[2].quantity`) so the caller can locate the offending field.
- **Undeclared fields are rejected** at every level (consistent with the top-level "unexpected parameter" rejection): an object that declares its fields rejects any field not in its `properties`, so a typo'd field name surfaces as a `400`/error rather than being silently ignored. A bare object schema with no declared fields (`{"type":"object"}`) stays shape-only and accepts any fields.
- A JSON `null` value satisfies any declared type (a present-but-null field is allowed); only the **absence** of a required field is an error.
## Script Compilation & Hot-Reload ## Script Compilation & Hot-Reload
API method scripts are compiled at central startup — all method definitions are loaded from the configuration database and compiled into in-memory delegates. API method scripts are compiled at central startup — all method definitions are loaded from the configuration database and compiled into in-memory delegates.
+27 -5
View File
@@ -32,9 +32,31 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs. - **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs.
### Token Lifecycle ### Token Lifecycle
- **JWT expiry**: 15 minutes. On each request, if the cookie-embedded JWT is near expiry, the app re-queries LDAP for current group memberships and issues a fresh JWT, writing an updated cookie. Roles are never more than 15 minutes stale.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the token is not refreshed and the user must re-login. Tracked via a last-activity timestamp in the token. > **Implementation note (M2.19, #15).** The interactive Central UI login path signs in
- **Sliding refresh**: Active users stay logged in indefinitely — the token refreshes every 15 minutes as long as requests are made within the 30-minute idle window. > with **bare cookie claims**, not a cookie-embedded JWT. The session lifecycle below is
> therefore enforced by the cookie middleware (`ExpireTimeSpan` + `SlidingExpiration`) plus
> a `CookieAuthenticationEvents.OnValidatePrincipal` handler — see **Session Validation
> (`OnValidatePrincipal`)** below. The embedded-JWT model remains the documented design
> intent and is the mechanism for any non-cookie bearer surface (e.g. `/auth/token`), but
> it is **not** the transport for the cookie principal.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the session is rejected and the user must re-login. Tracked via a `LastActivity` last-activity timestamp claim. The cookie's `ExpireTimeSpan` is set to the idle timeout and `SlidingExpiration` renews it on activity, so the cookie window and the explicit `OnValidatePrincipal` idle check use the **same** value and cannot contradict each other.
- **Role-mapping refresh (LDAP-free)**: Configurable, default **15 minutes** (`SecurityOptions.RoleRefreshThresholdMinutes`). At login the session stores the user's raw LDAP groups (one `zb:group` claim each) plus a `zb:lastrolerefresh` anchor. Once the anchor is older than the threshold, `OnValidatePrincipal` re-runs the **DB-backed** `RoleMapper` on the stored groups — **with no LDAP call** — rebuilds the role/scope claims via the shared claim-builder, advances the anchor, and re-issues the cookie. Central role-mapping (DB) changes — including a **revoked** mapping that drops the user's roles, and changed site-scope rules — take effect within this window. Roles derived from central mappings are never more than ~15 minutes stale.
#### Session Validation (`OnValidatePrincipal`)
- The cookie principal is built at login by a **single shared claim-builder** (`SessionClaimBuilder`). The `OnValidatePrincipal` role-refresh path rebuilds the principal through the **same** builder, so the login and refresh claim shapes cannot drift.
- **Failure policy**: the refresh is best-effort. Any error during the refresh (e.g. the configuration database is unreachable) **keeps the existing principal with its current roles** — it never signs the user out and never throws out of the request pipeline. This mirrors the **Active sessions** stance under *LDAP Connection Failure* below. Only the explicit idle-timeout path rejects the principal.
> **Residual limitation — live LDAP group-membership changes (follow-up).** The
> mid-session refresh re-maps the **stored** groups against the central database; it does
> **not** re-query LDAP, so a change to the user's actual **group membership** in the
> directory is picked up only at **next login**. A live group re-query for an active
> session would require a new passwordless service-account group-search method on the
> shared `ZB.MOM.WW.Auth.Ldap` library, which is an **external NuGet package** and exposes
> only `AuthenticateAsync(username, password, ct)` (no standalone group search). Adding
> that method is tracked as a follow-up. Until then: central role-mapping/scope changes are
> reflected within ~15 minutes; directory group-membership changes require re-login.
### Load Balancer Compatibility ### Load Balancer Compatibility
- The authentication cookie carries a self-contained JWT — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store. - The authentication cookie carries a self-contained JWT — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store.
@@ -43,8 +65,8 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
## LDAP Connection Failure ## LDAP Connection Failure
- **New logins**: If the LDAP/AD server is unreachable, login attempts **fail**. Users cannot be authenticated without LDAP. - **New logins**: If the LDAP/AD server is unreachable, login attempts **fail**. Users cannot be authenticated without LDAP.
- **Active sessions**: Users with valid (not-yet-expired) JWTs can **continue operating** with their current roles. The token refresh is skipped until LDAP is available again. This avoids disrupting engineers mid-work during a brief LDAP outage. - **Active sessions**: Users with a valid (not-idle-timed-out) session can **continue operating** with their current roles during an LDAP outage. Interactive cookie sessions never re-query LDAP mid-session (the mid-session role-mapping refresh is DB-only — see *Session Validation* above), so a brief LDAP outage does not disrupt engineers mid-work; central role-mapping changes still apply within the refresh window regardless of LDAP availability.
- **Recovery**: When LDAP becomes reachable again, the next token refresh cycle re-queries group memberships and issues a fresh token with current roles. - **Recovery (group-membership changes)**: Because the mid-session refresh is LDAP-free, a change to a user's **directory group membership** is picked up at the user's **next login** (when LDAP is queried again), not mid-session — see the *Residual limitation* note above.
## Roles ## Roles
@@ -1,4 +1,3 @@
using System.Security.Claims;
using Microsoft.AspNetCore.Authentication; using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Builder; using Microsoft.AspNetCore.Builder;
@@ -35,7 +34,6 @@ public static class AuthEndpoints
} }
var ldapAuth = context.RequestServices.GetRequiredService<ILdapAuthService>(); var ldapAuth = context.RequestServices.GetRequiredService<ILdapAuthService>();
var jwtService = context.RequestServices.GetRequiredService<JwtTokenService>();
var roleMapper = context.RequestServices.GetRequiredService<IGroupRoleMapper<string>>(); var roleMapper = context.RequestServices.GetRequiredService<IGroupRoleMapper<string>>();
var authResult = await ldapAuth.AuthenticateAsync(username, password, context.RequestAborted); var authResult = await ldapAuth.AuthenticateAsync(username, password, context.RequestAborted);
@@ -72,39 +70,23 @@ public static class AuthEndpoints
// the documented sliding-refresh policy. // the documented sliding-refresh policy.
var displayName = string.IsNullOrEmpty(authResult.DisplayName) ? username : authResult.DisplayName; var displayName = string.IsNullOrEmpty(authResult.DisplayName) ? username : authResult.DisplayName;
var resolvedUsername = string.IsNullOrEmpty(authResult.Username) ? username : authResult.Username; var resolvedUsername = string.IsNullOrEmpty(authResult.Username) ? username : authResult.Username;
var claims = new List<Claim>
{
new(ClaimTypes.Name, resolvedUsername),
new(JwtTokenService.DisplayNameClaimType, displayName),
new(JwtTokenService.UsernameClaimType, resolvedUsername),
};
foreach (var role in roleMapping.Roles) // M2.19 (#15): build the cookie principal through the shared
{ // SessionClaimBuilder — the SINGLE source of truth that the mid-session
claims.Add(new Claim(JwtTokenService.RoleClaimType, role)); // OnValidatePrincipal role-refresh path ALSO uses, so login and refresh can
} // never drift. It stamps the canonical identity/role/scope claims (with
// roleType/nameType pinned for IsInRole), PLUS the M2.19 additions: one
if (!scope.IsSystemWideDeployment) // zb:group claim per raw LDAP group (the durable input the mid-session
{ // RoleMapper re-run consumes) and a zb:lastrolerefresh anchor (login time,
foreach (var siteId in scope.PermittedSiteIds) // UTC) that also seeds the LastActivity idle anchor. The refresh timestamp is
{ // the login instant, so the first role refresh is due RoleRefreshThresholdMinutes
claims.Add(new Claim(JwtTokenService.SiteIdClaimType, siteId)); // later — not immediately.
} var principal = SessionClaimBuilder.Build(
} resolvedUsername,
displayName,
// Task 1.5: name the role/name claim types explicitly so the cookie authResult.Groups,
// principal's IsInRole / [Authorize(Roles=…)] resolve against the same scope,
// canonical types we mint (JwtTokenService.RoleClaimType = ZbClaimTypes.Role, DateTimeOffset.UtcNow);
// ClaimTypes.Name = ZbClaimTypes.Name). The policies use
// RequireClaim(RoleClaimType, …) which checks type+value directly, but
// pinning roleType keeps IsInRole-style checks consistent and survives the
// cookie serialize/round-trip.
var identity = new ClaimsIdentity(
claims,
authenticationType: CookieAuthenticationDefaults.AuthenticationScheme,
nameType: ClaimTypes.Name,
roleType: JwtTokenService.RoleClaimType);
var principal = new ClaimsPrincipal(identity);
await context.SignInAsync( await context.SignInAsync(
CookieAuthenticationDefaults.AuthenticationScheme, CookieAuthenticationDefaults.AuthenticationScheme,
@@ -445,6 +445,17 @@
}); });
}); });
// M2.11: the site returns InstanceNotFound=true when the instance is
// not deployed there (e.g. deployment not yet pushed, or wrong site).
if (session.InitialSnapshot.InstanceNotFound)
{
DebugStreamService.StopStream(session.SessionId);
_toast.ShowError(
"Instance not found on the selected site — check the deployment target.");
_connecting = false;
return;
}
_session = session; _session = session;
// Populate initial state from snapshot // Populate initial state from snapshot
@@ -864,12 +864,144 @@
? "The deployed revision hash differs from the current template-derived hash. Redeploy to apply changes." ? "The deployed revision hash differs from the current template-derived hash. Redeploy to apply changes."
: "No differences between deployed and current configuration."); : "No differences between deployed and current configuration.");
builder.CloseElement(); builder.CloseElement();
// DeploymentManager-018: render the structured diff sections so
// the operator sees WHAT changed, not just that the hash moved.
// Each section uses the same compact change-table idiom; the
// connection section surfaces standalone endpoint/protocol/
// failover drift that no per-attribute row would show (#10).
var d = diffResult.Diff;
if (d != null)
{
RenderChangeSection(builder, 100_000, "Attributes", d.AttributeChanges,
a => a.Value ?? "—");
RenderChangeSection(builder, 200_000, "Alarms", d.AlarmChanges,
a => $"P{a.PriorityLevel} · {a.TriggerType}");
RenderChangeSection(builder, 300_000, "Scripts", d.ScriptChanges,
s => s.TriggerType ?? "—");
RenderChangeSection(builder, 400_000, "Connections", d.ConnectionChanges,
c => FormatConnection(c));
}
} }
}; };
await _diffDialog.ShowAsync($"Deployment Diff — {inst.UniqueName}", body); await _diffDialog.ShowAsync($"Deployment Diff — {inst.UniqueName}", body);
} }
// Compact summary of a connection's deployment-relevant fields for the diff
// table's Before/After cells. Surfaces all four fields ConnectionsEqual
// compares — protocol, primary endpoint config, failover retry count, and
// the backup endpoint — so a backup-only change doesn't show identical
// Before/After cells. The backup segment is omitted when there is no backup.
private static string FormatConnection(
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.ConnectionConfig c)
{
var endpoint = string.IsNullOrWhiteSpace(c.ConfigurationJson) ? "—" : c.ConfigurationJson;
var summary = $"{c.Protocol} · {endpoint} · failover ×{c.FailoverRetryCount}";
if (!string.IsNullOrWhiteSpace(c.BackupConfigurationJson))
{
summary += $" · backup {c.BackupConfigurationJson}";
}
return summary;
}
// Renders one change section (a heading plus a Bootstrap change-table) for a
// set of diff entries, matching the deployment-diff idiom used elsewhere in
// the UI: table-sm/table-striped, a colored change badge, and Before/After
// text columns. Nothing is rendered when the section has no entries, so the
// four sections (attributes, alarms, scripts, connections) all read the same
// and only appear when they actually changed. seqBase values are spaced
// 100k apart so each section's per-row sequence numbers (13 per row) stay in
// a disjoint, ascending range no matter how many entries a section has.
private static void RenderChangeSection<T>(
Microsoft.AspNetCore.Components.Rendering.RenderTreeBuilder builder,
int seqBase,
string heading,
IReadOnlyList<ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffEntry<T>> entries,
Func<T, string> summarize)
{
if (entries.Count == 0)
return;
builder.OpenElement(seqBase, "div");
builder.AddAttribute(seqBase + 1, "class", "mt-3");
builder.OpenElement(seqBase + 2, "div");
builder.AddAttribute(seqBase + 3, "class", "fw-semibold small mb-1");
builder.AddContent(seqBase + 4, $"{heading} ({entries.Count})");
builder.CloseElement();
builder.OpenElement(seqBase + 5, "table");
builder.AddAttribute(seqBase + 6, "class", "table table-sm table-striped align-middle mb-0");
// Header row.
builder.OpenElement(seqBase + 7, "thead");
builder.OpenElement(seqBase + 8, "tr");
AppendHeaderCell(builder, seqBase + 9, "Name");
AppendHeaderCell(builder, seqBase + 12, "Change");
AppendHeaderCell(builder, seqBase + 15, "Before");
AppendHeaderCell(builder, seqBase + 18, "After");
builder.CloseElement(); // tr
builder.CloseElement(); // thead
builder.OpenElement(seqBase + 21, "tbody");
var rowSeq = seqBase + 22;
foreach (var entry in entries)
{
builder.OpenElement(rowSeq, "tr");
builder.OpenElement(rowSeq + 1, "td");
builder.AddContent(rowSeq + 2, entry.CanonicalName);
builder.CloseElement();
builder.OpenElement(rowSeq + 3, "td");
builder.OpenElement(rowSeq + 4, "span");
builder.AddAttribute(rowSeq + 5, "class", ChangeBadgeClass(entry.ChangeType));
builder.AddContent(rowSeq + 6, entry.ChangeType.ToString());
builder.CloseElement();
builder.CloseElement();
builder.OpenElement(rowSeq + 7, "td");
builder.AddAttribute(rowSeq + 8, "class", "small text-muted");
builder.AddContent(rowSeq + 9,
entry.OldValue is null ? "—" : summarize(entry.OldValue));
builder.CloseElement();
builder.OpenElement(rowSeq + 10, "td");
builder.AddAttribute(rowSeq + 11, "class", "small");
builder.AddContent(rowSeq + 12,
entry.NewValue is null ? "—" : summarize(entry.NewValue));
builder.CloseElement();
builder.CloseElement(); // tr
rowSeq += 13;
}
builder.CloseElement(); // tbody
builder.CloseElement(); // table
builder.CloseElement(); // div.mt-3
}
private static void AppendHeaderCell(
Microsoft.AspNetCore.Components.Rendering.RenderTreeBuilder builder, int seq, string text)
{
builder.OpenElement(seq, "th");
builder.AddAttribute(seq + 1, "scope", "col");
builder.AddContent(seq + 2, text);
builder.CloseElement();
}
private static string ChangeBadgeClass(
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType changeType) => changeType switch
{
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType.Added => "badge bg-success",
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType.Removed => "badge bg-danger",
_ => "badge bg-warning text-dark",
};
// ---- Dropdown option helpers ---- // ---- Dropdown option helpers ----
private IEnumerable<(int Id, string Label)> EnumerateSiteOptions() private IEnumerable<(int Id, string Label)> EnumerateSiteOptions()
{ {
@@ -117,6 +117,9 @@
private string? _scriptParameters; private string? _scriptParameters;
private string? _scriptReturn; private string? _scriptReturn;
private bool _scriptIsLocked; private bool _scriptIsLocked;
// Round-tripped from the loaded script so UI edits preserve a timeout set
// via Transport import (no authoring control in the UI — scoped out).
private int? _scriptExecutionTimeoutSeconds;
private string? _scriptFormError; private string? _scriptFormError;
private string _scriptModalTab = "trigger"; // "trigger" | "code" | "parameters" | "return" private string _scriptModalTab = "trigger"; // "trigger" | "code" | "parameters" | "return"
private MonacoEditor? _scriptEditor; private MonacoEditor? _scriptEditor;
@@ -1797,6 +1800,7 @@
_scriptParameters = null; _scriptParameters = null;
_scriptReturn = null; _scriptReturn = null;
_scriptIsLocked = false; _scriptIsLocked = false;
_scriptExecutionTimeoutSeconds = null;
_scriptModalTab = "trigger"; _scriptModalTab = "trigger";
ResetScriptTestRun(); ResetScriptTestRun();
} }
@@ -1814,6 +1818,9 @@
_scriptParameters = script.ParameterDefinitions; _scriptParameters = script.ParameterDefinitions;
_scriptReturn = script.ReturnDefinition; _scriptReturn = script.ReturnDefinition;
_scriptIsLocked = script.IsLocked; _scriptIsLocked = script.IsLocked;
// Preserve any timeout set via Transport import — the UI has no authoring
// control for this field, so we round-trip the loaded value unchanged.
_scriptExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds;
_scriptModalTab = "trigger"; _scriptModalTab = "trigger";
ResetScriptTestRun(); ResetScriptTestRun();
} }
@@ -1907,6 +1914,9 @@
ReturnDefinition = _scriptReturn, ReturnDefinition = _scriptReturn,
IsLocked = _scriptIsLocked, IsLocked = _scriptIsLocked,
MinTimeBetweenRuns = DurationInput.Compose(_scriptMinTimeValue, _scriptMinTimeUnit), MinTimeBetweenRuns = DurationInput.Compose(_scriptMinTimeValue, _scriptMinTimeUnit),
// Round-trip the loaded value — no UI control, so preserve
// any timeout set via Transport import unchanged.
ExecutionTimeoutSeconds = _scriptExecutionTimeoutSeconds,
IsInherited = existing.IsInherited, IsInherited = existing.IsInherited,
LockedInDerived = existing.LockedInDerived, LockedInDerived = existing.LockedInDerived,
}; };
@@ -52,6 +52,15 @@ public class TemplateScript
/// </summary> /// </summary>
public TimeSpan? MinTimeBetweenRuns { get; set; } public TimeSpan? MinTimeBetweenRuns { get; set; }
/// <summary>
/// Per-script execution timeout in seconds, or null to use the site's global
/// default (<c>SiteRuntimeOptions.ScriptExecutionTimeoutSeconds</c>). A
/// non-positive value (≤ 0) is treated the same as null — i.e. fall back to
/// the global default — by the Site Runtime. Seconds (not a TimeSpan) to keep
/// the unit consistent with the global option it overrides.
/// </summary>
public int? ExecutionTimeoutSeconds { get; set; }
/// <summary> /// <summary>
/// True when this row was copied from the base template and has not been /// True when this row was copied from the base template and has not been
/// overridden on the derived template. Changes to the base flow downward /// overridden on the derived template. Changes to the base flow downward
@@ -0,0 +1,34 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
/// <summary>
/// Single source of truth for which data-connection protocol strings produce an
/// adapter that implements <see cref="IAlarmSubscribableConnection"/> (i.e. can
/// mirror native alarms).
///
/// The set MUST stay in sync with the protocols registered against an
/// alarm-subscribable adapter in
/// <c>DataConnectionLayer/DataConnectionFactory.cs</c>: today the "OpcUa" adapter
/// (<c>OpcUaDataConnection</c>) and the "MxGateway" adapter
/// (<c>MxGatewayDataConnection</c>) both implement
/// <see cref="IAlarmSubscribableConnection"/>. The runtime decision is made in
/// <c>DataConnectionActor</c> via <c>_adapter is IAlarmSubscribableConnection</c>;
/// this central-side helper lets the deploy pipeline and Central UI gate
/// native-alarm-source bindings against the same notion without instantiating an
/// adapter. Adding a new alarm-capable protocol = register the adapter in the
/// factory AND add its protocol string here.
/// </summary>
public static class AlarmCapableProtocols
{
/// <summary>
/// Determines whether a data connection's protocol string resolves to an
/// alarm-capable adapter (one implementing <see cref="IAlarmSubscribableConnection"/>).
/// Case-insensitive to match <c>DataConnectionFactory</c>'s own
/// <c>OrdinalIgnoreCase</c> protocol-key lookup; <c>null</c>/blank is not
/// alarm-capable.
/// </summary>
/// <param name="protocol">The data connection protocol string (e.g. "OpcUa").</param>
/// <returns><c>true</c> when the protocol's adapter can subscribe native alarms; otherwise <c>false</c>.</returns>
public static bool IsAlarmCapable(string? protocol) =>
string.Equals(protocol, "OpcUa", StringComparison.OrdinalIgnoreCase)
|| string.Equals(protocol, "MxGateway", StringComparison.OrdinalIgnoreCase);
}
@@ -56,8 +56,17 @@ public interface IDatabaseGateway
/// <param name="parameters">Optional SQL parameters for the statement.</param> /// <param name="parameters">Optional SQL parameters for the statement.</param>
/// <param name="originInstanceName">Optional name of the instance that originated the write.</param> /// <param name="originInstanceName">Optional name of the instance that originated the write.</param>
/// <param name="cancellationToken">Cancellation token for the buffering operation.</param> /// <param name="cancellationToken">Cancellation token for the buffering operation.</param>
/// <returns>A task that represents the asynchronous operation.</returns> /// <returns>
Task CachedWriteAsync( /// M2.3 (#7): an <see cref="ExternalCallResult"/> mirroring the External-System
/// API path (<c>IExternalSystemClient.CachedCallAsync</c>). The write is
/// attempted immediately:
/// <list type="bullet">
/// <item>immediate success → <c>Success=true, WasBuffered=false</c> (not buffered);</item>
/// <item>permanent SQL error (constraint / syntax / permission) → <c>Success=false, WasBuffered=false</c> with an error message, returned synchronously and NOT buffered;</item>
/// <item>transient SQL error (connection / timeout / deadlock / throttle) → buffered to store-and-forward, <c>Success=true, WasBuffered=true</c>.</item>
/// </list>
/// </returns>
Task<ExternalCallResult> CachedWriteAsync(
string connectionName, string connectionName,
string sql, string sql,
IReadOnlyDictionary<string, object?>? parameters = null, IReadOnlyDictionary<string, object?>? parameters = null,
@@ -2,8 +2,38 @@ using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
namespace ZB.MOM.WW.ScadaBridge.Commons.Messages.DebugView; namespace ZB.MOM.WW.ScadaBridge.Commons.Messages.DebugView;
/// <summary>
/// Snapshot of an instance's debug state returned in response to a
/// <see cref="DebugSnapshotRequest"/> or <see cref="SubscribeDebugViewRequest"/>.
/// </summary>
/// <remarks>
/// <para>
/// <b>Additive-only contract (M2.11):</b> <see cref="InstanceNotFound"/> is an
/// optional trailing parameter with a default of <see langword="false"/> so every
/// existing positional constructor call and every existing serialized wire frame
/// remains valid. Callers that receive a snapshot with
/// <c>InstanceNotFound = true</c> know the instance was unknown on the site and
/// should distinguish that from a deployed-but-empty instance
/// (<c>InstanceNotFound = false</c>, empty <see cref="AttributeValues"/> and
/// <see cref="AlarmStates"/>).
/// </para>
/// <para>
/// A new dedicated message type (<c>DebugViewInstanceNotFound</c>) was
/// considered but rejected: the ClusterClient / ClusterClientReceptionist
/// channel is typed on the request side and the bridge actor is already
/// pattern-matching on <c>DebugViewSnapshot</c> for the initial-snapshot TCS
/// in <c>DebugStreamService</c>. Introducing a second reply type would require
/// every consumer to handle an additional <c>Ask</c> result union — more change
/// for no additive-safety gain. The defaulted field is strictly additive and
/// keeps all call sites untouched.
/// </para>
/// </remarks>
public record DebugViewSnapshot( public record DebugViewSnapshot(
string InstanceUniqueName, string InstanceUniqueName,
IReadOnlyList<AttributeValueChanged> AttributeValues, IReadOnlyList<AttributeValueChanged> AttributeValues,
IReadOnlyList<AlarmStateChanged> AlarmStates, IReadOnlyList<AlarmStateChanged> AlarmStates,
DateTimeOffset SnapshotTimestamp); DateTimeOffset SnapshotTimestamp,
// M2.11 — additive field: true when the requested instance is not registered
// on this site. Defaults to false so all existing call sites and wire
// frames are unaffected.
bool InstanceNotFound = false);
@@ -40,7 +40,14 @@ public record SiteHealthReport(
// hosted service every 30 s. Defaults to null so existing producers / // hosted service every 30 s. Defaults to null so existing producers /
// tests that don't refresh the snapshot stay valid; the central health // tests that don't refresh the snapshot stay valid; the central health
// surface treats null as "no data yet" rather than a zeroed queue. // surface treats null as "no data yet" rather than a zeroed queue.
SiteAuditBacklogSnapshot? SiteAuditBacklog = null); SiteAuditBacklogSnapshot? SiteAuditBacklog = null,
// Site Event Logging (#12) M2.16 (#30): cumulative count of event-log write
// failures (SQLite error, disk full, bounded-queue overflow drop) since the
// logger was created. Populated by the site-side SiteEventLogFailureCountReporter
// hosted service. Point-in-time (not reset on collect) — mirrors the
// SiteAuditBacklog pattern. Defaults to 0 so existing producers / tests that
// don't wire the poller stay valid.
long SiteEventLogWriteFailures = 0);
/// <summary> /// <summary>
/// Broadcast wrapper used between central nodes to keep per-node /// Broadcast wrapper used between central nodes to keep per-node
@@ -12,8 +12,8 @@ public sealed record ConfigurationDiff
public string? OldRevisionHash { get; init; } public string? OldRevisionHash { get; init; }
/// <summary>Revision hash of the new configuration being compared.</summary> /// <summary>Revision hash of the new configuration being compared.</summary>
public string? NewRevisionHash { get; init; } public string? NewRevisionHash { get; init; }
/// <summary>True when any attribute, alarm, or script changes are present.</summary> /// <summary>True when any attribute, alarm, script, or connection changes are present.</summary>
public bool HasChanges => AttributeChanges.Count > 0 || AlarmChanges.Count > 0 || ScriptChanges.Count > 0; public bool HasChanges => AttributeChanges.Count > 0 || AlarmChanges.Count > 0 || ScriptChanges.Count > 0 || ConnectionChanges.Count > 0;
/// <summary>Diff entries for resolved attributes.</summary> /// <summary>Diff entries for resolved attributes.</summary>
public IReadOnlyList<DiffEntry<ResolvedAttribute>> AttributeChanges { get; init; } = []; public IReadOnlyList<DiffEntry<ResolvedAttribute>> AttributeChanges { get; init; } = [];
@@ -21,6 +21,13 @@ public sealed record ConfigurationDiff
public IReadOnlyList<DiffEntry<ResolvedAlarm>> AlarmChanges { get; init; } = []; public IReadOnlyList<DiffEntry<ResolvedAlarm>> AlarmChanges { get; init; } = [];
/// <summary>Diff entries for resolved scripts.</summary> /// <summary>Diff entries for resolved scripts.</summary>
public IReadOnlyList<DiffEntry<ResolvedScript>> ScriptChanges { get; init; } = []; public IReadOnlyList<DiffEntry<ResolvedScript>> ScriptChanges { get; init; } = [];
/// <summary>
/// Diff entries for connection configurations, keyed by connection name.
/// Surfaces standalone endpoint/protocol/failover drift that does not show
/// up as a per-attribute binding change (TemplateEngine-018).
/// </summary>
public IReadOnlyList<DiffEntry<ConnectionConfig>> ConnectionChanges { get; init; } = [];
} }
/// <summary> /// <summary>
@@ -174,6 +174,14 @@ public sealed record ResolvedScript
/// <summary>Gets the minimum time between script executions.</summary> /// <summary>Gets the minimum time between script executions.</summary>
public TimeSpan? MinTimeBetweenRuns { get; init; } public TimeSpan? MinTimeBetweenRuns { get; init; }
/// <summary>
/// Per-script execution timeout in seconds, or null to use the site's global
/// default. A non-positive value is treated as null (use global) by the Site
/// Runtime. Seconds (not TimeSpan) to match the global option it overrides.
/// </summary>
public int? ExecutionTimeoutSeconds { get; init; }
/// <summary>Gets the source of this script.</summary> /// <summary>Gets the source of this script.</summary>
public string Source { get; init; } = "Template"; public string Source { get; init; } = "Template";
@@ -0,0 +1,393 @@
using System.Text.Json;
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
/// <summary>
/// Recursive, persistence-ignorant model of an inbound-API parameter or
/// return-value type definition. This is the deserialized form of the JSON
/// Schema stored in <c>ApiMethod.ParameterDefinitions</c> / <c>ReturnDefinition</c>
/// (and the equivalent TemplateScript / SharedScript columns), the canonical
/// format produced by the Central UI schema builder and the
/// <c>MigrateParametersToJsonSchema</c> migration.
///
/// <para>
/// Unlike the flat <see cref="ParameterDefinition"/> (name → scalar type, no
/// nesting), an <see cref="InboundApiSchema"/> carries the FULL nested type:
/// an <c>object</c> node carries its declared field schemas (and which fields
/// are required); an <c>array</c> node carries its element schema. This lets
/// callers validate complex request/response structures field-by-field and
/// element-by-element to any depth, with path-qualified errors
/// (e.g. <c>order.items[2].quantity</c>).
/// </para>
///
/// <para>
/// The extended type vocabulary (after normalization) is the JSON Schema set:
/// <c>boolean · integer · number · string · object · array</c>. Legacy aliases
/// (<c>bool</c>, <c>int</c>, <c>float</c>, <c>double</c>, <c>list</c>, …) are
/// accepted on parse for transition safety, mirroring the Central UI
/// <c>SchemaBuilderModel</c> / <c>JsonSchemaShapeParser</c> conventions.
/// </para>
/// </summary>
public sealed class InboundApiSchema
{
/// <summary>Normalized JSON Schema type: one of <c>boolean · integer · number · string · object · array</c>.</summary>
public string Type { get; init; } = "string";
/// <summary>For <see cref="Type"/> = <c>object</c>: the declared fields, in declaration order.</summary>
public IReadOnlyList<InboundApiSchemaField> Fields { get; init; } = [];
/// <summary>For <see cref="Type"/> = <c>array</c>: the schema every element must satisfy; null means element type was not declared (shape-only).</summary>
public InboundApiSchema? Items { get; init; }
/// <summary>Maximum allowed schema nesting depth for both Parse and Validate recursion.</summary>
private const int MaxDepth = 32;
// Allow the JSON reader to parse schemas up to ~3× our structural ceiling so
// the application-level ParseSchema depth guard (MaxDepth = 32) fires before
// the System.Text.Json reader ceiling. Each structural level contributes
// roughly 3 JSON-reader nesting levels (object → properties-object → value),
// so 128 reader levels comfortably accommodates 32+ structural levels.
private static readonly JsonDocumentOptions DocOptions = new() { MaxDepth = 128 };
/// <summary>
/// Parses a stored definition string into an <see cref="InboundApiSchema"/>.
/// Accepts the canonical JSON Schema object form
/// (<c>{"type":"object","properties":{…},"required":[…]}</c>) and, for
/// transition safety, the legacy flat-array parameter form
/// (<c>[{name,type,required,itemType?}]</c>) which it treats as an object
/// schema whose properties are the array entries.
/// </summary>
/// <param name="json">The definition JSON; null/whitespace yields <c>null</c>.</param>
/// <returns>The parsed schema, or <c>null</c> when the input is empty.</returns>
/// <exception cref="JsonException">The input is non-empty but not valid JSON, is a JSON scalar/null at the root, or the schema nesting exceeds <see cref="MaxDepth"/>.</exception>
public static InboundApiSchema? Parse(string? json)
{
if (string.IsNullOrWhiteSpace(json))
{
return null;
}
using var doc = JsonDocument.Parse(json, DocOptions);
return doc.RootElement.ValueKind switch
{
JsonValueKind.Object => ParseSchema(doc.RootElement, depth: 0),
JsonValueKind.Array => ParseLegacyArray(doc.RootElement),
_ => throw new JsonException("Type definition must be a JSON object (JSON Schema) or legacy parameter array."),
};
}
private static InboundApiSchema ParseSchema(JsonElement el, int depth)
{
if (depth > MaxDepth)
{
throw new JsonException($"Schema nesting exceeds the maximum allowed depth of {MaxDepth}.");
}
var type = el.TryGetProperty("type", out var t) && t.ValueKind == JsonValueKind.String
? NormalizeType(t.GetString())
: "string";
if (type == "array")
{
InboundApiSchema? items = null;
if (el.TryGetProperty("items", out var itemsEl) && itemsEl.ValueKind == JsonValueKind.Object)
{
items = ParseSchema(itemsEl, depth + 1);
}
return new InboundApiSchema { Type = "array", Items = items };
}
if (type == "object")
{
var requiredSet = new HashSet<string>(StringComparer.Ordinal);
if (el.TryGetProperty("required", out var req) && req.ValueKind == JsonValueKind.Array)
{
foreach (var r in req.EnumerateArray())
{
if (r.ValueKind == JsonValueKind.String)
{
var s = r.GetString();
if (!string.IsNullOrEmpty(s))
{
requiredSet.Add(s);
}
}
}
}
var fields = new List<InboundApiSchemaField>();
if (el.TryGetProperty("properties", out var props) && props.ValueKind == JsonValueKind.Object)
{
foreach (var prop in props.EnumerateObject())
{
var schema = prop.Value.ValueKind == JsonValueKind.Object
? ParseSchema(prop.Value, depth + 1)
: new InboundApiSchema { Type = "string" };
fields.Add(new InboundApiSchemaField(prop.Name, requiredSet.Contains(prop.Name), schema));
}
}
return new InboundApiSchema { Type = "object", Fields = fields };
}
return new InboundApiSchema { Type = type };
}
private static InboundApiSchema ParseLegacyArray(JsonElement arr)
{
var fields = new List<InboundApiSchemaField>();
foreach (var item in arr.EnumerateArray())
{
if (item.ValueKind != JsonValueKind.Object)
{
continue;
}
// The legacy flat shape historically appeared with both PascalCase
// (CLI / anonymous-object serialization read back with
// PropertyNameCaseInsensitive) and lowercase (DB) keys, so the
// property lookups here are case-insensitive for compatibility.
var name = TryGetMember(item, "name", out var n) ? n.GetString() : null;
if (string.IsNullOrEmpty(name))
{
continue;
}
var rawType = TryGetMember(item, "type", out var t) ? t.GetString() : "string";
// A field is optional only when "required" is explicitly false.
// The SQL migration uses a string comparison (LOWER(...) <> 'false'),
// so we must also accept the string "false" (case-insensitive) here —
// not only the JSON boolean false — to stay consistent with legacy rows
// that stored "required":"false" as a string.
var required = !TryGetMember(item, "required", out var rq)
|| (rq.ValueKind != JsonValueKind.False
&& !string.Equals(
rq.ValueKind == JsonValueKind.String ? rq.GetString() : null,
"false",
StringComparison.OrdinalIgnoreCase));
var normalized = NormalizeType(rawType);
InboundApiSchema schema;
if (normalized == "array")
{
var inner = TryGetMember(item, "itemType", out var it) ? it.GetString() : null;
schema = new InboundApiSchema
{
Type = "array",
Items = string.IsNullOrEmpty(inner) ? null : new InboundApiSchema { Type = NormalizeType(inner) },
};
}
else
{
schema = new InboundApiSchema { Type = normalized };
}
fields.Add(new InboundApiSchemaField(name!, required, schema));
}
return new InboundApiSchema { Type = "object", Fields = fields };
}
/// <summary>
/// Case-insensitive object-member lookup, used only on the legacy flat-array
/// path so both PascalCase and lowercase legacy keys resolve.
/// </summary>
private static bool TryGetMember(JsonElement obj, string name, out JsonElement value)
{
foreach (var prop in obj.EnumerateObject())
{
if (string.Equals(prop.Name, name, StringComparison.OrdinalIgnoreCase))
{
value = prop.Value;
return true;
}
}
value = default;
return false;
}
/// <summary>
/// Normalizes a raw type token to the canonical JSON Schema vocabulary,
/// tolerating legacy aliases. Unknown tokens are returned lowercased so the
/// validator can surface an explicit "unknown type" error.
/// </summary>
/// <param name="raw">The raw type token (may be null).</param>
/// <returns>The normalized type token.</returns>
public static string NormalizeType(string? raw) => raw?.ToLowerInvariant() switch
{
null or "" => "string",
"boolean" or "bool" => "boolean",
"integer" or "int" or "int32" or "int64" => "integer",
"number" or "float" or "double" or "decimal" => "number",
// datetime→string is intentional: the legacy migration's SQL
// normalization function maps "datetime" to "string" (no separate
// datetime wire type in the extended type system), so C# must match.
"string" or "datetime" => "string",
"object" => "object",
"array" or "list" => "array",
var other => other,
};
/// <summary>
/// Recursively validates a JSON value against this schema. A JSON <c>null</c>
/// satisfies any type (a present-but-null field is allowed; absence of a
/// required field is reported by the parent object). Errors are accumulated
/// with a path prefix (e.g. <c>order.items[2].quantity</c>) so the caller can
/// pinpoint the offending field.
/// </summary>
/// <param name="value">The JSON value to validate.</param>
/// <param name="path">The path prefix for the value being validated (empty for the root).</param>
/// <param name="errors">Accumulator the validator appends path-qualified messages to.</param>
public void Validate(JsonElement value, string path, List<string> errors)
=> ValidateCore(value, path, errors, depth: 0);
private void ValidateCore(JsonElement value, string path, List<string> errors, int depth)
{
ArgumentNullException.ThrowIfNull(errors);
if (depth > MaxDepth)
{
errors.Add($"{Describe(path)}: schema nesting too deep (max {MaxDepth})");
return;
}
// A null value satisfies any declared type — a present-but-null field is
// allowed; a MISSING required field is reported by the enclosing object.
if (value.ValueKind == JsonValueKind.Null)
{
return;
}
switch (Type)
{
case "boolean":
if (value.ValueKind is not (JsonValueKind.True or JsonValueKind.False))
{
errors.Add(Mismatch(path, "Boolean"));
}
break;
case "integer":
if (value.ValueKind != JsonValueKind.Number || !value.TryGetInt64(out _))
{
errors.Add(Mismatch(path, "Integer"));
}
break;
case "number":
if (value.ValueKind != JsonValueKind.Number)
{
errors.Add(Mismatch(path, "Float"));
}
break;
case "string":
if (value.ValueKind != JsonValueKind.String)
{
errors.Add(Mismatch(path, "String"));
}
break;
case "object":
ValidateObject(value, path, errors, depth);
break;
case "array":
ValidateArray(value, path, errors, depth);
break;
default:
errors.Add($"{Describe(path)} has unknown declared type '{Type}'");
break;
}
}
private void ValidateObject(JsonElement value, string path, List<string> errors, int depth)
{
if (value.ValueKind != JsonValueKind.Object)
{
errors.Add(Mismatch(path, "Object"));
return;
}
// Reject undeclared fields (defensive, consistent with InboundAPI-010's
// top-level "unexpected parameter" rejection) — a typo'd nested field is
// surfaced instead of silently ignored. Skipped when no fields are
// declared (a bare {"type":"object"} stays shape-only, like the legacy
// behaviour and the array-without-items case).
if (Fields.Count > 0)
{
var declared = new HashSet<string>(Fields.Select(f => f.Name), StringComparer.Ordinal);
foreach (var prop in value.EnumerateObject())
{
if (!declared.Contains(prop.Name))
{
errors.Add($"{Describe(JoinField(path, prop.Name))} is not a declared field");
}
}
}
foreach (var field in Fields)
{
var fieldPath = JoinField(path, field.Name);
if (value.TryGetProperty(field.Name, out var fieldValue))
{
field.Schema.ValidateCore(fieldValue, fieldPath, errors, depth + 1);
}
else if (field.Required)
{
errors.Add($"missing required field {Describe(fieldPath)}");
}
}
}
private void ValidateArray(JsonElement value, string path, List<string> errors, int depth)
{
if (value.ValueKind != JsonValueKind.Array)
{
errors.Add(Mismatch(path, "List"));
return;
}
// No declared element type → shape-only (any elements accepted).
if (Items is null)
{
return;
}
var index = 0;
foreach (var element in value.EnumerateArray())
{
Items.ValidateCore(element, $"{path}[{index}]", errors, depth + 1);
index++;
}
}
private static string Mismatch(string path, string expectedDisplayType) =>
$"{Describe(path)} must be {Article(expectedDisplayType)} {expectedDisplayType}";
private static string Describe(string path) =>
string.IsNullOrEmpty(path) ? "value" : $"'{path}'";
private static string JoinField(string path, string field) =>
string.IsNullOrEmpty(path) ? field : $"{path}.{field}";
private static string Article(string word) =>
word.Length > 0 && "AEIOU".IndexOf(char.ToUpperInvariant(word[0])) >= 0 ? "an" : "a";
}
/// <summary>
/// One declared field of an <see cref="InboundApiSchema"/> object node: the
/// field name, whether it is required, and its (recursive) type schema.
/// </summary>
/// <param name="Name">The field name as it appears in the JSON.</param>
/// <param name="Required">Whether the field must be present.</param>
/// <param name="Schema">The recursive type schema the field's value must satisfy.</param>
public sealed record InboundApiSchemaField(string Name, bool Required, InboundApiSchema Schema);
@@ -10,10 +10,24 @@ namespace ZB.MOM.WW.ScadaBridge.Communication.Actors;
/// Long-lived (one per active debug session) actor on the central side. Debug sessions /// Long-lived (one per active debug session) actor on the central side. Debug sessions
/// are session-based and temporary — this actor holds no persisted state and does not /// are session-based and temporary — this actor holds no persisted state and does not
/// derive from an Akka.Persistence base class; its state does not survive a restart. /// derive from an Akka.Persistence base class; its state does not survive a restart.
/// Sends SubscribeDebugViewRequest to the site via CentralCommunicationActor (with THIS actor /// <para>
/// as the Sender) to get the initial snapshot. After receiving the snapshot, opens a gRPC /// <b>Stream-first lifecycle (M2.18, #26).</b> To avoid losing any
/// server-streaming subscription via SiteStreamGrpcClient for ongoing events. /// <see cref="AttributeValueChanged"/>/<see cref="AlarmStateChanged"/> that occurs on
/// Stream events are marshalled back to the actor via Self.Tell for thread safety. /// the site during the snapshot-build + network-transit window, the gRPC server-streaming
/// subscription is opened FIRST (in <see cref="PreStart"/>), alongside the
/// <c>SubscribeDebugViewRequest</c> sent to the site via CentralCommunicationActor (with
/// THIS actor as the Sender). Live events that arrive before the
/// <see cref="DebugViewSnapshot"/> is delivered are <em>buffered in arrival order</em>.
/// When the snapshot arrives it is delivered to the consumer, then the buffer is flushed
/// in order, <em>deduped</em> against the snapshot (an event whose per-entity timestamp is
/// &lt;= the snapshot's timestamp for the same entity is already reflected → dropped; a
/// strictly-newer event is delivered; an event for an entity absent from the snapshot is
/// delivered). After the flush the actor switches to pass-through: subsequent events go
/// straight to the consumer. A mid-session reconnect (after the snapshot) resumes
/// pass-through — the snapshot is a one-time thing.
/// </para>
/// Stream events are marshalled back to the actor via Self.Tell for thread safety; all
/// state (phase flag + buffer) is mutated only on the actor thread.
/// </summary> /// </summary>
public class DebugStreamBridgeActor : ReceiveActor, IWithTimers public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
{ {
@@ -49,6 +63,31 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
private bool _stopped; private bool _stopped;
private CancellationTokenSource? _grpcCts; private CancellationTokenSource? _grpcCts;
/// <summary>
/// Phase flag (M2.18). <see langword="false"/> until the initial
/// <see cref="DebugViewSnapshot"/> has been delivered and the pre-snapshot buffer
/// flushed; <see langword="true"/> thereafter (pass-through). Mutated only on the
/// actor thread. A reconnect does NOT touch this flag — a mid-session reconnect
/// (after the snapshot) therefore stays in pass-through, and a reconnect during the
/// buffering phase (before the snapshot) stays buffering.
/// </summary>
private bool _snapshotDelivered;
/// <summary>
/// Ordered buffer of live gRPC events (<see cref="AttributeValueChanged"/>/
/// <see cref="AlarmStateChanged"/>) that arrived before the snapshot was delivered.
/// Flushed (with per-entity dedup against the snapshot) when the snapshot arrives,
/// then never used again. Mutated only on the actor thread.
/// </summary>
private readonly List<object> _preSnapshotBuffer = new();
/// <summary>
/// Defensive log threshold: if the pre-snapshot buffer grows past this many events
/// during a slow snapshot we log once (events are NOT dropped — the window is short).
/// </summary>
private const int BufferWarnThreshold = 10_000;
private bool _bufferWarned;
/// <summary>Timer scheduler for reconnect and stability window timers.</summary> /// <summary>Timer scheduler for reconnect and stability window timers.</summary>
public ITimerScheduler Timers { get; set; } = null!; public ITimerScheduler Timers { get; set; } = null!;
@@ -85,13 +124,55 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
_grpcNodeAAddress = grpcNodeAAddress; _grpcNodeAAddress = grpcNodeAAddress;
_grpcNodeBAddress = grpcNodeBAddress; _grpcNodeBAddress = grpcNodeBAddress;
// Initial snapshot response from the site (via ClusterClient) // Initial snapshot response from the site (via ClusterClient).
// M2.11: if the site reports InstanceNotFound=true the instance is not
// deployed there. M2.18: under the stream-first lifecycle the gRPC stream
// was already opened in PreStart, so the not-found path must tear it down
// (CleanupGrpc) rather than enter pass-through. Forward the snapshot (with
// InstanceNotFound=true) to _onEvent so DebugStreamService's TCS resolves and
// the caller can inspect the flag; then stop cleanly.
Receive<DebugViewSnapshot>(snapshot => Receive<DebugViewSnapshot>(snapshot =>
{ {
_log.Info("Received initial snapshot for {0} ({1} attrs, {2} alarms)", if (_snapshotDelivered)
_instanceUniqueName, snapshot.AttributeValues.Count, snapshot.AlarmStates.Count); {
// Defensive: a duplicate / late snapshot after we have already moved to
// pass-through. The snapshot is a one-time thing — ignore replays so we
// never re-buffer or double-deliver.
_log.Debug("Ignoring duplicate DebugViewSnapshot for {0} (already delivered)",
_instanceUniqueName);
return;
}
if (snapshot.InstanceNotFound)
{
_log.Warning("Instance {0} is not deployed on site; terminating debug stream",
_instanceUniqueName);
// M2.18: the stream-first subscription opened in PreStart is for a
// non-deployed instance — cancel it (and any buffered gap events are
// discarded with the actor). No pass-through.
// _stopped is set AFTER CleanupGrpc() to match the ordering in the
// DebugStreamTerminated and ReceiveTimeout handlers (cosmetic consistency).
CleanupGrpc();
_stopped = true;
_preSnapshotBuffer.Clear();
_onEvent(snapshot); // resolves the snapshot TCS with InstanceNotFound=true
// Note: after Context.Stop(Self) below the actor is dead. DebugStreamService
// inspects InitialSnapshot.InstanceNotFound and calls StopStream, which sends
// a StopDebugStream message. That Tell arrives after the actor has already
// stopped, producing a benign Akka dead-letter — expected and harmless.
Context.Stop(Self);
return;
}
_log.Info("Received initial snapshot for {0} ({1} attrs, {2} alarms); flushing {3} buffered event(s)",
_instanceUniqueName, snapshot.AttributeValues.Count, snapshot.AlarmStates.Count,
_preSnapshotBuffer.Count);
// Deliver the snapshot, then flush the gap-window buffer (deduped), then
// switch to pass-through. Order matters: snapshot first, buffered events next.
_onEvent(snapshot); _onEvent(snapshot);
OpenGrpcStream(); FlushBuffer(snapshot);
_snapshotDelivered = true;
}); });
// Domain events arriving via Self.Tell from gRPC callback. // Domain events arriving via Self.Tell from gRPC callback.
@@ -99,8 +180,11 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
// flapping stream that delivers a single event between failures would // flapping stream that delivers a single event between failures would
// otherwise never trip MaxRetries. The retry budget is recovered only by // otherwise never trip MaxRetries. The retry budget is recovered only by
// GrpcStreamStable (a stream that has stayed up for StabilityWindow). // GrpcStreamStable (a stream that has stayed up for StabilityWindow).
Receive<AttributeValueChanged>(changed => _onEvent(changed)); // M2.18: before the snapshot has been delivered, BUFFER (in arrival order)
Receive<AlarmStateChanged>(changed => _onEvent(changed)); // rather than deliver — these may be gap-window events. After the snapshot has
// been flushed, pass through directly (same handler, phase-dependent behavior).
Receive<AttributeValueChanged>(changed => HandleStreamEvent(changed));
Receive<AlarmStateChanged>(changed => HandleStreamEvent(changed));
// Stream has been stably connected for StabilityWindow — recover the // Stream has been stably connected for StabilityWindow — recover the
// retry budget so a future transient fault gets a fresh set of retries. // retry budget so a future transient fault gets a fresh set of retries.
@@ -155,11 +239,161 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
}); });
} }
/// <summary>
/// Handles a live gRPC stream event (<see cref="AttributeValueChanged"/> or
/// <see cref="AlarmStateChanged"/>). Before the snapshot has been delivered the
/// event is appended to the ordered pre-snapshot buffer (gap-window capture); after
/// the snapshot+flush it is passed straight through to the consumer. Always runs on
/// the actor thread (events are marshalled in via Self.Tell), so the phase flag and
/// buffer are accessed without locking.
/// </summary>
private void HandleStreamEvent(object evt)
{
if (_snapshotDelivered)
{
_onEvent(evt);
return;
}
_preSnapshotBuffer.Add(evt);
if (!_bufferWarned && _preSnapshotBuffer.Count > BufferWarnThreshold)
{
_bufferWarned = true;
_log.Warning(
"Pre-snapshot debug-event buffer for {0} exceeded {1} events while awaiting the snapshot; " +
"events are still retained (not dropped).",
_instanceUniqueName, BufferWarnThreshold);
}
}
/// <summary>
/// Flushes the pre-snapshot buffer in arrival order, deduping each event against the
/// just-delivered snapshot (M2.18).
/// <para>
/// <b>Dedup rule.</b> Identity is per-entity:
/// attributes by (InstanceUniqueName, AttributePath, AttributeName); alarms by
/// (InstanceUniqueName, AlarmName, SourceReference). For a buffered event whose entity
/// is present in the snapshot, the comparison is against that entity's snapshot
/// timestamp: a buffered timestamp &lt;= the snapshot timestamp means the event is
/// already reflected in the snapshot → DROP; a strictly-newer (&gt;) timestamp means
/// the event happened after the snapshot was built → DELIVER. The boundary is inclusive
/// on the snapshot side (equal timestamps are treated as duplicates) — the snapshot is
/// the authoritative point-in-time value, so an event at the exact same instant carries
/// no new information. A buffered event whose entity is NOT in the snapshot is a genuine
/// gap-window event → DELIVER.
/// </para>
/// </summary>
private void FlushBuffer(DebugViewSnapshot snapshot)
{
if (_preSnapshotBuffer.Count == 0) return;
// Build per-entity "as-of" timestamps from the snapshot. If (defensively) the
// snapshot lists the same entity twice, keep the newest timestamp.
var attrAsOf = new Dictionary<string, DateTimeOffset>();
foreach (var a in snapshot.AttributeValues)
{
var key = AttributeKey(a);
if (!attrAsOf.TryGetValue(key, out var existing) || a.Timestamp > existing)
attrAsOf[key] = a.Timestamp;
}
var alarmAsOf = new Dictionary<string, DateTimeOffset>();
foreach (var al in snapshot.AlarmStates)
{
var key = AlarmKey(al);
if (!alarmAsOf.TryGetValue(key, out var existing) || al.Timestamp > existing)
alarmAsOf[key] = al.Timestamp;
}
var flushed = 0;
var dropped = 0;
foreach (var evt in _preSnapshotBuffer)
{
if (IsReflectedInSnapshot(evt, attrAsOf, alarmAsOf))
{
dropped++;
continue;
}
_onEvent(evt);
flushed++;
}
if (dropped > 0 || flushed > 0)
{
_log.Debug("Flushed {0} buffered debug event(s) for {1}, dropped {2} as already-in-snapshot",
flushed, _instanceUniqueName, dropped);
}
_preSnapshotBuffer.Clear();
}
/// <summary>
/// Returns <see langword="true"/> when a buffered event is already reflected in the
/// snapshot (same entity, buffered timestamp &lt;= snapshot timestamp) and must be
/// dropped; otherwise <see langword="false"/> (deliver).
/// </summary>
private static bool IsReflectedInSnapshot(
object evt,
IReadOnlyDictionary<string, DateTimeOffset> attrAsOf,
IReadOnlyDictionary<string, DateTimeOffset> alarmAsOf)
{
switch (evt)
{
case AttributeValueChanged a:
return attrAsOf.TryGetValue(AttributeKey(a), out var attrTs) && a.Timestamp <= attrTs;
case AlarmStateChanged al:
return alarmAsOf.TryGetValue(AlarmKey(al), out var alarmTs) && al.Timestamp <= alarmTs;
default:
// Unknown buffered type (should not happen — only attr/alarm are buffered):
// never treat as a duplicate.
return false;
}
}
/// <summary>
/// Delimiter used to join identity components into a single dedup key. A NUL
/// control character cannot appear in an instance/attribute/alarm name, so
/// distinct identities never collide on a shared boundary (unlike a space, which
/// may legitimately occur within a name). Declared as an escaped char so the
/// source carries no raw NUL byte.
/// </summary>
private const char KeyDelimiter = '\u0000';
/// <summary>
/// Per-entity dedup key for an attribute change. Each nullable component is guarded
/// with <c>?? string.Empty</c> so a null can never silently collide with another
/// key via <see cref="string.Concat"/> (e.g. two entries with null AttributePath
/// would otherwise share a key with any entry whose AttributePath is the empty string).
/// </summary>
private static string AttributeKey(AttributeValueChanged a) =>
string.Concat(
a.InstanceUniqueName ?? string.Empty, KeyDelimiter,
a.AttributePath ?? string.Empty, KeyDelimiter,
a.AttributeName ?? string.Empty);
/// <summary>
/// Per-entity dedup key for an alarm change. Includes <see cref="AlarmStateChanged.SourceReference"/>
/// so native per-condition alarms (which share an AlarmName but differ by source
/// reference) are not conflated; empty for computed alarms. Each nullable component is
/// guarded with <c>?? string.Empty</c> to prevent silent key collisions.
/// </summary>
private static string AlarmKey(AlarmStateChanged al) =>
string.Concat(
al.InstanceUniqueName ?? string.Empty, KeyDelimiter,
al.AlarmName ?? string.Empty, KeyDelimiter,
al.SourceReference ?? string.Empty);
/// <inheritdoc /> /// <inheritdoc />
protected override void PreStart() protected override void PreStart()
{ {
_log.Info("Starting debug stream bridge for {0} on site {1}", _instanceUniqueName, _siteIdentifier); _log.Info("Starting debug stream bridge for {0} on site {1}", _instanceUniqueName, _siteIdentifier);
// M2.18 stream-first: open the gRPC live-event subscription BEFORE (and
// alongside) requesting the snapshot, so events occurring during the
// snapshot-build + network-transit window are captured (buffered) and not lost.
OpenGrpcStream();
// Send subscribe request via CentralCommunicationActor for the initial snapshot. // Send subscribe request via CentralCommunicationActor for the initial snapshot.
var request = new SubscribeDebugViewRequest(_instanceUniqueName, _correlationId); var request = new SubscribeDebugViewRequest(_instanceUniqueName, _correlationId);
var envelope = new SiteEnvelope(_siteIdentifier, request); var envelope = new SiteEnvelope(_siteIdentifier, request);
@@ -178,6 +178,11 @@ public class TemplateScriptConfiguration : IEntityTypeConfiguration<TemplateScri
builder.Property(s => s.ReturnDefinition) builder.Property(s => s.ReturnDefinition)
.HasMaxLength(4000); .HasMaxLength(4000);
// M2.5 (#9): nullable per-script execution timeout (seconds). Null = use
// the site's global ScriptExecutionTimeoutSeconds default.
builder.Property(s => s.ExecutionTimeoutSeconds)
.IsRequired(false);
builder.HasIndex(s => new { s.TemplateId, s.Name }).IsUnique(); builder.HasIndex(s => new { s.TemplateId, s.Name }).IsUnique();
} }
} }
@@ -0,0 +1,28 @@
using Microsoft.EntityFrameworkCore.Migrations;
#nullable disable
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
{
/// <inheritdoc />
public partial class ResyncLdapGroupMappingSeed : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.InsertData(
table: "LdapGroupMappings",
columns: new[] { "Id", "LdapGroupName", "Role" },
values: new object[] { 5, "SCADA-Viewers", "Viewer" });
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DeleteData(
table: "LdapGroupMappings",
keyColumn: "Id",
keyValue: 5);
}
}
}
@@ -0,0 +1,28 @@
using Microsoft.EntityFrameworkCore.Migrations;
#nullable disable
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
{
/// <inheritdoc />
public partial class AddTemplateScriptExecutionTimeout : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.AddColumn<int>(
name: "ExecutionTimeoutSeconds",
table: "TemplateScripts",
type: "int",
nullable: true);
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropColumn(
name: "ExecutionTimeoutSeconds",
table: "TemplateScripts");
}
}
}
@@ -925,6 +925,12 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
Id = 4, Id = 4,
LdapGroupName = "SCADA-Deploy-SiteA", LdapGroupName = "SCADA-Deploy-SiteA",
Role = "Deployer" Role = "Deployer"
},
new
{
Id = 5,
LdapGroupName = "SCADA-Viewers",
Role = "Viewer"
}); });
}); });
@@ -1307,6 +1313,9 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
.IsRequired() .IsRequired()
.HasColumnType("nvarchar(max)"); .HasColumnType("nvarchar(max)");
b.Property<int?>("ExecutionTimeoutSeconds")
.HasColumnType("int");
b.Property<bool>("IsInherited") b.Property<bool>("IsInherited")
.HasColumnType("bit"); .HasColumnType("bit");
@@ -99,8 +99,14 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
// routed to subscribers (NativeAlarmActors) by source-object reference. // routed to subscribers (NativeAlarmActors) by source-object reference.
/// <summary>sourceReference → set of subscriber actor refs (NativeAlarmActors), for routing + ref-count.</summary> /// <summary>sourceReference → set of subscriber actor refs (NativeAlarmActors), for routing + ref-count.</summary>
private readonly Dictionary<string, HashSet<IActorRef>> _alarmSourceSubscribers = new(); private readonly Dictionary<string, HashSet<IActorRef>> _alarmSourceSubscribers = new();
/// <summary>sourceReference → optional condition filter (first subscriber wins).</summary> /// <summary>sourceReference → raw condition filter string passed to the adapter (first subscriber wins).</summary>
private readonly Dictionary<string, string?> _alarmSourceFilter = new(); private readonly Dictionary<string, string?> _alarmSourceFilter = new();
/// <summary>
/// sourceReference → parsed condition-type predicate (M2.4 / #8). The authoritative
/// client-side gate in <see cref="HandleAlarmTransitionReceived"/>; applies uniformly
/// across OPC UA and the gateway-wide MxGateway feed.
/// </summary>
private readonly Dictionary<string, AlarmConditionFilter> _alarmSourceFilterPredicate = new();
/// <summary>sourceReference → adapter alarm subscription id.</summary> /// <summary>sourceReference → adapter alarm subscription id.</summary>
private readonly Dictionary<string, string> _alarmSubscriptionIds = new(); private readonly Dictionary<string, string> _alarmSubscriptionIds = new();
/// <summary>sourceReferences whose adapter SubscribeAlarmsAsync is currently in flight.</summary> /// <summary>sourceReferences whose adapter SubscribeAlarmsAsync is currently in flight.</summary>
@@ -1480,6 +1486,9 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
} }
subs.Add(subscriber); subs.Add(subscriber);
_alarmSourceFilter[request.SourceReference] = request.ConditionFilter; _alarmSourceFilter[request.SourceReference] = request.ConditionFilter;
// Parse the type-name filter once; this is the authoritative client-side
// gate consulted on every routed transition (M2.4 / #8).
_alarmSourceFilterPredicate[request.SourceReference] = AlarmConditionFilter.Parse(request.ConditionFilter);
// If the adapter feed for this source is already (being) established, the // If the adapter feed for this source is already (being) established, the
// existing subscription serves the new subscriber too. // existing subscription serves the new subscriber too.
@@ -1546,6 +1555,14 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
if (!match) if (!match)
continue; continue;
// M2.4 (#8): authoritative client-side condition-type gate. Applied
// per matched source because two sources may share a prefix yet carry
// different filters. Empty filter = allow all (historical behaviour);
// framing sentinels (SnapshotComplete) are never dropped.
if (_alarmSourceFilterPredicate.TryGetValue(sourceRef, out var predicate) &&
!predicate.IsAllowed(transition))
continue;
foreach (var sub in subs) foreach (var sub in subs)
{ {
if (notified.Add(sub)) if (notified.Add(sub))
@@ -1566,6 +1583,7 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
// No subscribers remain for this source — tear down the adapter feed. // No subscribers remain for this source — tear down the adapter feed.
_alarmSourceSubscribers.Remove(request.SourceReference); _alarmSourceSubscribers.Remove(request.SourceReference);
_alarmSourceFilter.Remove(request.SourceReference); _alarmSourceFilter.Remove(request.SourceReference);
_alarmSourceFilterPredicate.Remove(request.SourceReference);
if (_alarmSubscriptionIds.Remove(request.SourceReference, out var subId) && if (_alarmSubscriptionIds.Remove(request.SourceReference, out var subId) &&
_adapter is IAlarmSubscribableConnection alarmable) _adapter is IAlarmSubscribableConnection alarmable)
{ {
@@ -1,3 +1,5 @@
using System.Globalization;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ProtoConditionState = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmConditionState; using ProtoConditionState = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmConditionState;
using ProtoTransitionKind = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmTransitionKind; using ProtoTransitionKind = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmTransitionKind;
@@ -67,6 +69,19 @@ public static class MxGatewayAlarmMapper
Shelve: AlarmShelveState.Unshelved, Suppressed: false, Severity: NormalizeSeverity(severity)); Shelve: AlarmShelveState.Unshelved, Suppressed: false, Severity: NormalizeSeverity(severity));
} }
/// <summary>
/// Converts an <see cref="MxValue"/> union to a display-only string using
/// <see cref="MxValueExtensions.ToClrValue"/> and invariant culture formatting,
/// so numeric values always use '.' as the decimal separator. Null or unset
/// values produce an empty string.
/// </summary>
internal static string MxValueToString(MxValue? mxVal)
{
if (mxVal is null) return "";
var clr = mxVal.ToClrValue();
return clr is null ? "" : Convert.ToString(clr, CultureInfo.InvariantCulture) ?? "";
}
/// <summary>Maps a live <see cref="OnAlarmTransitionEvent"/> to a transition.</summary> /// <summary>Maps a live <see cref="OnAlarmTransitionEvent"/> to a transition.</summary>
/// <param name="body">The gateway alarm transition event proto message to map.</param> /// <param name="body">The gateway alarm transition event proto message to map.</param>
/// <returns>The protocol-neutral <see cref="NativeAlarmTransition"/>.</returns> /// <returns>The protocol-neutral <see cref="NativeAlarmTransition"/>.</returns>
@@ -83,8 +98,8 @@ public static class MxGatewayAlarmMapper
OperatorComment: body.OperatorComment, OperatorComment: body.OperatorComment,
OriginalRaiseTime: body.OriginalRaiseTimestamp?.ToDateTimeOffset(), OriginalRaiseTime: body.OriginalRaiseTimestamp?.ToDateTimeOffset(),
TransitionTime: body.TransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow, TransitionTime: body.TransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow,
CurrentValue: "", CurrentValue: MxValueToString(body.CurrentValue),
LimitValue: ""); LimitValue: MxValueToString(body.LimitValue));
/// <summary>The end-of-snapshot sentinel transition (no condition payload).</summary> /// <summary>The end-of-snapshot sentinel transition (no condition payload).</summary>
/// <returns>A <see cref="NativeAlarmTransition"/> with <c>AlarmTransitionKind.SnapshotComplete</c>.</returns> /// <returns>A <see cref="NativeAlarmTransition"/> with <c>AlarmTransitionKind.SnapshotComplete</c>.</returns>
@@ -109,6 +124,6 @@ public static class MxGatewayAlarmMapper
OperatorComment: snapshot.OperatorComment, OperatorComment: snapshot.OperatorComment,
OriginalRaiseTime: snapshot.OriginalRaiseTimestamp?.ToDateTimeOffset(), OriginalRaiseTime: snapshot.OriginalRaiseTimestamp?.ToDateTimeOffset(),
TransitionTime: snapshot.LastTransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow, TransitionTime: snapshot.LastTransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow,
CurrentValue: "", CurrentValue: MxValueToString(snapshot.CurrentValue),
LimitValue: ""); LimitValue: MxValueToString(snapshot.LimitValue));
} }
@@ -163,7 +163,11 @@ public class MxGatewayDataConnection : IDataConnection, IBrowsableDataConnection
_alarmCts = new CancellationTokenSource(); _alarmCts = new CancellationTokenSource();
var token = _alarmCts.Token; var token = _alarmCts.Token;
var client = _client!; var client = _client!;
// Gateway-wide feed (null prefix); the actor filters per source reference. // Gateway-wide feed (null prefix). The MxGateway has no server-side
// condition filter, so conditionFilter is intentionally NOT forwarded
// here: the DataConnectionActor applies it as the authoritative
// client-side gate per source reference AND per condition type
// (M2.4 / #8 — AlarmConditionFilter), uniform with the OPC UA path.
_ = Task.Run(() => client.RunAlarmStreamAsync(null, t => callback(t), token), token); _ = Task.Run(() => client.RunAlarmStreamAsync(null, t => callback(t), token), token);
} }
} }
@@ -65,4 +65,40 @@ public static class OpcUaAlarmMapper
null or "Unshelved" => AlarmShelveState.Unshelved, null or "Unshelved" => AlarmShelveState.Unshelved,
_ => AlarmShelveState.OneShotShelved _ => AlarmShelveState.OneShotShelved
}; };
/// <summary>
/// Picks a representative display-only limit value from the four standard
/// <c>LimitAlarmType</c> set-point fields (HighHighLimit, HighLimit, LowLimit,
/// LowLowLimit) returned by the OPC UA event SelectClause.
///
/// <para>
/// The fields are absent (null raw value) on non-limit alarm types (discrete,
/// off-normal, etc.). When present, the first non-null value is returned in
/// priority order: HighHigh → High → Low → LowLow. The caller may use
/// <c>AlarmTypeName</c> or <c>ConditionName</c> to determine which specific
/// limit is active; this method intentionally returns the coarsest useful value
/// for the common single-limit case without requiring callers to understand the
/// OPC UA limit hierarchy.
/// </para>
/// </summary>
/// <param name="highHighRaw">Raw HighHighLimit field value (null when absent).</param>
/// <param name="highRaw">Raw HighLimit field value (null when absent).</param>
/// <param name="lowRaw">Raw LowLimit field value (null when absent).</param>
/// <param name="lowLowRaw">Raw LowLowLimit field value (null when absent).</param>
/// <returns>
/// A formatted string representation of the first non-null limit value, or an
/// empty string when all four fields are absent (non-limit alarm type).
/// </returns>
public static string PickLimitValue(object? highHighRaw, object? highRaw, object? lowRaw, object? lowLowRaw)
{
// Standard OPC UA LimitAlarmType limit values are numeric (Double/Float/Int).
// Convert with InvariantCulture so the decimal separator is always '.' regardless
// of the server's locale.
foreach (var raw in new[] { highHighRaw, highRaw, lowRaw, lowLowRaw })
{
if (raw is not null)
return Convert.ToString(raw, System.Globalization.CultureInfo.InvariantCulture) ?? "";
}
return "";
}
} }
@@ -258,7 +258,9 @@ public class RealOpcUaClient : IOpcUaClient
MonitoringMode = MonitoringMode.Reporting, MonitoringMode = MonitoringMode.Reporting,
SamplingInterval = 0, SamplingInterval = 0,
QueueSize = 1000, QueueSize = 1000,
Filter = BuildAlarmEventFilter() // Server-side WhereClause is a bandwidth optimisation only — the
// authoritative condition-type gate lives in DataConnectionActor (M2.4 / #8).
Filter = BuildAlarmEventFilter(AlarmConditionFilter.Parse(conditionFilter))
}; };
item.Notification += (_, e) => item.Notification += (_, e) =>
@@ -289,10 +291,94 @@ public class RealOpcUaClient : IOpcUaClient
} }
/// <summary> /// <summary>
/// Builds the event filter selecting the base event fields plus the /// Maps the standard OPC UA Alarms &amp; Conditions type names (case-insensitive)
/// AlarmConditionType / AcknowledgeableConditionType state sub-variables we mirror. /// to their well-known <see cref="ObjectTypeIds"/> NodeIds, for building the
/// optional server-side WhereClause (M2.4 / #8). Only standard types appear
/// here; vendor/custom type names cannot be mapped without browsing the server
/// type tree, so they are handled by the client-side gate alone.
/// <para>
/// Single source of truth for both directions: <see cref="ConditionTypeNamesById"/>
/// is derived from this map, so the friendly-name and NodeId sides cannot drift.
/// </para>
/// </summary> /// </summary>
private static EventFilter BuildAlarmEventFilter() internal static readonly IReadOnlyDictionary<string, NodeId> KnownConditionTypeIds =
new Dictionary<string, NodeId>(StringComparer.OrdinalIgnoreCase)
{
["ConditionType"] = ObjectTypeIds.ConditionType,
["AcknowledgeableConditionType"] = ObjectTypeIds.AcknowledgeableConditionType,
["AlarmConditionType"] = ObjectTypeIds.AlarmConditionType,
["LimitAlarmType"] = ObjectTypeIds.LimitAlarmType,
["ExclusiveLimitAlarmType"] = ObjectTypeIds.ExclusiveLimitAlarmType,
["NonExclusiveLimitAlarmType"] = ObjectTypeIds.NonExclusiveLimitAlarmType,
["ExclusiveLevelAlarmType"] = ObjectTypeIds.ExclusiveLevelAlarmType,
["NonExclusiveLevelAlarmType"] = ObjectTypeIds.NonExclusiveLevelAlarmType,
["ExclusiveDeviationAlarmType"] = ObjectTypeIds.ExclusiveDeviationAlarmType,
["NonExclusiveDeviationAlarmType"] = ObjectTypeIds.NonExclusiveDeviationAlarmType,
["ExclusiveRateOfChangeAlarmType"] = ObjectTypeIds.ExclusiveRateOfChangeAlarmType,
["NonExclusiveRateOfChangeAlarmType"] = ObjectTypeIds.NonExclusiveRateOfChangeAlarmType,
["DiscreteAlarmType"] = ObjectTypeIds.DiscreteAlarmType,
["OffNormalAlarmType"] = ObjectTypeIds.OffNormalAlarmType,
["SystemOffNormalAlarmType"] = ObjectTypeIds.SystemOffNormalAlarmType,
["TripAlarmType"] = ObjectTypeIds.TripAlarmType,
["DiscrepancyAlarmType"] = ObjectTypeIds.DiscrepancyAlarmType,
["InstrumentDiagnosticAlarmType"] = ObjectTypeIds.InstrumentDiagnosticAlarmType,
["SystemDiagnosticAlarmType"] = ObjectTypeIds.SystemDiagnosticAlarmType,
["CertificateExpirationAlarmType"] = ObjectTypeIds.CertificateExpirationAlarmType,
};
/// <summary>
/// Inverse of <see cref="KnownConditionTypeIds"/> (NodeId → friendly name), derived
/// from it so the two cannot drift (M2.4 / #8). Used by <see cref="ResolveAlarmTypeName"/>
/// to translate the event-type NodeId an OPC UA server sends back into the friendly
/// type name the conditionFilter gate and server-side WhereClause both key off.
/// </summary>
private static readonly IReadOnlyDictionary<NodeId, string> ConditionTypeNamesById =
KnownConditionTypeIds.ToDictionary(kv => kv.Value, kv => kv.Key);
/// <summary>
/// Resolves an event-type <see cref="NodeId"/> to the friendly condition-type name the
/// <c>conditionFilter</c> gate (and the server-side WhereClause) use (M2.4 / #8).
///
/// <para>
/// Standard A&amp;C types are returned as their friendly name (e.g. <c>i=9341</c> →
/// <c>"ExclusiveLevelAlarmType"</c>) so the client-side gate — which compares against
/// the friendly names in <see cref="KnownConditionTypeIds"/> — actually matches the
/// events the server delivers. Vendor/custom subtypes that are not in the map fall back
/// to the NodeId string; that is consistent because the WhereClause is likewise omitted
/// for unmapped names, so such a filter can only be expressed (and matched) as the NodeId
/// string. A <c>null</c> event type yields the empty string.
/// </para>
/// </summary>
/// <param name="eventType">The event-type NodeId from the A&amp;C notification, or <c>null</c>.</param>
/// <returns>The friendly type name when known; otherwise the NodeId string (or "" when null).</returns>
internal static string ResolveAlarmTypeName(NodeId? eventType)
{
if (eventType is null)
return "";
return ConditionTypeNamesById.TryGetValue(eventType, out var friendly)
? friendly
: eventType.ToString();
}
/// <summary>
/// Builds the event filter selecting the base event fields plus the
/// AlarmConditionType / AcknowledgeableConditionType state sub-variables we mirror,
/// and — when <paramref name="conditionFilter"/> is non-empty and every requested
/// type maps to a standard A&amp;C type — a server-side <see cref="ContentFilter"/>
/// WhereClause (OfType, OR'd) as a bandwidth optimisation (M2.4 / #8).
///
/// <para>
/// Conservative by design: if <em>any</em> requested type name cannot be mapped to
/// a standard <see cref="ObjectTypeIds"/> NodeId, the WhereClause is omitted entirely
/// rather than partially applied — a partial server-side filter would silently drop
/// the unmapped types' events, and the server cannot send what it filtered out. The
/// client-side gate in DataConnectionActor enforces the full filter regardless, so
/// omitting the WhereClause only forgoes the bandwidth saving, never correctness.
/// </para>
/// </summary>
/// <param name="conditionFilter">The parsed condition-type filter (allow-all when empty).</param>
/// <returns>The configured <see cref="EventFilter"/>.</returns>
internal static EventFilter BuildAlarmEventFilter(AlarmConditionFilter conditionFilter)
{ {
var filter = new EventFilter(); var filter = new EventFilter();
foreach (var name in AlarmStateFields) foreach (var name in AlarmStateFields)
@@ -306,9 +392,81 @@ public class RealOpcUaClient : IOpcUaClient
filter.SelectClauses.Add(SelectField(ObjectTypeIds.AlarmConditionType, "ShelvingState", "CurrentState"));// 10 filter.SelectClauses.Add(SelectField(ObjectTypeIds.AlarmConditionType, "ShelvingState", "CurrentState"));// 10
filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "ConditionName")); // 11 filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "ConditionName")); // 11
filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "Comment")); // 12 filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "Comment")); // 12
// APPENDED fields (indices 13+): optional — only present on specific derived types.
// Guard all reads with fields.Count > N so base-ConditionType events still process.
// 13: AlarmConditionType/ActiveState/TransitionTime — the UTC instant the active-state
// last flipped to TRUE. Mapped to OriginalRaiseTime; absent on non-AlarmCondition
// events (ConditionType base events rarely carry it). CAVEAT: during a
// ConditionRefresh replay the server MAY re-stamp this to the current/restart time
// rather than the historical raise instant (OPC UA Part 9 §5.5.2 makes it advisory),
// so a snapshot-derived OriginalRaiseTime can look like the refresh time — it is
// display-only and not treated as authoritative.
filter.SelectClauses.Add(SelectField(ObjectTypeIds.AlarmConditionType, "ActiveState", "TransitionTime")); // 13
// 1417: LimitAlarmType limit thresholds — configuration-time set-points exposed as
// event fields by LimitAlarmType and all its subtypes (Exclusive/NonExclusive
// Level/Deviation/RateOfChange). Absent on non-limit alarm types (e.g. discrete,
// off-normal) — guarded by fields.Count > N below.
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "HighHighLimit")); // 14
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "HighLimit")); // 15
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "LowLimit")); // 16
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "LowLowLimit")); // 17
// UNAVAILABLE via standard OPC UA A&C event fields (documented here so future
// maintainers know these were considered, not overlooked):
// Category — not a standard event field; server-specific extensions only.
// Description — NativeAlarmTransition.Description is a static template description;
// OPC UA events carry dynamic Message text (index 4, mapped) but no
// static template description in the notification, so this stays empty.
// OperatorUser — not available on the standard ConditionRefresh replay stream;
// present on Acknowledge/Confirm method call results, but those do
// not flow through the monitored-item subscription.
// CurrentValue — the live process variable value is NOT a standard A&C event field;
// it would require a separate data subscription on the source node.
ApplyServerSideTypeWhereClause(filter, conditionFilter);
return filter; return filter;
} }
/// <summary>
/// Attaches an OfType(-OR'd) WhereClause to <paramref name="filter"/> when every
/// requested condition type maps to a standard A&amp;C type NodeId; otherwise leaves
/// the WhereClause empty (see <see cref="BuildAlarmEventFilter"/> rationale).
/// </summary>
private static void ApplyServerSideTypeWhereClause(EventFilter filter, AlarmConditionFilter conditionFilter)
{
if (conditionFilter.IsEmpty)
return;
var typeIds = new List<NodeId>();
foreach (var name in conditionFilter.Names)
{
if (!KnownConditionTypeIds.TryGetValue(name, out var id))
return; // unmapped type → omit the WhereClause entirely (client gate covers it)
typeIds.Add(id);
}
if (typeIds.Count == 0)
return;
var where = filter.WhereClause;
if (typeIds.Count == 1)
{
where.Push(FilterOperator.OfType, typeIds[0]);
return;
}
// OR together each OfType element so an event of ANY listed type passes.
var element = where.Push(FilterOperator.OfType, typeIds[0]);
for (var i = 1; i < typeIds.Count; i++)
{
var next = where.Push(FilterOperator.OfType, typeIds[i]);
element = where.Push(FilterOperator.Or, element, next);
}
}
private static SimpleAttributeOperand SelectField(NodeId typeDefinitionId, params string[] browse) private static SimpleAttributeOperand SelectField(NodeId typeDefinitionId, params string[] browse)
{ {
var path = new QualifiedNameCollection(); var path = new QualifiedNameCollection();
@@ -359,7 +517,12 @@ public class RealOpcUaClient : IOpcUaClient
return; return;
} }
var sourceName = fields[1].Value is NodeId ? (fields[2].Value as string ?? "") : (fields[2].Value as string ?? ""); // Field layout (AlarmStateFields): [1]=SourceNode (NodeId), [2]=SourceName (string).
// Prefer the human-readable SourceName; fall back to the SourceNode NodeId string
// only when SourceName is absent/empty, so the condition still has a stable key.
var sourceName = fields[2].Value as string;
if (string.IsNullOrEmpty(sourceName))
sourceName = (fields[1].Value as NodeId)?.ToString() ?? "";
var conditionName = fields.Count > 11 ? fields[11].Value as string : null; var conditionName = fields.Count > 11 ? fields[11].Value as string : null;
var sourceObjectRef = sourceName; var sourceObjectRef = sourceName;
var sourceRef = string.IsNullOrEmpty(conditionName) ? sourceName : $"{sourceName}.{conditionName}"; var sourceRef = string.IsNullOrEmpty(conditionName) ? sourceName : $"{sourceName}.{conditionName}";
@@ -377,6 +540,25 @@ public class RealOpcUaClient : IOpcUaClient
var shelve = OpcUaAlarmMapper.MapShelve(fields.Count > 10 ? (fields[10].Value as LocalizedText)?.Text : null); var shelve = OpcUaAlarmMapper.MapShelve(fields.Count > 10 ? (fields[10].Value as LocalizedText)?.Text : null);
var comment = fields.Count > 12 ? (fields[12].Value as LocalizedText)?.Text ?? "" : ""; var comment = fields.Count > 12 ? (fields[12].Value as LocalizedText)?.Text ?? "" : "";
// Index 13: ActiveState/TransitionTime → OriginalRaiseTime (when active-state last
// transitioned to TRUE). Absent on non-AlarmCondition events → guard + null fallback.
DateTimeOffset? originalRaiseTime = null;
if (fields.Count > 13 && fields[13].Value is DateTime activeTransitionTime)
// OPC UA mandates UTC for DateTime fields; a TimeSpan.Zero offset treats an
// Unspecified Kind as UTC (consistent with the Time→TransitionTime mapping above).
originalRaiseTime = new DateTimeOffset(activeTransitionTime, TimeSpan.Zero);
// Indices 1417: LimitAlarmType set-point thresholds (HighHighLimit/HighLimit/
// LowLimit/LowLowLimit). Absent on non-limit alarm types → null when missing.
// Pick the first non-null value in priority order (HiHi > Hi > Lo > LoLo) as a
// display-only representative limit; the caller is responsible for interpreting
// which limit is active using AlarmTypeName or ConditionName.
var limitValue = OpcUaAlarmMapper.PickLimitValue(
fields.Count > 14 ? fields[14].Value : null,
fields.Count > 15 ? fields[15].Value : null,
fields.Count > 16 ? fields[16].Value : null,
fields.Count > 17 ? fields[17].Value : null);
var inRefresh = _alarmInRefresh.GetValueOrDefault(handle); var inRefresh = _alarmInRefresh.GetValueOrDefault(handle);
var lastState = _alarmLastState.GetValueOrDefault(handle); var lastState = _alarmLastState.GetValueOrDefault(handle);
var (prevActive, prevAcked) = lastState != null && lastState.TryGetValue(sourceRef, out var prev) ? prev : (false, true); var (prevActive, prevAcked) = lastState != null && lastState.TryGetValue(sourceRef, out var prev) ? prev : (false, true);
@@ -389,18 +571,23 @@ public class RealOpcUaClient : IOpcUaClient
onTransition(new NativeAlarmTransition( onTransition(new NativeAlarmTransition(
SourceReference: sourceRef, SourceReference: sourceRef,
SourceObjectReference: sourceObjectRef, SourceObjectReference: sourceObjectRef,
AlarmTypeName: eventType?.ToString() ?? "", // Resolve the event-type NodeId (e.g. "i=9341") to the friendly type name
// the conditionFilter gate keys off (M2.4 / #8); NodeId-string for custom types.
AlarmTypeName: ResolveAlarmTypeName(eventType),
Kind: kind, Kind: kind,
Condition: OpcUaAlarmMapper.BuildCondition(active, acked, confirmed, shelve, suppressed, severity), Condition: OpcUaAlarmMapper.BuildCondition(active, acked, confirmed, shelve, suppressed, severity),
// UNAVAILABLE via standard OPC UA A&C event fields — see BuildAlarmEventFilter comments.
Category: "", Category: "",
Description: "", Description: "",
Message: message, Message: message,
// UNAVAILABLE: OperatorUser not on refresh stream — see BuildAlarmEventFilter comments.
OperatorUser: "", OperatorUser: "",
OperatorComment: comment, OperatorComment: comment,
OriginalRaiseTime: null, OriginalRaiseTime: originalRaiseTime,
TransitionTime: time, TransitionTime: time,
// UNAVAILABLE: CurrentValue not a standard A&C event field — see BuildAlarmEventFilter.
CurrentValue: "", CurrentValue: "",
LimitValue: "")); LimitValue: limitValue));
} }
private static NativeAlarmTransition SnapshotComplete() => new( private static NativeAlarmTransition SnapshotComplete() => new(
@@ -0,0 +1,78 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.Alarms;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
/// <summary>
/// Parsed native-alarm condition filter (M2.4 / #8).
///
/// <para>
/// A source's <c>conditionFilter</c> is a comma-separated, case-insensitive list
/// of alarm/condition <em>type names</em>, matched against
/// <see cref="NativeAlarmTransition.AlarmTypeName"/>. A <c>null</c>, blank, or
/// all-empty list means "mirror every condition" (the historical behaviour),
/// represented here by <see cref="IsEmpty"/>.
/// </para>
///
/// <para>
/// This is the authoritative <em>client-side</em> gate consulted in the
/// <c>DataConnectionActor</c> routing path, so it applies uniformly across OPC UA
/// (whose server-side <c>WhereClause</c> is only a bandwidth optimisation) and the
/// MxGateway (whose single gateway-wide feed has no server-side filter at all).
/// Parse once at subscribe time; <see cref="IsAllowed"/> is the hot-path check.
/// </para>
/// </summary>
public sealed class AlarmConditionFilter
{
/// <summary>The shared allow-all instance (empty filter set).</summary>
public static readonly AlarmConditionFilter AllowAll = new(new HashSet<string>(StringComparer.OrdinalIgnoreCase));
private readonly HashSet<string> _names;
private AlarmConditionFilter(HashSet<string> names) => _names = names;
/// <summary><c>true</c> when no type names are configured — every condition is allowed.</summary>
public bool IsEmpty => _names.Count == 0;
/// <summary>The normalized (trimmed) type names, for the OPC UA server-side WhereClause optimisation.</summary>
public IReadOnlyCollection<string> Names => _names;
/// <summary>
/// Parses a raw <c>conditionFilter</c> string into a normalized, case-insensitive
/// type-name set. <c>null</c>/blank/all-empty input yields an empty (allow-all) filter.
/// </summary>
/// <param name="conditionFilter">The raw comma-separated filter string, or <c>null</c>.</param>
/// <returns>A parsed <see cref="AlarmConditionFilter"/>; never <c>null</c>.</returns>
public static AlarmConditionFilter Parse(string? conditionFilter)
{
if (string.IsNullOrWhiteSpace(conditionFilter))
return AllowAll;
var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
foreach (var raw in conditionFilter.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
names.Add(raw);
return names.Count == 0 ? AllowAll : new AlarmConditionFilter(names);
}
/// <summary>
/// Returns <c>true</c> when <paramref name="transition"/> should be delivered:
/// the filter is empty (allow all), the transition is a framing sentinel
/// (<see cref="AlarmTransitionKind.SnapshotComplete"/>, which carries no condition
/// type and must never be swallowed or the snapshot swap never completes), or its
/// <see cref="NativeAlarmTransition.AlarmTypeName"/> is in the configured set.
/// </summary>
/// <param name="transition">The protocol-neutral transition to test.</param>
/// <returns><c>true</c> to deliver the transition; <c>false</c> to drop it.</returns>
public bool IsAllowed(NativeAlarmTransition transition)
{
if (_names.Count == 0)
return true;
// SnapshotComplete is pure framing (no condition payload) — never filter it.
if (transition.Kind == AlarmTransitionKind.SnapshotComplete)
return true;
return _names.Contains(transition.AlarmTypeName);
}
}
@@ -19,6 +19,13 @@
<PackageReference Include="ZB.MOM.WW.MxGateway.Client" /> <PackageReference Include="ZB.MOM.WW.MxGateway.Client" />
</ItemGroup> </ItemGroup>
<ItemGroup>
<!-- Exposes internal alarm-filter shaping (RealOpcUaClient.BuildAlarmEventFilter)
to the test assembly so the server-side WhereClause can be unit-tested
without a live OPC UA server (M2.4 / #8). -->
<InternalsVisibleTo Include="ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests" />
</ItemGroup>
<ItemGroup> <ItemGroup>
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" /> <ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" />
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.HealthMonitoring/ZB.MOM.WW.ScadaBridge.HealthMonitoring.csproj" /> <ProjectReference Include="../ZB.MOM.WW.ScadaBridge.HealthMonitoring/ZB.MOM.WW.ScadaBridge.HealthMonitoring.csproj" />
@@ -1,4 +1,5 @@
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites; using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories; using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types; using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening; using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
@@ -111,8 +112,41 @@ public class FlatteningPipeline : IFlatteningPipeline
ReturnDefinition = s.ReturnDefinition ReturnDefinition = s.ReturnDefinition
}).ToList(); }).ToList();
// Validate // Compute the alarm-capable connection-name set so the semantic validator
var validation = _validationService.Validate(config, resolvedSharedScripts); // can gate native-alarm-source bindings. "Alarm-capable" matches the DCL
// runtime decision (DataConnectionActor: _adapter is IAlarmSubscribableConnection);
// here we filter connections by alarm-capable protocol, then collect their names.
//
// StringComparer.Ordinal is intentional: connection names are stored and
// matched as authored throughout the pipeline (all other name-keyed
// dictionaries in FlatteningService and SemanticValidator use the same
// case-sensitive semantics). OrdinalIgnoreCase would be inconsistent with
// the rest of the binding-resolution path.
var alarmCapableConnectionNames = dataConnections.Values
.Where(c => AlarmCapableProtocols.IsAlarmCapable(c.Protocol))
.Select(c => c.Name)
.ToHashSet(StringComparer.Ordinal);
// M2.8 (#23): the set of data-connection names that actually exist on the
// target site, used to verify each bound connection resolves to a real site
// connection. Same StringComparer.Ordinal as the rest of the binding-resolution
// path (connection names are matched as-authored throughout the pipeline).
var siteConnectionNames = dataConnections.Values
.Select(c => c.Name)
.ToHashSet(StringComparer.Ordinal);
// Validate. This is the deploy-gating path, so connection-binding completeness
// is enforced as an Error (enforceConnectionBindings: true): a data-sourced
// attribute with no binding — or one bound to a connection that no longer exists
// on the site — blocks the deployment. (The template DESIGN-TIME validate path in
// ManagementActor leaves this non-blocking by NOT enforcing, since bindings are
// set later at instance/deploy time.)
var validation = _validationService.Validate(
config,
resolvedSharedScripts,
alarmCapableConnectionNames,
enforceConnectionBindings: true,
siteConnectionNames: siteConnectionNames);
// Compute revision hash // Compute revision hash
var hash = _revisionHashService.ComputeHash(config); var hash = _revisionHashService.ComputeHash(config);
@@ -37,6 +37,14 @@ public static class StateTransitionValidator
/// <summary>Returns true when a delete operation is allowed from the given state.</summary> /// <summary>Returns true when a delete operation is allowed from the given state.</summary>
/// <param name="currentState">The current instance state.</param> /// <param name="currentState">The current instance state.</param>
/// <returns><see langword="true"/> if delete is permitted; otherwise <see langword="false"/>.</returns> /// <returns><see langword="true"/> if delete is permitted; otherwise <see langword="false"/>.</returns>
/// <remarks>
/// Delete is allowed from <see cref="InstanceState.NotDeployed"/> by design: an
/// undeployed instance would otherwise linger as an unremovable orphan record.
/// Delete from <c>NotDeployed</c> is a central-side record cleanup (no live site
/// config to tear down). This matches the state-transition matrix in
/// Component-DeploymentManager.md ("Delete from Not deployed = Yes") — reconciled
/// in M2.17 (#31); the deliberate behaviour was introduced in commit 1d5465f3.
/// </remarks>
public static bool CanDelete(InstanceState currentState) => public static bool CanDelete(InstanceState currentState) =>
currentState is InstanceState.NotDeployed or InstanceState.Enabled or InstanceState.Disabled; currentState is InstanceState.NotDeployed or InstanceState.Enabled or InstanceState.Disabled;
@@ -75,7 +75,7 @@ public class DatabaseGateway : IDatabaseGateway
new SqlConnection(connectionString); new SqlConnection(connectionString);
/// <inheritdoc /> /// <inheritdoc />
public async Task CachedWriteAsync( public async Task<ExternalCallResult> CachedWriteAsync(
string connectionName, string connectionName,
string sql, string sql,
IReadOnlyDictionary<string, object?>? parameters = null, IReadOnlyDictionary<string, object?>? parameters = null,
@@ -97,6 +97,44 @@ public class DatabaseGateway : IDatabaseGateway
throw new InvalidOperationException("Store-and-forward service not available for cached writes"); throw new InvalidOperationException("Store-and-forward service not available for cached writes");
} }
// M2.3 (#7): attempt the write IMMEDIATELY and classify the outcome,
// mirroring ExternalSystemClient.CachedCallAsync. The pre-M2.3 behaviour
// enqueued every write unconditionally and the S&F retry sweep then
// retried ALL failures forever — a permanent SQL error (constraint,
// syntax, permission) was never returned to the script and spun in the
// buffer indefinitely. Now:
// * success -> Delivered, NOT buffered;
// * PermanentDatabaseException -> Failed synchronously, NOT buffered;
// * TransientDatabaseException -> buffered to S&F for retry.
try
{
await ExecuteWriteAsync(
connectionName, definition.ConnectionString, sql, parameters ?? EmptyParameters, cancellationToken)
.ConfigureAwait(false);
// Immediate success — the write is done; do not buffer.
return new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null, WasBuffered: false);
}
catch (PermanentDatabaseException ex)
{
// Permanent failures are returned to the script and never buffered —
// mirrors the PermanentExternalSystemException branch on the API path.
_logger.LogWarning(
ex,
"CachedWrite to '{Connection}' failed permanently (SQL error {Number}); returning Failed without buffering.",
connectionName, ex.SqlErrorNumber);
return new ExternalCallResult(
Success: false, ResponseJson: null, ErrorMessage: $"Permanent database error: {ex.Message}", WasBuffered: false);
}
catch (TransientDatabaseException ex)
{
// Transient failure — hand to S&F so the retry sweep delivers it.
_logger.LogDebug(
ex,
"CachedWrite to '{Connection}' failed transiently (SQL error {Number}); buffering for retry.",
connectionName, ex.SqlErrorNumber);
}
var payload = JsonSerializer.Serialize(new var payload = JsonSerializer.Serialize(new
{ {
ConnectionName = connectionName, ConnectionName = connectionName,
@@ -119,6 +157,12 @@ public class DatabaseGateway : IDatabaseGateway
originInstanceName, originInstanceName,
definition.MaxRetries > 0 ? definition.MaxRetries : null, definition.MaxRetries > 0 ? definition.MaxRetries : null,
definition.RetryDelay > TimeSpan.Zero ? definition.RetryDelay : null, definition.RetryDelay > TimeSpan.Zero ? definition.RetryDelay : null,
// M2.3 (#7): attemptImmediateDelivery: false — this method already
// made the write attempt above (the transient-classified failure is
// exactly why we are buffering). Letting EnqueueAsync re-invoke the
// delivery handler would execute the same write a second time —
// mirrors ExternalSystemClient.CachedCallAsync.
attemptImmediateDelivery: false,
// Audit Log #23 (M3): pin the S&F message id to the // Audit Log #23 (M3): pin the S&F message id to the
// TrackedOperationId so the retry loop (Bundle E Tasks E4/E5) can // TrackedOperationId so the retry loop (Bundle E Tasks E4/E5) can
// read it back via StoreAndForwardMessage.Id and emit per-attempt + // read it back via StoreAndForwardMessage.Id and emit per-attempt +
@@ -136,17 +180,29 @@ public class DatabaseGateway : IDatabaseGateway
// retry-loop cached-write audit rows correlate back to the // retry-loop cached-write audit rows correlate back to the
// cross-execution chain. Null for a non-routed run. // cross-execution chain. Null for a non-routed run.
parentExecutionId: parentExecutionId); parentExecutionId: parentExecutionId);
// Buffered for retry — mirrors the API path's WasBuffered=true result.
return new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null, WasBuffered: true);
} }
/// <summary> /// <summary>
/// WP-9/10: Delivers a buffered CachedDbWrite during a store-and-forward retry /// WP-9/10: Delivers a buffered CachedDbWrite during a store-and-forward retry
/// sweep — executes the SQL against the named connection. Returns true on /// sweep — executes the SQL against the named connection.
/// success, false if the connection no longer exists (the message is parked);
/// throws on any execution error so the engine retries.
/// </summary> /// </summary>
/// <remarks>
/// M2.3 (#7): the outcome is classified, mirroring
/// <see cref="ExternalSystemClient.DeliverBufferedAsync"/>. Returns
/// <c>false</c> — so the S&amp;F engine PARKS the message — when the
/// connection no longer exists, the payload is unreadable, or the SQL fails
/// with a PERMANENT error (constraint / syntax / permission). A TRANSIENT SQL
/// error (<see cref="TransientDatabaseException"/>) propagates so the engine
/// retries. The pre-M2.3 code rethrew on ANY SQL error, so a permanent
/// failure on the retry path looped forever.
/// </remarks>
/// <param name="message">The buffered store-and-forward message to deliver.</param> /// <param name="message">The buffered store-and-forward message to deliver.</param>
/// <param name="cancellationToken">Cancellation token for the delivery operation.</param> /// <param name="cancellationToken">Cancellation token for the delivery operation.</param>
/// <returns>A task that resolves to <c>true</c> on success, or <c>false</c> if the connection no longer exists.</returns> /// <returns>A task that resolves to <c>true</c> on success, or <c>false</c> when the message must be parked.</returns>
/// <exception cref="TransientDatabaseException">Thrown on a transient SQL failure so the engine retries.</exception>
public async Task<bool> DeliverBufferedAsync( public async Task<bool> DeliverBufferedAsync(
StoreAndForwardMessage message, CancellationToken cancellationToken = default) StoreAndForwardMessage message, CancellationToken cancellationToken = default)
{ {
@@ -185,22 +241,152 @@ public class DatabaseGateway : IDatabaseGateway
return false; return false;
} }
await using var connection = new SqlConnection(definition.ConnectionString); // Materialise the buffered JsonElement parameters into CLR values once,
await connection.OpenAsync(cancellationToken); // then run through the shared ExecuteWriteAsync seam so both the
using var command = connection.CreateCommand(); // immediate-attempt path and this retry path classify SqlException the
command.CommandText = payload.Sql; // same way.
if (payload.Parameters != null) IReadOnlyDictionary<string, object?> materialisedParameters =
payload.Parameters == null
? EmptyParameters
: payload.Parameters.ToDictionary(
kv => kv.Key, kv => (object?)JsonElementToParameterValue(kv.Value));
try
{ {
foreach (var (key, value) in payload.Parameters) await ExecuteWriteAsync(
{ payload.ConnectionName, definition.ConnectionString, payload.Sql, materialisedParameters, cancellationToken)
var parameter = command.CreateParameter(); .ConfigureAwait(false);
parameter.ParameterName = key.StartsWith('@') ? key : "@" + key; return true;
parameter.Value = JsonElementToParameterValue(value);
command.Parameters.Add(parameter);
}
} }
await command.ExecuteNonQueryAsync(cancellationToken); catch (PermanentDatabaseException ex)
return true; {
// Permanent — parking is correct; retrying the identical statement
// cannot succeed. Mirrors ExternalSystemClient.DeliverBufferedAsync
// returning false on PermanentExternalSystemException.
_logger.LogError(
ex,
"Buffered DB write to '{Connection}' failed permanently (SQL error {Number}); parking.",
payload.ConnectionName, ex.SqlErrorNumber);
return false;
}
// TransientDatabaseException propagates — the S&F engine retries.
}
/// <summary>
/// Reusable empty parameter map so the no-parameter paths do not allocate a
/// fresh dictionary each call.
/// </summary>
private static readonly IReadOnlyDictionary<string, object?> EmptyParameters =
new Dictionary<string, object?>();
/// <summary>
/// M2.3 (#7): executes a parameterised SQL write against the given connection
/// string and classifies the outcome into
/// <see cref="TransientDatabaseException"/> / <see cref="PermanentDatabaseException"/>,
/// mirroring the ordered catches of
/// <see cref="ExternalSystemClient.InvokeHttpAsync"/> on the API path:
/// caller-requested cancellation propagates unchanged; a <see cref="SqlException"/>
/// is classified by error number via <see cref="SqlErrorClassifier"/>; a
/// non-<see cref="SqlException"/> transport/connection outage is classified
/// transient via <see cref="SqlErrorClassifier.IsTransient(System.Exception)"/>;
/// genuinely-unexpected exceptions propagate. This is the single classification
/// seam shared by the immediate <see cref="CachedWriteAsync"/> attempt and the
/// <see cref="DeliverBufferedAsync"/> retry path. Marked <c>internal virtual</c>
/// so tests can substitute already-classified outcomes; the raw I/O lives in
/// the inner <see cref="RunSqlAsync"/> seam so tests can also drive raw outage
/// exceptions through this classification (without fabricating a
/// <see cref="SqlException"/>, which has no public constructor).
/// </summary>
/// <param name="connectionName">The human-readable connection name, used only for the classified error message (never the connection string — that would leak credentials into logs / script-visible errors).</param>
/// <param name="connectionString">The ADO.NET connection string to write through.</param>
/// <param name="sql">The SQL statement to execute.</param>
/// <param name="parameters">Materialised CLR parameter values (may be empty).</param>
/// <param name="cancellationToken">Cancellation token for the write.</param>
/// <returns>A task that completes when the write succeeds.</returns>
/// <exception cref="OperationCanceledException">Rethrown unchanged when the caller's <paramref name="cancellationToken"/> requested cancellation.</exception>
/// <exception cref="TransientDatabaseException">Thrown for a transient SQL error number or a non-Sql transport/connection outage.</exception>
/// <exception cref="PermanentDatabaseException">Thrown for a permanent (or unknown) SQL error number.</exception>
internal virtual async Task ExecuteWriteAsync(
string connectionName,
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
// M2.3 (#7) code-review fix: the catch ordering MIRRORS
// ExternalSystemClient.InvokeHttpAsync exactly so the SQL path classifies
// a live outage the same way the HTTP path does:
// 1. caller-requested cancellation propagates UNCHANGED (never a "DB error");
// 2. a SqlException is classified by error number (transient/permanent);
// 3. a NON-SqlException transport/connection failure (InvalidOperationException
// "connection not open", IOException, SocketException, TimeoutException,
// a non-Sql DbException, …) is TRANSIENT — buffered + retried, because a
// retry can succeed once the server is reachable. The pre-fix code only
// caught SqlException, so these escaped unclassified and crashed the
// Script Execution Actor instead of buffering;
// 4. genuinely-unexpected exceptions (e.g. an authoring ArgumentException)
// propagate — same as the HTTP path lets unexpected exceptions escape.
try
{
await RunSqlAsync(connectionString, sql, parameters, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// [2] The caller asked to abandon the work — propagate the cancellation
// unchanged; it must never be reclassified as a transient DB error.
throw;
}
catch (SqlException ex)
{
// Classify by SqlException.Number and rethrow as the strongly-typed
// transient / permanent failure the callers branch on. The context
// is the connection NAME, never the connection string.
throw SqlErrorClassifier.Throw(connectionName, ex);
}
catch (Exception ex) when (SqlErrorClassifier.IsTransient(ex))
{
// [1] A live outage that did not surface as a SqlException — treat as
// transient so the caller buffers + retries. The message uses the
// connection NAME, never the connection string (credential safety).
throw new TransientDatabaseException(
$"Transient database error on {connectionName}: {ex.Message}",
errorNumber: null,
ex);
}
}
/// <summary>
/// M2.3 (#7): the raw ADO.NET write — opens the connection, builds the
/// command, and executes it. Marked <c>internal virtual</c> so tests can throw
/// RAW outage-shaped exceptions (e.g. <see cref="InvalidOperationException"/>,
/// <see cref="System.Net.Sockets.SocketException"/>) through the PRODUCTION
/// classification in <see cref="ExecuteWriteAsync"/>. This is the SQL parallel
/// of <c>client.SendAsync</c> inside <see cref="ExternalSystemClient.InvokeHttpAsync"/>:
/// the actual I/O, wrapped by the ordered classification catches in the caller.
/// </summary>
/// <param name="connectionString">The ADO.NET connection string to write through.</param>
/// <param name="sql">The SQL statement to execute.</param>
/// <param name="parameters">Materialised CLR parameter values (may be empty).</param>
/// <param name="cancellationToken">Cancellation token for the write.</param>
/// <returns>A task that completes when the write succeeds.</returns>
internal virtual async Task RunSqlAsync(
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
await using var connection = new SqlConnection(connectionString);
await connection.OpenAsync(cancellationToken).ConfigureAwait(false);
using var command = connection.CreateCommand();
command.CommandText = sql;
foreach (var (key, value) in parameters)
{
var parameter = command.CreateParameter();
parameter.ParameterName = key.StartsWith('@') ? key : "@" + key;
parameter.Value = value ?? DBNull.Value;
command.Parameters.Add(parameter);
}
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
} }
// ExternalSystemGateway-020: a JSON number that does not fit in Int64 must // ExternalSystemGateway-020: a JSON number that does not fit in Int64 must
@@ -0,0 +1,217 @@
using System.Data.Common;
using System.IO;
using System.Net.Sockets;
using Microsoft.Data.SqlClient;
namespace ZB.MOM.WW.ScadaBridge.ExternalSystemGateway;
/// <summary>
/// M2.3 (#7): classifies a SQL Server failure as transient (a brief wait /
/// retry may succeed — buffer to store-and-forward) or permanent (the identical
/// statement cannot succeed — return to the script / park the buffered message).
/// </summary>
/// <remarks>
/// <para>
/// This is the database-side parallel of <see cref="ErrorClassifier"/> (the
/// HTTP path). The two are kept separate because the inputs differ: HTTP keys
/// off status codes / exception types, SQL keys off
/// <see cref="SqlException.Number"/>.
/// </para>
/// <para>
/// <b>Transient set.</b> Only connection-loss, timeout, deadlock, and Azure SQL
/// throttle/availability error numbers are transient — failures whose cause is
/// external to the statement and may clear on its own:
/// <list type="bullet">
/// <item><c>-2</c> — query / command timeout expired.</item>
/// <item><c>-1</c> — a connection-level error (general SqlClient connection failure).</item>
/// <item><c>2</c> — SQL Server / network instance not found or not accessible.</item>
/// <item><c>53</c> — network path to the server was not found.</item>
/// <item><c>64</c> — connection terminated mid-session (transport error).</item>
/// <item><c>233</c> — no process on the other end of the named pipe.</item>
/// <item><c>1205</c> — the session was chosen as a deadlock victim.</item>
/// <item><c>10053</c> — transport-level abort (software caused connection abort).</item>
/// <item><c>10054</c> — connection reset by peer.</item>
/// <item><c>10060</c> — connection attempt timed out.</item>
/// <item><c>40197</c> — Azure SQL service error processing the request; retry.</item>
/// <item><c>40501</c> — Azure SQL service is busy.</item>
/// <item><c>40613</c> — Azure SQL database is currently unavailable.</item>
/// <item><c>49918</c> / <c>49919</c> / <c>49920</c> — Azure SQL throttling (too many requests / operations).</item>
/// </list>
/// </para>
/// <para>
/// <b>Everything else is permanent.</b> Constraint violations (547, 2627, 2601),
/// syntax errors (102, 156, 207, 208), and permission errors (229, 230, 262) are
/// the obvious permanent cases, but the policy is broader: <b>any error number not
/// in the transient set — including unknown / undocumented / ambiguous numbers —
/// is treated as permanent.</b> Fail-fast is the safer default: silently
/// retrying an unrecognised error forever (the pre-M2.3 behaviour) hides
/// authoring bugs and can replay duplicate side effects. A genuinely transient
/// number we have not enumerated will, at worst, surface to the script as a
/// permanent failure — a loud, fixable outcome — rather than spin in an
/// unbounded retry loop.
/// </para>
/// </remarks>
public static class SqlErrorClassifier
{
/// <summary>
/// The complete set of SQL Server error numbers treated as transient. See the
/// type-level remarks for the per-number rationale. Anything outside this set
/// is permanent.
/// </summary>
private static readonly HashSet<int> TransientErrorNumbers = new()
{
-2, -1, 2, 53, 64, 233, 1205,
10053, 10054, 10060,
40197, 40501, 40613,
49918, 49919, 49920,
};
/// <summary>
/// Determines whether a SQL Server error number represents a transient
/// failure. Unknown / undocumented numbers default to permanent
/// (<see langword="false"/>) — see the type-level remarks.
/// </summary>
/// <param name="errorNumber">The SQL Server error number (e.g. <see cref="SqlException.Number"/>).</param>
/// <returns><see langword="true"/> if the number is in the transient set; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(int errorNumber) => TransientErrorNumbers.Contains(errorNumber);
/// <summary>
/// Determines whether a <see cref="SqlException"/> represents a transient
/// failure by classifying its top-level <see cref="SqlException.Number"/>.
/// </summary>
/// <param name="exception">The SQL exception to classify.</param>
/// <returns><see langword="true"/> if the exception's error number is transient; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(SqlException exception)
{
ArgumentNullException.ThrowIfNull(exception);
return IsTransient(exception.Number);
}
/// <summary>
/// Determines whether an arbitrary <see cref="Exception"/> represents a
/// transient database failure — the SQL-path parallel of
/// <see cref="ErrorClassifier.IsTransient(System.Exception)"/> on the HTTP path.
/// </summary>
/// <remarks>
/// <para>
/// A live DB outage does not always surface as a <see cref="SqlException"/>:
/// once the underlying connection / socket is torn down, the driver raises
/// transport-level exceptions instead. These are <b>retryable</b> — a retry
/// can succeed once the server is reachable again — so they are classified
/// transient (buffered to store-and-forward) rather than escaping unclassified
/// to crash the calling Script Execution Actor. The transient set:
/// </para>
/// <list type="bullet">
/// <item><see cref="InvalidOperationException"/> — connection-state error (e.g. "the connection is not open" / pooled connection broken).</item>
/// <item><see cref="IOException"/> — transport read/write failure mid-session.</item>
/// <item><see cref="SocketException"/> — TCP-level failure (connection refused/reset/timed out).</item>
/// <item><see cref="TimeoutException"/> — command / connection timeout surfaced as a CLR <see cref="TimeoutException"/>.</item>
/// <item><see cref="TaskCanceledException"/> — driver-level cancellation/timeout NOT tied to a caller token (the caller-token case is handled before classification — see the gateway's ordered catches).</item>
/// <item>Any <see cref="DbException"/> that is NOT a <see cref="SqlException"/> — a provider/driver transport error (a real <see cref="SqlException"/> is classified by error number via the overloads above, never here).</item>
/// </list>
/// <para>
/// <b>Everything else is NOT transient</b> and must propagate, exactly as the
/// HTTP path lets genuinely-unexpected exceptions escape past its
/// <c>catch (Exception ex) when (ErrorClassifier.IsTransient(ex))</c> filter.
/// Authoring bugs (<see cref="ArgumentException"/>, <see cref="NullReferenceException"/>,
/// etc.) are loud, fixable failures — silently buffering and retrying them
/// forever would hide the bug.
/// </para>
/// </remarks>
/// <param name="exception">The exception to classify.</param>
/// <returns><see langword="true"/> for a transport/connection/timeout/driver exception; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(Exception exception)
{
ArgumentNullException.ThrowIfNull(exception);
// A real SqlException is classified by its error number (the overloads
// above), never by type — fall back to the number-based policy so an
// unknown SqlException stays permanent (fail-fast) rather than being
// swept up as transient by the DbException catch-all below.
if (exception is SqlException sql)
{
return IsTransient(sql);
}
return exception is InvalidOperationException
or IOException
or SocketException
or TimeoutException
or TaskCanceledException
or DbException; // any non-SqlException DbException (SqlException handled above)
}
/// <summary>
/// Classifies a <see cref="SqlException"/> and rethrows it as the matching
/// strongly-typed failure: <see cref="TransientDatabaseException"/> for a
/// transient error number, <see cref="PermanentDatabaseException"/> otherwise.
/// Mirrors <see cref="ErrorClassifier.AsTransient(string, System.Exception?)"/>
/// + the throw of <see cref="PermanentExternalSystemException"/> on the HTTP
/// path — the callers then branch on the typed exception rather than on the
/// raw <see cref="SqlException"/>.
/// </summary>
/// <param name="context">A short human-readable description of the failing operation (e.g. the connection name).</param>
/// <param name="exception">The SQL exception to classify and wrap.</param>
/// <returns>This method never returns normally — it always throws.</returns>
/// <exception cref="TransientDatabaseException">Thrown when the error number is transient.</exception>
/// <exception cref="PermanentDatabaseException">Thrown when the error number is permanent (the default).</exception>
public static Exception Throw(string context, SqlException exception)
{
ArgumentNullException.ThrowIfNull(exception);
if (IsTransient(exception))
{
throw new TransientDatabaseException(
$"Transient SQL error {exception.Number} on {context}: {exception.Message}",
exception.Number,
exception);
}
throw new PermanentDatabaseException(
$"Permanent SQL error {exception.Number} on {context}: {exception.Message}",
exception.Number,
exception);
}
}
/// <summary>
/// Signals a transient database failure suitable for store-and-forward retry —
/// the SQL-path parallel of <see cref="TransientExternalSystemException"/>.
/// </summary>
public class TransientDatabaseException : Exception
{
/// <summary>Gets the SQL Server error number that caused the failure, if known.</summary>
public int? SqlErrorNumber { get; }
/// <summary>Initializes a new <see cref="TransientDatabaseException"/>.</summary>
/// <param name="message">The error message.</param>
/// <param name="errorNumber">The SQL Server error number, if available.</param>
/// <param name="innerException">Optional inner exception (typically the original <see cref="SqlException"/>).</param>
public TransientDatabaseException(string message, int? errorNumber = null, Exception? innerException = null)
: base(message, innerException)
{
SqlErrorNumber = errorNumber;
}
}
/// <summary>
/// Signals a permanent database failure that must not be retried — the SQL-path
/// parallel of <see cref="PermanentExternalSystemException"/>. Returned
/// synchronously to the calling script on the immediate attempt and parks the
/// message on the store-and-forward retry path.
/// </summary>
public class PermanentDatabaseException : Exception
{
/// <summary>Gets the SQL Server error number that caused the failure, if known.</summary>
public int? SqlErrorNumber { get; }
/// <summary>Initializes a new <see cref="PermanentDatabaseException"/>.</summary>
/// <param name="message">The error message.</param>
/// <param name="errorNumber">The SQL Server error number, if available.</param>
/// <param name="innerException">Optional inner exception (typically the original <see cref="SqlException"/>).</param>
public PermanentDatabaseException(string message, int? errorNumber = null, Exception? innerException = null)
: base(message, innerException)
{
SqlErrorNumber = errorNumber;
}
}
@@ -111,6 +111,23 @@ public interface ISiteHealthCollector
/// <param name="count">The number of parked messages.</param> /// <param name="count">The number of parked messages.</param>
void SetParkedMessageCount(int count); void SetParkedMessageCount(int count);
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — replace the latest cumulative
/// site-event-log write-failure count (SQLite error, disk full,
/// bounded-queue overflow drop) used by the next <see cref="CollectReport"/>
/// call. Refreshed periodically by the <c>SiteEventLogFailureCountReporter</c>
/// hosted service. Point-in-time: the value is NOT reset on
/// <see cref="CollectReport"/>; it carries forward until the next poller
/// refresh. Default interface implementation is a no-op so existing test
/// fakes continue to compile without per-fake updates.
/// </summary>
/// <param name="count">The cumulative failed-write count from <c>ISiteEventLogger.FailedWriteCount</c>.</param>
void SetSiteEventLogWriteFailures(long count)
{
// Default no-op so test fakes do not need to be updated. The real
// SiteHealthCollector overrides this with the Interlocked.Exchange store.
}
/// <summary> /// <summary>
/// Sets the hostname of this node. /// Sets the hostname of this node.
/// </summary> /// </summary>
@@ -1,11 +1,25 @@
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions; using Microsoft.Extensions.DependencyInjection.Extensions;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring; namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring;
public static class ServiceCollectionExtensions public static class ServiceCollectionExtensions
{ {
/// <summary>
/// Sentinel marker used by <see cref="AddSiteEventLogHealthMetricsBridge"/> to
/// implement an idempotency guard. Because the reporter is registered via a
/// factory-lambda overload of <c>AddHostedService</c>, its
/// <see cref="Microsoft.Extensions.DependencyInjection.ServiceDescriptor.ImplementationType"/>
/// is <see langword="null"/> — checking it would be a silent no-op. Registering
/// this marker as a singleton and guarding on its <c>ServiceType</c> gives a
/// reliable, allocation-free sentinel that works regardless of how the hosted
/// service was wired.
/// </summary>
private sealed class SiteEventLogHealthMetricsBridgeMarker { }
/// <summary> /// <summary>
/// Register site-side health monitoring services (metric collection + periodic reporting). /// Register site-side health monitoring services (metric collection + periodic reporting).
/// Call this on site nodes only. For central, call AddCentralHealthAggregation() instead. /// Call this on site nodes only. For central, call AddCentralHealthAggregation() instead.
@@ -50,6 +64,77 @@ public static class ServiceCollectionExtensions
return services; return services;
} }
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — register the
/// <see cref="SiteEventLogFailureCountReporter"/> hosted service that
/// periodically reads the cumulative event-log write-failure count and
/// pushes it into <see cref="ISiteHealthCollector"/> as a point-in-time
/// snapshot (<c>SiteEventLogWriteFailures</c> on the site health report).
/// </summary>
/// <remarks>
/// <para>
/// Must be called AFTER <see cref="AddSiteHealthMonitoring"/> (or
/// <see cref="AddHealthMonitoring"/>) which registers the
/// <see cref="ISiteHealthCollector"/> the reporter depends on.
/// </para>
/// <para>
/// <b>Why a Func&lt;long&gt; delegate instead of ISiteEventLogger.</b>
/// A direct <c>HealthMonitoring → SiteEventLogging</c> reference is avoided to
/// prevent an undesirable low-level coupling: <c>SiteEventLogging</c> is a
/// leaf component that should not pull in higher-level infrastructure. The
/// <see cref="Func{TResult}"/> delegate seam keeps the reference one-way and
/// loose: the caller (Host site wiring) captures
/// <c>ISiteEventLogger.FailedWriteCount</c> as a lambda and passes it here.
/// Note: <c>HealthMonitoring → StoreAndForward → SiteEventLogging</c> already
/// exists as a transitive path, so a direct reference would not introduce a
/// cycle — the delegate is purely a coupling-avoidance measure.
/// </para>
/// <para>
/// Idempotent — a <see cref="SiteEventLogHealthMetricsBridgeMarker"/> singleton
/// is used as the sentinel. Because the reporter is registered via a factory-lambda
/// overload of <c>AddHostedService</c>, its
/// <see cref="Microsoft.Extensions.DependencyInjection.ServiceDescriptor.ImplementationType"/>
/// is <see langword="null"/>; checking it would be a silent no-op and a second
/// call would spin up a second polling timer. Guarding on the marker's
/// <c>ServiceType</c> is always reliable regardless of how the hosted service
/// was wired (AddHostedService has no TryAdd variant).
/// </para>
/// </remarks>
/// <param name="services">The service collection to register into.</param>
/// <param name="failedWriteCountProvider">
/// A factory delegate that, given the root <see cref="IServiceProvider"/>,
/// returns a <see cref="Func{TResult}"/> that reads the current cumulative
/// event-log write-failure count. Typically:
/// <c>sp => () => sp.GetRequiredService&lt;ISiteEventLogger&gt;().FailedWriteCount</c>.
/// The factory is evaluated once at hosted-service resolution time; the inner
/// <see cref="Func{TResult}"/> is called on every poll tick.
/// </param>
/// <returns>The same <see cref="IServiceCollection"/> for chaining.</returns>
public static IServiceCollection AddSiteEventLogHealthMetricsBridge(
this IServiceCollection services,
Func<IServiceProvider, Func<long>> failedWriteCountProvider)
{
ArgumentNullException.ThrowIfNull(services);
ArgumentNullException.ThrowIfNull(failedWriteCountProvider);
// Idempotent guard — uses the marker type rather than ImplementationType because
// AddHostedService(factory-lambda) sets only ImplementationFactory and leaves
// ImplementationType null; an ImplementationType == check is a silent no-op for
// factory-registered services. The marker singleton's ServiceType is always set.
if (services.Any(d => d.ServiceType == typeof(SiteEventLogHealthMetricsBridgeMarker)))
{
return services;
}
services.AddSingleton<SiteEventLogHealthMetricsBridgeMarker>();
services.AddHostedService(sp => new SiteEventLogFailureCountReporter(
failedWriteCountProvider(sp),
sp.GetRequiredService<ISiteHealthCollector>(),
sp.GetRequiredService<ILogger<SiteEventLogFailureCountReporter>>()));
return services;
}
/// <summary> /// <summary>
/// HealthMonitoring-014: register the <see cref="HealthMonitoringOptionsValidator"/> /// HealthMonitoring-014: register the <see cref="HealthMonitoringOptionsValidator"/>
/// so a misconfigured <c>ScadaBridge:HealthMonitoring</c> section (zero/negative /// so a misconfigured <c>ScadaBridge:HealthMonitoring</c> section (zero/negative
@@ -0,0 +1,146 @@
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring;
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — site-side hosted service that
/// periodically reads the cumulative event-log write-failure count and pushes
/// it into <see cref="ISiteHealthCollector"/> so the next
/// <see cref="ISiteHealthCollector.CollectReport"/> emits a fresh
/// <c>SiteEventLogWriteFailures</c> field on the site health report.
/// </summary>
/// <remarks>
/// <para>
/// <b>Why a Func&lt;long&gt; and not ISiteEventLogger directly.</b>
/// A direct <c>HealthMonitoring → SiteEventLogging</c> reference is avoided
/// to prevent an undesirable low-level coupling: <c>SiteEventLogging</c> is a
/// leaf component that should not pull in higher-level infrastructure. Note that
/// <c>HealthMonitoring → StoreAndForward → SiteEventLogging</c> already
/// exists as a transitive path (confirmed: <c>StoreAndForward.csproj</c> references
/// <c>SiteEventLogging.csproj</c>), so a direct reference would NOT introduce a
/// cycle — the delegate is purely a coupling-avoidance measure. The
/// <see cref="Func{TResult}"/> seam lets the caller (Host site wiring) capture
/// <c>ISiteEventLogger.FailedWriteCount</c> as a lambda at registration time; this
/// service reads only the numeric result. The delegate approach is a standard
/// pattern for counter bridges and keeps the registration path self-documenting.
/// </para>
/// <para>
/// <b>Cadence.</b> 30 s by default — the same cadence as
/// <c>SiteAuditBacklogReporter</c>, which is coarse enough to stay within
/// the health-report interval budget while keeping the central dashboard
/// current.
/// </para>
/// <para>
/// <b>Failure containment.</b> Any unexpected exception during the probe is
/// caught and logged; the next tick retries. Mirrors
/// <c>SiteAuditBacklogReporter</c>'s "exception logged, not propagated"
/// contract.
/// </para>
/// </remarks>
public sealed class SiteEventLogFailureCountReporter : IHostedService, IDisposable
{
/// <summary>
/// Default poll cadence. Matches <c>SiteAuditBacklogReporter.DefaultRefreshInterval</c>
/// (30 s) — coarse enough to amortise the read across many reports, fine
/// enough that the central dashboard never lags by more than one
/// health-report interval.
/// </summary>
internal static readonly TimeSpan DefaultRefreshInterval = TimeSpan.FromSeconds(30);
private readonly Func<long> _failedWriteCountProvider;
private readonly ISiteHealthCollector _collector;
private readonly ILogger<SiteEventLogFailureCountReporter> _logger;
private readonly TimeSpan _refreshInterval;
private CancellationTokenSource? _cts;
private Task? _loop;
/// <summary>Initializes a new instance of <see cref="SiteEventLogFailureCountReporter"/>.</summary>
/// <param name="failedWriteCountProvider">
/// A delegate that returns the current cumulative event-log write-failure count.
/// Typically wired as <c>() => sp.GetRequiredService&lt;ISiteEventLogger&gt;().FailedWriteCount</c>
/// in the Host site composition root.
/// </param>
/// <param name="collector">The site health collector that receives the failure-count snapshot.</param>
/// <param name="logger">Logger instance.</param>
/// <param name="refreshInterval">Poll interval override; defaults to <see cref="DefaultRefreshInterval"/> (30 s).</param>
public SiteEventLogFailureCountReporter(
Func<long> failedWriteCountProvider,
ISiteHealthCollector collector,
ILogger<SiteEventLogFailureCountReporter> logger,
TimeSpan? refreshInterval = null)
{
_failedWriteCountProvider = failedWriteCountProvider
?? throw new ArgumentNullException(nameof(failedWriteCountProvider));
_collector = collector ?? throw new ArgumentNullException(nameof(collector));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_refreshInterval = refreshInterval ?? DefaultRefreshInterval;
}
/// <summary>Starts the background polling loop, running an immediate first probe before entering the timed cycle.</summary>
/// <param name="ct">Cancellation token signalling host shutdown.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
public Task StartAsync(CancellationToken ct)
{
// Linked CTS lets StopAsync's cancellation AND the host's shutdown
// token both terminate the loop; either side firing aborts the
// pending Task.Delay.
_cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
_loop = Task.Run(() => RunLoopAsync(_cts.Token));
return Task.CompletedTask;
}
private async Task RunLoopAsync(CancellationToken ct)
{
// First tick runs immediately so the very first health report after
// process start carries a real failure-count snapshot — without this
// the dashboard would show 0 for the first 30 s after a deploy even
// if failures had already accumulated.
SafeProbe();
while (!ct.IsCancellationRequested)
{
try
{
await Task.Delay(_refreshInterval, ct).ConfigureAwait(false);
}
catch (OperationCanceledException)
{
break;
}
SafeProbe();
}
}
private void SafeProbe()
{
try
{
var count = _failedWriteCountProvider();
_collector.SetSiteEventLogWriteFailures(count);
}
catch (Exception ex)
{
// Catch-all is deliberate: the hosted service must survive every
// class of probe failure so the next tick gets a chance. Mirrors
// SiteAuditBacklogReporter's "exception logged, not propagated" contract.
_logger.LogWarning(ex, "SiteEventLogFailureCountReporter probe failed; next tick will retry.");
}
}
/// <summary>Signals the polling loop to stop and waits for it to complete.</summary>
/// <param name="ct">Cancellation token (not used; the internal CTS governs shutdown).</param>
/// <returns>A task that represents the asynchronous operation.</returns>
public Task StopAsync(CancellationToken ct)
{
_cts?.Cancel();
return _loop ?? Task.CompletedTask;
}
/// <summary>Releases the internal <see cref="CancellationTokenSource"/> used to stop the polling loop.</summary>
public void Dispose()
{
_cts?.Dispose();
}
}
@@ -17,6 +17,7 @@ public class SiteHealthCollector : ISiteHealthCollector
private int _siteAuditWriteFailures; private int _siteAuditWriteFailures;
private int _auditRedactionFailures; private int _auditRedactionFailures;
private volatile SiteAuditBacklogSnapshot? _siteAuditBacklog; private volatile SiteAuditBacklogSnapshot? _siteAuditBacklog;
private long _siteEventLogWriteFailures;
private readonly ConcurrentDictionary<string, ConnectionHealth> _connectionStatuses = new(); private readonly ConcurrentDictionary<string, ConnectionHealth> _connectionStatuses = new();
private readonly ConcurrentDictionary<string, TagResolutionStatus> _tagResolutionCounts = new(); private readonly ConcurrentDictionary<string, TagResolutionStatus> _tagResolutionCounts = new();
private readonly ConcurrentDictionary<string, string> _connectionEndpoints = new(); private readonly ConcurrentDictionary<string, string> _connectionEndpoints = new();
@@ -77,6 +78,12 @@ public class SiteHealthCollector : ISiteHealthCollector
_siteAuditBacklog = snapshot ?? throw new ArgumentNullException(nameof(snapshot)); _siteAuditBacklog = snapshot ?? throw new ArgumentNullException(nameof(snapshot));
} }
/// <inheritdoc />
public void SetSiteEventLogWriteFailures(long count)
{
Interlocked.Exchange(ref _siteEventLogWriteFailures, count);
}
/// <inheritdoc /> /// <inheritdoc />
public void UpdateConnectionHealth(string connectionName, ConnectionHealth health) public void UpdateConnectionHealth(string connectionName, ConnectionHealth health)
{ {
@@ -206,6 +213,7 @@ public class SiteHealthCollector : ISiteHealthCollector
ClusterNodes: _clusterNodes?.ToList(), ClusterNodes: _clusterNodes?.ToList(),
SiteAuditWriteFailures: siteAuditWriteFailures, SiteAuditWriteFailures: siteAuditWriteFailures,
AuditRedactionFailure: auditRedactionFailures, AuditRedactionFailure: auditRedactionFailures,
SiteAuditBacklog: _siteAuditBacklog); SiteAuditBacklog: _siteAuditBacklog,
SiteEventLogWriteFailures: Interlocked.Read(ref _siteEventLogWriteFailures));
} }
} }
@@ -7,6 +7,8 @@ public class DatabaseOptions
{ {
/// <summary>Connection string for the central configuration SQL Server database.</summary> /// <summary>Connection string for the central configuration SQL Server database.</summary>
public string? ConfigurationDb { get; set; } public string? ConfigurationDb { get; set; }
/// <summary>Connection string for the central machine-data SQL Server database.</summary>
public string? MachineDataDb { get; set; }
/// <summary>File system path to the site-local SQLite database directory.</summary> /// <summary>File system path to the site-local SQLite database directory.</summary>
public string? SiteDbPath { get; set; } public string? SiteDbPath { get; set; }
} }
@@ -0,0 +1,175 @@
using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Logging;
namespace ZB.MOM.WW.ScadaBridge.Host.Health;
/// <summary>
/// M2.14 (#28): readiness check that verifies every <b>required central cluster
/// singleton</b> is reachable from this node, satisfying the "required cluster
/// singletons running (if applicable)" clause of REQ-HOST-4a. Register it
/// <see cref="ZB.MOM.WW.Health.ZbHealthTags.Ready"/>-tagged in the Central-role
/// <c>AddHealthChecks()</c> chain only, so it is naturally role-scoped (site nodes
/// never register it).
/// </summary>
/// <remarks>
/// <para>
/// <b>Probe strategy.</b> Each central singleton has a local
/// <c>ClusterSingletonProxy</c> actor (created unconditionally in
/// <c>AkkaHostedService.RegisterCentralActors</c>). The proxy actor exists locally
/// as soon as it is created, so merely resolving its path proves nothing about the
/// singleton itself. Instead we <see cref="ActorRefImplicitSenderExtensions.Ask{T}(ICanTell, object, TimeSpan?)"/>
/// the proxy an <see cref="Identify"/> with a short bounded per-singleton timeout and
/// expect an <see cref="ActorIdentity"/> whose <see cref="ActorIdentity.Subject"/> is
/// non-null. The proxy buffers and forwards to the live singleton, so a non-null
/// Subject within the timeout means the singleton is running and reachable; a null
/// Subject or a timeout means it is unreachable. Probes run concurrently
/// (<see cref="Task.WhenAll(System.Collections.Generic.IEnumerable{Task})"/>) so the
/// whole check stays cheap and readiness polling stays fast.
/// </para>
/// <para>
/// <b>Required-always vs if-applicable.</b> All five central singleton proxies are
/// created unconditionally on a central node (there is no feature/config gate around
/// any of them), so all five are treated as required-always here. If a future
/// singleton is created behind a feature flag, it should NOT be added to
/// <see cref="RequiredSingletonProxyNames"/> — "if applicable" means skip when its
/// feature is off.
/// </para>
/// <para>
/// <b>Failover flakiness.</b> During a brief singleton handover the singleton may be
/// momentarily unreachable through the proxy. The bounded per-singleton timeout maps
/// that to Unhealthy (we never throw and never retry — retries would make the probe
/// slow). Readiness flapping briefly during a failover is acceptable and correct: a
/// node mid-handover is legitimately not fully ready. We deliberately accept that
/// tradeoff rather than masking it with retries.
/// </para>
/// <para>
/// <b>No leadership requirement.</b> The proxy reaches the singleton from either node
/// (active or standby), so a ready standby still reports Healthy here — readiness must
/// NOT require cluster leadership (that is the Active tier's job).
/// </para>
/// <para>
/// The <see cref="ActorSystem"/> is resolved lazily from DI per probe, mirroring
/// <c>AkkaClusterHealthCheck</c>; if it is not yet available (startup race) the check
/// returns Unhealthy rather than throwing.
/// </para>
/// </remarks>
public sealed class RequiredSingletonsHealthCheck : IHealthCheck
{
/// <summary>
/// Local actor names (under <c>/user</c>) of the <c>ClusterSingletonProxy</c>
/// actors for the singletons that must always be running on a central node.
/// Matches the unconditional proxy registrations in
/// <c>AkkaHostedService.RegisterCentralActors</c>.
/// </summary>
public static readonly IReadOnlyList<string> RequiredSingletonProxyNames = new[]
{
"notification-outbox-proxy",
"audit-log-ingest-proxy",
"site-call-audit-proxy",
"audit-log-purge-proxy",
"site-audit-reconciliation-proxy",
};
// Short, bounded per-singleton timeout. Kept small so readiness polling stays
// fast; a singleton in mid-handover that does not answer within this window is
// (correctly) treated as momentarily unreachable. Do NOT add retries here.
private static readonly TimeSpan ProbeTimeout = TimeSpan.FromSeconds(2);
private readonly IServiceProvider _serviceProvider;
private readonly ILogger<RequiredSingletonsHealthCheck> _logger;
/// <summary>Initializes a new <see cref="RequiredSingletonsHealthCheck"/>.</summary>
/// <param name="serviceProvider">
/// Application service provider; the <see cref="ActorSystem"/> is resolved lazily so the
/// check is startup-safe (Unhealthy, never throwing, if Akka is not yet up).
/// </param>
/// <param name="logger">Logger for diagnostic detail on unreachable singletons.</param>
public RequiredSingletonsHealthCheck(
IServiceProvider serviceProvider,
ILogger<RequiredSingletonsHealthCheck> logger)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <inheritdoc />
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
// CheckHealthAsync must NEVER throw — catch everything and map to Unhealthy
// with a descriptive message. An escaping exception would be recorded as
// Unhealthy anyway, but a thrown exception loses the descriptive message.
try
{
var system = _serviceProvider.GetService<ActorSystem>();
if (system is null)
return HealthCheckResult.Unhealthy("ActorSystem not yet available.");
// Probe each required singleton concurrently so the whole check is bounded
// by ~ProbeTimeout, not the sum of the per-singleton timeouts.
var probes = RequiredSingletonProxyNames
.Select(name => ProbeAsync(system, name, cancellationToken))
.ToArray();
var results = await Task.WhenAll(probes).ConfigureAwait(false);
var unreachable = results
.Where(r => !r.Reachable)
.Select(r => r.Name)
.ToList();
if (unreachable.Count == 0)
return HealthCheckResult.Healthy(
$"All {RequiredSingletonProxyNames.Count} required cluster singletons are reachable.");
var joined = string.Join(", ", unreachable);
_logger.LogWarning(
"Readiness degraded: required cluster singleton(s) unreachable: {Unreachable}",
joined);
return HealthCheckResult.Unhealthy(
$"Required cluster singleton(s) unreachable: {joined}.");
}
catch (Exception ex)
{
// Defensive: any unexpected failure (including OperationCanceledException
// on shutdown) degrades readiness rather than escaping the check.
return HealthCheckResult.Unhealthy(
"Failed to probe required cluster singletons.", ex);
}
}
/// <summary>
/// Asks the named local proxy an <see cref="Identify"/> with a bounded timeout.
/// Reachable iff a non-null <see cref="ActorIdentity.Subject"/> comes back in time.
/// A null Subject (path not present) or a timeout/exception → not reachable. This
/// method itself never throws.
/// </summary>
private async Task<(string Name, bool Reachable)> ProbeAsync(
ActorSystem system,
string proxyName,
CancellationToken cancellationToken)
{
try
{
// ActorSelection so a missing path resolves an ActorIdentity with a null
// Subject (rather than throwing) within the bounded timeout.
var selection = system.ActorSelection($"/user/{proxyName}");
var identity = await selection
.Ask<ActorIdentity>(new Identify(proxyName), ProbeTimeout, cancellationToken)
.ConfigureAwait(false);
return (proxyName, identity.Subject is not null);
}
catch (Exception)
{
// Timeout / cancellation / any failure → momentarily unreachable. Bounded,
// no retry — readiness may briefly flap during a singleton handover, which
// is the correct signal for a node mid-handover.
return (proxyName, false);
}
}
}
+12
View File
@@ -202,6 +202,18 @@ try
failureStatus: null, failureStatus: null,
tags: new[] { ZbHealthTags.Ready }, tags: new[] { ZbHealthTags.Ready },
args: AkkaClusterStatusPolicy.Default) args: AkkaClusterStatusPolicy.Default)
// M2.14 (#28): readiness ALSO reflects "required cluster singletons running"
// (REQ-HOST-4a). Probes each central singleton's local ClusterSingletonProxy
// with a bounded Identify and degrades to Unhealthy if any required singleton
// is unreachable. Registered inside the Central-role branch (this is it) so the
// check is naturally role-scoped — site nodes never run it. It resolves
// ActorSystem from DI per probe, like the akka-cluster check above, and is
// leadership-agnostic so a ready standby still reports ready (the proxy reaches
// the singleton from either node).
.AddTypeActivatedCheck<RequiredSingletonsHealthCheck>(
"required-singletons",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready })
.AddTypeActivatedCheck<ActiveNodeHealthCheck>( .AddTypeActivatedCheck<ActiveNodeHealthCheck>(
"active-node", "active-node",
failureStatus: null, failureStatus: null,
@@ -58,6 +58,16 @@ public static class SiteServiceRegistration
services.AddStoreAndForward(); services.AddStoreAndForward();
services.AddSiteEventLogging(); services.AddSiteEventLogging();
// Site Event Logging (#12) M2.16 (#30) — bridge ISiteEventLogger.FailedWriteCount
// into the site health report as a point-in-time SiteEventLogWriteFailures field.
// Must come AFTER both AddSiteHealthMonitoring (registers ISiteHealthCollector) and
// AddSiteEventLogging (registers ISiteEventLogger). The outer Func<IServiceProvider, …>
// is evaluated once at hosted-service resolution time (root IServiceProvider is available);
// the inner Func<long> is called on every poll tick and reads FailedWriteCount from the
// already-resolved ISiteEventLogger singleton.
services.AddSiteEventLogHealthMetricsBridge(
sp => () => sp.GetRequiredService<ISiteEventLogger>().FailedWriteCount);
// Audit Log (#23) — site-side hot-path writer + telemetry collaborators. // Audit Log (#23) — site-side hot-path writer + telemetry collaborators.
// The SiteAuditTelemetryActor itself is registered by AkkaHostedService // The SiteAuditTelemetryActor itself is registered by AkkaHostedService
// in the site-role block; this call wires every DI dependency it (and // in the site-role block; this call wires every DI dependency it (and
@@ -96,6 +106,19 @@ public static class SiteServiceRegistration
return new AkkaClusterNodeProvider(akkaService, siteRole); return new AkkaClusterNodeProvider(akkaService, siteRole);
}); });
// SiteEventLogging-019 / #29 (M2.15): the EventLogPurgeService runs on every
// site host node but consults this optional gate each tick and early-exits on
// the standby. Register it to delegate to IClusterNodeProvider.SelfIsPrimary
// (the canonical "this node is Up AND cluster leader" check) so purge runs ONLY
// on the active node — no duplicated cluster logic. Non-clustered test hosts that
// never call SiteServiceRegistration leave it unregistered, so the purge defaults
// to always-run (the pre-fix behaviour, preserved).
services.AddSingleton<SiteEventLogActiveNodeCheck>(sp =>
{
var nodeProvider = sp.GetRequiredService<IClusterNodeProvider>();
return () => nodeProvider.SelfIsPrimary;
});
// Options binding // Options binding
BindSharedOptions(services, config); BindSharedOptions(services, config);
services.Configure<SiteRuntimeOptions>(config.GetSection("ScadaBridge:SiteRuntime")); services.Configure<SiteRuntimeOptions>(config.GetSection("ScadaBridge:SiteRuntime"));
@@ -60,6 +60,9 @@ public static class StartupValidator
.Require("ScadaBridge:Database:ConfigurationDb", .Require("ScadaBridge:Database:ConfigurationDb",
_ => !string.IsNullOrEmpty(configuration.GetSection("ScadaBridge:Database")["ConfigurationDb"]), _ => !string.IsNullOrEmpty(configuration.GetSection("ScadaBridge:Database")["ConfigurationDb"]),
"connection string required for Central") "connection string required for Central")
.Require("ScadaBridge:Database:MachineDataDb",
_ => !string.IsNullOrEmpty(configuration.GetSection("ScadaBridge:Database")["MachineDataDb"]),
"connection string required for Central")
// Task 1.4: the LDAP server key moved into the nested Security:Ldap // Task 1.4: the LDAP server key moved into the nested Security:Ldap
// sub-section (bound to the shared LdapOptions). Validate the nested key so // sub-section (bound to the shared LdapOptions). Validate the nested key so
// the pre-host preflight still fails fast on a missing LDAP server for // the pre-host preflight still fails fast on a missing LDAP server for
@@ -4,8 +4,23 @@ using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI; namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
/// <summary> /// <summary>
/// WP-2: Validates and deserializes JSON request body against method parameter definitions. /// WP-2: Validates and deserializes a JSON request body against a method's
/// Extended type system: Boolean, Integer, Float, String, Object, List. /// parameter definitions. Extended type system: Boolean, Integer, Float,
/// String, Object, List.
///
/// <para>
/// InboundAPI-M2.6: validation is now RECURSIVE and type-aware for the
/// extended <c>Object</c> / <c>List</c> types. Declared object fields are
/// validated against their declared (nested) types, list elements against the
/// declared element type, and scalars at any depth against the extended type —
/// with path-qualified errors (e.g. <c>order.items[2].quantity</c>). The
/// definition is read as JSON Schema (the canonical persisted format produced
/// by the Central UI / migration); the legacy flat-array form is still
/// accepted for transition safety. See
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.InboundApiSchema"/>
/// for the shared recursive engine that <see cref="ReturnValueValidator"/>
/// also uses.
/// </para>
/// </summary> /// </summary>
public static class ParameterValidator public static class ParameterValidator
{ {
@@ -14,40 +29,34 @@ public static class ParameterValidator
/// Returns deserialized parameters or an error message. /// Returns deserialized parameters or an error message.
/// </summary> /// </summary>
/// <param name="body">The parsed JSON request body; null or undefined if no body was supplied.</param> /// <param name="body">The parsed JSON request body; null or undefined if no body was supplied.</param>
/// <param name="parameterDefinitions">JSON-serialized list of <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.ParameterDefinition"/>; null or empty means no parameters are defined.</param> /// <param name="parameterDefinitions">JSON Schema describing the method's parameters (an object schema), or null/empty when no parameters are defined. The legacy flat-array form is also accepted.</param>
/// <returns>A <see cref="ParameterValidationResult"/> with coerced parameter values on success, or an error message on failure.</returns> /// <returns>A <see cref="ParameterValidationResult"/> with coerced parameter values on success, or an error message on failure.</returns>
public static ParameterValidationResult Validate( public static ParameterValidationResult Validate(
JsonElement? body, JsonElement? body,
string? parameterDefinitions) string? parameterDefinitions)
{ {
if (string.IsNullOrEmpty(parameterDefinitions)) InboundApiSchema? schema;
{
// No parameters defined — body should be empty or null
return ParameterValidationResult.Valid(new Dictionary<string, object?>());
}
List<ParameterDefinition> definitions;
try try
{ {
definitions = JsonSerializer.Deserialize<List<ParameterDefinition>>( schema = InboundApiSchema.Parse(parameterDefinitions);
parameterDefinitions,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true })
?? [];
} }
catch (JsonException) catch (JsonException)
{ {
return ParameterValidationResult.Invalid("Invalid parameter definitions in method configuration"); return ParameterValidationResult.Invalid("Invalid parameter definitions in method configuration");
} }
if (definitions.Count == 0) // No parameters defined (or an object schema with no declared fields) —
// the body is unconstrained and yields an empty parameter set.
if (schema is null || schema.Type != "object" || schema.Fields.Count == 0)
{ {
return ParameterValidationResult.Valid(new Dictionary<string, object?>()); return ParameterValidationResult.Valid(new Dictionary<string, object?>());
} }
if (body == null || body.Value.ValueKind == JsonValueKind.Null || body.Value.ValueKind == JsonValueKind.Undefined) if (body == null
|| body.Value.ValueKind == JsonValueKind.Null
|| body.Value.ValueKind == JsonValueKind.Undefined)
{ {
// Check if all parameters are optional var required = schema.Fields.Where(f => f.Required).ToList();
var required = definitions.Where(d => d.Required).ToList();
if (required.Count > 0) if (required.Count > 0)
{ {
return ParameterValidationResult.Invalid( return ParameterValidationResult.Invalid(
@@ -62,86 +71,51 @@ public static class ParameterValidator
return ParameterValidationResult.Invalid("Request body must be a JSON object"); return ParameterValidationResult.Invalid("Request body must be a JSON object");
} }
var result = new Dictionary<string, object?>(); // Recursively type-check the whole body against the declared object
// schema (nested Object fields, List element types, scalars at any
// depth, undeclared-field rejection) with path-qualified errors.
var errors = new List<string>(); var errors = new List<string>();
schema.Validate(body.Value, string.Empty, errors);
// InboundAPI-010: report top-level body fields that do not match any defined
// parameter, so a caller learns about a typo'd parameter name instead of
// having the field silently ignored.
var defined = new HashSet<string>(definitions.Select(d => d.Name), StringComparer.Ordinal);
var unexpected = body.Value.EnumerateObject()
.Select(p => p.Name)
.Where(name => !defined.Contains(name))
.ToList();
if (unexpected.Count > 0)
{
errors.Add($"Unexpected parameter(s): {string.Join(", ", unexpected)}");
}
foreach (var def in definitions)
{
if (body.Value.TryGetProperty(def.Name, out var prop))
{
var (value, error) = CoerceValue(prop, def.Type, def.Name);
if (error != null)
{
errors.Add(error);
}
else
{
result[def.Name] = value;
}
}
else if (def.Required)
{
errors.Add($"Missing required parameter: {def.Name}");
}
}
if (errors.Count > 0) if (errors.Count > 0)
{ {
return ParameterValidationResult.Invalid(string.Join("; ", errors)); return ParameterValidationResult.Invalid(string.Join("; ", errors));
} }
// Materialize the coerced top-level parameter values for the script.
var result = new Dictionary<string, object?>();
foreach (var field in schema.Fields)
{
if (body.Value.TryGetProperty(field.Name, out var prop))
{
result[field.Name] = Materialize(prop, field.Schema);
}
}
return ParameterValidationResult.Valid(result); return ParameterValidationResult.Valid(result);
} }
/// <summary> /// <summary>
/// Coerces a JSON element to the declared parameter type. InboundAPI-010: the /// Converts a validated JSON element to the CLR value handed to the script.
/// <c>Object</c> and <c>List</c> extended types are validated for JSON <em>shape</em> /// Validation has already passed, so this only shapes the value: scalars to
/// only (object vs. array) — there is no field-level or element-level type /// their primitive type, objects to <see cref="Dictionary{TKey,TValue}"/>,
/// validation. A method script that needs a specific nested structure must /// arrays to <see cref="List{T}"/>.
/// validate it itself; invalid nested data surfaces as a runtime script error.
/// </summary> /// </summary>
private static (object? value, string? error) CoerceValue(JsonElement element, string expectedType, string paramName) private static object? Materialize(JsonElement element, InboundApiSchema schema)
{ {
return expectedType.ToLowerInvariant() switch if (element.ValueKind == JsonValueKind.Null)
{ {
"boolean" => element.ValueKind == JsonValueKind.True || element.ValueKind == JsonValueKind.False return null;
? (element.GetBoolean(), null) }
: (null, $"Parameter '{paramName}' must be a Boolean"),
"integer" => element.ValueKind == JsonValueKind.Number && element.TryGetInt64(out var intVal) return schema.Type switch
? (intVal, null) {
: (null, $"Parameter '{paramName}' must be an Integer"), "boolean" => element.GetBoolean(),
"integer" => element.GetInt64(),
"float" => element.ValueKind == JsonValueKind.Number "number" => element.GetDouble(),
? (element.GetDouble(), null) "string" => element.GetString(),
: (null, $"Parameter '{paramName}' must be a Float"), "object" => JsonSerializer.Deserialize<Dictionary<string, object?>>(element.GetRawText()),
"array" => JsonSerializer.Deserialize<List<object?>>(element.GetRawText()),
"string" => element.ValueKind == JsonValueKind.String _ => JsonSerializer.Deserialize<object?>(element.GetRawText()),
? (element.GetString(), null)
: (null, $"Parameter '{paramName}' must be a String"),
"object" => element.ValueKind == JsonValueKind.Object
? (JsonSerializer.Deserialize<Dictionary<string, object?>>(element.GetRawText()), null)
: (null, $"Parameter '{paramName}' must be an Object"),
"list" => element.ValueKind == JsonValueKind.Array
? (JsonSerializer.Deserialize<List<object?>>(element.GetRawText()), null)
: (null, $"Parameter '{paramName}' must be a List"),
_ => (null, $"Unknown parameter type '{expectedType}' for parameter '{paramName}'")
}; };
} }
} }
@@ -1,4 +1,5 @@
using System.Text.Json; using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI; namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
@@ -10,13 +11,20 @@ namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
/// <see cref="ParameterValidator"/>. /// <see cref="ParameterValidator"/>.
/// ///
/// <para> /// <para>
/// The return definition is a JSON array of <see cref="ReturnFieldDefinition"/> /// The return definition is JSON Schema (the canonical persisted format; the
/// (the same <c>{name,type}</c> shape as a parameter definition). A method whose /// legacy flat <c>[{name,type}]</c> array is still accepted for transition
/// <c>ReturnDefinition</c> is null/empty is unconstrained — its return value is /// safety). A method whose <c>ReturnDefinition</c> is null/empty is
/// serialized as-is (backward compatible). Primitive fields (Boolean / Integer / /// unconstrained — its return value is serialized as-is (backward compatible).
/// Float / String) are type-checked; the extended <c>Object</c>/<c>List</c> types /// </para>
/// are shape-checked only (object vs. array), consistent with how ///
/// <see cref="ParameterValidator"/> treats inbound extended types. /// <para>
/// InboundAPI-M2.6: validation is RECURSIVE and type-aware — declared object
/// fields are validated against their declared (nested) types, list elements
/// against the declared element type, and scalars at any depth — with
/// path-qualified errors. The recursion is shared with
/// <see cref="ParameterValidator"/> via
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.InboundApiSchema"/>,
/// so the inbound and outbound type checks cannot drift apart.
/// </para> /// </para>
/// </summary> /// </summary>
public static class ReturnValueValidator public static class ReturnValueValidator
@@ -27,8 +35,8 @@ public static class ReturnValueValidator
/// definition is configured or the result conforms to it. /// definition is configured or the result conforms to it.
/// </summary> /// </summary>
/// <param name="resultJson">The JSON-serialized script return value to validate.</param> /// <param name="resultJson">The JSON-serialized script return value to validate.</param>
/// <param name="returnDefinition">JSON-serialized list of <see cref="ReturnFieldDefinition"/> entries, or null/empty to skip validation.</param> /// <param name="returnDefinition">JSON Schema describing the method's return value, or null/empty to skip validation. The legacy flat-array form is also accepted.</param>
/// <returns>A <see cref="ReturnValidationResult"/> indicating success or describing the first validation failure.</returns> /// <returns>A <see cref="ReturnValidationResult"/> indicating success or describing the validation failures.</returns>
public static ReturnValidationResult Validate(string? resultJson, string? returnDefinition) public static ReturnValidationResult Validate(string? resultJson, string? returnDefinition)
{ {
if (string.IsNullOrWhiteSpace(returnDefinition)) if (string.IsNullOrWhiteSpace(returnDefinition))
@@ -37,13 +45,10 @@ public static class ReturnValueValidator
return ReturnValidationResult.Valid(); return ReturnValidationResult.Valid();
} }
List<ReturnFieldDefinition> fields; InboundApiSchema? schema;
try try
{ {
fields = JsonSerializer.Deserialize<List<ReturnFieldDefinition>>( schema = InboundApiSchema.Parse(returnDefinition);
returnDefinition,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true })
?? [];
} }
catch (JsonException) catch (JsonException)
{ {
@@ -51,11 +56,25 @@ public static class ReturnValueValidator
"Invalid return definition in method configuration"); "Invalid return definition in method configuration");
} }
if (fields.Count == 0) // A schema that declares no constraints (e.g. an object schema with no
// fields) leaves the return value unconstrained.
if (schema is null || (schema.Type == "object" && schema.Fields.Count == 0))
{ {
return ReturnValidationResult.Valid(); return ReturnValidationResult.Valid();
} }
// INTENTIONAL asymmetry with ParameterValidator:
//
// ParameterValidator has an early-return guard for "schema.Type != object"
// because method parameters are ALWAYS a top-level JSON object (flat map of
// name→value); a non-object parameter schema is treated as unconstrained.
//
// ReturnValueValidator does NOT guard on schema.Type here. A method may
// declare a scalar return type (e.g. {"type":"string"} or {"type":"integer"})
// and the script is expected to return exactly that scalar JSON value.
// Guarding on type == "object" would silently bypass validation for scalar
// and array return schemas — do NOT add that guard here.
if (string.IsNullOrWhiteSpace(resultJson)) if (string.IsNullOrWhiteSpace(resultJson))
{ {
return ReturnValidationResult.Invalid( return ReturnValidationResult.Invalid(
@@ -63,75 +82,37 @@ public static class ReturnValueValidator
} }
JsonElement root; JsonElement root;
JsonDocument doc;
try try
{ {
using var doc = JsonDocument.Parse(resultJson); doc = JsonDocument.Parse(resultJson);
root = doc.RootElement.Clone();
} }
catch (JsonException) catch (JsonException)
{ {
return ReturnValidationResult.Invalid("Script return value is not valid JSON"); return ReturnValidationResult.Invalid("Script return value is not valid JSON");
} }
if (root.ValueKind != JsonValueKind.Object) using (doc)
{ {
return ReturnValidationResult.Invalid( root = doc.RootElement;
"Method declares a return structure but the script did not return an object");
}
var errors = new List<string>(); // A JSON null result against a declared structure is treated as
foreach (var field in fields) // "no value returned" (preserves the prior contract).
{ if (root.ValueKind == JsonValueKind.Null)
if (!root.TryGetProperty(field.Name, out var value))
{ {
errors.Add($"missing return field '{field.Name}'"); return ReturnValidationResult.Invalid(
continue; "Method declares a return structure but the script returned no value");
} }
var typeError = CheckFieldType(value, field.Type, field.Name); var errors = new List<string>();
if (typeError != null) schema.Validate(root, string.Empty, errors);
errors.Add(typeError);
return errors.Count > 0
? ReturnValidationResult.Invalid(
$"Return value does not match the declared return definition: {string.Join("; ", errors)}")
: ReturnValidationResult.Valid();
} }
return errors.Count > 0
? ReturnValidationResult.Invalid(
$"Return value does not match the declared return definition: {string.Join("; ", errors)}")
: ReturnValidationResult.Valid();
} }
private static string? CheckFieldType(JsonElement value, string declaredType, string fieldName)
{
// A null value satisfies any field type — the script may legitimately omit
// optional data; only a missing field (handled by the caller) is an error.
if (value.ValueKind == JsonValueKind.Null)
return null;
var ok = declaredType.ToLowerInvariant() switch
{
"boolean" => value.ValueKind is JsonValueKind.True or JsonValueKind.False,
"integer" => value.ValueKind == JsonValueKind.Number && value.TryGetInt64(out _),
"float" => value.ValueKind == JsonValueKind.Number,
"string" => value.ValueKind == JsonValueKind.String,
"object" => value.ValueKind == JsonValueKind.Object,
"list" => value.ValueKind == JsonValueKind.Array,
_ => true, // unknown declared type — do not block the response
};
return ok ? null : $"return field '{fieldName}' must be {declaredType}";
}
}
/// <summary>
/// InboundAPI-014: one field of a method's declared return structure — the
/// deserialized form of an entry in <c>ApiMethod.ReturnDefinition</c>. Defined in
/// this module (not Commons) because the inbound API is currently its only consumer.
/// </summary>
public sealed class ReturnFieldDefinition
{
/// <summary>Field name as it must appear in the script return object.</summary>
public string Name { get; set; } = string.Empty;
/// <summary>Expected JSON type of this field (e.g., "string", "integer", "boolean", "object", "list").</summary>
public string Type { get; set; } = "String";
} }
/// <summary> /// <summary>
@@ -0,0 +1,231 @@
using System.Security.Claims;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles;
namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary>
/// The outcome of a single cookie <c>OnValidatePrincipal</c> evaluation. The thin
/// <c>OnValidatePrincipal</c> lambda translates this into the matching
/// <c>CookieValidatePrincipalContext</c> calls (<c>RejectPrincipal</c> /
/// <c>ReplacePrincipal</c> + <c>ShouldRenew</c>); the decision itself is computed by
/// <see cref="CookieSessionValidator"/> so it is unit-testable in isolation.
/// </summary>
/// <param name="Action">What the caller must do with the principal.</param>
/// <param name="Principal">The replacement principal when <paramref name="Action"/> is <see cref="SessionValidationAction.Replace"/>; otherwise <c>null</c>.</param>
public readonly record struct SessionValidationResult(
SessionValidationAction Action,
ClaimsPrincipal? Principal)
{
/// <summary>Keep the existing principal unchanged.</summary>
public static SessionValidationResult Keep { get; } = new(SessionValidationAction.Keep, null);
/// <summary>Reject the principal (idle-timed-out) — the caller signs the user out.</summary>
public static SessionValidationResult Reject { get; } = new(SessionValidationAction.Reject, null);
/// <summary>Replace the principal with a refreshed one and renew the cookie.</summary>
/// <param name="principal">The rebuilt principal.</param>
/// <returns>A replace result carrying <paramref name="principal"/>.</returns>
public static SessionValidationResult Replace(ClaimsPrincipal principal) =>
new(SessionValidationAction.Replace, principal);
}
/// <summary>The action a cookie session validation requires of the caller.</summary>
public enum SessionValidationAction
{
/// <summary>Leave the principal as-is (no idle timeout, no refresh due, or a refresh error we swallow).</summary>
Keep,
/// <summary>The session is idle-timed-out; reject + sign out.</summary>
Reject,
/// <summary>The role mapping was refreshed; replace the principal and renew the cookie.</summary>
Replace,
}
/// <summary>
/// M2.19 (#15): the unit-testable core of the cookie <c>OnValidatePrincipal</c> event.
/// Enforces the idle timeout and refreshes the session's role/scope claims from the
/// STORED LDAP group claims via the DB-backed <see cref="RoleMapper"/> — <b>without any
/// LDAP call</b> — picking up central role-mapping (and scope-rule) changes mid-session.
/// </summary>
/// <remarks>
/// <para>
/// <b>Idle timeout</b> (default <see cref="SecurityOptions.IdleTimeoutMinutes"/> = 30):
/// computed from the <see cref="JwtTokenService.LastActivityClaimType"/> anchor. This is
/// the explicit, deterministic counterpart to the cookie middleware's
/// <c>ExpireTimeSpan</c> + <c>SlidingExpiration</c> window — both use the SAME idle
/// timeout value, so the explicit check never contradicts the cookie window. A
/// not-timed-out session has its last-activity anchor advanced to "now" (genuine
/// request = activity), mirroring the sliding renew.
/// </para>
/// <para>
/// <b>Role refresh</b> (default <see cref="SecurityOptions.RoleRefreshThresholdMinutes"/>
/// = 15): when the elapsed time since <see cref="JwtTokenService.LastRoleRefreshClaimType"/>
/// exceeds the threshold, the stored groups are re-mapped and the principal is rebuilt via
/// <see cref="SessionClaimBuilder"/> (identical shape to <c>/auth/login</c>). If the DB
/// mapping revoked the user's roles, the rebuilt principal reflects the loss.
/// </para>
/// <para>
/// <b>Failure policy</b>: a refresh error (e.g. the mapper throws because the DB is
/// unreachable) NEVER signs the user out and NEVER throws out of validation — it returns
/// <see cref="SessionValidationResult.Keep"/>, mirroring the documented "LDAP failure:
/// active sessions continue with current roles" stance. Only the explicit idle-timeout
/// path rejects.
/// </para>
/// </remarks>
public sealed class CookieSessionValidator
{
private readonly IGroupRoleMapper<string> _roleMapper;
private readonly SecurityOptions _options;
private readonly TimeProvider _timeProvider;
private readonly ILogger<CookieSessionValidator> _logger;
/// <summary>Initializes the validator.</summary>
/// <param name="roleMapper">The DB-backed group→role mapping seam (no LDAP) used for the mid-session refresh.</param>
/// <param name="options">Security options carrying the idle and role-refresh thresholds.</param>
/// <param name="timeProvider">Clock source; injected so tests can advance time deterministically.</param>
/// <param name="logger">Logger instance.</param>
public CookieSessionValidator(
IGroupRoleMapper<string> roleMapper,
IOptions<SecurityOptions> options,
TimeProvider timeProvider,
ILogger<CookieSessionValidator> logger)
{
_roleMapper = roleMapper ?? throw new ArgumentNullException(nameof(roleMapper));
_options = (options ?? throw new ArgumentNullException(nameof(options))).Value;
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <summary>
/// Evaluates a cookie principal: enforces the idle timeout, then refreshes the
/// role/scope claims from the stored LDAP groups when the role-refresh interval has
/// elapsed. Never throws.
/// </summary>
/// <param name="principal">The current cookie principal under validation.</param>
/// <param name="ct">Cancellation token (the request-aborted token in the pipeline).</param>
/// <returns>The action the caller must take and any replacement principal.</returns>
public async Task<SessionValidationResult> ValidateAsync(ClaimsPrincipal? principal, CancellationToken ct = default)
{
// An unauthenticated / null principal is left to the rest of the pipeline.
if (principal?.Identity is not { IsAuthenticated: true })
{
return SessionValidationResult.Keep;
}
var now = _timeProvider.GetUtcNow();
// 1) Idle-timeout enforcement — the only path that rejects. A missing/unparsable
// last-activity anchor is treated as timed-out (fail-closed): a session we
// cannot age must not be kept alive forever.
if (IsIdleTimedOut(principal, now))
{
_logger.LogInformation(
"Cookie session for {Username} rejected: past the {IdleTimeout}-minute idle timeout.",
principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value ?? "(unknown)",
_options.IdleTimeoutMinutes);
return SessionValidationResult.Reject;
}
// 2) Role-mapping refresh — best-effort. Any failure keeps the existing session.
try
{
var refreshed = await TryRefreshAsync(principal, now, ct).ConfigureAwait(false);
if (refreshed is not null)
{
return SessionValidationResult.Replace(refreshed);
}
}
catch (Exception ex)
{
// SECURITY: never broaden access and never sign the user out on a transient
// refresh fault — keep the existing principal (current roles) and swallow.
_logger.LogWarning(
ex,
"Mid-session role refresh failed for {Username}; keeping existing session and roles.",
principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value ?? "(unknown)");
return SessionValidationResult.Keep;
}
return SessionValidationResult.Keep;
}
/// <summary>
/// Returns true when the session's last-activity anchor is older than
/// <see cref="SecurityOptions.IdleTimeoutMinutes"/>. A missing/unparsable anchor is
/// treated as timed-out (fail-closed).
/// </summary>
/// <param name="principal">The cookie principal.</param>
/// <param name="now">The current instant.</param>
/// <returns><c>true</c> if the session has exceeded the idle window.</returns>
public bool IsIdleTimedOut(ClaimsPrincipal principal, DateTimeOffset now)
{
var claim = principal.FindFirst(JwtTokenService.LastActivityClaimType);
if (claim is null || !DateTimeOffset.TryParse(claim.Value, out var lastActivity))
{
return true;
}
return (now - lastActivity).TotalMinutes > _options.IdleTimeoutMinutes;
}
// Returns a rebuilt principal when the role-refresh interval has elapsed; null when
// nothing changed. The principal is rebuilt via SessionClaimBuilder so its shape is
// identical to /auth/login.
private async Task<ClaimsPrincipal?> TryRefreshAsync(ClaimsPrincipal principal, DateTimeOffset now, CancellationToken ct)
{
var roleRefreshDue = IsRoleRefreshDue(principal, now);
if (!roleRefreshDue)
{
// No mapping refresh due. We deliberately do NOT mint a new principal just to
// advance LastActivity: the cookie middleware's SlidingExpiration already
// renews the cookie window on activity, so the idle anchor only needs
// advancing when we are rebuilding the principal anyway (on a role refresh).
// This keeps the no-op request path allocation-free and avoids a cookie
// re-issue on every request.
return null;
}
var username = principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value;
var displayName = principal.FindFirst(JwtTokenService.DisplayNameClaimType)?.Value;
if (string.IsNullOrEmpty(username) || string.IsNullOrEmpty(displayName))
{
// Malformed principal — cannot rebuild faithfully. Keep it (do not reject).
_logger.LogWarning("Cannot refresh role mapping: principal is missing username/display-name claims.");
return null;
}
var groups = SessionClaimBuilder.ReadGroups(principal);
// Re-run the DB-backed mapping on the STORED groups — NO LDAP call.
var mapping = await _roleMapper.MapAsync(groups, ct).ConfigureAwait(false);
var scope = mapping.Scope is RoleMappingResult mapped
? mapped
: new RoleMappingResult(mapping.Roles, [], IsSystemWideDeployment: false);
// Rebuild identically to /auth/login, advancing BOTH anchors: the role-refresh
// anchor (we just refreshed) and the idle anchor (this is a genuine request).
return SessionClaimBuilder.Build(username, displayName, groups, scope, now);
}
/// <summary>
/// Returns true when the elapsed time since the last role refresh exceeds
/// <see cref="SecurityOptions.RoleRefreshThresholdMinutes"/>. A missing/unparsable
/// anchor is treated as due (refresh now and re-stamp the anchor).
/// </summary>
/// <param name="principal">The cookie principal.</param>
/// <param name="now">The current instant.</param>
/// <returns><c>true</c> if a role-mapping refresh is due.</returns>
public bool IsRoleRefreshDue(ClaimsPrincipal principal, DateTimeOffset now)
{
var claim = principal.FindFirst(JwtTokenService.LastRoleRefreshClaimType);
if (claim is null || !DateTimeOffset.TryParse(claim.Value, out var lastRefresh))
{
return true;
}
return (now - lastRefresh).TotalMinutes > _options.RoleRefreshThresholdMinutes;
}
}
@@ -29,6 +29,22 @@ public class JwtTokenService
public const string SiteIdClaimType = ZbClaimTypes.ScopeId; public const string SiteIdClaimType = ZbClaimTypes.ScopeId;
public const string LastActivityClaimType = "LastActivity"; public const string LastActivityClaimType = "LastActivity";
// M2.19 (#15): the cookie session now stores the user's raw LDAP groups and a
// role-mapping refresh anchor so an active interactive session can re-run the
// DB-backed RoleMapper (NOT LDAP) mid-session and pick up central role-mapping
// changes. These two have no canonical ZbClaimTypes equivalent (the shared
// vocabulary covers identity/role/scope, not the ScadaBridge-internal refresh
// machinery), so they keep "zb:"-prefixed ScadaBridge-local literals:
// - GroupClaimType ("zb:group", one per LDAP group) is the input the
// mid-session RoleMapper re-run consumes — the groups are the durable
// fact; the roles are the derived projection that can go stale.
// - LastRoleRefreshClaimType ("zb:lastrolerefresh", ISO-8601 "o") anchors
// the role-mapping refresh interval (SecurityOptions.RoleRefreshThresholdMinutes).
// LastActivityClaimType (above) remains the idle-timeout anchor — a separate
// clock from the role-refresh anchor.
public const string GroupClaimType = "zb:group";
public const string LastRoleRefreshClaimType = "zb:lastrolerefresh";
/// <summary> /// <summary>
/// Fixed issuer bound into every token and required on validation. Binding /// Fixed issuer bound into every token and required on validation. Binding
/// issuer/audience is defence-in-depth: even though the HMAC key is shared only /// issuer/audience is defence-in-depth: even though the HMAC key is shared only
@@ -1,10 +1,21 @@
namespace ZB.MOM.WW.ScadaBridge.Security; namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary> /// <summary>
/// Non-LDAP security configuration: the cookie-embedded JWT signing/lifetime /// Non-LDAP security configuration for the ScadaBridge Central UI.
/// settings and the session idle-timeout / cookie-security policy.
/// </summary> /// </summary>
/// <remarks> /// <remarks>
/// <para>
/// <b>JWT Bearer path (<c>/auth/token</c>)</b>: <see cref="JwtSigningKey"/> and
/// <see cref="JwtExpiryMinutes"/> govern the short-lived Bearer token issued to
/// the CLI / Inbound API. They have no effect on the Blazor cookie session.
/// </para>
/// <para>
/// <b>Blazor cookie session</b>: <see cref="IdleTimeoutMinutes"/> and
/// <see cref="RoleRefreshThresholdMinutes"/> govern the cookie-only session used by
/// the Blazor Server UI. There is no embedded JWT in this path — the cookie is
/// HttpOnly/Secure and managed entirely by ASP.NET Core cookie authentication.
/// </para>
/// <para>
/// Task 1.2/1.4 cutover: the LDAP connection settings that used to live here as /// Task 1.2/1.4 cutover: the LDAP connection settings that used to live here as
/// flat <c>Ldap*</c> keys (server, port, transport, search base, service account, /// flat <c>Ldap*</c> keys (server, port, transport, search base, service account,
/// attributes, timeout) moved into a nested <c>ScadaBridge:Security:Ldap</c> /// attributes, timeout) moved into a nested <c>ScadaBridge:Security:Ldap</c>
@@ -12,6 +23,7 @@ namespace ZB.MOM.WW.ScadaBridge.Security;
/// and registered via <c>AddZbLdapAuth</c>. This is a BREAKING config-key change — /// and registered via <c>AddZbLdapAuth</c>. This is a BREAKING config-key change —
/// see CHANGELOG. The non-LDAP fields below are unchanged and still bound from /// see CHANGELOG. The non-LDAP fields below are unchanged and still bound from
/// <c>ScadaBridge:Security</c>. /// <c>ScadaBridge:Security</c>.
/// </para>
/// </remarks> /// </remarks>
public class SecurityOptions public class SecurityOptions
{ {
@@ -27,7 +39,19 @@ public class SecurityOptions
public const int MinJwtSigningKeyBytes = 32; public const int MinJwtSigningKeyBytes = 32;
/// <summary>Cookie-embedded JWT lifetime in minutes before it must be refreshed.</summary> /// <summary>Cookie-embedded JWT lifetime in minutes before it must be refreshed.</summary>
public int JwtExpiryMinutes { get; set; } = 15; public int JwtExpiryMinutes { get; set; } = 15;
/// <summary>Session idle timeout in minutes; sessions inactive beyond this are expired.</summary> /// <summary>
/// Session idle timeout in minutes for the Blazor cookie session; sessions inactive
/// beyond this are expired and the user is redirected to <c>/login</c>. Default: <b>30</b>.
/// </summary>
/// <remarks>
/// Because <see cref="RoleRefreshThresholdMinutes"/> is the only operation that advances
/// the <c>LastActivity</c> anchor, the effective maximum idle window before a session is
/// guaranteed to be rejected is approximately
/// <c>IdleTimeoutMinutes + RoleRefreshThresholdMinutes</c> (~45 minutes with defaults).
/// This is intentional and mirrors the cookie middleware's own <c>SlidingExpiration</c>
/// fuzziness. Must be strictly greater than <see cref="RoleRefreshThresholdMinutes"/>
/// (enforced at startup by <see cref="SecurityOptionsValidator"/>).
/// </remarks>
public int IdleTimeoutMinutes { get; set; } = 30; public int IdleTimeoutMinutes { get; set; } = 30;
/// <summary> /// <summary>
@@ -35,6 +59,28 @@ public class SecurityOptions
/// </summary> /// </summary>
public int JwtRefreshThresholdMinutes { get; set; } = 5; public int JwtRefreshThresholdMinutes { get; set; } = 5;
/// <summary>
/// M2.19 (#15): how long a cookie session's role-mapping projection may be stale
/// before <c>OnValidatePrincipal</c> re-runs the DB-backed <c>RoleMapper</c> on the
/// session's stored LDAP group claims and rebuilds the role/scope claims. Default:
/// <b>15 minutes</b>, matching the documented sliding-refresh cadence.
/// </summary>
/// <remarks>
/// This is a purely central (database) refresh — it picks up LDAP-group→role mapping
/// changes and scope-rule changes WITHOUT contacting LDAP, so revoked roles take effect
/// within this window. It does NOT pick up live LDAP group-membership changes (the
/// shared LDAP library exposes no passwordless group-search; that remains a
/// next-login refresh — see Component-Security.md).
/// <para>
/// Because a role-refresh is also the only operation that advances the
/// <c>LastActivity</c> anchor, the effective maximum idle window is approximately
/// <c><see cref="IdleTimeoutMinutes"/> + RoleRefreshThresholdMinutes</c> (~45 minutes
/// with defaults). Must be strictly less than <see cref="IdleTimeoutMinutes"/>
/// (enforced at startup by <see cref="SecurityOptionsValidator"/>).
/// </para>
/// </remarks>
public int RoleRefreshThresholdMinutes { get; set; } = 15;
/// <summary> /// <summary>
/// When true (default) the authentication cookie is always marked /// When true (default) the authentication cookie is always marked
/// <c>Secure</c> (sent only over HTTPS) — the correct production setting, /// <c>Secure</c> (sent only over HTTPS) — the correct production setting,
@@ -59,3 +105,38 @@ public class SecurityOptions
/// </summary> /// </summary>
public string CookieName { get; set; } = DefaultCookieName; public string CookieName { get; set; } = DefaultCookieName;
} }
/// <summary>
/// M2.19 (#15): startup validator for <see cref="SecurityOptions"/>. Fails fast at boot
/// on any configuration that would defeat idle-timeout enforcement.
/// </summary>
/// <remarks>
/// Registered with <c>ValidateOnStart()</c> by
/// <see cref="ServiceCollectionExtensions.AddSecurity"/> so a misconfigured appsettings
/// section is caught at application startup rather than silently misapplied at runtime.
/// </remarks>
public sealed class SecurityOptionsValidator : Microsoft.Extensions.Options.IValidateOptions<SecurityOptions>
{
/// <inheritdoc/>
public Microsoft.Extensions.Options.ValidateOptionsResult Validate(string? name, SecurityOptions options)
{
// SECURITY: RoleRefreshThresholdMinutes must be strictly less than IdleTimeoutMinutes.
// The role-refresh cycle is the ONLY operation that advances the LastActivity anchor,
// so a single un-refreshed cycle must not be able to exhaust the entire idle window.
// If threshold >= idle, a user who triggers exactly one refresh at t=0 would have
// their anchor advanced to t=threshold while the idle check only fires at t>idle —
// meaning t=threshold >= t=idle is already past (or at) the expiry, defeating enforcement.
if (options.RoleRefreshThresholdMinutes >= options.IdleTimeoutMinutes)
{
return Microsoft.Extensions.Options.ValidateOptionsResult.Fail(
$"{nameof(SecurityOptions.RoleRefreshThresholdMinutes)} " +
$"({options.RoleRefreshThresholdMinutes}) must be strictly less than " +
$"{nameof(SecurityOptions.IdleTimeoutMinutes)} " +
$"({options.IdleTimeoutMinutes}). " +
$"A single refresh cycle must not equal or exceed the idle window or idle " +
$"enforcement is defeated.");
}
return Microsoft.Extensions.Options.ValidateOptionsResult.Success;
}
}
@@ -1,5 +1,8 @@
using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions;
using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options; using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles; using ZB.MOM.WW.Auth.Abstractions.Roles;
@@ -51,6 +54,14 @@ public static class ServiceCollectionExtensions
services.AddScoped<JwtTokenService>(); services.AddScoped<JwtTokenService>();
services.AddScoped<RoleMapper>(); services.AddScoped<RoleMapper>();
// M2.19 (#15): the cookie OnValidatePrincipal core. Scoped to match the
// IGroupRoleMapper<string> it depends on (which depends on the Scoped
// ISecurityRepository). The clock is injected (TimeProvider) so the idle/refresh
// thresholds can be exercised deterministically in tests; the production default
// is the wall clock. TryAddSingleton keeps the Host free to register its own.
services.TryAddSingleton(TimeProvider.System);
services.AddScoped<CookieSessionValidator>();
// Audit Actor wiring (Phase 3): the user-facing inbound API audit path // Audit Actor wiring (Phase 3): the user-facing inbound API audit path
// sources AuditEvent.Actor from the authenticated principal via this // sources AuditEvent.Actor from the authenticated principal via this
// seam. HttpAuditActorAccessor reads IHttpContextAccessor.HttpContext?.User // seam. HttpAuditActorAccessor reads IHttpContextAccessor.HttpContext?.User
@@ -71,6 +82,16 @@ public static class ServiceCollectionExtensions
// to consume this seam in a later task. // to consume this seam in a later task.
services.AddScoped<IGroupRoleMapper<string>, ScadaBridgeGroupRoleMapper>(); services.AddScoped<IGroupRoleMapper<string>, ScadaBridgeGroupRoleMapper>();
// M2.19 (#15): fail-fast config guard — RoleRefreshThresholdMinutes must be strictly
// less than IdleTimeoutMinutes. If they are equal or inverted, a single un-refreshed
// cycle can exhaust the entire idle window and idle enforcement is silently defeated.
// SecurityOptionsValidator is registered with ValidateOnStart so a misconfigured
// appsettings section fails at boot with a clear message rather than behaving subtly
// incorrectly at runtime. Config-binding stays with the Host (component library must
// not take IConfiguration), so we only register the validator + ValidateOnStart here.
services.AddOptions<SecurityOptions>().ValidateOnStart();
services.AddSingleton<IValidateOptions<SecurityOptions>, SecurityOptionsValidator>();
// Note: the old SecurityOptionsValidator (which fail-fast-validated LdapServer + // Note: the old SecurityOptionsValidator (which fail-fast-validated LdapServer +
// LdapSearchBase) is gone — those keys moved into the shared LdapOptions, whose // LdapSearchBase) is gone — those keys moved into the shared LdapOptions, whose
// LdapOptionsValidator (registered with ValidateOnStart by AddZbLdapAuth above) // LdapOptionsValidator (registered with ValidateOnStart by AddZbLdapAuth above)
@@ -94,6 +115,16 @@ public static class ServiceCollectionExtensions
// environments sharing a hostname can be given distinct names. HttpOnly / // environments sharing a hostname can be given distinct names. HttpOnly /
// SameSite / SecurePolicy / SlidingExpiration / ExpireTimeSpan are likewise // SameSite / SecurePolicy / SlidingExpiration / ExpireTimeSpan are likewise
// applied there via ZbCookieDefaults.Apply. // applied there via ZbCookieDefaults.Apply.
// M2.19 (#15): OnValidatePrincipal enforces the idle timeout and refreshes
// the role/scope claims from the session's STORED LDAP groups (DB-backed
// RoleMapper, NO LDAP) so central role-mapping changes take effect
// mid-session. The lambda is a THIN adapter: it resolves the request-scoped
// CookieSessionValidator (which holds all the testable idle/refresh logic)
// and translates its decision into the cookie context calls. It NEVER
// throws — CookieSessionValidator.ValidateAsync swallows refresh faults and
// keeps the session (mirrors "LDAP failure: active sessions continue").
options.Events.OnValidatePrincipal = OnValidatePrincipalAsync;
}); });
// CentralUI-005: configure the cookie session as a sliding window so the // CentralUI-005: configure the cookie session as a sliding window so the
@@ -152,6 +183,70 @@ public static class ServiceCollectionExtensions
return services; return services;
} }
/// <summary>
/// M2.19 (#15): the thin <see cref="CookieAuthenticationEvents.OnValidatePrincipal"/>
/// adapter. It resolves the request-scoped <see cref="CookieSessionValidator"/>,
/// asks it for a decision, and applies it to the cookie context:
/// <list type="bullet">
/// <item><see cref="SessionValidationAction.Reject"/> → <see cref="CookieValidatePrincipalContext.RejectPrincipal"/> + sign out (idle-timeout — the only sign-out path).</item>
/// <item><see cref="SessionValidationAction.Replace"/> → <see cref="CookieValidatePrincipalContext.ReplacePrincipal"/> + <c>ShouldRenew = true</c> (role mapping refreshed).</item>
/// <item><see cref="SessionValidationAction.Keep"/> → no-op (no refresh due, or a swallowed refresh fault).</item>
/// </list>
/// All logic lives in <see cref="CookieSessionValidator.ValidateAsync"/>, which never
/// throws, so this adapter cannot bubble an exception out into the request pipeline.
/// </summary>
/// <param name="context">The cookie validation context supplied by the middleware.</param>
/// <returns>A task that completes when the decision has been applied.</returns>
internal static async Task OnValidatePrincipalAsync(CookieValidatePrincipalContext context)
{
var validator = context.HttpContext.RequestServices.GetRequiredService<CookieSessionValidator>();
var result = await validator
.ValidateAsync(context.Principal, context.HttpContext.RequestAborted)
.ConfigureAwait(false);
await ApplyValidationResultAsync(context, result).ConfigureAwait(false);
}
/// <summary>
/// Applies a <see cref="SessionValidationResult"/> to a
/// <see cref="CookieValidatePrincipalContext"/>: the pure decision-application
/// step extracted from <see cref="OnValidatePrincipalAsync"/> so it can be
/// exercised in unit tests without a live DI container resolving
/// <see cref="CookieSessionValidator"/>.
/// </summary>
/// <param name="context">The cookie validation context to mutate.</param>
/// <param name="result">The decision produced by <see cref="CookieSessionValidator.ValidateAsync"/>.</param>
/// <returns>A task that completes when the result has been applied.</returns>
internal static async Task ApplyValidationResultAsync(
CookieValidatePrincipalContext context,
SessionValidationResult result)
{
switch (result.Action)
{
case SessionValidationAction.Reject:
// Idle-timeout: drop the principal AND clear the cookie so the next
// request is treated as anonymous and redirected to /login.
context.RejectPrincipal();
await context.HttpContext
.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme)
.ConfigureAwait(false);
break;
case SessionValidationAction.Replace when result.Principal is not null:
// Role mapping refreshed from stored groups — swap in the rebuilt
// principal and re-issue the cookie so the new claims persist.
context.ReplacePrincipal(result.Principal);
context.ShouldRenew = true;
break;
case SessionValidationAction.Keep:
default:
// Leave the principal untouched.
break;
}
}
/// <summary> /// <summary>
/// Registers security-related Akka actors (placeholder for future actor registrations). /// Registers security-related Akka actors (placeholder for future actor registrations).
/// </summary> /// </summary>
@@ -0,0 +1,116 @@
using System.Security.Claims;
using Microsoft.AspNetCore.Authentication.Cookies;
namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary>
/// M2.19 (#15): the single, shared source of truth for the FULL set of claims that
/// back an interactive cookie session. BOTH the <c>/auth/login</c> endpoint and the
/// <c>OnValidatePrincipal</c> mid-session role-refresh path build their principal
/// through <see cref="Build"/>, so the two can never drift — the spec requires the
/// refresh to "rebuild claims identically to /auth/login".
/// </summary>
/// <remarks>
/// The claim shape is exactly what the login endpoint historically minted, plus the
/// two M2.19 additions:
/// <list type="bullet">
/// <item><see cref="ClaimTypes.Name"/> — resolves <c>Identity.Name</c>.</item>
/// <item><see cref="JwtTokenService.DisplayNameClaimType"/> — human display name.</item>
/// <item><see cref="JwtTokenService.UsernameClaimType"/> — canonical username.</item>
/// <item><see cref="JwtTokenService.RoleClaimType"/> — one per mapped role.</item>
/// <item><see cref="JwtTokenService.SiteIdClaimType"/> — one per permitted site,
/// ONLY when the mapping is not system-wide (deny-by-omission preserved).</item>
/// <item><see cref="JwtTokenService.GroupClaimType"/> — one per raw LDAP group
/// (M2.19): the durable input the mid-session RoleMapper re-run consumes.</item>
/// <item><see cref="JwtTokenService.LastRoleRefreshClaimType"/> — the role-mapping
/// refresh anchor (M2.19), ISO-8601 round-trippable.</item>
/// <item><see cref="JwtTokenService.LastActivityClaimType"/> — the idle-timeout
/// anchor; seeded to the refresh timestamp at login so idle-timeout can be
/// enforced consistently from the very first request.</item>
/// </list>
/// The <see cref="ClaimsIdentity"/> is built with <c>nameType = ClaimTypes.Name</c>
/// and <c>roleType = RoleClaimType</c> so <c>Identity.Name</c> / <c>IsInRole</c> /
/// <c>[Authorize(Roles=…)]</c> resolve against exactly the canonical types minted here.
/// </remarks>
public static class SessionClaimBuilder
{
/// <summary>
/// Builds the full cookie-session <see cref="ClaimsPrincipal"/> from the resolved
/// identity, the raw LDAP groups, the DB-backed role mapping, and the refresh
/// timestamp. Used identically by <c>/auth/login</c> and the
/// <c>OnValidatePrincipal</c> refresh path so the two cannot diverge.
/// </summary>
/// <param name="username">The canonical authenticated username (becomes <see cref="ClaimTypes.Name"/> + <see cref="JwtTokenService.UsernameClaimType"/>).</param>
/// <param name="displayName">The human-readable display name.</param>
/// <param name="groups">The user's raw LDAP groups, stored one per <see cref="JwtTokenService.GroupClaimType"/> claim.</param>
/// <param name="mapping">The DB-backed role mapping (roles + permitted sites + system-wide flag).</param>
/// <param name="refreshTimestamp">The role-mapping refresh anchor; also seeds the last-activity anchor.</param>
/// <param name="authenticationType">The authentication type stamped on the identity (defaults to the cookie scheme).</param>
/// <returns>A fully populated cookie <see cref="ClaimsPrincipal"/>.</returns>
public static ClaimsPrincipal Build(
string username,
string displayName,
IReadOnlyList<string> groups,
RoleMappingResult mapping,
DateTimeOffset refreshTimestamp,
string authenticationType = CookieAuthenticationDefaults.AuthenticationScheme)
{
ArgumentNullException.ThrowIfNull(username);
ArgumentNullException.ThrowIfNull(displayName);
ArgumentNullException.ThrowIfNull(groups);
ArgumentNullException.ThrowIfNull(mapping);
var refreshStamp = refreshTimestamp.ToString("o");
var claims = new List<Claim>
{
new(ClaimTypes.Name, username),
new(JwtTokenService.DisplayNameClaimType, displayName),
new(JwtTokenService.UsernameClaimType, username),
// Role-refresh anchor AND idle anchor are seeded from the same instant at
// build time. They then diverge: OnValidatePrincipal advances LastActivity
// on every request but only advances LastRoleRefresh when it actually
// re-runs the mapping.
new(JwtTokenService.LastRoleRefreshClaimType, refreshStamp),
new(JwtTokenService.LastActivityClaimType, refreshStamp),
};
foreach (var role in mapping.Roles)
{
claims.Add(new Claim(JwtTokenService.RoleClaimType, role));
}
// Deny-by-omission: only stamp SiteId claims for a non-system-wide mapping.
if (!mapping.IsSystemWideDeployment)
{
foreach (var siteId in mapping.PermittedSiteIds)
{
claims.Add(new Claim(JwtTokenService.SiteIdClaimType, siteId));
}
}
// Store the raw LDAP groups so the mid-session refresh can re-run the
// DB-backed RoleMapper without any LDAP round-trip.
foreach (var group in groups)
{
claims.Add(new Claim(JwtTokenService.GroupClaimType, group));
}
var identity = new ClaimsIdentity(
claims,
authenticationType: authenticationType,
nameType: ClaimTypes.Name,
roleType: JwtTokenService.RoleClaimType);
return new ClaimsPrincipal(identity);
}
/// <summary>Reads the stored LDAP group claims (<see cref="JwtTokenService.GroupClaimType"/>) off a principal.</summary>
/// <param name="principal">The cookie principal to read from.</param>
/// <returns>The stored LDAP group names; empty if none were stored.</returns>
public static IReadOnlyList<string> ReadGroups(ClaimsPrincipal principal)
{
ArgumentNullException.ThrowIfNull(principal);
return principal.FindAll(JwtTokenService.GroupClaimType).Select(c => c.Value).ToList();
}
}
@@ -35,4 +35,10 @@
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" /> <ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" />
</ItemGroup> </ItemGroup>
<ItemGroup>
<!-- M2.19 (#15): expose internal members (OnValidatePrincipalAsync adapter) to the
Security test project so the adapter translation can be exercised in isolation. -->
<InternalsVisibleTo Include="ZB.MOM.WW.ScadaBridge.Security.Tests" />
</ItemGroup>
</Project> </Project>
@@ -32,10 +32,9 @@ public interface ISiteEventLogger
/// <summary> /// <summary>
/// SiteEventLogging-018: total number of event writes that have failed /// SiteEventLogging-018: total number of event writes that have failed
/// (SQLite error, disk full, bounded-queue overflow drop, etc.) since this /// (SQLite error, disk full, bounded-queue overflow drop, etc.) since this
/// logger was created. Available for future Health Monitoring integration — /// logger was created. Polled by <c>SiteEventLogFailureCountReporter</c>
/// promoted onto the interface so a Health consumer can read it without a /// (HealthMonitoring — M2.16 / #30) every 30 s and surfaced on the site
/// concrete-type downcast. Not yet polled by Health Monitoring; the wiring /// health report as <c>SiteHealthReport.SiteEventLogWriteFailures</c>.
/// is tracked separately.
/// </summary> /// </summary>
long FailedWriteCount { get; } long FailedWriteCount { get; }
} }
@@ -72,6 +72,15 @@ public class AlarmActor : ReceiveActor
private readonly string? _onTriggerScriptName; private readonly string? _onTriggerScriptName;
private readonly Script<object?>? _onTriggerCompiledScript; private readonly Script<object?>? _onTriggerCompiledScript;
/// <summary>
/// M2.5 (#9): the on-trigger script's per-script execution timeout in seconds,
/// or null to use the global default. Forwarded to each spawned
/// <see cref="AlarmExecutionActor"/>, which applies <c>perScript ?? global</c>
/// (treating ≤ 0 as "use global"). The value comes from the referenced
/// on-trigger script's <see cref="ResolvedScript.ExecutionTimeoutSeconds"/>.
/// </summary>
private readonly int? _onTriggerExecutionTimeoutSeconds;
// Expression trigger: compiled expression + the attribute snapshot it // Expression trigger: compiled expression + the attribute snapshot it
// evaluates against. This field is the single home for the compiled // evaluates against. This field is the single home for the compiled
// expression on the hot path. // expression on the hot path.
@@ -107,6 +116,9 @@ public class AlarmActor : ReceiveActor
/// <param name="serviceProvider">Optional DI service provider used to resolve the optional /// <param name="serviceProvider">Optional DI service provider used to resolve the optional
/// <see cref="ISiteEventLogger"/> for M1.5 <c>alarm</c> operational events. Fire-and-forget; /// <see cref="ISiteEventLogger"/> for M1.5 <c>alarm</c> operational events. Fire-and-forget;
/// a logging failure never affects alarm evaluation.</param> /// a logging failure never affects alarm evaluation.</param>
/// <param name="onTriggerExecutionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script
/// execution timeout in seconds (from its <see cref="ResolvedScript.ExecutionTimeoutSeconds"/>),
/// or null/non-positive to use the global default.</param>
public AlarmActor( public AlarmActor(
string alarmName, string alarmName,
string instanceName, string instanceName,
@@ -119,7 +131,9 @@ public class AlarmActor : ReceiveActor
Script<object?>? compiledTriggerExpression = null, Script<object?>? compiledTriggerExpression = null,
IReadOnlyDictionary<string, object?>? initialAttributes = null, IReadOnlyDictionary<string, object?>? initialAttributes = null,
ISiteHealthCollector? healthCollector = null, ISiteHealthCollector? healthCollector = null,
IServiceProvider? serviceProvider = null) IServiceProvider? serviceProvider = null,
// M2.5 (#9): per-script timeout for the on-trigger script (null = global).
int? onTriggerExecutionTimeoutSeconds = null)
{ {
_alarmName = alarmName; _alarmName = alarmName;
_instanceName = instanceName; _instanceName = instanceName;
@@ -135,6 +149,7 @@ public class AlarmActor : ReceiveActor
_priority = alarmConfig.PriorityLevel; _priority = alarmConfig.PriorityLevel;
_onTriggerScriptName = alarmConfig.OnTriggerScriptCanonicalName; _onTriggerScriptName = alarmConfig.OnTriggerScriptCanonicalName;
_onTriggerCompiledScript = onTriggerCompiledScript; _onTriggerCompiledScript = onTriggerCompiledScript;
_onTriggerExecutionTimeoutSeconds = onTriggerExecutionTimeoutSeconds;
_compiledTriggerExpression = compiledTriggerExpression; _compiledTriggerExpression = compiledTriggerExpression;
// Seed the trigger-expression attribute snapshot from the instance's // Seed the trigger-expression attribute snapshot from the instance's
@@ -574,7 +589,9 @@ public class AlarmActor : ReceiveActor
_instanceActor, _instanceActor,
_sharedScriptLibrary, _sharedScriptLibrary,
_options, _options,
_logger)); _logger,
// M2.5 (#9): per-script timeout from the on-trigger script (null = global).
_onTriggerExecutionTimeoutSeconds));
Context.ActorOf(props, executionId); Context.ActorOf(props, executionId);
} }
@@ -28,6 +28,7 @@ public class AlarmExecutionActor : ReceiveActor
/// <param name="sharedScriptLibrary">Shared script library providing common utilities.</param> /// <param name="sharedScriptLibrary">Shared script library providing common utilities.</param>
/// <param name="options">Site runtime configuration options, including the execution timeout.</param> /// <param name="options">Site runtime configuration options, including the execution timeout.</param>
/// <param name="logger">Logger for execution diagnostics.</param> /// <param name="logger">Logger for execution diagnostics.</param>
/// <param name="executionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
public AlarmExecutionActor( public AlarmExecutionActor(
string alarmName, string alarmName,
string instanceName, string instanceName,
@@ -38,7 +39,10 @@ public class AlarmExecutionActor : ReceiveActor
IActorRef instanceActor, IActorRef instanceActor,
SharedScriptLibrary sharedScriptLibrary, SharedScriptLibrary sharedScriptLibrary,
SiteRuntimeOptions options, SiteRuntimeOptions options,
ILogger logger) ILogger logger,
// M2.5 (#9): per-script execution timeout override (seconds) for the
// alarm on-trigger script. Null or non-positive falls back to the global.
int? executionTimeoutSeconds = null)
{ {
var self = Self; var self = Self;
var parent = Context.Parent; var parent = Context.Parent;
@@ -46,7 +50,8 @@ public class AlarmExecutionActor : ReceiveActor
ExecuteAlarmScript( ExecuteAlarmScript(
alarmName, instanceName, level, priority, message, alarmName, instanceName, level, priority, message,
compiledScript, instanceActor, compiledScript, instanceActor,
sharedScriptLibrary, options, self, parent, logger); sharedScriptLibrary, options, self, parent, logger,
executionTimeoutSeconds);
} }
private static void ExecuteAlarmScript( private static void ExecuteAlarmScript(
@@ -61,9 +66,15 @@ public class AlarmExecutionActor : ReceiveActor
SiteRuntimeOptions options, SiteRuntimeOptions options,
IActorRef self, IActorRef self,
IActorRef parent, IActorRef parent,
ILogger logger) ILogger logger,
int? executionTimeoutSeconds)
{ {
var timeout = TimeSpan.FromSeconds(options.ScriptExecutionTimeoutSeconds); // M2.5 (#9): per-script timeout overrides the global default. A null or
// non-positive per-script value (≤ 0) falls back to the global.
var timeout = TimeSpan.FromSeconds(
executionTimeoutSeconds is { } perScript && perScript > 0
? perScript
: options.ScriptExecutionTimeoutSeconds);
// SiteRuntime-009: run the alarm on-trigger body on the dedicated // SiteRuntime-009: run the alarm on-trigger body on the dedicated
// script-execution scheduler, not the shared .NET thread pool. // script-execution scheduler, not the shared .NET thread pool.
@@ -895,11 +895,14 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
} }
else else
{ {
// M2.11: set InstanceNotFound=true so the caller can distinguish
// "not deployed on this site" from a deployed-but-empty instance.
_logger.LogWarning( _logger.LogWarning(
"Debug view subscribe for unknown instance {Instance}", request.InstanceUniqueName); "Debug view subscribe for unknown instance {Instance}", request.InstanceUniqueName);
Sender.Tell(new DebugViewSnapshot( Sender.Tell(new DebugViewSnapshot(
request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(), request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(),
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow)); Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow,
InstanceNotFound: true));
} }
} }
@@ -919,11 +922,14 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
} }
else else
{ {
// M2.11: set InstanceNotFound=true so the caller can distinguish
// "not deployed on this site" from a deployed-but-empty instance.
_logger.LogWarning( _logger.LogWarning(
"Debug snapshot for unknown instance {Instance}", request.InstanceUniqueName); "Debug snapshot for unknown instance {Instance}", request.InstanceUniqueName);
Sender.Tell(new DebugViewSnapshot( Sender.Tell(new DebugViewSnapshot(
request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(), request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(),
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow)); Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow,
InstanceNotFound: true));
} }
} }
@@ -754,6 +754,10 @@ public class InstanceActor : ReceiveActor
foreach (var alarm in _configuration.Alarms) foreach (var alarm in _configuration.Alarms)
{ {
Script<object?>? onTriggerScript = null; Script<object?>? onTriggerScript = null;
// M2.5 (#9): the on-trigger script's per-script execution timeout,
// captured from its ResolvedScript so the AlarmExecutionActor can
// apply perScript ?? global. Null when there is no on-trigger script.
int? onTriggerTimeoutSeconds = null;
// Compile on-trigger script if defined // Compile on-trigger script if defined
if (!string.IsNullOrEmpty(alarm.OnTriggerScriptCanonicalName)) if (!string.IsNullOrEmpty(alarm.OnTriggerScriptCanonicalName))
@@ -763,6 +767,7 @@ public class InstanceActor : ReceiveActor
if (triggerScriptDef != null) if (triggerScriptDef != null)
{ {
onTriggerTimeoutSeconds = triggerScriptDef.ExecutionTimeoutSeconds;
var result = _compilationService.Compile( var result = _compilationService.Compile(
$"alarm-trigger-{alarm.CanonicalName}", triggerScriptDef.Code); $"alarm-trigger-{alarm.CanonicalName}", triggerScriptDef.Code);
if (result.IsSuccess) if (result.IsSuccess)
@@ -794,7 +799,9 @@ public class InstanceActor : ReceiveActor
triggerExpression, triggerExpression,
attributeSnapshot, attributeSnapshot,
_healthCollector, _healthCollector,
_serviceProvider)); _serviceProvider,
// M2.5 (#9): per-script timeout for the alarm on-trigger script.
onTriggerTimeoutSeconds));
var actorRef = Context.ActorOf(props, $"alarm-{alarm.CanonicalName}"); var actorRef = Context.ActorOf(props, $"alarm-{alarm.CanonicalName}");
_alarmActors[alarm.CanonicalName] = actorRef; _alarmActors[alarm.CanonicalName] = actorRef;
@@ -43,6 +43,13 @@ public class ScriptActor : ReceiveActor, IWithTimers
private Script<object?>? _compiledScript; private Script<object?>? _compiledScript;
private ScriptTriggerConfig? _triggerConfig; private ScriptTriggerConfig? _triggerConfig;
private TimeSpan? _minTimeBetweenRuns; private TimeSpan? _minTimeBetweenRuns;
/// <summary>
/// M2.5 (#9): the per-script execution timeout in seconds, or null to use the
/// global default. Threaded down to each spawned <see cref="ScriptExecutionActor"/>,
/// which applies <c>perScript ?? global</c> (and treats ≤ 0 as "use global").
/// </summary>
private readonly int? _executionTimeoutSeconds;
private DateTimeOffset _lastExecutionTime = DateTimeOffset.MinValue; private DateTimeOffset _lastExecutionTime = DateTimeOffset.MinValue;
private int _executionCounter; private int _executionCounter;
private readonly Commons.Types.Scripts.ScriptScope _scope; private readonly Commons.Types.Scripts.ScriptScope _scope;
@@ -112,6 +119,7 @@ public class ScriptActor : ReceiveActor, IWithTimers
_healthCollector = healthCollector; _healthCollector = healthCollector;
_serviceProvider = serviceProvider; _serviceProvider = serviceProvider;
_minTimeBetweenRuns = scriptConfig.MinTimeBetweenRuns; _minTimeBetweenRuns = scriptConfig.MinTimeBetweenRuns;
_executionTimeoutSeconds = scriptConfig.ExecutionTimeoutSeconds;
_scope = scriptConfig.Scope; _scope = scriptConfig.Scope;
_compiledTriggerExpression = compiledTriggerExpression; _compiledTriggerExpression = compiledTriggerExpression;
@@ -426,7 +434,9 @@ public class ScriptActor : ReceiveActor, IWithTimers
_serviceProvider, _serviceProvider,
// Audit Log #23 (ParentExecutionId): null for trigger-driven runs; // Audit Log #23 (ParentExecutionId): null for trigger-driven runs;
// an inbound-API-routed call supplies the inbound request's id. // an inbound-API-routed call supplies the inbound request's id.
parentExecutionId)); parentExecutionId,
// M2.5 (#9): per-script timeout override (null = use global).
_executionTimeoutSeconds));
Context.ActorOf(props, executionId); Context.ActorOf(props, executionId);
} }
@@ -47,6 +47,7 @@ public class ScriptExecutionActor : ReceiveActor
/// <param name="healthCollector">Optional health collector for recording execution metrics.</param> /// <param name="healthCollector">Optional health collector for recording execution metrics.</param>
/// <param name="serviceProvider">Optional DI service provider for script execution services.</param> /// <param name="serviceProvider">Optional DI service provider for script execution services.</param>
/// <param name="parentExecutionId">ExecutionId of the spawning inbound-API execution for audit correlation; null for normal runs.</param> /// <param name="parentExecutionId">ExecutionId of the spawning inbound-API execution for audit correlation; null for normal runs.</param>
/// <param name="executionTimeoutSeconds">M2.5 (#9): per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
public ScriptExecutionActor( public ScriptExecutionActor(
string scriptName, string scriptName,
string instanceName, string instanceName,
@@ -65,7 +66,10 @@ public class ScriptExecutionActor : ReceiveActor
// Audit Log #23 (ParentExecutionId): the spawning execution's // Audit Log #23 (ParentExecutionId): the spawning execution's
// ExecutionId for an inbound-API-routed call. Null for normal // ExecutionId for an inbound-API-routed call. Null for normal
// (tag-change / timer) runs and nested Script.Call invocations. // (tag-change / timer) runs and nested Script.Call invocations.
Guid? parentExecutionId = null) Guid? parentExecutionId = null,
// M2.5 (#9): per-script execution timeout override (seconds). Null or
// non-positive falls back to the global ScriptExecutionTimeoutSeconds.
int? executionTimeoutSeconds = null)
{ {
// Immediately begin execution // Immediately begin execution
var self = Self; var self = Self;
@@ -75,7 +79,7 @@ public class ScriptExecutionActor : ReceiveActor
scriptName, instanceName, compiledScript, parameters, callDepth, scriptName, instanceName, compiledScript, parameters, callDepth,
instanceActor, sharedScriptLibrary, options, replyTo, correlationId, instanceActor, sharedScriptLibrary, options, replyTo, correlationId,
self, parent, logger, scope, healthCollector, serviceProvider, self, parent, logger, scope, healthCollector, serviceProvider,
parentExecutionId); parentExecutionId, executionTimeoutSeconds);
} }
private static void ExecuteScript( private static void ExecuteScript(
@@ -95,9 +99,15 @@ public class ScriptExecutionActor : ReceiveActor
Commons.Types.Scripts.ScriptScope scope, Commons.Types.Scripts.ScriptScope scope,
ISiteHealthCollector? healthCollector, ISiteHealthCollector? healthCollector,
IServiceProvider? serviceProvider, IServiceProvider? serviceProvider,
Guid? parentExecutionId) Guid? parentExecutionId,
int? executionTimeoutSeconds)
{ {
var timeout = TimeSpan.FromSeconds(options.ScriptExecutionTimeoutSeconds); // M2.5 (#9): per-script timeout overrides the global default. A null or
// non-positive per-script value (≤ 0) falls back to the global.
var timeout = TimeSpan.FromSeconds(
executionTimeoutSeconds is { } perScript && perScript > 0
? perScript
: options.ScriptExecutionTimeoutSeconds);
// SiteRuntime-009: run the script body on the dedicated script-execution // SiteRuntime-009: run the script body on the dedicated script-execution
// scheduler, not the shared .NET thread pool, so blocking script I/O cannot // scheduler, not the shared .NET thread pool, so blocking script I/O cannot
@@ -207,7 +217,11 @@ public class ScriptExecutionActor : ReceiveActor
// and the four cached-call telemetry constructors can stamp // and the four cached-call telemetry constructors can stamp
// it onto NotificationSubmit.SourceNode and // it onto NotificationSubmit.SourceNode and
// SiteCallOperational.SourceNode respectively. // SiteCallOperational.SourceNode respectively.
sourceNode: sourceNode); sourceNode: sourceNode,
// M2.12 (#25): thread the singleton site event logger so
// recursion-limit violations at CallScript/CallShared emit a
// script Error site event in addition to ILogger.LogError.
siteEventLogger: siteEventLogger);
var globals = new ScriptGlobals var globals = new ScriptGlobals
{ {
@@ -13,6 +13,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit; using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums; using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using AuditEvent = ZB.MOM.WW.Audit.AuditEvent; using AuditEvent = ZB.MOM.WW.Audit.AuditEvent;
using ZB.MOM.WW.ScadaBridge.SiteEventLogging;
using ZB.MOM.WW.ScadaBridge.StoreAndForward; using ZB.MOM.WW.ScadaBridge.StoreAndForward;
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts; namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
@@ -94,6 +95,13 @@ public class ScriptRuntimeContext
/// </summary> /// </summary>
private readonly string? _sourceScript; private readonly string? _sourceScript;
/// <summary>
/// M2.12 (#25): site event logger for recording recursion-limit violations
/// to the local SQLite event log. Optional — when null the emission is
/// skipped; the existing <c>_logger.LogError</c> + throw path is unchanged.
/// </summary>
private readonly ISiteEventLogger? _siteEventLogger;
/// <summary> /// <summary>
/// Audit Log #23: best-effort emitter for boundary-crossing actions executed /// Audit Log #23: best-effort emitter for boundary-crossing actions executed
/// by the script. Optional — when null the helpers degrade to a no-op audit /// by the script. Optional — when null the helpers degrade to a no-op audit
@@ -179,6 +187,13 @@ public class ScriptRuntimeContext
/// <paramref name="executionId"/>; this only records the spawner. /// <paramref name="executionId"/>; this only records the spawner.
/// </param> /// </param>
/// <param name="sourceNode">Optional cluster node identifier (node-a/node-b) for audit trail stamping.</param> /// <param name="sourceNode">Optional cluster node identifier (node-a/node-b) for audit trail stamping.</param>
/// <param name="siteEventLogger">
/// M2.12 (#25): optional site event logger. When supplied, recursion-limit
/// violations at <c>CallScript</c> and <c>CallShared</c> emit a
/// <c>script</c> Error event in addition to the existing
/// <c>ILogger.LogError</c> + throw. When null the existing behaviour is
/// unchanged; all existing callers and tests remain source-compatible.
/// </param>
public ScriptRuntimeContext( public ScriptRuntimeContext(
IActorRef instanceActor, IActorRef instanceActor,
IActorRef self, IActorRef self,
@@ -199,7 +214,8 @@ public class ScriptRuntimeContext
ICachedCallTelemetryForwarder? cachedForwarder = null, ICachedCallTelemetryForwarder? cachedForwarder = null,
Guid? executionId = null, Guid? executionId = null,
Guid? parentExecutionId = null, Guid? parentExecutionId = null,
string? sourceNode = null) string? sourceNode = null,
ISiteEventLogger? siteEventLogger = null)
{ {
_instanceActor = instanceActor; _instanceActor = instanceActor;
_self = self; _self = self;
@@ -227,6 +243,44 @@ public class ScriptRuntimeContext
// Audit Log #23 (ParentExecutionId): stored verbatim — no `?? NewGuid()` // Audit Log #23 (ParentExecutionId): stored verbatim — no `?? NewGuid()`
// fallback. A non-routed run legitimately has no parent and stays null. // fallback. A non-routed run legitimately has no parent and stays null.
_parentExecutionId = parentExecutionId; _parentExecutionId = parentExecutionId;
// M2.12 (#25): optional — null when not wired (tests / AlarmExecutionActor).
_siteEventLogger = siteEventLogger;
}
/// <summary>
/// M2.12 (#25): fire-and-forget emission of a <c>script</c> Error site event
/// for a recursion-limit violation. Mirrors the call shape used by
/// <c>ScriptExecutionActor</c>'s catch blocks (WP-32 / M1.8). A fault from
/// the site-event logger is observed-and-dropped (best-effort) via
/// <c>ContinueWith(OnlyOnFaulted)</c> — it never blocks or faults the
/// <c>_logger.LogError</c> + throw path that follows. A null logger is a no-op.
/// </summary>
private void EmitRecursionLimitEventAsync(string msg)
{
if (_siteEventLogger == null)
return;
var source = string.IsNullOrEmpty(_instanceName)
? "recursion-guard"
: $"InstanceScript:{_instanceName}";
var logTask = _siteEventLogger.LogEventAsync("script", "Error", _instanceName, source, msg);
if (!logTask.IsCompleted)
{
logTask.ContinueWith(
t => _logger.LogWarning(t.Exception,
"Site event log write failed for recursion-limit violation on instance '{Instance}'",
_instanceName),
CancellationToken.None,
TaskContinuationOptions.OnlyOnFaulted | TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
}
else if (logTask.IsFaulted)
{
_logger.LogWarning(logTask.Exception,
"Site event log write failed for recursion-limit violation on instance '{Instance}'",
_instanceName);
}
} }
/// <summary> /// <summary>
@@ -302,6 +356,8 @@ public class ScriptRuntimeContext
var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " + var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " +
$"CallScript('{scriptName}') rejected at depth {nextDepth}."; $"CallScript('{scriptName}') rejected at depth {nextDepth}.";
_logger.LogError(msg); _logger.LogError(msg);
// M2.12 (#25): emit to site event log in addition to ILogger; fire-and-forget.
EmitRecursionLimitEventAsync(msg);
throw new InvalidOperationException(msg); throw new InvalidOperationException(msg);
} }
@@ -464,6 +520,9 @@ public class ScriptRuntimeContext
var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " + var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " +
$"CallShared('{scriptName}') rejected at depth {nextDepth}."; $"CallShared('{scriptName}') rejected at depth {nextDepth}.";
_logger.LogError(msg); _logger.LogError(msg);
// M2.12 (#25): emit to site event log via the parent context's
// helper — single emission path, fire-and-forget.
_context.EmitRecursionLimitEventAsync(msg);
throw new InvalidOperationException(msg); throw new InvalidOperationException(msg);
} }
@@ -1326,9 +1385,20 @@ public class ScriptRuntimeContext
name, trackedId, target, occurredAtUtc, cancellationToken) name, trackedId, target, occurredAtUtc, cancellationToken)
.ConfigureAwait(false); .ConfigureAwait(false);
// M2.3 (#7): the gateway now attempts the write immediately and
// classifies the outcome (mirroring ExternalSystem.CachedCall). The
// result is retained because the immediate paths (WasBuffered=false —
// immediate success OR a synchronous permanent failure) bypass the
// S&F retry loop entirely, so no retry-loop telemetry ever fires.
// This helper must emit the Attempted + CachedResolve terminal rows
// itself, otherwise Tracking.Status(id) would stay Submitted forever
// and the audit log would be missing the terminal lifecycle. The
// WasBuffered=true path is unaffected — the S&F retry loop owns the
// Attempted + Resolve emissions there.
ExternalCallResult? result;
try try
{ {
await _gateway.CachedWriteAsync( result = await _gateway.CachedWriteAsync(
name, sql, parameters, _instanceName, cancellationToken, trackedId, name, sql, parameters, _instanceName, cancellationToken, trackedId,
// Audit Log #23 (ExecutionId Task 4): thread the script // Audit Log #23 (ExecutionId Task 4): thread the script
// execution's ExecutionId + SourceScript so a buffered // execution's ExecutionId + SourceScript so a buffered
@@ -1350,9 +1420,148 @@ public class ScriptRuntimeContext
throw; throw;
} }
// M2.3 (#7): immediate-completion lifecycle — emit the missing
// Attempted + CachedResolve rows when the underlying write resolved
// without engaging the store-and-forward retry loop (immediate
// success or a synchronous permanent failure).
if (result is { WasBuffered: false })
{
await EmitImmediateDbTerminalTelemetryAsync(
name, target, trackedId, result, cancellationToken)
.ConfigureAwait(false);
}
return trackedId; return trackedId;
} }
/// <summary>
/// M2.3 (#7): best-effort emission of the immediate-completion lifecycle
/// for a <c>Database.CachedWrite</c> that resolved without the S&amp;F
/// retry loop — emits an <c>Attempted</c> row then a terminal
/// <c>CachedResolve</c> row (<c>Delivered</c> on success, <c>Failed</c> on
/// a synchronous permanent SQL error). The DB parallel of
/// <see cref="EmitImmediateTerminalTelemetryAsync"/>. Any forwarder
/// failure is logged and swallowed (alog.md §7).
/// </summary>
private async Task EmitImmediateDbTerminalTelemetryAsync(
string connectionName,
string target,
TrackedOperationId trackedId,
ExternalCallResult result,
CancellationToken cancellationToken)
{
if (_cachedForwarder == null)
{
return;
}
var occurredAtUtc = DateTime.UtcNow;
// Status mapping mirrors the API path: success -> Delivered, a
// synchronous permanent failure -> Failed. A transient failure never
// reaches here (WasBuffered=true), so "the immediate attempt failed
// and the operation is done" always means a permanent failure.
var auditTerminalStatus = result.Success ? AuditStatus.Delivered : AuditStatus.Failed;
var operationalTerminalStatus = result.Success ? "Delivered" : "Failed";
// --- Attempted row -------------------------------------------------
CachedCallTelemetry? attempted = TryBuildDbTerminalTelemetry(
connectionName, target, trackedId, occurredAtUtc,
AuditKind.DbWriteCached, AuditStatus.Attempted, "Attempted",
result, isTerminal: false);
if (attempted is not null)
{
try
{
await _cachedForwarder.ForwardAsync(attempted, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-Attempted telemetry forward failed for Database.CachedWrite {Connection} (TrackedOperationId {Id})",
connectionName, trackedId);
}
}
// --- CachedResolve row --------------------------------------------
CachedCallTelemetry? resolve = TryBuildDbTerminalTelemetry(
connectionName, target, trackedId, occurredAtUtc,
AuditKind.CachedResolve, auditTerminalStatus, operationalTerminalStatus,
result, isTerminal: true);
if (resolve is not null)
{
try
{
await _cachedForwarder.ForwardAsync(resolve, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-CachedResolve telemetry forward failed for Database.CachedWrite {Connection} (TrackedOperationId {Id})",
connectionName, trackedId);
}
}
}
/// <summary>
/// Builds one immediate-completion <c>DbOutbound</c> telemetry packet, or
/// returns <c>null</c> (and logs) when construction throws — so a build
/// failure skips emission rather than aborting the script.
/// </summary>
private CachedCallTelemetry? TryBuildDbTerminalTelemetry(
string connectionName,
string target,
TrackedOperationId trackedId,
DateTime occurredAtUtc,
AuditKind kind,
AuditStatus auditStatus,
string operationalStatus,
ExternalCallResult result,
bool isTerminal)
{
try
{
return new CachedCallTelemetry(
Audit: ScadaBridgeAuditEventFactory.Create(
channel: AuditChannel.DbOutbound,
kind: kind,
status: auditStatus,
occurredAtUtc: DateTime.SpecifyKind(occurredAtUtc, DateTimeKind.Utc),
target: target,
correlationId: trackedId.Value,
executionId: _executionId,
parentExecutionId: _parentExecutionId,
sourceSiteId: string.IsNullOrEmpty(_siteId) ? null : _siteId,
sourceInstanceId: _instanceName,
sourceScript: _sourceScript,
errorMessage: result.Success ? null : result.ErrorMessage),
Operational: new SiteCallOperational(
TrackedOperationId: trackedId,
Channel: "DbOutbound",
Target: target,
SourceSite: _siteId,
SourceNode: _sourceNode,
Status: operationalStatus,
RetryCount: 0,
LastError: result.Success ? null : result.ErrorMessage,
HttpStatus: null,
CreatedAtUtc: occurredAtUtc,
UpdatedAtUtc: occurredAtUtc,
TerminalAtUtc: isTerminal ? occurredAtUtc : null));
}
catch (Exception buildEx)
{
_logger.LogWarning(buildEx,
"Failed to build immediate-{Kind} telemetry for Database.CachedWrite {Connection} (TrackedOperationId {Id}) — skipping emission",
kind, connectionName, trackedId);
return null;
}
}
private async Task EmitCachedDbSubmitTelemetryAsync( private async Task EmitCachedDbSubmitTelemetryAsync(
string connectionName, string connectionName,
TrackedOperationId trackedId, TrackedOperationId trackedId,
@@ -42,6 +42,13 @@ public class DiffService
s => s.CanonicalName, s => s.CanonicalName,
ScriptsEqual); ScriptsEqual);
// TemplateEngine-018: surface standalone connection endpoint/protocol/
// failover drift. Per-attribute binding changes already show up under
// AttributeChanges, but a connection's own ConfigurationJson /
// BackupConfigurationJson / Protocol / FailoverRetryCount edits do not —
// those only appear here.
var connectionChanges = ComputeConnectionsDiff(oldConfig, newConfig);
return new ConfigurationDiff return new ConfigurationDiff
{ {
InstanceUniqueName = newConfig.InstanceUniqueName, InstanceUniqueName = newConfig.InstanceUniqueName,
@@ -49,7 +56,8 @@ public class DiffService
NewRevisionHash = newRevisionHash, NewRevisionHash = newRevisionHash,
AttributeChanges = attributeChanges, AttributeChanges = attributeChanges,
AlarmChanges = alarmChanges, AlarmChanges = alarmChanges,
ScriptChanges = scriptChanges ScriptChanges = scriptChanges,
ConnectionChanges = connectionChanges
}; };
} }
@@ -133,7 +141,8 @@ public class DiffService
a.TriggerConfiguration == b.TriggerConfiguration && a.TriggerConfiguration == b.TriggerConfiguration &&
a.ParameterDefinitions == b.ParameterDefinitions && a.ParameterDefinitions == b.ParameterDefinitions &&
a.ReturnDefinition == b.ReturnDefinition && a.ReturnDefinition == b.ReturnDefinition &&
a.MinTimeBetweenRuns == b.MinTimeBetweenRuns; a.MinTimeBetweenRuns == b.MinTimeBetweenRuns &&
a.ExecutionTimeoutSeconds == b.ExecutionTimeoutSeconds;
/// <summary> /// <summary>
/// Compares two <see cref="ConnectionConfig"/> instances for equality across /// Compares two <see cref="ConnectionConfig"/> instances for equality across
@@ -159,11 +168,10 @@ public class DiffService
/// TemplateEngine-018: produces a per-connection diff between two flattened /// TemplateEngine-018: produces a per-connection diff between two flattened
/// configurations, emitting Added / Removed / Changed entries keyed by the /// configurations, emitting Added / Removed / Changed entries keyed by the
/// connection name. Mirrors the existing <see cref="ComputeEntityDiff{T}"/> /// connection name. Mirrors the existing <see cref="ComputeEntityDiff{T}"/>
/// shape used for attributes / alarms / scripts but is exposed as a separate /// shape used for attributes / alarms / scripts. Called by
/// method because <see cref="ConfigurationDiff"/> in /// <see cref="ComputeDiff"/> to populate
/// <c>ZB.MOM.WW.ScadaBridge.Commons</c> does not yet carry a <c>ConnectionChanges</c> /// <see cref="ConfigurationDiff.ConnectionChanges"/>, and exposed publicly so
/// slot — the public diff record will be extended in a paired Commons change /// callers can compute connection drift in isolation. A null
/// (this file is the only one in this fix's scope). A null
/// <c>Connections</c> dictionary on either side is treated as the empty map. /// <c>Connections</c> dictionary on either side is treated as the empty map.
/// </summary> /// </summary>
/// <param name="oldConfig">The previously deployed configuration, or null /// <param name="oldConfig">The previously deployed configuration, or null
@@ -830,6 +830,10 @@ public class FlatteningService
ParameterDefinitions = script.ParameterDefinitions, ParameterDefinitions = script.ParameterDefinitions,
ReturnDefinition = script.ReturnDefinition, ReturnDefinition = script.ReturnDefinition,
MinTimeBetweenRuns = script.MinTimeBetweenRuns, MinTimeBetweenRuns = script.MinTimeBetweenRuns,
// M2.5 (#9): per-script timeout rides along on the winning row.
// Scripts inherit/override at whole-row granularity (no per-field
// merge), so this follows the same rule as the script body/MinTime.
ExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds,
Source = source Source = source
}; };
idByName[script.Name] = script.Id; idByName[script.Name] = script.Id;
@@ -83,7 +83,10 @@ public class RevisionHashService
TriggerConfiguration = s.TriggerConfiguration, TriggerConfiguration = s.TriggerConfiguration,
ParameterDefinitions = s.ParameterDefinitions, ParameterDefinitions = s.ParameterDefinitions,
ReturnDefinition = s.ReturnDefinition, ReturnDefinition = s.ReturnDefinition,
MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks,
// M2.5 (#9): include the per-script timeout so a change to it
// is detected as a configuration change (staleness/redeploy).
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds
}) })
.ToList(), .ToList(),
Connections = configuration.Connections is { Count: > 0 } Connections = configuration.Connections is { Count: > 0 }
@@ -244,6 +247,10 @@ public class RevisionHashService
/// </summary> /// </summary>
public string Code { get; init; } = string.Empty; public string Code { get; init; } = string.Empty;
/// <summary> /// <summary>
/// M2.5 (#9): the per-script execution timeout in seconds (null = global).
/// </summary>
public int? ExecutionTimeoutSeconds { get; init; }
/// <summary>
/// Whether the script is locked. /// Whether the script is locked.
/// </summary> /// </summary>
public bool IsLocked { get; init; } public bool IsLocked { get; init; }
@@ -17,7 +17,7 @@ namespace ZB.MOM.WW.ScadaBridge.TemplateEngine;
/// Override granularity: /// Override granularity:
/// - Attributes: Value and Description overridable; DataType and DataSourceReference fixed. /// - Attributes: Value and Description overridable; DataType and DataSourceReference fixed.
/// - Alarms: Priority, TriggerConfiguration, Description, OnTriggerScript overridable; Name and TriggerType fixed. /// - Alarms: Priority, TriggerConfiguration, Description, OnTriggerScript overridable; Name and TriggerType fixed.
/// - Scripts: Code, TriggerConfiguration, MinTimeBetweenRuns, params/return overridable; Name fixed. /// - Scripts: Code, TriggerConfiguration, MinTimeBetweenRuns, ExecutionTimeoutSeconds, params/return overridable; Name fixed.
/// - Lock flag applies to the entire member (attribute/alarm/script). /// - Lock flag applies to the entire member (attribute/alarm/script).
/// </summary> /// </summary>
public static class LockEnforcer public static class LockEnforcer
@@ -687,6 +687,8 @@ public class TemplateService
existing.TriggerType = proposed.TriggerType; existing.TriggerType = proposed.TriggerType;
existing.TriggerConfiguration = proposed.TriggerConfiguration; existing.TriggerConfiguration = proposed.TriggerConfiguration;
existing.MinTimeBetweenRuns = proposed.MinTimeBetweenRuns; existing.MinTimeBetweenRuns = proposed.MinTimeBetweenRuns;
// M2.5 (#9): per-script execution timeout is an overridable field.
existing.ExecutionTimeoutSeconds = proposed.ExecutionTimeoutSeconds;
existing.ParameterDefinitions = proposed.ParameterDefinitions; existing.ParameterDefinitions = proposed.ParameterDefinitions;
existing.ReturnDefinition = proposed.ReturnDefinition; existing.ReturnDefinition = proposed.ReturnDefinition;
existing.IsLocked = proposed.IsLocked; existing.IsLocked = proposed.IsLocked;
@@ -1013,6 +1015,7 @@ public class TemplateService
ParameterDefinitions = script.ParameterDefinitions, ParameterDefinitions = script.ParameterDefinitions,
ReturnDefinition = script.ReturnDefinition, ReturnDefinition = script.ReturnDefinition,
MinTimeBetweenRuns = script.MinTimeBetweenRuns, MinTimeBetweenRuns = script.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds,
IsInherited = true, IsInherited = true,
LockedInDerived = false, LockedInDerived = false,
}); });
@@ -80,6 +80,7 @@ public class SemanticValidator
else else
{ {
ValidateCallParameters(script.CanonicalName, call, sharedParamMap, errors); ValidateCallParameters(script.CanonicalName, call, sharedParamMap, errors);
ValidateCallReturnType(script.CanonicalName, call, sharedReturnMap, errors);
} }
} }
else else
@@ -94,6 +95,7 @@ public class SemanticValidator
else else
{ {
ValidateCallParameters(script.CanonicalName, call, scriptParamMap, errors); ValidateCallParameters(script.CanonicalName, call, scriptParamMap, errors);
ValidateCallReturnType(script.CanonicalName, call, scriptReturnMap, errors);
// Instance scripts cannot call alarm on-trigger scripts // Instance scripts cannot call alarm on-trigger scripts
if (alarmOnTriggerScripts.Contains(call.TargetName)) if (alarmOnTriggerScripts.Contains(call.TargetName))
@@ -262,6 +264,109 @@ public class SemanticValidator
errors.Add(ValidationEntry.Error(ValidationCategory.ParameterMismatch, errors.Add(ValidationEntry.Error(ValidationCategory.ParameterMismatch,
$"Script '{callerName}' calls '{call.TargetName}' with {call.ArgumentCount} arguments but {expectedParams.Count} are expected.", $"Script '{callerName}' calls '{call.TargetName}' with {call.ArgumentCount} arguments but {expectedParams.Count} are expected.",
callerName)); callerName));
// Count mismatch already reported — positional type matching below
// would be misaligned, so don't compound the noise.
return;
}
ValidateArgumentTypes(callerName, call, expectedParams, errors);
}
/// <summary>
/// #21 — Argument-type validation. Compares each positionally-matched call
/// argument expression against the target's declared parameter type and
/// flags only CLEAR cross-category mismatches.
///
/// Conservatism (false-positive avoidance) — a parameter is checked only
/// when BOTH sides are confidently known:
/// <list type="bullet">
/// <item>Declared type must normalize to a known primitive (String, Integer,
/// Float, Boolean). <c>Object</c>/<c>List</c>/unknown declarations accept
/// anything — never flagged.</item>
/// <item>Argument expression type must be inferable from a literal
/// (string/char, integer, decimal, <c>true</c>/<c>false</c>). Variables,
/// member access, method/await chains, <c>null</c>, casts, object/array
/// initializers, and anything else infer to Unknown and are never flagged.</item>
/// <item>Integer⇄Float is treated as compatible (numeric widening) — never
/// flagged.</item>
/// </list>
/// </summary>
private static void ValidateArgumentTypes(
string callerName,
CallTarget call,
List<string> expectedParams,
List<ValidationEntry> errors)
{
// Argument expressions are aligned 1:1 with parameters here (count was
// verified equal by the caller). If the argument text couldn't be split
// (e.g. it wasn't captured), skip silently.
if (call.ArgumentExpressions.Count != expectedParams.Count)
return;
for (var i = 0; i < expectedParams.Count; i++)
{
var declared = NormalizeType(expectedParams[i]);
if (declared is null)
continue; // Object/List/unknown declaration accepts anything.
var actual = InferLiteralType(call.ArgumentExpressions[i]);
if (actual is null)
continue; // Can't confidently infer the argument's type.
if (!IsAssignable(actual.Value, declared.Value))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ParameterMismatch,
$"Script '{callerName}' calls '{call.TargetName}' argument {i + 1} with type '{actual}' but parameter '{expectedParams[i]}' expects '{declared}'.",
callerName));
}
}
}
/// <summary>
/// #20 — Return-type validation. When a call result is assigned directly
/// into a typed local declaration (<c>int x = CallScript(...)</c>,
/// <c>bool b = await CallShared(...)</c>), compares the LHS declared type
/// against the target's declared return type and flags clear mismatches.
///
/// Conservatism (false-positive avoidance) — flagged only when ALL hold:
/// <list type="bullet">
/// <item>The call result is captured by a typed local whose type is a known
/// primitive (so <c>var</c>, <c>object</c>, <c>dynamic</c>, and untyped
/// reuse are never flagged).</item>
/// <item>The call is the WHOLE initializer (optionally preceded by
/// <c>await</c>). If the result feeds an expression / method chain
/// (e.g. <c>(int)(await CallScript(...))</c>, <c>CallScript(...).X</c>)
/// the assigned-type is not captured and nothing is flagged.</item>
/// <item>The target declares a known-primitive return type. Missing/Object/
/// List/unknown returns are never flagged.</item>
/// <item>Integer⇄Float is compatible (numeric widening) — never flagged.</item>
/// </list>
/// </summary>
private static void ValidateCallReturnType(
string callerName,
CallTarget call,
Dictionary<string, string?> returnMap,
List<ValidationEntry> errors)
{
if (call.AssignedToType is null)
return; // Result not captured by a typed local (var/untyped/unused).
var expected = NormalizeType(call.AssignedToType);
if (expected is null)
return; // LHS isn't a known primitive — don't guess.
if (!returnMap.TryGetValue(call.TargetName, out var returnDef))
return;
var actual = NormalizeType(ParseReturnDefinitionType(returnDef));
if (actual is null)
return; // Target's return type unknown/non-primitive.
if (!IsAssignable(actual.Value, expected.Value))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ReturnTypeMismatch,
$"Script '{callerName}' assigns the '{actual}' return value of '{call.TargetName}' to a '{expected}' variable.",
callerName));
} }
} }
@@ -270,12 +375,90 @@ public class SemanticValidator
var result = new Dictionary<string, List<string>>(StringComparer.Ordinal); var result = new Dictionary<string, List<string>>(StringComparer.Ordinal);
foreach (var script in scripts) foreach (var script in scripts)
{ {
var parameters = ParseParameterDefinitions(script.ParameterDefinitions); // Per-parameter declared TYPE in declared order (raw type strings).
// One entry per parameter, so the existing count check is preserved
// while #21 also has the types it needs for positional matching.
var parameters = ParseParameterTypes(script.ParameterDefinitions);
result[script.CanonicalName] = parameters; result[script.CanonicalName] = parameters;
} }
return result; return result;
} }
/// <summary>
/// Parses a parameter definitions JSON string (JSON Schema or legacy flat
/// array) and returns the declared parameter TYPE for each parameter, in
/// declared order. Names are not needed for positional call validation; the
/// returned count equals the parameter count (preserving the count check).
/// </summary>
/// <param name="parameterDefinitionsJson">JSON Schema or legacy flat-array string; null/empty returns an empty list.</param>
/// <returns>The per-parameter raw type strings (e.g. "Int32", "string", "List").</returns>
internal static List<string> ParseParameterTypes(string? parameterDefinitionsJson)
{
if (string.IsNullOrWhiteSpace(parameterDefinitionsJson))
return [];
try
{
using var doc = JsonDocument.Parse(parameterDefinitionsJson);
// JSON Schema: { type:"object", properties:{ name:{ type:"integer" }, ... } }
if (doc.RootElement.ValueKind == JsonValueKind.Object)
{
if (doc.RootElement.TryGetProperty("properties", out var props)
&& props.ValueKind == JsonValueKind.Object)
{
return props.EnumerateObject()
.Select(p => p.Value.ValueKind == JsonValueKind.Object
&& p.Value.TryGetProperty("type", out var t)
&& t.ValueKind == JsonValueKind.String
? t.GetString() ?? "unknown"
: "unknown")
.ToList();
}
}
// Legacy flat form: [{ name, type, required? }]
else if (doc.RootElement.ValueKind == JsonValueKind.Array)
{
return doc.RootElement.EnumerateArray()
.Select(e => e.TryGetProperty("type", out var t) ? t.GetString() ?? "unknown" : "unknown")
.ToList();
}
}
catch (JsonException)
{
}
return [];
}
/// <summary>
/// Extracts the declared return type from a ReturnDefinition JSON string
/// (JSON Schema <c>{type:"..."}</c> or legacy <c>{type:"..."}</c>). Returns
/// null when absent or unparseable.
/// </summary>
/// <param name="returnDefinitionJson">JSON return definition; null/empty returns null.</param>
/// <returns>The raw return type string (e.g. "boolean", "Int32"), or null.</returns>
internal static string? ParseReturnDefinitionType(string? returnDefinitionJson)
{
if (string.IsNullOrWhiteSpace(returnDefinitionJson))
return null;
try
{
using var doc = JsonDocument.Parse(returnDefinitionJson);
if (doc.RootElement.ValueKind == JsonValueKind.Object
&& doc.RootElement.TryGetProperty("type", out var t)
&& t.ValueKind == JsonValueKind.String)
{
return t.GetString();
}
}
catch (JsonException)
{
}
return null;
}
private static Dictionary<string, string?> BuildReturnMap(IReadOnlyList<ResolvedScript> scripts) private static Dictionary<string, string?> BuildReturnMap(IReadOnlyList<ResolvedScript> scripts)
{ {
var result = new Dictionary<string, string?>(StringComparer.Ordinal); var result = new Dictionary<string, string?>(StringComparer.Ordinal);
@@ -353,12 +536,22 @@ public class SemanticValidator
var target = ExtractStringArgument(code, argsStart); var target = ExtractStringArgument(code, argsStart);
if (target != null) if (target != null)
{ {
var argCount = CountArguments(code, argsStart); // First argument is the script name; the rest are the call's
// positional arguments.
var args = SplitCallArguments(code, argsStart);
var argExpressions = args.Count > 1
? args.GetRange(1, args.Count - 1)
: new List<string>();
results.Add(new CallTarget results.Add(new CallTarget
{ {
TargetName = target, TargetName = target,
IsShared = isShared, IsShared = isShared,
ArgumentCount = Math.Max(0, argCount - 1) // First arg is the name, rest are parameters ArgumentCount = argExpressions.Count,
ArgumentExpressions = argExpressions,
// #20: the declared type the result is assigned into, if the
// call is the whole initializer of a typed local declaration.
AssignedToType = ExtractAssignedToType(code, idx)
}); });
} }
@@ -366,6 +559,372 @@ public class SemanticValidator
} }
} }
/// <summary>
/// Splits a call's argument list (starting just after the opening paren)
/// into top-level argument expressions, trimmed. Tracks parenthesis, brace,
/// and bracket nesting plus string/char literals so object initializers,
/// nested calls, collection expressions, and commas inside literals don't
/// produce spurious splits. Element 0 is the script-name argument.
/// </summary>
private static List<string> SplitCallArguments(string code, int startPos)
{
var args = new List<string>();
var depthParen = 1; // we start inside the call's own '('
var depthBraceBracket = 0;
var pos = startPos;
var argStart = startPos;
while (pos < code.Length)
{
var c = code[pos];
switch (c)
{
case '(':
depthParen++;
break;
case ')':
depthParen--;
if (depthParen == 0)
{
AddArg(code, argStart, pos, args);
return args;
}
break;
case '{':
case '[':
depthBraceBracket++;
break;
case '}':
case ']':
if (depthBraceBracket > 0) depthBraceBracket--;
break;
case ',' when depthParen == 1 && depthBraceBracket == 0:
AddArg(code, argStart, pos, args);
argStart = pos + 1;
break;
case '"':
case '\'':
// Skip the literal body so its delimiters/commas are ignored.
pos++;
while (pos < code.Length && code[pos] != c)
{
if (code[pos] == '\\') pos++; // skip escaped char
pos++;
}
break;
case '/':
// Skip C# line and block comments so commas inside them are ignored.
// A `/` inside a string literal is already consumed above, so we only
// reach here for real `/` tokens in code.
if (pos + 1 < code.Length)
{
if (code[pos + 1] == '/')
{
// Line comment: skip to end-of-line.
pos += 2;
while (pos < code.Length && code[pos] != '\n') pos++;
}
else if (code[pos + 1] == '*')
{
// Block comment: skip to closing `*/`.
pos += 2;
while (pos + 1 < code.Length && !(code[pos] == '*' && code[pos + 1] == '/'))
pos++;
if (pos + 1 < code.Length) pos++; // step over the `/`
}
}
break;
}
pos++;
}
// Unterminated call (shouldn't happen for compilable code) — best effort.
AddArg(code, argStart, code.Length, args);
return args;
static void AddArg(string code, int start, int end, List<string> acc)
{
var text = code[start..end].Trim();
// Only the trailing empty slice after a lone name (e.g. "foo",) is
// dropped; an empty arg list ("foo") still yields just the name.
if (text.Length > 0 || acc.Count == 0)
acc.Add(text);
}
}
/// <summary>
/// #20 inference — looks backwards from the call's start index for a typed
/// local declaration whose initializer is exactly this call (optionally
/// preceded by <c>await</c>). The call may be qualified by a simple receiver
/// (<c>Instance.</c>, <c>Scripts.</c>, <c>Parent.</c>,
/// <c>Children["x"].</c>) which is skipped. Returns the declared LHS type
/// token, or null when the result isn't captured by a simple typed local
/// (e.g. <c>var</c>, no assignment, reassignment to an existing variable, or
/// the call is part of a larger expression such as a cast or longer
/// member-access chain).
/// </summary>
private static string? ExtractAssignedToType(string code, int callIndex)
{
// Walk back over a simple dotted receiver immediately before the call —
// e.g. the "Instance." / "Scripts." / "Children[\"x\"]." prefix on a
// qualified call. Only identifier chars, '.', and bracketed indexers
// (with string/identifier contents) are skipped; anything else (a ')',
// an operator, another call's '(') means the call is embedded in a
// larger expression and we must not infer.
var receiverStart = SkipReceiverBackwards(code, callIndex);
// Walk back over whitespace immediately before the receiver/call.
var i = receiverStart - 1;
while (i >= 0 && char.IsWhiteSpace(code[i])) i--;
if (i < 0) return null;
// The call must be the entire RHS: the char before it (after optional
// 'await') must be '='. Anything else (')', '.', '(', operators) means
// the result is consumed by a larger expression — don't infer.
var beforeCall = code[..(i + 1)];
// Strip a trailing 'await' so "= await CallScript(...)" is handled.
var awaitTrimmed = beforeCall.TrimEnd();
if (awaitTrimmed.EndsWith("await", StringComparison.Ordinal)
&& (awaitTrimmed.Length == 5 || !IsIdentifierChar(awaitTrimmed[^6])))
{
beforeCall = awaitTrimmed[..^5];
}
beforeCall = beforeCall.TrimEnd();
if (!beforeCall.EndsWith('=')) return null;
// Exclude '==', '<=', '>=', '!=' etc. — comparisons, not assignment.
if (beforeCall.Length >= 2)
{
var prev = beforeCall[^2];
if (prev is '=' or '!' or '<' or '>' or '+' or '-' or '*' or '/' or '%' or '&' or '|' or '^')
return null;
}
// Now parse the "<type> <name>" declaration that precedes the '='.
var decl = beforeCall[..^1].TrimEnd();
// Identifier (the variable name).
var end = decl.Length;
var nameEnd = end;
while (nameEnd > 0 && IsIdentifierChar(decl[nameEnd - 1])) nameEnd--;
if (nameEnd == end) return null; // no identifier
var nameStart = nameEnd;
// Whitespace between type and name.
var ws = nameStart;
while (ws > 0 && char.IsWhiteSpace(decl[ws - 1])) ws--;
if (ws == nameStart) return null; // need separating whitespace → "type name"
// The type token (single identifier/keyword — no generics/arrays here;
// those normalize to unknown anyway and stay unflagged).
var typeEnd = ws;
var typeStart = typeEnd;
while (typeStart > 0 && IsIdentifierChar(decl[typeStart - 1])) typeStart--;
if (typeStart == typeEnd) return null;
// Guard against picking up a keyword that isn't a type in this position
// (e.g. "return x = ..."). A real declaration's type token is preceded
// by a statement boundary or open brace, not by another identifier.
if (typeStart > 0)
{
var b = typeStart - 1;
while (b >= 0 && char.IsWhiteSpace(decl[b])) b--;
if (b >= 0 && IsIdentifierChar(decl[b]))
return null; // preceded by another word → not a clean declaration
}
return decl[typeStart..typeEnd];
}
private static bool IsIdentifierChar(char c) => char.IsLetterOrDigit(c) || c == '_';
/// <summary>
/// Given the index of a <c>CallScript</c>/<c>CallShared</c> token, walks
/// backwards over a leading receiver expression composed only of identifier
/// chars, '.', and bracketed indexers (<c>["x"]</c>), and returns the index
/// where that receiver begins. If there is no '.' immediately before the
/// token (an unqualified call) the original index is returned unchanged.
/// Stops at the first character that can't be part of such a simple
/// receiver, so casts/parenthesised/chained-method receivers aren't
/// mistaken for a clean assignment target.
/// </summary>
private static int SkipReceiverBackwards(string code, int callIndex)
{
var i = callIndex - 1;
// Optional whitespace then must be a '.' for there to be a receiver.
while (i >= 0 && char.IsWhiteSpace(code[i])) i--;
if (i < 0 || code[i] != '.') return callIndex;
var start = callIndex;
while (i >= 0)
{
var c = code[i];
if (c == '.' || IsIdentifierChar(c) || char.IsWhiteSpace(c))
{
start = i;
i--;
continue;
}
if (c == ']')
{
// Skip a single (non-nested) indexer "[ ... ]" with string or
// identifier contents — e.g. Children["pump"].
var j = i - 1;
while (j >= 0 && code[j] != '[' && code[j] != '(' && code[j] != ')')
j--;
if (j < 0 || code[j] != '[') return start;
start = j;
i = j - 1;
continue;
}
break;
}
return start;
}
// ── Script-level type vocabulary (#20/#21) ──────────────────────────────
//
// The template scripting "type system" exposed in ParameterDefinitions /
// ReturnDefinition is a small set: String, Integer, Float, Boolean, plus
// Object / List (and arbitrary unrecognised names). Only the four scalar
// primitives below are matched; everything else maps to null ("unknown"),
// which the validators treat as "accept anything / don't flag".
private enum ScriptType { String, Integer, Float, Boolean }
/// <summary>
/// Maps a declared type token (JSON-Schema name, legacy name, or a C# type
/// keyword used on a call-site LHS) onto a <see cref="ScriptType"/>, or null
/// when the type isn't one of the confidently-checkable primitives.
/// </summary>
private static ScriptType? NormalizeType(string? raw)
{
if (string.IsNullOrWhiteSpace(raw)) return null;
return raw.Trim().ToLowerInvariant() switch
{
"string" or "datetime" => ScriptType.String,
"integer" or "int" or "int32" or "int64" or "long" or "short" or "byte" => ScriptType.Integer,
"float" or "double" or "decimal" or "number" or "single" => ScriptType.Float,
"boolean" or "bool" => ScriptType.Boolean,
// Object, List, array, var, dynamic, and anything else → unknown.
_ => null,
};
}
/// <summary>
/// Infers the <see cref="ScriptType"/> of a call-site argument expression,
/// but ONLY for unambiguous literals. Returns null for variables, member
/// access, method/await chains, <c>null</c>, casts, parenthesised/compound
/// expressions, and object/array/collection initializers — those can't be
/// statically typed here and must never be flagged.
/// </summary>
private static ScriptType? InferLiteralType(string expr)
{
expr = expr.Trim();
if (expr.Length == 0) return null;
// String / char literal — but only if the WHOLE expression is the
// literal (so "a" + x or x + "b" stays unknown).
if ((expr[0] == '"' || expr[0] == '\'') && IsWholeStringLiteral(expr))
return ScriptType.String;
if (expr.StartsWith('@') && expr.Length > 1 && expr[1] == '"' && IsWholeStringLiteral(expr[1..]))
return ScriptType.String;
if (expr.StartsWith('$'))
return null; // interpolated string — string-ish, but be conservative.
if (expr is "true" or "false")
return ScriptType.Boolean;
// Numeric literal (optionally signed). Float if it has a '.', 'e'/'E'
// exponent, or a float/double/decimal suffix; otherwise Integer.
if (IsNumericLiteral(expr, out var isFloat))
return isFloat ? ScriptType.Float : ScriptType.Integer;
return null; // Not a literal we can confidently classify.
}
private static bool IsWholeStringLiteral(string expr)
{
if (expr.Length < 2) return false;
var quote = expr[0];
if (quote != '"' && quote != '\'') return false;
var i = 1;
while (i < expr.Length)
{
if (expr[i] == '\\') { i += 2; continue; }
if (expr[i] == quote) return i == expr.Length - 1; // closing quote must be last char
i++;
}
return false;
}
private static bool IsNumericLiteral(string expr, out bool isFloat)
{
isFloat = false;
var i = 0;
if (expr.Length == 0) return false;
if (expr[0] == '+' || expr[0] == '-') i++;
// A genuine numeric literal must start with a digit or a `.` followed by a
// digit. Identifiers that start with `_` or a letter (e.g. `_2`, `count`)
// are explicitly rejected here so they are inferred as Unknown, not Integer.
if (i >= expr.Length) return false;
var first = expr[i];
if (first == '.')
{
if (i + 1 >= expr.Length || !char.IsDigit(expr[i + 1])) return false;
}
else if (!char.IsDigit(first))
{
return false; // starts with `_`, letter, or anything else → not a literal
}
var sawDigit = false;
var sawDot = false;
var sawExp = false;
for (; i < expr.Length; i++)
{
var c = expr[i];
if (char.IsDigit(c)) { sawDigit = true; continue; }
if (c == '_' && sawDigit) continue; // digit separator — only valid between digits
if (c == '.' && !sawDot && !sawExp) { sawDot = true; isFloat = true; continue; }
if ((c == 'e' || c == 'E') && !sawExp && sawDigit)
{
sawExp = true; isFloat = true;
if (i + 1 < expr.Length && (expr[i + 1] == '+' || expr[i + 1] == '-')) i++;
continue;
}
// Numeric suffix terminates the literal.
if (i == expr.Length - 1 || (i == expr.Length - 2))
{
var suffix = expr[i..].ToLowerInvariant();
switch (suffix)
{
case "f": case "d": case "m": isFloat = true; return sawDigit;
case "l": case "u": case "ul": case "lu": return sawDigit; // integer suffixes
}
}
return false; // any other char → not a plain numeric literal
}
return sawDigit;
}
/// <summary>
/// Whether an argument/return of <paramref name="actual"/> type is
/// acceptable where <paramref name="expected"/> is declared. Exact match, or
/// Integer⇄Float numeric widening. All other cross-category pairings
/// (String↔number, String↔Boolean, Boolean↔number) are mismatches.
/// </summary>
private static bool IsAssignable(ScriptType actual, ScriptType expected)
{
if (actual == expected) return true;
// Numeric widening / narrowing between Integer and Float is tolerated —
// the scripting runtime coerces these and flagging them is noisy.
return (actual == ScriptType.Integer && expected == ScriptType.Float)
|| (actual == ScriptType.Float && expected == ScriptType.Integer);
}
private static string? ExtractStringArgument(string code, int startPos) private static string? ExtractStringArgument(string code, int startPos)
{ {
// Skip whitespace // Skip whitespace
@@ -387,43 +946,6 @@ public class SemanticValidator
return code[nameStart..pos]; return code[nameStart..pos];
} }
private static int CountArguments(string code, int startPos)
{
var depth = 1;
var count = 1; // At least one argument (the name)
var pos = startPos;
while (pos < code.Length && depth > 0)
{
switch (code[pos])
{
case '(':
depth++;
break;
case ')':
depth--;
break;
case ',' when depth == 1:
count++;
break;
case '"':
case '\'':
// Skip string literals
var quote = code[pos];
pos++;
while (pos < code.Length && code[pos] != quote)
{
if (code[pos] == '\\') pos++; // Skip escaped chars
pos++;
}
break;
}
pos++;
}
return count;
}
internal record CallTarget internal record CallTarget
{ {
/// <summary>Name of the script being called.</summary> /// <summary>Name of the script being called.</summary>
@@ -432,5 +954,13 @@ public class SemanticValidator
public bool IsShared { get; init; } public bool IsShared { get; init; }
/// <summary>Number of non-name arguments passed to the call.</summary> /// <summary>Number of non-name arguments passed to the call.</summary>
public int ArgumentCount { get; init; } public int ArgumentCount { get; init; }
/// <summary>The trimmed text of each non-name positional argument expression, in order.</summary>
public IReadOnlyList<string> ArgumentExpressions { get; init; } = [];
/// <summary>
/// The declared type token the call result is assigned into, when the
/// call is the whole initializer of a typed local declaration; otherwise
/// null (var/untyped/unused/expression-embedded). Used by #20.
/// </summary>
public string? AssignedToType { get; init; }
} }
} }
@@ -14,7 +14,10 @@ namespace ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
/// 4. Alarm trigger references exist (referenced attributes must be in the flattened config) /// 4. Alarm trigger references exist (referenced attributes must be in the flattened config)
/// 5. Script trigger references exist (referenced attributes must be in the flattened config) /// 5. Script trigger references exist (referenced attributes must be in the flattened config)
/// 6. Expression triggers — blank check, syntax check, and attribute-reference scan /// 6. Expression triggers — blank check, syntax check, and attribute-reference scan
/// 7. Connection binding completeness (all data-sourced attributes must have a binding) /// 7. Connection binding completeness — every data-sourced attribute must have a binding,
/// and (on the deploy path) the bound connection must exist on the target site.
/// Severity is context-dependent: a non-blocking Warning at template design time
/// (bindings are set later) and a deploy-gating Error when enforced (M2.8 / #23).
/// 8. Does NOT verify tag path resolution on devices /// 8. Does NOT verify tag path resolution on devices
/// </summary> /// </summary>
public class ValidationService public class ValidationService
@@ -45,8 +48,44 @@ public class ValidationService
/// </summary> /// </summary>
/// <param name="configuration">The flattened configuration to validate.</param> /// <param name="configuration">The flattened configuration to validate.</param>
/// <param name="sharedScripts">Optional list of shared scripts for validation context.</param> /// <param name="sharedScripts">Optional list of shared scripts for validation context.</param>
/// <param name="alarmCapableConnectionNames">
/// Optional set of site data-connection names whose protocol resolves to an
/// alarm-capable adapter (see
/// <see cref="Commons.Interfaces.Protocol.AlarmCapableProtocols"/>). When supplied,
/// the semantic validator gates every native-alarm-source binding against it.
/// <c>null</c> skips the capability check (its absence makes the check inert).
/// </param>
/// <param name="enforceConnectionBindings">
/// M2.8 (#23): controls the severity of the connection-binding-completeness check.
/// <para>
/// <c>false</c> (default) — template DESIGN-TIME: a data-sourced attribute that is
/// not yet bound produces only a non-blocking <c>Warning</c>. Bindings are set later,
/// at instance/deploy time, so an unbound data-sourced template attribute is legitimate
/// here (see <see cref="ManagementService"/>'s ValidateTemplate path, which builds a
/// config straight from raw template members with no bindings).
/// </para>
/// <para>
/// <c>true</c> — DEPLOY path (<see cref="DeploymentManager"/>'s FlatteningPipeline):
/// an unbound data-sourced attribute becomes a deploy-gating <c>Error</c> (IsValid false),
/// and — when <paramref name="siteConnectionNames"/> is supplied — a binding pointing at a
/// connection that does not exist on the target site is also an <c>Error</c>.
/// </para>
/// </param>
/// <param name="siteConnectionNames">
/// M2.8 (#23): optional set of the data-connection names that actually exist on the
/// target site (computed by the deploy pipeline from the site's loaded connections,
/// mirroring <paramref name="alarmCapableConnectionNames"/>). When supplied (and
/// <paramref name="enforceConnectionBindings"/> is <c>true</c>), every bound
/// connection is checked against this set so a binding to a phantom/stale connection
/// is caught. <c>null</c> skips the "exists at site" half (it stays inert).
/// </param>
/// <returns>A merged <see cref="ValidationResult"/> aggregating all pipeline stage outcomes.</returns> /// <returns>A merged <see cref="ValidationResult"/> aggregating all pipeline stage outcomes.</returns>
public ValidationResult Validate(FlattenedConfiguration configuration, IReadOnlyList<ResolvedScript>? sharedScripts = null) public ValidationResult Validate(
FlattenedConfiguration configuration,
IReadOnlyList<ResolvedScript>? sharedScripts = null,
IReadOnlySet<string>? alarmCapableConnectionNames = null,
bool enforceConnectionBindings = false,
IReadOnlySet<string>? siteConnectionNames = null)
{ {
ArgumentNullException.ThrowIfNull(configuration); ArgumentNullException.ThrowIfNull(configuration);
@@ -58,8 +97,8 @@ public class ValidationService
ValidateAlarmTriggerReferences(configuration), ValidateAlarmTriggerReferences(configuration),
ValidateScriptTriggerReferences(configuration), ValidateScriptTriggerReferences(configuration),
ValidateExpressionTriggers(configuration), ValidateExpressionTriggers(configuration),
ValidateConnectionBindingCompleteness(configuration), ValidateConnectionBindingCompleteness(configuration, enforceConnectionBindings, siteConnectionNames),
_semanticValidator.Validate(configuration, sharedScripts) _semanticValidator.Validate(configuration, sharedScripts, alarmCapableConnectionNames)
}; };
return ValidationResult.Merge(results.ToArray()); return ValidationResult.Merge(results.ToArray());
@@ -497,21 +536,88 @@ public class ValidationService
} }
/// <summary> /// <summary>
/// Validates that all data-sourced attributes have connection bindings. /// Validates connection bindings on data-sourced attributes. Only DATA-SOURCED
/// attributes (<see cref="ResolvedAttribute.DataSourceReference"/> != <c>null</c>)
/// require a binding; static attributes are never flagged.
///
/// M2.8 (#23): the severity is context-dependent (see <paramref name="enforce"/>).
/// At template design time (<c>enforce == false</c>) an unbound data-sourced
/// attribute is legitimate (bindings are set later) so it is only a non-blocking
/// <c>Warning</c>. On the deploy path (<c>enforce == true</c>) an unbound
/// data-sourced attribute is a deploy-gating <c>Error</c>, and — when
/// <paramref name="siteConnectionNames"/> is supplied — a binding to a connection
/// that does not exist on the target site is also an <c>Error</c>.
/// </summary> /// </summary>
/// <param name="configuration">The flattened configuration to validate.</param> /// <param name="configuration">The flattened configuration to validate.</param>
/// <returns>A <see cref="ValidationResult"/> with warnings for each data-sourced attribute that lacks a connection binding.</returns> /// <param name="enforce">
public static ValidationResult ValidateConnectionBindingCompleteness(FlattenedConfiguration configuration) /// <c>true</c> on the deploy path (unbound → Error + "exists at site" check);
/// <c>false</c> at design time (unbound → Warning only). Defaults to <c>false</c>
/// so design-time validation stays non-blocking.
/// </param>
/// <param name="siteConnectionNames">
/// Optional set of data-connection names that actually exist on the target site.
/// When non-<c>null</c> and <paramref name="enforce"/> is <c>true</c>, every bound
/// connection name is checked against this set. <c>null</c> skips the "exists at
/// site" check.
/// </param>
/// <returns>A <see cref="ValidationResult"/> with the binding findings at the appropriate severity.</returns>
public static ValidationResult ValidateConnectionBindingCompleteness(
FlattenedConfiguration configuration,
bool enforce = false,
IReadOnlySet<string>? siteConnectionNames = null)
{ {
var errors = new List<ValidationEntry>(); var errors = new List<ValidationEntry>();
var warnings = new List<ValidationEntry>(); var warnings = new List<ValidationEntry>();
foreach (var attr in configuration.Attributes) foreach (var attr in configuration.Attributes)
{ {
if (attr.DataSourceReference != null && attr.BoundDataConnectionId == null) // Only data-sourced attributes participate in binding validation.
if (attr.DataSourceReference == null)
continue;
if (attr.BoundDataConnectionId == null)
{ {
warnings.Add(ValidationEntry.Warning(ValidationCategory.ConnectionBinding, // Unbound data-sourced attribute. At deploy time this gates the
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.", // deployment; at design time the binding is set later, so it is
// only advisory.
//
// NOTE: this branch fires for TWO distinct cases that are
// indistinguishable post-flattening:
// 1. The user genuinely never set a binding.
// 2. The user set a binding, but FlatteningService.ApplyConnectionBindings
// silently dropped it because the stored DataConnectionId no longer
// resolves to any loaded site DataConnection (i.e. the connection was
// deleted after the binding was created). In that case the flattener
// leaves BoundDataConnectionId == null, and the attribute falls into
// this same "unbound → Error" path.
// The error message covers both cases; no behavioral change is needed.
if (enforce)
{
errors.Add(ValidationEntry.Error(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.",
attr.CanonicalName));
}
else
{
warnings.Add(ValidationEntry.Warning(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.",
attr.CanonicalName));
}
// Skip the "exists at site" check below — it only applies to bound attributes.
continue;
}
// The attribute IS bound. On the deploy path, verify the bound connection
// actually exists on the target site (resolve against the site's connection
// set, not just name presence in the config). A binding pointing at a
// non-existent/stale site connection is a deploy-gating Error.
if (enforce && siteConnectionNames != null &&
attr.BoundDataConnectionName != null &&
!siteConnectionNames.Contains(attr.BoundDataConnectionName))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' is bound to data connection '{attr.BoundDataConnectionName}' " +
"which does not exist on the target site.",
attr.CanonicalName)); attr.CanonicalName));
} }
} }
@@ -2339,6 +2339,7 @@ public sealed class BundleImporter : IBundleImporter
ParameterDefinitions = s.ParameterDefinitions, ParameterDefinitions = s.ParameterDefinitions,
ReturnDefinition = s.ReturnDefinition, ReturnDefinition = s.ReturnDefinition,
MinTimeBetweenRuns = s.MinTimeBetweenRuns, MinTimeBetweenRuns = s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds,
Source = "Template", Source = "Template",
}); });
} }
@@ -99,7 +99,10 @@ public sealed record TemplateScriptDto(
string? ParameterDefinitions, string? ParameterDefinitions,
string? ReturnDefinition, string? ReturnDefinition,
bool IsLocked, bool IsLocked,
TimeSpan? MinTimeBetweenRuns); TimeSpan? MinTimeBetweenRuns,
// M2.5 (#9): per-script execution timeout (seconds). Additive trailing field;
// null on bundles written before this field existed.
int? ExecutionTimeoutSeconds = null);
public sealed record TemplateCompositionDto( public sealed record TemplateCompositionDto(
string InstanceName, string InstanceName,
@@ -74,7 +74,8 @@ public sealed class EntitySerializer
ParameterDefinitions: s.ParameterDefinitions, ParameterDefinitions: s.ParameterDefinitions,
ReturnDefinition: s.ReturnDefinition, ReturnDefinition: s.ReturnDefinition,
IsLocked: s.IsLocked, IsLocked: s.IsLocked,
MinTimeBetweenRuns: s.MinTimeBetweenRuns)).ToList(), MinTimeBetweenRuns: s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds: s.ExecutionTimeoutSeconds)).ToList(),
Compositions: t.Compositions.Select(c => new TemplateCompositionDto( Compositions: t.Compositions.Select(c => new TemplateCompositionDto(
InstanceName: c.InstanceName, InstanceName: c.InstanceName,
ComposedTemplateName: templateNameById.TryGetValue(c.ComposedTemplateId, out var cn) ? cn : string.Empty)).ToList()); ComposedTemplateName: templateNameById.TryGetValue(c.ComposedTemplateId, out var cn) ? cn : string.Empty)).ToList());
@@ -227,6 +228,7 @@ public sealed class EntitySerializer
ReturnDefinition = s.ReturnDefinition, ReturnDefinition = s.ReturnDefinition,
IsLocked = s.IsLocked, IsLocked = s.IsLocked,
MinTimeBetweenRuns = s.MinTimeBetweenRuns, MinTimeBetweenRuns = s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds,
}); });
} }
return t; return t;
@@ -1,4 +1,5 @@
using System.Security.Claims; using System.Security.Claims;
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Security; using ZB.MOM.WW.ScadaBridge.Security;
using Bunit; using Bunit;
using Microsoft.AspNetCore.Components.Authorization; using Microsoft.AspNetCore.Components.Authorization;
@@ -12,7 +13,10 @@ using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates; using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories; using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services; using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums; using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Deployment;
using ZB.MOM.WW.ScadaBridge.Communication; using ZB.MOM.WW.ScadaBridge.Communication;
using ZB.MOM.WW.ScadaBridge.DeploymentManager; using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.CentralUI.Components.Shared; using ZB.MOM.WW.ScadaBridge.CentralUI.Components.Shared;
@@ -292,6 +296,90 @@ public class TopologyPageTests : BunitContext
Assert.Throws<Bunit.MissingEventHandlerException>(() => instanceLabel.DoubleClick()); Assert.Throws<Bunit.MissingEventHandlerException>(() => instanceLabel.DoubleClick());
} }
[Fact]
public void Diff_ConnectionEndpointChange_RendersConnectionSection()
{
// TemplateEngine-018 / DeploymentManager-018: a standalone connection
// endpoint edit (no per-attribute binding change) must surface in the
// deployment-diff modal. Before ConnectionChanges was wired through
// ComputeDiff + the UI, this redeploy showed only the stale-hash badge
// with no indication that the connection endpoint had moved.
// The DiffDialog body-scroll lock + focus call out to JS interop on
// open; loose mode no-ops the handlers we don't explicitly set up.
JSInterop.Mode = JSRuntimeMode.Loose;
var areasBySite = new Dictionary<int, IReadOnlyList<Area>>
{
[1] = new List<Area> { new("Line-1") { Id = 10, SiteId = 1 } }
};
SeedRepos(
sites: new[] { new Site("Plant-A", "plant-a") { Id = 1 } },
instances: new[]
{
new Instance("Pump-001") { Id = 100, SiteId = 1, AreaId = 10, State = InstanceState.Enabled }
},
areasBySite: areasBySite);
// Deployed snapshot: connection "plc1" points at host-a.
var deployedConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Pump-001",
Connections = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-a:4840\"}",
FailoverRetryCount = 3,
}
}
};
_deployRepo.GetDeployedSnapshotByInstanceIdAsync(100, Arg.Any<CancellationToken>())
.Returns(Task.FromResult<DeployedConfigSnapshot?>(
new DeployedConfigSnapshot("dep-1", "hash-old",
JsonSerializer.Serialize(deployedConfig))));
// Current template-derived config: same connection now points at host-b.
var currentConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Pump-001",
Connections = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-b:4840\"}",
FailoverRetryCount = 3,
}
}
};
_pipeline.FlattenAndValidateAsync(100, Arg.Any<CancellationToken>())
.Returns(Task.FromResult(Result<FlatteningPipelineResult>.Success(
new FlatteningPipelineResult(currentConfig, "hash-new", ValidationResult.Success()))));
var cut = Render<TopologyPage>();
FindToggleForLabel(cut, "Plant-A")!.Click();
FindToggleForLabel(cut, "Line-1")!.Click();
// The per-node action menu only renders after a context-menu (right
// click) on the instance row, so open it first, then click "Diff".
var instanceRow = cut.FindAll(".tv-row")
.First(row => row.QuerySelector(".tv-label")?.TextContent == "Pump-001");
instanceRow.ContextMenu();
var diffButton = cut.FindAll("button.dropdown-item")
.First(b => b.TextContent.Trim() == "Diff");
diffButton.Click();
var markup = cut.Markup;
Assert.Contains("Connections", markup);
Assert.Contains("plc1", markup);
Assert.Contains("host-a", markup);
Assert.Contains("host-b", markup);
// The change is a modification, so the row carries the "Changed" badge.
Assert.Contains("Changed", markup);
}
[Fact] [Fact]
public void LegacyInstancesRoute_IsDeclaredOnTopologyPage() public void LegacyInstancesRoute_IsDeclaredOnTopologyPage()
{ {
@@ -60,6 +60,50 @@ public class DebugStreamBridgeActorTests : TestKit
return new TestContext(actor, commProbe, mockClient, events, terminated); return new TestContext(actor, commProbe, mockClient, events, terminated);
} }
[Fact]
public void On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Tears_Down_Stream_And_Terminates()
{
// M2.11 (revised for M2.18 stream-first): the gRPC subscription is now opened
// up-front in PreStart, so when the site reports InstanceNotFound=true the
// bridge actor must
// (a) forward the not-found snapshot to _onEvent so DebugStreamService's TCS
// resolves and the caller can inspect the flag,
// (b) tear DOWN the already-opened gRPC stream (Unsubscribe the just-opened
// correlation) rather than enter pass-through, and
// (c) stop itself cleanly.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>(); // initial subscribe envelope
// Stream-first: the gRPC subscription is opened before the snapshot arrives.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var notFoundSnapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow,
InstanceNotFound: true);
Watch(ctx.BridgeActor);
ctx.BridgeActor.Tell(notFoundSnapshot);
// (a) _onEvent must receive the not-found snapshot
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
var received = Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.True(received.InstanceNotFound);
}
// (b) the just-opened gRPC stream is torn down (not left running / no pass-through)
AwaitCondition(() => ctx.MockGrpcClient.UnsubscribedCorrelationIds.Contains("corr-1"),
TimeSpan.FromSeconds(3));
// (c) actor terminates cleanly
ExpectTerminated(ctx.BridgeActor, TimeSpan.FromSeconds(3));
}
[Fact] [Fact]
public void PreStart_Sends_SubscribeDebugViewRequest_Via_ClusterClient() public void PreStart_Sends_SubscribeDebugViewRequest_Via_ClusterClient()
{ {
@@ -94,11 +138,18 @@ public class DebugStreamBridgeActorTests : TestKit
} }
[Fact] [Fact]
public void On_Snapshot_Opens_GrpcStream() public void On_Snapshot_Does_Not_Open_Additional_GrpcStream()
{ {
// M2.18 stream-first: the gRPC subscription is opened in PreStart, BEFORE the
// snapshot arrives. After the snapshot is delivered the actor switches to
// pass-through — it must NOT open a second subscription. Exactly ONE subscribe
// call should have been made (the PreStart one).
var ctx = CreateBridgeActor(); var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>(); ctx.CommProbe.ExpectMsg<SiteEnvelope>();
// Verify the stream is already open before the snapshot.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot( var snapshot = new DebugViewSnapshot(
InstanceName, InstanceName,
new List<AttributeValueChanged>(), new List<AttributeValueChanged>(),
@@ -107,11 +158,12 @@ public class DebugStreamBridgeActorTests : TestKit
ctx.BridgeActor.Tell(snapshot); ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3)); // After snapshot delivery, still exactly ONE subscribe — no additional stream opened.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
var call = ctx.MockGrpcClient.SubscribeCalls[0]; TimeSpan.FromSeconds(3));
Assert.Equal("corr-1", call.CorrelationId); var singleCall = Assert.Single(ctx.MockGrpcClient.SubscribeCalls);
Assert.Equal(InstanceName, call.InstanceUniqueName); Assert.Equal("corr-1", singleCall.CorrelationId);
Assert.Equal(InstanceName, singleCall.InstanceUniqueName);
} }
[Fact] [Fact]
@@ -348,6 +400,369 @@ public class DebugStreamBridgeActorTests : TestKit
Assert.Equal("corr-1", factory.ClientFor(GrpcNodeB).SubscribeCalls[0].CorrelationId); Assert.Equal("corr-1", factory.ClientFor(GrpcNodeB).SubscribeCalls[0].CorrelationId);
} }
// ---------------------------------------------------------------------
// M2.18 (#26) — stream-first + replay/dedup
// ---------------------------------------------------------------------
[Fact]
public void PreStart_Opens_GrpcStream_Before_Snapshot_Arrives()
{
// M2.18: the gRPC subscription must be opened in PreStart (stream-first),
// BEFORE the snapshot is delivered, so live events start flowing during the
// snapshot-build + network-transit window. The old lifecycle opened the
// stream only after the snapshot arrived, losing gap-window events.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>(); // initial subscribe envelope
// No snapshot sent yet — the stream must already be open.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
Assert.Equal("corr-1", ctx.MockGrpcClient.SubscribeCalls[0].CorrelationId);
Assert.Equal(InstanceName, ctx.MockGrpcClient.SubscribeCalls[0].InstanceUniqueName);
// _onEvent must NOT have fired — buffering, not delivering.
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
}
[Fact]
public void GapWindow_Event_Buffered_Before_Snapshot_Is_Delivered_Exactly_Once_After_Snapshot()
{
// M2.18: an event arriving DURING the snapshot window (before the snapshot
// is delivered) is buffered, then flushed exactly once AFTER the snapshot.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
// Live event arrives BEFORE the snapshot — its entity is NOT in the snapshot,
// so it is a genuine gap-window event that must survive.
var gapEvent = new AttributeValueChanged(InstanceName, "IO", "Pressure", 99.9, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(gapEvent);
// While buffering, _onEvent has not fired.
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
// snapshot then the buffered gap-window event, exactly once, in that order.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
Assert.Equal("Pressure", flushed.AttributeName);
}
}
[Fact]
public void Buffered_Event_Already_Reflected_In_Snapshot_Is_Dropped()
{
// M2.18 dedup: a buffered event whose entity is in the snapshot with an equal
// or newer snapshot timestamp (buffered.Timestamp <= snapshot.Timestamp) is
// already reflected and must be DROPPED.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var t0 = DateTimeOffset.UtcNow;
// Buffered event for "Temp" at t0.
var buffered = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", t0);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(buffered);
// Snapshot already contains "Temp" at the SAME timestamp t0 → buffered is a dup.
var snapAttr = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", t0);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged> { snapAttr },
new List<AlarmStateChanged>(),
t0);
ctx.BridgeActor.Tell(snapshot);
// Only the snapshot is delivered; the buffered duplicate is dropped.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Give a beat to ensure no extra (dropped) event sneaks through.
Thread.Sleep(200);
lock (ctx.ReceivedEvents)
{
Assert.Single(ctx.ReceivedEvents);
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
}
}
[Fact]
public void Buffered_Event_Strictly_Newer_Than_Snapshot_Entity_Is_Delivered()
{
// M2.18 dedup: a buffered event strictly newer than the snapshot's entry for
// the same entity (buffered.Timestamp > snapshot.Timestamp) is NOT a dup and
// must be DELIVERED after the snapshot.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapTime = DateTimeOffset.UtcNow;
var newerTime = snapTime.AddMilliseconds(1);
// Buffered event for "Temp" strictly NEWER than the snapshot's "Temp".
var buffered = new AttributeValueChanged(InstanceName, "IO", "Temp", 50.0, "Good", newerTime);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(buffered);
var snapAttr = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", snapTime);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged> { snapAttr },
new List<AlarmStateChanged>(),
snapTime);
ctx.BridgeActor.Tell(snapshot);
// snapshot then the strictly-newer buffered event.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
Assert.Equal(50.0, flushed.Value);
Assert.Equal(newerTime, flushed.Timestamp);
}
}
[Fact]
public void Buffered_Alarm_Dedup_Uses_AlarmIdentity_And_Timestamp()
{
// M2.18 dedup for alarms: identity = (instance, alarm name, source reference).
// A buffered alarm older-or-equal to the snapshot's same-identity alarm is
// dropped; a strictly-newer one is delivered.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var t0 = DateTimeOffset.UtcNow;
// Buffered: "PumpFault" at t0 (dup) and "Overheat" at t0+1ms (newer, deliver).
var dupAlarm = new AlarmStateChanged(InstanceName, "PumpFault",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 500, t0);
var newerAlarm = new AlarmStateChanged(InstanceName, "Overheat",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 700, t0.AddMilliseconds(1));
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(dupAlarm);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(newerAlarm);
// Snapshot contains BOTH "PumpFault" and "Overheat" at t0.
var snapPumpFault = new AlarmStateChanged(InstanceName, "PumpFault",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 500, t0);
var snapOverheat = new AlarmStateChanged(InstanceName, "Overheat",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Normal, 0, t0);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged> { snapPumpFault, snapOverheat },
t0);
ctx.BridgeActor.Tell(snapshot);
// snapshot + only the strictly-newer "Overheat" alarm (PumpFault dropped).
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
Thread.Sleep(200);
lock (ctx.ReceivedEvents)
{
Assert.Equal(2, ctx.ReceivedEvents.Count);
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AlarmStateChanged>(ctx.ReceivedEvents[1]);
Assert.Equal("Overheat", flushed.AlarmName);
Assert.Equal(700, flushed.Priority);
}
}
[Fact]
public void Buffered_Events_Flushed_In_Arrival_Order()
{
// M2.18: ordering preserved across multiple buffered events (none are dups —
// their entities are absent from the snapshot).
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var baseTime = DateTimeOffset.UtcNow;
var sub = ctx.MockGrpcClient.SubscribeCalls[0];
sub.OnEvent(new AttributeValueChanged(InstanceName, "IO", "A", 1, "Good", baseTime));
sub.OnEvent(new AlarmStateChanged(InstanceName, "AlarmX",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 100, baseTime));
sub.OnEvent(new AttributeValueChanged(InstanceName, "IO", "B", 2, "Good", baseTime));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
baseTime);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 4; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.Equal("A", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
Assert.Equal("AlarmX", Assert.IsType<AlarmStateChanged>(ctx.ReceivedEvents[2]).AlarmName);
Assert.Equal("B", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[3]).AttributeName);
}
}
[Fact]
public void PassThrough_After_Flush_Delivers_Subsequent_Events_Immediately()
{
// M2.18: after the snapshot+flush the actor switches to pass-through — later
// events go straight to _onEvent (no buffering, no dup).
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Post-snapshot event — must be delivered immediately, exactly once.
var postEvent = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(postEvent);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
}
}
[Fact]
public void InstanceNotFound_After_StreamFirst_Tears_Down_Stream_And_Does_Not_PassThrough()
{
// M2.18 + M2.11: stream-first means the gRPC subscription is already open
// when an InstanceNotFound snapshot arrives. The bridge must tear that stream
// down (Unsubscribe the just-opened correlation), deliver the not-found
// snapshot, NOT enter pass-through, and stop cleanly.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
// Stream opened up-front (stream-first).
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var notFoundSnapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow,
InstanceNotFound: true);
Watch(ctx.BridgeActor);
ctx.BridgeActor.Tell(notFoundSnapshot);
// Not-found snapshot delivered.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.True(Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]).InstanceNotFound);
}
// The just-opened stream must be torn down.
AwaitCondition(() => ctx.MockGrpcClient.UnsubscribedCorrelationIds.Contains("corr-1"),
TimeSpan.FromSeconds(3));
// Stops cleanly.
ExpectTerminated(ctx.BridgeActor, TimeSpan.FromSeconds(3));
// No pass-through: an event arriving after the stop is not delivered.
var late = new AttributeValueChanged(InstanceName, "IO", "Temp", 1, "Good", DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(late);
Thread.Sleep(200);
lock (ctx.ReceivedEvents) { Assert.Single(ctx.ReceivedEvents); }
}
[Fact]
public void Reconnect_During_Buffering_Phase_Keeps_Buffering_Until_Snapshot()
{
// M2.18: a gRPC error/reconnect BEFORE the snapshot arrives must remain in the
// buffering phase — events on the new stream are still buffered, then flushed
// when the snapshot finally arrives.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
// Error before snapshot → reconnect (still buffering).
ctx.MockGrpcClient.SubscribeCalls[0].OnError(new Exception("pre-snapshot blip"));
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 2, TimeSpan.FromSeconds(5));
// Event on the reconnected stream — still buffered (snapshot not yet delivered).
var gapEvent = new AttributeValueChanged(InstanceName, "IO", "Late", 7, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[1].OnEvent(gapEvent);
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
// snapshot + the event buffered across the reconnect.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.Equal("Late", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
}
}
[Fact]
public void Reconnect_After_Snapshot_Resumes_PassThrough_Not_Buffering()
{
// M2.18: a mid-session reconnect (after the snapshot was already delivered)
// must resume pass-through — the snapshot is a one-time thing and events on
// the reconnected stream are delivered immediately, not re-buffered.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Mid-session reconnect.
ctx.MockGrpcClient.SubscribeCalls[0].OnError(new Exception("mid-session blip"));
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 2, TimeSpan.FromSeconds(5));
// Event on the reconnected stream — delivered immediately (pass-through).
var postEvent = new AttributeValueChanged(InstanceName, "IO", "Temp", 9, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[1].OnEvent(postEvent);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.Equal("Temp", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
}
}
[Fact] [Fact]
public void RetryCount_RecoveredOnlyAfterStreamStaysStableForStabilityWindow() public void RetryCount_RecoveredOnlyAfterStreamStaysStableForStabilityWindow()
{ {
@@ -394,11 +809,25 @@ public class DebugStreamBridgeActorTests : TestKit
/// <summary> /// <summary>
/// Mock gRPC client that records SubscribeAsync and Unsubscribe calls. /// Mock gRPC client that records SubscribeAsync and Unsubscribe calls.
/// <para>
/// <b>Thread safety:</b> <see cref="SubscribeCalls"/> and
/// <see cref="UnsubscribedCorrelationIds"/> are written from the actor/background thread
/// (via <see cref="SubscribeAsync"/> and <see cref="Unsubscribe"/>) and read from the test
/// thread (via <c>AwaitCondition</c> / assertions). All access goes through a shared lock
/// to match the <c>lock (events)</c> pattern used for <c>ctx.ReceivedEvents</c>.
/// </para>
/// </summary> /// </summary>
internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
{ {
public List<MockSubscription> SubscribeCalls { get; } = new(); private readonly object _lock = new();
public List<string> UnsubscribedCorrelationIds { get; } = new(); private readonly List<MockSubscription> _subscribeCalls = new();
private readonly List<string> _unsubscribedCorrelationIds = new();
/// <summary>Returns a snapshot of subscribe calls, taken under the internal lock.</summary>
public List<MockSubscription> SubscribeCalls { get { lock (_lock) { return _subscribeCalls.ToList(); } } }
/// <summary>Returns a snapshot of unsubscribed correlation IDs, taken under the internal lock.</summary>
public List<string> UnsubscribedCorrelationIds { get { lock (_lock) { return _unsubscribedCorrelationIds.ToList(); } } }
private MockSiteStreamGrpcClient(bool _) : base() { } private MockSiteStreamGrpcClient(bool _) : base() { }
@@ -414,7 +843,7 @@ internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
CancellationToken ct) CancellationToken ct)
{ {
var subscription = new MockSubscription(correlationId, instanceUniqueName, onEvent, onError, ct); var subscription = new MockSubscription(correlationId, instanceUniqueName, onEvent, onError, ct);
SubscribeCalls.Add(subscription); lock (_lock) { _subscribeCalls.Add(subscription); }
// Return a task that completes when cancelled (simulates long-running stream) // Return a task that completes when cancelled (simulates long-running stream)
var tcs = new TaskCompletionSource(); var tcs = new TaskCompletionSource();
@@ -424,7 +853,7 @@ internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
public override void Unsubscribe(string correlationId) public override void Unsubscribe(string correlationId)
{ {
UnsubscribedCorrelationIds.Add(correlationId); lock (_lock) { _unsubscribedCorrelationIds.Add(correlationId); }
} }
} }
@@ -0,0 +1,318 @@
using System.Text.RegularExpressions;
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
/// <summary>
/// Code-level guard for the AuditLog append-only invariant (task M2.10, #18).
///
/// The DB-role control (DENY UPDATE / DENY DELETE on dbo.AuditLog in migration
/// 20260602174346_CollapseAuditLogToCanonical) is the runtime enforcement layer.
/// This test is the compile-time / test-time backstop: it fails the test run if
/// any C# source file in the ConfigurationDatabase project contains an UPDATE or
/// DELETE statement that targets the AuditLog table.
///
/// <b>Matching rule (see <c>ContainsAuditLogMutation</c> for full detail)</b>
/// A line is flagged as a violation iff it matches the DML-syntax pattern:
/// • <c>UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b</c> — UPDATE targeting AuditLog
/// • <c>DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b</c> — DELETE targeting AuditLog
///
/// These tight DML-syntax patterns naturally exclude false positives:
/// - DENY UPDATE ON dbo.AuditLog … → "DENY" comes before UPDATE; the regex
/// requires UPDATE to be immediately followed by (optional schema.) AuditLog,
/// so "UPDATE ON" does NOT match "UPDATE AuditLog".
/// - ALTER TABLE dbo.AuditLog SWITCH … → ALTER TABLE precedes the table name;
/// no UPDATE/DELETE keyword present.
/// - Comments like "// AuditLog … UPDATE …" → UPDATE is not immediately followed
/// by AuditLog (there are intervening words).
/// - DELETE FROM Notifications … → AuditLog not present.
///
/// <b>Known limitations:</b> This guard scans only raw SQL strings — EF Core methods
/// such as <c>ExecuteDeleteAsync</c>, <c>ExecuteUpdateAsync</c>, and <c>RemoveRange</c>
/// targeting the AuditLog entity are NOT covered and must never be introduced.
/// Additionally, the scan is line-oriented: DML where the keyword and table name appear
/// on separate lines is an accepted, undetected edge case.
/// </summary>
public class AuditLogAppendOnlyGuardTests
{
// ---------------------------------------------------------------------------
// Source root location — same walk-up pattern used by ArchitecturalConstraintTests
// in the Commons.Tests project.
// ---------------------------------------------------------------------------
private static string GetConfigurationDatabaseSourceDirectory()
{
// Walk up from the test binary output directory until we find the
// ConfigurationDatabase csproj (a known anchor in the repo tree).
var dir = new DirectoryInfo(AppContext.BaseDirectory);
while (dir != null)
{
var candidate = Path.Combine(
dir.FullName,
"src",
"ZB.MOM.WW.ScadaBridge.ConfigurationDatabase",
"ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.csproj");
if (File.Exists(candidate))
{
return Path.GetDirectoryName(candidate)!;
}
dir = dir.Parent;
}
throw new InvalidOperationException(
"Could not locate ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.csproj " +
"by walking up from the test output directory. " +
"Ensure the test is run from inside the repo clone.");
}
// ---------------------------------------------------------------------------
// Detection helper — kept as a static method so it can be unit-tested in
// isolation below without requiring any file I/O.
// ---------------------------------------------------------------------------
/// <summary>
/// Returns <see langword="true"/> when the supplied text (typically a single
/// source line) contains a SQL UPDATE or DELETE DML statement that directly
/// targets the <c>AuditLog</c> table.
///
/// <b>Matching rule.</b> The regex requires the DML keyword to be
/// immediately followed (possibly via FROM) by the optional schema prefix
/// (<c>dbo.</c> or <c>[dbo].</c>) and then the table name <c>AuditLog</c>
/// or <c>[AuditLog]</c> as a whole word:
/// <code>
/// UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
/// DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
/// </code>
/// This tight DML-syntax pattern naturally excludes false positives without
/// any additional keyword checks:
/// <list type="bullet">
/// <item><description>
/// <c>DENY UPDATE ON dbo.AuditLog …</c> — "UPDATE ON" is never immediately
/// followed by AuditLog; the pattern requires UPDATE → optional schema → AuditLog.
/// </description></item>
/// <item><description>
/// <c>ALTER TABLE dbo.AuditLog SWITCH …</c> — no UPDATE/DELETE keyword present.
/// </description></item>
/// <item><description>
/// <c>// AuditLog is append-only; never issue an UPDATE against it.</c> —
/// UPDATE is not followed by AuditLog here.
/// </description></item>
/// <item><description>
/// <c>DELETE FROM dbo.Notifications …</c> — AuditLog not present.
/// </description></item>
/// </list>
/// </summary>
/// <param name="text">A single source line (or any string to probe).</param>
/// <returns><see langword="true"/> if a mutation against AuditLog is detected.</returns>
internal static bool ContainsAuditLogMutation(string text)
{
if (string.IsNullOrEmpty(text))
{
return false;
}
// DML-syntax pattern: the UPDATE or DELETE keyword must be directly followed
// (optionally via FROM) by the optional schema qualifier and then the table name.
//
// Schema sub-pattern : (?:\[?dbo\]?\.)?
// matches: nothing, "dbo.", "[dbo]."
//
// Table sub-pattern : \[?AuditLog\]?
// matches: "AuditLog", "[AuditLog]"
//
// UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
// matches: "UPDATE AuditLog", "UPDATE dbo.AuditLog",
// "UPDATE [AuditLog]", "UPDATE [dbo].[AuditLog]"
// does NOT match: "DENY UPDATE ON dbo.AuditLog" (UPDATE is followed by ON)
//
// DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
// matches: "DELETE FROM AuditLog", "DELETE FROM dbo.AuditLog",
// "DELETE FROM [AuditLog]", "DELETE FROM [dbo].[AuditLog]"
// does NOT match: "DENY DELETE ON dbo.AuditLog" (DELETE is followed by ON)
return AuditLogMutationPattern.IsMatch(text);
}
private static readonly Regex AuditLogMutationPattern = new(
@"\bUPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
@"|\bDELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
// ---------------------------------------------------------------------------
// Guard test: scan every *.cs file in ConfigurationDatabase (excluding
// Designer/Snapshot EF artefacts and the obj/ directory).
// ---------------------------------------------------------------------------
[Fact]
public void ConfigurationDatabase_ShouldNotContainAuditLogMutations()
{
var sourceDir = GetConfigurationDatabaseSourceDirectory();
// Enumerate all .cs files; exclude EF scaffolding and build output.
var csFiles = Directory.GetFiles(sourceDir, "*.cs", SearchOption.AllDirectories)
.Where(f => !f.Contains(Path.DirectorySeparatorChar + "obj" + Path.DirectorySeparatorChar))
.Where(f => !f.EndsWith(".Designer.cs", StringComparison.OrdinalIgnoreCase))
.Where(f => !f.EndsWith("ModelSnapshot.cs", StringComparison.OrdinalIgnoreCase))
.ToList();
Assert.True(csFiles.Count > 0,
$"Expected to find .cs files under {sourceDir} but found none — source directory location may be wrong.");
var violations = new List<string>();
foreach (var file in csFiles)
{
var content = File.ReadAllText(file);
// Scan line-by-line so violation messages cite the exact line number.
var lines = content.Split('\n');
for (var i = 0; i < lines.Length; i++)
{
if (ContainsAuditLogMutation(lines[i]))
{
var relativePath = Path.GetRelativePath(sourceDir, file);
violations.Add($"{relativePath}:{i + 1}: {lines[i].Trim()}");
}
}
}
Assert.True(violations.Count == 0,
"AuditLog append-only guard: found UPDATE/DELETE targeting dbo.AuditLog " +
"in ConfigurationDatabase source. AuditLog is APPEND-ONLY (retention uses " +
"partition-switch DDL, not row DELETE). Violation(s):\n" +
string.Join("\n", violations));
}
// ---------------------------------------------------------------------------
// Self-verifying matcher unit tests — prove the helper does what it claims.
// ---------------------------------------------------------------------------
[Fact]
public void ContainsAuditLogMutation_ReturnsFalse_ForCleanSource()
{
// The guard scan over real source PASSES (no violations) — this fact is
// already asserted by ConfigurationDatabase_ShouldNotContainAuditLogMutations.
// Here we verify the helper directly on a representative set of CLEAN lines
// that appear in the production source tree.
// INSERT is not a mutation (append-only operations are fine).
Assert.False(ContainsAuditLogMutation(
"INSERT INTO dbo.AuditLog (EventId, OccurredAtUtc) VALUES (@id, @ts);"));
// SELECT is not a mutation.
Assert.False(ContainsAuditLogMutation(
"SELECT COUNT(*) FROM dbo.AuditLog WHERE OccurredAtUtc >= @threshold;"));
// ALTER TABLE SWITCH is the retention purge — not a row-level mutation.
Assert.False(ContainsAuditLogMutation(
"ALTER TABLE dbo.AuditLog SWITCH PARTITION 3 TO dbo.AuditLog_Staging;"));
// DENY DDL from the role-grant migration — must not be flagged.
Assert.False(ContainsAuditLogMutation(
"DENY UPDATE ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"DENY DELETE ON dbo.AuditLog TO scadabridge_audit_writer;"));
// GRANT DDL — also must not be flagged.
Assert.False(ContainsAuditLogMutation(
"GRANT INSERT ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"GRANT SELECT ON dbo.AuditLog TO scadabridge_audit_writer;"));
// DELETE on a different table — AuditLog not on the same line.
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.Notifications WHERE Status = 'Delivered';"));
// DELETE on a different table even though AuditLog appears nearby in the
// same line but beyond the proximity window (padded to >120 chars between).
var longSeparator = new string(' ', 130);
Assert.False(ContainsAuditLogMutation(
$"DELETE FROM dbo.Notifications WHERE Id = @id;{longSeparator}-- see also AuditLog"));
// Comment-only mention of AuditLog with UPDATE elsewhere in a comment.
Assert.False(ContainsAuditLogMutation(
"// AuditLog is append-only; never issue an UPDATE against it."));
// TRUNCATE on the staging table (not AuditLog directly); staging name only.
Assert.False(ContainsAuditLogMutation(
"TRUNCATE TABLE dbo.AuditLog_Staging_abc123;"));
}
[Fact]
public void ContainsAuditLogMutation_ReturnsTrue_ForPlantedViolations()
{
// Planted positive cases — the guard MUST catch these.
// Classic UPDATE targeting AuditLog.
Assert.True(ContainsAuditLogMutation(
"UPDATE AuditLog SET Status = 'Corrected' WHERE EventId = @id;"));
// UPDATE with schema prefix.
Assert.True(ContainsAuditLogMutation(
"UPDATE dbo.AuditLog SET DetailsJson = @json WHERE EventId = @id;"));
// DELETE FROM AuditLog.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM AuditLog WHERE OccurredAtUtc < @threshold;"));
// DELETE with schema prefix.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM dbo.AuditLog WHERE Status = 'Parked';"));
// Mixed case (SQL is case-insensitive in practice).
Assert.True(ContainsAuditLogMutation(
"update dbo.AuditLog set Actor = 'system' where Actor is null;"));
// AuditLog mentioned earlier in the line (e.g. in a comment prefix), with a real
// UPDATE dbo.AuditLog DML following — the DML occurrence must still be caught.
Assert.True(ContainsAuditLogMutation(
"-- AuditLog: UPDATE dbo.AuditLog SET x = 1"));
// ---- Bracketed identifier forms (SSMS-generated SQL) ----
// UPDATE [dbo].[AuditLog] — bracketed schema and bracketed table.
Assert.True(ContainsAuditLogMutation(
"UPDATE [dbo].[AuditLog] SET DetailsJson = @json WHERE EventId = @id;"));
// UPDATE [AuditLog] — bracketed table, no schema prefix.
Assert.True(ContainsAuditLogMutation(
"UPDATE [AuditLog] SET Status = 'Corrected' WHERE EventId = @id;"));
// DELETE FROM [dbo].[AuditLog] — bracketed schema and bracketed table.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM [dbo].[AuditLog] WHERE OccurredAtUtc < @threshold;"));
// DELETE FROM [AuditLog] — bracketed table, no schema prefix.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM [AuditLog] WHERE OccurredAtUtc < @threshold;"));
}
[Fact]
public void ContainsAuditLogMutation_ReturnsFalse_ForDenyGrantAndPartitionSwitchSamples()
{
// Extra explicit coverage for the four concrete exclusion patterns
// that appear in the real migration files.
// From 20260602174346_CollapseAuditLogToCanonical.cs and 20260520142214_AddAuditLogTable.cs:
Assert.False(ContainsAuditLogMutation(
"DENY UPDATE ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"DENY DELETE ON dbo.AuditLog TO scadabridge_audit_writer;"));
// From AuditLogRepository.cs SwitchOutPartitionAsync:
Assert.False(ContainsAuditLogMutation(
"ALTER TABLE dbo.AuditLog SWITCH PARTITION ' + CAST(@partitionNumber AS nvarchar(10)) + ' TO dbo.[' + @stagingName + '];"));
// Notifications DELETE (legitimate; AuditLog not present on the line):
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.Notifications WHERE CompletedAtUtc < @cutoff;"));
// Notifications DELETE using bracketed identifiers — AuditLog not present:
Assert.False(ContainsAuditLogMutation(
"DELETE FROM [dbo].[Notifications] WHERE CompletedAtUtc < @cutoff;"));
// SiteCalls DELETE (legitimate; AuditLog not present on the line):
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.SiteCalls WHERE TerminalAtUtc < @cutoff;"));
}
}
@@ -61,6 +61,34 @@ public class TemplateEngineRepositoryTests : IDisposable
Assert.Equal("Slot1", loaded.Compositions.First().InstanceName); Assert.Equal("Slot1", loaded.Compositions.First().InstanceName);
} }
[Fact]
public async Task TemplateScript_ExecutionTimeoutSeconds_RoundTripsThroughEf()
{
// M2.5 (#9): the nullable per-script execution timeout must persist and
// reload through EF — both an explicit value and a null (use-global).
var template = new Template("TimeoutTemplate");
template.Scripts.Add(new TemplateScript("WithTimeout", "return 1;")
{
ExecutionTimeoutSeconds = 45
});
template.Scripts.Add(new TemplateScript("NoTimeout", "return 2;")); // null
_context.Templates.Add(template);
await _context.SaveChangesAsync();
// Detach so the reload comes from the store, not the change tracker.
_context.ChangeTracker.Clear();
var loaded = await _context.Templates
.Include(t => t.Scripts)
.SingleAsync(t => t.Name == "TimeoutTemplate");
var withTimeout = loaded.Scripts.Single(s => s.Name == "WithTimeout");
Assert.Equal(45, withTimeout.ExecutionTimeoutSeconds);
var noTimeout = loaded.Scripts.Single(s => s.Name == "NoTimeout");
Assert.Null(noTimeout.ExecutionTimeoutSeconds);
}
[Fact] [Fact]
public async Task GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist() public async Task GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist()
{ {
@@ -0,0 +1,166 @@
using Opc.Ua;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests.Adapters;
/// <summary>
/// M2.4 (#8): the OPC UA EventFilter gains a server-side <see cref="ContentFilter"/>
/// WhereClause as a bandwidth optimisation when a condition-type filter is present.
/// The client-side gate in DataConnectionActor remains authoritative; these tests
/// only pin the filter-shaping. No live server required — pure SDK object building.
/// </summary>
public class RealOpcUaClientAlarmFilterTests
{
[Fact]
public void BuildAlarmEventFilter_NoFilter_HasNoWhereClause()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
Assert.NotEmpty(filter.SelectClauses);
Assert.Empty(filter.WhereClause.Elements);
}
[Fact]
public void BuildAlarmEventFilter_WithKnownTypes_BuildsNonEmptyWhereClause()
{
var parsed = AlarmConditionFilter.Parse("LimitAlarmType,DiscreteAlarmType");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.NotEmpty(filter.WhereClause.Elements);
// Two known types → two OfType operands (OR'd when more than one).
var ofTypeCount = filter.WhereClause.Elements.Count(e => e.FilterOperator == FilterOperator.OfType);
Assert.Equal(2, ofTypeCount);
Assert.Contains(filter.WhereClause.Elements, e => e.FilterOperator == FilterOperator.Or);
}
[Fact]
public void BuildAlarmEventFilter_SingleKnownType_BuildsSingleOfType_NoOr()
{
var parsed = AlarmConditionFilter.Parse("AlarmConditionType");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Single(filter.WhereClause.Elements);
Assert.Equal(FilterOperator.OfType, filter.WhereClause.Elements[0].FilterOperator);
}
[Fact]
public void BuildAlarmEventFilter_TypeMatchingIsCaseInsensitive()
{
var parsed = AlarmConditionFilter.Parse("limitalarmtype");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Single(filter.WhereClause.Elements, e => e.FilterOperator == FilterOperator.OfType);
}
[Fact]
public void BuildAlarmEventFilter_AllUnknownTypes_OmitsWhereClause()
{
// Custom/vendor type names we cannot map to standard NodeIds are skipped
// server-side; the client-side gate still enforces them. Omitting the
// WhereClause is the safe choice — a partial WhereClause would drop the
// unmapped types at the server and break correctness.
var parsed = AlarmConditionFilter.Parse("MyVendorCustomAlarm,AnotherCustomThing");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Empty(filter.WhereClause.Elements);
}
[Fact]
public void BuildAlarmEventFilter_MixedKnownAndUnknown_OmitsWhereClause()
{
// If ANY requested type can't be mapped, a server-side WhereClause would
// silently drop that type's events — so we omit the optimisation entirely
// and let the (authoritative) client gate do the filtering.
var parsed = AlarmConditionFilter.Parse("LimitAlarmType,MyVendorCustomAlarm");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Empty(filter.WhereClause.Elements);
}
// ── SelectClause index alignment (M2.13 / #27) ───────────────────────────
// CRITICAL: HandleAlarmEvent reads fields[N] by position. Verify new clauses
// are APPENDED at indices 1317 so existing mappings (012) are undisturbed.
[Fact]
public void BuildAlarmEventFilter_HasExactly18SelectClauses()
{
// Baseline: 6 base fields + 7 A&C sub-state fields + 5 new appended fields = 18.
// If this count changes, review HandleAlarmEvent index mappings immediately.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
Assert.Equal(18, filter.SelectClauses.Count);
}
[Fact]
public void BuildAlarmEventFilter_Index13_IsAlarmConditionType_ActiveState_TransitionTime()
{
// Index 13 must be AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[13];
Assert.Equal(ObjectTypeIds.AlarmConditionType, clause.TypeDefinitionId);
Assert.Equal(2, clause.BrowsePath.Count);
Assert.Equal("ActiveState", clause.BrowsePath[0].Name);
Assert.Equal("TransitionTime", clause.BrowsePath[1].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index14_IsLimitAlarmType_HighHighLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[14];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("HighHighLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index15_IsLimitAlarmType_HighLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[15];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("HighLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index16_IsLimitAlarmType_LowLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[16];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("LowLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index17_IsLimitAlarmType_LowLowLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[17];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("LowLowLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_ExistingIndices0To12_Unchanged()
{
// Guard: the first 13 SelectClauses (indices 012) must remain unchanged so
// that existing HandleAlarmEvent logic is not silently broken by future edits.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
// Indices 05: base event fields (EventType…Severity) from BaseEventType.
for (var i = 0; i <= 5; i++)
Assert.Equal(ObjectTypeIds.BaseEventType, filter.SelectClauses[i].TypeDefinitionId);
// Index 6: AlarmConditionType/ActiveState/Id
Assert.Equal(ObjectTypeIds.AlarmConditionType, filter.SelectClauses[6].TypeDefinitionId);
Assert.Equal("ActiveState", filter.SelectClauses[6].BrowsePath[0].Name);
Assert.Equal("Id", filter.SelectClauses[6].BrowsePath[1].Name);
// Index 7: AcknowledgeableConditionType/AckedState/Id
Assert.Equal(ObjectTypeIds.AcknowledgeableConditionType, filter.SelectClauses[7].TypeDefinitionId);
Assert.Equal("AckedState", filter.SelectClauses[7].BrowsePath[0].Name);
// Index 11: ConditionType/ConditionName
Assert.Equal(ObjectTypeIds.ConditionType, filter.SelectClauses[11].TypeDefinitionId);
Assert.Equal("ConditionName", filter.SelectClauses[11].BrowsePath[0].Name);
// Index 12: ConditionType/Comment
Assert.Equal(ObjectTypeIds.ConditionType, filter.SelectClauses[12].TypeDefinitionId);
Assert.Equal("Comment", filter.SelectClauses[12].BrowsePath[0].Name);
}
}
@@ -0,0 +1,113 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.Alarms;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests;
/// <summary>
/// M2.4 (#8): the alarm conditionFilter is a comma-separated, case-insensitive
/// list of condition type names. Blank = allow all. These tests pin the
/// parse-once / IsAllowed predicate that the DataConnectionActor uses as the
/// authoritative client-side gate.
/// </summary>
public class AlarmConditionFilterTests
{
private static NativeAlarmTransition Tx(string typeName,
AlarmTransitionKind kind = AlarmTransitionKind.Raise) =>
new("ref", "obj", typeName, kind,
new AlarmConditionState(true, false, null, AlarmShelveState.Unshelved, false, 500),
"cat", "desc", "msg", "", "", null, DateTimeOffset.UtcNow, "1", "0");
[Theory]
[InlineData(null)]
[InlineData("")]
[InlineData(" ")]
[InlineData(",")]
[InlineData(" , , ")]
public void NullOrBlankFilter_IsEmpty_AllowsEverything(string? filter)
{
var f = AlarmConditionFilter.Parse(filter);
Assert.True(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("anything-at-all")));
}
[Fact]
public void Parse_SplitsCommaSeparatedList()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi,DiscreteAlarm,AnalogLimit.Lo");
Assert.False(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
Assert.True(f.IsAllowed(Tx("AnalogLimit.Lo")));
Assert.False(f.IsAllowed(Tx("AnalogLimit.HiHi")));
}
[Fact]
public void IsAllowed_IsCaseInsensitive()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("analoglimit.hi")));
Assert.True(f.IsAllowed(Tx("ANALOGLIMIT.HI")));
Assert.False(f.IsAllowed(Tx("DiscreteAlarm")));
}
[Fact]
public void Parse_TrimsWhitespaceAroundEachName()
{
var f = AlarmConditionFilter.Parse(" AnalogLimit.Hi ,\tDiscreteAlarm ");
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
}
[Fact]
public void Parse_DropsEmptyEntries_KeepsNonEmpty()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi,, ,DiscreteAlarm");
Assert.False(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
Assert.False(f.IsAllowed(Tx("")));
}
[Fact]
public void IsAllowed_NeverDropsSnapshotCompleteFramingSentinel()
{
// SnapshotComplete is a pure framing sentinel (empty AlarmTypeName) that
// drives the NativeAlarmActor's atomic snapshot swap. A type filter must
// never swallow it or the snapshot replay never completes.
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("", AlarmTransitionKind.SnapshotComplete)));
}
[Fact]
public void IsAllowed_FiltersReplayedSnapshotConditionsByType()
{
// Snapshot-kind transitions carry real conditions and ARE filtered.
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi", AlarmTransitionKind.Snapshot)));
Assert.False(f.IsAllowed(Tx("DiscreteAlarm", AlarmTransitionKind.Snapshot)));
}
[Fact]
public void Names_ExposesNormalizedSet_ForServerSideOptimization()
{
var f = AlarmConditionFilter.Parse(" AnalogLimit.Hi , DiscreteAlarm ");
Assert.Equal(new[] { "AnalogLimit.Hi", "DiscreteAlarm" }, f.Names.OrderBy(n => n).ToArray());
Assert.Empty(AlarmConditionFilter.Parse(null).Names);
}
[Fact]
public void IsAllowed_OpcUaResolvedFriendlyName_MatchesFriendlyNameFilter()
{
// M2.4 (#8) regression: OPC UA delivers events whose AlarmTypeName, after
// RealOpcUaClient.ResolveAlarmTypeName, is a standard friendly type name
// (e.g. "ExclusiveLevelAlarmType"). A friendly-name filter on that source
// built a correct server WhereClause; the client gate must agree and deliver,
// not drop every event (which the prior NodeId-string AlarmTypeName caused).
var f = AlarmConditionFilter.Parse("ExclusiveLevelAlarmType,DiscreteAlarmType");
Assert.True(f.IsAllowed(Tx("ExclusiveLevelAlarmType")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarmType")));
Assert.False(f.IsAllowed(Tx("OffNormalAlarmType")));
}
}
@@ -23,10 +23,27 @@ public class DataConnectionActorAlarmTests : TestKit
}; };
private static NativeAlarmTransition Raise(string sourceRef, string sourceObj) => private static NativeAlarmTransition Raise(string sourceRef, string sourceObj) =>
new(sourceRef, sourceObj, "AnalogLimit.Hi", AlarmTransitionKind.Raise, Raise(sourceRef, sourceObj, "AnalogLimit.Hi");
private static NativeAlarmTransition Raise(string sourceRef, string sourceObj, string typeName,
AlarmTransitionKind kind = AlarmTransitionKind.Raise) =>
new(sourceRef, sourceObj, typeName, kind,
new AlarmConditionState(true, false, null, AlarmShelveState.Unshelved, false, 500), new AlarmConditionState(true, false, null, AlarmShelveState.Unshelved, false, 500),
"Process", "hi", "hi", "", "", null, DateTimeOffset.UtcNow, "92", "90"); "Process", "hi", "hi", "", "", null, DateTimeOffset.UtcNow, "92", "90");
private static (IDataConnection Adapter, Func<AlarmTransitionCallback?> Cb) BuildAlarmAdapter()
{
AlarmTransitionCallback? cb = null;
var adapter = Substitute.For<IDataConnection, IAlarmSubscribableConnection>();
adapter.ConnectAsync(Arg.Any<IDictionary<string, string>>(), Arg.Any<CancellationToken>())
.Returns(Task.CompletedTask);
((IAlarmSubscribableConnection)adapter)
.SubscribeAlarmsAsync(Arg.Any<string>(), Arg.Any<string?>(),
Arg.Do<AlarmTransitionCallback>(c => cb = c), Arg.Any<CancellationToken>())
.Returns(Task.FromResult("alarm-sub-1"));
return (adapter, () => cb);
}
[Fact] [Fact]
public void SubscribeAlarms_RoutesTransitionToInstanceSubscriber() public void SubscribeAlarms_RoutesTransitionToInstanceSubscriber()
{ {
@@ -63,4 +80,119 @@ public class DataConnectionActorAlarmTests : TestKit
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01", null, DateTimeOffset.UtcNow)); actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01", null, DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => !m.Success && m.ErrorMessage != null); ExpectMsg<SubscribeAlarmsResponse>(m => !m.Success && m.ErrorMessage != null);
} }
// ── M2.4 (#8): conditionFilter is now applied client-side in the actor ──
[Fact]
public void SubscribeAlarms_WithTypeFilter_DeliversOnlyMatchingTypes()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
"AnalogLimit.Hi,AnalogLimit.Lo", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Non-matching type is dropped (no message delivered).
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi"));
ExpectNoMsg(TimeSpan.FromMilliseconds(250));
// Matching type is delivered.
cb!(Raise("Tank01.Hi", "Tank01", "AnalogLimit.Hi"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.Hi");
}
[Fact]
public void SubscribeAlarms_WithNullFilter_DeliversAllTypes()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01", null, DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.HiHi");
cb!(Raise("Tank01.Lo", "Tank01", "DiscreteAlarm"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "DiscreteAlarm");
}
[Fact]
public void SubscribeAlarms_FilterMatch_IgnoresCaseAndWhitespace()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
" analoglimit.hi ,\tDISCRETEALARM ", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
cb!(Raise("Tank01.Hi", "Tank01", "AnalogLimit.Hi")); // case differs from filter
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.Hi");
cb!(Raise("Tank01.Disc", "Tank01", "DiscreteAlarm"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "DiscreteAlarm");
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi")); // not listed
ExpectNoMsg(TimeSpan.FromMilliseconds(250));
}
[Fact]
public void SubscribeAlarms_GatewayWideFeed_IsFilteredClientSide()
{
// MxGateway has no server-side filter: its adapter opens ONE gateway-wide
// feed and the actor is the authoritative gate. A filtered source must
// only see its own matching types even though the feed carries everything.
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "MxGateway")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Reactor",
"HighTemp", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Gateway-wide feed delivers a transition for a different source object —
// dropped by source routing.
cb!(Raise("Pump.Fault", "Pump", "HighTemp"));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
// Right source, wrong type — dropped by the client-side type gate.
cb!(Raise("Reactor.LowTemp", "Reactor", "LowTemp"));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
// Right source, right type — delivered.
cb!(Raise("Reactor.HighTemp", "Reactor", "HighTemp"));
ExpectMsg<NativeAlarmTransitionUpdate>(u =>
u.Transition.SourceObjectReference == "Reactor" && u.Transition.AlarmTypeName == "HighTemp");
}
[Fact]
public void SubscribeAlarms_WithFilter_StillForwardsSnapshotCompleteSentinel()
{
// The SnapshotComplete framing sentinel (empty AlarmTypeName) must survive
// the type gate so the NativeAlarmActor's snapshot swap can complete.
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
"AnalogLimit.Hi", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Snapshot-complete sentinel: empty source refs (the framing marker) but
// routed because every subscriber receives it; never type-filtered.
cb!(new NativeAlarmTransition("Tank01", "Tank01", "", AlarmTransitionKind.SnapshotComplete,
new AlarmConditionState(false, true, null, AlarmShelveState.Unshelved, false, 0),
"", "", "", "", "", null, DateTimeOffset.UtcNow, "", ""));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.Kind == AlarmTransitionKind.SnapshotComplete);
}
} }
@@ -1,3 +1,4 @@
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto; using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters; using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
using CommonsTransitionKind = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmTransitionKind; using CommonsTransitionKind = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmTransitionKind;
@@ -63,4 +64,91 @@ public class MxGatewayAlarmMapperTests
Assert.False(t.Condition.Acknowledged); Assert.False(t.Condition.Acknowledged);
Assert.Equal(1000, t.Condition.Severity); Assert.Equal(1000, t.Condition.Severity);
} }
// ── CurrentValue / LimitValue (M2.13 / #27) ──────────────────────────────
[Fact]
public void MapTransition_CurrentAndLimitValue_PopulatedFromProto()
{
// The gateway proto OnAlarmTransitionEvent carries current_value and
// limit_value as MxValue union fields. Verify both are mapped through
// MxValueToString into the neutral NativeAlarmTransition strings.
var ev = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
TransitionKind = ProtoTransitionKind.Raise,
Severity = 800,
CurrentValue = 95.3.ToMxValue(),
LimitValue = 90.0.ToMxValue()
};
var t = MxGatewayAlarmMapper.MapTransition(ev);
Assert.Equal("95.3", t.CurrentValue);
Assert.Equal("90", t.LimitValue);
}
[Fact]
public void MapTransition_AbsentCurrentAndLimitValue_YieldsEmpty()
{
// When the gateway sends events without current/limit value fields (optional),
// the resulting transition must have empty strings — never null.
var ev = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.Hi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.Hi",
TransitionKind = ProtoTransitionKind.Raise,
Severity = 600
// CurrentValue and LimitValue not set → proto default (null reference)
};
var t = MxGatewayAlarmMapper.MapTransition(ev);
Assert.Equal("", t.CurrentValue);
Assert.Equal("", t.LimitValue);
}
[Fact]
public void MapSnapshot_CurrentAndLimitValue_PopulatedFromProto()
{
// ActiveAlarmSnapshot also carries current_value and limit_value.
var snap = new ActiveAlarmSnapshot
{
AlarmFullReference = "Pump01.Vibration.HiHi",
SourceObjectReference = "Pump01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
CurrentState = ProtoConditionState.Active,
Severity = 900,
CurrentValue = 12.7.ToMxValue(),
LimitValue = 10.0.ToMxValue()
};
var t = MxGatewayAlarmMapper.MapSnapshot(snap);
Assert.Equal("12.7", t.CurrentValue);
Assert.Equal("10", t.LimitValue);
}
[Fact]
public void MapSnapshot_StringMxValue_ProducesStringCurrentValue()
{
// MxValue can carry string values (e.g. for discrete/string-type tags).
var snap = new ActiveAlarmSnapshot
{
AlarmFullReference = "Mode.Alarm",
SourceObjectReference = "Mode",
AlarmTypeName = "DiscreteAlarm",
CurrentState = ProtoConditionState.Active,
Severity = 500,
CurrentValue = "FAULT".ToMxValue()
};
var t = MxGatewayAlarmMapper.MapSnapshot(snap);
Assert.Equal("FAULT", t.CurrentValue);
Assert.Equal("", t.LimitValue); // not set
}
} }
@@ -55,4 +55,54 @@ public class OpcUaAlarmMapperTests
{ {
Assert.Equal(expected, OpcUaAlarmMapper.MapShelve(name)); Assert.Equal(expected, OpcUaAlarmMapper.MapShelve(name));
} }
// ── PickLimitValue (M2.13 / #27) ─────────────────────────────────────────
[Fact]
public void PickLimitValue_AllNull_ReturnsEmpty()
{
// All four limit fields absent (non-limit alarm type) → empty string.
Assert.Equal("", OpcUaAlarmMapper.PickLimitValue(null, null, null, null));
}
[Fact]
public void PickLimitValue_HighHighLimitPresent_ReturnsIt()
{
// HighHighLimit takes top priority; other fields are null (absent).
var result = OpcUaAlarmMapper.PickLimitValue(100.5, null, null, null);
Assert.Equal("100.5", result);
}
[Fact]
public void PickLimitValue_OnlyHighLimit_ReturnsHighLimit()
{
// Only HighLimit present (HighHighLimit absent on this alarm type).
var result = OpcUaAlarmMapper.PickLimitValue(null, 80.0, null, null);
Assert.Equal("80", result);
}
[Fact]
public void PickLimitValue_PriorityOrder_HighHighWinsOverHigh()
{
// When multiple limits are present, HighHighLimit takes precedence.
var result = OpcUaAlarmMapper.PickLimitValue(95.0, 80.0, 20.0, 5.0);
Assert.Equal("95", result);
}
[Fact]
public void PickLimitValue_OnlyLowLow_ReturnsLowLow()
{
// LowLowLimit only — last in priority, but should still be returned.
var result = OpcUaAlarmMapper.PickLimitValue(null, null, null, -10.5);
Assert.Equal("-10.5", result);
}
[Fact]
public void PickLimitValue_UsesInvariantCulture()
{
// Decimal separator must always be '.' regardless of thread culture.
var result = OpcUaAlarmMapper.PickLimitValue(1.5, null, null, null);
Assert.Contains('.', result); // invariant culture: '.' not ','
Assert.Equal("1.5", result);
}
} }
@@ -0,0 +1,63 @@
using Opc.Ua;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests;
/// <summary>
/// M2.4 (#8) regression: standard OPC UA A&amp;C events carry an event-type
/// <see cref="NodeId"/> (e.g. <c>i=9341</c> for ExclusiveLevelAlarmType), but the
/// client-side conditionFilter gate — and the server-side WhereClause — both key off
/// the friendly type names in <see cref="RealOpcUaClient.KnownConditionTypeIds"/>.
/// <see cref="RealOpcUaClient.ResolveAlarmTypeName"/> bridges the two by resolving the
/// event-type NodeId back to its friendly name (NodeId-string fallback for custom
/// types), so a friendly-name filter actually matches the events the server delivers.
/// </summary>
public class RealOpcUaClientAlarmFilterTests
{
[Fact]
public void ResolveAlarmTypeName_KnownStandardNodeId_ReturnsFriendlyName()
{
// The well-known NodeId for ExclusiveLevelAlarmType (i=9341) must resolve to
// the friendly name the conditionFilter/WhereClause use.
var resolved = RealOpcUaClient.ResolveAlarmTypeName(ObjectTypeIds.ExclusiveLevelAlarmType);
Assert.Equal("ExclusiveLevelAlarmType", resolved);
}
[Fact]
public void ResolveAlarmTypeName_DiscreteAlarmNodeId_ReturnsFriendlyName()
{
var resolved = RealOpcUaClient.ResolveAlarmTypeName(ObjectTypeIds.DiscreteAlarmType);
Assert.Equal("DiscreteAlarmType", resolved);
}
[Fact]
public void ResolveAlarmTypeName_UnknownCustomNodeId_ReturnsNodeIdString()
{
// A vendor/custom subtype not in KnownConditionTypeIds: we cannot map it to a
// friendly name, so we fall back to its NodeId string. This is consistent —
// the WhereClause is also omitted for unknown names, so the client gate matches
// the NodeId string, which is the only thing such a filter could carry.
var custom = new NodeId(987654u, 7);
var resolved = RealOpcUaClient.ResolveAlarmTypeName(custom);
Assert.Equal(custom.ToString(), resolved);
}
[Fact]
public void ResolveAlarmTypeName_Null_ReturnsEmptyString()
{
Assert.Equal("", RealOpcUaClient.ResolveAlarmTypeName(null));
}
[Fact]
public void InverseMap_RoundTrips_EveryKnownConditionType()
{
// The friendly→NodeId map (KnownConditionTypeIds) and the NodeId→friendly map
// are derived from a single source of truth, so they must round-trip for every
// entry — guards against the two maps drifting apart.
foreach (var (friendlyName, nodeId) in RealOpcUaClient.KnownConditionTypeIds)
{
var resolved = RealOpcUaClient.ResolveAlarmTypeName(nodeId);
Assert.Equal(friendlyName, resolved);
}
}
}
@@ -22,6 +22,8 @@
uses a plain [Fact] — it never needs the server. uses a plain [Fact] — it never needs the server.
--> -->
<PackageReference Include="Xunit.SkippableFact" /> <PackageReference Include="Xunit.SkippableFact" />
<!-- MxGateway.Client brings MxValueExtensions (ToClrValue) used by MxGatewayAlarmMapper tests. -->
<PackageReference Include="ZB.MOM.WW.MxGateway.Client" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
@@ -0,0 +1,122 @@
using NSubstitute;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Instances;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Flattening;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
namespace ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests;
/// <summary>
/// M2.8 (#23): proves the deploy path (FlatteningPipeline.FlattenAndValidateAsync)
/// opts into connection-binding enforcement, so a data-sourced attribute with no
/// binding gates the deployment as an ERROR (not just a warning), and that a binding
/// resolving to a connection that actually exists at the target site passes.
/// </summary>
public class FlatteningPipelineConnectionBindingTests
{
private const int InstanceId = 1;
private const int TemplateId = 10;
private const int SiteId = 100;
private const int ConnectionId = 7;
private readonly ITemplateEngineRepository _templateRepo = Substitute.For<ITemplateEngineRepository>();
private readonly ISiteRepository _siteRepo = Substitute.For<ISiteRepository>();
private readonly FlatteningPipeline _sut;
public FlatteningPipelineConnectionBindingTests()
{
_sut = new FlatteningPipeline(
_templateRepo,
_siteRepo,
new FlatteningService(),
new ValidationService(),
new RevisionHashService());
}
/// <summary>
/// Seeds a single-template chain with one data-sourced attribute ("Temp") and a
/// site that owns a single "PlantBus" data connection. The instance optionally
/// binds "Temp" to <paramref name="boundConnectionId"/>.
/// </summary>
private void Arrange(int? boundConnectionId)
{
var template = new Template("Tank") { Id = TemplateId };
template.Attributes.Add(new TemplateAttribute("Temp")
{
DataType = DataType.Double,
DataSourceReference = "ns=2;s=Temp"
});
var instance = new Instance("Tank-01") { Id = InstanceId, TemplateId = TemplateId, SiteId = SiteId };
if (boundConnectionId.HasValue)
{
instance.ConnectionBindings.Add(new InstanceConnectionBinding("Temp")
{
InstanceId = InstanceId,
DataConnectionId = boundConnectionId.Value
});
}
_templateRepo.GetInstanceByIdAsync(InstanceId, Arg.Any<CancellationToken>()).Returns(instance);
_templateRepo.GetTemplateWithChildrenAsync(TemplateId, Arg.Any<CancellationToken>()).Returns(template);
_templateRepo.GetCompositionsByTemplateIdAsync(TemplateId, Arg.Any<CancellationToken>()).Returns([]);
_templateRepo.GetAllSharedScriptsAsync(Arg.Any<CancellationToken>()).Returns([]);
var connection = new DataConnection("PlantBus", "OpcUa", SiteId) { Id = ConnectionId };
_siteRepo.GetDataConnectionsBySiteIdAsync(SiteId, Arg.Any<CancellationToken>())
.Returns([connection]);
}
[Fact]
public async Task FlattenAndValidate_DataSourcedAttributeWithNoBinding_ReportsBindingError()
{
Arrange(boundConnectionId: null);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.False(result.Value.Validation.IsValid);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
[Fact]
public async Task FlattenAndValidate_BindingToExistingSiteConnection_NoBindingError()
{
Arrange(boundConnectionId: ConnectionId);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.DoesNotContain(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
[Fact]
public async Task FlattenAndValidate_BindingToStaleDeletedConnection_ReportsBindingError()
{
// M2.8 (#23): FlatteningService.ApplyConnectionBindings silently drops a
// binding whose DataConnectionId doesn't resolve to any loaded site
// DataConnection (stale / deleted connection). The flattener leaves
// BoundDataConnectionId == null, so the validator treats the attribute as
// unbound and gates the deployment with a ConnectionBinding Error.
//
// Arrange: the instance binding points at id 999, but the site only has
// the connection with id=ConnectionId (7). The flattener can't resolve 999
// and drops the binding silently; the validator then flags it.
const int StaleConnectionId = 999;
Arrange(boundConnectionId: StaleConnectionId);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.False(result.Value.Validation.IsValid);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
}
@@ -0,0 +1,102 @@
using NSubstitute;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Instances;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Flattening;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
namespace ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests;
/// <summary>
/// M2.1 (#22): proves the FlatteningPipeline actually computes the alarm-capable
/// connection set from the loaded site data connections and threads it through
/// ValidationService → SemanticValidator. Before the fix the pipeline loaded the
/// connections but never passed the capable set, so the native-alarm-source
/// capability check (built but inert) never ran in production — a source bound to
/// a non-alarm-capable connection deployed silently.
/// </summary>
public class FlatteningPipelineNativeAlarmCapabilityTests
{
private const int InstanceId = 1;
private const int TemplateId = 10;
private const int SiteId = 100;
private readonly ITemplateEngineRepository _templateRepo = Substitute.For<ITemplateEngineRepository>();
private readonly ISiteRepository _siteRepo = Substitute.For<ISiteRepository>();
private readonly FlatteningPipeline _sut;
public FlatteningPipelineNativeAlarmCapabilityTests()
{
_sut = new FlatteningPipeline(
_templateRepo,
_siteRepo,
new FlatteningService(),
new ValidationService(),
new RevisionHashService());
}
/// <summary>
/// Seeds a single-template chain whose only template carries one native alarm
/// source bound to <paramref name="connectionName"/>, and a site that owns a
/// single data connection of <paramref name="connectionProtocol"/>.
/// </summary>
private void Arrange(string connectionName, string connectionProtocol, string boundConnectionName)
{
var template = new Template("Tank") { Id = TemplateId };
template.NativeAlarmSources.Add(new TemplateNativeAlarmSource("BoilerAlarms")
{
ConnectionName = boundConnectionName,
SourceReference = "ns=2;s=Boiler",
});
var instance = new Instance("Tank-01") { Id = InstanceId, TemplateId = TemplateId, SiteId = SiteId };
_templateRepo.GetInstanceByIdAsync(InstanceId, Arg.Any<CancellationToken>()).Returns(instance);
_templateRepo.GetTemplateWithChildrenAsync(TemplateId, Arg.Any<CancellationToken>()).Returns(template);
_templateRepo.GetCompositionsByTemplateIdAsync(TemplateId, Arg.Any<CancellationToken>())
.Returns([]);
_templateRepo.GetAllSharedScriptsAsync(Arg.Any<CancellationToken>())
.Returns([]);
var connection = new DataConnection(connectionName, connectionProtocol, SiteId) { Id = 7 };
_siteRepo.GetDataConnectionsBySiteIdAsync(SiteId, Arg.Any<CancellationToken>())
.Returns([connection]);
}
[Fact]
public async Task FlattenAndValidate_NativeAlarmSourceOnNonAlarmCapableConnection_ReportsCapabilityError()
{
// A "Modbus" connection is NOT alarm-capable (no IAlarmSubscribableConnection adapter).
Arrange(connectionName: "PlantBus", connectionProtocol: "Modbus", boundConnectionName: "PlantBus");
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.NativeAlarmSourceInvalid
&& e.Message.Contains("alarm-capable"));
}
[Theory]
[InlineData("OpcUa")]
[InlineData("MxGateway")]
// Case variants: IsAlarmCapable uses OrdinalIgnoreCase, matching DataConnectionFactory's
// own OrdinalIgnoreCase protocol-key lookup; lock the contract with non-canonical casing.
[InlineData("OPCUA")]
[InlineData("opcua")]
[InlineData("mxgateway")]
[InlineData("MXGATEWAY")]
public async Task FlattenAndValidate_NativeAlarmSourceOnAlarmCapableConnection_NoCapabilityError(string protocol)
{
Arrange(connectionName: "Boiler", connectionProtocol: protocol, boundConnectionName: "Boiler");
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.DoesNotContain(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.NativeAlarmSourceInvalid);
}
}
@@ -100,7 +100,14 @@ public class DatabaseGatewayTests
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService( var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance); storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
var gateway = new DatabaseGateway(_repository, NullLogger<DatabaseGateway>.Instance, storeAndForward: sf); // M2.3 (#7): CachedWriteAsync now attempts the write immediately and
// only buffers on a TRANSIENT failure. The stub forces a transient
// outcome so this test exercises the buffering path deterministically
// without a real SQL Server.
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException("deadlock", errorNumber: 1205));
// Audit Log #23 (ExecutionId Task 4): a known execution id / source // Audit Log #23 (ExecutionId Task 4): a known execution id / source
// script so the gateway -> EnqueueAsync hop can be asserted below. // script so the gateway -> EnqueueAsync hop can be asserted below.
@@ -157,7 +164,11 @@ public class DatabaseGatewayTests
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService( var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance); storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
var gateway = new DatabaseGateway(_repository, NullLogger<DatabaseGateway>.Instance, storeAndForward: sf); // M2.3 (#7): force a transient outcome so the write reaches S&F.
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException("deadlock", errorNumber: 1205));
await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)"); await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
@@ -167,6 +178,377 @@ public class DatabaseGatewayTests
Assert.NotEqual(0, maxRetries); Assert.NotEqual(0, maxRetries);
} }
// ── M2.3 (#7): transient-vs-permanent SQL classification on the immediate
// cached-write attempt + the buffered retry path ──
/// <summary>
/// Builds a real, initialised in-memory store-and-forward service plus a
/// keep-alive connection (the SQLite shared-cache DB lives only while a
/// connection is open). The caller disposes <paramref name="keepAlive"/>.
/// </summary>
private static (ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService Sf, string ConnStr, Microsoft.Data.Sqlite.SqliteConnection KeepAlive)
NewStoreAndForward()
{
var dbName = $"EsgCachedWriteClassify_{Guid.NewGuid():N}";
var connStr = $"Data Source={dbName};Mode=Memory;Cache=Shared";
var keepAlive = new Microsoft.Data.Sqlite.SqliteConnection(connStr);
keepAlive.Open();
var storage = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardStorage(
connStr, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardStorage>.Instance);
storage.InitializeAsync().GetAwaiter().GetResult();
var sfOptions = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardOptions
{
DefaultMaxRetries = 99,
DefaultRetryInterval = TimeSpan.FromMinutes(10),
RetryTimerInterval = TimeSpan.FromMinutes(10),
};
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
return (sf, connStr, keepAlive);
}
[Fact]
public async Task CachedWrite_PermanentSqlError_ReturnsFailedSynchronously_NotBuffered()
{
// A constraint/syntax/permission failure on the IMMEDIATE attempt must
// be returned to the script as Failed and must NOT be buffered — mirrors
// ExternalSystemClient.CachedCallAsync's PermanentExternalSystemException
// path.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new PermanentDatabaseException(
"Violation of PRIMARY KEY constraint", errorNumber: 2627));
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.False(result.Success);
Assert.False(result.WasBuffered);
Assert.NotNull(result.ErrorMessage);
// Nothing buffered — the permanent failure short-circuited S&F.
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_TransientSqlError_BuffersToStoreAndForward()
{
// A deadlock / timeout on the IMMEDIATE attempt is transient — the write
// is handed to S&F (WasBuffered=true), not returned as Failed.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test")
{
Id = 1,
MaxRetries = 5,
RetryDelay = TimeSpan.FromSeconds(12),
};
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException(
"Transaction was deadlocked", errorNumber: 1205));
var result = await gateway.CachedWriteAsync(
"testDb", "UPDATE t SET v = 1", new Dictionary<string, object?> { ["x"] = 1 });
Assert.True(result.Success); // accepted for delivery
Assert.True(result.WasBuffered); // handed to S&F, not synchronously failed
Assert.Null(result.ErrorMessage);
Assert.Equal(1, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_ImmediateSuccess_NotBuffered_ReturnsDelivered()
{
// A write that succeeds immediately is done — it must NOT be buffered,
// and the result reports success (WasBuffered=false), mirroring the API
// path's immediate-success behaviour.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(_repository, sf, onExecute: () => { /* succeeds */ });
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.True(result.Success);
Assert.False(result.WasBuffered);
Assert.Null(result.ErrorMessage);
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task DeliverBuffered_TransientSqlError_RethrowsSoEngineRetries()
{
// On the retry path a transient failure must propagate so the S&F engine
// schedules another retry — mirrors ExternalSystemClient.DeliverBuffered
// letting TransientExternalSystemException escape.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new ExecuteStubGateway(
_repository,
storeAndForward: null,
onExecute: () => throw new TransientDatabaseException("timeout", errorNumber: -2));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
await Assert.ThrowsAsync<TransientDatabaseException>(
() => gateway.DeliverBufferedAsync(message));
}
[Fact]
public async Task DeliverBuffered_PermanentSqlError_ReturnsFalseSoMessageParks()
{
// On the retry path a permanent failure must park the message (return
// false) rather than retry forever — mirrors ExternalSystemClient.
// DeliverBuffered returning false on PermanentExternalSystemException.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new ExecuteStubGateway(
_repository,
storeAndForward: null,
onExecute: () => throw new PermanentDatabaseException(
"Invalid column name", errorNumber: 207));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
var delivered = await gateway.DeliverBufferedAsync(message);
Assert.False(delivered); // permanent — the S&F engine parks the message
}
// ── M2.3 (#7) code-review fix: ExecuteWriteAsync must classify NON-SqlException
// DB outages as transient (buffer+retry) and propagate cancellation —
// mirroring the HTTP path's ordered catches in InvokeHttpAsync. The pre-fix
// code only caught SqlException, so a live outage surfacing as
// InvalidOperationException / SocketException / IOException / TimeoutException
// escaped unclassified and crashed the Script Execution Actor instead of
// buffering. These tests drive the RAW execution seam (RunSqlAsync) so the
// PRODUCTION classification in ExecuteWriteAsync runs end-to-end. ──
public static IEnumerable<object[]> TransientNonSqlOutages()
{
// A live DB outage that surfaces as a non-SqlException: connection-state,
// socket, IO, and timeout failures are all retryable transport errors.
yield return new object[] { new InvalidOperationException("The connection is not open.") };
yield return new object[] { new System.Net.Sockets.SocketException(10061 /* connection refused */) };
yield return new object[] { new System.IO.IOException("Unable to read data from the transport connection.") };
yield return new object[] { new TimeoutException("The operation has timed out.") };
}
[Theory]
[MemberData(nameof(TransientNonSqlOutages))]
public async Task CachedWrite_NonSqlOutage_ClassifiedTransient_BuffersNotCrash(Exception outage)
{
// [1] A live outage that is NOT a SqlException must be classified TRANSIENT
// (buffered for retry), NOT escape unclassified to crash the script actor,
// and NOT be returned as a permanent Failed result.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test")
{
Id = 1,
MaxRetries = 5,
RetryDelay = TimeSpan.FromSeconds(12),
};
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
// RawExecuteStubGateway routes the raw throw through the PRODUCTION
// ExecuteWriteAsync classification (the seam under test), unlike
// ExecuteStubGateway which throws an already-classified exception.
var gateway = new RawExecuteStubGateway(_repository, sf, onRunSql: () => throw outage);
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.True(result.Success); // accepted for delivery, not a crash
Assert.True(result.WasBuffered); // handed to S&F as transient
Assert.Null(result.ErrorMessage); // not a permanent Failed result
Assert.Equal(1, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_CancellationRequested_PropagatesOperationCanceled_NotReclassified()
{
// [2] OperationCanceledException raised while the caller's token is
// cancelled must propagate UNCHANGED — never reclassified as a transient
// DB error and never buffered. Mirrors the HTTP path's first catch:
// `catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) throw;`
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
using var cts = new CancellationTokenSource();
cts.Cancel();
var gateway = new RawExecuteStubGateway(
_repository, sf, onRunSql: () => throw new OperationCanceledException(cts.Token));
await Assert.ThrowsAsync<OperationCanceledException>(
() => gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)", cancellationToken: cts.Token));
// Cancellation is not a transient failure — nothing must have been buffered.
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_UnexpectedException_Propagates_NotClassifiedTransient()
{
// An exception type outside the transient transport set (e.g.
// ArgumentException) is NOT a DB outage — it must propagate, exactly as
// the HTTP path lets genuinely-unexpected exceptions escape past
// `catch (Exception ex) when (ErrorClassifier.IsTransient(ex))`.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new RawExecuteStubGateway(
_repository, sf, onRunSql: () => throw new ArgumentException("authoring bug"));
await Assert.ThrowsAsync<ArgumentException>(
() => gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)"));
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task DeliverBuffered_NonSqlOutage_RethrowsAsTransient_SoEngineRetries()
{
// [1] on the RETRY path: a non-SqlException outage during delivery must be
// classified transient and propagate (as TransientDatabaseException) so
// the S&F engine schedules another retry — it must NOT crash/park.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new RawExecuteStubGateway(
_repository,
storeAndForward: null,
onRunSql: () => throw new InvalidOperationException("The connection is not open."));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
await Assert.ThrowsAsync<TransientDatabaseException>(
() => gateway.DeliverBufferedAsync(message));
}
/// <summary>
/// Reads the current buffered-message count off the S&amp;F SQLite DB by
/// counting <c>sf_messages</c> rows (the engine's persistence table).
/// </summary>
private static int ReadBufferDepth(string connStr)
{
using var conn = new Microsoft.Data.Sqlite.SqliteConnection(connStr);
conn.Open();
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM sf_messages";
return Convert.ToInt32(cmd.ExecuteScalar());
}
/// <summary>
/// Test gateway that substitutes the SQL-execution seam so a test can drive
/// success / transient / permanent outcomes without a real SQL Server (and
/// without fabricating a <see cref="Microsoft.Data.SqlClient.SqlException"/>,
/// which has no public constructor). Production classifies a real
/// <c>SqlException</c> into <see cref="TransientDatabaseException"/> /
/// <see cref="PermanentDatabaseException"/> at this same seam.
/// </summary>
private sealed class ExecuteStubGateway : DatabaseGateway
{
private readonly Action _onExecute;
public ExecuteStubGateway(
IExternalSystemRepository repository,
ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService? storeAndForward,
Action onExecute)
: base(repository, NullLogger<DatabaseGateway>.Instance, storeAndForward)
=> _onExecute = onExecute;
internal override Task ExecuteWriteAsync(
string connectionName,
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
_onExecute();
return Task.CompletedTask;
}
}
/// <summary>
/// Test gateway that substitutes the INNER SQL-execution seam
/// (<c>RunSqlAsync</c>) so a test can throw RAW exceptions (a real outage
/// shape: <see cref="InvalidOperationException"/>, <see cref="System.Net.Sockets.SocketException"/>,
/// etc.) and have them flow through the PRODUCTION
/// <c>ExecuteWriteAsync</c> classification (the catch ordering under test) —
/// unlike <see cref="ExecuteStubGateway"/>, which throws an
/// already-classified <see cref="TransientDatabaseException"/> /
/// <see cref="PermanentDatabaseException"/> and so bypasses the catches.
/// </summary>
private sealed class RawExecuteStubGateway : DatabaseGateway
{
private readonly Action _onRunSql;
public RawExecuteStubGateway(
IExternalSystemRepository repository,
ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService? storeAndForward,
Action onRunSql)
: base(repository, NullLogger<DatabaseGateway>.Instance, storeAndForward)
=> _onRunSql = onRunSql;
internal override Task RunSqlAsync(
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
_onRunSql();
return Task.CompletedTask;
}
}
private static (int MaxRetries, long RetryIntervalMs, Guid? ExecutionId, string? SourceScript) private static (int MaxRetries, long RetryIntervalMs, Guid? ExecutionId, string? SourceScript)
ReadBufferedRetrySettings(string connStr) ReadBufferedRetrySettings(string connStr)
{ {
@@ -0,0 +1,105 @@
using System.Data.Common;
namespace ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests;
/// <summary>
/// M2.3 (#7): unit tests for the transient-vs-permanent SQL error-number
/// classifier that <c>DatabaseGateway</c> uses to decide whether a failed
/// cached write should be buffered (transient) or returned to the script
/// synchronously / parked (permanent).
/// </summary>
public class SqlErrorClassifierTests
{
// The full transient set documented on SqlErrorClassifier — connection,
// timeout, deadlock, and Azure throttle error numbers. A retry can plausibly
// succeed for any of these, so they are buffered to store-and-forward.
[Theory]
[InlineData(-2)] // timeout expired
[InlineData(-1)] // connection error
[InlineData(2)] // network / instance not found
[InlineData(53)] // network path not found
[InlineData(64)] // connection terminated mid-session
[InlineData(233)] // no process on the other end of the pipe
[InlineData(1205)] // deadlock victim
[InlineData(10053)] // transport-level abort
[InlineData(10054)] // connection reset by peer
[InlineData(10060)] // connection timed out
[InlineData(40197)] // Azure SQL service error, retry
[InlineData(40501)] // Azure SQL service busy
[InlineData(40613)] // Azure SQL database unavailable
[InlineData(49918)] // Azure SQL cannot process request (throttle)
[InlineData(49919)] // Azure SQL too many create/update operations
[InlineData(49920)] // Azure SQL too many operations (throttle)
public void IsTransient_KnownTransientNumber_ReturnsTrue(int errorNumber)
{
Assert.True(SqlErrorClassifier.IsTransient(errorNumber));
}
// Constraint, syntax, and permission errors are permanent — retrying the
// identical statement cannot succeed and may cause duplicate side effects.
[Theory]
[InlineData(547)] // constraint violation (FK/CHECK)
[InlineData(2627)] // primary-key / unique constraint violation
[InlineData(2601)] // duplicate key in a unique index
[InlineData(102)] // incorrect syntax
[InlineData(156)] // incorrect syntax near a keyword
[InlineData(207)] // invalid column name
[InlineData(208)] // invalid object name
[InlineData(229)] // permission denied on object
[InlineData(230)] // permission denied on column
[InlineData(262)] // permission denied (CREATE etc.)
public void IsTransient_KnownPermanentNumber_ReturnsFalse(int errorNumber)
{
Assert.False(SqlErrorClassifier.IsTransient(errorNumber));
}
[Theory]
[InlineData(0)] // no error number captured
[InlineData(99999)] // unknown / undocumented number
[InlineData(12345)]
[InlineData(int.MaxValue)]
public void IsTransient_UnknownNumber_DefaultsToPermanent(int errorNumber)
{
// Fail-fast is the safer default: an unrecognised error number must NOT
// be silently retried forever. Unknown => permanent => false.
Assert.False(SqlErrorClassifier.IsTransient(errorNumber));
}
// ── M2.3 (#7) code-review fix: IsTransient(Exception) — a live DB outage does
// not always surface as a SqlException. Transport/connection/timeout/driver
// exception types are transient (buffer+retry), mirroring the HTTP path's
// ErrorClassifier.IsTransient(Exception). ──
public static IEnumerable<object[]> TransientExceptionTypes()
{
yield return new object[] { new InvalidOperationException("connection not open") };
yield return new object[] { new System.IO.IOException("transport reset") };
yield return new object[] { new System.Net.Sockets.SocketException(10060) };
yield return new object[] { new TimeoutException("timed out") };
yield return new object[] { new TaskCanceledException("driver-level cancellation") };
// Any DbException that is NOT a SqlException is a driver/transport error.
yield return new object[] { new NonSqlDbException("provider transport error") };
}
[Theory]
[MemberData(nameof(TransientExceptionTypes))]
public void IsTransient_Exception_TrueForTransportTypes(Exception ex)
{
Assert.True(SqlErrorClassifier.IsTransient(ex));
}
[Fact]
public void IsTransient_Exception_FalseForUnexpectedType()
{
// Authoring bugs are NOT a DB outage — they must propagate, exactly as the
// HTTP path lets genuinely-unexpected exceptions escape its IsTransient filter.
Assert.False(SqlErrorClassifier.IsTransient(new ArgumentException("authoring bug")));
Assert.False(SqlErrorClassifier.IsTransient(new NullReferenceException()));
}
/// <summary>A concrete <see cref="DbException"/> that is not a SqlException, for the classifier unit test.</summary>
private sealed class NonSqlDbException : DbException
{
public NonSqlDbException(string message) : base(message) { }
}
}
@@ -0,0 +1,48 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) idempotency regression — code-review finding on commit d81f747.
/// <para>
/// <see cref="ServiceCollectionExtensions.AddSiteEventLogHealthMetricsBridge"/> uses a
/// factory-lambda overload of <c>AddHostedService</c>, which sets only
/// <c>ImplementationFactory</c> and leaves <c>ImplementationType</c> null. The original
/// <c>ImplementationType ==</c> guard was therefore a silent no-op: a second call would spin
/// up a second <see cref="SiteEventLogFailureCountReporter"/> (two timers both polling).
/// The fix uses a private marker singleton whose <c>ServiceType</c> is always set.
/// </para>
/// </summary>
public class AddSiteEventLogHealthMetricsBridgeTests
{
[Fact]
public void AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService()
{
// M2.16 (#30): calling the bridge method twice must register exactly one
// SiteEventLogFailureCountReporter. Without the marker-type guard the
// ImplementationType == check was a no-op for factory-lambda registrations,
// so the second call would have added a second hosted service (two timers).
var services = new ServiceCollection();
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
services.AddHealthMonitoring();
Func<IServiceProvider, Func<long>> factory = _ => () => 0L;
services.AddSiteEventLogHealthMetricsBridge(factory);
services.AddSiteEventLogHealthMetricsBridge(factory);
// Count IHostedService descriptors whose factory produces a
// SiteEventLogFailureCountReporter. Because it is factory-registered,
// ImplementationType is null — we count by resolving and checking type.
using var provider = services.BuildServiceProvider();
var reporters = provider.GetServices<IHostedService>()
.OfType<SiteEventLogFailureCountReporter>()
.ToList();
Assert.Single(reporters);
}
}
@@ -0,0 +1,77 @@
using Microsoft.Extensions.Logging.Abstractions;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) — unit tests for <see cref="SiteEventLogFailureCountReporter"/>.
/// Verifies that the poller reads the count provided by the
/// <see cref="Func{TResult}"/> delegate and pushes it into
/// <see cref="ISiteHealthCollector.SetSiteEventLogWriteFailures"/>.
/// </summary>
public class SiteEventLogFailureCountReporterTests
{
[Fact]
public async Task StartAsync_ImmediatelyProbes_FailedWriteCount()
{
// Arrange
var count = 99L;
var collector = new SiteHealthCollector();
using var reporter = new SiteEventLogFailureCountReporter(
failedWriteCountProvider: () => count,
collector: collector,
logger: NullLogger<SiteEventLogFailureCountReporter>.Instance,
refreshInterval: TimeSpan.FromHours(1)); // long interval — only immediate tick matters
// Act
await reporter.StartAsync(CancellationToken.None);
// Give the background Task a moment to execute its synchronous immediate probe.
var deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures == 0L
&& DateTime.UtcNow < deadline)
{
await Task.Delay(10);
}
// Assert — the immediate probe before the first Delay must have fired.
var report = collector.CollectReport("site-1");
Assert.Equal(99L, report.SiteEventLogWriteFailures);
await reporter.StopAsync(CancellationToken.None);
}
[Fact]
public async Task StartAsync_PushesLatestCount_OnEachTick()
{
// Arrange — start with count 5; advance to 12 after the first tick.
var count = 5L;
var collector = new SiteHealthCollector();
using var reporter = new SiteEventLogFailureCountReporter(
failedWriteCountProvider: () => count,
collector: collector,
logger: NullLogger<SiteEventLogFailureCountReporter>.Instance,
refreshInterval: TimeSpan.FromMilliseconds(50));
await reporter.StartAsync(CancellationToken.None);
// Wait for immediate probe.
var deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures != 5L
&& DateTime.UtcNow < deadline)
await Task.Delay(10);
Assert.Equal(5L, collector.CollectReport("site-1").SiteEventLogWriteFailures);
// Advance the counter and wait for the next tick to push the new value.
count = 12L;
deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures != 12L
&& DateTime.UtcNow < deadline)
await Task.Delay(10);
Assert.Equal(12L, collector.CollectReport("site-1").SiteEventLogWriteFailures);
await reporter.StopAsync(CancellationToken.None);
}
}
@@ -0,0 +1,62 @@
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) regression coverage. <see cref="ISiteEventLogger.FailedWriteCount"/>
/// is a cumulative (point-in-time) counter. A periodic
/// <c>SiteEventLogFailureCountReporter</c> hosted service polls the count and
/// pushes it into the collector via
/// <see cref="ISiteHealthCollector.SetSiteEventLogWriteFailures"/> so the next
/// <see cref="ISiteHealthCollector.CollectReport"/> includes it in the report
/// payload as <c>SiteEventLogWriteFailures</c>. Unlike the per-interval
/// SiteAuditWriteFailures counter, this value is NOT reset on collect — it
/// carries forward whatever the most recent poller push delivered.
/// </summary>
public class SiteEventLogWriteFailuresMetricTests
{
private readonly SiteHealthCollector _collector = new();
[Fact]
public void Set_Then_CollectReport_IncludesCount()
{
_collector.SetSiteEventLogWriteFailures(17L);
var report = _collector.CollectReport("site-1");
Assert.Equal(17L, report.SiteEventLogWriteFailures);
}
[Fact]
public void Report_Payload_Includes_SiteEventLogWriteFailures_AsZeroByDefault()
{
var report = _collector.CollectReport("site-1");
Assert.Equal(0L, report.SiteEventLogWriteFailures);
}
[Fact]
public void CollectReport_DoesNotReset_SiteEventLogWriteFailures()
{
// This is a point-in-time cumulative count — successive CollectReport
// calls before the next poller tick MUST carry forward the same value
// rather than resetting to zero (which would falsely indicate no failures
// between the two reports).
_collector.SetSiteEventLogWriteFailures(42L);
var first = _collector.CollectReport("site-1");
var second = _collector.CollectReport("site-1");
Assert.Equal(42L, first.SiteEventLogWriteFailures);
Assert.Equal(42L, second.SiteEventLogWriteFailures);
}
[Fact]
public void Set_Overwrites_Previous_Value()
{
_collector.SetSiteEventLogWriteFailures(5L);
_collector.SetSiteEventLogWriteFailures(9L);
var report = _collector.CollectReport("site-1");
Assert.Equal(9L, report.SiteEventLogWriteFailures);
}
}
@@ -11,6 +11,7 @@
<ItemGroup> <ItemGroup>
<PackageReference Include="coverlet.collector" /> <PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.Data.Sqlite" /> <PackageReference Include="Microsoft.Data.Sqlite" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" /> <PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
<PackageReference Include="Microsoft.Extensions.Options" /> <PackageReference Include="Microsoft.Extensions.Options" />
<PackageReference Include="Microsoft.NET.Test.Sdk" /> <PackageReference Include="Microsoft.NET.Test.Sdk" />
@@ -35,6 +35,11 @@ public class CentralActorPathTests : IAsyncLifetime
// env var is visible to StartupValidator.Validate() at Program.cs line 42. // env var is visible to StartupValidator.Validate() at Program.cs line 42.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper); CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>() _factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder => .WithWebHostBuilder(builder =>
@@ -94,6 +99,7 @@ public class CentralActorPathTests : IAsyncLifetime
_factory?.Dispose(); _factory?.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null); Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
await Task.CompletedTask; await Task.CompletedTask;
} }
@@ -101,6 +101,11 @@ public class CentralAuditWiringTests : IDisposable
// runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config. // runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper); CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>() _factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder => .WithWebHostBuilder(builder =>
@@ -156,6 +161,7 @@ public class CentralAuditWiringTests : IDisposable
_factory.Dispose(); _factory.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null); Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
} }
[Fact] [Fact]
@@ -10,8 +10,12 @@ namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
/// ///
/// Also supplies <c>ScadaBridge__InboundApi__ApiKeyPepper</c> so the Central-role /// Also supplies <c>ScadaBridge__InboundApi__ApiKeyPepper</c> so the Central-role
/// StartupValidator preflight (added in 1fcc4f5) does not fail for tests that set /// StartupValidator preflight (added in 1fcc4f5) does not fail for tests that set
/// <c>DOTNET_ENVIRONMENT=Central</c> without an explicit pepper env var. Both vars /// <c>DOTNET_ENVIRONMENT=Central</c> without an explicit pepper env var.
/// are restored on Dispose so tests stay isolated. ///
/// Also supplies <c>ScadaBridge__Database__MachineDataDb</c> so the Central-role
/// StartupValidator preflight (reverts Host-008, REQ-HOST-3/4, M2.9 #17) does not
/// fail for tests that set <c>DOTNET_ENVIRONMENT=Central</c> without an explicit
/// MachineDataDb env var. All vars are restored on Dispose so tests stay isolated.
/// </summary> /// </summary>
internal sealed class CentralDbTestEnvironment : IDisposable internal sealed class CentralDbTestEnvironment : IDisposable
{ {
@@ -22,6 +26,11 @@ internal sealed class CentralDbTestEnvironment : IDisposable
private const string ConfigKey = "ScadaBridge__Database__ConfigurationDb"; private const string ConfigKey = "ScadaBridge__Database__ConfigurationDb";
private const string MachineDataDb =
"Server=localhost,1433;Database=ScadaBridgeMachineData;User Id=scadabridge_app;Password=ScadaBridge_Dev1#;TrustServerCertificate=true";
private const string MachineDataKey = "ScadaBridge__Database__MachineDataDb";
// Test-only pepper — satisfies the ≥16-char StartupValidator requirement without // Test-only pepper — satisfies the ≥16-char StartupValidator requirement without
// committing a real secret. The env-var name uses the double-underscore delimiter // committing a real secret. The env-var name uses the double-underscore delimiter
// so AddEnvironmentVariables() maps it to ScadaBridge:InboundApi:ApiKeyPepper. // so AddEnvironmentVariables() maps it to ScadaBridge:InboundApi:ApiKeyPepper.
@@ -29,6 +38,7 @@ internal sealed class CentralDbTestEnvironment : IDisposable
private const string PepperKey = "ScadaBridge__InboundApi__ApiKeyPepper"; private const string PepperKey = "ScadaBridge__InboundApi__ApiKeyPepper";
private readonly string? _previousConfig; private readonly string? _previousConfig;
private readonly string? _previousMachineData;
private readonly string? _previousPepper; private readonly string? _previousPepper;
public CentralDbTestEnvironment() public CentralDbTestEnvironment()
@@ -36,6 +46,9 @@ internal sealed class CentralDbTestEnvironment : IDisposable
_previousConfig = Environment.GetEnvironmentVariable(ConfigKey); _previousConfig = Environment.GetEnvironmentVariable(ConfigKey);
Environment.SetEnvironmentVariable(ConfigKey, ConfigurationDb); Environment.SetEnvironmentVariable(ConfigKey, ConfigurationDb);
_previousMachineData = Environment.GetEnvironmentVariable(MachineDataKey);
Environment.SetEnvironmentVariable(MachineDataKey, MachineDataDb);
_previousPepper = Environment.GetEnvironmentVariable(PepperKey); _previousPepper = Environment.GetEnvironmentVariable(PepperKey);
Environment.SetEnvironmentVariable(PepperKey, TestPepper); Environment.SetEnvironmentVariable(PepperKey, TestPepper);
} }
@@ -43,6 +56,7 @@ internal sealed class CentralDbTestEnvironment : IDisposable
public void Dispose() public void Dispose()
{ {
Environment.SetEnvironmentVariable(ConfigKey, _previousConfig); Environment.SetEnvironmentVariable(ConfigKey, _previousConfig);
Environment.SetEnvironmentVariable(MachineDataKey, _previousMachineData);
Environment.SetEnvironmentVariable(PepperKey, _previousPepper); Environment.SetEnvironmentVariable(PepperKey, _previousPepper);
} }
} }
@@ -95,6 +95,11 @@ public class CentralCompositionRootTests : IDisposable
// runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config. // runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper); CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>() _factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder => .WithWebHostBuilder(builder =>
@@ -159,6 +164,7 @@ public class CentralCompositionRootTests : IDisposable
_factory.Dispose(); _factory.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null); Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
} }
// --- Singletons --- // --- Singletons ---
@@ -399,6 +405,9 @@ public class SiteCompositionRootTests : IDisposable
new object[] { typeof(IEventLogQueryService) }, new object[] { typeof(IEventLogQueryService) },
new object[] { typeof(ISiteIdentityProvider) }, new object[] { typeof(ISiteIdentityProvider) },
new object[] { typeof(IHealthReportTransport) }, new object[] { typeof(IHealthReportTransport) },
// M2.15 (#29): the active-node purge gate must be registered on site nodes
// so EventLogPurge only runs on the active node.
new object[] { typeof(SiteEventLogActiveNodeCheck) },
}; };
// --- Scoped services --- // --- Scoped services ---
@@ -158,6 +158,15 @@ public class HealthCheckTests : IDisposable
Assert.Contains(ZbHealthTags.Ready, registrations["database"].Tags); Assert.Contains(ZbHealthTags.Ready, registrations["database"].Tags);
Assert.Contains(ZbHealthTags.Ready, registrations["akka-cluster"].Tags); Assert.Contains(ZbHealthTags.Ready, registrations["akka-cluster"].Tags);
// M2.14 (#28): readiness ALSO reflects "required cluster singletons running"
// (REQ-HOST-4a). The Central-only required-singletons check is Ready-tagged so
// it gates /health/ready alongside database + akka-cluster, but is leadership-
// agnostic (it does NOT carry the Active tag), so a ready standby stays ready.
Assert.True(registrations.ContainsKey("required-singletons"),
"Expected a 'required-singletons' health check.");
Assert.Contains(ZbHealthTags.Ready, registrations["required-singletons"].Tags);
Assert.DoesNotContain(ZbHealthTags.Active, registrations["required-singletons"].Tags);
// The leader-only active-node check must NOT be on the readiness tier. // The leader-only active-node check must NOT be on the readiness tier.
Assert.DoesNotContain(ZbHealthTags.Ready, registrations["active-node"].Tags); Assert.DoesNotContain(ZbHealthTags.Ready, registrations["active-node"].Tags);
} }
@@ -0,0 +1,143 @@
using Akka.Actor;
using Akka.TestKit.Xunit2;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.ScadaBridge.Host.Health;
namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
/// <summary>
/// M2.14 (#28): unit tests for <see cref="RequiredSingletonsHealthCheck"/>.
///
/// The check probes each required central singleton through its local
/// <c>ClusterSingletonProxy</c> by Asking an <see cref="Identify"/> with a short
/// bounded timeout and treating a non-null <see cref="ActorIdentity.Subject"/> as
/// "reachable". These tests exercise that probe logic directly against a TestKit
/// <see cref="ActorSystem"/>:
/// <list type="bullet">
/// <item>present + reachable proxy paths (live echo actors) → Healthy;</item>
/// <item>a missing proxy path (ActorSelection resolves a null Subject) → Unhealthy
/// naming the unreachable singleton.</item>
/// </list>
/// No WebApplicationFactory / DB / formed cluster is needed — the probe is just an
/// in-process Identify round-trip, so the tests are deterministic and fast.
/// </summary>
public class RequiredSingletonsHealthCheckTests : TestKit
{
/// <summary>A minimal live actor that does nothing — its mere existence makes
/// an <see cref="Identify"/> resolve a non-null Subject (i.e. "reachable").</summary>
/// <remarks>No <c>Receive&lt;Identify&gt;</c> handler is needed: Akka's
/// <see cref="ActorBase"/> answers every <see cref="Identify"/> message with
/// an <see cref="ActorIdentity"/> automatically, so an empty actor at the proxy
/// path is sufficient to simulate a reachable singleton.</remarks>
private sealed class EchoActor : ReceiveActor
{
}
private IServiceProvider ProviderReturning(ActorSystem system)
{
var services = new ServiceCollection();
services.AddSingleton(system);
return services.BuildServiceProvider();
}
private static async Task<HealthCheckResult> RunAsync(RequiredSingletonsHealthCheck check)
{
var context = new HealthCheckContext
{
Registration = new HealthCheckRegistration(
"required-singletons", check, failureStatus: null, tags: null),
};
return await check.CheckHealthAsync(context, CancellationToken.None);
}
[Fact]
public async Task AllRequiredSingletonProxiesReachable_ReportsHealthy()
{
// Create a live actor at every required proxy path so each Identify resolves
// a non-null Subject.
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Healthy, result.Status);
}
[Fact]
public async Task OneRequiredSingletonUnreachable_ReportsUnhealthyNamingIt()
{
// Create all but one proxy. The missing one's ActorSelection resolves an
// ActorIdentity with a null Subject within the bounded timeout → unreachable.
var missing = RequiredSingletonsHealthCheck.RequiredSingletonProxyNames[0];
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
if (name == missing)
continue;
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
Assert.NotNull(result.Description);
Assert.Contains(missing, result.Description!);
}
[Fact]
public async Task ActorSystemNotYetAvailable_ReportsUnhealthy_DoesNotThrow()
{
// Startup race: ActorSystem not yet bridged into DI. The check must map this
// to Unhealthy (the node is not ready to serve) rather than throwing.
var emptyProvider = new ServiceCollection().BuildServiceProvider();
var check = new RequiredSingletonsHealthCheck(
emptyProvider,
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task PreCancelledToken_ReportsUnhealthy_DoesNotThrow()
{
// Shutdown-race path: CheckHealthAsync is called with an already-cancelled
// token (e.g. host is tearing down). The check must never throw — any
// OperationCanceledException from Ask must be caught and mapped to Unhealthy.
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
using var cts = new CancellationTokenSource();
cts.Cancel(); // already cancelled before the check runs
var context = new HealthCheckContext
{
Registration = new HealthCheckRegistration(
"required-singletons", check, failureStatus: null, tags: null),
};
// Must not throw; an already-cancelled token → all probes fail → Unhealthy.
var result = await check.CheckHealthAsync(context, cts.Token);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
}
@@ -20,6 +20,7 @@ public class StartupValidatorTests
["ScadaBridge:Node:NodeHostname"] = "central-node1", ["ScadaBridge:Node:NodeHostname"] = "central-node1",
["ScadaBridge:Node:RemotingPort"] = "8081", ["ScadaBridge:Node:RemotingPort"] = "8081",
["ScadaBridge:Database:ConfigurationDb"] = "Server=localhost;Database=Config;", ["ScadaBridge:Database:ConfigurationDb"] = "Server=localhost;Database=Config;",
["ScadaBridge:Database:MachineDataDb"] = "Server=localhost;Database=MachineData;",
["ScadaBridge:Security:Ldap:Server"] = "ldap.example.com", ["ScadaBridge:Security:Ldap:Server"] = "ldap.example.com",
["ScadaBridge:Security:JwtSigningKey"] = "test-signing-key-at-least-32-chars-long", ["ScadaBridge:Security:JwtSigningKey"] = "test-signing-key-at-least-32-chars-long",
["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@central-node1:8081", ["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@central-node1:8081",
@@ -152,17 +153,19 @@ public class StartupValidatorTests
} }
[Fact] [Fact]
public void Central_MissingMachineDataDb_PassesValidation() public void Central_MissingMachineDataDb_FailsValidation()
{ {
// Host-008 regression: MachineDataDb is never consumed anywhere in the // Reverts Host-008. REQ-HOST-3/REQ-HOST-4 require MachineDataDb to be
// system (only ConfigurationDb is wired into AddConfigurationDatabase). // validated at startup for Central nodes, and the shipped docker appsettings
// It is no longer a required key, so its absence must not fail startup. // (docker/central-node-a/appsettings.Central.json and central-node-b) carry
// the key. The prior Host-008 decision (which removed the Require) is reversed
// here (#17, M2.9): a missing MachineDataDb must fail fast with a clear error.
var values = ValidCentralConfig(); var values = ValidCentralConfig();
values.Remove("ScadaBridge:Database:MachineDataDb"); values.Remove("ScadaBridge:Database:MachineDataDb");
var config = BuildConfig(values); var config = BuildConfig(values);
var ex = Record.Exception(() => StartupValidator.Validate(config)); var ex = Assert.Throws<InvalidOperationException>(() => StartupValidator.Validate(config));
Assert.Null(ex); Assert.Contains("MachineDataDb connection string required for Central", ex.Message);
} }
[Fact] [Fact]
@@ -1,12 +1,30 @@
using System.Text.Json; using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests; namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
/// <summary> /// <summary>
/// WP-2: Tests for parameter validation — type checking, required fields, extended type system. /// WP-2 / InboundAPI-M2.6: tests for parameter validation — type checking,
/// required fields, the extended type system, and RECURSIVE (nested Object /
/// List element) type validation with path-qualified errors.
///
/// <para>
/// Definitions are expressed as JSON Schema (the canonical persisted format
/// produced by the Central UI / migration). The validator also accepts the
/// legacy flat-array form; that backward-compat path is covered by the final
/// region.
/// </para>
/// </summary> /// </summary>
public class ParameterValidatorTests public class ParameterValidatorTests
{ {
private static JsonElement Body(string json)
{
using var doc = JsonDocument.Parse(json);
return doc.RootElement.Clone();
}
// ── No / empty definitions ────────────────────────────────────────────────
[Fact] [Fact]
public void NoDefinitions_NoBody_ReturnsValid() public void NoDefinitions_NoBody_ReturnsValid()
{ {
@@ -16,21 +34,27 @@ public class ParameterValidatorTests
} }
[Fact] [Fact]
public void EmptyDefinitions_ReturnsValid() public void EmptyObjectSchema_ReturnsValid()
{
var result = ParameterValidator.Validate(null, """{"type":"object","properties":{}}""");
Assert.True(result.IsValid);
}
[Fact]
public void EmptyLegacyArray_ReturnsValid()
{ {
var result = ParameterValidator.Validate(null, "[]"); var result = ParameterValidator.Validate(null, "[]");
Assert.True(result.IsValid); Assert.True(result.IsValid);
} }
// ── Required / body shape ──────────────────────────────────────────────────
[Fact] [Fact]
public void RequiredParameterMissing_ReturnsInvalid() public void RequiredParameterMissing_ReturnsInvalid()
{ {
var definitions = JsonSerializer.Serialize(new[] const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
{
new { Name = "value", Type = "Integer", Required = true }
});
var result = ParameterValidator.Validate(null, definitions); var result = ParameterValidator.Validate(null, def);
Assert.False(result.IsValid); Assert.False(result.IsValid);
Assert.Contains("Missing required parameter", result.ErrorMessage); Assert.Contains("Missing required parameter", result.ErrorMessage);
} }
@@ -38,136 +62,379 @@ public class ParameterValidatorTests
[Fact] [Fact]
public void BodyNotObject_ReturnsInvalid() public void BodyNotObject_ReturnsInvalid()
{ {
var definitions = JsonSerializer.Serialize(new[] const string def = """{"type":"object","properties":{"value":{"type":"string"}},"required":["value"]}""";
{
new { Name = "value", Type = "String", Required = true }
});
using var doc = JsonDocument.Parse("\"just a string\""); var result = ParameterValidator.Validate(Body("\"just a string\""), def);
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.False(result.IsValid); Assert.False(result.IsValid);
Assert.Contains("must be a JSON object", result.ErrorMessage); Assert.Contains("must be a JSON object", result.ErrorMessage);
} }
[Theory]
[InlineData("Boolean", "true", true)]
[InlineData("Integer", "42", (long)42)]
[InlineData("Float", "3.14", 3.14)]
[InlineData("String", "\"hello\"", "hello")]
public void ValidTypeCoercion_Succeeds(string type, string jsonValue, object expected)
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "val", Type = type, Required = true }
});
using var doc = JsonDocument.Parse($"{{\"val\": {jsonValue}}}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.Equal(expected, result.Parameters["val"]);
}
[Fact]
public void WrongType_ReturnsInvalid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "count", Type = "Integer", Required = true }
});
using var doc = JsonDocument.Parse("{\"count\": \"not a number\"}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.False(result.IsValid);
Assert.Contains("must be an Integer", result.ErrorMessage);
}
[Fact]
public void ObjectType_Parsed()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "data", Type = "Object", Required = true }
});
using var doc = JsonDocument.Parse("{\"data\": {\"key\": \"value\"}}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.IsType<Dictionary<string, object?>>(result.Parameters["data"]);
}
[Fact]
public void ListType_Parsed()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "items", Type = "List", Required = true }
});
using var doc = JsonDocument.Parse("{\"items\": [1, 2, 3]}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.IsType<List<object?>>(result.Parameters["items"]);
}
[Fact] [Fact]
public void OptionalParameter_MissingBody_ReturnsValid() public void OptionalParameter_MissingBody_ReturnsValid()
{ {
var definitions = JsonSerializer.Serialize(new[] const string def = """{"type":"object","properties":{"optional":{"type":"string"}}}""";
{
new { Name = "optional", Type = "String", Required = false }
});
var result = ParameterValidator.Validate(null, definitions); var result = ParameterValidator.Validate(null, def);
Assert.True(result.IsValid); Assert.True(result.IsValid);
} }
// ── Scalar coercion ────────────────────────────────────────────────────────
[Theory]
[InlineData("boolean", "true", true)]
[InlineData("integer", "42", (long)42)]
[InlineData("number", "3.14", 3.14)]
[InlineData("string", "\"hello\"", "hello")]
public void ValidTypeCoercion_Succeeds(string type, string jsonValue, object expected)
{
var def = "{\"type\":\"object\",\"properties\":{\"val\":{\"type\":\"" + type + "\"}},\"required\":[\"val\"]}";
var result = ParameterValidator.Validate(Body($"{{\"val\": {jsonValue}}}"), def);
Assert.True(result.IsValid);
Assert.Equal(expected, result.Parameters["val"]);
}
[Fact]
public void WrongScalarType_ReturnsInvalid()
{
const string def = """{"type":"object","properties":{"count":{"type":"integer"}},"required":["count"]}""";
var result = ParameterValidator.Validate(Body("{\"count\": \"not a number\"}"), def);
Assert.False(result.IsValid);
Assert.Contains("'count'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact] [Fact]
public void UnknownType_ReturnsInvalid() public void UnknownType_ReturnsInvalid()
{ {
var definitions = JsonSerializer.Serialize(new[] const string def = """{"type":"object","properties":{"val":{"type":"customtype"}},"required":["val"]}""";
{
new { Name = "val", Type = "CustomType", Required = true }
});
using var doc = JsonDocument.Parse("{\"val\": \"test\"}"); var result = ParameterValidator.Validate(Body("{\"val\": \"test\"}"), def);
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.False(result.IsValid); Assert.False(result.IsValid);
Assert.Contains("Unknown parameter type", result.ErrorMessage); Assert.Contains("unknown declared type", result.ErrorMessage);
} }
// --- InboundAPI-010: unexpected top-level body fields must be reported so // ── Object / List shape + materialization ──────────────────────────────────
// callers get feedback on typo'd parameter names instead of silent ignore. ---
[Fact] [Fact]
public void UnexpectedBodyField_ReturnsInvalid() public void ObjectType_NoDeclaredFields_ShapeOnly_Materialized()
{ {
var definitions = JsonSerializer.Serialize(new[] const string def = """{"type":"object","properties":{"data":{"type":"object"}},"required":["data"]}""";
{
new { Name = "value", Type = "Integer", Required = true } var result = ParameterValidator.Validate(Body("{\"data\": {\"key\": \"value\"}}"), def);
}); Assert.True(result.IsValid);
Assert.IsType<Dictionary<string, object?>>(result.Parameters["data"]);
}
[Fact]
public void ListType_NoDeclaredElement_ShapeOnly_Materialized()
{
const string def = """{"type":"object","properties":{"items":{"type":"array"}},"required":["items"]}""";
var result = ParameterValidator.Validate(Body("{\"items\": [1, 2, 3]}"), def);
Assert.True(result.IsValid);
Assert.IsType<List<object?>>(result.Parameters["items"]);
}
// ── Undeclared / unexpected fields (rejected, recursively) ─────────────────
[Fact]
public void UnexpectedTopLevelField_ReturnsInvalid()
{
const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
// "valeu" is a typo for "value"; the caller must be told, not ignored. // "valeu" is a typo for "value"; the caller must be told, not ignored.
using var doc = JsonDocument.Parse("{\"value\": 1, \"valeu\": 2}"); var result = ParameterValidator.Validate(Body("{\"value\": 1, \"valeu\": 2}"), def);
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.False(result.IsValid); Assert.False(result.IsValid);
Assert.Contains("valeu", result.ErrorMessage); Assert.Contains("valeu", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
} }
[Fact] [Fact]
public void OnlyDefinedFields_StillValid() public void OnlyDeclaredFields_StillValid()
{ {
// Regression guard: a body containing exactly the defined parameters const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
// must continue to validate.
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "value", Type = "Integer", Required = true }
});
using var doc = JsonDocument.Parse("{\"value\": 1}"); var result = ParameterValidator.Validate(Body("{\"value\": 1}"), def);
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid); Assert.True(result.IsValid);
Assert.Equal((long)1, result.Parameters["value"]); Assert.Equal((long)1, result.Parameters["value"]);
} }
[Fact]
public void UndeclaredNestedField_ReturnsInvalid_PathQualified()
{
const string def = """
{"type":"object","properties":{
"order":{"type":"object","properties":{"id":{"type":"integer"}},"required":["id"]}
},"required":["order"]}
""";
var result = ParameterValidator.Validate(
Body("""{"order":{"id":1,"bogus":2}}"""), def);
Assert.False(result.IsValid);
Assert.Contains("order.bogus", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
}
// ── Nested validation: the M2.6 core ───────────────────────────────────────
private const string NestedDef = """
{
"type":"object",
"properties":{
"order":{
"type":"object",
"properties":{
"id":{"type":"integer"},
"customer":{
"type":"object",
"properties":{"name":{"type":"string"},"vip":{"type":"boolean"}},
"required":["name"]
},
"items":{
"type":"array",
"items":{
"type":"object",
"properties":{"sku":{"type":"string"},"quantity":{"type":"integer"}},
"required":["sku","quantity"]
}
}
},
"required":["id","customer","items"]
}
},
"required":["order"]
}
""";
[Fact]
public void ValidNestedPayload_Passes()
{
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":true},
"items":[
{"sku":"A1","quantity":3},
{"sku":"B2","quantity":1}
]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void WrongScalarTwoLevelsDeep_ReturnsInvalid_WithExactPath()
{
// order.customer.vip declared boolean, given a string.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":"yes"},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.customer.vip'", result.ErrorMessage);
Assert.Contains("Boolean", result.ErrorMessage);
}
[Fact]
public void WrongScalarInsideListElement_ReturnsInvalid_WithElementIndexInPath()
{
// order.items[1].quantity declared integer, given a string.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme"},
"items":[
{"sku":"A1","quantity":3},
{"sku":"B2","quantity":"lots"}
]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.items[1].quantity'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact]
public void ListElementWrongShape_ReturnsInvalid_WithElementIndexInPath()
{
// order.items[0] declared object, given a scalar.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme"},
"items":[ 42 ]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.items[0]'", result.ErrorMessage);
Assert.Contains("Object", result.ErrorMessage);
}
[Fact]
public void MissingRequiredNestedField_ReturnsInvalid_PathQualified()
{
// order.customer.name is required but absent.
const string body = """
{"order":{
"id":7,
"customer":{"vip":false},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("missing required field", result.ErrorMessage);
Assert.Contains("'order.customer.name'", result.ErrorMessage);
}
// ── Empty / null edge cases ────────────────────────────────────────────────
[Fact]
public void EmptyList_AgainstTypedElement_Passes()
{
const string body = """
{"order":{"id":7,"customer":{"name":"Acme"},"items":[]}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void NullForOptionalNestedScalar_Passes()
{
// order.customer.vip is optional; explicit null is accepted.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":null},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void NullForRequiredNestedScalar_Passes()
{
// A PRESENT-but-null required field satisfies the type — only ABSENCE
// of a required field is an error (consistent with return-side policy).
const string body = """
{"order":{
"id":null,
"customer":{"name":"Acme"},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
// ── Legacy flat-array backward-compat ──────────────────────────────────────
[Fact]
public void LegacyFlatArrayDefinition_StillAccepted()
{
const string def = """[{"name":"count","type":"Integer","required":true}]""";
var ok = ParameterValidator.Validate(Body("{\"count\":5}"), def);
Assert.True(ok.IsValid);
Assert.Equal((long)5, ok.Parameters["count"]);
var bad = ParameterValidator.Validate(Body("{\"count\":\"nope\"}"), def);
Assert.False(bad.IsValid);
Assert.Contains("'count'", bad.ErrorMessage);
}
// FIX 1: legacy "required":"false" string → field is optional ─────────────
[Theory]
[InlineData("""[{"name":"opt","type":"String","required":"false"}]""")]
[InlineData("""[{"name":"opt","type":"String","required":"False"}]""")]
[InlineData("""[{"name":"opt","type":"String","required":"FALSE"}]""")]
public void LegacyFlatArray_RequiredStringFalse_FieldIsOptional(string def)
{
// An absent field whose "required" is the string "false" (any case)
// must be treated as optional — consistent with the SQL migration's
// LOWER(...) <> 'false' comparison that produced these rows.
var result = ParameterValidator.Validate(null, def);
Assert.True(result.IsValid, $"Expected optional field to be valid when absent; error: {result.ErrorMessage}");
}
[Fact]
public void LegacyFlatArray_RequiredStringFalse_FieldPresentAndTypedCorrectly_Passes()
{
const string def = """[{"name":"opt","type":"String","required":"false"}]""";
var result = ParameterValidator.Validate(Body("{\"opt\":\"hello\"}"), def);
Assert.True(result.IsValid);
}
// FIX 2: recursion depth guard on Parse ───────────────────────────────────
/// <summary>
/// Builds a JSON Schema string with <paramref name="depth"/> levels of nested
/// object-in-properties nesting. Each level wraps the previous in an object
/// with a single property "a". The result exceeds the Parse ceiling when
/// depth &gt; 32.
/// </summary>
private static string BuildDeeplyNestedSchema(int depth)
{
// Inner-most: a scalar
var schema = "{\"type\":\"string\"}";
for (var i = 0; i < depth; i++)
{
schema = "{\"type\":\"object\",\"properties\":{\"a\":" + schema + "}}";
}
return schema;
}
[Fact]
public void SchemaAtDepthCeiling_ParsesSuccessfully()
{
// Exactly 32 levels of nesting should parse without throwing.
var def = BuildDeeplyNestedSchema(32);
var schema = InboundApiSchema.Parse(def);
Assert.NotNull(schema);
}
[Fact]
public void SchemaExceedingDepthCeiling_ThrowsJsonException_NotStackOverflow()
{
// 33 levels exceeds the ceiling → JsonException (clean 400 via the
// caller's try/catch), NOT a StackOverflowException.
var def = BuildDeeplyNestedSchema(33);
Assert.Throws<System.Text.Json.JsonException>(() => InboundApiSchema.Parse(def));
}
[Fact]
public void SchemaExceedingDepthCeiling_ParameterValidator_ReturnsInvalid()
{
// End-to-end: ParameterValidator wraps Parse in try/catch(JsonException)
// → the caller gets Invalid rather than an unhandled exception.
var def = BuildDeeplyNestedSchema(33);
var result = ParameterValidator.Validate(Body("{\"a\":\"x\"}"), def);
Assert.False(result.IsValid);
Assert.Contains("Invalid parameter definitions", result.ErrorMessage);
}
} }
@@ -1,13 +1,21 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests; namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
/// <summary> /// <summary>
/// InboundAPI-014: tests for return-value validation against a method's /// InboundAPI-014 / InboundAPI-M2.6: tests for return-value validation against a
/// <c>ReturnDefinition</c>. Previously the script's return value was serialized /// method's <c>ReturnDefinition</c>. Mirrors <see cref="ParameterValidatorTests"/>
/// verbatim with no checking against the declared return structure. /// (shared recursive engine) — RECURSIVE nested Object / List-element type
/// validation with path-qualified errors.
///
/// <para>
/// Definitions are expressed as JSON Schema (the canonical persisted format);
/// the legacy flat-array form is still accepted (final region).
/// </para>
/// </summary> /// </summary>
public class ReturnValueValidatorTests public class ReturnValueValidatorTests
{ {
// --- No definition → no validation (backward compatible) --- // ── No definition → no validation (backward compatible) ───────────────────
[Theory] [Theory]
[InlineData(null)] [InlineData(null)]
@@ -26,12 +34,17 @@ public class ReturnValueValidatorTests
Assert.True(result.IsValid); Assert.True(result.IsValid);
} }
// --- Happy path: result matches the declared field shape --- // ── Happy path: result matches the declared object shape ──────────────────
[Fact] [Fact]
public void ResultMatchingDefinition_IsValid() public void ResultMatchingDefinition_IsValid()
{ {
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]"""; const string def = """
{"type":"object","properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"}
},"required":["siteName","totalUnits"]}
""";
const string json = """{"siteName":"Site Alpha","totalUnits":14250}"""; const string json = """{"siteName":"Site Alpha","totalUnits":14250}""";
var result = ReturnValueValidator.Validate(json, def); var result = ReturnValueValidator.Validate(json, def);
@@ -40,22 +53,31 @@ public class ReturnValueValidatorTests
} }
[Fact] [Fact]
public void ResultWithListField_ShapeChecked_IsValid() public void ResultWithListOfScalars_TypeChecked_IsValid()
{ {
const string def = """[{"name":"lines","type":"List"}]"""; const string def = """
const string json = """{"lines":[{"lineName":"Line-1","units":8200}]}"""; {"type":"object","properties":{
"codes":{"type":"array","items":{"type":"integer"}}
}}
""";
const string json = """{"codes":[1,2,3]}""";
var result = ReturnValueValidator.Validate(json, def); var result = ReturnValueValidator.Validate(json, def);
Assert.True(result.IsValid); Assert.True(result.IsValid);
} }
// --- Mismatches must be reported --- // ── Scalar / shape mismatches must be reported ────────────────────────────
[Fact] [Fact]
public void ResultMissingDeclaredField_IsInvalid() public void ResultMissingDeclaredField_IsInvalid()
{ {
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]"""; const string def = """
{"type":"object","properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"}
},"required":["siteName","totalUnits"]}
""";
const string json = """{"siteName":"Site Alpha"}"""; const string json = """{"siteName":"Site Alpha"}""";
var result = ReturnValueValidator.Validate(json, def); var result = ReturnValueValidator.Validate(json, def);
@@ -67,7 +89,7 @@ public class ReturnValueValidatorTests
[Fact] [Fact]
public void ResultFieldWrongType_IsInvalid() public void ResultFieldWrongType_IsInvalid()
{ {
const string def = """[{"name":"totalUnits","type":"Integer"}]"""; const string def = """{"type":"object","properties":{"totalUnits":{"type":"integer"}},"required":["totalUnits"]}""";
const string json = """{"totalUnits":"not-a-number"}"""; const string json = """{"totalUnits":"not-a-number"}""";
var result = ReturnValueValidator.Validate(json, def); var result = ReturnValueValidator.Validate(json, def);
@@ -79,7 +101,7 @@ public class ReturnValueValidatorTests
[Fact] [Fact]
public void NullResultWhenStructureRequired_IsInvalid() public void NullResultWhenStructureRequired_IsInvalid()
{ {
const string def = """[{"name":"siteName","type":"String"}]"""; const string def = """{"type":"object","properties":{"siteName":{"type":"string"}},"required":["siteName"]}""";
var result = ReturnValueValidator.Validate(null, def); var result = ReturnValueValidator.Validate(null, def);
@@ -89,7 +111,7 @@ public class ReturnValueValidatorTests
[Fact] [Fact]
public void NonObjectResultWhenStructureRequired_IsInvalid() public void NonObjectResultWhenStructureRequired_IsInvalid()
{ {
const string def = """[{"name":"siteName","type":"String"}]"""; const string def = """{"type":"object","properties":{"siteName":{"type":"string"}},"required":["siteName"]}""";
var result = ReturnValueValidator.Validate("42", def); var result = ReturnValueValidator.Validate("42", def);
@@ -99,7 +121,7 @@ public class ReturnValueValidatorTests
[Fact] [Fact]
public void ListFieldGivenNonArray_IsInvalid() public void ListFieldGivenNonArray_IsInvalid()
{ {
const string def = """[{"name":"lines","type":"List"}]"""; const string def = """{"type":"object","properties":{"lines":{"type":"array","items":{"type":"object"}}}}""";
const string json = """{"lines":"not-a-list"}"""; const string json = """{"lines":"not-a-list"}""";
var result = ReturnValueValidator.Validate(json, def); var result = ReturnValueValidator.Validate(json, def);
@@ -115,4 +137,261 @@ public class ReturnValueValidatorTests
Assert.False(result.IsValid); Assert.False(result.IsValid);
} }
// ── Nested validation: the M2.6 core (production-report shape) ─────────────
private const string ReportDef = """
{
"type":"object",
"properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"},
"lines":{
"type":"array",
"items":{
"type":"object",
"properties":{
"lineName":{"type":"string"},
"units":{"type":"integer"},
"efficiency":{"type":"number"}
},
"required":["lineName","units"]
}
}
},
"required":["siteName","totalUnits","lines"]
}
""";
[Fact]
public void ValidNestedReturn_Passes()
{
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[
{"lineName":"Line-1","units":8200,"efficiency":92.5},
{"lineName":"Line-2","units":6050,"efficiency":88.1}
]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
[Fact]
public void WrongScalarInsideListElement_IsInvalid_WithElementIndexInPath()
{
// lines[1].units declared integer, given a string.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[
{"lineName":"Line-1","units":8200},
{"lineName":"Line-2","units":"lots"}
]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[1].units'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact]
public void WrongListElementType_IsInvalid_WithElementIndexInPath()
{
// lines[0] declared object, given a scalar.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ 7 ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[0]'", result.ErrorMessage);
Assert.Contains("Object", result.ErrorMessage);
}
[Fact]
public void MissingRequiredNestedField_IsInvalid_PathQualified()
{
// lines[0].units is required but absent.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ {"lineName":"Line-1"} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("missing required field", result.ErrorMessage);
Assert.Contains("'lines[0].units'", result.ErrorMessage);
}
[Fact]
public void UndeclaredNestedField_IsInvalid_PathQualified()
{
// lines[0].bogus is not declared on the line-item schema.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ {"lineName":"Line-1","units":1,"bogus":true} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[0].bogus'", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
}
// ── Empty / null edge cases ────────────────────────────────────────────────
[Fact]
public void EmptyListAgainstTypedElement_Passes()
{
const string json = """{"siteName":"S","totalUnits":0,"lines":[]}""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
[Fact]
public void EmptyObjectSchema_AnythingIsValid()
{
const string def = """{"type":"object","properties":{}}""";
var result = ReturnValueValidator.Validate("""{"whatever":1}""", def);
Assert.True(result.IsValid);
}
[Fact]
public void NullOptionalNestedScalar_Passes()
{
// lines[0].efficiency is optional; explicit null is accepted.
const string json = """
{
"siteName":"S",
"totalUnits":1,
"lines":[ {"lineName":"L1","units":1,"efficiency":null} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
// ── Legacy flat-array backward-compat ──────────────────────────────────────
[Fact]
public void LegacyFlatArrayDefinition_StillAccepted()
{
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]""";
var ok = ReturnValueValidator.Validate("""{"siteName":"A","totalUnits":1}""", def);
Assert.True(ok.IsValid);
var bad = ReturnValueValidator.Validate("""{"siteName":"A","totalUnits":"x"}""", def);
Assert.False(bad.IsValid);
Assert.Contains("totalUnits", bad.ErrorMessage);
}
// FIX 3: scalar return schema validates scalar return values ──────────────
// (Guards the intentional ParameterValidator/ReturnValueValidator asymmetry:
// ReturnValueValidator must NOT short-circuit on non-object schema types.)
[Fact]
public void ScalarStringReturnSchema_ValidatesScalarStringReturn()
{
// A {"type":"string"} return schema must accept a bare JSON string.
var result = ReturnValueValidator.Validate("\"hello\"", """{"type":"string"}""");
Assert.True(result.IsValid);
}
[Fact]
public void ScalarIntegerReturnSchema_ValidatesScalarIntegerReturn()
{
var result = ReturnValueValidator.Validate("42", """{"type":"integer"}""");
Assert.True(result.IsValid);
}
[Fact]
public void ScalarStringReturnSchema_RejectsIntegerReturn()
{
var result = ReturnValueValidator.Validate("42", """{"type":"string"}""");
Assert.False(result.IsValid);
Assert.Contains("String", result.ErrorMessage);
}
[Fact]
public void ScalarBooleanReturnSchema_ValidatesBooleanReturn()
{
var result = ReturnValueValidator.Validate("true", """{"type":"boolean"}""");
Assert.True(result.IsValid);
}
// FIX 2: recursion depth guard on Validate ─────────────────────────────────
[Fact]
public void ValidateExceedingDepthCeiling_AddsDepthError_DoesNotThrow()
{
// Build a schema programmatically (bypassing Parse) with 34 levels of
// nesting to exceed the ceiling of 32. Validate must add an error and
// return, NOT stack overflow.
//
// Parse prevents creating a >32-level schema from stored JSON, but
// InboundApiSchema is a public type constructable in code, so Validate
// must guard independently.
var deepSchema = BuildProgrammaticSchema(34);
var json = BuildDeeplyNestedValue(34);
using var doc = System.Text.Json.JsonDocument.Parse(json);
var errors = new System.Collections.Generic.List<string>();
// Must not throw — adds a depth error to the list instead.
deepSchema.Validate(doc.RootElement, string.Empty, errors);
Assert.NotEmpty(errors);
Assert.Contains("nesting too deep", errors[0], StringComparison.OrdinalIgnoreCase);
}
/// <summary>
/// Constructs an <see cref="InboundApiSchema"/> with <paramref name="depth"/>
/// levels of object-nesting programmatically (bypassing <c>Parse</c>) to
/// exercise the Validate depth ceiling independently of the Parse ceiling.
/// </summary>
private static InboundApiSchema BuildProgrammaticSchema(int depth)
{
InboundApiSchema inner = new() { Type = "string" };
for (var i = 0; i < depth; i++)
{
inner = new InboundApiSchema
{
Type = "object",
Fields = [new InboundApiSchemaField("a", Required: false, inner)],
};
}
return inner;
}
private static string BuildDeeplyNestedValue(int depth)
{
var value = "\"leaf\"";
for (var i = 0; i < depth; i++)
{
value = "{\"a\":" + value + "}";
}
return value;
}
} }

Some files were not shown because too many files have changed in this diff Show More