Merge M2: stillpending.md Tier-2 correctness & behavioral gaps (#7,#8,#9,#10,#13,#15,#17,#18,#20,#21,#22,#23,#24,#25,#26,#27,#28,#29,#30,#31,#32)

20 tasks (M2.0-M2.19), each through its classification-driven review chain.
Full-solution build green (0 warnings, TreatWarningsAsErrors). Per-task targeted
suites all passed. Known pre-existing: 2 partition-purge E2E failures (follow-up #52).
This commit is contained in:
Joseph Doherty
2026-06-16 08:27:59 -04:00
110 changed files with 13232 additions and 495 deletions
+13 -13
View File
@@ -36,28 +36,28 @@ public class ApiMethod
public int Id { get; set; }
public string Name { get; set; } // route segment
public string Script { get; set; } // Roslyn C# script body
public string? ParameterDefinitions { get; set; } // JSON: List<ParameterDefinition>
public string? ReturnDefinition { get; set; } // JSON: List<ReturnFieldDefinition>
public string? ParameterDefinitions { get; set; } // JSON Schema (object) describing parameters
public string? ReturnDefinition { get; set; } // JSON Schema describing the return value
public int TimeoutSeconds { get; set; }
}
```
`ParameterDefinitions` and `ReturnDefinition` are stored as JSON strings to keep the schema simple; both are deserialized on every request by `ParameterValidator` and `ReturnValueValidator`.
`ParameterDefinitions` and `ReturnDefinition` are stored as JSON Schema strings (canonical form: `{"type":"object","properties":{…},"required":[…]}`, arrays via `"items"`); both are parsed on every request by `ParameterValidator` and `ReturnValueValidator` into a shared recursive `InboundApiSchema` (Commons). The legacy flat-array form (`[{name,type,required,itemType?}]`) is still accepted on read.
### Extended type system
Parameter and return field definitions share the same six-type vocabulary:
Parameter and return definitions share the same six-type vocabulary (JSON Schema type tokens in parentheses):
| Type | JSON shape | C# value after coercion |
|-----------|----------------------|-------------------------------------|
| `Boolean` | `true` / `false` | `bool` |
| `Integer` | number (whole) | `long` |
| `Float` | number | `double` |
| `String` | string | `string` |
| `Object` | JSON object | `Dictionary<string, object?>` |
| `List` | JSON array | `List<object?>` |
| Type | JSON Schema token | JSON shape | C# value after coercion |
|-----------|-------------------|------------------|-------------------------------|
| `Boolean` | `boolean` | `true` / `false` | `bool` |
| `Integer` | `integer` | number (whole) | `long` |
| `Float` | `number` | number | `double` |
| `String` | `string` | string | `string` |
| `Object` | `object` | JSON object | `Dictionary<string, object?>` |
| `List` | `array` | JSON array | `List<object?>` |
`Object` and `List` are validated for JSON shape only — field-level or element-level type constraints are the script's responsibility. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway.
`Object` and `List` are validated **recursively**: a declared object validates each field against its declared (nested) type and rejects undeclared fields; a list validates every element against the declared `items` type. Scalars are checked at any depth and errors are path-qualified (e.g. `order.items[2].quantity`). A bare `{"type":"object"}` / `{"type":"array"}` (no `properties` / `items`) stays shape-only. Template attributes use only the four primitive types; the extended types apply here and in the External System Gateway.
## Architecture
@@ -0,0 +1,203 @@
# M2 — Correctness & Behavioral Gaps (Tier 2) Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: superpowers-extended-cc:subagent-driven-development. Execute task-by-task on branch `feature/stillpending-m2`, in-place (NOT a worktree — docker tooling builds from the repo path; implementers run **serially** to avoid racing the shared git index). Honor each task's `Classification` for the review chain.
**Goal:** Close the Tier-2 correctness/behavioral divergences from `stillpending.md` — make narrow/inert behaviors match the spec, and where the spec was the divergence, update it in the same slice.
**Architecture:** Touches the central Config DB (EF migrations), Site Runtime actors, the DCL alarm pipeline, the template validation/flattening pipeline, the deployment diff, Host startup validation, the Security cookie-auth pipeline, and Site Event Logging. Each item is independently shippable.
**Tech Stack:** C#/.NET 10, EF Core 10 (MS SQL central + SQLite site), Akka.NET 1.5, OPC UA (`OPCFoundation.NetStandard.Opc.Ua.Client`), ASP.NET Core cookie auth, xUnit/FluentAssertions/NSubstitute.
**Build/verify:** `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (TreatWarningsAsErrors ON). Redeploy: `bash docker/deploy.sh`. Test user `--username multi-role --password password`.
---
## Scope decisions (recorded; per "use recommendations")
- **#19 (script started/completed events)** — already shipped in M1.8 (`e74c3ae`). **Excluded.**
- **#16 (Transport stale-instance enumeration)** — genuine Tier-2 gap but NOT in the approved M2 list, and the fix needs a non-trivial shared-script-hash staleness compute across instances. **Deferred to the Transport milestone (M8).** Tracked, not dropped.
- **#17 (MachineDataDb)** — a deliberate prior decision ("Host-008") removed this validation with a regression test asserting absence *passes*. The approved design doc says to add the option + startup validation, and both REQ-HOST-3/4 and the shipped docker `appsettings.Central.json` carry the key. **Resolution: implement per design doc (add option + central startup validation, no DbContext since nothing consumes it), reverting the Host-008 regression test and noting the reversal in the commit.**
- **#31 (StateTransitionValidator delete-from-NotDeployed)** — the audit claimed a "deliberate per code comment"; investigation found **no such comment**. **Reconcile by intent (git blame); default = align code to the spec matrix (remove `NotDeployed` from `CanDelete`) unless blame shows deliberate orphan-cleanup intent, in which case update the doc matrix.**
- **#8 (conditionFilter) semantics** — the filter is currently an undefined nullable string. **Define it as a comma-separated, case-insensitive list of alarm/condition *type names*; null/blank = mirror all.** Authoritative enforcement is **client-side in `DataConnectionActor` routing** (uniform across OPC UA + MxGateway, since MxGateway has no server-side filter); OPC UA additionally gets a server-side `WhereClause` as a bandwidth optimization where the type maps cleanly. Implementer confirms the discriminator field on `NativeAlarmTransition`.
- **#15 (LDAP re-query)** — highest risk; passwordless group re-query depends on a shared-lib capability that may not exist. **Spike first**, then ship the always-achievable layers (idle-timeout enforcement + DB role-mapping refresh on stored group claims) and the LDAP group re-query only if the lib supports a service-account search; document any residual limitation.
---
## Execution order & dependencies
Risk-first, migration-safe ordering. `#32` first (unblocks DB-backed verification). The two migration-touching tasks (`#32`, M2.5) are serialized so the snapshot stays clean.
| # | Task | Class | Migration? |
|---|------|-------|-----------|
| #32 | M2.0 EF model/snapshot drift | high-risk | snapshot only |
| M2.1 | #22 native-alarm capability validation wired into deploy pipeline | standard | no |
| M2.2 | #10 connection-level diff surfaced | standard | no |
| M2.3 | #7 `Database.CachedWrite` transient/permanent classification | high-risk | no |
| M2.4 | #8 alarm `conditionFilter` applied | high-risk | no |
| M2.5 | #9 per-script execution timeout | standard | **yes** (new column) |
| M2.6 | #13 nested `Object`/`List` validation | standard | no |
| M2.7 | #20 + #21 return-type + argument-type compatibility | standard | no |
| M2.8 | #23 binding-completeness Error + "name exists at site" | standard | no |
| M2.9 | #17 MachineDataDb fail-fast | small | no |
| M2.10 | #18 CI grep-guard (data-layer scan test) | small | no |
| M2.11 | #24 debug snapshot unknown-instance → error | small | no |
| M2.12 | #25 recursion-limit → site event log | small | no |
| M2.13 | #27 OPC UA / MxGateway transition field population | small | no |
| M2.14 | #28 readiness "required singletons running" probe | standard | no |
| M2.15 | #29 site active-node purge-gate DI registration | small | no |
| M2.16 | #30 `FailedWriteCount` consumed by Health Monitoring | small | no |
| M2.17 | #31 StateTransitionValidator reconcile | small | no |
| M2.18 | #26 debug-stream ordering + replay/dedup | high-risk | no |
| M2.19 | #15 LDAP periodic re-query (spike + impl) | high-risk | no |
---
## Tasks
### M2.0 — #32: EF model/snapshot drift (PendingModelChangesWarning)
**Classification:** high-risk · **Files:** `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Configurations/AuditLogEntityTypeConfiguration.cs:68-69`, `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Migrations/ScadaBridgeDbContextModelSnapshot.cs`, possibly a new empty migration.
**Root cause:** `OccurredAtUtc` has `.HasConversion(UtcConverter)` in config; the model snapshot omits the converter annotation → EF throws `PendingModelChangesWarning` in `MsSqlMigrationFixture.MigrateAsync` (~57 AuditLog MSSQL tests fail in fixture ctor).
**Fix:** Run `dotnet ef migrations has-pending-model-changes` (or `migrations add`) against `ScadaBridgeDbContext` to surface the FULL drift (there may be more than `OccurredAtUtc`). Prefer the EF-canonical path: `dotnet ef migrations add ResyncAuditLogModelSnapshot`**verify the generated migration's `Up`/`Down` are empty (no DDL)**; a value-converter-only change produces no DDL but realigns the snapshot. If non-empty/unexpected DDL appears, stop and report. Auto-apply is dev-only per CLAUDE.md, so an empty migration is harmless to prod.
**Tests:** Re-run `dotnet test tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests` (requires MSSQL via `cd infra && docker compose up -d`); the ~57 fixture-ctor failures must clear. If MSSQL is unreachable in this environment, confirm the build + the snapshot diff is empty-DDL and note the test gating.
**DoD:** No `PendingModelChangesWarning`; AuditLog MSSQL suite green (or gated-with-note if no DB). Adversarial: confirm no real schema change was smuggled in.
### M2.1 — #22: native-alarm-source capability validation wired into deploy pipeline
**Classification:** standard · **Files:** `src/.../DeploymentManager/FlatteningPipeline.cs:93,115`, `SemanticValidator.cs:30-33,239-245`, validation service call site.
**Gap (M1-era regression):** `FlatteningPipeline` loads `dataConnections` but never extracts the alarm-capable subset, so `SemanticValidator.Validate(...)` is always called with `alarmCapableConnectionNames = null` → native-alarm-source capability check never runs; a source can reference a non-alarm-capable connection and deploy.
**Fix:** In `FlatteningPipeline`, compute the alarm-capable connection-name set from the loaded connections (filter by the protocol/capability that maps to `IAlarmSubscribableConnection` — OPC UA + MxGateway), pass it into the validator. Confirm the capability predicate (protocol enum / adapter capability) is the same one DCL uses to decide `IAlarmSubscribableConnection`.
**Tests:** `tests/.../TemplateEngine.Tests` SemanticValidator/flattening — add: native-alarm source on a non-alarm-capable connection → validation Error; on a capable one → passes.
**DoD:** Deploy gate rejects native-alarm sources bound to non-capable connections.
### M2.2 — #10: connection-level diff surfaced in deployment diff
**Classification:** standard · **Files:** `src/.../Commons/Types/Flattening/ConfigurationDiff.cs:7-24`, `src/.../TemplateEngine/Flattening/DiffService.cs:19-54,174-204`, Central UI diff render (`CentralUI/Components/Shared/DiffDialog.razor` caller / deployment preview page).
**Gap:** `ComputeConnectionsDiff` exists **with tests** but is dead (no callers); `ConfigurationDiff` has no `ConnectionChanges` slot; `HasChanges` ignores connections.
**Fix:** Add `ConnectionChanges` slot (`IReadOnlyList<DiffEntry<ConnectionConfig>>` — the element type already exists) to `ConfigurationDiff`; include it in `HasChanges`. Call `ComputeConnectionsDiff` from `ComputeDiff` and populate the slot. Surface in the deployment-diff UI alongside attribute/alarm/script changes (connection name + old/new protocol + endpoint config). Wire the existing `ComputeConnectionsDiff` tests' expectations through `ComputeDiff` too.
**Tests:** `tests/.../TemplateEngine.Tests/Flattening/DiffServiceTests.cs` — add a `ComputeDiff` integration assertion that `ConnectionChanges` populates and `HasChanges` is true when only a connection differs.
**DoD:** Standalone connection endpoint/protocol/failover drift appears in the deployment diff.
### M2.3 — #7: `Database.CachedWrite` classifies transient vs permanent SQL errors
**Classification:** high-risk · **Files:** `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:78-204`, new `SqlErrorClassifier.cs` + `PermanentDatabaseException`, reference `ExternalSystemClient.cs:80-162` + `ErrorClassifier.cs`.
**Gap:** `CachedWriteAsync` buffers ALL writes without an immediate attempt; `DeliverBufferedAsync` throws on any `SqlException` → S&F retries permanent errors forever; the script never gets a synchronous `Failed`. The API path (`ExternalSystemClient`) does it right.
**Fix (mirror API path):** Add `SqlErrorClassifier.IsTransient(SqlException)` — transient = connection/timeout/deadlock/throttle error numbers (e.g. `-2, 64, 53, 233, 1205, 40197, 40501, 40613, 49918-49920`); permanent = constraint/syntax/permission/etc. Create `PermanentDatabaseException` (parallel to `PermanentExternalSystemException`). In `CachedWriteAsync`: attempt immediately; on success done; on permanent → return `Failed` synchronously (set the tracking row terminal `Failed`) and do NOT buffer; on transient → buffer to S&F. In `DeliverBufferedAsync`: classify on `SqlException`, return `false` (park) for permanent, rethrow for transient (S&F retries). Keep behavior unified with `TrackedOperationId`/`OperationTrackingStore` and the `Pending → Retrying → Delivered/Parked/Failed/Discarded` lifecycle.
**Tests:** `tests/.../ExternalSystemGateway.Tests/DatabaseGatewayTests.cs` — transient SQL (deadlock 1205, timeout -2) → buffers/retries; permanent SQL (constraint 2627, syntax 102, permission 229) → synchronous `Failed`, not buffered; `DeliverBuffered` parks on permanent. Adversarial: ambiguous error numbers default to the safer classification (document which).
**DoD:** Permanent SQL errors fail fast to the script as `Failed`; only transient errors buffer.
### M2.4 — #8: alarm `conditionFilter` applied (OPC UA WhereClause + client-side routing)
**Classification:** high-risk · **Files:** `src/.../DataConnectionLayer/Actors/DataConnectionActor.cs:1482,1540-1554`, `Adapters/RealOpcUaClient.cs:242,295-310`, `Adapters/MxGatewayDataConnection.cs:154-167`, `IAlarmSubscribableConnection.cs`.
**Decision (semantics):** filter = comma-separated, case-insensitive list of alarm/condition **type names**; null/blank = mirror all. **Authoritative gate = client-side in `DataConnectionActor.HandleAlarmTransitionReceived`** (after source-ref match, drop transitions whose type name isn't in the source's filter set). Store the per-source filter set correctly (the current `_alarmSourceFilter[...]` keying is wrong — key by source reference). OPC UA additionally builds a server-side `WhereClause` in `RealOpcUaClient` as an optimization where the condition type maps cleanly; MxGateway relies solely on the client-side gate.
**Fix:** (1) Parse the filter string into a normalized set at subscribe time, keyed by source ref. (2) In routing, consult the set and skip non-matching transitions. (3) In `RealOpcUaClient.BuildAlarmEventFilter`, attach a `WhereClause` (ContentFilter on the condition/event type) built from the filter when present. Confirm `NativeAlarmTransition` exposes a usable type-name discriminator; if not, filter on the available field and note it.
**Tests:** `tests/.../DataConnectionLayer.Tests/DataConnectionActorAlarmTests.cs` — filter set → only matching-type transitions delivered; null → all delivered; MxGateway path filters client-side; OPC UA builds a non-empty WhereClause. Adversarial: case/whitespace variations in the filter list.
**DoD:** Setting a conditionFilter actually restricts mirrored conditions across both adapters.
### M2.5 — #9: per-script execution timeout
**Classification:** standard · **Migration: yes.** · **Files:** `Commons/Entities/Templates/TemplateScript.cs`, `ConfigurationDatabase/Configurations/TemplateConfiguration.cs` (`TemplateScriptConfiguration`), **new EF migration**, `Commons/Types/Flattening/FlattenedConfiguration.cs` (`ResolvedScript`), `TemplateEngine/Flattening/FlatteningService.cs` (`ResolveInheritedScripts`), `SiteRuntime/Actors/ScriptActor.cs`, `ScriptExecutionActor.cs:100`, `AlarmExecutionActor.cs:66`, `SiteRuntimeOptions.cs:31` (global fallback unchanged).
**Gap:** Only a global `ScriptExecutionTimeoutSeconds`; no per-script field. Mirror the existing nullable `MinTimeBetweenRuns` pattern end-to-end.
**Fix:** Add `int? ExecutionTimeoutSeconds` to `TemplateScript` + EF config (nullable) + **migration** (runs after M2.0 so the snapshot is clean) + `ResolvedScript` + flattening map + `ScriptActor` field; pass it into `ScriptExecutionActor`/`AlarmExecutionActor`, which compute `effective = perScript ?? options.ScriptExecutionTimeoutSeconds`. Validate non-negative.
**Tests:** flattening test (field threads through), actor test (per-script override vs global default both enforce the CTS timeout), EF round-trip test.
**DoD:** A per-script timeout overrides the global; absent → global default.
### M2.6 — #13: nested `Object`/`List` extended-type validation
**Classification:** standard · **Files:** `src/.../InboundAPI/.../ParameterValidator.cs:109-145`, `ReturnValueValidator.cs:18`.
**Gap:** `Object`/`List` are shape-validated only (object-vs-array); no nested/field-level type validation.
**Fix:** Recursive descent through the declared `Object` field schema / `List` element type, type-checking each level (scalars by extended-type, nested Object/List recursively). Reuse the existing extended-type system; keep error messages path-qualified (`field.sub[2].x`). Apply symmetrically in both validators.
**Tests:** `tests/.../InboundAPI.Tests` — valid nested payload passes; wrong scalar type at depth, wrong list element type, missing required nested field → rejected with path.
**DoD:** Nested type mismatches are caught at inbound validation, not at script runtime. (Satisfies the M4 cross-reference to this item.)
**Status: complete.** A shared recursive engine, `Commons.Types.InboundApi.InboundApiSchema` (parse + path-qualified `Validate`), backs both validators so they cannot drift. Key finding: the canonical persisted/authored format is **JSON Schema** (object `properties` + `required`, array `items`) — produced by the Central UI schema builder and the `MigrateParametersToJsonSchema` migration — but the validators still parsed the *legacy flat array* `[{name,type}]` and only shape-checked `Object`/`List`. They could not even consume a migrated JSON-Schema-object definition (the `Deserialize<List<…>>` would fail). Rewriting both to read `InboundApiSchema` fixes that latent format mismatch *and* delivers true nested validation; the legacy flat array is still accepted on read (case-insensitive keys) for transition safety. **Undeclared-field policy: reject at every level** (a declared object rejects any field not in its `properties`, consistent with the existing top-level `InboundAPI-010` "unexpected parameter" rejection); a bare `{"type":"object"}` with no declared fields stays shape-only. A present-but-null value satisfies any type; only the *absence* of a required field is an error.
### M2.7 — #20 + #21: return-type + argument-type compatibility checks
**Classification:** standard · **Files:** `src/.../TemplateEngine/Validation/SemanticValidator.cs:62-63,251-266,279-287,390-425`.
**Gap:** `BuildReturnMap` builds maps never read (no return-type comparison); call validation checks arg *count* only (comma counting), not arg *types*.
**Fix:** #20 — compare a call site's used-return against the target script's declared `ReturnDefinition`; flag incompatible use. #21 — extract/infer argument types at the call site and check each against the parameter definition (count + type). These share `SemanticValidator` — implement together. Be conservative: only flag clear mismatches (avoid false positives on dynamically-typed expressions); document the inference limits.
**Tests:** `tests/.../TemplateEngine.Tests` SemanticValidator — return-type mismatch flagged; arg type mismatch flagged; correct calls pass; dynamic/unknown types don't false-positive.
**DoD:** Type-incompatible script calls fail validation, not just count-mismatched ones.
### M2.8 — #23: connection-binding completeness as deploy-gating Error + "name exists at site"
**Classification:** standard · **Files:** `src/.../TemplateEngine/Validation/ValidationService.cs:504-519`, `ValidationResult.cs:9`.
**Gap:** Missing-binding for a data-sourced attribute is a non-blocking Warning (so `IsValid` stays true); the "connection name exists at the target site" half is missing.
**Fix:** Elevate binding-completeness to Error (or add a parallel Error-level check) so a deployment with unresolved bindings fails the gate; add the "binding references a connection that exists on the target site" check (resolve by site connection, not just name presence). Confirm this doesn't break legitimately-unbound attributes (static/non-data-sourced) — only data-sourced attributes require a binding.
**Tests:** `tests/.../TemplateEngine.Tests` ValidationService — data-sourced attribute with no binding → Error + `IsValid` false; binding to a non-existent site connection → Error; static attribute without binding → OK.
**DoD:** Incomplete/invalid connection bindings block deploy.
### M2.9 — #17: MachineDataDb fail-fast (per design doc; reverts Host-008)
**Classification:** small · **Files:** `src/ZB.MOM.WW.ScadaBridge.Host/DatabaseOptions.cs:6-12`, `StartupValidator.cs:59-62`, `tests/.../Host.Tests/StartupValidatorTests.cs` (the `Central_MissingMachineDataDb_PassesValidation` regression).
**Fix:** Add `string? MachineDataDb` to `DatabaseOptions`; add a Central-only `Require("ScadaBridge:Database:MachineDataDb", non-empty, ...)` in `StartupValidator`. **No DbContext** (nothing consumes it). Revert the Host-008 regression test to expect failure when missing; add `MachineDataDb` to `ValidCentralConfig()`. Commit message must note the deliberate Host-008 reversal and cite REQ-HOST-3/4 + shipped docker appsettings as justification.
**Tests:** `StartupValidatorTests` — Central missing MachineDataDb → fails; present → passes; Site role unaffected.
**DoD:** Central nodes fail fast on empty MachineDataDb; spec REQ-HOST-4 satisfied.
### M2.10 — #18: CI grep-guard against UPDATE/DELETE on AuditLog
**Classification:** small · **Files:** new guard test in `tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/` (the only thing that actually runs — no CI service exists; build is Docker-only).
**Fix:** Add a test that scans the ConfigurationDatabase source tree (and migration SQL) for `UPDATE`/`DELETE` statements targeting `AuditLog`, failing if any are found in C# data-access code. Scope strictly to the `AuditLog` table (allow purge/delete on Notifications/SiteCalls and partition-switch DDL). This backstops the existing DB-role `DENY UPDATE/DELETE` (migration `20260602174346`). Optionally add an MSBuild target mirroring it, but the test is the enforced control.
**Tests:** the guard test itself; verify it passes on current clean source and would fail on a planted violation (assert via a unit on the scanner helper).
**DoD:** A code-level guard fails the test run on AuditLog mutations.
### M2.11 — #24: debug snapshot/subscribe for unknown instance returns an error
**Classification:** small · **Files:** `src/.../DeploymentManager/.../DeploymentManagerActor.cs:845-866`.
**Gap:** Unknown-instance snapshot/subscribe returns an empty snapshot — caller can't distinguish "not deployed" from "deployed-but-empty".
**Fix:** Check instance registration first; return an explicit "instance not found"/not-deployed error response (matching the existing debug response contract) instead of an empty snapshot.
**Tests:** `tests/.../DeploymentManager` (or SiteRuntime) — unknown instance → error response; known empty instance → empty snapshot (unchanged).
**DoD:** Unknown-instance debug requests are distinguishable from empty ones.
### M2.12 — #25: recursion-limit error → site event log
**Classification:** small · **Files:** `src/.../SiteRuntime/.../ScriptRuntimeContext.cs:302-305,464-466` (thread `ISiteEventLogger` in, mirroring M1.8's `ScriptExecutionActor` wiring).
**Fix:** Inject `ISiteEventLogger` into `ScriptRuntimeContext`; on recursion-limit violation, emit a `script` Error site event (fire-and-forget `_ = logger?.LogEventAsync(...)`) in addition to the existing `ILogger` log, at both check sites.
**Tests:** `tests/.../SiteRuntime.Tests` — recursion-limit hit emits a site event with category `script`, severity Error.
**DoD:** Recursion-limit violations appear in the site event log per spec.
### M2.13 — #27: populate obtainable OPC UA / MxGateway transition fields
**Classification:** small · **Files:** `src/.../DataConnectionLayer/Adapters/RealOpcUaClient.cs:395-403`, `MxGatewayAlarmMapper.cs:79-113`.
**Fix:** Populate fields that are genuinely obtainable: for OPC UA A&C, add SelectClauses + map Category, Description, OriginalRaiseTime where the server exposes them (extend `BuildAlarmEventFilter`'s SelectClauses); for MxGateway, extract `OperatorUser` (present in the event but dropped) and any available Current/Limit values. Leave truly-unavailable fields empty and document which are unavailable-by-protocol vs left-empty.
**Tests:** `tests/.../DataConnectionLayer.Tests` mapper tests — obtainable fields populate from a representative event; unavailable fields documented.
**DoD:** Display fields populate where the source provides them.
### M2.14 — #28: readiness gate checks required cluster singletons
**Classification:** standard · **Files:** `src/.../Host/Program.cs:188-201,314-317`, new health check (peer to `AkkaClusterHealthCheck.cs`).
**Gap:** Readiness covers membership + DB connectivity only; spec wants "required singletons running".
**Fix:** Add a `Ready`-tagged health check that, on the active central node, verifies each required singleton proxy is reachable (e.g. `NotificationOutboxActor`, `AuditLogIngestActor`, `SiteCallAuditActor`, `AuditLogPurgeActor`, `SiteAuditReconciliationActor`) via a short `Ask`/Identify with timeout; degrade to Unhealthy if a required singleton is unreachable. Respect the "(if applicable)" softening — only gate on singletons that should be running for this node's role. Keep the probe cheap (cache/identify, short timeout) so readiness polling stays fast.
**Tests:** `tests/.../Host.Tests` or IntegrationTests — health check reports Unhealthy when a required singleton proxy is absent; Healthy when present. Avoid flakiness (use Identify with a bounded timeout).
**DoD:** `/health/ready` reflects singleton health.
### M2.15 — #29: register the site active-node purge gate
**Classification:** small · **Files:** `src/.../SiteEventLogging/ServiceCollectionExtensions.cs:33-37`, site service registration / cluster setup.
**Gap:** `SiteEventLogActiveNodeCheck` is consulted by `EventLogPurgeService` but no implementation is registered on the site node → purge runs on standby too (defaults to `() => true`).
**Fix:** Register a `SiteEventLogActiveNodeCheck` delegate on the site node that returns true only when this node is the cluster leader/active (mirror how central gates active-node work). Keep the null-default behavior for non-clustered test hosts.
**Tests:** `tests/.../SiteEventLogging.Tests` — purge gated off on standby, on for active; default-true preserved when unregistered.
**DoD:** Site event-log purge runs only on the active node.
### M2.16 — #30: Health Monitoring consumes `FailedWriteCount`
**Classification:** small · **Files:** `src/.../SiteEventLogging/ISiteEventLogger.cs:32-40`, Health Monitoring metric path.
**Fix:** Wire `FailedWriteCount` into the site health metrics the same way other site metrics are collected/reported (find the existing site metric collection path), so the dangling metric is consumed (surface as a health metric / threshold). Keep it raw-count per the health-reporting conventions.
**Tests:** `tests/.../HealthMonitoring`/SiteEventLogging — failed writes increment the reported metric.
**DoD:** `FailedWriteCount` reaches Health Monitoring.
### M2.17 — #31: reconcile StateTransitionValidator delete-from-NotDeployed
**Classification:** small · **Files:** `src/.../DeploymentManager/.../StateTransitionValidator.cs:38-41`, possibly `docs/requirements/Component-DeploymentManager.md` (spec matrix).
**Fix:** `git blame`/log the `CanDelete` line to recover intent. Default: **align code to the spec matrix** — remove `NotDeployed` from the allowed delete states, add a clarifying comment — UNLESS history shows deliberate orphan-cleanup intent, in which case update the spec matrix (Delete from NotDeployed = Yes, with a no-op-cleanup note) instead. Whichever direction, code and doc must agree at the end.
**Tests:** `tests/.../DeploymentManager` StateTransitionValidator — the chosen rule is asserted.
**DoD:** Code and spec matrix agree on delete-from-NotDeployed.
### M2.18 — #26: debug-stream stream-first ordering + replay/dedup
**Classification:** high-risk · **Files:** `src/.../DebugStreamBridgeActor.cs:89-103,163-166`.
**Gap:** `PreStart` sends the snapshot first, then opens the gRPC stream → events in the gap window are lost. Spec wants stream-first + replay with timestamp dedup.
**Fix:** Open the gRPC subscription FIRST (buffer incoming events), then fetch+send the snapshot, then flush buffered events, deduping by timestamp/identity against the snapshot so no gap-window event is lost or double-delivered. Preserve ordering. This is a re-arch of the actor's PreStart lifecycle — keep the existing message contract.
**Tests:** `tests/.../` DebugStreamBridgeActor — an event arriving during the snapshot window is delivered exactly once after the snapshot; ordering preserved; dedup drops the snapshot-overlapping event.
**DoD:** No gap-window events lost; no duplicates.
### M2.19 — #15: LDAP periodic re-query for interactive sessions (SECURITY)
**Classification:** high-risk · **Files:** `src/.../Security/ServiceCollectionExtensions.cs:86-148` (cookie events), `JwtTokenService.cs` (wire the unused `IsIdleTimedOut`/`ShouldRefresh`/`RecordActivity`/`RefreshToken`), `RoleMapper.cs`, LDAP service interface, `CentralUI/Auth/AuthEndpoints.cs` (claims-build parity).
**Spike first:** Determine whether the shared `ZB.MOM.WW.Auth.Ldap` lib exposes a **passwordless service-account group search** for an already-authenticated username. Report the answer before building the LDAP leg.
**Fix (layered):**
1. **Always achievable** — add `CookieAuthenticationEvents.OnValidatePrincipal` that: enforces idle-timeout (reject/sign-out past 30-min idle, advance last-activity on use), and refreshes role claims by **re-running `RoleMapper` on the stored group claims** (picks up central role-mapping changes without LDAP). Stamp a `LastLdapCheck` claim.
2. **If the lib supports passwordless group search** — when `LastLdapCheck` is >15 min old, re-query LDAP groups via the service-account search, re-map roles, update role/site claims. **On LDAP failure: keep existing roles, do NOT sign out** (per "LDAP failure: new logins fail; active sessions continue with current roles"). If the lib does NOT support it, ship layer 1 and document the residual limitation (group-membership changes picked up only at next login) in the security doc.
Rebuild claims identically to `/auth/login` (same claim types). Use the cookie-only model (embedded-JWT is dispositioned doc-only in M4).
**Tests (incl. adversarial):** idle-timeout enforced; role-mapping change reflected without LDAP; LDAP-down on re-query keeps existing roles (no sign-out); >15-min triggers re-query, <15-min skips (TTL respected); a revoked-group user loses roles after re-query (if LDAP leg shipped).
**DoD:** Interactive sessions enforce idle-timeout and refresh roles per the documented policy; any residual LDAP-dependency limitation is documented.
---
## Cross-cutting
- `dotnet build ZB.MOM.WW.ScadaBridge.slnx` green (TreatWarningsAsErrors); relevant unit/integration tests pass per task.
- MSSQL-backed tests need `cd infra && docker compose up -d`; if unavailable, gate-with-note (M2.0 especially).
- Migration tasks (M2.0, M2.5) serialized; M2.0 first.
- `git diff` review before each commit; design-summary commit messages; one logical slice per commit.
- After all tasks: final integration code review, build, and `bash docker/deploy.sh` smoke (`curl localhost:9000/health/ready`).
@@ -0,0 +1,35 @@
{
"planPath": "docs/plans/2026-06-15-stillpending-m2-implementation.md",
"tasks": [
{"id": 32, "ref": "M2.0", "subject": "M2.0 #32: EF model/snapshot drift (PendingModelChangesWarning)", "class": "high-risk", "status": "completed", "commits": ["2fb608f"]},
{"id": 33, "ref": "M2.1", "subject": "M2.1 #22: native-alarm capability validation wired into deploy pipeline", "class": "standard", "status": "completed", "commits": ["d690920", "41d828e"]},
{"id": 34, "ref": "M2.2", "subject": "M2.2 #10: connection-level diff surfaced in deployment diff", "class": "standard", "status": "completed", "commits": ["e9a84ba", "198770f"]},
{"id": 35, "ref": "M2.3", "subject": "M2.3 #7: Database.CachedWrite transient/permanent SQL classification", "class": "high-risk", "status": "completed", "commits": ["d052706", "de375ff"]},
{"id": 36, "ref": "M2.4", "subject": "M2.4 #8: alarm conditionFilter applied (OPC UA WhereClause + client routing)", "class": "high-risk", "status": "completed", "commits": ["8825df5", "00304a2"]},
{"id": 37, "ref": "M2.5", "subject": "M2.5 #9: per-script execution timeout (entity+migration+flatten+actor)", "class": "standard", "status": "completed", "blockedBy": [32], "commits": ["3edef09", "3032faa"]},
{"id": 38, "ref": "M2.6", "subject": "M2.6 #13: nested Object/List extended-type validation", "class": "standard", "status": "completed", "commits": ["4b6187c", "411d0c0"]},
{"id": 39, "ref": "M2.7", "subject": "M2.7 #20+#21: return-type + argument-type compatibility checks", "class": "standard", "status": "completed", "commits": ["958229e", "a8e9e99"]},
{"id": 40, "ref": "M2.8", "subject": "M2.8 #23: binding-completeness Error + name-exists-at-site", "class": "standard", "status": "completed", "commits": ["7c14a69", "21b801b"]},
{"id": 41, "ref": "M2.9", "subject": "M2.9 #17: MachineDataDb fail-fast (reverts Host-008)", "class": "small", "status": "completed", "commits": ["76198b3"]},
{"id": 42, "ref": "M2.10", "subject": "M2.10 #18: CI grep-guard against UPDATE/DELETE on AuditLog", "class": "small", "status": "completed", "commits": ["e7b6fe3", "9cd62aa"]},
{"id": 43, "ref": "M2.11", "subject": "M2.11 #24: debug snapshot unknown-instance returns error", "class": "small", "status": "completed", "commits": ["dbf44b9", "d160c7f"]},
{"id": 44, "ref": "M2.12", "subject": "M2.12 #25: recursion-limit error to site event log", "class": "small", "status": "completed", "commits": ["f08038d", "e2b31a9"]},
{"id": 45, "ref": "M2.13", "subject": "M2.13 #27: populate obtainable OPC UA/MxGateway transition fields", "class": "small", "status": "completed", "commits": ["722b866", "3945789"]},
{"id": 46, "ref": "M2.14", "subject": "M2.14 #28: readiness gate checks required cluster singletons", "class": "standard", "status": "completed", "commits": ["253bec5", "6b1cb9e"]},
{"id": 47, "ref": "M2.15", "subject": "M2.15 #29: register site active-node purge gate (DI)", "class": "small", "status": "completed", "commits": ["e1ee37e"]},
{"id": 48, "ref": "M2.16", "subject": "M2.16 #30: Health Monitoring consumes FailedWriteCount", "class": "small", "status": "completed", "commits": ["d81f747", "c9244d8"]},
{"id": 49, "ref": "M2.17", "subject": "M2.17 #31: reconcile StateTransitionValidator delete-from-NotDeployed", "class": "small", "status": "completed", "commits": ["c104356"]},
{"id": 50, "ref": "M2.18", "subject": "M2.18 #26: debug-stream stream-first ordering + replay/dedup", "class": "high-risk", "status": "completed", "commits": ["d8519cb", "a0d9379"]},
{"id": 51, "ref": "M2.19", "subject": "M2.19 #15: LDAP periodic re-query for interactive sessions (spike+impl)", "class": "high-risk", "status": "completed", "note": "Spike outcome: shared ILdapAuthService exposes only AuthenticateAsync (no passwordless group-search) -> live LDAP group re-query out of scope (external pkg, tracked follow-up). Implemented always-achievable layers: stored zb:group + zb:lastrolerefresh claims at login, shared SessionClaimBuilder (DRY login+refresh), CookieSessionValidator + OnValidatePrincipal (idle-timeout reject@30m, DB-only role-mapping refresh@15m, fail-soft keep-session on refresh error). Residual limitation documented in Component-Security.md.", "commits": ["8fe7f46", "fddc695"]}
],
"deferred": [
{"ref": "#16", "subject": "Transport stale-instance enumeration", "to": "M8 (Transport)"},
{"ref": "#19", "subject": "script started/completed events", "status": "done in M1.8"}
],
"followups": [
{"id": 52, "subject": "Investigate 2 partition-purge E2E test failures (AuditLogPurgeActor/PartitionPurge)", "from": "M2.0", "status": "pending"},
{"id": 53, "subject": "Dedup alarm-capable protocol predicate (3 copies → AlarmCapableProtocols)", "from": "M2.1", "status": "pending"},
{"id": 54, "subject": "Expose ExecutionTimeoutSeconds (+ MinTimeBetweenRuns) in CLI + UI script authoring", "from": "M2.5", "status": "pending"}
],
"lastUpdated": "2026-06-15"
}
@@ -84,7 +84,14 @@ All mutating operations on a single instance (deploy, disable, enable, delete) s
|---------------|--------|---------|--------|--------|
| Enabled | Yes | Yes | No (already enabled) | Yes |
| Disabled | Yes (enables on apply) | No (already disabled) | Yes | Yes |
| Not deployed | Yes (initial deploy) | No | No | No |
| Not deployed | Yes (initial deploy) | No | No | Yes (removes the orphan record) |
> **Delete from Not deployed:** permitted so an instance that was previously
> undeployed (state `NotDeployed`) can have its record fully removed —
> deployment history, snapshot, attribute/alarm overrides, and connection
> bindings — rather than lingering as an unremovable orphan. There is no live
> site configuration to tear down in this state, so the delete is a
> central-side record cleanup (no site round-trip required).
## System-Wide Artifact Deployment Failure Handling
+2
View File
@@ -95,6 +95,8 @@ On central nodes, the ASP.NET Core web endpoints (Central UI, Inbound API) must
- Database connectivity (MS SQL) is verified.
- Required cluster singletons are running (if applicable).
These are implemented as three `Ready`-tagged health checks registered in the Central-role branch of `Program.cs` (so they are naturally role-scoped — site nodes do not run them): `database` (`DatabaseHealthCheck<ScadaBridgeDbContext>`), `akka-cluster` (`AkkaClusterHealthCheck`), and `required-singletons` (`RequiredSingletonsHealthCheck`). The last verifies each *required-always* central singleton is reachable by Asking its local `ClusterSingletonProxy` an `Identify` with a short bounded timeout (~2s, probes run concurrently) and treating a non-null `ActorIdentity.Subject` as reachable; any unreachable required singleton degrades the check to **Unhealthy**, naming it. The required-always set is the five unconditional central singletons: notification-outbox, audit-log-ingest, site-call-audit, audit-log-purge, and site-audit-reconciliation. Feature-gated singletons are the "if applicable" case and are not probed when their feature is off. The check is leadership-agnostic — the proxy reaches the singleton from either central node, so a ready standby still reports ready (readiness must NOT require cluster leadership; that is the `Active` tier's job). During a brief singleton handover the probe may momentarily time out and the node may flap to not-ready, which is correct: a node mid-handover is legitimately not fully ready (no retries are used, to keep readiness polling fast).
A standard ASP.NET Core health check endpoint (`/health/ready`) reports readiness status. The load balancer uses this endpoint to determine when to route traffic to the node. During startup or failover, the node returns `503 Service Unavailable` until ready.
### REQ-HOST-5: Windows Service Hosting
+14 -2
View File
@@ -40,9 +40,10 @@ Each API method definition includes:
- **Approved API Keys**: List of API keys authorized to invoke this method. Requests from non-approved keys are rejected.
- **Parameter Definitions**: Ordered list of input parameters, each with:
- Parameter name.
- Data type (Boolean, Integer, Float, String — same fixed set as template attributes).
- Data type — the **extended type system** (Boolean, Integer, Float, String, plus the nestable Object and List; see [Extended Type System](#extended-type-system)).
- Whether the parameter is required.
- **Return Value Definition**: Structure of the response, with:
- Field names and data types. Supports returning **lists of objects**.
- Field names and (extended-system) data types. Supports returning **lists of objects** and arbitrarily nested structures.
- **Implementation Script**: C# script that executes when the method is called. Stored **inline** in the method definition. Follows standard C# authoring patterns but has no template inheritance — it is a standalone script tied to this method.
- **Timeout**: Configurable per method. Defines the maximum time the method is allowed to execute (including any routed calls to sites) before returning a timeout error to the caller.
@@ -99,6 +100,17 @@ Each API method definition includes:
- This allows complex request/response structures (e.g., an object containing properties and a list of nested objects).
- Template attributes retain the simpler four-type system. The extended types apply only to Inbound API method definitions and External System Gateway method definitions.
#### Type Definition Format & Nested Validation
- Parameter and return type definitions are persisted as **JSON Schema** (the canonical format produced by the Central UI schema builder; see the `MigrateParametersToJsonSchema` migration). An object declares its fields via `properties` (+ a `required` array); a list declares its element type via `items`. The legacy flat-array form (`[{name,type,required,itemType?}]`) is still accepted on read for transition safety.
- Validation is **recursive and type-aware** for the extended types (request parameters and script return values alike, via a single shared engine so the two cannot drift):
- **Object**: each declared field's value is validated against its declared (possibly nested) type; a missing required field and a present-but-wrong type are both reported.
- **List**: every element is validated against the declared element type (recursing into nested objects/lists). A list whose element type is left undeclared (`array` without `items`) is shape-checked only.
- **Scalars at any depth** are checked against the extended type.
- Errors are **path-qualified** (e.g. `order.items[2].quantity`) so the caller can locate the offending field.
- **Undeclared fields are rejected** at every level (consistent with the top-level "unexpected parameter" rejection): an object that declares its fields rejects any field not in its `properties`, so a typo'd field name surfaces as a `400`/error rather than being silently ignored. A bare object schema with no declared fields (`{"type":"object"}`) stays shape-only and accepts any fields.
- A JSON `null` value satisfies any declared type (a present-but-null field is allowed); only the **absence** of a required field is an error.
## Script Compilation & Hot-Reload
API method scripts are compiled at central startup — all method definitions are loaded from the configuration database and compiled into in-memory delegates.
+27 -5
View File
@@ -32,9 +32,31 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
- **JWT claims**: User display name, username, list of roles (Admin, Design, Deployment), and for site-scoped Deployment, the list of permitted site IDs.
### Token Lifecycle
- **JWT expiry**: 15 minutes. On each request, if the cookie-embedded JWT is near expiry, the app re-queries LDAP for current group memberships and issues a fresh JWT, writing an updated cookie. Roles are never more than 15 minutes stale.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the token is not refreshed and the user must re-login. Tracked via a last-activity timestamp in the token.
- **Sliding refresh**: Active users stay logged in indefinitely — the token refreshes every 15 minutes as long as requests are made within the 30-minute idle window.
> **Implementation note (M2.19, #15).** The interactive Central UI login path signs in
> with **bare cookie claims**, not a cookie-embedded JWT. The session lifecycle below is
> therefore enforced by the cookie middleware (`ExpireTimeSpan` + `SlidingExpiration`) plus
> a `CookieAuthenticationEvents.OnValidatePrincipal` handler — see **Session Validation
> (`OnValidatePrincipal`)** below. The embedded-JWT model remains the documented design
> intent and is the mechanism for any non-cookie bearer surface (e.g. `/auth/token`), but
> it is **not** the transport for the cookie principal.
- **Idle timeout**: Configurable, default **30 minutes**. If no requests are made within the idle window, the session is rejected and the user must re-login. Tracked via a `LastActivity` last-activity timestamp claim. The cookie's `ExpireTimeSpan` is set to the idle timeout and `SlidingExpiration` renews it on activity, so the cookie window and the explicit `OnValidatePrincipal` idle check use the **same** value and cannot contradict each other.
- **Role-mapping refresh (LDAP-free)**: Configurable, default **15 minutes** (`SecurityOptions.RoleRefreshThresholdMinutes`). At login the session stores the user's raw LDAP groups (one `zb:group` claim each) plus a `zb:lastrolerefresh` anchor. Once the anchor is older than the threshold, `OnValidatePrincipal` re-runs the **DB-backed** `RoleMapper` on the stored groups — **with no LDAP call** — rebuilds the role/scope claims via the shared claim-builder, advances the anchor, and re-issues the cookie. Central role-mapping (DB) changes — including a **revoked** mapping that drops the user's roles, and changed site-scope rules — take effect within this window. Roles derived from central mappings are never more than ~15 minutes stale.
#### Session Validation (`OnValidatePrincipal`)
- The cookie principal is built at login by a **single shared claim-builder** (`SessionClaimBuilder`). The `OnValidatePrincipal` role-refresh path rebuilds the principal through the **same** builder, so the login and refresh claim shapes cannot drift.
- **Failure policy**: the refresh is best-effort. Any error during the refresh (e.g. the configuration database is unreachable) **keeps the existing principal with its current roles** — it never signs the user out and never throws out of the request pipeline. This mirrors the **Active sessions** stance under *LDAP Connection Failure* below. Only the explicit idle-timeout path rejects the principal.
> **Residual limitation — live LDAP group-membership changes (follow-up).** The
> mid-session refresh re-maps the **stored** groups against the central database; it does
> **not** re-query LDAP, so a change to the user's actual **group membership** in the
> directory is picked up only at **next login**. A live group re-query for an active
> session would require a new passwordless service-account group-search method on the
> shared `ZB.MOM.WW.Auth.Ldap` library, which is an **external NuGet package** and exposes
> only `AuthenticateAsync(username, password, ct)` (no standalone group search). Adding
> that method is tracked as a follow-up. Until then: central role-mapping/scope changes are
> reflected within ~15 minutes; directory group-membership changes require re-login.
### Load Balancer Compatibility
- The authentication cookie carries a self-contained JWT — no server-side session state. A load balancer in front of the central cluster can route requests to either node without sticky sessions or a shared session store.
@@ -43,8 +65,8 @@ Central cluster. Sites do not have user-facing interfaces and do not perform ind
## LDAP Connection Failure
- **New logins**: If the LDAP/AD server is unreachable, login attempts **fail**. Users cannot be authenticated without LDAP.
- **Active sessions**: Users with valid (not-yet-expired) JWTs can **continue operating** with their current roles. The token refresh is skipped until LDAP is available again. This avoids disrupting engineers mid-work during a brief LDAP outage.
- **Recovery**: When LDAP becomes reachable again, the next token refresh cycle re-queries group memberships and issues a fresh token with current roles.
- **Active sessions**: Users with a valid (not-idle-timed-out) session can **continue operating** with their current roles during an LDAP outage. Interactive cookie sessions never re-query LDAP mid-session (the mid-session role-mapping refresh is DB-only — see *Session Validation* above), so a brief LDAP outage does not disrupt engineers mid-work; central role-mapping changes still apply within the refresh window regardless of LDAP availability.
- **Recovery (group-membership changes)**: Because the mid-session refresh is LDAP-free, a change to a user's **directory group membership** is picked up at the user's **next login** (when LDAP is queried again), not mid-session — see the *Residual limitation* note above.
## Roles
@@ -1,4 +1,3 @@
using System.Security.Claims;
using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Builder;
@@ -35,7 +34,6 @@ public static class AuthEndpoints
}
var ldapAuth = context.RequestServices.GetRequiredService<ILdapAuthService>();
var jwtService = context.RequestServices.GetRequiredService<JwtTokenService>();
var roleMapper = context.RequestServices.GetRequiredService<IGroupRoleMapper<string>>();
var authResult = await ldapAuth.AuthenticateAsync(username, password, context.RequestAborted);
@@ -72,39 +70,23 @@ public static class AuthEndpoints
// the documented sliding-refresh policy.
var displayName = string.IsNullOrEmpty(authResult.DisplayName) ? username : authResult.DisplayName;
var resolvedUsername = string.IsNullOrEmpty(authResult.Username) ? username : authResult.Username;
var claims = new List<Claim>
{
new(ClaimTypes.Name, resolvedUsername),
new(JwtTokenService.DisplayNameClaimType, displayName),
new(JwtTokenService.UsernameClaimType, resolvedUsername),
};
foreach (var role in roleMapping.Roles)
{
claims.Add(new Claim(JwtTokenService.RoleClaimType, role));
}
if (!scope.IsSystemWideDeployment)
{
foreach (var siteId in scope.PermittedSiteIds)
{
claims.Add(new Claim(JwtTokenService.SiteIdClaimType, siteId));
}
}
// Task 1.5: name the role/name claim types explicitly so the cookie
// principal's IsInRole / [Authorize(Roles=…)] resolve against the same
// canonical types we mint (JwtTokenService.RoleClaimType = ZbClaimTypes.Role,
// ClaimTypes.Name = ZbClaimTypes.Name). The policies use
// RequireClaim(RoleClaimType, …) which checks type+value directly, but
// pinning roleType keeps IsInRole-style checks consistent and survives the
// cookie serialize/round-trip.
var identity = new ClaimsIdentity(
claims,
authenticationType: CookieAuthenticationDefaults.AuthenticationScheme,
nameType: ClaimTypes.Name,
roleType: JwtTokenService.RoleClaimType);
var principal = new ClaimsPrincipal(identity);
// M2.19 (#15): build the cookie principal through the shared
// SessionClaimBuilder — the SINGLE source of truth that the mid-session
// OnValidatePrincipal role-refresh path ALSO uses, so login and refresh can
// never drift. It stamps the canonical identity/role/scope claims (with
// roleType/nameType pinned for IsInRole), PLUS the M2.19 additions: one
// zb:group claim per raw LDAP group (the durable input the mid-session
// RoleMapper re-run consumes) and a zb:lastrolerefresh anchor (login time,
// UTC) that also seeds the LastActivity idle anchor. The refresh timestamp is
// the login instant, so the first role refresh is due RoleRefreshThresholdMinutes
// later — not immediately.
var principal = SessionClaimBuilder.Build(
resolvedUsername,
displayName,
authResult.Groups,
scope,
DateTimeOffset.UtcNow);
await context.SignInAsync(
CookieAuthenticationDefaults.AuthenticationScheme,
@@ -445,6 +445,17 @@
});
});
// M2.11: the site returns InstanceNotFound=true when the instance is
// not deployed there (e.g. deployment not yet pushed, or wrong site).
if (session.InitialSnapshot.InstanceNotFound)
{
DebugStreamService.StopStream(session.SessionId);
_toast.ShowError(
"Instance not found on the selected site — check the deployment target.");
_connecting = false;
return;
}
_session = session;
// Populate initial state from snapshot
@@ -864,12 +864,144 @@
? "The deployed revision hash differs from the current template-derived hash. Redeploy to apply changes."
: "No differences between deployed and current configuration.");
builder.CloseElement();
// DeploymentManager-018: render the structured diff sections so
// the operator sees WHAT changed, not just that the hash moved.
// Each section uses the same compact change-table idiom; the
// connection section surfaces standalone endpoint/protocol/
// failover drift that no per-attribute row would show (#10).
var d = diffResult.Diff;
if (d != null)
{
RenderChangeSection(builder, 100_000, "Attributes", d.AttributeChanges,
a => a.Value ?? "—");
RenderChangeSection(builder, 200_000, "Alarms", d.AlarmChanges,
a => $"P{a.PriorityLevel} · {a.TriggerType}");
RenderChangeSection(builder, 300_000, "Scripts", d.ScriptChanges,
s => s.TriggerType ?? "—");
RenderChangeSection(builder, 400_000, "Connections", d.ConnectionChanges,
c => FormatConnection(c));
}
}
};
await _diffDialog.ShowAsync($"Deployment Diff — {inst.UniqueName}", body);
}
// Compact summary of a connection's deployment-relevant fields for the diff
// table's Before/After cells. Surfaces all four fields ConnectionsEqual
// compares — protocol, primary endpoint config, failover retry count, and
// the backup endpoint — so a backup-only change doesn't show identical
// Before/After cells. The backup segment is omitted when there is no backup.
private static string FormatConnection(
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.ConnectionConfig c)
{
var endpoint = string.IsNullOrWhiteSpace(c.ConfigurationJson) ? "—" : c.ConfigurationJson;
var summary = $"{c.Protocol} · {endpoint} · failover ×{c.FailoverRetryCount}";
if (!string.IsNullOrWhiteSpace(c.BackupConfigurationJson))
{
summary += $" · backup {c.BackupConfigurationJson}";
}
return summary;
}
// Renders one change section (a heading plus a Bootstrap change-table) for a
// set of diff entries, matching the deployment-diff idiom used elsewhere in
// the UI: table-sm/table-striped, a colored change badge, and Before/After
// text columns. Nothing is rendered when the section has no entries, so the
// four sections (attributes, alarms, scripts, connections) all read the same
// and only appear when they actually changed. seqBase values are spaced
// 100k apart so each section's per-row sequence numbers (13 per row) stay in
// a disjoint, ascending range no matter how many entries a section has.
private static void RenderChangeSection<T>(
Microsoft.AspNetCore.Components.Rendering.RenderTreeBuilder builder,
int seqBase,
string heading,
IReadOnlyList<ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffEntry<T>> entries,
Func<T, string> summarize)
{
if (entries.Count == 0)
return;
builder.OpenElement(seqBase, "div");
builder.AddAttribute(seqBase + 1, "class", "mt-3");
builder.OpenElement(seqBase + 2, "div");
builder.AddAttribute(seqBase + 3, "class", "fw-semibold small mb-1");
builder.AddContent(seqBase + 4, $"{heading} ({entries.Count})");
builder.CloseElement();
builder.OpenElement(seqBase + 5, "table");
builder.AddAttribute(seqBase + 6, "class", "table table-sm table-striped align-middle mb-0");
// Header row.
builder.OpenElement(seqBase + 7, "thead");
builder.OpenElement(seqBase + 8, "tr");
AppendHeaderCell(builder, seqBase + 9, "Name");
AppendHeaderCell(builder, seqBase + 12, "Change");
AppendHeaderCell(builder, seqBase + 15, "Before");
AppendHeaderCell(builder, seqBase + 18, "After");
builder.CloseElement(); // tr
builder.CloseElement(); // thead
builder.OpenElement(seqBase + 21, "tbody");
var rowSeq = seqBase + 22;
foreach (var entry in entries)
{
builder.OpenElement(rowSeq, "tr");
builder.OpenElement(rowSeq + 1, "td");
builder.AddContent(rowSeq + 2, entry.CanonicalName);
builder.CloseElement();
builder.OpenElement(rowSeq + 3, "td");
builder.OpenElement(rowSeq + 4, "span");
builder.AddAttribute(rowSeq + 5, "class", ChangeBadgeClass(entry.ChangeType));
builder.AddContent(rowSeq + 6, entry.ChangeType.ToString());
builder.CloseElement();
builder.CloseElement();
builder.OpenElement(rowSeq + 7, "td");
builder.AddAttribute(rowSeq + 8, "class", "small text-muted");
builder.AddContent(rowSeq + 9,
entry.OldValue is null ? "—" : summarize(entry.OldValue));
builder.CloseElement();
builder.OpenElement(rowSeq + 10, "td");
builder.AddAttribute(rowSeq + 11, "class", "small");
builder.AddContent(rowSeq + 12,
entry.NewValue is null ? "—" : summarize(entry.NewValue));
builder.CloseElement();
builder.CloseElement(); // tr
rowSeq += 13;
}
builder.CloseElement(); // tbody
builder.CloseElement(); // table
builder.CloseElement(); // div.mt-3
}
private static void AppendHeaderCell(
Microsoft.AspNetCore.Components.Rendering.RenderTreeBuilder builder, int seq, string text)
{
builder.OpenElement(seq, "th");
builder.AddAttribute(seq + 1, "scope", "col");
builder.AddContent(seq + 2, text);
builder.CloseElement();
}
private static string ChangeBadgeClass(
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType changeType) => changeType switch
{
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType.Added => "badge bg-success",
ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening.DiffChangeType.Removed => "badge bg-danger",
_ => "badge bg-warning text-dark",
};
// ---- Dropdown option helpers ----
private IEnumerable<(int Id, string Label)> EnumerateSiteOptions()
{
@@ -117,6 +117,9 @@
private string? _scriptParameters;
private string? _scriptReturn;
private bool _scriptIsLocked;
// Round-tripped from the loaded script so UI edits preserve a timeout set
// via Transport import (no authoring control in the UI — scoped out).
private int? _scriptExecutionTimeoutSeconds;
private string? _scriptFormError;
private string _scriptModalTab = "trigger"; // "trigger" | "code" | "parameters" | "return"
private MonacoEditor? _scriptEditor;
@@ -1797,6 +1800,7 @@
_scriptParameters = null;
_scriptReturn = null;
_scriptIsLocked = false;
_scriptExecutionTimeoutSeconds = null;
_scriptModalTab = "trigger";
ResetScriptTestRun();
}
@@ -1814,6 +1818,9 @@
_scriptParameters = script.ParameterDefinitions;
_scriptReturn = script.ReturnDefinition;
_scriptIsLocked = script.IsLocked;
// Preserve any timeout set via Transport import — the UI has no authoring
// control for this field, so we round-trip the loaded value unchanged.
_scriptExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds;
_scriptModalTab = "trigger";
ResetScriptTestRun();
}
@@ -1907,6 +1914,9 @@
ReturnDefinition = _scriptReturn,
IsLocked = _scriptIsLocked,
MinTimeBetweenRuns = DurationInput.Compose(_scriptMinTimeValue, _scriptMinTimeUnit),
// Round-trip the loaded value — no UI control, so preserve
// any timeout set via Transport import unchanged.
ExecutionTimeoutSeconds = _scriptExecutionTimeoutSeconds,
IsInherited = existing.IsInherited,
LockedInDerived = existing.LockedInDerived,
};
@@ -52,6 +52,15 @@ public class TemplateScript
/// </summary>
public TimeSpan? MinTimeBetweenRuns { get; set; }
/// <summary>
/// Per-script execution timeout in seconds, or null to use the site's global
/// default (<c>SiteRuntimeOptions.ScriptExecutionTimeoutSeconds</c>). A
/// non-positive value (≤ 0) is treated the same as null — i.e. fall back to
/// the global default — by the Site Runtime. Seconds (not a TimeSpan) to keep
/// the unit consistent with the global option it overrides.
/// </summary>
public int? ExecutionTimeoutSeconds { get; set; }
/// <summary>
/// True when this row was copied from the base template and has not been
/// overridden on the derived template. Changes to the base flow downward
@@ -0,0 +1,34 @@
namespace ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
/// <summary>
/// Single source of truth for which data-connection protocol strings produce an
/// adapter that implements <see cref="IAlarmSubscribableConnection"/> (i.e. can
/// mirror native alarms).
///
/// The set MUST stay in sync with the protocols registered against an
/// alarm-subscribable adapter in
/// <c>DataConnectionLayer/DataConnectionFactory.cs</c>: today the "OpcUa" adapter
/// (<c>OpcUaDataConnection</c>) and the "MxGateway" adapter
/// (<c>MxGatewayDataConnection</c>) both implement
/// <see cref="IAlarmSubscribableConnection"/>. The runtime decision is made in
/// <c>DataConnectionActor</c> via <c>_adapter is IAlarmSubscribableConnection</c>;
/// this central-side helper lets the deploy pipeline and Central UI gate
/// native-alarm-source bindings against the same notion without instantiating an
/// adapter. Adding a new alarm-capable protocol = register the adapter in the
/// factory AND add its protocol string here.
/// </summary>
public static class AlarmCapableProtocols
{
/// <summary>
/// Determines whether a data connection's protocol string resolves to an
/// alarm-capable adapter (one implementing <see cref="IAlarmSubscribableConnection"/>).
/// Case-insensitive to match <c>DataConnectionFactory</c>'s own
/// <c>OrdinalIgnoreCase</c> protocol-key lookup; <c>null</c>/blank is not
/// alarm-capable.
/// </summary>
/// <param name="protocol">The data connection protocol string (e.g. "OpcUa").</param>
/// <returns><c>true</c> when the protocol's adapter can subscribe native alarms; otherwise <c>false</c>.</returns>
public static bool IsAlarmCapable(string? protocol) =>
string.Equals(protocol, "OpcUa", StringComparison.OrdinalIgnoreCase)
|| string.Equals(protocol, "MxGateway", StringComparison.OrdinalIgnoreCase);
}
@@ -56,8 +56,17 @@ public interface IDatabaseGateway
/// <param name="parameters">Optional SQL parameters for the statement.</param>
/// <param name="originInstanceName">Optional name of the instance that originated the write.</param>
/// <param name="cancellationToken">Cancellation token for the buffering operation.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
Task CachedWriteAsync(
/// <returns>
/// M2.3 (#7): an <see cref="ExternalCallResult"/> mirroring the External-System
/// API path (<c>IExternalSystemClient.CachedCallAsync</c>). The write is
/// attempted immediately:
/// <list type="bullet">
/// <item>immediate success → <c>Success=true, WasBuffered=false</c> (not buffered);</item>
/// <item>permanent SQL error (constraint / syntax / permission) → <c>Success=false, WasBuffered=false</c> with an error message, returned synchronously and NOT buffered;</item>
/// <item>transient SQL error (connection / timeout / deadlock / throttle) → buffered to store-and-forward, <c>Success=true, WasBuffered=true</c>.</item>
/// </list>
/// </returns>
Task<ExternalCallResult> CachedWriteAsync(
string connectionName,
string sql,
IReadOnlyDictionary<string, object?>? parameters = null,
@@ -2,8 +2,38 @@ using ZB.MOM.WW.ScadaBridge.Commons.Messages.Streaming;
namespace ZB.MOM.WW.ScadaBridge.Commons.Messages.DebugView;
/// <summary>
/// Snapshot of an instance's debug state returned in response to a
/// <see cref="DebugSnapshotRequest"/> or <see cref="SubscribeDebugViewRequest"/>.
/// </summary>
/// <remarks>
/// <para>
/// <b>Additive-only contract (M2.11):</b> <see cref="InstanceNotFound"/> is an
/// optional trailing parameter with a default of <see langword="false"/> so every
/// existing positional constructor call and every existing serialized wire frame
/// remains valid. Callers that receive a snapshot with
/// <c>InstanceNotFound = true</c> know the instance was unknown on the site and
/// should distinguish that from a deployed-but-empty instance
/// (<c>InstanceNotFound = false</c>, empty <see cref="AttributeValues"/> and
/// <see cref="AlarmStates"/>).
/// </para>
/// <para>
/// A new dedicated message type (<c>DebugViewInstanceNotFound</c>) was
/// considered but rejected: the ClusterClient / ClusterClientReceptionist
/// channel is typed on the request side and the bridge actor is already
/// pattern-matching on <c>DebugViewSnapshot</c> for the initial-snapshot TCS
/// in <c>DebugStreamService</c>. Introducing a second reply type would require
/// every consumer to handle an additional <c>Ask</c> result union — more change
/// for no additive-safety gain. The defaulted field is strictly additive and
/// keeps all call sites untouched.
/// </para>
/// </remarks>
public record DebugViewSnapshot(
string InstanceUniqueName,
IReadOnlyList<AttributeValueChanged> AttributeValues,
IReadOnlyList<AlarmStateChanged> AlarmStates,
DateTimeOffset SnapshotTimestamp);
DateTimeOffset SnapshotTimestamp,
// M2.11 — additive field: true when the requested instance is not registered
// on this site. Defaults to false so all existing call sites and wire
// frames are unaffected.
bool InstanceNotFound = false);
@@ -40,7 +40,14 @@ public record SiteHealthReport(
// hosted service every 30 s. Defaults to null so existing producers /
// tests that don't refresh the snapshot stay valid; the central health
// surface treats null as "no data yet" rather than a zeroed queue.
SiteAuditBacklogSnapshot? SiteAuditBacklog = null);
SiteAuditBacklogSnapshot? SiteAuditBacklog = null,
// Site Event Logging (#12) M2.16 (#30): cumulative count of event-log write
// failures (SQLite error, disk full, bounded-queue overflow drop) since the
// logger was created. Populated by the site-side SiteEventLogFailureCountReporter
// hosted service. Point-in-time (not reset on collect) — mirrors the
// SiteAuditBacklog pattern. Defaults to 0 so existing producers / tests that
// don't wire the poller stay valid.
long SiteEventLogWriteFailures = 0);
/// <summary>
/// Broadcast wrapper used between central nodes to keep per-node
@@ -12,8 +12,8 @@ public sealed record ConfigurationDiff
public string? OldRevisionHash { get; init; }
/// <summary>Revision hash of the new configuration being compared.</summary>
public string? NewRevisionHash { get; init; }
/// <summary>True when any attribute, alarm, or script changes are present.</summary>
public bool HasChanges => AttributeChanges.Count > 0 || AlarmChanges.Count > 0 || ScriptChanges.Count > 0;
/// <summary>True when any attribute, alarm, script, or connection changes are present.</summary>
public bool HasChanges => AttributeChanges.Count > 0 || AlarmChanges.Count > 0 || ScriptChanges.Count > 0 || ConnectionChanges.Count > 0;
/// <summary>Diff entries for resolved attributes.</summary>
public IReadOnlyList<DiffEntry<ResolvedAttribute>> AttributeChanges { get; init; } = [];
@@ -21,6 +21,13 @@ public sealed record ConfigurationDiff
public IReadOnlyList<DiffEntry<ResolvedAlarm>> AlarmChanges { get; init; } = [];
/// <summary>Diff entries for resolved scripts.</summary>
public IReadOnlyList<DiffEntry<ResolvedScript>> ScriptChanges { get; init; } = [];
/// <summary>
/// Diff entries for connection configurations, keyed by connection name.
/// Surfaces standalone endpoint/protocol/failover drift that does not show
/// up as a per-attribute binding change (TemplateEngine-018).
/// </summary>
public IReadOnlyList<DiffEntry<ConnectionConfig>> ConnectionChanges { get; init; } = [];
}
/// <summary>
@@ -174,6 +174,14 @@ public sealed record ResolvedScript
/// <summary>Gets the minimum time between script executions.</summary>
public TimeSpan? MinTimeBetweenRuns { get; init; }
/// <summary>
/// Per-script execution timeout in seconds, or null to use the site's global
/// default. A non-positive value is treated as null (use global) by the Site
/// Runtime. Seconds (not TimeSpan) to match the global option it overrides.
/// </summary>
public int? ExecutionTimeoutSeconds { get; init; }
/// <summary>Gets the source of this script.</summary>
public string Source { get; init; } = "Template";
@@ -0,0 +1,393 @@
using System.Text.Json;
namespace ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
/// <summary>
/// Recursive, persistence-ignorant model of an inbound-API parameter or
/// return-value type definition. This is the deserialized form of the JSON
/// Schema stored in <c>ApiMethod.ParameterDefinitions</c> / <c>ReturnDefinition</c>
/// (and the equivalent TemplateScript / SharedScript columns), the canonical
/// format produced by the Central UI schema builder and the
/// <c>MigrateParametersToJsonSchema</c> migration.
///
/// <para>
/// Unlike the flat <see cref="ParameterDefinition"/> (name → scalar type, no
/// nesting), an <see cref="InboundApiSchema"/> carries the FULL nested type:
/// an <c>object</c> node carries its declared field schemas (and which fields
/// are required); an <c>array</c> node carries its element schema. This lets
/// callers validate complex request/response structures field-by-field and
/// element-by-element to any depth, with path-qualified errors
/// (e.g. <c>order.items[2].quantity</c>).
/// </para>
///
/// <para>
/// The extended type vocabulary (after normalization) is the JSON Schema set:
/// <c>boolean · integer · number · string · object · array</c>. Legacy aliases
/// (<c>bool</c>, <c>int</c>, <c>float</c>, <c>double</c>, <c>list</c>, …) are
/// accepted on parse for transition safety, mirroring the Central UI
/// <c>SchemaBuilderModel</c> / <c>JsonSchemaShapeParser</c> conventions.
/// </para>
/// </summary>
public sealed class InboundApiSchema
{
/// <summary>Normalized JSON Schema type: one of <c>boolean · integer · number · string · object · array</c>.</summary>
public string Type { get; init; } = "string";
/// <summary>For <see cref="Type"/> = <c>object</c>: the declared fields, in declaration order.</summary>
public IReadOnlyList<InboundApiSchemaField> Fields { get; init; } = [];
/// <summary>For <see cref="Type"/> = <c>array</c>: the schema every element must satisfy; null means element type was not declared (shape-only).</summary>
public InboundApiSchema? Items { get; init; }
/// <summary>Maximum allowed schema nesting depth for both Parse and Validate recursion.</summary>
private const int MaxDepth = 32;
// Allow the JSON reader to parse schemas up to ~3× our structural ceiling so
// the application-level ParseSchema depth guard (MaxDepth = 32) fires before
// the System.Text.Json reader ceiling. Each structural level contributes
// roughly 3 JSON-reader nesting levels (object → properties-object → value),
// so 128 reader levels comfortably accommodates 32+ structural levels.
private static readonly JsonDocumentOptions DocOptions = new() { MaxDepth = 128 };
/// <summary>
/// Parses a stored definition string into an <see cref="InboundApiSchema"/>.
/// Accepts the canonical JSON Schema object form
/// (<c>{"type":"object","properties":{…},"required":[…]}</c>) and, for
/// transition safety, the legacy flat-array parameter form
/// (<c>[{name,type,required,itemType?}]</c>) which it treats as an object
/// schema whose properties are the array entries.
/// </summary>
/// <param name="json">The definition JSON; null/whitespace yields <c>null</c>.</param>
/// <returns>The parsed schema, or <c>null</c> when the input is empty.</returns>
/// <exception cref="JsonException">The input is non-empty but not valid JSON, is a JSON scalar/null at the root, or the schema nesting exceeds <see cref="MaxDepth"/>.</exception>
public static InboundApiSchema? Parse(string? json)
{
if (string.IsNullOrWhiteSpace(json))
{
return null;
}
using var doc = JsonDocument.Parse(json, DocOptions);
return doc.RootElement.ValueKind switch
{
JsonValueKind.Object => ParseSchema(doc.RootElement, depth: 0),
JsonValueKind.Array => ParseLegacyArray(doc.RootElement),
_ => throw new JsonException("Type definition must be a JSON object (JSON Schema) or legacy parameter array."),
};
}
private static InboundApiSchema ParseSchema(JsonElement el, int depth)
{
if (depth > MaxDepth)
{
throw new JsonException($"Schema nesting exceeds the maximum allowed depth of {MaxDepth}.");
}
var type = el.TryGetProperty("type", out var t) && t.ValueKind == JsonValueKind.String
? NormalizeType(t.GetString())
: "string";
if (type == "array")
{
InboundApiSchema? items = null;
if (el.TryGetProperty("items", out var itemsEl) && itemsEl.ValueKind == JsonValueKind.Object)
{
items = ParseSchema(itemsEl, depth + 1);
}
return new InboundApiSchema { Type = "array", Items = items };
}
if (type == "object")
{
var requiredSet = new HashSet<string>(StringComparer.Ordinal);
if (el.TryGetProperty("required", out var req) && req.ValueKind == JsonValueKind.Array)
{
foreach (var r in req.EnumerateArray())
{
if (r.ValueKind == JsonValueKind.String)
{
var s = r.GetString();
if (!string.IsNullOrEmpty(s))
{
requiredSet.Add(s);
}
}
}
}
var fields = new List<InboundApiSchemaField>();
if (el.TryGetProperty("properties", out var props) && props.ValueKind == JsonValueKind.Object)
{
foreach (var prop in props.EnumerateObject())
{
var schema = prop.Value.ValueKind == JsonValueKind.Object
? ParseSchema(prop.Value, depth + 1)
: new InboundApiSchema { Type = "string" };
fields.Add(new InboundApiSchemaField(prop.Name, requiredSet.Contains(prop.Name), schema));
}
}
return new InboundApiSchema { Type = "object", Fields = fields };
}
return new InboundApiSchema { Type = type };
}
private static InboundApiSchema ParseLegacyArray(JsonElement arr)
{
var fields = new List<InboundApiSchemaField>();
foreach (var item in arr.EnumerateArray())
{
if (item.ValueKind != JsonValueKind.Object)
{
continue;
}
// The legacy flat shape historically appeared with both PascalCase
// (CLI / anonymous-object serialization read back with
// PropertyNameCaseInsensitive) and lowercase (DB) keys, so the
// property lookups here are case-insensitive for compatibility.
var name = TryGetMember(item, "name", out var n) ? n.GetString() : null;
if (string.IsNullOrEmpty(name))
{
continue;
}
var rawType = TryGetMember(item, "type", out var t) ? t.GetString() : "string";
// A field is optional only when "required" is explicitly false.
// The SQL migration uses a string comparison (LOWER(...) <> 'false'),
// so we must also accept the string "false" (case-insensitive) here —
// not only the JSON boolean false — to stay consistent with legacy rows
// that stored "required":"false" as a string.
var required = !TryGetMember(item, "required", out var rq)
|| (rq.ValueKind != JsonValueKind.False
&& !string.Equals(
rq.ValueKind == JsonValueKind.String ? rq.GetString() : null,
"false",
StringComparison.OrdinalIgnoreCase));
var normalized = NormalizeType(rawType);
InboundApiSchema schema;
if (normalized == "array")
{
var inner = TryGetMember(item, "itemType", out var it) ? it.GetString() : null;
schema = new InboundApiSchema
{
Type = "array",
Items = string.IsNullOrEmpty(inner) ? null : new InboundApiSchema { Type = NormalizeType(inner) },
};
}
else
{
schema = new InboundApiSchema { Type = normalized };
}
fields.Add(new InboundApiSchemaField(name!, required, schema));
}
return new InboundApiSchema { Type = "object", Fields = fields };
}
/// <summary>
/// Case-insensitive object-member lookup, used only on the legacy flat-array
/// path so both PascalCase and lowercase legacy keys resolve.
/// </summary>
private static bool TryGetMember(JsonElement obj, string name, out JsonElement value)
{
foreach (var prop in obj.EnumerateObject())
{
if (string.Equals(prop.Name, name, StringComparison.OrdinalIgnoreCase))
{
value = prop.Value;
return true;
}
}
value = default;
return false;
}
/// <summary>
/// Normalizes a raw type token to the canonical JSON Schema vocabulary,
/// tolerating legacy aliases. Unknown tokens are returned lowercased so the
/// validator can surface an explicit "unknown type" error.
/// </summary>
/// <param name="raw">The raw type token (may be null).</param>
/// <returns>The normalized type token.</returns>
public static string NormalizeType(string? raw) => raw?.ToLowerInvariant() switch
{
null or "" => "string",
"boolean" or "bool" => "boolean",
"integer" or "int" or "int32" or "int64" => "integer",
"number" or "float" or "double" or "decimal" => "number",
// datetime→string is intentional: the legacy migration's SQL
// normalization function maps "datetime" to "string" (no separate
// datetime wire type in the extended type system), so C# must match.
"string" or "datetime" => "string",
"object" => "object",
"array" or "list" => "array",
var other => other,
};
/// <summary>
/// Recursively validates a JSON value against this schema. A JSON <c>null</c>
/// satisfies any type (a present-but-null field is allowed; absence of a
/// required field is reported by the parent object). Errors are accumulated
/// with a path prefix (e.g. <c>order.items[2].quantity</c>) so the caller can
/// pinpoint the offending field.
/// </summary>
/// <param name="value">The JSON value to validate.</param>
/// <param name="path">The path prefix for the value being validated (empty for the root).</param>
/// <param name="errors">Accumulator the validator appends path-qualified messages to.</param>
public void Validate(JsonElement value, string path, List<string> errors)
=> ValidateCore(value, path, errors, depth: 0);
private void ValidateCore(JsonElement value, string path, List<string> errors, int depth)
{
ArgumentNullException.ThrowIfNull(errors);
if (depth > MaxDepth)
{
errors.Add($"{Describe(path)}: schema nesting too deep (max {MaxDepth})");
return;
}
// A null value satisfies any declared type — a present-but-null field is
// allowed; a MISSING required field is reported by the enclosing object.
if (value.ValueKind == JsonValueKind.Null)
{
return;
}
switch (Type)
{
case "boolean":
if (value.ValueKind is not (JsonValueKind.True or JsonValueKind.False))
{
errors.Add(Mismatch(path, "Boolean"));
}
break;
case "integer":
if (value.ValueKind != JsonValueKind.Number || !value.TryGetInt64(out _))
{
errors.Add(Mismatch(path, "Integer"));
}
break;
case "number":
if (value.ValueKind != JsonValueKind.Number)
{
errors.Add(Mismatch(path, "Float"));
}
break;
case "string":
if (value.ValueKind != JsonValueKind.String)
{
errors.Add(Mismatch(path, "String"));
}
break;
case "object":
ValidateObject(value, path, errors, depth);
break;
case "array":
ValidateArray(value, path, errors, depth);
break;
default:
errors.Add($"{Describe(path)} has unknown declared type '{Type}'");
break;
}
}
private void ValidateObject(JsonElement value, string path, List<string> errors, int depth)
{
if (value.ValueKind != JsonValueKind.Object)
{
errors.Add(Mismatch(path, "Object"));
return;
}
// Reject undeclared fields (defensive, consistent with InboundAPI-010's
// top-level "unexpected parameter" rejection) — a typo'd nested field is
// surfaced instead of silently ignored. Skipped when no fields are
// declared (a bare {"type":"object"} stays shape-only, like the legacy
// behaviour and the array-without-items case).
if (Fields.Count > 0)
{
var declared = new HashSet<string>(Fields.Select(f => f.Name), StringComparer.Ordinal);
foreach (var prop in value.EnumerateObject())
{
if (!declared.Contains(prop.Name))
{
errors.Add($"{Describe(JoinField(path, prop.Name))} is not a declared field");
}
}
}
foreach (var field in Fields)
{
var fieldPath = JoinField(path, field.Name);
if (value.TryGetProperty(field.Name, out var fieldValue))
{
field.Schema.ValidateCore(fieldValue, fieldPath, errors, depth + 1);
}
else if (field.Required)
{
errors.Add($"missing required field {Describe(fieldPath)}");
}
}
}
private void ValidateArray(JsonElement value, string path, List<string> errors, int depth)
{
if (value.ValueKind != JsonValueKind.Array)
{
errors.Add(Mismatch(path, "List"));
return;
}
// No declared element type → shape-only (any elements accepted).
if (Items is null)
{
return;
}
var index = 0;
foreach (var element in value.EnumerateArray())
{
Items.ValidateCore(element, $"{path}[{index}]", errors, depth + 1);
index++;
}
}
private static string Mismatch(string path, string expectedDisplayType) =>
$"{Describe(path)} must be {Article(expectedDisplayType)} {expectedDisplayType}";
private static string Describe(string path) =>
string.IsNullOrEmpty(path) ? "value" : $"'{path}'";
private static string JoinField(string path, string field) =>
string.IsNullOrEmpty(path) ? field : $"{path}.{field}";
private static string Article(string word) =>
word.Length > 0 && "AEIOU".IndexOf(char.ToUpperInvariant(word[0])) >= 0 ? "an" : "a";
}
/// <summary>
/// One declared field of an <see cref="InboundApiSchema"/> object node: the
/// field name, whether it is required, and its (recursive) type schema.
/// </summary>
/// <param name="Name">The field name as it appears in the JSON.</param>
/// <param name="Required">Whether the field must be present.</param>
/// <param name="Schema">The recursive type schema the field's value must satisfy.</param>
public sealed record InboundApiSchemaField(string Name, bool Required, InboundApiSchema Schema);
@@ -10,10 +10,24 @@ namespace ZB.MOM.WW.ScadaBridge.Communication.Actors;
/// Long-lived (one per active debug session) actor on the central side. Debug sessions
/// are session-based and temporary — this actor holds no persisted state and does not
/// derive from an Akka.Persistence base class; its state does not survive a restart.
/// Sends SubscribeDebugViewRequest to the site via CentralCommunicationActor (with THIS actor
/// as the Sender) to get the initial snapshot. After receiving the snapshot, opens a gRPC
/// server-streaming subscription via SiteStreamGrpcClient for ongoing events.
/// Stream events are marshalled back to the actor via Self.Tell for thread safety.
/// <para>
/// <b>Stream-first lifecycle (M2.18, #26).</b> To avoid losing any
/// <see cref="AttributeValueChanged"/>/<see cref="AlarmStateChanged"/> that occurs on
/// the site during the snapshot-build + network-transit window, the gRPC server-streaming
/// subscription is opened FIRST (in <see cref="PreStart"/>), alongside the
/// <c>SubscribeDebugViewRequest</c> sent to the site via CentralCommunicationActor (with
/// THIS actor as the Sender). Live events that arrive before the
/// <see cref="DebugViewSnapshot"/> is delivered are <em>buffered in arrival order</em>.
/// When the snapshot arrives it is delivered to the consumer, then the buffer is flushed
/// in order, <em>deduped</em> against the snapshot (an event whose per-entity timestamp is
/// &lt;= the snapshot's timestamp for the same entity is already reflected → dropped; a
/// strictly-newer event is delivered; an event for an entity absent from the snapshot is
/// delivered). After the flush the actor switches to pass-through: subsequent events go
/// straight to the consumer. A mid-session reconnect (after the snapshot) resumes
/// pass-through — the snapshot is a one-time thing.
/// </para>
/// Stream events are marshalled back to the actor via Self.Tell for thread safety; all
/// state (phase flag + buffer) is mutated only on the actor thread.
/// </summary>
public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
{
@@ -49,6 +63,31 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
private bool _stopped;
private CancellationTokenSource? _grpcCts;
/// <summary>
/// Phase flag (M2.18). <see langword="false"/> until the initial
/// <see cref="DebugViewSnapshot"/> has been delivered and the pre-snapshot buffer
/// flushed; <see langword="true"/> thereafter (pass-through). Mutated only on the
/// actor thread. A reconnect does NOT touch this flag — a mid-session reconnect
/// (after the snapshot) therefore stays in pass-through, and a reconnect during the
/// buffering phase (before the snapshot) stays buffering.
/// </summary>
private bool _snapshotDelivered;
/// <summary>
/// Ordered buffer of live gRPC events (<see cref="AttributeValueChanged"/>/
/// <see cref="AlarmStateChanged"/>) that arrived before the snapshot was delivered.
/// Flushed (with per-entity dedup against the snapshot) when the snapshot arrives,
/// then never used again. Mutated only on the actor thread.
/// </summary>
private readonly List<object> _preSnapshotBuffer = new();
/// <summary>
/// Defensive log threshold: if the pre-snapshot buffer grows past this many events
/// during a slow snapshot we log once (events are NOT dropped — the window is short).
/// </summary>
private const int BufferWarnThreshold = 10_000;
private bool _bufferWarned;
/// <summary>Timer scheduler for reconnect and stability window timers.</summary>
public ITimerScheduler Timers { get; set; } = null!;
@@ -85,13 +124,55 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
_grpcNodeAAddress = grpcNodeAAddress;
_grpcNodeBAddress = grpcNodeBAddress;
// Initial snapshot response from the site (via ClusterClient)
// Initial snapshot response from the site (via ClusterClient).
// M2.11: if the site reports InstanceNotFound=true the instance is not
// deployed there. M2.18: under the stream-first lifecycle the gRPC stream
// was already opened in PreStart, so the not-found path must tear it down
// (CleanupGrpc) rather than enter pass-through. Forward the snapshot (with
// InstanceNotFound=true) to _onEvent so DebugStreamService's TCS resolves and
// the caller can inspect the flag; then stop cleanly.
Receive<DebugViewSnapshot>(snapshot =>
{
_log.Info("Received initial snapshot for {0} ({1} attrs, {2} alarms)",
_instanceUniqueName, snapshot.AttributeValues.Count, snapshot.AlarmStates.Count);
if (_snapshotDelivered)
{
// Defensive: a duplicate / late snapshot after we have already moved to
// pass-through. The snapshot is a one-time thing — ignore replays so we
// never re-buffer or double-deliver.
_log.Debug("Ignoring duplicate DebugViewSnapshot for {0} (already delivered)",
_instanceUniqueName);
return;
}
if (snapshot.InstanceNotFound)
{
_log.Warning("Instance {0} is not deployed on site; terminating debug stream",
_instanceUniqueName);
// M2.18: the stream-first subscription opened in PreStart is for a
// non-deployed instance — cancel it (and any buffered gap events are
// discarded with the actor). No pass-through.
// _stopped is set AFTER CleanupGrpc() to match the ordering in the
// DebugStreamTerminated and ReceiveTimeout handlers (cosmetic consistency).
CleanupGrpc();
_stopped = true;
_preSnapshotBuffer.Clear();
_onEvent(snapshot); // resolves the snapshot TCS with InstanceNotFound=true
// Note: after Context.Stop(Self) below the actor is dead. DebugStreamService
// inspects InitialSnapshot.InstanceNotFound and calls StopStream, which sends
// a StopDebugStream message. That Tell arrives after the actor has already
// stopped, producing a benign Akka dead-letter — expected and harmless.
Context.Stop(Self);
return;
}
_log.Info("Received initial snapshot for {0} ({1} attrs, {2} alarms); flushing {3} buffered event(s)",
_instanceUniqueName, snapshot.AttributeValues.Count, snapshot.AlarmStates.Count,
_preSnapshotBuffer.Count);
// Deliver the snapshot, then flush the gap-window buffer (deduped), then
// switch to pass-through. Order matters: snapshot first, buffered events next.
_onEvent(snapshot);
OpenGrpcStream();
FlushBuffer(snapshot);
_snapshotDelivered = true;
});
// Domain events arriving via Self.Tell from gRPC callback.
@@ -99,8 +180,11 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
// flapping stream that delivers a single event between failures would
// otherwise never trip MaxRetries. The retry budget is recovered only by
// GrpcStreamStable (a stream that has stayed up for StabilityWindow).
Receive<AttributeValueChanged>(changed => _onEvent(changed));
Receive<AlarmStateChanged>(changed => _onEvent(changed));
// M2.18: before the snapshot has been delivered, BUFFER (in arrival order)
// rather than deliver — these may be gap-window events. After the snapshot has
// been flushed, pass through directly (same handler, phase-dependent behavior).
Receive<AttributeValueChanged>(changed => HandleStreamEvent(changed));
Receive<AlarmStateChanged>(changed => HandleStreamEvent(changed));
// Stream has been stably connected for StabilityWindow — recover the
// retry budget so a future transient fault gets a fresh set of retries.
@@ -155,11 +239,161 @@ public class DebugStreamBridgeActor : ReceiveActor, IWithTimers
});
}
/// <summary>
/// Handles a live gRPC stream event (<see cref="AttributeValueChanged"/> or
/// <see cref="AlarmStateChanged"/>). Before the snapshot has been delivered the
/// event is appended to the ordered pre-snapshot buffer (gap-window capture); after
/// the snapshot+flush it is passed straight through to the consumer. Always runs on
/// the actor thread (events are marshalled in via Self.Tell), so the phase flag and
/// buffer are accessed without locking.
/// </summary>
private void HandleStreamEvent(object evt)
{
if (_snapshotDelivered)
{
_onEvent(evt);
return;
}
_preSnapshotBuffer.Add(evt);
if (!_bufferWarned && _preSnapshotBuffer.Count > BufferWarnThreshold)
{
_bufferWarned = true;
_log.Warning(
"Pre-snapshot debug-event buffer for {0} exceeded {1} events while awaiting the snapshot; " +
"events are still retained (not dropped).",
_instanceUniqueName, BufferWarnThreshold);
}
}
/// <summary>
/// Flushes the pre-snapshot buffer in arrival order, deduping each event against the
/// just-delivered snapshot (M2.18).
/// <para>
/// <b>Dedup rule.</b> Identity is per-entity:
/// attributes by (InstanceUniqueName, AttributePath, AttributeName); alarms by
/// (InstanceUniqueName, AlarmName, SourceReference). For a buffered event whose entity
/// is present in the snapshot, the comparison is against that entity's snapshot
/// timestamp: a buffered timestamp &lt;= the snapshot timestamp means the event is
/// already reflected in the snapshot → DROP; a strictly-newer (&gt;) timestamp means
/// the event happened after the snapshot was built → DELIVER. The boundary is inclusive
/// on the snapshot side (equal timestamps are treated as duplicates) — the snapshot is
/// the authoritative point-in-time value, so an event at the exact same instant carries
/// no new information. A buffered event whose entity is NOT in the snapshot is a genuine
/// gap-window event → DELIVER.
/// </para>
/// </summary>
private void FlushBuffer(DebugViewSnapshot snapshot)
{
if (_preSnapshotBuffer.Count == 0) return;
// Build per-entity "as-of" timestamps from the snapshot. If (defensively) the
// snapshot lists the same entity twice, keep the newest timestamp.
var attrAsOf = new Dictionary<string, DateTimeOffset>();
foreach (var a in snapshot.AttributeValues)
{
var key = AttributeKey(a);
if (!attrAsOf.TryGetValue(key, out var existing) || a.Timestamp > existing)
attrAsOf[key] = a.Timestamp;
}
var alarmAsOf = new Dictionary<string, DateTimeOffset>();
foreach (var al in snapshot.AlarmStates)
{
var key = AlarmKey(al);
if (!alarmAsOf.TryGetValue(key, out var existing) || al.Timestamp > existing)
alarmAsOf[key] = al.Timestamp;
}
var flushed = 0;
var dropped = 0;
foreach (var evt in _preSnapshotBuffer)
{
if (IsReflectedInSnapshot(evt, attrAsOf, alarmAsOf))
{
dropped++;
continue;
}
_onEvent(evt);
flushed++;
}
if (dropped > 0 || flushed > 0)
{
_log.Debug("Flushed {0} buffered debug event(s) for {1}, dropped {2} as already-in-snapshot",
flushed, _instanceUniqueName, dropped);
}
_preSnapshotBuffer.Clear();
}
/// <summary>
/// Returns <see langword="true"/> when a buffered event is already reflected in the
/// snapshot (same entity, buffered timestamp &lt;= snapshot timestamp) and must be
/// dropped; otherwise <see langword="false"/> (deliver).
/// </summary>
private static bool IsReflectedInSnapshot(
object evt,
IReadOnlyDictionary<string, DateTimeOffset> attrAsOf,
IReadOnlyDictionary<string, DateTimeOffset> alarmAsOf)
{
switch (evt)
{
case AttributeValueChanged a:
return attrAsOf.TryGetValue(AttributeKey(a), out var attrTs) && a.Timestamp <= attrTs;
case AlarmStateChanged al:
return alarmAsOf.TryGetValue(AlarmKey(al), out var alarmTs) && al.Timestamp <= alarmTs;
default:
// Unknown buffered type (should not happen — only attr/alarm are buffered):
// never treat as a duplicate.
return false;
}
}
/// <summary>
/// Delimiter used to join identity components into a single dedup key. A NUL
/// control character cannot appear in an instance/attribute/alarm name, so
/// distinct identities never collide on a shared boundary (unlike a space, which
/// may legitimately occur within a name). Declared as an escaped char so the
/// source carries no raw NUL byte.
/// </summary>
private const char KeyDelimiter = '\u0000';
/// <summary>
/// Per-entity dedup key for an attribute change. Each nullable component is guarded
/// with <c>?? string.Empty</c> so a null can never silently collide with another
/// key via <see cref="string.Concat"/> (e.g. two entries with null AttributePath
/// would otherwise share a key with any entry whose AttributePath is the empty string).
/// </summary>
private static string AttributeKey(AttributeValueChanged a) =>
string.Concat(
a.InstanceUniqueName ?? string.Empty, KeyDelimiter,
a.AttributePath ?? string.Empty, KeyDelimiter,
a.AttributeName ?? string.Empty);
/// <summary>
/// Per-entity dedup key for an alarm change. Includes <see cref="AlarmStateChanged.SourceReference"/>
/// so native per-condition alarms (which share an AlarmName but differ by source
/// reference) are not conflated; empty for computed alarms. Each nullable component is
/// guarded with <c>?? string.Empty</c> to prevent silent key collisions.
/// </summary>
private static string AlarmKey(AlarmStateChanged al) =>
string.Concat(
al.InstanceUniqueName ?? string.Empty, KeyDelimiter,
al.AlarmName ?? string.Empty, KeyDelimiter,
al.SourceReference ?? string.Empty);
/// <inheritdoc />
protected override void PreStart()
{
_log.Info("Starting debug stream bridge for {0} on site {1}", _instanceUniqueName, _siteIdentifier);
// M2.18 stream-first: open the gRPC live-event subscription BEFORE (and
// alongside) requesting the snapshot, so events occurring during the
// snapshot-build + network-transit window are captured (buffered) and not lost.
OpenGrpcStream();
// Send subscribe request via CentralCommunicationActor for the initial snapshot.
var request = new SubscribeDebugViewRequest(_instanceUniqueName, _correlationId);
var envelope = new SiteEnvelope(_siteIdentifier, request);
@@ -178,6 +178,11 @@ public class TemplateScriptConfiguration : IEntityTypeConfiguration<TemplateScri
builder.Property(s => s.ReturnDefinition)
.HasMaxLength(4000);
// M2.5 (#9): nullable per-script execution timeout (seconds). Null = use
// the site's global ScriptExecutionTimeoutSeconds default.
builder.Property(s => s.ExecutionTimeoutSeconds)
.IsRequired(false);
builder.HasIndex(s => new { s.TemplateId, s.Name }).IsUnique();
}
}
@@ -0,0 +1,28 @@
using Microsoft.EntityFrameworkCore.Migrations;
#nullable disable
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
{
/// <inheritdoc />
public partial class ResyncLdapGroupMappingSeed : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.InsertData(
table: "LdapGroupMappings",
columns: new[] { "Id", "LdapGroupName", "Role" },
values: new object[] { 5, "SCADA-Viewers", "Viewer" });
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DeleteData(
table: "LdapGroupMappings",
keyColumn: "Id",
keyValue: 5);
}
}
}
@@ -0,0 +1,28 @@
using Microsoft.EntityFrameworkCore.Migrations;
#nullable disable
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
{
/// <inheritdoc />
public partial class AddTemplateScriptExecutionTimeout : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.AddColumn<int>(
name: "ExecutionTimeoutSeconds",
table: "TemplateScripts",
type: "int",
nullable: true);
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropColumn(
name: "ExecutionTimeoutSeconds",
table: "TemplateScripts");
}
}
}
@@ -925,6 +925,12 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
Id = 4,
LdapGroupName = "SCADA-Deploy-SiteA",
Role = "Deployer"
},
new
{
Id = 5,
LdapGroupName = "SCADA-Viewers",
Role = "Viewer"
});
});
@@ -1307,6 +1313,9 @@ namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Migrations
.IsRequired()
.HasColumnType("nvarchar(max)");
b.Property<int?>("ExecutionTimeoutSeconds")
.HasColumnType("int");
b.Property<bool>("IsInherited")
.HasColumnType("bit");
@@ -99,8 +99,14 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
// routed to subscribers (NativeAlarmActors) by source-object reference.
/// <summary>sourceReference → set of subscriber actor refs (NativeAlarmActors), for routing + ref-count.</summary>
private readonly Dictionary<string, HashSet<IActorRef>> _alarmSourceSubscribers = new();
/// <summary>sourceReference → optional condition filter (first subscriber wins).</summary>
/// <summary>sourceReference → raw condition filter string passed to the adapter (first subscriber wins).</summary>
private readonly Dictionary<string, string?> _alarmSourceFilter = new();
/// <summary>
/// sourceReference → parsed condition-type predicate (M2.4 / #8). The authoritative
/// client-side gate in <see cref="HandleAlarmTransitionReceived"/>; applies uniformly
/// across OPC UA and the gateway-wide MxGateway feed.
/// </summary>
private readonly Dictionary<string, AlarmConditionFilter> _alarmSourceFilterPredicate = new();
/// <summary>sourceReference → adapter alarm subscription id.</summary>
private readonly Dictionary<string, string> _alarmSubscriptionIds = new();
/// <summary>sourceReferences whose adapter SubscribeAlarmsAsync is currently in flight.</summary>
@@ -1480,6 +1486,9 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
}
subs.Add(subscriber);
_alarmSourceFilter[request.SourceReference] = request.ConditionFilter;
// Parse the type-name filter once; this is the authoritative client-side
// gate consulted on every routed transition (M2.4 / #8).
_alarmSourceFilterPredicate[request.SourceReference] = AlarmConditionFilter.Parse(request.ConditionFilter);
// If the adapter feed for this source is already (being) established, the
// existing subscription serves the new subscriber too.
@@ -1546,6 +1555,14 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
if (!match)
continue;
// M2.4 (#8): authoritative client-side condition-type gate. Applied
// per matched source because two sources may share a prefix yet carry
// different filters. Empty filter = allow all (historical behaviour);
// framing sentinels (SnapshotComplete) are never dropped.
if (_alarmSourceFilterPredicate.TryGetValue(sourceRef, out var predicate) &&
!predicate.IsAllowed(transition))
continue;
foreach (var sub in subs)
{
if (notified.Add(sub))
@@ -1566,6 +1583,7 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers
// No subscribers remain for this source — tear down the adapter feed.
_alarmSourceSubscribers.Remove(request.SourceReference);
_alarmSourceFilter.Remove(request.SourceReference);
_alarmSourceFilterPredicate.Remove(request.SourceReference);
if (_alarmSubscriptionIds.Remove(request.SourceReference, out var subId) &&
_adapter is IAlarmSubscribableConnection alarmable)
{
@@ -1,3 +1,5 @@
using System.Globalization;
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ProtoConditionState = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmConditionState;
using ProtoTransitionKind = ZB.MOM.WW.MxGateway.Contracts.Proto.AlarmTransitionKind;
@@ -67,6 +69,19 @@ public static class MxGatewayAlarmMapper
Shelve: AlarmShelveState.Unshelved, Suppressed: false, Severity: NormalizeSeverity(severity));
}
/// <summary>
/// Converts an <see cref="MxValue"/> union to a display-only string using
/// <see cref="MxValueExtensions.ToClrValue"/> and invariant culture formatting,
/// so numeric values always use '.' as the decimal separator. Null or unset
/// values produce an empty string.
/// </summary>
internal static string MxValueToString(MxValue? mxVal)
{
if (mxVal is null) return "";
var clr = mxVal.ToClrValue();
return clr is null ? "" : Convert.ToString(clr, CultureInfo.InvariantCulture) ?? "";
}
/// <summary>Maps a live <see cref="OnAlarmTransitionEvent"/> to a transition.</summary>
/// <param name="body">The gateway alarm transition event proto message to map.</param>
/// <returns>The protocol-neutral <see cref="NativeAlarmTransition"/>.</returns>
@@ -83,8 +98,8 @@ public static class MxGatewayAlarmMapper
OperatorComment: body.OperatorComment,
OriginalRaiseTime: body.OriginalRaiseTimestamp?.ToDateTimeOffset(),
TransitionTime: body.TransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow,
CurrentValue: "",
LimitValue: "");
CurrentValue: MxValueToString(body.CurrentValue),
LimitValue: MxValueToString(body.LimitValue));
/// <summary>The end-of-snapshot sentinel transition (no condition payload).</summary>
/// <returns>A <see cref="NativeAlarmTransition"/> with <c>AlarmTransitionKind.SnapshotComplete</c>.</returns>
@@ -109,6 +124,6 @@ public static class MxGatewayAlarmMapper
OperatorComment: snapshot.OperatorComment,
OriginalRaiseTime: snapshot.OriginalRaiseTimestamp?.ToDateTimeOffset(),
TransitionTime: snapshot.LastTransitionTimestamp?.ToDateTimeOffset() ?? DateTimeOffset.UtcNow,
CurrentValue: "",
LimitValue: "");
CurrentValue: MxValueToString(snapshot.CurrentValue),
LimitValue: MxValueToString(snapshot.LimitValue));
}
@@ -163,7 +163,11 @@ public class MxGatewayDataConnection : IDataConnection, IBrowsableDataConnection
_alarmCts = new CancellationTokenSource();
var token = _alarmCts.Token;
var client = _client!;
// Gateway-wide feed (null prefix); the actor filters per source reference.
// Gateway-wide feed (null prefix). The MxGateway has no server-side
// condition filter, so conditionFilter is intentionally NOT forwarded
// here: the DataConnectionActor applies it as the authoritative
// client-side gate per source reference AND per condition type
// (M2.4 / #8 — AlarmConditionFilter), uniform with the OPC UA path.
_ = Task.Run(() => client.RunAlarmStreamAsync(null, t => callback(t), token), token);
}
}
@@ -65,4 +65,40 @@ public static class OpcUaAlarmMapper
null or "Unshelved" => AlarmShelveState.Unshelved,
_ => AlarmShelveState.OneShotShelved
};
/// <summary>
/// Picks a representative display-only limit value from the four standard
/// <c>LimitAlarmType</c> set-point fields (HighHighLimit, HighLimit, LowLimit,
/// LowLowLimit) returned by the OPC UA event SelectClause.
///
/// <para>
/// The fields are absent (null raw value) on non-limit alarm types (discrete,
/// off-normal, etc.). When present, the first non-null value is returned in
/// priority order: HighHigh → High → Low → LowLow. The caller may use
/// <c>AlarmTypeName</c> or <c>ConditionName</c> to determine which specific
/// limit is active; this method intentionally returns the coarsest useful value
/// for the common single-limit case without requiring callers to understand the
/// OPC UA limit hierarchy.
/// </para>
/// </summary>
/// <param name="highHighRaw">Raw HighHighLimit field value (null when absent).</param>
/// <param name="highRaw">Raw HighLimit field value (null when absent).</param>
/// <param name="lowRaw">Raw LowLimit field value (null when absent).</param>
/// <param name="lowLowRaw">Raw LowLowLimit field value (null when absent).</param>
/// <returns>
/// A formatted string representation of the first non-null limit value, or an
/// empty string when all four fields are absent (non-limit alarm type).
/// </returns>
public static string PickLimitValue(object? highHighRaw, object? highRaw, object? lowRaw, object? lowLowRaw)
{
// Standard OPC UA LimitAlarmType limit values are numeric (Double/Float/Int).
// Convert with InvariantCulture so the decimal separator is always '.' regardless
// of the server's locale.
foreach (var raw in new[] { highHighRaw, highRaw, lowRaw, lowLowRaw })
{
if (raw is not null)
return Convert.ToString(raw, System.Globalization.CultureInfo.InvariantCulture) ?? "";
}
return "";
}
}
@@ -258,7 +258,9 @@ public class RealOpcUaClient : IOpcUaClient
MonitoringMode = MonitoringMode.Reporting,
SamplingInterval = 0,
QueueSize = 1000,
Filter = BuildAlarmEventFilter()
// Server-side WhereClause is a bandwidth optimisation only — the
// authoritative condition-type gate lives in DataConnectionActor (M2.4 / #8).
Filter = BuildAlarmEventFilter(AlarmConditionFilter.Parse(conditionFilter))
};
item.Notification += (_, e) =>
@@ -289,10 +291,94 @@ public class RealOpcUaClient : IOpcUaClient
}
/// <summary>
/// Builds the event filter selecting the base event fields plus the
/// AlarmConditionType / AcknowledgeableConditionType state sub-variables we mirror.
/// Maps the standard OPC UA Alarms &amp; Conditions type names (case-insensitive)
/// to their well-known <see cref="ObjectTypeIds"/> NodeIds, for building the
/// optional server-side WhereClause (M2.4 / #8). Only standard types appear
/// here; vendor/custom type names cannot be mapped without browsing the server
/// type tree, so they are handled by the client-side gate alone.
/// <para>
/// Single source of truth for both directions: <see cref="ConditionTypeNamesById"/>
/// is derived from this map, so the friendly-name and NodeId sides cannot drift.
/// </para>
/// </summary>
private static EventFilter BuildAlarmEventFilter()
internal static readonly IReadOnlyDictionary<string, NodeId> KnownConditionTypeIds =
new Dictionary<string, NodeId>(StringComparer.OrdinalIgnoreCase)
{
["ConditionType"] = ObjectTypeIds.ConditionType,
["AcknowledgeableConditionType"] = ObjectTypeIds.AcknowledgeableConditionType,
["AlarmConditionType"] = ObjectTypeIds.AlarmConditionType,
["LimitAlarmType"] = ObjectTypeIds.LimitAlarmType,
["ExclusiveLimitAlarmType"] = ObjectTypeIds.ExclusiveLimitAlarmType,
["NonExclusiveLimitAlarmType"] = ObjectTypeIds.NonExclusiveLimitAlarmType,
["ExclusiveLevelAlarmType"] = ObjectTypeIds.ExclusiveLevelAlarmType,
["NonExclusiveLevelAlarmType"] = ObjectTypeIds.NonExclusiveLevelAlarmType,
["ExclusiveDeviationAlarmType"] = ObjectTypeIds.ExclusiveDeviationAlarmType,
["NonExclusiveDeviationAlarmType"] = ObjectTypeIds.NonExclusiveDeviationAlarmType,
["ExclusiveRateOfChangeAlarmType"] = ObjectTypeIds.ExclusiveRateOfChangeAlarmType,
["NonExclusiveRateOfChangeAlarmType"] = ObjectTypeIds.NonExclusiveRateOfChangeAlarmType,
["DiscreteAlarmType"] = ObjectTypeIds.DiscreteAlarmType,
["OffNormalAlarmType"] = ObjectTypeIds.OffNormalAlarmType,
["SystemOffNormalAlarmType"] = ObjectTypeIds.SystemOffNormalAlarmType,
["TripAlarmType"] = ObjectTypeIds.TripAlarmType,
["DiscrepancyAlarmType"] = ObjectTypeIds.DiscrepancyAlarmType,
["InstrumentDiagnosticAlarmType"] = ObjectTypeIds.InstrumentDiagnosticAlarmType,
["SystemDiagnosticAlarmType"] = ObjectTypeIds.SystemDiagnosticAlarmType,
["CertificateExpirationAlarmType"] = ObjectTypeIds.CertificateExpirationAlarmType,
};
/// <summary>
/// Inverse of <see cref="KnownConditionTypeIds"/> (NodeId → friendly name), derived
/// from it so the two cannot drift (M2.4 / #8). Used by <see cref="ResolveAlarmTypeName"/>
/// to translate the event-type NodeId an OPC UA server sends back into the friendly
/// type name the conditionFilter gate and server-side WhereClause both key off.
/// </summary>
private static readonly IReadOnlyDictionary<NodeId, string> ConditionTypeNamesById =
KnownConditionTypeIds.ToDictionary(kv => kv.Value, kv => kv.Key);
/// <summary>
/// Resolves an event-type <see cref="NodeId"/> to the friendly condition-type name the
/// <c>conditionFilter</c> gate (and the server-side WhereClause) use (M2.4 / #8).
///
/// <para>
/// Standard A&amp;C types are returned as their friendly name (e.g. <c>i=9341</c> →
/// <c>"ExclusiveLevelAlarmType"</c>) so the client-side gate — which compares against
/// the friendly names in <see cref="KnownConditionTypeIds"/> — actually matches the
/// events the server delivers. Vendor/custom subtypes that are not in the map fall back
/// to the NodeId string; that is consistent because the WhereClause is likewise omitted
/// for unmapped names, so such a filter can only be expressed (and matched) as the NodeId
/// string. A <c>null</c> event type yields the empty string.
/// </para>
/// </summary>
/// <param name="eventType">The event-type NodeId from the A&amp;C notification, or <c>null</c>.</param>
/// <returns>The friendly type name when known; otherwise the NodeId string (or "" when null).</returns>
internal static string ResolveAlarmTypeName(NodeId? eventType)
{
if (eventType is null)
return "";
return ConditionTypeNamesById.TryGetValue(eventType, out var friendly)
? friendly
: eventType.ToString();
}
/// <summary>
/// Builds the event filter selecting the base event fields plus the
/// AlarmConditionType / AcknowledgeableConditionType state sub-variables we mirror,
/// and — when <paramref name="conditionFilter"/> is non-empty and every requested
/// type maps to a standard A&amp;C type — a server-side <see cref="ContentFilter"/>
/// WhereClause (OfType, OR'd) as a bandwidth optimisation (M2.4 / #8).
///
/// <para>
/// Conservative by design: if <em>any</em> requested type name cannot be mapped to
/// a standard <see cref="ObjectTypeIds"/> NodeId, the WhereClause is omitted entirely
/// rather than partially applied — a partial server-side filter would silently drop
/// the unmapped types' events, and the server cannot send what it filtered out. The
/// client-side gate in DataConnectionActor enforces the full filter regardless, so
/// omitting the WhereClause only forgoes the bandwidth saving, never correctness.
/// </para>
/// </summary>
/// <param name="conditionFilter">The parsed condition-type filter (allow-all when empty).</param>
/// <returns>The configured <see cref="EventFilter"/>.</returns>
internal static EventFilter BuildAlarmEventFilter(AlarmConditionFilter conditionFilter)
{
var filter = new EventFilter();
foreach (var name in AlarmStateFields)
@@ -306,9 +392,81 @@ public class RealOpcUaClient : IOpcUaClient
filter.SelectClauses.Add(SelectField(ObjectTypeIds.AlarmConditionType, "ShelvingState", "CurrentState"));// 10
filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "ConditionName")); // 11
filter.SelectClauses.Add(SelectField(ObjectTypeIds.ConditionType, "Comment")); // 12
// APPENDED fields (indices 13+): optional — only present on specific derived types.
// Guard all reads with fields.Count > N so base-ConditionType events still process.
// 13: AlarmConditionType/ActiveState/TransitionTime — the UTC instant the active-state
// last flipped to TRUE. Mapped to OriginalRaiseTime; absent on non-AlarmCondition
// events (ConditionType base events rarely carry it). CAVEAT: during a
// ConditionRefresh replay the server MAY re-stamp this to the current/restart time
// rather than the historical raise instant (OPC UA Part 9 §5.5.2 makes it advisory),
// so a snapshot-derived OriginalRaiseTime can look like the refresh time — it is
// display-only and not treated as authoritative.
filter.SelectClauses.Add(SelectField(ObjectTypeIds.AlarmConditionType, "ActiveState", "TransitionTime")); // 13
// 1417: LimitAlarmType limit thresholds — configuration-time set-points exposed as
// event fields by LimitAlarmType and all its subtypes (Exclusive/NonExclusive
// Level/Deviation/RateOfChange). Absent on non-limit alarm types (e.g. discrete,
// off-normal) — guarded by fields.Count > N below.
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "HighHighLimit")); // 14
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "HighLimit")); // 15
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "LowLimit")); // 16
filter.SelectClauses.Add(SelectField(ObjectTypeIds.LimitAlarmType, "LowLowLimit")); // 17
// UNAVAILABLE via standard OPC UA A&C event fields (documented here so future
// maintainers know these were considered, not overlooked):
// Category — not a standard event field; server-specific extensions only.
// Description — NativeAlarmTransition.Description is a static template description;
// OPC UA events carry dynamic Message text (index 4, mapped) but no
// static template description in the notification, so this stays empty.
// OperatorUser — not available on the standard ConditionRefresh replay stream;
// present on Acknowledge/Confirm method call results, but those do
// not flow through the monitored-item subscription.
// CurrentValue — the live process variable value is NOT a standard A&C event field;
// it would require a separate data subscription on the source node.
ApplyServerSideTypeWhereClause(filter, conditionFilter);
return filter;
}
/// <summary>
/// Attaches an OfType(-OR'd) WhereClause to <paramref name="filter"/> when every
/// requested condition type maps to a standard A&amp;C type NodeId; otherwise leaves
/// the WhereClause empty (see <see cref="BuildAlarmEventFilter"/> rationale).
/// </summary>
private static void ApplyServerSideTypeWhereClause(EventFilter filter, AlarmConditionFilter conditionFilter)
{
if (conditionFilter.IsEmpty)
return;
var typeIds = new List<NodeId>();
foreach (var name in conditionFilter.Names)
{
if (!KnownConditionTypeIds.TryGetValue(name, out var id))
return; // unmapped type → omit the WhereClause entirely (client gate covers it)
typeIds.Add(id);
}
if (typeIds.Count == 0)
return;
var where = filter.WhereClause;
if (typeIds.Count == 1)
{
where.Push(FilterOperator.OfType, typeIds[0]);
return;
}
// OR together each OfType element so an event of ANY listed type passes.
var element = where.Push(FilterOperator.OfType, typeIds[0]);
for (var i = 1; i < typeIds.Count; i++)
{
var next = where.Push(FilterOperator.OfType, typeIds[i]);
element = where.Push(FilterOperator.Or, element, next);
}
}
private static SimpleAttributeOperand SelectField(NodeId typeDefinitionId, params string[] browse)
{
var path = new QualifiedNameCollection();
@@ -359,7 +517,12 @@ public class RealOpcUaClient : IOpcUaClient
return;
}
var sourceName = fields[1].Value is NodeId ? (fields[2].Value as string ?? "") : (fields[2].Value as string ?? "");
// Field layout (AlarmStateFields): [1]=SourceNode (NodeId), [2]=SourceName (string).
// Prefer the human-readable SourceName; fall back to the SourceNode NodeId string
// only when SourceName is absent/empty, so the condition still has a stable key.
var sourceName = fields[2].Value as string;
if (string.IsNullOrEmpty(sourceName))
sourceName = (fields[1].Value as NodeId)?.ToString() ?? "";
var conditionName = fields.Count > 11 ? fields[11].Value as string : null;
var sourceObjectRef = sourceName;
var sourceRef = string.IsNullOrEmpty(conditionName) ? sourceName : $"{sourceName}.{conditionName}";
@@ -377,6 +540,25 @@ public class RealOpcUaClient : IOpcUaClient
var shelve = OpcUaAlarmMapper.MapShelve(fields.Count > 10 ? (fields[10].Value as LocalizedText)?.Text : null);
var comment = fields.Count > 12 ? (fields[12].Value as LocalizedText)?.Text ?? "" : "";
// Index 13: ActiveState/TransitionTime → OriginalRaiseTime (when active-state last
// transitioned to TRUE). Absent on non-AlarmCondition events → guard + null fallback.
DateTimeOffset? originalRaiseTime = null;
if (fields.Count > 13 && fields[13].Value is DateTime activeTransitionTime)
// OPC UA mandates UTC for DateTime fields; a TimeSpan.Zero offset treats an
// Unspecified Kind as UTC (consistent with the Time→TransitionTime mapping above).
originalRaiseTime = new DateTimeOffset(activeTransitionTime, TimeSpan.Zero);
// Indices 1417: LimitAlarmType set-point thresholds (HighHighLimit/HighLimit/
// LowLimit/LowLowLimit). Absent on non-limit alarm types → null when missing.
// Pick the first non-null value in priority order (HiHi > Hi > Lo > LoLo) as a
// display-only representative limit; the caller is responsible for interpreting
// which limit is active using AlarmTypeName or ConditionName.
var limitValue = OpcUaAlarmMapper.PickLimitValue(
fields.Count > 14 ? fields[14].Value : null,
fields.Count > 15 ? fields[15].Value : null,
fields.Count > 16 ? fields[16].Value : null,
fields.Count > 17 ? fields[17].Value : null);
var inRefresh = _alarmInRefresh.GetValueOrDefault(handle);
var lastState = _alarmLastState.GetValueOrDefault(handle);
var (prevActive, prevAcked) = lastState != null && lastState.TryGetValue(sourceRef, out var prev) ? prev : (false, true);
@@ -389,18 +571,23 @@ public class RealOpcUaClient : IOpcUaClient
onTransition(new NativeAlarmTransition(
SourceReference: sourceRef,
SourceObjectReference: sourceObjectRef,
AlarmTypeName: eventType?.ToString() ?? "",
// Resolve the event-type NodeId (e.g. "i=9341") to the friendly type name
// the conditionFilter gate keys off (M2.4 / #8); NodeId-string for custom types.
AlarmTypeName: ResolveAlarmTypeName(eventType),
Kind: kind,
Condition: OpcUaAlarmMapper.BuildCondition(active, acked, confirmed, shelve, suppressed, severity),
// UNAVAILABLE via standard OPC UA A&C event fields — see BuildAlarmEventFilter comments.
Category: "",
Description: "",
Message: message,
// UNAVAILABLE: OperatorUser not on refresh stream — see BuildAlarmEventFilter comments.
OperatorUser: "",
OperatorComment: comment,
OriginalRaiseTime: null,
OriginalRaiseTime: originalRaiseTime,
TransitionTime: time,
// UNAVAILABLE: CurrentValue not a standard A&C event field — see BuildAlarmEventFilter.
CurrentValue: "",
LimitValue: ""));
LimitValue: limitValue));
}
private static NativeAlarmTransition SnapshotComplete() => new(
@@ -0,0 +1,78 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.Alarms;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
/// <summary>
/// Parsed native-alarm condition filter (M2.4 / #8).
///
/// <para>
/// A source's <c>conditionFilter</c> is a comma-separated, case-insensitive list
/// of alarm/condition <em>type names</em>, matched against
/// <see cref="NativeAlarmTransition.AlarmTypeName"/>. A <c>null</c>, blank, or
/// all-empty list means "mirror every condition" (the historical behaviour),
/// represented here by <see cref="IsEmpty"/>.
/// </para>
///
/// <para>
/// This is the authoritative <em>client-side</em> gate consulted in the
/// <c>DataConnectionActor</c> routing path, so it applies uniformly across OPC UA
/// (whose server-side <c>WhereClause</c> is only a bandwidth optimisation) and the
/// MxGateway (whose single gateway-wide feed has no server-side filter at all).
/// Parse once at subscribe time; <see cref="IsAllowed"/> is the hot-path check.
/// </para>
/// </summary>
public sealed class AlarmConditionFilter
{
/// <summary>The shared allow-all instance (empty filter set).</summary>
public static readonly AlarmConditionFilter AllowAll = new(new HashSet<string>(StringComparer.OrdinalIgnoreCase));
private readonly HashSet<string> _names;
private AlarmConditionFilter(HashSet<string> names) => _names = names;
/// <summary><c>true</c> when no type names are configured — every condition is allowed.</summary>
public bool IsEmpty => _names.Count == 0;
/// <summary>The normalized (trimmed) type names, for the OPC UA server-side WhereClause optimisation.</summary>
public IReadOnlyCollection<string> Names => _names;
/// <summary>
/// Parses a raw <c>conditionFilter</c> string into a normalized, case-insensitive
/// type-name set. <c>null</c>/blank/all-empty input yields an empty (allow-all) filter.
/// </summary>
/// <param name="conditionFilter">The raw comma-separated filter string, or <c>null</c>.</param>
/// <returns>A parsed <see cref="AlarmConditionFilter"/>; never <c>null</c>.</returns>
public static AlarmConditionFilter Parse(string? conditionFilter)
{
if (string.IsNullOrWhiteSpace(conditionFilter))
return AllowAll;
var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
foreach (var raw in conditionFilter.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries))
names.Add(raw);
return names.Count == 0 ? AllowAll : new AlarmConditionFilter(names);
}
/// <summary>
/// Returns <c>true</c> when <paramref name="transition"/> should be delivered:
/// the filter is empty (allow all), the transition is a framing sentinel
/// (<see cref="AlarmTransitionKind.SnapshotComplete"/>, which carries no condition
/// type and must never be swallowed or the snapshot swap never completes), or its
/// <see cref="NativeAlarmTransition.AlarmTypeName"/> is in the configured set.
/// </summary>
/// <param name="transition">The protocol-neutral transition to test.</param>
/// <returns><c>true</c> to deliver the transition; <c>false</c> to drop it.</returns>
public bool IsAllowed(NativeAlarmTransition transition)
{
if (_names.Count == 0)
return true;
// SnapshotComplete is pure framing (no condition payload) — never filter it.
if (transition.Kind == AlarmTransitionKind.SnapshotComplete)
return true;
return _names.Contains(transition.AlarmTypeName);
}
}
@@ -19,6 +19,13 @@
<PackageReference Include="ZB.MOM.WW.MxGateway.Client" />
</ItemGroup>
<ItemGroup>
<!-- Exposes internal alarm-filter shaping (RealOpcUaClient.BuildAlarmEventFilter)
to the test assembly so the server-side WhereClause can be unit-tested
without a live OPC UA server (M2.4 / #8). -->
<InternalsVisibleTo Include="ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests" />
</ItemGroup>
<ItemGroup>
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" />
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.HealthMonitoring/ZB.MOM.WW.ScadaBridge.HealthMonitoring.csproj" />
@@ -1,4 +1,5 @@
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Protocol;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
@@ -111,8 +112,41 @@ public class FlatteningPipeline : IFlatteningPipeline
ReturnDefinition = s.ReturnDefinition
}).ToList();
// Validate
var validation = _validationService.Validate(config, resolvedSharedScripts);
// Compute the alarm-capable connection-name set so the semantic validator
// can gate native-alarm-source bindings. "Alarm-capable" matches the DCL
// runtime decision (DataConnectionActor: _adapter is IAlarmSubscribableConnection);
// here we filter connections by alarm-capable protocol, then collect their names.
//
// StringComparer.Ordinal is intentional: connection names are stored and
// matched as authored throughout the pipeline (all other name-keyed
// dictionaries in FlatteningService and SemanticValidator use the same
// case-sensitive semantics). OrdinalIgnoreCase would be inconsistent with
// the rest of the binding-resolution path.
var alarmCapableConnectionNames = dataConnections.Values
.Where(c => AlarmCapableProtocols.IsAlarmCapable(c.Protocol))
.Select(c => c.Name)
.ToHashSet(StringComparer.Ordinal);
// M2.8 (#23): the set of data-connection names that actually exist on the
// target site, used to verify each bound connection resolves to a real site
// connection. Same StringComparer.Ordinal as the rest of the binding-resolution
// path (connection names are matched as-authored throughout the pipeline).
var siteConnectionNames = dataConnections.Values
.Select(c => c.Name)
.ToHashSet(StringComparer.Ordinal);
// Validate. This is the deploy-gating path, so connection-binding completeness
// is enforced as an Error (enforceConnectionBindings: true): a data-sourced
// attribute with no binding — or one bound to a connection that no longer exists
// on the site — blocks the deployment. (The template DESIGN-TIME validate path in
// ManagementActor leaves this non-blocking by NOT enforcing, since bindings are
// set later at instance/deploy time.)
var validation = _validationService.Validate(
config,
resolvedSharedScripts,
alarmCapableConnectionNames,
enforceConnectionBindings: true,
siteConnectionNames: siteConnectionNames);
// Compute revision hash
var hash = _revisionHashService.ComputeHash(config);
@@ -37,6 +37,14 @@ public static class StateTransitionValidator
/// <summary>Returns true when a delete operation is allowed from the given state.</summary>
/// <param name="currentState">The current instance state.</param>
/// <returns><see langword="true"/> if delete is permitted; otherwise <see langword="false"/>.</returns>
/// <remarks>
/// Delete is allowed from <see cref="InstanceState.NotDeployed"/> by design: an
/// undeployed instance would otherwise linger as an unremovable orphan record.
/// Delete from <c>NotDeployed</c> is a central-side record cleanup (no live site
/// config to tear down). This matches the state-transition matrix in
/// Component-DeploymentManager.md ("Delete from Not deployed = Yes") — reconciled
/// in M2.17 (#31); the deliberate behaviour was introduced in commit 1d5465f3.
/// </remarks>
public static bool CanDelete(InstanceState currentState) =>
currentState is InstanceState.NotDeployed or InstanceState.Enabled or InstanceState.Disabled;
@@ -75,7 +75,7 @@ public class DatabaseGateway : IDatabaseGateway
new SqlConnection(connectionString);
/// <inheritdoc />
public async Task CachedWriteAsync(
public async Task<ExternalCallResult> CachedWriteAsync(
string connectionName,
string sql,
IReadOnlyDictionary<string, object?>? parameters = null,
@@ -97,6 +97,44 @@ public class DatabaseGateway : IDatabaseGateway
throw new InvalidOperationException("Store-and-forward service not available for cached writes");
}
// M2.3 (#7): attempt the write IMMEDIATELY and classify the outcome,
// mirroring ExternalSystemClient.CachedCallAsync. The pre-M2.3 behaviour
// enqueued every write unconditionally and the S&F retry sweep then
// retried ALL failures forever — a permanent SQL error (constraint,
// syntax, permission) was never returned to the script and spun in the
// buffer indefinitely. Now:
// * success -> Delivered, NOT buffered;
// * PermanentDatabaseException -> Failed synchronously, NOT buffered;
// * TransientDatabaseException -> buffered to S&F for retry.
try
{
await ExecuteWriteAsync(
connectionName, definition.ConnectionString, sql, parameters ?? EmptyParameters, cancellationToken)
.ConfigureAwait(false);
// Immediate success — the write is done; do not buffer.
return new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null, WasBuffered: false);
}
catch (PermanentDatabaseException ex)
{
// Permanent failures are returned to the script and never buffered —
// mirrors the PermanentExternalSystemException branch on the API path.
_logger.LogWarning(
ex,
"CachedWrite to '{Connection}' failed permanently (SQL error {Number}); returning Failed without buffering.",
connectionName, ex.SqlErrorNumber);
return new ExternalCallResult(
Success: false, ResponseJson: null, ErrorMessage: $"Permanent database error: {ex.Message}", WasBuffered: false);
}
catch (TransientDatabaseException ex)
{
// Transient failure — hand to S&F so the retry sweep delivers it.
_logger.LogDebug(
ex,
"CachedWrite to '{Connection}' failed transiently (SQL error {Number}); buffering for retry.",
connectionName, ex.SqlErrorNumber);
}
var payload = JsonSerializer.Serialize(new
{
ConnectionName = connectionName,
@@ -119,6 +157,12 @@ public class DatabaseGateway : IDatabaseGateway
originInstanceName,
definition.MaxRetries > 0 ? definition.MaxRetries : null,
definition.RetryDelay > TimeSpan.Zero ? definition.RetryDelay : null,
// M2.3 (#7): attemptImmediateDelivery: false — this method already
// made the write attempt above (the transient-classified failure is
// exactly why we are buffering). Letting EnqueueAsync re-invoke the
// delivery handler would execute the same write a second time —
// mirrors ExternalSystemClient.CachedCallAsync.
attemptImmediateDelivery: false,
// Audit Log #23 (M3): pin the S&F message id to the
// TrackedOperationId so the retry loop (Bundle E Tasks E4/E5) can
// read it back via StoreAndForwardMessage.Id and emit per-attempt +
@@ -136,17 +180,29 @@ public class DatabaseGateway : IDatabaseGateway
// retry-loop cached-write audit rows correlate back to the
// cross-execution chain. Null for a non-routed run.
parentExecutionId: parentExecutionId);
// Buffered for retry — mirrors the API path's WasBuffered=true result.
return new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null, WasBuffered: true);
}
/// <summary>
/// WP-9/10: Delivers a buffered CachedDbWrite during a store-and-forward retry
/// sweep — executes the SQL against the named connection. Returns true on
/// success, false if the connection no longer exists (the message is parked);
/// throws on any execution error so the engine retries.
/// sweep — executes the SQL against the named connection.
/// </summary>
/// <remarks>
/// M2.3 (#7): the outcome is classified, mirroring
/// <see cref="ExternalSystemClient.DeliverBufferedAsync"/>. Returns
/// <c>false</c> — so the S&amp;F engine PARKS the message — when the
/// connection no longer exists, the payload is unreadable, or the SQL fails
/// with a PERMANENT error (constraint / syntax / permission). A TRANSIENT SQL
/// error (<see cref="TransientDatabaseException"/>) propagates so the engine
/// retries. The pre-M2.3 code rethrew on ANY SQL error, so a permanent
/// failure on the retry path looped forever.
/// </remarks>
/// <param name="message">The buffered store-and-forward message to deliver.</param>
/// <param name="cancellationToken">Cancellation token for the delivery operation.</param>
/// <returns>A task that resolves to <c>true</c> on success, or <c>false</c> if the connection no longer exists.</returns>
/// <returns>A task that resolves to <c>true</c> on success, or <c>false</c> when the message must be parked.</returns>
/// <exception cref="TransientDatabaseException">Thrown on a transient SQL failure so the engine retries.</exception>
public async Task<bool> DeliverBufferedAsync(
StoreAndForwardMessage message, CancellationToken cancellationToken = default)
{
@@ -185,22 +241,152 @@ public class DatabaseGateway : IDatabaseGateway
return false;
}
await using var connection = new SqlConnection(definition.ConnectionString);
await connection.OpenAsync(cancellationToken);
using var command = connection.CreateCommand();
command.CommandText = payload.Sql;
if (payload.Parameters != null)
// Materialise the buffered JsonElement parameters into CLR values once,
// then run through the shared ExecuteWriteAsync seam so both the
// immediate-attempt path and this retry path classify SqlException the
// same way.
IReadOnlyDictionary<string, object?> materialisedParameters =
payload.Parameters == null
? EmptyParameters
: payload.Parameters.ToDictionary(
kv => kv.Key, kv => (object?)JsonElementToParameterValue(kv.Value));
try
{
foreach (var (key, value) in payload.Parameters)
{
var parameter = command.CreateParameter();
parameter.ParameterName = key.StartsWith('@') ? key : "@" + key;
parameter.Value = JsonElementToParameterValue(value);
command.Parameters.Add(parameter);
}
await ExecuteWriteAsync(
payload.ConnectionName, definition.ConnectionString, payload.Sql, materialisedParameters, cancellationToken)
.ConfigureAwait(false);
return true;
}
await command.ExecuteNonQueryAsync(cancellationToken);
return true;
catch (PermanentDatabaseException ex)
{
// Permanent — parking is correct; retrying the identical statement
// cannot succeed. Mirrors ExternalSystemClient.DeliverBufferedAsync
// returning false on PermanentExternalSystemException.
_logger.LogError(
ex,
"Buffered DB write to '{Connection}' failed permanently (SQL error {Number}); parking.",
payload.ConnectionName, ex.SqlErrorNumber);
return false;
}
// TransientDatabaseException propagates — the S&F engine retries.
}
/// <summary>
/// Reusable empty parameter map so the no-parameter paths do not allocate a
/// fresh dictionary each call.
/// </summary>
private static readonly IReadOnlyDictionary<string, object?> EmptyParameters =
new Dictionary<string, object?>();
/// <summary>
/// M2.3 (#7): executes a parameterised SQL write against the given connection
/// string and classifies the outcome into
/// <see cref="TransientDatabaseException"/> / <see cref="PermanentDatabaseException"/>,
/// mirroring the ordered catches of
/// <see cref="ExternalSystemClient.InvokeHttpAsync"/> on the API path:
/// caller-requested cancellation propagates unchanged; a <see cref="SqlException"/>
/// is classified by error number via <see cref="SqlErrorClassifier"/>; a
/// non-<see cref="SqlException"/> transport/connection outage is classified
/// transient via <see cref="SqlErrorClassifier.IsTransient(System.Exception)"/>;
/// genuinely-unexpected exceptions propagate. This is the single classification
/// seam shared by the immediate <see cref="CachedWriteAsync"/> attempt and the
/// <see cref="DeliverBufferedAsync"/> retry path. Marked <c>internal virtual</c>
/// so tests can substitute already-classified outcomes; the raw I/O lives in
/// the inner <see cref="RunSqlAsync"/> seam so tests can also drive raw outage
/// exceptions through this classification (without fabricating a
/// <see cref="SqlException"/>, which has no public constructor).
/// </summary>
/// <param name="connectionName">The human-readable connection name, used only for the classified error message (never the connection string — that would leak credentials into logs / script-visible errors).</param>
/// <param name="connectionString">The ADO.NET connection string to write through.</param>
/// <param name="sql">The SQL statement to execute.</param>
/// <param name="parameters">Materialised CLR parameter values (may be empty).</param>
/// <param name="cancellationToken">Cancellation token for the write.</param>
/// <returns>A task that completes when the write succeeds.</returns>
/// <exception cref="OperationCanceledException">Rethrown unchanged when the caller's <paramref name="cancellationToken"/> requested cancellation.</exception>
/// <exception cref="TransientDatabaseException">Thrown for a transient SQL error number or a non-Sql transport/connection outage.</exception>
/// <exception cref="PermanentDatabaseException">Thrown for a permanent (or unknown) SQL error number.</exception>
internal virtual async Task ExecuteWriteAsync(
string connectionName,
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
// M2.3 (#7) code-review fix: the catch ordering MIRRORS
// ExternalSystemClient.InvokeHttpAsync exactly so the SQL path classifies
// a live outage the same way the HTTP path does:
// 1. caller-requested cancellation propagates UNCHANGED (never a "DB error");
// 2. a SqlException is classified by error number (transient/permanent);
// 3. a NON-SqlException transport/connection failure (InvalidOperationException
// "connection not open", IOException, SocketException, TimeoutException,
// a non-Sql DbException, …) is TRANSIENT — buffered + retried, because a
// retry can succeed once the server is reachable. The pre-fix code only
// caught SqlException, so these escaped unclassified and crashed the
// Script Execution Actor instead of buffering;
// 4. genuinely-unexpected exceptions (e.g. an authoring ArgumentException)
// propagate — same as the HTTP path lets unexpected exceptions escape.
try
{
await RunSqlAsync(connectionString, sql, parameters, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// [2] The caller asked to abandon the work — propagate the cancellation
// unchanged; it must never be reclassified as a transient DB error.
throw;
}
catch (SqlException ex)
{
// Classify by SqlException.Number and rethrow as the strongly-typed
// transient / permanent failure the callers branch on. The context
// is the connection NAME, never the connection string.
throw SqlErrorClassifier.Throw(connectionName, ex);
}
catch (Exception ex) when (SqlErrorClassifier.IsTransient(ex))
{
// [1] A live outage that did not surface as a SqlException — treat as
// transient so the caller buffers + retries. The message uses the
// connection NAME, never the connection string (credential safety).
throw new TransientDatabaseException(
$"Transient database error on {connectionName}: {ex.Message}",
errorNumber: null,
ex);
}
}
/// <summary>
/// M2.3 (#7): the raw ADO.NET write — opens the connection, builds the
/// command, and executes it. Marked <c>internal virtual</c> so tests can throw
/// RAW outage-shaped exceptions (e.g. <see cref="InvalidOperationException"/>,
/// <see cref="System.Net.Sockets.SocketException"/>) through the PRODUCTION
/// classification in <see cref="ExecuteWriteAsync"/>. This is the SQL parallel
/// of <c>client.SendAsync</c> inside <see cref="ExternalSystemClient.InvokeHttpAsync"/>:
/// the actual I/O, wrapped by the ordered classification catches in the caller.
/// </summary>
/// <param name="connectionString">The ADO.NET connection string to write through.</param>
/// <param name="sql">The SQL statement to execute.</param>
/// <param name="parameters">Materialised CLR parameter values (may be empty).</param>
/// <param name="cancellationToken">Cancellation token for the write.</param>
/// <returns>A task that completes when the write succeeds.</returns>
internal virtual async Task RunSqlAsync(
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
await using var connection = new SqlConnection(connectionString);
await connection.OpenAsync(cancellationToken).ConfigureAwait(false);
using var command = connection.CreateCommand();
command.CommandText = sql;
foreach (var (key, value) in parameters)
{
var parameter = command.CreateParameter();
parameter.ParameterName = key.StartsWith('@') ? key : "@" + key;
parameter.Value = value ?? DBNull.Value;
command.Parameters.Add(parameter);
}
await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
// ExternalSystemGateway-020: a JSON number that does not fit in Int64 must
@@ -0,0 +1,217 @@
using System.Data.Common;
using System.IO;
using System.Net.Sockets;
using Microsoft.Data.SqlClient;
namespace ZB.MOM.WW.ScadaBridge.ExternalSystemGateway;
/// <summary>
/// M2.3 (#7): classifies a SQL Server failure as transient (a brief wait /
/// retry may succeed — buffer to store-and-forward) or permanent (the identical
/// statement cannot succeed — return to the script / park the buffered message).
/// </summary>
/// <remarks>
/// <para>
/// This is the database-side parallel of <see cref="ErrorClassifier"/> (the
/// HTTP path). The two are kept separate because the inputs differ: HTTP keys
/// off status codes / exception types, SQL keys off
/// <see cref="SqlException.Number"/>.
/// </para>
/// <para>
/// <b>Transient set.</b> Only connection-loss, timeout, deadlock, and Azure SQL
/// throttle/availability error numbers are transient — failures whose cause is
/// external to the statement and may clear on its own:
/// <list type="bullet">
/// <item><c>-2</c> — query / command timeout expired.</item>
/// <item><c>-1</c> — a connection-level error (general SqlClient connection failure).</item>
/// <item><c>2</c> — SQL Server / network instance not found or not accessible.</item>
/// <item><c>53</c> — network path to the server was not found.</item>
/// <item><c>64</c> — connection terminated mid-session (transport error).</item>
/// <item><c>233</c> — no process on the other end of the named pipe.</item>
/// <item><c>1205</c> — the session was chosen as a deadlock victim.</item>
/// <item><c>10053</c> — transport-level abort (software caused connection abort).</item>
/// <item><c>10054</c> — connection reset by peer.</item>
/// <item><c>10060</c> — connection attempt timed out.</item>
/// <item><c>40197</c> — Azure SQL service error processing the request; retry.</item>
/// <item><c>40501</c> — Azure SQL service is busy.</item>
/// <item><c>40613</c> — Azure SQL database is currently unavailable.</item>
/// <item><c>49918</c> / <c>49919</c> / <c>49920</c> — Azure SQL throttling (too many requests / operations).</item>
/// </list>
/// </para>
/// <para>
/// <b>Everything else is permanent.</b> Constraint violations (547, 2627, 2601),
/// syntax errors (102, 156, 207, 208), and permission errors (229, 230, 262) are
/// the obvious permanent cases, but the policy is broader: <b>any error number not
/// in the transient set — including unknown / undocumented / ambiguous numbers —
/// is treated as permanent.</b> Fail-fast is the safer default: silently
/// retrying an unrecognised error forever (the pre-M2.3 behaviour) hides
/// authoring bugs and can replay duplicate side effects. A genuinely transient
/// number we have not enumerated will, at worst, surface to the script as a
/// permanent failure — a loud, fixable outcome — rather than spin in an
/// unbounded retry loop.
/// </para>
/// </remarks>
public static class SqlErrorClassifier
{
/// <summary>
/// The complete set of SQL Server error numbers treated as transient. See the
/// type-level remarks for the per-number rationale. Anything outside this set
/// is permanent.
/// </summary>
private static readonly HashSet<int> TransientErrorNumbers = new()
{
-2, -1, 2, 53, 64, 233, 1205,
10053, 10054, 10060,
40197, 40501, 40613,
49918, 49919, 49920,
};
/// <summary>
/// Determines whether a SQL Server error number represents a transient
/// failure. Unknown / undocumented numbers default to permanent
/// (<see langword="false"/>) — see the type-level remarks.
/// </summary>
/// <param name="errorNumber">The SQL Server error number (e.g. <see cref="SqlException.Number"/>).</param>
/// <returns><see langword="true"/> if the number is in the transient set; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(int errorNumber) => TransientErrorNumbers.Contains(errorNumber);
/// <summary>
/// Determines whether a <see cref="SqlException"/> represents a transient
/// failure by classifying its top-level <see cref="SqlException.Number"/>.
/// </summary>
/// <param name="exception">The SQL exception to classify.</param>
/// <returns><see langword="true"/> if the exception's error number is transient; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(SqlException exception)
{
ArgumentNullException.ThrowIfNull(exception);
return IsTransient(exception.Number);
}
/// <summary>
/// Determines whether an arbitrary <see cref="Exception"/> represents a
/// transient database failure — the SQL-path parallel of
/// <see cref="ErrorClassifier.IsTransient(System.Exception)"/> on the HTTP path.
/// </summary>
/// <remarks>
/// <para>
/// A live DB outage does not always surface as a <see cref="SqlException"/>:
/// once the underlying connection / socket is torn down, the driver raises
/// transport-level exceptions instead. These are <b>retryable</b> — a retry
/// can succeed once the server is reachable again — so they are classified
/// transient (buffered to store-and-forward) rather than escaping unclassified
/// to crash the calling Script Execution Actor. The transient set:
/// </para>
/// <list type="bullet">
/// <item><see cref="InvalidOperationException"/> — connection-state error (e.g. "the connection is not open" / pooled connection broken).</item>
/// <item><see cref="IOException"/> — transport read/write failure mid-session.</item>
/// <item><see cref="SocketException"/> — TCP-level failure (connection refused/reset/timed out).</item>
/// <item><see cref="TimeoutException"/> — command / connection timeout surfaced as a CLR <see cref="TimeoutException"/>.</item>
/// <item><see cref="TaskCanceledException"/> — driver-level cancellation/timeout NOT tied to a caller token (the caller-token case is handled before classification — see the gateway's ordered catches).</item>
/// <item>Any <see cref="DbException"/> that is NOT a <see cref="SqlException"/> — a provider/driver transport error (a real <see cref="SqlException"/> is classified by error number via the overloads above, never here).</item>
/// </list>
/// <para>
/// <b>Everything else is NOT transient</b> and must propagate, exactly as the
/// HTTP path lets genuinely-unexpected exceptions escape past its
/// <c>catch (Exception ex) when (ErrorClassifier.IsTransient(ex))</c> filter.
/// Authoring bugs (<see cref="ArgumentException"/>, <see cref="NullReferenceException"/>,
/// etc.) are loud, fixable failures — silently buffering and retrying them
/// forever would hide the bug.
/// </para>
/// </remarks>
/// <param name="exception">The exception to classify.</param>
/// <returns><see langword="true"/> for a transport/connection/timeout/driver exception; otherwise <see langword="false"/>.</returns>
public static bool IsTransient(Exception exception)
{
ArgumentNullException.ThrowIfNull(exception);
// A real SqlException is classified by its error number (the overloads
// above), never by type — fall back to the number-based policy so an
// unknown SqlException stays permanent (fail-fast) rather than being
// swept up as transient by the DbException catch-all below.
if (exception is SqlException sql)
{
return IsTransient(sql);
}
return exception is InvalidOperationException
or IOException
or SocketException
or TimeoutException
or TaskCanceledException
or DbException; // any non-SqlException DbException (SqlException handled above)
}
/// <summary>
/// Classifies a <see cref="SqlException"/> and rethrows it as the matching
/// strongly-typed failure: <see cref="TransientDatabaseException"/> for a
/// transient error number, <see cref="PermanentDatabaseException"/> otherwise.
/// Mirrors <see cref="ErrorClassifier.AsTransient(string, System.Exception?)"/>
/// + the throw of <see cref="PermanentExternalSystemException"/> on the HTTP
/// path — the callers then branch on the typed exception rather than on the
/// raw <see cref="SqlException"/>.
/// </summary>
/// <param name="context">A short human-readable description of the failing operation (e.g. the connection name).</param>
/// <param name="exception">The SQL exception to classify and wrap.</param>
/// <returns>This method never returns normally — it always throws.</returns>
/// <exception cref="TransientDatabaseException">Thrown when the error number is transient.</exception>
/// <exception cref="PermanentDatabaseException">Thrown when the error number is permanent (the default).</exception>
public static Exception Throw(string context, SqlException exception)
{
ArgumentNullException.ThrowIfNull(exception);
if (IsTransient(exception))
{
throw new TransientDatabaseException(
$"Transient SQL error {exception.Number} on {context}: {exception.Message}",
exception.Number,
exception);
}
throw new PermanentDatabaseException(
$"Permanent SQL error {exception.Number} on {context}: {exception.Message}",
exception.Number,
exception);
}
}
/// <summary>
/// Signals a transient database failure suitable for store-and-forward retry —
/// the SQL-path parallel of <see cref="TransientExternalSystemException"/>.
/// </summary>
public class TransientDatabaseException : Exception
{
/// <summary>Gets the SQL Server error number that caused the failure, if known.</summary>
public int? SqlErrorNumber { get; }
/// <summary>Initializes a new <see cref="TransientDatabaseException"/>.</summary>
/// <param name="message">The error message.</param>
/// <param name="errorNumber">The SQL Server error number, if available.</param>
/// <param name="innerException">Optional inner exception (typically the original <see cref="SqlException"/>).</param>
public TransientDatabaseException(string message, int? errorNumber = null, Exception? innerException = null)
: base(message, innerException)
{
SqlErrorNumber = errorNumber;
}
}
/// <summary>
/// Signals a permanent database failure that must not be retried — the SQL-path
/// parallel of <see cref="PermanentExternalSystemException"/>. Returned
/// synchronously to the calling script on the immediate attempt and parks the
/// message on the store-and-forward retry path.
/// </summary>
public class PermanentDatabaseException : Exception
{
/// <summary>Gets the SQL Server error number that caused the failure, if known.</summary>
public int? SqlErrorNumber { get; }
/// <summary>Initializes a new <see cref="PermanentDatabaseException"/>.</summary>
/// <param name="message">The error message.</param>
/// <param name="errorNumber">The SQL Server error number, if available.</param>
/// <param name="innerException">Optional inner exception (typically the original <see cref="SqlException"/>).</param>
public PermanentDatabaseException(string message, int? errorNumber = null, Exception? innerException = null)
: base(message, innerException)
{
SqlErrorNumber = errorNumber;
}
}
@@ -111,6 +111,23 @@ public interface ISiteHealthCollector
/// <param name="count">The number of parked messages.</param>
void SetParkedMessageCount(int count);
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — replace the latest cumulative
/// site-event-log write-failure count (SQLite error, disk full,
/// bounded-queue overflow drop) used by the next <see cref="CollectReport"/>
/// call. Refreshed periodically by the <c>SiteEventLogFailureCountReporter</c>
/// hosted service. Point-in-time: the value is NOT reset on
/// <see cref="CollectReport"/>; it carries forward until the next poller
/// refresh. Default interface implementation is a no-op so existing test
/// fakes continue to compile without per-fake updates.
/// </summary>
/// <param name="count">The cumulative failed-write count from <c>ISiteEventLogger.FailedWriteCount</c>.</param>
void SetSiteEventLogWriteFailures(long count)
{
// Default no-op so test fakes do not need to be updated. The real
// SiteHealthCollector overrides this with the Interlocked.Exchange store.
}
/// <summary>
/// Sets the hostname of this node.
/// </summary>
@@ -1,11 +1,25 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring;
public static class ServiceCollectionExtensions
{
/// <summary>
/// Sentinel marker used by <see cref="AddSiteEventLogHealthMetricsBridge"/> to
/// implement an idempotency guard. Because the reporter is registered via a
/// factory-lambda overload of <c>AddHostedService</c>, its
/// <see cref="Microsoft.Extensions.DependencyInjection.ServiceDescriptor.ImplementationType"/>
/// is <see langword="null"/> — checking it would be a silent no-op. Registering
/// this marker as a singleton and guarding on its <c>ServiceType</c> gives a
/// reliable, allocation-free sentinel that works regardless of how the hosted
/// service was wired.
/// </summary>
private sealed class SiteEventLogHealthMetricsBridgeMarker { }
/// <summary>
/// Register site-side health monitoring services (metric collection + periodic reporting).
/// Call this on site nodes only. For central, call AddCentralHealthAggregation() instead.
@@ -50,6 +64,77 @@ public static class ServiceCollectionExtensions
return services;
}
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — register the
/// <see cref="SiteEventLogFailureCountReporter"/> hosted service that
/// periodically reads the cumulative event-log write-failure count and
/// pushes it into <see cref="ISiteHealthCollector"/> as a point-in-time
/// snapshot (<c>SiteEventLogWriteFailures</c> on the site health report).
/// </summary>
/// <remarks>
/// <para>
/// Must be called AFTER <see cref="AddSiteHealthMonitoring"/> (or
/// <see cref="AddHealthMonitoring"/>) which registers the
/// <see cref="ISiteHealthCollector"/> the reporter depends on.
/// </para>
/// <para>
/// <b>Why a Func&lt;long&gt; delegate instead of ISiteEventLogger.</b>
/// A direct <c>HealthMonitoring → SiteEventLogging</c> reference is avoided to
/// prevent an undesirable low-level coupling: <c>SiteEventLogging</c> is a
/// leaf component that should not pull in higher-level infrastructure. The
/// <see cref="Func{TResult}"/> delegate seam keeps the reference one-way and
/// loose: the caller (Host site wiring) captures
/// <c>ISiteEventLogger.FailedWriteCount</c> as a lambda and passes it here.
/// Note: <c>HealthMonitoring → StoreAndForward → SiteEventLogging</c> already
/// exists as a transitive path, so a direct reference would not introduce a
/// cycle — the delegate is purely a coupling-avoidance measure.
/// </para>
/// <para>
/// Idempotent — a <see cref="SiteEventLogHealthMetricsBridgeMarker"/> singleton
/// is used as the sentinel. Because the reporter is registered via a factory-lambda
/// overload of <c>AddHostedService</c>, its
/// <see cref="Microsoft.Extensions.DependencyInjection.ServiceDescriptor.ImplementationType"/>
/// is <see langword="null"/>; checking it would be a silent no-op and a second
/// call would spin up a second polling timer. Guarding on the marker's
/// <c>ServiceType</c> is always reliable regardless of how the hosted service
/// was wired (AddHostedService has no TryAdd variant).
/// </para>
/// </remarks>
/// <param name="services">The service collection to register into.</param>
/// <param name="failedWriteCountProvider">
/// A factory delegate that, given the root <see cref="IServiceProvider"/>,
/// returns a <see cref="Func{TResult}"/> that reads the current cumulative
/// event-log write-failure count. Typically:
/// <c>sp => () => sp.GetRequiredService&lt;ISiteEventLogger&gt;().FailedWriteCount</c>.
/// The factory is evaluated once at hosted-service resolution time; the inner
/// <see cref="Func{TResult}"/> is called on every poll tick.
/// </param>
/// <returns>The same <see cref="IServiceCollection"/> for chaining.</returns>
public static IServiceCollection AddSiteEventLogHealthMetricsBridge(
this IServiceCollection services,
Func<IServiceProvider, Func<long>> failedWriteCountProvider)
{
ArgumentNullException.ThrowIfNull(services);
ArgumentNullException.ThrowIfNull(failedWriteCountProvider);
// Idempotent guard — uses the marker type rather than ImplementationType because
// AddHostedService(factory-lambda) sets only ImplementationFactory and leaves
// ImplementationType null; an ImplementationType == check is a silent no-op for
// factory-registered services. The marker singleton's ServiceType is always set.
if (services.Any(d => d.ServiceType == typeof(SiteEventLogHealthMetricsBridgeMarker)))
{
return services;
}
services.AddSingleton<SiteEventLogHealthMetricsBridgeMarker>();
services.AddHostedService(sp => new SiteEventLogFailureCountReporter(
failedWriteCountProvider(sp),
sp.GetRequiredService<ISiteHealthCollector>(),
sp.GetRequiredService<ILogger<SiteEventLogFailureCountReporter>>()));
return services;
}
/// <summary>
/// HealthMonitoring-014: register the <see cref="HealthMonitoringOptionsValidator"/>
/// so a misconfigured <c>ScadaBridge:HealthMonitoring</c> section (zero/negative
@@ -0,0 +1,146 @@
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring;
/// <summary>
/// Site Event Logging (#12) M2.16 (#30) — site-side hosted service that
/// periodically reads the cumulative event-log write-failure count and pushes
/// it into <see cref="ISiteHealthCollector"/> so the next
/// <see cref="ISiteHealthCollector.CollectReport"/> emits a fresh
/// <c>SiteEventLogWriteFailures</c> field on the site health report.
/// </summary>
/// <remarks>
/// <para>
/// <b>Why a Func&lt;long&gt; and not ISiteEventLogger directly.</b>
/// A direct <c>HealthMonitoring → SiteEventLogging</c> reference is avoided
/// to prevent an undesirable low-level coupling: <c>SiteEventLogging</c> is a
/// leaf component that should not pull in higher-level infrastructure. Note that
/// <c>HealthMonitoring → StoreAndForward → SiteEventLogging</c> already
/// exists as a transitive path (confirmed: <c>StoreAndForward.csproj</c> references
/// <c>SiteEventLogging.csproj</c>), so a direct reference would NOT introduce a
/// cycle — the delegate is purely a coupling-avoidance measure. The
/// <see cref="Func{TResult}"/> seam lets the caller (Host site wiring) capture
/// <c>ISiteEventLogger.FailedWriteCount</c> as a lambda at registration time; this
/// service reads only the numeric result. The delegate approach is a standard
/// pattern for counter bridges and keeps the registration path self-documenting.
/// </para>
/// <para>
/// <b>Cadence.</b> 30 s by default — the same cadence as
/// <c>SiteAuditBacklogReporter</c>, which is coarse enough to stay within
/// the health-report interval budget while keeping the central dashboard
/// current.
/// </para>
/// <para>
/// <b>Failure containment.</b> Any unexpected exception during the probe is
/// caught and logged; the next tick retries. Mirrors
/// <c>SiteAuditBacklogReporter</c>'s "exception logged, not propagated"
/// contract.
/// </para>
/// </remarks>
public sealed class SiteEventLogFailureCountReporter : IHostedService, IDisposable
{
/// <summary>
/// Default poll cadence. Matches <c>SiteAuditBacklogReporter.DefaultRefreshInterval</c>
/// (30 s) — coarse enough to amortise the read across many reports, fine
/// enough that the central dashboard never lags by more than one
/// health-report interval.
/// </summary>
internal static readonly TimeSpan DefaultRefreshInterval = TimeSpan.FromSeconds(30);
private readonly Func<long> _failedWriteCountProvider;
private readonly ISiteHealthCollector _collector;
private readonly ILogger<SiteEventLogFailureCountReporter> _logger;
private readonly TimeSpan _refreshInterval;
private CancellationTokenSource? _cts;
private Task? _loop;
/// <summary>Initializes a new instance of <see cref="SiteEventLogFailureCountReporter"/>.</summary>
/// <param name="failedWriteCountProvider">
/// A delegate that returns the current cumulative event-log write-failure count.
/// Typically wired as <c>() => sp.GetRequiredService&lt;ISiteEventLogger&gt;().FailedWriteCount</c>
/// in the Host site composition root.
/// </param>
/// <param name="collector">The site health collector that receives the failure-count snapshot.</param>
/// <param name="logger">Logger instance.</param>
/// <param name="refreshInterval">Poll interval override; defaults to <see cref="DefaultRefreshInterval"/> (30 s).</param>
public SiteEventLogFailureCountReporter(
Func<long> failedWriteCountProvider,
ISiteHealthCollector collector,
ILogger<SiteEventLogFailureCountReporter> logger,
TimeSpan? refreshInterval = null)
{
_failedWriteCountProvider = failedWriteCountProvider
?? throw new ArgumentNullException(nameof(failedWriteCountProvider));
_collector = collector ?? throw new ArgumentNullException(nameof(collector));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_refreshInterval = refreshInterval ?? DefaultRefreshInterval;
}
/// <summary>Starts the background polling loop, running an immediate first probe before entering the timed cycle.</summary>
/// <param name="ct">Cancellation token signalling host shutdown.</param>
/// <returns>A task that represents the asynchronous operation.</returns>
public Task StartAsync(CancellationToken ct)
{
// Linked CTS lets StopAsync's cancellation AND the host's shutdown
// token both terminate the loop; either side firing aborts the
// pending Task.Delay.
_cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
_loop = Task.Run(() => RunLoopAsync(_cts.Token));
return Task.CompletedTask;
}
private async Task RunLoopAsync(CancellationToken ct)
{
// First tick runs immediately so the very first health report after
// process start carries a real failure-count snapshot — without this
// the dashboard would show 0 for the first 30 s after a deploy even
// if failures had already accumulated.
SafeProbe();
while (!ct.IsCancellationRequested)
{
try
{
await Task.Delay(_refreshInterval, ct).ConfigureAwait(false);
}
catch (OperationCanceledException)
{
break;
}
SafeProbe();
}
}
private void SafeProbe()
{
try
{
var count = _failedWriteCountProvider();
_collector.SetSiteEventLogWriteFailures(count);
}
catch (Exception ex)
{
// Catch-all is deliberate: the hosted service must survive every
// class of probe failure so the next tick gets a chance. Mirrors
// SiteAuditBacklogReporter's "exception logged, not propagated" contract.
_logger.LogWarning(ex, "SiteEventLogFailureCountReporter probe failed; next tick will retry.");
}
}
/// <summary>Signals the polling loop to stop and waits for it to complete.</summary>
/// <param name="ct">Cancellation token (not used; the internal CTS governs shutdown).</param>
/// <returns>A task that represents the asynchronous operation.</returns>
public Task StopAsync(CancellationToken ct)
{
_cts?.Cancel();
return _loop ?? Task.CompletedTask;
}
/// <summary>Releases the internal <see cref="CancellationTokenSource"/> used to stop the polling loop.</summary>
public void Dispose()
{
_cts?.Dispose();
}
}
@@ -17,6 +17,7 @@ public class SiteHealthCollector : ISiteHealthCollector
private int _siteAuditWriteFailures;
private int _auditRedactionFailures;
private volatile SiteAuditBacklogSnapshot? _siteAuditBacklog;
private long _siteEventLogWriteFailures;
private readonly ConcurrentDictionary<string, ConnectionHealth> _connectionStatuses = new();
private readonly ConcurrentDictionary<string, TagResolutionStatus> _tagResolutionCounts = new();
private readonly ConcurrentDictionary<string, string> _connectionEndpoints = new();
@@ -77,6 +78,12 @@ public class SiteHealthCollector : ISiteHealthCollector
_siteAuditBacklog = snapshot ?? throw new ArgumentNullException(nameof(snapshot));
}
/// <inheritdoc />
public void SetSiteEventLogWriteFailures(long count)
{
Interlocked.Exchange(ref _siteEventLogWriteFailures, count);
}
/// <inheritdoc />
public void UpdateConnectionHealth(string connectionName, ConnectionHealth health)
{
@@ -206,6 +213,7 @@ public class SiteHealthCollector : ISiteHealthCollector
ClusterNodes: _clusterNodes?.ToList(),
SiteAuditWriteFailures: siteAuditWriteFailures,
AuditRedactionFailure: auditRedactionFailures,
SiteAuditBacklog: _siteAuditBacklog);
SiteAuditBacklog: _siteAuditBacklog,
SiteEventLogWriteFailures: Interlocked.Read(ref _siteEventLogWriteFailures));
}
}
@@ -7,6 +7,8 @@ public class DatabaseOptions
{
/// <summary>Connection string for the central configuration SQL Server database.</summary>
public string? ConfigurationDb { get; set; }
/// <summary>Connection string for the central machine-data SQL Server database.</summary>
public string? MachineDataDb { get; set; }
/// <summary>File system path to the site-local SQLite database directory.</summary>
public string? SiteDbPath { get; set; }
}
@@ -0,0 +1,175 @@
using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Logging;
namespace ZB.MOM.WW.ScadaBridge.Host.Health;
/// <summary>
/// M2.14 (#28): readiness check that verifies every <b>required central cluster
/// singleton</b> is reachable from this node, satisfying the "required cluster
/// singletons running (if applicable)" clause of REQ-HOST-4a. Register it
/// <see cref="ZB.MOM.WW.Health.ZbHealthTags.Ready"/>-tagged in the Central-role
/// <c>AddHealthChecks()</c> chain only, so it is naturally role-scoped (site nodes
/// never register it).
/// </summary>
/// <remarks>
/// <para>
/// <b>Probe strategy.</b> Each central singleton has a local
/// <c>ClusterSingletonProxy</c> actor (created unconditionally in
/// <c>AkkaHostedService.RegisterCentralActors</c>). The proxy actor exists locally
/// as soon as it is created, so merely resolving its path proves nothing about the
/// singleton itself. Instead we <see cref="ActorRefImplicitSenderExtensions.Ask{T}(ICanTell, object, TimeSpan?)"/>
/// the proxy an <see cref="Identify"/> with a short bounded per-singleton timeout and
/// expect an <see cref="ActorIdentity"/> whose <see cref="ActorIdentity.Subject"/> is
/// non-null. The proxy buffers and forwards to the live singleton, so a non-null
/// Subject within the timeout means the singleton is running and reachable; a null
/// Subject or a timeout means it is unreachable. Probes run concurrently
/// (<see cref="Task.WhenAll(System.Collections.Generic.IEnumerable{Task})"/>) so the
/// whole check stays cheap and readiness polling stays fast.
/// </para>
/// <para>
/// <b>Required-always vs if-applicable.</b> All five central singleton proxies are
/// created unconditionally on a central node (there is no feature/config gate around
/// any of them), so all five are treated as required-always here. If a future
/// singleton is created behind a feature flag, it should NOT be added to
/// <see cref="RequiredSingletonProxyNames"/> — "if applicable" means skip when its
/// feature is off.
/// </para>
/// <para>
/// <b>Failover flakiness.</b> During a brief singleton handover the singleton may be
/// momentarily unreachable through the proxy. The bounded per-singleton timeout maps
/// that to Unhealthy (we never throw and never retry — retries would make the probe
/// slow). Readiness flapping briefly during a failover is acceptable and correct: a
/// node mid-handover is legitimately not fully ready. We deliberately accept that
/// tradeoff rather than masking it with retries.
/// </para>
/// <para>
/// <b>No leadership requirement.</b> The proxy reaches the singleton from either node
/// (active or standby), so a ready standby still reports Healthy here — readiness must
/// NOT require cluster leadership (that is the Active tier's job).
/// </para>
/// <para>
/// The <see cref="ActorSystem"/> is resolved lazily from DI per probe, mirroring
/// <c>AkkaClusterHealthCheck</c>; if it is not yet available (startup race) the check
/// returns Unhealthy rather than throwing.
/// </para>
/// </remarks>
public sealed class RequiredSingletonsHealthCheck : IHealthCheck
{
/// <summary>
/// Local actor names (under <c>/user</c>) of the <c>ClusterSingletonProxy</c>
/// actors for the singletons that must always be running on a central node.
/// Matches the unconditional proxy registrations in
/// <c>AkkaHostedService.RegisterCentralActors</c>.
/// </summary>
public static readonly IReadOnlyList<string> RequiredSingletonProxyNames = new[]
{
"notification-outbox-proxy",
"audit-log-ingest-proxy",
"site-call-audit-proxy",
"audit-log-purge-proxy",
"site-audit-reconciliation-proxy",
};
// Short, bounded per-singleton timeout. Kept small so readiness polling stays
// fast; a singleton in mid-handover that does not answer within this window is
// (correctly) treated as momentarily unreachable. Do NOT add retries here.
private static readonly TimeSpan ProbeTimeout = TimeSpan.FromSeconds(2);
private readonly IServiceProvider _serviceProvider;
private readonly ILogger<RequiredSingletonsHealthCheck> _logger;
/// <summary>Initializes a new <see cref="RequiredSingletonsHealthCheck"/>.</summary>
/// <param name="serviceProvider">
/// Application service provider; the <see cref="ActorSystem"/> is resolved lazily so the
/// check is startup-safe (Unhealthy, never throwing, if Akka is not yet up).
/// </param>
/// <param name="logger">Logger for diagnostic detail on unreachable singletons.</param>
public RequiredSingletonsHealthCheck(
IServiceProvider serviceProvider,
ILogger<RequiredSingletonsHealthCheck> logger)
{
_serviceProvider = serviceProvider ?? throw new ArgumentNullException(nameof(serviceProvider));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <inheritdoc />
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
// CheckHealthAsync must NEVER throw — catch everything and map to Unhealthy
// with a descriptive message. An escaping exception would be recorded as
// Unhealthy anyway, but a thrown exception loses the descriptive message.
try
{
var system = _serviceProvider.GetService<ActorSystem>();
if (system is null)
return HealthCheckResult.Unhealthy("ActorSystem not yet available.");
// Probe each required singleton concurrently so the whole check is bounded
// by ~ProbeTimeout, not the sum of the per-singleton timeouts.
var probes = RequiredSingletonProxyNames
.Select(name => ProbeAsync(system, name, cancellationToken))
.ToArray();
var results = await Task.WhenAll(probes).ConfigureAwait(false);
var unreachable = results
.Where(r => !r.Reachable)
.Select(r => r.Name)
.ToList();
if (unreachable.Count == 0)
return HealthCheckResult.Healthy(
$"All {RequiredSingletonProxyNames.Count} required cluster singletons are reachable.");
var joined = string.Join(", ", unreachable);
_logger.LogWarning(
"Readiness degraded: required cluster singleton(s) unreachable: {Unreachable}",
joined);
return HealthCheckResult.Unhealthy(
$"Required cluster singleton(s) unreachable: {joined}.");
}
catch (Exception ex)
{
// Defensive: any unexpected failure (including OperationCanceledException
// on shutdown) degrades readiness rather than escaping the check.
return HealthCheckResult.Unhealthy(
"Failed to probe required cluster singletons.", ex);
}
}
/// <summary>
/// Asks the named local proxy an <see cref="Identify"/> with a bounded timeout.
/// Reachable iff a non-null <see cref="ActorIdentity.Subject"/> comes back in time.
/// A null Subject (path not present) or a timeout/exception → not reachable. This
/// method itself never throws.
/// </summary>
private async Task<(string Name, bool Reachable)> ProbeAsync(
ActorSystem system,
string proxyName,
CancellationToken cancellationToken)
{
try
{
// ActorSelection so a missing path resolves an ActorIdentity with a null
// Subject (rather than throwing) within the bounded timeout.
var selection = system.ActorSelection($"/user/{proxyName}");
var identity = await selection
.Ask<ActorIdentity>(new Identify(proxyName), ProbeTimeout, cancellationToken)
.ConfigureAwait(false);
return (proxyName, identity.Subject is not null);
}
catch (Exception)
{
// Timeout / cancellation / any failure → momentarily unreachable. Bounded,
// no retry — readiness may briefly flap during a singleton handover, which
// is the correct signal for a node mid-handover.
return (proxyName, false);
}
}
}
+12
View File
@@ -202,6 +202,18 @@ try
failureStatus: null,
tags: new[] { ZbHealthTags.Ready },
args: AkkaClusterStatusPolicy.Default)
// M2.14 (#28): readiness ALSO reflects "required cluster singletons running"
// (REQ-HOST-4a). Probes each central singleton's local ClusterSingletonProxy
// with a bounded Identify and degrades to Unhealthy if any required singleton
// is unreachable. Registered inside the Central-role branch (this is it) so the
// check is naturally role-scoped — site nodes never run it. It resolves
// ActorSystem from DI per probe, like the akka-cluster check above, and is
// leadership-agnostic so a ready standby still reports ready (the proxy reaches
// the singleton from either node).
.AddTypeActivatedCheck<RequiredSingletonsHealthCheck>(
"required-singletons",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready })
.AddTypeActivatedCheck<ActiveNodeHealthCheck>(
"active-node",
failureStatus: null,
@@ -58,6 +58,16 @@ public static class SiteServiceRegistration
services.AddStoreAndForward();
services.AddSiteEventLogging();
// Site Event Logging (#12) M2.16 (#30) — bridge ISiteEventLogger.FailedWriteCount
// into the site health report as a point-in-time SiteEventLogWriteFailures field.
// Must come AFTER both AddSiteHealthMonitoring (registers ISiteHealthCollector) and
// AddSiteEventLogging (registers ISiteEventLogger). The outer Func<IServiceProvider, …>
// is evaluated once at hosted-service resolution time (root IServiceProvider is available);
// the inner Func<long> is called on every poll tick and reads FailedWriteCount from the
// already-resolved ISiteEventLogger singleton.
services.AddSiteEventLogHealthMetricsBridge(
sp => () => sp.GetRequiredService<ISiteEventLogger>().FailedWriteCount);
// Audit Log (#23) — site-side hot-path writer + telemetry collaborators.
// The SiteAuditTelemetryActor itself is registered by AkkaHostedService
// in the site-role block; this call wires every DI dependency it (and
@@ -96,6 +106,19 @@ public static class SiteServiceRegistration
return new AkkaClusterNodeProvider(akkaService, siteRole);
});
// SiteEventLogging-019 / #29 (M2.15): the EventLogPurgeService runs on every
// site host node but consults this optional gate each tick and early-exits on
// the standby. Register it to delegate to IClusterNodeProvider.SelfIsPrimary
// (the canonical "this node is Up AND cluster leader" check) so purge runs ONLY
// on the active node — no duplicated cluster logic. Non-clustered test hosts that
// never call SiteServiceRegistration leave it unregistered, so the purge defaults
// to always-run (the pre-fix behaviour, preserved).
services.AddSingleton<SiteEventLogActiveNodeCheck>(sp =>
{
var nodeProvider = sp.GetRequiredService<IClusterNodeProvider>();
return () => nodeProvider.SelfIsPrimary;
});
// Options binding
BindSharedOptions(services, config);
services.Configure<SiteRuntimeOptions>(config.GetSection("ScadaBridge:SiteRuntime"));
@@ -60,6 +60,9 @@ public static class StartupValidator
.Require("ScadaBridge:Database:ConfigurationDb",
_ => !string.IsNullOrEmpty(configuration.GetSection("ScadaBridge:Database")["ConfigurationDb"]),
"connection string required for Central")
.Require("ScadaBridge:Database:MachineDataDb",
_ => !string.IsNullOrEmpty(configuration.GetSection("ScadaBridge:Database")["MachineDataDb"]),
"connection string required for Central")
// Task 1.4: the LDAP server key moved into the nested Security:Ldap
// sub-section (bound to the shared LdapOptions). Validate the nested key so
// the pre-host preflight still fails fast on a missing LDAP server for
@@ -4,8 +4,23 @@ using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
/// <summary>
/// WP-2: Validates and deserializes JSON request body against method parameter definitions.
/// Extended type system: Boolean, Integer, Float, String, Object, List.
/// WP-2: Validates and deserializes a JSON request body against a method's
/// parameter definitions. Extended type system: Boolean, Integer, Float,
/// String, Object, List.
///
/// <para>
/// InboundAPI-M2.6: validation is now RECURSIVE and type-aware for the
/// extended <c>Object</c> / <c>List</c> types. Declared object fields are
/// validated against their declared (nested) types, list elements against the
/// declared element type, and scalars at any depth against the extended type —
/// with path-qualified errors (e.g. <c>order.items[2].quantity</c>). The
/// definition is read as JSON Schema (the canonical persisted format produced
/// by the Central UI / migration); the legacy flat-array form is still
/// accepted for transition safety. See
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.InboundApiSchema"/>
/// for the shared recursive engine that <see cref="ReturnValueValidator"/>
/// also uses.
/// </para>
/// </summary>
public static class ParameterValidator
{
@@ -14,40 +29,34 @@ public static class ParameterValidator
/// Returns deserialized parameters or an error message.
/// </summary>
/// <param name="body">The parsed JSON request body; null or undefined if no body was supplied.</param>
/// <param name="parameterDefinitions">JSON-serialized list of <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.ParameterDefinition"/>; null or empty means no parameters are defined.</param>
/// <param name="parameterDefinitions">JSON Schema describing the method's parameters (an object schema), or null/empty when no parameters are defined. The legacy flat-array form is also accepted.</param>
/// <returns>A <see cref="ParameterValidationResult"/> with coerced parameter values on success, or an error message on failure.</returns>
public static ParameterValidationResult Validate(
JsonElement? body,
string? parameterDefinitions)
{
if (string.IsNullOrEmpty(parameterDefinitions))
{
// No parameters defined — body should be empty or null
return ParameterValidationResult.Valid(new Dictionary<string, object?>());
}
List<ParameterDefinition> definitions;
InboundApiSchema? schema;
try
{
definitions = JsonSerializer.Deserialize<List<ParameterDefinition>>(
parameterDefinitions,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true })
?? [];
schema = InboundApiSchema.Parse(parameterDefinitions);
}
catch (JsonException)
{
return ParameterValidationResult.Invalid("Invalid parameter definitions in method configuration");
}
if (definitions.Count == 0)
// No parameters defined (or an object schema with no declared fields) —
// the body is unconstrained and yields an empty parameter set.
if (schema is null || schema.Type != "object" || schema.Fields.Count == 0)
{
return ParameterValidationResult.Valid(new Dictionary<string, object?>());
}
if (body == null || body.Value.ValueKind == JsonValueKind.Null || body.Value.ValueKind == JsonValueKind.Undefined)
if (body == null
|| body.Value.ValueKind == JsonValueKind.Null
|| body.Value.ValueKind == JsonValueKind.Undefined)
{
// Check if all parameters are optional
var required = definitions.Where(d => d.Required).ToList();
var required = schema.Fields.Where(f => f.Required).ToList();
if (required.Count > 0)
{
return ParameterValidationResult.Invalid(
@@ -62,86 +71,51 @@ public static class ParameterValidator
return ParameterValidationResult.Invalid("Request body must be a JSON object");
}
var result = new Dictionary<string, object?>();
// Recursively type-check the whole body against the declared object
// schema (nested Object fields, List element types, scalars at any
// depth, undeclared-field rejection) with path-qualified errors.
var errors = new List<string>();
// InboundAPI-010: report top-level body fields that do not match any defined
// parameter, so a caller learns about a typo'd parameter name instead of
// having the field silently ignored.
var defined = new HashSet<string>(definitions.Select(d => d.Name), StringComparer.Ordinal);
var unexpected = body.Value.EnumerateObject()
.Select(p => p.Name)
.Where(name => !defined.Contains(name))
.ToList();
if (unexpected.Count > 0)
{
errors.Add($"Unexpected parameter(s): {string.Join(", ", unexpected)}");
}
foreach (var def in definitions)
{
if (body.Value.TryGetProperty(def.Name, out var prop))
{
var (value, error) = CoerceValue(prop, def.Type, def.Name);
if (error != null)
{
errors.Add(error);
}
else
{
result[def.Name] = value;
}
}
else if (def.Required)
{
errors.Add($"Missing required parameter: {def.Name}");
}
}
schema.Validate(body.Value, string.Empty, errors);
if (errors.Count > 0)
{
return ParameterValidationResult.Invalid(string.Join("; ", errors));
}
// Materialize the coerced top-level parameter values for the script.
var result = new Dictionary<string, object?>();
foreach (var field in schema.Fields)
{
if (body.Value.TryGetProperty(field.Name, out var prop))
{
result[field.Name] = Materialize(prop, field.Schema);
}
}
return ParameterValidationResult.Valid(result);
}
/// <summary>
/// Coerces a JSON element to the declared parameter type. InboundAPI-010: the
/// <c>Object</c> and <c>List</c> extended types are validated for JSON <em>shape</em>
/// only (object vs. array) — there is no field-level or element-level type
/// validation. A method script that needs a specific nested structure must
/// validate it itself; invalid nested data surfaces as a runtime script error.
/// Converts a validated JSON element to the CLR value handed to the script.
/// Validation has already passed, so this only shapes the value: scalars to
/// their primitive type, objects to <see cref="Dictionary{TKey,TValue}"/>,
/// arrays to <see cref="List{T}"/>.
/// </summary>
private static (object? value, string? error) CoerceValue(JsonElement element, string expectedType, string paramName)
private static object? Materialize(JsonElement element, InboundApiSchema schema)
{
return expectedType.ToLowerInvariant() switch
if (element.ValueKind == JsonValueKind.Null)
{
"boolean" => element.ValueKind == JsonValueKind.True || element.ValueKind == JsonValueKind.False
? (element.GetBoolean(), null)
: (null, $"Parameter '{paramName}' must be a Boolean"),
return null;
}
"integer" => element.ValueKind == JsonValueKind.Number && element.TryGetInt64(out var intVal)
? (intVal, null)
: (null, $"Parameter '{paramName}' must be an Integer"),
"float" => element.ValueKind == JsonValueKind.Number
? (element.GetDouble(), null)
: (null, $"Parameter '{paramName}' must be a Float"),
"string" => element.ValueKind == JsonValueKind.String
? (element.GetString(), null)
: (null, $"Parameter '{paramName}' must be a String"),
"object" => element.ValueKind == JsonValueKind.Object
? (JsonSerializer.Deserialize<Dictionary<string, object?>>(element.GetRawText()), null)
: (null, $"Parameter '{paramName}' must be an Object"),
"list" => element.ValueKind == JsonValueKind.Array
? (JsonSerializer.Deserialize<List<object?>>(element.GetRawText()), null)
: (null, $"Parameter '{paramName}' must be a List"),
_ => (null, $"Unknown parameter type '{expectedType}' for parameter '{paramName}'")
return schema.Type switch
{
"boolean" => element.GetBoolean(),
"integer" => element.GetInt64(),
"number" => element.GetDouble(),
"string" => element.GetString(),
"object" => JsonSerializer.Deserialize<Dictionary<string, object?>>(element.GetRawText()),
"array" => JsonSerializer.Deserialize<List<object?>>(element.GetRawText()),
_ => JsonSerializer.Deserialize<object?>(element.GetRawText()),
};
}
}
@@ -1,4 +1,5 @@
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
@@ -10,13 +11,20 @@ namespace ZB.MOM.WW.ScadaBridge.InboundAPI;
/// <see cref="ParameterValidator"/>.
///
/// <para>
/// The return definition is a JSON array of <see cref="ReturnFieldDefinition"/>
/// (the same <c>{name,type}</c> shape as a parameter definition). A method whose
/// <c>ReturnDefinition</c> is null/empty is unconstrained — its return value is
/// serialized as-is (backward compatible). Primitive fields (Boolean / Integer /
/// Float / String) are type-checked; the extended <c>Object</c>/<c>List</c> types
/// are shape-checked only (object vs. array), consistent with how
/// <see cref="ParameterValidator"/> treats inbound extended types.
/// The return definition is JSON Schema (the canonical persisted format; the
/// legacy flat <c>[{name,type}]</c> array is still accepted for transition
/// safety). A method whose <c>ReturnDefinition</c> is null/empty is
/// unconstrained — its return value is serialized as-is (backward compatible).
/// </para>
///
/// <para>
/// InboundAPI-M2.6: validation is RECURSIVE and type-aware — declared object
/// fields are validated against their declared (nested) types, list elements
/// against the declared element type, and scalars at any depth — with
/// path-qualified errors. The recursion is shared with
/// <see cref="ParameterValidator"/> via
/// <see cref="ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi.InboundApiSchema"/>,
/// so the inbound and outbound type checks cannot drift apart.
/// </para>
/// </summary>
public static class ReturnValueValidator
@@ -27,8 +35,8 @@ public static class ReturnValueValidator
/// definition is configured or the result conforms to it.
/// </summary>
/// <param name="resultJson">The JSON-serialized script return value to validate.</param>
/// <param name="returnDefinition">JSON-serialized list of <see cref="ReturnFieldDefinition"/> entries, or null/empty to skip validation.</param>
/// <returns>A <see cref="ReturnValidationResult"/> indicating success or describing the first validation failure.</returns>
/// <param name="returnDefinition">JSON Schema describing the method's return value, or null/empty to skip validation. The legacy flat-array form is also accepted.</param>
/// <returns>A <see cref="ReturnValidationResult"/> indicating success or describing the validation failures.</returns>
public static ReturnValidationResult Validate(string? resultJson, string? returnDefinition)
{
if (string.IsNullOrWhiteSpace(returnDefinition))
@@ -37,13 +45,10 @@ public static class ReturnValueValidator
return ReturnValidationResult.Valid();
}
List<ReturnFieldDefinition> fields;
InboundApiSchema? schema;
try
{
fields = JsonSerializer.Deserialize<List<ReturnFieldDefinition>>(
returnDefinition,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true })
?? [];
schema = InboundApiSchema.Parse(returnDefinition);
}
catch (JsonException)
{
@@ -51,11 +56,25 @@ public static class ReturnValueValidator
"Invalid return definition in method configuration");
}
if (fields.Count == 0)
// A schema that declares no constraints (e.g. an object schema with no
// fields) leaves the return value unconstrained.
if (schema is null || (schema.Type == "object" && schema.Fields.Count == 0))
{
return ReturnValidationResult.Valid();
}
// INTENTIONAL asymmetry with ParameterValidator:
//
// ParameterValidator has an early-return guard for "schema.Type != object"
// because method parameters are ALWAYS a top-level JSON object (flat map of
// name→value); a non-object parameter schema is treated as unconstrained.
//
// ReturnValueValidator does NOT guard on schema.Type here. A method may
// declare a scalar return type (e.g. {"type":"string"} or {"type":"integer"})
// and the script is expected to return exactly that scalar JSON value.
// Guarding on type == "object" would silently bypass validation for scalar
// and array return schemas — do NOT add that guard here.
if (string.IsNullOrWhiteSpace(resultJson))
{
return ReturnValidationResult.Invalid(
@@ -63,75 +82,37 @@ public static class ReturnValueValidator
}
JsonElement root;
JsonDocument doc;
try
{
using var doc = JsonDocument.Parse(resultJson);
root = doc.RootElement.Clone();
doc = JsonDocument.Parse(resultJson);
}
catch (JsonException)
{
return ReturnValidationResult.Invalid("Script return value is not valid JSON");
}
if (root.ValueKind != JsonValueKind.Object)
using (doc)
{
return ReturnValidationResult.Invalid(
"Method declares a return structure but the script did not return an object");
}
root = doc.RootElement;
var errors = new List<string>();
foreach (var field in fields)
{
if (!root.TryGetProperty(field.Name, out var value))
// A JSON null result against a declared structure is treated as
// "no value returned" (preserves the prior contract).
if (root.ValueKind == JsonValueKind.Null)
{
errors.Add($"missing return field '{field.Name}'");
continue;
return ReturnValidationResult.Invalid(
"Method declares a return structure but the script returned no value");
}
var typeError = CheckFieldType(value, field.Type, field.Name);
if (typeError != null)
errors.Add(typeError);
var errors = new List<string>();
schema.Validate(root, string.Empty, errors);
return errors.Count > 0
? ReturnValidationResult.Invalid(
$"Return value does not match the declared return definition: {string.Join("; ", errors)}")
: ReturnValidationResult.Valid();
}
return errors.Count > 0
? ReturnValidationResult.Invalid(
$"Return value does not match the declared return definition: {string.Join("; ", errors)}")
: ReturnValidationResult.Valid();
}
private static string? CheckFieldType(JsonElement value, string declaredType, string fieldName)
{
// A null value satisfies any field type — the script may legitimately omit
// optional data; only a missing field (handled by the caller) is an error.
if (value.ValueKind == JsonValueKind.Null)
return null;
var ok = declaredType.ToLowerInvariant() switch
{
"boolean" => value.ValueKind is JsonValueKind.True or JsonValueKind.False,
"integer" => value.ValueKind == JsonValueKind.Number && value.TryGetInt64(out _),
"float" => value.ValueKind == JsonValueKind.Number,
"string" => value.ValueKind == JsonValueKind.String,
"object" => value.ValueKind == JsonValueKind.Object,
"list" => value.ValueKind == JsonValueKind.Array,
_ => true, // unknown declared type — do not block the response
};
return ok ? null : $"return field '{fieldName}' must be {declaredType}";
}
}
/// <summary>
/// InboundAPI-014: one field of a method's declared return structure — the
/// deserialized form of an entry in <c>ApiMethod.ReturnDefinition</c>. Defined in
/// this module (not Commons) because the inbound API is currently its only consumer.
/// </summary>
public sealed class ReturnFieldDefinition
{
/// <summary>Field name as it must appear in the script return object.</summary>
public string Name { get; set; } = string.Empty;
/// <summary>Expected JSON type of this field (e.g., "string", "integer", "boolean", "object", "list").</summary>
public string Type { get; set; } = "String";
}
/// <summary>
@@ -0,0 +1,231 @@
using System.Security.Claims;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles;
namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary>
/// The outcome of a single cookie <c>OnValidatePrincipal</c> evaluation. The thin
/// <c>OnValidatePrincipal</c> lambda translates this into the matching
/// <c>CookieValidatePrincipalContext</c> calls (<c>RejectPrincipal</c> /
/// <c>ReplacePrincipal</c> + <c>ShouldRenew</c>); the decision itself is computed by
/// <see cref="CookieSessionValidator"/> so it is unit-testable in isolation.
/// </summary>
/// <param name="Action">What the caller must do with the principal.</param>
/// <param name="Principal">The replacement principal when <paramref name="Action"/> is <see cref="SessionValidationAction.Replace"/>; otherwise <c>null</c>.</param>
public readonly record struct SessionValidationResult(
SessionValidationAction Action,
ClaimsPrincipal? Principal)
{
/// <summary>Keep the existing principal unchanged.</summary>
public static SessionValidationResult Keep { get; } = new(SessionValidationAction.Keep, null);
/// <summary>Reject the principal (idle-timed-out) — the caller signs the user out.</summary>
public static SessionValidationResult Reject { get; } = new(SessionValidationAction.Reject, null);
/// <summary>Replace the principal with a refreshed one and renew the cookie.</summary>
/// <param name="principal">The rebuilt principal.</param>
/// <returns>A replace result carrying <paramref name="principal"/>.</returns>
public static SessionValidationResult Replace(ClaimsPrincipal principal) =>
new(SessionValidationAction.Replace, principal);
}
/// <summary>The action a cookie session validation requires of the caller.</summary>
public enum SessionValidationAction
{
/// <summary>Leave the principal as-is (no idle timeout, no refresh due, or a refresh error we swallow).</summary>
Keep,
/// <summary>The session is idle-timed-out; reject + sign out.</summary>
Reject,
/// <summary>The role mapping was refreshed; replace the principal and renew the cookie.</summary>
Replace,
}
/// <summary>
/// M2.19 (#15): the unit-testable core of the cookie <c>OnValidatePrincipal</c> event.
/// Enforces the idle timeout and refreshes the session's role/scope claims from the
/// STORED LDAP group claims via the DB-backed <see cref="RoleMapper"/> — <b>without any
/// LDAP call</b> — picking up central role-mapping (and scope-rule) changes mid-session.
/// </summary>
/// <remarks>
/// <para>
/// <b>Idle timeout</b> (default <see cref="SecurityOptions.IdleTimeoutMinutes"/> = 30):
/// computed from the <see cref="JwtTokenService.LastActivityClaimType"/> anchor. This is
/// the explicit, deterministic counterpart to the cookie middleware's
/// <c>ExpireTimeSpan</c> + <c>SlidingExpiration</c> window — both use the SAME idle
/// timeout value, so the explicit check never contradicts the cookie window. A
/// not-timed-out session has its last-activity anchor advanced to "now" (genuine
/// request = activity), mirroring the sliding renew.
/// </para>
/// <para>
/// <b>Role refresh</b> (default <see cref="SecurityOptions.RoleRefreshThresholdMinutes"/>
/// = 15): when the elapsed time since <see cref="JwtTokenService.LastRoleRefreshClaimType"/>
/// exceeds the threshold, the stored groups are re-mapped and the principal is rebuilt via
/// <see cref="SessionClaimBuilder"/> (identical shape to <c>/auth/login</c>). If the DB
/// mapping revoked the user's roles, the rebuilt principal reflects the loss.
/// </para>
/// <para>
/// <b>Failure policy</b>: a refresh error (e.g. the mapper throws because the DB is
/// unreachable) NEVER signs the user out and NEVER throws out of validation — it returns
/// <see cref="SessionValidationResult.Keep"/>, mirroring the documented "LDAP failure:
/// active sessions continue with current roles" stance. Only the explicit idle-timeout
/// path rejects.
/// </para>
/// </remarks>
public sealed class CookieSessionValidator
{
private readonly IGroupRoleMapper<string> _roleMapper;
private readonly SecurityOptions _options;
private readonly TimeProvider _timeProvider;
private readonly ILogger<CookieSessionValidator> _logger;
/// <summary>Initializes the validator.</summary>
/// <param name="roleMapper">The DB-backed group→role mapping seam (no LDAP) used for the mid-session refresh.</param>
/// <param name="options">Security options carrying the idle and role-refresh thresholds.</param>
/// <param name="timeProvider">Clock source; injected so tests can advance time deterministically.</param>
/// <param name="logger">Logger instance.</param>
public CookieSessionValidator(
IGroupRoleMapper<string> roleMapper,
IOptions<SecurityOptions> options,
TimeProvider timeProvider,
ILogger<CookieSessionValidator> logger)
{
_roleMapper = roleMapper ?? throw new ArgumentNullException(nameof(roleMapper));
_options = (options ?? throw new ArgumentNullException(nameof(options))).Value;
_timeProvider = timeProvider ?? throw new ArgumentNullException(nameof(timeProvider));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <summary>
/// Evaluates a cookie principal: enforces the idle timeout, then refreshes the
/// role/scope claims from the stored LDAP groups when the role-refresh interval has
/// elapsed. Never throws.
/// </summary>
/// <param name="principal">The current cookie principal under validation.</param>
/// <param name="ct">Cancellation token (the request-aborted token in the pipeline).</param>
/// <returns>The action the caller must take and any replacement principal.</returns>
public async Task<SessionValidationResult> ValidateAsync(ClaimsPrincipal? principal, CancellationToken ct = default)
{
// An unauthenticated / null principal is left to the rest of the pipeline.
if (principal?.Identity is not { IsAuthenticated: true })
{
return SessionValidationResult.Keep;
}
var now = _timeProvider.GetUtcNow();
// 1) Idle-timeout enforcement — the only path that rejects. A missing/unparsable
// last-activity anchor is treated as timed-out (fail-closed): a session we
// cannot age must not be kept alive forever.
if (IsIdleTimedOut(principal, now))
{
_logger.LogInformation(
"Cookie session for {Username} rejected: past the {IdleTimeout}-minute idle timeout.",
principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value ?? "(unknown)",
_options.IdleTimeoutMinutes);
return SessionValidationResult.Reject;
}
// 2) Role-mapping refresh — best-effort. Any failure keeps the existing session.
try
{
var refreshed = await TryRefreshAsync(principal, now, ct).ConfigureAwait(false);
if (refreshed is not null)
{
return SessionValidationResult.Replace(refreshed);
}
}
catch (Exception ex)
{
// SECURITY: never broaden access and never sign the user out on a transient
// refresh fault — keep the existing principal (current roles) and swallow.
_logger.LogWarning(
ex,
"Mid-session role refresh failed for {Username}; keeping existing session and roles.",
principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value ?? "(unknown)");
return SessionValidationResult.Keep;
}
return SessionValidationResult.Keep;
}
/// <summary>
/// Returns true when the session's last-activity anchor is older than
/// <see cref="SecurityOptions.IdleTimeoutMinutes"/>. A missing/unparsable anchor is
/// treated as timed-out (fail-closed).
/// </summary>
/// <param name="principal">The cookie principal.</param>
/// <param name="now">The current instant.</param>
/// <returns><c>true</c> if the session has exceeded the idle window.</returns>
public bool IsIdleTimedOut(ClaimsPrincipal principal, DateTimeOffset now)
{
var claim = principal.FindFirst(JwtTokenService.LastActivityClaimType);
if (claim is null || !DateTimeOffset.TryParse(claim.Value, out var lastActivity))
{
return true;
}
return (now - lastActivity).TotalMinutes > _options.IdleTimeoutMinutes;
}
// Returns a rebuilt principal when the role-refresh interval has elapsed; null when
// nothing changed. The principal is rebuilt via SessionClaimBuilder so its shape is
// identical to /auth/login.
private async Task<ClaimsPrincipal?> TryRefreshAsync(ClaimsPrincipal principal, DateTimeOffset now, CancellationToken ct)
{
var roleRefreshDue = IsRoleRefreshDue(principal, now);
if (!roleRefreshDue)
{
// No mapping refresh due. We deliberately do NOT mint a new principal just to
// advance LastActivity: the cookie middleware's SlidingExpiration already
// renews the cookie window on activity, so the idle anchor only needs
// advancing when we are rebuilding the principal anyway (on a role refresh).
// This keeps the no-op request path allocation-free and avoids a cookie
// re-issue on every request.
return null;
}
var username = principal.FindFirst(JwtTokenService.UsernameClaimType)?.Value;
var displayName = principal.FindFirst(JwtTokenService.DisplayNameClaimType)?.Value;
if (string.IsNullOrEmpty(username) || string.IsNullOrEmpty(displayName))
{
// Malformed principal — cannot rebuild faithfully. Keep it (do not reject).
_logger.LogWarning("Cannot refresh role mapping: principal is missing username/display-name claims.");
return null;
}
var groups = SessionClaimBuilder.ReadGroups(principal);
// Re-run the DB-backed mapping on the STORED groups — NO LDAP call.
var mapping = await _roleMapper.MapAsync(groups, ct).ConfigureAwait(false);
var scope = mapping.Scope is RoleMappingResult mapped
? mapped
: new RoleMappingResult(mapping.Roles, [], IsSystemWideDeployment: false);
// Rebuild identically to /auth/login, advancing BOTH anchors: the role-refresh
// anchor (we just refreshed) and the idle anchor (this is a genuine request).
return SessionClaimBuilder.Build(username, displayName, groups, scope, now);
}
/// <summary>
/// Returns true when the elapsed time since the last role refresh exceeds
/// <see cref="SecurityOptions.RoleRefreshThresholdMinutes"/>. A missing/unparsable
/// anchor is treated as due (refresh now and re-stamp the anchor).
/// </summary>
/// <param name="principal">The cookie principal.</param>
/// <param name="now">The current instant.</param>
/// <returns><c>true</c> if a role-mapping refresh is due.</returns>
public bool IsRoleRefreshDue(ClaimsPrincipal principal, DateTimeOffset now)
{
var claim = principal.FindFirst(JwtTokenService.LastRoleRefreshClaimType);
if (claim is null || !DateTimeOffset.TryParse(claim.Value, out var lastRefresh))
{
return true;
}
return (now - lastRefresh).TotalMinutes > _options.RoleRefreshThresholdMinutes;
}
}
@@ -29,6 +29,22 @@ public class JwtTokenService
public const string SiteIdClaimType = ZbClaimTypes.ScopeId;
public const string LastActivityClaimType = "LastActivity";
// M2.19 (#15): the cookie session now stores the user's raw LDAP groups and a
// role-mapping refresh anchor so an active interactive session can re-run the
// DB-backed RoleMapper (NOT LDAP) mid-session and pick up central role-mapping
// changes. These two have no canonical ZbClaimTypes equivalent (the shared
// vocabulary covers identity/role/scope, not the ScadaBridge-internal refresh
// machinery), so they keep "zb:"-prefixed ScadaBridge-local literals:
// - GroupClaimType ("zb:group", one per LDAP group) is the input the
// mid-session RoleMapper re-run consumes — the groups are the durable
// fact; the roles are the derived projection that can go stale.
// - LastRoleRefreshClaimType ("zb:lastrolerefresh", ISO-8601 "o") anchors
// the role-mapping refresh interval (SecurityOptions.RoleRefreshThresholdMinutes).
// LastActivityClaimType (above) remains the idle-timeout anchor — a separate
// clock from the role-refresh anchor.
public const string GroupClaimType = "zb:group";
public const string LastRoleRefreshClaimType = "zb:lastrolerefresh";
/// <summary>
/// Fixed issuer bound into every token and required on validation. Binding
/// issuer/audience is defence-in-depth: even though the HMAC key is shared only
@@ -1,10 +1,21 @@
namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary>
/// Non-LDAP security configuration: the cookie-embedded JWT signing/lifetime
/// settings and the session idle-timeout / cookie-security policy.
/// Non-LDAP security configuration for the ScadaBridge Central UI.
/// </summary>
/// <remarks>
/// <para>
/// <b>JWT Bearer path (<c>/auth/token</c>)</b>: <see cref="JwtSigningKey"/> and
/// <see cref="JwtExpiryMinutes"/> govern the short-lived Bearer token issued to
/// the CLI / Inbound API. They have no effect on the Blazor cookie session.
/// </para>
/// <para>
/// <b>Blazor cookie session</b>: <see cref="IdleTimeoutMinutes"/> and
/// <see cref="RoleRefreshThresholdMinutes"/> govern the cookie-only session used by
/// the Blazor Server UI. There is no embedded JWT in this path — the cookie is
/// HttpOnly/Secure and managed entirely by ASP.NET Core cookie authentication.
/// </para>
/// <para>
/// Task 1.2/1.4 cutover: the LDAP connection settings that used to live here as
/// flat <c>Ldap*</c> keys (server, port, transport, search base, service account,
/// attributes, timeout) moved into a nested <c>ScadaBridge:Security:Ldap</c>
@@ -12,6 +23,7 @@ namespace ZB.MOM.WW.ScadaBridge.Security;
/// and registered via <c>AddZbLdapAuth</c>. This is a BREAKING config-key change —
/// see CHANGELOG. The non-LDAP fields below are unchanged and still bound from
/// <c>ScadaBridge:Security</c>.
/// </para>
/// </remarks>
public class SecurityOptions
{
@@ -27,7 +39,19 @@ public class SecurityOptions
public const int MinJwtSigningKeyBytes = 32;
/// <summary>Cookie-embedded JWT lifetime in minutes before it must be refreshed.</summary>
public int JwtExpiryMinutes { get; set; } = 15;
/// <summary>Session idle timeout in minutes; sessions inactive beyond this are expired.</summary>
/// <summary>
/// Session idle timeout in minutes for the Blazor cookie session; sessions inactive
/// beyond this are expired and the user is redirected to <c>/login</c>. Default: <b>30</b>.
/// </summary>
/// <remarks>
/// Because <see cref="RoleRefreshThresholdMinutes"/> is the only operation that advances
/// the <c>LastActivity</c> anchor, the effective maximum idle window before a session is
/// guaranteed to be rejected is approximately
/// <c>IdleTimeoutMinutes + RoleRefreshThresholdMinutes</c> (~45 minutes with defaults).
/// This is intentional and mirrors the cookie middleware's own <c>SlidingExpiration</c>
/// fuzziness. Must be strictly greater than <see cref="RoleRefreshThresholdMinutes"/>
/// (enforced at startup by <see cref="SecurityOptionsValidator"/>).
/// </remarks>
public int IdleTimeoutMinutes { get; set; } = 30;
/// <summary>
@@ -35,6 +59,28 @@ public class SecurityOptions
/// </summary>
public int JwtRefreshThresholdMinutes { get; set; } = 5;
/// <summary>
/// M2.19 (#15): how long a cookie session's role-mapping projection may be stale
/// before <c>OnValidatePrincipal</c> re-runs the DB-backed <c>RoleMapper</c> on the
/// session's stored LDAP group claims and rebuilds the role/scope claims. Default:
/// <b>15 minutes</b>, matching the documented sliding-refresh cadence.
/// </summary>
/// <remarks>
/// This is a purely central (database) refresh — it picks up LDAP-group→role mapping
/// changes and scope-rule changes WITHOUT contacting LDAP, so revoked roles take effect
/// within this window. It does NOT pick up live LDAP group-membership changes (the
/// shared LDAP library exposes no passwordless group-search; that remains a
/// next-login refresh — see Component-Security.md).
/// <para>
/// Because a role-refresh is also the only operation that advances the
/// <c>LastActivity</c> anchor, the effective maximum idle window is approximately
/// <c><see cref="IdleTimeoutMinutes"/> + RoleRefreshThresholdMinutes</c> (~45 minutes
/// with defaults). Must be strictly less than <see cref="IdleTimeoutMinutes"/>
/// (enforced at startup by <see cref="SecurityOptionsValidator"/>).
/// </para>
/// </remarks>
public int RoleRefreshThresholdMinutes { get; set; } = 15;
/// <summary>
/// When true (default) the authentication cookie is always marked
/// <c>Secure</c> (sent only over HTTPS) — the correct production setting,
@@ -59,3 +105,38 @@ public class SecurityOptions
/// </summary>
public string CookieName { get; set; } = DefaultCookieName;
}
/// <summary>
/// M2.19 (#15): startup validator for <see cref="SecurityOptions"/>. Fails fast at boot
/// on any configuration that would defeat idle-timeout enforcement.
/// </summary>
/// <remarks>
/// Registered with <c>ValidateOnStart()</c> by
/// <see cref="ServiceCollectionExtensions.AddSecurity"/> so a misconfigured appsettings
/// section is caught at application startup rather than silently misapplied at runtime.
/// </remarks>
public sealed class SecurityOptionsValidator : Microsoft.Extensions.Options.IValidateOptions<SecurityOptions>
{
/// <inheritdoc/>
public Microsoft.Extensions.Options.ValidateOptionsResult Validate(string? name, SecurityOptions options)
{
// SECURITY: RoleRefreshThresholdMinutes must be strictly less than IdleTimeoutMinutes.
// The role-refresh cycle is the ONLY operation that advances the LastActivity anchor,
// so a single un-refreshed cycle must not be able to exhaust the entire idle window.
// If threshold >= idle, a user who triggers exactly one refresh at t=0 would have
// their anchor advanced to t=threshold while the idle check only fires at t>idle —
// meaning t=threshold >= t=idle is already past (or at) the expiry, defeating enforcement.
if (options.RoleRefreshThresholdMinutes >= options.IdleTimeoutMinutes)
{
return Microsoft.Extensions.Options.ValidateOptionsResult.Fail(
$"{nameof(SecurityOptions.RoleRefreshThresholdMinutes)} " +
$"({options.RoleRefreshThresholdMinutes}) must be strictly less than " +
$"{nameof(SecurityOptions.IdleTimeoutMinutes)} " +
$"({options.IdleTimeoutMinutes}). " +
$"A single refresh cycle must not equal or exceed the idle window or idle " +
$"enforcement is defeated.");
}
return Microsoft.Extensions.Options.ValidateOptionsResult.Success;
}
}
@@ -1,5 +1,8 @@
using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Auth.Abstractions.Roles;
@@ -51,6 +54,14 @@ public static class ServiceCollectionExtensions
services.AddScoped<JwtTokenService>();
services.AddScoped<RoleMapper>();
// M2.19 (#15): the cookie OnValidatePrincipal core. Scoped to match the
// IGroupRoleMapper<string> it depends on (which depends on the Scoped
// ISecurityRepository). The clock is injected (TimeProvider) so the idle/refresh
// thresholds can be exercised deterministically in tests; the production default
// is the wall clock. TryAddSingleton keeps the Host free to register its own.
services.TryAddSingleton(TimeProvider.System);
services.AddScoped<CookieSessionValidator>();
// Audit Actor wiring (Phase 3): the user-facing inbound API audit path
// sources AuditEvent.Actor from the authenticated principal via this
// seam. HttpAuditActorAccessor reads IHttpContextAccessor.HttpContext?.User
@@ -71,6 +82,16 @@ public static class ServiceCollectionExtensions
// to consume this seam in a later task.
services.AddScoped<IGroupRoleMapper<string>, ScadaBridgeGroupRoleMapper>();
// M2.19 (#15): fail-fast config guard — RoleRefreshThresholdMinutes must be strictly
// less than IdleTimeoutMinutes. If they are equal or inverted, a single un-refreshed
// cycle can exhaust the entire idle window and idle enforcement is silently defeated.
// SecurityOptionsValidator is registered with ValidateOnStart so a misconfigured
// appsettings section fails at boot with a clear message rather than behaving subtly
// incorrectly at runtime. Config-binding stays with the Host (component library must
// not take IConfiguration), so we only register the validator + ValidateOnStart here.
services.AddOptions<SecurityOptions>().ValidateOnStart();
services.AddSingleton<IValidateOptions<SecurityOptions>, SecurityOptionsValidator>();
// Note: the old SecurityOptionsValidator (which fail-fast-validated LdapServer +
// LdapSearchBase) is gone — those keys moved into the shared LdapOptions, whose
// LdapOptionsValidator (registered with ValidateOnStart by AddZbLdapAuth above)
@@ -94,6 +115,16 @@ public static class ServiceCollectionExtensions
// environments sharing a hostname can be given distinct names. HttpOnly /
// SameSite / SecurePolicy / SlidingExpiration / ExpireTimeSpan are likewise
// applied there via ZbCookieDefaults.Apply.
// M2.19 (#15): OnValidatePrincipal enforces the idle timeout and refreshes
// the role/scope claims from the session's STORED LDAP groups (DB-backed
// RoleMapper, NO LDAP) so central role-mapping changes take effect
// mid-session. The lambda is a THIN adapter: it resolves the request-scoped
// CookieSessionValidator (which holds all the testable idle/refresh logic)
// and translates its decision into the cookie context calls. It NEVER
// throws — CookieSessionValidator.ValidateAsync swallows refresh faults and
// keeps the session (mirrors "LDAP failure: active sessions continue").
options.Events.OnValidatePrincipal = OnValidatePrincipalAsync;
});
// CentralUI-005: configure the cookie session as a sliding window so the
@@ -152,6 +183,70 @@ public static class ServiceCollectionExtensions
return services;
}
/// <summary>
/// M2.19 (#15): the thin <see cref="CookieAuthenticationEvents.OnValidatePrincipal"/>
/// adapter. It resolves the request-scoped <see cref="CookieSessionValidator"/>,
/// asks it for a decision, and applies it to the cookie context:
/// <list type="bullet">
/// <item><see cref="SessionValidationAction.Reject"/> → <see cref="CookieValidatePrincipalContext.RejectPrincipal"/> + sign out (idle-timeout — the only sign-out path).</item>
/// <item><see cref="SessionValidationAction.Replace"/> → <see cref="CookieValidatePrincipalContext.ReplacePrincipal"/> + <c>ShouldRenew = true</c> (role mapping refreshed).</item>
/// <item><see cref="SessionValidationAction.Keep"/> → no-op (no refresh due, or a swallowed refresh fault).</item>
/// </list>
/// All logic lives in <see cref="CookieSessionValidator.ValidateAsync"/>, which never
/// throws, so this adapter cannot bubble an exception out into the request pipeline.
/// </summary>
/// <param name="context">The cookie validation context supplied by the middleware.</param>
/// <returns>A task that completes when the decision has been applied.</returns>
internal static async Task OnValidatePrincipalAsync(CookieValidatePrincipalContext context)
{
var validator = context.HttpContext.RequestServices.GetRequiredService<CookieSessionValidator>();
var result = await validator
.ValidateAsync(context.Principal, context.HttpContext.RequestAborted)
.ConfigureAwait(false);
await ApplyValidationResultAsync(context, result).ConfigureAwait(false);
}
/// <summary>
/// Applies a <see cref="SessionValidationResult"/> to a
/// <see cref="CookieValidatePrincipalContext"/>: the pure decision-application
/// step extracted from <see cref="OnValidatePrincipalAsync"/> so it can be
/// exercised in unit tests without a live DI container resolving
/// <see cref="CookieSessionValidator"/>.
/// </summary>
/// <param name="context">The cookie validation context to mutate.</param>
/// <param name="result">The decision produced by <see cref="CookieSessionValidator.ValidateAsync"/>.</param>
/// <returns>A task that completes when the result has been applied.</returns>
internal static async Task ApplyValidationResultAsync(
CookieValidatePrincipalContext context,
SessionValidationResult result)
{
switch (result.Action)
{
case SessionValidationAction.Reject:
// Idle-timeout: drop the principal AND clear the cookie so the next
// request is treated as anonymous and redirected to /login.
context.RejectPrincipal();
await context.HttpContext
.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme)
.ConfigureAwait(false);
break;
case SessionValidationAction.Replace when result.Principal is not null:
// Role mapping refreshed from stored groups — swap in the rebuilt
// principal and re-issue the cookie so the new claims persist.
context.ReplacePrincipal(result.Principal);
context.ShouldRenew = true;
break;
case SessionValidationAction.Keep:
default:
// Leave the principal untouched.
break;
}
}
/// <summary>
/// Registers security-related Akka actors (placeholder for future actor registrations).
/// </summary>
@@ -0,0 +1,116 @@
using System.Security.Claims;
using Microsoft.AspNetCore.Authentication.Cookies;
namespace ZB.MOM.WW.ScadaBridge.Security;
/// <summary>
/// M2.19 (#15): the single, shared source of truth for the FULL set of claims that
/// back an interactive cookie session. BOTH the <c>/auth/login</c> endpoint and the
/// <c>OnValidatePrincipal</c> mid-session role-refresh path build their principal
/// through <see cref="Build"/>, so the two can never drift — the spec requires the
/// refresh to "rebuild claims identically to /auth/login".
/// </summary>
/// <remarks>
/// The claim shape is exactly what the login endpoint historically minted, plus the
/// two M2.19 additions:
/// <list type="bullet">
/// <item><see cref="ClaimTypes.Name"/> — resolves <c>Identity.Name</c>.</item>
/// <item><see cref="JwtTokenService.DisplayNameClaimType"/> — human display name.</item>
/// <item><see cref="JwtTokenService.UsernameClaimType"/> — canonical username.</item>
/// <item><see cref="JwtTokenService.RoleClaimType"/> — one per mapped role.</item>
/// <item><see cref="JwtTokenService.SiteIdClaimType"/> — one per permitted site,
/// ONLY when the mapping is not system-wide (deny-by-omission preserved).</item>
/// <item><see cref="JwtTokenService.GroupClaimType"/> — one per raw LDAP group
/// (M2.19): the durable input the mid-session RoleMapper re-run consumes.</item>
/// <item><see cref="JwtTokenService.LastRoleRefreshClaimType"/> — the role-mapping
/// refresh anchor (M2.19), ISO-8601 round-trippable.</item>
/// <item><see cref="JwtTokenService.LastActivityClaimType"/> — the idle-timeout
/// anchor; seeded to the refresh timestamp at login so idle-timeout can be
/// enforced consistently from the very first request.</item>
/// </list>
/// The <see cref="ClaimsIdentity"/> is built with <c>nameType = ClaimTypes.Name</c>
/// and <c>roleType = RoleClaimType</c> so <c>Identity.Name</c> / <c>IsInRole</c> /
/// <c>[Authorize(Roles=…)]</c> resolve against exactly the canonical types minted here.
/// </remarks>
public static class SessionClaimBuilder
{
/// <summary>
/// Builds the full cookie-session <see cref="ClaimsPrincipal"/> from the resolved
/// identity, the raw LDAP groups, the DB-backed role mapping, and the refresh
/// timestamp. Used identically by <c>/auth/login</c> and the
/// <c>OnValidatePrincipal</c> refresh path so the two cannot diverge.
/// </summary>
/// <param name="username">The canonical authenticated username (becomes <see cref="ClaimTypes.Name"/> + <see cref="JwtTokenService.UsernameClaimType"/>).</param>
/// <param name="displayName">The human-readable display name.</param>
/// <param name="groups">The user's raw LDAP groups, stored one per <see cref="JwtTokenService.GroupClaimType"/> claim.</param>
/// <param name="mapping">The DB-backed role mapping (roles + permitted sites + system-wide flag).</param>
/// <param name="refreshTimestamp">The role-mapping refresh anchor; also seeds the last-activity anchor.</param>
/// <param name="authenticationType">The authentication type stamped on the identity (defaults to the cookie scheme).</param>
/// <returns>A fully populated cookie <see cref="ClaimsPrincipal"/>.</returns>
public static ClaimsPrincipal Build(
string username,
string displayName,
IReadOnlyList<string> groups,
RoleMappingResult mapping,
DateTimeOffset refreshTimestamp,
string authenticationType = CookieAuthenticationDefaults.AuthenticationScheme)
{
ArgumentNullException.ThrowIfNull(username);
ArgumentNullException.ThrowIfNull(displayName);
ArgumentNullException.ThrowIfNull(groups);
ArgumentNullException.ThrowIfNull(mapping);
var refreshStamp = refreshTimestamp.ToString("o");
var claims = new List<Claim>
{
new(ClaimTypes.Name, username),
new(JwtTokenService.DisplayNameClaimType, displayName),
new(JwtTokenService.UsernameClaimType, username),
// Role-refresh anchor AND idle anchor are seeded from the same instant at
// build time. They then diverge: OnValidatePrincipal advances LastActivity
// on every request but only advances LastRoleRefresh when it actually
// re-runs the mapping.
new(JwtTokenService.LastRoleRefreshClaimType, refreshStamp),
new(JwtTokenService.LastActivityClaimType, refreshStamp),
};
foreach (var role in mapping.Roles)
{
claims.Add(new Claim(JwtTokenService.RoleClaimType, role));
}
// Deny-by-omission: only stamp SiteId claims for a non-system-wide mapping.
if (!mapping.IsSystemWideDeployment)
{
foreach (var siteId in mapping.PermittedSiteIds)
{
claims.Add(new Claim(JwtTokenService.SiteIdClaimType, siteId));
}
}
// Store the raw LDAP groups so the mid-session refresh can re-run the
// DB-backed RoleMapper without any LDAP round-trip.
foreach (var group in groups)
{
claims.Add(new Claim(JwtTokenService.GroupClaimType, group));
}
var identity = new ClaimsIdentity(
claims,
authenticationType: authenticationType,
nameType: ClaimTypes.Name,
roleType: JwtTokenService.RoleClaimType);
return new ClaimsPrincipal(identity);
}
/// <summary>Reads the stored LDAP group claims (<see cref="JwtTokenService.GroupClaimType"/>) off a principal.</summary>
/// <param name="principal">The cookie principal to read from.</param>
/// <returns>The stored LDAP group names; empty if none were stored.</returns>
public static IReadOnlyList<string> ReadGroups(ClaimsPrincipal principal)
{
ArgumentNullException.ThrowIfNull(principal);
return principal.FindAll(JwtTokenService.GroupClaimType).Select(c => c.Value).ToList();
}
}
@@ -35,4 +35,10 @@
<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.Commons/ZB.MOM.WW.ScadaBridge.Commons.csproj" />
</ItemGroup>
<ItemGroup>
<!-- M2.19 (#15): expose internal members (OnValidatePrincipalAsync adapter) to the
Security test project so the adapter translation can be exercised in isolation. -->
<InternalsVisibleTo Include="ZB.MOM.WW.ScadaBridge.Security.Tests" />
</ItemGroup>
</Project>
@@ -32,10 +32,9 @@ public interface ISiteEventLogger
/// <summary>
/// SiteEventLogging-018: total number of event writes that have failed
/// (SQLite error, disk full, bounded-queue overflow drop, etc.) since this
/// logger was created. Available for future Health Monitoring integration —
/// promoted onto the interface so a Health consumer can read it without a
/// concrete-type downcast. Not yet polled by Health Monitoring; the wiring
/// is tracked separately.
/// logger was created. Polled by <c>SiteEventLogFailureCountReporter</c>
/// (HealthMonitoring — M2.16 / #30) every 30 s and surfaced on the site
/// health report as <c>SiteHealthReport.SiteEventLogWriteFailures</c>.
/// </summary>
long FailedWriteCount { get; }
}
@@ -72,6 +72,15 @@ public class AlarmActor : ReceiveActor
private readonly string? _onTriggerScriptName;
private readonly Script<object?>? _onTriggerCompiledScript;
/// <summary>
/// M2.5 (#9): the on-trigger script's per-script execution timeout in seconds,
/// or null to use the global default. Forwarded to each spawned
/// <see cref="AlarmExecutionActor"/>, which applies <c>perScript ?? global</c>
/// (treating ≤ 0 as "use global"). The value comes from the referenced
/// on-trigger script's <see cref="ResolvedScript.ExecutionTimeoutSeconds"/>.
/// </summary>
private readonly int? _onTriggerExecutionTimeoutSeconds;
// Expression trigger: compiled expression + the attribute snapshot it
// evaluates against. This field is the single home for the compiled
// expression on the hot path.
@@ -107,6 +116,9 @@ public class AlarmActor : ReceiveActor
/// <param name="serviceProvider">Optional DI service provider used to resolve the optional
/// <see cref="ISiteEventLogger"/> for M1.5 <c>alarm</c> operational events. Fire-and-forget;
/// a logging failure never affects alarm evaluation.</param>
/// <param name="onTriggerExecutionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script
/// execution timeout in seconds (from its <see cref="ResolvedScript.ExecutionTimeoutSeconds"/>),
/// or null/non-positive to use the global default.</param>
public AlarmActor(
string alarmName,
string instanceName,
@@ -119,7 +131,9 @@ public class AlarmActor : ReceiveActor
Script<object?>? compiledTriggerExpression = null,
IReadOnlyDictionary<string, object?>? initialAttributes = null,
ISiteHealthCollector? healthCollector = null,
IServiceProvider? serviceProvider = null)
IServiceProvider? serviceProvider = null,
// M2.5 (#9): per-script timeout for the on-trigger script (null = global).
int? onTriggerExecutionTimeoutSeconds = null)
{
_alarmName = alarmName;
_instanceName = instanceName;
@@ -135,6 +149,7 @@ public class AlarmActor : ReceiveActor
_priority = alarmConfig.PriorityLevel;
_onTriggerScriptName = alarmConfig.OnTriggerScriptCanonicalName;
_onTriggerCompiledScript = onTriggerCompiledScript;
_onTriggerExecutionTimeoutSeconds = onTriggerExecutionTimeoutSeconds;
_compiledTriggerExpression = compiledTriggerExpression;
// Seed the trigger-expression attribute snapshot from the instance's
@@ -574,7 +589,9 @@ public class AlarmActor : ReceiveActor
_instanceActor,
_sharedScriptLibrary,
_options,
_logger));
_logger,
// M2.5 (#9): per-script timeout from the on-trigger script (null = global).
_onTriggerExecutionTimeoutSeconds));
Context.ActorOf(props, executionId);
}
@@ -28,6 +28,7 @@ public class AlarmExecutionActor : ReceiveActor
/// <param name="sharedScriptLibrary">Shared script library providing common utilities.</param>
/// <param name="options">Site runtime configuration options, including the execution timeout.</param>
/// <param name="logger">Logger for execution diagnostics.</param>
/// <param name="executionTimeoutSeconds">M2.5 (#9): the on-trigger script's per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
public AlarmExecutionActor(
string alarmName,
string instanceName,
@@ -38,7 +39,10 @@ public class AlarmExecutionActor : ReceiveActor
IActorRef instanceActor,
SharedScriptLibrary sharedScriptLibrary,
SiteRuntimeOptions options,
ILogger logger)
ILogger logger,
// M2.5 (#9): per-script execution timeout override (seconds) for the
// alarm on-trigger script. Null or non-positive falls back to the global.
int? executionTimeoutSeconds = null)
{
var self = Self;
var parent = Context.Parent;
@@ -46,7 +50,8 @@ public class AlarmExecutionActor : ReceiveActor
ExecuteAlarmScript(
alarmName, instanceName, level, priority, message,
compiledScript, instanceActor,
sharedScriptLibrary, options, self, parent, logger);
sharedScriptLibrary, options, self, parent, logger,
executionTimeoutSeconds);
}
private static void ExecuteAlarmScript(
@@ -61,9 +66,15 @@ public class AlarmExecutionActor : ReceiveActor
SiteRuntimeOptions options,
IActorRef self,
IActorRef parent,
ILogger logger)
ILogger logger,
int? executionTimeoutSeconds)
{
var timeout = TimeSpan.FromSeconds(options.ScriptExecutionTimeoutSeconds);
// M2.5 (#9): per-script timeout overrides the global default. A null or
// non-positive per-script value (≤ 0) falls back to the global.
var timeout = TimeSpan.FromSeconds(
executionTimeoutSeconds is { } perScript && perScript > 0
? perScript
: options.ScriptExecutionTimeoutSeconds);
// SiteRuntime-009: run the alarm on-trigger body on the dedicated
// script-execution scheduler, not the shared .NET thread pool.
@@ -895,11 +895,14 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
}
else
{
// M2.11: set InstanceNotFound=true so the caller can distinguish
// "not deployed on this site" from a deployed-but-empty instance.
_logger.LogWarning(
"Debug view subscribe for unknown instance {Instance}", request.InstanceUniqueName);
Sender.Tell(new DebugViewSnapshot(
request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(),
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow));
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow,
InstanceNotFound: true));
}
}
@@ -919,11 +922,14 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
}
else
{
// M2.11: set InstanceNotFound=true so the caller can distinguish
// "not deployed on this site" from a deployed-but-empty instance.
_logger.LogWarning(
"Debug snapshot for unknown instance {Instance}", request.InstanceUniqueName);
Sender.Tell(new DebugViewSnapshot(
request.InstanceUniqueName, Array.Empty<Commons.Messages.Streaming.AttributeValueChanged>(),
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow));
Array.Empty<Commons.Messages.Streaming.AlarmStateChanged>(), DateTimeOffset.UtcNow,
InstanceNotFound: true));
}
}
@@ -754,6 +754,10 @@ public class InstanceActor : ReceiveActor
foreach (var alarm in _configuration.Alarms)
{
Script<object?>? onTriggerScript = null;
// M2.5 (#9): the on-trigger script's per-script execution timeout,
// captured from its ResolvedScript so the AlarmExecutionActor can
// apply perScript ?? global. Null when there is no on-trigger script.
int? onTriggerTimeoutSeconds = null;
// Compile on-trigger script if defined
if (!string.IsNullOrEmpty(alarm.OnTriggerScriptCanonicalName))
@@ -763,6 +767,7 @@ public class InstanceActor : ReceiveActor
if (triggerScriptDef != null)
{
onTriggerTimeoutSeconds = triggerScriptDef.ExecutionTimeoutSeconds;
var result = _compilationService.Compile(
$"alarm-trigger-{alarm.CanonicalName}", triggerScriptDef.Code);
if (result.IsSuccess)
@@ -794,7 +799,9 @@ public class InstanceActor : ReceiveActor
triggerExpression,
attributeSnapshot,
_healthCollector,
_serviceProvider));
_serviceProvider,
// M2.5 (#9): per-script timeout for the alarm on-trigger script.
onTriggerTimeoutSeconds));
var actorRef = Context.ActorOf(props, $"alarm-{alarm.CanonicalName}");
_alarmActors[alarm.CanonicalName] = actorRef;
@@ -43,6 +43,13 @@ public class ScriptActor : ReceiveActor, IWithTimers
private Script<object?>? _compiledScript;
private ScriptTriggerConfig? _triggerConfig;
private TimeSpan? _minTimeBetweenRuns;
/// <summary>
/// M2.5 (#9): the per-script execution timeout in seconds, or null to use the
/// global default. Threaded down to each spawned <see cref="ScriptExecutionActor"/>,
/// which applies <c>perScript ?? global</c> (and treats ≤ 0 as "use global").
/// </summary>
private readonly int? _executionTimeoutSeconds;
private DateTimeOffset _lastExecutionTime = DateTimeOffset.MinValue;
private int _executionCounter;
private readonly Commons.Types.Scripts.ScriptScope _scope;
@@ -112,6 +119,7 @@ public class ScriptActor : ReceiveActor, IWithTimers
_healthCollector = healthCollector;
_serviceProvider = serviceProvider;
_minTimeBetweenRuns = scriptConfig.MinTimeBetweenRuns;
_executionTimeoutSeconds = scriptConfig.ExecutionTimeoutSeconds;
_scope = scriptConfig.Scope;
_compiledTriggerExpression = compiledTriggerExpression;
@@ -426,7 +434,9 @@ public class ScriptActor : ReceiveActor, IWithTimers
_serviceProvider,
// Audit Log #23 (ParentExecutionId): null for trigger-driven runs;
// an inbound-API-routed call supplies the inbound request's id.
parentExecutionId));
parentExecutionId,
// M2.5 (#9): per-script timeout override (null = use global).
_executionTimeoutSeconds));
Context.ActorOf(props, executionId);
}
@@ -47,6 +47,7 @@ public class ScriptExecutionActor : ReceiveActor
/// <param name="healthCollector">Optional health collector for recording execution metrics.</param>
/// <param name="serviceProvider">Optional DI service provider for script execution services.</param>
/// <param name="parentExecutionId">ExecutionId of the spawning inbound-API execution for audit correlation; null for normal runs.</param>
/// <param name="executionTimeoutSeconds">M2.5 (#9): per-script execution timeout in seconds. Null or non-positive falls back to the global <see cref="SiteRuntimeOptions.ScriptExecutionTimeoutSeconds"/>.</param>
public ScriptExecutionActor(
string scriptName,
string instanceName,
@@ -65,7 +66,10 @@ public class ScriptExecutionActor : ReceiveActor
// Audit Log #23 (ParentExecutionId): the spawning execution's
// ExecutionId for an inbound-API-routed call. Null for normal
// (tag-change / timer) runs and nested Script.Call invocations.
Guid? parentExecutionId = null)
Guid? parentExecutionId = null,
// M2.5 (#9): per-script execution timeout override (seconds). Null or
// non-positive falls back to the global ScriptExecutionTimeoutSeconds.
int? executionTimeoutSeconds = null)
{
// Immediately begin execution
var self = Self;
@@ -75,7 +79,7 @@ public class ScriptExecutionActor : ReceiveActor
scriptName, instanceName, compiledScript, parameters, callDepth,
instanceActor, sharedScriptLibrary, options, replyTo, correlationId,
self, parent, logger, scope, healthCollector, serviceProvider,
parentExecutionId);
parentExecutionId, executionTimeoutSeconds);
}
private static void ExecuteScript(
@@ -95,9 +99,15 @@ public class ScriptExecutionActor : ReceiveActor
Commons.Types.Scripts.ScriptScope scope,
ISiteHealthCollector? healthCollector,
IServiceProvider? serviceProvider,
Guid? parentExecutionId)
Guid? parentExecutionId,
int? executionTimeoutSeconds)
{
var timeout = TimeSpan.FromSeconds(options.ScriptExecutionTimeoutSeconds);
// M2.5 (#9): per-script timeout overrides the global default. A null or
// non-positive per-script value (≤ 0) falls back to the global.
var timeout = TimeSpan.FromSeconds(
executionTimeoutSeconds is { } perScript && perScript > 0
? perScript
: options.ScriptExecutionTimeoutSeconds);
// SiteRuntime-009: run the script body on the dedicated script-execution
// scheduler, not the shared .NET thread pool, so blocking script I/O cannot
@@ -207,7 +217,11 @@ public class ScriptExecutionActor : ReceiveActor
// and the four cached-call telemetry constructors can stamp
// it onto NotificationSubmit.SourceNode and
// SiteCallOperational.SourceNode respectively.
sourceNode: sourceNode);
sourceNode: sourceNode,
// M2.12 (#25): thread the singleton site event logger so
// recursion-limit violations at CallScript/CallShared emit a
// script Error site event in addition to ILogger.LogError.
siteEventLogger: siteEventLogger);
var globals = new ScriptGlobals
{
@@ -13,6 +13,7 @@ using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Audit;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using AuditEvent = ZB.MOM.WW.Audit.AuditEvent;
using ZB.MOM.WW.ScadaBridge.SiteEventLogging;
using ZB.MOM.WW.ScadaBridge.StoreAndForward;
namespace ZB.MOM.WW.ScadaBridge.SiteRuntime.Scripts;
@@ -94,6 +95,13 @@ public class ScriptRuntimeContext
/// </summary>
private readonly string? _sourceScript;
/// <summary>
/// M2.12 (#25): site event logger for recording recursion-limit violations
/// to the local SQLite event log. Optional — when null the emission is
/// skipped; the existing <c>_logger.LogError</c> + throw path is unchanged.
/// </summary>
private readonly ISiteEventLogger? _siteEventLogger;
/// <summary>
/// Audit Log #23: best-effort emitter for boundary-crossing actions executed
/// by the script. Optional — when null the helpers degrade to a no-op audit
@@ -179,6 +187,13 @@ public class ScriptRuntimeContext
/// <paramref name="executionId"/>; this only records the spawner.
/// </param>
/// <param name="sourceNode">Optional cluster node identifier (node-a/node-b) for audit trail stamping.</param>
/// <param name="siteEventLogger">
/// M2.12 (#25): optional site event logger. When supplied, recursion-limit
/// violations at <c>CallScript</c> and <c>CallShared</c> emit a
/// <c>script</c> Error event in addition to the existing
/// <c>ILogger.LogError</c> + throw. When null the existing behaviour is
/// unchanged; all existing callers and tests remain source-compatible.
/// </param>
public ScriptRuntimeContext(
IActorRef instanceActor,
IActorRef self,
@@ -199,7 +214,8 @@ public class ScriptRuntimeContext
ICachedCallTelemetryForwarder? cachedForwarder = null,
Guid? executionId = null,
Guid? parentExecutionId = null,
string? sourceNode = null)
string? sourceNode = null,
ISiteEventLogger? siteEventLogger = null)
{
_instanceActor = instanceActor;
_self = self;
@@ -227,6 +243,44 @@ public class ScriptRuntimeContext
// Audit Log #23 (ParentExecutionId): stored verbatim — no `?? NewGuid()`
// fallback. A non-routed run legitimately has no parent and stays null.
_parentExecutionId = parentExecutionId;
// M2.12 (#25): optional — null when not wired (tests / AlarmExecutionActor).
_siteEventLogger = siteEventLogger;
}
/// <summary>
/// M2.12 (#25): fire-and-forget emission of a <c>script</c> Error site event
/// for a recursion-limit violation. Mirrors the call shape used by
/// <c>ScriptExecutionActor</c>'s catch blocks (WP-32 / M1.8). A fault from
/// the site-event logger is observed-and-dropped (best-effort) via
/// <c>ContinueWith(OnlyOnFaulted)</c> — it never blocks or faults the
/// <c>_logger.LogError</c> + throw path that follows. A null logger is a no-op.
/// </summary>
private void EmitRecursionLimitEventAsync(string msg)
{
if (_siteEventLogger == null)
return;
var source = string.IsNullOrEmpty(_instanceName)
? "recursion-guard"
: $"InstanceScript:{_instanceName}";
var logTask = _siteEventLogger.LogEventAsync("script", "Error", _instanceName, source, msg);
if (!logTask.IsCompleted)
{
logTask.ContinueWith(
t => _logger.LogWarning(t.Exception,
"Site event log write failed for recursion-limit violation on instance '{Instance}'",
_instanceName),
CancellationToken.None,
TaskContinuationOptions.OnlyOnFaulted | TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default);
}
else if (logTask.IsFaulted)
{
_logger.LogWarning(logTask.Exception,
"Site event log write failed for recursion-limit violation on instance '{Instance}'",
_instanceName);
}
}
/// <summary>
@@ -302,6 +356,8 @@ public class ScriptRuntimeContext
var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " +
$"CallScript('{scriptName}') rejected at depth {nextDepth}.";
_logger.LogError(msg);
// M2.12 (#25): emit to site event log in addition to ILogger; fire-and-forget.
EmitRecursionLimitEventAsync(msg);
throw new InvalidOperationException(msg);
}
@@ -464,6 +520,9 @@ public class ScriptRuntimeContext
var msg = $"Script call depth exceeded maximum of {_maxCallDepth}. " +
$"CallShared('{scriptName}') rejected at depth {nextDepth}.";
_logger.LogError(msg);
// M2.12 (#25): emit to site event log via the parent context's
// helper — single emission path, fire-and-forget.
_context.EmitRecursionLimitEventAsync(msg);
throw new InvalidOperationException(msg);
}
@@ -1326,9 +1385,20 @@ public class ScriptRuntimeContext
name, trackedId, target, occurredAtUtc, cancellationToken)
.ConfigureAwait(false);
// M2.3 (#7): the gateway now attempts the write immediately and
// classifies the outcome (mirroring ExternalSystem.CachedCall). The
// result is retained because the immediate paths (WasBuffered=false —
// immediate success OR a synchronous permanent failure) bypass the
// S&F retry loop entirely, so no retry-loop telemetry ever fires.
// This helper must emit the Attempted + CachedResolve terminal rows
// itself, otherwise Tracking.Status(id) would stay Submitted forever
// and the audit log would be missing the terminal lifecycle. The
// WasBuffered=true path is unaffected — the S&F retry loop owns the
// Attempted + Resolve emissions there.
ExternalCallResult? result;
try
{
await _gateway.CachedWriteAsync(
result = await _gateway.CachedWriteAsync(
name, sql, parameters, _instanceName, cancellationToken, trackedId,
// Audit Log #23 (ExecutionId Task 4): thread the script
// execution's ExecutionId + SourceScript so a buffered
@@ -1350,9 +1420,148 @@ public class ScriptRuntimeContext
throw;
}
// M2.3 (#7): immediate-completion lifecycle — emit the missing
// Attempted + CachedResolve rows when the underlying write resolved
// without engaging the store-and-forward retry loop (immediate
// success or a synchronous permanent failure).
if (result is { WasBuffered: false })
{
await EmitImmediateDbTerminalTelemetryAsync(
name, target, trackedId, result, cancellationToken)
.ConfigureAwait(false);
}
return trackedId;
}
/// <summary>
/// M2.3 (#7): best-effort emission of the immediate-completion lifecycle
/// for a <c>Database.CachedWrite</c> that resolved without the S&amp;F
/// retry loop — emits an <c>Attempted</c> row then a terminal
/// <c>CachedResolve</c> row (<c>Delivered</c> on success, <c>Failed</c> on
/// a synchronous permanent SQL error). The DB parallel of
/// <see cref="EmitImmediateTerminalTelemetryAsync"/>. Any forwarder
/// failure is logged and swallowed (alog.md §7).
/// </summary>
private async Task EmitImmediateDbTerminalTelemetryAsync(
string connectionName,
string target,
TrackedOperationId trackedId,
ExternalCallResult result,
CancellationToken cancellationToken)
{
if (_cachedForwarder == null)
{
return;
}
var occurredAtUtc = DateTime.UtcNow;
// Status mapping mirrors the API path: success -> Delivered, a
// synchronous permanent failure -> Failed. A transient failure never
// reaches here (WasBuffered=true), so "the immediate attempt failed
// and the operation is done" always means a permanent failure.
var auditTerminalStatus = result.Success ? AuditStatus.Delivered : AuditStatus.Failed;
var operationalTerminalStatus = result.Success ? "Delivered" : "Failed";
// --- Attempted row -------------------------------------------------
CachedCallTelemetry? attempted = TryBuildDbTerminalTelemetry(
connectionName, target, trackedId, occurredAtUtc,
AuditKind.DbWriteCached, AuditStatus.Attempted, "Attempted",
result, isTerminal: false);
if (attempted is not null)
{
try
{
await _cachedForwarder.ForwardAsync(attempted, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-Attempted telemetry forward failed for Database.CachedWrite {Connection} (TrackedOperationId {Id})",
connectionName, trackedId);
}
}
// --- CachedResolve row --------------------------------------------
CachedCallTelemetry? resolve = TryBuildDbTerminalTelemetry(
connectionName, target, trackedId, occurredAtUtc,
AuditKind.CachedResolve, auditTerminalStatus, operationalTerminalStatus,
result, isTerminal: true);
if (resolve is not null)
{
try
{
await _cachedForwarder.ForwardAsync(resolve, cancellationToken)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Immediate-CachedResolve telemetry forward failed for Database.CachedWrite {Connection} (TrackedOperationId {Id})",
connectionName, trackedId);
}
}
}
/// <summary>
/// Builds one immediate-completion <c>DbOutbound</c> telemetry packet, or
/// returns <c>null</c> (and logs) when construction throws — so a build
/// failure skips emission rather than aborting the script.
/// </summary>
private CachedCallTelemetry? TryBuildDbTerminalTelemetry(
string connectionName,
string target,
TrackedOperationId trackedId,
DateTime occurredAtUtc,
AuditKind kind,
AuditStatus auditStatus,
string operationalStatus,
ExternalCallResult result,
bool isTerminal)
{
try
{
return new CachedCallTelemetry(
Audit: ScadaBridgeAuditEventFactory.Create(
channel: AuditChannel.DbOutbound,
kind: kind,
status: auditStatus,
occurredAtUtc: DateTime.SpecifyKind(occurredAtUtc, DateTimeKind.Utc),
target: target,
correlationId: trackedId.Value,
executionId: _executionId,
parentExecutionId: _parentExecutionId,
sourceSiteId: string.IsNullOrEmpty(_siteId) ? null : _siteId,
sourceInstanceId: _instanceName,
sourceScript: _sourceScript,
errorMessage: result.Success ? null : result.ErrorMessage),
Operational: new SiteCallOperational(
TrackedOperationId: trackedId,
Channel: "DbOutbound",
Target: target,
SourceSite: _siteId,
SourceNode: _sourceNode,
Status: operationalStatus,
RetryCount: 0,
LastError: result.Success ? null : result.ErrorMessage,
HttpStatus: null,
CreatedAtUtc: occurredAtUtc,
UpdatedAtUtc: occurredAtUtc,
TerminalAtUtc: isTerminal ? occurredAtUtc : null));
}
catch (Exception buildEx)
{
_logger.LogWarning(buildEx,
"Failed to build immediate-{Kind} telemetry for Database.CachedWrite {Connection} (TrackedOperationId {Id}) — skipping emission",
kind, connectionName, trackedId);
return null;
}
}
private async Task EmitCachedDbSubmitTelemetryAsync(
string connectionName,
TrackedOperationId trackedId,
@@ -42,6 +42,13 @@ public class DiffService
s => s.CanonicalName,
ScriptsEqual);
// TemplateEngine-018: surface standalone connection endpoint/protocol/
// failover drift. Per-attribute binding changes already show up under
// AttributeChanges, but a connection's own ConfigurationJson /
// BackupConfigurationJson / Protocol / FailoverRetryCount edits do not —
// those only appear here.
var connectionChanges = ComputeConnectionsDiff(oldConfig, newConfig);
return new ConfigurationDiff
{
InstanceUniqueName = newConfig.InstanceUniqueName,
@@ -49,7 +56,8 @@ public class DiffService
NewRevisionHash = newRevisionHash,
AttributeChanges = attributeChanges,
AlarmChanges = alarmChanges,
ScriptChanges = scriptChanges
ScriptChanges = scriptChanges,
ConnectionChanges = connectionChanges
};
}
@@ -133,7 +141,8 @@ public class DiffService
a.TriggerConfiguration == b.TriggerConfiguration &&
a.ParameterDefinitions == b.ParameterDefinitions &&
a.ReturnDefinition == b.ReturnDefinition &&
a.MinTimeBetweenRuns == b.MinTimeBetweenRuns;
a.MinTimeBetweenRuns == b.MinTimeBetweenRuns &&
a.ExecutionTimeoutSeconds == b.ExecutionTimeoutSeconds;
/// <summary>
/// Compares two <see cref="ConnectionConfig"/> instances for equality across
@@ -159,11 +168,10 @@ public class DiffService
/// TemplateEngine-018: produces a per-connection diff between two flattened
/// configurations, emitting Added / Removed / Changed entries keyed by the
/// connection name. Mirrors the existing <see cref="ComputeEntityDiff{T}"/>
/// shape used for attributes / alarms / scripts but is exposed as a separate
/// method because <see cref="ConfigurationDiff"/> in
/// <c>ZB.MOM.WW.ScadaBridge.Commons</c> does not yet carry a <c>ConnectionChanges</c>
/// slot — the public diff record will be extended in a paired Commons change
/// (this file is the only one in this fix's scope). A null
/// shape used for attributes / alarms / scripts. Called by
/// <see cref="ComputeDiff"/> to populate
/// <see cref="ConfigurationDiff.ConnectionChanges"/>, and exposed publicly so
/// callers can compute connection drift in isolation. A null
/// <c>Connections</c> dictionary on either side is treated as the empty map.
/// </summary>
/// <param name="oldConfig">The previously deployed configuration, or null
@@ -830,6 +830,10 @@ public class FlatteningService
ParameterDefinitions = script.ParameterDefinitions,
ReturnDefinition = script.ReturnDefinition,
MinTimeBetweenRuns = script.MinTimeBetweenRuns,
// M2.5 (#9): per-script timeout rides along on the winning row.
// Scripts inherit/override at whole-row granularity (no per-field
// merge), so this follows the same rule as the script body/MinTime.
ExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds,
Source = source
};
idByName[script.Name] = script.Id;
@@ -83,7 +83,10 @@ public class RevisionHashService
TriggerConfiguration = s.TriggerConfiguration,
ParameterDefinitions = s.ParameterDefinitions,
ReturnDefinition = s.ReturnDefinition,
MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks
MinTimeBetweenRunsTicks = s.MinTimeBetweenRuns?.Ticks,
// M2.5 (#9): include the per-script timeout so a change to it
// is detected as a configuration change (staleness/redeploy).
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds
})
.ToList(),
Connections = configuration.Connections is { Count: > 0 }
@@ -244,6 +247,10 @@ public class RevisionHashService
/// </summary>
public string Code { get; init; } = string.Empty;
/// <summary>
/// M2.5 (#9): the per-script execution timeout in seconds (null = global).
/// </summary>
public int? ExecutionTimeoutSeconds { get; init; }
/// <summary>
/// Whether the script is locked.
/// </summary>
public bool IsLocked { get; init; }
@@ -17,7 +17,7 @@ namespace ZB.MOM.WW.ScadaBridge.TemplateEngine;
/// Override granularity:
/// - Attributes: Value and Description overridable; DataType and DataSourceReference fixed.
/// - Alarms: Priority, TriggerConfiguration, Description, OnTriggerScript overridable; Name and TriggerType fixed.
/// - Scripts: Code, TriggerConfiguration, MinTimeBetweenRuns, params/return overridable; Name fixed.
/// - Scripts: Code, TriggerConfiguration, MinTimeBetweenRuns, ExecutionTimeoutSeconds, params/return overridable; Name fixed.
/// - Lock flag applies to the entire member (attribute/alarm/script).
/// </summary>
public static class LockEnforcer
@@ -687,6 +687,8 @@ public class TemplateService
existing.TriggerType = proposed.TriggerType;
existing.TriggerConfiguration = proposed.TriggerConfiguration;
existing.MinTimeBetweenRuns = proposed.MinTimeBetweenRuns;
// M2.5 (#9): per-script execution timeout is an overridable field.
existing.ExecutionTimeoutSeconds = proposed.ExecutionTimeoutSeconds;
existing.ParameterDefinitions = proposed.ParameterDefinitions;
existing.ReturnDefinition = proposed.ReturnDefinition;
existing.IsLocked = proposed.IsLocked;
@@ -1013,6 +1015,7 @@ public class TemplateService
ParameterDefinitions = script.ParameterDefinitions,
ReturnDefinition = script.ReturnDefinition,
MinTimeBetweenRuns = script.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = script.ExecutionTimeoutSeconds,
IsInherited = true,
LockedInDerived = false,
});
@@ -80,6 +80,7 @@ public class SemanticValidator
else
{
ValidateCallParameters(script.CanonicalName, call, sharedParamMap, errors);
ValidateCallReturnType(script.CanonicalName, call, sharedReturnMap, errors);
}
}
else
@@ -94,6 +95,7 @@ public class SemanticValidator
else
{
ValidateCallParameters(script.CanonicalName, call, scriptParamMap, errors);
ValidateCallReturnType(script.CanonicalName, call, scriptReturnMap, errors);
// Instance scripts cannot call alarm on-trigger scripts
if (alarmOnTriggerScripts.Contains(call.TargetName))
@@ -262,6 +264,109 @@ public class SemanticValidator
errors.Add(ValidationEntry.Error(ValidationCategory.ParameterMismatch,
$"Script '{callerName}' calls '{call.TargetName}' with {call.ArgumentCount} arguments but {expectedParams.Count} are expected.",
callerName));
// Count mismatch already reported — positional type matching below
// would be misaligned, so don't compound the noise.
return;
}
ValidateArgumentTypes(callerName, call, expectedParams, errors);
}
/// <summary>
/// #21 — Argument-type validation. Compares each positionally-matched call
/// argument expression against the target's declared parameter type and
/// flags only CLEAR cross-category mismatches.
///
/// Conservatism (false-positive avoidance) — a parameter is checked only
/// when BOTH sides are confidently known:
/// <list type="bullet">
/// <item>Declared type must normalize to a known primitive (String, Integer,
/// Float, Boolean). <c>Object</c>/<c>List</c>/unknown declarations accept
/// anything — never flagged.</item>
/// <item>Argument expression type must be inferable from a literal
/// (string/char, integer, decimal, <c>true</c>/<c>false</c>). Variables,
/// member access, method/await chains, <c>null</c>, casts, object/array
/// initializers, and anything else infer to Unknown and are never flagged.</item>
/// <item>Integer⇄Float is treated as compatible (numeric widening) — never
/// flagged.</item>
/// </list>
/// </summary>
private static void ValidateArgumentTypes(
string callerName,
CallTarget call,
List<string> expectedParams,
List<ValidationEntry> errors)
{
// Argument expressions are aligned 1:1 with parameters here (count was
// verified equal by the caller). If the argument text couldn't be split
// (e.g. it wasn't captured), skip silently.
if (call.ArgumentExpressions.Count != expectedParams.Count)
return;
for (var i = 0; i < expectedParams.Count; i++)
{
var declared = NormalizeType(expectedParams[i]);
if (declared is null)
continue; // Object/List/unknown declaration accepts anything.
var actual = InferLiteralType(call.ArgumentExpressions[i]);
if (actual is null)
continue; // Can't confidently infer the argument's type.
if (!IsAssignable(actual.Value, declared.Value))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ParameterMismatch,
$"Script '{callerName}' calls '{call.TargetName}' argument {i + 1} with type '{actual}' but parameter '{expectedParams[i]}' expects '{declared}'.",
callerName));
}
}
}
/// <summary>
/// #20 — Return-type validation. When a call result is assigned directly
/// into a typed local declaration (<c>int x = CallScript(...)</c>,
/// <c>bool b = await CallShared(...)</c>), compares the LHS declared type
/// against the target's declared return type and flags clear mismatches.
///
/// Conservatism (false-positive avoidance) — flagged only when ALL hold:
/// <list type="bullet">
/// <item>The call result is captured by a typed local whose type is a known
/// primitive (so <c>var</c>, <c>object</c>, <c>dynamic</c>, and untyped
/// reuse are never flagged).</item>
/// <item>The call is the WHOLE initializer (optionally preceded by
/// <c>await</c>). If the result feeds an expression / method chain
/// (e.g. <c>(int)(await CallScript(...))</c>, <c>CallScript(...).X</c>)
/// the assigned-type is not captured and nothing is flagged.</item>
/// <item>The target declares a known-primitive return type. Missing/Object/
/// List/unknown returns are never flagged.</item>
/// <item>Integer⇄Float is compatible (numeric widening) — never flagged.</item>
/// </list>
/// </summary>
private static void ValidateCallReturnType(
string callerName,
CallTarget call,
Dictionary<string, string?> returnMap,
List<ValidationEntry> errors)
{
if (call.AssignedToType is null)
return; // Result not captured by a typed local (var/untyped/unused).
var expected = NormalizeType(call.AssignedToType);
if (expected is null)
return; // LHS isn't a known primitive — don't guess.
if (!returnMap.TryGetValue(call.TargetName, out var returnDef))
return;
var actual = NormalizeType(ParseReturnDefinitionType(returnDef));
if (actual is null)
return; // Target's return type unknown/non-primitive.
if (!IsAssignable(actual.Value, expected.Value))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ReturnTypeMismatch,
$"Script '{callerName}' assigns the '{actual}' return value of '{call.TargetName}' to a '{expected}' variable.",
callerName));
}
}
@@ -270,12 +375,90 @@ public class SemanticValidator
var result = new Dictionary<string, List<string>>(StringComparer.Ordinal);
foreach (var script in scripts)
{
var parameters = ParseParameterDefinitions(script.ParameterDefinitions);
// Per-parameter declared TYPE in declared order (raw type strings).
// One entry per parameter, so the existing count check is preserved
// while #21 also has the types it needs for positional matching.
var parameters = ParseParameterTypes(script.ParameterDefinitions);
result[script.CanonicalName] = parameters;
}
return result;
}
/// <summary>
/// Parses a parameter definitions JSON string (JSON Schema or legacy flat
/// array) and returns the declared parameter TYPE for each parameter, in
/// declared order. Names are not needed for positional call validation; the
/// returned count equals the parameter count (preserving the count check).
/// </summary>
/// <param name="parameterDefinitionsJson">JSON Schema or legacy flat-array string; null/empty returns an empty list.</param>
/// <returns>The per-parameter raw type strings (e.g. "Int32", "string", "List").</returns>
internal static List<string> ParseParameterTypes(string? parameterDefinitionsJson)
{
if (string.IsNullOrWhiteSpace(parameterDefinitionsJson))
return [];
try
{
using var doc = JsonDocument.Parse(parameterDefinitionsJson);
// JSON Schema: { type:"object", properties:{ name:{ type:"integer" }, ... } }
if (doc.RootElement.ValueKind == JsonValueKind.Object)
{
if (doc.RootElement.TryGetProperty("properties", out var props)
&& props.ValueKind == JsonValueKind.Object)
{
return props.EnumerateObject()
.Select(p => p.Value.ValueKind == JsonValueKind.Object
&& p.Value.TryGetProperty("type", out var t)
&& t.ValueKind == JsonValueKind.String
? t.GetString() ?? "unknown"
: "unknown")
.ToList();
}
}
// Legacy flat form: [{ name, type, required? }]
else if (doc.RootElement.ValueKind == JsonValueKind.Array)
{
return doc.RootElement.EnumerateArray()
.Select(e => e.TryGetProperty("type", out var t) ? t.GetString() ?? "unknown" : "unknown")
.ToList();
}
}
catch (JsonException)
{
}
return [];
}
/// <summary>
/// Extracts the declared return type from a ReturnDefinition JSON string
/// (JSON Schema <c>{type:"..."}</c> or legacy <c>{type:"..."}</c>). Returns
/// null when absent or unparseable.
/// </summary>
/// <param name="returnDefinitionJson">JSON return definition; null/empty returns null.</param>
/// <returns>The raw return type string (e.g. "boolean", "Int32"), or null.</returns>
internal static string? ParseReturnDefinitionType(string? returnDefinitionJson)
{
if (string.IsNullOrWhiteSpace(returnDefinitionJson))
return null;
try
{
using var doc = JsonDocument.Parse(returnDefinitionJson);
if (doc.RootElement.ValueKind == JsonValueKind.Object
&& doc.RootElement.TryGetProperty("type", out var t)
&& t.ValueKind == JsonValueKind.String)
{
return t.GetString();
}
}
catch (JsonException)
{
}
return null;
}
private static Dictionary<string, string?> BuildReturnMap(IReadOnlyList<ResolvedScript> scripts)
{
var result = new Dictionary<string, string?>(StringComparer.Ordinal);
@@ -353,12 +536,22 @@ public class SemanticValidator
var target = ExtractStringArgument(code, argsStart);
if (target != null)
{
var argCount = CountArguments(code, argsStart);
// First argument is the script name; the rest are the call's
// positional arguments.
var args = SplitCallArguments(code, argsStart);
var argExpressions = args.Count > 1
? args.GetRange(1, args.Count - 1)
: new List<string>();
results.Add(new CallTarget
{
TargetName = target,
IsShared = isShared,
ArgumentCount = Math.Max(0, argCount - 1) // First arg is the name, rest are parameters
ArgumentCount = argExpressions.Count,
ArgumentExpressions = argExpressions,
// #20: the declared type the result is assigned into, if the
// call is the whole initializer of a typed local declaration.
AssignedToType = ExtractAssignedToType(code, idx)
});
}
@@ -366,6 +559,372 @@ public class SemanticValidator
}
}
/// <summary>
/// Splits a call's argument list (starting just after the opening paren)
/// into top-level argument expressions, trimmed. Tracks parenthesis, brace,
/// and bracket nesting plus string/char literals so object initializers,
/// nested calls, collection expressions, and commas inside literals don't
/// produce spurious splits. Element 0 is the script-name argument.
/// </summary>
private static List<string> SplitCallArguments(string code, int startPos)
{
var args = new List<string>();
var depthParen = 1; // we start inside the call's own '('
var depthBraceBracket = 0;
var pos = startPos;
var argStart = startPos;
while (pos < code.Length)
{
var c = code[pos];
switch (c)
{
case '(':
depthParen++;
break;
case ')':
depthParen--;
if (depthParen == 0)
{
AddArg(code, argStart, pos, args);
return args;
}
break;
case '{':
case '[':
depthBraceBracket++;
break;
case '}':
case ']':
if (depthBraceBracket > 0) depthBraceBracket--;
break;
case ',' when depthParen == 1 && depthBraceBracket == 0:
AddArg(code, argStart, pos, args);
argStart = pos + 1;
break;
case '"':
case '\'':
// Skip the literal body so its delimiters/commas are ignored.
pos++;
while (pos < code.Length && code[pos] != c)
{
if (code[pos] == '\\') pos++; // skip escaped char
pos++;
}
break;
case '/':
// Skip C# line and block comments so commas inside them are ignored.
// A `/` inside a string literal is already consumed above, so we only
// reach here for real `/` tokens in code.
if (pos + 1 < code.Length)
{
if (code[pos + 1] == '/')
{
// Line comment: skip to end-of-line.
pos += 2;
while (pos < code.Length && code[pos] != '\n') pos++;
}
else if (code[pos + 1] == '*')
{
// Block comment: skip to closing `*/`.
pos += 2;
while (pos + 1 < code.Length && !(code[pos] == '*' && code[pos + 1] == '/'))
pos++;
if (pos + 1 < code.Length) pos++; // step over the `/`
}
}
break;
}
pos++;
}
// Unterminated call (shouldn't happen for compilable code) — best effort.
AddArg(code, argStart, code.Length, args);
return args;
static void AddArg(string code, int start, int end, List<string> acc)
{
var text = code[start..end].Trim();
// Only the trailing empty slice after a lone name (e.g. "foo",) is
// dropped; an empty arg list ("foo") still yields just the name.
if (text.Length > 0 || acc.Count == 0)
acc.Add(text);
}
}
/// <summary>
/// #20 inference — looks backwards from the call's start index for a typed
/// local declaration whose initializer is exactly this call (optionally
/// preceded by <c>await</c>). The call may be qualified by a simple receiver
/// (<c>Instance.</c>, <c>Scripts.</c>, <c>Parent.</c>,
/// <c>Children["x"].</c>) which is skipped. Returns the declared LHS type
/// token, or null when the result isn't captured by a simple typed local
/// (e.g. <c>var</c>, no assignment, reassignment to an existing variable, or
/// the call is part of a larger expression such as a cast or longer
/// member-access chain).
/// </summary>
private static string? ExtractAssignedToType(string code, int callIndex)
{
// Walk back over a simple dotted receiver immediately before the call —
// e.g. the "Instance." / "Scripts." / "Children[\"x\"]." prefix on a
// qualified call. Only identifier chars, '.', and bracketed indexers
// (with string/identifier contents) are skipped; anything else (a ')',
// an operator, another call's '(') means the call is embedded in a
// larger expression and we must not infer.
var receiverStart = SkipReceiverBackwards(code, callIndex);
// Walk back over whitespace immediately before the receiver/call.
var i = receiverStart - 1;
while (i >= 0 && char.IsWhiteSpace(code[i])) i--;
if (i < 0) return null;
// The call must be the entire RHS: the char before it (after optional
// 'await') must be '='. Anything else (')', '.', '(', operators) means
// the result is consumed by a larger expression — don't infer.
var beforeCall = code[..(i + 1)];
// Strip a trailing 'await' so "= await CallScript(...)" is handled.
var awaitTrimmed = beforeCall.TrimEnd();
if (awaitTrimmed.EndsWith("await", StringComparison.Ordinal)
&& (awaitTrimmed.Length == 5 || !IsIdentifierChar(awaitTrimmed[^6])))
{
beforeCall = awaitTrimmed[..^5];
}
beforeCall = beforeCall.TrimEnd();
if (!beforeCall.EndsWith('=')) return null;
// Exclude '==', '<=', '>=', '!=' etc. — comparisons, not assignment.
if (beforeCall.Length >= 2)
{
var prev = beforeCall[^2];
if (prev is '=' or '!' or '<' or '>' or '+' or '-' or '*' or '/' or '%' or '&' or '|' or '^')
return null;
}
// Now parse the "<type> <name>" declaration that precedes the '='.
var decl = beforeCall[..^1].TrimEnd();
// Identifier (the variable name).
var end = decl.Length;
var nameEnd = end;
while (nameEnd > 0 && IsIdentifierChar(decl[nameEnd - 1])) nameEnd--;
if (nameEnd == end) return null; // no identifier
var nameStart = nameEnd;
// Whitespace between type and name.
var ws = nameStart;
while (ws > 0 && char.IsWhiteSpace(decl[ws - 1])) ws--;
if (ws == nameStart) return null; // need separating whitespace → "type name"
// The type token (single identifier/keyword — no generics/arrays here;
// those normalize to unknown anyway and stay unflagged).
var typeEnd = ws;
var typeStart = typeEnd;
while (typeStart > 0 && IsIdentifierChar(decl[typeStart - 1])) typeStart--;
if (typeStart == typeEnd) return null;
// Guard against picking up a keyword that isn't a type in this position
// (e.g. "return x = ..."). A real declaration's type token is preceded
// by a statement boundary or open brace, not by another identifier.
if (typeStart > 0)
{
var b = typeStart - 1;
while (b >= 0 && char.IsWhiteSpace(decl[b])) b--;
if (b >= 0 && IsIdentifierChar(decl[b]))
return null; // preceded by another word → not a clean declaration
}
return decl[typeStart..typeEnd];
}
private static bool IsIdentifierChar(char c) => char.IsLetterOrDigit(c) || c == '_';
/// <summary>
/// Given the index of a <c>CallScript</c>/<c>CallShared</c> token, walks
/// backwards over a leading receiver expression composed only of identifier
/// chars, '.', and bracketed indexers (<c>["x"]</c>), and returns the index
/// where that receiver begins. If there is no '.' immediately before the
/// token (an unqualified call) the original index is returned unchanged.
/// Stops at the first character that can't be part of such a simple
/// receiver, so casts/parenthesised/chained-method receivers aren't
/// mistaken for a clean assignment target.
/// </summary>
private static int SkipReceiverBackwards(string code, int callIndex)
{
var i = callIndex - 1;
// Optional whitespace then must be a '.' for there to be a receiver.
while (i >= 0 && char.IsWhiteSpace(code[i])) i--;
if (i < 0 || code[i] != '.') return callIndex;
var start = callIndex;
while (i >= 0)
{
var c = code[i];
if (c == '.' || IsIdentifierChar(c) || char.IsWhiteSpace(c))
{
start = i;
i--;
continue;
}
if (c == ']')
{
// Skip a single (non-nested) indexer "[ ... ]" with string or
// identifier contents — e.g. Children["pump"].
var j = i - 1;
while (j >= 0 && code[j] != '[' && code[j] != '(' && code[j] != ')')
j--;
if (j < 0 || code[j] != '[') return start;
start = j;
i = j - 1;
continue;
}
break;
}
return start;
}
// ── Script-level type vocabulary (#20/#21) ──────────────────────────────
//
// The template scripting "type system" exposed in ParameterDefinitions /
// ReturnDefinition is a small set: String, Integer, Float, Boolean, plus
// Object / List (and arbitrary unrecognised names). Only the four scalar
// primitives below are matched; everything else maps to null ("unknown"),
// which the validators treat as "accept anything / don't flag".
private enum ScriptType { String, Integer, Float, Boolean }
/// <summary>
/// Maps a declared type token (JSON-Schema name, legacy name, or a C# type
/// keyword used on a call-site LHS) onto a <see cref="ScriptType"/>, or null
/// when the type isn't one of the confidently-checkable primitives.
/// </summary>
private static ScriptType? NormalizeType(string? raw)
{
if (string.IsNullOrWhiteSpace(raw)) return null;
return raw.Trim().ToLowerInvariant() switch
{
"string" or "datetime" => ScriptType.String,
"integer" or "int" or "int32" or "int64" or "long" or "short" or "byte" => ScriptType.Integer,
"float" or "double" or "decimal" or "number" or "single" => ScriptType.Float,
"boolean" or "bool" => ScriptType.Boolean,
// Object, List, array, var, dynamic, and anything else → unknown.
_ => null,
};
}
/// <summary>
/// Infers the <see cref="ScriptType"/> of a call-site argument expression,
/// but ONLY for unambiguous literals. Returns null for variables, member
/// access, method/await chains, <c>null</c>, casts, parenthesised/compound
/// expressions, and object/array/collection initializers — those can't be
/// statically typed here and must never be flagged.
/// </summary>
private static ScriptType? InferLiteralType(string expr)
{
expr = expr.Trim();
if (expr.Length == 0) return null;
// String / char literal — but only if the WHOLE expression is the
// literal (so "a" + x or x + "b" stays unknown).
if ((expr[0] == '"' || expr[0] == '\'') && IsWholeStringLiteral(expr))
return ScriptType.String;
if (expr.StartsWith('@') && expr.Length > 1 && expr[1] == '"' && IsWholeStringLiteral(expr[1..]))
return ScriptType.String;
if (expr.StartsWith('$'))
return null; // interpolated string — string-ish, but be conservative.
if (expr is "true" or "false")
return ScriptType.Boolean;
// Numeric literal (optionally signed). Float if it has a '.', 'e'/'E'
// exponent, or a float/double/decimal suffix; otherwise Integer.
if (IsNumericLiteral(expr, out var isFloat))
return isFloat ? ScriptType.Float : ScriptType.Integer;
return null; // Not a literal we can confidently classify.
}
private static bool IsWholeStringLiteral(string expr)
{
if (expr.Length < 2) return false;
var quote = expr[0];
if (quote != '"' && quote != '\'') return false;
var i = 1;
while (i < expr.Length)
{
if (expr[i] == '\\') { i += 2; continue; }
if (expr[i] == quote) return i == expr.Length - 1; // closing quote must be last char
i++;
}
return false;
}
private static bool IsNumericLiteral(string expr, out bool isFloat)
{
isFloat = false;
var i = 0;
if (expr.Length == 0) return false;
if (expr[0] == '+' || expr[0] == '-') i++;
// A genuine numeric literal must start with a digit or a `.` followed by a
// digit. Identifiers that start with `_` or a letter (e.g. `_2`, `count`)
// are explicitly rejected here so they are inferred as Unknown, not Integer.
if (i >= expr.Length) return false;
var first = expr[i];
if (first == '.')
{
if (i + 1 >= expr.Length || !char.IsDigit(expr[i + 1])) return false;
}
else if (!char.IsDigit(first))
{
return false; // starts with `_`, letter, or anything else → not a literal
}
var sawDigit = false;
var sawDot = false;
var sawExp = false;
for (; i < expr.Length; i++)
{
var c = expr[i];
if (char.IsDigit(c)) { sawDigit = true; continue; }
if (c == '_' && sawDigit) continue; // digit separator — only valid between digits
if (c == '.' && !sawDot && !sawExp) { sawDot = true; isFloat = true; continue; }
if ((c == 'e' || c == 'E') && !sawExp && sawDigit)
{
sawExp = true; isFloat = true;
if (i + 1 < expr.Length && (expr[i + 1] == '+' || expr[i + 1] == '-')) i++;
continue;
}
// Numeric suffix terminates the literal.
if (i == expr.Length - 1 || (i == expr.Length - 2))
{
var suffix = expr[i..].ToLowerInvariant();
switch (suffix)
{
case "f": case "d": case "m": isFloat = true; return sawDigit;
case "l": case "u": case "ul": case "lu": return sawDigit; // integer suffixes
}
}
return false; // any other char → not a plain numeric literal
}
return sawDigit;
}
/// <summary>
/// Whether an argument/return of <paramref name="actual"/> type is
/// acceptable where <paramref name="expected"/> is declared. Exact match, or
/// Integer⇄Float numeric widening. All other cross-category pairings
/// (String↔number, String↔Boolean, Boolean↔number) are mismatches.
/// </summary>
private static bool IsAssignable(ScriptType actual, ScriptType expected)
{
if (actual == expected) return true;
// Numeric widening / narrowing between Integer and Float is tolerated —
// the scripting runtime coerces these and flagging them is noisy.
return (actual == ScriptType.Integer && expected == ScriptType.Float)
|| (actual == ScriptType.Float && expected == ScriptType.Integer);
}
private static string? ExtractStringArgument(string code, int startPos)
{
// Skip whitespace
@@ -387,43 +946,6 @@ public class SemanticValidator
return code[nameStart..pos];
}
private static int CountArguments(string code, int startPos)
{
var depth = 1;
var count = 1; // At least one argument (the name)
var pos = startPos;
while (pos < code.Length && depth > 0)
{
switch (code[pos])
{
case '(':
depth++;
break;
case ')':
depth--;
break;
case ',' when depth == 1:
count++;
break;
case '"':
case '\'':
// Skip string literals
var quote = code[pos];
pos++;
while (pos < code.Length && code[pos] != quote)
{
if (code[pos] == '\\') pos++; // Skip escaped chars
pos++;
}
break;
}
pos++;
}
return count;
}
internal record CallTarget
{
/// <summary>Name of the script being called.</summary>
@@ -432,5 +954,13 @@ public class SemanticValidator
public bool IsShared { get; init; }
/// <summary>Number of non-name arguments passed to the call.</summary>
public int ArgumentCount { get; init; }
/// <summary>The trimmed text of each non-name positional argument expression, in order.</summary>
public IReadOnlyList<string> ArgumentExpressions { get; init; } = [];
/// <summary>
/// The declared type token the call result is assigned into, when the
/// call is the whole initializer of a typed local declaration; otherwise
/// null (var/untyped/unused/expression-embedded). Used by #20.
/// </summary>
public string? AssignedToType { get; init; }
}
}
@@ -14,7 +14,10 @@ namespace ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
/// 4. Alarm trigger references exist (referenced attributes must be in the flattened config)
/// 5. Script trigger references exist (referenced attributes must be in the flattened config)
/// 6. Expression triggers — blank check, syntax check, and attribute-reference scan
/// 7. Connection binding completeness (all data-sourced attributes must have a binding)
/// 7. Connection binding completeness — every data-sourced attribute must have a binding,
/// and (on the deploy path) the bound connection must exist on the target site.
/// Severity is context-dependent: a non-blocking Warning at template design time
/// (bindings are set later) and a deploy-gating Error when enforced (M2.8 / #23).
/// 8. Does NOT verify tag path resolution on devices
/// </summary>
public class ValidationService
@@ -45,8 +48,44 @@ public class ValidationService
/// </summary>
/// <param name="configuration">The flattened configuration to validate.</param>
/// <param name="sharedScripts">Optional list of shared scripts for validation context.</param>
/// <param name="alarmCapableConnectionNames">
/// Optional set of site data-connection names whose protocol resolves to an
/// alarm-capable adapter (see
/// <see cref="Commons.Interfaces.Protocol.AlarmCapableProtocols"/>). When supplied,
/// the semantic validator gates every native-alarm-source binding against it.
/// <c>null</c> skips the capability check (its absence makes the check inert).
/// </param>
/// <param name="enforceConnectionBindings">
/// M2.8 (#23): controls the severity of the connection-binding-completeness check.
/// <para>
/// <c>false</c> (default) — template DESIGN-TIME: a data-sourced attribute that is
/// not yet bound produces only a non-blocking <c>Warning</c>. Bindings are set later,
/// at instance/deploy time, so an unbound data-sourced template attribute is legitimate
/// here (see <see cref="ManagementService"/>'s ValidateTemplate path, which builds a
/// config straight from raw template members with no bindings).
/// </para>
/// <para>
/// <c>true</c> — DEPLOY path (<see cref="DeploymentManager"/>'s FlatteningPipeline):
/// an unbound data-sourced attribute becomes a deploy-gating <c>Error</c> (IsValid false),
/// and — when <paramref name="siteConnectionNames"/> is supplied — a binding pointing at a
/// connection that does not exist on the target site is also an <c>Error</c>.
/// </para>
/// </param>
/// <param name="siteConnectionNames">
/// M2.8 (#23): optional set of the data-connection names that actually exist on the
/// target site (computed by the deploy pipeline from the site's loaded connections,
/// mirroring <paramref name="alarmCapableConnectionNames"/>). When supplied (and
/// <paramref name="enforceConnectionBindings"/> is <c>true</c>), every bound
/// connection is checked against this set so a binding to a phantom/stale connection
/// is caught. <c>null</c> skips the "exists at site" half (it stays inert).
/// </param>
/// <returns>A merged <see cref="ValidationResult"/> aggregating all pipeline stage outcomes.</returns>
public ValidationResult Validate(FlattenedConfiguration configuration, IReadOnlyList<ResolvedScript>? sharedScripts = null)
public ValidationResult Validate(
FlattenedConfiguration configuration,
IReadOnlyList<ResolvedScript>? sharedScripts = null,
IReadOnlySet<string>? alarmCapableConnectionNames = null,
bool enforceConnectionBindings = false,
IReadOnlySet<string>? siteConnectionNames = null)
{
ArgumentNullException.ThrowIfNull(configuration);
@@ -58,8 +97,8 @@ public class ValidationService
ValidateAlarmTriggerReferences(configuration),
ValidateScriptTriggerReferences(configuration),
ValidateExpressionTriggers(configuration),
ValidateConnectionBindingCompleteness(configuration),
_semanticValidator.Validate(configuration, sharedScripts)
ValidateConnectionBindingCompleteness(configuration, enforceConnectionBindings, siteConnectionNames),
_semanticValidator.Validate(configuration, sharedScripts, alarmCapableConnectionNames)
};
return ValidationResult.Merge(results.ToArray());
@@ -497,21 +536,88 @@ public class ValidationService
}
/// <summary>
/// Validates that all data-sourced attributes have connection bindings.
/// Validates connection bindings on data-sourced attributes. Only DATA-SOURCED
/// attributes (<see cref="ResolvedAttribute.DataSourceReference"/> != <c>null</c>)
/// require a binding; static attributes are never flagged.
///
/// M2.8 (#23): the severity is context-dependent (see <paramref name="enforce"/>).
/// At template design time (<c>enforce == false</c>) an unbound data-sourced
/// attribute is legitimate (bindings are set later) so it is only a non-blocking
/// <c>Warning</c>. On the deploy path (<c>enforce == true</c>) an unbound
/// data-sourced attribute is a deploy-gating <c>Error</c>, and — when
/// <paramref name="siteConnectionNames"/> is supplied — a binding to a connection
/// that does not exist on the target site is also an <c>Error</c>.
/// </summary>
/// <param name="configuration">The flattened configuration to validate.</param>
/// <returns>A <see cref="ValidationResult"/> with warnings for each data-sourced attribute that lacks a connection binding.</returns>
public static ValidationResult ValidateConnectionBindingCompleteness(FlattenedConfiguration configuration)
/// <param name="enforce">
/// <c>true</c> on the deploy path (unbound → Error + "exists at site" check);
/// <c>false</c> at design time (unbound → Warning only). Defaults to <c>false</c>
/// so design-time validation stays non-blocking.
/// </param>
/// <param name="siteConnectionNames">
/// Optional set of data-connection names that actually exist on the target site.
/// When non-<c>null</c> and <paramref name="enforce"/> is <c>true</c>, every bound
/// connection name is checked against this set. <c>null</c> skips the "exists at
/// site" check.
/// </param>
/// <returns>A <see cref="ValidationResult"/> with the binding findings at the appropriate severity.</returns>
public static ValidationResult ValidateConnectionBindingCompleteness(
FlattenedConfiguration configuration,
bool enforce = false,
IReadOnlySet<string>? siteConnectionNames = null)
{
var errors = new List<ValidationEntry>();
var warnings = new List<ValidationEntry>();
foreach (var attr in configuration.Attributes)
{
if (attr.DataSourceReference != null && attr.BoundDataConnectionId == null)
// Only data-sourced attributes participate in binding validation.
if (attr.DataSourceReference == null)
continue;
if (attr.BoundDataConnectionId == null)
{
warnings.Add(ValidationEntry.Warning(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.",
// Unbound data-sourced attribute. At deploy time this gates the
// deployment; at design time the binding is set later, so it is
// only advisory.
//
// NOTE: this branch fires for TWO distinct cases that are
// indistinguishable post-flattening:
// 1. The user genuinely never set a binding.
// 2. The user set a binding, but FlatteningService.ApplyConnectionBindings
// silently dropped it because the stored DataConnectionId no longer
// resolves to any loaded site DataConnection (i.e. the connection was
// deleted after the binding was created). In that case the flattener
// leaves BoundDataConnectionId == null, and the attribute falls into
// this same "unbound → Error" path.
// The error message covers both cases; no behavioral change is needed.
if (enforce)
{
errors.Add(ValidationEntry.Error(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.",
attr.CanonicalName));
}
else
{
warnings.Add(ValidationEntry.Warning(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' has a data source reference but no connection binding.",
attr.CanonicalName));
}
// Skip the "exists at site" check below — it only applies to bound attributes.
continue;
}
// The attribute IS bound. On the deploy path, verify the bound connection
// actually exists on the target site (resolve against the site's connection
// set, not just name presence in the config). A binding pointing at a
// non-existent/stale site connection is a deploy-gating Error.
if (enforce && siteConnectionNames != null &&
attr.BoundDataConnectionName != null &&
!siteConnectionNames.Contains(attr.BoundDataConnectionName))
{
errors.Add(ValidationEntry.Error(ValidationCategory.ConnectionBinding,
$"Attribute '{attr.CanonicalName}' is bound to data connection '{attr.BoundDataConnectionName}' " +
"which does not exist on the target site.",
attr.CanonicalName));
}
}
@@ -2339,6 +2339,7 @@ public sealed class BundleImporter : IBundleImporter
ParameterDefinitions = s.ParameterDefinitions,
ReturnDefinition = s.ReturnDefinition,
MinTimeBetweenRuns = s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds,
Source = "Template",
});
}
@@ -99,7 +99,10 @@ public sealed record TemplateScriptDto(
string? ParameterDefinitions,
string? ReturnDefinition,
bool IsLocked,
TimeSpan? MinTimeBetweenRuns);
TimeSpan? MinTimeBetweenRuns,
// M2.5 (#9): per-script execution timeout (seconds). Additive trailing field;
// null on bundles written before this field existed.
int? ExecutionTimeoutSeconds = null);
public sealed record TemplateCompositionDto(
string InstanceName,
@@ -74,7 +74,8 @@ public sealed class EntitySerializer
ParameterDefinitions: s.ParameterDefinitions,
ReturnDefinition: s.ReturnDefinition,
IsLocked: s.IsLocked,
MinTimeBetweenRuns: s.MinTimeBetweenRuns)).ToList(),
MinTimeBetweenRuns: s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds: s.ExecutionTimeoutSeconds)).ToList(),
Compositions: t.Compositions.Select(c => new TemplateCompositionDto(
InstanceName: c.InstanceName,
ComposedTemplateName: templateNameById.TryGetValue(c.ComposedTemplateId, out var cn) ? cn : string.Empty)).ToList());
@@ -227,6 +228,7 @@ public sealed class EntitySerializer
ReturnDefinition = s.ReturnDefinition,
IsLocked = s.IsLocked,
MinTimeBetweenRuns = s.MinTimeBetweenRuns,
ExecutionTimeoutSeconds = s.ExecutionTimeoutSeconds,
});
}
return t;
@@ -1,4 +1,5 @@
using System.Security.Claims;
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Security;
using Bunit;
using Microsoft.AspNetCore.Components.Authorization;
@@ -12,7 +13,10 @@ using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Services;
using ZB.MOM.WW.ScadaBridge.Commons.Types;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Deployment;
using ZB.MOM.WW.ScadaBridge.Communication;
using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.CentralUI.Components.Shared;
@@ -292,6 +296,90 @@ public class TopologyPageTests : BunitContext
Assert.Throws<Bunit.MissingEventHandlerException>(() => instanceLabel.DoubleClick());
}
[Fact]
public void Diff_ConnectionEndpointChange_RendersConnectionSection()
{
// TemplateEngine-018 / DeploymentManager-018: a standalone connection
// endpoint edit (no per-attribute binding change) must surface in the
// deployment-diff modal. Before ConnectionChanges was wired through
// ComputeDiff + the UI, this redeploy showed only the stale-hash badge
// with no indication that the connection endpoint had moved.
// The DiffDialog body-scroll lock + focus call out to JS interop on
// open; loose mode no-ops the handlers we don't explicitly set up.
JSInterop.Mode = JSRuntimeMode.Loose;
var areasBySite = new Dictionary<int, IReadOnlyList<Area>>
{
[1] = new List<Area> { new("Line-1") { Id = 10, SiteId = 1 } }
};
SeedRepos(
sites: new[] { new Site("Plant-A", "plant-a") { Id = 1 } },
instances: new[]
{
new Instance("Pump-001") { Id = 100, SiteId = 1, AreaId = 10, State = InstanceState.Enabled }
},
areasBySite: areasBySite);
// Deployed snapshot: connection "plc1" points at host-a.
var deployedConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Pump-001",
Connections = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-a:4840\"}",
FailoverRetryCount = 3,
}
}
};
_deployRepo.GetDeployedSnapshotByInstanceIdAsync(100, Arg.Any<CancellationToken>())
.Returns(Task.FromResult<DeployedConfigSnapshot?>(
new DeployedConfigSnapshot("dep-1", "hash-old",
JsonSerializer.Serialize(deployedConfig))));
// Current template-derived config: same connection now points at host-b.
var currentConfig = new FlattenedConfiguration
{
InstanceUniqueName = "Pump-001",
Connections = new Dictionary<string, ConnectionConfig>
{
["plc1"] = new ConnectionConfig
{
Protocol = "OpcUa",
ConfigurationJson = "{\"endpoint\":\"opc.tcp://host-b:4840\"}",
FailoverRetryCount = 3,
}
}
};
_pipeline.FlattenAndValidateAsync(100, Arg.Any<CancellationToken>())
.Returns(Task.FromResult(Result<FlatteningPipelineResult>.Success(
new FlatteningPipelineResult(currentConfig, "hash-new", ValidationResult.Success()))));
var cut = Render<TopologyPage>();
FindToggleForLabel(cut, "Plant-A")!.Click();
FindToggleForLabel(cut, "Line-1")!.Click();
// The per-node action menu only renders after a context-menu (right
// click) on the instance row, so open it first, then click "Diff".
var instanceRow = cut.FindAll(".tv-row")
.First(row => row.QuerySelector(".tv-label")?.TextContent == "Pump-001");
instanceRow.ContextMenu();
var diffButton = cut.FindAll("button.dropdown-item")
.First(b => b.TextContent.Trim() == "Diff");
diffButton.Click();
var markup = cut.Markup;
Assert.Contains("Connections", markup);
Assert.Contains("plc1", markup);
Assert.Contains("host-a", markup);
Assert.Contains("host-b", markup);
// The change is a modification, so the row carries the "Changed" badge.
Assert.Contains("Changed", markup);
}
[Fact]
public void LegacyInstancesRoute_IsDeclaredOnTopologyPage()
{
@@ -60,6 +60,50 @@ public class DebugStreamBridgeActorTests : TestKit
return new TestContext(actor, commProbe, mockClient, events, terminated);
}
[Fact]
public void On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Tears_Down_Stream_And_Terminates()
{
// M2.11 (revised for M2.18 stream-first): the gRPC subscription is now opened
// up-front in PreStart, so when the site reports InstanceNotFound=true the
// bridge actor must
// (a) forward the not-found snapshot to _onEvent so DebugStreamService's TCS
// resolves and the caller can inspect the flag,
// (b) tear DOWN the already-opened gRPC stream (Unsubscribe the just-opened
// correlation) rather than enter pass-through, and
// (c) stop itself cleanly.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>(); // initial subscribe envelope
// Stream-first: the gRPC subscription is opened before the snapshot arrives.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var notFoundSnapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow,
InstanceNotFound: true);
Watch(ctx.BridgeActor);
ctx.BridgeActor.Tell(notFoundSnapshot);
// (a) _onEvent must receive the not-found snapshot
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
var received = Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.True(received.InstanceNotFound);
}
// (b) the just-opened gRPC stream is torn down (not left running / no pass-through)
AwaitCondition(() => ctx.MockGrpcClient.UnsubscribedCorrelationIds.Contains("corr-1"),
TimeSpan.FromSeconds(3));
// (c) actor terminates cleanly
ExpectTerminated(ctx.BridgeActor, TimeSpan.FromSeconds(3));
}
[Fact]
public void PreStart_Sends_SubscribeDebugViewRequest_Via_ClusterClient()
{
@@ -94,11 +138,18 @@ public class DebugStreamBridgeActorTests : TestKit
}
[Fact]
public void On_Snapshot_Opens_GrpcStream()
public void On_Snapshot_Does_Not_Open_Additional_GrpcStream()
{
// M2.18 stream-first: the gRPC subscription is opened in PreStart, BEFORE the
// snapshot arrives. After the snapshot is delivered the actor switches to
// pass-through — it must NOT open a second subscription. Exactly ONE subscribe
// call should have been made (the PreStart one).
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
// Verify the stream is already open before the snapshot.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
@@ -107,11 +158,12 @@ public class DebugStreamBridgeActorTests : TestKit
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var call = ctx.MockGrpcClient.SubscribeCalls[0];
Assert.Equal("corr-1", call.CorrelationId);
Assert.Equal(InstanceName, call.InstanceUniqueName);
// After snapshot delivery, still exactly ONE subscribe — no additional stream opened.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
var singleCall = Assert.Single(ctx.MockGrpcClient.SubscribeCalls);
Assert.Equal("corr-1", singleCall.CorrelationId);
Assert.Equal(InstanceName, singleCall.InstanceUniqueName);
}
[Fact]
@@ -348,6 +400,369 @@ public class DebugStreamBridgeActorTests : TestKit
Assert.Equal("corr-1", factory.ClientFor(GrpcNodeB).SubscribeCalls[0].CorrelationId);
}
// ---------------------------------------------------------------------
// M2.18 (#26) — stream-first + replay/dedup
// ---------------------------------------------------------------------
[Fact]
public void PreStart_Opens_GrpcStream_Before_Snapshot_Arrives()
{
// M2.18: the gRPC subscription must be opened in PreStart (stream-first),
// BEFORE the snapshot is delivered, so live events start flowing during the
// snapshot-build + network-transit window. The old lifecycle opened the
// stream only after the snapshot arrived, losing gap-window events.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>(); // initial subscribe envelope
// No snapshot sent yet — the stream must already be open.
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
Assert.Equal("corr-1", ctx.MockGrpcClient.SubscribeCalls[0].CorrelationId);
Assert.Equal(InstanceName, ctx.MockGrpcClient.SubscribeCalls[0].InstanceUniqueName);
// _onEvent must NOT have fired — buffering, not delivering.
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
}
[Fact]
public void GapWindow_Event_Buffered_Before_Snapshot_Is_Delivered_Exactly_Once_After_Snapshot()
{
// M2.18: an event arriving DURING the snapshot window (before the snapshot
// is delivered) is buffered, then flushed exactly once AFTER the snapshot.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
// Live event arrives BEFORE the snapshot — its entity is NOT in the snapshot,
// so it is a genuine gap-window event that must survive.
var gapEvent = new AttributeValueChanged(InstanceName, "IO", "Pressure", 99.9, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(gapEvent);
// While buffering, _onEvent has not fired.
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
// snapshot then the buffered gap-window event, exactly once, in that order.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
Assert.Equal("Pressure", flushed.AttributeName);
}
}
[Fact]
public void Buffered_Event_Already_Reflected_In_Snapshot_Is_Dropped()
{
// M2.18 dedup: a buffered event whose entity is in the snapshot with an equal
// or newer snapshot timestamp (buffered.Timestamp <= snapshot.Timestamp) is
// already reflected and must be DROPPED.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var t0 = DateTimeOffset.UtcNow;
// Buffered event for "Temp" at t0.
var buffered = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", t0);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(buffered);
// Snapshot already contains "Temp" at the SAME timestamp t0 → buffered is a dup.
var snapAttr = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", t0);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged> { snapAttr },
new List<AlarmStateChanged>(),
t0);
ctx.BridgeActor.Tell(snapshot);
// Only the snapshot is delivered; the buffered duplicate is dropped.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Give a beat to ensure no extra (dropped) event sneaks through.
Thread.Sleep(200);
lock (ctx.ReceivedEvents)
{
Assert.Single(ctx.ReceivedEvents);
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
}
}
[Fact]
public void Buffered_Event_Strictly_Newer_Than_Snapshot_Entity_Is_Delivered()
{
// M2.18 dedup: a buffered event strictly newer than the snapshot's entry for
// the same entity (buffered.Timestamp > snapshot.Timestamp) is NOT a dup and
// must be DELIVERED after the snapshot.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapTime = DateTimeOffset.UtcNow;
var newerTime = snapTime.AddMilliseconds(1);
// Buffered event for "Temp" strictly NEWER than the snapshot's "Temp".
var buffered = new AttributeValueChanged(InstanceName, "IO", "Temp", 50.0, "Good", newerTime);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(buffered);
var snapAttr = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good", snapTime);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged> { snapAttr },
new List<AlarmStateChanged>(),
snapTime);
ctx.BridgeActor.Tell(snapshot);
// snapshot then the strictly-newer buffered event.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
Assert.Equal(50.0, flushed.Value);
Assert.Equal(newerTime, flushed.Timestamp);
}
}
[Fact]
public void Buffered_Alarm_Dedup_Uses_AlarmIdentity_And_Timestamp()
{
// M2.18 dedup for alarms: identity = (instance, alarm name, source reference).
// A buffered alarm older-or-equal to the snapshot's same-identity alarm is
// dropped; a strictly-newer one is delivered.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var t0 = DateTimeOffset.UtcNow;
// Buffered: "PumpFault" at t0 (dup) and "Overheat" at t0+1ms (newer, deliver).
var dupAlarm = new AlarmStateChanged(InstanceName, "PumpFault",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 500, t0);
var newerAlarm = new AlarmStateChanged(InstanceName, "Overheat",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 700, t0.AddMilliseconds(1));
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(dupAlarm);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(newerAlarm);
// Snapshot contains BOTH "PumpFault" and "Overheat" at t0.
var snapPumpFault = new AlarmStateChanged(InstanceName, "PumpFault",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 500, t0);
var snapOverheat = new AlarmStateChanged(InstanceName, "Overheat",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Normal, 0, t0);
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged> { snapPumpFault, snapOverheat },
t0);
ctx.BridgeActor.Tell(snapshot);
// snapshot + only the strictly-newer "Overheat" alarm (PumpFault dropped).
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
Thread.Sleep(200);
lock (ctx.ReceivedEvents)
{
Assert.Equal(2, ctx.ReceivedEvents.Count);
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
var flushed = Assert.IsType<AlarmStateChanged>(ctx.ReceivedEvents[1]);
Assert.Equal("Overheat", flushed.AlarmName);
Assert.Equal(700, flushed.Priority);
}
}
[Fact]
public void Buffered_Events_Flushed_In_Arrival_Order()
{
// M2.18: ordering preserved across multiple buffered events (none are dups —
// their entities are absent from the snapshot).
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var baseTime = DateTimeOffset.UtcNow;
var sub = ctx.MockGrpcClient.SubscribeCalls[0];
sub.OnEvent(new AttributeValueChanged(InstanceName, "IO", "A", 1, "Good", baseTime));
sub.OnEvent(new AlarmStateChanged(InstanceName, "AlarmX",
ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmState.Active, 100, baseTime));
sub.OnEvent(new AttributeValueChanged(InstanceName, "IO", "B", 2, "Good", baseTime));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
baseTime);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 4; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.Equal("A", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
Assert.Equal("AlarmX", Assert.IsType<AlarmStateChanged>(ctx.ReceivedEvents[2]).AlarmName);
Assert.Equal("B", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[3]).AttributeName);
}
}
[Fact]
public void PassThrough_After_Flush_Delivers_Subsequent_Events_Immediately()
{
// M2.18: after the snapshot+flush the actor switches to pass-through — later
// events go straight to _onEvent (no buffering, no dup).
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Post-snapshot event — must be delivered immediately, exactly once.
var postEvent = new AttributeValueChanged(InstanceName, "IO", "Temp", 42.5, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(postEvent);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]);
}
}
[Fact]
public void InstanceNotFound_After_StreamFirst_Tears_Down_Stream_And_Does_Not_PassThrough()
{
// M2.18 + M2.11: stream-first means the gRPC subscription is already open
// when an InstanceNotFound snapshot arrives. The bridge must tear that stream
// down (Unsubscribe the just-opened correlation), deliver the not-found
// snapshot, NOT enter pass-through, and stop cleanly.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
// Stream opened up-front (stream-first).
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var notFoundSnapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow,
InstanceNotFound: true);
Watch(ctx.BridgeActor);
ctx.BridgeActor.Tell(notFoundSnapshot);
// Not-found snapshot delivered.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.True(Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]).InstanceNotFound);
}
// The just-opened stream must be torn down.
AwaitCondition(() => ctx.MockGrpcClient.UnsubscribedCorrelationIds.Contains("corr-1"),
TimeSpan.FromSeconds(3));
// Stops cleanly.
ExpectTerminated(ctx.BridgeActor, TimeSpan.FromSeconds(3));
// No pass-through: an event arriving after the stop is not delivered.
var late = new AttributeValueChanged(InstanceName, "IO", "Temp", 1, "Good", DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[0].OnEvent(late);
Thread.Sleep(200);
lock (ctx.ReceivedEvents) { Assert.Single(ctx.ReceivedEvents); }
}
[Fact]
public void Reconnect_During_Buffering_Phase_Keeps_Buffering_Until_Snapshot()
{
// M2.18: a gRPC error/reconnect BEFORE the snapshot arrives must remain in the
// buffering phase — events on the new stream are still buffered, then flushed
// when the snapshot finally arrives.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
// Error before snapshot → reconnect (still buffering).
ctx.MockGrpcClient.SubscribeCalls[0].OnError(new Exception("pre-snapshot blip"));
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 2, TimeSpan.FromSeconds(5));
// Event on the reconnected stream — still buffered (snapshot not yet delivered).
var gapEvent = new AttributeValueChanged(InstanceName, "IO", "Late", 7, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[1].OnEvent(gapEvent);
lock (ctx.ReceivedEvents) { Assert.Empty(ctx.ReceivedEvents); }
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
// snapshot + the event buffered across the reconnect.
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.IsType<DebugViewSnapshot>(ctx.ReceivedEvents[0]);
Assert.Equal("Late", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
}
}
[Fact]
public void Reconnect_After_Snapshot_Resumes_PassThrough_Not_Buffering()
{
// M2.18: a mid-session reconnect (after the snapshot was already delivered)
// must resume pass-through — the snapshot is a one-time thing and events on
// the reconnected stream are delivered immediately, not re-buffered.
var ctx = CreateBridgeActor();
ctx.CommProbe.ExpectMsg<SiteEnvelope>();
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 1, TimeSpan.FromSeconds(3));
var snapshot = new DebugViewSnapshot(
InstanceName,
new List<AttributeValueChanged>(),
new List<AlarmStateChanged>(),
DateTimeOffset.UtcNow);
ctx.BridgeActor.Tell(snapshot);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 1; } },
TimeSpan.FromSeconds(3));
// Mid-session reconnect.
ctx.MockGrpcClient.SubscribeCalls[0].OnError(new Exception("mid-session blip"));
AwaitCondition(() => ctx.MockGrpcClient.SubscribeCalls.Count == 2, TimeSpan.FromSeconds(5));
// Event on the reconnected stream — delivered immediately (pass-through).
var postEvent = new AttributeValueChanged(InstanceName, "IO", "Temp", 9, "Good",
DateTimeOffset.UtcNow);
ctx.MockGrpcClient.SubscribeCalls[1].OnEvent(postEvent);
AwaitCondition(() => { lock (ctx.ReceivedEvents) { return ctx.ReceivedEvents.Count == 2; } },
TimeSpan.FromSeconds(3));
lock (ctx.ReceivedEvents)
{
Assert.Equal("Temp", Assert.IsType<AttributeValueChanged>(ctx.ReceivedEvents[1]).AttributeName);
}
}
[Fact]
public void RetryCount_RecoveredOnlyAfterStreamStaysStableForStabilityWindow()
{
@@ -394,11 +809,25 @@ public class DebugStreamBridgeActorTests : TestKit
/// <summary>
/// Mock gRPC client that records SubscribeAsync and Unsubscribe calls.
/// <para>
/// <b>Thread safety:</b> <see cref="SubscribeCalls"/> and
/// <see cref="UnsubscribedCorrelationIds"/> are written from the actor/background thread
/// (via <see cref="SubscribeAsync"/> and <see cref="Unsubscribe"/>) and read from the test
/// thread (via <c>AwaitCondition</c> / assertions). All access goes through a shared lock
/// to match the <c>lock (events)</c> pattern used for <c>ctx.ReceivedEvents</c>.
/// </para>
/// </summary>
internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
{
public List<MockSubscription> SubscribeCalls { get; } = new();
public List<string> UnsubscribedCorrelationIds { get; } = new();
private readonly object _lock = new();
private readonly List<MockSubscription> _subscribeCalls = new();
private readonly List<string> _unsubscribedCorrelationIds = new();
/// <summary>Returns a snapshot of subscribe calls, taken under the internal lock.</summary>
public List<MockSubscription> SubscribeCalls { get { lock (_lock) { return _subscribeCalls.ToList(); } } }
/// <summary>Returns a snapshot of unsubscribed correlation IDs, taken under the internal lock.</summary>
public List<string> UnsubscribedCorrelationIds { get { lock (_lock) { return _unsubscribedCorrelationIds.ToList(); } } }
private MockSiteStreamGrpcClient(bool _) : base() { }
@@ -414,7 +843,7 @@ internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
CancellationToken ct)
{
var subscription = new MockSubscription(correlationId, instanceUniqueName, onEvent, onError, ct);
SubscribeCalls.Add(subscription);
lock (_lock) { _subscribeCalls.Add(subscription); }
// Return a task that completes when cancelled (simulates long-running stream)
var tcs = new TaskCompletionSource();
@@ -424,7 +853,7 @@ internal class MockSiteStreamGrpcClient : SiteStreamGrpcClient
public override void Unsubscribe(string correlationId)
{
UnsubscribedCorrelationIds.Add(correlationId);
lock (_lock) { _unsubscribedCorrelationIds.Add(correlationId); }
}
}
@@ -0,0 +1,318 @@
using System.Text.RegularExpressions;
namespace ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests;
/// <summary>
/// Code-level guard for the AuditLog append-only invariant (task M2.10, #18).
///
/// The DB-role control (DENY UPDATE / DENY DELETE on dbo.AuditLog in migration
/// 20260602174346_CollapseAuditLogToCanonical) is the runtime enforcement layer.
/// This test is the compile-time / test-time backstop: it fails the test run if
/// any C# source file in the ConfigurationDatabase project contains an UPDATE or
/// DELETE statement that targets the AuditLog table.
///
/// <b>Matching rule (see <c>ContainsAuditLogMutation</c> for full detail)</b>
/// A line is flagged as a violation iff it matches the DML-syntax pattern:
/// • <c>UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b</c> — UPDATE targeting AuditLog
/// • <c>DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b</c> — DELETE targeting AuditLog
///
/// These tight DML-syntax patterns naturally exclude false positives:
/// - DENY UPDATE ON dbo.AuditLog … → "DENY" comes before UPDATE; the regex
/// requires UPDATE to be immediately followed by (optional schema.) AuditLog,
/// so "UPDATE ON" does NOT match "UPDATE AuditLog".
/// - ALTER TABLE dbo.AuditLog SWITCH … → ALTER TABLE precedes the table name;
/// no UPDATE/DELETE keyword present.
/// - Comments like "// AuditLog … UPDATE …" → UPDATE is not immediately followed
/// by AuditLog (there are intervening words).
/// - DELETE FROM Notifications … → AuditLog not present.
///
/// <b>Known limitations:</b> This guard scans only raw SQL strings — EF Core methods
/// such as <c>ExecuteDeleteAsync</c>, <c>ExecuteUpdateAsync</c>, and <c>RemoveRange</c>
/// targeting the AuditLog entity are NOT covered and must never be introduced.
/// Additionally, the scan is line-oriented: DML where the keyword and table name appear
/// on separate lines is an accepted, undetected edge case.
/// </summary>
public class AuditLogAppendOnlyGuardTests
{
// ---------------------------------------------------------------------------
// Source root location — same walk-up pattern used by ArchitecturalConstraintTests
// in the Commons.Tests project.
// ---------------------------------------------------------------------------
private static string GetConfigurationDatabaseSourceDirectory()
{
// Walk up from the test binary output directory until we find the
// ConfigurationDatabase csproj (a known anchor in the repo tree).
var dir = new DirectoryInfo(AppContext.BaseDirectory);
while (dir != null)
{
var candidate = Path.Combine(
dir.FullName,
"src",
"ZB.MOM.WW.ScadaBridge.ConfigurationDatabase",
"ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.csproj");
if (File.Exists(candidate))
{
return Path.GetDirectoryName(candidate)!;
}
dir = dir.Parent;
}
throw new InvalidOperationException(
"Could not locate ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.csproj " +
"by walking up from the test output directory. " +
"Ensure the test is run from inside the repo clone.");
}
// ---------------------------------------------------------------------------
// Detection helper — kept as a static method so it can be unit-tested in
// isolation below without requiring any file I/O.
// ---------------------------------------------------------------------------
/// <summary>
/// Returns <see langword="true"/> when the supplied text (typically a single
/// source line) contains a SQL UPDATE or DELETE DML statement that directly
/// targets the <c>AuditLog</c> table.
///
/// <b>Matching rule.</b> The regex requires the DML keyword to be
/// immediately followed (possibly via FROM) by the optional schema prefix
/// (<c>dbo.</c> or <c>[dbo].</c>) and then the table name <c>AuditLog</c>
/// or <c>[AuditLog]</c> as a whole word:
/// <code>
/// UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
/// DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
/// </code>
/// This tight DML-syntax pattern naturally excludes false positives without
/// any additional keyword checks:
/// <list type="bullet">
/// <item><description>
/// <c>DENY UPDATE ON dbo.AuditLog …</c> — "UPDATE ON" is never immediately
/// followed by AuditLog; the pattern requires UPDATE → optional schema → AuditLog.
/// </description></item>
/// <item><description>
/// <c>ALTER TABLE dbo.AuditLog SWITCH …</c> — no UPDATE/DELETE keyword present.
/// </description></item>
/// <item><description>
/// <c>// AuditLog is append-only; never issue an UPDATE against it.</c> —
/// UPDATE is not followed by AuditLog here.
/// </description></item>
/// <item><description>
/// <c>DELETE FROM dbo.Notifications …</c> — AuditLog not present.
/// </description></item>
/// </list>
/// </summary>
/// <param name="text">A single source line (or any string to probe).</param>
/// <returns><see langword="true"/> if a mutation against AuditLog is detected.</returns>
internal static bool ContainsAuditLogMutation(string text)
{
if (string.IsNullOrEmpty(text))
{
return false;
}
// DML-syntax pattern: the UPDATE or DELETE keyword must be directly followed
// (optionally via FROM) by the optional schema qualifier and then the table name.
//
// Schema sub-pattern : (?:\[?dbo\]?\.)?
// matches: nothing, "dbo.", "[dbo]."
//
// Table sub-pattern : \[?AuditLog\]?
// matches: "AuditLog", "[AuditLog]"
//
// UPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
// matches: "UPDATE AuditLog", "UPDATE dbo.AuditLog",
// "UPDATE [AuditLog]", "UPDATE [dbo].[AuditLog]"
// does NOT match: "DENY UPDATE ON dbo.AuditLog" (UPDATE is followed by ON)
//
// DELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b
// matches: "DELETE FROM AuditLog", "DELETE FROM dbo.AuditLog",
// "DELETE FROM [AuditLog]", "DELETE FROM [dbo].[AuditLog]"
// does NOT match: "DENY DELETE ON dbo.AuditLog" (DELETE is followed by ON)
return AuditLogMutationPattern.IsMatch(text);
}
private static readonly Regex AuditLogMutationPattern = new(
@"\bUPDATE\s+(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b" +
@"|\bDELETE\s+(?:FROM\s+)?(?:\[?dbo\]?\.)?(?:\[?AuditLog\]?)\b",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
// ---------------------------------------------------------------------------
// Guard test: scan every *.cs file in ConfigurationDatabase (excluding
// Designer/Snapshot EF artefacts and the obj/ directory).
// ---------------------------------------------------------------------------
[Fact]
public void ConfigurationDatabase_ShouldNotContainAuditLogMutations()
{
var sourceDir = GetConfigurationDatabaseSourceDirectory();
// Enumerate all .cs files; exclude EF scaffolding and build output.
var csFiles = Directory.GetFiles(sourceDir, "*.cs", SearchOption.AllDirectories)
.Where(f => !f.Contains(Path.DirectorySeparatorChar + "obj" + Path.DirectorySeparatorChar))
.Where(f => !f.EndsWith(".Designer.cs", StringComparison.OrdinalIgnoreCase))
.Where(f => !f.EndsWith("ModelSnapshot.cs", StringComparison.OrdinalIgnoreCase))
.ToList();
Assert.True(csFiles.Count > 0,
$"Expected to find .cs files under {sourceDir} but found none — source directory location may be wrong.");
var violations = new List<string>();
foreach (var file in csFiles)
{
var content = File.ReadAllText(file);
// Scan line-by-line so violation messages cite the exact line number.
var lines = content.Split('\n');
for (var i = 0; i < lines.Length; i++)
{
if (ContainsAuditLogMutation(lines[i]))
{
var relativePath = Path.GetRelativePath(sourceDir, file);
violations.Add($"{relativePath}:{i + 1}: {lines[i].Trim()}");
}
}
}
Assert.True(violations.Count == 0,
"AuditLog append-only guard: found UPDATE/DELETE targeting dbo.AuditLog " +
"in ConfigurationDatabase source. AuditLog is APPEND-ONLY (retention uses " +
"partition-switch DDL, not row DELETE). Violation(s):\n" +
string.Join("\n", violations));
}
// ---------------------------------------------------------------------------
// Self-verifying matcher unit tests — prove the helper does what it claims.
// ---------------------------------------------------------------------------
[Fact]
public void ContainsAuditLogMutation_ReturnsFalse_ForCleanSource()
{
// The guard scan over real source PASSES (no violations) — this fact is
// already asserted by ConfigurationDatabase_ShouldNotContainAuditLogMutations.
// Here we verify the helper directly on a representative set of CLEAN lines
// that appear in the production source tree.
// INSERT is not a mutation (append-only operations are fine).
Assert.False(ContainsAuditLogMutation(
"INSERT INTO dbo.AuditLog (EventId, OccurredAtUtc) VALUES (@id, @ts);"));
// SELECT is not a mutation.
Assert.False(ContainsAuditLogMutation(
"SELECT COUNT(*) FROM dbo.AuditLog WHERE OccurredAtUtc >= @threshold;"));
// ALTER TABLE SWITCH is the retention purge — not a row-level mutation.
Assert.False(ContainsAuditLogMutation(
"ALTER TABLE dbo.AuditLog SWITCH PARTITION 3 TO dbo.AuditLog_Staging;"));
// DENY DDL from the role-grant migration — must not be flagged.
Assert.False(ContainsAuditLogMutation(
"DENY UPDATE ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"DENY DELETE ON dbo.AuditLog TO scadabridge_audit_writer;"));
// GRANT DDL — also must not be flagged.
Assert.False(ContainsAuditLogMutation(
"GRANT INSERT ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"GRANT SELECT ON dbo.AuditLog TO scadabridge_audit_writer;"));
// DELETE on a different table — AuditLog not on the same line.
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.Notifications WHERE Status = 'Delivered';"));
// DELETE on a different table even though AuditLog appears nearby in the
// same line but beyond the proximity window (padded to >120 chars between).
var longSeparator = new string(' ', 130);
Assert.False(ContainsAuditLogMutation(
$"DELETE FROM dbo.Notifications WHERE Id = @id;{longSeparator}-- see also AuditLog"));
// Comment-only mention of AuditLog with UPDATE elsewhere in a comment.
Assert.False(ContainsAuditLogMutation(
"// AuditLog is append-only; never issue an UPDATE against it."));
// TRUNCATE on the staging table (not AuditLog directly); staging name only.
Assert.False(ContainsAuditLogMutation(
"TRUNCATE TABLE dbo.AuditLog_Staging_abc123;"));
}
[Fact]
public void ContainsAuditLogMutation_ReturnsTrue_ForPlantedViolations()
{
// Planted positive cases — the guard MUST catch these.
// Classic UPDATE targeting AuditLog.
Assert.True(ContainsAuditLogMutation(
"UPDATE AuditLog SET Status = 'Corrected' WHERE EventId = @id;"));
// UPDATE with schema prefix.
Assert.True(ContainsAuditLogMutation(
"UPDATE dbo.AuditLog SET DetailsJson = @json WHERE EventId = @id;"));
// DELETE FROM AuditLog.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM AuditLog WHERE OccurredAtUtc < @threshold;"));
// DELETE with schema prefix.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM dbo.AuditLog WHERE Status = 'Parked';"));
// Mixed case (SQL is case-insensitive in practice).
Assert.True(ContainsAuditLogMutation(
"update dbo.AuditLog set Actor = 'system' where Actor is null;"));
// AuditLog mentioned earlier in the line (e.g. in a comment prefix), with a real
// UPDATE dbo.AuditLog DML following — the DML occurrence must still be caught.
Assert.True(ContainsAuditLogMutation(
"-- AuditLog: UPDATE dbo.AuditLog SET x = 1"));
// ---- Bracketed identifier forms (SSMS-generated SQL) ----
// UPDATE [dbo].[AuditLog] — bracketed schema and bracketed table.
Assert.True(ContainsAuditLogMutation(
"UPDATE [dbo].[AuditLog] SET DetailsJson = @json WHERE EventId = @id;"));
// UPDATE [AuditLog] — bracketed table, no schema prefix.
Assert.True(ContainsAuditLogMutation(
"UPDATE [AuditLog] SET Status = 'Corrected' WHERE EventId = @id;"));
// DELETE FROM [dbo].[AuditLog] — bracketed schema and bracketed table.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM [dbo].[AuditLog] WHERE OccurredAtUtc < @threshold;"));
// DELETE FROM [AuditLog] — bracketed table, no schema prefix.
Assert.True(ContainsAuditLogMutation(
"DELETE FROM [AuditLog] WHERE OccurredAtUtc < @threshold;"));
}
[Fact]
public void ContainsAuditLogMutation_ReturnsFalse_ForDenyGrantAndPartitionSwitchSamples()
{
// Extra explicit coverage for the four concrete exclusion patterns
// that appear in the real migration files.
// From 20260602174346_CollapseAuditLogToCanonical.cs and 20260520142214_AddAuditLogTable.cs:
Assert.False(ContainsAuditLogMutation(
"DENY UPDATE ON dbo.AuditLog TO scadabridge_audit_writer;"));
Assert.False(ContainsAuditLogMutation(
"DENY DELETE ON dbo.AuditLog TO scadabridge_audit_writer;"));
// From AuditLogRepository.cs SwitchOutPartitionAsync:
Assert.False(ContainsAuditLogMutation(
"ALTER TABLE dbo.AuditLog SWITCH PARTITION ' + CAST(@partitionNumber AS nvarchar(10)) + ' TO dbo.[' + @stagingName + '];"));
// Notifications DELETE (legitimate; AuditLog not present on the line):
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.Notifications WHERE CompletedAtUtc < @cutoff;"));
// Notifications DELETE using bracketed identifiers — AuditLog not present:
Assert.False(ContainsAuditLogMutation(
"DELETE FROM [dbo].[Notifications] WHERE CompletedAtUtc < @cutoff;"));
// SiteCalls DELETE (legitimate; AuditLog not present on the line):
Assert.False(ContainsAuditLogMutation(
"DELETE FROM dbo.SiteCalls WHERE TerminalAtUtc < @cutoff;"));
}
}
@@ -61,6 +61,34 @@ public class TemplateEngineRepositoryTests : IDisposable
Assert.Equal("Slot1", loaded.Compositions.First().InstanceName);
}
[Fact]
public async Task TemplateScript_ExecutionTimeoutSeconds_RoundTripsThroughEf()
{
// M2.5 (#9): the nullable per-script execution timeout must persist and
// reload through EF — both an explicit value and a null (use-global).
var template = new Template("TimeoutTemplate");
template.Scripts.Add(new TemplateScript("WithTimeout", "return 1;")
{
ExecutionTimeoutSeconds = 45
});
template.Scripts.Add(new TemplateScript("NoTimeout", "return 2;")); // null
_context.Templates.Add(template);
await _context.SaveChangesAsync();
// Detach so the reload comes from the store, not the change tracker.
_context.ChangeTracker.Clear();
var loaded = await _context.Templates
.Include(t => t.Scripts)
.SingleAsync(t => t.Name == "TimeoutTemplate");
var withTimeout = loaded.Scripts.Single(s => s.Name == "WithTimeout");
Assert.Equal(45, withTimeout.ExecutionTimeoutSeconds);
var noTimeout = loaded.Scripts.Single(s => s.Name == "NoTimeout");
Assert.Null(noTimeout.ExecutionTimeoutSeconds);
}
[Fact]
public async Task GetTemplateWithChildrenAsync_ReturnsNull_WhenTemplateDoesNotExist()
{
@@ -0,0 +1,166 @@
using Opc.Ua;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests.Adapters;
/// <summary>
/// M2.4 (#8): the OPC UA EventFilter gains a server-side <see cref="ContentFilter"/>
/// WhereClause as a bandwidth optimisation when a condition-type filter is present.
/// The client-side gate in DataConnectionActor remains authoritative; these tests
/// only pin the filter-shaping. No live server required — pure SDK object building.
/// </summary>
public class RealOpcUaClientAlarmFilterTests
{
[Fact]
public void BuildAlarmEventFilter_NoFilter_HasNoWhereClause()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
Assert.NotEmpty(filter.SelectClauses);
Assert.Empty(filter.WhereClause.Elements);
}
[Fact]
public void BuildAlarmEventFilter_WithKnownTypes_BuildsNonEmptyWhereClause()
{
var parsed = AlarmConditionFilter.Parse("LimitAlarmType,DiscreteAlarmType");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.NotEmpty(filter.WhereClause.Elements);
// Two known types → two OfType operands (OR'd when more than one).
var ofTypeCount = filter.WhereClause.Elements.Count(e => e.FilterOperator == FilterOperator.OfType);
Assert.Equal(2, ofTypeCount);
Assert.Contains(filter.WhereClause.Elements, e => e.FilterOperator == FilterOperator.Or);
}
[Fact]
public void BuildAlarmEventFilter_SingleKnownType_BuildsSingleOfType_NoOr()
{
var parsed = AlarmConditionFilter.Parse("AlarmConditionType");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Single(filter.WhereClause.Elements);
Assert.Equal(FilterOperator.OfType, filter.WhereClause.Elements[0].FilterOperator);
}
[Fact]
public void BuildAlarmEventFilter_TypeMatchingIsCaseInsensitive()
{
var parsed = AlarmConditionFilter.Parse("limitalarmtype");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Single(filter.WhereClause.Elements, e => e.FilterOperator == FilterOperator.OfType);
}
[Fact]
public void BuildAlarmEventFilter_AllUnknownTypes_OmitsWhereClause()
{
// Custom/vendor type names we cannot map to standard NodeIds are skipped
// server-side; the client-side gate still enforces them. Omitting the
// WhereClause is the safe choice — a partial WhereClause would drop the
// unmapped types at the server and break correctness.
var parsed = AlarmConditionFilter.Parse("MyVendorCustomAlarm,AnotherCustomThing");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Empty(filter.WhereClause.Elements);
}
[Fact]
public void BuildAlarmEventFilter_MixedKnownAndUnknown_OmitsWhereClause()
{
// If ANY requested type can't be mapped, a server-side WhereClause would
// silently drop that type's events — so we omit the optimisation entirely
// and let the (authoritative) client gate do the filtering.
var parsed = AlarmConditionFilter.Parse("LimitAlarmType,MyVendorCustomAlarm");
var filter = RealOpcUaClient.BuildAlarmEventFilter(parsed);
Assert.Empty(filter.WhereClause.Elements);
}
// ── SelectClause index alignment (M2.13 / #27) ───────────────────────────
// CRITICAL: HandleAlarmEvent reads fields[N] by position. Verify new clauses
// are APPENDED at indices 1317 so existing mappings (012) are undisturbed.
[Fact]
public void BuildAlarmEventFilter_HasExactly18SelectClauses()
{
// Baseline: 6 base fields + 7 A&C sub-state fields + 5 new appended fields = 18.
// If this count changes, review HandleAlarmEvent index mappings immediately.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
Assert.Equal(18, filter.SelectClauses.Count);
}
[Fact]
public void BuildAlarmEventFilter_Index13_IsAlarmConditionType_ActiveState_TransitionTime()
{
// Index 13 must be AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[13];
Assert.Equal(ObjectTypeIds.AlarmConditionType, clause.TypeDefinitionId);
Assert.Equal(2, clause.BrowsePath.Count);
Assert.Equal("ActiveState", clause.BrowsePath[0].Name);
Assert.Equal("TransitionTime", clause.BrowsePath[1].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index14_IsLimitAlarmType_HighHighLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[14];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("HighHighLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index15_IsLimitAlarmType_HighLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[15];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("HighLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index16_IsLimitAlarmType_LowLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[16];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("LowLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_Index17_IsLimitAlarmType_LowLowLimit()
{
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
var clause = filter.SelectClauses[17];
Assert.Equal(ObjectTypeIds.LimitAlarmType, clause.TypeDefinitionId);
Assert.Equal("LowLowLimit", clause.BrowsePath[0].Name);
}
[Fact]
public void BuildAlarmEventFilter_ExistingIndices0To12_Unchanged()
{
// Guard: the first 13 SelectClauses (indices 012) must remain unchanged so
// that existing HandleAlarmEvent logic is not silently broken by future edits.
var filter = RealOpcUaClient.BuildAlarmEventFilter(AlarmConditionFilter.AllowAll);
// Indices 05: base event fields (EventType…Severity) from BaseEventType.
for (var i = 0; i <= 5; i++)
Assert.Equal(ObjectTypeIds.BaseEventType, filter.SelectClauses[i].TypeDefinitionId);
// Index 6: AlarmConditionType/ActiveState/Id
Assert.Equal(ObjectTypeIds.AlarmConditionType, filter.SelectClauses[6].TypeDefinitionId);
Assert.Equal("ActiveState", filter.SelectClauses[6].BrowsePath[0].Name);
Assert.Equal("Id", filter.SelectClauses[6].BrowsePath[1].Name);
// Index 7: AcknowledgeableConditionType/AckedState/Id
Assert.Equal(ObjectTypeIds.AcknowledgeableConditionType, filter.SelectClauses[7].TypeDefinitionId);
Assert.Equal("AckedState", filter.SelectClauses[7].BrowsePath[0].Name);
// Index 11: ConditionType/ConditionName
Assert.Equal(ObjectTypeIds.ConditionType, filter.SelectClauses[11].TypeDefinitionId);
Assert.Equal("ConditionName", filter.SelectClauses[11].BrowsePath[0].Name);
// Index 12: ConditionType/Comment
Assert.Equal(ObjectTypeIds.ConditionType, filter.SelectClauses[12].TypeDefinitionId);
Assert.Equal("Comment", filter.SelectClauses[12].BrowsePath[0].Name);
}
}
@@ -0,0 +1,113 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.Alarms;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests;
/// <summary>
/// M2.4 (#8): the alarm conditionFilter is a comma-separated, case-insensitive
/// list of condition type names. Blank = allow all. These tests pin the
/// parse-once / IsAllowed predicate that the DataConnectionActor uses as the
/// authoritative client-side gate.
/// </summary>
public class AlarmConditionFilterTests
{
private static NativeAlarmTransition Tx(string typeName,
AlarmTransitionKind kind = AlarmTransitionKind.Raise) =>
new("ref", "obj", typeName, kind,
new AlarmConditionState(true, false, null, AlarmShelveState.Unshelved, false, 500),
"cat", "desc", "msg", "", "", null, DateTimeOffset.UtcNow, "1", "0");
[Theory]
[InlineData(null)]
[InlineData("")]
[InlineData(" ")]
[InlineData(",")]
[InlineData(" , , ")]
public void NullOrBlankFilter_IsEmpty_AllowsEverything(string? filter)
{
var f = AlarmConditionFilter.Parse(filter);
Assert.True(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("anything-at-all")));
}
[Fact]
public void Parse_SplitsCommaSeparatedList()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi,DiscreteAlarm,AnalogLimit.Lo");
Assert.False(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
Assert.True(f.IsAllowed(Tx("AnalogLimit.Lo")));
Assert.False(f.IsAllowed(Tx("AnalogLimit.HiHi")));
}
[Fact]
public void IsAllowed_IsCaseInsensitive()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("analoglimit.hi")));
Assert.True(f.IsAllowed(Tx("ANALOGLIMIT.HI")));
Assert.False(f.IsAllowed(Tx("DiscreteAlarm")));
}
[Fact]
public void Parse_TrimsWhitespaceAroundEachName()
{
var f = AlarmConditionFilter.Parse(" AnalogLimit.Hi ,\tDiscreteAlarm ");
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
}
[Fact]
public void Parse_DropsEmptyEntries_KeepsNonEmpty()
{
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi,, ,DiscreteAlarm");
Assert.False(f.IsEmpty);
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarm")));
Assert.False(f.IsAllowed(Tx("")));
}
[Fact]
public void IsAllowed_NeverDropsSnapshotCompleteFramingSentinel()
{
// SnapshotComplete is a pure framing sentinel (empty AlarmTypeName) that
// drives the NativeAlarmActor's atomic snapshot swap. A type filter must
// never swallow it or the snapshot replay never completes.
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("", AlarmTransitionKind.SnapshotComplete)));
}
[Fact]
public void IsAllowed_FiltersReplayedSnapshotConditionsByType()
{
// Snapshot-kind transitions carry real conditions and ARE filtered.
var f = AlarmConditionFilter.Parse("AnalogLimit.Hi");
Assert.True(f.IsAllowed(Tx("AnalogLimit.Hi", AlarmTransitionKind.Snapshot)));
Assert.False(f.IsAllowed(Tx("DiscreteAlarm", AlarmTransitionKind.Snapshot)));
}
[Fact]
public void Names_ExposesNormalizedSet_ForServerSideOptimization()
{
var f = AlarmConditionFilter.Parse(" AnalogLimit.Hi , DiscreteAlarm ");
Assert.Equal(new[] { "AnalogLimit.Hi", "DiscreteAlarm" }, f.Names.OrderBy(n => n).ToArray());
Assert.Empty(AlarmConditionFilter.Parse(null).Names);
}
[Fact]
public void IsAllowed_OpcUaResolvedFriendlyName_MatchesFriendlyNameFilter()
{
// M2.4 (#8) regression: OPC UA delivers events whose AlarmTypeName, after
// RealOpcUaClient.ResolveAlarmTypeName, is a standard friendly type name
// (e.g. "ExclusiveLevelAlarmType"). A friendly-name filter on that source
// built a correct server WhereClause; the client gate must agree and deliver,
// not drop every event (which the prior NodeId-string AlarmTypeName caused).
var f = AlarmConditionFilter.Parse("ExclusiveLevelAlarmType,DiscreteAlarmType");
Assert.True(f.IsAllowed(Tx("ExclusiveLevelAlarmType")));
Assert.True(f.IsAllowed(Tx("DiscreteAlarmType")));
Assert.False(f.IsAllowed(Tx("OffNormalAlarmType")));
}
}
@@ -23,10 +23,27 @@ public class DataConnectionActorAlarmTests : TestKit
};
private static NativeAlarmTransition Raise(string sourceRef, string sourceObj) =>
new(sourceRef, sourceObj, "AnalogLimit.Hi", AlarmTransitionKind.Raise,
Raise(sourceRef, sourceObj, "AnalogLimit.Hi");
private static NativeAlarmTransition Raise(string sourceRef, string sourceObj, string typeName,
AlarmTransitionKind kind = AlarmTransitionKind.Raise) =>
new(sourceRef, sourceObj, typeName, kind,
new AlarmConditionState(true, false, null, AlarmShelveState.Unshelved, false, 500),
"Process", "hi", "hi", "", "", null, DateTimeOffset.UtcNow, "92", "90");
private static (IDataConnection Adapter, Func<AlarmTransitionCallback?> Cb) BuildAlarmAdapter()
{
AlarmTransitionCallback? cb = null;
var adapter = Substitute.For<IDataConnection, IAlarmSubscribableConnection>();
adapter.ConnectAsync(Arg.Any<IDictionary<string, string>>(), Arg.Any<CancellationToken>())
.Returns(Task.CompletedTask);
((IAlarmSubscribableConnection)adapter)
.SubscribeAlarmsAsync(Arg.Any<string>(), Arg.Any<string?>(),
Arg.Do<AlarmTransitionCallback>(c => cb = c), Arg.Any<CancellationToken>())
.Returns(Task.FromResult("alarm-sub-1"));
return (adapter, () => cb);
}
[Fact]
public void SubscribeAlarms_RoutesTransitionToInstanceSubscriber()
{
@@ -63,4 +80,119 @@ public class DataConnectionActorAlarmTests : TestKit
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01", null, DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => !m.Success && m.ErrorMessage != null);
}
// ── M2.4 (#8): conditionFilter is now applied client-side in the actor ──
[Fact]
public void SubscribeAlarms_WithTypeFilter_DeliversOnlyMatchingTypes()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
"AnalogLimit.Hi,AnalogLimit.Lo", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Non-matching type is dropped (no message delivered).
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi"));
ExpectNoMsg(TimeSpan.FromMilliseconds(250));
// Matching type is delivered.
cb!(Raise("Tank01.Hi", "Tank01", "AnalogLimit.Hi"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.Hi");
}
[Fact]
public void SubscribeAlarms_WithNullFilter_DeliversAllTypes()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01", null, DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.HiHi");
cb!(Raise("Tank01.Lo", "Tank01", "DiscreteAlarm"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "DiscreteAlarm");
}
[Fact]
public void SubscribeAlarms_FilterMatch_IgnoresCaseAndWhitespace()
{
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
" analoglimit.hi ,\tDISCRETEALARM ", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
cb!(Raise("Tank01.Hi", "Tank01", "AnalogLimit.Hi")); // case differs from filter
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "AnalogLimit.Hi");
cb!(Raise("Tank01.Disc", "Tank01", "DiscreteAlarm"));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.AlarmTypeName == "DiscreteAlarm");
cb!(Raise("Tank01.HiHi", "Tank01", "AnalogLimit.HiHi")); // not listed
ExpectNoMsg(TimeSpan.FromMilliseconds(250));
}
[Fact]
public void SubscribeAlarms_GatewayWideFeed_IsFilteredClientSide()
{
// MxGateway has no server-side filter: its adapter opens ONE gateway-wide
// feed and the actor is the authoritative gate. A filtered source must
// only see its own matching types even though the feed carries everything.
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "MxGateway")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Reactor",
"HighTemp", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Gateway-wide feed delivers a transition for a different source object —
// dropped by source routing.
cb!(Raise("Pump.Fault", "Pump", "HighTemp"));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
// Right source, wrong type — dropped by the client-side type gate.
cb!(Raise("Reactor.LowTemp", "Reactor", "LowTemp"));
ExpectNoMsg(TimeSpan.FromMilliseconds(200));
// Right source, right type — delivered.
cb!(Raise("Reactor.HighTemp", "Reactor", "HighTemp"));
ExpectMsg<NativeAlarmTransitionUpdate>(u =>
u.Transition.SourceObjectReference == "Reactor" && u.Transition.AlarmTypeName == "HighTemp");
}
[Fact]
public void SubscribeAlarms_WithFilter_StillForwardsSnapshotCompleteSentinel()
{
// The SnapshotComplete framing sentinel (empty AlarmTypeName) must survive
// the type gate so the NativeAlarmActor's snapshot swap can complete.
var (adapter, getCb) = BuildAlarmAdapter();
var actor = Sys.ActorOf(Props.Create(() => new DataConnectionActor(
"conn", adapter, _options, _health, _factory, "OpcUa")));
actor.Tell(new SubscribeAlarmsRequest("c", "inst", "conn", "Tank01",
"AnalogLimit.Hi", DateTimeOffset.UtcNow));
ExpectMsg<SubscribeAlarmsResponse>(m => m.Success);
var cb = getCb();
Assert.NotNull(cb);
// Snapshot-complete sentinel: empty source refs (the framing marker) but
// routed because every subscriber receives it; never type-filtered.
cb!(new NativeAlarmTransition("Tank01", "Tank01", "", AlarmTransitionKind.SnapshotComplete,
new AlarmConditionState(false, true, null, AlarmShelveState.Unshelved, false, 0),
"", "", "", "", "", null, DateTimeOffset.UtcNow, "", ""));
ExpectMsg<NativeAlarmTransitionUpdate>(u => u.Transition.Kind == AlarmTransitionKind.SnapshotComplete);
}
}
@@ -1,3 +1,4 @@
using ZB.MOM.WW.MxGateway.Client;
using ZB.MOM.WW.MxGateway.Contracts.Proto;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
using CommonsTransitionKind = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.AlarmTransitionKind;
@@ -63,4 +64,91 @@ public class MxGatewayAlarmMapperTests
Assert.False(t.Condition.Acknowledged);
Assert.Equal(1000, t.Condition.Severity);
}
// ── CurrentValue / LimitValue (M2.13 / #27) ──────────────────────────────
[Fact]
public void MapTransition_CurrentAndLimitValue_PopulatedFromProto()
{
// The gateway proto OnAlarmTransitionEvent carries current_value and
// limit_value as MxValue union fields. Verify both are mapped through
// MxValueToString into the neutral NativeAlarmTransition strings.
var ev = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
TransitionKind = ProtoTransitionKind.Raise,
Severity = 800,
CurrentValue = 95.3.ToMxValue(),
LimitValue = 90.0.ToMxValue()
};
var t = MxGatewayAlarmMapper.MapTransition(ev);
Assert.Equal("95.3", t.CurrentValue);
Assert.Equal("90", t.LimitValue);
}
[Fact]
public void MapTransition_AbsentCurrentAndLimitValue_YieldsEmpty()
{
// When the gateway sends events without current/limit value fields (optional),
// the resulting transition must have empty strings — never null.
var ev = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.Hi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.Hi",
TransitionKind = ProtoTransitionKind.Raise,
Severity = 600
// CurrentValue and LimitValue not set → proto default (null reference)
};
var t = MxGatewayAlarmMapper.MapTransition(ev);
Assert.Equal("", t.CurrentValue);
Assert.Equal("", t.LimitValue);
}
[Fact]
public void MapSnapshot_CurrentAndLimitValue_PopulatedFromProto()
{
// ActiveAlarmSnapshot also carries current_value and limit_value.
var snap = new ActiveAlarmSnapshot
{
AlarmFullReference = "Pump01.Vibration.HiHi",
SourceObjectReference = "Pump01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
CurrentState = ProtoConditionState.Active,
Severity = 900,
CurrentValue = 12.7.ToMxValue(),
LimitValue = 10.0.ToMxValue()
};
var t = MxGatewayAlarmMapper.MapSnapshot(snap);
Assert.Equal("12.7", t.CurrentValue);
Assert.Equal("10", t.LimitValue);
}
[Fact]
public void MapSnapshot_StringMxValue_ProducesStringCurrentValue()
{
// MxValue can carry string values (e.g. for discrete/string-type tags).
var snap = new ActiveAlarmSnapshot
{
AlarmFullReference = "Mode.Alarm",
SourceObjectReference = "Mode",
AlarmTypeName = "DiscreteAlarm",
CurrentState = ProtoConditionState.Active,
Severity = 500,
CurrentValue = "FAULT".ToMxValue()
};
var t = MxGatewayAlarmMapper.MapSnapshot(snap);
Assert.Equal("FAULT", t.CurrentValue);
Assert.Equal("", t.LimitValue); // not set
}
}
@@ -55,4 +55,54 @@ public class OpcUaAlarmMapperTests
{
Assert.Equal(expected, OpcUaAlarmMapper.MapShelve(name));
}
// ── PickLimitValue (M2.13 / #27) ─────────────────────────────────────────
[Fact]
public void PickLimitValue_AllNull_ReturnsEmpty()
{
// All four limit fields absent (non-limit alarm type) → empty string.
Assert.Equal("", OpcUaAlarmMapper.PickLimitValue(null, null, null, null));
}
[Fact]
public void PickLimitValue_HighHighLimitPresent_ReturnsIt()
{
// HighHighLimit takes top priority; other fields are null (absent).
var result = OpcUaAlarmMapper.PickLimitValue(100.5, null, null, null);
Assert.Equal("100.5", result);
}
[Fact]
public void PickLimitValue_OnlyHighLimit_ReturnsHighLimit()
{
// Only HighLimit present (HighHighLimit absent on this alarm type).
var result = OpcUaAlarmMapper.PickLimitValue(null, 80.0, null, null);
Assert.Equal("80", result);
}
[Fact]
public void PickLimitValue_PriorityOrder_HighHighWinsOverHigh()
{
// When multiple limits are present, HighHighLimit takes precedence.
var result = OpcUaAlarmMapper.PickLimitValue(95.0, 80.0, 20.0, 5.0);
Assert.Equal("95", result);
}
[Fact]
public void PickLimitValue_OnlyLowLow_ReturnsLowLow()
{
// LowLowLimit only — last in priority, but should still be returned.
var result = OpcUaAlarmMapper.PickLimitValue(null, null, null, -10.5);
Assert.Equal("-10.5", result);
}
[Fact]
public void PickLimitValue_UsesInvariantCulture()
{
// Decimal separator must always be '.' regardless of thread culture.
var result = OpcUaAlarmMapper.PickLimitValue(1.5, null, null, null);
Assert.Contains('.', result); // invariant culture: '.' not ','
Assert.Equal("1.5", result);
}
}
@@ -0,0 +1,63 @@
using Opc.Ua;
using ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Adapters;
namespace ZB.MOM.WW.ScadaBridge.DataConnectionLayer.Tests;
/// <summary>
/// M2.4 (#8) regression: standard OPC UA A&amp;C events carry an event-type
/// <see cref="NodeId"/> (e.g. <c>i=9341</c> for ExclusiveLevelAlarmType), but the
/// client-side conditionFilter gate — and the server-side WhereClause — both key off
/// the friendly type names in <see cref="RealOpcUaClient.KnownConditionTypeIds"/>.
/// <see cref="RealOpcUaClient.ResolveAlarmTypeName"/> bridges the two by resolving the
/// event-type NodeId back to its friendly name (NodeId-string fallback for custom
/// types), so a friendly-name filter actually matches the events the server delivers.
/// </summary>
public class RealOpcUaClientAlarmFilterTests
{
[Fact]
public void ResolveAlarmTypeName_KnownStandardNodeId_ReturnsFriendlyName()
{
// The well-known NodeId for ExclusiveLevelAlarmType (i=9341) must resolve to
// the friendly name the conditionFilter/WhereClause use.
var resolved = RealOpcUaClient.ResolveAlarmTypeName(ObjectTypeIds.ExclusiveLevelAlarmType);
Assert.Equal("ExclusiveLevelAlarmType", resolved);
}
[Fact]
public void ResolveAlarmTypeName_DiscreteAlarmNodeId_ReturnsFriendlyName()
{
var resolved = RealOpcUaClient.ResolveAlarmTypeName(ObjectTypeIds.DiscreteAlarmType);
Assert.Equal("DiscreteAlarmType", resolved);
}
[Fact]
public void ResolveAlarmTypeName_UnknownCustomNodeId_ReturnsNodeIdString()
{
// A vendor/custom subtype not in KnownConditionTypeIds: we cannot map it to a
// friendly name, so we fall back to its NodeId string. This is consistent —
// the WhereClause is also omitted for unknown names, so the client gate matches
// the NodeId string, which is the only thing such a filter could carry.
var custom = new NodeId(987654u, 7);
var resolved = RealOpcUaClient.ResolveAlarmTypeName(custom);
Assert.Equal(custom.ToString(), resolved);
}
[Fact]
public void ResolveAlarmTypeName_Null_ReturnsEmptyString()
{
Assert.Equal("", RealOpcUaClient.ResolveAlarmTypeName(null));
}
[Fact]
public void InverseMap_RoundTrips_EveryKnownConditionType()
{
// The friendly→NodeId map (KnownConditionTypeIds) and the NodeId→friendly map
// are derived from a single source of truth, so they must round-trip for every
// entry — guards against the two maps drifting apart.
foreach (var (friendlyName, nodeId) in RealOpcUaClient.KnownConditionTypeIds)
{
var resolved = RealOpcUaClient.ResolveAlarmTypeName(nodeId);
Assert.Equal(friendlyName, resolved);
}
}
}
@@ -22,6 +22,8 @@
uses a plain [Fact] — it never needs the server.
-->
<PackageReference Include="Xunit.SkippableFact" />
<!-- MxGateway.Client brings MxValueExtensions (ToClrValue) used by MxGatewayAlarmMapper tests. -->
<PackageReference Include="ZB.MOM.WW.MxGateway.Client" />
</ItemGroup>
<ItemGroup>
@@ -0,0 +1,122 @@
using NSubstitute;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Instances;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Enums;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Flattening;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
namespace ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests;
/// <summary>
/// M2.8 (#23): proves the deploy path (FlatteningPipeline.FlattenAndValidateAsync)
/// opts into connection-binding enforcement, so a data-sourced attribute with no
/// binding gates the deployment as an ERROR (not just a warning), and that a binding
/// resolving to a connection that actually exists at the target site passes.
/// </summary>
public class FlatteningPipelineConnectionBindingTests
{
private const int InstanceId = 1;
private const int TemplateId = 10;
private const int SiteId = 100;
private const int ConnectionId = 7;
private readonly ITemplateEngineRepository _templateRepo = Substitute.For<ITemplateEngineRepository>();
private readonly ISiteRepository _siteRepo = Substitute.For<ISiteRepository>();
private readonly FlatteningPipeline _sut;
public FlatteningPipelineConnectionBindingTests()
{
_sut = new FlatteningPipeline(
_templateRepo,
_siteRepo,
new FlatteningService(),
new ValidationService(),
new RevisionHashService());
}
/// <summary>
/// Seeds a single-template chain with one data-sourced attribute ("Temp") and a
/// site that owns a single "PlantBus" data connection. The instance optionally
/// binds "Temp" to <paramref name="boundConnectionId"/>.
/// </summary>
private void Arrange(int? boundConnectionId)
{
var template = new Template("Tank") { Id = TemplateId };
template.Attributes.Add(new TemplateAttribute("Temp")
{
DataType = DataType.Double,
DataSourceReference = "ns=2;s=Temp"
});
var instance = new Instance("Tank-01") { Id = InstanceId, TemplateId = TemplateId, SiteId = SiteId };
if (boundConnectionId.HasValue)
{
instance.ConnectionBindings.Add(new InstanceConnectionBinding("Temp")
{
InstanceId = InstanceId,
DataConnectionId = boundConnectionId.Value
});
}
_templateRepo.GetInstanceByIdAsync(InstanceId, Arg.Any<CancellationToken>()).Returns(instance);
_templateRepo.GetTemplateWithChildrenAsync(TemplateId, Arg.Any<CancellationToken>()).Returns(template);
_templateRepo.GetCompositionsByTemplateIdAsync(TemplateId, Arg.Any<CancellationToken>()).Returns([]);
_templateRepo.GetAllSharedScriptsAsync(Arg.Any<CancellationToken>()).Returns([]);
var connection = new DataConnection("PlantBus", "OpcUa", SiteId) { Id = ConnectionId };
_siteRepo.GetDataConnectionsBySiteIdAsync(SiteId, Arg.Any<CancellationToken>())
.Returns([connection]);
}
[Fact]
public async Task FlattenAndValidate_DataSourcedAttributeWithNoBinding_ReportsBindingError()
{
Arrange(boundConnectionId: null);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.False(result.Value.Validation.IsValid);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
[Fact]
public async Task FlattenAndValidate_BindingToExistingSiteConnection_NoBindingError()
{
Arrange(boundConnectionId: ConnectionId);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.DoesNotContain(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
[Fact]
public async Task FlattenAndValidate_BindingToStaleDeletedConnection_ReportsBindingError()
{
// M2.8 (#23): FlatteningService.ApplyConnectionBindings silently drops a
// binding whose DataConnectionId doesn't resolve to any loaded site
// DataConnection (stale / deleted connection). The flattener leaves
// BoundDataConnectionId == null, so the validator treats the attribute as
// unbound and gates the deployment with a ConnectionBinding Error.
//
// Arrange: the instance binding points at id 999, but the site only has
// the connection with id=ConnectionId (7). The flattener can't resolve 999
// and drops the binding silently; the validator then flags it.
const int StaleConnectionId = 999;
Arrange(boundConnectionId: StaleConnectionId);
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.False(result.Value.Validation.IsValid);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.ConnectionBinding);
}
}
@@ -0,0 +1,102 @@
using NSubstitute;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Instances;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Sites;
using ZB.MOM.WW.ScadaBridge.Commons.Entities.Templates;
using ZB.MOM.WW.ScadaBridge.Commons.Interfaces.Repositories;
using ZB.MOM.WW.ScadaBridge.Commons.Types.Flattening;
using ZB.MOM.WW.ScadaBridge.DeploymentManager;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Flattening;
using ZB.MOM.WW.ScadaBridge.TemplateEngine.Validation;
namespace ZB.MOM.WW.ScadaBridge.DeploymentManager.Tests;
/// <summary>
/// M2.1 (#22): proves the FlatteningPipeline actually computes the alarm-capable
/// connection set from the loaded site data connections and threads it through
/// ValidationService → SemanticValidator. Before the fix the pipeline loaded the
/// connections but never passed the capable set, so the native-alarm-source
/// capability check (built but inert) never ran in production — a source bound to
/// a non-alarm-capable connection deployed silently.
/// </summary>
public class FlatteningPipelineNativeAlarmCapabilityTests
{
private const int InstanceId = 1;
private const int TemplateId = 10;
private const int SiteId = 100;
private readonly ITemplateEngineRepository _templateRepo = Substitute.For<ITemplateEngineRepository>();
private readonly ISiteRepository _siteRepo = Substitute.For<ISiteRepository>();
private readonly FlatteningPipeline _sut;
public FlatteningPipelineNativeAlarmCapabilityTests()
{
_sut = new FlatteningPipeline(
_templateRepo,
_siteRepo,
new FlatteningService(),
new ValidationService(),
new RevisionHashService());
}
/// <summary>
/// Seeds a single-template chain whose only template carries one native alarm
/// source bound to <paramref name="connectionName"/>, and a site that owns a
/// single data connection of <paramref name="connectionProtocol"/>.
/// </summary>
private void Arrange(string connectionName, string connectionProtocol, string boundConnectionName)
{
var template = new Template("Tank") { Id = TemplateId };
template.NativeAlarmSources.Add(new TemplateNativeAlarmSource("BoilerAlarms")
{
ConnectionName = boundConnectionName,
SourceReference = "ns=2;s=Boiler",
});
var instance = new Instance("Tank-01") { Id = InstanceId, TemplateId = TemplateId, SiteId = SiteId };
_templateRepo.GetInstanceByIdAsync(InstanceId, Arg.Any<CancellationToken>()).Returns(instance);
_templateRepo.GetTemplateWithChildrenAsync(TemplateId, Arg.Any<CancellationToken>()).Returns(template);
_templateRepo.GetCompositionsByTemplateIdAsync(TemplateId, Arg.Any<CancellationToken>())
.Returns([]);
_templateRepo.GetAllSharedScriptsAsync(Arg.Any<CancellationToken>())
.Returns([]);
var connection = new DataConnection(connectionName, connectionProtocol, SiteId) { Id = 7 };
_siteRepo.GetDataConnectionsBySiteIdAsync(SiteId, Arg.Any<CancellationToken>())
.Returns([connection]);
}
[Fact]
public async Task FlattenAndValidate_NativeAlarmSourceOnNonAlarmCapableConnection_ReportsCapabilityError()
{
// A "Modbus" connection is NOT alarm-capable (no IAlarmSubscribableConnection adapter).
Arrange(connectionName: "PlantBus", connectionProtocol: "Modbus", boundConnectionName: "PlantBus");
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.Contains(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.NativeAlarmSourceInvalid
&& e.Message.Contains("alarm-capable"));
}
[Theory]
[InlineData("OpcUa")]
[InlineData("MxGateway")]
// Case variants: IsAlarmCapable uses OrdinalIgnoreCase, matching DataConnectionFactory's
// own OrdinalIgnoreCase protocol-key lookup; lock the contract with non-canonical casing.
[InlineData("OPCUA")]
[InlineData("opcua")]
[InlineData("mxgateway")]
[InlineData("MXGATEWAY")]
public async Task FlattenAndValidate_NativeAlarmSourceOnAlarmCapableConnection_NoCapabilityError(string protocol)
{
Arrange(connectionName: "Boiler", connectionProtocol: protocol, boundConnectionName: "Boiler");
var result = await _sut.FlattenAndValidateAsync(InstanceId);
Assert.True(result.IsSuccess);
Assert.DoesNotContain(result.Value.Validation.Errors,
e => e.Category == ValidationCategory.NativeAlarmSourceInvalid);
}
}
@@ -100,7 +100,14 @@ public class DatabaseGatewayTests
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
var gateway = new DatabaseGateway(_repository, NullLogger<DatabaseGateway>.Instance, storeAndForward: sf);
// M2.3 (#7): CachedWriteAsync now attempts the write immediately and
// only buffers on a TRANSIENT failure. The stub forces a transient
// outcome so this test exercises the buffering path deterministically
// without a real SQL Server.
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException("deadlock", errorNumber: 1205));
// Audit Log #23 (ExecutionId Task 4): a known execution id / source
// script so the gateway -> EnqueueAsync hop can be asserted below.
@@ -157,7 +164,11 @@ public class DatabaseGatewayTests
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
var gateway = new DatabaseGateway(_repository, NullLogger<DatabaseGateway>.Instance, storeAndForward: sf);
// M2.3 (#7): force a transient outcome so the write reaches S&F.
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException("deadlock", errorNumber: 1205));
await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
@@ -167,6 +178,377 @@ public class DatabaseGatewayTests
Assert.NotEqual(0, maxRetries);
}
// ── M2.3 (#7): transient-vs-permanent SQL classification on the immediate
// cached-write attempt + the buffered retry path ──
/// <summary>
/// Builds a real, initialised in-memory store-and-forward service plus a
/// keep-alive connection (the SQLite shared-cache DB lives only while a
/// connection is open). The caller disposes <paramref name="keepAlive"/>.
/// </summary>
private static (ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService Sf, string ConnStr, Microsoft.Data.Sqlite.SqliteConnection KeepAlive)
NewStoreAndForward()
{
var dbName = $"EsgCachedWriteClassify_{Guid.NewGuid():N}";
var connStr = $"Data Source={dbName};Mode=Memory;Cache=Shared";
var keepAlive = new Microsoft.Data.Sqlite.SqliteConnection(connStr);
keepAlive.Open();
var storage = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardStorage(
connStr, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardStorage>.Instance);
storage.InitializeAsync().GetAwaiter().GetResult();
var sfOptions = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardOptions
{
DefaultMaxRetries = 99,
DefaultRetryInterval = TimeSpan.FromMinutes(10),
RetryTimerInterval = TimeSpan.FromMinutes(10),
};
var sf = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService(
storage, sfOptions, NullLogger<ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService>.Instance);
return (sf, connStr, keepAlive);
}
[Fact]
public async Task CachedWrite_PermanentSqlError_ReturnsFailedSynchronously_NotBuffered()
{
// A constraint/syntax/permission failure on the IMMEDIATE attempt must
// be returned to the script as Failed and must NOT be buffered — mirrors
// ExternalSystemClient.CachedCallAsync's PermanentExternalSystemException
// path.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new PermanentDatabaseException(
"Violation of PRIMARY KEY constraint", errorNumber: 2627));
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.False(result.Success);
Assert.False(result.WasBuffered);
Assert.NotNull(result.ErrorMessage);
// Nothing buffered — the permanent failure short-circuited S&F.
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_TransientSqlError_BuffersToStoreAndForward()
{
// A deadlock / timeout on the IMMEDIATE attempt is transient — the write
// is handed to S&F (WasBuffered=true), not returned as Failed.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test")
{
Id = 1,
MaxRetries = 5,
RetryDelay = TimeSpan.FromSeconds(12),
};
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(
_repository,
sf,
onExecute: () => throw new TransientDatabaseException(
"Transaction was deadlocked", errorNumber: 1205));
var result = await gateway.CachedWriteAsync(
"testDb", "UPDATE t SET v = 1", new Dictionary<string, object?> { ["x"] = 1 });
Assert.True(result.Success); // accepted for delivery
Assert.True(result.WasBuffered); // handed to S&F, not synchronously failed
Assert.Null(result.ErrorMessage);
Assert.Equal(1, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_ImmediateSuccess_NotBuffered_ReturnsDelivered()
{
// A write that succeeds immediately is done — it must NOT be buffered,
// and the result reports success (WasBuffered=false), mirroring the API
// path's immediate-success behaviour.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new ExecuteStubGateway(_repository, sf, onExecute: () => { /* succeeds */ });
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.True(result.Success);
Assert.False(result.WasBuffered);
Assert.Null(result.ErrorMessage);
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task DeliverBuffered_TransientSqlError_RethrowsSoEngineRetries()
{
// On the retry path a transient failure must propagate so the S&F engine
// schedules another retry — mirrors ExternalSystemClient.DeliverBuffered
// letting TransientExternalSystemException escape.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new ExecuteStubGateway(
_repository,
storeAndForward: null,
onExecute: () => throw new TransientDatabaseException("timeout", errorNumber: -2));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
await Assert.ThrowsAsync<TransientDatabaseException>(
() => gateway.DeliverBufferedAsync(message));
}
[Fact]
public async Task DeliverBuffered_PermanentSqlError_ReturnsFalseSoMessageParks()
{
// On the retry path a permanent failure must park the message (return
// false) rather than retry forever — mirrors ExternalSystemClient.
// DeliverBuffered returning false on PermanentExternalSystemException.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new ExecuteStubGateway(
_repository,
storeAndForward: null,
onExecute: () => throw new PermanentDatabaseException(
"Invalid column name", errorNumber: 207));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
var delivered = await gateway.DeliverBufferedAsync(message);
Assert.False(delivered); // permanent — the S&F engine parks the message
}
// ── M2.3 (#7) code-review fix: ExecuteWriteAsync must classify NON-SqlException
// DB outages as transient (buffer+retry) and propagate cancellation —
// mirroring the HTTP path's ordered catches in InvokeHttpAsync. The pre-fix
// code only caught SqlException, so a live outage surfacing as
// InvalidOperationException / SocketException / IOException / TimeoutException
// escaped unclassified and crashed the Script Execution Actor instead of
// buffering. These tests drive the RAW execution seam (RunSqlAsync) so the
// PRODUCTION classification in ExecuteWriteAsync runs end-to-end. ──
public static IEnumerable<object[]> TransientNonSqlOutages()
{
// A live DB outage that surfaces as a non-SqlException: connection-state,
// socket, IO, and timeout failures are all retryable transport errors.
yield return new object[] { new InvalidOperationException("The connection is not open.") };
yield return new object[] { new System.Net.Sockets.SocketException(10061 /* connection refused */) };
yield return new object[] { new System.IO.IOException("Unable to read data from the transport connection.") };
yield return new object[] { new TimeoutException("The operation has timed out.") };
}
[Theory]
[MemberData(nameof(TransientNonSqlOutages))]
public async Task CachedWrite_NonSqlOutage_ClassifiedTransient_BuffersNotCrash(Exception outage)
{
// [1] A live outage that is NOT a SqlException must be classified TRANSIENT
// (buffered for retry), NOT escape unclassified to crash the script actor,
// and NOT be returned as a permanent Failed result.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test")
{
Id = 1,
MaxRetries = 5,
RetryDelay = TimeSpan.FromSeconds(12),
};
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
// RawExecuteStubGateway routes the raw throw through the PRODUCTION
// ExecuteWriteAsync classification (the seam under test), unlike
// ExecuteStubGateway which throws an already-classified exception.
var gateway = new RawExecuteStubGateway(_repository, sf, onRunSql: () => throw outage);
var result = await gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)");
Assert.True(result.Success); // accepted for delivery, not a crash
Assert.True(result.WasBuffered); // handed to S&F as transient
Assert.Null(result.ErrorMessage); // not a permanent Failed result
Assert.Equal(1, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_CancellationRequested_PropagatesOperationCanceled_NotReclassified()
{
// [2] OperationCanceledException raised while the caller's token is
// cancelled must propagate UNCHANGED — never reclassified as a transient
// DB error and never buffered. Mirrors the HTTP path's first catch:
// `catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) throw;`
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
using var cts = new CancellationTokenSource();
cts.Cancel();
var gateway = new RawExecuteStubGateway(
_repository, sf, onRunSql: () => throw new OperationCanceledException(cts.Token));
await Assert.ThrowsAsync<OperationCanceledException>(
() => gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)", cancellationToken: cts.Token));
// Cancellation is not a transient failure — nothing must have been buffered.
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task CachedWrite_UnexpectedException_Propagates_NotClassifiedTransient()
{
// An exception type outside the transient transport set (e.g.
// ArgumentException) is NOT a DB outage — it must propagate, exactly as
// the HTTP path lets genuinely-unexpected exceptions escape past
// `catch (Exception ex) when (ErrorClassifier.IsTransient(ex))`.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var (sf, connStr, keepAlive) = NewStoreAndForward();
using var _ = keepAlive;
var gateway = new RawExecuteStubGateway(
_repository, sf, onRunSql: () => throw new ArgumentException("authoring bug"));
await Assert.ThrowsAsync<ArgumentException>(
() => gateway.CachedWriteAsync("testDb", "INSERT INTO t VALUES (1)"));
Assert.Equal(0, ReadBufferDepth(connStr));
}
[Fact]
public async Task DeliverBuffered_NonSqlOutage_RethrowsAsTransient_SoEngineRetries()
{
// [1] on the RETRY path: a non-SqlException outage during delivery must be
// classified transient and propagate (as TransientDatabaseException) so
// the S&F engine schedules another retry — it must NOT crash/park.
var conn = new DatabaseConnectionDefinition("testDb", "Server=localhost;Database=test") { Id = 1 };
StubConnection(conn);
var gateway = new RawExecuteStubGateway(
_repository,
storeAndForward: null,
onRunSql: () => throw new InvalidOperationException("The connection is not open."));
var message = new ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardMessage
{
Id = Guid.NewGuid().ToString("N"),
Category = ZB.MOM.WW.ScadaBridge.Commons.Types.Enums.StoreAndForwardCategory.CachedDbWrite,
Target = "testDb",
PayloadJson =
"""{"ConnectionName":"testDb","Sql":"INSERT INTO t VALUES (1)","Parameters":null}""",
};
await Assert.ThrowsAsync<TransientDatabaseException>(
() => gateway.DeliverBufferedAsync(message));
}
/// <summary>
/// Reads the current buffered-message count off the S&amp;F SQLite DB by
/// counting <c>sf_messages</c> rows (the engine's persistence table).
/// </summary>
private static int ReadBufferDepth(string connStr)
{
using var conn = new Microsoft.Data.Sqlite.SqliteConnection(connStr);
conn.Open();
using var cmd = conn.CreateCommand();
cmd.CommandText = "SELECT COUNT(*) FROM sf_messages";
return Convert.ToInt32(cmd.ExecuteScalar());
}
/// <summary>
/// Test gateway that substitutes the SQL-execution seam so a test can drive
/// success / transient / permanent outcomes without a real SQL Server (and
/// without fabricating a <see cref="Microsoft.Data.SqlClient.SqlException"/>,
/// which has no public constructor). Production classifies a real
/// <c>SqlException</c> into <see cref="TransientDatabaseException"/> /
/// <see cref="PermanentDatabaseException"/> at this same seam.
/// </summary>
private sealed class ExecuteStubGateway : DatabaseGateway
{
private readonly Action _onExecute;
public ExecuteStubGateway(
IExternalSystemRepository repository,
ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService? storeAndForward,
Action onExecute)
: base(repository, NullLogger<DatabaseGateway>.Instance, storeAndForward)
=> _onExecute = onExecute;
internal override Task ExecuteWriteAsync(
string connectionName,
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
_onExecute();
return Task.CompletedTask;
}
}
/// <summary>
/// Test gateway that substitutes the INNER SQL-execution seam
/// (<c>RunSqlAsync</c>) so a test can throw RAW exceptions (a real outage
/// shape: <see cref="InvalidOperationException"/>, <see cref="System.Net.Sockets.SocketException"/>,
/// etc.) and have them flow through the PRODUCTION
/// <c>ExecuteWriteAsync</c> classification (the catch ordering under test) —
/// unlike <see cref="ExecuteStubGateway"/>, which throws an
/// already-classified <see cref="TransientDatabaseException"/> /
/// <see cref="PermanentDatabaseException"/> and so bypasses the catches.
/// </summary>
private sealed class RawExecuteStubGateway : DatabaseGateway
{
private readonly Action _onRunSql;
public RawExecuteStubGateway(
IExternalSystemRepository repository,
ZB.MOM.WW.ScadaBridge.StoreAndForward.StoreAndForwardService? storeAndForward,
Action onRunSql)
: base(repository, NullLogger<DatabaseGateway>.Instance, storeAndForward)
=> _onRunSql = onRunSql;
internal override Task RunSqlAsync(
string connectionString,
string sql,
IReadOnlyDictionary<string, object?> parameters,
CancellationToken cancellationToken)
{
_onRunSql();
return Task.CompletedTask;
}
}
private static (int MaxRetries, long RetryIntervalMs, Guid? ExecutionId, string? SourceScript)
ReadBufferedRetrySettings(string connStr)
{
@@ -0,0 +1,105 @@
using System.Data.Common;
namespace ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests;
/// <summary>
/// M2.3 (#7): unit tests for the transient-vs-permanent SQL error-number
/// classifier that <c>DatabaseGateway</c> uses to decide whether a failed
/// cached write should be buffered (transient) or returned to the script
/// synchronously / parked (permanent).
/// </summary>
public class SqlErrorClassifierTests
{
// The full transient set documented on SqlErrorClassifier — connection,
// timeout, deadlock, and Azure throttle error numbers. A retry can plausibly
// succeed for any of these, so they are buffered to store-and-forward.
[Theory]
[InlineData(-2)] // timeout expired
[InlineData(-1)] // connection error
[InlineData(2)] // network / instance not found
[InlineData(53)] // network path not found
[InlineData(64)] // connection terminated mid-session
[InlineData(233)] // no process on the other end of the pipe
[InlineData(1205)] // deadlock victim
[InlineData(10053)] // transport-level abort
[InlineData(10054)] // connection reset by peer
[InlineData(10060)] // connection timed out
[InlineData(40197)] // Azure SQL service error, retry
[InlineData(40501)] // Azure SQL service busy
[InlineData(40613)] // Azure SQL database unavailable
[InlineData(49918)] // Azure SQL cannot process request (throttle)
[InlineData(49919)] // Azure SQL too many create/update operations
[InlineData(49920)] // Azure SQL too many operations (throttle)
public void IsTransient_KnownTransientNumber_ReturnsTrue(int errorNumber)
{
Assert.True(SqlErrorClassifier.IsTransient(errorNumber));
}
// Constraint, syntax, and permission errors are permanent — retrying the
// identical statement cannot succeed and may cause duplicate side effects.
[Theory]
[InlineData(547)] // constraint violation (FK/CHECK)
[InlineData(2627)] // primary-key / unique constraint violation
[InlineData(2601)] // duplicate key in a unique index
[InlineData(102)] // incorrect syntax
[InlineData(156)] // incorrect syntax near a keyword
[InlineData(207)] // invalid column name
[InlineData(208)] // invalid object name
[InlineData(229)] // permission denied on object
[InlineData(230)] // permission denied on column
[InlineData(262)] // permission denied (CREATE etc.)
public void IsTransient_KnownPermanentNumber_ReturnsFalse(int errorNumber)
{
Assert.False(SqlErrorClassifier.IsTransient(errorNumber));
}
[Theory]
[InlineData(0)] // no error number captured
[InlineData(99999)] // unknown / undocumented number
[InlineData(12345)]
[InlineData(int.MaxValue)]
public void IsTransient_UnknownNumber_DefaultsToPermanent(int errorNumber)
{
// Fail-fast is the safer default: an unrecognised error number must NOT
// be silently retried forever. Unknown => permanent => false.
Assert.False(SqlErrorClassifier.IsTransient(errorNumber));
}
// ── M2.3 (#7) code-review fix: IsTransient(Exception) — a live DB outage does
// not always surface as a SqlException. Transport/connection/timeout/driver
// exception types are transient (buffer+retry), mirroring the HTTP path's
// ErrorClassifier.IsTransient(Exception). ──
public static IEnumerable<object[]> TransientExceptionTypes()
{
yield return new object[] { new InvalidOperationException("connection not open") };
yield return new object[] { new System.IO.IOException("transport reset") };
yield return new object[] { new System.Net.Sockets.SocketException(10060) };
yield return new object[] { new TimeoutException("timed out") };
yield return new object[] { new TaskCanceledException("driver-level cancellation") };
// Any DbException that is NOT a SqlException is a driver/transport error.
yield return new object[] { new NonSqlDbException("provider transport error") };
}
[Theory]
[MemberData(nameof(TransientExceptionTypes))]
public void IsTransient_Exception_TrueForTransportTypes(Exception ex)
{
Assert.True(SqlErrorClassifier.IsTransient(ex));
}
[Fact]
public void IsTransient_Exception_FalseForUnexpectedType()
{
// Authoring bugs are NOT a DB outage — they must propagate, exactly as the
// HTTP path lets genuinely-unexpected exceptions escape its IsTransient filter.
Assert.False(SqlErrorClassifier.IsTransient(new ArgumentException("authoring bug")));
Assert.False(SqlErrorClassifier.IsTransient(new NullReferenceException()));
}
/// <summary>A concrete <see cref="DbException"/> that is not a SqlException, for the classifier unit test.</summary>
private sealed class NonSqlDbException : DbException
{
public NonSqlDbException(string message) : base(message) { }
}
}
@@ -0,0 +1,48 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) idempotency regression — code-review finding on commit d81f747.
/// <para>
/// <see cref="ServiceCollectionExtensions.AddSiteEventLogHealthMetricsBridge"/> uses a
/// factory-lambda overload of <c>AddHostedService</c>, which sets only
/// <c>ImplementationFactory</c> and leaves <c>ImplementationType</c> null. The original
/// <c>ImplementationType ==</c> guard was therefore a silent no-op: a second call would spin
/// up a second <see cref="SiteEventLogFailureCountReporter"/> (two timers both polling).
/// The fix uses a private marker singleton whose <c>ServiceType</c> is always set.
/// </para>
/// </summary>
public class AddSiteEventLogHealthMetricsBridgeTests
{
[Fact]
public void AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService()
{
// M2.16 (#30): calling the bridge method twice must register exactly one
// SiteEventLogFailureCountReporter. Without the marker-type guard the
// ImplementationType == check was a no-op for factory-lambda registrations,
// so the second call would have added a second hosted service (two timers).
var services = new ServiceCollection();
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
services.AddHealthMonitoring();
Func<IServiceProvider, Func<long>> factory = _ => () => 0L;
services.AddSiteEventLogHealthMetricsBridge(factory);
services.AddSiteEventLogHealthMetricsBridge(factory);
// Count IHostedService descriptors whose factory produces a
// SiteEventLogFailureCountReporter. Because it is factory-registered,
// ImplementationType is null — we count by resolving and checking type.
using var provider = services.BuildServiceProvider();
var reporters = provider.GetServices<IHostedService>()
.OfType<SiteEventLogFailureCountReporter>()
.ToList();
Assert.Single(reporters);
}
}
@@ -0,0 +1,77 @@
using Microsoft.Extensions.Logging.Abstractions;
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) — unit tests for <see cref="SiteEventLogFailureCountReporter"/>.
/// Verifies that the poller reads the count provided by the
/// <see cref="Func{TResult}"/> delegate and pushes it into
/// <see cref="ISiteHealthCollector.SetSiteEventLogWriteFailures"/>.
/// </summary>
public class SiteEventLogFailureCountReporterTests
{
[Fact]
public async Task StartAsync_ImmediatelyProbes_FailedWriteCount()
{
// Arrange
var count = 99L;
var collector = new SiteHealthCollector();
using var reporter = new SiteEventLogFailureCountReporter(
failedWriteCountProvider: () => count,
collector: collector,
logger: NullLogger<SiteEventLogFailureCountReporter>.Instance,
refreshInterval: TimeSpan.FromHours(1)); // long interval — only immediate tick matters
// Act
await reporter.StartAsync(CancellationToken.None);
// Give the background Task a moment to execute its synchronous immediate probe.
var deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures == 0L
&& DateTime.UtcNow < deadline)
{
await Task.Delay(10);
}
// Assert — the immediate probe before the first Delay must have fired.
var report = collector.CollectReport("site-1");
Assert.Equal(99L, report.SiteEventLogWriteFailures);
await reporter.StopAsync(CancellationToken.None);
}
[Fact]
public async Task StartAsync_PushesLatestCount_OnEachTick()
{
// Arrange — start with count 5; advance to 12 after the first tick.
var count = 5L;
var collector = new SiteHealthCollector();
using var reporter = new SiteEventLogFailureCountReporter(
failedWriteCountProvider: () => count,
collector: collector,
logger: NullLogger<SiteEventLogFailureCountReporter>.Instance,
refreshInterval: TimeSpan.FromMilliseconds(50));
await reporter.StartAsync(CancellationToken.None);
// Wait for immediate probe.
var deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures != 5L
&& DateTime.UtcNow < deadline)
await Task.Delay(10);
Assert.Equal(5L, collector.CollectReport("site-1").SiteEventLogWriteFailures);
// Advance the counter and wait for the next tick to push the new value.
count = 12L;
deadline = DateTime.UtcNow.AddSeconds(5);
while (collector.CollectReport("probe").SiteEventLogWriteFailures != 12L
&& DateTime.UtcNow < deadline)
await Task.Delay(10);
Assert.Equal(12L, collector.CollectReport("site-1").SiteEventLogWriteFailures);
await reporter.StopAsync(CancellationToken.None);
}
}
@@ -0,0 +1,62 @@
namespace ZB.MOM.WW.ScadaBridge.HealthMonitoring.Tests;
/// <summary>
/// M2.16 (#30) regression coverage. <see cref="ISiteEventLogger.FailedWriteCount"/>
/// is a cumulative (point-in-time) counter. A periodic
/// <c>SiteEventLogFailureCountReporter</c> hosted service polls the count and
/// pushes it into the collector via
/// <see cref="ISiteHealthCollector.SetSiteEventLogWriteFailures"/> so the next
/// <see cref="ISiteHealthCollector.CollectReport"/> includes it in the report
/// payload as <c>SiteEventLogWriteFailures</c>. Unlike the per-interval
/// SiteAuditWriteFailures counter, this value is NOT reset on collect — it
/// carries forward whatever the most recent poller push delivered.
/// </summary>
public class SiteEventLogWriteFailuresMetricTests
{
private readonly SiteHealthCollector _collector = new();
[Fact]
public void Set_Then_CollectReport_IncludesCount()
{
_collector.SetSiteEventLogWriteFailures(17L);
var report = _collector.CollectReport("site-1");
Assert.Equal(17L, report.SiteEventLogWriteFailures);
}
[Fact]
public void Report_Payload_Includes_SiteEventLogWriteFailures_AsZeroByDefault()
{
var report = _collector.CollectReport("site-1");
Assert.Equal(0L, report.SiteEventLogWriteFailures);
}
[Fact]
public void CollectReport_DoesNotReset_SiteEventLogWriteFailures()
{
// This is a point-in-time cumulative count — successive CollectReport
// calls before the next poller tick MUST carry forward the same value
// rather than resetting to zero (which would falsely indicate no failures
// between the two reports).
_collector.SetSiteEventLogWriteFailures(42L);
var first = _collector.CollectReport("site-1");
var second = _collector.CollectReport("site-1");
Assert.Equal(42L, first.SiteEventLogWriteFailures);
Assert.Equal(42L, second.SiteEventLogWriteFailures);
}
[Fact]
public void Set_Overwrites_Previous_Value()
{
_collector.SetSiteEventLogWriteFailures(5L);
_collector.SetSiteEventLogWriteFailures(9L);
var report = _collector.CollectReport("site-1");
Assert.Equal(9L, report.SiteEventLogWriteFailures);
}
}
@@ -11,6 +11,7 @@
<ItemGroup>
<PackageReference Include="coverlet.collector" />
<PackageReference Include="Microsoft.Data.Sqlite" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" />
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
<PackageReference Include="Microsoft.Extensions.Options" />
<PackageReference Include="Microsoft.NET.Test.Sdk" />
@@ -35,6 +35,11 @@ public class CentralActorPathTests : IAsyncLifetime
// env var is visible to StartupValidator.Validate() at Program.cs line 42.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder =>
@@ -94,6 +99,7 @@ public class CentralActorPathTests : IAsyncLifetime
_factory?.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
await Task.CompletedTask;
}
@@ -101,6 +101,11 @@ public class CentralAuditWiringTests : IDisposable
// runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder =>
@@ -156,6 +161,7 @@ public class CentralAuditWiringTests : IDisposable
_factory.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
}
[Fact]
@@ -10,8 +10,12 @@ namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
///
/// Also supplies <c>ScadaBridge__InboundApi__ApiKeyPepper</c> so the Central-role
/// StartupValidator preflight (added in 1fcc4f5) does not fail for tests that set
/// <c>DOTNET_ENVIRONMENT=Central</c> without an explicit pepper env var. Both vars
/// are restored on Dispose so tests stay isolated.
/// <c>DOTNET_ENVIRONMENT=Central</c> without an explicit pepper env var.
///
/// Also supplies <c>ScadaBridge__Database__MachineDataDb</c> so the Central-role
/// StartupValidator preflight (reverts Host-008, REQ-HOST-3/4, M2.9 #17) does not
/// fail for tests that set <c>DOTNET_ENVIRONMENT=Central</c> without an explicit
/// MachineDataDb env var. All vars are restored on Dispose so tests stay isolated.
/// </summary>
internal sealed class CentralDbTestEnvironment : IDisposable
{
@@ -22,6 +26,11 @@ internal sealed class CentralDbTestEnvironment : IDisposable
private const string ConfigKey = "ScadaBridge__Database__ConfigurationDb";
private const string MachineDataDb =
"Server=localhost,1433;Database=ScadaBridgeMachineData;User Id=scadabridge_app;Password=ScadaBridge_Dev1#;TrustServerCertificate=true";
private const string MachineDataKey = "ScadaBridge__Database__MachineDataDb";
// Test-only pepper — satisfies the ≥16-char StartupValidator requirement without
// committing a real secret. The env-var name uses the double-underscore delimiter
// so AddEnvironmentVariables() maps it to ScadaBridge:InboundApi:ApiKeyPepper.
@@ -29,6 +38,7 @@ internal sealed class CentralDbTestEnvironment : IDisposable
private const string PepperKey = "ScadaBridge__InboundApi__ApiKeyPepper";
private readonly string? _previousConfig;
private readonly string? _previousMachineData;
private readonly string? _previousPepper;
public CentralDbTestEnvironment()
@@ -36,6 +46,9 @@ internal sealed class CentralDbTestEnvironment : IDisposable
_previousConfig = Environment.GetEnvironmentVariable(ConfigKey);
Environment.SetEnvironmentVariable(ConfigKey, ConfigurationDb);
_previousMachineData = Environment.GetEnvironmentVariable(MachineDataKey);
Environment.SetEnvironmentVariable(MachineDataKey, MachineDataDb);
_previousPepper = Environment.GetEnvironmentVariable(PepperKey);
Environment.SetEnvironmentVariable(PepperKey, TestPepper);
}
@@ -43,6 +56,7 @@ internal sealed class CentralDbTestEnvironment : IDisposable
public void Dispose()
{
Environment.SetEnvironmentVariable(ConfigKey, _previousConfig);
Environment.SetEnvironmentVariable(MachineDataKey, _previousMachineData);
Environment.SetEnvironmentVariable(PepperKey, _previousPepper);
}
}
@@ -95,6 +95,11 @@ public class CentralCompositionRootTests : IDisposable
// runs before WithWebHostBuilder.ConfigureAppConfiguration applies DI config.
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper",
CentralDbTestEnvironment.TestPepper);
// Supply MachineDataDb so the reverted Host-008 Require (REQ-HOST-3/4, M2.9 #17)
// passes for Central-role StartupValidator. A non-empty placeholder satisfies
// the preflight; the DI override below replaces the real DbContext anyway.
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb",
"Server=localhost;Database=MachineData;");
_factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder =>
@@ -159,6 +164,7 @@ public class CentralCompositionRootTests : IDisposable
_factory.Dispose();
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
Environment.SetEnvironmentVariable("ScadaBridge__InboundApi__ApiKeyPepper", null);
Environment.SetEnvironmentVariable("ScadaBridge__Database__MachineDataDb", null);
}
// --- Singletons ---
@@ -399,6 +405,9 @@ public class SiteCompositionRootTests : IDisposable
new object[] { typeof(IEventLogQueryService) },
new object[] { typeof(ISiteIdentityProvider) },
new object[] { typeof(IHealthReportTransport) },
// M2.15 (#29): the active-node purge gate must be registered on site nodes
// so EventLogPurge only runs on the active node.
new object[] { typeof(SiteEventLogActiveNodeCheck) },
};
// --- Scoped services ---
@@ -158,6 +158,15 @@ public class HealthCheckTests : IDisposable
Assert.Contains(ZbHealthTags.Ready, registrations["database"].Tags);
Assert.Contains(ZbHealthTags.Ready, registrations["akka-cluster"].Tags);
// M2.14 (#28): readiness ALSO reflects "required cluster singletons running"
// (REQ-HOST-4a). The Central-only required-singletons check is Ready-tagged so
// it gates /health/ready alongside database + akka-cluster, but is leadership-
// agnostic (it does NOT carry the Active tag), so a ready standby stays ready.
Assert.True(registrations.ContainsKey("required-singletons"),
"Expected a 'required-singletons' health check.");
Assert.Contains(ZbHealthTags.Ready, registrations["required-singletons"].Tags);
Assert.DoesNotContain(ZbHealthTags.Active, registrations["required-singletons"].Tags);
// The leader-only active-node check must NOT be on the readiness tier.
Assert.DoesNotContain(ZbHealthTags.Ready, registrations["active-node"].Tags);
}
@@ -0,0 +1,143 @@
using Akka.Actor;
using Akka.TestKit.Xunit2;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Logging.Abstractions;
using ZB.MOM.WW.ScadaBridge.Host.Health;
namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
/// <summary>
/// M2.14 (#28): unit tests for <see cref="RequiredSingletonsHealthCheck"/>.
///
/// The check probes each required central singleton through its local
/// <c>ClusterSingletonProxy</c> by Asking an <see cref="Identify"/> with a short
/// bounded timeout and treating a non-null <see cref="ActorIdentity.Subject"/> as
/// "reachable". These tests exercise that probe logic directly against a TestKit
/// <see cref="ActorSystem"/>:
/// <list type="bullet">
/// <item>present + reachable proxy paths (live echo actors) → Healthy;</item>
/// <item>a missing proxy path (ActorSelection resolves a null Subject) → Unhealthy
/// naming the unreachable singleton.</item>
/// </list>
/// No WebApplicationFactory / DB / formed cluster is needed — the probe is just an
/// in-process Identify round-trip, so the tests are deterministic and fast.
/// </summary>
public class RequiredSingletonsHealthCheckTests : TestKit
{
/// <summary>A minimal live actor that does nothing — its mere existence makes
/// an <see cref="Identify"/> resolve a non-null Subject (i.e. "reachable").</summary>
/// <remarks>No <c>Receive&lt;Identify&gt;</c> handler is needed: Akka's
/// <see cref="ActorBase"/> answers every <see cref="Identify"/> message with
/// an <see cref="ActorIdentity"/> automatically, so an empty actor at the proxy
/// path is sufficient to simulate a reachable singleton.</remarks>
private sealed class EchoActor : ReceiveActor
{
}
private IServiceProvider ProviderReturning(ActorSystem system)
{
var services = new ServiceCollection();
services.AddSingleton(system);
return services.BuildServiceProvider();
}
private static async Task<HealthCheckResult> RunAsync(RequiredSingletonsHealthCheck check)
{
var context = new HealthCheckContext
{
Registration = new HealthCheckRegistration(
"required-singletons", check, failureStatus: null, tags: null),
};
return await check.CheckHealthAsync(context, CancellationToken.None);
}
[Fact]
public async Task AllRequiredSingletonProxiesReachable_ReportsHealthy()
{
// Create a live actor at every required proxy path so each Identify resolves
// a non-null Subject.
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Healthy, result.Status);
}
[Fact]
public async Task OneRequiredSingletonUnreachable_ReportsUnhealthyNamingIt()
{
// Create all but one proxy. The missing one's ActorSelection resolves an
// ActorIdentity with a null Subject within the bounded timeout → unreachable.
var missing = RequiredSingletonsHealthCheck.RequiredSingletonProxyNames[0];
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
if (name == missing)
continue;
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
Assert.NotNull(result.Description);
Assert.Contains(missing, result.Description!);
}
[Fact]
public async Task ActorSystemNotYetAvailable_ReportsUnhealthy_DoesNotThrow()
{
// Startup race: ActorSystem not yet bridged into DI. The check must map this
// to Unhealthy (the node is not ready to serve) rather than throwing.
var emptyProvider = new ServiceCollection().BuildServiceProvider();
var check = new RequiredSingletonsHealthCheck(
emptyProvider,
NullLogger<RequiredSingletonsHealthCheck>.Instance);
var result = await RunAsync(check);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
[Fact]
public async Task PreCancelledToken_ReportsUnhealthy_DoesNotThrow()
{
// Shutdown-race path: CheckHealthAsync is called with an already-cancelled
// token (e.g. host is tearing down). The check must never throw — any
// OperationCanceledException from Ask must be caught and mapped to Unhealthy.
foreach (var name in RequiredSingletonsHealthCheck.RequiredSingletonProxyNames)
{
Sys.ActorOf(Props.Create(() => new EchoActor()), name);
}
var check = new RequiredSingletonsHealthCheck(
ProviderReturning(Sys),
NullLogger<RequiredSingletonsHealthCheck>.Instance);
using var cts = new CancellationTokenSource();
cts.Cancel(); // already cancelled before the check runs
var context = new HealthCheckContext
{
Registration = new HealthCheckRegistration(
"required-singletons", check, failureStatus: null, tags: null),
};
// Must not throw; an already-cancelled token → all probes fail → Unhealthy.
var result = await check.CheckHealthAsync(context, cts.Token);
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
}
@@ -20,6 +20,7 @@ public class StartupValidatorTests
["ScadaBridge:Node:NodeHostname"] = "central-node1",
["ScadaBridge:Node:RemotingPort"] = "8081",
["ScadaBridge:Database:ConfigurationDb"] = "Server=localhost;Database=Config;",
["ScadaBridge:Database:MachineDataDb"] = "Server=localhost;Database=MachineData;",
["ScadaBridge:Security:Ldap:Server"] = "ldap.example.com",
["ScadaBridge:Security:JwtSigningKey"] = "test-signing-key-at-least-32-chars-long",
["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@central-node1:8081",
@@ -152,17 +153,19 @@ public class StartupValidatorTests
}
[Fact]
public void Central_MissingMachineDataDb_PassesValidation()
public void Central_MissingMachineDataDb_FailsValidation()
{
// Host-008 regression: MachineDataDb is never consumed anywhere in the
// system (only ConfigurationDb is wired into AddConfigurationDatabase).
// It is no longer a required key, so its absence must not fail startup.
// Reverts Host-008. REQ-HOST-3/REQ-HOST-4 require MachineDataDb to be
// validated at startup for Central nodes, and the shipped docker appsettings
// (docker/central-node-a/appsettings.Central.json and central-node-b) carry
// the key. The prior Host-008 decision (which removed the Require) is reversed
// here (#17, M2.9): a missing MachineDataDb must fail fast with a clear error.
var values = ValidCentralConfig();
values.Remove("ScadaBridge:Database:MachineDataDb");
var config = BuildConfig(values);
var ex = Record.Exception(() => StartupValidator.Validate(config));
Assert.Null(ex);
var ex = Assert.Throws<InvalidOperationException>(() => StartupValidator.Validate(config));
Assert.Contains("MachineDataDb connection string required for Central", ex.Message);
}
[Fact]
@@ -1,12 +1,30 @@
using System.Text.Json;
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
/// <summary>
/// WP-2: Tests for parameter validation — type checking, required fields, extended type system.
/// WP-2 / InboundAPI-M2.6: tests for parameter validation — type checking,
/// required fields, the extended type system, and RECURSIVE (nested Object /
/// List element) type validation with path-qualified errors.
///
/// <para>
/// Definitions are expressed as JSON Schema (the canonical persisted format
/// produced by the Central UI / migration). The validator also accepts the
/// legacy flat-array form; that backward-compat path is covered by the final
/// region.
/// </para>
/// </summary>
public class ParameterValidatorTests
{
private static JsonElement Body(string json)
{
using var doc = JsonDocument.Parse(json);
return doc.RootElement.Clone();
}
// ── No / empty definitions ────────────────────────────────────────────────
[Fact]
public void NoDefinitions_NoBody_ReturnsValid()
{
@@ -16,21 +34,27 @@ public class ParameterValidatorTests
}
[Fact]
public void EmptyDefinitions_ReturnsValid()
public void EmptyObjectSchema_ReturnsValid()
{
var result = ParameterValidator.Validate(null, """{"type":"object","properties":{}}""");
Assert.True(result.IsValid);
}
[Fact]
public void EmptyLegacyArray_ReturnsValid()
{
var result = ParameterValidator.Validate(null, "[]");
Assert.True(result.IsValid);
}
// ── Required / body shape ──────────────────────────────────────────────────
[Fact]
public void RequiredParameterMissing_ReturnsInvalid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "value", Type = "Integer", Required = true }
});
const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
var result = ParameterValidator.Validate(null, definitions);
var result = ParameterValidator.Validate(null, def);
Assert.False(result.IsValid);
Assert.Contains("Missing required parameter", result.ErrorMessage);
}
@@ -38,136 +62,379 @@ public class ParameterValidatorTests
[Fact]
public void BodyNotObject_ReturnsInvalid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "value", Type = "String", Required = true }
});
const string def = """{"type":"object","properties":{"value":{"type":"string"}},"required":["value"]}""";
using var doc = JsonDocument.Parse("\"just a string\"");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
var result = ParameterValidator.Validate(Body("\"just a string\""), def);
Assert.False(result.IsValid);
Assert.Contains("must be a JSON object", result.ErrorMessage);
}
[Theory]
[InlineData("Boolean", "true", true)]
[InlineData("Integer", "42", (long)42)]
[InlineData("Float", "3.14", 3.14)]
[InlineData("String", "\"hello\"", "hello")]
public void ValidTypeCoercion_Succeeds(string type, string jsonValue, object expected)
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "val", Type = type, Required = true }
});
using var doc = JsonDocument.Parse($"{{\"val\": {jsonValue}}}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.Equal(expected, result.Parameters["val"]);
}
[Fact]
public void WrongType_ReturnsInvalid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "count", Type = "Integer", Required = true }
});
using var doc = JsonDocument.Parse("{\"count\": \"not a number\"}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.False(result.IsValid);
Assert.Contains("must be an Integer", result.ErrorMessage);
}
[Fact]
public void ObjectType_Parsed()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "data", Type = "Object", Required = true }
});
using var doc = JsonDocument.Parse("{\"data\": {\"key\": \"value\"}}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.IsType<Dictionary<string, object?>>(result.Parameters["data"]);
}
[Fact]
public void ListType_Parsed()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "items", Type = "List", Required = true }
});
using var doc = JsonDocument.Parse("{\"items\": [1, 2, 3]}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
Assert.True(result.IsValid);
Assert.IsType<List<object?>>(result.Parameters["items"]);
}
[Fact]
public void OptionalParameter_MissingBody_ReturnsValid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "optional", Type = "String", Required = false }
});
const string def = """{"type":"object","properties":{"optional":{"type":"string"}}}""";
var result = ParameterValidator.Validate(null, definitions);
var result = ParameterValidator.Validate(null, def);
Assert.True(result.IsValid);
}
// ── Scalar coercion ────────────────────────────────────────────────────────
[Theory]
[InlineData("boolean", "true", true)]
[InlineData("integer", "42", (long)42)]
[InlineData("number", "3.14", 3.14)]
[InlineData("string", "\"hello\"", "hello")]
public void ValidTypeCoercion_Succeeds(string type, string jsonValue, object expected)
{
var def = "{\"type\":\"object\",\"properties\":{\"val\":{\"type\":\"" + type + "\"}},\"required\":[\"val\"]}";
var result = ParameterValidator.Validate(Body($"{{\"val\": {jsonValue}}}"), def);
Assert.True(result.IsValid);
Assert.Equal(expected, result.Parameters["val"]);
}
[Fact]
public void WrongScalarType_ReturnsInvalid()
{
const string def = """{"type":"object","properties":{"count":{"type":"integer"}},"required":["count"]}""";
var result = ParameterValidator.Validate(Body("{\"count\": \"not a number\"}"), def);
Assert.False(result.IsValid);
Assert.Contains("'count'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact]
public void UnknownType_ReturnsInvalid()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "val", Type = "CustomType", Required = true }
});
const string def = """{"type":"object","properties":{"val":{"type":"customtype"}},"required":["val"]}""";
using var doc = JsonDocument.Parse("{\"val\": \"test\"}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
var result = ParameterValidator.Validate(Body("{\"val\": \"test\"}"), def);
Assert.False(result.IsValid);
Assert.Contains("Unknown parameter type", result.ErrorMessage);
Assert.Contains("unknown declared type", result.ErrorMessage);
}
// --- InboundAPI-010: unexpected top-level body fields must be reported so
// callers get feedback on typo'd parameter names instead of silent ignore. ---
// ── Object / List shape + materialization ──────────────────────────────────
[Fact]
public void UnexpectedBodyField_ReturnsInvalid()
public void ObjectType_NoDeclaredFields_ShapeOnly_Materialized()
{
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "value", Type = "Integer", Required = true }
});
const string def = """{"type":"object","properties":{"data":{"type":"object"}},"required":["data"]}""";
var result = ParameterValidator.Validate(Body("{\"data\": {\"key\": \"value\"}}"), def);
Assert.True(result.IsValid);
Assert.IsType<Dictionary<string, object?>>(result.Parameters["data"]);
}
[Fact]
public void ListType_NoDeclaredElement_ShapeOnly_Materialized()
{
const string def = """{"type":"object","properties":{"items":{"type":"array"}},"required":["items"]}""";
var result = ParameterValidator.Validate(Body("{\"items\": [1, 2, 3]}"), def);
Assert.True(result.IsValid);
Assert.IsType<List<object?>>(result.Parameters["items"]);
}
// ── Undeclared / unexpected fields (rejected, recursively) ─────────────────
[Fact]
public void UnexpectedTopLevelField_ReturnsInvalid()
{
const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
// "valeu" is a typo for "value"; the caller must be told, not ignored.
using var doc = JsonDocument.Parse("{\"value\": 1, \"valeu\": 2}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
var result = ParameterValidator.Validate(Body("{\"value\": 1, \"valeu\": 2}"), def);
Assert.False(result.IsValid);
Assert.Contains("valeu", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
}
[Fact]
public void OnlyDefinedFields_StillValid()
public void OnlyDeclaredFields_StillValid()
{
// Regression guard: a body containing exactly the defined parameters
// must continue to validate.
var definitions = JsonSerializer.Serialize(new[]
{
new { Name = "value", Type = "Integer", Required = true }
});
const string def = """{"type":"object","properties":{"value":{"type":"integer"}},"required":["value"]}""";
using var doc = JsonDocument.Parse("{\"value\": 1}");
var result = ParameterValidator.Validate(doc.RootElement.Clone(), definitions);
var result = ParameterValidator.Validate(Body("{\"value\": 1}"), def);
Assert.True(result.IsValid);
Assert.Equal((long)1, result.Parameters["value"]);
}
[Fact]
public void UndeclaredNestedField_ReturnsInvalid_PathQualified()
{
const string def = """
{"type":"object","properties":{
"order":{"type":"object","properties":{"id":{"type":"integer"}},"required":["id"]}
},"required":["order"]}
""";
var result = ParameterValidator.Validate(
Body("""{"order":{"id":1,"bogus":2}}"""), def);
Assert.False(result.IsValid);
Assert.Contains("order.bogus", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
}
// ── Nested validation: the M2.6 core ───────────────────────────────────────
private const string NestedDef = """
{
"type":"object",
"properties":{
"order":{
"type":"object",
"properties":{
"id":{"type":"integer"},
"customer":{
"type":"object",
"properties":{"name":{"type":"string"},"vip":{"type":"boolean"}},
"required":["name"]
},
"items":{
"type":"array",
"items":{
"type":"object",
"properties":{"sku":{"type":"string"},"quantity":{"type":"integer"}},
"required":["sku","quantity"]
}
}
},
"required":["id","customer","items"]
}
},
"required":["order"]
}
""";
[Fact]
public void ValidNestedPayload_Passes()
{
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":true},
"items":[
{"sku":"A1","quantity":3},
{"sku":"B2","quantity":1}
]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void WrongScalarTwoLevelsDeep_ReturnsInvalid_WithExactPath()
{
// order.customer.vip declared boolean, given a string.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":"yes"},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.customer.vip'", result.ErrorMessage);
Assert.Contains("Boolean", result.ErrorMessage);
}
[Fact]
public void WrongScalarInsideListElement_ReturnsInvalid_WithElementIndexInPath()
{
// order.items[1].quantity declared integer, given a string.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme"},
"items":[
{"sku":"A1","quantity":3},
{"sku":"B2","quantity":"lots"}
]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.items[1].quantity'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact]
public void ListElementWrongShape_ReturnsInvalid_WithElementIndexInPath()
{
// order.items[0] declared object, given a scalar.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme"},
"items":[ 42 ]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("'order.items[0]'", result.ErrorMessage);
Assert.Contains("Object", result.ErrorMessage);
}
[Fact]
public void MissingRequiredNestedField_ReturnsInvalid_PathQualified()
{
// order.customer.name is required but absent.
const string body = """
{"order":{
"id":7,
"customer":{"vip":false},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.False(result.IsValid);
Assert.Contains("missing required field", result.ErrorMessage);
Assert.Contains("'order.customer.name'", result.ErrorMessage);
}
// ── Empty / null edge cases ────────────────────────────────────────────────
[Fact]
public void EmptyList_AgainstTypedElement_Passes()
{
const string body = """
{"order":{"id":7,"customer":{"name":"Acme"},"items":[]}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void NullForOptionalNestedScalar_Passes()
{
// order.customer.vip is optional; explicit null is accepted.
const string body = """
{"order":{
"id":7,
"customer":{"name":"Acme","vip":null},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
[Fact]
public void NullForRequiredNestedScalar_Passes()
{
// A PRESENT-but-null required field satisfies the type — only ABSENCE
// of a required field is an error (consistent with return-side policy).
const string body = """
{"order":{
"id":null,
"customer":{"name":"Acme"},
"items":[]
}}
""";
var result = ParameterValidator.Validate(Body(body), NestedDef);
Assert.True(result.IsValid);
}
// ── Legacy flat-array backward-compat ──────────────────────────────────────
[Fact]
public void LegacyFlatArrayDefinition_StillAccepted()
{
const string def = """[{"name":"count","type":"Integer","required":true}]""";
var ok = ParameterValidator.Validate(Body("{\"count\":5}"), def);
Assert.True(ok.IsValid);
Assert.Equal((long)5, ok.Parameters["count"]);
var bad = ParameterValidator.Validate(Body("{\"count\":\"nope\"}"), def);
Assert.False(bad.IsValid);
Assert.Contains("'count'", bad.ErrorMessage);
}
// FIX 1: legacy "required":"false" string → field is optional ─────────────
[Theory]
[InlineData("""[{"name":"opt","type":"String","required":"false"}]""")]
[InlineData("""[{"name":"opt","type":"String","required":"False"}]""")]
[InlineData("""[{"name":"opt","type":"String","required":"FALSE"}]""")]
public void LegacyFlatArray_RequiredStringFalse_FieldIsOptional(string def)
{
// An absent field whose "required" is the string "false" (any case)
// must be treated as optional — consistent with the SQL migration's
// LOWER(...) <> 'false' comparison that produced these rows.
var result = ParameterValidator.Validate(null, def);
Assert.True(result.IsValid, $"Expected optional field to be valid when absent; error: {result.ErrorMessage}");
}
[Fact]
public void LegacyFlatArray_RequiredStringFalse_FieldPresentAndTypedCorrectly_Passes()
{
const string def = """[{"name":"opt","type":"String","required":"false"}]""";
var result = ParameterValidator.Validate(Body("{\"opt\":\"hello\"}"), def);
Assert.True(result.IsValid);
}
// FIX 2: recursion depth guard on Parse ───────────────────────────────────
/// <summary>
/// Builds a JSON Schema string with <paramref name="depth"/> levels of nested
/// object-in-properties nesting. Each level wraps the previous in an object
/// with a single property "a". The result exceeds the Parse ceiling when
/// depth &gt; 32.
/// </summary>
private static string BuildDeeplyNestedSchema(int depth)
{
// Inner-most: a scalar
var schema = "{\"type\":\"string\"}";
for (var i = 0; i < depth; i++)
{
schema = "{\"type\":\"object\",\"properties\":{\"a\":" + schema + "}}";
}
return schema;
}
[Fact]
public void SchemaAtDepthCeiling_ParsesSuccessfully()
{
// Exactly 32 levels of nesting should parse without throwing.
var def = BuildDeeplyNestedSchema(32);
var schema = InboundApiSchema.Parse(def);
Assert.NotNull(schema);
}
[Fact]
public void SchemaExceedingDepthCeiling_ThrowsJsonException_NotStackOverflow()
{
// 33 levels exceeds the ceiling → JsonException (clean 400 via the
// caller's try/catch), NOT a StackOverflowException.
var def = BuildDeeplyNestedSchema(33);
Assert.Throws<System.Text.Json.JsonException>(() => InboundApiSchema.Parse(def));
}
[Fact]
public void SchemaExceedingDepthCeiling_ParameterValidator_ReturnsInvalid()
{
// End-to-end: ParameterValidator wraps Parse in try/catch(JsonException)
// → the caller gets Invalid rather than an unhandled exception.
var def = BuildDeeplyNestedSchema(33);
var result = ParameterValidator.Validate(Body("{\"a\":\"x\"}"), def);
Assert.False(result.IsValid);
Assert.Contains("Invalid parameter definitions", result.ErrorMessage);
}
}
@@ -1,13 +1,21 @@
using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi;
namespace ZB.MOM.WW.ScadaBridge.InboundAPI.Tests;
/// <summary>
/// InboundAPI-014: tests for return-value validation against a method's
/// <c>ReturnDefinition</c>. Previously the script's return value was serialized
/// verbatim with no checking against the declared return structure.
/// InboundAPI-014 / InboundAPI-M2.6: tests for return-value validation against a
/// method's <c>ReturnDefinition</c>. Mirrors <see cref="ParameterValidatorTests"/>
/// (shared recursive engine) — RECURSIVE nested Object / List-element type
/// validation with path-qualified errors.
///
/// <para>
/// Definitions are expressed as JSON Schema (the canonical persisted format);
/// the legacy flat-array form is still accepted (final region).
/// </para>
/// </summary>
public class ReturnValueValidatorTests
{
// --- No definition → no validation (backward compatible) ---
// ── No definition → no validation (backward compatible) ───────────────────
[Theory]
[InlineData(null)]
@@ -26,12 +34,17 @@ public class ReturnValueValidatorTests
Assert.True(result.IsValid);
}
// --- Happy path: result matches the declared field shape ---
// ── Happy path: result matches the declared object shape ──────────────────
[Fact]
public void ResultMatchingDefinition_IsValid()
{
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]""";
const string def = """
{"type":"object","properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"}
},"required":["siteName","totalUnits"]}
""";
const string json = """{"siteName":"Site Alpha","totalUnits":14250}""";
var result = ReturnValueValidator.Validate(json, def);
@@ -40,22 +53,31 @@ public class ReturnValueValidatorTests
}
[Fact]
public void ResultWithListField_ShapeChecked_IsValid()
public void ResultWithListOfScalars_TypeChecked_IsValid()
{
const string def = """[{"name":"lines","type":"List"}]""";
const string json = """{"lines":[{"lineName":"Line-1","units":8200}]}""";
const string def = """
{"type":"object","properties":{
"codes":{"type":"array","items":{"type":"integer"}}
}}
""";
const string json = """{"codes":[1,2,3]}""";
var result = ReturnValueValidator.Validate(json, def);
Assert.True(result.IsValid);
}
// --- Mismatches must be reported ---
// ── Scalar / shape mismatches must be reported ────────────────────────────
[Fact]
public void ResultMissingDeclaredField_IsInvalid()
{
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]""";
const string def = """
{"type":"object","properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"}
},"required":["siteName","totalUnits"]}
""";
const string json = """{"siteName":"Site Alpha"}""";
var result = ReturnValueValidator.Validate(json, def);
@@ -67,7 +89,7 @@ public class ReturnValueValidatorTests
[Fact]
public void ResultFieldWrongType_IsInvalid()
{
const string def = """[{"name":"totalUnits","type":"Integer"}]""";
const string def = """{"type":"object","properties":{"totalUnits":{"type":"integer"}},"required":["totalUnits"]}""";
const string json = """{"totalUnits":"not-a-number"}""";
var result = ReturnValueValidator.Validate(json, def);
@@ -79,7 +101,7 @@ public class ReturnValueValidatorTests
[Fact]
public void NullResultWhenStructureRequired_IsInvalid()
{
const string def = """[{"name":"siteName","type":"String"}]""";
const string def = """{"type":"object","properties":{"siteName":{"type":"string"}},"required":["siteName"]}""";
var result = ReturnValueValidator.Validate(null, def);
@@ -89,7 +111,7 @@ public class ReturnValueValidatorTests
[Fact]
public void NonObjectResultWhenStructureRequired_IsInvalid()
{
const string def = """[{"name":"siteName","type":"String"}]""";
const string def = """{"type":"object","properties":{"siteName":{"type":"string"}},"required":["siteName"]}""";
var result = ReturnValueValidator.Validate("42", def);
@@ -99,7 +121,7 @@ public class ReturnValueValidatorTests
[Fact]
public void ListFieldGivenNonArray_IsInvalid()
{
const string def = """[{"name":"lines","type":"List"}]""";
const string def = """{"type":"object","properties":{"lines":{"type":"array","items":{"type":"object"}}}}""";
const string json = """{"lines":"not-a-list"}""";
var result = ReturnValueValidator.Validate(json, def);
@@ -115,4 +137,261 @@ public class ReturnValueValidatorTests
Assert.False(result.IsValid);
}
// ── Nested validation: the M2.6 core (production-report shape) ─────────────
private const string ReportDef = """
{
"type":"object",
"properties":{
"siteName":{"type":"string"},
"totalUnits":{"type":"integer"},
"lines":{
"type":"array",
"items":{
"type":"object",
"properties":{
"lineName":{"type":"string"},
"units":{"type":"integer"},
"efficiency":{"type":"number"}
},
"required":["lineName","units"]
}
}
},
"required":["siteName","totalUnits","lines"]
}
""";
[Fact]
public void ValidNestedReturn_Passes()
{
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[
{"lineName":"Line-1","units":8200,"efficiency":92.5},
{"lineName":"Line-2","units":6050,"efficiency":88.1}
]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
[Fact]
public void WrongScalarInsideListElement_IsInvalid_WithElementIndexInPath()
{
// lines[1].units declared integer, given a string.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[
{"lineName":"Line-1","units":8200},
{"lineName":"Line-2","units":"lots"}
]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[1].units'", result.ErrorMessage);
Assert.Contains("Integer", result.ErrorMessage);
}
[Fact]
public void WrongListElementType_IsInvalid_WithElementIndexInPath()
{
// lines[0] declared object, given a scalar.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ 7 ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[0]'", result.ErrorMessage);
Assert.Contains("Object", result.ErrorMessage);
}
[Fact]
public void MissingRequiredNestedField_IsInvalid_PathQualified()
{
// lines[0].units is required but absent.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ {"lineName":"Line-1"} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("missing required field", result.ErrorMessage);
Assert.Contains("'lines[0].units'", result.ErrorMessage);
}
[Fact]
public void UndeclaredNestedField_IsInvalid_PathQualified()
{
// lines[0].bogus is not declared on the line-item schema.
const string json = """
{
"siteName":"Site Alpha",
"totalUnits":14250,
"lines":[ {"lineName":"Line-1","units":1,"bogus":true} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.False(result.IsValid);
Assert.Contains("'lines[0].bogus'", result.ErrorMessage);
Assert.Contains("not a declared field", result.ErrorMessage);
}
// ── Empty / null edge cases ────────────────────────────────────────────────
[Fact]
public void EmptyListAgainstTypedElement_Passes()
{
const string json = """{"siteName":"S","totalUnits":0,"lines":[]}""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
[Fact]
public void EmptyObjectSchema_AnythingIsValid()
{
const string def = """{"type":"object","properties":{}}""";
var result = ReturnValueValidator.Validate("""{"whatever":1}""", def);
Assert.True(result.IsValid);
}
[Fact]
public void NullOptionalNestedScalar_Passes()
{
// lines[0].efficiency is optional; explicit null is accepted.
const string json = """
{
"siteName":"S",
"totalUnits":1,
"lines":[ {"lineName":"L1","units":1,"efficiency":null} ]
}
""";
var result = ReturnValueValidator.Validate(json, ReportDef);
Assert.True(result.IsValid);
}
// ── Legacy flat-array backward-compat ──────────────────────────────────────
[Fact]
public void LegacyFlatArrayDefinition_StillAccepted()
{
const string def = """[{"name":"siteName","type":"String"},{"name":"totalUnits","type":"Integer"}]""";
var ok = ReturnValueValidator.Validate("""{"siteName":"A","totalUnits":1}""", def);
Assert.True(ok.IsValid);
var bad = ReturnValueValidator.Validate("""{"siteName":"A","totalUnits":"x"}""", def);
Assert.False(bad.IsValid);
Assert.Contains("totalUnits", bad.ErrorMessage);
}
// FIX 3: scalar return schema validates scalar return values ──────────────
// (Guards the intentional ParameterValidator/ReturnValueValidator asymmetry:
// ReturnValueValidator must NOT short-circuit on non-object schema types.)
[Fact]
public void ScalarStringReturnSchema_ValidatesScalarStringReturn()
{
// A {"type":"string"} return schema must accept a bare JSON string.
var result = ReturnValueValidator.Validate("\"hello\"", """{"type":"string"}""");
Assert.True(result.IsValid);
}
[Fact]
public void ScalarIntegerReturnSchema_ValidatesScalarIntegerReturn()
{
var result = ReturnValueValidator.Validate("42", """{"type":"integer"}""");
Assert.True(result.IsValid);
}
[Fact]
public void ScalarStringReturnSchema_RejectsIntegerReturn()
{
var result = ReturnValueValidator.Validate("42", """{"type":"string"}""");
Assert.False(result.IsValid);
Assert.Contains("String", result.ErrorMessage);
}
[Fact]
public void ScalarBooleanReturnSchema_ValidatesBooleanReturn()
{
var result = ReturnValueValidator.Validate("true", """{"type":"boolean"}""");
Assert.True(result.IsValid);
}
// FIX 2: recursion depth guard on Validate ─────────────────────────────────
[Fact]
public void ValidateExceedingDepthCeiling_AddsDepthError_DoesNotThrow()
{
// Build a schema programmatically (bypassing Parse) with 34 levels of
// nesting to exceed the ceiling of 32. Validate must add an error and
// return, NOT stack overflow.
//
// Parse prevents creating a >32-level schema from stored JSON, but
// InboundApiSchema is a public type constructable in code, so Validate
// must guard independently.
var deepSchema = BuildProgrammaticSchema(34);
var json = BuildDeeplyNestedValue(34);
using var doc = System.Text.Json.JsonDocument.Parse(json);
var errors = new System.Collections.Generic.List<string>();
// Must not throw — adds a depth error to the list instead.
deepSchema.Validate(doc.RootElement, string.Empty, errors);
Assert.NotEmpty(errors);
Assert.Contains("nesting too deep", errors[0], StringComparison.OrdinalIgnoreCase);
}
/// <summary>
/// Constructs an <see cref="InboundApiSchema"/> with <paramref name="depth"/>
/// levels of object-nesting programmatically (bypassing <c>Parse</c>) to
/// exercise the Validate depth ceiling independently of the Parse ceiling.
/// </summary>
private static InboundApiSchema BuildProgrammaticSchema(int depth)
{
InboundApiSchema inner = new() { Type = "string" };
for (var i = 0; i < depth; i++)
{
inner = new InboundApiSchema
{
Type = "object",
Fields = [new InboundApiSchemaField("a", Required: false, inner)],
};
}
return inner;
}
private static string BuildDeeplyNestedValue(int depth)
{
var value = "\"leaf\"";
for (var i = 0; i < depth; i++)
{
value = "{\"a\":" + value + "}";
}
return value;
}
}

Some files were not shown because too many files have changed in this diff Show More