Compare commits
28 Commits
phase-6-pl
...
phase-6-2-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3b8280f08a | ||
| 70f3ec0092 | |||
|
|
8efb99b6be | ||
| f74e141e64 | |||
|
|
40fb459040 | ||
| 13a231b7ad | |||
|
|
0fcdfc7546 | ||
| 1650c6c550 | |||
|
|
f29043c66a | ||
| a7f34a4301 | |||
|
|
cbcaf6593a | ||
| 8d81715079 | |||
|
|
854c3bcfec | ||
| ff4a74a81f | |||
|
|
9dd5e4e745 | ||
| 6b3a67fd9e | |||
|
|
1d9008e354 | ||
|
|
ef6b0bb8fc | ||
| a06fcb16a2 | |||
|
|
d2f3a243cd | ||
|
|
29bcaf277b | ||
|
|
b6d2803ff6 | ||
|
|
f3850f8914 | ||
|
|
90f7792c92 | ||
|
|
c04b13f436 | ||
| 6a30f3dde7 | |||
|
|
ba31f200f6 | ||
| 81a1f7f0f6 |
@@ -1,6 +1,8 @@
|
|||||||
# Phase 6.1 — Resilience & Observability Runtime
|
# Phase 6.1 — Resilience & Observability Runtime
|
||||||
|
|
||||||
> **Status**: DRAFT — implementation plan for a cross-cutting phase that was never formalised. The v2 `plan.md` specifies Polly, Tier A/B/C protections, structured logging, and local-cache fallback by decision; none are wired end-to-end.
|
> **Status**: **SHIPPED** 2026-04-19 — Streams A/B/C/D + E data layer merged to `v2` across PRs #78-82. Final exit-gate PR #83 turns the compliance script into real checks (all pass) and records this status update. One deferred piece: Stream E.2/E.3 SignalR hub + Blazor `/hosts` column refresh lands in a visual-compliance follow-up PR on the Phase 6.4 Admin UI branch.
|
||||||
|
>
|
||||||
|
> Baseline: 906 solution tests → post-Phase-6.1: 1042 passing (+136 net). One pre-existing Client.CLI Subscribe flake unchanged.
|
||||||
>
|
>
|
||||||
> **Branch**: `v2/phase-6-1-resilience-observability`
|
> **Branch**: `v2/phase-6-1-resilience-observability`
|
||||||
> **Estimated duration**: 3 weeks
|
> **Estimated duration**: 3 weeks
|
||||||
@@ -23,14 +25,18 @@ Closes these gaps flagged in the 2026-04-19 audit:
|
|||||||
|
|
||||||
| Concern | Change |
|
| Concern | Change |
|
||||||
|---------|--------|
|
|---------|--------|
|
||||||
| `Core` → new `Core.Resilience` sub-namespace | Shared Polly pipeline builder (`DriverResiliencePipelines`), per-capability policy (Read / Write / Subscribe / HistoryRead / Discover / Probe / Alarm). One pipeline per driver instance; driver-options decide tuning. |
|
| `Core` → new `Core.Resilience` sub-namespace | Shared Polly pipeline builder (`DriverResiliencePipelines`). **Pipeline key = `(DriverInstanceId, HostName)`** so one dead PLC behind a multi-device driver doesn't open the breaker for healthy siblings (decision #35 per-device isolation). **Per-capability policy** — Read / HistoryRead / Discover / Probe / Alarm get retries; **Write does NOT** unless `[WriteIdempotent]` on the tag definition (decisions #44-45). |
|
||||||
| Every `IDriver*` consumer in the server | Wrap capability calls in the shared pipeline. Policy composition order: timeout → retry (with jitter, bounded by capability-specific `MaxRetries`) → circuit breaker (per driver instance, opens on N consecutive failures) → bulkhead (ceiling on in-flight requests per driver). |
|
| Every capability-interface consumer in the server | Wrap `IReadable.ReadAsync`, `IWritable.WriteAsync`, `ITagDiscovery.DiscoverAsync`, `ISubscribable.SubscribeAsync/UnsubscribeAsync`, `IHostConnectivityProbe` probe loop, `IAlarmSource.SubscribeAlarmsAsync/AcknowledgeAsync`, `IHistoryProvider.ReadRawAsync/ReadProcessedAsync/ReadAtTimeAsync/ReadEventsAsync`. Composition: timeout → (retry when capability supports) → circuit breaker → bulkhead. |
|
||||||
| `Core` → new `Core.Stability` sub-namespace | Generalise `MemoryWatchdog` (`Driver.Galaxy.Host`) into `DriverMemoryWatchdog` consuming `IDriver.GetMemoryFootprint()`. Add `ScheduledRecycleScheduler` (decision #67) for weekly/time-of-day recycle. Add `WedgeDetector` that flips a driver to Faulted when no successful Read in N × PublishingInterval. |
|
| `Core.Abstractions` → new `WriteIdempotentAttribute` | Marker on `ModbusTagDefinition` / `S7TagDefinition` / `OpcUaClientDriver` tag rows; opts that tag into auto-retry on Write. Absence = no retry, per spec. |
|
||||||
| `DriverTypeRegistry` | Each driver type registers its `DriverTier` {A, B, C}. Tier C drivers must also advertise their out-of-process topology; the registry enforces invariants (Tier C has a `Proxy` + `Host` pair). |
|
| `Core` → new `Core.Stability` sub-namespace — **split** | Two separate subsystems: (a) **`MemoryTracking`** runs all tiers; captures baseline (median of first 5 min `GetMemoryFootprint` samples) + applies the hybrid rule `soft = max(multiplier × baseline, baseline + floor)`; soft breach logs + surfaces to Admin; never kills. (b) **`MemoryRecycle`** (Tier C only — requires out-of-process topology) handles hard-breach recycle via the Proxy-side supervisor. Tier A/B overrun escalates to Tier C promotion ticket, not auto-kill. |
|
||||||
| `OtOpcUa.Server` → new Minimal API endpoints | `/healthz` (liveness — process alive + config DB reachable or LiteDB cache warm), `/readyz` (readiness — every driver instance reports `DriverState.Healthy`). JSON bodies cite individual driver health per instance. |
|
| `ScheduledRecycleScheduler` | Tier C only per decisions #73-74. Weekly/time-of-day recycle via Proxy supervisor. Tier A/B opt-in recycle lands in a future phase together with a Tier-C-escalation workflow. |
|
||||||
| Serilog configuration | Centralize enrichers in `OtOpcUa.Server/Observability/LogContextEnricher.cs`. Every driver call runs inside a `LogContext.PushProperty` scope with {DriverInstanceId, DriverType, CapabilityName, CorrelationId (UA RequestHandle or internal GUID)}. Sink config stays rolling-file per CLAUDE.md; JSON-formatted output added alongside plain-text so SIEM ingestion works. |
|
| `WedgeDetector` | **Demand-aware**: flips a driver to Faulted only when `(hasPendingWork AND noProgressIn > threshold)`. `hasPendingWork` derives from non-zero Polly bulkhead depth OR ≥1 active MonitoredItem OR ≥1 queued historian read. Idle + subscription-only drivers stay Healthy. |
|
||||||
| `Configuration` project | Add `LiteDbConfigCache` adapter. Wrap EF Core queries in a Polly pipeline: timeout (2 s) → retry (3×, jittered) → fallback-to-cache. Cache refresh on successful DB query + after `sp_PublishGeneration`. Cache lives at `%ProgramData%/OtOpcUa/config-cache/<cluster-id>.db` per node. |
|
| `DriverTypeRegistry` | Each driver type registers its `DriverTier` {A, B, C}. Tier C drivers must advertise their out-of-process topology; the registry enforces invariants (Tier C has a `Proxy` + `Host` pair). |
|
||||||
| `DriverHostStatus` entity | Extend to carry `LastCircuitBreakerOpenUtc`, `ConsecutiveFailures`, `CurrentBulkheadDepth`, `LastRecycleUtc`. Admin `/hosts` page reads these. |
|
| `Driver.Galaxy.Proxy/Supervisor/` | **Retains** existing `CircuitBreaker` + `Backoff` — they guard IPC respawn (decision #68), different concern from the per-call Polly layer. Only `HeartbeatMonitor` is referenced downstream (IPC liveness). |
|
||||||
|
| `OtOpcUa.Server` → Minimal API endpoints on `http://+:4841` | `/healthz` = process alive + (config DB reachable OR `UsingStaleConfig=true`). `/readyz` = ANDed driver health; state-machine per `DriverState`: `Unknown`/`Initializing` → 503, `Healthy` → 200, `Degraded` → 200 + `{degradedDrivers: [...]}` in body, `Faulted` → 503. JSON body always reports per-instance detail. |
|
||||||
|
| Serilog configuration | Centralize enrichers in `OtOpcUa.Server/Observability/LogContextEnricher.cs`. Every capability call runs inside a `LogContext.PushProperty` scope with {DriverInstanceId, DriverType, CapabilityName, CorrelationId (UA RequestHandle or internal GUID)}. Sink config stays rolling-file per CLAUDE.md; JSON sink added alongside plain-text (switchable via `Serilog:WriteJson` appsetting). |
|
||||||
|
| `Configuration` project | Add `LiteDbConfigCache` adapter. **Generation-sealed snapshots**: `sp_PublishGeneration` writes `<cache-root>/<cluster>/<generationId>.db` as a read-only sealed file. Reads serve the last-known-sealed generation; mixed-generation reads are impossible. Write path bypasses cache + fails hard on DB outage. Pipeline: timeout (2 s) → retry (3×, jittered) → fallback-to-sealed-snapshot. |
|
||||||
|
| `DriverHostStatus` vs. `DriverInstanceResilienceStatus` | New separate entity `DriverInstanceResilienceStatus { DriverInstanceId, HostName, LastCircuitBreakerOpenUtc, ConsecutiveFailures, CurrentBulkheadDepth, LastRecycleUtc, BaselineFootprintBytes }`. `DriverHostStatus` keeps per-host connectivity only; Admin `/hosts` joins both for display. |
|
||||||
|
|
||||||
## Scope — What Does NOT Change
|
## Scope — What Does NOT Change
|
||||||
|
|
||||||
@@ -56,19 +62,21 @@ Closes these gaps flagged in the 2026-04-19 audit:
|
|||||||
|
|
||||||
### Stream A — Resilience layer (1 week)
|
### Stream A — Resilience layer (1 week)
|
||||||
|
|
||||||
1. **A.1** Add `Polly.Core` + `Microsoft.Extensions.Resilience` to `Core`. Build `DriverResiliencePipelineBuilder` that composes Timeout → Retry (exponential backoff + jitter, capability-specific max retries) → CircuitBreaker (consecutive-failure threshold; half-open probe) → Bulkhead (max in-flight per driver instance). Unit tests cover each policy in isolation + composed pipeline.
|
1. **A.1** Add `Polly.Core` + `Microsoft.Extensions.Resilience` to `Core`. Build `DriverResiliencePipelineBuilder` — key on `(DriverInstanceId, HostName)`; composes Timeout → (Retry when the capability allows it; skipped for Write unless `[WriteIdempotent]`) → CircuitBreaker → Bulkhead. Per-capability policy map documented in `DriverResilienceOptions.CapabilityPolicies`.
|
||||||
2. **A.2** `DriverResilienceOptions` record bound from `DriverInstance.ResilienceConfig` JSON column (new nullable). Defaults encoded per-tier: Tier A (OPC UA Client, S7) — 3 retries, 2 s timeout, 5-failure breaker; Tier B (Modbus) — same except 4 s timeout; Tier C (Galaxy) — 1 retry (inner supervisor handles restart), 10 s timeout, circuit-breaker trips but doesn't kill the driver (the Proxy supervisor already handles that).
|
2. **A.2** `DriverResilienceOptions` record bound from `DriverInstance.ResilienceConfig` JSON column (new nullable). **Per-tier × per-capability** defaults: Tier A (OpcUaClient, S7) Read 3 retries/2 s/5-failure-breaker, Write 0 retries/2 s/5-failure-breaker; Tier B (Modbus) Read 3/4 s/5, Write 0/4 s/5; Tier C (Galaxy) Read 1 retry/10 s/no-kill, Write 0/10 s/no-kill. Idempotent writes can opt into Read-shaped retry via the attribute.
|
||||||
3. **A.3** `DriverCapabilityInvoker<T>` wraps every `IDriver*` method call. Existing server-side dispatch (whatever currently calls `driver.ReadAsync`) routes through the invoker. Policy injection via DI.
|
3. **A.3** `CapabilityInvoker<TCapability, TResult>` wraps every method on the capability interfaces (`IReadable.ReadAsync`, `IWritable.WriteAsync`, `ITagDiscovery.DiscoverAsync`, `ISubscribable.SubscribeAsync/UnsubscribeAsync`, `IHostConnectivityProbe` probe loop, `IAlarmSource.SubscribeAlarmsAsync/AcknowledgeAsync`, `IHistoryProvider.ReadRawAsync/ReadProcessedAsync/ReadAtTimeAsync/ReadEventsAsync`). Existing server-side dispatch routes through it.
|
||||||
4. **A.4** Remove the hand-rolled `CircuitBreaker` + `Backoff` from `Driver.Galaxy.Proxy/Supervisor/` — replaced by the shared layer. Keep `HeartbeatMonitor` (different concern: IPC liveness, not data-path resilience).
|
4. **A.4** **Retain** `Driver.Galaxy.Proxy/Supervisor/CircuitBreaker.cs` + `Backoff.cs` — they guard IPC process respawn (decision #68), orthogonal to the per-call Polly layer. Only `HeartbeatMonitor` is consumed outside the supervisor.
|
||||||
5. **A.5** Unit tests: per-policy, per-composition. Integration test: Modbus driver under a FlakeyTransport that fails 5×, succeeds on 6th; invoker surfaces the eventual success. Bench: no-op overhead < 1% under nominal load.
|
5. **A.5** Unit tests: per-policy, per-composition. Negative integration tests: (a) Modbus FlakeyTransport fails 5× on Read, succeeds 6th — invoker surfaces success; (b) Modbus FlakeyTransport fails 1× on Write with `[WriteIdempotent]=false` — invoker surfaces failure without retry (no duplicate pulse); (c) Modbus FlakeyTransport fails 1× on Write with `[WriteIdempotent]=true` — invoker retries. Bench: no-op overhead < 1%.
|
||||||
|
6. **A.6** `WriteIdempotentAttribute` in `Core.Abstractions`. Modbus/S7/OpcUaClient tag-definition records pick it up; invoker reads via reflection once at driver init.
|
||||||
|
|
||||||
### Stream B — Tier A/B/C stability runtime (1 week, can parallel with Stream A after A.1)
|
### Stream B — Tier A/B/C stability runtime — split into MemoryTracking + MemoryRecycle (1 week)
|
||||||
|
|
||||||
1. **B.1** `Core.Abstractions` → `DriverTier` enum {A, B, C}. Extend `DriverTypeRegistry` to require `DriverTier` at registration. Existing driver types get their tier stamped (Galaxy = C, Modbus = B, S7 = B, OpcUaClient = A).
|
1. **B.1** `Core.Abstractions` → `DriverTier` enum {A, B, C}. Extend `DriverTypeRegistry` to require `DriverTier` at registration. Existing driver types stamped (Galaxy = C, Modbus = B, S7 = B, OpcUaClient = A).
|
||||||
2. **B.2** Generalise `DriverMemoryWatchdog` (lift from `Driver.Galaxy.Host/MemoryWatchdog.cs`). Tier-specific thresholds: A = 256 MB RSS soft / 512 MB hard, B = 512 MB soft / 1 GB hard, C = 1 GB soft / 2 GB hard (decision #70 hybrid multiplier + floor). Soft threshold → log + metric; hard threshold → mark driver Faulted + trigger recycle.
|
2. **B.2** **`MemoryTracking`** (all tiers) lifted from `Driver.Galaxy.Host/MemoryWatchdog.cs`. Captures `BaselineFootprintBytes` as the median of first 5 min of `IDriver.GetMemoryFootprint()` samples post-`InitializeAsync`. Applies **decision #70 hybrid formula**: `soft = max(multiplier × baseline, baseline + floor)`; Tier A multiplier=3, floor=50 MB; Tier B multiplier=3, floor=100 MB; Tier C multiplier=2, floor=500 MB. Soft breach → log + `DriverInstanceResilienceStatus.CurrentFootprint` tick; never kills. Hard = 2 × soft.
|
||||||
3. **B.3** `ScheduledRecycleScheduler` (decision #67): each driver instance can opt-in to a weekly recycle at a configured cron. Recycle = `ShutdownAsync` → `InitializeAsync`. Tier C drivers get the Proxy-side recycle; Tier A/B recycle in-process.
|
3. **B.3** **`MemoryRecycle`** (Tier C only per decisions #73-74). Hard-breach on a Tier C driver triggers `ScheduledRecycleScheduler.RequestRecycleNow(driverInstanceId)`; scheduler proxies to `Driver.Galaxy.Proxy/Supervisor/` which restarts the Host process. Tier A/B hard-breach logs a promotion-to-Tier-C recommendation; **never auto-kills** the in-process driver.
|
||||||
4. **B.4** `WedgeDetector`: polling thread per driver instance; if `LastSuccessfulRead` older than `WedgeThreshold` (default 5 × PublishingInterval, minimum 60 s) AND driver state is `Healthy`, flag as wedged → force `ReinitializeAsync`. Prevents silent dead-subscriptions.
|
4. **B.4** **`ScheduledRecycleScheduler`** per decision #67: Tier C driver instances opt-in to a weekly recycle at a configured cron. Tier A/B scheduled recycle deferred to a later phase paired with Tier-C escalation.
|
||||||
5. **B.5** Tests: watchdog unit tests drive synthetic allocation; scheduler uses a virtual clock; wedge detector tests use a fake IClock + driver stub.
|
5. **B.5** **`WedgeDetector`** demand-aware: `if (state==Healthy && hasPendingWork && noProgressIn > WedgeThreshold) → force ReinitializeAsync`. `hasPendingWork` = (bulkhead depth > 0) OR (active monitored items > 0) OR (queued historian-read count > 0). `WedgeThreshold` default 5 × PublishingInterval, min 60 s. Idle driver stays Healthy.
|
||||||
|
6. **B.6** Tests: tracking unit tests drive synthetic allocation against a fake `GetMemoryFootprint`; recycle tests use a mock supervisor; wedge tests include the false-fault cases — idle subscriber, slow historian backfill, write-only burst.
|
||||||
|
|
||||||
### Stream C — Health endpoints + structured logging (4 days)
|
### Stream C — Health endpoints + structured logging (4 days)
|
||||||
|
|
||||||
@@ -77,12 +85,12 @@ Closes these gaps flagged in the 2026-04-19 audit:
|
|||||||
3. **C.3** Add JSON-formatted Serilog sink alongside the existing rolling-file plain-text sink so SIEMs (Splunk, Datadog) can ingest without a regex parser. Sink switchable via `Serilog:WriteJson` appsetting.
|
3. **C.3** Add JSON-formatted Serilog sink alongside the existing rolling-file plain-text sink so SIEMs (Splunk, Datadog) can ingest without a regex parser. Sink switchable via `Serilog:WriteJson` appsetting.
|
||||||
4. **C.4** Integration test: boot server, issue Modbus read, assert log line contains `DriverInstanceId` + `CorrelationId` structured fields.
|
4. **C.4** Integration test: boot server, issue Modbus read, assert log line contains `DriverInstanceId` + `CorrelationId` structured fields.
|
||||||
|
|
||||||
### Stream D — Config DB LiteDB fallback (1 week)
|
### Stream D — Config DB LiteDB fallback — generation-sealed snapshots (1 week)
|
||||||
|
|
||||||
1. **D.1** `LiteDbConfigCache` adapter. Wraps `ConfigurationDbContext` queries that are safe to serve stale (cluster membership, generation metadata, driver instance definitions, LDAP role mapping). Write-path queries (draft save, publish) bypass the cache and fail hard on DB outage.
|
1. **D.1** `LiteDbConfigCache` adapter backed by **sealed generation snapshots**: each successful `sp_PublishGeneration` writes `<cache-root>/<clusterId>/<generationId>.db` as read-only after commit. The adapter maintains a `CurrentSealedGenerationId` pointer updated atomically on successful publish. Mixed-generation reads are **impossible** — every read served from the cache serves one coherent sealed generation.
|
||||||
2. **D.2** Cache refresh strategy: refresh on every successful read (write-through-cache), full refresh after `sp_PublishGeneration` confirmation. Cache entries carry `CachedAtUtc`; served entries older than 24 h trigger a synthetic `Warning` log line so operators see stale data in effect.
|
2. **D.2** Write-path queries (draft save, publish) bypass the cache entirely and fail hard on DB outage. Read-path queries (DriverInstance enumeration, LdapGroupRoleMapping, cluster + namespace metadata) go through the pipeline: timeout 2 s → retry 3× jittered → fallback to the current sealed snapshot.
|
||||||
3. **D.3** Polly pipeline in `Configuration` project: EF Core query → retry 3× → fallback to cache. On fallback, driver state stays `Healthy` but a `UsingStaleConfig` flag on the cluster's health report flips true.
|
3. **D.3** `UsingStaleConfig` flag flips true when a read fell back to the sealed snapshot; cleared on the next successful DB round-trip. Surfaced on `/healthz` body and Admin `/hosts`.
|
||||||
4. **D.4** Tests: in-memory SQL Server failure injected via `TestContainers`-ish double; cache returns last-known values; Admin UI banners reflect `UsingStaleConfig`.
|
4. **D.4** Tests: (a) SQL-container kill mid-operation — read returns sealed snapshot, `UsingStaleConfig=true`, driver stays Healthy; (b) mixed-generation guard — attempt to serve partial generation by corrupting a snapshot file mid-read → adapter fails closed rather than serving mixed data; (c) first-boot-no-snapshot case — adapter refuses to start, driver fails `InitializeAsync` with a clear config-DB-required error.
|
||||||
|
|
||||||
### Stream E — Admin `/hosts` page refresh (3 days)
|
### Stream E — Admin `/hosts` page refresh (3 days)
|
||||||
|
|
||||||
@@ -92,11 +100,17 @@ Closes these gaps flagged in the 2026-04-19 audit:
|
|||||||
|
|
||||||
## Compliance Checks (run at exit gate)
|
## Compliance Checks (run at exit gate)
|
||||||
|
|
||||||
- [ ] **Polly coverage**: every `IDriver*` method call in the server dispatch layer routes through `DriverCapabilityInvoker`. Enforce via a Roslyn analyzer added to `Core.Abstractions` build (error on direct `IDriver.ReadAsync` calls outside the invoker).
|
- [ ] **Invoker coverage**: every method on `IReadable` / `IWritable` / `ITagDiscovery` / `ISubscribable` / `IHostConnectivityProbe` / `IAlarmSource` / `IHistoryProvider` in the server dispatch layer routes through `CapabilityInvoker`. Enforce via a Roslyn analyzer (error-level; warning-first is rejected — the compliance check is the gate).
|
||||||
- [ ] **Tier registry**: every driver type registered in `DriverTypeRegistry` has a non-null `Tier`. Unit test walks the registry + asserts no gaps.
|
- [ ] **Write-retry guard**: writes without `[WriteIdempotent]` never get retried. Unit-test the invoker path asserts zero retry attempts.
|
||||||
- [ ] **Health contract**: `/healthz` + `/readyz` respond within 500 ms even with one driver Faulted.
|
- [ ] **Pipeline isolation**: pipeline key is `(DriverInstanceId, HostName)`. Integration test with two Modbus hosts under one instance — failing host A does not open the breaker for host B.
|
||||||
- [ ] **Structured log**: CI grep on `tests/` output asserts at least one log line contains `"DriverInstanceId"` + `"CorrelationId"` JSON fields.
|
- [ ] **Tier registry**: every driver type registered in `DriverTypeRegistry` has a non-null `Tier`. Unit test walks the registry + asserts no gaps. Tier C registrations must declare their out-of-process topology.
|
||||||
- [ ] **Cache fallback**: Integration test kills the SQL container mid-operation; driver health stays `Healthy`, `UsingStaleConfig` flips true.
|
- [ ] **MemoryTracking never kills**: soft/hard breach tests on a Tier A/B driver log + surface without terminating the process.
|
||||||
|
- [ ] **MemoryRecycle Tier C only**: hard breach on a Tier A driver never invokes the supervisor; on Tier C it does.
|
||||||
|
- [ ] **Wedge demand-aware**: test suite includes idle-subscription-only, slow-historian-backfill, and write-only-burst cases — driver stays Healthy.
|
||||||
|
- [ ] **Galaxy supervisor preserved**: `Driver.Galaxy.Proxy/Supervisor/CircuitBreaker.cs` + `Backoff.cs` still present + still invoked on Host crash.
|
||||||
|
- [ ] **Health state machine**: `/healthz` + `/readyz` respond within 500 ms for every `DriverState`; state-machine table in this doc drives the test matrix.
|
||||||
|
- [ ] **Structured log**: CI grep asserts at least one log line per capability call has `"DriverInstanceId"` + `"CorrelationId"` JSON fields.
|
||||||
|
- [ ] **Generation-sealed cache**: integration tests cover (a) SQL-kill mid-operation serves last-sealed snapshot; (b) mixed-generation corruption fails closed; (c) first-boot no-snapshot + DB-down → `InitializeAsync` fails with clear error.
|
||||||
- [ ] No regression in existing test suites — `dotnet test ZB.MOM.WW.OtOpcUa.slnx` count equal-or-greater than pre-Phase-6.1 baseline.
|
- [ ] No regression in existing test suites — `dotnet test ZB.MOM.WW.OtOpcUa.slnx` count equal-or-greater than pre-Phase-6.1 baseline.
|
||||||
|
|
||||||
## Risks and Mitigations
|
## Risks and Mitigations
|
||||||
|
|||||||
@@ -20,15 +20,22 @@ Closes these gaps:
|
|||||||
|
|
||||||
## Scope — What Changes
|
## Scope — What Changes
|
||||||
|
|
||||||
|
**Architectural separation** (critical for correctness): `LdapGroupRoleMapping` is **control-plane only** — it maps LDAP groups to Admin UI roles (`FleetAdmin` / `ConfigEditor` / `ReadOnly`) and cluster scopes for Admin access. **It is NOT consulted by the OPC UA data-path evaluator.** The data-path evaluator reads `NodeAcl` rows joined directly against the session's **resolved LDAP group memberships**. The two concerns share zero runtime code path.
|
||||||
|
|
||||||
| Concern | Change |
|
| Concern | Change |
|
||||||
|---------|--------|
|
|---------|--------|
|
||||||
| `Configuration` project | New entity `LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }`. Migration. Admin CRUD. |
|
| `Configuration` project | New entity `LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }`. **Consumed only by Admin UI role routing.** Migration. Admin CRUD. |
|
||||||
| `Core` → new `Core.Authorization` sub-namespace | `IPermissionEvaluator` interface; concrete `PermissionTrieEvaluator` implementation loads ACLs + LDAP mappings from Configuration, builds a trie keyed on the 6-level scope hierarchy, evaluates a `(UserClaim[], NodeId, NodePermissions)` → `bool` decision in O(depth × group-count). |
|
| `Core` → new `Core.Authorization` sub-namespace | `IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, OpcUaOperation op, NodeId nodeId) → AuthorizationDecision`. `op` covers every OPC UA surface: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve. Result is tri-state (internal model distinguishes `Allow` / `NotGranted` / `Denied` + carries matched-grant provenance). Phase 6.2 only produces `Allow` + `NotGranted`; v2.1 Deny lands without API break. |
|
||||||
| `Core.Authorization` cache | `PermissionTrieCache` — one trie per `(ClusterId, GenerationId)`. Rebuilt on `sp_PublishGeneration` confirmation; served from memory thereafter. Per-session evaluator keeps a reference to the current trie + user's LDAP groups. |
|
| `PermissionTrieBuilder` | Builds trie from `NodeAcl` rows joined against **resolved LDAP group memberships**, keyed on 6-level scope hierarchy for Equipment namespaces. **SystemPlatform namespaces (Galaxy)** use a `FolderSegment` scope level between Namespace and Tag, populated from `Tag.FolderPath` segments, so folder subtree authorization works on Galaxy trees the same way UNS works on Equipment trees. Trie node carries `ScopeKind` enum. |
|
||||||
| OPC UA server dispatch | `OtOpcUa.Server/OpcUa/DriverNodeManager.cs` Read/Write/HistoryRead/MonitoredItem-create paths call `PermissionEvaluator.Authorize(session.Identity, nodeId, NodePermissions.Read)` etc. before delegating to the driver. Unauthorized returns `BadUserAccessDenied` (0x80210000) — not a silent no-op per corrections-doc B1. |
|
| `PermissionTrieCache` + freshness | One trie per `(ClusterId, GenerationId)`. Invalidated on `sp_PublishGeneration` via in-process event bus AND generation-ID check on hot path — every authz call looks up `CurrentGenerationId` (Polly-wrapped, sub-second cache); a Backup that cached a stale generation detects the mismatch + forces re-load. **Redundancy-safe**. |
|
||||||
| `LdapAuthService` (existing) | On cookie-auth success, resolves the user's LDAP groups via `LdapGroupService.GetMemberships` + loads the matching `LdapGroupRoleMapping` rows → produces a role-claim list + cluster-scope claim list. Stored on the auth cookie. |
|
| `UserAuthorizationState` freshness | Cached per session BUT bounded by `MembershipFreshnessInterval` (default **15 min**). Past that, the next hot-path authz call re-resolves LDAP group memberships via `LdapGroupService`. Failure to re-resolve (LDAP unreachable) → **fail-closed**: evaluator returns `NotGranted` for every call until memberships refresh successfully. Decoupled from Phase 6.1's availability-oriented 24h cache. |
|
||||||
| Admin UI `AclsTab.razor` | Repoint edits at the new `NodeAclService` API that writes through to the same table the evaluator reads. Add a "test this permission" probe that runs a dummy evaluator against a chosen `(user, nodeId, action)` so ops can sanity-check grants before publishing a draft. |
|
| `AuthCacheMaxStaleness` | Separate from Phase 6.1's `UsingStaleConfig` window. Default 5 min — beyond that, authz fails closed regardless of Phase 6.1 cache warmth. |
|
||||||
| Admin UI new tab `RoleGrantsTab.razor` | CRUD over `LdapGroupRoleMapping`. Per-cluster + system-wide grants. FleetAdmin only. |
|
| OPC UA server dispatch — all enforcement surfaces | `DriverNodeManager` wires evaluator on: **Browse + TranslateBrowsePathsToNodeIds** (ancestors implicitly visible if any descendant has a grant; denied ancestors filter from results), **Read** (per-attribute StatusCode `BadUserAccessDenied` in mixed-authorization batches; batch never poisons), **Write** (uses `NodePermissions.WriteOperate/Tune/Configure` based on driver `SecurityClassification`), **HistoryRead** (uses `NodePermissions.HistoryRead` — **distinct** flag, not Read), **HistoryUpdate** (`NodePermissions.HistoryUpdate`), **CreateMonitoredItems** (per-`MonitoredItemCreateResult` denial), **TransferSubscriptions** (re-evaluates items on transfer), **Call** (`NodePermissions.MethodCall`), **Acknowledge/Confirm/Shelve** (per-alarm flags). |
|
||||||
|
| Subscription re-authorization | Each `MonitoredItem` is stamped with `(AuthGenerationId, MembershipVersion)` at create time. On every Publish, items with a stamp mismatching the session's current `(AuthGenerationId, MembershipVersion)` get re-evaluated; revoked items drop to `BadUserAccessDenied` within one publish cycle. Unchanged items stay fast-path. |
|
||||||
|
| `LdapAuthService` | On cookie-auth success: resolves LDAP group memberships; loads matching `LdapGroupRoleMapping` rows → role claims + cluster-scope claims (control plane); stores `UserAuthorizationState.LdapGroups` on the session for the data-plane evaluator. |
|
||||||
|
| `ValidatedNodeAclAuthoringService` | Replaces CRUD-only `NodeAclService` for authoring. Validates (LDAP group exists, scope exists in current or target draft, grant shape is valid, no duplicate `(LdapGroup, Scope)` pair). Admin UI writes only through it. |
|
||||||
|
| Admin UI `AclsTab.razor` | Writes via `ValidatedNodeAclAuthoringService`. Adds Probe-This-Permission row that runs the real evaluator against a chosen `(LDAP group, node, operation)` and shows `Allow` / `NotGranted` + matched-grant provenance. |
|
||||||
|
| Admin UI new tab `RoleGrantsTab.razor` | CRUD over `LdapGroupRoleMapping`. Per-cluster + system-wide grants. FleetAdmin only. **Documentation explicit** that this only affects Admin UI access, not OPC UA data plane. |
|
||||||
| Audit log | Every Grant/Revoke/Publish on `LdapGroupRoleMapping` or `NodeAcl` writes an `AuditLog` row with old/new state + user. |
|
| Audit log | Every Grant/Revoke/Publish on `LdapGroupRoleMapping` or `NodeAcl` writes an `AuditLog` row with old/new state + user. |
|
||||||
|
|
||||||
## Scope — What Does NOT Change
|
## Scope — What Does NOT Change
|
||||||
@@ -66,14 +73,19 @@ Closes these gaps:
|
|||||||
5. **B.5** Per-session cached evaluator: OPC UA Session authentication produces `UserAuthorizationState { ClusterId, LdapGroups[], Trie }`; cached on the session until session close or generation-apply.
|
5. **B.5** Per-session cached evaluator: OPC UA Session authentication produces `UserAuthorizationState { ClusterId, LdapGroups[], Trie }`; cached on the session until session close or generation-apply.
|
||||||
6. **B.6** Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
|
6. **B.6** Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
|
||||||
|
|
||||||
### Stream C — OPC UA server dispatch wiring (4 days)
|
### Stream C — OPC UA server dispatch wiring (6 days, widened)
|
||||||
|
|
||||||
1. **C.1** `DriverNodeManager.Read` — consult evaluator before delegating to `IReadable`. Unauthorized nodes get `BadUserAccessDenied` per-attribute, not on the whole batch.
|
1. **C.1** `DriverNodeManager.Read` — evaluator consulted per `ReadValueId` with `OpcUaOperation.Read`. Denied attributes get `BadUserAccessDenied` per-item; batch never poisons. Integration test covers mixed-authorization batch (3 authorized + 2 denied → 3 Good values + 2 Bad StatusCodes, request completes).
|
||||||
2. **C.2** `DriverNodeManager.Write` — same. Evaluator needs `NodePermissions.WriteOperate` / `WriteTune` / `WriteConfigure` depending on driver-reported `SecurityClassification` of the attribute.
|
2. **C.2** `DriverNodeManager.Write` — evaluator chooses `NodePermissions.WriteOperate` / `WriteTune` / `WriteConfigure` based on the driver-reported `SecurityClassification`.
|
||||||
3. **C.3** `DriverNodeManager.HistoryRead` — ACL checks `NodePermissions.Read` (history uses the same Read flag per `acl-design.md`).
|
3. **C.3** `DriverNodeManager.HistoryRead` — **uses `NodePermissions.HistoryRead`**, which is a **distinct flag** from Read. Test: user with Read but not HistoryRead can read live values but gets `BadUserAccessDenied` on `HistoryRead`.
|
||||||
4. **C.4** `DriverNodeManager.CreateMonitoredItem` — denies unauthorized nodes at subscription create time, not after the first publish. Cleaner than silently omitting notifications.
|
4. **C.4** `DriverNodeManager.HistoryUpdate` — uses `NodePermissions.HistoryUpdate`.
|
||||||
5. **C.5** Alarm actions (acknowledge / confirm / shelve) — checks `AlarmAck` / `AlarmConfirm` / `AlarmShelve` flags.
|
5. **C.5** `DriverNodeManager.CreateMonitoredItems` — per-`MonitoredItemCreateResult` denial in mixed-authorization batch; partial success path per OPC UA Part 4. Each created item stamped `(AuthGenerationId, MembershipVersion)`.
|
||||||
6. **C.6** Integration tests: boot server with a seed trie, auth as three distinct users with different group memberships, assert read of one tag allowed + read of another denied + write denied where Read allowed.
|
6. **C.6** `DriverNodeManager.TransferSubscriptions` — on reconnect, re-evaluate every transferred `MonitoredItem` against the session's current auth state. Stale-stamp items drop to `BadUserAccessDenied`.
|
||||||
|
7. **C.7** **Browse + TranslateBrowsePathsToNodeIds** — evaluator called with `OpcUaOperation.Browse`. Ancestor visibility implied when any descendant has a grant (per `acl-design.md` §Browse). Denied ancestors filter from browse results — the UA browser sees a hierarchy truncated at the denied ancestor rather than an inconsistent child-without-parent view.
|
||||||
|
8. **C.8** `DriverNodeManager.Call` — `NodePermissions.MethodCall`.
|
||||||
|
9. **C.9** Alarm actions (Acknowledge / Confirm / Shelve) — per-alarm `NodePermissions.AlarmAck` / `AlarmConfirm` / `AlarmShelve`.
|
||||||
|
10. **C.10** Publish path — for each `MonitoredItem` with a mismatched `(AuthGenerationId, MembershipVersion)` stamp, re-evaluate. Unchanged items stay fast-path; changes happen at next publish cycle.
|
||||||
|
11. **C.11** Integration tests: three-user seed with different memberships; matrix covers every operation in §Scope. Mixed-batch tests for Read + CreateMonitoredItems.
|
||||||
|
|
||||||
### Stream D — Admin UI refresh (4 days)
|
### Stream D — Admin UI refresh (4 days)
|
||||||
|
|
||||||
@@ -84,11 +96,20 @@ Closes these gaps:
|
|||||||
|
|
||||||
## Compliance Checks (run at exit gate)
|
## Compliance Checks (run at exit gate)
|
||||||
|
|
||||||
- [ ] **Data-path enforcement**: OPC UA Read against a NodeId the current user has no grant for returns `BadUserAccessDenied` with a ServiceResult, not Good with stale data. Verified by an integration test with a Basic256Sha256-secured session + a read-only LDAP identity.
|
- [ ] **Control/data-plane separation**: `LdapGroupRoleMapping` consumed only by Admin UI; the data-path evaluator has zero references to it. Enforced via a project-reference audit (Admin project references the mapping service; `Core.Authorization` does not).
|
||||||
- [ ] **Trie invariants**: `PermissionTrieBuilder` is idempotent (building twice with identical inputs produces equal tries — override `Equals` to assert).
|
- [ ] **Every operation wired**: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve all consult the evaluator. Integration test matrix covers every operation × allow/deny.
|
||||||
- [ ] **Additive grants**: Cluster-level grant on User A means User A can read every tag in that cluster *without* needing any lower-level grant.
|
- [ ] **HistoryRead uses its own flag**: test "user with Read + no HistoryRead gets `BadUserAccessDenied` on HistoryRead".
|
||||||
- [ ] **Isolation between clusters**: a grant on Cluster 1 has zero effect on Cluster 2 for the same user.
|
- [ ] **Mixed-batch semantics**: Read of 5 nodes (3 allowed + 2 denied) returns 3 Good + 2 `BadUserAccessDenied` per-`ReadValueId`; CreateMonitoredItems equivalent.
|
||||||
- [ ] **Galaxy path coverage**: ACL checks work on `Galaxy` folder nodes + tag nodes where the UNS levels are absent (the trie treats them as shallow `Cluster → Namespace → Tag`).
|
- [ ] **Browse ancestor visibility**: user with a grant only on a deep equipment node can browse the path to it (ancestors implied); denied ancestors filter from browse results otherwise.
|
||||||
|
- [ ] **Galaxy FolderSegment coverage**: a grant on a Galaxy folder subtree cascades to its tags; sibling folders are unaffected. Trie test covers this.
|
||||||
|
- [ ] **Subscription re-authorization**: integration test — create item, revoke grant via draft+publish, next publish cycle the item returns `BadUserAccessDenied` (not silently still-notifying).
|
||||||
|
- [ ] **Membership freshness**: test — 15 min MembershipFreshnessInterval elapses on a long-lived session + LDAP now unreachable → authz fails closed on the next request until LDAP recovers.
|
||||||
|
- [ ] **Auth cache fail-closed**: test — Phase 6.1 cache serves stale config for 6 min; authz evaluator refuses all calls after 5 min regardless.
|
||||||
|
- [ ] **Trie invariants**: `PermissionTrieBuilder` is idempotent (build twice with identical inputs → equal tries).
|
||||||
|
- [ ] **Additive grants + cluster isolation**: cluster-grant cascades; cross-cluster leakage impossible.
|
||||||
|
- [ ] **Redundancy-safe invalidation**: integration test — two nodes, a publish on one, authorize a request on the other before in-process event propagates → generation-mismatch forces re-load, no stale decision.
|
||||||
|
- [ ] **Authoring validation**: `AclsTab` cannot save a `(LdapGroup, Scope)` pair that already exists in the draft; operator sees the validation error pre-save.
|
||||||
|
- [ ] **AuthorizationDecision shape stability**: API surface exposes `Allow` + `NotGranted` only; `Denied` variant exists in the type but is never produced; v2.1 can add Deny without API break.
|
||||||
- [ ] No regression in driver test counts.
|
- [ ] No regression in driver test counts.
|
||||||
|
|
||||||
## Risks and Mitigations
|
## Risks and Mitigations
|
||||||
|
|||||||
@@ -22,11 +22,13 @@ Closes these gaps:
|
|||||||
|
|
||||||
| Concern | Change |
|
| Concern | Change |
|
||||||
|---------|--------|
|
|---------|--------|
|
||||||
| `OtOpcUa.Server` → new `Server.Redundancy` sub-namespace | `RedundancyCoordinator` singleton. Resolves the current node's `ClusterNode` row at startup, loads its peers from `ServerCluster`, probes each peer's `/healthz` (Phase 6.1 endpoint) every `PeerProbeInterval` (default 2 s), maintains per-peer health state. |
|
| `OtOpcUa.Server` → new `Server.Redundancy` sub-namespace | `RedundancyCoordinator` singleton. Resolves the current node's `ClusterNode` row at startup, loads peers, runs **two-layer peer health probe**: (a) `/healthz` every 2 s as the fast-fail (inherits Phase 6.1 semantics — HTTP + DB/cache healthy); (b) `UaHealthProbe` every 10 s — opens a lightweight OPC UA client session to the peer + reads its `ServiceLevel` node + verifies endpoint serves data. Authority decisions use UaHealthProbe; `/healthz` is used only to avoid wasting UA probes when peer is obviously down. |
|
||||||
| OPC UA server root | `ServiceLevel` variable node becomes a `BaseDataVariable` whose value updates on `RedundancyCoordinator` state change. `ServerUriArray` array variable refreshes on cluster-topology change. `RedundancySupport` stays static (set from `RedundancyMode` at startup). |
|
| Publish-generation fencing | Topology + role decisions are stamped with a monotonic `ConfigGenerationId` from the shared config DB. Coordinator re-reads topology via CAS on `(ClusterId, ExpectedGeneration)` → new row; peers reject state propagated from a lower generation. Prevents split-publish races. |
|
||||||
| `RedundancyCoordinator` computation | `ServiceLevel` formula: 255 = Primary + fully healthy + no apply in progress; 200 = Primary + an apply in the middle (clients should prefer peer); 100 = Backup + fully healthy; 50 = Backup + mid-apply; 0 = Faulted or peer-unreachable-and-I'm-not-authoritative. Documented in `docs/Redundancy.md` update. |
|
| `InvalidTopology` runtime state | If both nodes detect >1 Primary AFTER startup (config-DB drift during a publish), both self-demote to ServiceLevel 2 until convergence. Neither node serves authoritatively; clients pick the healthier alternative or reconnect later. |
|
||||||
|
| OPC UA server root | `ServiceLevel` variable node becomes a `BaseDataVariable` whose value updates on `RedundancyCoordinator` state change. `ServerUriArray` array variable includes **self + peers** in stable deterministic ordering (decision per OPC UA Part 4 §6.6.2.2). `RedundancySupport` stays static (set from `RedundancyMode` at startup); `Transparent` mode validated pre-publish, not rejected at startup. |
|
||||||
|
| `RedundancyCoordinator` computation | **8-state ServiceLevel matrix** — avoids OPC UA Part 5 §6.3.34 collision (`0=Maintenance`, `1=NoData`). Operator-declared maintenance only = **0**. Unreachable / Faulted = **1**. In-range operational states occupy **2..255**: Authoritative-Primary = **255**; Isolated-Primary (peer unreachable, self serving) = **230**; Primary-Mid-Apply = **200**; Recovering-Primary (post-fault, dwell not met) = **180**; Authoritative-Backup = **100**; Isolated-Backup (primary unreachable, "take over if asked") = **80**; Backup-Mid-Apply = **50**; Recovering-Backup = **30**; `InvalidTopology` (runtime detects >1 Primary) = **2** (detected-inconsistency band — below normal operation). Full matrix documented in `docs/Redundancy.md` update. |
|
||||||
| Role transition | Split-brain avoidance: role is *declared* in the shared config DB (`ClusterNode.RedundancyRole`), not elected at runtime. An operator flips the row (or a failover script does). Coordinator only reads; never writes. |
|
| Role transition | Split-brain avoidance: role is *declared* in the shared config DB (`ClusterNode.RedundancyRole`), not elected at runtime. An operator flips the row (or a failover script does). Coordinator only reads; never writes. |
|
||||||
| `sp_PublishGeneration` hook | Before the apply starts, the coordinator sets `ApplyInProgress = true` in-memory → `ServiceLevel` drops to mid-apply band. Clears after `sp_PublishGeneration` returns. |
|
| `sp_PublishGeneration` hook | Uses named **apply leases** keyed to `(ConfigGenerationId, PublishRequestId)`. `await using var lease = coordinator.BeginApplyLease(...)`. Disposal on any exit path (success, exception, cancellation) decrements. Watchdog auto-closes any lease older than `ApplyMaxDuration` (default 10 min) → ServiceLevel can't stick at mid-apply. Pre-publish validator rejects unsupported `RedundancyMode` (e.g. `Transparent`) with a clear error so runtime never sees an invalid state. |
|
||||||
| Admin UI `/cluster/{id}` page | New `RedundancyTab.razor` — shows current node's role + ServiceLevel + peer reachability. FleetAdmin can trigger a role-swap by editing `ClusterNode.RedundancyRole` + publishing a draft. |
|
| Admin UI `/cluster/{id}` page | New `RedundancyTab.razor` — shows current node's role + ServiceLevel + peer reachability. FleetAdmin can trigger a role-swap by editing `ClusterNode.RedundancyRole` + publishing a draft. |
|
||||||
| Metrics | New OpenTelemetry metrics: `ot_opcua_service_level{cluster,node}`, `ot_opcua_peer_reachable{cluster,node,peer}`, `ot_opcua_apply_in_progress{cluster,node}`. Sink via Phase 6.1 observability layer. |
|
| Metrics | New OpenTelemetry metrics: `ot_opcua_service_level{cluster,node}`, `ot_opcua_peer_reachable{cluster,node,peer}`, `ot_opcua_apply_in_progress{cluster,node}`. Sink via Phase 6.1 observability layer. |
|
||||||
|
|
||||||
@@ -57,41 +59,59 @@ Closes these gaps:
|
|||||||
2. **A.2** Topology subscription — coordinator re-reads on `sp_PublishGeneration` confirmation so an operator role-swap takes effect after publish (no process restart needed).
|
2. **A.2** Topology subscription — coordinator re-reads on `sp_PublishGeneration` confirmation so an operator role-swap takes effect after publish (no process restart needed).
|
||||||
3. **A.3** Tests: two-node cluster seed, one-node cluster seed (degenerate), duplicate-uri rejection.
|
3. **A.3** Tests: two-node cluster seed, one-node cluster seed (degenerate), duplicate-uri rejection.
|
||||||
|
|
||||||
### Stream B — Peer health probing + ServiceLevel computation (4 days)
|
### Stream B — Peer health probing + ServiceLevel computation (6 days, widened)
|
||||||
|
|
||||||
1. **B.1** `PeerProbeLoop` runs per peer at `PeerProbeInterval` (2 s default, configurable via `appsettings.json`). Calls peer's `/healthz` via `HttpClient`; timeout 1 s. Exponential backoff on sustained failure.
|
1. **B.1** `PeerHttpProbeLoop` per peer at 2 s — calls peer's `/healthz`, 1 s timeout, exponential backoff on sustained failure. Used as fast-fail.
|
||||||
2. **B.2** `ServiceLevelCalculator.Compute(current role, self health, peer reachable, apply in progress) → byte`. Matrix documented in §Scope.
|
2. **B.2** `PeerUaProbeLoop` per peer at 10 s — opens an OPC UA client session to the peer (reuses Phase 5 `Driver.OpcUaClient` stack), reads peer's `ServiceLevel` node + verifies endpoint serves data. Short-circuit: if HTTP probe is failing, skip UA probe (no wasted sessions).
|
||||||
3. **B.3** Calculator reacts to inputs via `IObserver` pattern so changes immediately push to the OPC UA `ServiceLevel` node.
|
3. **B.3** `ServiceLevelCalculator.Compute(role, selfHealth, peerHttpHealthy, peerUaHealthy, applyInProgress, recoveryDwellMet, topologyValid) → byte`. 8-state matrix per §Scope. `topologyValid=false` forces InvalidTopology = 2 regardless of other inputs.
|
||||||
4. **B.4** Tests: matrix coverage for all role × health × apply permutations (32 cases); injected `IClock` + fake `HttpClient` so tests are deterministic.
|
4. **B.4** `RecoveryStateManager`: after a `Faulted → Healthy` transition, hold driver in `Recovering` band (180 Primary / 30 Backup) for `RecoveryDwellTime` (default 60 s) AND require one positive publish witness (successful `Read` on a reference node) before entering Authoritative band.
|
||||||
|
5. **B.5** Calculator reacts to inputs via `IObserver` so changes immediately push to the OPC UA `ServiceLevel` node.
|
||||||
|
6. **B.6** Tests: **64-case matrix** covering role × self-health × peer-http × peer-ua × apply × recovery × topology. Specific cases flagged: Primary-with-unreachable-peer-serves-at-230 (authority retained); Backup-with-unreachable-primary-escalates-to-80 (not auto-promote); InvalidTopology demotes both nodes; Recovering dwell + publish-witness blocks premature return to 255.
|
||||||
|
|
||||||
### Stream C — OPC UA node wiring (3 days)
|
### Stream C — OPC UA node wiring (3 days)
|
||||||
|
|
||||||
1. **C.1** `ServiceLevel` variable node created under `ServerStatus` at server startup. Type `Byte`, AccessLevel = CurrentRead only. Subscribe to `ServiceLevelCalculator` observable; push updates via `DataChangeNotification`.
|
1. **C.1** `ServiceLevel` variable node created under `ServerStatus` at server startup. Type `Byte`, AccessLevel = CurrentRead only. Subscribe to `ServiceLevelCalculator` observable; push updates via `DataChangeNotification`.
|
||||||
2. **C.2** `ServerUriArray` variable node under `ServerCapabilities`. Array of `String`, length = peer count. Updates on topology change.
|
2. **C.2** `ServerUriArray` variable node under `ServerCapabilities`. Array of `String`, **includes self + peers** with deterministic ordering (self first). Updates on topology change. Compliance test asserts local-plus-peer membership.
|
||||||
3. **C.3** `RedundancySupport` variable — static at startup from `RedundancyMode`. Values: `None`, `Cold`, `Warm`, `WarmActive`, `Hot`. Phase 6.3 supports everything except `Transparent` + `HotAndMirrored`.
|
3. **C.3** `RedundancySupport` variable — static at startup from `RedundancyMode`. Values: `None`, `Cold`, `Warm`, `WarmActive`, `Hot`. Unsupported values (`Transparent`, `HotAndMirrored`) are rejected **pre-publish** by validator — runtime never sees them.
|
||||||
4. **C.4** Test against the Client.CLI: connect to primary, read `ServiceLevel` → expect 255; pause primary apply → expect 200; fail primary → client sees `Bad_ServerNotConnected` + reconnects to peer at 100.
|
4. **C.4** Client.CLI cutover test: connect to primary, read `ServiceLevel` → 255; pause primary apply → 200; unreachable peer while apply in progress → 200 (apply dominates peer-unreachable per matrix); client sees peer via `ServerUriArray`; fail primary → client reconnects to peer at 80 (isolated-backup band).
|
||||||
|
|
||||||
### Stream D — Apply-window integration (2 days)
|
### Stream D — Apply-window integration (3 days)
|
||||||
|
|
||||||
1. **D.1** `sp_PublishGeneration` caller wraps the apply in `using (coordinator.BeginApplyWindow())`. `BeginApplyWindow` increments an in-process counter; ServiceLevel drops on first increment. Dispose decrements.
|
1. **D.1** `sp_PublishGeneration` caller wraps the apply in `await using var lease = coordinator.BeginApplyLease(generationId, publishRequestId)`. Lease keyed to `(ConfigGenerationId, PublishRequestId)` so concurrent publishes stay isolated. Disposal decrements on every exit path.
|
||||||
2. **D.2** Nested applies handled by the counter (rarely happens but Ignition and Kepware clients have both been observed firing rapid-succession draft publishes).
|
2. **D.2** `ApplyLeaseWatchdog` auto-closes leases older than `ApplyMaxDuration` (default 10 min) so a crashed publisher can't pin the node at mid-apply.
|
||||||
3. **D.3** Test: mid-apply subscribe on primary; assert the subscribing client sees the ServiceLevel drop immediately after the apply starts, then restore after apply completes.
|
3. **D.3** Pre-publish validator in `sp_PublishGeneration` rejects unsupported `RedundancyMode` values (`Transparent`, `HotAndMirrored`) with a clear error message — runtime never sees an invalid mode.
|
||||||
|
4. **D.4** Tests: (a) mid-apply client subscribes → sees ServiceLevel drop → sees restore; (b) lease leak via `ThreadAbort` / cancellation → watchdog closes; (c) publish rejected for `Transparent` → operator-actionable error.
|
||||||
|
|
||||||
### Stream E — Admin UI + metrics (3 days)
|
### Stream E — Admin UI + metrics (3 days)
|
||||||
|
|
||||||
1. **E.1** `RedundancyTab.razor` under `/cluster/{id}/redundancy`. Shows each node's role, current ServiceLevel, peer reachability, last apply timestamp. Role-swap button posts a draft edit on `ClusterNode.RedundancyRole`; publish applies.
|
1. **E.1** `RedundancyTab.razor` under `/cluster/{id}/redundancy`. Shows each node's role, current ServiceLevel (with band label per 8-state matrix), peer reachability (HTTP + UA probe separately), last apply timestamp. Role-swap button posts a draft edit on `ClusterNode.RedundancyRole`; publish applies.
|
||||||
2. **E.2** OpenTelemetry meter export: three gauges per the §Scope metrics. Sink via Phase 6.1 observability.
|
2. **E.2** OpenTelemetry meter export: `ot_opcua_service_level{cluster,node}` gauge + `ot_opcua_peer_reachable{cluster,node,peer,kind=http|ua}` + `ot_opcua_apply_in_progress{cluster,node}` + `ot_opcua_topology_valid{cluster}`. Sink via Phase 6.1 observability.
|
||||||
3. **E.3** SignalR push: `FleetStatusHub` broadcasts ServiceLevel changes so the Admin UI updates within ~1 s of the coordinator observing a peer flip.
|
3. **E.3** SignalR push: `FleetStatusHub` broadcasts ServiceLevel changes so the Admin UI updates within ~1 s of the coordinator observing a peer flip.
|
||||||
|
|
||||||
|
### Stream F — Client-interoperability matrix (3 days, new)
|
||||||
|
|
||||||
|
1. **F.1** Validate ServiceLevel-driven cutover against **Ignition 8.1 + 8.3**, **Kepware KEPServerEX 6.x**, **Aveva OI Gateway 2020R2 + 2023R1**. For each: configure the client with both endpoints, verify it honors `ServiceLevel` + `ServerUriArray` during primary failover.
|
||||||
|
2. **F.2** Clients that don't honour the standards (doc field — may include Kepware and OI Gateway per Codex review) get an explicit compatibility-matrix entry: "requires manual backup-endpoint config / vendor-specific redundancy primitives". Documented in `docs/Redundancy.md`.
|
||||||
|
3. **F.3** Galaxy MXAccess failover test — boot Galaxy.Proxy on both nodes, kill Primary, assert Galaxy consumer reconnects to Backup within `(SessionTimeout + KeepAliveInterval × 3)`. Document required session-timeout config in `docs/Redundancy.md`.
|
||||||
|
|
||||||
## Compliance Checks (run at exit gate)
|
## Compliance Checks (run at exit gate)
|
||||||
|
|
||||||
- [ ] **Primary-healthy** ServiceLevel = 255.
|
- [ ] **OPC UA band compliance**: `0=Maintenance` reserved, `1=NoData` reserved. Operational states in 2..255 per 8-state matrix.
|
||||||
- [ ] **Backup-healthy** ServiceLevel = 100.
|
- [ ] **Authoritative-Primary** ServiceLevel = 255.
|
||||||
- [ ] **Mid-apply Primary** ServiceLevel = 200 — verified via Client.CLI subscription polling ServiceLevel during a forced draft publish.
|
- [ ] **Isolated-Primary** (peer unreachable, self serving) = 230 — Primary retains authority.
|
||||||
- [ ] **Peer-unreachable** handling: when a Primary can't probe its Backup's `/healthz`, Primary still serves at 255 (peer is the one with the problem). When a Backup can't probe Primary, Backup flips to 200 (per decision #81 — a lonely Backup promotes its advertised level to signal "I'll take over if you ask" without auto-promoting).
|
- [ ] **Primary-Mid-Apply** = 200.
|
||||||
- [ ] **Role transition via operator publish**: FleetAdmin swaps `RedundancyRole` rows in a draft, publishes; both nodes re-read topology on publish confirmation and flip ServiceLevel accordingly — no restart needed.
|
- [ ] **Recovering-Primary** = 180 with dwell + publish witness enforced.
|
||||||
- [ ] **ServerUriArray** returns exactly the peer node's ApplicationUri.
|
- [ ] **Authoritative-Backup** = 100.
|
||||||
- [ ] **Client.CLI cutover**: with a primary deliberately halted, a client that was connected to primary reconnects to the backup within the ServiceLevel-polling interval.
|
- [ ] **Isolated-Backup** (primary unreachable) = 80 — does NOT auto-promote.
|
||||||
|
- [ ] **InvalidTopology** = 2 — both nodes self-demote when >1 Primary detected runtime.
|
||||||
|
- [ ] **ServerUriArray** returns self + peer URIs, self first.
|
||||||
|
- [ ] **UaHealthProbe authority**: integration test — peer returns HTTP 200 but OPC UA endpoint unreachable → coordinator treats peer as UA-unhealthy; peer is not a valid authority source.
|
||||||
|
- [ ] **Apply-lease disposal**: leases close on exception, cancellation, and watchdog timeout; ServiceLevel never sticks at mid-apply band.
|
||||||
|
- [ ] **Transparent-mode rejection**: attempting to publish `RedundancyMode=Transparent` is blocked at `sp_PublishGeneration`; runtime never sees an invalid mode.
|
||||||
|
- [ ] **Role transition via operator publish**: FleetAdmin swaps `RedundancyRole` in a draft, publishes; both nodes re-read topology on publish confirmation + flip ServiceLevel — no restart.
|
||||||
|
- [ ] **Client.CLI cutover**: with primary halted, Client.CLI that was connected to primary sees primary drop + reconnects to backup via `ServerUriArray`.
|
||||||
|
- [ ] **Client interoperability matrix** (Stream F): Ignition 8.1 + 8.3 honour ServiceLevel; Kepware + Aveva OI Gateway findings documented.
|
||||||
|
- [ ] **Galaxy MXAccess failover**: end-to-end test — primary kill → Galaxy consumer reconnects to backup within session-timeout budget.
|
||||||
- [ ] No regression in existing driver test suites; no regression in `/healthz` reachability under redundancy load.
|
- [ ] No regression in existing driver test suites; no regression in `/healthz` reachability under redundancy load.
|
||||||
|
|
||||||
## Risks and Mitigations
|
## Risks and Mitigations
|
||||||
|
|||||||
@@ -22,13 +22,14 @@ Gaps to close:
|
|||||||
|
|
||||||
| Concern | Change |
|
| Concern | Change |
|
||||||
|---------|--------|
|
|---------|--------|
|
||||||
| `Admin/Pages/UnsTab.razor` | Rewrite as a tree component with drag-drop (Blazor-native HTML5 DnD; no third-party dep). Each drag fires a "Compute Impact" call against the draft-generation state + renders a modal preview ("Moving Line 'Oven-2' from 'Packaging' to 'Assembly' will re-home 14 equipment + re-parent 237 tags"). Confirmation commits the draft edit. |
|
| `Admin/Pages/UnsTab.razor` | Tree component with drag-drop using **`MudBlazor.TreeView` + `MudBlazor.DropTarget`** (existing transitive dep — no new third-party package). Native HTML5 DnD rejected because virtualization + DnD on 500+ nodes doesn't combine reliably. Each drag fires a "Compute Impact" call carrying a `DraftRevisionToken`; modal preview ("Moving Line 'Oven-2' from 'Packaging' to 'Assembly' will re-home 14 equipment + re-parent 237 tags"). **Confirm step re-checks the token** and rejects with a `409 Conflict / refresh-required` modal if the draft advanced between preview and commit. |
|
||||||
| `Admin/Services/UnsImpactAnalyzer.cs` | New service. Given a move-operation (line move, area rename, line merge), computes cascade counts by walking the draft-generation `Equipment` + `Tag` tables. Pure-function shape; testable in isolation. |
|
| `Admin/Services/UnsImpactAnalyzer.cs` | New service. Given a move-operation (line move, area rename, line merge), computes cascade counts + `DraftRevisionToken` at preview time. Pure-function shape; testable in isolation. |
|
||||||
| `Admin/Pages/EquipmentTab.razor` | Add CSV-import button → modal with file picker + dry-run preview. Add multi-identifier search bar (ZTag / SAPID / UniqueId / Alias1 / Alias2) per decision #95 — parses any of the five, shows matches across draft + published generations. |
|
| `Admin/Pages/EquipmentTab.razor` | Add CSV-import button → modal with file picker + dry-run preview. **Identifier search** uses the canonical decision #117 set: `ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid`. Typeahead probes each column with a ranking query (exact match score 100 → prefix 50 → opt-in LIKE 20; published > draft tie-break). Result row shows which field matched via trailing badge. |
|
||||||
| `Admin/Services/EquipmentCsvImporter.cs` | New service. Parses CSV with documented header row; validates each row against the `Equipment` schema (required fields + `ExternalIdReservation` freshness); returns `ImportPreview` DTO with per-row accept/reject + reason; commit step wraps in a single EF transaction. |
|
| `Admin/Services/EquipmentCsvImporter.cs` | New service. CSV header row must start with `# OtOpcUaCsv v1` (version marker — future shape changes bump the version). Columns: `ZTag, MachineCode, SAPID, EquipmentId, EquipmentUuid, Name, UnsAreaName, UnsLineName, Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. Parser rejects unknown columns + blank required fields + duplicate ZTags + missing UnsLines. |
|
||||||
| `Admin/Pages/DraftEditor.razor` + `DiffViewer.razor` | Diff viewer expanded: adds sections for ACL grants (from Phase 6.2 `LdapGroupRoleMapping` + `NodeAcl`), redundancy-role changes (from Phase 6.3), equipment-class `_base` Identification fields. Render each section collapsible. |
|
| **Staged-import table** `EquipmentImportBatch` | New entity `{ Id, CreatedAtUtc, CreatedBy, RowsStaged, RowsAccepted, RowsRejected, FinalisedAtUtc? }` + child `EquipmentImportRow` records. Import writes rows in chunks to the staging table (not to `Equipment`). `FinaliseImportBatch` is the atomic finalize step that applies all accepted rows to `Equipment` + `ExternalIdReservation` in one transaction — short + bounded regardless of input size. Rollback = drop the batch row; `Equipment` never partially mutates. |
|
||||||
| `Admin/Components/IdentificationFields.razor` | New component. Renders the OPC 40010 nullable columns (Manufacturer, Model, SerialNumber, ProductInstanceUri, HardwareRevision, SoftwareRevision, DeviceRevision, YearOfConstruction, MonthOfConstruction) as a labelled field group on the `EquipmentTab` detail view. |
|
| `Admin/Pages/DraftEditor.razor` + `DiffViewer.razor` | Diff viewer refactored into a base component + section plugins: `StructuralDiffSection`, `EquipmentDiffSection`, `TagDiffSection`, `AclDiffSection` (Phase 6.2), `RedundancyDiffSection` (Phase 6.3), `IdentificationDiffSection`. Each section has a **1000-row hard cap**; over-cap renders an aggregate summary + "Load full diff" button streaming 500-row pages via SignalR. Subtree-rename diffs (decision #115 bulk restructure) surface as summary only by default. |
|
||||||
| `OtOpcUa.Server/OpcUa/DriverNodeManager` — Equipment folder build | When an `Equipment` row has non-null Identification fields, the server adds an `Identification` sub-folder under the Equipment node containing one variable per non-null field. Matches OPC 40010 companion spec. |
|
| `Admin/Components/IdentificationFields.razor` | New component. Renders the OPC 40010 field set **per decision #139**: `Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. `ProductInstanceUri / DeviceRevision / MonthOfConstruction` dropped from this phase — they need a separate decision-log widening. |
|
||||||
|
| `OtOpcUa.Server/OpcUa/DriverNodeManager` — Equipment folder build | When an `Equipment` row has non-null Identification fields, the server adds an `Identification` sub-folder under the Equipment node containing one variable per non-null field. **ACL binding**: the sub-folder + variables inherit the `Equipment` scope's grants from Phase 6.2's trie — no new scope level added. Documented in `acl-design.md` cross-reference update. |
|
||||||
|
|
||||||
## Scope — What Does NOT Change
|
## Scope — What Does NOT Change
|
||||||
|
|
||||||
@@ -51,40 +52,53 @@ Gaps to close:
|
|||||||
|
|
||||||
## Task Breakdown
|
## Task Breakdown
|
||||||
|
|
||||||
### Stream A — UNS drag/reorder + impact preview (4 days)
|
### Stream A — UNS drag/reorder + impact preview (5 days)
|
||||||
|
|
||||||
1. **A.1** `UnsImpactAnalyzer` service. Inputs: `(DraftGenerationId, MoveOperation)`. Outputs: `ImpactPreview { AffectedEquipmentCount, AffectedTagCount, CascadeWarnings[] }`. Unit tests cover line move / area rename / line merge.
|
1. **A.1** 1000-node synthetic seed fixture. Drag-latency bench against `MudBlazor.TreeView` + `MudBlazor.DropTarget` — commit to the component if latency budget (100 ms drag-enter feedback) holds; fall back to flat-list reorder UI (Area/Line dropdowns) with loss of visual drag affordance otherwise.
|
||||||
2. **A.2** HTML5 DnD on a tree component. No JS interop beyond `ondragstart`/`ondragover`/`ondrop` — keeps build + testability simple.
|
2. **A.2** `UnsImpactAnalyzer` service. Inputs: `(DraftGenerationId, MoveOperation, DraftRevisionToken)`. Outputs: `ImpactPreview { AffectedEquipmentCount, AffectedTagCount, CascadeWarnings[], DraftRevisionToken }`. Pure-function shape; testable in isolation.
|
||||||
3. **A.3** Modal preview wired to `UnsImpactAnalyzer` output; "Confirm" commits a draft edit via `DraftService`.
|
3. **A.3** Modal preview wired to `UnsImpactAnalyzer`. **Confirm** re-reads the current draft revision + compares against the preview's token; if the draft advanced (another operator saved a different edit), show a `409 Conflict / refresh-required` modal rather than silently overwriting.
|
||||||
4. **A.4** Playwright smoke test (or equivalent): drag a line across areas, assert modal shows the right counts, assert draft row reflects the move.
|
4. **A.4** Cross-cluster drop attempts: target disabled + toast "Equipment is cluster-scoped (decision #82). To move across clusters, use Export → Import on the Cluster detail page." Plus help link.
|
||||||
|
5. **A.5** Playwright (or equivalent) smoke test: drag a line across areas, assert modal shows right counts, assert draft row reflects the move; concurrent-edit test runs two sessions + asserts the later Confirm hits the 409.
|
||||||
|
|
||||||
### Stream B — Equipment CSV import + 5-identifier search (4 days)
|
### Stream B — Equipment CSV import + 5-identifier search (5 days)
|
||||||
|
|
||||||
1. **B.1** `EquipmentCsvImporter` with a documented header row (`ZTag, SAPID, UniqueId, Alias1, Alias2, Name, UnsAreaName, UnsLineName, Manufacturer, Model, SerialNumber, …`). Parser rejects unknown columns + blank required fields + duplicate ZTags.
|
1. **B.1** `EquipmentCsvImporter`. Strict RFC 4180 parser (per decision #95). Header row validation: first line must match `# OtOpcUaCsv v1` — future versions fork parser versions. Required columns: `ZTag, MachineCode, SAPID, EquipmentId, EquipmentUuid, Name, UnsAreaName, UnsLineName`. Optional: `Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. Parser rejects unknown columns + blank required fields + duplicate ZTags.
|
||||||
2. **B.2** `ImportPreview` UI: per-row accept/reject table. Reject reasons: "ZTag already exists in draft", "ExternalIdReservation conflict with Cluster X", "UnsLineName not found in draft UNS tree", etc. Operator reviews then clicks "Commit" → single EF transaction.
|
2. **B.2** `EquipmentImportBatch` + `EquipmentImportRow` staging tables (migration). Import writes preview rows to staging via chunked inserts; staging never blocks `Equipment` or `ExternalIdReservation`. Preview query reads staging + validates each row against the current `Equipment` state + `ExternalIdReservation` freshness.
|
||||||
3. **B.3** Multi-identifier search — bar accepts any of the 5 identifiers, probes each column in parallel, returns first-match-wins + disambiguation list if multiple match.
|
3. **B.3** `ImportPreview` UI — per-row accept/reject table. Reject reasons: "ZTag already exists in draft", "ExternalIdReservation conflict with Cluster X", "UnsLineName not found in draft UNS tree", etc. Operator reviews + clicks "Commit".
|
||||||
4. **B.4** Smoke tests: 100-row CSV with 10 intentional conflicts (5 ZTag dupes, 3 reservation clashes, 2 missing UnsLines); assert preview flags each; assert commit rolls back cleanly when a conflict surfaces post-preview.
|
4. **B.4** `FinaliseImportBatch` — atomic finalize. One EF transaction applies accepted rows to `Equipment` + `ExternalIdReservation`; duration bounded regardless of input size (the atomic step is a bulk-insert, not per-row row-by-row). Rollback = drop batch row via `DropImportBatch`; `Equipment` never partially mutates.
|
||||||
|
5. **B.5** Five-identifier search. Rank SQL: exact match any identifier = score 100, prefix match = 50, LIKE-fuzzy (opt-in via `?fuzzy=true`) = 20; tie-break `published > draft` then `RowVersion DESC`. Typeahead shows which field matched via trailing badge.
|
||||||
|
6. **B.6** Smoke tests: 100-row CSV with 10 conflicts (5 ZTag dupes, 3 reservation clashes, 2 missing UnsLines); 10k-row perf test asserting finalize txn < 30 s; concurrent import + external `ExternalIdReservation` insert test asserts retryable-conflict handling.
|
||||||
|
|
||||||
### Stream C — Diff viewer enhancements (3 days)
|
### Stream C — Diff viewer enhancements (4 days)
|
||||||
|
|
||||||
1. **C.1** Refactor `DiffViewer.razor` into a base component + section plugins. Section plugins: `StructuralDiffSection` (UNS tree), `EquipmentDiffSection` (Equipment rows), `TagDiffSection` (Tag rows), `AclDiffSection` (ACL grants — depends on Phase 6.2), `RedundancyDiffSection` (role changes — depends on Phase 6.3), `IdentificationDiffSection` (OPC 40010 fields).
|
1. **C.1** Refactor `DiffViewer.razor` into a base component + section plugins. Plugins: `StructuralDiffSection` (UNS tree), `EquipmentDiffSection`, `TagDiffSection`, `AclDiffSection` (Phase 6.2), `RedundancyDiffSection` (Phase 6.3), `IdentificationDiffSection`.
|
||||||
2. **C.2** Each section renders collapsed by default; counts + top-line summary always visible.
|
2. **C.2** Each section renders collapsed by default; counts + top-line summary always visible. **1000-row hard cap** per section — over-cap sections render aggregate summary (e.g. "237 equipment re-parented from Packaging to Assembly") with a "Load full diff" button that streams 500-row pages via SignalR.
|
||||||
3. **C.3** Tests: seed two generations with deliberate diffs, assert every section reports the right counts + top-line summary.
|
3. **C.3** Subtree-rename diffs (decision #115 bulk restructure) surface as summary only by default regardless of row count.
|
||||||
|
4. **C.4** Tests: seed two generations with deliberate diffs; assert every section reports the right counts + top-line summary + hard-cap behavior.
|
||||||
|
|
||||||
### Stream D — OPC 40010 Identification exposure (3 days)
|
### Stream D — OPC 40010 Identification exposure (3 days)
|
||||||
|
|
||||||
1. **D.1** `IdentificationFields.razor` component — labelled inputs; nullable columns show empty input; required field validation only on commit.
|
1. **D.1** `IdentificationFields.razor` component. Renders the **9 decision #139 fields**: `Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. Labelled inputs; nullable columns show empty input; required-field validation on commit only.
|
||||||
2. **D.2** `DriverNodeManager` equipment-folder builder — after building the equipment node, inspect the Identification columns; if any non-null, add an `Identification` sub-folder with variable-per-field.
|
2. **D.2** `DriverNodeManager` equipment-folder builder — after building the equipment node, inspect the 9 Identification columns; if any non-null, add an `Identification` sub-folder with variable-per-non-null-field. ACL binding: sub-folder + variables inherit the **same `ScopeId` as the Equipment node** (Phase 6.2's trie treats them as part of the Equipment scope — no new scope level).
|
||||||
3. **D.3** Address-space smoke test via Client.CLI: browse an equipment node, assert `Identification` sub-folder present when columns are set, absent when all null.
|
3. **D.3** Address-space smoke test via Client.CLI: browse an equipment node, assert `Identification` sub-folder present when columns are set, absent when all null, variables match the field values.
|
||||||
|
4. **D.4** ACL integration test: a user with Equipment-level grant reads the `Identification` variables without needing a separate grant; a user without the Equipment grant gets `BadUserAccessDenied` on both the Equipment node + its Identification variables.
|
||||||
|
|
||||||
## Compliance Checks (run at exit gate)
|
## Compliance Checks (run at exit gate)
|
||||||
|
|
||||||
- [ ] **UNS drag/move**: drag a line across areas; modal preview shows correct impacted-equipment + impacted-tag counts.
|
- [ ] **UNS drag/move**: drag a line across areas; modal preview shows correct impacted-equipment + impacted-tag counts.
|
||||||
- [ ] **Equipment CSV**: 100-row CSV with 10 conflicts imports cleanly (preview flags each, commit rolls back mid-conflict).
|
- [ ] **Concurrent-edit safety**: two-session test — session B saves a draft edit after session A opened the preview; session A's Confirm returns `409 Conflict / refresh-required` instead of overwriting.
|
||||||
- [ ] **5-identifier search**: querying any of the 5 IDs returns the matching row; ambiguous searches list options.
|
- [ ] **Cross-cluster drop**: dropping equipment across cluster boundaries is disabled + shows actionable toast pointing to Export/Import workflow.
|
||||||
- [ ] **Diff viewer**: every section renders for a 2-generation diff with deliberate changes in every category.
|
- [ ] **1000-node tree**: drag operations on a 1000-node seed maintain < 100 ms drag-enter feedback.
|
||||||
- [ ] **OPC 40010 exposure**: Client.CLI browse shows `Identification` sub-folder when equipment has non-null columns; folder absent when all null.
|
- [ ] **CSV header version**: file missing `# OtOpcUaCsv v1` first line is rejected pre-parse.
|
||||||
- [ ] **ScadaLink visual parity**: operator-equivalence reviewer signs off that the new tabs feel consistent with existing Admin UI pages.
|
- [ ] **CSV canonical identifier set**: columns match decision #117 (ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid); drift from the earlier draft surfaces as a test failure.
|
||||||
|
- [ ] **Staged-import atomicity**: `FinaliseImportBatch` transaction bounded < 30 s for a 10k-row import; pre-finalize stagings visible only to the importing user; rollback via `DropImportBatch`.
|
||||||
|
- [ ] **Concurrent import + external reservation**: concurrent test — third party inserts to `ExternalIdReservation` mid-finalize; finalize retries with conflict handling; no corruption.
|
||||||
|
- [ ] **5-identifier search ranking**: exact matches outrank prefix matches; published outranks draft for equal scores.
|
||||||
|
- [ ] **Diff viewer section caps**: 2000-row subtree-rename diff renders as summary only; "Load full diff" streams in pages.
|
||||||
|
- [ ] **OPC 40010 field list match**: rendered field group matches decision #139 exactly; no extra fields.
|
||||||
|
- [ ] **OPC 40010 exposure**: Client.CLI browse shows `Identification` sub-folder when equipment has non-null columns; absent when all null.
|
||||||
|
- [ ] **ACL inheritance for Identification**: integration test — Equipment-grant user reads Identification; no-grant user gets `BadUserAccessDenied` on both.
|
||||||
|
- [ ] **Visual parity reviewer**: named role (`FleetAdmin` user, not the implementation lead) compares side-by-side against `admin-ui.md` §Visual-Design reference panels; signoff artefact is a checked-in screenshot set under `docs/v2/visual-compliance/phase-6-4/`.
|
||||||
|
|
||||||
## Risks and Mitigations
|
## Risks and Mitigations
|
||||||
|
|
||||||
|
|||||||
@@ -909,6 +909,26 @@ Each step leaves the system runnable. The generic extraction is effectively free
|
|||||||
| 140 | Enterprise shortname = `zb` (UNS level-1 segment) | Closes corrections-doc D4. Matches the existing `ZB.MOM.WW.*` namespace prefix used throughout the codebase; short by design since this segment appears in every equipment path (`zb/warsaw-west/bldg-3/line-2/cnc-mill-05/RunState`); operators already say "ZB" colloquially. Admin UI cluster-create form default-prefills `zb` for the Enterprise field. Production deployments use it directly from cluster-create | 2026-04-17 |
|
| 140 | Enterprise shortname = `zb` (UNS level-1 segment) | Closes corrections-doc D4. Matches the existing `ZB.MOM.WW.*` namespace prefix used throughout the codebase; short by design since this segment appears in every equipment path (`zb/warsaw-west/bldg-3/line-2/cnc-mill-05/RunState`); operators already say "ZB" colloquially. Admin UI cluster-create form default-prefills `zb` for the Enterprise field. Production deployments use it directly from cluster-create | 2026-04-17 |
|
||||||
| 141 | Tier 3 (AppServer IO) cutover is feasible — AVEVA's OI Gateway supports arbitrary upstream OPC UA servers as a documented pattern | Closes corrections-doc E2 with **GREEN-YELLOW** verdict. Multiple AVEVA partners (Software Toolbox, InSource) have published working integrations against four different non-AVEVA upstream servers (TOP Server, OPC Router, OmniServer, Cogent DataHub). No re-architecting of OtOpcUa required. Path: `OPC UA node → OI Gateway → SuiteLink → $DDESuiteLinkDIObject → AppServer attribute`. Recommended AppServer floor: System Platform 2023 R2 Patch 01. Two integrator-burden risks tracked: validation/GxP paperwork (no AVEVA blueprint exists for non-AVEVA upstream servers in Part 11 deployments) and unpublished scale benchmarks (in-house benchmark required before cutover scheduling). See `aveva-system-platform-io-research.md` | 2026-04-17 |
|
| 141 | Tier 3 (AppServer IO) cutover is feasible — AVEVA's OI Gateway supports arbitrary upstream OPC UA servers as a documented pattern | Closes corrections-doc E2 with **GREEN-YELLOW** verdict. Multiple AVEVA partners (Software Toolbox, InSource) have published working integrations against four different non-AVEVA upstream servers (TOP Server, OPC Router, OmniServer, Cogent DataHub). No re-architecting of OtOpcUa required. Path: `OPC UA node → OI Gateway → SuiteLink → $DDESuiteLinkDIObject → AppServer attribute`. Recommended AppServer floor: System Platform 2023 R2 Patch 01. Two integrator-burden risks tracked: validation/GxP paperwork (no AVEVA blueprint exists for non-AVEVA upstream servers in Part 11 deployments) and unpublished scale benchmarks (in-house benchmark required before cutover scheduling). See `aveva-system-platform-io-research.md` | 2026-04-17 |
|
||||||
| 142 | Phase 1 acceptance includes an end-to-end AppServer-via-OI-Gateway smoke test against OtOpcUa | Catches AppServer-specific quirks (cert exchange via reject-and-trust workflow, endpoint URL must NOT include `/discovery` suffix per Inductive Automation forum failure mode, service-account install required because OI Gateway under SYSTEM cannot connect to remote OPC servers, `Basic256Sha256` + `SignAndEncrypt` + LDAP-username token combination must work end-to-end) early — well before the Year 3 tier-3 cutover schedule. Adds one task to `phase-1-configuration-and-admin-scaffold.md` Stream E (Admin smoke test) | 2026-04-17 |
|
| 142 | Phase 1 acceptance includes an end-to-end AppServer-via-OI-Gateway smoke test against OtOpcUa | Catches AppServer-specific quirks (cert exchange via reject-and-trust workflow, endpoint URL must NOT include `/discovery` suffix per Inductive Automation forum failure mode, service-account install required because OI Gateway under SYSTEM cannot connect to remote OPC servers, `Basic256Sha256` + `SignAndEncrypt` + LDAP-username token combination must work end-to-end) early — well before the Year 3 tier-3 cutover schedule. Adds one task to `phase-1-configuration-and-admin-scaffold.md` Stream E (Admin smoke test) | 2026-04-17 |
|
||||||
|
| 143 | Polly per-capability policy — Read / HistoryRead / Discover / Probe / Alarm-subscribe auto-retry; Write does NOT auto-retry unless the tag metadata carries `[WriteIdempotent]` | Decisions #44-45 forbid auto-retry on Write because a timed-out write can succeed on the device + be replayed by the pipeline, duplicating pulses / alarm acks / counter increments / recipe-step advances. Per-capability policy in the shared Polly layer makes the retry safety story explicit; `WriteIdempotentAttribute` on tag definitions is the opt-in surface | 2026-04-19 |
|
||||||
|
| 144 | Polly pipeline key = `(DriverInstanceId, HostName)`, not DriverInstanceId alone | Decision #35 requires per-device isolation. One dead PLC behind a multi-device Modbus driver must NOT open the circuit breaker for healthy sibling hosts. Per-instance pipelines would poison every device behind one bad endpoint | 2026-04-19 |
|
||||||
|
| 145 | Tier A/B/C runtime enforcement splits into `MemoryTracking` (all tiers — soft/hard thresholds log + surface, NEVER kill) and `MemoryRecycle` (Tier C only — requires out-of-process topology). Tier A/B hard-breach logs a promotion-to-Tier-C recommendation; the runtime never auto-kills an in-process driver | Decisions #73-74 reserve process-kill protections for Tier C. An in-process Tier A/B "recycle" would kill every OPC UA session + every other in-proc driver for one leaky instance, blast-radius worse than the leak | 2026-04-19 |
|
||||||
|
| 146 | Memory watchdog uses the hybrid formula `soft = max(multiplier × baseline, baseline + floor)`, with baseline captured as the median of the first 5 min of `GetMemoryFootprint()` samples post-InitializeAsync. Tier-specific constants: A multiplier=3 floor=50 MB, B multiplier=3 floor=100 MB, C multiplier=2 floor=500 MB. Hard = 2 × soft | Codex adversarial review on the Phase 6.1 plan flagged that hardcoded per-tier MB bands diverge from decision #70's specified formula. Static bands false-trigger on small-footprint drivers + miss meaningful growth on large ones. Observed-baseline + hybrid formula recovers the original intent | 2026-04-19 |
|
||||||
|
| 147 | `WedgeDetector` uses demand-aware criteria `(state==Healthy AND hasPendingWork AND noProgressIn > threshold)`. `hasPendingWork` = (Polly bulkhead depth > 0) OR (active MonitoredItem count > 0) OR (queued historian read count > 0). Idle + subscription-only + write-only-burst drivers stay Healthy without false-fault | Previous "no successful Read in N intervals" formulation flipped legitimate idle subscribers, slow historian backfills, and write-heavy drivers to Faulted. The demand-aware check only fires when the driver claims work is outstanding | 2026-04-19 |
|
||||||
|
| 148 | LiteDB config cache is **generation-sealed**: `sp_PublishGeneration` writes `<cache-root>/<cluster>/<generationId>.db` as a read-only sealed file; cache reads serve the last-known-sealed generation. Mixed-generation reads are impossible | Prior "refresh on every successful query" cache could serve LDAP role mapping from one generation alongside UNS topology from another, producing impossible states. Sealed-snapshot invariant keeps cache-served reads coherent with a real published state | 2026-04-19 |
|
||||||
|
| 149 | `AuthorizationDecision { Allow \| NotGranted \| Denied, IReadOnlyList<MatchedGrant> Provenance }` — tri-state internal model. Phase 6.2 only produces `Allow` + `NotGranted` (grant-only semantics per decision #129); v2.1 Deny widens without API break | bool return would collapse `no-matching-grant` and `explicit-deny` into the same runtime state + UI explanation; provenance record is needed for the audit log anyway. Making the shape tri-state from Phase 6.2 avoids a breaking change in v2.1 | 2026-04-19 |
|
||||||
|
| 150 | Data-plane ACL evaluator consumes `NodeAcl` rows joined against the session's resolved LDAP group memberships. `LdapGroupRoleMapping` (decision #105) is control-plane only — routes LDAP groups to Admin UI roles. Zero runtime overlap between the two | Codex adversarial review flagged that Phase 6.2 draft conflated the two — building the data-plane trie from `LdapGroupRoleMapping` would let a user inherit tag permissions from an admin-role claim path never intended as a data-path grant | 2026-04-19 |
|
||||||
|
| 151 | `UserAuthorizationState` cached per session but bounded by `MembershipFreshnessInterval` (default 15 min). Past that interval the next hot-path authz call re-resolves LDAP group memberships; failure to re-resolve (LDAP unreachable) → fail-closed (evaluator returns `NotGranted` until memberships refresh successfully) | Previous design cached memberships until session close, so a user removed from a privileged LDAP group could keep authorized access for hours. Bounded freshness + fail-closed covers the revoke-takes-effect story | 2026-04-19 |
|
||||||
|
| 152 | Auth cache has its own staleness budget `AuthCacheMaxStaleness` (default 5 min), independent of decision #36's availability-oriented config cache (24 h). Past 5 min on authorization data, evaluator fails closed regardless of whether the underlying config is still serving from cache | Availability-oriented caches trade correctness for uptime. Authorization data is correctness-sensitive — stale ACLs silently extend revoked access. Auth-specific budget keeps the two concerns from colliding | 2026-04-19 |
|
||||||
|
| 153 | MonitoredItem carries `(AuthGenerationId, MembershipVersion)` stamp at create time. On every Publish, items with a mismatching stamp re-evaluate; unchanged items stay fast-path. Revoked items drop to `BadUserAccessDenied` within one publish cycle | Create-time-only authorization leaves revoked users receiving data forever; per-publish re-authorization at 100 ms cadence across 50 groups × 6 levels is too expensive. Stamp-then-reevaluate-on-change balances correctness with cost | 2026-04-19 |
|
||||||
|
| 154 | ServiceLevel reserves `0` for operator-declared maintenance only; `1` = NoData (unreachable / Faulted); operational states occupy `2..255` in an 8-state matrix (Authoritative-Primary=255, Isolated-Primary=230, Primary-Mid-Apply=200, Recovering-Primary=180, Authoritative-Backup=100, Isolated-Backup=80, Backup-Mid-Apply=50, Recovering-Backup=30, InvalidTopology=2) | OPC UA Part 5 §6.3.34 defines `0=Maintenance` + `1=NoData`; using `0` for our Faulted case collides with spec + triggers spec-compliant clients to enter maintenance-mode cutover. Expanded 8-state matrix covers operational states the 5-state original collapsed together (e.g. Isolated-Primary vs Primary-Mid-Apply were both 200) | 2026-04-19 |
|
||||||
|
| 155 | `ServerUriArray` includes self + peers (self first, deterministic ordering), per OPC UA Part 4 §6.6.2.2 | Previous design excluded self from the array — spec violation + clients lose the ability to map server identities consistently during failover | 2026-04-19 |
|
||||||
|
| 156 | Redundancy peer health uses a two-layer probe: `/healthz` (2 s) as fast-fail + `UaHealthProbe` (10 s, opens OPC UA client session to peer + reads its `ServiceLevel` node) as the authority signal. HTTP-healthy ≠ UA-authoritative | `/healthz` returns 200 whenever HTTP + config DB/cache is healthy — but a peer can be HTTP-healthy with a broken OPC UA endpoint or a stuck subscription publisher. Using HTTP alone would advertise authority against servers that can't actually publish data | 2026-04-19 |
|
||||||
|
| 157 | Publish-generation fencing — coordinator CAS on a monotonic `ConfigGenerationId`; every topology + role decision is generation-stamped; peers reject state propagated from a lower generation. Runtime `InvalidTopology` state (both self-demote to ServiceLevel 2) when >1 Primary detected post-startup | Operator race publishing two drafts with different roles can produce two locally-valid views; without fencing + runtime containment both nodes can serve as Primary until manual intervention | 2026-04-19 |
|
||||||
|
| 158 | Apply-window uses named leases keyed to `(ConfigGenerationId, PublishRequestId)` via `await using`. `ApplyLeaseWatchdog` auto-closes any lease older than `ApplyMaxDuration` (default 10 min) | Simple `IDisposable`-counter design leaks on cancellation / async-ownership races; a stuck positive count leaves the node permanently mid-apply. Generation-keyed leases + watchdog bound worst case | 2026-04-19 |
|
||||||
|
| 159 | CSV import header row must start with `# OtOpcUaCsv v1` (version marker). Future shape changes bump the version; parser forks per version. Canonical identifier columns follow decision #117: `ZTag, MachineCode, SAPID, EquipmentId, EquipmentUuid` | Without a version marker the CSV schema has no upgrade path — adding a required column breaks every old export silently. The version prefix makes parser dispatch explicit + future-compatible | 2026-04-19 |
|
||||||
|
| 160 | Equipment CSV import uses a staged-import pattern: `EquipmentImportBatch` + `EquipmentImportRow` tables receive chunked inserts; `FinaliseImportBatch` is one atomic transaction that applies accepted rows to `Equipment` + `ExternalIdReservation`. Rollback = drop the batch row; `Equipment` never partially mutates | 10k-row single-transaction import holds locks too long; chunked direct writes lose all-or-nothing rollback. Staging + atomic finalize bounds transaction duration + preserves rollback semantics | 2026-04-19 |
|
||||||
|
| 161 | UNS drag-reorder impact preview carries a `DraftRevisionToken`; Confirm re-checks against the current draft + returns `409 Conflict / refresh-required` if the draft advanced between preview and commit | Without concurrency control, two operators editing the same draft can overwrite each other's changes silently. Draft-revision token + 409 response makes the race visible + forces refresh | 2026-04-19 |
|
||||||
|
| 162 | OPC 40010 Identification sub-folder exposed under each equipment node inherits the Equipment scope's ACL grants — the ACL trie does NOT add a new scope level for Identification | Adding a new scope level for Identification would require every grant to add a second grant for `Equipment/Identification`; inheriting the Equipment scope keeps the grant model flat + prevents operator-forgot-to-grant-Identification access surprises | 2026-04-19 |
|
||||||
|
|
||||||
## Reference Documents
|
## Reference Documents
|
||||||
|
|
||||||
|
|||||||
139
scripts/compliance/phase-6-1-compliance.ps1
Normal file
139
scripts/compliance/phase-6-1-compliance.ps1
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Phase 6.1 exit-gate compliance check. Each check either passes or records a
|
||||||
|
failure; non-zero exit = fail.
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
Validates Phase 6.1 (Resilience & Observability runtime) completion. Checks
|
||||||
|
enumerated in `docs/v2/implementation/phase-6-1-resilience-and-observability.md`
|
||||||
|
§"Compliance Checks (run at exit gate)".
|
||||||
|
|
||||||
|
Runs a mix of file-presence checks, text-pattern sweeps over the committed
|
||||||
|
codebase, and a full `dotnet test` pass to exercise the invariants each
|
||||||
|
class encodes. Meant to be invoked from repo root.
|
||||||
|
|
||||||
|
.NOTES
|
||||||
|
Usage: pwsh ./scripts/compliance/phase-6-1-compliance.ps1
|
||||||
|
Exit: 0 = all checks passed; non-zero = one or more FAILs
|
||||||
|
#>
|
||||||
|
[CmdletBinding()]
|
||||||
|
param()
|
||||||
|
|
||||||
|
$ErrorActionPreference = 'Stop'
|
||||||
|
$script:failures = 0
|
||||||
|
$repoRoot = (Resolve-Path (Join-Path $PSScriptRoot '..\..')).Path
|
||||||
|
|
||||||
|
function Assert-Pass {
|
||||||
|
param([string]$Check)
|
||||||
|
Write-Host " [PASS] $Check" -ForegroundColor Green
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Fail {
|
||||||
|
param([string]$Check, [string]$Reason)
|
||||||
|
Write-Host " [FAIL] $Check - $Reason" -ForegroundColor Red
|
||||||
|
$script:failures++
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Deferred {
|
||||||
|
param([string]$Check, [string]$FollowupPr)
|
||||||
|
Write-Host " [DEFERRED] $Check (follow-up: $FollowupPr)" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-FileExists {
|
||||||
|
param([string]$Check, [string]$RelPath)
|
||||||
|
$full = Join-Path $repoRoot $RelPath
|
||||||
|
if (Test-Path $full) { Assert-Pass "$Check ($RelPath)" }
|
||||||
|
else { Assert-Fail $Check "missing file: $RelPath" }
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-TextFound {
|
||||||
|
param([string]$Check, [string]$Pattern, [string[]]$RelPaths)
|
||||||
|
foreach ($p in $RelPaths) {
|
||||||
|
$full = Join-Path $repoRoot $p
|
||||||
|
if (-not (Test-Path $full)) { continue }
|
||||||
|
if (Select-String -Path $full -Pattern $Pattern -Quiet) {
|
||||||
|
Assert-Pass "$Check (matched in $p)"
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Assert-Fail $Check "pattern '$Pattern' not found in any of: $($RelPaths -join ', ')"
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "=== Phase 6.1 compliance - Resilience & Observability runtime ===" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
Write-Host "Stream A - Resilience layer"
|
||||||
|
Assert-FileExists "Pipeline builder present" "src/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResiliencePipelineBuilder.cs"
|
||||||
|
Assert-FileExists "CapabilityInvoker present" "src/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs"
|
||||||
|
Assert-FileExists "WriteIdempotentAttribute present" "src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/WriteIdempotentAttribute.cs"
|
||||||
|
Assert-TextFound "Pipeline key includes HostName (per-device isolation)" "PipelineKey\(.+HostName" @("src/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResiliencePipelineBuilder.cs")
|
||||||
|
Assert-TextFound "OnReadValue routes through invoker" "DriverCapability\.Read," @("src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs")
|
||||||
|
Assert-TextFound "OnWriteValue routes through invoker" "ExecuteWriteAsync" @("src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs")
|
||||||
|
Assert-TextFound "HistoryRead routes through invoker" "DriverCapability\.HistoryRead" @("src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs")
|
||||||
|
Assert-FileExists "Galaxy supervisor CircuitBreaker preserved" "src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Supervisor/CircuitBreaker.cs"
|
||||||
|
Assert-FileExists "Galaxy supervisor Backoff preserved" "src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Supervisor/Backoff.cs"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream B - Tier A/B/C runtime"
|
||||||
|
Assert-FileExists "DriverTier enum present" "src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTier.cs"
|
||||||
|
Assert-TextFound "DriverTypeMetadata requires Tier" "DriverTier Tier" @("src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs")
|
||||||
|
Assert-FileExists "MemoryTracking present" "src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryTracking.cs"
|
||||||
|
Assert-FileExists "MemoryRecycle present" "src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryRecycle.cs"
|
||||||
|
Assert-TextFound "MemoryRecycle is Tier C gated" "_tier == DriverTier\.C" @("src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryRecycle.cs")
|
||||||
|
Assert-FileExists "ScheduledRecycleScheduler present" "src/ZB.MOM.WW.OtOpcUa.Core/Stability/ScheduledRecycleScheduler.cs"
|
||||||
|
Assert-TextFound "Scheduler ctor rejects Tier A/B" "tier != DriverTier\.C" @("src/ZB.MOM.WW.OtOpcUa.Core/Stability/ScheduledRecycleScheduler.cs")
|
||||||
|
Assert-FileExists "WedgeDetector present" "src/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs"
|
||||||
|
Assert-TextFound "WedgeDetector is demand-aware" "HasPendingWork" @("src/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs")
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream C - Health + logging"
|
||||||
|
Assert-FileExists "DriverHealthReport present" "src/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs"
|
||||||
|
Assert-FileExists "HealthEndpointsHost present" "src/ZB.MOM.WW.OtOpcUa.Server/Observability/HealthEndpointsHost.cs"
|
||||||
|
Assert-TextFound "State matrix: Healthy = 200" "ReadinessVerdict\.Healthy => 200" @("src/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs")
|
||||||
|
Assert-TextFound "State matrix: Faulted = 503" "ReadinessVerdict\.Faulted => 503" @("src/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs")
|
||||||
|
Assert-FileExists "LogContextEnricher present" "src/ZB.MOM.WW.OtOpcUa.Core/Observability/LogContextEnricher.cs"
|
||||||
|
Assert-TextFound "Enricher pushes DriverInstanceId property" "DriverInstanceId" @("src/ZB.MOM.WW.OtOpcUa.Core/Observability/LogContextEnricher.cs")
|
||||||
|
Assert-TextFound "JSON sink opt-in via Serilog:WriteJson" "Serilog:WriteJson" @("src/ZB.MOM.WW.OtOpcUa.Server/Program.cs")
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream D - LiteDB generation-sealed cache"
|
||||||
|
Assert-FileExists "GenerationSealedCache present" "src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs"
|
||||||
|
Assert-TextFound "Sealed files marked ReadOnly" "FileAttributes\.ReadOnly" @("src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs")
|
||||||
|
Assert-TextFound "Corruption fails closed with GenerationCacheUnavailableException" "GenerationCacheUnavailableException" @("src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/GenerationSealedCache.cs")
|
||||||
|
Assert-FileExists "ResilientConfigReader present" "src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs"
|
||||||
|
Assert-FileExists "StaleConfigFlag present" "src/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/StaleConfigFlag.cs"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream E - Admin /hosts (data layer)"
|
||||||
|
Assert-FileExists "DriverInstanceResilienceStatus entity" "src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/DriverInstanceResilienceStatus.cs"
|
||||||
|
Assert-FileExists "DriverResilienceStatusTracker present" "src/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceStatusTracker.cs"
|
||||||
|
Assert-Deferred "FleetStatusHub SignalR push + Blazor /hosts column refresh" "Phase 6.1 Stream E.2/E.3 visual-compliance follow-up"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Cross-cutting"
|
||||||
|
Write-Host " Running full solution test suite..." -ForegroundColor DarkGray
|
||||||
|
$prevPref = $ErrorActionPreference
|
||||||
|
$ErrorActionPreference = 'Continue'
|
||||||
|
$testOutput = & dotnet test (Join-Path $repoRoot 'ZB.MOM.WW.OtOpcUa.slnx') --nologo 2>&1
|
||||||
|
$ErrorActionPreference = $prevPref
|
||||||
|
$passLine = $testOutput | Select-String 'Passed:\s+(\d+)' -AllMatches
|
||||||
|
$failLine = $testOutput | Select-String 'Failed:\s+(\d+)' -AllMatches
|
||||||
|
$passCount = 0; foreach ($m in $passLine.Matches) { $passCount += [int]$m.Groups[1].Value }
|
||||||
|
$failCount = 0; foreach ($m in $failLine.Matches) { $failCount += [int]$m.Groups[1].Value }
|
||||||
|
$baseline = 906
|
||||||
|
if ($passCount -ge $baseline) { Assert-Pass "No test-count regression ($passCount >= $baseline baseline)" }
|
||||||
|
else { Assert-Fail "Test-count regression" "passed $passCount < baseline $baseline" }
|
||||||
|
|
||||||
|
# Pre-existing Client.CLI Subscribe flake tracked separately; exit gate tolerates a single
|
||||||
|
# known flake but flags any NEW failures.
|
||||||
|
if ($failCount -le 1) { Assert-Pass "No new failing tests (pre-existing CLI flake tolerated)" }
|
||||||
|
else { Assert-Fail "New failing tests" "$failCount failures > 1 tolerated" }
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
if ($script:failures -eq 0) {
|
||||||
|
Write-Host "Phase 6.1 compliance: PASS" -ForegroundColor Green
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
Write-Host "Phase 6.1 compliance: $script:failures FAIL(s)" -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
81
scripts/compliance/phase-6-2-compliance.ps1
Normal file
81
scripts/compliance/phase-6-2-compliance.ps1
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Phase 6.2 exit-gate compliance check — stub. Each `Assert-*` either passes
|
||||||
|
(Write-Host green) or throws. Non-zero exit = fail.
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
Validates Phase 6.2 (Authorization runtime) completion. Checks enumerated
|
||||||
|
in `docs/v2/implementation/phase-6-2-authorization-runtime.md`
|
||||||
|
§"Compliance Checks (run at exit gate)".
|
||||||
|
|
||||||
|
Current status: SCAFFOLD. Every check writes a TODO line and does NOT throw.
|
||||||
|
Each implementation task in Phase 6.2 is responsible for replacing its TODO
|
||||||
|
with a real check before closing that task.
|
||||||
|
|
||||||
|
.NOTES
|
||||||
|
Usage: pwsh ./scripts/compliance/phase-6-2-compliance.ps1
|
||||||
|
Exit: 0 = all checks passed (or are still TODO); non-zero = explicit fail
|
||||||
|
#>
|
||||||
|
[CmdletBinding()]
|
||||||
|
param()
|
||||||
|
|
||||||
|
$ErrorActionPreference = 'Stop'
|
||||||
|
$script:failures = 0
|
||||||
|
|
||||||
|
function Assert-Todo {
|
||||||
|
param([string]$Check, [string]$ImplementationTask)
|
||||||
|
Write-Host " [TODO] $Check (implement during $ImplementationTask)" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Pass {
|
||||||
|
param([string]$Check)
|
||||||
|
Write-Host " [PASS] $Check" -ForegroundColor Green
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Fail {
|
||||||
|
param([string]$Check, [string]$Reason)
|
||||||
|
Write-Host " [FAIL] $Check — $Reason" -ForegroundColor Red
|
||||||
|
$script:failures++
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "=== Phase 6.2 compliance — Authorization runtime ===" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
Write-Host "Stream A — LdapGroupRoleMapping (control plane)"
|
||||||
|
Assert-Todo "Control/data-plane separation — Core.Authorization has zero refs to LdapGroupRoleMapping" "Stream A.2"
|
||||||
|
Assert-Todo "Authoring validation — AclsTab rejects duplicate (LdapGroup, Scope) pre-save" "Stream A.3"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream B — Evaluator + trie + cache"
|
||||||
|
Assert-Todo "Trie invariants — PermissionTrieBuilder idempotent (build twice == equal)" "Stream B.1"
|
||||||
|
Assert-Todo "Additive grants + cluster isolation — cross-cluster leakage impossible" "Stream B.1"
|
||||||
|
Assert-Todo "Galaxy FolderSegment coverage — folder-subtree grant cascades; siblings unaffected" "Stream B.2"
|
||||||
|
Assert-Todo "Redundancy-safe invalidation — generation-mismatch forces trie re-load on peer" "Stream B.4"
|
||||||
|
Assert-Todo "Membership freshness — 15 min interval elapsed + LDAP down = fail-closed" "Stream B.5"
|
||||||
|
Assert-Todo "Auth cache fail-closed — 5 min AuthCacheMaxStaleness exceeded = NotGranted" "Stream B.5"
|
||||||
|
Assert-Todo "AuthorizationDecision shape — Allow + NotGranted only; Denied variant exists unused" "Stream B.6"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream C — OPC UA operation wiring"
|
||||||
|
Assert-Todo "Every operation wired — Browse/Read/Write/HistoryRead/HistoryUpdate/CreateMonitoredItems/TransferSubscriptions/Call/Ack/Confirm/Shelve" "Stream C.1-C.7"
|
||||||
|
Assert-Todo "HistoryRead uses its own flag — Read+no-HistoryRead denies HistoryRead" "Stream C.3"
|
||||||
|
Assert-Todo "Mixed-batch semantics — 3 allowed + 2 denied returns per-item status, no coarse failure" "Stream C.6"
|
||||||
|
Assert-Todo "Browse ancestor visibility — deep grant implies ancestor browse; denied ancestors filter" "Stream C.7"
|
||||||
|
Assert-Todo "Subscription re-authorization — revoked grant surfaces BadUserAccessDenied in one publish" "Stream C.5"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream D — Admin UI + SignalR invalidation"
|
||||||
|
Assert-Todo "SignalR invalidation — sp_PublishGeneration pushes PermissionTrieCache invalidate < 2 s" "Stream D.4"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Cross-cutting"
|
||||||
|
Assert-Todo "No test-count regression — dotnet test ZB.MOM.WW.OtOpcUa.slnx count ≥ pre-Phase-6.2 baseline" "Final exit-gate"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
if ($script:failures -eq 0) {
|
||||||
|
Write-Host "Phase 6.2 compliance: scaffold-mode PASS (all checks TODO)" -ForegroundColor Green
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
Write-Host "Phase 6.2 compliance: $script:failures FAIL(s)" -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
85
scripts/compliance/phase-6-3-compliance.ps1
Normal file
85
scripts/compliance/phase-6-3-compliance.ps1
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Phase 6.3 exit-gate compliance check — stub. Each `Assert-*` either passes
|
||||||
|
(Write-Host green) or throws. Non-zero exit = fail.
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
Validates Phase 6.3 (Redundancy runtime) completion. Checks enumerated in
|
||||||
|
`docs/v2/implementation/phase-6-3-redundancy-runtime.md`
|
||||||
|
§"Compliance Checks (run at exit gate)".
|
||||||
|
|
||||||
|
Current status: SCAFFOLD. Every check writes a TODO line and does NOT throw.
|
||||||
|
Each implementation task in Phase 6.3 is responsible for replacing its TODO
|
||||||
|
with a real check before closing that task.
|
||||||
|
|
||||||
|
.NOTES
|
||||||
|
Usage: pwsh ./scripts/compliance/phase-6-3-compliance.ps1
|
||||||
|
Exit: 0 = all checks passed (or are still TODO); non-zero = explicit fail
|
||||||
|
#>
|
||||||
|
[CmdletBinding()]
|
||||||
|
param()
|
||||||
|
|
||||||
|
$ErrorActionPreference = 'Stop'
|
||||||
|
$script:failures = 0
|
||||||
|
|
||||||
|
function Assert-Todo {
|
||||||
|
param([string]$Check, [string]$ImplementationTask)
|
||||||
|
Write-Host " [TODO] $Check (implement during $ImplementationTask)" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Pass {
|
||||||
|
param([string]$Check)
|
||||||
|
Write-Host " [PASS] $Check" -ForegroundColor Green
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Fail {
|
||||||
|
param([string]$Check, [string]$Reason)
|
||||||
|
Write-Host " [FAIL] $Check — $Reason" -ForegroundColor Red
|
||||||
|
$script:failures++
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "=== Phase 6.3 compliance — Redundancy runtime ===" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
Write-Host "Stream A — Topology loader"
|
||||||
|
Assert-Todo "Transparent-mode rejection — sp_PublishGeneration blocks RedundancyMode=Transparent" "Stream A.3"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream B — Peer probe + ServiceLevel calculator"
|
||||||
|
Assert-Todo "OPC UA band compliance — 0=Maintenance / 1=NoData reserved; operational 2..255" "Stream B.2"
|
||||||
|
Assert-Todo "Authoritative-Primary ServiceLevel = 255" "Stream B.2"
|
||||||
|
Assert-Todo "Isolated-Primary (peer unreachable, self serving) = 230" "Stream B.2"
|
||||||
|
Assert-Todo "Primary-Mid-Apply = 200" "Stream B.2"
|
||||||
|
Assert-Todo "Recovering-Primary = 180 with dwell + publish witness enforced" "Stream B.2"
|
||||||
|
Assert-Todo "Authoritative-Backup = 100" "Stream B.2"
|
||||||
|
Assert-Todo "Isolated-Backup (primary unreachable) = 80 — no auto-promote" "Stream B.2"
|
||||||
|
Assert-Todo "InvalidTopology = 2 — >1 Primary self-demotes both nodes" "Stream B.2"
|
||||||
|
Assert-Todo "UaHealthProbe authority — HTTP-200 + UA-down peer treated as UA-unhealthy" "Stream B.1"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream C — OPC UA node wiring"
|
||||||
|
Assert-Todo "ServerUriArray — returns self + peer URIs, self first" "Stream C.2"
|
||||||
|
Assert-Todo "Client.CLI cutover — primary halt triggers reconnect to backup via ServerUriArray" "Stream C.4"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream D — Apply-lease + publish fencing"
|
||||||
|
Assert-Todo "Apply-lease disposal — leases close on exception, cancellation, watchdog timeout" "Stream D.2"
|
||||||
|
Assert-Todo "Role transition via operator publish — no restart; both nodes flip ServiceLevel on publish confirm" "Stream D.3"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream F — Interop matrix"
|
||||||
|
Assert-Todo "Client interoperability matrix — Ignition 8.1/8.3 / Kepware / Aveva OI Gateway findings documented" "Stream F.1-F.2"
|
||||||
|
Assert-Todo "Galaxy MXAccess failover — primary kill; Galaxy consumer reconnects within session-timeout budget" "Stream F.3"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Cross-cutting"
|
||||||
|
Assert-Todo "No regression in driver test suites; /healthz reachable under redundancy load" "Final exit-gate"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
if ($script:failures -eq 0) {
|
||||||
|
Write-Host "Phase 6.3 compliance: scaffold-mode PASS (all checks TODO)" -ForegroundColor Green
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
Write-Host "Phase 6.3 compliance: $script:failures FAIL(s)" -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
83
scripts/compliance/phase-6-4-compliance.ps1
Normal file
83
scripts/compliance/phase-6-4-compliance.ps1
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Phase 6.4 exit-gate compliance check — stub. Each `Assert-*` either passes
|
||||||
|
(Write-Host green) or throws. Non-zero exit = fail.
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
Validates Phase 6.4 (Admin UI completion) completion. Checks enumerated in
|
||||||
|
`docs/v2/implementation/phase-6-4-admin-ui-completion.md`
|
||||||
|
§"Compliance Checks (run at exit gate)".
|
||||||
|
|
||||||
|
Current status: SCAFFOLD. Every check writes a TODO line and does NOT throw.
|
||||||
|
Each implementation task in Phase 6.4 is responsible for replacing its TODO
|
||||||
|
with a real check before closing that task.
|
||||||
|
|
||||||
|
.NOTES
|
||||||
|
Usage: pwsh ./scripts/compliance/phase-6-4-compliance.ps1
|
||||||
|
Exit: 0 = all checks passed (or are still TODO); non-zero = explicit fail
|
||||||
|
#>
|
||||||
|
[CmdletBinding()]
|
||||||
|
param()
|
||||||
|
|
||||||
|
$ErrorActionPreference = 'Stop'
|
||||||
|
$script:failures = 0
|
||||||
|
|
||||||
|
function Assert-Todo {
|
||||||
|
param([string]$Check, [string]$ImplementationTask)
|
||||||
|
Write-Host " [TODO] $Check (implement during $ImplementationTask)" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Pass {
|
||||||
|
param([string]$Check)
|
||||||
|
Write-Host " [PASS] $Check" -ForegroundColor Green
|
||||||
|
}
|
||||||
|
|
||||||
|
function Assert-Fail {
|
||||||
|
param([string]$Check, [string]$Reason)
|
||||||
|
Write-Host " [FAIL] $Check — $Reason" -ForegroundColor Red
|
||||||
|
$script:failures++
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "=== Phase 6.4 compliance — Admin UI completion ===" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
Write-Host "Stream A — UNS drag/move + impact preview"
|
||||||
|
Assert-Todo "UNS drag/move — drag line across areas; modal shows correct impacted-equipment + tag counts" "Stream A.2"
|
||||||
|
Assert-Todo "Concurrent-edit safety — session B saves draft mid-preview; session A Confirm returns 409" "Stream A.3 (DraftRevisionToken)"
|
||||||
|
Assert-Todo "Cross-cluster drop disabled — actionable toast points to Export/Import" "Stream A.2"
|
||||||
|
Assert-Todo "1000-node tree — drag-enter feedback < 100 ms" "Stream A.4"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream B — CSV import + staged-import + 5-identifier search"
|
||||||
|
Assert-Todo "CSV header version — file missing '# OtOpcUaCsv v1' rejected pre-parse" "Stream B.1"
|
||||||
|
Assert-Todo "CSV canonical identifier set — columns match decision #117 exactly" "Stream B.1"
|
||||||
|
Assert-Todo "Staged-import atomicity — 10k-row FinaliseImportBatch < 30 s; user-scoped visibility; DropImportBatch rollback" "Stream B.3"
|
||||||
|
Assert-Todo "Concurrent import + external reservation — finalize retries with conflict handling; no corruption" "Stream B.3"
|
||||||
|
Assert-Todo "5-identifier search ranking — exact > prefix; published > draft for equal scores" "Stream B.4"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream C — DiffViewer sections"
|
||||||
|
Assert-Todo "Diff viewer section caps — 2000-row subtree-rename summary-only; 'Load full diff' paginates" "Stream C.2"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Stream D — Identification (OPC 40010)"
|
||||||
|
Assert-Todo "OPC 40010 field list match — rendered fields match decision #139 exactly; no extras" "Stream D.1"
|
||||||
|
Assert-Todo "OPC 40010 exposure — Identification sub-folder shows when non-null; absent when all null" "Stream D.3"
|
||||||
|
Assert-Todo "ACL inheritance for Identification — Equipment-grant reads; no-grant denies both" "Stream D.4"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Visual compliance"
|
||||||
|
Assert-Todo "Visual parity reviewer — FleetAdmin signoff vs admin-ui.md §Visual-Design; screenshot set checked in under docs/v2/visual-compliance/phase-6-4/" "Visual review"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Cross-cutting"
|
||||||
|
Assert-Todo "Full solution dotnet test passes; no test-count regression vs pre-Phase-6.4 baseline" "Final exit-gate"
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
if ($script:failures -eq 0) {
|
||||||
|
Write-Host "Phase 6.4 compliance: scaffold-mode PASS (all checks TODO)" -ForegroundColor Green
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
Write-Host "Phase 6.4 compliance: $script:failures FAIL(s)" -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
@@ -0,0 +1,117 @@
|
|||||||
|
using Microsoft.EntityFrameworkCore;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Admin.Services;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Draft-aware write surface over <see cref="NodeAcl"/>. Replaces direct
|
||||||
|
/// <see cref="NodeAclService"/> CRUD for Admin UI grant authoring; the raw service stays
|
||||||
|
/// as the read / delete surface. Enforces the invariants listed in Phase 6.2 Stream D.2:
|
||||||
|
/// scope-uniqueness per (LdapGroup, ScopeKind, ScopeId, GenerationId), grant shape
|
||||||
|
/// consistency, and no empty permission masks.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Per decision #129 grants are additive — <see cref="NodePermissions.None"/> is
|
||||||
|
/// rejected at write time. Explicit Deny is v2.1 and is not representable in the current
|
||||||
|
/// <c>NodeAcl</c> row; attempts to express it (e.g. empty permission set) surface as
|
||||||
|
/// <see cref="InvalidNodeAclGrantException"/>.</para>
|
||||||
|
///
|
||||||
|
/// <para>Draft scope: writes always target an unpublished (Draft-state) generation id.
|
||||||
|
/// Once a generation publishes, its rows are frozen.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class ValidatedNodeAclAuthoringService(OtOpcUaConfigDbContext db)
|
||||||
|
{
|
||||||
|
/// <summary>Add a new grant row to the given draft generation.</summary>
|
||||||
|
public async Task<NodeAcl> GrantAsync(
|
||||||
|
long draftGenerationId,
|
||||||
|
string clusterId,
|
||||||
|
string ldapGroup,
|
||||||
|
NodeAclScopeKind scopeKind,
|
||||||
|
string? scopeId,
|
||||||
|
NodePermissions permissions,
|
||||||
|
string? notes,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(ldapGroup);
|
||||||
|
|
||||||
|
ValidateGrantShape(scopeKind, scopeId, permissions);
|
||||||
|
await EnsureNoDuplicate(draftGenerationId, clusterId, ldapGroup, scopeKind, scopeId, cancellationToken).ConfigureAwait(false);
|
||||||
|
|
||||||
|
var row = new NodeAcl
|
||||||
|
{
|
||||||
|
GenerationId = draftGenerationId,
|
||||||
|
NodeAclId = $"acl-{Guid.NewGuid():N}"[..20],
|
||||||
|
ClusterId = clusterId,
|
||||||
|
LdapGroup = ldapGroup,
|
||||||
|
ScopeKind = scopeKind,
|
||||||
|
ScopeId = scopeId,
|
||||||
|
PermissionFlags = permissions,
|
||||||
|
Notes = notes,
|
||||||
|
};
|
||||||
|
db.NodeAcls.Add(row);
|
||||||
|
await db.SaveChangesAsync(cancellationToken).ConfigureAwait(false);
|
||||||
|
return row;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Replace an existing grant's permission set in place. Validates the new shape;
|
||||||
|
/// rejects attempts to blank-out to None (that's a Revoke via <see cref="NodeAclService"/>).
|
||||||
|
/// </summary>
|
||||||
|
public async Task<NodeAcl> UpdatePermissionsAsync(
|
||||||
|
Guid nodeAclRowId,
|
||||||
|
NodePermissions newPermissions,
|
||||||
|
string? notes,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
if (newPermissions == NodePermissions.None)
|
||||||
|
throw new InvalidNodeAclGrantException(
|
||||||
|
"Permission set cannot be None — revoke the row instead of writing an empty grant.");
|
||||||
|
|
||||||
|
var row = await db.NodeAcls.FirstOrDefaultAsync(a => a.NodeAclRowId == nodeAclRowId, cancellationToken).ConfigureAwait(false)
|
||||||
|
?? throw new InvalidNodeAclGrantException($"NodeAcl row {nodeAclRowId} not found.");
|
||||||
|
|
||||||
|
row.PermissionFlags = newPermissions;
|
||||||
|
if (notes is not null) row.Notes = notes;
|
||||||
|
await db.SaveChangesAsync(cancellationToken).ConfigureAwait(false);
|
||||||
|
return row;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void ValidateGrantShape(NodeAclScopeKind scopeKind, string? scopeId, NodePermissions permissions)
|
||||||
|
{
|
||||||
|
if (permissions == NodePermissions.None)
|
||||||
|
throw new InvalidNodeAclGrantException(
|
||||||
|
"Permission set cannot be None — grants must carry at least one flag (decision #129, additive only).");
|
||||||
|
|
||||||
|
if (scopeKind == NodeAclScopeKind.Cluster && !string.IsNullOrEmpty(scopeId))
|
||||||
|
throw new InvalidNodeAclGrantException(
|
||||||
|
"Cluster-scope grants must have null ScopeId. ScopeId only applies to sub-cluster scopes.");
|
||||||
|
|
||||||
|
if (scopeKind != NodeAclScopeKind.Cluster && string.IsNullOrEmpty(scopeId))
|
||||||
|
throw new InvalidNodeAclGrantException(
|
||||||
|
$"ScopeKind={scopeKind} requires a populated ScopeId.");
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task EnsureNoDuplicate(
|
||||||
|
long generationId, string clusterId, string ldapGroup, NodeAclScopeKind scopeKind, string? scopeId,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
var exists = await db.NodeAcls.AsNoTracking()
|
||||||
|
.AnyAsync(a => a.GenerationId == generationId
|
||||||
|
&& a.ClusterId == clusterId
|
||||||
|
&& a.LdapGroup == ldapGroup
|
||||||
|
&& a.ScopeKind == scopeKind
|
||||||
|
&& a.ScopeId == scopeId,
|
||||||
|
cancellationToken).ConfigureAwait(false);
|
||||||
|
|
||||||
|
if (exists)
|
||||||
|
throw new InvalidNodeAclGrantException(
|
||||||
|
$"A grant for (LdapGroup={ldapGroup}, ScopeKind={scopeKind}, ScopeId={scopeId}) already exists in generation {generationId}. " +
|
||||||
|
"Update the existing row's permissions instead of inserting a duplicate.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Thrown when a <see cref="NodeAcl"/> grant authoring request violates an invariant.</summary>
|
||||||
|
public sealed class InvalidNodeAclGrantException(string message) : Exception(message);
|
||||||
@@ -0,0 +1,44 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Runtime resilience counters the CapabilityInvoker + MemoryTracking + MemoryRecycle
|
||||||
|
/// surfaces for each <c>(DriverInstanceId, HostName)</c> pair. Separate from
|
||||||
|
/// <see cref="DriverHostStatus"/> (which owns per-host <i>connectivity</i> state) so a
|
||||||
|
/// host that's Running but has tripped its breaker or is approaching its memory ceiling
|
||||||
|
/// shows up distinctly on Admin <c>/hosts</c>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/implementation/phase-6-1-resilience-and-observability.md</c> §Stream E.1.
|
||||||
|
/// The Admin UI left-joins this table on DriverHostStatus for display; rows are written
|
||||||
|
/// by the runtime via a HostedService that samples the tracker at a configurable
|
||||||
|
/// interval (default 5 s) — writes are non-critical, a missed sample is tolerated.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class DriverInstanceResilienceStatus
|
||||||
|
{
|
||||||
|
public required string DriverInstanceId { get; set; }
|
||||||
|
public required string HostName { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Most recent time the circuit breaker for this (instance, host) opened; null if never.</summary>
|
||||||
|
public DateTime? LastCircuitBreakerOpenUtc { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Rolling count of consecutive Polly pipeline failures for this (instance, host).</summary>
|
||||||
|
public int ConsecutiveFailures { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Current Polly bulkhead depth (in-flight calls) for this (instance, host).</summary>
|
||||||
|
public int CurrentBulkheadDepth { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Most recent process recycle time (Tier C only; null for in-process tiers).</summary>
|
||||||
|
public DateTime? LastRecycleUtc { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Post-init memory baseline captured by <c>MemoryTracking</c> (median of first
|
||||||
|
/// BaselineWindow samples). Zero while still warming up.
|
||||||
|
/// </summary>
|
||||||
|
public long BaselineFootprintBytes { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Most recent footprint sample the tracker saw (steady-state read).</summary>
|
||||||
|
public long CurrentFootprintBytes { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Row last-write timestamp — advances on every sampling tick.</summary>
|
||||||
|
public DateTime LastSampledUtc { get; set; }
|
||||||
|
}
|
||||||
@@ -0,0 +1,56 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Maps an LDAP group to an <see cref="AdminRole"/> for Admin UI access. Optionally scoped
|
||||||
|
/// to one <see cref="ClusterId"/>; when <see cref="IsSystemWide"/> is true, the grant
|
||||||
|
/// applies fleet-wide.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Per <c>docs/v2/plan.md</c> decisions #105 and #150 — this entity is <b>control-plane
|
||||||
|
/// only</b>. The OPC UA data-path evaluator does not read these rows; it reads
|
||||||
|
/// <see cref="NodeAcl"/> joined directly against the session's resolved LDAP group
|
||||||
|
/// memberships. Collapsing the two would let a user inherit tag permissions via an
|
||||||
|
/// admin-role claim path never intended as a data-path grant.</para>
|
||||||
|
///
|
||||||
|
/// <para>Uniqueness: <c>(LdapGroup, ClusterId)</c> — the same LDAP group may hold
|
||||||
|
/// different roles on different clusters, but only one row per cluster. A system-wide row
|
||||||
|
/// (<c>IsSystemWide = true</c>, <c>ClusterId = null</c>) stacks additively with any
|
||||||
|
/// cluster-scoped rows for the same group.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class LdapGroupRoleMapping
|
||||||
|
{
|
||||||
|
/// <summary>Surrogate primary key.</summary>
|
||||||
|
public Guid Id { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// LDAP group DN the membership query returns (e.g. <c>cn=fleet-admin,ou=groups,dc=corp,dc=example</c>).
|
||||||
|
/// Comparison is case-insensitive per LDAP conventions.
|
||||||
|
/// </summary>
|
||||||
|
public required string LdapGroup { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Admin role this group grants.</summary>
|
||||||
|
public required AdminRole Role { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Cluster the grant applies to; <c>null</c> when <see cref="IsSystemWide"/> is true.
|
||||||
|
/// Foreign key to <see cref="ServerCluster.ClusterId"/>.
|
||||||
|
/// </summary>
|
||||||
|
public string? ClusterId { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// <c>true</c> = grant applies across every cluster in the fleet; <c>ClusterId</c> must be null.
|
||||||
|
/// <c>false</c> = grant is cluster-scoped; <c>ClusterId</c> must be populated.
|
||||||
|
/// </summary>
|
||||||
|
public required bool IsSystemWide { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Row creation timestamp (UTC).</summary>
|
||||||
|
public DateTime CreatedAtUtc { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Optional human-readable note (e.g. "added 2026-04-19 for Warsaw fleet admin handoff").</summary>
|
||||||
|
public string? Notes { get; set; }
|
||||||
|
|
||||||
|
/// <summary>Navigation for EF core when the row is cluster-scoped.</summary>
|
||||||
|
public ServerCluster? Cluster { get; set; }
|
||||||
|
}
|
||||||
26
src/ZB.MOM.WW.OtOpcUa.Configuration/Enums/AdminRole.cs
Normal file
26
src/ZB.MOM.WW.OtOpcUa.Configuration/Enums/AdminRole.cs
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Admin UI roles per <c>admin-ui.md</c> §"Admin Roles" and Phase 6.2 Stream A.
|
||||||
|
/// These govern Admin UI capabilities (cluster CRUD, draft → publish, fleet-wide admin
|
||||||
|
/// actions) — they do NOT govern OPC UA data-path authorization, which reads
|
||||||
|
/// <see cref="Entities.NodeAcl"/> joined against LDAP group memberships directly.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decision #150 the two concerns share zero runtime code path:
|
||||||
|
/// the control plane (Admin UI) consumes <see cref="Entities.LdapGroupRoleMapping"/>; the
|
||||||
|
/// data plane consumes <see cref="Entities.NodeAcl"/> rows directly. Having them in one
|
||||||
|
/// table would collapse the distinction + let a user inherit tag permissions via their
|
||||||
|
/// admin-role claim path.
|
||||||
|
/// </remarks>
|
||||||
|
public enum AdminRole
|
||||||
|
{
|
||||||
|
/// <summary>Read-only Admin UI access — can view cluster state, drafts, publish history.</summary>
|
||||||
|
ConfigViewer,
|
||||||
|
|
||||||
|
/// <summary>Can author drafts + submit for publish.</summary>
|
||||||
|
ConfigEditor,
|
||||||
|
|
||||||
|
/// <summary>Full Admin UI privileges including publish + fleet-admin actions.</summary>
|
||||||
|
FleetAdmin,
|
||||||
|
}
|
||||||
@@ -0,0 +1,170 @@
|
|||||||
|
using LiteDB;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Generation-sealed LiteDB cache per <c>docs/v2/plan.md</c> decision #148 and Phase 6.1
|
||||||
|
/// Stream D.1. Each published generation writes one <b>read-only</b> LiteDB file under
|
||||||
|
/// <c><cache-root>/<clusterId>/<generationId>.db</c>. A per-cluster
|
||||||
|
/// <c>CURRENT</c> text file holds the currently-active generation id; it is updated
|
||||||
|
/// atomically (temp file + <see cref="File.Replace(string, string, string?)"/>) only after
|
||||||
|
/// the sealed file is fully written.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Mixed-generation reads are impossible: any read opens the single file pointed to
|
||||||
|
/// by <c>CURRENT</c>, which is a coherent snapshot. Corruption of the CURRENT file or the
|
||||||
|
/// sealed file surfaces as <see cref="GenerationCacheUnavailableException"/> — the reader
|
||||||
|
/// fails closed rather than silently falling back to an older generation. Recovery path
|
||||||
|
/// is to re-fetch from the central DB (and the Phase 6.1 Stream C <c>UsingStaleConfig</c>
|
||||||
|
/// flag goes true until that succeeds).</para>
|
||||||
|
///
|
||||||
|
/// <para>This cache is the read-path fallback when the central DB is unreachable. The
|
||||||
|
/// write path (draft edits, publish) bypasses the cache and fails hard on DB outage per
|
||||||
|
/// Stream D.2 — inconsistent writes are worse than a temporary inability to edit.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class GenerationSealedCache
|
||||||
|
{
|
||||||
|
private const string CollectionName = "generation";
|
||||||
|
private const string CurrentPointerFileName = "CURRENT";
|
||||||
|
private readonly string _cacheRoot;
|
||||||
|
|
||||||
|
/// <summary>Root directory for all clusters' sealed caches.</summary>
|
||||||
|
public string CacheRoot => _cacheRoot;
|
||||||
|
|
||||||
|
public GenerationSealedCache(string cacheRoot)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(cacheRoot);
|
||||||
|
_cacheRoot = cacheRoot;
|
||||||
|
Directory.CreateDirectory(_cacheRoot);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Seal a generation: write the snapshot to <c><cluster>/<generationId>.db</c>,
|
||||||
|
/// mark the file read-only, then atomically publish the <c>CURRENT</c> pointer. Existing
|
||||||
|
/// sealed files for prior generations are preserved (prune separately).
|
||||||
|
/// </summary>
|
||||||
|
public async Task SealAsync(GenerationSnapshot snapshot, CancellationToken ct = default)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(snapshot);
|
||||||
|
ct.ThrowIfCancellationRequested();
|
||||||
|
|
||||||
|
var clusterDir = Path.Combine(_cacheRoot, snapshot.ClusterId);
|
||||||
|
Directory.CreateDirectory(clusterDir);
|
||||||
|
var sealedPath = Path.Combine(clusterDir, $"{snapshot.GenerationId}.db");
|
||||||
|
|
||||||
|
if (File.Exists(sealedPath))
|
||||||
|
{
|
||||||
|
// Already sealed — idempotent. Treat as no-op + update pointer in case an earlier
|
||||||
|
// seal succeeded but the pointer update failed (crash recovery).
|
||||||
|
WritePointerAtomically(clusterDir, snapshot.GenerationId);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
var tmpPath = sealedPath + ".tmp";
|
||||||
|
try
|
||||||
|
{
|
||||||
|
using (var db = new LiteDatabase(new ConnectionString { Filename = tmpPath, Upgrade = false }))
|
||||||
|
{
|
||||||
|
var col = db.GetCollection<GenerationSnapshot>(CollectionName);
|
||||||
|
col.Insert(snapshot);
|
||||||
|
}
|
||||||
|
|
||||||
|
File.Move(tmpPath, sealedPath);
|
||||||
|
File.SetAttributes(sealedPath, File.GetAttributes(sealedPath) | FileAttributes.ReadOnly);
|
||||||
|
WritePointerAtomically(clusterDir, snapshot.GenerationId);
|
||||||
|
}
|
||||||
|
catch
|
||||||
|
{
|
||||||
|
try { if (File.Exists(tmpPath)) File.Delete(tmpPath); } catch { /* best-effort */ }
|
||||||
|
throw;
|
||||||
|
}
|
||||||
|
|
||||||
|
await Task.CompletedTask;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Read the current sealed snapshot for <paramref name="clusterId"/>. Throws
|
||||||
|
/// <see cref="GenerationCacheUnavailableException"/> when the pointer is missing
|
||||||
|
/// (first-boot-no-snapshot case) or when the sealed file is corrupt. Never silently
|
||||||
|
/// falls back to a prior generation.
|
||||||
|
/// </summary>
|
||||||
|
public Task<GenerationSnapshot> ReadCurrentAsync(string clusterId, CancellationToken ct = default)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
ct.ThrowIfCancellationRequested();
|
||||||
|
|
||||||
|
var clusterDir = Path.Combine(_cacheRoot, clusterId);
|
||||||
|
var pointerPath = Path.Combine(clusterDir, CurrentPointerFileName);
|
||||||
|
if (!File.Exists(pointerPath))
|
||||||
|
throw new GenerationCacheUnavailableException(
|
||||||
|
$"No sealed generation for cluster '{clusterId}' at '{clusterDir}'. First-boot case: the central DB must be reachable at least once before cache fallback is possible.");
|
||||||
|
|
||||||
|
long generationId;
|
||||||
|
try
|
||||||
|
{
|
||||||
|
var text = File.ReadAllText(pointerPath).Trim();
|
||||||
|
generationId = long.Parse(text, System.Globalization.CultureInfo.InvariantCulture);
|
||||||
|
}
|
||||||
|
catch (Exception ex)
|
||||||
|
{
|
||||||
|
throw new GenerationCacheUnavailableException(
|
||||||
|
$"CURRENT pointer at '{pointerPath}' is corrupt or unreadable.", ex);
|
||||||
|
}
|
||||||
|
|
||||||
|
var sealedPath = Path.Combine(clusterDir, $"{generationId}.db");
|
||||||
|
if (!File.Exists(sealedPath))
|
||||||
|
throw new GenerationCacheUnavailableException(
|
||||||
|
$"CURRENT points at generation {generationId} but '{sealedPath}' is missing — fails closed rather than serving an older generation.");
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
using var db = new LiteDatabase(new ConnectionString { Filename = sealedPath, ReadOnly = true });
|
||||||
|
var col = db.GetCollection<GenerationSnapshot>(CollectionName);
|
||||||
|
var snapshot = col.FindAll().FirstOrDefault()
|
||||||
|
?? throw new GenerationCacheUnavailableException(
|
||||||
|
$"Sealed file '{sealedPath}' contains no snapshot row — file is corrupt.");
|
||||||
|
return Task.FromResult(snapshot);
|
||||||
|
}
|
||||||
|
catch (GenerationCacheUnavailableException) { throw; }
|
||||||
|
catch (Exception ex) when (ex is LiteException or InvalidDataException or IOException
|
||||||
|
or NotSupportedException or FormatException)
|
||||||
|
{
|
||||||
|
throw new GenerationCacheUnavailableException(
|
||||||
|
$"Sealed file '{sealedPath}' is corrupt or unreadable — fails closed rather than falling back to an older generation.", ex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Return the generation id the <c>CURRENT</c> pointer points at, or null if no pointer exists.</summary>
|
||||||
|
public long? TryGetCurrentGenerationId(string clusterId)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
var pointerPath = Path.Combine(_cacheRoot, clusterId, CurrentPointerFileName);
|
||||||
|
if (!File.Exists(pointerPath)) return null;
|
||||||
|
try
|
||||||
|
{
|
||||||
|
return long.Parse(File.ReadAllText(pointerPath).Trim(), System.Globalization.CultureInfo.InvariantCulture);
|
||||||
|
}
|
||||||
|
catch
|
||||||
|
{
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void WritePointerAtomically(string clusterDir, long generationId)
|
||||||
|
{
|
||||||
|
var pointerPath = Path.Combine(clusterDir, CurrentPointerFileName);
|
||||||
|
var tmpPath = pointerPath + ".tmp";
|
||||||
|
File.WriteAllText(tmpPath, generationId.ToString(System.Globalization.CultureInfo.InvariantCulture));
|
||||||
|
if (File.Exists(pointerPath))
|
||||||
|
File.Replace(tmpPath, pointerPath, destinationBackupFileName: null);
|
||||||
|
else
|
||||||
|
File.Move(tmpPath, pointerPath);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Sealed cache is unreachable — caller must fail closed.</summary>
|
||||||
|
public sealed class GenerationCacheUnavailableException : Exception
|
||||||
|
{
|
||||||
|
public GenerationCacheUnavailableException(string message) : base(message) { }
|
||||||
|
public GenerationCacheUnavailableException(string message, Exception inner) : base(message, inner) { }
|
||||||
|
}
|
||||||
@@ -0,0 +1,90 @@
|
|||||||
|
using Microsoft.Extensions.Logging;
|
||||||
|
using Polly;
|
||||||
|
using Polly.Retry;
|
||||||
|
using Polly.Timeout;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Wraps a central-DB fetch function with Phase 6.1 Stream D.2 resilience:
|
||||||
|
/// <b>timeout 2 s → retry 3× jittered → fallback to sealed cache</b>. Maintains the
|
||||||
|
/// <see cref="StaleConfigFlag"/> — fresh on central-DB success, stale on cache fallback.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Read-path only per plan. The write path (draft save, publish) bypasses this
|
||||||
|
/// wrapper entirely and fails hard on DB outage so inconsistent writes never land.</para>
|
||||||
|
///
|
||||||
|
/// <para>Fallback is triggered by <b>any exception</b> the fetch raises (central-DB
|
||||||
|
/// unreachable, SqlException, timeout). If the sealed cache also fails (no pointer,
|
||||||
|
/// corrupt file, etc.), <see cref="GenerationCacheUnavailableException"/> surfaces — caller
|
||||||
|
/// must fail the current request (InitializeAsync for a driver, etc.).</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class ResilientConfigReader
|
||||||
|
{
|
||||||
|
private readonly GenerationSealedCache _cache;
|
||||||
|
private readonly StaleConfigFlag _staleFlag;
|
||||||
|
private readonly ResiliencePipeline _pipeline;
|
||||||
|
private readonly ILogger<ResilientConfigReader> _logger;
|
||||||
|
|
||||||
|
public ResilientConfigReader(
|
||||||
|
GenerationSealedCache cache,
|
||||||
|
StaleConfigFlag staleFlag,
|
||||||
|
ILogger<ResilientConfigReader> logger,
|
||||||
|
TimeSpan? timeout = null,
|
||||||
|
int retryCount = 3)
|
||||||
|
{
|
||||||
|
_cache = cache;
|
||||||
|
_staleFlag = staleFlag;
|
||||||
|
_logger = logger;
|
||||||
|
var builder = new ResiliencePipelineBuilder()
|
||||||
|
.AddTimeout(new TimeoutStrategyOptions { Timeout = timeout ?? TimeSpan.FromSeconds(2) });
|
||||||
|
|
||||||
|
if (retryCount > 0)
|
||||||
|
{
|
||||||
|
builder.AddRetry(new RetryStrategyOptions
|
||||||
|
{
|
||||||
|
MaxRetryAttempts = retryCount,
|
||||||
|
BackoffType = DelayBackoffType.Exponential,
|
||||||
|
UseJitter = true,
|
||||||
|
Delay = TimeSpan.FromMilliseconds(100),
|
||||||
|
MaxDelay = TimeSpan.FromSeconds(1),
|
||||||
|
ShouldHandle = new PredicateBuilder().Handle<Exception>(ex => ex is not OperationCanceledException),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
_pipeline = builder.Build();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Execute <paramref name="centralFetch"/> through the resilience pipeline. On full failure
|
||||||
|
/// (post-retry), reads the sealed cache for <paramref name="clusterId"/> and passes the
|
||||||
|
/// snapshot to <paramref name="fromSnapshot"/> to extract the requested shape.
|
||||||
|
/// </summary>
|
||||||
|
public async ValueTask<T> ReadAsync<T>(
|
||||||
|
string clusterId,
|
||||||
|
Func<CancellationToken, ValueTask<T>> centralFetch,
|
||||||
|
Func<GenerationSnapshot, T> fromSnapshot,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
ArgumentNullException.ThrowIfNull(centralFetch);
|
||||||
|
ArgumentNullException.ThrowIfNull(fromSnapshot);
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
var result = await _pipeline.ExecuteAsync(centralFetch, cancellationToken).ConfigureAwait(false);
|
||||||
|
_staleFlag.MarkFresh();
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
catch (Exception ex) when (ex is not OperationCanceledException)
|
||||||
|
{
|
||||||
|
_logger.LogWarning(ex, "Central-DB read failed after retries; falling back to sealed cache for cluster {ClusterId}", clusterId);
|
||||||
|
// GenerationCacheUnavailableException surfaces intentionally — fails the caller's
|
||||||
|
// operation. StaleConfigFlag stays unchanged; the flag only flips when we actually
|
||||||
|
// served a cache snapshot.
|
||||||
|
var snapshot = await _cache.ReadCurrentAsync(clusterId, cancellationToken).ConfigureAwait(false);
|
||||||
|
_staleFlag.MarkStale();
|
||||||
|
return fromSnapshot(snapshot);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Thread-safe <c>UsingStaleConfig</c> signal per Phase 6.1 Stream D.3. Flips true whenever
|
||||||
|
/// a read falls back to a sealed cache snapshot; flips false on the next successful central-DB
|
||||||
|
/// round-trip. Surfaced on <c>/healthz</c> body and on the Admin <c>/hosts</c> page.
|
||||||
|
/// </summary>
|
||||||
|
public sealed class StaleConfigFlag
|
||||||
|
{
|
||||||
|
private int _stale;
|
||||||
|
|
||||||
|
/// <summary>True when the last config read was served from the sealed cache, not the central DB.</summary>
|
||||||
|
public bool IsStale => Volatile.Read(ref _stale) != 0;
|
||||||
|
|
||||||
|
/// <summary>Mark the current config as stale (a read fell back to the cache).</summary>
|
||||||
|
public void MarkStale() => Volatile.Write(ref _stale, 1);
|
||||||
|
|
||||||
|
/// <summary>Mark the current config as fresh (a central-DB read succeeded).</summary>
|
||||||
|
public void MarkFresh() => Volatile.Write(ref _stale, 0);
|
||||||
|
}
|
||||||
1287
src/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260419124034_AddDriverInstanceResilienceStatus.Designer.cs
generated
Normal file
1287
src/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260419124034_AddDriverInstanceResilienceStatus.Designer.cs
generated
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,46 @@
|
|||||||
|
using System;
|
||||||
|
using Microsoft.EntityFrameworkCore.Migrations;
|
||||||
|
|
||||||
|
#nullable disable
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
|
||||||
|
{
|
||||||
|
/// <inheritdoc />
|
||||||
|
public partial class AddDriverInstanceResilienceStatus : Migration
|
||||||
|
{
|
||||||
|
/// <inheritdoc />
|
||||||
|
protected override void Up(MigrationBuilder migrationBuilder)
|
||||||
|
{
|
||||||
|
migrationBuilder.CreateTable(
|
||||||
|
name: "DriverInstanceResilienceStatus",
|
||||||
|
columns: table => new
|
||||||
|
{
|
||||||
|
DriverInstanceId = table.Column<string>(type: "nvarchar(64)", maxLength: 64, nullable: false),
|
||||||
|
HostName = table.Column<string>(type: "nvarchar(256)", maxLength: 256, nullable: false),
|
||||||
|
LastCircuitBreakerOpenUtc = table.Column<DateTime>(type: "datetime2(3)", nullable: true),
|
||||||
|
ConsecutiveFailures = table.Column<int>(type: "int", nullable: false),
|
||||||
|
CurrentBulkheadDepth = table.Column<int>(type: "int", nullable: false),
|
||||||
|
LastRecycleUtc = table.Column<DateTime>(type: "datetime2(3)", nullable: true),
|
||||||
|
BaselineFootprintBytes = table.Column<long>(type: "bigint", nullable: false),
|
||||||
|
CurrentFootprintBytes = table.Column<long>(type: "bigint", nullable: false),
|
||||||
|
LastSampledUtc = table.Column<DateTime>(type: "datetime2(3)", nullable: false)
|
||||||
|
},
|
||||||
|
constraints: table =>
|
||||||
|
{
|
||||||
|
table.PrimaryKey("PK_DriverInstanceResilienceStatus", x => new { x.DriverInstanceId, x.HostName });
|
||||||
|
});
|
||||||
|
|
||||||
|
migrationBuilder.CreateIndex(
|
||||||
|
name: "IX_DriverResilience_LastSampled",
|
||||||
|
table: "DriverInstanceResilienceStatus",
|
||||||
|
column: "LastSampledUtc");
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <inheritdoc />
|
||||||
|
protected override void Down(MigrationBuilder migrationBuilder)
|
||||||
|
{
|
||||||
|
migrationBuilder.DropTable(
|
||||||
|
name: "DriverInstanceResilienceStatus");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
1342
src/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260419131444_AddLdapGroupRoleMapping.Designer.cs
generated
Normal file
1342
src/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260419131444_AddLdapGroupRoleMapping.Designer.cs
generated
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,62 @@
|
|||||||
|
using System;
|
||||||
|
using Microsoft.EntityFrameworkCore.Migrations;
|
||||||
|
|
||||||
|
#nullable disable
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
|
||||||
|
{
|
||||||
|
/// <inheritdoc />
|
||||||
|
public partial class AddLdapGroupRoleMapping : Migration
|
||||||
|
{
|
||||||
|
/// <inheritdoc />
|
||||||
|
protected override void Up(MigrationBuilder migrationBuilder)
|
||||||
|
{
|
||||||
|
migrationBuilder.CreateTable(
|
||||||
|
name: "LdapGroupRoleMapping",
|
||||||
|
columns: table => new
|
||||||
|
{
|
||||||
|
Id = table.Column<Guid>(type: "uniqueidentifier", nullable: false),
|
||||||
|
LdapGroup = table.Column<string>(type: "nvarchar(512)", maxLength: 512, nullable: false),
|
||||||
|
Role = table.Column<string>(type: "nvarchar(32)", maxLength: 32, nullable: false),
|
||||||
|
ClusterId = table.Column<string>(type: "nvarchar(64)", maxLength: 64, nullable: true),
|
||||||
|
IsSystemWide = table.Column<bool>(type: "bit", nullable: false),
|
||||||
|
CreatedAtUtc = table.Column<DateTime>(type: "datetime2(3)", nullable: false),
|
||||||
|
Notes = table.Column<string>(type: "nvarchar(512)", maxLength: 512, nullable: true)
|
||||||
|
},
|
||||||
|
constraints: table =>
|
||||||
|
{
|
||||||
|
table.PrimaryKey("PK_LdapGroupRoleMapping", x => x.Id);
|
||||||
|
table.ForeignKey(
|
||||||
|
name: "FK_LdapGroupRoleMapping_ServerCluster_ClusterId",
|
||||||
|
column: x => x.ClusterId,
|
||||||
|
principalTable: "ServerCluster",
|
||||||
|
principalColumn: "ClusterId",
|
||||||
|
onDelete: ReferentialAction.Cascade);
|
||||||
|
});
|
||||||
|
|
||||||
|
migrationBuilder.CreateIndex(
|
||||||
|
name: "IX_LdapGroupRoleMapping_ClusterId",
|
||||||
|
table: "LdapGroupRoleMapping",
|
||||||
|
column: "ClusterId");
|
||||||
|
|
||||||
|
migrationBuilder.CreateIndex(
|
||||||
|
name: "IX_LdapGroupRoleMapping_Group",
|
||||||
|
table: "LdapGroupRoleMapping",
|
||||||
|
column: "LdapGroup");
|
||||||
|
|
||||||
|
migrationBuilder.CreateIndex(
|
||||||
|
name: "UX_LdapGroupRoleMapping_Group_Cluster",
|
||||||
|
table: "LdapGroupRoleMapping",
|
||||||
|
columns: new[] { "LdapGroup", "ClusterId" },
|
||||||
|
unique: true,
|
||||||
|
filter: "[ClusterId] IS NOT NULL");
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <inheritdoc />
|
||||||
|
protected override void Down(MigrationBuilder migrationBuilder)
|
||||||
|
{
|
||||||
|
migrationBuilder.DropTable(
|
||||||
|
name: "LdapGroupRoleMapping");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -434,6 +434,45 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.DriverInstanceResilienceStatus", b =>
|
||||||
|
{
|
||||||
|
b.Property<string>("DriverInstanceId")
|
||||||
|
.HasMaxLength(64)
|
||||||
|
.HasColumnType("nvarchar(64)");
|
||||||
|
|
||||||
|
b.Property<string>("HostName")
|
||||||
|
.HasMaxLength(256)
|
||||||
|
.HasColumnType("nvarchar(256)");
|
||||||
|
|
||||||
|
b.Property<long>("BaselineFootprintBytes")
|
||||||
|
.HasColumnType("bigint");
|
||||||
|
|
||||||
|
b.Property<int>("ConsecutiveFailures")
|
||||||
|
.HasColumnType("int");
|
||||||
|
|
||||||
|
b.Property<int>("CurrentBulkheadDepth")
|
||||||
|
.HasColumnType("int");
|
||||||
|
|
||||||
|
b.Property<long>("CurrentFootprintBytes")
|
||||||
|
.HasColumnType("bigint");
|
||||||
|
|
||||||
|
b.Property<DateTime?>("LastCircuitBreakerOpenUtc")
|
||||||
|
.HasColumnType("datetime2(3)");
|
||||||
|
|
||||||
|
b.Property<DateTime?>("LastRecycleUtc")
|
||||||
|
.HasColumnType("datetime2(3)");
|
||||||
|
|
||||||
|
b.Property<DateTime>("LastSampledUtc")
|
||||||
|
.HasColumnType("datetime2(3)");
|
||||||
|
|
||||||
|
b.HasKey("DriverInstanceId", "HostName");
|
||||||
|
|
||||||
|
b.HasIndex("LastSampledUtc")
|
||||||
|
.HasDatabaseName("IX_DriverResilience_LastSampled");
|
||||||
|
|
||||||
|
b.ToTable("DriverInstanceResilienceStatus", (string)null);
|
||||||
|
});
|
||||||
|
|
||||||
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Equipment", b =>
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Equipment", b =>
|
||||||
{
|
{
|
||||||
b.Property<Guid>("EquipmentRowId")
|
b.Property<Guid>("EquipmentRowId")
|
||||||
@@ -624,6 +663,51 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
|
|||||||
b.ToTable("ExternalIdReservation", (string)null);
|
b.ToTable("ExternalIdReservation", (string)null);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.LdapGroupRoleMapping", b =>
|
||||||
|
{
|
||||||
|
b.Property<Guid>("Id")
|
||||||
|
.ValueGeneratedOnAdd()
|
||||||
|
.HasColumnType("uniqueidentifier");
|
||||||
|
|
||||||
|
b.Property<string>("ClusterId")
|
||||||
|
.HasMaxLength(64)
|
||||||
|
.HasColumnType("nvarchar(64)");
|
||||||
|
|
||||||
|
b.Property<DateTime>("CreatedAtUtc")
|
||||||
|
.HasColumnType("datetime2(3)");
|
||||||
|
|
||||||
|
b.Property<bool>("IsSystemWide")
|
||||||
|
.HasColumnType("bit");
|
||||||
|
|
||||||
|
b.Property<string>("LdapGroup")
|
||||||
|
.IsRequired()
|
||||||
|
.HasMaxLength(512)
|
||||||
|
.HasColumnType("nvarchar(512)");
|
||||||
|
|
||||||
|
b.Property<string>("Notes")
|
||||||
|
.HasMaxLength(512)
|
||||||
|
.HasColumnType("nvarchar(512)");
|
||||||
|
|
||||||
|
b.Property<string>("Role")
|
||||||
|
.IsRequired()
|
||||||
|
.HasMaxLength(32)
|
||||||
|
.HasColumnType("nvarchar(32)");
|
||||||
|
|
||||||
|
b.HasKey("Id");
|
||||||
|
|
||||||
|
b.HasIndex("ClusterId");
|
||||||
|
|
||||||
|
b.HasIndex("LdapGroup")
|
||||||
|
.HasDatabaseName("IX_LdapGroupRoleMapping_Group");
|
||||||
|
|
||||||
|
b.HasIndex("LdapGroup", "ClusterId")
|
||||||
|
.IsUnique()
|
||||||
|
.HasDatabaseName("UX_LdapGroupRoleMapping_Group_Cluster")
|
||||||
|
.HasFilter("[ClusterId] IS NOT NULL");
|
||||||
|
|
||||||
|
b.ToTable("LdapGroupRoleMapping", (string)null);
|
||||||
|
});
|
||||||
|
|
||||||
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Namespace", b =>
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Namespace", b =>
|
||||||
{
|
{
|
||||||
b.Property<Guid>("NamespaceRowId")
|
b.Property<Guid>("NamespaceRowId")
|
||||||
@@ -1142,6 +1226,16 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
|
|||||||
b.Navigation("Generation");
|
b.Navigation("Generation");
|
||||||
});
|
});
|
||||||
|
|
||||||
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.LdapGroupRoleMapping", b =>
|
||||||
|
{
|
||||||
|
b.HasOne("ZB.MOM.WW.OtOpcUa.Configuration.Entities.ServerCluster", "Cluster")
|
||||||
|
.WithMany()
|
||||||
|
.HasForeignKey("ClusterId")
|
||||||
|
.OnDelete(DeleteBehavior.Cascade);
|
||||||
|
|
||||||
|
b.Navigation("Cluster");
|
||||||
|
});
|
||||||
|
|
||||||
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Namespace", b =>
|
modelBuilder.Entity("ZB.MOM.WW.OtOpcUa.Configuration.Entities.Namespace", b =>
|
||||||
{
|
{
|
||||||
b.HasOne("ZB.MOM.WW.OtOpcUa.Configuration.Entities.ServerCluster", "Cluster")
|
b.HasOne("ZB.MOM.WW.OtOpcUa.Configuration.Entities.ServerCluster", "Cluster")
|
||||||
|
|||||||
@@ -28,6 +28,8 @@ public sealed class OtOpcUaConfigDbContext(DbContextOptions<OtOpcUaConfigDbConte
|
|||||||
public DbSet<ConfigAuditLog> ConfigAuditLogs => Set<ConfigAuditLog>();
|
public DbSet<ConfigAuditLog> ConfigAuditLogs => Set<ConfigAuditLog>();
|
||||||
public DbSet<ExternalIdReservation> ExternalIdReservations => Set<ExternalIdReservation>();
|
public DbSet<ExternalIdReservation> ExternalIdReservations => Set<ExternalIdReservation>();
|
||||||
public DbSet<DriverHostStatus> DriverHostStatuses => Set<DriverHostStatus>();
|
public DbSet<DriverHostStatus> DriverHostStatuses => Set<DriverHostStatus>();
|
||||||
|
public DbSet<DriverInstanceResilienceStatus> DriverInstanceResilienceStatuses => Set<DriverInstanceResilienceStatus>();
|
||||||
|
public DbSet<LdapGroupRoleMapping> LdapGroupRoleMappings => Set<LdapGroupRoleMapping>();
|
||||||
|
|
||||||
protected override void OnModelCreating(ModelBuilder modelBuilder)
|
protected override void OnModelCreating(ModelBuilder modelBuilder)
|
||||||
{
|
{
|
||||||
@@ -49,6 +51,8 @@ public sealed class OtOpcUaConfigDbContext(DbContextOptions<OtOpcUaConfigDbConte
|
|||||||
ConfigureConfigAuditLog(modelBuilder);
|
ConfigureConfigAuditLog(modelBuilder);
|
||||||
ConfigureExternalIdReservation(modelBuilder);
|
ConfigureExternalIdReservation(modelBuilder);
|
||||||
ConfigureDriverHostStatus(modelBuilder);
|
ConfigureDriverHostStatus(modelBuilder);
|
||||||
|
ConfigureDriverInstanceResilienceStatus(modelBuilder);
|
||||||
|
ConfigureLdapGroupRoleMapping(modelBuilder);
|
||||||
}
|
}
|
||||||
|
|
||||||
private static void ConfigureServerCluster(ModelBuilder modelBuilder)
|
private static void ConfigureServerCluster(ModelBuilder modelBuilder)
|
||||||
@@ -512,4 +516,53 @@ public sealed class OtOpcUaConfigDbContext(DbContextOptions<OtOpcUaConfigDbConte
|
|||||||
e.HasIndex(x => x.LastSeenUtc).HasDatabaseName("IX_DriverHostStatus_LastSeen");
|
e.HasIndex(x => x.LastSeenUtc).HasDatabaseName("IX_DriverHostStatus_LastSeen");
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static void ConfigureDriverInstanceResilienceStatus(ModelBuilder modelBuilder)
|
||||||
|
{
|
||||||
|
modelBuilder.Entity<DriverInstanceResilienceStatus>(e =>
|
||||||
|
{
|
||||||
|
e.ToTable("DriverInstanceResilienceStatus");
|
||||||
|
e.HasKey(x => new { x.DriverInstanceId, x.HostName });
|
||||||
|
e.Property(x => x.DriverInstanceId).HasMaxLength(64);
|
||||||
|
e.Property(x => x.HostName).HasMaxLength(256);
|
||||||
|
e.Property(x => x.LastCircuitBreakerOpenUtc).HasColumnType("datetime2(3)");
|
||||||
|
e.Property(x => x.LastRecycleUtc).HasColumnType("datetime2(3)");
|
||||||
|
e.Property(x => x.LastSampledUtc).HasColumnType("datetime2(3)");
|
||||||
|
// LastSampledUtc drives the Admin UI's stale-sample filter same way DriverHostStatus's
|
||||||
|
// LastSeenUtc index does for connectivity rows.
|
||||||
|
e.HasIndex(x => x.LastSampledUtc).HasDatabaseName("IX_DriverResilience_LastSampled");
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void ConfigureLdapGroupRoleMapping(ModelBuilder modelBuilder)
|
||||||
|
{
|
||||||
|
modelBuilder.Entity<LdapGroupRoleMapping>(e =>
|
||||||
|
{
|
||||||
|
e.ToTable("LdapGroupRoleMapping");
|
||||||
|
e.HasKey(x => x.Id);
|
||||||
|
e.Property(x => x.LdapGroup).HasMaxLength(512).IsRequired();
|
||||||
|
e.Property(x => x.Role).HasConversion<string>().HasMaxLength(32);
|
||||||
|
e.Property(x => x.ClusterId).HasMaxLength(64);
|
||||||
|
e.Property(x => x.CreatedAtUtc).HasColumnType("datetime2(3)");
|
||||||
|
e.Property(x => x.Notes).HasMaxLength(512);
|
||||||
|
|
||||||
|
// FK to ServerCluster when cluster-scoped; null for system-wide grants.
|
||||||
|
e.HasOne(x => x.Cluster)
|
||||||
|
.WithMany()
|
||||||
|
.HasForeignKey(x => x.ClusterId)
|
||||||
|
.OnDelete(DeleteBehavior.Cascade);
|
||||||
|
|
||||||
|
// Uniqueness: one row per (LdapGroup, ClusterId). Null ClusterId is treated as its own
|
||||||
|
// "bucket" so a system-wide row coexists with cluster-scoped rows for the same group.
|
||||||
|
// SQL Server treats NULL as a distinct value in unique-index comparisons by default
|
||||||
|
// since 2008 SP1 onwards under the session setting we use — tested in SchemaCompliance.
|
||||||
|
e.HasIndex(x => new { x.LdapGroup, x.ClusterId })
|
||||||
|
.IsUnique()
|
||||||
|
.HasDatabaseName("UX_LdapGroupRoleMapping_Group_Cluster");
|
||||||
|
|
||||||
|
// Hot-path lookup during cookie auth: "what grants does this user's set of LDAP
|
||||||
|
// groups carry?". Fires on every sign-in so the index earns its keep.
|
||||||
|
e.HasIndex(x => x.LdapGroup).HasDatabaseName("IX_LdapGroupRoleMapping_Group");
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,47 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Services;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// CRUD surface for <see cref="LdapGroupRoleMapping"/> — the control-plane mapping from
|
||||||
|
/// LDAP groups to Admin UI roles. Consumed only by Admin UI code paths; the OPC UA
|
||||||
|
/// data-path evaluator MUST NOT depend on this interface (see decision #150 and the
|
||||||
|
/// Phase 6.2 compliance check on control/data-plane separation).
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per Phase 6.2 Stream A.2 this service is expected to run behind the Phase 6.1
|
||||||
|
/// <c>ResilientConfigReader</c> pipeline (timeout → retry → fallback-to-cache) so a
|
||||||
|
/// transient DB outage during sign-in falls back to the sealed snapshot rather than
|
||||||
|
/// denying every login.
|
||||||
|
/// </remarks>
|
||||||
|
public interface ILdapGroupRoleMappingService
|
||||||
|
{
|
||||||
|
/// <summary>List every mapping whose LDAP group matches one of <paramref name="ldapGroups"/>.</summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Hot path — fires on every sign-in. The default EF implementation relies on the
|
||||||
|
/// <c>IX_LdapGroupRoleMapping_Group</c> index. Case-insensitive per LDAP conventions.
|
||||||
|
/// </remarks>
|
||||||
|
Task<IReadOnlyList<LdapGroupRoleMapping>> GetByGroupsAsync(
|
||||||
|
IEnumerable<string> ldapGroups, CancellationToken cancellationToken);
|
||||||
|
|
||||||
|
/// <summary>Enumerate every mapping; Admin UI listing only.</summary>
|
||||||
|
Task<IReadOnlyList<LdapGroupRoleMapping>> ListAllAsync(CancellationToken cancellationToken);
|
||||||
|
|
||||||
|
/// <summary>Create a new grant.</summary>
|
||||||
|
/// <exception cref="InvalidLdapGroupRoleMappingException">
|
||||||
|
/// Thrown when the proposed row violates an invariant (IsSystemWide inconsistent with
|
||||||
|
/// ClusterId, duplicate (group, cluster) pair, etc.) — ValidatedLdapGroupRoleMappingService
|
||||||
|
/// is the write surface that enforces these; the raw service here surfaces DB-level violations.
|
||||||
|
/// </exception>
|
||||||
|
Task<LdapGroupRoleMapping> CreateAsync(LdapGroupRoleMapping row, CancellationToken cancellationToken);
|
||||||
|
|
||||||
|
/// <summary>Delete a mapping by its surrogate key.</summary>
|
||||||
|
Task DeleteAsync(Guid id, CancellationToken cancellationToken);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Thrown when <see cref="LdapGroupRoleMapping"/> authoring violates an invariant.</summary>
|
||||||
|
public sealed class InvalidLdapGroupRoleMappingException : Exception
|
||||||
|
{
|
||||||
|
public InvalidLdapGroupRoleMappingException(string message) : base(message) { }
|
||||||
|
}
|
||||||
@@ -0,0 +1,69 @@
|
|||||||
|
using Microsoft.EntityFrameworkCore;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Services;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// EF Core implementation of <see cref="ILdapGroupRoleMappingService"/>. Enforces the
|
||||||
|
/// "exactly one of (ClusterId, IsSystemWide)" invariant at the write surface so a
|
||||||
|
/// malformed row can't land in the DB.
|
||||||
|
/// </summary>
|
||||||
|
public sealed class LdapGroupRoleMappingService(OtOpcUaConfigDbContext db) : ILdapGroupRoleMappingService
|
||||||
|
{
|
||||||
|
public async Task<IReadOnlyList<LdapGroupRoleMapping>> GetByGroupsAsync(
|
||||||
|
IEnumerable<string> ldapGroups, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(ldapGroups);
|
||||||
|
var groupSet = ldapGroups.ToList();
|
||||||
|
if (groupSet.Count == 0) return [];
|
||||||
|
|
||||||
|
return await db.LdapGroupRoleMappings
|
||||||
|
.AsNoTracking()
|
||||||
|
.Where(m => groupSet.Contains(m.LdapGroup))
|
||||||
|
.ToListAsync(cancellationToken)
|
||||||
|
.ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
public async Task<IReadOnlyList<LdapGroupRoleMapping>> ListAllAsync(CancellationToken cancellationToken)
|
||||||
|
=> await db.LdapGroupRoleMappings
|
||||||
|
.AsNoTracking()
|
||||||
|
.OrderBy(m => m.LdapGroup)
|
||||||
|
.ThenBy(m => m.ClusterId)
|
||||||
|
.ToListAsync(cancellationToken)
|
||||||
|
.ConfigureAwait(false);
|
||||||
|
|
||||||
|
public async Task<LdapGroupRoleMapping> CreateAsync(LdapGroupRoleMapping row, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(row);
|
||||||
|
ValidateInvariants(row);
|
||||||
|
|
||||||
|
if (row.Id == Guid.Empty) row.Id = Guid.NewGuid();
|
||||||
|
if (row.CreatedAtUtc == default) row.CreatedAtUtc = DateTime.UtcNow;
|
||||||
|
|
||||||
|
db.LdapGroupRoleMappings.Add(row);
|
||||||
|
await db.SaveChangesAsync(cancellationToken).ConfigureAwait(false);
|
||||||
|
return row;
|
||||||
|
}
|
||||||
|
|
||||||
|
public async Task DeleteAsync(Guid id, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
var existing = await db.LdapGroupRoleMappings.FindAsync([id], cancellationToken).ConfigureAwait(false);
|
||||||
|
if (existing is null) return;
|
||||||
|
db.LdapGroupRoleMappings.Remove(existing);
|
||||||
|
await db.SaveChangesAsync(cancellationToken).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void ValidateInvariants(LdapGroupRoleMapping row)
|
||||||
|
{
|
||||||
|
if (string.IsNullOrWhiteSpace(row.LdapGroup))
|
||||||
|
throw new InvalidLdapGroupRoleMappingException("LdapGroup must not be empty.");
|
||||||
|
|
||||||
|
if (row.IsSystemWide && !string.IsNullOrEmpty(row.ClusterId))
|
||||||
|
throw new InvalidLdapGroupRoleMappingException(
|
||||||
|
"IsSystemWide=true requires ClusterId to be null. A fleet-wide grant cannot also be cluster-scoped.");
|
||||||
|
|
||||||
|
if (!row.IsSystemWide && string.IsNullOrEmpty(row.ClusterId))
|
||||||
|
throw new InvalidLdapGroupRoleMappingException(
|
||||||
|
"IsSystemWide=false requires a populated ClusterId. A cluster-scoped grant needs its target cluster.");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -19,7 +19,9 @@
|
|||||||
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
|
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
|
||||||
</PackageReference>
|
</PackageReference>
|
||||||
<PackageReference Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.0"/>
|
<PackageReference Include="Microsoft.Extensions.Configuration.Abstractions" Version="10.0.0"/>
|
||||||
|
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.0"/>
|
||||||
<PackageReference Include="LiteDB" Version="5.0.21"/>
|
<PackageReference Include="LiteDB" Version="5.0.21"/>
|
||||||
|
<PackageReference Include="Polly.Core" Version="8.6.6"/>
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
|
|||||||
@@ -25,6 +25,14 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
|||||||
/// OPC UA <c>AlarmConditionState</c> when true. Defaults to false so existing non-Galaxy
|
/// OPC UA <c>AlarmConditionState</c> when true. Defaults to false so existing non-Galaxy
|
||||||
/// drivers aren't forced to flow a flag they don't produce.
|
/// drivers aren't forced to flow a flag they don't produce.
|
||||||
/// </param>
|
/// </param>
|
||||||
|
/// <param name="WriteIdempotent">
|
||||||
|
/// True when a timed-out or failed write to this attribute is safe to replay. Per
|
||||||
|
/// <c>docs/v2/plan.md</c> decisions #44, #45, #143 — writes are NOT auto-retried by default
|
||||||
|
/// because replaying a pulse / alarm-ack / counter-increment / recipe-step advance can
|
||||||
|
/// duplicate field actions. Drivers flag only tags whose semantics make retry safe
|
||||||
|
/// (holding registers with level-set values, set-point writes to analog tags) — the
|
||||||
|
/// capability invoker respects this flag when deciding whether to apply Polly retry.
|
||||||
|
/// </param>
|
||||||
public sealed record DriverAttributeInfo(
|
public sealed record DriverAttributeInfo(
|
||||||
string FullName,
|
string FullName,
|
||||||
DriverDataType DriverDataType,
|
DriverDataType DriverDataType,
|
||||||
@@ -32,4 +40,5 @@ public sealed record DriverAttributeInfo(
|
|||||||
uint? ArrayDim,
|
uint? ArrayDim,
|
||||||
SecurityClassification SecurityClass,
|
SecurityClassification SecurityClass,
|
||||||
bool IsHistorized,
|
bool IsHistorized,
|
||||||
bool IsAlarm = false);
|
bool IsAlarm = false,
|
||||||
|
bool WriteIdempotent = false);
|
||||||
|
|||||||
42
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverCapability.cs
Normal file
42
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverCapability.cs
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Enumerates the driver-capability surface points guarded by Phase 6.1 resilience pipelines.
|
||||||
|
/// Each value corresponds to one method (or tightly-related method group) on the
|
||||||
|
/// <c>Core.Abstractions</c> capability interfaces (<see cref="IReadable"/>, <see cref="IWritable"/>,
|
||||||
|
/// <see cref="ITagDiscovery"/>, <see cref="ISubscribable"/>, <see cref="IHostConnectivityProbe"/>,
|
||||||
|
/// <see cref="IAlarmSource"/>, <see cref="IHistoryProvider"/>).
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decision #143 (per-capability retry policy): Read / HistoryRead /
|
||||||
|
/// Discover / Probe / AlarmSubscribe auto-retry; <see cref="Write"/> does NOT retry unless the
|
||||||
|
/// tag-definition carries <see cref="WriteIdempotentAttribute"/>. Alarm-acknowledge is treated
|
||||||
|
/// as a write for retry semantics (an alarm-ack is not idempotent at the plant-floor acknowledgement
|
||||||
|
/// level even if the OPC UA spec permits re-issue).
|
||||||
|
/// </remarks>
|
||||||
|
public enum DriverCapability
|
||||||
|
{
|
||||||
|
/// <summary>Batch <see cref="IReadable.ReadAsync"/>. Retries by default.</summary>
|
||||||
|
Read,
|
||||||
|
|
||||||
|
/// <summary>Batch <see cref="IWritable.WriteAsync"/>. Does not retry unless tag is <see cref="WriteIdempotentAttribute">idempotent</see>.</summary>
|
||||||
|
Write,
|
||||||
|
|
||||||
|
/// <summary><see cref="ITagDiscovery.DiscoverAsync"/>. Retries by default.</summary>
|
||||||
|
Discover,
|
||||||
|
|
||||||
|
/// <summary><see cref="ISubscribable.SubscribeAsync"/> and unsubscribe. Retries by default.</summary>
|
||||||
|
Subscribe,
|
||||||
|
|
||||||
|
/// <summary><see cref="IHostConnectivityProbe"/> probe loop. Retries by default.</summary>
|
||||||
|
Probe,
|
||||||
|
|
||||||
|
/// <summary><see cref="IAlarmSource.SubscribeAlarmsAsync"/>. Retries by default.</summary>
|
||||||
|
AlarmSubscribe,
|
||||||
|
|
||||||
|
/// <summary><see cref="IAlarmSource.AcknowledgeAsync"/>. Does NOT retry — ack is a write-shaped operation (decision #143).</summary>
|
||||||
|
AlarmAcknowledge,
|
||||||
|
|
||||||
|
/// <summary><see cref="IHistoryProvider"/> reads (Raw/Processed/AtTime/Events). Retries by default.</summary>
|
||||||
|
HistoryRead,
|
||||||
|
}
|
||||||
34
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTier.cs
Normal file
34
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTier.cs
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Stability tier of a driver type. Determines which cross-cutting runtime protections
|
||||||
|
/// apply — per-tier retry defaults, memory-tracking thresholds, and whether out-of-process
|
||||||
|
/// supervision with process-level recycle is in play.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/driver-stability.md</c> §2-4 and <c>docs/v2/plan.md</c> decisions #63-74.
|
||||||
|
///
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item><b>A</b> — managed, known-good SDK; low blast radius. In-process. Fast retries.
|
||||||
|
/// Examples: OPC UA Client (OPCFoundation stack), S7 (S7NetPlus).</item>
|
||||||
|
/// <item><b>B</b> — native or semi-trusted SDK with an in-process footprint. Examples: Modbus.</item>
|
||||||
|
/// <item><b>C</b> — unmanaged SDK with COM/STA constraints, leak risk, or other out-of-process
|
||||||
|
/// requirements. Must run as a separate Host process behind a Proxy with a supervisor that
|
||||||
|
/// can recycle the process on hard-breach. Example: Galaxy (MXAccess COM).</item>
|
||||||
|
/// </list>
|
||||||
|
///
|
||||||
|
/// <para>Process-kill protections (<c>MemoryRecycle</c>, <c>ScheduledRecycleScheduler</c>) are
|
||||||
|
/// Tier C only per decisions #73-74 and #145 — killing an in-process Tier A/B driver also kills
|
||||||
|
/// every OPC UA session and every co-hosted driver, blast-radius worse than the leak.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public enum DriverTier
|
||||||
|
{
|
||||||
|
/// <summary>Managed SDK, in-process, low blast radius.</summary>
|
||||||
|
A,
|
||||||
|
|
||||||
|
/// <summary>Native or semi-trusted SDK, in-process.</summary>
|
||||||
|
B,
|
||||||
|
|
||||||
|
/// <summary>Unmanaged SDK, out-of-process required with Proxy+Host+Supervisor.</summary>
|
||||||
|
C,
|
||||||
|
}
|
||||||
@@ -69,12 +69,20 @@ public sealed class DriverTypeRegistry
|
|||||||
/// <param name="DriverConfigJsonSchema">JSON Schema (Draft 2020-12) the driver's <c>DriverConfig</c> column must validate against.</param>
|
/// <param name="DriverConfigJsonSchema">JSON Schema (Draft 2020-12) the driver's <c>DriverConfig</c> column must validate against.</param>
|
||||||
/// <param name="DeviceConfigJsonSchema">JSON Schema for <c>DeviceConfig</c> (multi-device drivers); null if the driver has no device layer.</param>
|
/// <param name="DeviceConfigJsonSchema">JSON Schema for <c>DeviceConfig</c> (multi-device drivers); null if the driver has no device layer.</param>
|
||||||
/// <param name="TagConfigJsonSchema">JSON Schema for <c>TagConfig</c>; required for every driver since every driver has tags.</param>
|
/// <param name="TagConfigJsonSchema">JSON Schema for <c>TagConfig</c>; required for every driver since every driver has tags.</param>
|
||||||
|
/// <param name="Tier">
|
||||||
|
/// Stability tier per <c>docs/v2/driver-stability.md</c> §2-4 and <c>docs/v2/plan.md</c>
|
||||||
|
/// decisions #63-74. Drives the shared resilience pipeline defaults
|
||||||
|
/// (<see cref="Tier"/> × capability → <c>CapabilityPolicy</c>), the <c>MemoryTracking</c>
|
||||||
|
/// hybrid-formula constants, and whether process-level <c>MemoryRecycle</c> / scheduled-
|
||||||
|
/// recycle protections apply (Tier C only). Every registered driver type must declare one.
|
||||||
|
/// </param>
|
||||||
public sealed record DriverTypeMetadata(
|
public sealed record DriverTypeMetadata(
|
||||||
string TypeName,
|
string TypeName,
|
||||||
NamespaceKindCompatibility AllowedNamespaceKinds,
|
NamespaceKindCompatibility AllowedNamespaceKinds,
|
||||||
string DriverConfigJsonSchema,
|
string DriverConfigJsonSchema,
|
||||||
string? DeviceConfigJsonSchema,
|
string? DeviceConfigJsonSchema,
|
||||||
string TagConfigJsonSchema);
|
string TagConfigJsonSchema,
|
||||||
|
DriverTier Tier);
|
||||||
|
|
||||||
/// <summary>Bitmask of namespace kinds a driver type may populate. Per decision #111.</summary>
|
/// <summary>Bitmask of namespace kinds a driver type may populate. Per decision #111.</summary>
|
||||||
[Flags]
|
[Flags]
|
||||||
|
|||||||
26
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverSupervisor.cs
Normal file
26
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverSupervisor.cs
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Process-level supervisor contract a Tier C driver's out-of-process topology provides
|
||||||
|
/// (e.g. <c>Driver.Galaxy.Proxy/Supervisor/</c>). Concerns: restart the Host process when a
|
||||||
|
/// hard fault is detected (memory breach, wedge, scheduled recycle window).
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #68, #73-74, and #145. Tier A/B drivers do NOT have
|
||||||
|
/// a supervisor because they run in-process — recycling would kill every OPC UA session and
|
||||||
|
/// every co-hosted driver. The Core.Stability layer only invokes this interface for Tier C
|
||||||
|
/// instances after asserting the tier via <see cref="DriverTypeMetadata.Tier"/>.
|
||||||
|
/// </remarks>
|
||||||
|
public interface IDriverSupervisor
|
||||||
|
{
|
||||||
|
/// <summary>Driver instance this supervisor governs.</summary>
|
||||||
|
string DriverInstanceId { get; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Request the supervisor to recycle (terminate + restart) the Host process. Implementations
|
||||||
|
/// are expected to be idempotent under repeat calls during an in-flight recycle.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="reason">Human-readable reason — flows into the supervisor's logs.</param>
|
||||||
|
/// <param name="cancellationToken">Cancels the recycle request; an in-flight restart is not interrupted.</param>
|
||||||
|
Task RecycleAsync(string reason, CancellationToken cancellationToken);
|
||||||
|
}
|
||||||
59
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/OpcUaOperation.cs
Normal file
59
src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/OpcUaOperation.cs
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Every OPC UA operation surface the Phase 6.2 authorization evaluator gates, per
|
||||||
|
/// <c>docs/v2/implementation/phase-6-2-authorization-runtime.md</c> §Stream C and
|
||||||
|
/// decision #143. The evaluator maps each operation onto the corresponding
|
||||||
|
/// <c>NodePermissions</c> bit(s) to decide whether the calling session is allowed.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Write is split out into <see cref="WriteOperate"/> / <see cref="WriteTune"/> /
|
||||||
|
/// <see cref="WriteConfigure"/> because the underlying driver-reported
|
||||||
|
/// <see cref="SecurityClassification"/> already carries that distinction — the
|
||||||
|
/// evaluator maps the requested tag's security class to the matching operation value
|
||||||
|
/// before checking the permission bit.
|
||||||
|
/// </remarks>
|
||||||
|
public enum OpcUaOperation
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// <c>Browse</c> + <c>TranslateBrowsePathsToNodeIds</c>. Ancestor visibility implied
|
||||||
|
/// when any descendant has a grant; denied ancestors filter from browse results.
|
||||||
|
/// </summary>
|
||||||
|
Browse,
|
||||||
|
|
||||||
|
/// <summary><c>Read</c> on a variable node.</summary>
|
||||||
|
Read,
|
||||||
|
|
||||||
|
/// <summary><c>Write</c> when the target has <see cref="SecurityClassification.Operate"/> / <see cref="SecurityClassification.FreeAccess"/>.</summary>
|
||||||
|
WriteOperate,
|
||||||
|
|
||||||
|
/// <summary><c>Write</c> when the target has <see cref="SecurityClassification.Tune"/>.</summary>
|
||||||
|
WriteTune,
|
||||||
|
|
||||||
|
/// <summary><c>Write</c> when the target has <see cref="SecurityClassification.Configure"/>.</summary>
|
||||||
|
WriteConfigure,
|
||||||
|
|
||||||
|
/// <summary><c>HistoryRead</c> — uses its own <c>NodePermissions.HistoryRead</c> bit; Read alone is NOT sufficient (decision in Phase 6.2 Compliance).</summary>
|
||||||
|
HistoryRead,
|
||||||
|
|
||||||
|
/// <summary><c>HistoryUpdate</c> — annotation / insert / delete on historian.</summary>
|
||||||
|
HistoryUpdate,
|
||||||
|
|
||||||
|
/// <summary><c>CreateMonitoredItems</c>. Per-item denial in mixed-authorization batches.</summary>
|
||||||
|
CreateMonitoredItems,
|
||||||
|
|
||||||
|
/// <summary><c>TransferSubscriptions</c>. Re-evaluates transferred items against current auth state.</summary>
|
||||||
|
TransferSubscriptions,
|
||||||
|
|
||||||
|
/// <summary><c>Call</c> on a Method node.</summary>
|
||||||
|
Call,
|
||||||
|
|
||||||
|
/// <summary>Alarm <c>Acknowledge</c>.</summary>
|
||||||
|
AlarmAcknowledge,
|
||||||
|
|
||||||
|
/// <summary>Alarm <c>Confirm</c>.</summary>
|
||||||
|
AlarmConfirm,
|
||||||
|
|
||||||
|
/// <summary>Alarm <c>Shelve</c> / <c>Unshelve</c>.</summary>
|
||||||
|
AlarmShelve,
|
||||||
|
}
|
||||||
@@ -0,0 +1,19 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Opts a tag-definition record into auto-retry on <see cref="IWritable.WriteAsync"/> failures.
|
||||||
|
/// Absence of this attribute means writes are <b>not</b> retried — a timed-out write may have
|
||||||
|
/// already succeeded at the device, and replaying pulses, alarm acks, counter increments, or
|
||||||
|
/// recipe-step advances can duplicate irreversible field actions.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #44, #45, and #143. Applied to tag-definition POCOs
|
||||||
|
/// (e.g. <c>ModbusTagDefinition</c>, <c>S7TagDefinition</c>, OPC UA client tag rows) at the
|
||||||
|
/// property or record level. The <c>CapabilityInvoker</c> in <c>ZB.MOM.WW.OtOpcUa.Core.Resilience</c>
|
||||||
|
/// reads this attribute via reflection once at driver-init time and caches the result; no
|
||||||
|
/// per-write reflection cost.
|
||||||
|
/// </remarks>
|
||||||
|
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Class | AttributeTargets.Struct, AllowMultiple = false, Inherited = true)]
|
||||||
|
public sealed class WriteIdempotentAttribute : Attribute
|
||||||
|
{
|
||||||
|
}
|
||||||
@@ -0,0 +1,48 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tri-state result of an <see cref="IPermissionEvaluator.Authorize"/> call, per decision
|
||||||
|
/// #149. Phase 6.2 only produces <see cref="AuthorizationVerdict.Allow"/> and
|
||||||
|
/// <see cref="AuthorizationVerdict.NotGranted"/>; the <see cref="AuthorizationVerdict.Denied"/>
|
||||||
|
/// variant exists in the model so v2.1 Explicit Deny lands without an API break. Provenance
|
||||||
|
/// carries the matched grants (or empty when not granted) for audit + the Admin UI "Probe
|
||||||
|
/// this permission" diagnostic.
|
||||||
|
/// </summary>
|
||||||
|
public sealed record AuthorizationDecision(
|
||||||
|
AuthorizationVerdict Verdict,
|
||||||
|
IReadOnlyList<MatchedGrant> Provenance)
|
||||||
|
{
|
||||||
|
public bool IsAllowed => Verdict == AuthorizationVerdict.Allow;
|
||||||
|
|
||||||
|
/// <summary>Convenience constructor for the common "no grants matched" outcome.</summary>
|
||||||
|
public static AuthorizationDecision NotGranted() => new(AuthorizationVerdict.NotGranted, []);
|
||||||
|
|
||||||
|
/// <summary>Allow with the list of grants that matched.</summary>
|
||||||
|
public static AuthorizationDecision Allowed(IReadOnlyList<MatchedGrant> provenance)
|
||||||
|
=> new(AuthorizationVerdict.Allow, provenance);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Three-valued authorization outcome.</summary>
|
||||||
|
public enum AuthorizationVerdict
|
||||||
|
{
|
||||||
|
/// <summary>At least one grant matches the requested (operation, scope) pair.</summary>
|
||||||
|
Allow,
|
||||||
|
|
||||||
|
/// <summary>No grant matches. Phase 6.2 default — treated as deny at the OPC UA surface.</summary>
|
||||||
|
NotGranted,
|
||||||
|
|
||||||
|
/// <summary>Explicit deny grant matched. Reserved for v2.1; never produced by Phase 6.2.</summary>
|
||||||
|
Denied,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>One grant that contributed to an Allow verdict — for audit / UI diagnostics.</summary>
|
||||||
|
/// <param name="LdapGroup">LDAP group the matched grant belongs to.</param>
|
||||||
|
/// <param name="Scope">Where in the hierarchy the grant was anchored.</param>
|
||||||
|
/// <param name="PermissionFlags">The bitmask the grant contributed.</param>
|
||||||
|
public sealed record MatchedGrant(
|
||||||
|
string LdapGroup,
|
||||||
|
NodeAclScopeKind Scope,
|
||||||
|
NodePermissions PermissionFlags);
|
||||||
@@ -0,0 +1,23 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Evaluates whether a session is authorized to perform an OPC UA <see cref="OpcUaOperation"/>
|
||||||
|
/// on the node addressed by a <see cref="NodeScope"/>. Phase 6.2 Stream B central surface.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Data-plane only. Reads <c>NodeAcl</c> rows joined against the session's resolved LDAP
|
||||||
|
/// groups (via <see cref="UserAuthorizationState"/>). Must not depend on
|
||||||
|
/// <c>LdapGroupRoleMapping</c> (control-plane) per decision #150.
|
||||||
|
/// </remarks>
|
||||||
|
public interface IPermissionEvaluator
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// Authorize the requested operation for the session. Callers (<c>DriverNodeManager</c>
|
||||||
|
/// Read / Write / HistoryRead / Subscribe / Browse / Call dispatch) map their native
|
||||||
|
/// failure to <c>BadUserAccessDenied</c> per OPC UA Part 4 when the result is not
|
||||||
|
/// <see cref="AuthorizationVerdict.Allow"/>.
|
||||||
|
/// </summary>
|
||||||
|
AuthorizationDecision Authorize(UserAuthorizationState session, OpcUaOperation operation, NodeScope scope);
|
||||||
|
}
|
||||||
58
src/ZB.MOM.WW.OtOpcUa.Core/Authorization/NodeScope.cs
Normal file
58
src/ZB.MOM.WW.OtOpcUa.Core/Authorization/NodeScope.cs
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Address of a node in the 6-level scope hierarchy the Phase 6.2 evaluator walks.
|
||||||
|
/// Assembled by the dispatch layer from the node's namespace + UNS path + tag; passed
|
||||||
|
/// to <see cref="IPermissionEvaluator"/> which walks the matching trie path.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Per decision #129 and the Phase 6.2 Stream B plan the hierarchy is
|
||||||
|
/// <c>Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag</c> for UNS
|
||||||
|
/// (Equipment-kind) namespaces. Galaxy (SystemPlatform-kind) namespaces instead use
|
||||||
|
/// <c>Cluster → Namespace → FolderSegment(s) → Tag</c>, and each folder segment takes
|
||||||
|
/// one trie level — so a deeply-nested Galaxy folder implicitly reaches the same
|
||||||
|
/// depth as a full UNS path.</para>
|
||||||
|
///
|
||||||
|
/// <para>Unset mid-path levels (e.g. a Cluster-scoped request with no UnsArea) leave
|
||||||
|
/// the corresponding id <c>null</c>. The evaluator walks as far as the scope goes +
|
||||||
|
/// stops at the first null.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed record NodeScope
|
||||||
|
{
|
||||||
|
/// <summary>Cluster the node belongs to. Required.</summary>
|
||||||
|
public required string ClusterId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Namespace within the cluster. Null is not allowed for a request against a real node.</summary>
|
||||||
|
public string? NamespaceId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>For Equipment-kind namespaces: UNS area (e.g. "warsaw-west"). Null on Galaxy.</summary>
|
||||||
|
public string? UnsAreaId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>For Equipment-kind namespaces: UNS line below the area. Null on Galaxy.</summary>
|
||||||
|
public string? UnsLineId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>For Equipment-kind namespaces: equipment row below the line. Null on Galaxy.</summary>
|
||||||
|
public string? EquipmentId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// For Galaxy (SystemPlatform-kind) namespaces only: the folder path segments from
|
||||||
|
/// namespace root to the target tag, in order. Empty on Equipment namespaces.
|
||||||
|
/// </summary>
|
||||||
|
public IReadOnlyList<string> FolderSegments { get; init; } = [];
|
||||||
|
|
||||||
|
/// <summary>Target tag id when the scope addresses a specific tag; null for folder / equipment-level scopes.</summary>
|
||||||
|
public string? TagId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Which hierarchy applies — Equipment-kind (UNS) or SystemPlatform-kind (Galaxy).</summary>
|
||||||
|
public required NodeHierarchyKind Kind { get; init; }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Selector between the two scope-hierarchy shapes.</summary>
|
||||||
|
public enum NodeHierarchyKind
|
||||||
|
{
|
||||||
|
/// <summary><c>Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag</c> — UNS / Equipment kind.</summary>
|
||||||
|
Equipment,
|
||||||
|
|
||||||
|
/// <summary><c>Cluster → Namespace → FolderSegment(s) → Tag</c> — Galaxy / SystemPlatform kind.</summary>
|
||||||
|
SystemPlatform,
|
||||||
|
}
|
||||||
125
src/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs
Normal file
125
src/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// In-memory permission trie for one <c>(ClusterId, GenerationId)</c>. Walk from the cluster
|
||||||
|
/// root down through namespace → UNS levels (or folder segments) → tag, OR-ing the
|
||||||
|
/// <see cref="TrieGrant.PermissionFlags"/> granted at each visited level for each of the session's
|
||||||
|
/// LDAP groups. The accumulated bitmask is compared to the permission required by the
|
||||||
|
/// requested <see cref="Abstractions.OpcUaOperation"/>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per decision #129 (additive grants, no explicit Deny in v2.0) the walk is pure union:
|
||||||
|
/// encountering a grant at any level contributes its flags, never revokes them. A grant at
|
||||||
|
/// the Cluster root therefore cascades to every tag below it; a grant at a deep equipment
|
||||||
|
/// leaf is visible only on that equipment subtree.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class PermissionTrie
|
||||||
|
{
|
||||||
|
/// <summary>Cluster this trie belongs to.</summary>
|
||||||
|
public required string ClusterId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Config generation the trie was built from — used by the cache for invalidation.</summary>
|
||||||
|
public required long GenerationId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Root of the trie. Level 0 (cluster-level grants) live directly here.</summary>
|
||||||
|
public PermissionTrieNode Root { get; init; } = new();
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Walk the trie collecting grants that apply to <paramref name="scope"/> for any of the
|
||||||
|
/// session's <paramref name="ldapGroups"/>. Returns the matched-grant list; the caller
|
||||||
|
/// OR-s the flag bits to decide whether the requested permission is carried.
|
||||||
|
/// </summary>
|
||||||
|
public IReadOnlyList<MatchedGrant> CollectMatches(NodeScope scope, IEnumerable<string> ldapGroups)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(scope);
|
||||||
|
ArgumentNullException.ThrowIfNull(ldapGroups);
|
||||||
|
|
||||||
|
var groups = ldapGroups.ToHashSet(StringComparer.OrdinalIgnoreCase);
|
||||||
|
if (groups.Count == 0) return [];
|
||||||
|
|
||||||
|
var matches = new List<MatchedGrant>();
|
||||||
|
|
||||||
|
// Level 0 — cluster-scoped grants.
|
||||||
|
CollectAtLevel(Root, NodeAclScopeKind.Cluster, groups, matches);
|
||||||
|
|
||||||
|
// Level 1 — namespace.
|
||||||
|
if (scope.NamespaceId is null) return matches;
|
||||||
|
if (!Root.Children.TryGetValue(scope.NamespaceId, out var ns)) return matches;
|
||||||
|
CollectAtLevel(ns, NodeAclScopeKind.Namespace, groups, matches);
|
||||||
|
|
||||||
|
// Two hierarchies diverge below the namespace.
|
||||||
|
if (scope.Kind == NodeHierarchyKind.Equipment)
|
||||||
|
WalkEquipment(ns, scope, groups, matches);
|
||||||
|
else
|
||||||
|
WalkSystemPlatform(ns, scope, groups, matches);
|
||||||
|
|
||||||
|
return matches;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void WalkEquipment(PermissionTrieNode ns, NodeScope scope, HashSet<string> groups, List<MatchedGrant> matches)
|
||||||
|
{
|
||||||
|
if (scope.UnsAreaId is null) return;
|
||||||
|
if (!ns.Children.TryGetValue(scope.UnsAreaId, out var area)) return;
|
||||||
|
CollectAtLevel(area, NodeAclScopeKind.UnsArea, groups, matches);
|
||||||
|
|
||||||
|
if (scope.UnsLineId is null) return;
|
||||||
|
if (!area.Children.TryGetValue(scope.UnsLineId, out var line)) return;
|
||||||
|
CollectAtLevel(line, NodeAclScopeKind.UnsLine, groups, matches);
|
||||||
|
|
||||||
|
if (scope.EquipmentId is null) return;
|
||||||
|
if (!line.Children.TryGetValue(scope.EquipmentId, out var eq)) return;
|
||||||
|
CollectAtLevel(eq, NodeAclScopeKind.Equipment, groups, matches);
|
||||||
|
|
||||||
|
if (scope.TagId is null) return;
|
||||||
|
if (!eq.Children.TryGetValue(scope.TagId, out var tag)) return;
|
||||||
|
CollectAtLevel(tag, NodeAclScopeKind.Tag, groups, matches);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void WalkSystemPlatform(PermissionTrieNode ns, NodeScope scope, HashSet<string> groups, List<MatchedGrant> matches)
|
||||||
|
{
|
||||||
|
// FolderSegments are nested under the namespace; each is its own trie level. Reuse the
|
||||||
|
// UnsArea scope kind for the flags — NodeAcl rows for Galaxy tags carry ScopeKind.Tag
|
||||||
|
// for leaf grants and ScopeKind.Namespace for folder-root grants; deeper folder grants
|
||||||
|
// are modeled as Equipment-level rows today since NodeAclScopeKind doesn't enumerate
|
||||||
|
// a dedicated FolderSegment kind. Future-proof TODO tracked in Stream B follow-up.
|
||||||
|
var current = ns;
|
||||||
|
foreach (var segment in scope.FolderSegments)
|
||||||
|
{
|
||||||
|
if (!current.Children.TryGetValue(segment, out var child)) return;
|
||||||
|
CollectAtLevel(child, NodeAclScopeKind.Equipment, groups, matches);
|
||||||
|
current = child;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (scope.TagId is null) return;
|
||||||
|
if (!current.Children.TryGetValue(scope.TagId, out var tag)) return;
|
||||||
|
CollectAtLevel(tag, NodeAclScopeKind.Tag, groups, matches);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void CollectAtLevel(PermissionTrieNode node, NodeAclScopeKind level, HashSet<string> groups, List<MatchedGrant> matches)
|
||||||
|
{
|
||||||
|
foreach (var grant in node.Grants)
|
||||||
|
{
|
||||||
|
if (groups.Contains(grant.LdapGroup))
|
||||||
|
matches.Add(new MatchedGrant(grant.LdapGroup, level, grant.PermissionFlags));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>One node in a <see cref="PermissionTrie"/>.</summary>
|
||||||
|
public sealed class PermissionTrieNode
|
||||||
|
{
|
||||||
|
/// <summary>Grants anchored at this trie level.</summary>
|
||||||
|
public List<TrieGrant> Grants { get; } = [];
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Children keyed by the next level's id — namespace id under cluster; UnsAreaId or
|
||||||
|
/// folder-segment name under namespace; etc. Comparer is OrdinalIgnoreCase so the walk
|
||||||
|
/// tolerates case drift between the NodeAcl row and the requested scope.
|
||||||
|
/// </summary>
|
||||||
|
public Dictionary<string, PermissionTrieNode> Children { get; } = new(StringComparer.OrdinalIgnoreCase);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Projection of a <see cref="Configuration.Entities.NodeAcl"/> row into the trie.</summary>
|
||||||
|
public sealed record TrieGrant(string LdapGroup, NodePermissions PermissionFlags);
|
||||||
@@ -0,0 +1,97 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Builds a <see cref="PermissionTrie"/> from a set of <see cref="NodeAcl"/> rows anchored
|
||||||
|
/// in one generation. The trie is keyed on the rows' scope hierarchy — rows with
|
||||||
|
/// <see cref="NodeAclScopeKind.Cluster"/> land at the trie root, rows with
|
||||||
|
/// <see cref="NodeAclScopeKind.Tag"/> land at a leaf, etc.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Intended to be called by <see cref="PermissionTrieCache"/> once per published
|
||||||
|
/// generation; the resulting trie is immutable for the life of the cache entry. Idempotent —
|
||||||
|
/// two builds from the same rows produce equal tries (grant lists may be in insertion order;
|
||||||
|
/// evaluators don't depend on order).</para>
|
||||||
|
///
|
||||||
|
/// <para>The builder deliberately does not know about the node-row metadata the trie path
|
||||||
|
/// will be walked with. The caller assembles <see cref="NodeScope"/> values from the live
|
||||||
|
/// config (UnsArea parent of UnsLine, etc.); this class only honors the <c>ScopeId</c>
|
||||||
|
/// each row carries.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public static class PermissionTrieBuilder
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// Build a trie for one cluster/generation from the supplied rows. The caller is
|
||||||
|
/// responsible for pre-filtering rows to the target generation + cluster.
|
||||||
|
/// </summary>
|
||||||
|
public static PermissionTrie Build(
|
||||||
|
string clusterId,
|
||||||
|
long generationId,
|
||||||
|
IReadOnlyList<NodeAcl> rows,
|
||||||
|
IReadOnlyDictionary<string, NodeAclPath>? scopePaths = null)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
ArgumentNullException.ThrowIfNull(rows);
|
||||||
|
|
||||||
|
var trie = new PermissionTrie { ClusterId = clusterId, GenerationId = generationId };
|
||||||
|
|
||||||
|
foreach (var row in rows)
|
||||||
|
{
|
||||||
|
if (!string.Equals(row.ClusterId, clusterId, StringComparison.OrdinalIgnoreCase)) continue;
|
||||||
|
var grant = new TrieGrant(row.LdapGroup, row.PermissionFlags);
|
||||||
|
|
||||||
|
var node = row.ScopeKind switch
|
||||||
|
{
|
||||||
|
NodeAclScopeKind.Cluster => trie.Root,
|
||||||
|
_ => Descend(trie.Root, row, scopePaths),
|
||||||
|
};
|
||||||
|
|
||||||
|
if (node is not null)
|
||||||
|
node.Grants.Add(grant);
|
||||||
|
}
|
||||||
|
|
||||||
|
return trie;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static PermissionTrieNode? Descend(PermissionTrieNode root, NodeAcl row, IReadOnlyDictionary<string, NodeAclPath>? scopePaths)
|
||||||
|
{
|
||||||
|
if (string.IsNullOrEmpty(row.ScopeId)) return null;
|
||||||
|
|
||||||
|
// For sub-cluster scopes the caller supplies a path lookup so we know the containing
|
||||||
|
// namespace / UnsArea / UnsLine ids. Without a path lookup we fall back to putting the
|
||||||
|
// row directly under the root using its ScopeId — works for deterministic tests, not
|
||||||
|
// for production where the hierarchy must be honored.
|
||||||
|
if (scopePaths is null || !scopePaths.TryGetValue(row.ScopeId, out var path))
|
||||||
|
{
|
||||||
|
return EnsureChild(root, row.ScopeId);
|
||||||
|
}
|
||||||
|
|
||||||
|
var node = root;
|
||||||
|
foreach (var segment in path.Segments)
|
||||||
|
node = EnsureChild(node, segment);
|
||||||
|
return node;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static PermissionTrieNode EnsureChild(PermissionTrieNode parent, string key)
|
||||||
|
{
|
||||||
|
if (!parent.Children.TryGetValue(key, out var child))
|
||||||
|
{
|
||||||
|
child = new PermissionTrieNode();
|
||||||
|
parent.Children[key] = child;
|
||||||
|
}
|
||||||
|
return child;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Ordered list of trie-path segments from root to the target node. Supplied to
|
||||||
|
/// <see cref="PermissionTrieBuilder.Build"/> so the builder knows where a
|
||||||
|
/// <see cref="NodeAclScopeKind.UnsLine"/>-scoped row sits in the hierarchy.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="Segments">
|
||||||
|
/// Namespace id, then (for Equipment kind) UnsAreaId / UnsLineId / EquipmentId / TagId as
|
||||||
|
/// applicable; or (for SystemPlatform kind) NamespaceId / FolderSegment / .../TagId.
|
||||||
|
/// </param>
|
||||||
|
public sealed record NodeAclPath(IReadOnlyList<string> Segments);
|
||||||
@@ -0,0 +1,88 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Process-singleton cache of <see cref="PermissionTrie"/> instances keyed on
|
||||||
|
/// <c>(ClusterId, GenerationId)</c>. Hot-path evaluation reads
|
||||||
|
/// <see cref="GetTrie(string)"/> without awaiting DB access; the cache is populated
|
||||||
|
/// out-of-band on publish + on first reference via
|
||||||
|
/// <see cref="Install(PermissionTrie)"/>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per decision #148 and Phase 6.2 Stream B.4 the cache is generation-sealed: once a
|
||||||
|
/// trie is installed for <c>(ClusterId, GenerationId)</c> the entry is immutable. When a
|
||||||
|
/// new generation publishes, the caller calls <see cref="Install"/> with the new trie
|
||||||
|
/// + the cache atomically updates its "current generation" pointer for that cluster.
|
||||||
|
/// Older generations are retained so an in-flight request evaluating the prior generation
|
||||||
|
/// still succeeds — GC via <see cref="Prune(string, int)"/>.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class PermissionTrieCache
|
||||||
|
{
|
||||||
|
private readonly ConcurrentDictionary<string, ClusterEntry> _byCluster =
|
||||||
|
new(StringComparer.OrdinalIgnoreCase);
|
||||||
|
|
||||||
|
/// <summary>Install a trie for a cluster + make it the current generation.</summary>
|
||||||
|
public void Install(PermissionTrie trie)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(trie);
|
||||||
|
_byCluster.AddOrUpdate(trie.ClusterId,
|
||||||
|
_ => ClusterEntry.FromSingle(trie),
|
||||||
|
(_, existing) => existing.WithAdditional(trie));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Get the current-generation trie for a cluster; null when nothing installed.</summary>
|
||||||
|
public PermissionTrie? GetTrie(string clusterId)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
|
||||||
|
return _byCluster.TryGetValue(clusterId, out var entry) ? entry.Current : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Get a specific (cluster, generation) trie; null if that pair isn't cached.</summary>
|
||||||
|
public PermissionTrie? GetTrie(string clusterId, long generationId)
|
||||||
|
{
|
||||||
|
if (!_byCluster.TryGetValue(clusterId, out var entry)) return null;
|
||||||
|
return entry.Tries.TryGetValue(generationId, out var trie) ? trie : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>The generation id the <see cref="GetTrie(string)"/> shortcut currently serves for a cluster.</summary>
|
||||||
|
public long? CurrentGenerationId(string clusterId)
|
||||||
|
=> _byCluster.TryGetValue(clusterId, out var entry) ? entry.Current.GenerationId : null;
|
||||||
|
|
||||||
|
/// <summary>Drop every cached trie for one cluster.</summary>
|
||||||
|
public void Invalidate(string clusterId) => _byCluster.TryRemove(clusterId, out _);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Retain only the most-recent <paramref name="keepLatest"/> generations for a cluster.
|
||||||
|
/// No-op when there's nothing to drop.
|
||||||
|
/// </summary>
|
||||||
|
public void Prune(string clusterId, int keepLatest = 3)
|
||||||
|
{
|
||||||
|
if (keepLatest < 1) throw new ArgumentOutOfRangeException(nameof(keepLatest), keepLatest, "keepLatest must be >= 1");
|
||||||
|
if (!_byCluster.TryGetValue(clusterId, out var entry)) return;
|
||||||
|
|
||||||
|
if (entry.Tries.Count <= keepLatest) return;
|
||||||
|
var keep = entry.Tries
|
||||||
|
.OrderByDescending(kvp => kvp.Key)
|
||||||
|
.Take(keepLatest)
|
||||||
|
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
|
||||||
|
_byCluster[clusterId] = new ClusterEntry(entry.Current, keep);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Diagnostics counter: number of cached (cluster, generation) tries.</summary>
|
||||||
|
public int CachedTrieCount => _byCluster.Values.Sum(e => e.Tries.Count);
|
||||||
|
|
||||||
|
private sealed record ClusterEntry(PermissionTrie Current, IReadOnlyDictionary<long, PermissionTrie> Tries)
|
||||||
|
{
|
||||||
|
public static ClusterEntry FromSingle(PermissionTrie trie) =>
|
||||||
|
new(trie, new Dictionary<long, PermissionTrie> { [trie.GenerationId] = trie });
|
||||||
|
|
||||||
|
public ClusterEntry WithAdditional(PermissionTrie trie)
|
||||||
|
{
|
||||||
|
var next = new Dictionary<long, PermissionTrie>(Tries) { [trie.GenerationId] = trie };
|
||||||
|
// The highest generation wins as "current" — handles out-of-order installs.
|
||||||
|
var current = trie.GenerationId >= Current.GenerationId ? trie : Current;
|
||||||
|
return new ClusterEntry(current, next);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Default <see cref="IPermissionEvaluator"/> implementation. Resolves the
|
||||||
|
/// <see cref="PermissionTrie"/> for the session's cluster (via
|
||||||
|
/// <see cref="PermissionTrieCache"/>), walks it collecting matched grants, OR-s the
|
||||||
|
/// permission flags, and maps against the operation-specific required permission.
|
||||||
|
/// </summary>
|
||||||
|
public sealed class TriePermissionEvaluator : IPermissionEvaluator
|
||||||
|
{
|
||||||
|
private readonly PermissionTrieCache _cache;
|
||||||
|
private readonly TimeProvider _timeProvider;
|
||||||
|
|
||||||
|
public TriePermissionEvaluator(PermissionTrieCache cache, TimeProvider? timeProvider = null)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(cache);
|
||||||
|
_cache = cache;
|
||||||
|
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||||
|
}
|
||||||
|
|
||||||
|
public AuthorizationDecision Authorize(UserAuthorizationState session, OpcUaOperation operation, NodeScope scope)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(session);
|
||||||
|
ArgumentNullException.ThrowIfNull(scope);
|
||||||
|
|
||||||
|
// Decision #152 — beyond the staleness ceiling every call fails closed regardless of
|
||||||
|
// cache warmth elsewhere in the process.
|
||||||
|
if (session.IsStale(_timeProvider.GetUtcNow().UtcDateTime))
|
||||||
|
return AuthorizationDecision.NotGranted();
|
||||||
|
|
||||||
|
if (!string.Equals(session.ClusterId, scope.ClusterId, StringComparison.OrdinalIgnoreCase))
|
||||||
|
return AuthorizationDecision.NotGranted();
|
||||||
|
|
||||||
|
var trie = _cache.GetTrie(scope.ClusterId);
|
||||||
|
if (trie is null) return AuthorizationDecision.NotGranted();
|
||||||
|
|
||||||
|
var matches = trie.CollectMatches(scope, session.LdapGroups);
|
||||||
|
if (matches.Count == 0) return AuthorizationDecision.NotGranted();
|
||||||
|
|
||||||
|
var required = MapOperationToPermission(operation);
|
||||||
|
var granted = NodePermissions.None;
|
||||||
|
foreach (var m in matches) granted |= m.PermissionFlags;
|
||||||
|
|
||||||
|
return (granted & required) == required
|
||||||
|
? AuthorizationDecision.Allowed(matches)
|
||||||
|
: AuthorizationDecision.NotGranted();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Maps each <see cref="OpcUaOperation"/> to the <see cref="NodePermissions"/> bit required to grant it.</summary>
|
||||||
|
public static NodePermissions MapOperationToPermission(OpcUaOperation op) => op switch
|
||||||
|
{
|
||||||
|
OpcUaOperation.Browse => NodePermissions.Browse,
|
||||||
|
OpcUaOperation.Read => NodePermissions.Read,
|
||||||
|
OpcUaOperation.WriteOperate => NodePermissions.WriteOperate,
|
||||||
|
OpcUaOperation.WriteTune => NodePermissions.WriteTune,
|
||||||
|
OpcUaOperation.WriteConfigure => NodePermissions.WriteConfigure,
|
||||||
|
OpcUaOperation.HistoryRead => NodePermissions.HistoryRead,
|
||||||
|
OpcUaOperation.HistoryUpdate => NodePermissions.HistoryRead, // HistoryUpdate bit not yet in NodePermissions; TODO Stream C follow-up
|
||||||
|
OpcUaOperation.CreateMonitoredItems => NodePermissions.Subscribe,
|
||||||
|
OpcUaOperation.TransferSubscriptions=> NodePermissions.Subscribe,
|
||||||
|
OpcUaOperation.Call => NodePermissions.MethodCall,
|
||||||
|
OpcUaOperation.AlarmAcknowledge => NodePermissions.AlarmAcknowledge,
|
||||||
|
OpcUaOperation.AlarmConfirm => NodePermissions.AlarmConfirm,
|
||||||
|
OpcUaOperation.AlarmShelve => NodePermissions.AlarmShelve,
|
||||||
|
_ => throw new ArgumentOutOfRangeException(nameof(op), op, $"No permission mapping defined for operation {op}."),
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -0,0 +1,69 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-session authorization state cached on the OPC UA session object + keyed on the
|
||||||
|
/// session id. Captures the LDAP group memberships resolved at sign-in, the generation
|
||||||
|
/// the membership was resolved against, and the bounded freshness window.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per decision #151 the membership is bounded by <see cref="MembershipFreshnessInterval"/>
|
||||||
|
/// (default 15 min). After that, the next hot-path authz call re-resolves LDAP group
|
||||||
|
/// memberships; failure to re-resolve (LDAP unreachable) flips the session to fail-closed
|
||||||
|
/// until a refresh succeeds.
|
||||||
|
///
|
||||||
|
/// Per decision #152 <see cref="AuthCacheMaxStaleness"/> (default 5 min) is separate from
|
||||||
|
/// Phase 6.1's availability-oriented 24h cache — beyond this window the evaluator returns
|
||||||
|
/// <see cref="AuthorizationVerdict.NotGranted"/> regardless of config-cache warmth.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed record UserAuthorizationState
|
||||||
|
{
|
||||||
|
/// <summary>Opaque session id (reuse OPC UA session handle when possible).</summary>
|
||||||
|
public required string SessionId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Cluster the session is scoped to — every request targets nodes in this cluster.</summary>
|
||||||
|
public required string ClusterId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// LDAP groups the user is a member of as resolved at sign-in / last membership refresh.
|
||||||
|
/// Case comparison is handled downstream by the evaluator (OrdinalIgnoreCase).
|
||||||
|
/// </summary>
|
||||||
|
public required IReadOnlyList<string> LdapGroups { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Timestamp when <see cref="LdapGroups"/> was last resolved from the directory.</summary>
|
||||||
|
public required DateTime MembershipResolvedUtc { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Trie generation the session is currently bound to. When
|
||||||
|
/// <see cref="PermissionTrieCache"/> moves to a new generation, the session's
|
||||||
|
/// <c>(AuthGenerationId, MembershipVersion)</c> stamp no longer matches its
|
||||||
|
/// MonitoredItems and they re-evaluate on next publish (decision #153).
|
||||||
|
/// </summary>
|
||||||
|
public required long AuthGenerationId { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Monotonic counter incremented every time membership is re-resolved. Combined with
|
||||||
|
/// <see cref="AuthGenerationId"/> into the subscription stamp per decision #153.
|
||||||
|
/// </summary>
|
||||||
|
public required long MembershipVersion { get; init; }
|
||||||
|
|
||||||
|
/// <summary>Bounded membership freshness window; past this the next authz call refreshes.</summary>
|
||||||
|
public TimeSpan MembershipFreshnessInterval { get; init; } = TimeSpan.FromMinutes(15);
|
||||||
|
|
||||||
|
/// <summary>Hard staleness ceiling — beyond this, the evaluator fails closed.</summary>
|
||||||
|
public TimeSpan AuthCacheMaxStaleness { get; init; } = TimeSpan.FromMinutes(5);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// True when <paramref name="utcNow"/> - <see cref="MembershipResolvedUtc"/> exceeds
|
||||||
|
/// <see cref="AuthCacheMaxStaleness"/>. The evaluator short-circuits to NotGranted
|
||||||
|
/// whenever this is true.
|
||||||
|
/// </summary>
|
||||||
|
public bool IsStale(DateTime utcNow) => utcNow - MembershipResolvedUtc > AuthCacheMaxStaleness;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// True when membership is past its freshness interval but still within the staleness
|
||||||
|
/// ceiling — a signal to the caller to kick off an async refresh, while the current
|
||||||
|
/// call still evaluates against the cached memberships.
|
||||||
|
/// </summary>
|
||||||
|
public bool NeedsRefresh(DateTime utcNow) =>
|
||||||
|
!IsStale(utcNow) && utcNow - MembershipResolvedUtc > MembershipFreshnessInterval;
|
||||||
|
}
|
||||||
@@ -0,0 +1,86 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Domain-layer health aggregation for Phase 6.1 Stream C. Pure functions over the driver
|
||||||
|
/// fleet — given each driver's <see cref="DriverState"/>, produce a <see cref="ReadinessVerdict"/>
|
||||||
|
/// that maps to HTTP status codes at the endpoint layer.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// State matrix per <c>docs/v2/implementation/phase-6-1-resilience-and-observability.md</c>
|
||||||
|
/// §Stream C.1:
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item><see cref="DriverState.Unknown"/> / <see cref="DriverState.Initializing"/>
|
||||||
|
/// → /readyz 503 (not yet ready).</item>
|
||||||
|
/// <item><see cref="DriverState.Healthy"/> → /readyz 200.</item>
|
||||||
|
/// <item><see cref="DriverState.Degraded"/> → /readyz 200 with flagged driver IDs.</item>
|
||||||
|
/// <item><see cref="DriverState.Faulted"/> → /readyz 503.</item>
|
||||||
|
/// </list>
|
||||||
|
/// The overall verdict is computed across the fleet: any Faulted → Faulted; any
|
||||||
|
/// Unknown/Initializing → NotReady; any Degraded → Degraded; else Healthy. An empty fleet
|
||||||
|
/// is Healthy (nothing to degrade).
|
||||||
|
/// </remarks>
|
||||||
|
public static class DriverHealthReport
|
||||||
|
{
|
||||||
|
/// <summary>Compute the fleet-wide readiness verdict from per-driver states.</summary>
|
||||||
|
public static ReadinessVerdict Aggregate(IReadOnlyList<DriverHealthSnapshot> drivers)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(drivers);
|
||||||
|
if (drivers.Count == 0) return ReadinessVerdict.Healthy;
|
||||||
|
|
||||||
|
var anyFaulted = drivers.Any(d => d.State == DriverState.Faulted);
|
||||||
|
if (anyFaulted) return ReadinessVerdict.Faulted;
|
||||||
|
|
||||||
|
var anyInitializing = drivers.Any(d =>
|
||||||
|
d.State == DriverState.Unknown || d.State == DriverState.Initializing);
|
||||||
|
if (anyInitializing) return ReadinessVerdict.NotReady;
|
||||||
|
|
||||||
|
// Reconnecting = driver alive but not serving live data; report as Degraded so /readyz
|
||||||
|
// stays 200 (the fleet can still serve cached / last-good data) while operators see the
|
||||||
|
// affected driver in the body.
|
||||||
|
var anyDegraded = drivers.Any(d =>
|
||||||
|
d.State == DriverState.Degraded || d.State == DriverState.Reconnecting);
|
||||||
|
if (anyDegraded) return ReadinessVerdict.Degraded;
|
||||||
|
|
||||||
|
return ReadinessVerdict.Healthy;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Map a <see cref="ReadinessVerdict"/> to the HTTP status the /readyz endpoint should
|
||||||
|
/// return per the Stream C.1 state matrix.
|
||||||
|
/// </summary>
|
||||||
|
public static int HttpStatus(ReadinessVerdict verdict) => verdict switch
|
||||||
|
{
|
||||||
|
ReadinessVerdict.Healthy => 200,
|
||||||
|
ReadinessVerdict.Degraded => 200,
|
||||||
|
ReadinessVerdict.NotReady => 503,
|
||||||
|
ReadinessVerdict.Faulted => 503,
|
||||||
|
_ => 500,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Per-driver snapshot fed into <see cref="DriverHealthReport.Aggregate"/>.</summary>
|
||||||
|
/// <param name="DriverInstanceId">Driver instance identifier (from <c>IDriver.DriverInstanceId</c>).</param>
|
||||||
|
/// <param name="State">Current <see cref="DriverState"/> from <c>IDriver.GetHealth</c>.</param>
|
||||||
|
/// <param name="DetailMessage">Optional driver-supplied detail (e.g. "primary PLC unreachable").</param>
|
||||||
|
public sealed record DriverHealthSnapshot(
|
||||||
|
string DriverInstanceId,
|
||||||
|
DriverState State,
|
||||||
|
string? DetailMessage = null);
|
||||||
|
|
||||||
|
/// <summary>Overall fleet readiness — derived from driver states by <see cref="DriverHealthReport.Aggregate"/>.</summary>
|
||||||
|
public enum ReadinessVerdict
|
||||||
|
{
|
||||||
|
/// <summary>All drivers Healthy (or fleet is empty).</summary>
|
||||||
|
Healthy,
|
||||||
|
|
||||||
|
/// <summary>At least one driver Degraded; none Faulted / NotReady.</summary>
|
||||||
|
Degraded,
|
||||||
|
|
||||||
|
/// <summary>At least one driver Unknown / Initializing; none Faulted.</summary>
|
||||||
|
NotReady,
|
||||||
|
|
||||||
|
/// <summary>At least one driver Faulted.</summary>
|
||||||
|
Faulted,
|
||||||
|
}
|
||||||
@@ -0,0 +1,53 @@
|
|||||||
|
using Serilog.Context;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Convenience wrapper around Serilog <see cref="LogContext"/> — attaches the set of
|
||||||
|
/// structured properties a capability call should carry (DriverInstanceId, DriverType,
|
||||||
|
/// CapabilityName, CorrelationId). Callers wrap their call-site body in a <c>using</c>
|
||||||
|
/// block; inner <c>Log.Information</c> / <c>Log.Warning</c> calls emit the context
|
||||||
|
/// automatically via the Serilog enricher chain.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/implementation/phase-6-1-resilience-and-observability.md</c> §Stream C.2.
|
||||||
|
/// The correlation ID should be the OPC UA <c>RequestHeader.RequestHandle</c> when in-flight;
|
||||||
|
/// otherwise a short random GUID. Callers supply whichever is available.
|
||||||
|
/// </remarks>
|
||||||
|
public static class LogContextEnricher
|
||||||
|
{
|
||||||
|
/// <summary>Attach the capability-call property set. Dispose the returned scope to pop.</summary>
|
||||||
|
public static IDisposable Push(string driverInstanceId, string driverType, DriverCapability capability, string correlationId)
|
||||||
|
{
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(driverInstanceId);
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(driverType);
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(correlationId);
|
||||||
|
|
||||||
|
var a = LogContext.PushProperty("DriverInstanceId", driverInstanceId);
|
||||||
|
var b = LogContext.PushProperty("DriverType", driverType);
|
||||||
|
var c = LogContext.PushProperty("CapabilityName", capability.ToString());
|
||||||
|
var d = LogContext.PushProperty("CorrelationId", correlationId);
|
||||||
|
return new CompositeScope(a, b, c, d);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Generate a short correlation ID when no OPC UA RequestHandle is available.
|
||||||
|
/// 12-hex-char slice of a GUID — long enough for log correlation, short enough to
|
||||||
|
/// scan visually.
|
||||||
|
/// </summary>
|
||||||
|
public static string NewCorrelationId() => Guid.NewGuid().ToString("N")[..12];
|
||||||
|
|
||||||
|
private sealed class CompositeScope : IDisposable
|
||||||
|
{
|
||||||
|
private readonly IDisposable[] _inner;
|
||||||
|
public CompositeScope(params IDisposable[] inner) => _inner = inner;
|
||||||
|
|
||||||
|
public void Dispose()
|
||||||
|
{
|
||||||
|
// Reverse-order disposal matches Serilog's stack semantics.
|
||||||
|
for (var i = _inner.Length - 1; i >= 0; i--)
|
||||||
|
_inner[i].Dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
120
src/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs
Normal file
120
src/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
using Polly;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Executes driver-capability calls through a shared Polly pipeline. One invoker per
|
||||||
|
/// <c>(DriverInstance, IDriver)</c> pair; the underlying <see cref="DriverResiliencePipelineBuilder"/>
|
||||||
|
/// is process-singleton so all invokers share its cache.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #143-144 and Phase 6.1 Stream A.3. The server's dispatch
|
||||||
|
/// layer routes every capability call (<c>IReadable.ReadAsync</c>, <c>IWritable.WriteAsync</c>,
|
||||||
|
/// <c>ITagDiscovery.DiscoverAsync</c>, <c>ISubscribable.SubscribeAsync/UnsubscribeAsync</c>,
|
||||||
|
/// <c>IHostConnectivityProbe</c> probe loop, <c>IAlarmSource.SubscribeAlarmsAsync/AcknowledgeAsync</c>,
|
||||||
|
/// and all four <c>IHistoryProvider</c> reads) through this invoker.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class CapabilityInvoker
|
||||||
|
{
|
||||||
|
private readonly DriverResiliencePipelineBuilder _builder;
|
||||||
|
private readonly string _driverInstanceId;
|
||||||
|
private readonly string _driverType;
|
||||||
|
private readonly Func<DriverResilienceOptions> _optionsAccessor;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Construct an invoker for one driver instance.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="builder">Shared, process-singleton pipeline builder.</param>
|
||||||
|
/// <param name="driverInstanceId">The <c>DriverInstance.Id</c> column value.</param>
|
||||||
|
/// <param name="optionsAccessor">
|
||||||
|
/// Snapshot accessor for the current resilience options. Invoked per call so Admin-edit +
|
||||||
|
/// pipeline-invalidate can take effect without restarting the invoker.
|
||||||
|
/// </param>
|
||||||
|
/// <param name="driverType">Driver type name for structured-log enrichment (e.g. <c>"Modbus"</c>).</param>
|
||||||
|
public CapabilityInvoker(
|
||||||
|
DriverResiliencePipelineBuilder builder,
|
||||||
|
string driverInstanceId,
|
||||||
|
Func<DriverResilienceOptions> optionsAccessor,
|
||||||
|
string driverType = "Unknown")
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(builder);
|
||||||
|
ArgumentNullException.ThrowIfNull(optionsAccessor);
|
||||||
|
|
||||||
|
_builder = builder;
|
||||||
|
_driverInstanceId = driverInstanceId;
|
||||||
|
_driverType = driverType;
|
||||||
|
_optionsAccessor = optionsAccessor;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Execute a capability call returning a value, honoring the per-capability pipeline.</summary>
|
||||||
|
/// <typeparam name="TResult">Return type of the underlying driver call.</typeparam>
|
||||||
|
public async ValueTask<TResult> ExecuteAsync<TResult>(
|
||||||
|
DriverCapability capability,
|
||||||
|
string hostName,
|
||||||
|
Func<CancellationToken, ValueTask<TResult>> callSite,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(callSite);
|
||||||
|
|
||||||
|
var pipeline = ResolvePipeline(capability, hostName);
|
||||||
|
using (LogContextEnricher.Push(_driverInstanceId, _driverType, capability, LogContextEnricher.NewCorrelationId()))
|
||||||
|
{
|
||||||
|
return await pipeline.ExecuteAsync(callSite, cancellationToken).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Execute a void-returning capability call, honoring the per-capability pipeline.</summary>
|
||||||
|
public async ValueTask ExecuteAsync(
|
||||||
|
DriverCapability capability,
|
||||||
|
string hostName,
|
||||||
|
Func<CancellationToken, ValueTask> callSite,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(callSite);
|
||||||
|
|
||||||
|
var pipeline = ResolvePipeline(capability, hostName);
|
||||||
|
using (LogContextEnricher.Push(_driverInstanceId, _driverType, capability, LogContextEnricher.NewCorrelationId()))
|
||||||
|
{
|
||||||
|
await pipeline.ExecuteAsync(callSite, cancellationToken).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Execute a <see cref="DriverCapability.Write"/> call honoring <see cref="WriteIdempotentAttribute"/>
|
||||||
|
/// semantics — if <paramref name="isIdempotent"/> is <c>false</c>, retries are disabled regardless
|
||||||
|
/// of the tag-level configuration (the pipeline for a non-idempotent write never retries per
|
||||||
|
/// decisions #44-45). If <c>true</c>, the call runs through the capability's pipeline which may
|
||||||
|
/// retry when the tier configuration permits.
|
||||||
|
/// </summary>
|
||||||
|
public async ValueTask<TResult> ExecuteWriteAsync<TResult>(
|
||||||
|
string hostName,
|
||||||
|
bool isIdempotent,
|
||||||
|
Func<CancellationToken, ValueTask<TResult>> callSite,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(callSite);
|
||||||
|
|
||||||
|
if (!isIdempotent)
|
||||||
|
{
|
||||||
|
var noRetryOptions = _optionsAccessor() with
|
||||||
|
{
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Write] = _optionsAccessor().Resolve(DriverCapability.Write) with { RetryCount = 0 },
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var pipeline = _builder.GetOrCreate(_driverInstanceId, $"{hostName}::non-idempotent", DriverCapability.Write, noRetryOptions);
|
||||||
|
using (LogContextEnricher.Push(_driverInstanceId, _driverType, DriverCapability.Write, LogContextEnricher.NewCorrelationId()))
|
||||||
|
{
|
||||||
|
return await pipeline.ExecuteAsync(callSite, cancellationToken).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return await ExecuteAsync(DriverCapability.Write, hostName, callSite, cancellationToken).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
private ResiliencePipeline ResolvePipeline(DriverCapability capability, string hostName) =>
|
||||||
|
_builder.GetOrCreate(_driverInstanceId, hostName, capability, _optionsAccessor());
|
||||||
|
}
|
||||||
@@ -0,0 +1,96 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-tier × per-capability resilience policy configuration for a driver instance.
|
||||||
|
/// Bound from <c>DriverInstance.ResilienceConfig</c> JSON (nullable column; null = tier defaults).
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #143 and #144.
|
||||||
|
/// </summary>
|
||||||
|
public sealed record DriverResilienceOptions
|
||||||
|
{
|
||||||
|
/// <summary>Tier the owning driver type is registered as; drives the default map.</summary>
|
||||||
|
public required DriverTier Tier { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-capability policy overrides. Capabilities absent from this map fall back to
|
||||||
|
/// <see cref="GetTierDefaults(DriverTier)"/> for the configured <see cref="Tier"/>.
|
||||||
|
/// </summary>
|
||||||
|
public IReadOnlyDictionary<DriverCapability, CapabilityPolicy> CapabilityPolicies { get; init; }
|
||||||
|
= new Dictionary<DriverCapability, CapabilityPolicy>();
|
||||||
|
|
||||||
|
/// <summary>Bulkhead (max concurrent in-flight calls) for every capability. Default 32.</summary>
|
||||||
|
public int BulkheadMaxConcurrent { get; init; } = 32;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Bulkhead queue depth. Zero = no queueing; overflow fails fast with
|
||||||
|
/// <c>BulkheadRejectedException</c>. Default 64.
|
||||||
|
/// </summary>
|
||||||
|
public int BulkheadMaxQueue { get; init; } = 64;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Look up the effective policy for a capability, falling back to tier defaults when no
|
||||||
|
/// override is configured. Never returns null.
|
||||||
|
/// </summary>
|
||||||
|
public CapabilityPolicy Resolve(DriverCapability capability)
|
||||||
|
{
|
||||||
|
if (CapabilityPolicies.TryGetValue(capability, out var policy))
|
||||||
|
return policy;
|
||||||
|
|
||||||
|
var defaults = GetTierDefaults(Tier);
|
||||||
|
return defaults[capability];
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-tier per-capability default policy table, per decisions #143-144 and the Phase 6.1
|
||||||
|
/// Stream A.2 specification. Retries skipped on <see cref="DriverCapability.Write"/> and
|
||||||
|
/// <see cref="DriverCapability.AlarmAcknowledge"/> regardless of tier.
|
||||||
|
/// </summary>
|
||||||
|
public static IReadOnlyDictionary<DriverCapability, CapabilityPolicy> GetTierDefaults(DriverTier tier) =>
|
||||||
|
tier switch
|
||||||
|
{
|
||||||
|
DriverTier.A => new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Read] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 0, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Discover] = new(TimeoutSeconds: 30, RetryCount: 2, BreakerFailureThreshold: 3),
|
||||||
|
[DriverCapability.Subscribe] = new(TimeoutSeconds: 5, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Probe] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.AlarmSubscribe] = new(TimeoutSeconds: 5, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.AlarmAcknowledge] = new(TimeoutSeconds: 5, RetryCount: 0, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.HistoryRead] = new(TimeoutSeconds: 30, RetryCount: 2, BreakerFailureThreshold: 5),
|
||||||
|
},
|
||||||
|
DriverTier.B => new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Read] = new(TimeoutSeconds: 4, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 4, RetryCount: 0, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Discover] = new(TimeoutSeconds: 60, RetryCount: 2, BreakerFailureThreshold: 3),
|
||||||
|
[DriverCapability.Subscribe] = new(TimeoutSeconds: 8, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.Probe] = new(TimeoutSeconds: 4, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.AlarmSubscribe] = new(TimeoutSeconds: 8, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.AlarmAcknowledge] = new(TimeoutSeconds: 8, RetryCount: 0, BreakerFailureThreshold: 5),
|
||||||
|
[DriverCapability.HistoryRead] = new(TimeoutSeconds: 60, RetryCount: 2, BreakerFailureThreshold: 5),
|
||||||
|
},
|
||||||
|
DriverTier.C => new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Read] = new(TimeoutSeconds: 10, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 10, RetryCount: 0, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.Discover] = new(TimeoutSeconds: 120, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.Subscribe] = new(TimeoutSeconds: 15, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.Probe] = new(TimeoutSeconds: 10, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.AlarmSubscribe] = new(TimeoutSeconds: 15, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.AlarmAcknowledge] = new(TimeoutSeconds: 15, RetryCount: 0, BreakerFailureThreshold: 0),
|
||||||
|
[DriverCapability.HistoryRead] = new(TimeoutSeconds: 120, RetryCount: 1, BreakerFailureThreshold: 0),
|
||||||
|
},
|
||||||
|
_ => throw new ArgumentOutOfRangeException(nameof(tier), tier, $"No default policy table defined for tier {tier}."),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Policy for one capability on one driver instance.</summary>
|
||||||
|
/// <param name="TimeoutSeconds">Per-call timeout (wraps the inner Polly execution).</param>
|
||||||
|
/// <param name="RetryCount">Number of retry attempts after the first failure; zero = no retry.</param>
|
||||||
|
/// <param name="BreakerFailureThreshold">
|
||||||
|
/// Consecutive-failure count that opens the circuit breaker; zero = no breaker
|
||||||
|
/// (Tier C uses the supervisor's process-level breaker instead, per decision #68).
|
||||||
|
/// </param>
|
||||||
|
public sealed record CapabilityPolicy(int TimeoutSeconds, int RetryCount, int BreakerFailureThreshold);
|
||||||
@@ -0,0 +1,118 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
using Polly;
|
||||||
|
using Polly.CircuitBreaker;
|
||||||
|
using Polly.Retry;
|
||||||
|
using Polly.Timeout;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Builds and caches Polly resilience pipelines keyed on
|
||||||
|
/// <c>(DriverInstanceId, HostName, DriverCapability)</c>. One dead PLC behind a multi-device
|
||||||
|
/// driver cannot open the circuit breaker for healthy sibling hosts.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decision #144 (per-device isolation). Composition from outside-in:
|
||||||
|
/// <b>Timeout → Retry (when capability permits) → Circuit Breaker (when tier permits) → Bulkhead</b>.
|
||||||
|
///
|
||||||
|
/// <para>Pipeline resolution is lock-free on the hot path: the inner
|
||||||
|
/// <see cref="ConcurrentDictionary{TKey,TValue}"/> caches a <see cref="ResiliencePipeline"/> per key;
|
||||||
|
/// first-call cost is one <see cref="ResiliencePipelineBuilder"/>.Build. Thereafter reads are O(1).</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class DriverResiliencePipelineBuilder
|
||||||
|
{
|
||||||
|
private readonly ConcurrentDictionary<PipelineKey, ResiliencePipeline> _pipelines = new();
|
||||||
|
private readonly TimeProvider _timeProvider;
|
||||||
|
|
||||||
|
/// <summary>Construct with the ambient clock (use <see cref="TimeProvider.System"/> in prod).</summary>
|
||||||
|
public DriverResiliencePipelineBuilder(TimeProvider? timeProvider = null)
|
||||||
|
{
|
||||||
|
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Get or build the pipeline for a given <c>(driver instance, host, capability)</c> triple.
|
||||||
|
/// Calls with the same key + same options reuse the same pipeline instance; the first caller
|
||||||
|
/// wins if a race occurs (both pipelines would be behaviourally identical).
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="driverInstanceId">DriverInstance primary key — opaque to this layer.</param>
|
||||||
|
/// <param name="hostName">
|
||||||
|
/// Host the call targets. For single-host drivers (Galaxy, some OPC UA Client configs) pass the
|
||||||
|
/// driver's canonical host string. For multi-host drivers (Modbus with N PLCs), pass the
|
||||||
|
/// specific PLC so one dead PLC doesn't poison healthy siblings.
|
||||||
|
/// </param>
|
||||||
|
/// <param name="capability">Which capability surface is being called.</param>
|
||||||
|
/// <param name="options">Per-driver-instance options (tier + per-capability overrides).</param>
|
||||||
|
public ResiliencePipeline GetOrCreate(
|
||||||
|
string driverInstanceId,
|
||||||
|
string hostName,
|
||||||
|
DriverCapability capability,
|
||||||
|
DriverResilienceOptions options)
|
||||||
|
{
|
||||||
|
ArgumentNullException.ThrowIfNull(options);
|
||||||
|
ArgumentException.ThrowIfNullOrWhiteSpace(hostName);
|
||||||
|
|
||||||
|
var key = new PipelineKey(driverInstanceId, hostName, capability);
|
||||||
|
return _pipelines.GetOrAdd(key, static (_, state) => Build(state.capability, state.options, state.timeProvider),
|
||||||
|
(capability, options, timeProvider: _timeProvider));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Drop cached pipelines for one driver instance (e.g. on ResilienceConfig change). Test + Admin-reload use.</summary>
|
||||||
|
public int Invalidate(string driverInstanceId)
|
||||||
|
{
|
||||||
|
var removed = 0;
|
||||||
|
foreach (var key in _pipelines.Keys)
|
||||||
|
{
|
||||||
|
if (key.DriverInstanceId == driverInstanceId && _pipelines.TryRemove(key, out _))
|
||||||
|
removed++;
|
||||||
|
}
|
||||||
|
return removed;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Snapshot of the current number of cached pipelines. For diagnostics only.</summary>
|
||||||
|
public int CachedPipelineCount => _pipelines.Count;
|
||||||
|
|
||||||
|
private static ResiliencePipeline Build(
|
||||||
|
DriverCapability capability,
|
||||||
|
DriverResilienceOptions options,
|
||||||
|
TimeProvider timeProvider)
|
||||||
|
{
|
||||||
|
var policy = options.Resolve(capability);
|
||||||
|
var builder = new ResiliencePipelineBuilder { TimeProvider = timeProvider };
|
||||||
|
|
||||||
|
builder.AddTimeout(new TimeoutStrategyOptions
|
||||||
|
{
|
||||||
|
Timeout = TimeSpan.FromSeconds(policy.TimeoutSeconds),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (policy.RetryCount > 0)
|
||||||
|
{
|
||||||
|
builder.AddRetry(new RetryStrategyOptions
|
||||||
|
{
|
||||||
|
MaxRetryAttempts = policy.RetryCount,
|
||||||
|
BackoffType = DelayBackoffType.Exponential,
|
||||||
|
UseJitter = true,
|
||||||
|
Delay = TimeSpan.FromMilliseconds(100),
|
||||||
|
MaxDelay = TimeSpan.FromSeconds(5),
|
||||||
|
ShouldHandle = new PredicateBuilder().Handle<Exception>(ex => ex is not OperationCanceledException),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
if (policy.BreakerFailureThreshold > 0)
|
||||||
|
{
|
||||||
|
builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions
|
||||||
|
{
|
||||||
|
FailureRatio = 1.0,
|
||||||
|
MinimumThroughput = policy.BreakerFailureThreshold,
|
||||||
|
SamplingDuration = TimeSpan.FromSeconds(30),
|
||||||
|
BreakDuration = TimeSpan.FromSeconds(15),
|
||||||
|
ShouldHandle = new PredicateBuilder().Handle<Exception>(ex => ex is not OperationCanceledException),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return builder.Build();
|
||||||
|
}
|
||||||
|
|
||||||
|
private readonly record struct PipelineKey(string DriverInstanceId, string HostName, DriverCapability Capability);
|
||||||
|
}
|
||||||
@@ -0,0 +1,104 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Process-singleton tracker of live resilience counters per
|
||||||
|
/// <c>(DriverInstanceId, HostName)</c>. Populated by the CapabilityInvoker and the
|
||||||
|
/// MemoryTracking layer; consumed by a HostedService that periodically persists a
|
||||||
|
/// snapshot to the <c>DriverInstanceResilienceStatus</c> table for Admin <c>/hosts</c>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per Phase 6.1 Stream E. No DB dependency here — the tracker is pure in-memory so
|
||||||
|
/// tests can exercise it without EF Core or SQL Server. The HostedService that writes
|
||||||
|
/// snapshots lives in the Server project (Stream E.2); the actual SignalR push + Blazor
|
||||||
|
/// page refresh (E.3) lands in a follow-up visual-review PR.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class DriverResilienceStatusTracker
|
||||||
|
{
|
||||||
|
private readonly ConcurrentDictionary<StatusKey, ResilienceStatusSnapshot> _status = new();
|
||||||
|
|
||||||
|
/// <summary>Record a Polly pipeline failure for <paramref name="hostName"/>.</summary>
|
||||||
|
public void RecordFailure(string driverInstanceId, string hostName, DateTime utcNow)
|
||||||
|
{
|
||||||
|
var key = new StatusKey(driverInstanceId, hostName);
|
||||||
|
_status.AddOrUpdate(key,
|
||||||
|
_ => new ResilienceStatusSnapshot { ConsecutiveFailures = 1, LastSampledUtc = utcNow },
|
||||||
|
(_, existing) => existing with
|
||||||
|
{
|
||||||
|
ConsecutiveFailures = existing.ConsecutiveFailures + 1,
|
||||||
|
LastSampledUtc = utcNow,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Reset the consecutive-failure count on a successful pipeline execution.</summary>
|
||||||
|
public void RecordSuccess(string driverInstanceId, string hostName, DateTime utcNow)
|
||||||
|
{
|
||||||
|
var key = new StatusKey(driverInstanceId, hostName);
|
||||||
|
_status.AddOrUpdate(key,
|
||||||
|
_ => new ResilienceStatusSnapshot { ConsecutiveFailures = 0, LastSampledUtc = utcNow },
|
||||||
|
(_, existing) => existing with
|
||||||
|
{
|
||||||
|
ConsecutiveFailures = 0,
|
||||||
|
LastSampledUtc = utcNow,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Record a circuit-breaker open event.</summary>
|
||||||
|
public void RecordBreakerOpen(string driverInstanceId, string hostName, DateTime utcNow)
|
||||||
|
{
|
||||||
|
var key = new StatusKey(driverInstanceId, hostName);
|
||||||
|
_status.AddOrUpdate(key,
|
||||||
|
_ => new ResilienceStatusSnapshot { LastBreakerOpenUtc = utcNow, LastSampledUtc = utcNow },
|
||||||
|
(_, existing) => existing with { LastBreakerOpenUtc = utcNow, LastSampledUtc = utcNow });
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Record a process recycle event (Tier C only).</summary>
|
||||||
|
public void RecordRecycle(string driverInstanceId, string hostName, DateTime utcNow)
|
||||||
|
{
|
||||||
|
var key = new StatusKey(driverInstanceId, hostName);
|
||||||
|
_status.AddOrUpdate(key,
|
||||||
|
_ => new ResilienceStatusSnapshot { LastRecycleUtc = utcNow, LastSampledUtc = utcNow },
|
||||||
|
(_, existing) => existing with { LastRecycleUtc = utcNow, LastSampledUtc = utcNow });
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Capture / update the MemoryTracking-supplied baseline + current footprint.</summary>
|
||||||
|
public void RecordFootprint(string driverInstanceId, string hostName, long baselineBytes, long currentBytes, DateTime utcNow)
|
||||||
|
{
|
||||||
|
var key = new StatusKey(driverInstanceId, hostName);
|
||||||
|
_status.AddOrUpdate(key,
|
||||||
|
_ => new ResilienceStatusSnapshot
|
||||||
|
{
|
||||||
|
BaselineFootprintBytes = baselineBytes,
|
||||||
|
CurrentFootprintBytes = currentBytes,
|
||||||
|
LastSampledUtc = utcNow,
|
||||||
|
},
|
||||||
|
(_, existing) => existing with
|
||||||
|
{
|
||||||
|
BaselineFootprintBytes = baselineBytes,
|
||||||
|
CurrentFootprintBytes = currentBytes,
|
||||||
|
LastSampledUtc = utcNow,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Snapshot of a specific (instance, host) pair; null if no counters recorded yet.</summary>
|
||||||
|
public ResilienceStatusSnapshot? TryGet(string driverInstanceId, string hostName) =>
|
||||||
|
_status.TryGetValue(new StatusKey(driverInstanceId, hostName), out var snapshot) ? snapshot : null;
|
||||||
|
|
||||||
|
/// <summary>Copy of every currently-tracked (instance, host, snapshot) triple. Safe under concurrent writes.</summary>
|
||||||
|
public IReadOnlyList<(string DriverInstanceId, string HostName, ResilienceStatusSnapshot Snapshot)> Snapshot() =>
|
||||||
|
_status.Select(kvp => (kvp.Key.DriverInstanceId, kvp.Key.HostName, kvp.Value)).ToList();
|
||||||
|
|
||||||
|
private readonly record struct StatusKey(string DriverInstanceId, string HostName);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Snapshot of the resilience counters for one <c>(DriverInstanceId, HostName)</c> pair.</summary>
|
||||||
|
public sealed record ResilienceStatusSnapshot
|
||||||
|
{
|
||||||
|
public int ConsecutiveFailures { get; init; }
|
||||||
|
public DateTime? LastBreakerOpenUtc { get; init; }
|
||||||
|
public DateTime? LastRecycleUtc { get; init; }
|
||||||
|
public long BaselineFootprintBytes { get; init; }
|
||||||
|
public long CurrentFootprintBytes { get; init; }
|
||||||
|
public DateTime LastSampledUtc { get; init; }
|
||||||
|
}
|
||||||
65
src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryRecycle.cs
Normal file
65
src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryRecycle.cs
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
using Microsoft.Extensions.Logging;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tier C only process-recycle companion to <see cref="MemoryTracking"/>. On a
|
||||||
|
/// <see cref="MemoryTrackingAction.HardBreach"/> signal, invokes the supplied
|
||||||
|
/// <see cref="IDriverSupervisor"/> to restart the out-of-process Host.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #74 and #145. Tier A/B hard-breach on an in-process
|
||||||
|
/// driver would kill every OPC UA session and every co-hosted driver, so for Tier A/B this
|
||||||
|
/// class logs a <b>promotion-to-Tier-C recommendation</b> and does NOT invoke any supervisor.
|
||||||
|
/// A future tier-migration workflow acts on the recommendation.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class MemoryRecycle
|
||||||
|
{
|
||||||
|
private readonly DriverTier _tier;
|
||||||
|
private readonly IDriverSupervisor? _supervisor;
|
||||||
|
private readonly ILogger<MemoryRecycle> _logger;
|
||||||
|
|
||||||
|
public MemoryRecycle(DriverTier tier, IDriverSupervisor? supervisor, ILogger<MemoryRecycle> logger)
|
||||||
|
{
|
||||||
|
_tier = tier;
|
||||||
|
_supervisor = supervisor;
|
||||||
|
_logger = logger;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Handle a <see cref="MemoryTracking"/> classification for the driver. For Tier C with a
|
||||||
|
/// wired supervisor, <c>HardBreach</c> triggers <see cref="IDriverSupervisor.RecycleAsync"/>.
|
||||||
|
/// All other combinations are no-ops with respect to process state (soft breaches + Tier A/B
|
||||||
|
/// hard breaches just log).
|
||||||
|
/// </summary>
|
||||||
|
/// <returns>True when a recycle was requested; false otherwise.</returns>
|
||||||
|
public async Task<bool> HandleAsync(MemoryTrackingAction action, long footprintBytes, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
switch (action)
|
||||||
|
{
|
||||||
|
case MemoryTrackingAction.SoftBreach:
|
||||||
|
_logger.LogWarning(
|
||||||
|
"Memory soft-breach on driver {DriverId}: footprint={Footprint:N0} bytes, tier={Tier}. Surfaced to Admin; no action.",
|
||||||
|
_supervisor?.DriverInstanceId ?? "(unknown)", footprintBytes, _tier);
|
||||||
|
return false;
|
||||||
|
|
||||||
|
case MemoryTrackingAction.HardBreach when _tier == DriverTier.C && _supervisor is not null:
|
||||||
|
_logger.LogError(
|
||||||
|
"Memory hard-breach on Tier C driver {DriverId}: footprint={Footprint:N0} bytes. Requesting supervisor recycle.",
|
||||||
|
_supervisor.DriverInstanceId, footprintBytes);
|
||||||
|
await _supervisor.RecycleAsync($"Memory hard-breach: {footprintBytes} bytes", cancellationToken).ConfigureAwait(false);
|
||||||
|
return true;
|
||||||
|
|
||||||
|
case MemoryTrackingAction.HardBreach:
|
||||||
|
_logger.LogError(
|
||||||
|
"Memory hard-breach on Tier {Tier} in-process driver {DriverId}: footprint={Footprint:N0} bytes. " +
|
||||||
|
"Recommending promotion to Tier C; NOT auto-killing (decisions #74, #145).",
|
||||||
|
_tier, _supervisor?.DriverInstanceId ?? "(unknown)", footprintBytes);
|
||||||
|
return false;
|
||||||
|
|
||||||
|
default:
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
136
src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryTracking.cs
Normal file
136
src/ZB.MOM.WW.OtOpcUa.Core/Stability/MemoryTracking.cs
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tier-agnostic memory-footprint tracker. Captures the post-initialize <b>baseline</b>
|
||||||
|
/// from the first samples after <c>IDriver.InitializeAsync</c>, then classifies each
|
||||||
|
/// subsequent sample against a hybrid soft/hard threshold per
|
||||||
|
/// <c>docs/v2/plan.md</c> decision #146 — <c>soft = max(multiplier × baseline, baseline + floor)</c>,
|
||||||
|
/// <c>hard = 2 × soft</c>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Per decision #145, this tracker <b>never kills a process</b>. Soft and hard breaches
|
||||||
|
/// log + surface to the Admin UI via <c>DriverInstanceResilienceStatus</c>. The matching
|
||||||
|
/// process-level recycle protection lives in a separate <c>MemoryRecycle</c> that activates
|
||||||
|
/// for Tier C drivers only (where the driver runs out-of-process behind a supervisor that
|
||||||
|
/// can safely restart it without tearing down the OPC UA session or co-hosted in-proc
|
||||||
|
/// drivers).</para>
|
||||||
|
///
|
||||||
|
/// <para>Baseline capture: the tracker starts in <see cref="TrackingPhase.WarmingUp"/> for
|
||||||
|
/// <see cref="BaselineWindow"/> (default 5 min). During that window samples are collected;
|
||||||
|
/// the baseline is computed as the median once the window elapses. Before that point every
|
||||||
|
/// classification returns <see cref="MemoryTrackingAction.Warming"/>.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class MemoryTracking
|
||||||
|
{
|
||||||
|
private readonly DriverTier _tier;
|
||||||
|
private readonly TimeSpan _baselineWindow;
|
||||||
|
private readonly List<long> _warmupSamples = [];
|
||||||
|
private long _baselineBytes;
|
||||||
|
private TrackingPhase _phase = TrackingPhase.WarmingUp;
|
||||||
|
private DateTime? _warmupStartUtc;
|
||||||
|
|
||||||
|
/// <summary>Tier-default multiplier/floor constants per decision #146.</summary>
|
||||||
|
public static (int Multiplier, long FloorBytes) GetTierConstants(DriverTier tier) => tier switch
|
||||||
|
{
|
||||||
|
DriverTier.A => (Multiplier: 3, FloorBytes: 50L * 1024 * 1024),
|
||||||
|
DriverTier.B => (Multiplier: 3, FloorBytes: 100L * 1024 * 1024),
|
||||||
|
DriverTier.C => (Multiplier: 2, FloorBytes: 500L * 1024 * 1024),
|
||||||
|
_ => throw new ArgumentOutOfRangeException(nameof(tier), tier, $"No memory-tracking constants defined for tier {tier}."),
|
||||||
|
};
|
||||||
|
|
||||||
|
/// <summary>Window over which post-init samples are collected to compute the baseline.</summary>
|
||||||
|
public TimeSpan BaselineWindow => _baselineWindow;
|
||||||
|
|
||||||
|
/// <summary>Current phase: <see cref="TrackingPhase.WarmingUp"/> or <see cref="TrackingPhase.Steady"/>.</summary>
|
||||||
|
public TrackingPhase Phase => _phase;
|
||||||
|
|
||||||
|
/// <summary>Captured baseline; 0 until warmup completes.</summary>
|
||||||
|
public long BaselineBytes => _baselineBytes;
|
||||||
|
|
||||||
|
/// <summary>Effective soft threshold (zero while warming up).</summary>
|
||||||
|
public long SoftThresholdBytes => _baselineBytes == 0 ? 0 : ComputeSoft(_tier, _baselineBytes);
|
||||||
|
|
||||||
|
/// <summary>Effective hard threshold = 2 × soft (zero while warming up).</summary>
|
||||||
|
public long HardThresholdBytes => _baselineBytes == 0 ? 0 : ComputeSoft(_tier, _baselineBytes) * 2;
|
||||||
|
|
||||||
|
public MemoryTracking(DriverTier tier, TimeSpan? baselineWindow = null)
|
||||||
|
{
|
||||||
|
_tier = tier;
|
||||||
|
_baselineWindow = baselineWindow ?? TimeSpan.FromMinutes(5);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Submit a memory-footprint sample. Returns the action the caller should surface.
|
||||||
|
/// During warmup, always returns <see cref="MemoryTrackingAction.Warming"/> and accumulates
|
||||||
|
/// samples; once the window elapses the first steady-phase sample triggers baseline capture
|
||||||
|
/// (median of warmup samples).
|
||||||
|
/// </summary>
|
||||||
|
public MemoryTrackingAction Sample(long footprintBytes, DateTime utcNow)
|
||||||
|
{
|
||||||
|
if (_phase == TrackingPhase.WarmingUp)
|
||||||
|
{
|
||||||
|
_warmupStartUtc ??= utcNow;
|
||||||
|
_warmupSamples.Add(footprintBytes);
|
||||||
|
if (utcNow - _warmupStartUtc.Value >= _baselineWindow && _warmupSamples.Count > 0)
|
||||||
|
{
|
||||||
|
_baselineBytes = ComputeMedian(_warmupSamples);
|
||||||
|
_phase = TrackingPhase.Steady;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
return MemoryTrackingAction.Warming;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (footprintBytes >= HardThresholdBytes) return MemoryTrackingAction.HardBreach;
|
||||||
|
if (footprintBytes >= SoftThresholdBytes) return MemoryTrackingAction.SoftBreach;
|
||||||
|
return MemoryTrackingAction.None;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static long ComputeSoft(DriverTier tier, long baseline)
|
||||||
|
{
|
||||||
|
var (multiplier, floor) = GetTierConstants(tier);
|
||||||
|
return Math.Max(multiplier * baseline, baseline + floor);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static long ComputeMedian(List<long> samples)
|
||||||
|
{
|
||||||
|
var sorted = samples.Order().ToArray();
|
||||||
|
var mid = sorted.Length / 2;
|
||||||
|
return sorted.Length % 2 == 1
|
||||||
|
? sorted[mid]
|
||||||
|
: (sorted[mid - 1] + sorted[mid]) / 2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Phase of a <see cref="MemoryTracking"/> lifecycle.</summary>
|
||||||
|
public enum TrackingPhase
|
||||||
|
{
|
||||||
|
/// <summary>Collecting post-init samples; baseline not yet computed.</summary>
|
||||||
|
WarmingUp,
|
||||||
|
|
||||||
|
/// <summary>Baseline captured; every sample classified against soft/hard thresholds.</summary>
|
||||||
|
Steady,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Classification the tracker returns per sample.</summary>
|
||||||
|
public enum MemoryTrackingAction
|
||||||
|
{
|
||||||
|
/// <summary>Baseline not yet captured; sample collected, no threshold check.</summary>
|
||||||
|
Warming,
|
||||||
|
|
||||||
|
/// <summary>Below soft threshold.</summary>
|
||||||
|
None,
|
||||||
|
|
||||||
|
/// <summary>Between soft and hard thresholds — log + surface, no action.</summary>
|
||||||
|
SoftBreach,
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// ≥ hard threshold. Log + surface + (Tier C only, via <c>MemoryRecycle</c>) request
|
||||||
|
/// process recycle via the driver supervisor. Tier A/B breach never invokes any
|
||||||
|
/// kill path per decisions #145 and #74.
|
||||||
|
/// </summary>
|
||||||
|
HardBreach,
|
||||||
|
}
|
||||||
@@ -0,0 +1,86 @@
|
|||||||
|
using Microsoft.Extensions.Logging;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tier C opt-in periodic-recycle driver per <c>docs/v2/plan.md</c> decision #67.
|
||||||
|
/// A tick method advanced by the caller (fed by a background timer in prod; by test clock
|
||||||
|
/// in unit tests) decides whether the configured interval has elapsed and, if so, drives the
|
||||||
|
/// supplied <see cref="IDriverSupervisor"/> to recycle the Host.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Tier A/B drivers MUST NOT use this class — scheduled recycle for in-process drivers would
|
||||||
|
/// kill every OPC UA session and every co-hosted driver. The ctor throws when constructed
|
||||||
|
/// with any tier other than C to make the misuse structurally impossible.
|
||||||
|
///
|
||||||
|
/// <para>Keeps no background thread of its own — callers invoke <see cref="TickAsync"/> on
|
||||||
|
/// their ambient scheduler tick (Phase 6.1 Stream C's health-endpoint host runs one). That
|
||||||
|
/// decouples the unit under test from wall-clock time and thread-pool scheduling.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class ScheduledRecycleScheduler
|
||||||
|
{
|
||||||
|
private readonly TimeSpan _recycleInterval;
|
||||||
|
private readonly IDriverSupervisor _supervisor;
|
||||||
|
private readonly ILogger<ScheduledRecycleScheduler> _logger;
|
||||||
|
private DateTime _nextRecycleUtc;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Construct the scheduler for a Tier C driver. Throws if <paramref name="tier"/> isn't C.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="tier">Driver tier; must be <see cref="DriverTier.C"/>.</param>
|
||||||
|
/// <param name="recycleInterval">Interval between recycles (e.g. 7 days).</param>
|
||||||
|
/// <param name="startUtc">Anchor time; next recycle fires at <paramref name="startUtc"/> + <paramref name="recycleInterval"/>.</param>
|
||||||
|
/// <param name="supervisor">Supervisor that performs the actual recycle.</param>
|
||||||
|
/// <param name="logger">Diagnostic sink.</param>
|
||||||
|
public ScheduledRecycleScheduler(
|
||||||
|
DriverTier tier,
|
||||||
|
TimeSpan recycleInterval,
|
||||||
|
DateTime startUtc,
|
||||||
|
IDriverSupervisor supervisor,
|
||||||
|
ILogger<ScheduledRecycleScheduler> logger)
|
||||||
|
{
|
||||||
|
if (tier != DriverTier.C)
|
||||||
|
throw new ArgumentException(
|
||||||
|
$"ScheduledRecycleScheduler is Tier C only (got {tier}). " +
|
||||||
|
"In-process drivers must not use scheduled recycle; see decisions #74 and #145.",
|
||||||
|
nameof(tier));
|
||||||
|
|
||||||
|
if (recycleInterval <= TimeSpan.Zero)
|
||||||
|
throw new ArgumentException("RecycleInterval must be positive.", nameof(recycleInterval));
|
||||||
|
|
||||||
|
_recycleInterval = recycleInterval;
|
||||||
|
_supervisor = supervisor;
|
||||||
|
_logger = logger;
|
||||||
|
_nextRecycleUtc = startUtc + recycleInterval;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Next scheduled recycle UTC. Advances by <see cref="RecycleInterval"/> on each fire.</summary>
|
||||||
|
public DateTime NextRecycleUtc => _nextRecycleUtc;
|
||||||
|
|
||||||
|
/// <summary>Recycle interval this scheduler was constructed with.</summary>
|
||||||
|
public TimeSpan RecycleInterval => _recycleInterval;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tick the scheduler forward. If <paramref name="utcNow"/> is past
|
||||||
|
/// <see cref="NextRecycleUtc"/>, requests a recycle from the supervisor and advances
|
||||||
|
/// <see cref="NextRecycleUtc"/> by exactly one interval. Returns true when a recycle fired.
|
||||||
|
/// </summary>
|
||||||
|
public async Task<bool> TickAsync(DateTime utcNow, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
if (utcNow < _nextRecycleUtc)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
_logger.LogInformation(
|
||||||
|
"Scheduled recycle due for Tier C driver {DriverId} at {Now:o}; advancing next to {Next:o}.",
|
||||||
|
_supervisor.DriverInstanceId, utcNow, _nextRecycleUtc + _recycleInterval);
|
||||||
|
|
||||||
|
await _supervisor.RecycleAsync("Scheduled periodic recycle", cancellationToken).ConfigureAwait(false);
|
||||||
|
_nextRecycleUtc += _recycleInterval;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Request an immediate recycle outside the schedule (e.g. MemoryRecycle hard-breach escalation).</summary>
|
||||||
|
public Task RequestRecycleNowAsync(string reason, CancellationToken cancellationToken) =>
|
||||||
|
_supervisor.RecycleAsync(reason, cancellationToken);
|
||||||
|
}
|
||||||
81
src/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs
Normal file
81
src/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Demand-aware driver-wedge detector per <c>docs/v2/plan.md</c> decision #147.
|
||||||
|
/// Flips a driver to <see cref="WedgeVerdict.Faulted"/> only when BOTH of the following hold:
|
||||||
|
/// (a) there is pending work outstanding, AND (b) no progress has been observed for longer
|
||||||
|
/// than <see cref="Threshold"/>. Idle drivers, write-only burst drivers, and subscription-only
|
||||||
|
/// drivers whose signals don't arrive regularly all stay Healthy.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>Pending work signal is supplied by the caller via <see cref="DemandSignal"/>:
|
||||||
|
/// non-zero Polly bulkhead depth, ≥1 active MonitoredItem, or ≥1 queued historian read
|
||||||
|
/// each qualifies. The detector itself is state-light: all it remembers is the last
|
||||||
|
/// <c>LastProgressUtc</c> it saw and the last wedge verdict. No history buffer.</para>
|
||||||
|
///
|
||||||
|
/// <para>Default threshold per plan: <c>5 × PublishingInterval</c>, with a minimum of 60 s.
|
||||||
|
/// Concrete values are driver-agnostic and configured per-instance by the caller.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class WedgeDetector
|
||||||
|
{
|
||||||
|
/// <summary>Wedge-detection threshold; pass < 60 s and the detector clamps to 60 s.</summary>
|
||||||
|
public TimeSpan Threshold { get; }
|
||||||
|
|
||||||
|
/// <summary>Whether the driver reported itself <see cref="DriverState.Healthy"/> at construction.</summary>
|
||||||
|
public WedgeDetector(TimeSpan threshold)
|
||||||
|
{
|
||||||
|
Threshold = threshold < TimeSpan.FromSeconds(60) ? TimeSpan.FromSeconds(60) : threshold;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Classify the current state against the demand signal. Does not retain state across
|
||||||
|
/// calls — each call is self-contained; the caller owns the <c>LastProgressUtc</c> clock.
|
||||||
|
/// </summary>
|
||||||
|
public WedgeVerdict Classify(DriverState state, DemandSignal demand, DateTime utcNow)
|
||||||
|
{
|
||||||
|
if (state != DriverState.Healthy)
|
||||||
|
return WedgeVerdict.NotApplicable;
|
||||||
|
|
||||||
|
if (!demand.HasPendingWork)
|
||||||
|
return WedgeVerdict.Idle;
|
||||||
|
|
||||||
|
var sinceProgress = utcNow - demand.LastProgressUtc;
|
||||||
|
return sinceProgress > Threshold ? WedgeVerdict.Faulted : WedgeVerdict.Healthy;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Caller-supplied demand snapshot. All three counters are OR'd — any non-zero means work
|
||||||
|
/// is outstanding, which is the trigger for checking the <see cref="LastProgressUtc"/> clock.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="BulkheadDepth">Polly bulkhead depth (in-flight capability calls).</param>
|
||||||
|
/// <param name="ActiveMonitoredItems">Number of live OPC UA MonitoredItems bound to this driver.</param>
|
||||||
|
/// <param name="QueuedHistoryReads">Pending historian-read requests the driver owes the server.</param>
|
||||||
|
/// <param name="LastProgressUtc">Last time the driver reported a successful unit of work (read, subscribe-ack, publish).</param>
|
||||||
|
public readonly record struct DemandSignal(
|
||||||
|
int BulkheadDepth,
|
||||||
|
int ActiveMonitoredItems,
|
||||||
|
int QueuedHistoryReads,
|
||||||
|
DateTime LastProgressUtc)
|
||||||
|
{
|
||||||
|
/// <summary>True when any of the three counters is > 0.</summary>
|
||||||
|
public bool HasPendingWork => BulkheadDepth > 0 || ActiveMonitoredItems > 0 || QueuedHistoryReads > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Outcome of a single <see cref="WedgeDetector.Classify"/> call.</summary>
|
||||||
|
public enum WedgeVerdict
|
||||||
|
{
|
||||||
|
/// <summary>Driver wasn't Healthy to begin with — wedge detection doesn't apply.</summary>
|
||||||
|
NotApplicable,
|
||||||
|
|
||||||
|
/// <summary>Driver claims Healthy + no pending work → stays Healthy.</summary>
|
||||||
|
Idle,
|
||||||
|
|
||||||
|
/// <summary>Driver claims Healthy + has pending work + has made progress within the threshold → stays Healthy.</summary>
|
||||||
|
Healthy,
|
||||||
|
|
||||||
|
/// <summary>Driver claims Healthy + has pending work + has NOT made progress within the threshold → wedged.</summary>
|
||||||
|
Faulted,
|
||||||
|
}
|
||||||
@@ -16,6 +16,11 @@
|
|||||||
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Configuration\ZB.MOM.WW.OtOpcUa.Configuration.csproj"/>
|
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Configuration\ZB.MOM.WW.OtOpcUa.Configuration.csproj"/>
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|
||||||
|
<ItemGroup>
|
||||||
|
<PackageReference Include="Polly.Core" Version="8.6.6"/>
|
||||||
|
<PackageReference Include="Serilog" Version="4.3.0"/>
|
||||||
|
</ItemGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Core.Tests"/>
|
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Core.Tests"/>
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|||||||
@@ -115,7 +115,8 @@ public sealed class ModbusDriver(ModbusDriverOptions options, string driverInsta
|
|||||||
ArrayDim: null,
|
ArrayDim: null,
|
||||||
SecurityClass: t.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly,
|
SecurityClass: t.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly,
|
||||||
IsHistorized: false,
|
IsHistorized: false,
|
||||||
IsAlarm: false));
|
IsAlarm: false,
|
||||||
|
WriteIdempotent: t.WriteIdempotent));
|
||||||
}
|
}
|
||||||
return Task.CompletedTask;
|
return Task.CompletedTask;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -92,6 +92,14 @@ public sealed class ModbusProbeOptions
|
|||||||
/// AutomationDirect DirectLOGIC (DL205/DL260) and a few legacy families pack the first
|
/// AutomationDirect DirectLOGIC (DL205/DL260) and a few legacy families pack the first
|
||||||
/// character in the low byte instead — see <c>docs/v2/dl205.md</c> §strings.
|
/// character in the low byte instead — see <c>docs/v2/dl205.md</c> §strings.
|
||||||
/// </param>
|
/// </param>
|
||||||
|
/// <param name="WriteIdempotent">
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #44, #45, #143 — flag a tag as safe to replay on
|
||||||
|
/// write timeout / failure. Default <c>false</c>; writes do not auto-retry. Safe candidates:
|
||||||
|
/// holding-register set-points for analog values and configuration registers where the same
|
||||||
|
/// value can be written again without side-effects. Unsafe: coils that drive edge-triggered
|
||||||
|
/// actions (pulse outputs), counter-increment addresses on PLCs that treat writes as deltas,
|
||||||
|
/// any BCD / counter register where repeat-writes advance state.
|
||||||
|
/// </param>
|
||||||
public sealed record ModbusTagDefinition(
|
public sealed record ModbusTagDefinition(
|
||||||
string Name,
|
string Name,
|
||||||
ModbusRegion Region,
|
ModbusRegion Region,
|
||||||
@@ -101,7 +109,8 @@ public sealed record ModbusTagDefinition(
|
|||||||
ModbusByteOrder ByteOrder = ModbusByteOrder.BigEndian,
|
ModbusByteOrder ByteOrder = ModbusByteOrder.BigEndian,
|
||||||
byte BitIndex = 0,
|
byte BitIndex = 0,
|
||||||
ushort StringLength = 0,
|
ushort StringLength = 0,
|
||||||
ModbusStringByteOrder StringByteOrder = ModbusStringByteOrder.HighByteFirst);
|
ModbusStringByteOrder StringByteOrder = ModbusStringByteOrder.HighByteFirst,
|
||||||
|
bool WriteIdempotent = false);
|
||||||
|
|
||||||
public enum ModbusRegion { Coils, DiscreteInputs, InputRegisters, HoldingRegisters }
|
public enum ModbusRegion { Coils, DiscreteInputs, InputRegisters, HoldingRegisters }
|
||||||
|
|
||||||
|
|||||||
@@ -341,7 +341,8 @@ public sealed class S7Driver(S7DriverOptions options, string driverInstanceId)
|
|||||||
ArrayDim: null,
|
ArrayDim: null,
|
||||||
SecurityClass: t.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly,
|
SecurityClass: t.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly,
|
||||||
IsHistorized: false,
|
IsHistorized: false,
|
||||||
IsAlarm: false));
|
IsAlarm: false,
|
||||||
|
WriteIdempotent: t.WriteIdempotent));
|
||||||
}
|
}
|
||||||
return Task.CompletedTask;
|
return Task.CompletedTask;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -88,12 +88,20 @@ public sealed class S7ProbeOptions
|
|||||||
/// <param name="DataType">Logical data type — drives the underlying S7.Net read/write width.</param>
|
/// <param name="DataType">Logical data type — drives the underlying S7.Net read/write width.</param>
|
||||||
/// <param name="Writable">When true the driver accepts writes for this tag.</param>
|
/// <param name="Writable">When true the driver accepts writes for this tag.</param>
|
||||||
/// <param name="StringLength">For <c>DataType = String</c>: S7-string max length. Default 254 (S7 max).</param>
|
/// <param name="StringLength">For <c>DataType = String</c>: S7-string max length. Default 254 (S7 max).</param>
|
||||||
|
/// <param name="WriteIdempotent">
|
||||||
|
/// Per <c>docs/v2/plan.md</c> decisions #44, #45, #143 — flag a tag as safe to replay on
|
||||||
|
/// write timeout / failure. Default <c>false</c>; writes do not auto-retry. Safe candidates
|
||||||
|
/// on S7: DB word/dword set-points holding analog values, configuration DBs where the same
|
||||||
|
/// value can be written again without side-effects. Unsafe: M (merker) bits or Q (output)
|
||||||
|
/// coils that drive edge-triggered routines in the PLC program.
|
||||||
|
/// </param>
|
||||||
public sealed record S7TagDefinition(
|
public sealed record S7TagDefinition(
|
||||||
string Name,
|
string Name,
|
||||||
string Address,
|
string Address,
|
||||||
S7DataType DataType,
|
S7DataType DataType,
|
||||||
bool Writable = true,
|
bool Writable = true,
|
||||||
int StringLength = 254);
|
int StringLength = 254,
|
||||||
|
bool WriteIdempotent = false);
|
||||||
|
|
||||||
public enum S7DataType
|
public enum S7DataType
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -0,0 +1,181 @@
|
|||||||
|
using System.Net;
|
||||||
|
using System.Text;
|
||||||
|
using System.Text.Json;
|
||||||
|
using Microsoft.Extensions.Logging;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Server.Observability;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Standalone <see cref="HttpListener"/> host for <c>/healthz</c> and <c>/readyz</c>
|
||||||
|
/// separate from the OPC UA binding. Per <c>docs/v2/implementation/phase-6-1-resilience-
|
||||||
|
/// and-observability.md</c> §Stream C.1.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Binds to <c>http://localhost:4841</c> by default — loopback avoids the Windows URL-ACL
|
||||||
|
/// elevation requirement that binding to <c>http://+:4841</c> (wildcard) would impose.
|
||||||
|
/// When a deployment needs remote probing, a reverse proxy or explicit netsh urlacl grant
|
||||||
|
/// is the expected path; documented in <c>docs/v2/Server-Deployment.md</c> in a follow-up.
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class HealthEndpointsHost : IAsyncDisposable
|
||||||
|
{
|
||||||
|
private readonly string _prefix;
|
||||||
|
private readonly DriverHost _driverHost;
|
||||||
|
private readonly Func<bool> _configDbHealthy;
|
||||||
|
private readonly Func<bool> _usingStaleConfig;
|
||||||
|
private readonly ILogger<HealthEndpointsHost> _logger;
|
||||||
|
private readonly HttpListener _listener = new();
|
||||||
|
private readonly DateTime _startedUtc = DateTime.UtcNow;
|
||||||
|
private CancellationTokenSource? _cts;
|
||||||
|
private Task? _acceptLoop;
|
||||||
|
private bool _disposed;
|
||||||
|
|
||||||
|
public HealthEndpointsHost(
|
||||||
|
DriverHost driverHost,
|
||||||
|
ILogger<HealthEndpointsHost> logger,
|
||||||
|
Func<bool>? configDbHealthy = null,
|
||||||
|
Func<bool>? usingStaleConfig = null,
|
||||||
|
string prefix = "http://localhost:4841/")
|
||||||
|
{
|
||||||
|
_driverHost = driverHost;
|
||||||
|
_logger = logger;
|
||||||
|
_configDbHealthy = configDbHealthy ?? (() => true);
|
||||||
|
_usingStaleConfig = usingStaleConfig ?? (() => false);
|
||||||
|
_prefix = prefix.EndsWith('/') ? prefix : prefix + "/";
|
||||||
|
_listener.Prefixes.Add(_prefix);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void Start()
|
||||||
|
{
|
||||||
|
_listener.Start();
|
||||||
|
_cts = new CancellationTokenSource();
|
||||||
|
_acceptLoop = Task.Run(() => AcceptLoopAsync(_cts.Token));
|
||||||
|
_logger.LogInformation("Health endpoints listening on {Prefix}", _prefix);
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task AcceptLoopAsync(CancellationToken ct)
|
||||||
|
{
|
||||||
|
while (!ct.IsCancellationRequested)
|
||||||
|
{
|
||||||
|
HttpListenerContext ctx;
|
||||||
|
try
|
||||||
|
{
|
||||||
|
ctx = await _listener.GetContextAsync().ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
catch (HttpListenerException) when (ct.IsCancellationRequested) { break; }
|
||||||
|
catch (ObjectDisposedException) { break; }
|
||||||
|
|
||||||
|
_ = Task.Run(() => HandleAsync(ctx), ct);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task HandleAsync(HttpListenerContext ctx)
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
var path = ctx.Request.Url?.AbsolutePath ?? "/";
|
||||||
|
switch (path)
|
||||||
|
{
|
||||||
|
case "/healthz":
|
||||||
|
await WriteHealthzAsync(ctx).ConfigureAwait(false);
|
||||||
|
break;
|
||||||
|
case "/readyz":
|
||||||
|
await WriteReadyzAsync(ctx).ConfigureAwait(false);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
ctx.Response.StatusCode = 404;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
catch (Exception ex)
|
||||||
|
{
|
||||||
|
_logger.LogWarning(ex, "Health endpoint handler failure");
|
||||||
|
try { ctx.Response.StatusCode = 500; } catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
try { ctx.Response.Close(); } catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task WriteHealthzAsync(HttpListenerContext ctx)
|
||||||
|
{
|
||||||
|
var configHealthy = _configDbHealthy();
|
||||||
|
var staleConfig = _usingStaleConfig();
|
||||||
|
// /healthz is 200 when process alive + (config DB reachable OR cache-warm).
|
||||||
|
// Stale-config still serves 200 so the process isn't flagged dead when the DB
|
||||||
|
// blips; the body surfaces the stale flag for operators.
|
||||||
|
var healthy = configHealthy || staleConfig;
|
||||||
|
ctx.Response.StatusCode = healthy ? 200 : 503;
|
||||||
|
|
||||||
|
var body = JsonSerializer.Serialize(new
|
||||||
|
{
|
||||||
|
status = healthy ? "healthy" : "unhealthy",
|
||||||
|
uptimeSeconds = (int)(DateTime.UtcNow - _startedUtc).TotalSeconds,
|
||||||
|
configDbReachable = configHealthy,
|
||||||
|
usingStaleConfig = staleConfig,
|
||||||
|
});
|
||||||
|
await WriteBodyAsync(ctx, body).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task WriteReadyzAsync(HttpListenerContext ctx)
|
||||||
|
{
|
||||||
|
var snapshots = BuildSnapshots();
|
||||||
|
var verdict = DriverHealthReport.Aggregate(snapshots);
|
||||||
|
ctx.Response.StatusCode = DriverHealthReport.HttpStatus(verdict);
|
||||||
|
|
||||||
|
var body = JsonSerializer.Serialize(new
|
||||||
|
{
|
||||||
|
verdict = verdict.ToString(),
|
||||||
|
uptimeSeconds = (int)(DateTime.UtcNow - _startedUtc).TotalSeconds,
|
||||||
|
drivers = snapshots.Select(d => new
|
||||||
|
{
|
||||||
|
id = d.DriverInstanceId,
|
||||||
|
state = d.State.ToString(),
|
||||||
|
detail = d.DetailMessage,
|
||||||
|
}).ToArray(),
|
||||||
|
degradedDrivers = snapshots
|
||||||
|
.Where(d => d.State == DriverState.Degraded || d.State == DriverState.Reconnecting)
|
||||||
|
.Select(d => d.DriverInstanceId)
|
||||||
|
.ToArray(),
|
||||||
|
});
|
||||||
|
await WriteBodyAsync(ctx, body).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
private IReadOnlyList<DriverHealthSnapshot> BuildSnapshots()
|
||||||
|
{
|
||||||
|
var list = new List<DriverHealthSnapshot>();
|
||||||
|
foreach (var id in _driverHost.RegisteredDriverIds)
|
||||||
|
{
|
||||||
|
var driver = _driverHost.GetDriver(id);
|
||||||
|
if (driver is null) continue;
|
||||||
|
var health = driver.GetHealth();
|
||||||
|
list.Add(new DriverHealthSnapshot(driver.DriverInstanceId, health.State, health.LastError));
|
||||||
|
}
|
||||||
|
return list;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static async Task WriteBodyAsync(HttpListenerContext ctx, string body)
|
||||||
|
{
|
||||||
|
var bytes = Encoding.UTF8.GetBytes(body);
|
||||||
|
ctx.Response.ContentType = "application/json; charset=utf-8";
|
||||||
|
ctx.Response.ContentLength64 = bytes.LongLength;
|
||||||
|
await ctx.Response.OutputStream.WriteAsync(bytes).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
public async ValueTask DisposeAsync()
|
||||||
|
{
|
||||||
|
if (_disposed) return;
|
||||||
|
_disposed = true;
|
||||||
|
_cts?.Cancel();
|
||||||
|
try { _listener.Stop(); } catch { /* ignore */ }
|
||||||
|
if (_acceptLoop is not null)
|
||||||
|
{
|
||||||
|
try { await _acceptLoop.ConfigureAwait(false); } catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
_listener.Close();
|
||||||
|
_cts?.Dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -3,6 +3,7 @@ using Microsoft.Extensions.Logging;
|
|||||||
using Opc.Ua;
|
using Opc.Ua;
|
||||||
using Opc.Ua.Server;
|
using Opc.Ua.Server;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
using DriverWriteRequest = ZB.MOM.WW.OtOpcUa.Core.Abstractions.WriteRequest;
|
using DriverWriteRequest = ZB.MOM.WW.OtOpcUa.Core.Abstractions.WriteRequest;
|
||||||
// Core.Abstractions defines a type-named HistoryReadResult (driver-side samples + continuation
|
// Core.Abstractions defines a type-named HistoryReadResult (driver-side samples + continuation
|
||||||
@@ -33,8 +34,14 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
private readonly IDriver _driver;
|
private readonly IDriver _driver;
|
||||||
private readonly IReadable? _readable;
|
private readonly IReadable? _readable;
|
||||||
private readonly IWritable? _writable;
|
private readonly IWritable? _writable;
|
||||||
|
private readonly CapabilityInvoker _invoker;
|
||||||
private readonly ILogger<DriverNodeManager> _logger;
|
private readonly ILogger<DriverNodeManager> _logger;
|
||||||
|
|
||||||
|
// Per-variable idempotency flag populated during Variable() registration from
|
||||||
|
// DriverAttributeInfo.WriteIdempotent. Drives ExecuteWriteAsync's retry gating in
|
||||||
|
// OnWriteValue; absent entries default to false (decisions #44, #45, #143).
|
||||||
|
private readonly Dictionary<string, bool> _writeIdempotentByFullRef = new(StringComparer.OrdinalIgnoreCase);
|
||||||
|
|
||||||
/// <summary>The driver whose address space this node manager exposes.</summary>
|
/// <summary>The driver whose address space this node manager exposes.</summary>
|
||||||
public IDriver Driver => _driver;
|
public IDriver Driver => _driver;
|
||||||
|
|
||||||
@@ -53,12 +60,13 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
private FolderState _currentFolder = null!;
|
private FolderState _currentFolder = null!;
|
||||||
|
|
||||||
public DriverNodeManager(IServerInternal server, ApplicationConfiguration configuration,
|
public DriverNodeManager(IServerInternal server, ApplicationConfiguration configuration,
|
||||||
IDriver driver, ILogger<DriverNodeManager> logger)
|
IDriver driver, CapabilityInvoker invoker, ILogger<DriverNodeManager> logger)
|
||||||
: base(server, configuration, namespaceUris: $"urn:OtOpcUa:{driver.DriverInstanceId}")
|
: base(server, configuration, namespaceUris: $"urn:OtOpcUa:{driver.DriverInstanceId}")
|
||||||
{
|
{
|
||||||
_driver = driver;
|
_driver = driver;
|
||||||
_readable = driver as IReadable;
|
_readable = driver as IReadable;
|
||||||
_writable = driver as IWritable;
|
_writable = driver as IWritable;
|
||||||
|
_invoker = invoker;
|
||||||
_logger = logger;
|
_logger = logger;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -148,6 +156,7 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
AddPredefinedNode(SystemContext, v);
|
AddPredefinedNode(SystemContext, v);
|
||||||
_variablesByFullRef[attributeInfo.FullName] = v;
|
_variablesByFullRef[attributeInfo.FullName] = v;
|
||||||
_securityByFullRef[attributeInfo.FullName] = attributeInfo.SecurityClass;
|
_securityByFullRef[attributeInfo.FullName] = attributeInfo.SecurityClass;
|
||||||
|
_writeIdempotentByFullRef[attributeInfo.FullName] = attributeInfo.WriteIdempotent;
|
||||||
|
|
||||||
v.OnReadValue = OnReadValue;
|
v.OnReadValue = OnReadValue;
|
||||||
v.OnWriteValue = OnWriteValue;
|
v.OnWriteValue = OnWriteValue;
|
||||||
@@ -188,7 +197,11 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
try
|
try
|
||||||
{
|
{
|
||||||
var fullRef = node.NodeId.Identifier as string ?? "";
|
var fullRef = node.NodeId.Identifier as string ?? "";
|
||||||
var result = _readable.ReadAsync([fullRef], CancellationToken.None).GetAwaiter().GetResult();
|
var result = _invoker.ExecuteAsync(
|
||||||
|
DriverCapability.Read,
|
||||||
|
_driver.DriverInstanceId,
|
||||||
|
async ct => (IReadOnlyList<DataValueSnapshot>)await _readable.ReadAsync([fullRef], ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
if (result.Count == 0)
|
if (result.Count == 0)
|
||||||
{
|
{
|
||||||
statusCode = StatusCodes.BadNoData;
|
statusCode = StatusCodes.BadNoData;
|
||||||
@@ -381,9 +394,15 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var results = _writable.WriteAsync(
|
var isIdempotent = _writeIdempotentByFullRef.GetValueOrDefault(fullRef!, false);
|
||||||
[new DriverWriteRequest(fullRef!, value)],
|
var capturedValue = value;
|
||||||
CancellationToken.None).GetAwaiter().GetResult();
|
var results = _invoker.ExecuteWriteAsync(
|
||||||
|
_driver.DriverInstanceId,
|
||||||
|
isIdempotent,
|
||||||
|
async ct => (IReadOnlyList<WriteResult>)await _writable.WriteAsync(
|
||||||
|
[new DriverWriteRequest(fullRef!, capturedValue)],
|
||||||
|
ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
if (results.Count > 0 && results[0].StatusCode != 0)
|
if (results.Count > 0 && results[0].StatusCode != 0)
|
||||||
{
|
{
|
||||||
statusCode = results[0].StatusCode;
|
statusCode = results[0].StatusCode;
|
||||||
@@ -465,12 +484,16 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var driverResult = History.ReadRawAsync(
|
var driverResult = _invoker.ExecuteAsync(
|
||||||
fullRef,
|
DriverCapability.HistoryRead,
|
||||||
details.StartTime,
|
_driver.DriverInstanceId,
|
||||||
details.EndTime,
|
async ct => await History.ReadRawAsync(
|
||||||
details.NumValuesPerNode,
|
fullRef,
|
||||||
CancellationToken.None).GetAwaiter().GetResult();
|
details.StartTime,
|
||||||
|
details.EndTime,
|
||||||
|
details.NumValuesPerNode,
|
||||||
|
ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
|
|
||||||
WriteResult(results, errors, i, StatusCodes.Good,
|
WriteResult(results, errors, i, StatusCodes.Good,
|
||||||
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
||||||
@@ -525,13 +548,17 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var driverResult = History.ReadProcessedAsync(
|
var driverResult = _invoker.ExecuteAsync(
|
||||||
fullRef,
|
DriverCapability.HistoryRead,
|
||||||
details.StartTime,
|
_driver.DriverInstanceId,
|
||||||
details.EndTime,
|
async ct => await History.ReadProcessedAsync(
|
||||||
interval,
|
fullRef,
|
||||||
aggregate.Value,
|
details.StartTime,
|
||||||
CancellationToken.None).GetAwaiter().GetResult();
|
details.EndTime,
|
||||||
|
interval,
|
||||||
|
aggregate.Value,
|
||||||
|
ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
|
|
||||||
WriteResult(results, errors, i, StatusCodes.Good,
|
WriteResult(results, errors, i, StatusCodes.Good,
|
||||||
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
||||||
@@ -578,8 +605,11 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var driverResult = History.ReadAtTimeAsync(
|
var driverResult = _invoker.ExecuteAsync(
|
||||||
fullRef, requestedTimes, CancellationToken.None).GetAwaiter().GetResult();
|
DriverCapability.HistoryRead,
|
||||||
|
_driver.DriverInstanceId,
|
||||||
|
async ct => await History.ReadAtTimeAsync(fullRef, requestedTimes, ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
|
|
||||||
WriteResult(results, errors, i, StatusCodes.Good,
|
WriteResult(results, errors, i, StatusCodes.Good,
|
||||||
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
BuildHistoryData(driverResult.Samples), driverResult.ContinuationPoint);
|
||||||
@@ -632,12 +662,16 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var driverResult = History.ReadEventsAsync(
|
var driverResult = _invoker.ExecuteAsync(
|
||||||
sourceName: fullRef,
|
DriverCapability.HistoryRead,
|
||||||
startUtc: details.StartTime,
|
_driver.DriverInstanceId,
|
||||||
endUtc: details.EndTime,
|
async ct => await History.ReadEventsAsync(
|
||||||
maxEvents: maxEvents,
|
sourceName: fullRef,
|
||||||
cancellationToken: CancellationToken.None).GetAwaiter().GetResult();
|
startUtc: details.StartTime,
|
||||||
|
endUtc: details.EndTime,
|
||||||
|
maxEvents: maxEvents,
|
||||||
|
cancellationToken: ct).ConfigureAwait(false),
|
||||||
|
CancellationToken.None).AsTask().GetAwaiter().GetResult();
|
||||||
|
|
||||||
WriteResult(results, errors, i, StatusCodes.Good,
|
WriteResult(results, errors, i, StatusCodes.Good,
|
||||||
BuildHistoryEvent(driverResult.Events), driverResult.ContinuationPoint);
|
BuildHistoryEvent(driverResult.Events), driverResult.ContinuationPoint);
|
||||||
|
|||||||
@@ -3,6 +3,8 @@ using Opc.Ua;
|
|||||||
using Opc.Ua.Configuration;
|
using Opc.Ua.Configuration;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.OpcUa;
|
using ZB.MOM.WW.OtOpcUa.Core.OpcUa;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Server.Observability;
|
||||||
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
|
|
||||||
namespace ZB.MOM.WW.OtOpcUa.Server.OpcUa;
|
namespace ZB.MOM.WW.OtOpcUa.Server.OpcUa;
|
||||||
@@ -20,18 +22,22 @@ public sealed class OpcUaApplicationHost : IAsyncDisposable
|
|||||||
private readonly OpcUaServerOptions _options;
|
private readonly OpcUaServerOptions _options;
|
||||||
private readonly DriverHost _driverHost;
|
private readonly DriverHost _driverHost;
|
||||||
private readonly IUserAuthenticator _authenticator;
|
private readonly IUserAuthenticator _authenticator;
|
||||||
|
private readonly DriverResiliencePipelineBuilder _pipelineBuilder;
|
||||||
private readonly ILoggerFactory _loggerFactory;
|
private readonly ILoggerFactory _loggerFactory;
|
||||||
private readonly ILogger<OpcUaApplicationHost> _logger;
|
private readonly ILogger<OpcUaApplicationHost> _logger;
|
||||||
private ApplicationInstance? _application;
|
private ApplicationInstance? _application;
|
||||||
private OtOpcUaServer? _server;
|
private OtOpcUaServer? _server;
|
||||||
|
private HealthEndpointsHost? _healthHost;
|
||||||
private bool _disposed;
|
private bool _disposed;
|
||||||
|
|
||||||
public OpcUaApplicationHost(OpcUaServerOptions options, DriverHost driverHost,
|
public OpcUaApplicationHost(OpcUaServerOptions options, DriverHost driverHost,
|
||||||
IUserAuthenticator authenticator, ILoggerFactory loggerFactory, ILogger<OpcUaApplicationHost> logger)
|
IUserAuthenticator authenticator, ILoggerFactory loggerFactory, ILogger<OpcUaApplicationHost> logger,
|
||||||
|
DriverResiliencePipelineBuilder? pipelineBuilder = null)
|
||||||
{
|
{
|
||||||
_options = options;
|
_options = options;
|
||||||
_driverHost = driverHost;
|
_driverHost = driverHost;
|
||||||
_authenticator = authenticator;
|
_authenticator = authenticator;
|
||||||
|
_pipelineBuilder = pipelineBuilder ?? new DriverResiliencePipelineBuilder();
|
||||||
_loggerFactory = loggerFactory;
|
_loggerFactory = loggerFactory;
|
||||||
_logger = logger;
|
_logger = logger;
|
||||||
}
|
}
|
||||||
@@ -58,12 +64,23 @@ public sealed class OpcUaApplicationHost : IAsyncDisposable
|
|||||||
throw new InvalidOperationException(
|
throw new InvalidOperationException(
|
||||||
$"OPC UA application certificate could not be validated or created in {_options.PkiStoreRoot}");
|
$"OPC UA application certificate could not be validated or created in {_options.PkiStoreRoot}");
|
||||||
|
|
||||||
_server = new OtOpcUaServer(_driverHost, _authenticator, _loggerFactory);
|
_server = new OtOpcUaServer(_driverHost, _authenticator, _pipelineBuilder, _loggerFactory);
|
||||||
await _application.Start(_server).ConfigureAwait(false);
|
await _application.Start(_server).ConfigureAwait(false);
|
||||||
|
|
||||||
_logger.LogInformation("OPC UA server started — endpoint={Endpoint} driverCount={Count}",
|
_logger.LogInformation("OPC UA server started — endpoint={Endpoint} driverCount={Count}",
|
||||||
_options.EndpointUrl, _server.DriverNodeManagers.Count);
|
_options.EndpointUrl, _server.DriverNodeManagers.Count);
|
||||||
|
|
||||||
|
// Phase 6.1 Stream C: health endpoints on :4841 (loopback by default — see
|
||||||
|
// HealthEndpointsHost remarks for the Windows URL-ACL tradeoff).
|
||||||
|
if (_options.HealthEndpointsEnabled)
|
||||||
|
{
|
||||||
|
_healthHost = new HealthEndpointsHost(
|
||||||
|
_driverHost,
|
||||||
|
_loggerFactory.CreateLogger<HealthEndpointsHost>(),
|
||||||
|
prefix: _options.HealthEndpointsPrefix);
|
||||||
|
_healthHost.Start();
|
||||||
|
}
|
||||||
|
|
||||||
// Drive each driver's discovery through its node manager. The node manager IS the
|
// Drive each driver's discovery through its node manager. The node manager IS the
|
||||||
// IAddressSpaceBuilder; GenericDriverNodeManager captures alarm-condition sinks into
|
// IAddressSpaceBuilder; GenericDriverNodeManager captures alarm-condition sinks into
|
||||||
// its internal map and wires OnAlarmEvent → sink routing.
|
// its internal map and wires OnAlarmEvent → sink routing.
|
||||||
@@ -217,6 +234,12 @@ public sealed class OpcUaApplicationHost : IAsyncDisposable
|
|||||||
{
|
{
|
||||||
_logger.LogWarning(ex, "OPC UA server stop threw during dispose");
|
_logger.LogWarning(ex, "OPC UA server stop threw during dispose");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (_healthHost is not null)
|
||||||
|
{
|
||||||
|
try { await _healthHost.DisposeAsync().ConfigureAwait(false); }
|
||||||
|
catch (Exception ex) { _logger.LogWarning(ex, "Health endpoints host dispose threw"); }
|
||||||
|
}
|
||||||
await Task.CompletedTask;
|
await Task.CompletedTask;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -58,6 +58,20 @@ public sealed class OpcUaServerOptions
|
|||||||
/// </summary>
|
/// </summary>
|
||||||
public bool AutoAcceptUntrustedClientCertificates { get; init; } = true;
|
public bool AutoAcceptUntrustedClientCertificates { get; init; } = true;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Whether to start the Phase 6.1 Stream C <c>/healthz</c> + <c>/readyz</c> HTTP listener.
|
||||||
|
/// Defaults to <c>true</c>; set false in embedded deployments that don't need HTTP
|
||||||
|
/// (e.g. tests that only exercise the OPC UA surface).
|
||||||
|
/// </summary>
|
||||||
|
public bool HealthEndpointsEnabled { get; init; } = true;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// URL prefix the health endpoints bind to. Default <c>http://localhost:4841/</c> — loopback
|
||||||
|
/// avoids Windows URL-ACL elevation. Production deployments that need remote probing should
|
||||||
|
/// either reverse-proxy or use <c>http://+:4841/</c> with netsh urlacl granted.
|
||||||
|
/// </summary>
|
||||||
|
public string HealthEndpointsPrefix { get; init; } = "http://localhost:4841/";
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Security profile advertised on the endpoint. Default <see cref="OpcUaSecurityProfile.None"/>
|
/// Security profile advertised on the endpoint. Default <see cref="OpcUaSecurityProfile.None"/>
|
||||||
/// preserves the PR 17 endpoint shape; set to <see cref="OpcUaSecurityProfile.Basic256Sha256SignAndEncrypt"/>
|
/// preserves the PR 17 endpoint shape; set to <see cref="OpcUaSecurityProfile.Basic256Sha256SignAndEncrypt"/>
|
||||||
|
|||||||
@@ -5,6 +5,7 @@ using Opc.Ua.Server;
|
|||||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.OpcUa;
|
using ZB.MOM.WW.OtOpcUa.Core.OpcUa;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
|
|
||||||
namespace ZB.MOM.WW.OtOpcUa.Server.OpcUa;
|
namespace ZB.MOM.WW.OtOpcUa.Server.OpcUa;
|
||||||
@@ -19,13 +20,19 @@ public sealed class OtOpcUaServer : StandardServer
|
|||||||
{
|
{
|
||||||
private readonly DriverHost _driverHost;
|
private readonly DriverHost _driverHost;
|
||||||
private readonly IUserAuthenticator _authenticator;
|
private readonly IUserAuthenticator _authenticator;
|
||||||
|
private readonly DriverResiliencePipelineBuilder _pipelineBuilder;
|
||||||
private readonly ILoggerFactory _loggerFactory;
|
private readonly ILoggerFactory _loggerFactory;
|
||||||
private readonly List<DriverNodeManager> _driverNodeManagers = new();
|
private readonly List<DriverNodeManager> _driverNodeManagers = new();
|
||||||
|
|
||||||
public OtOpcUaServer(DriverHost driverHost, IUserAuthenticator authenticator, ILoggerFactory loggerFactory)
|
public OtOpcUaServer(
|
||||||
|
DriverHost driverHost,
|
||||||
|
IUserAuthenticator authenticator,
|
||||||
|
DriverResiliencePipelineBuilder pipelineBuilder,
|
||||||
|
ILoggerFactory loggerFactory)
|
||||||
{
|
{
|
||||||
_driverHost = driverHost;
|
_driverHost = driverHost;
|
||||||
_authenticator = authenticator;
|
_authenticator = authenticator;
|
||||||
|
_pipelineBuilder = pipelineBuilder;
|
||||||
_loggerFactory = loggerFactory;
|
_loggerFactory = loggerFactory;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -46,7 +53,12 @@ public sealed class OtOpcUaServer : StandardServer
|
|||||||
if (driver is null) continue;
|
if (driver is null) continue;
|
||||||
|
|
||||||
var logger = _loggerFactory.CreateLogger<DriverNodeManager>();
|
var logger = _loggerFactory.CreateLogger<DriverNodeManager>();
|
||||||
var manager = new DriverNodeManager(server, configuration, driver, logger);
|
// Per-driver resilience options: default Tier A pending Stream B.1 which wires
|
||||||
|
// per-type tiers into DriverTypeRegistry. Read ResilienceConfig JSON from the
|
||||||
|
// DriverInstance row in a follow-up PR; for now every driver gets Tier A defaults.
|
||||||
|
var options = new DriverResilienceOptions { Tier = DriverTier.A };
|
||||||
|
var invoker = new CapabilityInvoker(_pipelineBuilder, driver.DriverInstanceId, () => options, driver.DriverType);
|
||||||
|
var manager = new DriverNodeManager(server, configuration, driver, invoker, logger);
|
||||||
_driverNodeManagers.Add(manager);
|
_driverNodeManagers.Add(manager);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -4,6 +4,7 @@ using Microsoft.Extensions.DependencyInjection;
|
|||||||
using Microsoft.Extensions.Hosting;
|
using Microsoft.Extensions.Hosting;
|
||||||
using Microsoft.Extensions.Logging;
|
using Microsoft.Extensions.Logging;
|
||||||
using Serilog;
|
using Serilog;
|
||||||
|
using Serilog.Formatting.Compact;
|
||||||
using ZB.MOM.WW.OtOpcUa.Configuration;
|
using ZB.MOM.WW.OtOpcUa.Configuration;
|
||||||
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
||||||
@@ -13,11 +14,25 @@ using ZB.MOM.WW.OtOpcUa.Server.Security;
|
|||||||
|
|
||||||
var builder = Host.CreateApplicationBuilder(args);
|
var builder = Host.CreateApplicationBuilder(args);
|
||||||
|
|
||||||
Log.Logger = new LoggerConfiguration()
|
// Per Phase 6.1 Stream C.3: SIEMs (Splunk, Datadog) ingest the JSON file without a
|
||||||
|
// regex parser. Plain-text rolling file stays on by default for human readability;
|
||||||
|
// JSON file is opt-in via appsetting `Serilog:WriteJson = true`.
|
||||||
|
var writeJson = builder.Configuration.GetValue<bool>("Serilog:WriteJson");
|
||||||
|
var loggerBuilder = new LoggerConfiguration()
|
||||||
.ReadFrom.Configuration(builder.Configuration)
|
.ReadFrom.Configuration(builder.Configuration)
|
||||||
|
.Enrich.FromLogContext()
|
||||||
.WriteTo.Console()
|
.WriteTo.Console()
|
||||||
.WriteTo.File("logs/otopcua-.log", rollingInterval: RollingInterval.Day)
|
.WriteTo.File("logs/otopcua-.log", rollingInterval: RollingInterval.Day);
|
||||||
.CreateLogger();
|
|
||||||
|
if (writeJson)
|
||||||
|
{
|
||||||
|
loggerBuilder = loggerBuilder.WriteTo.File(
|
||||||
|
new CompactJsonFormatter(),
|
||||||
|
"logs/otopcua-.json.log",
|
||||||
|
rollingInterval: RollingInterval.Day);
|
||||||
|
}
|
||||||
|
|
||||||
|
Log.Logger = loggerBuilder.CreateLogger();
|
||||||
|
|
||||||
builder.Services.AddSerilog();
|
builder.Services.AddSerilog();
|
||||||
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");
|
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");
|
||||||
|
|||||||
86
src/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs
Normal file
86
src/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
using Opc.Ua;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Bridges the OPC UA stack's <see cref="ISystemContext.UserIdentity"/> to the
|
||||||
|
/// <see cref="IPermissionEvaluator"/> evaluator. Resolves the session's
|
||||||
|
/// <see cref="UserAuthorizationState"/> from whatever the identity claims + the stack's
|
||||||
|
/// session handle, then delegates to the evaluator and returns a single bool the
|
||||||
|
/// dispatch paths can use to short-circuit with <c>BadUserAccessDenied</c>.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>This class is deliberately the single integration seam between the Server
|
||||||
|
/// project and the <c>Core.Authorization</c> evaluator. DriverNodeManager holds one
|
||||||
|
/// reference and calls <see cref="IsAllowed"/> on every Read / Write / HistoryRead /
|
||||||
|
/// Browse / Call / CreateMonitoredItems / etc. The evaluator itself stays pure — it
|
||||||
|
/// doesn't know about the OPC UA stack types.</para>
|
||||||
|
///
|
||||||
|
/// <para>Fail-open-during-transition: when the evaluator is configured with
|
||||||
|
/// <c>StrictMode = false</c>, missing cluster tries OR sessions without resolved
|
||||||
|
/// LDAP groups get <c>true</c> so existing deployments keep working while ACLs are
|
||||||
|
/// populated. Flip to strict via <c>Authorization:StrictMode = true</c> in production.</para>
|
||||||
|
/// </remarks>
|
||||||
|
public sealed class AuthorizationGate
|
||||||
|
{
|
||||||
|
private readonly IPermissionEvaluator _evaluator;
|
||||||
|
private readonly bool _strictMode;
|
||||||
|
private readonly TimeProvider _timeProvider;
|
||||||
|
|
||||||
|
public AuthorizationGate(IPermissionEvaluator evaluator, bool strictMode = false, TimeProvider? timeProvider = null)
|
||||||
|
{
|
||||||
|
_evaluator = evaluator ?? throw new ArgumentNullException(nameof(evaluator));
|
||||||
|
_strictMode = strictMode;
|
||||||
|
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>True when strict authorization is enabled — no-grant = denied.</summary>
|
||||||
|
public bool StrictMode => _strictMode;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Authorize an OPC UA operation against the session identity + scope. Returns true to
|
||||||
|
/// allow the dispatch to continue; false to surface <c>BadUserAccessDenied</c>.
|
||||||
|
/// </summary>
|
||||||
|
public bool IsAllowed(IUserIdentity? identity, OpcUaOperation operation, NodeScope scope)
|
||||||
|
{
|
||||||
|
// Anonymous / unknown identity — strict mode denies, lax mode allows so the fallback
|
||||||
|
// auth layers (WriteAuthzPolicy) still see the call.
|
||||||
|
if (identity is null) return !_strictMode;
|
||||||
|
|
||||||
|
var session = BuildSessionState(identity, scope.ClusterId);
|
||||||
|
if (session is null)
|
||||||
|
{
|
||||||
|
// Identity doesn't carry LDAP groups. In lax mode let the dispatch proceed so
|
||||||
|
// older deployments keep working; strict mode denies.
|
||||||
|
return !_strictMode;
|
||||||
|
}
|
||||||
|
|
||||||
|
var decision = _evaluator.Authorize(session, operation, scope);
|
||||||
|
if (decision.IsAllowed) return true;
|
||||||
|
|
||||||
|
return !_strictMode;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Materialize a <see cref="UserAuthorizationState"/> from the session identity.
|
||||||
|
/// Returns null when the identity doesn't carry LDAP group metadata.
|
||||||
|
/// </summary>
|
||||||
|
public UserAuthorizationState? BuildSessionState(IUserIdentity identity, string clusterId)
|
||||||
|
{
|
||||||
|
if (identity is not ILdapGroupsBearer bearer || bearer.LdapGroups.Count == 0)
|
||||||
|
return null;
|
||||||
|
|
||||||
|
var sessionId = identity.DisplayName ?? Guid.NewGuid().ToString("N");
|
||||||
|
return new UserAuthorizationState
|
||||||
|
{
|
||||||
|
SessionId = sessionId,
|
||||||
|
ClusterId = clusterId,
|
||||||
|
LdapGroups = bearer.LdapGroups,
|
||||||
|
MembershipResolvedUtc = _timeProvider.GetUtcNow().UtcDateTime,
|
||||||
|
AuthGenerationId = 0,
|
||||||
|
MembershipVersion = 0,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
20
src/ZB.MOM.WW.OtOpcUa.Server/Security/ILdapGroupsBearer.cs
Normal file
20
src/ZB.MOM.WW.OtOpcUa.Server/Security/ILdapGroupsBearer.cs
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
namespace ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Minimal interface an <see cref="Opc.Ua.IUserIdentity"/> exposes so the Phase 6.2
|
||||||
|
/// authorization evaluator can read the session's resolved LDAP group DNs without a
|
||||||
|
/// hard dependency on any specific identity subtype. Implemented by OtOpcUaServer's
|
||||||
|
/// role-based identity; tests stub it to drive the evaluator under different group
|
||||||
|
/// memberships.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// Control/data-plane separation (decision #150): Admin UI role routing consumes
|
||||||
|
/// <see cref="IRoleBearer.Roles"/> via <c>LdapGroupRoleMapping</c>; the OPC UA data-path
|
||||||
|
/// evaluator consumes <see cref="LdapGroups"/> directly against <c>NodeAcl</c>. The two
|
||||||
|
/// are sourced from the same directory query at sign-in but never cross.
|
||||||
|
/// </remarks>
|
||||||
|
public interface ILdapGroupsBearer
|
||||||
|
{
|
||||||
|
/// <summary>Fully-qualified LDAP group DNs the user is a member of.</summary>
|
||||||
|
IReadOnlyList<string> LdapGroups { get; }
|
||||||
|
}
|
||||||
@@ -21,6 +21,7 @@
|
|||||||
<PackageReference Include="Serilog.Settings.Configuration" Version="9.0.0"/>
|
<PackageReference Include="Serilog.Settings.Configuration" Version="9.0.0"/>
|
||||||
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
|
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0"/>
|
||||||
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0"/>
|
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0"/>
|
||||||
|
<PackageReference Include="Serilog.Formatting.Compact" Version="3.0.0"/>
|
||||||
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Server" Version="1.5.374.126"/>
|
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Server" Version="1.5.374.126"/>
|
||||||
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Configuration" Version="1.5.374.126"/>
|
<PackageReference Include="OPCFoundation.NetStandard.Opc.Ua.Configuration" Version="1.5.374.126"/>
|
||||||
<PackageReference Include="Novell.Directory.Ldap.NETStandard" Version="3.6.0"/>
|
<PackageReference Include="Novell.Directory.Ldap.NETStandard" Version="3.6.0"/>
|
||||||
|
|||||||
@@ -0,0 +1,146 @@
|
|||||||
|
using Microsoft.EntityFrameworkCore;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Admin.Services;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Admin.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class ValidatedNodeAclAuthoringServiceTests : IDisposable
|
||||||
|
{
|
||||||
|
private readonly OtOpcUaConfigDbContext _db;
|
||||||
|
|
||||||
|
public ValidatedNodeAclAuthoringServiceTests()
|
||||||
|
{
|
||||||
|
var options = new DbContextOptionsBuilder<OtOpcUaConfigDbContext>()
|
||||||
|
.UseInMemoryDatabase($"val-nodeacl-{Guid.NewGuid():N}")
|
||||||
|
.Options;
|
||||||
|
_db = new OtOpcUaConfigDbContext(options);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void Dispose() => _db.Dispose();
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_Rejects_NonePermissions()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(() => svc.GrantAsync(
|
||||||
|
draftGenerationId: 1, clusterId: "c1", ldapGroup: "cn=ops",
|
||||||
|
scopeKind: NodeAclScopeKind.Cluster, scopeId: null,
|
||||||
|
permissions: NodePermissions.None, notes: null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_Rejects_ClusterScope_With_ScopeId()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(() => svc.GrantAsync(
|
||||||
|
1, "c1", "cn=ops",
|
||||||
|
NodeAclScopeKind.Cluster, scopeId: "not-null-wrong",
|
||||||
|
NodePermissions.Read, null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_Rejects_SubClusterScope_Without_ScopeId()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(() => svc.GrantAsync(
|
||||||
|
1, "c1", "cn=ops",
|
||||||
|
NodeAclScopeKind.Equipment, scopeId: null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_Succeeds_When_Valid()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
|
||||||
|
var row = await svc.GrantAsync(
|
||||||
|
1, "c1", "cn=ops",
|
||||||
|
NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read | NodePermissions.Browse, "fleet reader", CancellationToken.None);
|
||||||
|
|
||||||
|
row.LdapGroup.ShouldBe("cn=ops");
|
||||||
|
row.PermissionFlags.ShouldBe(NodePermissions.Read | NodePermissions.Browse);
|
||||||
|
row.NodeAclId.ShouldNotBeNullOrWhiteSpace();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_Rejects_DuplicateScopeGroup_Pair()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
await svc.GrantAsync(1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(() => svc.GrantAsync(
|
||||||
|
1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.WriteOperate, null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_SameGroup_DifferentScope_IsAllowed()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
await svc.GrantAsync(1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
var tagRow = await svc.GrantAsync(1, "c1", "cn=ops",
|
||||||
|
NodeAclScopeKind.Tag, scopeId: "tag-xyz",
|
||||||
|
NodePermissions.WriteOperate, null, CancellationToken.None);
|
||||||
|
|
||||||
|
tagRow.ScopeKind.ShouldBe(NodeAclScopeKind.Tag);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Grant_SameGroupScope_DifferentDraft_IsAllowed()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
await svc.GrantAsync(1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
var draft2Row = await svc.GrantAsync(2, "c1", "cn=ops",
|
||||||
|
NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
draft2Row.GenerationId.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task UpdatePermissions_Rejects_None()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
var row = await svc.GrantAsync(1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(
|
||||||
|
() => svc.UpdatePermissionsAsync(row.NodeAclRowId, NodePermissions.None, null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task UpdatePermissions_RoundTrips_NewFlags()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
var row = await svc.GrantAsync(1, "c1", "cn=ops", NodeAclScopeKind.Cluster, null,
|
||||||
|
NodePermissions.Read, null, CancellationToken.None);
|
||||||
|
|
||||||
|
var updated = await svc.UpdatePermissionsAsync(row.NodeAclRowId,
|
||||||
|
NodePermissions.Read | NodePermissions.WriteOperate, "bumped", CancellationToken.None);
|
||||||
|
|
||||||
|
updated.PermissionFlags.ShouldBe(NodePermissions.Read | NodePermissions.WriteOperate);
|
||||||
|
updated.Notes.ShouldBe("bumped");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task UpdatePermissions_MissingRow_Throws()
|
||||||
|
{
|
||||||
|
var svc = new ValidatedNodeAclAuthoringService(_db);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidNodeAclGrantException>(
|
||||||
|
() => svc.UpdatePermissionsAsync(Guid.NewGuid(), NodePermissions.Read, null, CancellationToken.None));
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -22,6 +22,7 @@
|
|||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Admin\ZB.MOM.WW.OtOpcUa.Admin.csproj"/>
|
<ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Admin\ZB.MOM.WW.OtOpcUa.Admin.csproj"/>
|
||||||
<PackageReference Include="Microsoft.Data.SqlClient" Version="6.1.1"/>
|
<PackageReference Include="Microsoft.Data.SqlClient" Version="6.1.1"/>
|
||||||
|
<PackageReference Include="Microsoft.EntityFrameworkCore.InMemory" Version="10.0.0"/>
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
|
|||||||
@@ -0,0 +1,157 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class GenerationSealedCacheTests : IDisposable
|
||||||
|
{
|
||||||
|
private readonly string _root = Path.Combine(Path.GetTempPath(), $"otopcua-sealed-{Guid.NewGuid():N}");
|
||||||
|
|
||||||
|
public void Dispose()
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
if (!Directory.Exists(_root)) return;
|
||||||
|
// Remove ReadOnly attribute first so Directory.Delete can clean sealed files.
|
||||||
|
foreach (var f in Directory.EnumerateFiles(_root, "*", SearchOption.AllDirectories))
|
||||||
|
File.SetAttributes(f, FileAttributes.Normal);
|
||||||
|
Directory.Delete(_root, recursive: true);
|
||||||
|
}
|
||||||
|
catch { /* best-effort cleanup */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
private GenerationSnapshot MakeSnapshot(string clusterId, long generationId, string payload = "{\"sample\":true}") =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
ClusterId = clusterId,
|
||||||
|
GenerationId = generationId,
|
||||||
|
CachedAt = DateTime.UtcNow,
|
||||||
|
PayloadJson = payload,
|
||||||
|
};
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task FirstBoot_NoSnapshot_ReadThrows()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<GenerationCacheUnavailableException>(
|
||||||
|
() => cache.ReadCurrentAsync("cluster-a"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SealThenRead_RoundTrips()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
var snapshot = MakeSnapshot("cluster-a", 42, "{\"hello\":\"world\"}");
|
||||||
|
|
||||||
|
await cache.SealAsync(snapshot);
|
||||||
|
|
||||||
|
var read = await cache.ReadCurrentAsync("cluster-a");
|
||||||
|
read.GenerationId.ShouldBe(42);
|
||||||
|
read.ClusterId.ShouldBe("cluster-a");
|
||||||
|
read.PayloadJson.ShouldBe("{\"hello\":\"world\"}");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SealedFile_IsReadOnly_OnDisk()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 5));
|
||||||
|
|
||||||
|
var sealedPath = Path.Combine(_root, "cluster-a", "5.db");
|
||||||
|
File.Exists(sealedPath).ShouldBeTrue();
|
||||||
|
var attrs = File.GetAttributes(sealedPath);
|
||||||
|
attrs.HasFlag(FileAttributes.ReadOnly).ShouldBeTrue("sealed file must be read-only");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SealingTwoGenerations_PointerAdvances_ToLatest()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 1));
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 2));
|
||||||
|
|
||||||
|
cache.TryGetCurrentGenerationId("cluster-a").ShouldBe(2);
|
||||||
|
var read = await cache.ReadCurrentAsync("cluster-a");
|
||||||
|
read.GenerationId.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PriorGenerationFile_Survives_AfterNewSeal()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 1));
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 2));
|
||||||
|
|
||||||
|
File.Exists(Path.Combine(_root, "cluster-a", "1.db")).ShouldBeTrue(
|
||||||
|
"prior generations preserved for audit; pruning is separate");
|
||||||
|
File.Exists(Path.Combine(_root, "cluster-a", "2.db")).ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CorruptSealedFile_ReadFailsClosed()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 7));
|
||||||
|
|
||||||
|
// Corrupt the sealed file: clear read-only, truncate, leave pointer intact.
|
||||||
|
var sealedPath = Path.Combine(_root, "cluster-a", "7.db");
|
||||||
|
File.SetAttributes(sealedPath, FileAttributes.Normal);
|
||||||
|
File.WriteAllBytes(sealedPath, [0x00, 0x01, 0x02]);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<GenerationCacheUnavailableException>(
|
||||||
|
() => cache.ReadCurrentAsync("cluster-a"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task MissingSealedFile_ReadFailsClosed()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 3));
|
||||||
|
|
||||||
|
// Delete the sealed file but leave the pointer — corruption scenario.
|
||||||
|
var sealedPath = Path.Combine(_root, "cluster-a", "3.db");
|
||||||
|
File.SetAttributes(sealedPath, FileAttributes.Normal);
|
||||||
|
File.Delete(sealedPath);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<GenerationCacheUnavailableException>(
|
||||||
|
() => cache.ReadCurrentAsync("cluster-a"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CorruptPointerFile_ReadFailsClosed()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 9));
|
||||||
|
|
||||||
|
var pointerPath = Path.Combine(_root, "cluster-a", "CURRENT");
|
||||||
|
File.WriteAllText(pointerPath, "not-a-number");
|
||||||
|
|
||||||
|
await Should.ThrowAsync<GenerationCacheUnavailableException>(
|
||||||
|
() => cache.ReadCurrentAsync("cluster-a"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SealSameGenerationTwice_IsIdempotent()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 11));
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 11, "{\"v\":2}"));
|
||||||
|
|
||||||
|
var read = await cache.ReadCurrentAsync("cluster-a");
|
||||||
|
read.PayloadJson.ShouldBe("{\"sample\":true}", "sealed file is immutable; second seal no-ops");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task IndependentClusters_DoNotInterfere()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-a", 1));
|
||||||
|
await cache.SealAsync(MakeSnapshot("cluster-b", 10));
|
||||||
|
|
||||||
|
(await cache.ReadCurrentAsync("cluster-a")).GenerationId.ShouldBe(1);
|
||||||
|
(await cache.ReadCurrentAsync("cluster-b")).GenerationId.ShouldBe(10);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,138 @@
|
|||||||
|
using Microsoft.EntityFrameworkCore;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Services;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class LdapGroupRoleMappingServiceTests : IDisposable
|
||||||
|
{
|
||||||
|
private readonly OtOpcUaConfigDbContext _db;
|
||||||
|
|
||||||
|
public LdapGroupRoleMappingServiceTests()
|
||||||
|
{
|
||||||
|
var options = new DbContextOptionsBuilder<OtOpcUaConfigDbContext>()
|
||||||
|
.UseInMemoryDatabase($"ldap-grm-{Guid.NewGuid():N}")
|
||||||
|
.Options;
|
||||||
|
_db = new OtOpcUaConfigDbContext(options);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void Dispose() => _db.Dispose();
|
||||||
|
|
||||||
|
private LdapGroupRoleMapping Make(string group, AdminRole role, string? clusterId = null, bool? isSystemWide = null) =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
LdapGroup = group,
|
||||||
|
Role = role,
|
||||||
|
ClusterId = clusterId,
|
||||||
|
IsSystemWide = isSystemWide ?? (clusterId is null),
|
||||||
|
};
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Create_SetsId_AndCreatedAtUtc()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
var row = Make("cn=fleet,dc=x", AdminRole.FleetAdmin);
|
||||||
|
|
||||||
|
var saved = await svc.CreateAsync(row, CancellationToken.None);
|
||||||
|
|
||||||
|
saved.Id.ShouldNotBe(Guid.Empty);
|
||||||
|
saved.CreatedAtUtc.ShouldBeGreaterThan(DateTime.UtcNow.AddMinutes(-1));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Create_Rejects_EmptyLdapGroup()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
var row = Make("", AdminRole.FleetAdmin);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidLdapGroupRoleMappingException>(
|
||||||
|
() => svc.CreateAsync(row, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Create_Rejects_SystemWide_With_ClusterId()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
var row = Make("cn=g", AdminRole.ConfigViewer, clusterId: "c1", isSystemWide: true);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidLdapGroupRoleMappingException>(
|
||||||
|
() => svc.CreateAsync(row, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Create_Rejects_NonSystemWide_WithoutClusterId()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
var row = Make("cn=g", AdminRole.ConfigViewer, clusterId: null, isSystemWide: false);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidLdapGroupRoleMappingException>(
|
||||||
|
() => svc.CreateAsync(row, CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task GetByGroups_Returns_MatchingGrants_Only()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
await svc.CreateAsync(Make("cn=fleet,dc=x", AdminRole.FleetAdmin), CancellationToken.None);
|
||||||
|
await svc.CreateAsync(Make("cn=editor,dc=x", AdminRole.ConfigEditor), CancellationToken.None);
|
||||||
|
await svc.CreateAsync(Make("cn=viewer,dc=x", AdminRole.ConfigViewer), CancellationToken.None);
|
||||||
|
|
||||||
|
var results = await svc.GetByGroupsAsync(
|
||||||
|
["cn=fleet,dc=x", "cn=viewer,dc=x"], CancellationToken.None);
|
||||||
|
|
||||||
|
results.Count.ShouldBe(2);
|
||||||
|
results.Select(r => r.Role).ShouldBe([AdminRole.FleetAdmin, AdminRole.ConfigViewer], ignoreOrder: true);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task GetByGroups_Empty_Input_ReturnsEmpty()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
await svc.CreateAsync(Make("cn=fleet,dc=x", AdminRole.FleetAdmin), CancellationToken.None);
|
||||||
|
|
||||||
|
var results = await svc.GetByGroupsAsync([], CancellationToken.None);
|
||||||
|
|
||||||
|
results.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ListAll_Orders_ByGroupThenCluster()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
await svc.CreateAsync(Make("cn=b,dc=x", AdminRole.FleetAdmin), CancellationToken.None);
|
||||||
|
await svc.CreateAsync(Make("cn=a,dc=x", AdminRole.ConfigEditor, clusterId: "c2", isSystemWide: false), CancellationToken.None);
|
||||||
|
await svc.CreateAsync(Make("cn=a,dc=x", AdminRole.ConfigEditor, clusterId: "c1", isSystemWide: false), CancellationToken.None);
|
||||||
|
|
||||||
|
var results = await svc.ListAllAsync(CancellationToken.None);
|
||||||
|
|
||||||
|
results[0].LdapGroup.ShouldBe("cn=a,dc=x");
|
||||||
|
results[0].ClusterId.ShouldBe("c1");
|
||||||
|
results[1].ClusterId.ShouldBe("c2");
|
||||||
|
results[2].LdapGroup.ShouldBe("cn=b,dc=x");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Delete_Removes_Matching_Row()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
var saved = await svc.CreateAsync(Make("cn=fleet,dc=x", AdminRole.FleetAdmin), CancellationToken.None);
|
||||||
|
|
||||||
|
await svc.DeleteAsync(saved.Id, CancellationToken.None);
|
||||||
|
|
||||||
|
var after = await svc.ListAllAsync(CancellationToken.None);
|
||||||
|
after.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Delete_Unknown_Id_IsNoOp()
|
||||||
|
{
|
||||||
|
var svc = new LdapGroupRoleMappingService(_db);
|
||||||
|
|
||||||
|
await svc.DeleteAsync(Guid.NewGuid(), CancellationToken.None);
|
||||||
|
// no exception
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,154 @@
|
|||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Configuration.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class ResilientConfigReaderTests : IDisposable
|
||||||
|
{
|
||||||
|
private readonly string _root = Path.Combine(Path.GetTempPath(), $"otopcua-reader-{Guid.NewGuid():N}");
|
||||||
|
|
||||||
|
public void Dispose()
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
if (!Directory.Exists(_root)) return;
|
||||||
|
foreach (var f in Directory.EnumerateFiles(_root, "*", SearchOption.AllDirectories))
|
||||||
|
File.SetAttributes(f, FileAttributes.Normal);
|
||||||
|
Directory.Delete(_root, recursive: true);
|
||||||
|
}
|
||||||
|
catch { /* best-effort */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CentralDbSucceeds_ReturnsValue_MarksFresh()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
var flag = new StaleConfigFlag { };
|
||||||
|
flag.MarkStale(); // pre-existing stale state
|
||||||
|
var reader = new ResilientConfigReader(cache, flag, NullLogger<ResilientConfigReader>.Instance);
|
||||||
|
|
||||||
|
var result = await reader.ReadAsync(
|
||||||
|
"cluster-a",
|
||||||
|
_ => ValueTask.FromResult("fresh-from-db"),
|
||||||
|
_ => "from-cache",
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
result.ShouldBe("fresh-from-db");
|
||||||
|
flag.IsStale.ShouldBeFalse("successful central-DB read clears stale flag");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CentralDbFails_ExhaustsRetries_FallsBackToCache_MarksStale()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
await cache.SealAsync(new GenerationSnapshot
|
||||||
|
{
|
||||||
|
ClusterId = "cluster-a", GenerationId = 99, CachedAt = DateTime.UtcNow,
|
||||||
|
PayloadJson = "{\"cached\":true}",
|
||||||
|
});
|
||||||
|
var flag = new StaleConfigFlag();
|
||||||
|
var reader = new ResilientConfigReader(cache, flag, NullLogger<ResilientConfigReader>.Instance,
|
||||||
|
timeout: TimeSpan.FromSeconds(10), retryCount: 2);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
var result = await reader.ReadAsync(
|
||||||
|
"cluster-a",
|
||||||
|
_ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
throw new InvalidOperationException("SQL dead");
|
||||||
|
#pragma warning disable CS0162
|
||||||
|
return ValueTask.FromResult("never");
|
||||||
|
#pragma warning restore CS0162
|
||||||
|
},
|
||||||
|
snap => snap.PayloadJson,
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
attempts.ShouldBe(3, "1 initial + 2 retries = 3 attempts");
|
||||||
|
result.ShouldBe("{\"cached\":true}");
|
||||||
|
flag.IsStale.ShouldBeTrue("cache fallback flips stale flag true");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CentralDbFails_AndCacheAlsoUnavailable_Throws()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
var flag = new StaleConfigFlag();
|
||||||
|
var reader = new ResilientConfigReader(cache, flag, NullLogger<ResilientConfigReader>.Instance,
|
||||||
|
timeout: TimeSpan.FromSeconds(10), retryCount: 0);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<GenerationCacheUnavailableException>(async () =>
|
||||||
|
{
|
||||||
|
await reader.ReadAsync<string>(
|
||||||
|
"cluster-a",
|
||||||
|
_ => throw new InvalidOperationException("SQL dead"),
|
||||||
|
_ => "never",
|
||||||
|
CancellationToken.None);
|
||||||
|
});
|
||||||
|
|
||||||
|
flag.IsStale.ShouldBeFalse("no snapshot ever served, so flag stays whatever it was");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Cancellation_NotRetried()
|
||||||
|
{
|
||||||
|
var cache = new GenerationSealedCache(_root);
|
||||||
|
var flag = new StaleConfigFlag();
|
||||||
|
var reader = new ResilientConfigReader(cache, flag, NullLogger<ResilientConfigReader>.Instance,
|
||||||
|
timeout: TimeSpan.FromSeconds(10), retryCount: 5);
|
||||||
|
using var cts = new CancellationTokenSource();
|
||||||
|
cts.Cancel();
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
await Should.ThrowAsync<OperationCanceledException>(async () =>
|
||||||
|
{
|
||||||
|
await reader.ReadAsync<string>(
|
||||||
|
"cluster-a",
|
||||||
|
ct =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
ct.ThrowIfCancellationRequested();
|
||||||
|
return ValueTask.FromResult("ok");
|
||||||
|
},
|
||||||
|
_ => "cache",
|
||||||
|
cts.Token);
|
||||||
|
});
|
||||||
|
|
||||||
|
attempts.ShouldBeLessThanOrEqualTo(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class StaleConfigFlagTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public void Default_IsFresh()
|
||||||
|
{
|
||||||
|
new StaleConfigFlag().IsStale.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void MarkStale_ThenFresh_Toggles()
|
||||||
|
{
|
||||||
|
var flag = new StaleConfigFlag();
|
||||||
|
flag.MarkStale();
|
||||||
|
flag.IsStale.ShouldBeTrue();
|
||||||
|
flag.MarkFresh();
|
||||||
|
flag.IsStale.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ConcurrentWrites_Converge()
|
||||||
|
{
|
||||||
|
var flag = new StaleConfigFlag();
|
||||||
|
Parallel.For(0, 1000, i =>
|
||||||
|
{
|
||||||
|
if (i % 2 == 0) flag.MarkStale(); else flag.MarkFresh();
|
||||||
|
});
|
||||||
|
flag.MarkFresh();
|
||||||
|
flag.IsStale.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -29,6 +29,8 @@ public sealed class SchemaComplianceTests
|
|||||||
"DriverInstance", "Device", "Equipment", "Tag", "PollGroup",
|
"DriverInstance", "Device", "Equipment", "Tag", "PollGroup",
|
||||||
"NodeAcl", "ExternalIdReservation",
|
"NodeAcl", "ExternalIdReservation",
|
||||||
"DriverHostStatus",
|
"DriverHostStatus",
|
||||||
|
"DriverInstanceResilienceStatus",
|
||||||
|
"LdapGroupRoleMapping",
|
||||||
};
|
};
|
||||||
|
|
||||||
var actual = QueryStrings(@"
|
var actual = QueryStrings(@"
|
||||||
|
|||||||
@@ -14,6 +14,7 @@
|
|||||||
<PackageReference Include="Shouldly" Version="4.3.0"/>
|
<PackageReference Include="Shouldly" Version="4.3.0"/>
|
||||||
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/>
|
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/>
|
||||||
<PackageReference Include="Microsoft.Data.SqlClient" Version="6.1.1"/>
|
<PackageReference Include="Microsoft.Data.SqlClient" Version="6.1.1"/>
|
||||||
|
<PackageReference Include="Microsoft.EntityFrameworkCore.InMemory" Version="10.0.0"/>
|
||||||
<PackageReference Include="xunit.runner.visualstudio" Version="3.0.2">
|
<PackageReference Include="xunit.runner.visualstudio" Version="3.0.2">
|
||||||
<PrivateAssets>all</PrivateAssets>
|
<PrivateAssets>all</PrivateAssets>
|
||||||
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
|
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
|
||||||
|
|||||||
@@ -7,11 +7,13 @@ public sealed class DriverTypeRegistryTests
|
|||||||
{
|
{
|
||||||
private static DriverTypeMetadata SampleMetadata(
|
private static DriverTypeMetadata SampleMetadata(
|
||||||
string typeName = "Modbus",
|
string typeName = "Modbus",
|
||||||
NamespaceKindCompatibility allowed = NamespaceKindCompatibility.Equipment) =>
|
NamespaceKindCompatibility allowed = NamespaceKindCompatibility.Equipment,
|
||||||
|
DriverTier tier = DriverTier.B) =>
|
||||||
new(typeName, allowed,
|
new(typeName, allowed,
|
||||||
DriverConfigJsonSchema: "{\"type\": \"object\"}",
|
DriverConfigJsonSchema: "{\"type\": \"object\"}",
|
||||||
DeviceConfigJsonSchema: "{\"type\": \"object\"}",
|
DeviceConfigJsonSchema: "{\"type\": \"object\"}",
|
||||||
TagConfigJsonSchema: "{\"type\": \"object\"}");
|
TagConfigJsonSchema: "{\"type\": \"object\"}",
|
||||||
|
Tier: tier);
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public void Register_ThenGet_RoundTrips()
|
public void Register_ThenGet_RoundTrips()
|
||||||
@@ -24,6 +26,20 @@ public sealed class DriverTypeRegistryTests
|
|||||||
registry.Get("Modbus").ShouldBe(metadata);
|
registry.Get("Modbus").ShouldBe(metadata);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
[InlineData(DriverTier.C)]
|
||||||
|
public void Register_Requires_NonNullTier(DriverTier tier)
|
||||||
|
{
|
||||||
|
var registry = new DriverTypeRegistry();
|
||||||
|
var metadata = SampleMetadata(typeName: $"Driver-{tier}", tier: tier);
|
||||||
|
|
||||||
|
registry.Register(metadata);
|
||||||
|
|
||||||
|
registry.Get(metadata.TypeName).Tier.ShouldBe(tier);
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public void Get_IsCaseInsensitive()
|
public void Get_IsCaseInsensitive()
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -0,0 +1,104 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class PermissionTrieCacheTests
|
||||||
|
{
|
||||||
|
private static PermissionTrie Trie(string cluster, long generation) => new()
|
||||||
|
{
|
||||||
|
ClusterId = cluster,
|
||||||
|
GenerationId = generation,
|
||||||
|
};
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void GetTrie_Empty_ReturnsNull()
|
||||||
|
{
|
||||||
|
new PermissionTrieCache().GetTrie("c1").ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Install_ThenGet_RoundTrips()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 5));
|
||||||
|
|
||||||
|
cache.GetTrie("c1")!.GenerationId.ShouldBe(5);
|
||||||
|
cache.CurrentGenerationId("c1").ShouldBe(5);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NewGeneration_BecomesCurrent()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 1));
|
||||||
|
cache.Install(Trie("c1", 2));
|
||||||
|
|
||||||
|
cache.CurrentGenerationId("c1").ShouldBe(2);
|
||||||
|
cache.GetTrie("c1", 1).ShouldNotBeNull("prior generation retained for in-flight requests");
|
||||||
|
cache.GetTrie("c1", 2).ShouldNotBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void OutOfOrder_Install_DoesNotDowngrade_Current()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 3));
|
||||||
|
cache.Install(Trie("c1", 1)); // late-arriving older generation
|
||||||
|
|
||||||
|
cache.CurrentGenerationId("c1").ShouldBe(3, "older generation must not become current");
|
||||||
|
cache.GetTrie("c1", 1).ShouldNotBeNull("but older is still retrievable by explicit lookup");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Invalidate_DropsCluster()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 1));
|
||||||
|
cache.Install(Trie("c2", 1));
|
||||||
|
|
||||||
|
cache.Invalidate("c1");
|
||||||
|
|
||||||
|
cache.GetTrie("c1").ShouldBeNull();
|
||||||
|
cache.GetTrie("c2").ShouldNotBeNull("sibling cluster unaffected");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Prune_RetainsMostRecent()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
for (var g = 1L; g <= 5; g++) cache.Install(Trie("c1", g));
|
||||||
|
|
||||||
|
cache.Prune("c1", keepLatest: 2);
|
||||||
|
|
||||||
|
cache.GetTrie("c1", 5).ShouldNotBeNull();
|
||||||
|
cache.GetTrie("c1", 4).ShouldNotBeNull();
|
||||||
|
cache.GetTrie("c1", 3).ShouldBeNull();
|
||||||
|
cache.GetTrie("c1", 1).ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Prune_LessThanKeep_IsNoOp()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 1));
|
||||||
|
cache.Install(Trie("c1", 2));
|
||||||
|
|
||||||
|
cache.Prune("c1", keepLatest: 10);
|
||||||
|
|
||||||
|
cache.CachedTrieCount.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ClusterIsolation()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(Trie("c1", 1));
|
||||||
|
cache.Install(Trie("c2", 9));
|
||||||
|
|
||||||
|
cache.CurrentGenerationId("c1").ShouldBe(1);
|
||||||
|
cache.CurrentGenerationId("c2").ShouldBe(9);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,157 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class PermissionTrieTests
|
||||||
|
{
|
||||||
|
private static NodeAcl Row(string group, NodeAclScopeKind scope, string? scopeId, NodePermissions flags, string clusterId = "c1") =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
NodeAclRowId = Guid.NewGuid(),
|
||||||
|
NodeAclId = $"acl-{Guid.NewGuid():N}",
|
||||||
|
GenerationId = 1,
|
||||||
|
ClusterId = clusterId,
|
||||||
|
LdapGroup = group,
|
||||||
|
ScopeKind = scope,
|
||||||
|
ScopeId = scopeId,
|
||||||
|
PermissionFlags = flags,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static NodeScope EquipmentTag(string cluster, string ns, string area, string line, string equip, string tag) =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
ClusterId = cluster,
|
||||||
|
NamespaceId = ns,
|
||||||
|
UnsAreaId = area,
|
||||||
|
UnsLineId = line,
|
||||||
|
EquipmentId = equip,
|
||||||
|
TagId = tag,
|
||||||
|
Kind = NodeHierarchyKind.Equipment,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static NodeScope GalaxyTag(string cluster, string ns, string[] folders, string tag) =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
ClusterId = cluster,
|
||||||
|
NamespaceId = ns,
|
||||||
|
FolderSegments = folders,
|
||||||
|
TagId = tag,
|
||||||
|
Kind = NodeHierarchyKind.SystemPlatform,
|
||||||
|
};
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ClusterLevelGrant_Cascades_ToEveryTag()
|
||||||
|
{
|
||||||
|
var rows = new[] { Row("cn=ops", NodeAclScopeKind.Cluster, scopeId: null, NodePermissions.Read) };
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
|
||||||
|
var matches = trie.CollectMatches(
|
||||||
|
EquipmentTag("c1", "ns", "area1", "line1", "eq1", "tag1"),
|
||||||
|
["cn=ops"]);
|
||||||
|
|
||||||
|
matches.Count.ShouldBe(1);
|
||||||
|
matches[0].PermissionFlags.ShouldBe(NodePermissions.Read);
|
||||||
|
matches[0].Scope.ShouldBe(NodeAclScopeKind.Cluster);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void EquipmentScope_DoesNotLeak_ToSibling()
|
||||||
|
{
|
||||||
|
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
|
||||||
|
{
|
||||||
|
["eq-A"] = new(new[] { "ns", "area1", "line1", "eq-A" }),
|
||||||
|
};
|
||||||
|
var rows = new[] { Row("cn=ops", NodeAclScopeKind.Equipment, "eq-A", NodePermissions.Read) };
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths);
|
||||||
|
|
||||||
|
var matchA = trie.CollectMatches(EquipmentTag("c1", "ns", "area1", "line1", "eq-A", "tag1"), ["cn=ops"]);
|
||||||
|
var matchB = trie.CollectMatches(EquipmentTag("c1", "ns", "area1", "line1", "eq-B", "tag1"), ["cn=ops"]);
|
||||||
|
|
||||||
|
matchA.Count.ShouldBe(1);
|
||||||
|
matchB.ShouldBeEmpty("grant at eq-A must not apply to sibling eq-B");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void MultiGroup_Union_OrsPermissionFlags()
|
||||||
|
{
|
||||||
|
var rows = new[]
|
||||||
|
{
|
||||||
|
Row("cn=readers", NodeAclScopeKind.Cluster, null, NodePermissions.Read),
|
||||||
|
Row("cn=writers", NodeAclScopeKind.Cluster, null, NodePermissions.WriteOperate),
|
||||||
|
};
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
|
||||||
|
var matches = trie.CollectMatches(
|
||||||
|
EquipmentTag("c1", "ns", "area1", "line1", "eq1", "tag1"),
|
||||||
|
["cn=readers", "cn=writers"]);
|
||||||
|
|
||||||
|
matches.Count.ShouldBe(2);
|
||||||
|
var combined = matches.Aggregate(NodePermissions.None, (acc, m) => acc | m.PermissionFlags);
|
||||||
|
combined.ShouldBe(NodePermissions.Read | NodePermissions.WriteOperate);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NoMatchingGroup_ReturnsEmpty()
|
||||||
|
{
|
||||||
|
var rows = new[] { Row("cn=different", NodeAclScopeKind.Cluster, null, NodePermissions.Read) };
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
|
||||||
|
var matches = trie.CollectMatches(
|
||||||
|
EquipmentTag("c1", "ns", "area1", "line1", "eq1", "tag1"),
|
||||||
|
["cn=ops"]);
|
||||||
|
|
||||||
|
matches.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Galaxy_FolderSegment_Grant_DoesNotLeak_To_Sibling_Folder()
|
||||||
|
{
|
||||||
|
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
|
||||||
|
{
|
||||||
|
["folder-A"] = new(new[] { "ns-gal", "folder-A" }),
|
||||||
|
};
|
||||||
|
var rows = new[] { Row("cn=ops", NodeAclScopeKind.Equipment, "folder-A", NodePermissions.Read) };
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths);
|
||||||
|
|
||||||
|
var matchA = trie.CollectMatches(GalaxyTag("c1", "ns-gal", ["folder-A"], "tag1"), ["cn=ops"]);
|
||||||
|
var matchB = trie.CollectMatches(GalaxyTag("c1", "ns-gal", ["folder-B"], "tag1"), ["cn=ops"]);
|
||||||
|
|
||||||
|
matchA.Count.ShouldBe(1);
|
||||||
|
matchB.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void CrossCluster_Grant_DoesNotLeak()
|
||||||
|
{
|
||||||
|
var rows = new[] { Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read, clusterId: "c-other") };
|
||||||
|
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
|
||||||
|
var matches = trie.CollectMatches(
|
||||||
|
EquipmentTag("c1", "ns", "area1", "line1", "eq1", "tag1"),
|
||||||
|
["cn=ops"]);
|
||||||
|
|
||||||
|
matches.ShouldBeEmpty("rows for cluster c-other must not land in c1's trie");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Build_IsIdempotent()
|
||||||
|
{
|
||||||
|
var rows = new[]
|
||||||
|
{
|
||||||
|
Row("cn=a", NodeAclScopeKind.Cluster, null, NodePermissions.Read),
|
||||||
|
Row("cn=b", NodeAclScopeKind.Cluster, null, NodePermissions.WriteOperate),
|
||||||
|
};
|
||||||
|
|
||||||
|
var trie1 = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
var trie2 = PermissionTrieBuilder.Build("c1", 1, rows);
|
||||||
|
|
||||||
|
trie1.Root.Grants.Count.ShouldBe(trie2.Root.Grants.Count);
|
||||||
|
trie1.ClusterId.ShouldBe(trie2.ClusterId);
|
||||||
|
trie1.GenerationId.ShouldBe(trie2.GenerationId);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,154 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class TriePermissionEvaluatorTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime Now = new(2026, 4, 19, 12, 0, 0, DateTimeKind.Utc);
|
||||||
|
private readonly FakeTimeProvider _time = new();
|
||||||
|
|
||||||
|
private sealed class FakeTimeProvider : TimeProvider
|
||||||
|
{
|
||||||
|
public DateTime Utc { get; set; } = Now;
|
||||||
|
public override DateTimeOffset GetUtcNow() => new(Utc, TimeSpan.Zero);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static NodeAcl Row(string group, NodeAclScopeKind scope, string? scopeId, NodePermissions flags) =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
NodeAclRowId = Guid.NewGuid(),
|
||||||
|
NodeAclId = $"acl-{Guid.NewGuid():N}",
|
||||||
|
GenerationId = 1,
|
||||||
|
ClusterId = "c1",
|
||||||
|
LdapGroup = group,
|
||||||
|
ScopeKind = scope,
|
||||||
|
ScopeId = scopeId,
|
||||||
|
PermissionFlags = flags,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static UserAuthorizationState Session(string[] groups, DateTime? resolvedUtc = null, string clusterId = "c1") =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
SessionId = "sess",
|
||||||
|
ClusterId = clusterId,
|
||||||
|
LdapGroups = groups,
|
||||||
|
MembershipResolvedUtc = resolvedUtc ?? Now,
|
||||||
|
AuthGenerationId = 1,
|
||||||
|
MembershipVersion = 1,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static NodeScope Scope(string cluster = "c1") =>
|
||||||
|
new()
|
||||||
|
{
|
||||||
|
ClusterId = cluster,
|
||||||
|
NamespaceId = "ns",
|
||||||
|
UnsAreaId = "area",
|
||||||
|
UnsLineId = "line",
|
||||||
|
EquipmentId = "eq",
|
||||||
|
TagId = "tag",
|
||||||
|
Kind = NodeHierarchyKind.Equipment,
|
||||||
|
};
|
||||||
|
|
||||||
|
private TriePermissionEvaluator MakeEvaluator(NodeAcl[] rows)
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(PermissionTrieBuilder.Build("c1", 1, rows));
|
||||||
|
return new TriePermissionEvaluator(cache, _time);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Allow_When_RequiredFlag_Matched()
|
||||||
|
{
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(Session(["cn=ops"]), OpcUaOperation.Read, Scope());
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.Allow);
|
||||||
|
decision.Provenance.Count.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NotGranted_When_NoMatchingGroup()
|
||||||
|
{
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(Session(["cn=unrelated"]), OpcUaOperation.Read, Scope());
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.NotGranted);
|
||||||
|
decision.Provenance.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NotGranted_When_FlagsInsufficient()
|
||||||
|
{
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(Session(["cn=ops"]), OpcUaOperation.WriteOperate, Scope());
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.NotGranted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void HistoryRead_Requires_Its_Own_Bit()
|
||||||
|
{
|
||||||
|
// User has Read but not HistoryRead
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
|
||||||
|
var liveRead = evaluator.Authorize(Session(["cn=ops"]), OpcUaOperation.Read, Scope());
|
||||||
|
var historyRead = evaluator.Authorize(Session(["cn=ops"]), OpcUaOperation.HistoryRead, Scope());
|
||||||
|
|
||||||
|
liveRead.IsAllowed.ShouldBeTrue();
|
||||||
|
historyRead.IsAllowed.ShouldBeFalse("HistoryRead uses its own NodePermissions flag, not Read");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void CrossCluster_Session_Denied()
|
||||||
|
{
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
var otherSession = Session(["cn=ops"], clusterId: "c-other");
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(otherSession, OpcUaOperation.Read, Scope(cluster: "c1"));
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.NotGranted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void StaleSession_FailsClosed()
|
||||||
|
{
|
||||||
|
var evaluator = MakeEvaluator([Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read)]);
|
||||||
|
var session = Session(["cn=ops"], resolvedUtc: Now);
|
||||||
|
_time.Utc = Now.AddMinutes(10); // well past the 5-min AuthCacheMaxStaleness default
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(session, OpcUaOperation.Read, Scope());
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.NotGranted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NoCachedTrie_ForCluster_Denied()
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache(); // empty cache
|
||||||
|
var evaluator = new TriePermissionEvaluator(cache, _time);
|
||||||
|
|
||||||
|
var decision = evaluator.Authorize(Session(["cn=ops"]), OpcUaOperation.Read, Scope());
|
||||||
|
|
||||||
|
decision.Verdict.ShouldBe(AuthorizationVerdict.NotGranted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void OperationToPermission_Mapping_IsTotal()
|
||||||
|
{
|
||||||
|
foreach (var op in Enum.GetValues<OpcUaOperation>())
|
||||||
|
{
|
||||||
|
// Must not throw — every OpcUaOperation needs a mapping or the compliance-check
|
||||||
|
// "every operation wired" fails.
|
||||||
|
TriePermissionEvaluator.MapOperationToPermission(op);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,60 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class UserAuthorizationStateTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime Now = new(2026, 4, 19, 12, 0, 0, DateTimeKind.Utc);
|
||||||
|
|
||||||
|
private static UserAuthorizationState Fresh(DateTime resolved) => new()
|
||||||
|
{
|
||||||
|
SessionId = "s",
|
||||||
|
ClusterId = "c1",
|
||||||
|
LdapGroups = ["cn=ops"],
|
||||||
|
MembershipResolvedUtc = resolved,
|
||||||
|
AuthGenerationId = 1,
|
||||||
|
MembershipVersion = 1,
|
||||||
|
};
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FreshlyResolved_Is_NotStale_NorNeedsRefresh()
|
||||||
|
{
|
||||||
|
var session = Fresh(Now);
|
||||||
|
|
||||||
|
session.IsStale(Now.AddMinutes(1)).ShouldBeFalse();
|
||||||
|
session.NeedsRefresh(Now.AddMinutes(1)).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NeedsRefresh_FiresAfter_FreshnessInterval()
|
||||||
|
{
|
||||||
|
var session = Fresh(Now);
|
||||||
|
|
||||||
|
session.NeedsRefresh(Now.AddMinutes(16)).ShouldBeFalse("past freshness but also past the 5-min staleness ceiling — should be Stale, not NeedsRefresh");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NeedsRefresh_TrueBetween_Freshness_And_Staleness_Windows()
|
||||||
|
{
|
||||||
|
// Custom: freshness=2 min, staleness=10 min → between 2 and 10 min NeedsRefresh fires.
|
||||||
|
var session = Fresh(Now) with
|
||||||
|
{
|
||||||
|
MembershipFreshnessInterval = TimeSpan.FromMinutes(2),
|
||||||
|
AuthCacheMaxStaleness = TimeSpan.FromMinutes(10),
|
||||||
|
};
|
||||||
|
|
||||||
|
session.NeedsRefresh(Now.AddMinutes(5)).ShouldBeTrue();
|
||||||
|
session.IsStale(Now.AddMinutes(5)).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void IsStale_TrueAfter_StalenessWindow()
|
||||||
|
{
|
||||||
|
var session = Fresh(Now);
|
||||||
|
|
||||||
|
session.IsStale(Now.AddMinutes(6)).ShouldBeTrue("default AuthCacheMaxStaleness is 5 min");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,72 @@
|
|||||||
|
using Serilog;
|
||||||
|
using Serilog.Core;
|
||||||
|
using Serilog.Events;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Observability;
|
||||||
|
|
||||||
|
[Trait("Category", "Integration")]
|
||||||
|
public sealed class CapabilityInvokerEnrichmentTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public async Task InvokerExecute_LogsInsideCallSite_CarryStructuredProperties()
|
||||||
|
{
|
||||||
|
var sink = new InMemorySink();
|
||||||
|
var logger = new LoggerConfiguration()
|
||||||
|
.Enrich.FromLogContext()
|
||||||
|
.WriteTo.Sink(sink)
|
||||||
|
.CreateLogger();
|
||||||
|
|
||||||
|
var invoker = new CapabilityInvoker(
|
||||||
|
new DriverResiliencePipelineBuilder(),
|
||||||
|
driverInstanceId: "drv-live",
|
||||||
|
optionsAccessor: () => new DriverResilienceOptions { Tier = DriverTier.A },
|
||||||
|
driverType: "Modbus");
|
||||||
|
|
||||||
|
await invoker.ExecuteAsync(
|
||||||
|
DriverCapability.Read,
|
||||||
|
"plc-1",
|
||||||
|
ct =>
|
||||||
|
{
|
||||||
|
logger.Information("inside call site");
|
||||||
|
return ValueTask.FromResult(42);
|
||||||
|
},
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
var evt = sink.Events.ShouldHaveSingleItem();
|
||||||
|
evt.Properties["DriverInstanceId"].ToString().ShouldBe("\"drv-live\"");
|
||||||
|
evt.Properties["DriverType"].ToString().ShouldBe("\"Modbus\"");
|
||||||
|
evt.Properties["CapabilityName"].ToString().ShouldBe("\"Read\"");
|
||||||
|
evt.Properties.ShouldContainKey("CorrelationId");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task InvokerExecute_DoesNotLeak_ContextOutsideCallSite()
|
||||||
|
{
|
||||||
|
var sink = new InMemorySink();
|
||||||
|
var logger = new LoggerConfiguration()
|
||||||
|
.Enrich.FromLogContext()
|
||||||
|
.WriteTo.Sink(sink)
|
||||||
|
.CreateLogger();
|
||||||
|
|
||||||
|
var invoker = new CapabilityInvoker(
|
||||||
|
new DriverResiliencePipelineBuilder(),
|
||||||
|
driverInstanceId: "drv-a",
|
||||||
|
optionsAccessor: () => new DriverResilienceOptions { Tier = DriverTier.A });
|
||||||
|
|
||||||
|
await invoker.ExecuteAsync(DriverCapability.Read, "host", _ => ValueTask.FromResult(1), CancellationToken.None);
|
||||||
|
logger.Information("outside");
|
||||||
|
|
||||||
|
var outside = sink.Events.ShouldHaveSingleItem();
|
||||||
|
outside.Properties.ContainsKey("DriverInstanceId").ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class InMemorySink : ILogEventSink
|
||||||
|
{
|
||||||
|
public List<LogEvent> Events { get; } = [];
|
||||||
|
public void Emit(LogEvent logEvent) => Events.Add(logEvent);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Observability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class DriverHealthReportTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public void EmptyFleet_IsHealthy()
|
||||||
|
{
|
||||||
|
DriverHealthReport.Aggregate([]).ShouldBe(ReadinessVerdict.Healthy);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void AllHealthy_Fleet_IsHealthy()
|
||||||
|
{
|
||||||
|
var verdict = DriverHealthReport.Aggregate([
|
||||||
|
new DriverHealthSnapshot("a", DriverState.Healthy),
|
||||||
|
new DriverHealthSnapshot("b", DriverState.Healthy),
|
||||||
|
]);
|
||||||
|
verdict.ShouldBe(ReadinessVerdict.Healthy);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void AnyFaulted_TrumpsEverything()
|
||||||
|
{
|
||||||
|
var verdict = DriverHealthReport.Aggregate([
|
||||||
|
new DriverHealthSnapshot("a", DriverState.Healthy),
|
||||||
|
new DriverHealthSnapshot("b", DriverState.Degraded),
|
||||||
|
new DriverHealthSnapshot("c", DriverState.Faulted),
|
||||||
|
new DriverHealthSnapshot("d", DriverState.Initializing),
|
||||||
|
]);
|
||||||
|
verdict.ShouldBe(ReadinessVerdict.Faulted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverState.Unknown)]
|
||||||
|
[InlineData(DriverState.Initializing)]
|
||||||
|
public void Any_NotReady_WithoutFaulted_IsNotReady(DriverState initializingState)
|
||||||
|
{
|
||||||
|
var verdict = DriverHealthReport.Aggregate([
|
||||||
|
new DriverHealthSnapshot("a", DriverState.Healthy),
|
||||||
|
new DriverHealthSnapshot("b", initializingState),
|
||||||
|
]);
|
||||||
|
verdict.ShouldBe(ReadinessVerdict.NotReady);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Any_Degraded_WithoutFaultedOrNotReady_IsDegraded()
|
||||||
|
{
|
||||||
|
var verdict = DriverHealthReport.Aggregate([
|
||||||
|
new DriverHealthSnapshot("a", DriverState.Healthy),
|
||||||
|
new DriverHealthSnapshot("b", DriverState.Degraded),
|
||||||
|
]);
|
||||||
|
verdict.ShouldBe(ReadinessVerdict.Degraded);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(ReadinessVerdict.Healthy, 200)]
|
||||||
|
[InlineData(ReadinessVerdict.Degraded, 200)]
|
||||||
|
[InlineData(ReadinessVerdict.NotReady, 503)]
|
||||||
|
[InlineData(ReadinessVerdict.Faulted, 503)]
|
||||||
|
public void HttpStatus_MatchesStateMatrix(ReadinessVerdict verdict, int expected)
|
||||||
|
{
|
||||||
|
DriverHealthReport.HttpStatus(verdict).ShouldBe(expected);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,78 @@
|
|||||||
|
using Serilog;
|
||||||
|
using Serilog.Core;
|
||||||
|
using Serilog.Events;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Observability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class LogContextEnricherTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public void Scope_Attaches_AllFour_Properties()
|
||||||
|
{
|
||||||
|
var captured = new InMemorySink();
|
||||||
|
var logger = new LoggerConfiguration()
|
||||||
|
.Enrich.FromLogContext()
|
||||||
|
.WriteTo.Sink(captured)
|
||||||
|
.CreateLogger();
|
||||||
|
|
||||||
|
using (LogContextEnricher.Push("drv-1", "Modbus", DriverCapability.Read, "abc123"))
|
||||||
|
{
|
||||||
|
logger.Information("test message");
|
||||||
|
}
|
||||||
|
|
||||||
|
var evt = captured.Events.ShouldHaveSingleItem();
|
||||||
|
evt.Properties["DriverInstanceId"].ToString().ShouldBe("\"drv-1\"");
|
||||||
|
evt.Properties["DriverType"].ToString().ShouldBe("\"Modbus\"");
|
||||||
|
evt.Properties["CapabilityName"].ToString().ShouldBe("\"Read\"");
|
||||||
|
evt.Properties["CorrelationId"].ToString().ShouldBe("\"abc123\"");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Scope_Dispose_Pops_Properties()
|
||||||
|
{
|
||||||
|
var captured = new InMemorySink();
|
||||||
|
var logger = new LoggerConfiguration()
|
||||||
|
.Enrich.FromLogContext()
|
||||||
|
.WriteTo.Sink(captured)
|
||||||
|
.CreateLogger();
|
||||||
|
|
||||||
|
using (LogContextEnricher.Push("drv-1", "Modbus", DriverCapability.Read, "abc123"))
|
||||||
|
{
|
||||||
|
logger.Information("inside");
|
||||||
|
}
|
||||||
|
logger.Information("outside");
|
||||||
|
|
||||||
|
captured.Events.Count.ShouldBe(2);
|
||||||
|
captured.Events[0].Properties.ContainsKey("DriverInstanceId").ShouldBeTrue();
|
||||||
|
captured.Events[1].Properties.ContainsKey("DriverInstanceId").ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NewCorrelationId_Returns_12_Hex_Chars()
|
||||||
|
{
|
||||||
|
var id = LogContextEnricher.NewCorrelationId();
|
||||||
|
id.Length.ShouldBe(12);
|
||||||
|
id.ShouldMatch("^[0-9a-f]{12}$");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(null)]
|
||||||
|
[InlineData("")]
|
||||||
|
[InlineData(" ")]
|
||||||
|
public void Push_Throws_OnMissingDriverInstanceId(string? id)
|
||||||
|
{
|
||||||
|
Should.Throw<ArgumentException>(() =>
|
||||||
|
LogContextEnricher.Push(id!, "Modbus", DriverCapability.Read, "c"));
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class InMemorySink : ILogEventSink
|
||||||
|
{
|
||||||
|
public List<LogEvent> Events { get; } = [];
|
||||||
|
public void Emit(LogEvent logEvent) => Events.Add(logEvent);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,151 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Resilience;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class CapabilityInvokerTests
|
||||||
|
{
|
||||||
|
private static CapabilityInvoker MakeInvoker(
|
||||||
|
DriverResiliencePipelineBuilder builder,
|
||||||
|
DriverResilienceOptions options) =>
|
||||||
|
new(builder, "drv-test", () => options);
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Read_ReturnsValue_FromCallSite()
|
||||||
|
{
|
||||||
|
var invoker = MakeInvoker(new DriverResiliencePipelineBuilder(), new DriverResilienceOptions { Tier = DriverTier.A });
|
||||||
|
|
||||||
|
var result = await invoker.ExecuteAsync(
|
||||||
|
DriverCapability.Read,
|
||||||
|
"host-1",
|
||||||
|
_ => ValueTask.FromResult(42),
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
result.ShouldBe(42);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Read_Retries_OnTransientFailure()
|
||||||
|
{
|
||||||
|
var invoker = MakeInvoker(new DriverResiliencePipelineBuilder(), new DriverResilienceOptions { Tier = DriverTier.A });
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
var result = await invoker.ExecuteAsync(
|
||||||
|
DriverCapability.Read,
|
||||||
|
"host-1",
|
||||||
|
async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
if (attempts < 2) throw new InvalidOperationException("transient");
|
||||||
|
await Task.Yield();
|
||||||
|
return "ok";
|
||||||
|
},
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
result.ShouldBe("ok");
|
||||||
|
attempts.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_NonIdempotent_DoesNotRetry_EvenWhenPolicyHasRetries()
|
||||||
|
{
|
||||||
|
var options = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var invoker = MakeInvoker(new DriverResiliencePipelineBuilder(), options);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
await invoker.ExecuteWriteAsync(
|
||||||
|
"host-1",
|
||||||
|
isIdempotent: false,
|
||||||
|
async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("boom");
|
||||||
|
#pragma warning disable CS0162
|
||||||
|
return 0;
|
||||||
|
#pragma warning restore CS0162
|
||||||
|
},
|
||||||
|
CancellationToken.None));
|
||||||
|
|
||||||
|
attempts.ShouldBe(1, "non-idempotent write must never replay");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_Idempotent_Retries_WhenPolicyHasRetries()
|
||||||
|
{
|
||||||
|
var options = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var invoker = MakeInvoker(new DriverResiliencePipelineBuilder(), options);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
var result = await invoker.ExecuteWriteAsync(
|
||||||
|
"host-1",
|
||||||
|
isIdempotent: true,
|
||||||
|
async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
if (attempts < 2) throw new InvalidOperationException("transient");
|
||||||
|
await Task.Yield();
|
||||||
|
return "ok";
|
||||||
|
},
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
result.ShouldBe("ok");
|
||||||
|
attempts.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_Default_DoesNotRetry_WhenPolicyHasZeroRetries()
|
||||||
|
{
|
||||||
|
// Tier A Write default is RetryCount=0. Even isIdempotent=true shouldn't retry
|
||||||
|
// because the policy says not to.
|
||||||
|
var invoker = MakeInvoker(new DriverResiliencePipelineBuilder(), new DriverResilienceOptions { Tier = DriverTier.A });
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
await invoker.ExecuteWriteAsync(
|
||||||
|
"host-1",
|
||||||
|
isIdempotent: true,
|
||||||
|
async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("boom");
|
||||||
|
#pragma warning disable CS0162
|
||||||
|
return 0;
|
||||||
|
#pragma warning restore CS0162
|
||||||
|
},
|
||||||
|
CancellationToken.None));
|
||||||
|
|
||||||
|
attempts.ShouldBe(1, "tier-A default for Write is RetryCount=0");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Execute_HonorsDifferentHosts_Independently()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var invoker = MakeInvoker(builder, new DriverResilienceOptions { Tier = DriverTier.A });
|
||||||
|
|
||||||
|
await invoker.ExecuteAsync(DriverCapability.Read, "host-a", _ => ValueTask.FromResult(1), CancellationToken.None);
|
||||||
|
await invoker.ExecuteAsync(DriverCapability.Read, "host-b", _ => ValueTask.FromResult(2), CancellationToken.None);
|
||||||
|
|
||||||
|
builder.CachedPipelineCount.ShouldBe(2);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,102 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Resilience;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class DriverResilienceOptionsTests
|
||||||
|
{
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
[InlineData(DriverTier.C)]
|
||||||
|
public void TierDefaults_Cover_EveryCapability(DriverTier tier)
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(tier);
|
||||||
|
|
||||||
|
foreach (var capability in Enum.GetValues<DriverCapability>())
|
||||||
|
defaults.ShouldContainKey(capability);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
[InlineData(DriverTier.C)]
|
||||||
|
public void Write_NeverRetries_ByDefault(DriverTier tier)
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(tier);
|
||||||
|
defaults[DriverCapability.Write].RetryCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
[InlineData(DriverTier.C)]
|
||||||
|
public void AlarmAcknowledge_NeverRetries_ByDefault(DriverTier tier)
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(tier);
|
||||||
|
defaults[DriverCapability.AlarmAcknowledge].RetryCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A, DriverCapability.Read)]
|
||||||
|
[InlineData(DriverTier.A, DriverCapability.HistoryRead)]
|
||||||
|
[InlineData(DriverTier.B, DriverCapability.Discover)]
|
||||||
|
[InlineData(DriverTier.B, DriverCapability.Probe)]
|
||||||
|
[InlineData(DriverTier.C, DriverCapability.AlarmSubscribe)]
|
||||||
|
public void IdempotentCapabilities_Retry_ByDefault(DriverTier tier, DriverCapability capability)
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(tier);
|
||||||
|
defaults[capability].RetryCount.ShouldBeGreaterThan(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void TierC_DisablesCircuitBreaker_DeferringToSupervisor()
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(DriverTier.C);
|
||||||
|
|
||||||
|
foreach (var (_, policy) in defaults)
|
||||||
|
policy.BreakerFailureThreshold.ShouldBe(0, "Tier C breaker is handled by the Proxy supervisor (decision #68)");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
public void TierAAndB_EnableCircuitBreaker(DriverTier tier)
|
||||||
|
{
|
||||||
|
var defaults = DriverResilienceOptions.GetTierDefaults(tier);
|
||||||
|
|
||||||
|
foreach (var (_, policy) in defaults)
|
||||||
|
policy.BreakerFailureThreshold.ShouldBeGreaterThan(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Resolve_Uses_TierDefaults_When_NoOverride()
|
||||||
|
{
|
||||||
|
var options = new DriverResilienceOptions { Tier = DriverTier.A };
|
||||||
|
|
||||||
|
var resolved = options.Resolve(DriverCapability.Read);
|
||||||
|
|
||||||
|
resolved.ShouldBe(DriverResilienceOptions.GetTierDefaults(DriverTier.A)[DriverCapability.Read]);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Resolve_Uses_Override_When_Configured()
|
||||||
|
{
|
||||||
|
var custom = new CapabilityPolicy(TimeoutSeconds: 42, RetryCount: 7, BreakerFailureThreshold: 9);
|
||||||
|
var options = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Read] = custom,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
options.Resolve(DriverCapability.Read).ShouldBe(custom);
|
||||||
|
options.Resolve(DriverCapability.Write).ShouldBe(
|
||||||
|
DriverResilienceOptions.GetTierDefaults(DriverTier.A)[DriverCapability.Write]);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,222 @@
|
|||||||
|
using Polly.CircuitBreaker;
|
||||||
|
using Polly.Timeout;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Resilience;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class DriverResiliencePipelineBuilderTests
|
||||||
|
{
|
||||||
|
private static readonly DriverResilienceOptions TierAOptions = new() { Tier = DriverTier.A };
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Read_Retries_Transient_Failures()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.Read, TierAOptions);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
await pipeline.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
if (attempts < 3) throw new InvalidOperationException("transient");
|
||||||
|
await Task.Yield();
|
||||||
|
});
|
||||||
|
|
||||||
|
attempts.ShouldBe(3);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_DoesNotRetry_OnFailure()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.Write, TierAOptions);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
var ex = await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
{
|
||||||
|
await pipeline.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("boom");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
attempts.ShouldBe(1);
|
||||||
|
ex.Message.ShouldBe("boom");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task AlarmAcknowledge_DoesNotRetry_OnFailure()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.AlarmAcknowledge, TierAOptions);
|
||||||
|
var attempts = 0;
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
{
|
||||||
|
await pipeline.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("boom");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
attempts.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Pipeline_IsIsolated_PerHost()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var driverId = "drv-test";
|
||||||
|
|
||||||
|
var hostA = builder.GetOrCreate(driverId, "host-a", DriverCapability.Read, TierAOptions);
|
||||||
|
var hostB = builder.GetOrCreate(driverId, "host-b", DriverCapability.Read, TierAOptions);
|
||||||
|
|
||||||
|
hostA.ShouldNotBeSameAs(hostB);
|
||||||
|
builder.CachedPipelineCount.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Pipeline_IsReused_ForSameTriple()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var driverId = "drv-test";
|
||||||
|
|
||||||
|
var first = builder.GetOrCreate(driverId, "host-a", DriverCapability.Read, TierAOptions);
|
||||||
|
var second = builder.GetOrCreate(driverId, "host-a", DriverCapability.Read, TierAOptions);
|
||||||
|
|
||||||
|
first.ShouldBeSameAs(second);
|
||||||
|
builder.CachedPipelineCount.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Pipeline_IsIsolated_PerCapability()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var driverId = "drv-test";
|
||||||
|
|
||||||
|
var read = builder.GetOrCreate(driverId, "host-a", DriverCapability.Read, TierAOptions);
|
||||||
|
var write = builder.GetOrCreate(driverId, "host-a", DriverCapability.Write, TierAOptions);
|
||||||
|
|
||||||
|
read.ShouldNotBeSameAs(write);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task DeadHost_DoesNotOpenBreaker_ForSiblingHost()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var driverId = "drv-test";
|
||||||
|
|
||||||
|
var deadHost = builder.GetOrCreate(driverId, "dead-plc", DriverCapability.Read, TierAOptions);
|
||||||
|
var liveHost = builder.GetOrCreate(driverId, "live-plc", DriverCapability.Read, TierAOptions);
|
||||||
|
|
||||||
|
var threshold = TierAOptions.Resolve(DriverCapability.Read).BreakerFailureThreshold;
|
||||||
|
for (var i = 0; i < threshold + 5; i++)
|
||||||
|
{
|
||||||
|
await Should.ThrowAsync<Exception>(async () =>
|
||||||
|
await deadHost.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("dead plc");
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
var liveAttempts = 0;
|
||||||
|
await liveHost.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
liveAttempts++;
|
||||||
|
await Task.Yield();
|
||||||
|
});
|
||||||
|
|
||||||
|
liveAttempts.ShouldBe(1, "healthy sibling host must not be affected by dead peer");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CircuitBreaker_Opens_AfterFailureThreshold_OnTierA()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.Write, TierAOptions);
|
||||||
|
|
||||||
|
var threshold = TierAOptions.Resolve(DriverCapability.Write).BreakerFailureThreshold;
|
||||||
|
for (var i = 0; i < threshold; i++)
|
||||||
|
{
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
await pipeline.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
await Task.Yield();
|
||||||
|
throw new InvalidOperationException("boom");
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
await Should.ThrowAsync<BrokenCircuitException>(async () =>
|
||||||
|
await pipeline.ExecuteAsync(async _ =>
|
||||||
|
{
|
||||||
|
await Task.Yield();
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Timeout_Cancels_SlowOperation()
|
||||||
|
{
|
||||||
|
var tierAWithShortTimeout = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Read] = new(TimeoutSeconds: 1, RetryCount: 0, BreakerFailureThreshold: 5),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.Read, tierAWithShortTimeout);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<TimeoutRejectedException>(async () =>
|
||||||
|
await pipeline.ExecuteAsync(async ct =>
|
||||||
|
{
|
||||||
|
await Task.Delay(TimeSpan.FromSeconds(5), ct);
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Invalidate_Removes_OnlyMatchingInstance()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var keepId = "drv-keep";
|
||||||
|
var dropId = "drv-drop";
|
||||||
|
|
||||||
|
builder.GetOrCreate(keepId, "h", DriverCapability.Read, TierAOptions);
|
||||||
|
builder.GetOrCreate(keepId, "h", DriverCapability.Write, TierAOptions);
|
||||||
|
builder.GetOrCreate(dropId, "h", DriverCapability.Read, TierAOptions);
|
||||||
|
|
||||||
|
var removed = builder.Invalidate(dropId);
|
||||||
|
|
||||||
|
removed.ShouldBe(1);
|
||||||
|
builder.CachedPipelineCount.ShouldBe(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Cancellation_IsNot_Retried()
|
||||||
|
{
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var pipeline = builder.GetOrCreate("drv-test", "host-1", DriverCapability.Read, TierAOptions);
|
||||||
|
var attempts = 0;
|
||||||
|
using var cts = new CancellationTokenSource();
|
||||||
|
cts.Cancel();
|
||||||
|
|
||||||
|
await Should.ThrowAsync<OperationCanceledException>(async () =>
|
||||||
|
await pipeline.ExecuteAsync(async ct =>
|
||||||
|
{
|
||||||
|
attempts++;
|
||||||
|
ct.ThrowIfCancellationRequested();
|
||||||
|
await Task.Yield();
|
||||||
|
}, cts.Token));
|
||||||
|
|
||||||
|
attempts.ShouldBeLessThanOrEqualTo(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,110 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Resilience;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class DriverResilienceStatusTrackerTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime Now = new(2026, 4, 19, 12, 0, 0, DateTimeKind.Utc);
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void TryGet_Returns_Null_Before_AnyWrite()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host").ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RecordFailure_Accumulates_ConsecutiveFailures()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.RecordFailure("drv", "host", Now);
|
||||||
|
tracker.RecordFailure("drv", "host", Now.AddSeconds(1));
|
||||||
|
tracker.RecordFailure("drv", "host", Now.AddSeconds(2));
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host")!.ConsecutiveFailures.ShouldBe(3);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RecordSuccess_Resets_ConsecutiveFailures()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
tracker.RecordFailure("drv", "host", Now);
|
||||||
|
tracker.RecordFailure("drv", "host", Now.AddSeconds(1));
|
||||||
|
|
||||||
|
tracker.RecordSuccess("drv", "host", Now.AddSeconds(2));
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host")!.ConsecutiveFailures.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RecordBreakerOpen_Populates_LastBreakerOpenUtc()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.RecordBreakerOpen("drv", "host", Now);
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host")!.LastBreakerOpenUtc.ShouldBe(Now);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RecordRecycle_Populates_LastRecycleUtc()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.RecordRecycle("drv", "host", Now);
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host")!.LastRecycleUtc.ShouldBe(Now);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RecordFootprint_CapturesBaselineAndCurrent()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.RecordFootprint("drv", "host", baselineBytes: 100_000_000, currentBytes: 150_000_000, Now);
|
||||||
|
|
||||||
|
var snap = tracker.TryGet("drv", "host")!;
|
||||||
|
snap.BaselineFootprintBytes.ShouldBe(100_000_000);
|
||||||
|
snap.CurrentFootprintBytes.ShouldBe(150_000_000);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void DifferentHosts_AreIndependent()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
|
||||||
|
tracker.RecordFailure("drv", "host-a", Now);
|
||||||
|
tracker.RecordFailure("drv", "host-b", Now);
|
||||||
|
tracker.RecordSuccess("drv", "host-a", Now.AddSeconds(1));
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host-a")!.ConsecutiveFailures.ShouldBe(0);
|
||||||
|
tracker.TryGet("drv", "host-b")!.ConsecutiveFailures.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Snapshot_ReturnsAll_TrackedPairs()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
tracker.RecordFailure("drv-1", "host-a", Now);
|
||||||
|
tracker.RecordFailure("drv-1", "host-b", Now);
|
||||||
|
tracker.RecordFailure("drv-2", "host-a", Now);
|
||||||
|
|
||||||
|
var snapshot = tracker.Snapshot();
|
||||||
|
|
||||||
|
snapshot.Count.ShouldBe(3);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ConcurrentWrites_DoNotLose_Failures()
|
||||||
|
{
|
||||||
|
var tracker = new DriverResilienceStatusTracker();
|
||||||
|
Parallel.For(0, 500, _ => tracker.RecordFailure("drv", "host", Now));
|
||||||
|
|
||||||
|
tracker.TryGet("drv", "host")!.ConsecutiveFailures.ShouldBe(500);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,160 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Resilience;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Resilience;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Integration tests for the Phase 6.1 Stream A.5 contract — wrapping a flaky
|
||||||
|
/// <see cref="IReadable"/> / <see cref="IWritable"/> through the <see cref="CapabilityInvoker"/>.
|
||||||
|
/// Exercises the three scenarios the plan enumerates: transient read succeeds after N
|
||||||
|
/// retries; non-idempotent write fails after one attempt; idempotent write retries through.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Integration")]
|
||||||
|
public sealed class FlakeyDriverIntegrationTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public async Task Read_SurfacesSuccess_AfterTransientFailures()
|
||||||
|
{
|
||||||
|
var flaky = new FlakeyDriver(failReadsBeforeIndex: 5);
|
||||||
|
var options = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
// TimeoutSeconds=30 gives slack for 5 exponential-backoff retries under
|
||||||
|
// parallel-test-execution CPU pressure; 10 retries at the default Delay=100ms
|
||||||
|
// exponential can otherwise exceed a 2-second budget intermittently.
|
||||||
|
[DriverCapability.Read] = new(TimeoutSeconds: 30, RetryCount: 10, BreakerFailureThreshold: 50),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var invoker = new CapabilityInvoker(new DriverResiliencePipelineBuilder(), "drv-test", () => options);
|
||||||
|
|
||||||
|
var result = await invoker.ExecuteAsync(
|
||||||
|
DriverCapability.Read,
|
||||||
|
"host-1",
|
||||||
|
async ct => await flaky.ReadAsync(["tag-a"], ct),
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
flaky.ReadAttempts.ShouldBe(6);
|
||||||
|
result[0].StatusCode.ShouldBe(0u);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_NonIdempotent_FailsOnFirstFailure_NoReplay()
|
||||||
|
{
|
||||||
|
var flaky = new FlakeyDriver(failWritesBeforeIndex: 3);
|
||||||
|
var optionsWithAggressiveRetry = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 5, BreakerFailureThreshold: 50),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var invoker = new CapabilityInvoker(new DriverResiliencePipelineBuilder(), "drv-test", () => optionsWithAggressiveRetry);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||||
|
await invoker.ExecuteWriteAsync(
|
||||||
|
"host-1",
|
||||||
|
isIdempotent: false,
|
||||||
|
async ct => await flaky.WriteAsync([new WriteRequest("pulse-coil", true)], ct),
|
||||||
|
CancellationToken.None));
|
||||||
|
|
||||||
|
flaky.WriteAttempts.ShouldBe(1, "non-idempotent write must never replay (decision #44)");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Write_Idempotent_RetriesUntilSuccess()
|
||||||
|
{
|
||||||
|
var flaky = new FlakeyDriver(failWritesBeforeIndex: 2);
|
||||||
|
var optionsWithRetry = new DriverResilienceOptions
|
||||||
|
{
|
||||||
|
Tier = DriverTier.A,
|
||||||
|
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
|
||||||
|
{
|
||||||
|
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 5, BreakerFailureThreshold: 50),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
var invoker = new CapabilityInvoker(new DriverResiliencePipelineBuilder(), "drv-test", () => optionsWithRetry);
|
||||||
|
|
||||||
|
var results = await invoker.ExecuteWriteAsync(
|
||||||
|
"host-1",
|
||||||
|
isIdempotent: true,
|
||||||
|
async ct => await flaky.WriteAsync([new WriteRequest("set-point", 42.0f)], ct),
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
flaky.WriteAttempts.ShouldBe(3);
|
||||||
|
results[0].StatusCode.ShouldBe(0u);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task MultipleHosts_OnOneDriver_HaveIndependentFailureCounts()
|
||||||
|
{
|
||||||
|
var flaky = new FlakeyDriver(failReadsBeforeIndex: 0);
|
||||||
|
var options = new DriverResilienceOptions { Tier = DriverTier.A };
|
||||||
|
var builder = new DriverResiliencePipelineBuilder();
|
||||||
|
var invoker = new CapabilityInvoker(builder, "drv-test", () => options);
|
||||||
|
|
||||||
|
// host-dead: force many failures to exhaust retries + trip breaker
|
||||||
|
var threshold = options.Resolve(DriverCapability.Read).BreakerFailureThreshold;
|
||||||
|
for (var i = 0; i < threshold + 5; i++)
|
||||||
|
{
|
||||||
|
await Should.ThrowAsync<Exception>(async () =>
|
||||||
|
await invoker.ExecuteAsync(DriverCapability.Read, "host-dead",
|
||||||
|
_ => throw new InvalidOperationException("dead"),
|
||||||
|
CancellationToken.None));
|
||||||
|
}
|
||||||
|
|
||||||
|
// host-live: succeeds on first call — unaffected by the dead-host breaker
|
||||||
|
var liveAttempts = 0;
|
||||||
|
await invoker.ExecuteAsync(DriverCapability.Read, "host-live",
|
||||||
|
_ => { liveAttempts++; return ValueTask.FromResult("ok"); },
|
||||||
|
CancellationToken.None);
|
||||||
|
|
||||||
|
liveAttempts.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class FlakeyDriver : IReadable, IWritable
|
||||||
|
{
|
||||||
|
private readonly int _failReadsBeforeIndex;
|
||||||
|
private readonly int _failWritesBeforeIndex;
|
||||||
|
|
||||||
|
public int ReadAttempts { get; private set; }
|
||||||
|
public int WriteAttempts { get; private set; }
|
||||||
|
|
||||||
|
public FlakeyDriver(int failReadsBeforeIndex = 0, int failWritesBeforeIndex = 0)
|
||||||
|
{
|
||||||
|
_failReadsBeforeIndex = failReadsBeforeIndex;
|
||||||
|
_failWritesBeforeIndex = failWritesBeforeIndex;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Task<IReadOnlyList<DataValueSnapshot>> ReadAsync(
|
||||||
|
IReadOnlyList<string> fullReferences,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
var attempt = ++ReadAttempts;
|
||||||
|
if (attempt <= _failReadsBeforeIndex)
|
||||||
|
throw new InvalidOperationException($"transient read failure #{attempt}");
|
||||||
|
|
||||||
|
var now = DateTime.UtcNow;
|
||||||
|
IReadOnlyList<DataValueSnapshot> result = fullReferences
|
||||||
|
.Select(_ => new DataValueSnapshot(Value: 0, StatusCode: 0u, SourceTimestampUtc: now, ServerTimestampUtc: now))
|
||||||
|
.ToList();
|
||||||
|
return Task.FromResult(result);
|
||||||
|
}
|
||||||
|
|
||||||
|
public Task<IReadOnlyList<WriteResult>> WriteAsync(
|
||||||
|
IReadOnlyList<WriteRequest> writes,
|
||||||
|
CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
var attempt = ++WriteAttempts;
|
||||||
|
if (attempt <= _failWritesBeforeIndex)
|
||||||
|
throw new InvalidOperationException($"transient write failure #{attempt}");
|
||||||
|
|
||||||
|
IReadOnlyList<WriteResult> result = writes.Select(_ => new WriteResult(0u)).ToList();
|
||||||
|
return Task.FromResult(result);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,91 @@
|
|||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Stability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class MemoryRecycleTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public async Task TierC_HardBreach_RequestsSupervisorRecycle()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var recycle = new MemoryRecycle(DriverTier.C, supervisor, NullLogger<MemoryRecycle>.Instance);
|
||||||
|
|
||||||
|
var requested = await recycle.HandleAsync(MemoryTrackingAction.HardBreach, 2_000_000_000, CancellationToken.None);
|
||||||
|
|
||||||
|
requested.ShouldBeTrue();
|
||||||
|
supervisor.RecycleCount.ShouldBe(1);
|
||||||
|
supervisor.LastReason.ShouldContain("hard-breach");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
public async Task InProcessTier_HardBreach_NeverRequestsRecycle(DriverTier tier)
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var recycle = new MemoryRecycle(tier, supervisor, NullLogger<MemoryRecycle>.Instance);
|
||||||
|
|
||||||
|
var requested = await recycle.HandleAsync(MemoryTrackingAction.HardBreach, 2_000_000_000, CancellationToken.None);
|
||||||
|
|
||||||
|
requested.ShouldBeFalse("Tier A/B hard-breach logs a promotion recommendation only (decisions #74, #145)");
|
||||||
|
supervisor.RecycleCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task TierC_WithoutSupervisor_HardBreach_NoOp()
|
||||||
|
{
|
||||||
|
var recycle = new MemoryRecycle(DriverTier.C, supervisor: null, NullLogger<MemoryRecycle>.Instance);
|
||||||
|
|
||||||
|
var requested = await recycle.HandleAsync(MemoryTrackingAction.HardBreach, 2_000_000_000, CancellationToken.None);
|
||||||
|
|
||||||
|
requested.ShouldBeFalse("no supervisor → no recycle path; action logged only");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
[InlineData(DriverTier.C)]
|
||||||
|
public async Task SoftBreach_NeverRequestsRecycle(DriverTier tier)
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var recycle = new MemoryRecycle(tier, supervisor, NullLogger<MemoryRecycle>.Instance);
|
||||||
|
|
||||||
|
var requested = await recycle.HandleAsync(MemoryTrackingAction.SoftBreach, 1_000_000_000, CancellationToken.None);
|
||||||
|
|
||||||
|
requested.ShouldBeFalse("soft-breach is surface-only at every tier");
|
||||||
|
supervisor.RecycleCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(MemoryTrackingAction.None)]
|
||||||
|
[InlineData(MemoryTrackingAction.Warming)]
|
||||||
|
public async Task NonBreachActions_NoOp(MemoryTrackingAction action)
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var recycle = new MemoryRecycle(DriverTier.C, supervisor, NullLogger<MemoryRecycle>.Instance);
|
||||||
|
|
||||||
|
var requested = await recycle.HandleAsync(action, 100_000_000, CancellationToken.None);
|
||||||
|
|
||||||
|
requested.ShouldBeFalse();
|
||||||
|
supervisor.RecycleCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class FakeSupervisor : IDriverSupervisor
|
||||||
|
{
|
||||||
|
public string DriverInstanceId => "fake-tier-c";
|
||||||
|
public int RecycleCount { get; private set; }
|
||||||
|
public string? LastReason { get; private set; }
|
||||||
|
|
||||||
|
public Task RecycleAsync(string reason, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
RecycleCount++;
|
||||||
|
LastReason = reason;
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,119 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Stability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class MemoryTrackingTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime T0 = new(2026, 4, 19, 12, 0, 0, DateTimeKind.Utc);
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void WarmingUp_Returns_Warming_UntilWindowElapses()
|
||||||
|
{
|
||||||
|
var tracker = new MemoryTracking(DriverTier.A, TimeSpan.FromMinutes(5));
|
||||||
|
|
||||||
|
tracker.Sample(100_000_000, T0).ShouldBe(MemoryTrackingAction.Warming);
|
||||||
|
tracker.Sample(105_000_000, T0.AddMinutes(1)).ShouldBe(MemoryTrackingAction.Warming);
|
||||||
|
tracker.Sample(102_000_000, T0.AddMinutes(4.9)).ShouldBe(MemoryTrackingAction.Warming);
|
||||||
|
|
||||||
|
tracker.Phase.ShouldBe(TrackingPhase.WarmingUp);
|
||||||
|
tracker.BaselineBytes.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void WindowElapsed_CapturesBaselineAsMedian_AndTransitionsToSteady()
|
||||||
|
{
|
||||||
|
var tracker = new MemoryTracking(DriverTier.A, TimeSpan.FromMinutes(5));
|
||||||
|
|
||||||
|
tracker.Sample(100_000_000, T0);
|
||||||
|
tracker.Sample(200_000_000, T0.AddMinutes(1));
|
||||||
|
tracker.Sample(150_000_000, T0.AddMinutes(2));
|
||||||
|
var first = tracker.Sample(150_000_000, T0.AddMinutes(5));
|
||||||
|
|
||||||
|
tracker.Phase.ShouldBe(TrackingPhase.Steady);
|
||||||
|
tracker.BaselineBytes.ShouldBe(150_000_000L, "median of 4 samples [100, 200, 150, 150] = (150+150)/2 = 150");
|
||||||
|
first.ShouldBe(MemoryTrackingAction.None, "150 MB is the baseline itself, well under soft threshold");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A, 3, 50)]
|
||||||
|
[InlineData(DriverTier.B, 3, 100)]
|
||||||
|
[InlineData(DriverTier.C, 2, 500)]
|
||||||
|
public void GetTierConstants_MatchesDecision146(DriverTier tier, int expectedMultiplier, long expectedFloorMB)
|
||||||
|
{
|
||||||
|
var (multiplier, floor) = MemoryTracking.GetTierConstants(tier);
|
||||||
|
multiplier.ShouldBe(expectedMultiplier);
|
||||||
|
floor.ShouldBe(expectedFloorMB * 1024 * 1024);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void SoftThreshold_UsesMax_OfMultiplierAndFloor_SmallBaseline()
|
||||||
|
{
|
||||||
|
// Tier A: mult=3, floor=50 MB. Baseline 10 MB → 3×10=30 MB < 10+50=60 MB → floor wins.
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 10L * 1024 * 1024);
|
||||||
|
tracker.SoftThresholdBytes.ShouldBe(60L * 1024 * 1024);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void SoftThreshold_UsesMax_OfMultiplierAndFloor_LargeBaseline()
|
||||||
|
{
|
||||||
|
// Tier A: mult=3, floor=50 MB. Baseline 200 MB → 3×200=600 MB > 200+50=250 MB → multiplier wins.
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 200L * 1024 * 1024);
|
||||||
|
tracker.SoftThresholdBytes.ShouldBe(600L * 1024 * 1024);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void HardThreshold_IsTwiceSoft()
|
||||||
|
{
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.B, 200L * 1024 * 1024);
|
||||||
|
tracker.HardThresholdBytes.ShouldBe(tracker.SoftThresholdBytes * 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Sample_Below_Soft_Returns_None()
|
||||||
|
{
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 100L * 1024 * 1024);
|
||||||
|
|
||||||
|
tracker.Sample(200L * 1024 * 1024, T0.AddMinutes(10)).ShouldBe(MemoryTrackingAction.None);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Sample_AtSoft_Returns_SoftBreach()
|
||||||
|
{
|
||||||
|
// Tier A, baseline 200 MB → soft = 600 MB. Sample exactly at soft.
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 200L * 1024 * 1024);
|
||||||
|
|
||||||
|
tracker.Sample(tracker.SoftThresholdBytes, T0.AddMinutes(10))
|
||||||
|
.ShouldBe(MemoryTrackingAction.SoftBreach);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Sample_AtHard_Returns_HardBreach()
|
||||||
|
{
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 200L * 1024 * 1024);
|
||||||
|
|
||||||
|
tracker.Sample(tracker.HardThresholdBytes, T0.AddMinutes(10))
|
||||||
|
.ShouldBe(MemoryTrackingAction.HardBreach);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Sample_AboveHard_Returns_HardBreach()
|
||||||
|
{
|
||||||
|
var tracker = WarmupWithBaseline(DriverTier.A, 200L * 1024 * 1024);
|
||||||
|
|
||||||
|
tracker.Sample(tracker.HardThresholdBytes + 100_000_000, T0.AddMinutes(10))
|
||||||
|
.ShouldBe(MemoryTrackingAction.HardBreach);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static MemoryTracking WarmupWithBaseline(DriverTier tier, long baseline)
|
||||||
|
{
|
||||||
|
var tracker = new MemoryTracking(tier, TimeSpan.FromMinutes(5));
|
||||||
|
tracker.Sample(baseline, T0);
|
||||||
|
tracker.Sample(baseline, T0.AddMinutes(5));
|
||||||
|
tracker.BaselineBytes.ShouldBe(baseline);
|
||||||
|
return tracker;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,101 @@
|
|||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Stability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class ScheduledRecycleSchedulerTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime T0 = new(2026, 4, 19, 0, 0, 0, DateTimeKind.Utc);
|
||||||
|
private static readonly TimeSpan Weekly = TimeSpan.FromDays(7);
|
||||||
|
|
||||||
|
[Theory]
|
||||||
|
[InlineData(DriverTier.A)]
|
||||||
|
[InlineData(DriverTier.B)]
|
||||||
|
public void TierAOrB_Ctor_Throws(DriverTier tier)
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
Should.Throw<ArgumentException>(() => new ScheduledRecycleScheduler(
|
||||||
|
tier, Weekly, T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ZeroOrNegativeInterval_Throws()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
Should.Throw<ArgumentException>(() => new ScheduledRecycleScheduler(
|
||||||
|
DriverTier.C, TimeSpan.Zero, T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance));
|
||||||
|
Should.Throw<ArgumentException>(() => new ScheduledRecycleScheduler(
|
||||||
|
DriverTier.C, TimeSpan.FromSeconds(-1), T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Tick_BeforeNextRecycle_NoOp()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var sch = new ScheduledRecycleScheduler(DriverTier.C, Weekly, T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||||
|
|
||||||
|
var fired = await sch.TickAsync(T0 + TimeSpan.FromDays(6), CancellationToken.None);
|
||||||
|
|
||||||
|
fired.ShouldBeFalse();
|
||||||
|
supervisor.RecycleCount.ShouldBe(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Tick_AtOrAfterNextRecycle_FiresOnce_AndAdvances()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var sch = new ScheduledRecycleScheduler(DriverTier.C, Weekly, T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||||
|
|
||||||
|
var fired = await sch.TickAsync(T0 + Weekly + TimeSpan.FromMinutes(1), CancellationToken.None);
|
||||||
|
|
||||||
|
fired.ShouldBeTrue();
|
||||||
|
supervisor.RecycleCount.ShouldBe(1);
|
||||||
|
sch.NextRecycleUtc.ShouldBe(T0 + Weekly + Weekly);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task RequestRecycleNow_Fires_Immediately_WithoutAdvancingSchedule()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var sch = new ScheduledRecycleScheduler(DriverTier.C, Weekly, T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||||
|
var nextBefore = sch.NextRecycleUtc;
|
||||||
|
|
||||||
|
await sch.RequestRecycleNowAsync("memory hard-breach", CancellationToken.None);
|
||||||
|
|
||||||
|
supervisor.RecycleCount.ShouldBe(1);
|
||||||
|
supervisor.LastReason.ShouldBe("memory hard-breach");
|
||||||
|
sch.NextRecycleUtc.ShouldBe(nextBefore, "ad-hoc recycle doesn't shift the cron schedule");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task MultipleFires_AcrossTicks_AdvanceOneIntervalEach()
|
||||||
|
{
|
||||||
|
var supervisor = new FakeSupervisor();
|
||||||
|
var sch = new ScheduledRecycleScheduler(DriverTier.C, TimeSpan.FromDays(1), T0, supervisor, NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||||
|
|
||||||
|
await sch.TickAsync(T0 + TimeSpan.FromDays(1) + TimeSpan.FromHours(1), CancellationToken.None);
|
||||||
|
await sch.TickAsync(T0 + TimeSpan.FromDays(2) + TimeSpan.FromHours(1), CancellationToken.None);
|
||||||
|
await sch.TickAsync(T0 + TimeSpan.FromDays(3) + TimeSpan.FromHours(1), CancellationToken.None);
|
||||||
|
|
||||||
|
supervisor.RecycleCount.ShouldBe(3);
|
||||||
|
sch.NextRecycleUtc.ShouldBe(T0 + TimeSpan.FromDays(4));
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class FakeSupervisor : IDriverSupervisor
|
||||||
|
{
|
||||||
|
public string DriverInstanceId => "tier-c-fake";
|
||||||
|
public int RecycleCount { get; private set; }
|
||||||
|
public string? LastReason { get; private set; }
|
||||||
|
|
||||||
|
public Task RecycleAsync(string reason, CancellationToken cancellationToken)
|
||||||
|
{
|
||||||
|
RecycleCount++;
|
||||||
|
LastReason = reason;
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,112 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Stability;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class WedgeDetectorTests
|
||||||
|
{
|
||||||
|
private static readonly DateTime Now = new(2026, 4, 19, 12, 0, 0, DateTimeKind.Utc);
|
||||||
|
private static readonly TimeSpan Threshold = TimeSpan.FromSeconds(120);
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void SubSixtySecondThreshold_ClampsToSixty()
|
||||||
|
{
|
||||||
|
var detector = new WedgeDetector(TimeSpan.FromSeconds(10));
|
||||||
|
detector.Threshold.ShouldBe(TimeSpan.FromSeconds(60));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Unhealthy_Driver_AlwaysNotApplicable()
|
||||||
|
{
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 5, ActiveMonitoredItems: 10, QueuedHistoryReads: 0, LastProgressUtc: Now.AddMinutes(-10));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Faulted, demand, Now).ShouldBe(WedgeVerdict.NotApplicable);
|
||||||
|
detector.Classify(DriverState.Degraded, demand, Now).ShouldBe(WedgeVerdict.NotApplicable);
|
||||||
|
detector.Classify(DriverState.Initializing, demand, Now).ShouldBe(WedgeVerdict.NotApplicable);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Idle_Subscription_Only_StaysIdle()
|
||||||
|
{
|
||||||
|
// Idle driver: bulkhead 0, monitored items 0, no history reads queued.
|
||||||
|
// Even if LastProgressUtc is ancient, the verdict is Idle, not Faulted.
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(0, 0, 0, Now.AddHours(-12));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Idle);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void PendingWork_WithRecentProgress_StaysHealthy()
|
||||||
|
{
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 2, ActiveMonitoredItems: 0, QueuedHistoryReads: 0, LastProgressUtc: Now.AddSeconds(-30));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Healthy);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void PendingWork_WithStaleProgress_IsFaulted()
|
||||||
|
{
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 2, ActiveMonitoredItems: 0, QueuedHistoryReads: 0, LastProgressUtc: Now.AddMinutes(-5));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Faulted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void MonitoredItems_Active_ButNoRecentPublish_IsFaulted()
|
||||||
|
{
|
||||||
|
// Subscription-only driver with live MonitoredItems but no publish progress within threshold
|
||||||
|
// is a real wedge — this is the case the previous "no successful Read" formulation used
|
||||||
|
// to miss (no reads ever happen).
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 0, ActiveMonitoredItems: 5, QueuedHistoryReads: 0, LastProgressUtc: Now.AddMinutes(-10));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Faulted);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void MonitoredItems_Active_WithFreshPublish_StaysHealthy()
|
||||||
|
{
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 0, ActiveMonitoredItems: 5, QueuedHistoryReads: 0, LastProgressUtc: Now.AddSeconds(-10));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Healthy);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void HistoryBackfill_SlowButMakingProgress_StaysHealthy()
|
||||||
|
{
|
||||||
|
// Slow historian backfill — QueuedHistoryReads > 0 but progress advances within threshold.
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(BulkheadDepth: 0, ActiveMonitoredItems: 0, QueuedHistoryReads: 50, LastProgressUtc: Now.AddSeconds(-60));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Healthy);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void WriteOnlyBurst_StaysIdle_WhenBulkheadEmpty()
|
||||||
|
{
|
||||||
|
// A write-only driver that just finished a burst: bulkhead drained, no subscriptions, no
|
||||||
|
// history reads. Idle — the previous formulation would have faulted here because no
|
||||||
|
// reads were succeeding even though the driver is perfectly healthy.
|
||||||
|
var detector = new WedgeDetector(Threshold);
|
||||||
|
var demand = new DemandSignal(0, 0, 0, Now.AddMinutes(-30));
|
||||||
|
|
||||||
|
detector.Classify(DriverState.Healthy, demand, Now).ShouldBe(WedgeVerdict.Idle);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void DemandSignal_HasPendingWork_TrueForAnyNonZeroCounter()
|
||||||
|
{
|
||||||
|
new DemandSignal(1, 0, 0, Now).HasPendingWork.ShouldBeTrue();
|
||||||
|
new DemandSignal(0, 1, 0, Now).HasPendingWork.ShouldBeTrue();
|
||||||
|
new DemandSignal(0, 0, 1, Now).HasPendingWork.ShouldBeTrue();
|
||||||
|
new DemandSignal(0, 0, 0, Now).HasPendingWork.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -220,6 +220,23 @@ public sealed class ModbusDriverTests
|
|||||||
builder.Variables.ShouldContain(v => v.BrowseName == "Run" && v.Info.DriverDataType == DriverDataType.Boolean);
|
builder.Variables.ShouldContain(v => v.BrowseName == "Run" && v.Info.DriverDataType == DriverDataType.Boolean);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Discover_propagates_WriteIdempotent_from_tag_to_attribute_info()
|
||||||
|
{
|
||||||
|
var (drv, _) = NewDriver(
|
||||||
|
new ModbusTagDefinition("SetPoint", ModbusRegion.HoldingRegisters, 0, ModbusDataType.Float32, WriteIdempotent: true),
|
||||||
|
new ModbusTagDefinition("PulseCoil", ModbusRegion.Coils, 0, ModbusDataType.Bool));
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
|
||||||
|
var builder = new RecordingBuilder();
|
||||||
|
await drv.DiscoverAsync(builder, CancellationToken.None);
|
||||||
|
|
||||||
|
var setPoint = builder.Variables.Single(v => v.BrowseName == "SetPoint");
|
||||||
|
var pulse = builder.Variables.Single(v => v.BrowseName == "PulseCoil");
|
||||||
|
setPoint.Info.WriteIdempotent.ShouldBeTrue();
|
||||||
|
pulse.Info.WriteIdempotent.ShouldBeFalse("default is opt-in per decision #44");
|
||||||
|
}
|
||||||
|
|
||||||
// --- helpers ---
|
// --- helpers ---
|
||||||
|
|
||||||
private sealed class RecordingBuilder : IAddressSpaceBuilder
|
private sealed class RecordingBuilder : IAddressSpaceBuilder
|
||||||
|
|||||||
@@ -65,6 +65,27 @@ public sealed class S7DiscoveryAndSubscribeTests
|
|||||||
builder.Variables[2].Attr.DriverDataType.ShouldBe(DriverDataType.Float32);
|
builder.Variables[2].Attr.DriverDataType.ShouldBe(DriverDataType.Float32);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task DiscoverAsync_propagates_WriteIdempotent_from_tag_to_attribute_info()
|
||||||
|
{
|
||||||
|
var opts = new S7DriverOptions
|
||||||
|
{
|
||||||
|
Host = "192.0.2.1",
|
||||||
|
Tags =
|
||||||
|
[
|
||||||
|
new("SetPoint", "DB1.DBW0", S7DataType.Int16, WriteIdempotent: true),
|
||||||
|
new("StartBit", "M0.0", S7DataType.Bool),
|
||||||
|
],
|
||||||
|
};
|
||||||
|
using var drv = new S7Driver(opts, "s7-idem");
|
||||||
|
|
||||||
|
var builder = new RecordingAddressSpaceBuilder();
|
||||||
|
await drv.DiscoverAsync(builder, TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
builder.Variables.Single(v => v.Name == "SetPoint").Attr.WriteIdempotent.ShouldBeTrue();
|
||||||
|
builder.Variables.Single(v => v.Name == "StartBit").Attr.WriteIdempotent.ShouldBeFalse("default is opt-in per decision #44");
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public void GetHostStatuses_returns_one_row_with_host_port_identity_pre_init()
|
public void GetHostStatuses_returns_one_row_with_host_port_identity_pre_init()
|
||||||
{
|
{
|
||||||
|
|||||||
136
tests/ZB.MOM.WW.OtOpcUa.Server.Tests/AuthorizationGateTests.cs
Normal file
136
tests/ZB.MOM.WW.OtOpcUa.Server.Tests/AuthorizationGateTests.cs
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
using Opc.Ua;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Server.Security;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class AuthorizationGateTests
|
||||||
|
{
|
||||||
|
private static NodeScope Scope(string cluster = "c1", string? tag = "tag1") => new()
|
||||||
|
{
|
||||||
|
ClusterId = cluster,
|
||||||
|
NamespaceId = "ns",
|
||||||
|
UnsAreaId = "area",
|
||||||
|
UnsLineId = "line",
|
||||||
|
EquipmentId = "eq",
|
||||||
|
TagId = tag,
|
||||||
|
Kind = NodeHierarchyKind.Equipment,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static NodeAcl Row(string group, NodePermissions flags) => new()
|
||||||
|
{
|
||||||
|
NodeAclRowId = Guid.NewGuid(),
|
||||||
|
NodeAclId = Guid.NewGuid().ToString(),
|
||||||
|
GenerationId = 1,
|
||||||
|
ClusterId = "c1",
|
||||||
|
LdapGroup = group,
|
||||||
|
ScopeKind = NodeAclScopeKind.Cluster,
|
||||||
|
ScopeId = null,
|
||||||
|
PermissionFlags = flags,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static AuthorizationGate MakeGate(bool strict, NodeAcl[] rows)
|
||||||
|
{
|
||||||
|
var cache = new PermissionTrieCache();
|
||||||
|
cache.Install(PermissionTrieBuilder.Build("c1", 1, rows));
|
||||||
|
var evaluator = new TriePermissionEvaluator(cache);
|
||||||
|
return new AuthorizationGate(evaluator, strictMode: strict);
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class FakeIdentity : UserIdentity, ILdapGroupsBearer
|
||||||
|
{
|
||||||
|
public FakeIdentity(string name, IReadOnlyList<string> groups)
|
||||||
|
{
|
||||||
|
DisplayName = name;
|
||||||
|
LdapGroups = groups;
|
||||||
|
}
|
||||||
|
public new string DisplayName { get; }
|
||||||
|
public IReadOnlyList<string> LdapGroups { get; }
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NullIdentity_StrictMode_Denies()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: []);
|
||||||
|
gate.IsAllowed(null, OpcUaOperation.Read, Scope()).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void NullIdentity_LaxMode_Allows()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: false, rows: []);
|
||||||
|
gate.IsAllowed(null, OpcUaOperation.Read, Scope()).ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void IdentityWithoutLdapGroups_StrictMode_Denies()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: []);
|
||||||
|
var identity = new UserIdentity(); // anonymous, no LDAP groups
|
||||||
|
|
||||||
|
gate.IsAllowed(identity, OpcUaOperation.Read, Scope()).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void IdentityWithoutLdapGroups_LaxMode_Allows()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: false, rows: []);
|
||||||
|
var identity = new UserIdentity();
|
||||||
|
|
||||||
|
gate.IsAllowed(identity, OpcUaOperation.Read, Scope()).ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void LdapGroupWithGrant_Allows()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: [Row("cn=ops", NodePermissions.Read)]);
|
||||||
|
var identity = new FakeIdentity("ops-user", ["cn=ops"]);
|
||||||
|
|
||||||
|
gate.IsAllowed(identity, OpcUaOperation.Read, Scope()).ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void LdapGroupWithoutGrant_StrictMode_Denies()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: [Row("cn=ops", NodePermissions.Read)]);
|
||||||
|
var identity = new FakeIdentity("other-user", ["cn=other"]);
|
||||||
|
|
||||||
|
gate.IsAllowed(identity, OpcUaOperation.Read, Scope()).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void WrongOperation_Denied()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: [Row("cn=ops", NodePermissions.Read)]);
|
||||||
|
var identity = new FakeIdentity("ops-user", ["cn=ops"]);
|
||||||
|
|
||||||
|
gate.IsAllowed(identity, OpcUaOperation.WriteOperate, Scope()).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void BuildSessionState_IncludesLdapGroups()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: []);
|
||||||
|
var identity = new FakeIdentity("u", ["cn=a", "cn=b"]);
|
||||||
|
|
||||||
|
var state = gate.BuildSessionState(identity, "c1");
|
||||||
|
|
||||||
|
state.ShouldNotBeNull();
|
||||||
|
state!.LdapGroups.Count.ShouldBe(2);
|
||||||
|
state.ClusterId.ShouldBe("c1");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void BuildSessionState_ReturnsNull_ForIdentityWithoutLdapGroups()
|
||||||
|
{
|
||||||
|
var gate = MakeGate(strict: true, rows: []);
|
||||||
|
|
||||||
|
gate.BuildSessionState(new UserIdentity(), "c1").ShouldBeNull();
|
||||||
|
}
|
||||||
|
}
|
||||||
177
tests/ZB.MOM.WW.OtOpcUa.Server.Tests/HealthEndpointsHostTests.cs
Normal file
177
tests/ZB.MOM.WW.OtOpcUa.Server.Tests/HealthEndpointsHostTests.cs
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
using System.Net.Http;
|
||||||
|
using System.Text.Json;
|
||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Server.Observability;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
|
||||||
|
|
||||||
|
[Trait("Category", "Integration")]
|
||||||
|
public sealed class HealthEndpointsHostTests : IAsyncLifetime
|
||||||
|
{
|
||||||
|
private static int _portCounter = 48500 + Random.Shared.Next(0, 99);
|
||||||
|
private readonly int _port = Interlocked.Increment(ref _portCounter);
|
||||||
|
private string Prefix => $"http://localhost:{_port}/";
|
||||||
|
private readonly DriverHost _driverHost = new();
|
||||||
|
private HealthEndpointsHost _host = null!;
|
||||||
|
private HttpClient _client = null!;
|
||||||
|
|
||||||
|
public ValueTask InitializeAsync()
|
||||||
|
{
|
||||||
|
_client = new HttpClient { BaseAddress = new Uri(Prefix) };
|
||||||
|
return ValueTask.CompletedTask;
|
||||||
|
}
|
||||||
|
|
||||||
|
public async ValueTask DisposeAsync()
|
||||||
|
{
|
||||||
|
_client.Dispose();
|
||||||
|
if (_host is not null) await _host.DisposeAsync();
|
||||||
|
}
|
||||||
|
|
||||||
|
private HealthEndpointsHost Start(Func<bool>? configDbHealthy = null, Func<bool>? usingStaleConfig = null)
|
||||||
|
{
|
||||||
|
_host = new HealthEndpointsHost(
|
||||||
|
_driverHost,
|
||||||
|
NullLogger<HealthEndpointsHost>.Instance,
|
||||||
|
configDbHealthy,
|
||||||
|
usingStaleConfig,
|
||||||
|
prefix: Prefix);
|
||||||
|
_host.Start();
|
||||||
|
return _host;
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Healthz_ReturnsHealthy_EmptyFleet()
|
||||||
|
{
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/healthz");
|
||||||
|
|
||||||
|
response.IsSuccessStatusCode.ShouldBeTrue();
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("status").GetString().ShouldBe("healthy");
|
||||||
|
body.GetProperty("configDbReachable").GetBoolean().ShouldBeTrue();
|
||||||
|
body.GetProperty("usingStaleConfig").GetBoolean().ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Healthz_StaleConfig_Returns200_WithFlag()
|
||||||
|
{
|
||||||
|
Start(configDbHealthy: () => false, usingStaleConfig: () => true);
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/healthz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("configDbReachable").GetBoolean().ShouldBeFalse();
|
||||||
|
body.GetProperty("usingStaleConfig").GetBoolean().ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Healthz_UnreachableConfig_And_NoCache_Returns503()
|
||||||
|
{
|
||||||
|
Start(configDbHealthy: () => false, usingStaleConfig: () => false);
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/healthz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.ServiceUnavailable);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Readyz_EmptyFleet_Is200_Healthy()
|
||||||
|
{
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/readyz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("verdict").GetString().ShouldBe("Healthy");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Readyz_WithHealthyDriver_Is200()
|
||||||
|
{
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("drv-1", DriverState.Healthy), "{}", CancellationToken.None);
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/readyz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("verdict").GetString().ShouldBe("Healthy");
|
||||||
|
body.GetProperty("drivers").GetArrayLength().ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Readyz_WithFaultedDriver_Is503()
|
||||||
|
{
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("dead", DriverState.Faulted), "{}", CancellationToken.None);
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("alive", DriverState.Healthy), "{}", CancellationToken.None);
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/readyz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.ServiceUnavailable);
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("verdict").GetString().ShouldBe("Faulted");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Readyz_WithDegradedDriver_Is200_WithDegradedList()
|
||||||
|
{
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("drv-ok", DriverState.Healthy), "{}", CancellationToken.None);
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("drv-deg", DriverState.Degraded), "{}", CancellationToken.None);
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/readyz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
var body = JsonDocument.Parse(await response.Content.ReadAsStringAsync()).RootElement;
|
||||||
|
body.GetProperty("verdict").GetString().ShouldBe("Degraded");
|
||||||
|
body.GetProperty("degradedDrivers").GetArrayLength().ShouldBe(1);
|
||||||
|
body.GetProperty("degradedDrivers")[0].GetString().ShouldBe("drv-deg");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Readyz_WithInitializingDriver_Is503()
|
||||||
|
{
|
||||||
|
await _driverHost.RegisterAsync(new StubDriver("init", DriverState.Initializing), "{}", CancellationToken.None);
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/readyz");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.ServiceUnavailable);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Unknown_Path_Returns404()
|
||||||
|
{
|
||||||
|
Start();
|
||||||
|
|
||||||
|
var response = await _client.GetAsync("/foo");
|
||||||
|
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.NotFound);
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class StubDriver : IDriver
|
||||||
|
{
|
||||||
|
private readonly DriverState _state;
|
||||||
|
public StubDriver(string id, DriverState state)
|
||||||
|
{
|
||||||
|
DriverInstanceId = id;
|
||||||
|
_state = state;
|
||||||
|
}
|
||||||
|
public string DriverInstanceId { get; }
|
||||||
|
public string DriverType => "Stub";
|
||||||
|
public Task InitializeAsync(string _, CancellationToken ct) => Task.CompletedTask;
|
||||||
|
public Task ReinitializeAsync(string _, CancellationToken ct) => Task.CompletedTask;
|
||||||
|
public Task ShutdownAsync(CancellationToken ct) => Task.CompletedTask;
|
||||||
|
public DriverHealth GetHealth() => new(_state, null, null);
|
||||||
|
public long GetMemoryFootprint() => 0;
|
||||||
|
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -46,7 +46,7 @@ public sealed class HistoryReadIntegrationTests : IAsyncLifetime
|
|||||||
ApplicationName = "OtOpcUaHistoryTest",
|
ApplicationName = "OtOpcUaHistoryTest",
|
||||||
ApplicationUri = "urn:OtOpcUa:Server:HistoryTest",
|
ApplicationUri = "urn:OtOpcUa:Server:HistoryTest",
|
||||||
PkiStoreRoot = _pkiRoot,
|
PkiStoreRoot = _pkiRoot,
|
||||||
AutoAcceptUntrustedClientCertificates = true,
|
AutoAcceptUntrustedClientCertificates = true, HealthEndpointsEnabled = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
||||||
|
|||||||
@@ -49,7 +49,7 @@ public sealed class MultipleDriverInstancesIntegrationTests : IAsyncLifetime
|
|||||||
ApplicationName = "OtOpcUaMultiDriverTest",
|
ApplicationName = "OtOpcUaMultiDriverTest",
|
||||||
ApplicationUri = "urn:OtOpcUa:Server:MultiDriverTest",
|
ApplicationUri = "urn:OtOpcUa:Server:MultiDriverTest",
|
||||||
PkiStoreRoot = _pkiRoot,
|
PkiStoreRoot = _pkiRoot,
|
||||||
AutoAcceptUntrustedClientCertificates = true,
|
AutoAcceptUntrustedClientCertificates = true, HealthEndpointsEnabled = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
||||||
|
|||||||
@@ -36,7 +36,7 @@ public sealed class OpcUaServerIntegrationTests : IAsyncLifetime
|
|||||||
ApplicationName = "OtOpcUaTest",
|
ApplicationName = "OtOpcUaTest",
|
||||||
ApplicationUri = "urn:OtOpcUa:Server:Test",
|
ApplicationUri = "urn:OtOpcUa:Server:Test",
|
||||||
PkiStoreRoot = _pkiRoot,
|
PkiStoreRoot = _pkiRoot,
|
||||||
AutoAcceptUntrustedClientCertificates = true,
|
AutoAcceptUntrustedClientCertificates = true, HealthEndpointsEnabled = false,
|
||||||
};
|
};
|
||||||
|
|
||||||
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
_server = new OpcUaApplicationHost(options, _driverHost, new DenyAllUserAuthenticator(),
|
||||||
|
|||||||
Reference in New Issue
Block a user