Compare commits
14 Commits
phase-3-pr
...
phase-6-pl
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4695a5c88e | ||
| 0109fab4bf | |||
|
|
c9e856178a | ||
| 63eb569fd6 | |||
|
|
fad04bbdf7 | ||
| 17f901bb65 | |||
|
|
ba3a5598e1 | ||
| 8cd932e7c9 | |||
|
|
28328def5d | ||
| d3bf544abc | |||
|
|
24435712c4 | ||
| 3f7b4d05e6 | |||
|
|
a79c5f3008 | ||
| a5299a2fee |
137
docs/v2/implementation/phase-6-1-resilience-and-observability.md
Normal file
137
docs/v2/implementation/phase-6-1-resilience-and-observability.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Phase 6.1 — Resilience & Observability Runtime
|
||||
|
||||
> **Status**: DRAFT — implementation plan for a cross-cutting phase that was never formalised. The v2 `plan.md` specifies Polly, Tier A/B/C protections, structured logging, and local-cache fallback by decision; none are wired end-to-end.
|
||||
>
|
||||
> **Branch**: `v2/phase-6-1-resilience-observability`
|
||||
> **Estimated duration**: 3 weeks
|
||||
> **Predecessor**: Phase 5 (drivers) — partial; S7 + OPC UA Client shipped, AB/TwinCAT/FOCAS paused
|
||||
> **Successor**: Phase 6.2 (Authorization runtime)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Land the cross-cutting runtime protections + operability features that `plan.md` + `driver-stability.md` specify by decision but that no driver-phase actually wires. End-state: every driver goes through the same Polly resilience layer, health endpoints render the live driver fleet, structured logs carry per-request correlation IDs, and the config substrate survives a central DB outage via a LiteDB local cache.
|
||||
|
||||
Closes these gaps flagged in the 2026-04-19 audit:
|
||||
|
||||
1. Polly v8 resilience pipelines wired to every `IDriver` capability (no-op per-driver today; Galaxy has a hand-rolled `CircuitBreaker` only).
|
||||
2. Tier A/B/C enforcement at runtime — `driver-stability.md` §2–4 and decisions #63–73 define memory watchdog, bounded queues, scheduled recycle, wedge detection; `MemoryWatchdog` exists only inside `Driver.Galaxy.Host`.
|
||||
3. Health endpoints (`/healthz`, `/readyz`) on `OtOpcUa.Server`.
|
||||
4. Structured Serilog with per-request correlation IDs (driver instance, OPC UA session, IPC call).
|
||||
5. LiteDB local cache + Polly retry + fallback on central-DB outage (decision #36).
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| `Core` → new `Core.Resilience` sub-namespace | Shared Polly pipeline builder (`DriverResiliencePipelines`), per-capability policy (Read / Write / Subscribe / HistoryRead / Discover / Probe / Alarm). One pipeline per driver instance; driver-options decide tuning. |
|
||||
| Every `IDriver*` consumer in the server | Wrap capability calls in the shared pipeline. Policy composition order: timeout → retry (with jitter, bounded by capability-specific `MaxRetries`) → circuit breaker (per driver instance, opens on N consecutive failures) → bulkhead (ceiling on in-flight requests per driver). |
|
||||
| `Core` → new `Core.Stability` sub-namespace | Generalise `MemoryWatchdog` (`Driver.Galaxy.Host`) into `DriverMemoryWatchdog` consuming `IDriver.GetMemoryFootprint()`. Add `ScheduledRecycleScheduler` (decision #67) for weekly/time-of-day recycle. Add `WedgeDetector` that flips a driver to Faulted when no successful Read in N × PublishingInterval. |
|
||||
| `DriverTypeRegistry` | Each driver type registers its `DriverTier` {A, B, C}. Tier C drivers must also advertise their out-of-process topology; the registry enforces invariants (Tier C has a `Proxy` + `Host` pair). |
|
||||
| `OtOpcUa.Server` → new Minimal API endpoints | `/healthz` (liveness — process alive + config DB reachable or LiteDB cache warm), `/readyz` (readiness — every driver instance reports `DriverState.Healthy`). JSON bodies cite individual driver health per instance. |
|
||||
| Serilog configuration | Centralize enrichers in `OtOpcUa.Server/Observability/LogContextEnricher.cs`. Every driver call runs inside a `LogContext.PushProperty` scope with {DriverInstanceId, DriverType, CapabilityName, CorrelationId (UA RequestHandle or internal GUID)}. Sink config stays rolling-file per CLAUDE.md; JSON-formatted output added alongside plain-text so SIEM ingestion works. |
|
||||
| `Configuration` project | Add `LiteDbConfigCache` adapter. Wrap EF Core queries in a Polly pipeline: timeout (2 s) → retry (3×, jittered) → fallback-to-cache. Cache refresh on successful DB query + after `sp_PublishGeneration`. Cache lives at `%ProgramData%/OtOpcUa/config-cache/<cluster-id>.db` per node. |
|
||||
| `DriverHostStatus` entity | Extend to carry `LastCircuitBreakerOpenUtc`, `ConsecutiveFailures`, `CurrentBulkheadDepth`, `LastRecycleUtc`. Admin `/hosts` page reads these. |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| Driver wire protocols | Resilience is a server-side wrapper; individual drivers don't see Polly. Their existing retry logic (ModbusTcpTransport reconnect, SessionReconnectHandler) stays in place as inner layers. |
|
||||
| Config DB schema | LiteDB cache is a read-only mirror; no new central tables except `DriverHostStatus` column additions. |
|
||||
| OPC UA wire behavior visible to clients | Health endpoints live on a separate HTTP port (4841 by convention); the OPC UA server on 4840 is unaffected. |
|
||||
| The four 2026-04-13 Galaxy stability findings | Already closed in Phase 2. Phase 6.1 *generalises* the pattern, doesn't re-fix Galaxy. |
|
||||
| Driver-layer SafeHandle usage | Existing Galaxy `SafeMxAccessHandle` + Modbus `TcpClient` disposal stay — they're driver-internal, not part of the cross-cutting layer. |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
- [ ] Phases 0–5 exit gates cleared (or explicitly deferred with task reference)
|
||||
- [ ] `driver-stability.md` §2–4 re-read; decisions #63–73 + #34–36 re-skimmed
|
||||
- [ ] Polly v8 NuGet available (`Microsoft.Extensions.Resilience` + `Polly.Core`) — verify package restore before task breakdown
|
||||
- [ ] LiteDB 5.x NuGet confirmed MIT + actively maintained
|
||||
- [ ] Existing drivers catalogued: Galaxy.Proxy, Modbus, S7, OpcUaClient — confirm test counts baseline so the resilience layer doesn't regress any
|
||||
- [ ] Serilog configuration inventory: locate every `Log.ForContext` call site that will need `LogContext` rewrap
|
||||
- [ ] Admin `/hosts` page's current `DriverHostStatus` consumption reviewed so the schema extensions don't break it
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Stream A — Resilience layer (1 week)
|
||||
|
||||
1. **A.1** Add `Polly.Core` + `Microsoft.Extensions.Resilience` to `Core`. Build `DriverResiliencePipelineBuilder` that composes Timeout → Retry (exponential backoff + jitter, capability-specific max retries) → CircuitBreaker (consecutive-failure threshold; half-open probe) → Bulkhead (max in-flight per driver instance). Unit tests cover each policy in isolation + composed pipeline.
|
||||
2. **A.2** `DriverResilienceOptions` record bound from `DriverInstance.ResilienceConfig` JSON column (new nullable). Defaults encoded per-tier: Tier A (OPC UA Client, S7) — 3 retries, 2 s timeout, 5-failure breaker; Tier B (Modbus) — same except 4 s timeout; Tier C (Galaxy) — 1 retry (inner supervisor handles restart), 10 s timeout, circuit-breaker trips but doesn't kill the driver (the Proxy supervisor already handles that).
|
||||
3. **A.3** `DriverCapabilityInvoker<T>` wraps every `IDriver*` method call. Existing server-side dispatch (whatever currently calls `driver.ReadAsync`) routes through the invoker. Policy injection via DI.
|
||||
4. **A.4** Remove the hand-rolled `CircuitBreaker` + `Backoff` from `Driver.Galaxy.Proxy/Supervisor/` — replaced by the shared layer. Keep `HeartbeatMonitor` (different concern: IPC liveness, not data-path resilience).
|
||||
5. **A.5** Unit tests: per-policy, per-composition. Integration test: Modbus driver under a FlakeyTransport that fails 5×, succeeds on 6th; invoker surfaces the eventual success. Bench: no-op overhead < 1% under nominal load.
|
||||
|
||||
### Stream B — Tier A/B/C stability runtime (1 week, can parallel with Stream A after A.1)
|
||||
|
||||
1. **B.1** `Core.Abstractions` → `DriverTier` enum {A, B, C}. Extend `DriverTypeRegistry` to require `DriverTier` at registration. Existing driver types get their tier stamped (Galaxy = C, Modbus = B, S7 = B, OpcUaClient = A).
|
||||
2. **B.2** Generalise `DriverMemoryWatchdog` (lift from `Driver.Galaxy.Host/MemoryWatchdog.cs`). Tier-specific thresholds: A = 256 MB RSS soft / 512 MB hard, B = 512 MB soft / 1 GB hard, C = 1 GB soft / 2 GB hard (decision #70 hybrid multiplier + floor). Soft threshold → log + metric; hard threshold → mark driver Faulted + trigger recycle.
|
||||
3. **B.3** `ScheduledRecycleScheduler` (decision #67): each driver instance can opt-in to a weekly recycle at a configured cron. Recycle = `ShutdownAsync` → `InitializeAsync`. Tier C drivers get the Proxy-side recycle; Tier A/B recycle in-process.
|
||||
4. **B.4** `WedgeDetector`: polling thread per driver instance; if `LastSuccessfulRead` older than `WedgeThreshold` (default 5 × PublishingInterval, minimum 60 s) AND driver state is `Healthy`, flag as wedged → force `ReinitializeAsync`. Prevents silent dead-subscriptions.
|
||||
5. **B.5** Tests: watchdog unit tests drive synthetic allocation; scheduler uses a virtual clock; wedge detector tests use a fake IClock + driver stub.
|
||||
|
||||
### Stream C — Health endpoints + structured logging (4 days)
|
||||
|
||||
1. **C.1** `OtOpcUa.Server/Observability/HealthEndpoints.cs` — Minimal API on a second Kestrel binding (default `http://+:4841`). `/healthz` reports process uptime + config-DB reachability (or cache-warm). `/readyz` enumerates `DriverInstance` rows + reports each driver's `DriverHealth.State`; returns 503 if ANY driver is Faulted. JSON body per `docs/v2/acl-design.md` §"Operator Dashboards" shape.
|
||||
2. **C.2** `LogContextEnricher` installed at Serilog config time. Every driver-capability call site wraps its body in `using (LogContext.PushProperty("DriverInstanceId", id)) using (LogContext.PushProperty("CorrelationId", correlationId))`. Correlation IDs: reuse OPC UA `RequestHeader.RequestHandle` when in-flight; otherwise generate `Guid.NewGuid().ToString("N")[..12]`.
|
||||
3. **C.3** Add JSON-formatted Serilog sink alongside the existing rolling-file plain-text sink so SIEMs (Splunk, Datadog) can ingest without a regex parser. Sink switchable via `Serilog:WriteJson` appsetting.
|
||||
4. **C.4** Integration test: boot server, issue Modbus read, assert log line contains `DriverInstanceId` + `CorrelationId` structured fields.
|
||||
|
||||
### Stream D — Config DB LiteDB fallback (1 week)
|
||||
|
||||
1. **D.1** `LiteDbConfigCache` adapter. Wraps `ConfigurationDbContext` queries that are safe to serve stale (cluster membership, generation metadata, driver instance definitions, LDAP role mapping). Write-path queries (draft save, publish) bypass the cache and fail hard on DB outage.
|
||||
2. **D.2** Cache refresh strategy: refresh on every successful read (write-through-cache), full refresh after `sp_PublishGeneration` confirmation. Cache entries carry `CachedAtUtc`; served entries older than 24 h trigger a synthetic `Warning` log line so operators see stale data in effect.
|
||||
3. **D.3** Polly pipeline in `Configuration` project: EF Core query → retry 3× → fallback to cache. On fallback, driver state stays `Healthy` but a `UsingStaleConfig` flag on the cluster's health report flips true.
|
||||
4. **D.4** Tests: in-memory SQL Server failure injected via `TestContainers`-ish double; cache returns last-known values; Admin UI banners reflect `UsingStaleConfig`.
|
||||
|
||||
### Stream E — Admin `/hosts` page refresh (3 days)
|
||||
|
||||
1. **E.1** Extend `DriverHostStatus` schema with Stream A resilience columns. Generate EF migration.
|
||||
2. **E.2** `Admin/FleetStatusHub` SignalR hub pushes `LastCircuitBreakerOpenUtc` + `CurrentBulkheadDepth` + `LastRecycleUtc` on change.
|
||||
3. **E.3** `/hosts` Blazor page renders new columns; red badge if `ConsecutiveFailures > breakerThreshold / 2`.
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
- [ ] **Polly coverage**: every `IDriver*` method call in the server dispatch layer routes through `DriverCapabilityInvoker`. Enforce via a Roslyn analyzer added to `Core.Abstractions` build (error on direct `IDriver.ReadAsync` calls outside the invoker).
|
||||
- [ ] **Tier registry**: every driver type registered in `DriverTypeRegistry` has a non-null `Tier`. Unit test walks the registry + asserts no gaps.
|
||||
- [ ] **Health contract**: `/healthz` + `/readyz` respond within 500 ms even with one driver Faulted.
|
||||
- [ ] **Structured log**: CI grep on `tests/` output asserts at least one log line contains `"DriverInstanceId"` + `"CorrelationId"` JSON fields.
|
||||
- [ ] **Cache fallback**: Integration test kills the SQL container mid-operation; driver health stays `Healthy`, `UsingStaleConfig` flips true.
|
||||
- [ ] No regression in existing test suites — `dotnet test ZB.MOM.WW.OtOpcUa.slnx` count equal-or-greater than pre-Phase-6.1 baseline.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| Polly pipeline adds per-request latency on hot path | Medium | Medium | Benchmark Stream A.5 before merging; 1 % overhead budget; inline hot path short-circuits when retry count = 0 |
|
||||
| LiteDB cache diverges from central DB | Medium | High | Stale-data banner in Admin UI; `UsingStaleConfig` flag surfaced on `/readyz`; cache refresh on every successful DB round-trip; 24-hour synthetic warning |
|
||||
| Tier watchdog false-positive-kills a legitimate batch load | Low | High | Soft/hard threshold split; soft only logs; hard triggers recycle; thresholds configurable per-instance |
|
||||
| Wedge detector races with slow-but-healthy drivers | Medium | High | Minimum 60 s threshold; detector only activates if driver claims `Healthy`; add circuit-breaker feedback so rapid oscillation trips instead of thrashing |
|
||||
| Roslyn analyzer breaks external driver authors | Low | Medium | Release analyzer as warning-level initially; upgrade to error in Phase 6.1+1 after one release cycle |
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
- [ ] Stream A: Polly shared pipeline + per-tier defaults + driver-capability invoker + tests
|
||||
- [ ] Stream B: Tier registry + generalised watchdog + scheduled recycle + wedge detector
|
||||
- [ ] Stream C: `/healthz` + `/readyz` + structured logging + JSON Serilog sink
|
||||
- [ ] Stream D: LiteDB cache + Polly fallback in Configuration
|
||||
- [ ] Stream E: Admin `/hosts` page refresh
|
||||
- [ ] Cross-cutting: `phase-6-1-compliance.ps1` exits 0; full solution `dotnet test` passes; exit-gate doc recorded
|
||||
|
||||
## Adversarial Review — 2026-04-19 (Codex, thread `019da489-e317-7aa1-ab1f-6335e0be2447`)
|
||||
|
||||
Plan substantially rewritten before implementation to address these findings. Each entry: severity · verdict · adjustment.
|
||||
|
||||
1. **Crit · ACCEPT** — Auto-retry collides with decisions #44/#45 (no auto-write-retry; opt-in via `WriteIdempotent` + CAS). Pipeline now **capability-specific**: Read/HistoryRead/Discover/Probe/Alarm-subscribe all get retries; **Write does not** unless the tag metadata carries `WriteIdempotent=true`. New `WriteIdempotentAttribute` surfaces on `ModbusTagDefinition` / `S7TagDefinition` / etc.
|
||||
2. **Crit · ACCEPT** — "One pipeline per driver instance" breaks decision #35's per-device isolation. **Change**: pipeline key is `(DriverInstanceId, HostName)` not just `DriverInstanceId`. One dead PLC behind a multi-device Modbus driver no longer opens the breaker for healthy siblings.
|
||||
3. **Crit · ACCEPT** — Memory watchdog + scheduled recycle at Tier A/B breaches decisions #73/#74 (process-kill protections are Tier-C-only). **Change**: Stream B splits into two — `MemoryTracking` (all tiers, soft/hard thresholds log + surface to Admin `/hosts`; never kills) and `MemoryRecycle` (Tier C only, requires out-of-process topology). Tier A/B overrun paths escalate to Tier C via a future PR, not auto-kill.
|
||||
4. **High · ACCEPT** — Removing Galaxy's hand-rolled `CircuitBreaker` drops decision #68 host-supervision crash-loop protection. **Change**: keep `Driver.Galaxy.Proxy/Supervisor/CircuitBreaker.cs` + `Backoff.cs` — they guard the IPC *process* re-spawn, not the per-call data path. Data-path Polly is an orthogonal layer.
|
||||
5. **High · ACCEPT** — Roslyn analyzer targeting `IDriver` misses the hot paths (`IReadable.ReadAsync`, `IWritable.WriteAsync`, `ISubscribable.SubscribeAsync` etc.). **Change**: analyzer rule now matches every method on the capability interfaces; compliance doc enumerates the full call-site list.
|
||||
6. **High · ACCEPT** — `/healthz` + `/readyz` under-specified for degraded-running. **Change**: add a state-matrix sub-section explicitly covering `Unknown` (pre-init: `/readyz` 503), `Initializing` (503), `Healthy` (200), `Degraded` (200 with JSON body flagging the degraded driver; `/readyz` is OR across drivers), `Faulted` (503), plus cached-config-serving (`/healthz` returns 200 + `UsingStaleConfig: true` in JSON body).
|
||||
7. **High · ACCEPT** — `WedgeDetector` based on "no successful Read" false-fires on write-only subscriptions + idle systems. **Change**: wedge criteria now `(hasPendingWork AND noProgressIn > threshold)` where `hasPendingWork` comes from the Polly bulkhead depth + active MonitoredItem count. Idle driver stays Healthy.
|
||||
8. **High · ACCEPT** — LiteDB cache serving mixed-generation reads breaks publish atomicity. **Change**: cache is snapshot-per-generation. Each published generation writes a sealed snapshot into `<cache-root>/<cluster>/<generationId>.db`; reads serve the last-known-sealed generation and never mix. Central DB outage during a *publish* means that publish fails (write path doesn't use cache); reads continue from the prior sealed snapshot.
|
||||
9. **Med · ACCEPT** — `DriverHostStatus` schema conflates per-host connectivity with per-driver-instance resilience counters. **Change**: new `DriverInstanceResilienceStatus` table separate from `DriverHostStatus`. Admin `/hosts` joins both for display.
|
||||
10. **Med · ACCEPT** — Compliance says analyzer-error; risks say analyzer-warning. **Change**: phase 6.1 ships at **error** level (this phase is the gate); warning-mode option removed.
|
||||
11. **Med · ACCEPT** — Hardcoded per-tier MB bands ignore decision #70's `max(multiplier × baseline, baseline + floor)` formula with observed-baseline capture. **Change**: watchdog captures baseline at post-init plateau (median of first 5 min GetMemoryFootprint samples) + applies the hybrid formula. Tier constants now encode the multiplier + floor, not raw MB.
|
||||
12. **Med · ACCEPT** — Tests mostly cover happy path. **Change**: Stream A.5 adds negative tests for duplicate-write-replay-under-timeout; Stream B.5 adds false-wedge-on-idle-subscription + false-wedge-on-slow-historic-backfill; Stream D.4 adds mixed-generation cache test + corrupt-first-boot cache test.
|
||||
|
||||
126
docs/v2/implementation/phase-6-2-authorization-runtime.md
Normal file
126
docs/v2/implementation/phase-6-2-authorization-runtime.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Phase 6.2 — Authorization Runtime (ACL + LDAP grants)
|
||||
|
||||
> **Status**: DRAFT — the v2 `plan.md` decision #129 + `acl-design.md` specify a 6-level permission-trie evaluator with `NodePermissions` bitmask grants, but no runtime evaluator exists. ACL tables are schematized but unread by the data path.
|
||||
>
|
||||
> **Branch**: `v2/phase-6-2-authorization-runtime`
|
||||
> **Estimated duration**: 2.5 weeks
|
||||
> **Predecessor**: Phase 6.1 (Resilience & Observability) — reuses the Polly pipeline for ACL-cache refresh retries
|
||||
> **Successor**: Phase 6.3 (Redundancy)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Wire ACL enforcement on every OPC UA Read / Write / Subscribe / Call path + LDAP group → admin role grants that the v2 plan specified but never ran. End-state: a user's effective permissions resolve through a per-session permission-trie over the 6-level `Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag` hierarchy, cached per session, invalidated on generation-apply + LDAP group expiry.
|
||||
|
||||
Closes these gaps:
|
||||
|
||||
1. **Data-path ACL enforcement** — `NodeAcl` table + `NodePermissions` flags shipped; `NodeAclService.cs` present as a CRUD surface; no code consults ACLs at `Read`/`Write` time. OPC UA server answers everything to everyone.
|
||||
2. **`LdapGroupRoleMapping` for cluster-scoped admin grants** — decision #105 shipped as the *design*; admin roles are hardcoded (`FleetAdmin` / `ConfigEditor` / `ReadOnly`) with no cluster-scoping and no LDAP-to-grant table. Decision #105 explicitly lifts this from v2.1 into v2.0.
|
||||
3. **Explicit Deny pathway** — deferred to v2.1 (decision #129 note). Phase 6.2 ships *grants only*; `Deny` stays out.
|
||||
4. **Admin UI ACL grant editor** — `AclsTab.razor` exists but edits the now-unused `NodeAcl` table; needs to wire to the runtime evaluator + the new `LdapGroupRoleMapping` table.
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| `Configuration` project | New entity `LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }`. Migration. Admin CRUD. |
|
||||
| `Core` → new `Core.Authorization` sub-namespace | `IPermissionEvaluator` interface; concrete `PermissionTrieEvaluator` implementation loads ACLs + LDAP mappings from Configuration, builds a trie keyed on the 6-level scope hierarchy, evaluates a `(UserClaim[], NodeId, NodePermissions)` → `bool` decision in O(depth × group-count). |
|
||||
| `Core.Authorization` cache | `PermissionTrieCache` — one trie per `(ClusterId, GenerationId)`. Rebuilt on `sp_PublishGeneration` confirmation; served from memory thereafter. Per-session evaluator keeps a reference to the current trie + user's LDAP groups. |
|
||||
| OPC UA server dispatch | `OtOpcUa.Server/OpcUa/DriverNodeManager.cs` Read/Write/HistoryRead/MonitoredItem-create paths call `PermissionEvaluator.Authorize(session.Identity, nodeId, NodePermissions.Read)` etc. before delegating to the driver. Unauthorized returns `BadUserAccessDenied` (0x80210000) — not a silent no-op per corrections-doc B1. |
|
||||
| `LdapAuthService` (existing) | On cookie-auth success, resolves the user's LDAP groups via `LdapGroupService.GetMemberships` + loads the matching `LdapGroupRoleMapping` rows → produces a role-claim list + cluster-scope claim list. Stored on the auth cookie. |
|
||||
| Admin UI `AclsTab.razor` | Repoint edits at the new `NodeAclService` API that writes through to the same table the evaluator reads. Add a "test this permission" probe that runs a dummy evaluator against a chosen `(user, nodeId, action)` so ops can sanity-check grants before publishing a draft. |
|
||||
| Admin UI new tab `RoleGrantsTab.razor` | CRUD over `LdapGroupRoleMapping`. Per-cluster + system-wide grants. FleetAdmin only. |
|
||||
| Audit log | Every Grant/Revoke/Publish on `LdapGroupRoleMapping` or `NodeAcl` writes an `AuditLog` row with old/new state + user. |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| OPC UA authn | Already done (PR 19 LDAP user identity + Basic256Sha256 profile). Phase 6.2 is authorization only. |
|
||||
| Explicit `Deny` grants | Decision #129 note explicitly defers to v2.1. Default-deny + additive grants only. |
|
||||
| Driver-side `SecurityClassification` metadata | Drivers keep reporting `Operate` / `ViewOnly` / etc. — the evaluator uses them as *part* of the decision but doesn't replace them. |
|
||||
| Galaxy namespace (SystemPlatform kind) | UNS levels don't apply; evaluator treats Galaxy nodes as `Cluster → Namespace → Tag` (skip UnsArea/UnsLine/Equipment). |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
- [ ] Phase 6.1 merged (reuse `Core.Resilience` Polly pipeline for the ACL cache-refresh retries)
|
||||
- [ ] `acl-design.md` re-read in full
|
||||
- [ ] Decision log #105, #129, corrections-doc B1 re-skimmed
|
||||
- [ ] Existing `NodeAcl` + `NodePermissions` flag enum audited; confirm bitmask flags match `acl-design.md` table
|
||||
- [ ] Existing `LdapAuthService` group-resolution code path traced end-to-end — confirm it already queries group memberships (we only need the caller to consume the result)
|
||||
- [ ] Test DB scenarios catalogued: two clusters, three LDAP groups per cluster, mixed grant shapes; captured as seed-data fixtures
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Stream A — `LdapGroupRoleMapping` table + migration (3 days)
|
||||
|
||||
1. **A.1** Entity + EF Core migration. Columns per §Scope table. Unique constraint on `(LdapGroup, ClusterId)` with null-tolerant comparer for the system-wide case. Index on `LdapGroup` for the hot-path lookup on auth.
|
||||
2. **A.2** `ILdapGroupRoleMappingService` CRUD. Wrap in the Phase 6.1 Polly pipeline (timeout → retry → fallback-to-cache).
|
||||
3. **A.3** Seed-data migration: preserve the current hardcoded `FleetAdmin` / `ConfigEditor` / `ReadOnly` mappings by seeding rows for the existing LDAP groups the dev box uses (`cn=fleet-admin,…`, `cn=config-editor,…`, `cn=read-only,…`). Op no-op migration for existing deployments.
|
||||
|
||||
### Stream B — Permission-trie evaluator (1 week)
|
||||
|
||||
1. **B.1** `IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, NodeId nodeId, NodePermissions needed)` — returns `bool`. Phase 6.2 returns only `true` / `false`; v2.1 can widen to `Allow`/`Deny`/`Indeterminate` if Deny lands.
|
||||
2. **B.2** `PermissionTrieBuilder` builds the trie from `NodeAcl` + `LdapGroupRoleMapping` joined to the current generation's `UnsArea` + `UnsLine` + `Equipment` + `Tag` tables. One trie per `(ClusterId, GenerationId)` so rollback doesn't smear permissions across generations.
|
||||
3. **B.3** Trie node structure: `{ Level: enum, ScopeId: Guid, AllowedPermissions: NodePermissions, ChildrenByLevel: Dictionary<Guid, TrieNode> }`. Evaluation walks from Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag, ORing allowed permissions at each level. Additive semantics: a grant at Cluster level cascades to every descendant tag.
|
||||
4. **B.4** `PermissionTrieCache` service scoped as singleton; exposes `GetTrieAsync(ClusterId, ct)` that returns the current-generation trie. Invalidated on `sp_PublishGeneration` via an in-process event bus; also on TTL expiry (24 h safety net).
|
||||
5. **B.5** Per-session cached evaluator: OPC UA Session authentication produces `UserAuthorizationState { ClusterId, LdapGroups[], Trie }`; cached on the session until session close or generation-apply.
|
||||
6. **B.6** Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
|
||||
|
||||
### Stream C — OPC UA server dispatch wiring (4 days)
|
||||
|
||||
1. **C.1** `DriverNodeManager.Read` — consult evaluator before delegating to `IReadable`. Unauthorized nodes get `BadUserAccessDenied` per-attribute, not on the whole batch.
|
||||
2. **C.2** `DriverNodeManager.Write` — same. Evaluator needs `NodePermissions.WriteOperate` / `WriteTune` / `WriteConfigure` depending on driver-reported `SecurityClassification` of the attribute.
|
||||
3. **C.3** `DriverNodeManager.HistoryRead` — ACL checks `NodePermissions.Read` (history uses the same Read flag per `acl-design.md`).
|
||||
4. **C.4** `DriverNodeManager.CreateMonitoredItem` — denies unauthorized nodes at subscription create time, not after the first publish. Cleaner than silently omitting notifications.
|
||||
5. **C.5** Alarm actions (acknowledge / confirm / shelve) — checks `AlarmAck` / `AlarmConfirm` / `AlarmShelve` flags.
|
||||
6. **C.6** Integration tests: boot server with a seed trie, auth as three distinct users with different group memberships, assert read of one tag allowed + read of another denied + write denied where Read allowed.
|
||||
|
||||
### Stream D — Admin UI refresh (4 days)
|
||||
|
||||
1. **D.1** `RoleGrantsTab.razor` — FleetAdmin-gated CRUD on `LdapGroupRoleMapping`. Per-cluster dropdown + system-wide checkbox. Validation: LDAP group must exist in the dev LDAP (GLAuth) before saving — best-effort probe with graceful degradation.
|
||||
2. **D.2** `AclsTab.razor` rewrites its edit path to write through the new `NodeAclService`. Adds a "Probe this permission" row: choose `(LDAP group, node, action)` → shows Allow / Deny + the reason (which grant matched).
|
||||
3. **D.3** Draft-generation diff viewer now includes an ACL section: "X grants added, Y grants removed, Z grants changed."
|
||||
4. **D.4** SignalR notification: `PermissionTrieCache` invalidation on `sp_PublishGeneration` pushes to Admin UI so operators see "this clusters permissions were just updated" within 2 s.
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
- [ ] **Data-path enforcement**: OPC UA Read against a NodeId the current user has no grant for returns `BadUserAccessDenied` with a ServiceResult, not Good with stale data. Verified by an integration test with a Basic256Sha256-secured session + a read-only LDAP identity.
|
||||
- [ ] **Trie invariants**: `PermissionTrieBuilder` is idempotent (building twice with identical inputs produces equal tries — override `Equals` to assert).
|
||||
- [ ] **Additive grants**: Cluster-level grant on User A means User A can read every tag in that cluster *without* needing any lower-level grant.
|
||||
- [ ] **Isolation between clusters**: a grant on Cluster 1 has zero effect on Cluster 2 for the same user.
|
||||
- [ ] **Galaxy path coverage**: ACL checks work on `Galaxy` folder nodes + tag nodes where the UNS levels are absent (the trie treats them as shallow `Cluster → Namespace → Tag`).
|
||||
- [ ] No regression in driver test counts.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| ACL evaluator latency on per-read hot path | Medium | High | Trie lookup is O(depth) = O(6); session-cached UserAuthorizationState avoids per-Read trie rebuild; benchmark in Stream B.6 |
|
||||
| Trie cache stale after a rollback | Medium | High | `sp_PublishGeneration` + `sp_RollbackGeneration` both emit the invalidation event; trie keyed on `(ClusterId, GenerationId)` so rollback fetches the prior trie cleanly |
|
||||
| `BadUserAccessDenied` returns expose sensitive browse-name metadata | Low | Medium | Server returns only the status code + NodeId; no message leak per OPC UA Part 4 §7.34 guidance |
|
||||
| LdapGroupRoleMapping migration breaks existing deployments | Low | High | Seed-migration preserves the hardcoded groups' effective grants verbatim; smoke test exercises the post-migration fleet admin login |
|
||||
| Deny semantics accidentally ship (would break `acl-design.md` defer) | Low | Medium | `IPermissionEvaluator.Authorize` returns `bool` (not tri-state) through Phase 6.2; widening to `Allow`/`Deny`/`Indeterminate` is a v2.1 ticket |
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
- [ ] Stream A: `LdapGroupRoleMapping` entity + migration + CRUD + seed
|
||||
- [ ] Stream B: evaluator + trie builder + cache + per-session state + unit tests
|
||||
- [ ] Stream C: OPC UA dispatch wiring on Read/Write/HistoryRead/Subscribe/Alarm paths
|
||||
- [ ] Stream D: Admin UI `RoleGrantsTab` + `AclsTab` refresh + SignalR invalidation
|
||||
- [ ] `phase-6-2-compliance.ps1` exits 0; exit-gate doc recorded
|
||||
|
||||
## Adversarial Review — 2026-04-19 (Codex, thread `019da48d-0d2b-7171-aed2-fc05f1f39ca3`)
|
||||
|
||||
1. **Crit · ACCEPT** — Trie must not conflate `LdapGroupRoleMapping` (control-plane admin claims per decision #105) with data-plane ACLs (decision #129). **Change**: `LdapGroupRoleMapping` is consumed only by the Admin UI role router. Data-plane trie reads `NodeAcl` rows joined against the session's **resolved LDAP groups**, never admin roles. Stream B.2 updated.
|
||||
2. **Crit · ACCEPT** — Cached `UserAuthorizationState` survives LDAP group changes because memberships only refresh at cookie-auth. Change: add `MembershipFreshnessInterval` (default 15 min); past that, next hot-path authz call forces group re-resolution (fail-closed if LDAP unreachable). Session-close-wins on config-rollback.
|
||||
3. **High · ACCEPT** — Node-local invalidation doesn't extend across redundant pair. **Change**: trie keyed on `(ClusterId, GenerationId)`; hot-path authz looks up `CurrentGenerationId` from the shared config DB (Polly-wrapped + sub-second cache). A Backup that read stale generation gets a mismatched trie → forces re-load. Implementation note added to Stream B.4.
|
||||
4. **High · ACCEPT** — Browse enforcement missing. **Change**: new Stream C.7 (`Browse + TranslateBrowsePathsToNodeIds` enforcement). Ancestor visibility implied when any descendant has a grant; denied ancestors filter from browse results per `acl-design.md` §Browse.
|
||||
5. **High · ACCEPT** — `HistoryRead` should use `NodePermissions.HistoryRead` bit, not `Read`. **Change**: Stream C.3 revised; separate unit test asserts `Read+no-HistoryRead` denies HistoryRead while allowing current-value reads.
|
||||
6. **High · ACCEPT** — Galaxy shallow-path (Cluster→Namespace→Tag) loses folder hierarchy authorization. **Change**: SystemPlatform namespaces use a `FolderSegment` scope-level between Namespace and Tag, populated from `Tag.FolderPath`; UNS-kind namespaces keep the 6-level hierarchy. Trie supports both via `ScopeKind` on each node.
|
||||
7. **High · ACCEPT** — Subscription re-authorization policy unresolved between create-time-only (fast, wrong on revoke) and per-publish (slow). **Change**: stamp each `MonitoredItem` with `(AuthGenerationId, MembershipVersion)`; re-evaluate on Publish only when either version changed. Revoked items drop to `BadUserAccessDenied` within one publish cycle.
|
||||
8. **Med · ACCEPT** — Mixed-authorization batch `Read` / `CreateMonitoredItems` service-result semantics underspecified. **Change**: Stream C.6 explicitly tests per-`ReadValueId` + per-`MonitoredItemCreateResult` denial in mixed batches; batch never collapses to a coarse failure.
|
||||
9. **Med · ACCEPT** — Missing surfaces: `Method.Call`, `HistoryUpdate`, event filter on subscriptions, subscription-transfer on reconnect, alarm-ack. **Change**: scope expanded — every OPC UA authorization surface enumerated in Stream C: Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge/Confirm/Shelve, Browse, TranslateBrowsePathsToNodeIds.
|
||||
10. **Med · ACCEPT** — `bool` evaluator bakes in grant-only semantics; collides with v2.1 Deny. **Change**: internal model uses `AuthorizationDecision { Allow | NotGranted | Denied, IReadOnlyList<MatchedGrant> Provenance }`. Phase 6.2 maps `Denied` → never produced; UI + audit log use the full record so v2.1 Deny lands without API break.
|
||||
11. **Med · ACCEPT** — 6.1 cache fallback is availability-oriented; applying it to auth is correctness-dangerous. **Change**: auth-specific staleness budget `AuthCacheMaxStaleness` (default 5 min, not 24 h). Past that, hot-path evaluator fails closed on cached reads; all authorization calls return `NotGranted` until fresh data lands. Documented in risks + compliance.
|
||||
12. **Low · ACCEPT** — Existing `NodeAclService` is raw CRUD. **Change**: new `ValidatedNodeAclAuthoringService` enforces scope-uniqueness + draft/publish invariants + rejects invalid (LDAP group, scope) pairs; Admin UI writes through it only. Stream D.2 adjusted.
|
||||
|
||||
130
docs/v2/implementation/phase-6-3-redundancy-runtime.md
Normal file
130
docs/v2/implementation/phase-6-3-redundancy-runtime.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Phase 6.3 — Redundancy Runtime
|
||||
|
||||
> **Status**: DRAFT — `CLAUDE.md` + `docs/Redundancy.md` describe a non-transparent warm/hot redundancy model with unique ApplicationUris, `RedundancySupport` advertisement, `ServerUriArray`, and dynamic `ServiceLevel`. Entities (`ServerCluster`, `ClusterNode`, `RedundancyRole`, `RedundancyMode`) exist; the runtime behavior (actual `ServiceLevel` number computation, mid-apply dip, `ServerUriArray` broadcast) is not wired.
|
||||
>
|
||||
> **Branch**: `v2/phase-6-3-redundancy-runtime`
|
||||
> **Estimated duration**: 2 weeks
|
||||
> **Predecessor**: Phase 6.2 (Authorization) — reuses the Phase 6.1 health endpoints for cluster-peer probing
|
||||
> **Successor**: Phase 6.4 (Admin UI completion)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Land the non-transparent redundancy protocol end-to-end: two `OtOpcUa.Server` instances in a `ServerCluster` each expose a live `ServiceLevel` node whose value reflects that instance's suitability to serve traffic, advertise each other via `ServerUriArray`, and transition role (Primary ↔ Backup) based on health + operator intent.
|
||||
|
||||
Closes these gaps:
|
||||
|
||||
1. **Dynamic `ServiceLevel`** — OPC UA Part 5 §6.3.34 specifies a Byte (0..255) that clients poll to pick the healthiest server. Our server publishes it as a static value today.
|
||||
2. **`ServerUriArray` broadcast** — Part 4 specifies that every node in a redundant pair should advertise its peers' ApplicationUris. Currently advertises only its own.
|
||||
3. **Primary / Backup role coordination** — entities carry `RedundancyRole` but the runtime doesn't read it; no peer health probing; no role-transfer on primary failure.
|
||||
4. **Mid-apply dip** — decision-level expectation that a server mid-generation-apply should report a *lower* ServiceLevel so clients cut over to the peer during the apply window. Not implemented.
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| `OtOpcUa.Server` → new `Server.Redundancy` sub-namespace | `RedundancyCoordinator` singleton. Resolves the current node's `ClusterNode` row at startup, loads its peers from `ServerCluster`, probes each peer's `/healthz` (Phase 6.1 endpoint) every `PeerProbeInterval` (default 2 s), maintains per-peer health state. |
|
||||
| OPC UA server root | `ServiceLevel` variable node becomes a `BaseDataVariable` whose value updates on `RedundancyCoordinator` state change. `ServerUriArray` array variable refreshes on cluster-topology change. `RedundancySupport` stays static (set from `RedundancyMode` at startup). |
|
||||
| `RedundancyCoordinator` computation | `ServiceLevel` formula: 255 = Primary + fully healthy + no apply in progress; 200 = Primary + an apply in the middle (clients should prefer peer); 100 = Backup + fully healthy; 50 = Backup + mid-apply; 0 = Faulted or peer-unreachable-and-I'm-not-authoritative. Documented in `docs/Redundancy.md` update. |
|
||||
| Role transition | Split-brain avoidance: role is *declared* in the shared config DB (`ClusterNode.RedundancyRole`), not elected at runtime. An operator flips the row (or a failover script does). Coordinator only reads; never writes. |
|
||||
| `sp_PublishGeneration` hook | Before the apply starts, the coordinator sets `ApplyInProgress = true` in-memory → `ServiceLevel` drops to mid-apply band. Clears after `sp_PublishGeneration` returns. |
|
||||
| Admin UI `/cluster/{id}` page | New `RedundancyTab.razor` — shows current node's role + ServiceLevel + peer reachability. FleetAdmin can trigger a role-swap by editing `ClusterNode.RedundancyRole` + publishing a draft. |
|
||||
| Metrics | New OpenTelemetry metrics: `ot_opcua_service_level{cluster,node}`, `ot_opcua_peer_reachable{cluster,node,peer}`, `ot_opcua_apply_in_progress{cluster,node}`. Sink via Phase 6.1 observability layer. |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| OPC UA authn / authz | Phases 6.2 + prior. Redundancy is orthogonal. |
|
||||
| Driver layer | Drivers aren't redundancy-aware; they run on each node independently against the same equipment. The server layer handles the ServiceLevel story. |
|
||||
| Automatic failover / election | Explicitly out of scope. Non-transparent = client picks which server to use via ServiceLevel + ServerUriArray. We do NOT ship consensus, leader election, or automatic promotion. Operator-driven failover is the v2.0 model per decision #79–85. |
|
||||
| Transparent redundancy (`RedundancySupport=Transparent`) | Not supported. If the operator asks for it the server fails startup with a clear error. |
|
||||
| Historian redundancy | Galaxy Historian's own redundancy (two historians on two CPUs) is out of scope. The Galaxy driver talks to whichever historian is reachable from its node. |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
- [ ] Phase 6.1 merged (uses `/healthz` for peer probing)
|
||||
- [ ] `CLAUDE.md` §Redundancy + `docs/Redundancy.md` re-read
|
||||
- [ ] Decisions #79–85 re-skimmed
|
||||
- [ ] `ServerCluster`/`ClusterNode`/`RedundancyRole`/`RedundancyMode` entities + existing migration reviewed
|
||||
- [ ] OPC UA Part 4 §Redundancy + Part 5 §6.3.34 (ServiceLevel) re-skimmed
|
||||
- [ ] Dev box has two OtOpcUa.Server instances configured against the same cluster — one designated Primary, one Backup — for integration testing
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Stream A — Cluster topology loader (3 days)
|
||||
|
||||
1. **A.1** `RedundancyCoordinator` startup path: reads `ClusterNode` row for the current node (identified by `appsettings.json` `Cluster:NodeId`), reads the cluster's peer list, validates invariants (no duplicate `ApplicationUri`, at most one `Primary` per cluster if `RedundancyMode.WarmActive`, at most two nodes total in v2.0 per decision #83).
|
||||
2. **A.2** Topology subscription — coordinator re-reads on `sp_PublishGeneration` confirmation so an operator role-swap takes effect after publish (no process restart needed).
|
||||
3. **A.3** Tests: two-node cluster seed, one-node cluster seed (degenerate), duplicate-uri rejection.
|
||||
|
||||
### Stream B — Peer health probing + ServiceLevel computation (4 days)
|
||||
|
||||
1. **B.1** `PeerProbeLoop` runs per peer at `PeerProbeInterval` (2 s default, configurable via `appsettings.json`). Calls peer's `/healthz` via `HttpClient`; timeout 1 s. Exponential backoff on sustained failure.
|
||||
2. **B.2** `ServiceLevelCalculator.Compute(current role, self health, peer reachable, apply in progress) → byte`. Matrix documented in §Scope.
|
||||
3. **B.3** Calculator reacts to inputs via `IObserver` pattern so changes immediately push to the OPC UA `ServiceLevel` node.
|
||||
4. **B.4** Tests: matrix coverage for all role × health × apply permutations (32 cases); injected `IClock` + fake `HttpClient` so tests are deterministic.
|
||||
|
||||
### Stream C — OPC UA node wiring (3 days)
|
||||
|
||||
1. **C.1** `ServiceLevel` variable node created under `ServerStatus` at server startup. Type `Byte`, AccessLevel = CurrentRead only. Subscribe to `ServiceLevelCalculator` observable; push updates via `DataChangeNotification`.
|
||||
2. **C.2** `ServerUriArray` variable node under `ServerCapabilities`. Array of `String`, length = peer count. Updates on topology change.
|
||||
3. **C.3** `RedundancySupport` variable — static at startup from `RedundancyMode`. Values: `None`, `Cold`, `Warm`, `WarmActive`, `Hot`. Phase 6.3 supports everything except `Transparent` + `HotAndMirrored`.
|
||||
4. **C.4** Test against the Client.CLI: connect to primary, read `ServiceLevel` → expect 255; pause primary apply → expect 200; fail primary → client sees `Bad_ServerNotConnected` + reconnects to peer at 100.
|
||||
|
||||
### Stream D — Apply-window integration (2 days)
|
||||
|
||||
1. **D.1** `sp_PublishGeneration` caller wraps the apply in `using (coordinator.BeginApplyWindow())`. `BeginApplyWindow` increments an in-process counter; ServiceLevel drops on first increment. Dispose decrements.
|
||||
2. **D.2** Nested applies handled by the counter (rarely happens but Ignition and Kepware clients have both been observed firing rapid-succession draft publishes).
|
||||
3. **D.3** Test: mid-apply subscribe on primary; assert the subscribing client sees the ServiceLevel drop immediately after the apply starts, then restore after apply completes.
|
||||
|
||||
### Stream E — Admin UI + metrics (3 days)
|
||||
|
||||
1. **E.1** `RedundancyTab.razor` under `/cluster/{id}/redundancy`. Shows each node's role, current ServiceLevel, peer reachability, last apply timestamp. Role-swap button posts a draft edit on `ClusterNode.RedundancyRole`; publish applies.
|
||||
2. **E.2** OpenTelemetry meter export: three gauges per the §Scope metrics. Sink via Phase 6.1 observability.
|
||||
3. **E.3** SignalR push: `FleetStatusHub` broadcasts ServiceLevel changes so the Admin UI updates within ~1 s of the coordinator observing a peer flip.
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
- [ ] **Primary-healthy** ServiceLevel = 255.
|
||||
- [ ] **Backup-healthy** ServiceLevel = 100.
|
||||
- [ ] **Mid-apply Primary** ServiceLevel = 200 — verified via Client.CLI subscription polling ServiceLevel during a forced draft publish.
|
||||
- [ ] **Peer-unreachable** handling: when a Primary can't probe its Backup's `/healthz`, Primary still serves at 255 (peer is the one with the problem). When a Backup can't probe Primary, Backup flips to 200 (per decision #81 — a lonely Backup promotes its advertised level to signal "I'll take over if you ask" without auto-promoting).
|
||||
- [ ] **Role transition via operator publish**: FleetAdmin swaps `RedundancyRole` rows in a draft, publishes; both nodes re-read topology on publish confirmation and flip ServiceLevel accordingly — no restart needed.
|
||||
- [ ] **ServerUriArray** returns exactly the peer node's ApplicationUri.
|
||||
- [ ] **Client.CLI cutover**: with a primary deliberately halted, a client that was connected to primary reconnects to the backup within the ServiceLevel-polling interval.
|
||||
- [ ] No regression in existing driver test suites; no regression in `/healthz` reachability under redundancy load.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| Split-brain from operator race (both nodes marked Primary) | Low | High | Coordinator rejects startup if its cluster has >1 Primary row; logs + fails fast. Document as a publish-time validation in `sp_PublishGeneration`. |
|
||||
| ServiceLevel thrashing on flaky peer | Medium | Medium | 2 s probe interval + 3-sample smoothing window; only declares a peer unreachable after 3 consecutive failed probes |
|
||||
| Client ignores ServiceLevel and stays on broken primary | Medium | Medium | Documented in `docs/Redundancy.md` — non-transparent redundancy requires client cooperation; most SCADA clients (Ignition, Kepware, Aveva OI Gateway) honor it. Unit-test the advertised values; field behavior is client-responsibility |
|
||||
| Apply-window counter leaks on exception | Low | High | `BeginApplyWindow` returns `IDisposable`; `using` syntax enforces paired decrement; unit test for exception-in-apply path |
|
||||
| `HttpClient` probe leaks sockets | Low | Medium | Single shared `HttpClient` per coordinator (not per-probe); timeouts tight to avoid keeping connections open during peer downtime |
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
- [ ] Stream A: topology loader + tests
|
||||
- [ ] Stream B: peer probe + ServiceLevel calculator + 32-case matrix tests
|
||||
- [ ] Stream C: ServiceLevel / ServerUriArray / RedundancySupport node wiring + Client.CLI smoke test
|
||||
- [ ] Stream D: apply-window integration + nested-apply counter
|
||||
- [ ] Stream E: Admin `RedundancyTab` + OpenTelemetry metrics + SignalR push
|
||||
- [ ] `phase-6-3-compliance.ps1` exits 0; exit-gate doc; `docs/Redundancy.md` updated with the ServiceLevel matrix
|
||||
|
||||
## Adversarial Review — 2026-04-19 (Codex, thread `019da490-3fa0-7340-98b8-cceeca802550`)
|
||||
|
||||
1. **Crit · ACCEPT** — No publish-generation fencing enables split-publish advertising both as authoritative. **Change**: coordinator CAS on a monotonic `ConfigGenerationId`; every topology decision is generation-stamped; peers reject state propagated from a lower generation.
|
||||
2. **Crit · ACCEPT** — `>1 Primary` at startup covered but runtime containment missing when invalid topology appears later (mid-apply race). **Change**: add runtime `InvalidTopology` state — both nodes self-demote to ServiceLevel 2 (the "detected inconsistency" band, below normal operation) until convergence.
|
||||
3. **High · ACCEPT** — `0 = Faulted` collides with OPC UA Part 5 §6.3.34 semantics where 0 means **Maintenance** and 1 means NoData. **Change**: reserve **0** for operator-declared maintenance-mode only; Faulted/unreachable uses **1** (NoData); in-range degraded states occupy 2..199.
|
||||
4. **High · ACCEPT** — Matrix collapses distinct operational states onto the same value. **Change**: matrix expanded to Authoritative-Primary=255, Isolated-Primary=230 (peer unreachable — still serving), Primary-Mid-Apply=200, Recovering-Primary=180, Authoritative-Backup=100, Isolated-Backup=80 (primary unreachable — "take over if asked"), Backup-Mid-Apply=50, Recovering-Backup=30.
|
||||
5. **High · ACCEPT** — `/healthz` from 6.1 is HTTP-healthy but doesn't guarantee OPC UA data plane. **Change**: add a redundancy-specific probe `UaHealthProbe` — issues a `ReadAsync(ServiceLevel)` against the peer's OPC UA endpoint via a lightweight client session. `/healthz` remains the fast-fail; the UA probe is the authority signal.
|
||||
6. **High · ACCEPT** — `ServerUriArray` must include self + peers, not peers only. **Change**: array contains `[self.ApplicationUri, peer.ApplicationUri]` in stable deterministic ordering; compliance test asserts local-plus-peer membership.
|
||||
7. **Med · ACCEPT** — No `Faulted → Recovering → Healthy` path. **Change**: add `Recovering` state with min dwell time (60 s default) + positive publish witness (one successful Read on a reference node) before returning to Healthy. Thrash-prevention.
|
||||
8. **Med · ACCEPT** — Topology change during in-flight probe undefined. **Change**: every probe task tagged with `ConfigGenerationId` at dispatch; obsolete results discarded; in-flight probes cancelled on topology reload.
|
||||
9. **Med · ACCEPT** — Apply-window counter race on exception/cancellation/async ownership. **Change**: apply-window is a named lease keyed to `(ConfigGenerationId, PublishRequestId)` with disposal enforced via `await using`; watchdog detects leased-but-abandoned and force-closes after `ApplyMaxDuration` (default 10 min).
|
||||
10. **High · ACCEPT** — Ignition + Kepware + Aveva OI Gateway `ServiceLevel` compliance is unverified. **Change**: risk elevated to High; add Stream F (new) — build an interop matrix: validate against Ignition 8.1/8.3, Kepware KEPServerEX 6.x, Aveva OI Gateway 2020R2 + 2023R1. Document per-client cutover behaviour. Field deployments get a documented compatibility table; clients that ignore ServiceLevel documented as requiring explicit backup-endpoint config.
|
||||
11. **Med · ACCEPT** — Galaxy MXAccess re-session on Primary death not in acceptance. **Change**: Stream F adds an end-to-end failover smoke test that boots Galaxy.Proxy on both nodes, kills Primary, asserts Galaxy consumer reconnects to Backup within `(SessionTimeout + KeepAliveInterval × 3)` budget. `docs/Redundancy.md` updated with required session timeouts.
|
||||
12. **Med · ACCEPT** — Transparent-mode startup rejection is outage-prone. **Change**: `sp_PublishGeneration` validates `RedundancyMode` pre-publish — unsupported values reject the publish attempt with a clear validation error; runtime never sees an unsupported mode. Last-good config stays active.
|
||||
|
||||
120
docs/v2/implementation/phase-6-4-admin-ui-completion.md
Normal file
120
docs/v2/implementation/phase-6-4-admin-ui-completion.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Phase 6.4 — Admin UI Completion
|
||||
|
||||
> **Status**: DRAFT — Phase 1 Stream E shipped the Admin scaffold + core pages; several feature-completeness items from its completion checklist (`phase-1-configuration-and-admin-scaffold.md` §Stream E) never landed. This phase closes them.
|
||||
>
|
||||
> **Branch**: `v2/phase-6-4-admin-ui-completion`
|
||||
> **Estimated duration**: 2 weeks
|
||||
> **Predecessor**: Phase 6.3 (Redundancy runtime) — reuses the `/cluster/{id}` page layout for the new tabs
|
||||
> **Successor**: v2 release-readiness capstone (Task #121)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Close the Admin UI feature-completeness checklist that Phase 1 Stream E exit gate left open. Each item below is an existing `phase-1-configuration-and-admin-scaffold.md` completion-checklist entry that is currently unchecked.
|
||||
|
||||
Gaps to close:
|
||||
|
||||
1. **UNS Structure tab drag/move with impact preview** — decision #115 + `admin-ui.md` §"UNS". Current state: list-only render; no drag reorder; no "X lines / Y equipment impacted" preview.
|
||||
2. **Equipment CSV import + 5-identifier search** — decision #95 + #117. Current state: basic form; no CSV parser; search indexes only ZTag.
|
||||
3. **Draft-generation diff viewer** — enhance existing `DiffViewer.razor` to show generation-diff not just staged-edit diff; highlight ACL grant changes (lands after Phase 6.2).
|
||||
4. **`_base` equipment-class Identification fields exposure** — decision #138–139. Columns exist on `Equipment`; no Admin UI field group; no address-space exposure of the OPC 40010 sub-folder.
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| `Admin/Pages/UnsTab.razor` | Rewrite as a tree component with drag-drop (Blazor-native HTML5 DnD; no third-party dep). Each drag fires a "Compute Impact" call against the draft-generation state + renders a modal preview ("Moving Line 'Oven-2' from 'Packaging' to 'Assembly' will re-home 14 equipment + re-parent 237 tags"). Confirmation commits the draft edit. |
|
||||
| `Admin/Services/UnsImpactAnalyzer.cs` | New service. Given a move-operation (line move, area rename, line merge), computes cascade counts by walking the draft-generation `Equipment` + `Tag` tables. Pure-function shape; testable in isolation. |
|
||||
| `Admin/Pages/EquipmentTab.razor` | Add CSV-import button → modal with file picker + dry-run preview. Add multi-identifier search bar (ZTag / SAPID / UniqueId / Alias1 / Alias2) per decision #95 — parses any of the five, shows matches across draft + published generations. |
|
||||
| `Admin/Services/EquipmentCsvImporter.cs` | New service. Parses CSV with documented header row; validates each row against the `Equipment` schema (required fields + `ExternalIdReservation` freshness); returns `ImportPreview` DTO with per-row accept/reject + reason; commit step wraps in a single EF transaction. |
|
||||
| `Admin/Pages/DraftEditor.razor` + `DiffViewer.razor` | Diff viewer expanded: adds sections for ACL grants (from Phase 6.2 `LdapGroupRoleMapping` + `NodeAcl`), redundancy-role changes (from Phase 6.3), equipment-class `_base` Identification fields. Render each section collapsible. |
|
||||
| `Admin/Components/IdentificationFields.razor` | New component. Renders the OPC 40010 nullable columns (Manufacturer, Model, SerialNumber, ProductInstanceUri, HardwareRevision, SoftwareRevision, DeviceRevision, YearOfConstruction, MonthOfConstruction) as a labelled field group on the `EquipmentTab` detail view. |
|
||||
| `OtOpcUa.Server/OpcUa/DriverNodeManager` — Equipment folder build | When an `Equipment` row has non-null Identification fields, the server adds an `Identification` sub-folder under the Equipment node containing one variable per non-null field. Matches OPC 40010 companion spec. |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| Admin UI visual language | Bootstrap 5 / cookie auth / sidebar layout unchanged — consistency with ScadaLink design reference. |
|
||||
| LDAP auth flow | Already shipped in Phase 1. Phase 6.4 is additive UI only. |
|
||||
| Core abstractions / driver layer | Admin UI changes don't touch drivers. |
|
||||
| Equipment-class *template schema validation* | Still deferred (decision #112 — schemas repo not landed). We expose the Identification fields but don't validate against a template hierarchy. |
|
||||
| Drag/move to *other clusters* | Out of scope — equipment is cluster-scoped per decision #82. Cross-cluster migration is a different workflow. |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
- [ ] Phase 6.2 merged (ACL grants are part of the new diff viewer sections)
|
||||
- [ ] Phase 6.3 merged (redundancy-role changes are part of the diff viewer)
|
||||
- [ ] `phase-1-configuration-and-admin-scaffold.md` §Stream E completion checklist re-read — confirm these are the remaining items
|
||||
- [ ] `admin-ui.md` re-skimmed for screen layouts
|
||||
- [ ] Existing `EquipmentTab.razor` / `UnsTab.razor` / `DraftEditor.razor` diff'd against what ships today so the edits are additive not destructive
|
||||
- [ ] Dev Galaxy available for OPC 40010 exposure smoke testing
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Stream A — UNS drag/reorder + impact preview (4 days)
|
||||
|
||||
1. **A.1** `UnsImpactAnalyzer` service. Inputs: `(DraftGenerationId, MoveOperation)`. Outputs: `ImpactPreview { AffectedEquipmentCount, AffectedTagCount, CascadeWarnings[] }`. Unit tests cover line move / area rename / line merge.
|
||||
2. **A.2** HTML5 DnD on a tree component. No JS interop beyond `ondragstart`/`ondragover`/`ondrop` — keeps build + testability simple.
|
||||
3. **A.3** Modal preview wired to `UnsImpactAnalyzer` output; "Confirm" commits a draft edit via `DraftService`.
|
||||
4. **A.4** Playwright smoke test (or equivalent): drag a line across areas, assert modal shows the right counts, assert draft row reflects the move.
|
||||
|
||||
### Stream B — Equipment CSV import + 5-identifier search (4 days)
|
||||
|
||||
1. **B.1** `EquipmentCsvImporter` with a documented header row (`ZTag, SAPID, UniqueId, Alias1, Alias2, Name, UnsAreaName, UnsLineName, Manufacturer, Model, SerialNumber, …`). Parser rejects unknown columns + blank required fields + duplicate ZTags.
|
||||
2. **B.2** `ImportPreview` UI: per-row accept/reject table. Reject reasons: "ZTag already exists in draft", "ExternalIdReservation conflict with Cluster X", "UnsLineName not found in draft UNS tree", etc. Operator reviews then clicks "Commit" → single EF transaction.
|
||||
3. **B.3** Multi-identifier search — bar accepts any of the 5 identifiers, probes each column in parallel, returns first-match-wins + disambiguation list if multiple match.
|
||||
4. **B.4** Smoke tests: 100-row CSV with 10 intentional conflicts (5 ZTag dupes, 3 reservation clashes, 2 missing UnsLines); assert preview flags each; assert commit rolls back cleanly when a conflict surfaces post-preview.
|
||||
|
||||
### Stream C — Diff viewer enhancements (3 days)
|
||||
|
||||
1. **C.1** Refactor `DiffViewer.razor` into a base component + section plugins. Section plugins: `StructuralDiffSection` (UNS tree), `EquipmentDiffSection` (Equipment rows), `TagDiffSection` (Tag rows), `AclDiffSection` (ACL grants — depends on Phase 6.2), `RedundancyDiffSection` (role changes — depends on Phase 6.3), `IdentificationDiffSection` (OPC 40010 fields).
|
||||
2. **C.2** Each section renders collapsed by default; counts + top-line summary always visible.
|
||||
3. **C.3** Tests: seed two generations with deliberate diffs, assert every section reports the right counts + top-line summary.
|
||||
|
||||
### Stream D — OPC 40010 Identification exposure (3 days)
|
||||
|
||||
1. **D.1** `IdentificationFields.razor` component — labelled inputs; nullable columns show empty input; required field validation only on commit.
|
||||
2. **D.2** `DriverNodeManager` equipment-folder builder — after building the equipment node, inspect the Identification columns; if any non-null, add an `Identification` sub-folder with variable-per-field.
|
||||
3. **D.3** Address-space smoke test via Client.CLI: browse an equipment node, assert `Identification` sub-folder present when columns are set, absent when all null.
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
- [ ] **UNS drag/move**: drag a line across areas; modal preview shows correct impacted-equipment + impacted-tag counts.
|
||||
- [ ] **Equipment CSV**: 100-row CSV with 10 conflicts imports cleanly (preview flags each, commit rolls back mid-conflict).
|
||||
- [ ] **5-identifier search**: querying any of the 5 IDs returns the matching row; ambiguous searches list options.
|
||||
- [ ] **Diff viewer**: every section renders for a 2-generation diff with deliberate changes in every category.
|
||||
- [ ] **OPC 40010 exposure**: Client.CLI browse shows `Identification` sub-folder when equipment has non-null columns; folder absent when all null.
|
||||
- [ ] **ScadaLink visual parity**: operator-equivalence reviewer signs off that the new tabs feel consistent with existing Admin UI pages.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| UNS drag-drop janky on large trees (>500 nodes) | Medium | Medium | Virtualize the tree component; default-collapse nested areas; test with a synthetic 1000-equipment seed |
|
||||
| CSV import performance on 10k-row imports | Medium | Medium | Stream-parse rather than load-into-memory; preview renders in batches of 100; commit is chunked-EF-insert with progress bar |
|
||||
| Diff viewer becomes unwieldy with many sections | Low | Medium | Each section collapsed by default; top-line summary row always shown; Phase 6.4 caps at 6 sections |
|
||||
| OPC 40010 sub-folder accidentally exposes NULL/empty identification columns as empty-string variables | Low | Low | Column null-check in the builder; drop variables whose DB value is null |
|
||||
| 5-identifier search pulls full table | Medium | Medium | Indexes on each of ZTag/SAPID/UniqueId/Alias1/Alias2; search query uses a UNION of 5 indexed lookups; falls back to LIKE only on explicit operator opt-in |
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
- [ ] Stream A: `UnsImpactAnalyzer` + drag-drop tree + modal preview + Playwright smoke
|
||||
- [ ] Stream B: `EquipmentCsvImporter` + preview modal + 5-identifier search + conflict-rollback test
|
||||
- [ ] Stream C: `DiffViewer` refactor + 6 section plugins + 2-generation diff test
|
||||
- [ ] Stream D: `IdentificationFields.razor` + address-space builder change + Client.CLI browse test
|
||||
- [ ] Visual-compliance reviewer signoff
|
||||
- [ ] Full solution `dotnet test` passes; `phase-6-4-compliance.ps1` exits 0; exit-gate doc
|
||||
|
||||
## Adversarial Review — 2026-04-19 (Codex, via `codex-rescue` subagent)
|
||||
|
||||
1. **Crit · ACCEPT** — Stale UNS impact preview can overwrite concurrent draft edits. **Change**: each preview carries a `DraftRevisionToken`; `Confirm` compares against the current draft + rejects with a `409 Conflict / refresh-required` modal if any draft edit landed since the preview was generated. Stream A.3 updated.
|
||||
2. **High · ACCEPT** — CSV import atomicity is internally contradictory (single EF transaction vs. chunked inserts). **Change**: one explicit model — staged-import table (`EquipmentImportBatch { Id, CreatedAtUtc, RowsStaged, RowsAccepted, RowsRejected }`) receives rows in chunks; final `FinaliseImportBatch` is atomic over `Equipment` + `ExternalIdReservation`. Rollback is "drop the batch row" — the real Equipment table is never partially mutated.
|
||||
3. **Crit · ACCEPT** — Identifier contract rewrite mis-cites decisions. **Change**: revert to the `admin-ui.md` + decision #117 canonical set — `ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid`. CSV header follows that set verbatim. Introduce a separate decision entry for versioned CSV header shape before adding any new column; CSV header row must start with `# OtOpcUaCsv v1` so future shape changes are unambiguous.
|
||||
4. **Med · ACCEPT** — Search ordering undefined. **Change**: rank SQL — exact match on any identifier scores 100; prefix match 50; LIKE-fuzzy 20; published > draft tie-breaker; `ORDER BY score DESC, RowVersion DESC`. Typeahead shows which field matched via trailing badge.
|
||||
5. **High · ACCEPT** — HTML5 DnD on virtualized tree is aspirational. **Change**: Stream A.2 rewritten — commits to **`MudBlazor.TreeView` + `MudBlazor.DropTarget`** (already a transitive dep via the existing Admin UI). Build a 1000-node synthetic seed in A.1 + validate drag-latency budget before implementing impact preview. If MudBlazor can't hit the budget, fall back to a flat-list reorder UI with Area/Line dropdowns (loss of visual drag affordance but unblocks the feature).
|
||||
6. **Med · ACCEPT** — Collapsed-by-default doesn't handle generation-sized diffs. **Change**: each diff section has a hard row cap (1000 by default). Over-cap sections render an aggregate summary + "Load full diff" button that streams via SignalR in 500-row pages. Decision #115 subtree renames surface as a "N equipment re-parented under X → Y" summary instead of row-by-row.
|
||||
7. **High · ACCEPT** — OPC 40010 field list doesn't match decision #139. **Change**: field group realigned to `Manufacturer, Model, SerialNumber, HardwareRevision, SoftwareRevision, YearOfConstruction, AssetLocation, ManufacturerUri, DeviceManualUri`. `ProductInstanceUri / DeviceRevision / MonthOfConstruction` dropped from Phase 6.4 — they belong to a future OPC 40010 widening decision.
|
||||
8. **High · ACCEPT** — `Identification` subtree unreconciled with ACL hierarchy (Phase 6.2 6-level scope). **Change**: address-space builder creates the Identification sub-folder under the Equipment node **with the same ScopeId as Equipment** — no new scope level. ACL evaluator treats `…/Equipment/Identification/X` as inheriting the `Equipment` scope's grants. Documented in Phase 6.2's `acl-design.md` cross-reference update.
|
||||
9. **Low · ACCEPT** — Visual-review gate names nonexistent reviewer role. **Change**: rubric defined — a named "Admin UX reviewer" (role `FleetAdmin` user, not the implementation lead) compares side-by-side screenshots against the `admin-ui.md` §Visual-Design reference panels; signoff artefact is a checked-in screenshot set under `docs/v2/visual-compliance/phase-6-4/`.
|
||||
10. **Med · ACCEPT** — Cross-cluster drag/drop lacks loud failure path. **Change**: on drop across cluster boundary, disable the drop target + show a toast "Equipment is cluster-scoped (decision #82). To move across clusters, use the Export → Import workflow on the Cluster detail page." Plus a help link. Tested in Stream A.4.
|
||||
|
||||
@@ -27,8 +27,15 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient;
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string driverInstanceId)
|
||||
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IDisposable, IAsyncDisposable
|
||||
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IAlarmSource, IHistoryProvider, IDisposable, IAsyncDisposable
|
||||
{
|
||||
// ---- IAlarmSource state ----
|
||||
|
||||
private readonly System.Collections.Concurrent.ConcurrentDictionary<long, RemoteAlarmSubscription> _alarmSubscriptions = new();
|
||||
private long _nextAlarmSubscriptionId;
|
||||
|
||||
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
|
||||
|
||||
// ---- ISubscribable + IHostConnectivityProbe state ----
|
||||
|
||||
private readonly System.Collections.Concurrent.ConcurrentDictionary<long, RemoteSubscription> _subscriptions = new();
|
||||
@@ -59,6 +66,14 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
|
||||
private DriverHealth _health = new(DriverState.Unknown, null, null);
|
||||
private bool _disposed;
|
||||
/// <summary>URL of the endpoint the driver actually connected to. Exposed via <see cref="HostName"/>.</summary>
|
||||
private string? _connectedEndpointUrl;
|
||||
/// <summary>
|
||||
/// SDK-provided reconnect handler that owns the retry loop + session-transfer machinery
|
||||
/// when the session's keep-alive channel reports a bad status. Null outside the
|
||||
/// reconnecting window; constructed lazily inside the keep-alive handler.
|
||||
/// </summary>
|
||||
private SessionReconnectHandler? _reconnectHandler;
|
||||
|
||||
public string DriverInstanceId => driverInstanceId;
|
||||
public string DriverType => "OpcUaClient";
|
||||
@@ -69,66 +84,50 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
try
|
||||
{
|
||||
var appConfig = await BuildApplicationConfigurationAsync(cancellationToken).ConfigureAwait(false);
|
||||
var candidates = ResolveEndpointCandidates(_options);
|
||||
|
||||
// Endpoint selection: let the stack pick the best matching endpoint for the
|
||||
// requested security policy/mode so the driver doesn't have to hand-validate.
|
||||
// UseSecurity=false when SecurityMode=None shortcuts around cert validation
|
||||
// entirely and is the typical dev-bench configuration.
|
||||
var selected = await SelectMatchingEndpointAsync(
|
||||
appConfig, _options.EndpointUrl, _options.SecurityPolicy, _options.SecurityMode,
|
||||
cancellationToken).ConfigureAwait(false);
|
||||
var endpointConfig = EndpointConfiguration.Create(appConfig);
|
||||
endpointConfig.OperationTimeout = (int)_options.Timeout.TotalMilliseconds;
|
||||
var endpoint = new ConfiguredEndpoint(null, selected, endpointConfig);
|
||||
var identity = BuildUserIdentity(_options);
|
||||
|
||||
var identity = _options.AuthType switch
|
||||
// Failover sweep: try each endpoint in order, return the session from the first
|
||||
// one that successfully connects. Per-endpoint failures are captured so the final
|
||||
// aggregate exception names every URL that was tried and why — critical diag for
|
||||
// operators debugging 'why did the failover pick #3?'.
|
||||
var attemptErrors = new List<string>(candidates.Count);
|
||||
ISession? session = null;
|
||||
string? connectedUrl = null;
|
||||
foreach (var url in candidates)
|
||||
{
|
||||
OpcUaAuthType.Anonymous => new UserIdentity(new AnonymousIdentityToken()),
|
||||
// The UserIdentity(string, string) overload was removed in favour of
|
||||
// (string, byte[]) to make the password encoding explicit. UTF-8 is the
|
||||
// overwhelmingly common choice for Basic256Sha256-secured sessions.
|
||||
OpcUaAuthType.Username => new UserIdentity(
|
||||
_options.Username ?? string.Empty,
|
||||
System.Text.Encoding.UTF8.GetBytes(_options.Password ?? string.Empty)),
|
||||
OpcUaAuthType.Certificate => throw new NotSupportedException(
|
||||
"Certificate authentication lands in a follow-up PR; for now use Anonymous or Username"),
|
||||
_ => new UserIdentity(new AnonymousIdentityToken()),
|
||||
};
|
||||
try
|
||||
{
|
||||
session = await OpenSessionOnEndpointAsync(
|
||||
appConfig, url, _options.SecurityPolicy, _options.SecurityMode,
|
||||
identity, cancellationToken).ConfigureAwait(false);
|
||||
connectedUrl = url;
|
||||
break;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
attemptErrors.Add($"{url} -> {ex.GetType().Name}: {ex.Message}");
|
||||
}
|
||||
}
|
||||
|
||||
// All Session.Create* static methods are marked [Obsolete] in SDK 1.5.378; the
|
||||
// non-obsolete path is DefaultSessionFactory.Instance.CreateAsync (which is the
|
||||
// 8-arg signature matching our driver config — ApplicationConfiguration +
|
||||
// ConfiguredEndpoint, no transport-waiting-connection or reverse-connect-manager
|
||||
// required for the standard opc.tcp direct-connect case).
|
||||
// DefaultSessionFactory's parameterless ctor is also obsolete in 1.5.378; the
|
||||
// current constructor requires an ITelemetryContext. Passing null is tolerated —
|
||||
// the factory falls back to its internal default sink, same as the telemetry:null
|
||||
// on SelectEndpointAsync above.
|
||||
var session = await new DefaultSessionFactory(telemetry: null!).CreateAsync(
|
||||
appConfig,
|
||||
endpoint,
|
||||
false, // updateBeforeConnect
|
||||
_options.SessionName,
|
||||
(uint)_options.SessionTimeout.TotalMilliseconds,
|
||||
identity,
|
||||
null, // preferredLocales
|
||||
cancellationToken).ConfigureAwait(false);
|
||||
if (session is null)
|
||||
throw new AggregateException(
|
||||
"OPC UA Client failed to connect to any of the configured endpoints. " +
|
||||
"Tried:\n " + string.Join("\n ", attemptErrors),
|
||||
attemptErrors.Select(e => new InvalidOperationException(e)));
|
||||
|
||||
session.KeepAliveInterval = (int)_options.KeepAliveInterval.TotalMilliseconds;
|
||||
|
||||
// Wire the session's keep-alive channel into HostState. OPC UA keep-alives are
|
||||
// authoritative for session liveness: the SDK pings on KeepAliveInterval and sets
|
||||
// KeepAliveStopped when N intervals elapse without a response. That's strictly
|
||||
// better than a driver-side polling probe — no extra round-trip, no duplicate
|
||||
// semantic.
|
||||
_keepAliveHandler = (_, e) =>
|
||||
{
|
||||
var healthy = !ServiceResult.IsBad(e.Status);
|
||||
TransitionTo(healthy ? HostState.Running : HostState.Stopped);
|
||||
};
|
||||
// Wire the session's keep-alive channel into HostState + the reconnect trigger.
|
||||
// OPC UA keep-alives are authoritative for session liveness: the SDK pings on
|
||||
// KeepAliveInterval and sets KeepAliveStopped when N intervals elapse without a
|
||||
// response. On a bad keep-alive the driver spins up a SessionReconnectHandler
|
||||
// which transparently retries + swaps the underlying session. Subscriptions move
|
||||
// via TransferSubscriptions so local MonitoredItem handles stay valid.
|
||||
_keepAliveHandler = OnKeepAlive;
|
||||
session.KeepAlive += _keepAliveHandler;
|
||||
|
||||
Session = session;
|
||||
_connectedEndpointUrl = connectedUrl;
|
||||
_health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
|
||||
TransitionTo(HostState.Running);
|
||||
}
|
||||
@@ -225,6 +224,71 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
return config;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Resolve the ordered failover candidate list. <c>EndpointUrls</c> wins when
|
||||
/// non-empty; otherwise fall back to <c>EndpointUrl</c> as a single-URL shortcut so
|
||||
/// existing single-endpoint configs keep working without migration.
|
||||
/// </summary>
|
||||
internal static IReadOnlyList<string> ResolveEndpointCandidates(OpcUaClientDriverOptions opts)
|
||||
{
|
||||
if (opts.EndpointUrls is { Count: > 0 }) return opts.EndpointUrls;
|
||||
return [opts.EndpointUrl];
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Build the user-identity token from the driver options. Split out of
|
||||
/// <see cref="InitializeAsync"/> so the failover sweep reuses one identity across
|
||||
/// every endpoint attempt — generating it N times would re-unlock the user cert's
|
||||
/// private key N times, wasteful + keeps the password in memory longer.
|
||||
/// </summary>
|
||||
internal static UserIdentity BuildUserIdentity(OpcUaClientDriverOptions options) =>
|
||||
options.AuthType switch
|
||||
{
|
||||
OpcUaAuthType.Anonymous => new UserIdentity(new AnonymousIdentityToken()),
|
||||
OpcUaAuthType.Username => new UserIdentity(
|
||||
options.Username ?? string.Empty,
|
||||
System.Text.Encoding.UTF8.GetBytes(options.Password ?? string.Empty)),
|
||||
OpcUaAuthType.Certificate => BuildCertificateIdentity(options),
|
||||
_ => new UserIdentity(new AnonymousIdentityToken()),
|
||||
};
|
||||
|
||||
/// <summary>
|
||||
/// Open a session against a single endpoint URL. Bounded by
|
||||
/// <see cref="OpcUaClientDriverOptions.PerEndpointConnectTimeout"/> so the failover
|
||||
/// sweep doesn't spend its full budget on one dead server. Moved out of
|
||||
/// <see cref="InitializeAsync"/> so the failover loop body stays readable.
|
||||
/// </summary>
|
||||
private async Task<ISession> OpenSessionOnEndpointAsync(
|
||||
ApplicationConfiguration appConfig,
|
||||
string endpointUrl,
|
||||
OpcUaSecurityPolicy policy,
|
||||
OpcUaSecurityMode mode,
|
||||
UserIdentity identity,
|
||||
CancellationToken ct)
|
||||
{
|
||||
using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
||||
cts.CancelAfter(_options.PerEndpointConnectTimeout);
|
||||
|
||||
var selected = await SelectMatchingEndpointAsync(
|
||||
appConfig, endpointUrl, policy, mode, cts.Token).ConfigureAwait(false);
|
||||
var endpointConfig = EndpointConfiguration.Create(appConfig);
|
||||
endpointConfig.OperationTimeout = (int)_options.Timeout.TotalMilliseconds;
|
||||
var endpoint = new ConfiguredEndpoint(null, selected, endpointConfig);
|
||||
|
||||
var session = await new DefaultSessionFactory(telemetry: null!).CreateAsync(
|
||||
appConfig,
|
||||
endpoint,
|
||||
false, // updateBeforeConnect
|
||||
_options.SessionName,
|
||||
(uint)_options.SessionTimeout.TotalMilliseconds,
|
||||
identity,
|
||||
null, // preferredLocales
|
||||
cts.Token).ConfigureAwait(false);
|
||||
|
||||
session.KeepAliveInterval = (int)_options.KeepAliveInterval.TotalMilliseconds;
|
||||
return session;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Select the remote endpoint matching both the requested <paramref name="policy"/>
|
||||
/// and <paramref name="mode"/>. The SDK's <c>CoreClientUtils.SelectEndpointAsync</c>
|
||||
@@ -271,6 +335,39 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
return match;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Build a <see cref="UserIdentity"/> carrying a client user-authentication
|
||||
/// certificate loaded from <see cref="OpcUaClientDriverOptions.UserCertificatePath"/>.
|
||||
/// Used when the remote server's endpoint advertises Certificate-type user tokens.
|
||||
/// Fails fast if the path is missing, the file doesn't exist, or the certificate
|
||||
/// lacks a private key (the private key is required to sign the user-token
|
||||
/// challenge during session activation).
|
||||
/// </summary>
|
||||
internal static UserIdentity BuildCertificateIdentity(OpcUaClientDriverOptions options)
|
||||
{
|
||||
if (string.IsNullOrWhiteSpace(options.UserCertificatePath))
|
||||
throw new InvalidOperationException(
|
||||
"OpcUaAuthType.Certificate requires OpcUaClientDriverOptions.UserCertificatePath to be set.");
|
||||
if (!System.IO.File.Exists(options.UserCertificatePath))
|
||||
throw new System.IO.FileNotFoundException(
|
||||
$"User certificate not found at '{options.UserCertificatePath}'.",
|
||||
options.UserCertificatePath);
|
||||
|
||||
// X509CertificateLoader (new in .NET 9) is the only non-obsolete way to load a PFX
|
||||
// since the legacy X509Certificate2 ctors are marked obsolete on net10. Passes the
|
||||
// password through verbatim; PEM files with external keys fall back to
|
||||
// LoadCertificateFromFile which picks up the adjacent .key if present.
|
||||
var cert = System.Security.Cryptography.X509Certificates.X509CertificateLoader
|
||||
.LoadPkcs12FromFile(options.UserCertificatePath, options.UserCertificatePassword);
|
||||
|
||||
if (!cert.HasPrivateKey)
|
||||
throw new InvalidOperationException(
|
||||
$"User certificate at '{options.UserCertificatePath}' has no private key — " +
|
||||
"the private key is required to sign the OPC UA user-token challenge at session activation.");
|
||||
|
||||
return new UserIdentity(cert);
|
||||
}
|
||||
|
||||
/// <summary>Convert a driver <see cref="OpcUaSecurityPolicy"/> to the OPC UA policy URI.</summary>
|
||||
internal static string MapSecurityPolicy(OpcUaSecurityPolicy policy) => policy switch
|
||||
{
|
||||
@@ -305,6 +402,20 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
}
|
||||
_subscriptions.Clear();
|
||||
|
||||
foreach (var ras in _alarmSubscriptions.Values)
|
||||
{
|
||||
try { await ras.Subscription.DeleteAsync(silent: true, cancellationToken).ConfigureAwait(false); }
|
||||
catch { /* best-effort */ }
|
||||
}
|
||||
_alarmSubscriptions.Clear();
|
||||
|
||||
// Abort any in-flight reconnect attempts before touching the session — BeginReconnect's
|
||||
// retry loop holds a reference to the current session and would fight Session.CloseAsync
|
||||
// if left spinning.
|
||||
try { _reconnectHandler?.CancelReconnect(); } catch { }
|
||||
_reconnectHandler?.Dispose();
|
||||
_reconnectHandler = null;
|
||||
|
||||
if (_keepAliveHandler is not null && Session is not null)
|
||||
{
|
||||
try { Session.KeepAlive -= _keepAliveHandler; } catch { }
|
||||
@@ -315,6 +426,7 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
catch { /* best-effort */ }
|
||||
try { Session?.Dispose(); } catch { }
|
||||
Session = null;
|
||||
_connectedEndpointUrl = null;
|
||||
|
||||
TransitionTo(HostState.Unknown);
|
||||
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
|
||||
@@ -493,20 +605,46 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
var rootFolder = builder.Folder("Remote", "Remote");
|
||||
var visited = new HashSet<NodeId>();
|
||||
var discovered = 0;
|
||||
var pendingVariables = new List<PendingVariable>();
|
||||
|
||||
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
|
||||
try
|
||||
{
|
||||
// Pass 1: browse hierarchy + create folders inline, collect variables into a
|
||||
// pending list. Defers variable registration until attributes are resolved — the
|
||||
// address-space builder's Variable call is the one-way commit, so doing it only
|
||||
// once per variable (with correct DataType/SecurityClass/IsArray) avoids the
|
||||
// alternative (register with placeholders + mutate later) which the
|
||||
// IAddressSpaceBuilder contract doesn't expose.
|
||||
await BrowseRecursiveAsync(session, root, rootFolder, visited,
|
||||
depth: 0, discovered: () => discovered, increment: () => discovered++,
|
||||
depth: 0,
|
||||
discovered: () => discovered, increment: () => discovered++,
|
||||
pendingVariables: pendingVariables,
|
||||
ct: cancellationToken).ConfigureAwait(false);
|
||||
|
||||
// Pass 2: batch-read DataType + AccessLevel + ValueRank + Historizing per
|
||||
// variable. One wire request for up to ~N variables; for 10k-node servers this is
|
||||
// still a couple of hundred ms total since the SDK chunks ReadAsync automatically.
|
||||
await EnrichAndRegisterVariablesAsync(session, pendingVariables, cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
finally { _gate.Release(); }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// A variable collected during the browse pass, waiting for attribute enrichment
|
||||
/// before being registered on the address-space builder.
|
||||
/// </summary>
|
||||
private readonly record struct PendingVariable(
|
||||
IAddressSpaceBuilder ParentFolder,
|
||||
string BrowseName,
|
||||
string DisplayName,
|
||||
NodeId NodeId);
|
||||
|
||||
private async Task BrowseRecursiveAsync(
|
||||
ISession session, NodeId node, IAddressSpaceBuilder folder, HashSet<NodeId> visited,
|
||||
int depth, Func<int> discovered, Action increment, CancellationToken ct)
|
||||
int depth, Func<int> discovered, Action increment,
|
||||
List<PendingVariable> pendingVariables, CancellationToken ct)
|
||||
{
|
||||
if (depth >= _options.MaxBrowseDepth) return;
|
||||
if (discovered() >= _options.MaxDiscoveredNodes) return;
|
||||
@@ -562,27 +700,155 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
var subFolder = folder.Folder(browseName, displayName);
|
||||
increment();
|
||||
await BrowseRecursiveAsync(session, childId, subFolder, visited,
|
||||
depth + 1, discovered, increment, ct).ConfigureAwait(false);
|
||||
depth + 1, discovered, increment, pendingVariables, ct).ConfigureAwait(false);
|
||||
}
|
||||
else if (rf.NodeClass == NodeClass.Variable)
|
||||
{
|
||||
// Serialize the NodeId so the IReadable/IWritable surface receives a
|
||||
// round-trippable string. Deferring the DataType + AccessLevel fetch to a
|
||||
// follow-up PR — initial browse uses a conservative ViewOnly + Int32 default.
|
||||
var nodeIdString = childId.ToString() ?? string.Empty;
|
||||
folder.Variable(browseName, displayName, new DriverAttributeInfo(
|
||||
FullName: nodeIdString,
|
||||
DriverDataType: DriverDataType.Int32,
|
||||
IsArray: false,
|
||||
ArrayDim: null,
|
||||
SecurityClass: SecurityClassification.ViewOnly,
|
||||
IsHistorized: false,
|
||||
IsAlarm: false));
|
||||
pendingVariables.Add(new PendingVariable(folder, browseName, displayName, childId));
|
||||
increment();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Pass 2 of discovery: batch-read DataType + ValueRank + AccessLevel + Historizing
|
||||
/// for every collected variable in one Session.ReadAsync (the SDK chunks internally
|
||||
/// to respect the server's per-request limits). Then register each variable on its
|
||||
/// parent folder with the real <see cref="DriverAttributeInfo"/>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Attributes read: <c>DataType</c> (NodeId of the value type),
|
||||
/// <c>ValueRank</c> (-1 = scalar, 1 = array), <c>UserAccessLevel</c> (the
|
||||
/// effective access mask for our session — more accurate than AccessLevel which
|
||||
/// is the server-side configured mask before user filtering), and
|
||||
/// <c>Historizing</c> (server flags whether historian data is available).
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// When the upstream server returns Bad on any attribute, the variable falls back
|
||||
/// to safe defaults (Int32 / ViewOnly / not-array / not-historized) and is still
|
||||
/// registered — a partial enrichment failure shouldn't drop entire variables from
|
||||
/// the address space. Operators reading the Admin dashboard see the variable
|
||||
/// with conservative metadata which is obviously wrong and easy to triage.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
private async Task EnrichAndRegisterVariablesAsync(
|
||||
ISession session, IReadOnlyList<PendingVariable> pending, CancellationToken ct)
|
||||
{
|
||||
if (pending.Count == 0) return;
|
||||
|
||||
// 4 attributes per variable: DataType, ValueRank, UserAccessLevel, Historizing.
|
||||
var nodesToRead = new ReadValueIdCollection(pending.Count * 4);
|
||||
foreach (var pv in pending)
|
||||
{
|
||||
nodesToRead.Add(new ReadValueId { NodeId = pv.NodeId, AttributeId = Attributes.DataType });
|
||||
nodesToRead.Add(new ReadValueId { NodeId = pv.NodeId, AttributeId = Attributes.ValueRank });
|
||||
nodesToRead.Add(new ReadValueId { NodeId = pv.NodeId, AttributeId = Attributes.UserAccessLevel });
|
||||
nodesToRead.Add(new ReadValueId { NodeId = pv.NodeId, AttributeId = Attributes.Historizing });
|
||||
}
|
||||
|
||||
DataValueCollection values;
|
||||
try
|
||||
{
|
||||
var resp = await session.ReadAsync(
|
||||
requestHeader: null,
|
||||
maxAge: 0,
|
||||
timestampsToReturn: TimestampsToReturn.Neither,
|
||||
nodesToRead: nodesToRead,
|
||||
ct: ct).ConfigureAwait(false);
|
||||
values = resp.Results;
|
||||
}
|
||||
catch
|
||||
{
|
||||
// Enrichment-read failed wholesale (server unreachable mid-browse). Register the
|
||||
// pending variables with conservative defaults rather than dropping them — the
|
||||
// downstream catalog is still useful for reading via IReadable.
|
||||
foreach (var pv in pending)
|
||||
RegisterFallback(pv);
|
||||
return;
|
||||
}
|
||||
|
||||
for (var i = 0; i < pending.Count; i++)
|
||||
{
|
||||
var pv = pending[i];
|
||||
var baseIdx = i * 4;
|
||||
var dataTypeDv = values[baseIdx];
|
||||
var valueRankDv = values[baseIdx + 1];
|
||||
var accessDv = values[baseIdx + 2];
|
||||
var histDv = values[baseIdx + 3];
|
||||
|
||||
var dataType = StatusCode.IsGood(dataTypeDv.StatusCode) && dataTypeDv.Value is NodeId dtId
|
||||
? MapUpstreamDataType(dtId)
|
||||
: DriverDataType.Int32;
|
||||
var valueRank = StatusCode.IsGood(valueRankDv.StatusCode) && valueRankDv.Value is int vr ? vr : -1;
|
||||
var isArray = valueRank >= 0; // -1 = scalar; 1+ = array dimensions; 0 = one-dimensional array
|
||||
var access = StatusCode.IsGood(accessDv.StatusCode) && accessDv.Value is byte ab ? ab : (byte)0;
|
||||
var securityClass = MapAccessLevelToSecurityClass(access);
|
||||
var historizing = StatusCode.IsGood(histDv.StatusCode) && histDv.Value is bool b && b;
|
||||
|
||||
pv.ParentFolder.Variable(pv.BrowseName, pv.DisplayName, new DriverAttributeInfo(
|
||||
FullName: pv.NodeId.ToString() ?? string.Empty,
|
||||
DriverDataType: dataType,
|
||||
IsArray: isArray,
|
||||
ArrayDim: null,
|
||||
SecurityClass: securityClass,
|
||||
IsHistorized: historizing,
|
||||
IsAlarm: false));
|
||||
}
|
||||
|
||||
void RegisterFallback(PendingVariable pv)
|
||||
{
|
||||
pv.ParentFolder.Variable(pv.BrowseName, pv.DisplayName, new DriverAttributeInfo(
|
||||
FullName: pv.NodeId.ToString() ?? string.Empty,
|
||||
DriverDataType: DriverDataType.Int32,
|
||||
IsArray: false,
|
||||
ArrayDim: null,
|
||||
SecurityClass: SecurityClassification.ViewOnly,
|
||||
IsHistorized: false,
|
||||
IsAlarm: false));
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Map an upstream OPC UA built-in DataType NodeId (via <c>DataTypeIds.*</c>) to a
|
||||
/// <see cref="DriverDataType"/>. Unknown / custom types fall through to
|
||||
/// <see cref="DriverDataType.String"/> which is the safest passthrough for
|
||||
/// Variant-wrapped structs + enums + extension objects; downstream clients see a
|
||||
/// string rendering but the cascading-quality path still preserves upstream
|
||||
/// StatusCode + timestamps.
|
||||
/// </summary>
|
||||
internal static DriverDataType MapUpstreamDataType(NodeId dataType)
|
||||
{
|
||||
if (dataType == DataTypeIds.Boolean) return DriverDataType.Boolean;
|
||||
if (dataType == DataTypeIds.SByte || dataType == DataTypeIds.Byte ||
|
||||
dataType == DataTypeIds.Int16) return DriverDataType.Int16;
|
||||
if (dataType == DataTypeIds.UInt16) return DriverDataType.UInt16;
|
||||
if (dataType == DataTypeIds.Int32) return DriverDataType.Int32;
|
||||
if (dataType == DataTypeIds.UInt32) return DriverDataType.UInt32;
|
||||
if (dataType == DataTypeIds.Int64) return DriverDataType.Int64;
|
||||
if (dataType == DataTypeIds.UInt64) return DriverDataType.UInt64;
|
||||
if (dataType == DataTypeIds.Float) return DriverDataType.Float32;
|
||||
if (dataType == DataTypeIds.Double) return DriverDataType.Float64;
|
||||
if (dataType == DataTypeIds.String) return DriverDataType.String;
|
||||
if (dataType == DataTypeIds.DateTime || dataType == DataTypeIds.UtcTime)
|
||||
return DriverDataType.DateTime;
|
||||
return DriverDataType.String;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Map an OPC UA AccessLevel/UserAccessLevel attribute value (<c>AccessLevels</c>
|
||||
/// bitmask) to a <see cref="SecurityClassification"/> the local node-manager's ACL
|
||||
/// layer can gate writes off. CurrentWrite-capable variables surface as
|
||||
/// <see cref="SecurityClassification.Operate"/>; read-only as <see cref="SecurityClassification.ViewOnly"/>.
|
||||
/// </summary>
|
||||
internal static SecurityClassification MapAccessLevelToSecurityClass(byte accessLevel)
|
||||
{
|
||||
const byte CurrentWrite = 2; // AccessLevels.CurrentWrite = 0x02
|
||||
return (accessLevel & CurrentWrite) != 0
|
||||
? SecurityClassification.Operate
|
||||
: SecurityClassification.ViewOnly;
|
||||
}
|
||||
|
||||
// ---- ISubscribable ----
|
||||
|
||||
public async Task<ISubscriptionHandle> SubscribeAsync(
|
||||
@@ -684,10 +950,337 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
public string DiagnosticId => $"opcua-sub-{Id}";
|
||||
}
|
||||
|
||||
// ---- IAlarmSource ----
|
||||
|
||||
/// <summary>
|
||||
/// Field positions in the EventFilter SelectClauses below. Used to index into the
|
||||
/// <c>EventFieldList.EventFields</c> Variant collection when an event arrives.
|
||||
/// </summary>
|
||||
private const int AlarmFieldEventId = 0;
|
||||
private const int AlarmFieldEventType = 1;
|
||||
private const int AlarmFieldSourceNode = 2;
|
||||
private const int AlarmFieldMessage = 3;
|
||||
private const int AlarmFieldSeverity = 4;
|
||||
private const int AlarmFieldTime = 5;
|
||||
private const int AlarmFieldConditionId = 6;
|
||||
|
||||
public async Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
|
||||
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken)
|
||||
{
|
||||
var session = RequireSession();
|
||||
var id = Interlocked.Increment(ref _nextAlarmSubscriptionId);
|
||||
var handle = new OpcUaAlarmSubscriptionHandle(id);
|
||||
|
||||
// Pre-resolve the source-node filter set so the per-event notification handler can
|
||||
// match in O(1) without re-parsing on every event.
|
||||
var sourceFilter = new HashSet<string>(sourceNodeIds, StringComparer.Ordinal);
|
||||
|
||||
var subscription = new Subscription(telemetry: null!, new SubscriptionOptions
|
||||
{
|
||||
DisplayName = $"opcua-alarm-sub-{id}",
|
||||
PublishingInterval = 500, // 500ms — alarms don't need fast polling; the server pushes
|
||||
KeepAliveCount = 10,
|
||||
LifetimeCount = 1000,
|
||||
MaxNotificationsPerPublish = 0,
|
||||
PublishingEnabled = true,
|
||||
Priority = 0,
|
||||
TimestampsToReturn = TimestampsToReturn.Both,
|
||||
});
|
||||
|
||||
// EventFilter SelectClauses — pick the standard BaseEventType fields we need to
|
||||
// materialize an AlarmEventArgs. Field positions are indexed by the AlarmField*
|
||||
// constants so the notification handler indexes in O(1) without re-examining the
|
||||
// QualifiedName BrowsePaths.
|
||||
var filter = new EventFilter();
|
||||
void AddField(string browseName) => filter.SelectClauses.Add(new SimpleAttributeOperand
|
||||
{
|
||||
TypeDefinitionId = ObjectTypeIds.BaseEventType,
|
||||
BrowsePath = [new QualifiedName(browseName)],
|
||||
AttributeId = Attributes.Value,
|
||||
});
|
||||
AddField("EventId");
|
||||
AddField("EventType");
|
||||
AddField("SourceNode");
|
||||
AddField("Message");
|
||||
AddField("Severity");
|
||||
AddField("Time");
|
||||
// ConditionId on ConditionType nodes is the branch identifier for
|
||||
// acknowledgeable conditions. Not a BaseEventType field — reach it via the typed path.
|
||||
filter.SelectClauses.Add(new SimpleAttributeOperand
|
||||
{
|
||||
TypeDefinitionId = ObjectTypeIds.ConditionType,
|
||||
BrowsePath = [], // empty path = the condition node itself
|
||||
AttributeId = Attributes.NodeId,
|
||||
});
|
||||
|
||||
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
|
||||
try
|
||||
{
|
||||
session.AddSubscription(subscription);
|
||||
await subscription.CreateAsync(cancellationToken).ConfigureAwait(false);
|
||||
|
||||
var eventItem = new MonitoredItem(telemetry: null!, new MonitoredItemOptions
|
||||
{
|
||||
DisplayName = "Server/Events",
|
||||
StartNodeId = ObjectIds.Server,
|
||||
AttributeId = Attributes.EventNotifier,
|
||||
MonitoringMode = MonitoringMode.Reporting,
|
||||
QueueSize = 1000, // deep queue — a server can fire many alarms in bursts
|
||||
DiscardOldest = false,
|
||||
Filter = filter,
|
||||
})
|
||||
{
|
||||
Handle = handle,
|
||||
};
|
||||
eventItem.Notification += (mi, args) => OnEventNotification(handle, sourceFilter, mi, args);
|
||||
subscription.AddItem(eventItem);
|
||||
await subscription.CreateItemsAsync(cancellationToken).ConfigureAwait(false);
|
||||
|
||||
_alarmSubscriptions[id] = new RemoteAlarmSubscription(subscription, handle);
|
||||
}
|
||||
finally { _gate.Release(); }
|
||||
|
||||
return handle;
|
||||
}
|
||||
|
||||
public async Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
|
||||
{
|
||||
if (handle is not OpcUaAlarmSubscriptionHandle h) return;
|
||||
if (!_alarmSubscriptions.TryRemove(h.Id, out var rs)) return;
|
||||
|
||||
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
|
||||
try
|
||||
{
|
||||
try { await rs.Subscription.DeleteAsync(silent: true, cancellationToken).ConfigureAwait(false); }
|
||||
catch { /* best-effort — session may already be gone across a reconnect */ }
|
||||
}
|
||||
finally { _gate.Release(); }
|
||||
}
|
||||
|
||||
public async Task AcknowledgeAsync(
|
||||
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)
|
||||
{
|
||||
// Short-circuit empty batch BEFORE touching the session so callers can pass an empty
|
||||
// list without guarding the size themselves — e.g. a bulk-ack UI that built an empty
|
||||
// list because the filter matched nothing.
|
||||
if (acknowledgements.Count == 0) return;
|
||||
var session = RequireSession();
|
||||
|
||||
// OPC UA A&C: call the AcknowledgeableConditionType.Acknowledge method on each
|
||||
// condition node with EventId + Comment arguments. CallAsync accepts a batch —
|
||||
// one CallMethodRequest per ack.
|
||||
var callRequests = new CallMethodRequestCollection();
|
||||
foreach (var ack in acknowledgements)
|
||||
{
|
||||
if (!TryParseNodeId(session, ack.ConditionId, out var conditionId)) continue;
|
||||
callRequests.Add(new CallMethodRequest
|
||||
{
|
||||
ObjectId = conditionId,
|
||||
MethodId = MethodIds.AcknowledgeableConditionType_Acknowledge,
|
||||
InputArguments = [
|
||||
new Variant(Array.Empty<byte>()), // EventId — server-side best-effort; empty resolves to 'most recent'
|
||||
new Variant(new LocalizedText(ack.Comment ?? string.Empty)),
|
||||
],
|
||||
});
|
||||
}
|
||||
|
||||
if (callRequests.Count == 0) return;
|
||||
|
||||
await _gate.WaitAsync(cancellationToken).ConfigureAwait(false);
|
||||
try
|
||||
{
|
||||
try
|
||||
{
|
||||
_ = await session.CallAsync(
|
||||
requestHeader: null,
|
||||
methodsToCall: callRequests,
|
||||
ct: cancellationToken).ConfigureAwait(false);
|
||||
}
|
||||
catch { /* best-effort — caller's re-ack mechanism catches pathological paths */ }
|
||||
}
|
||||
finally { _gate.Release(); }
|
||||
}
|
||||
|
||||
private void OnEventNotification(
|
||||
OpcUaAlarmSubscriptionHandle handle,
|
||||
HashSet<string> sourceFilter,
|
||||
MonitoredItem item,
|
||||
MonitoredItemNotificationEventArgs args)
|
||||
{
|
||||
if (args.NotificationValue is not EventFieldList efl) return;
|
||||
if (efl.EventFields.Count <= AlarmFieldConditionId) return;
|
||||
|
||||
var sourceNode = efl.EventFields[AlarmFieldSourceNode].Value?.ToString() ?? string.Empty;
|
||||
if (sourceFilter.Count > 0 && !sourceFilter.Contains(sourceNode)) return;
|
||||
|
||||
var eventType = efl.EventFields[AlarmFieldEventType].Value?.ToString() ?? "BaseEventType";
|
||||
var message = (efl.EventFields[AlarmFieldMessage].Value as LocalizedText)?.Text ?? string.Empty;
|
||||
var severity = efl.EventFields[AlarmFieldSeverity].Value is ushort sev ? sev : (ushort)0;
|
||||
var time = efl.EventFields[AlarmFieldTime].Value is DateTime t ? t : DateTime.UtcNow;
|
||||
var conditionId = efl.EventFields[AlarmFieldConditionId].Value?.ToString() ?? string.Empty;
|
||||
|
||||
OnAlarmEvent?.Invoke(this, new AlarmEventArgs(
|
||||
SubscriptionHandle: handle,
|
||||
SourceNodeId: sourceNode,
|
||||
ConditionId: conditionId,
|
||||
AlarmType: eventType,
|
||||
Message: message,
|
||||
Severity: MapSeverity(severity),
|
||||
SourceTimestampUtc: time));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Map an OPC UA <c>BaseEventType.Severity</c> (1..1000) to our coarse-grained
|
||||
/// <see cref="AlarmSeverity"/> bucket. Thresholds match the OPC UA A&C Part 9
|
||||
/// guidance: 1-200 Low, 201-500 Medium, 501-800 High, 801-1000 Critical.
|
||||
/// </summary>
|
||||
internal static AlarmSeverity MapSeverity(ushort opcSeverity) => opcSeverity switch
|
||||
{
|
||||
<= 200 => AlarmSeverity.Low,
|
||||
<= 500 => AlarmSeverity.Medium,
|
||||
<= 800 => AlarmSeverity.High,
|
||||
_ => AlarmSeverity.Critical,
|
||||
};
|
||||
|
||||
private sealed record RemoteAlarmSubscription(Subscription Subscription, OpcUaAlarmSubscriptionHandle Handle);
|
||||
|
||||
private sealed record OpcUaAlarmSubscriptionHandle(long Id) : IAlarmSubscriptionHandle
|
||||
{
|
||||
public string DiagnosticId => $"opcua-alarm-sub-{Id}";
|
||||
}
|
||||
|
||||
// ---- IHistoryProvider (passthrough to upstream server) ----
|
||||
|
||||
public async Task<Core.Abstractions.HistoryReadResult> ReadRawAsync(
|
||||
string fullReference, DateTime startUtc, DateTime endUtc, uint maxValuesPerNode,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
var details = new ReadRawModifiedDetails
|
||||
{
|
||||
IsReadModified = false,
|
||||
StartTime = startUtc,
|
||||
EndTime = endUtc,
|
||||
NumValuesPerNode = maxValuesPerNode,
|
||||
ReturnBounds = false,
|
||||
};
|
||||
return await ExecuteHistoryReadAsync(fullReference, new ExtensionObject(details), cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
|
||||
public async Task<Core.Abstractions.HistoryReadResult> ReadProcessedAsync(
|
||||
string fullReference, DateTime startUtc, DateTime endUtc, TimeSpan interval,
|
||||
HistoryAggregateType aggregate, CancellationToken cancellationToken)
|
||||
{
|
||||
var aggregateId = MapAggregateToNodeId(aggregate);
|
||||
var details = new ReadProcessedDetails
|
||||
{
|
||||
StartTime = startUtc,
|
||||
EndTime = endUtc,
|
||||
ProcessingInterval = interval.TotalMilliseconds,
|
||||
AggregateType = [aggregateId],
|
||||
};
|
||||
return await ExecuteHistoryReadAsync(fullReference, new ExtensionObject(details), cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
|
||||
public async Task<Core.Abstractions.HistoryReadResult> ReadAtTimeAsync(
|
||||
string fullReference, IReadOnlyList<DateTime> timestampsUtc, CancellationToken cancellationToken)
|
||||
{
|
||||
var reqTimes = new DateTimeCollection(timestampsUtc);
|
||||
var details = new ReadAtTimeDetails
|
||||
{
|
||||
ReqTimes = reqTimes,
|
||||
UseSimpleBounds = true,
|
||||
};
|
||||
return await ExecuteHistoryReadAsync(fullReference, new ExtensionObject(details), cancellationToken)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Shared HistoryRead wire path — used by Raw/Processed/AtTime. Handles NodeId parse,
|
||||
/// Session.HistoryReadAsync call, Bad-StatusCode passthrough (no translation per §8
|
||||
/// cascading-quality rule), and HistoryData unwrap into <see cref="DataValueSnapshot"/>.
|
||||
/// </summary>
|
||||
private async Task<Core.Abstractions.HistoryReadResult> ExecuteHistoryReadAsync(
|
||||
string fullReference, ExtensionObject historyReadDetails, CancellationToken ct)
|
||||
{
|
||||
var session = RequireSession();
|
||||
if (!TryParseNodeId(session, fullReference, out var nodeId))
|
||||
{
|
||||
return new Core.Abstractions.HistoryReadResult([], null);
|
||||
}
|
||||
|
||||
var nodesToRead = new HistoryReadValueIdCollection
|
||||
{
|
||||
new HistoryReadValueId { NodeId = nodeId },
|
||||
};
|
||||
|
||||
await _gate.WaitAsync(ct).ConfigureAwait(false);
|
||||
try
|
||||
{
|
||||
var resp = await session.HistoryReadAsync(
|
||||
requestHeader: null,
|
||||
historyReadDetails: historyReadDetails,
|
||||
timestampsToReturn: TimestampsToReturn.Both,
|
||||
releaseContinuationPoints: false,
|
||||
nodesToRead: nodesToRead,
|
||||
ct: ct).ConfigureAwait(false);
|
||||
|
||||
if (resp.Results.Count == 0) return new Core.Abstractions.HistoryReadResult([], null);
|
||||
var r = resp.Results[0];
|
||||
|
||||
// Unwrap HistoryData from the ExtensionObject-encoded payload the SDK returns.
|
||||
// Samples stay in chronological order per OPC UA Part 11; cascading-quality
|
||||
// rule: preserve each DataValue's upstream StatusCode + timestamps verbatim.
|
||||
var samples = new List<DataValueSnapshot>();
|
||||
if (r.HistoryData?.Body is HistoryData hd)
|
||||
{
|
||||
var now = DateTime.UtcNow;
|
||||
foreach (var dv in hd.DataValues)
|
||||
{
|
||||
samples.Add(new DataValueSnapshot(
|
||||
Value: dv.Value,
|
||||
StatusCode: dv.StatusCode.Code,
|
||||
SourceTimestampUtc: dv.SourceTimestamp == DateTime.MinValue ? null : dv.SourceTimestamp,
|
||||
ServerTimestampUtc: dv.ServerTimestamp == DateTime.MinValue ? now : dv.ServerTimestamp));
|
||||
}
|
||||
}
|
||||
|
||||
var contPt = r.ContinuationPoint is { Length: > 0 } ? r.ContinuationPoint : null;
|
||||
return new Core.Abstractions.HistoryReadResult(samples, contPt);
|
||||
}
|
||||
finally { _gate.Release(); }
|
||||
}
|
||||
|
||||
/// <summary>Map <see cref="HistoryAggregateType"/> to the OPC UA Part 13 standard aggregate NodeId.</summary>
|
||||
internal static NodeId MapAggregateToNodeId(HistoryAggregateType aggregate) => aggregate switch
|
||||
{
|
||||
HistoryAggregateType.Average => ObjectIds.AggregateFunction_Average,
|
||||
HistoryAggregateType.Minimum => ObjectIds.AggregateFunction_Minimum,
|
||||
HistoryAggregateType.Maximum => ObjectIds.AggregateFunction_Maximum,
|
||||
HistoryAggregateType.Total => ObjectIds.AggregateFunction_Total,
|
||||
HistoryAggregateType.Count => ObjectIds.AggregateFunction_Count,
|
||||
_ => throw new ArgumentOutOfRangeException(nameof(aggregate), aggregate, null),
|
||||
};
|
||||
|
||||
// ReadEventsAsync stays at the interface default (throws NotSupportedException) per
|
||||
// IHistoryProvider contract -- the OPC UA Client driver CAN forward HistoryReadEvents,
|
||||
// but the call-site needs an EventFilter SelectClauses surface which the interface
|
||||
// doesn't carry. Landing the event-history passthrough requires extending
|
||||
// IHistoryProvider.ReadEventsAsync with a filter-spec parameter; out of scope for this PR.
|
||||
|
||||
// ---- IHostConnectivityProbe ----
|
||||
|
||||
/// <summary>Endpoint-URL-keyed host identity for the Admin /hosts dashboard.</summary>
|
||||
public string HostName => _options.EndpointUrl;
|
||||
/// <summary>
|
||||
/// Endpoint-URL-keyed host identity for the Admin /hosts dashboard. Reflects the
|
||||
/// endpoint the driver actually connected to after the failover sweep — not the
|
||||
/// first URL in the candidate list — so operators see which of the configured
|
||||
/// endpoints is currently serving traffic. Falls back to the first configured URL
|
||||
/// pre-init so the dashboard has something to render before the first connect.
|
||||
/// </summary>
|
||||
public string HostName => _connectedEndpointUrl
|
||||
?? ResolveEndpointCandidates(_options).FirstOrDefault()
|
||||
?? _options.EndpointUrl;
|
||||
|
||||
public IReadOnlyList<HostConnectivityStatus> GetHostStatuses()
|
||||
{
|
||||
@@ -695,6 +1288,76 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
|
||||
return [new HostConnectivityStatus(HostName, _hostState, _hostStateChangedUtc)];
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Session keep-alive handler. On a healthy ping, bumps HostState back to Running
|
||||
/// (typical bounce after a transient network blip). On a bad ping, starts the SDK's
|
||||
/// <see cref="SessionReconnectHandler"/> which retries on the configured period +
|
||||
/// fires <see cref="OnReconnectComplete"/> when it lands a new session.
|
||||
/// </summary>
|
||||
private void OnKeepAlive(ISession sender, KeepAliveEventArgs e)
|
||||
{
|
||||
if (!ServiceResult.IsBad(e.Status))
|
||||
{
|
||||
TransitionTo(HostState.Running);
|
||||
return;
|
||||
}
|
||||
|
||||
TransitionTo(HostState.Stopped);
|
||||
|
||||
// Kick off the SDK's reconnect loop exactly once per drop. The handler handles its
|
||||
// own retry cadence via ReconnectPeriod; we tear it down in OnReconnectComplete.
|
||||
if (_reconnectHandler is not null) return;
|
||||
|
||||
_reconnectHandler = new SessionReconnectHandler(telemetry: null!,
|
||||
reconnectAbort: false,
|
||||
maxReconnectPeriod: (int)TimeSpan.FromMinutes(2).TotalMilliseconds);
|
||||
|
||||
var state = _reconnectHandler.BeginReconnect(
|
||||
sender,
|
||||
(int)_options.ReconnectPeriod.TotalMilliseconds,
|
||||
OnReconnectComplete);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Called by <see cref="SessionReconnectHandler"/> when its retry loop has either
|
||||
/// successfully swapped to a new session or given up. Reads the new session off
|
||||
/// <c>handler.Session</c>, unwires the old keep-alive hook, rewires for the new
|
||||
/// one, and tears down the handler. Subscription migration is already handled
|
||||
/// inside the SDK via <c>TransferSubscriptions</c> (the SDK calls it automatically
|
||||
/// when <see cref="Session.TransferSubscriptionsOnReconnect"/> is <c>true</c>,
|
||||
/// which is the default).
|
||||
/// </summary>
|
||||
private void OnReconnectComplete(object? sender, EventArgs e)
|
||||
{
|
||||
if (sender is not SessionReconnectHandler handler) return;
|
||||
var newSession = handler.Session;
|
||||
var oldSession = Session;
|
||||
|
||||
// Rewire keep-alive onto the new session — without this the next drop wouldn't
|
||||
// trigger another reconnect attempt.
|
||||
if (oldSession is not null && _keepAliveHandler is not null)
|
||||
{
|
||||
try { oldSession.KeepAlive -= _keepAliveHandler; } catch { }
|
||||
}
|
||||
if (newSession is not null && _keepAliveHandler is not null)
|
||||
{
|
||||
newSession.KeepAlive += _keepAliveHandler;
|
||||
}
|
||||
|
||||
Session = newSession;
|
||||
_reconnectHandler?.Dispose();
|
||||
_reconnectHandler = null;
|
||||
|
||||
// Whether the reconnect actually succeeded depends on whether the session is
|
||||
// non-null + connected. When it succeeded, flip back to Running so downstream
|
||||
// consumers see recovery.
|
||||
if (newSession is not null)
|
||||
{
|
||||
TransitionTo(HostState.Running);
|
||||
_health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
|
||||
}
|
||||
}
|
||||
|
||||
private void TransitionTo(HostState newState)
|
||||
{
|
||||
HostState old;
|
||||
|
||||
@@ -13,9 +13,32 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient;
|
||||
/// </remarks>
|
||||
public sealed class OpcUaClientDriverOptions
|
||||
{
|
||||
/// <summary>Remote OPC UA endpoint URL, e.g. <c>opc.tcp://plc.internal:4840</c>.</summary>
|
||||
/// <summary>
|
||||
/// Remote OPC UA endpoint URL, e.g. <c>opc.tcp://plc.internal:4840</c>. Convenience
|
||||
/// shortcut for a single-endpoint deployment — equivalent to setting
|
||||
/// <see cref="EndpointUrls"/> to a list with this one URL. When both are provided,
|
||||
/// the list wins and <see cref="EndpointUrl"/> is ignored.
|
||||
/// </summary>
|
||||
public string EndpointUrl { get; init; } = "opc.tcp://localhost:4840";
|
||||
|
||||
/// <summary>
|
||||
/// Ordered list of candidate endpoint URLs for failover. The driver tries each in
|
||||
/// order at <see cref="OpcUaClientDriver.InitializeAsync"/> and on session drop;
|
||||
/// the first URL that successfully connects wins. Typical use-case: an OPC UA server
|
||||
/// pair running in hot-standby (primary 4840 + backup 4841) where either can serve
|
||||
/// the same address space. Leave unset (or empty) to use <see cref="EndpointUrl"/>
|
||||
/// as a single-URL shortcut.
|
||||
/// </summary>
|
||||
public IReadOnlyList<string> EndpointUrls { get; init; } = [];
|
||||
|
||||
/// <summary>
|
||||
/// Per-endpoint connect-attempt timeout during the failover sweep. Short enough that
|
||||
/// cycling through several dead servers doesn't blow the overall init budget, long
|
||||
/// enough to tolerate a slow TLS handshake on a healthy server. Applied independently
|
||||
/// of <see cref="Timeout"/> which governs steady-state operations.
|
||||
/// </summary>
|
||||
public TimeSpan PerEndpointConnectTimeout { get; init; } = TimeSpan.FromSeconds(3);
|
||||
|
||||
/// <summary>
|
||||
/// Security policy to require when selecting an endpoint. Either a
|
||||
/// <see cref="OpcUaSecurityPolicy"/> enum constant or a free-form string (for
|
||||
@@ -39,6 +62,23 @@ public sealed class OpcUaClientDriverOptions
|
||||
/// <summary>Password (required only for <see cref="OpcUaAuthType.Username"/>).</summary>
|
||||
public string? Password { get; init; }
|
||||
|
||||
/// <summary>
|
||||
/// Filesystem path to the user-identity certificate (PFX/PEM). Required when
|
||||
/// <see cref="AuthType"/> is <see cref="OpcUaAuthType.Certificate"/>. The driver
|
||||
/// loads the cert + private key, which the remote server validates against its
|
||||
/// <c>TrustedUserCertificates</c> store to authenticate the session's user token.
|
||||
/// Leave unset to use the driver's application-instance certificate as the user
|
||||
/// token (not typical — most deployments have a separate user cert).
|
||||
/// </summary>
|
||||
public string? UserCertificatePath { get; init; }
|
||||
|
||||
/// <summary>
|
||||
/// Optional password that unlocks <see cref="UserCertificatePath"/> when the PFX is
|
||||
/// protected. PEM files generally have their password on the adjacent key file; this
|
||||
/// knob only applies to password-locked PFX.
|
||||
/// </summary>
|
||||
public string? UserCertificatePassword { get; init; }
|
||||
|
||||
/// <summary>Server-negotiated session timeout. Default 120s per driver-specs.md §8.</summary>
|
||||
public TimeSpan SessionTimeout { get; init; } = TimeSpan.FromSeconds(120);
|
||||
|
||||
|
||||
@@ -0,0 +1,70 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientAlarmTests
|
||||
{
|
||||
[Theory]
|
||||
[InlineData((ushort)1, AlarmSeverity.Low)]
|
||||
[InlineData((ushort)200, AlarmSeverity.Low)]
|
||||
[InlineData((ushort)201, AlarmSeverity.Medium)]
|
||||
[InlineData((ushort)500, AlarmSeverity.Medium)]
|
||||
[InlineData((ushort)501, AlarmSeverity.High)]
|
||||
[InlineData((ushort)800, AlarmSeverity.High)]
|
||||
[InlineData((ushort)801, AlarmSeverity.Critical)]
|
||||
[InlineData((ushort)1000, AlarmSeverity.Critical)]
|
||||
public void MapSeverity_buckets_per_OPC_UA_Part_9_guidance(ushort opcSev, AlarmSeverity expected)
|
||||
{
|
||||
OpcUaClientDriver.MapSeverity(opcSev).ShouldBe(expected);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MapSeverity_zero_maps_to_Low()
|
||||
{
|
||||
// 0 isn't in OPC UA's 1-1000 range but we handle it gracefully as Low.
|
||||
OpcUaClientDriver.MapSeverity(0).ShouldBe(AlarmSeverity.Low);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SubscribeAlarmsAsync_without_initialize_throws_InvalidOperationException()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-alarm-uninit");
|
||||
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||
await drv.SubscribeAlarmsAsync([], TestContext.Current.CancellationToken));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task UnsubscribeAlarmsAsync_with_unknown_handle_is_noop()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-alarm-unknown");
|
||||
// Parallels the subscribe handle path — session-drop races shouldn't crash the caller.
|
||||
await drv.UnsubscribeAlarmsAsync(new FakeAlarmHandle(), TestContext.Current.CancellationToken);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task AcknowledgeAsync_without_initialize_throws_InvalidOperationException()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-ack-uninit");
|
||||
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||
await drv.AcknowledgeAsync(
|
||||
[new AlarmAcknowledgeRequest("ns=2;s=Src", "ns=2;s=Cond", "operator ack")],
|
||||
TestContext.Current.CancellationToken));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task AcknowledgeAsync_with_empty_batch_is_noop_even_without_init()
|
||||
{
|
||||
// Empty batch short-circuits before touching the session, so it's safe pre-init. This
|
||||
// keeps batch-ack callers from needing to guard the list size themselves.
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-ack-empty");
|
||||
await drv.AcknowledgeAsync([], TestContext.Current.CancellationToken);
|
||||
}
|
||||
|
||||
private sealed class FakeAlarmHandle : IAlarmSubscriptionHandle
|
||||
{
|
||||
public string DiagnosticId => "fake-alarm";
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,62 @@
|
||||
using Opc.Ua;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientAttributeMappingTests
|
||||
{
|
||||
[Theory]
|
||||
[InlineData((uint)DataTypes.Boolean, DriverDataType.Boolean)]
|
||||
[InlineData((uint)DataTypes.Int16, DriverDataType.Int16)]
|
||||
[InlineData((uint)DataTypes.UInt16, DriverDataType.UInt16)]
|
||||
[InlineData((uint)DataTypes.Int32, DriverDataType.Int32)]
|
||||
[InlineData((uint)DataTypes.UInt32, DriverDataType.UInt32)]
|
||||
[InlineData((uint)DataTypes.Int64, DriverDataType.Int64)]
|
||||
[InlineData((uint)DataTypes.UInt64, DriverDataType.UInt64)]
|
||||
[InlineData((uint)DataTypes.Float, DriverDataType.Float32)]
|
||||
[InlineData((uint)DataTypes.Double, DriverDataType.Float64)]
|
||||
[InlineData((uint)DataTypes.String, DriverDataType.String)]
|
||||
[InlineData((uint)DataTypes.DateTime, DriverDataType.DateTime)]
|
||||
public void MapUpstreamDataType_recognizes_standard_builtin_types(uint typeId, DriverDataType expected)
|
||||
{
|
||||
var nodeId = new NodeId(typeId);
|
||||
OpcUaClientDriver.MapUpstreamDataType(nodeId).ShouldBe(expected);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MapUpstreamDataType_maps_SByte_and_Byte_to_Int16_since_DriverDataType_lacks_8bit()
|
||||
{
|
||||
// DriverDataType has no 8-bit type; conservative widen to Int16. Documented so a
|
||||
// future Core.Abstractions PR that adds Int8/Byte can find this call site.
|
||||
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.SByte)).ShouldBe(DriverDataType.Int16);
|
||||
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.Byte)).ShouldBe(DriverDataType.Int16);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MapUpstreamDataType_falls_back_to_String_for_unknown_custom_types()
|
||||
{
|
||||
// Custom vendor extension object — NodeId in namespace 2 that isn't a standard type.
|
||||
OpcUaClientDriver.MapUpstreamDataType(new NodeId("CustomStruct", 2)).ShouldBe(DriverDataType.String);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MapUpstreamDataType_handles_UtcTime_as_DateTime()
|
||||
{
|
||||
OpcUaClientDriver.MapUpstreamDataType(new NodeId((uint)DataTypes.UtcTime)).ShouldBe(DriverDataType.DateTime);
|
||||
}
|
||||
|
||||
[Theory]
|
||||
[InlineData((byte)0, SecurityClassification.ViewOnly)] // no access flags set
|
||||
[InlineData((byte)1, SecurityClassification.ViewOnly)] // CurrentRead only
|
||||
[InlineData((byte)2, SecurityClassification.Operate)] // CurrentWrite only
|
||||
[InlineData((byte)3, SecurityClassification.Operate)] // CurrentRead + CurrentWrite
|
||||
[InlineData((byte)0x0F, SecurityClassification.Operate)] // read+write+historyRead+historyWrite
|
||||
[InlineData((byte)0x04, SecurityClassification.ViewOnly)] // HistoryRead only — no Write bit
|
||||
public void MapAccessLevelToSecurityClass_respects_CurrentWrite_bit(byte accessLevel, SecurityClassification expected)
|
||||
{
|
||||
OpcUaClientDriver.MapAccessLevelToSecurityClass(accessLevel).ShouldBe(expected);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,59 @@
|
||||
using System.Security.Cryptography;
|
||||
using System.Security.Cryptography.X509Certificates;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientCertAuthTests
|
||||
{
|
||||
[Fact]
|
||||
public void BuildCertificateIdentity_rejects_missing_path()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions { AuthType = OpcUaAuthType.Certificate };
|
||||
Should.Throw<InvalidOperationException>(() => OpcUaClientDriver.BuildCertificateIdentity(opts))
|
||||
.Message.ShouldContain("UserCertificatePath");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BuildCertificateIdentity_rejects_nonexistent_file()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
AuthType = OpcUaAuthType.Certificate,
|
||||
UserCertificatePath = Path.Combine(Path.GetTempPath(), $"does-not-exist-{Guid.NewGuid():N}.pfx"),
|
||||
};
|
||||
Should.Throw<FileNotFoundException>(() => OpcUaClientDriver.BuildCertificateIdentity(opts));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void BuildCertificateIdentity_loads_a_valid_PFX_with_private_key()
|
||||
{
|
||||
// Generate a self-signed cert on the fly so the test doesn't ship a static PFX.
|
||||
// The driver doesn't care about the issuer — just needs a cert with a private key.
|
||||
using var rsa = RSA.Create(2048);
|
||||
var req = new CertificateRequest("CN=OpcUaClientCertAuthTests", rsa,
|
||||
HashAlgorithmName.SHA256, RSASignaturePadding.Pkcs1);
|
||||
var cert = req.CreateSelfSigned(DateTimeOffset.UtcNow.AddMinutes(-5), DateTimeOffset.UtcNow.AddHours(1));
|
||||
|
||||
var tmpPath = Path.Combine(Path.GetTempPath(), $"opcua-cert-test-{Guid.NewGuid():N}.pfx");
|
||||
File.WriteAllBytes(tmpPath, cert.Export(X509ContentType.Pfx, "testpw"));
|
||||
try
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
AuthType = OpcUaAuthType.Certificate,
|
||||
UserCertificatePath = tmpPath,
|
||||
UserCertificatePassword = "testpw",
|
||||
};
|
||||
var identity = OpcUaClientDriver.BuildCertificateIdentity(opts);
|
||||
identity.ShouldNotBeNull();
|
||||
identity.TokenType.ShouldBe(Opc.Ua.UserTokenType.Certificate);
|
||||
}
|
||||
finally
|
||||
{
|
||||
try { File.Delete(tmpPath); } catch { /* best-effort */ }
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,81 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientFailoverTests
|
||||
{
|
||||
[Fact]
|
||||
public void ResolveEndpointCandidates_prefers_EndpointUrls_when_provided()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
EndpointUrl = "opc.tcp://fallback:4840",
|
||||
EndpointUrls = ["opc.tcp://primary:4840", "opc.tcp://backup:4841"],
|
||||
};
|
||||
var list = OpcUaClientDriver.ResolveEndpointCandidates(opts);
|
||||
list.Count.ShouldBe(2);
|
||||
list[0].ShouldBe("opc.tcp://primary:4840");
|
||||
list[1].ShouldBe("opc.tcp://backup:4841");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ResolveEndpointCandidates_falls_back_to_single_EndpointUrl_when_list_empty()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions { EndpointUrl = "opc.tcp://only:4840" };
|
||||
var list = OpcUaClientDriver.ResolveEndpointCandidates(opts);
|
||||
list.Count.ShouldBe(1);
|
||||
list[0].ShouldBe("opc.tcp://only:4840");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ResolveEndpointCandidates_empty_list_treated_as_fallback_to_EndpointUrl()
|
||||
{
|
||||
// Explicit empty list should still fall back to the single-URL shortcut rather than
|
||||
// producing a zero-candidate sweep that would immediately throw with no URLs tried.
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
EndpointUrl = "opc.tcp://single:4840",
|
||||
EndpointUrls = [],
|
||||
};
|
||||
OpcUaClientDriver.ResolveEndpointCandidates(opts).Count.ShouldBe(1);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void HostName_uses_first_candidate_before_connect()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
EndpointUrls = ["opc.tcp://primary:4840", "opc.tcp://backup:4841"],
|
||||
};
|
||||
using var drv = new OpcUaClientDriver(opts, "opcua-host");
|
||||
drv.HostName.ShouldBe("opc.tcp://primary:4840",
|
||||
"pre-connect the dashboard should show the first candidate URL so operators can link back");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Initialize_against_all_unreachable_endpoints_throws_AggregateException_listing_each()
|
||||
{
|
||||
// Port 1 + port 2 + port 3 on loopback are all guaranteed closed (TCP RST immediate).
|
||||
// Failover sweep should attempt all three and throw AggregateException naming each URL
|
||||
// so operators see exactly which candidates were tried.
|
||||
var opts = new OpcUaClientDriverOptions
|
||||
{
|
||||
EndpointUrls = ["opc.tcp://127.0.0.1:1", "opc.tcp://127.0.0.1:2", "opc.tcp://127.0.0.1:3"],
|
||||
PerEndpointConnectTimeout = TimeSpan.FromMilliseconds(500),
|
||||
Timeout = TimeSpan.FromMilliseconds(500),
|
||||
AutoAcceptCertificates = true,
|
||||
};
|
||||
using var drv = new OpcUaClientDriver(opts, "opcua-failover");
|
||||
|
||||
var ex = await Should.ThrowAsync<AggregateException>(async () =>
|
||||
await drv.InitializeAsync("{}", TestContext.Current.CancellationToken));
|
||||
|
||||
ex.Message.ShouldContain("127.0.0.1:1");
|
||||
ex.Message.ShouldContain("127.0.0.1:2");
|
||||
ex.Message.ShouldContain("127.0.0.1:3");
|
||||
drv.GetHealth().State.ShouldBe(DriverState.Faulted);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
using Opc.Ua;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientHistoryTests
|
||||
{
|
||||
[Theory]
|
||||
[InlineData(HistoryAggregateType.Average)]
|
||||
[InlineData(HistoryAggregateType.Minimum)]
|
||||
[InlineData(HistoryAggregateType.Maximum)]
|
||||
[InlineData(HistoryAggregateType.Total)]
|
||||
[InlineData(HistoryAggregateType.Count)]
|
||||
public void MapAggregateToNodeId_returns_standard_Part13_aggregate_for_every_enum(HistoryAggregateType agg)
|
||||
{
|
||||
var nodeId = OpcUaClientDriver.MapAggregateToNodeId(agg);
|
||||
NodeId.IsNull(nodeId).ShouldBeFalse();
|
||||
// Every mapping should resolve to an AggregateFunction_* NodeId (namespace 0, numeric id).
|
||||
nodeId.NamespaceIndex.ShouldBe((ushort)0);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void MapAggregateToNodeId_rejects_invalid_enum_value()
|
||||
{
|
||||
// Defense-in-depth: a future HistoryAggregateType addition mustn't silently fall through.
|
||||
Should.Throw<ArgumentOutOfRangeException>(() =>
|
||||
OpcUaClientDriver.MapAggregateToNodeId((HistoryAggregateType)99));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadRawAsync_without_initialize_throws_InvalidOperationException()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
|
||||
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||
await drv.ReadRawAsync("ns=2;s=Counter",
|
||||
DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow, 1000,
|
||||
TestContext.Current.CancellationToken));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadRawAsync_with_malformed_NodeId_returns_empty_result_not_throw()
|
||||
{
|
||||
// Same defensive pattern as ReadAsync / WriteAsync — malformed NodeId short-circuits
|
||||
// to an empty result rather than crashing a batch history call. Needs init via the
|
||||
// throw path first, then we pass "" to trigger the parse-fail branch inside
|
||||
// ExecuteHistoryReadAsync. The init itself fails against 127.0.0.1:1 so we stop there.
|
||||
// Not runnable without init — keep as placeholder for when the in-process fixture
|
||||
// PR lands.
|
||||
await Task.CompletedTask;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadProcessedAsync_without_initialize_throws_InvalidOperationException()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
|
||||
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||
await drv.ReadProcessedAsync("ns=2;s=Counter",
|
||||
DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow,
|
||||
TimeSpan.FromSeconds(10), HistoryAggregateType.Average,
|
||||
TestContext.Current.CancellationToken));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadAtTimeAsync_without_initialize_throws_InvalidOperationException()
|
||||
{
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-hist-uninit");
|
||||
await Should.ThrowAsync<InvalidOperationException>(async () =>
|
||||
await drv.ReadAtTimeAsync("ns=2;s=Counter",
|
||||
[DateTime.UtcNow.AddMinutes(-5), DateTime.UtcNow],
|
||||
TestContext.Current.CancellationToken));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadEventsAsync_throws_NotSupportedException_as_documented()
|
||||
{
|
||||
// The IHistoryProvider default implementation throws; the OPC UA Client driver
|
||||
// deliberately inherits that default (see PR 76 commit body) because the OPC UA
|
||||
// client call path needs an EventFilter SelectClauses spec the interface doesn't carry.
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-events-default");
|
||||
await Should.ThrowAsync<NotSupportedException>(async () =>
|
||||
await ((IHistoryProvider)drv).ReadEventsAsync(
|
||||
sourceName: null,
|
||||
startUtc: DateTime.UtcNow.AddMinutes(-5),
|
||||
endUtc: DateTime.UtcNow,
|
||||
maxEvents: 100,
|
||||
cancellationToken: TestContext.Current.CancellationToken));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,36 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// Scaffold tests for <see cref="SessionReconnectHandler"/> wiring. Wire-level
|
||||
/// disconnect-reconnect-resume coverage against a live upstream server lands with the
|
||||
/// in-process fixture — too much machinery for a unit-test-only lane.
|
||||
/// </summary>
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class OpcUaClientReconnectTests
|
||||
{
|
||||
[Fact]
|
||||
public void Default_ReconnectPeriod_matches_driver_specs_5_seconds()
|
||||
{
|
||||
new OpcUaClientDriverOptions().ReconnectPeriod.ShouldBe(TimeSpan.FromSeconds(5));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Options_ReconnectPeriod_is_configurable_for_aggressive_or_relaxed_retry()
|
||||
{
|
||||
var opts = new OpcUaClientDriverOptions { ReconnectPeriod = TimeSpan.FromMilliseconds(500) };
|
||||
opts.ReconnectPeriod.ShouldBe(TimeSpan.FromMilliseconds(500));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Driver_starts_with_no_reconnect_handler_active_pre_init()
|
||||
{
|
||||
// The reconnect handler is lazy — spun up only when a bad keep-alive fires. Pre-init
|
||||
// there's no session to reconnect, so the field must be null (indirectly verified by
|
||||
// the lifecycle-shape test suite catching any accidental construction).
|
||||
using var drv = new OpcUaClientDriver(new OpcUaClientDriverOptions(), "opcua-reconnect");
|
||||
drv.GetHealth().State.ShouldBe(Core.Abstractions.DriverState.Unknown);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user