fix(core): resolve Low code-review findings (Core-004,008,009,010,011,012)

- Core-004: add ConfigureAwait(false) to DriverHost.RegisterAsync /
  UnregisterAsync / DisposeAsync.
- Core-008: rewrite the BuildAddressSpaceAsync XML doc to correctly name
  the caller (OpcUaApplicationHost.PopulateAddressSpaces) that owns the
  per-driver isolation.
- Core-009: snapshot DriverResilienceOptions once per non-idempotent write
  in CapabilityInvoker.ExecuteWriteAsync.
- Core-010: switch DriverResilienceOptions.Resolve to TryGetValue with a
  diagnostic error message when a tier table is missing a capability.
- Core-011: add an optional diagnostic callback to PermissionTrieBuilder
  so production callers can surface scope-path mismatches.
- Core-012: correct the stale WedgeDetector ctor summary and add the
  Reconnecting row to DriverHealthReport's state matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 05:38:09 -04:00
parent ff2e75ab98
commit 8be6afbda4
15 changed files with 656 additions and 28 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. The server host has no synchronization context so behaviour is currently correct, but the inconsistency is a maintenance hazard and a deviation from the established convention in `Core.Resilience`.
**Recommendation:** Add `.ConfigureAwait(false)` to the three awaited calls in `DriverHost.RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `.ConfigureAwait(false)` to the three awaited driver calls in `RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`; added three `RegisterAsync/UnregisterAsync/DisposeAsync_Does_Not_Capture_SynchronizationContext` regression tests that install a tracking `SynchronizationContext` on a dedicated thread and assert zero captured posts.
### Core-005
@@ -138,13 +138,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Status | Open |
| Status | Resolved |
**Description:** The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception from `discovery.DiscoverAsync` propagates straight out unhandled, and nothing here marks a subtree Faulted. The isolation is presumably done by the server-layer caller, but the comment asserts behaviour this class does not implement.
**Recommendation:** Either implement the documented isolation in `GenericDriverNodeManager`, or correct the XML doc to state that exception isolation is the caller's responsibility and name the type that performs it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — corrected the `BuildAddressSpaceAsync` XML doc to (a) explicitly state exception isolation is the caller's responsibility, and (b) name the type that performs it (`Server.OpcUa.OpcUaApplicationHost.PopulateAddressSpaces`); added `BuildAddressSpaceAsync_Propagates_Discovery_Exceptions_To_Caller` regression test verifying the documented propagation behaviour.
### Core-009
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
| Status | Open |
| Status | Resolved |
**Description:** `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path it rebuilds a fresh `DriverResilienceOptions` and a one-entry dictionary on every non-idempotent write, and the redundant accessor calls could observe two different snapshots if an Admin edit lands between them. Phase 6.1 budgets a 1% pipeline overhead; this is unnecessary allocation plus a minor consistency hazard.
**Recommendation:** Capture `var options = _optionsAccessor();` once at the top of the non-idempotent branch and derive both the `with` and the `Resolve` call from that snapshot. Consider caching the no-retry pipeline keyed on `(hostName, non-idempotent)`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ExecuteWriteAsync` now captures `_optionsAccessor()` into a single `snapshot` local at the top of the non-idempotent branch; the `with` expression and the `Resolve(Write)` call both derive from that snapshot so the two values are guaranteed coherent and only one accessor invocation occurs per call. Added `ExecuteWriteAsync_NonIdempotent_Snapshots_Options_Once_Per_Call` (counts invocations) and `ExecuteWriteAsync_NonIdempotent_Uses_Consistent_Options_Snapshot` (alternating-accessor) regression tests.
### Core-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will make `Resolve` throw `KeyNotFoundException` at runtime on the capability hot path rather than failing at build time. The two are coupled by convention only.
**Recommendation:** Either add a `default` arm to `Resolve` returning a conservative policy (and logging), or add a unit-test invariant asserting every `DriverCapability` value is present in each tier's default table.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `Resolve` now uses `TryGetValue` and throws a diagnostic `KeyNotFoundException` whose message names the missing capability + tier and points to `GetTierDefaults` when a capability is missing from both the override map and the tier table; the existing `TierDefaults_Cover_EveryCapability` test invariant prevents this in shipped code, and added `Resolve_Returns_NonNull_Policy_For_Every_Capability` (per-tier exhaustive) + `Resolve_Throws_Diagnostic_When_Capability_Missing_From_Tier_Defaults` regression tests.
### Core-011
@@ -183,13 +183,13 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
| Status | Open |
| Status | Resolved |
**Description:** `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for deterministic tests, not for production"). The fallback silently produces a structurally incorrect trie when `scopePaths` is null or a row's `ScopeId` is missing — a UnsLine-scoped grant ends up as a direct child of the root, so `WalkEquipment` / `WalkSystemPlatform` never reach it and the grant is effectively dropped, with no diagnostic. There is no test asserting the production multi-level descent versus the fallback.
**Recommendation:** Add unit tests covering `Build` with `scopePaths` producing the correct multi-level trie and the missing-`ScopeId` fallback. Have `Descend` surface a diagnostic (or throw outside test configuration) when a sub-cluster row cannot be located in `scopePaths`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added optional `Action<PermissionTrieBuildDiagnostic>? diagnostic` parameter to `PermissionTrieBuilder.Build`; `Descend` now invokes the callback with a `MissingScopePath` diagnostic when a sub-cluster row's `ScopeId` is absent from a supplied (non-null) `scopePaths` lookup so production callers can log + surface orphan grants instead of silently dropping them. New `PermissionTrieBuilderTests` covers (a) production multi-level descent with sibling-line non-leakage, (b) the deterministic-test fallback, (c) the diagnostic firing on a missing scope-path entry, (d) no diagnostic when all rows resolve, and (e) no diagnostic when `scopePaths` is null (explicit test mode).
### Core-012
@@ -198,10 +198,10 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
| Status | Open |
| Status | Resolved |
**Description:** Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is documented as stateless; the comment describes nothing the constructor does. (2) `DriverHealthReport` — the `<remarks>` state matrix lists Unknown, Initializing, Healthy, Degraded, Faulted but `Aggregate` (lines 42-44) also folds `DriverState.Reconnecting` into the Degraded verdict. `Reconnecting` is a real `DriverState` member absent from the documented matrix.
**Recommendation:** Replace the `WedgeDetector` constructor `<summary>` with an accurate description (e.g. "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s"). Add `Reconnecting` to the `DriverHealthReport` `<remarks>` state matrix and state it maps to Degraded.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the `WedgeDetector(.ctor)` `<summary>` with an accurate "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s" description plus a `<param>` block; added the `Reconnecting` row to the `DriverHealthReport` `<remarks>` state matrix and updated the verdict-rule prose. Added `WedgeDetectorTests.Doc_Constructor_Summary_Describes_Threshold_Clamp` and `DriverHealthReportTests.Doc_State_Matrix_Includes_Reconnecting` regression tests that parse the generated `.xml` doc to assert the strings, plus `Any_Reconnecting_WithoutFaultedOrNotReady_IsDegraded` confirming the documented Reconnecting → Degraded behaviour.