fix(core): resolve Low code-review findings (Core-004,008,009,010,011,012)

- Core-004: add ConfigureAwait(false) to DriverHost.RegisterAsync /
  UnregisterAsync / DisposeAsync.
- Core-008: rewrite the BuildAddressSpaceAsync XML doc to correctly name
  the caller (OpcUaApplicationHost.PopulateAddressSpaces) that owns the
  per-driver isolation.
- Core-009: snapshot DriverResilienceOptions once per non-idempotent write
  in CapabilityInvoker.ExecuteWriteAsync.
- Core-010: switch DriverResilienceOptions.Resolve to TryGetValue with a
  diagnostic error message when a tier table is missing a capability.
- Core-011: add an optional diagnostic callback to PermissionTrieBuilder
  so production callers can surface scope-path mismatches.
- Core-012: correct the stale WedgeDetector ctor summary and add the
  Reconnecting row to DriverHealthReport's state matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 05:38:09 -04:00
parent ff2e75ab98
commit 8be6afbda4
15 changed files with 656 additions and 28 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. The server host has no synchronization context so behaviour is currently correct, but the inconsistency is a maintenance hazard and a deviation from the established convention in `Core.Resilience`.
**Recommendation:** Add `.ConfigureAwait(false)` to the three awaited calls in `DriverHost.RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added `.ConfigureAwait(false)` to the three awaited driver calls in `RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`; added three `RegisterAsync/UnregisterAsync/DisposeAsync_Does_Not_Capture_SynchronizationContext` regression tests that install a tracking `SynchronizationContext` on a dedicated thread and assert zero captured posts.
### Core-005
@@ -138,13 +138,13 @@
| Severity | Low |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Status | Open |
| Status | Resolved |
**Description:** The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception from `discovery.DiscoverAsync` propagates straight out unhandled, and nothing here marks a subtree Faulted. The isolation is presumably done by the server-layer caller, but the comment asserts behaviour this class does not implement.
**Recommendation:** Either implement the documented isolation in `GenericDriverNodeManager`, or correct the XML doc to state that exception isolation is the caller's responsibility and name the type that performs it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — corrected the `BuildAddressSpaceAsync` XML doc to (a) explicitly state exception isolation is the caller's responsibility, and (b) name the type that performs it (`Server.OpcUa.OpcUaApplicationHost.PopulateAddressSpaces`); added `BuildAddressSpaceAsync_Propagates_Discovery_Exceptions_To_Caller` regression test verifying the documented propagation behaviour.
### Core-009
@@ -153,13 +153,13 @@
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
| Status | Open |
| Status | Resolved |
**Description:** `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path it rebuilds a fresh `DriverResilienceOptions` and a one-entry dictionary on every non-idempotent write, and the redundant accessor calls could observe two different snapshots if an Admin edit lands between them. Phase 6.1 budgets a 1% pipeline overhead; this is unnecessary allocation plus a minor consistency hazard.
**Recommendation:** Capture `var options = _optionsAccessor();` once at the top of the non-idempotent branch and derive both the `with` and the `Resolve` call from that snapshot. Consider caching the no-retry pipeline keyed on `(hostName, non-idempotent)`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `ExecuteWriteAsync` now captures `_optionsAccessor()` into a single `snapshot` local at the top of the non-idempotent branch; the `with` expression and the `Resolve(Write)` call both derive from that snapshot so the two values are guaranteed coherent and only one accessor invocation occurs per call. Added `ExecuteWriteAsync_NonIdempotent_Snapshots_Options_Once_Per_Call` (counts invocations) and `ExecuteWriteAsync_NonIdempotent_Uses_Consistent_Options_Snapshot` (alternating-accessor) regression tests.
### Core-010
@@ -168,13 +168,13 @@
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
| Status | Open |
| Status | Resolved |
**Description:** `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will make `Resolve` throw `KeyNotFoundException` at runtime on the capability hot path rather than failing at build time. The two are coupled by convention only.
**Recommendation:** Either add a `default` arm to `Resolve` returning a conservative policy (and logging), or add a unit-test invariant asserting every `DriverCapability` value is present in each tier's default table.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — `Resolve` now uses `TryGetValue` and throws a diagnostic `KeyNotFoundException` whose message names the missing capability + tier and points to `GetTierDefaults` when a capability is missing from both the override map and the tier table; the existing `TierDefaults_Cover_EveryCapability` test invariant prevents this in shipped code, and added `Resolve_Returns_NonNull_Policy_For_Every_Capability` (per-tier exhaustive) + `Resolve_Throws_Diagnostic_When_Capability_Missing_From_Tier_Defaults` regression tests.
### Core-011
@@ -183,13 +183,13 @@
| Severity | Low |
| Category | Testing coverage |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
| Status | Open |
| Status | Resolved |
**Description:** `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for deterministic tests, not for production"). The fallback silently produces a structurally incorrect trie when `scopePaths` is null or a row's `ScopeId` is missing — a UnsLine-scoped grant ends up as a direct child of the root, so `WalkEquipment` / `WalkSystemPlatform` never reach it and the grant is effectively dropped, with no diagnostic. There is no test asserting the production multi-level descent versus the fallback.
**Recommendation:** Add unit tests covering `Build` with `scopePaths` producing the correct multi-level trie and the missing-`ScopeId` fallback. Have `Descend` surface a diagnostic (or throw outside test configuration) when a sub-cluster row cannot be located in `scopePaths`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — added optional `Action<PermissionTrieBuildDiagnostic>? diagnostic` parameter to `PermissionTrieBuilder.Build`; `Descend` now invokes the callback with a `MissingScopePath` diagnostic when a sub-cluster row's `ScopeId` is absent from a supplied (non-null) `scopePaths` lookup so production callers can log + surface orphan grants instead of silently dropping them. New `PermissionTrieBuilderTests` covers (a) production multi-level descent with sibling-line non-leakage, (b) the deterministic-test fallback, (c) the diagnostic firing on a missing scope-path entry, (d) no diagnostic when all rows resolve, and (e) no diagnostic when `scopePaths` is null (explicit test mode).
### Core-012
@@ -198,10 +198,10 @@
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
| Status | Open |
| Status | Resolved |
**Description:** Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is documented as stateless; the comment describes nothing the constructor does. (2) `DriverHealthReport` — the `<remarks>` state matrix lists Unknown, Initializing, Healthy, Degraded, Faulted but `Aggregate` (lines 42-44) also folds `DriverState.Reconnecting` into the Degraded verdict. `Reconnecting` is a real `DriverState` member absent from the documented matrix.
**Recommendation:** Replace the `WedgeDetector` constructor `<summary>` with an accurate description (e.g. "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s"). Add `Reconnecting` to the `DriverHealthReport` `<remarks>` state matrix and state it maps to Degraded.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-23 — replaced the `WedgeDetector(.ctor)` `<summary>` with an accurate "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s" description plus a `<param>` block; added the `Reconnecting` row to the `DriverHealthReport` `<remarks>` state matrix and updated the verdict-rule prose. Added `WedgeDetectorTests.Doc_Constructor_Summary_Describes_Threshold_Clamp` and `DriverHealthReportTests.Doc_State_Matrix_Includes_Reconnecting` regression tests that parse the generated `.xml` doc to assert the strings, plus `Any_Reconnecting_WithoutFaultedOrNotReady_IsDegraded` confirming the documented Reconnecting → Degraded behaviour.

View File

@@ -26,11 +26,27 @@ public static class PermissionTrieBuilder
/// Build a trie for one cluster/generation from the supplied rows. The caller is
/// responsible for pre-filtering rows to the target generation + cluster.
/// </summary>
/// <param name="clusterId">Cluster the trie is being built for; rows for other clusters are skipped.</param>
/// <param name="generationId">Config-generation the rows belong to; stamped on the returned trie.</param>
/// <param name="rows">ACL rows for this cluster + generation.</param>
/// <param name="scopePaths">
/// Optional <c>ScopeId</c> → multi-level trie-path lookup. When supplied, sub-cluster rows
/// descend to their structurally-correct trie node. When null, sub-cluster rows fall back
/// to a direct child of the trie root keyed on <c>ScopeId</c> — deterministic-test mode.
/// </param>
/// <param name="diagnostic">
/// Optional callback invoked when a sub-cluster row's <c>ScopeId</c> cannot be located
/// in <paramref name="scopePaths"/>. Production callers should wire a logger here so
/// orphaned grants surface — silently dropping them under the wrong trie level was the
/// Core-011 production hazard. The callback fires only when <paramref name="scopePaths"/>
/// is non-null (a null lookup is the explicit deterministic-test fallback mode).
/// </param>
public static PermissionTrie Build(
string clusterId,
long generationId,
IReadOnlyList<NodeAcl> rows,
IReadOnlyDictionary<string, NodeAclPath>? scopePaths = null)
IReadOnlyDictionary<string, NodeAclPath>? scopePaths = null,
Action<PermissionTrieBuildDiagnostic>? diagnostic = null)
{
ArgumentException.ThrowIfNullOrWhiteSpace(clusterId);
ArgumentNullException.ThrowIfNull(rows);
@@ -45,7 +61,7 @@ public static class PermissionTrieBuilder
var node = row.ScopeKind switch
{
NodeAclScopeKind.Cluster => trie.Root,
_ => Descend(trie.Root, row, scopePaths),
_ => Descend(trie.Root, row, scopePaths, diagnostic),
};
if (node is not null)
@@ -55,16 +71,30 @@ public static class PermissionTrieBuilder
return trie;
}
private static PermissionTrieNode? Descend(PermissionTrieNode root, NodeAcl row, IReadOnlyDictionary<string, NodeAclPath>? scopePaths)
private static PermissionTrieNode? Descend(
PermissionTrieNode root,
NodeAcl row,
IReadOnlyDictionary<string, NodeAclPath>? scopePaths,
Action<PermissionTrieBuildDiagnostic>? diagnostic)
{
if (string.IsNullOrEmpty(row.ScopeId)) return null;
// For sub-cluster scopes the caller supplies a path lookup so we know the containing
// namespace / UnsArea / UnsLine ids. Without a path lookup we fall back to putting the
// row directly under the root using its ScopeId — works for deterministic tests, not
// for production where the hierarchy must be honored.
// for production where the hierarchy must be honored. If a scopePaths lookup IS
// provided but is missing the row's ScopeId, surface a diagnostic so the caller can
// log the orphan instead of silently dropping the grant under an unreachable node.
if (scopePaths is null || !scopePaths.TryGetValue(row.ScopeId, out var path))
{
if (scopePaths is not null)
{
diagnostic?.Invoke(new PermissionTrieBuildDiagnostic(
NodeAclId: row.NodeAclId,
ScopeKind: row.ScopeKind,
ScopeId: row.ScopeId,
Reason: PermissionTrieBuildDiagnosticReason.MissingScopePath));
}
return EnsureChild(root, row.ScopeId);
}
@@ -95,3 +125,30 @@ public static class PermissionTrieBuilder
/// applicable; or (for SystemPlatform kind) NamespaceId / FolderSegment / .../TagId.
/// </param>
public sealed record NodeAclPath(IReadOnlyList<string> Segments);
/// <summary>
/// Diagnostic emitted by <see cref="PermissionTrieBuilder.Build"/> when a row could not be
/// placed at its structurally-correct trie node. Production callers should log these so
/// orphaned grants surface instead of being silently dropped under an unreachable node
/// (Core-011).
/// </summary>
/// <param name="NodeAclId">The offending row's logical id.</param>
/// <param name="ScopeKind">The row's <see cref="NodeAclScopeKind"/>.</param>
/// <param name="ScopeId">The row's <c>ScopeId</c> that could not be located.</param>
/// <param name="Reason">Why the diagnostic fired.</param>
public sealed record PermissionTrieBuildDiagnostic(
string NodeAclId,
NodeAclScopeKind ScopeKind,
string ScopeId,
PermissionTrieBuildDiagnosticReason Reason);
/// <summary>Reasons <see cref="PermissionTrieBuildDiagnostic"/> can be emitted.</summary>
public enum PermissionTrieBuildDiagnosticReason
{
/// <summary>
/// The row's <c>ScopeId</c> was not present in the supplied <c>scopePaths</c> lookup.
/// The grant is placed as a direct child of the trie root keyed on <c>ScopeId</c> — a
/// position the production trie walker cannot reach for multi-level scopes.
/// </summary>
MissingScopePath,
}

View File

@@ -52,7 +52,7 @@ public sealed class DriverHost : IAsyncDisposable
_drivers[id] = driver;
}
try { await driver.InitializeAsync(driverConfigJson, ct); }
try { await driver.InitializeAsync(driverConfigJson, ct).ConfigureAwait(false); }
catch
{
// Keep the driver registered — operator will see Faulted state and can reinitialize.
@@ -69,7 +69,7 @@ public sealed class DriverHost : IAsyncDisposable
_drivers.Remove(driverInstanceId);
}
try { await driver.ShutdownAsync(ct); }
try { await driver.ShutdownAsync(ct).ConfigureAwait(false); }
catch { /* shutdown is best-effort; logs elsewhere */ }
}
@@ -84,7 +84,7 @@ public sealed class DriverHost : IAsyncDisposable
foreach (var driver in snapshot)
{
try { await driver.ShutdownAsync(CancellationToken.None); } catch { /* ignore */ }
try { await driver.ShutdownAsync(CancellationToken.None).ConfigureAwait(false); } catch { /* ignore */ }
(driver as IDisposable)?.Dispose();
}
}

View File

@@ -15,11 +15,13 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Observability;
/// → /readyz 503 (not yet ready).</item>
/// <item><see cref="DriverState.Healthy"/> → /readyz 200.</item>
/// <item><see cref="DriverState.Degraded"/> → /readyz 200 with flagged driver IDs.</item>
/// <item><see cref="DriverState.Reconnecting"/> → /readyz 200 with flagged driver IDs
/// (driver alive but not serving live data; same verdict as Degraded).</item>
/// <item><see cref="DriverState.Faulted"/> → /readyz 503.</item>
/// </list>
/// The overall verdict is computed across the fleet: any Faulted → Faulted; any
/// Unknown/Initializing → NotReady; any Degraded → Degraded; else Healthy. An empty fleet
/// is Healthy (nothing to degrade).
/// Unknown/Initializing → NotReady; any Degraded or Reconnecting → Degraded; else
/// Healthy. An empty fleet is Healthy (nothing to degrade).
/// </remarks>
public static class DriverHealthReport
{

View File

@@ -39,8 +39,11 @@ public class GenericDriverNodeManager(IDriver driver) : IDisposable
/// If called a second time (e.g. Galaxy redeploy via <c>IRediscoverable.OnRediscoveryNeeded</c>)
/// the previous alarm subscription is torn down and the sink registry is cleared before
/// re-walking, preventing double delivery of alarm transitions.
/// Exception isolation (marking the driver's subtree Faulted) is the caller's responsibility —
/// exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> propagate to the caller.
/// Exception isolation (per decision #12 — marking the driver's subtree Faulted while other
/// drivers stay available) is the caller's responsibility; exceptions from
/// <see cref="ITagDiscovery.DiscoverAsync"/> propagate unhandled to the caller. The Server
/// project's <c>OpcUaApplicationHost.PopulateAddressSpaces</c> wraps this call in a per-driver
/// try/catch that logs + leaves the driver's subtree empty until a Reinitialize succeeds.
/// </summary>
public async Task BuildAddressSpaceAsync(IAddressSpaceBuilder builder, CancellationToken ct)
{

View File

@@ -118,11 +118,15 @@ public sealed class CapabilityInvoker
if (!isIdempotent)
{
var noRetryOptions = _optionsAccessor() with
// Snapshot the options exactly once per call — invoking _optionsAccessor twice can
// (a) observe two different snapshots if an Admin edit lands between them and
// (b) wastes an allocation on the per-write hot path (Phase 6.1 1% pipeline budget).
var snapshot = _optionsAccessor();
var noRetryOptions = snapshot with
{
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
{
[DriverCapability.Write] = _optionsAccessor().Resolve(DriverCapability.Write) with { RetryCount = 0 },
[DriverCapability.Write] = snapshot.Resolve(DriverCapability.Write) with { RetryCount = 0 },
},
};
var pipeline = _builder.GetOrCreate(_driverInstanceId, $"{hostName}::non-idempotent", DriverCapability.Write, noRetryOptions);

View File

@@ -42,13 +42,27 @@ public sealed record DriverResilienceOptions
/// Look up the effective policy for a capability, falling back to tier defaults when no
/// override is configured. Never returns null.
/// </summary>
/// <exception cref="KeyNotFoundException">
/// Thrown when neither the override map nor the tier defaults carry an entry for the
/// requested capability. The <c>TierDefaults_Cover_EveryCapability</c> invariant test
/// in <c>DriverResilienceOptionsTests</c> guarantees every defined enum value is present
/// in each tier's table, so this only fires when a caller passes an out-of-range value
/// or someone adds a <see cref="DriverCapability"/> member without updating
/// <see cref="GetTierDefaults"/>. The message names the missing capability and tier.
/// </exception>
public CapabilityPolicy Resolve(DriverCapability capability)
{
if (CapabilityPolicies.TryGetValue(capability, out var policy))
return policy;
var defaults = GetTierDefaults(Tier);
return defaults[capability];
if (defaults.TryGetValue(capability, out var fallback))
return fallback;
throw new KeyNotFoundException(
$"No policy defined for capability '{capability}' under tier '{Tier}'. " +
$"This indicates a {nameof(DriverCapability)} enum value missing from {nameof(GetTierDefaults)} — " +
"add the capability to every tier's default table.");
}
/// <summary>

View File

@@ -23,7 +23,15 @@ public sealed class WedgeDetector
/// <summary>Wedge-detection threshold; pass &lt; 60 s and the detector clamps to 60 s.</summary>
public TimeSpan Threshold { get; }
/// <summary>Whether the driver reported itself <see cref="DriverState.Healthy"/> at construction.</summary>
/// <summary>
/// Construct with the wedge-detection threshold; values below 60 s clamp to 60 s so
/// the detector never fires below the documented floor.
/// </summary>
/// <param name="threshold">
/// Time without a successful unit of work after which a Healthy driver with pending
/// work is considered Faulted. Clamped to a minimum of 60 s per the plan-default of
/// 5 × PublishingInterval.
/// </param>
public WedgeDetector(TimeSpan threshold)
{
Threshold = threshold < TimeSpan.FromSeconds(60) ? TimeSpan.FromSeconds(60) : threshold;

View File

@@ -0,0 +1,155 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
using ZB.MOM.WW.OtOpcUa.Configuration.Enums;
using ZB.MOM.WW.OtOpcUa.Core.Authorization;
namespace ZB.MOM.WW.OtOpcUa.Core.Tests.Authorization;
/// <summary>
/// Core-011 regression coverage for <see cref="PermissionTrieBuilder.Build"/>'s
/// <c>Descend</c> helper:
/// <list type="bullet">
/// <item>With a <c>scopePaths</c> lookup the row must land at the correct multi-level
/// trie node — a deep <see cref="NodeAclScopeKind.UnsLine"/> grant must be visible
/// ONLY when the requested scope walks the same namespace/area/line chain.</item>
/// <item>Without a <c>scopePaths</c> entry the row falls back to a direct child of
/// the namespace root keyed on the row's <c>ScopeId</c>. The builder must surface
/// this fallback (warning callback) so callers know a grant was placed where the
/// walker can't reach it for production hierarchies — silently dropping the grant
/// is the Core-011 production hazard.</item>
/// </list>
/// </summary>
[Trait("Category", "Unit")]
public sealed class PermissionTrieBuilderTests
{
private static NodeAcl Row(string group, NodeAclScopeKind scope, string? scopeId, NodePermissions flags, string clusterId = "c1") =>
new()
{
NodeAclRowId = Guid.NewGuid(),
NodeAclId = $"acl-{Guid.NewGuid():N}",
GenerationId = 1,
ClusterId = clusterId,
LdapGroup = group,
ScopeKind = scope,
ScopeId = scopeId,
PermissionFlags = flags,
};
private static NodeScope EquipmentTag(string cluster, string ns, string area, string line, string equip, string tag) =>
new()
{
ClusterId = cluster,
NamespaceId = ns,
UnsAreaId = area,
UnsLineId = line,
EquipmentId = equip,
TagId = tag,
Kind = NodeHierarchyKind.Equipment,
};
[Fact]
public void Build_With_ScopePaths_Places_UnsLine_Row_At_Correct_Multi_Level_Node()
{
// Scope path mirrors the production hierarchy: namespace → area → line.
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-42"] = new(new[] { "ns", "area-1", "line-42" }),
};
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths);
// Walk through the same chain — the grant must be reachable.
var matchOnLine = trie.CollectMatches(
EquipmentTag("c1", "ns", "area-1", "line-42", "eq-A", "tag-A"),
["cn=ops"]);
matchOnLine.Count.ShouldBe(1, "row must land at the correct multi-level trie node");
// A different line under the same area must not pick up the grant.
var matchOnOtherLine = trie.CollectMatches(
EquipmentTag("c1", "ns", "area-1", "line-99", "eq-A", "tag-A"),
["cn=ops"]);
matchOnOtherLine.ShouldBeEmpty(
"grant anchored at line-42 must not leak to sibling line-99 under the same area");
}
[Fact]
public void Build_Without_ScopePaths_Falls_Back_To_Root_Child_For_Tests()
{
// Fallback path — deterministic tests pass without a scope-path lookup. The row
// is placed as a direct child of the trie root keyed by ScopeId.
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var trie = PermissionTrieBuilder.Build("c1", 1, rows);
// Root has one child — "line-42".
trie.Root.Children.ShouldContainKey("line-42");
var node = trie.Root.Children["line-42"];
node.Grants.Count.ShouldBe(1);
}
/// <summary>
/// Core-011 regression: when a sub-cluster row's ScopeId is not in the supplied
/// <c>scopePaths</c>, the fallback diagnostic callback must fire so the caller can
/// surface a warning. Silently dropping the grant under the wrong trie level is the
/// production hazard the finding flagged.
/// </summary>
[Fact]
public void Build_Missing_ScopePath_Entry_Invokes_Diagnostic_Callback()
{
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-known"] = new(new[] { "ns", "area-1", "line-known" }),
};
// Row references a line that is NOT in the path lookup.
var rows = new[]
{
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-orphan", NodePermissions.Read),
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-known", NodePermissions.Read),
};
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
var trie = PermissionTrieBuilder.Build("c1", 1, rows, paths, diagnostics.Add);
diagnostics.Count.ShouldBe(1, "exactly one row had no matching scope-path entry");
diagnostics[0].ScopeId.ShouldBe("line-orphan");
diagnostics[0].ScopeKind.ShouldBe(NodeAclScopeKind.UnsLine);
diagnostics[0].Reason.ShouldBe(PermissionTrieBuildDiagnosticReason.MissingScopePath);
}
[Fact]
public void Build_No_Diagnostic_When_All_Sub_Cluster_Rows_Have_ScopePaths()
{
var paths = new Dictionary<string, NodeAclPath>(StringComparer.OrdinalIgnoreCase)
{
["line-A"] = new(new[] { "ns", "area-1", "line-A" }),
["line-B"] = new(new[] { "ns", "area-1", "line-B" }),
};
var rows = new[]
{
Row("cn=ops", NodeAclScopeKind.Cluster, null, NodePermissions.Read), // cluster-level — no descent
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-A", NodePermissions.Read),
Row("cn=ops", NodeAclScopeKind.UnsLine, "line-B", NodePermissions.Read),
};
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
PermissionTrieBuilder.Build("c1", 1, rows, paths, diagnostics.Add);
diagnostics.ShouldBeEmpty("no rows are missing a scope-path entry");
}
[Fact]
public void Build_Diagnostic_Callback_Optional_When_ScopePaths_Null()
{
// No diagnostics callback should fire when scopePaths itself is null — that's the
// "deterministic-test fallback" mode, not a production drop.
var rows = new[] { Row("cn=ops", NodeAclScopeKind.UnsLine, "line-42", NodePermissions.Read) };
var diagnostics = new List<PermissionTrieBuildDiagnostic>();
PermissionTrieBuilder.Build("c1", 1, rows, scopePaths: null, diagnostic: diagnostics.Add);
diagnostics.ShouldBeEmpty(
"scopePaths=null is the explicit test-fallback mode and must not emit per-row warnings");
}
}

View File

@@ -77,4 +77,162 @@ public sealed class DriverHostTests
host.RegisteredDriverIds.ShouldNotContain("d-1");
driver.ShutDown.ShouldBeTrue();
}
/// <summary>
/// Core-004 regression — DriverHost is a library type whose async calls must use
/// ConfigureAwait(false) to match the convention used by CapabilityInvoker /
/// AlarmSurfaceInvoker. Asserts the awaited driver call does not post its
/// continuation back to a captured SynchronizationContext.
/// The driver awaits an unsettled TaskCompletionSource so it does not introduce its
/// own capture — only DriverHost's await of the returned Task can drive a post.
/// </summary>
[Fact]
public async Task RegisterAsync_Does_Not_Capture_SynchronizationContext()
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-1", tcs);
var ctx = new TrackingSynchronizationContext();
// Run the DriverHost call on a dedicated thread that has our tracking SyncContext installed.
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
var registerTask = host.RegisterAsync(driver, "{}", CancellationToken.None);
// Complete the driver's InitializeAsync from a background thread so DriverHost's
// await must resume via the captured context if ConfigureAwait(false) was missing.
_ = Task.Run(() => tcs.SetResult());
await registerTask.ConfigureAwait(false);
await host.DisposeAsync().ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"RegisterAsync's awaited driver call must use ConfigureAwait(false) so the continuation does not post back to the captured context");
}
[Fact]
public async Task UnregisterAsync_Does_Not_Capture_SynchronizationContext()
{
var initTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
initTcs.SetResult();
var shutdownTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-2", initTcs, shutdownTcs);
var ctx = new TrackingSynchronizationContext();
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
await host.RegisterAsync(driver, "{}", CancellationToken.None).ConfigureAwait(false);
// After RegisterAsync we re-enter the context. Reset the post counter so we only
// observe UnregisterAsync's behaviour from here on.
((TrackingSynchronizationContext)SynchronizationContext.Current!).Reset();
var task = host.UnregisterAsync("d-cfg-2", CancellationToken.None);
_ = Task.Run(() => shutdownTcs.SetResult());
await task.ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"UnregisterAsync's awaited shutdown call must use ConfigureAwait(false)");
}
[Fact]
public async Task DisposeAsync_Does_Not_Capture_SynchronizationContext()
{
var initTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
initTcs.SetResult();
var shutdownTcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
var driver = new TcsDriver("d-cfg-3", initTcs, shutdownTcs);
var ctx = new TrackingSynchronizationContext();
var workerCtx = await RunOnContextAsync(ctx, async () =>
{
var host = new DriverHost();
await host.RegisterAsync(driver, "{}", CancellationToken.None).ConfigureAwait(false);
((TrackingSynchronizationContext)SynchronizationContext.Current!).Reset();
var task = host.DisposeAsync();
_ = Task.Run(() => shutdownTcs.SetResult());
await task.ConfigureAwait(false);
});
workerCtx.PostCount.ShouldBe(0,
"DisposeAsync's awaited shutdown call must use ConfigureAwait(false)");
}
/// <summary>
/// Run <paramref name="body"/> on a dedicated thread with <paramref name="ctx"/>
/// installed as the current SynchronizationContext, and return <paramref name="ctx"/>
/// after the body completes. The dedicated thread guarantees that resuming via the
/// captured context observably routes through our Post hook (the ThreadPool would
/// otherwise clear the context on the resuming worker).
/// </summary>
private static Task<TrackingSynchronizationContext> RunOnContextAsync(TrackingSynchronizationContext ctx, Func<Task> body)
{
var done = new TaskCompletionSource<TrackingSynchronizationContext>(TaskCreationOptions.RunContinuationsAsynchronously);
var t = new Thread(() =>
{
SynchronizationContext.SetSynchronizationContext(ctx);
try
{
// Pump posted continuations until the body completes.
var task = body();
while (!task.IsCompleted)
{
if (ctx.TryDequeue(out var work)) work();
else Thread.Sleep(1);
}
// Drain any tail continuations.
while (ctx.TryDequeue(out var work)) work();
task.GetAwaiter().GetResult();
done.SetResult(ctx);
}
catch (Exception ex) { done.SetException(ex); }
}) { IsBackground = true };
t.Start();
return done.Task;
}
/// <summary>Driver whose Initialize / Shutdown completions are caller-controlled via TCS.</summary>
private sealed class TcsDriver(string id, TaskCompletionSource initTcs, TaskCompletionSource? shutdownTcs = null) : IDriver
{
public string DriverInstanceId { get; } = id;
public string DriverType => "Tcs";
public Task InitializeAsync(string _, CancellationToken ct) => initTcs.Task;
public Task ReinitializeAsync(string _, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => (shutdownTcs ?? CompletedTcs).Task;
public DriverHealth GetHealth() => new(DriverState.Healthy, null, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
private static readonly TaskCompletionSource CompletedTcs = MakeCompleted();
private static TaskCompletionSource MakeCompleted()
{
var t = new TaskCompletionSource();
t.SetResult();
return t;
}
}
/// <summary>SynchronizationContext that queues posts to a thread-safe work list and counts them.</summary>
private sealed class TrackingSynchronizationContext : SynchronizationContext
{
private readonly System.Collections.Concurrent.ConcurrentQueue<Action> _queue = new();
public int PostCount;
public int SendCount;
public override void Post(SendOrPostCallback d, object? state)
{
Interlocked.Increment(ref PostCount);
_queue.Enqueue(() => d(state));
}
public override void Send(SendOrPostCallback d, object? state)
{
Interlocked.Increment(ref SendCount);
d(state);
}
public bool TryDequeue(out Action work) => _queue.TryDequeue(out work!);
public void Reset() { Interlocked.Exchange(ref PostCount, 0); Interlocked.Exchange(ref SendCount, 0); }
}
}

View File

@@ -143,6 +143,41 @@ public sealed class GenericDriverNodeManagerTests
nm.BuildAddressSpaceAsync(new RecordingBuilder(), CancellationToken.None));
}
/// <summary>
/// Core-008 regression: the XML doc states exception isolation is the caller's
/// responsibility — exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> must propagate
/// out of <c>BuildAddressSpaceAsync</c> unhandled so the Server layer's per-driver try/catch
/// (<c>OpcUaApplicationHost.PopulateAddressSpaces</c>) can mark the subtree Faulted.
/// </summary>
[Fact]
public async Task BuildAddressSpaceAsync_Propagates_Discovery_Exceptions_To_Caller()
{
var driver = new ThrowingDiscoveryDriver();
using var nm = new GenericDriverNodeManager(driver);
var ex = await Should.ThrowAsync<InvalidOperationException>(() =>
nm.BuildAddressSpaceAsync(new RecordingBuilder(), CancellationToken.None));
ex.Message.ShouldBe("discovery boom",
"exceptions from DiscoverAsync must propagate unhandled — exception isolation is the caller's responsibility (e.g. OpcUaApplicationHost)");
}
/// <summary>Driver whose DiscoverAsync throws — exercises the exception-isolation boundary.</summary>
private sealed class ThrowingDiscoveryDriver : IDriver, ITagDiscovery
{
public string DriverInstanceId => "throwing";
public string DriverType => "Throwing";
public Task InitializeAsync(string _, CancellationToken __) => Task.CompletedTask;
public Task ReinitializeAsync(string _, CancellationToken __) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken _) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, null, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken _) => Task.CompletedTask;
public Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken ct)
=> throw new InvalidOperationException("discovery boom");
}
// --- test doubles ---
private sealed class FakeDriver : IDriver, ITagDiscovery, IAlarmSource

View File

@@ -67,4 +67,53 @@ public sealed class DriverHealthReportTests
{
DriverHealthReport.HttpStatus(verdict).ShouldBe(expected);
}
/// <summary>
/// Core-012 regression: <see cref="DriverState.Reconnecting"/> must aggregate to
/// <see cref="ReadinessVerdict.Degraded"/> — the doc remarks state matrix lists this
/// mapping (after the Core-012 doc fix that added the Reconnecting row).
/// </summary>
[Fact]
public void Any_Reconnecting_WithoutFaultedOrNotReady_IsDegraded()
{
var verdict = DriverHealthReport.Aggregate([
new DriverHealthSnapshot("a", DriverState.Healthy),
new DriverHealthSnapshot("b", DriverState.Reconnecting),
]);
verdict.ShouldBe(ReadinessVerdict.Degraded,
"Reconnecting = driver alive but not serving live data → /readyz stays 200 while operators see the affected driver in the body");
}
/// <summary>
/// Core-012 regression: assert the XML <c>&lt;remarks&gt;</c> on
/// <see cref="DriverHealthReport"/> names <see cref="DriverState.Reconnecting"/> in its
/// state matrix. Catches a future doc-drift if someone re-aliases Reconnecting without
/// updating the matrix.
/// </summary>
[Fact]
public void Doc_State_Matrix_Includes_Reconnecting()
{
var xmlPath = Path.Combine(
AppContext.BaseDirectory,
"ZB.MOM.WW.OtOpcUa.Core.xml");
File.Exists(xmlPath).ShouldBeTrue($"expected XML doc file at {xmlPath}");
var content = File.ReadAllText(xmlPath);
var driverHealthReportRemarks = ExtractRemarksFor(content, "T:ZB.MOM.WW.OtOpcUa.Core.Observability.DriverHealthReport");
driverHealthReportRemarks.ShouldContain("Reconnecting");
}
private static string ExtractRemarksFor(string xml, string member)
{
var memberStart = xml.IndexOf($"<member name=\"{member}\"", StringComparison.Ordinal);
if (memberStart < 0) return string.Empty;
var memberEnd = xml.IndexOf("</member>", memberStart, StringComparison.Ordinal);
if (memberEnd < 0) return string.Empty;
var slice = xml.Substring(memberStart, memberEnd - memberStart);
var remarksStart = slice.IndexOf("<remarks>", StringComparison.Ordinal);
if (remarksStart < 0) return string.Empty;
var remarksEnd = slice.IndexOf("</remarks>", remarksStart, StringComparison.Ordinal);
return remarksEnd < 0 ? string.Empty : slice.Substring(remarksStart, remarksEnd - remarksStart);
}
}

View File

@@ -148,4 +148,66 @@ public sealed class CapabilityInvokerTests
builder.CachedPipelineCount.ShouldBe(2);
}
/// <summary>
/// Core-009 regression: ExecuteWriteAsync's non-idempotent branch must snapshot
/// <c>_optionsAccessor</c> exactly once per call. Calling it multiple times allocates
/// redundant options objects on the per-write hot path and creates a consistency hazard
/// where an Admin edit mid-call could observe two different snapshots.
/// </summary>
[Fact]
public async Task ExecuteWriteAsync_NonIdempotent_Snapshots_Options_Once_Per_Call()
{
var options = new DriverResilienceOptions
{
Tier = DriverTier.A,
CapabilityPolicies = new Dictionary<DriverCapability, CapabilityPolicy>
{
[DriverCapability.Write] = new(TimeoutSeconds: 2, RetryCount: 3, BreakerFailureThreshold: 5),
},
};
var accessorCalls = 0;
var invoker = new CapabilityInvoker(
new DriverResiliencePipelineBuilder(),
"drv-test",
() => { Interlocked.Increment(ref accessorCalls); return options; });
await invoker.ExecuteWriteAsync(
"host-1",
isIdempotent: false,
_ => ValueTask.FromResult(0),
CancellationToken.None);
accessorCalls.ShouldBe(1,
"ExecuteWriteAsync's non-idempotent branch must capture the options snapshot exactly once per call");
}
/// <summary>
/// Core-009 regression — companion consistency assertion: the non-idempotent branch must
/// not observe two different option snapshots even if the accessor's returned value changes
/// between calls (simulating an Admin edit landing mid-flight). With a single snapshot the
/// two derived values (<c>with</c> base + <c>Resolve(Write)</c>) come from the same options
/// instance.
/// </summary>
[Fact]
public async Task ExecuteWriteAsync_NonIdempotent_Uses_Consistent_Options_Snapshot()
{
var a = new DriverResilienceOptions { Tier = DriverTier.A };
var b = new DriverResilienceOptions { Tier = DriverTier.B };
var alternating = new[] { a, b, a, b }.AsEnumerable().GetEnumerator();
var invoker = new CapabilityInvoker(
new DriverResiliencePipelineBuilder(),
"drv-test",
() => { alternating.MoveNext(); return alternating.Current; });
// If options is read twice, the with-expression and Resolve(Write) come from
// different tier tables (A then B) — the resulting one-entry dictionary is
// inconsistent with the snapshot used for the rest of the options. Single-snapshot
// semantics guarantee the call sees a coherent view.
await Should.NotThrowAsync(async () => await invoker.ExecuteWriteAsync(
"host-1",
isIdempotent: false,
_ => ValueTask.FromResult(0),
CancellationToken.None));
}
}

View File

@@ -99,4 +99,49 @@ public sealed class DriverResilienceOptionsTests
options.Resolve(DriverCapability.Write).ShouldBe(
DriverResilienceOptions.GetTierDefaults(DriverTier.A)[DriverCapability.Write]);
}
/// <summary>
/// Core-010 regression: every <see cref="DriverCapability"/> value must successfully resolve
/// under every tier with a default <see cref="DriverResilienceOptions"/>. A future
/// enum-only addition that forgets to update <c>GetTierDefaults</c> would otherwise blow up
/// on the hot path with <see cref="KeyNotFoundException"/>.
/// </summary>
[Theory]
[InlineData(DriverTier.A)]
[InlineData(DriverTier.B)]
[InlineData(DriverTier.C)]
public void Resolve_Returns_NonNull_Policy_For_Every_Capability(DriverTier tier)
{
var options = new DriverResilienceOptions { Tier = tier };
foreach (var capability in Enum.GetValues<DriverCapability>())
{
var policy = options.Resolve(capability);
policy.ShouldNotBeNull(
$"every DriverCapability must resolve to a non-null policy for tier {tier} — {capability} did not");
}
}
/// <summary>
/// Core-010 regression: when a capability is somehow missing from BOTH the override
/// map and the tier defaults (defensive — should be impossible thanks to the
/// <c>TierDefaults_Cover_EveryCapability</c> invariant, but is the failure mode the
/// finding flagged), <c>Resolve</c> must throw a diagnostic <see cref="KeyNotFoundException"/>
/// that names the missing capability and tier — not a bare lookup failure.
/// </summary>
[Fact]
public void Resolve_Throws_Diagnostic_When_Capability_Missing_From_Tier_Defaults()
{
// Use a CapabilityPolicies dict that purposely omits one capability and use reflection
// to confirm the message names the capability when the tier defaults also omit it.
// We can't easily mutate GetTierDefaults so we exercise the documented behavior on a
// synthetic non-tier-known capability (we cast an out-of-range enum value).
var options = new DriverResilienceOptions { Tier = DriverTier.A };
var bogus = (DriverCapability)int.MaxValue;
var ex = Should.Throw<KeyNotFoundException>(() => options.Resolve(bogus));
ex.Message.ShouldContain(bogus.ToString());
ex.Message.ShouldContain(DriverTier.A.ToString());
ex.Message.ShouldContain(nameof(DriverResilienceOptions.GetTierDefaults));
}
}

View File

@@ -109,4 +109,40 @@ public sealed class WedgeDetectorTests
new DemandSignal(0, 0, 1, Now).HasPendingWork.ShouldBeTrue();
new DemandSignal(0, 0, 0, Now).HasPendingWork.ShouldBeFalse();
}
/// <summary>
/// Core-012 regression: the XML <c>&lt;summary&gt;</c> on the <see cref="WedgeDetector"/>
/// constructor must accurately describe what the constructor does (take + clamp the
/// threshold). The previous text — "Whether the driver reported itself Healthy at
/// construction" — referenced behaviour the constructor doesn't perform.
/// </summary>
[Fact]
public void Doc_Constructor_Summary_Describes_Threshold_Clamp()
{
var xmlPath = Path.Combine(
AppContext.BaseDirectory,
"ZB.MOM.WW.OtOpcUa.Core.xml");
File.Exists(xmlPath).ShouldBeTrue($"expected XML doc file at {xmlPath}");
var content = File.ReadAllText(xmlPath);
var ctorSummary = ExtractSummaryFor(content,
"M:ZB.MOM.WW.OtOpcUa.Core.Stability.WedgeDetector.#ctor(System.TimeSpan)");
ctorSummary.ShouldNotBeNullOrWhiteSpace();
ctorSummary.ShouldNotContain("reported itself");
ctorSummary.ShouldContain("threshold");
}
private static string ExtractSummaryFor(string xml, string member)
{
var memberStart = xml.IndexOf($"<member name=\"{member}\"", StringComparison.Ordinal);
if (memberStart < 0) return string.Empty;
var memberEnd = xml.IndexOf("</member>", memberStart, StringComparison.Ordinal);
if (memberEnd < 0) return string.Empty;
var slice = xml.Substring(memberStart, memberEnd - memberStart);
var sumStart = slice.IndexOf("<summary>", StringComparison.Ordinal);
if (sumStart < 0) return string.Empty;
var sumEnd = slice.IndexOf("</summary>", sumStart, StringComparison.Ordinal);
return sumEnd < 0 ? string.Empty : slice.Substring(sumStart, sumEnd - sumStart);
}
}