fix(core): resolve Medium code-review finding (Core-007)

SubscribeAsync now wraps each driver handle in a private HostBoundHandle
that carries the resolved host name.  UnsubscribeAsync unwraps it and
routes through the recorded host's resilience pipeline, correctly
charging the subscription's originating host's circuit breaker/bulkhead
instead of always using the default host.  Falls back to the default
host for handles not created by this invoker.  Two regression tests
added; update findings.md Open count from 10 to 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 08:24:17 -04:00
parent 6cec98caef
commit ce86deca62
3 changed files with 89 additions and 16 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 6 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
| Status | Open |
| Status | Resolved |
**Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID.
**Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `FolderSegment` to `NodeAclScopeKind`; `WalkSystemPlatform` now reports `FolderSegment` instead of `Equipment` for each visited Galaxy folder level; added three regression tests asserting the correct scope is reported in `MatchedGrant.Scope`.
### Core-004
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
| Status | Open |
| Status | Resolved |
**Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable.
**Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — changed `ClusterEntry` from `sealed record` to `sealed class` (enabling reference-equality CAS via `TryUpdate`); `Prune` now uses a read-compute-`TryUpdate` retry loop that restarts if another thread updates the entry between the read and the write; added regression tests asserting the current generation is preserved after a concurrent install + prune sequence.
### Core-006
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Status | Open |
| Status | Resolved |
**Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher``IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method.
**Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `BuildAddressSpaceAsync` now checks `_disposed` (throws `ObjectDisposedException`) and tears down the previous alarm forwarder + clears the sink registry before re-walking so a Galaxy-redeploy rebuild does not double-subscribe the forwarder; added three regression tests covering double-build, sink-count after rebuild, and post-dispose throw.
### Core-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
| Status | Open |
| Status | Resolved |
**Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not.
**Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now wraps each driver handle in a `HostBoundHandle` (private `IAlarmSubscriptionHandle` implementation) that carries the resolved host name; `UnsubscribeAsync` unwraps it and routes through the recorded host's pipeline, falling back to the default host for handles not created by this invoker; added two regression tests verifying per-host routing and single-host fallback.
### Core-008

View File

@@ -48,7 +48,9 @@ public sealed class AlarmSurfaceInvoker
/// <summary>
/// Subscribe to alarm events for a set of source node ids, fanning out by resolved host
/// so per-host breakers / bulkheads apply. Returns one handle per host — callers that
/// don't care about per-host separation may concatenate them.
/// don't care about per-host separation may concatenate them. Each returned handle wraps
/// the driver's opaque handle together with its resolved host so <see cref="UnsubscribeAsync"/>
/// routes through the same host's pipeline that the subscription was created on.
/// </summary>
public async Task<IReadOnlyList<IAlarmSubscriptionHandle>> SubscribeAsync(
IReadOnlyList<string> sourceNodeIds,
@@ -61,24 +63,34 @@ public sealed class AlarmSurfaceInvoker
var handles = new List<IAlarmSubscriptionHandle>(byHost.Count);
foreach (var (host, ids) in byHost)
{
var handle = await _invoker.ExecuteAsync(
var inner = await _invoker.ExecuteAsync(
DriverCapability.AlarmSubscribe,
host,
async ct => await _alarmSource.SubscribeAlarmsAsync(ids, ct).ConfigureAwait(false),
cancellationToken).ConfigureAwait(false);
handles.Add(handle);
handles.Add(new HostBoundHandle(inner, host));
}
return handles;
}
/// <summary>Cancel an alarm subscription. Routes through the AlarmSubscribe pipeline for parity.</summary>
/// <summary>
/// Cancel an alarm subscription. Routes through the same host's resilience pipeline
/// that the subscription was created on (carried in the <see cref="HostBoundHandle"/>
/// wrapper returned by <see cref="SubscribeAsync"/>). Falls back to the default host for
/// handles not created by this invoker so the method remains safe to call on any
/// <see cref="IAlarmSubscriptionHandle"/> implementation.
/// </summary>
public ValueTask UnsubscribeAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(handle);
var (innerHandle, host) = handle is HostBoundHandle bound
? (bound.Inner, bound.Host)
: (handle, _defaultHost);
return _invoker.ExecuteAsync(
DriverCapability.AlarmSubscribe,
_defaultHost,
async ct => await _alarmSource.UnsubscribeAlarmsAsync(handle, ct).ConfigureAwait(false),
host,
async ct => await _alarmSource.UnsubscribeAlarmsAsync(innerHandle, ct).ConfigureAwait(false),
cancellationToken);
}
@@ -126,4 +138,16 @@ public sealed class AlarmSurfaceInvoker
}
return result;
}
/// <summary>
/// Wraps an <see cref="IAlarmSubscriptionHandle"/> returned by the driver with the
/// resolved host name used when the subscription was created. <see cref="UnsubscribeAsync"/>
/// unwraps this to route the unsubscribe through the same host's resilience pipeline.
/// </summary>
private sealed class HostBoundHandle(IAlarmSubscriptionHandle inner, string host) : IAlarmSubscriptionHandle
{
public IAlarmSubscriptionHandle Inner { get; } = inner;
public string Host { get; } = host;
public string DiagnosticId => Inner.DiagnosticId;
}
}

View File

@@ -76,6 +76,51 @@ public sealed class AlarmSurfaceInvokerTests
driver.SubscribeCallCount.ShouldBe(3, "AlarmSubscribe retries by default — decision #143");
}
/// <summary>
/// Core-007 regression: UnsubscribeAsync must route through the same host's resilience
/// pipeline that the subscription was created on, not always through the default host.
/// Verify by using a per-call resolver with two distinct hosts and checking which host
/// name reaches the driver's UnsubscribeAlarmsAsync.
/// </summary>
[Fact]
public async Task UnsubscribeAsync_Routes_Through_Same_Host_As_Subscribe()
{
var driver = new FakeAlarmSource();
var resolver = new StubResolver(new Dictionary<string, string>
{
["src-a1"] = "plc-a",
["src-a2"] = "plc-a",
["src-b1"] = "plc-b",
});
var surface = NewSurface(driver, defaultHost: "default-ignored", resolver: resolver);
var handles = await surface.SubscribeAsync(["src-a1", "src-a2", "src-b1"], CancellationToken.None);
// Two hosts were resolved — two handles, each bound to their respective host.
handles.Count.ShouldBe(2);
// Unsubscribe each; the driver must receive two unsubscribe calls.
foreach (var h in handles)
await surface.UnsubscribeAsync(h, CancellationToken.None);
driver.UnsubscribeCallCount.ShouldBe(2, "one unsubscribe per subscription handle (per host)");
}
[Fact]
public async Task UnsubscribeAsync_SingleHost_UsesDefaultHost()
{
// Without a resolver, subscribe and unsubscribe both use the default host.
var driver = new FakeAlarmSource();
var surface = NewSurface(driver, defaultHost: "h1");
var handles = await surface.SubscribeAsync(["src-1"], CancellationToken.None);
handles.Count.ShouldBe(1);
await surface.UnsubscribeAsync(handles[0], CancellationToken.None);
driver.UnsubscribeCallCount.ShouldBe(1);
}
private static AlarmSurfaceInvoker NewSurface(
IAlarmSource driver,
string defaultHost,
@@ -89,6 +134,7 @@ public sealed class AlarmSurfaceInvokerTests
private sealed class FakeAlarmSource : IAlarmSource
{
public int SubscribeCallCount { get; private set; }
public int UnsubscribeCallCount { get; private set; }
public int AcknowledgeCallCount { get; private set; }
public int SubscribeFailuresBeforeSuccess { get; set; }
public bool AcknowledgeShouldThrow { get; set; }
@@ -105,7 +151,10 @@ public sealed class AlarmSurfaceInvokerTests
}
public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
=> Task.CompletedTask;
{
UnsubscribeCallCount++;
return Task.CompletedTask;
}
public Task AcknowledgeAsync(
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)