fix(core): resolve Medium code-review finding (Core-007)
SubscribeAsync now wraps each driver handle in a private HostBoundHandle that carries the resolved host name. UnsubscribeAsync unwraps it and routes through the recorded host's resilience pipeline, correctly charging the subscription's originating host's circuit breaker/bulkhead instead of always using the default host. Falls back to the default host for handles not created by this invoker. Two regression tests added; update findings.md Open count from 10 to 6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 10 |
|
||||
| Open findings | 6 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -63,13 +63,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID.
|
||||
|
||||
**Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — added `FolderSegment` to `NodeAclScopeKind`; `WalkSystemPlatform` now reports `FolderSegment` instead of `Equipment` for each visited Galaxy folder level; added three regression tests asserting the correct scope is reported in `MatchedGrant.Scope`.
|
||||
|
||||
### Core-004
|
||||
|
||||
@@ -93,13 +93,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable.
|
||||
|
||||
**Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — changed `ClusterEntry` from `sealed record` to `sealed class` (enabling reference-equality CAS via `TryUpdate`); `Prune` now uses a read-compute-`TryUpdate` retry loop that restarts if another thread updates the entry between the read and the write; added regression tests asserting the current generation is preserved after a concurrent install + prune sequence.
|
||||
|
||||
### Core-006
|
||||
|
||||
@@ -108,13 +108,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher` → `IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method.
|
||||
|
||||
**Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — `BuildAddressSpaceAsync` now checks `_disposed` (throws `ObjectDisposedException`) and tears down the previous alarm forwarder + clears the sink registry before re-walking so a Galaxy-redeploy rebuild does not double-subscribe the forwarder; added three regression tests covering double-build, sink-count after rebuild, and post-dispose throw.
|
||||
|
||||
### Core-007
|
||||
|
||||
@@ -123,13 +123,13 @@
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not.
|
||||
|
||||
**Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now wraps each driver handle in a `HostBoundHandle` (private `IAlarmSubscriptionHandle` implementation) that carries the resolved host name; `UnsubscribeAsync` unwraps it and routes through the recorded host's pipeline, falling back to the default host for handles not created by this invoker; added two regression tests verifying per-host routing and single-host fallback.
|
||||
|
||||
### Core-008
|
||||
|
||||
|
||||
@@ -48,7 +48,9 @@ public sealed class AlarmSurfaceInvoker
|
||||
/// <summary>
|
||||
/// Subscribe to alarm events for a set of source node ids, fanning out by resolved host
|
||||
/// so per-host breakers / bulkheads apply. Returns one handle per host — callers that
|
||||
/// don't care about per-host separation may concatenate them.
|
||||
/// don't care about per-host separation may concatenate them. Each returned handle wraps
|
||||
/// the driver's opaque handle together with its resolved host so <see cref="UnsubscribeAsync"/>
|
||||
/// routes through the same host's pipeline that the subscription was created on.
|
||||
/// </summary>
|
||||
public async Task<IReadOnlyList<IAlarmSubscriptionHandle>> SubscribeAsync(
|
||||
IReadOnlyList<string> sourceNodeIds,
|
||||
@@ -61,24 +63,34 @@ public sealed class AlarmSurfaceInvoker
|
||||
var handles = new List<IAlarmSubscriptionHandle>(byHost.Count);
|
||||
foreach (var (host, ids) in byHost)
|
||||
{
|
||||
var handle = await _invoker.ExecuteAsync(
|
||||
var inner = await _invoker.ExecuteAsync(
|
||||
DriverCapability.AlarmSubscribe,
|
||||
host,
|
||||
async ct => await _alarmSource.SubscribeAlarmsAsync(ids, ct).ConfigureAwait(false),
|
||||
cancellationToken).ConfigureAwait(false);
|
||||
handles.Add(handle);
|
||||
handles.Add(new HostBoundHandle(inner, host));
|
||||
}
|
||||
return handles;
|
||||
}
|
||||
|
||||
/// <summary>Cancel an alarm subscription. Routes through the AlarmSubscribe pipeline for parity.</summary>
|
||||
/// <summary>
|
||||
/// Cancel an alarm subscription. Routes through the same host's resilience pipeline
|
||||
/// that the subscription was created on (carried in the <see cref="HostBoundHandle"/>
|
||||
/// wrapper returned by <see cref="SubscribeAsync"/>). Falls back to the default host for
|
||||
/// handles not created by this invoker so the method remains safe to call on any
|
||||
/// <see cref="IAlarmSubscriptionHandle"/> implementation.
|
||||
/// </summary>
|
||||
public ValueTask UnsubscribeAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(handle);
|
||||
var (innerHandle, host) = handle is HostBoundHandle bound
|
||||
? (bound.Inner, bound.Host)
|
||||
: (handle, _defaultHost);
|
||||
|
||||
return _invoker.ExecuteAsync(
|
||||
DriverCapability.AlarmSubscribe,
|
||||
_defaultHost,
|
||||
async ct => await _alarmSource.UnsubscribeAlarmsAsync(handle, ct).ConfigureAwait(false),
|
||||
host,
|
||||
async ct => await _alarmSource.UnsubscribeAlarmsAsync(innerHandle, ct).ConfigureAwait(false),
|
||||
cancellationToken);
|
||||
}
|
||||
|
||||
@@ -126,4 +138,16 @@ public sealed class AlarmSurfaceInvoker
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Wraps an <see cref="IAlarmSubscriptionHandle"/> returned by the driver with the
|
||||
/// resolved host name used when the subscription was created. <see cref="UnsubscribeAsync"/>
|
||||
/// unwraps this to route the unsubscribe through the same host's resilience pipeline.
|
||||
/// </summary>
|
||||
private sealed class HostBoundHandle(IAlarmSubscriptionHandle inner, string host) : IAlarmSubscriptionHandle
|
||||
{
|
||||
public IAlarmSubscriptionHandle Inner { get; } = inner;
|
||||
public string Host { get; } = host;
|
||||
public string DiagnosticId => Inner.DiagnosticId;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -76,6 +76,51 @@ public sealed class AlarmSurfaceInvokerTests
|
||||
driver.SubscribeCallCount.ShouldBe(3, "AlarmSubscribe retries by default — decision #143");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Core-007 regression: UnsubscribeAsync must route through the same host's resilience
|
||||
/// pipeline that the subscription was created on, not always through the default host.
|
||||
/// Verify by using a per-call resolver with two distinct hosts and checking which host
|
||||
/// name reaches the driver's UnsubscribeAlarmsAsync.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public async Task UnsubscribeAsync_Routes_Through_Same_Host_As_Subscribe()
|
||||
{
|
||||
var driver = new FakeAlarmSource();
|
||||
var resolver = new StubResolver(new Dictionary<string, string>
|
||||
{
|
||||
["src-a1"] = "plc-a",
|
||||
["src-a2"] = "plc-a",
|
||||
["src-b1"] = "plc-b",
|
||||
});
|
||||
var surface = NewSurface(driver, defaultHost: "default-ignored", resolver: resolver);
|
||||
|
||||
var handles = await surface.SubscribeAsync(["src-a1", "src-a2", "src-b1"], CancellationToken.None);
|
||||
|
||||
// Two hosts were resolved — two handles, each bound to their respective host.
|
||||
handles.Count.ShouldBe(2);
|
||||
|
||||
// Unsubscribe each; the driver must receive two unsubscribe calls.
|
||||
foreach (var h in handles)
|
||||
await surface.UnsubscribeAsync(h, CancellationToken.None);
|
||||
|
||||
driver.UnsubscribeCallCount.ShouldBe(2, "one unsubscribe per subscription handle (per host)");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task UnsubscribeAsync_SingleHost_UsesDefaultHost()
|
||||
{
|
||||
// Without a resolver, subscribe and unsubscribe both use the default host.
|
||||
var driver = new FakeAlarmSource();
|
||||
var surface = NewSurface(driver, defaultHost: "h1");
|
||||
|
||||
var handles = await surface.SubscribeAsync(["src-1"], CancellationToken.None);
|
||||
handles.Count.ShouldBe(1);
|
||||
|
||||
await surface.UnsubscribeAsync(handles[0], CancellationToken.None);
|
||||
|
||||
driver.UnsubscribeCallCount.ShouldBe(1);
|
||||
}
|
||||
|
||||
private static AlarmSurfaceInvoker NewSurface(
|
||||
IAlarmSource driver,
|
||||
string defaultHost,
|
||||
@@ -89,6 +134,7 @@ public sealed class AlarmSurfaceInvokerTests
|
||||
private sealed class FakeAlarmSource : IAlarmSource
|
||||
{
|
||||
public int SubscribeCallCount { get; private set; }
|
||||
public int UnsubscribeCallCount { get; private set; }
|
||||
public int AcknowledgeCallCount { get; private set; }
|
||||
public int SubscribeFailuresBeforeSuccess { get; set; }
|
||||
public bool AcknowledgeShouldThrow { get; set; }
|
||||
@@ -105,7 +151,10 @@ public sealed class AlarmSurfaceInvokerTests
|
||||
}
|
||||
|
||||
public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
|
||||
=> Task.CompletedTask;
|
||||
{
|
||||
UnsubscribeCallCount++;
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public Task AcknowledgeAsync(
|
||||
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)
|
||||
|
||||
Reference in New Issue
Block a user