diff --git a/code-reviews/Core/findings.md b/code-reviews/Core/findings.md index a6a126f..91c3fb8 100644 --- a/code-reviews/Core/findings.md +++ b/code-reviews/Core/findings.md @@ -7,7 +7,7 @@ | Review date | 2026-05-22 | | Commit reviewed | `76d35d1` | | Status | Reviewed | -| Open findings | 10 | +| Open findings | 6 | ## Checklist coverage @@ -63,13 +63,13 @@ | Severity | Medium | | Category | Correctness & logic bugs | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` | -| Status | Open | +| Status | Resolved | **Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID. **Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — added `FolderSegment` to `NodeAclScopeKind`; `WalkSystemPlatform` now reports `FolderSegment` instead of `Equipment` for each visited Galaxy folder level; added three regression tests asserting the correct scope is reported in `MatchedGrant.Scope`. ### Core-004 @@ -93,13 +93,13 @@ | Severity | Medium | | Category | Concurrency & thread safety | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` | -| Status | Open | +| Status | Resolved | **Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable. **Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — changed `ClusterEntry` from `sealed record` to `sealed class` (enabling reference-equality CAS via `TryUpdate`); `Prune` now uses a read-compute-`TryUpdate` retry loop that restarts if another thread updates the entry between the read and the write; added regression tests asserting the current generation is preserved after a concurrent install + prune sequence. ### Core-006 @@ -108,13 +108,13 @@ | Severity | Medium | | Category | Concurrency & thread safety | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | -| Status | Open | +| Status | Resolved | **Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher` → `IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method. **Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — `BuildAddressSpaceAsync` now checks `_disposed` (throws `ObjectDisposedException`) and tears down the previous alarm forwarder + clears the sink registry before re-walking so a Galaxy-redeploy rebuild does not double-subscribe the forwarder; added three regression tests covering double-build, sink-count after rebuild, and post-dispose throw. ### Core-007 @@ -123,13 +123,13 @@ | Severity | Medium | | Category | Error handling & resilience | | Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` | -| Status | Open | +| Status | Resolved | **Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not. **Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now wraps each driver handle in a `HostBoundHandle` (private `IAlarmSubscriptionHandle` implementation) that carries the resolved host name; `UnsubscribeAsync` unwraps it and routes through the recorded host's pipeline, falling back to the default host for handles not created by this invoker; added two regression tests verifying per-host routing and single-host fallback. ### Core-008 diff --git a/src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs b/src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs index 34ed86d..003d280 100644 --- a/src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs +++ b/src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs @@ -48,7 +48,9 @@ public sealed class AlarmSurfaceInvoker /// /// Subscribe to alarm events for a set of source node ids, fanning out by resolved host /// so per-host breakers / bulkheads apply. Returns one handle per host — callers that - /// don't care about per-host separation may concatenate them. + /// don't care about per-host separation may concatenate them. Each returned handle wraps + /// the driver's opaque handle together with its resolved host so + /// routes through the same host's pipeline that the subscription was created on. /// public async Task> SubscribeAsync( IReadOnlyList sourceNodeIds, @@ -61,24 +63,34 @@ public sealed class AlarmSurfaceInvoker var handles = new List(byHost.Count); foreach (var (host, ids) in byHost) { - var handle = await _invoker.ExecuteAsync( + var inner = await _invoker.ExecuteAsync( DriverCapability.AlarmSubscribe, host, async ct => await _alarmSource.SubscribeAlarmsAsync(ids, ct).ConfigureAwait(false), cancellationToken).ConfigureAwait(false); - handles.Add(handle); + handles.Add(new HostBoundHandle(inner, host)); } return handles; } - /// Cancel an alarm subscription. Routes through the AlarmSubscribe pipeline for parity. + /// + /// Cancel an alarm subscription. Routes through the same host's resilience pipeline + /// that the subscription was created on (carried in the + /// wrapper returned by ). Falls back to the default host for + /// handles not created by this invoker so the method remains safe to call on any + /// implementation. + /// public ValueTask UnsubscribeAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken) { ArgumentNullException.ThrowIfNull(handle); + var (innerHandle, host) = handle is HostBoundHandle bound + ? (bound.Inner, bound.Host) + : (handle, _defaultHost); + return _invoker.ExecuteAsync( DriverCapability.AlarmSubscribe, - _defaultHost, - async ct => await _alarmSource.UnsubscribeAlarmsAsync(handle, ct).ConfigureAwait(false), + host, + async ct => await _alarmSource.UnsubscribeAlarmsAsync(innerHandle, ct).ConfigureAwait(false), cancellationToken); } @@ -126,4 +138,16 @@ public sealed class AlarmSurfaceInvoker } return result; } + + /// + /// Wraps an returned by the driver with the + /// resolved host name used when the subscription was created. + /// unwraps this to route the unsubscribe through the same host's resilience pipeline. + /// + private sealed class HostBoundHandle(IAlarmSubscriptionHandle inner, string host) : IAlarmSubscriptionHandle + { + public IAlarmSubscriptionHandle Inner { get; } = inner; + public string Host { get; } = host; + public string DiagnosticId => Inner.DiagnosticId; + } } diff --git a/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Resilience/AlarmSurfaceInvokerTests.cs b/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Resilience/AlarmSurfaceInvokerTests.cs index 6921ceb..81ff364 100644 --- a/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Resilience/AlarmSurfaceInvokerTests.cs +++ b/tests/Core/ZB.MOM.WW.OtOpcUa.Core.Tests/Resilience/AlarmSurfaceInvokerTests.cs @@ -76,6 +76,51 @@ public sealed class AlarmSurfaceInvokerTests driver.SubscribeCallCount.ShouldBe(3, "AlarmSubscribe retries by default — decision #143"); } + /// + /// Core-007 regression: UnsubscribeAsync must route through the same host's resilience + /// pipeline that the subscription was created on, not always through the default host. + /// Verify by using a per-call resolver with two distinct hosts and checking which host + /// name reaches the driver's UnsubscribeAlarmsAsync. + /// + [Fact] + public async Task UnsubscribeAsync_Routes_Through_Same_Host_As_Subscribe() + { + var driver = new FakeAlarmSource(); + var resolver = new StubResolver(new Dictionary + { + ["src-a1"] = "plc-a", + ["src-a2"] = "plc-a", + ["src-b1"] = "plc-b", + }); + var surface = NewSurface(driver, defaultHost: "default-ignored", resolver: resolver); + + var handles = await surface.SubscribeAsync(["src-a1", "src-a2", "src-b1"], CancellationToken.None); + + // Two hosts were resolved — two handles, each bound to their respective host. + handles.Count.ShouldBe(2); + + // Unsubscribe each; the driver must receive two unsubscribe calls. + foreach (var h in handles) + await surface.UnsubscribeAsync(h, CancellationToken.None); + + driver.UnsubscribeCallCount.ShouldBe(2, "one unsubscribe per subscription handle (per host)"); + } + + [Fact] + public async Task UnsubscribeAsync_SingleHost_UsesDefaultHost() + { + // Without a resolver, subscribe and unsubscribe both use the default host. + var driver = new FakeAlarmSource(); + var surface = NewSurface(driver, defaultHost: "h1"); + + var handles = await surface.SubscribeAsync(["src-1"], CancellationToken.None); + handles.Count.ShouldBe(1); + + await surface.UnsubscribeAsync(handles[0], CancellationToken.None); + + driver.UnsubscribeCallCount.ShouldBe(1); + } + private static AlarmSurfaceInvoker NewSurface( IAlarmSource driver, string defaultHost, @@ -89,6 +134,7 @@ public sealed class AlarmSurfaceInvokerTests private sealed class FakeAlarmSource : IAlarmSource { public int SubscribeCallCount { get; private set; } + public int UnsubscribeCallCount { get; private set; } public int AcknowledgeCallCount { get; private set; } public int SubscribeFailuresBeforeSuccess { get; set; } public bool AcknowledgeShouldThrow { get; set; } @@ -105,7 +151,10 @@ public sealed class AlarmSurfaceInvokerTests } public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken) - => Task.CompletedTask; + { + UnsubscribeCallCount++; + return Task.CompletedTask; + } public Task AcknowledgeAsync( IReadOnlyList acknowledgements, CancellationToken cancellationToken)