review(OpcUaServer): fix silent auto-unshelve failure (empty User -> 'system')

Cross-module fix from the review sweep. -007 (Medium): OnTimedUnshelve built its AlarmCommand
with User=string.Empty, so Part9StateMachine.ApplyUnshelve rejected it (ArgumentException,
swallowed) and a TimedShelve never auto-expired. Pass the canonical 'system' user; the
AlarmAck-gate bypass is preserved. Repurposed the test that had encoded the bug.
This commit is contained in:
Joseph Doherty
2026-06-19 12:29:40 -04:00
parent 298bd4bfe5
commit 40749d3f67
3 changed files with 63 additions and 5 deletions
+43 -1
View File
@@ -16,7 +16,7 @@ a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003 |
| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003, -007 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) |
| 4 | Error handling & resilience | OpcUaServer-004 |
@@ -209,3 +209,45 @@ claim and the retired `EquipmentNodeWalker`/F14b framing), and the two `OpcUaApp
doc blocks (class summary + `BuildUserTokenPolicies`) to describe the shipped impersonation/auth
wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build
re-verified green.
### OpcUaServer-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `OtOpcUaNodeManager.cs:674` (`MaterialiseAlarmCondition``alarm.OnTimedUnshelve`) |
| Status | Resolved |
**Description:** The system-timer auto-unshelve callback (`alarm.OnTimedUnshelve`, wired in
`MaterialiseAlarmCondition`) — fired by the SDK when a `TimedShelve` duration expires — built its
`AlarmCommand` with `User: string.Empty`:
`AlarmCommandRouter?.Invoke(new AlarmCommand(alarmId, "Unshelve", string.Empty, null, null))`. That
command is routed to the scripted-alarm engine, whose `ScriptedAlarmEngine.UnshelveAsync` calls
`Part9StateMachine.ApplyUnshelve(cur, user, _clock())`
(`src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/Part9StateMachine.cs:211-213`), and that method
opens with `if (string.IsNullOrWhiteSpace(user)) throw new ArgumentException("User required.", …)`.
The exception is swallowed downstream (caught by `ScriptedAlarmHostActor`), so this is NOT a crash —
but the auto-unshelve **silently no-ops**: a `TimedShelve` never auto-expires, leaving the alarm
permanently shelved with no operator-visible error. The bug was operationally invisible and even had
a green node-manager test that asserted `cmd.User == string.Empty`, encoding the defect as expected
behaviour. The separate, correct design rule — `OnTimedUnshelve` BYPASSES the `AlarmAck` role gate
because it is a session-less system timer with no client principal (routing through the gated
`HandleAlarmCommand` would return `BadUserAccessDenied`) — is intentional and was preserved; only the
empty `User` string was the defect.
**Recommendation:** Pass a non-empty system user at the `OnTimedUnshelve` call site so `ApplyUnshelve`
accepts it, while keeping the gate bypass. Use the codebase-wide canonical `"system"` system-actor
label (matching `Part9StateMachine`'s own `AutoUnshelve` audit user, `AlarmConditionState`'s
"engine-internal events ⇒ `system`" doc, and `AuditActor.SystemFallback = "system"`).
**Resolution:** Resolved — 2026-06-19 (SHA pending): changed the `OnTimedUnshelve` delegate in
`MaterialiseAlarmCondition` (`OtOpcUaNodeManager.cs:674`) to build the `AlarmCommand` with
`User: "system"` instead of `string.Empty`, so the engine's `ApplyUnshelve` user-required guard
accepts the system-initiated unshelve and the timed auto-unshelve actually applies. The `AlarmAck`
gate bypass for the session-less system timer is unchanged. TDD: repurposed the existing
`AlarmCommandRouterTests.OnTimedUnshelve_with_system_context_returns_good_and_routes_unshelve`
(which previously asserted `cmd.User == string.Empty`) to assert a non-empty `"system"` user —
verified it failed against the unfixed source (`cmd.User should not be null or white space`) and
passes after the fix. Surgical one-line src change; no public-contract/wire change. Full
`OpcUaServer.Tests` suite green (275/275).