# Code Review — Runtime | Field | Value | |---|---| | Module | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime` | | Reviewer | Claude Code | | Review date | 2026-06-19 | | Commit reviewed | `7286d320` | | Status | Reviewed | | Open findings | 0 | ## Checklist coverage | # | Category | Result | |---|---|---| | 1 | Correctness & logic bugs | Findings Runtime-001, Runtime-004, Runtime-005 | | 2 | OtOpcUa conventions | No issues found | | 3 | Concurrency & thread safety | No issues found (see Notes) | | 4 | Error handling & resilience | Findings Runtime-002, Runtime-003 | | 5 | Security | No issues found | | 6 | Performance & resource management | No issues found | | 7 | Design-document adherence | No issues found | | 8 | Code organization & conventions | No issues found | | 9 | Testing coverage | Finding Runtime-006 | | 10 | Documentation & comments | No issues found | ### Scope 22 source `.cs` under `src/Server/ZB.MOM.WW.OtOpcUa.Runtime` (6129 LOC): the deployment-artifact decode (`DeploymentArtifact.cs`), the OPC UA publish actor (`OpcUaPublishActor.cs`), the driver host + instance actors (`DriverHostActor.cs`, `DriverInstanceActor.cs`, `DriverSpawnPlan.cs`, `NativeAlarmProjector.cs`), the address-space write gateway (`ActorNodeWriteGateway.cs`), health probes (`DbHealthProbeActor.cs`, `PeerOpcUaProbeActor.cs`, `PeerProbeSupervisor.cs`), the historian adapter + options (`HistorianAdapterActor.cs`, `AlarmHistorianOptions.cs`, `ServerHistorianOptions.cs`), scripted alarms (`ScriptedAlarmHostActor.cs`, `EfAlarmConditionStateStore.cs`, `DependencyMuxTagUpstreamSource.cs`), scripting (`DpsScriptLogPublisher.cs`), virtual tags (`DependencyMuxActor.cs`, `VirtualTagActor.cs`, `VirtualTagHostActor.cs`), the health publisher (`AkkaDriverHealthPublisher.cs`), and DI wiring (`ServiceCollectionExtensions.cs`). ### Notes (no-finding observations) - **Concurrency**: actors are single-threaded per instance; off-thread continuations (`OpcUaPublishActor.OnHealthTick`, `DriverHostActor.HandleRouteNodeWrite`, `ScriptedAlarmHostActor.OnApply`/`OnEngineEmission`) correctly capture `Sender`/`Self` before the `await`/`ContinueWith` and only construct messages off-thread, mutating state only back on the actor thread. `DependencyMuxTagUpstreamSource` is the one genuinely-shared type and uses immutable observer lists + atomic CAS correctly. No shared mutable state crosses the actor boundary. - **Security**: write-through is Primary-gated AND `WriteOperate`-gated upstream of Runtime (`OnWriteValue` in the node manager); native-alarm ack is `AlarmAck`-gated upstream; Runtime re-applies the Primary gate defensively. No secret logging — `AlarmHistorianOptions.Validate` / `ServerHistorianOptions.Validate` warn when `SharedSecret` is empty but never echo the secret value. ## Findings ### Runtime-001 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:311-318`; cross-module `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpacePlan.cs:56-61` + `src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/IOpcUaAddressSpaceSink.cs:55` | | Status | Deferred | **Description:** This is the Runtime side of the orchestrator-handed cross-module item **OpcUaServer-001**: a UNS Area/Line *rename* (display-name-only edit, no Equipment/Tag/Driver/Alarm change) produces an `AddressSpacePlan` that `IsEmpty`, so `OpcUaPublishActor.HandleRebuild` short-circuits at line 313 (`if (plan.IsEmpty) { ...; return; }`) **before** reaching `_applier.MaterialiseHierarchy(composition)` at line 326 — the rebuilt folder display-name never lands and OPC UA clients keep browsing the stale name. I verified this is **NOT self-contained within Runtime**: a complete fix needs changes in two other modules that Runtime cannot touch: 1. `AddressSpacePlan.IsEmpty` (in OpcUaServer) does not diff `UnsArea`/`UnsLine` at all (it only tracks Equipment/Driver/Alarm/EquipmentTag/VirtualTag). For the rename to drive a rebuild, the plan must gain a UNS-changed signal (a new diff set) so `IsEmpty` returns `false`. 2. Even if `MaterialiseHierarchy` were reached, `IOpcUaAddressSpaceSink.EnsureFolder` (in Commons) is contractually **idempotent and returns the existing folder unchanged** — it has no rename semantics, so an existing folder's `DisplayName` would not be updated. `EnsureFolder` (or a new `RenameFolder`) in the Commons contract + its SDK-backed implementation must actually re-set the display name. Both (1) and (2) are wire/contract-shaped changes in OpcUaServer + Commons, outside this module's edit boundary. The Runtime side alone (acting on a UNS-changed plan and re-running `MaterialiseHierarchy`) is necessary but not sufficient. **Recommendation:** Track jointly with OpcUaServer-001. OpcUaServer: add a UNS-rename diff set to `AddressSpacePlan` so `IsEmpty` is false on a rename. Commons: give `EnsureFolder` (or a new `RenameFolder`) update-display-name semantics on an existing folder. Runtime: once those land, ensure `HandleRebuild` runs `MaterialiseHierarchy` for a non-empty UNS-only plan (it already does, since the short-circuit only fires for a truly-empty plan). No Runtime-only change can close it safely. **Resolution:** _(deferred — needs OpcUaServer `AddressSpacePlan` rename signal + Commons `EnsureFolder`/`RenameFolder` rename semantics; both outside the Runtime edit boundary. Tracked with OpcUaServer-001.)_ ### Runtime-002 | Field | Value | |---|---| | Severity | Medium | | Category | Error handling & resilience | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs:148-169, 782-828` | | Status | Resolved | **Description:** The artifact-decode methods document a "lenient" contract — *"Empty / malformed blobs return an empty list"* / *"return an empty composition"* — and guard every parse path with `catch (JsonException)`. But the per-row readers (`TryReadSpec`, `ReadAreaProjection`, `ReadLineProjection`, `ReadEquipmentNode`, `ReadDriverPlan`, `ReadAlarmPlan`) call `JsonElement.GetString()` and `GetBoolean()` directly on a property element. If the artifact is *valid JSON but a field carries the wrong JSON type* (e.g. `"Enabled": "true"` as a string, or `"Name": 5` as a number), those accessors throw `InvalidOperationException`, which is **not** a `JsonException` and so escapes the method — defeating the documented lenient-parse contract. For `ParseDriverInstances` (called from `ReconcileDrivers` at `DriverHostActor.cs:889`, outside its own try) the escape propagates to `ApplyAndAck`, marking the whole deployment **Failed** instead of degrading gracefully. The artifact is normally machine-generated with controlled types, so impact is bounded, but a single wrong-typed field shouldn't fail-the-deploy when the code claims to tolerate malformed input. **Recommendation:** Make the per-row readers type-check before reading (`enEl.ValueKind == True/False` before `GetBoolean`, `ValueKind == String` before `GetString`) — the pattern the `Build*` methods and `ExtractTag*` helpers already follow — so a wrong-typed field degrades to the field default / a skipped row rather than throwing. Equivalently, broaden the guards to also catch `InvalidOperationException`. **Resolution:** Resolved 2026-06-19 — `TryReadSpec` now reads `Enabled` only on a real `True/False` token (defaulting `true` otherwise) and reads every string field via a `ReadString` helper that returns null on a non-string `ValueKind`; `ReadAreaProjection`/`ReadLineProjection`/`ReadEquipmentNode`/`ReadDriverPlan`/`ReadAlarmPlan` use the same helper. Wrong-typed fields now degrade to the field default / skipped row instead of throwing `InvalidOperationException`. Regression test `DeploymentArtifactMalformedTypeTests`. ### Runtime-003 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:320-342` | | Status | Won't Fix | **Description:** In `HandleRebuild`, `_lastApplied = composition` is assigned (line 321) immediately after `_applier.Apply(plan)` returns, but BEFORE the four `Materialise*` passes run. If one of those passes throws (e.g. `MaterialiseEquipmentTags`), the catch logs the error and returns — but `_lastApplied` has already advanced to the new composition. The next rebuild then diffs against a composition whose materialisation only partially completed, so the partially-failed nodes won't be re-added (the diff sees them as already applied). In practice the `Materialise*` passes are individually try/wrapped (`SafeEnsureFolder` etc.) and swallow per-node failures, so a throw out of a whole pass is unlikely; and the next *config-changing* deploy re-materialises everything. Net risk is low. **Recommendation:** Advance `_lastApplied` only after all materialise passes complete (move the assignment to the end of the try, after `MaterialiseEquipmentVirtualTags`). Note this is a deliberate trade-off — moving it means a partial failure re-runs `Apply(plan)` (the diff) against the OLD composition next time, which is the safer direction. **Resolution:** Won't Fix (2026-06-19) — the four `Materialise*` passes are each fully `Safe*`-wrapped at the per-node level (`AddressSpaceApplier.SafeEnsureFolder` / `SafeEnsureVariable` / `SafeMaterialiseAlarmCondition` swallow and log every per-node exception), so a pass cannot throw out to `HandleRebuild` in practice — the ordering hazard is not reachable. Reordering the assignment would change the recovery direction on a hypothetical unreachable path without a demonstrable live benefit, and would risk re-running `Apply(plan)` against a stale composition. Recorded as a latent-ordering note rather than a fix. ### Runtime-004 | Field | Value | |---|---| | Severity | Low | | Category | Correctness & logic bugs | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs:683-684` | | Status | Won't Fix | **Description:** `HandleRouteNodeWrite` forwards `new DriverInstanceActor.WriteAttribute(target.FullName, msg.Value!)` where `RouteNodeWrite.Value` is `object?` (nullable) but `WriteAttribute.Value` is non-nullable `object`. The `!` null-forgiving operator suppresses the warning but does not prevent a genuine `null` from flowing through to `WriteRequest`/`IWritable.WriteAsync` if an OPC UA client writes a null value. Whether a null write coerces cleanly is driver-dependent; at minimum the non-null contract on `WriteAttribute.Value` is misleading. **Recommendation:** Either make `WriteAttribute.Value` an `object?` (honest contract; the driver already has to handle type coercion / rejection), or reject a null write at the gateway with a clear `NodeWriteResult(false, "null value")`. Low priority — null variable writes are rare and downstream drivers generally reject them with a Bad status anyway. **Resolution:** Won't Fix (2026-06-19) — this is a public message-contract shape (`WriteAttribute.Value` is a wire-serialised Akka message record). Widening it to `object?` or adding a null-reject branch touches the inbound-write contract shared with `ActorNodeWriteGateway` / the node-manager wiring; the orchestrator's hard constraints forbid a wire/message-contract change in this pass. Downstream `IWritable.WriteAsync` implementations already reject an uncoercible value with a Bad status, so the live impact is a slightly less clear error string, not a crash or data loss. Recorded for a future contract-touching pass. ### Runtime-005 | Field | Value | |---|---| | Severity | Low | | Category | Documentation & comments | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs:610-615` | | Status | Won't Fix | **Description:** `BuildEquipmentScriptedAlarmPlans`'s local `ReadBool` has a comment warning that *"GetBoolean only runs for a genuine true/false token (a non-bool token would otherwise throw InvalidOperationException, uncaught here)"*. The comment describes a hazard that the code already prevents — the guard `b.ValueKind is JsonValueKind.True or JsonValueKind.False` runs before `GetBoolean`, so `GetBoolean` can never see a non-bool token and never throws. The comment is stale/misleading (it reads as if the throw is a live risk). **Recommendation:** Trim the parenthetical to state the guard makes the call safe, rather than implying an uncaught throw. Purely a comment clarity issue. **Resolution:** Won't Fix (2026-06-19) — the code is correct (the `ValueKind` guard precedes `GetBoolean`), and the comment is defensively accurate in spirit (it explains *why* the guard exists). The wording is slightly alarmist but not factually wrong, and the parity-critical `ExtractTag*` mirror methods carry the same defensive phrasing for consistency. Not worth a churn edit on a byte-parity-sensitive file. ### Runtime-006 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs` (whole-file) | | Status | Resolved | **Description:** The artifact-decode parity suites (`DeploymentArtifact*ParityTests`) cover the happy path + the byte-parity-with-composer contract thoroughly, but there is no test asserting the documented *lenient-parse* contract under **valid-JSON-but-wrong-type** input (e.g. a string `Enabled`, a numeric `Name`). That gap is exactly what let Runtime-002 go unnoticed. **Recommendation:** Add a regression test feeding a syntactically-valid artifact with wrong-typed fields and asserting the parse degrades to the field default / a skipped row (no exception). **Resolution:** Resolved 2026-06-19 — added `DeploymentArtifactMalformedTypeTests` (string `Enabled`, numeric `Name`/`DriverType`, numeric `UnsAreaId`) asserting `ParseDriverInstances` + `ParseComposition` degrade gracefully instead of throwing. Co-resolved with Runtime-002.