Review at HEAD 7286d320. Runtime-002/006 (Medium): DeploymentArtifact lenient-parse
now degrades wrong-typed JSON fields to defaults/skipped-row instead of throwing (fails
the deploy) + regression tests; byte-parity with AddressSpaceComposer preserved. Runtime-001
(UNS rename) deferred cross-module (needs AddressSpacePlan rename signal + EnsureFolder
rename). 003/004/005 Won't-Fix.
14 KiB
Code Review — Runtime
| Field | Value |
|---|---|
| Module | src/Server/ZB.MOM.WW.OtOpcUa.Runtime |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | 7286d320 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Findings Runtime-001, Runtime-004, Runtime-005 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (see Notes) |
| 4 | Error handling & resilience | Findings Runtime-002, Runtime-003 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Finding Runtime-006 |
| 10 | Documentation & comments | No issues found |
Scope
22 source .cs under src/Server/ZB.MOM.WW.OtOpcUa.Runtime (6129 LOC): the deployment-artifact
decode (DeploymentArtifact.cs), the OPC UA publish actor (OpcUaPublishActor.cs), the driver host +
instance actors (DriverHostActor.cs, DriverInstanceActor.cs, DriverSpawnPlan.cs,
NativeAlarmProjector.cs), the address-space write gateway (ActorNodeWriteGateway.cs), health probes
(DbHealthProbeActor.cs, PeerOpcUaProbeActor.cs, PeerProbeSupervisor.cs), the historian adapter +
options (HistorianAdapterActor.cs, AlarmHistorianOptions.cs, ServerHistorianOptions.cs), scripted
alarms (ScriptedAlarmHostActor.cs, EfAlarmConditionStateStore.cs,
DependencyMuxTagUpstreamSource.cs), scripting (DpsScriptLogPublisher.cs), virtual tags
(DependencyMuxActor.cs, VirtualTagActor.cs, VirtualTagHostActor.cs), the health publisher
(AkkaDriverHealthPublisher.cs), and DI wiring (ServiceCollectionExtensions.cs).
Notes (no-finding observations)
- Concurrency: actors are single-threaded per instance; off-thread continuations
(
OpcUaPublishActor.OnHealthTick,DriverHostActor.HandleRouteNodeWrite,ScriptedAlarmHostActor.OnApply/OnEngineEmission) correctly captureSender/Selfbefore theawait/ContinueWithand only construct messages off-thread, mutating state only back on the actor thread.DependencyMuxTagUpstreamSourceis the one genuinely-shared type and uses immutable observer lists + atomic CAS correctly. No shared mutable state crosses the actor boundary. - Security: write-through is Primary-gated AND
WriteOperate-gated upstream of Runtime (OnWriteValuein the node manager); native-alarm ack isAlarmAck-gated upstream; Runtime re-applies the Primary gate defensively. No secret logging —AlarmHistorianOptions.Validate/ServerHistorianOptions.Validatewarn whenSharedSecretis empty but never echo the secret value.
Findings
Runtime-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:311-318; cross-module src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpacePlan.cs:56-61 + src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/IOpcUaAddressSpaceSink.cs:55 |
| Status | Deferred |
Description: This is the Runtime side of the orchestrator-handed cross-module item OpcUaServer-001:
a UNS Area/Line rename (display-name-only edit, no Equipment/Tag/Driver/Alarm change) produces an
AddressSpacePlan that IsEmpty, so OpcUaPublishActor.HandleRebuild short-circuits at line 313
(if (plan.IsEmpty) { ...; return; }) before reaching _applier.MaterialiseHierarchy(composition)
at line 326 — the rebuilt folder display-name never lands and OPC UA clients keep browsing the stale name.
I verified this is NOT self-contained within Runtime: a complete fix needs changes in two other modules that Runtime cannot touch:
AddressSpacePlan.IsEmpty(in OpcUaServer) does not diffUnsArea/UnsLineat all (it only tracks Equipment/Driver/Alarm/EquipmentTag/VirtualTag). For the rename to drive a rebuild, the plan must gain a UNS-changed signal (a new diff set) soIsEmptyreturnsfalse.- Even if
MaterialiseHierarchywere reached,IOpcUaAddressSpaceSink.EnsureFolder(in Commons) is contractually idempotent and returns the existing folder unchanged — it has no rename semantics, so an existing folder'sDisplayNamewould not be updated.EnsureFolder(or a newRenameFolder) in the Commons contract + its SDK-backed implementation must actually re-set the display name.
Both (1) and (2) are wire/contract-shaped changes in OpcUaServer + Commons, outside this module's edit
boundary. The Runtime side alone (acting on a UNS-changed plan and re-running MaterialiseHierarchy)
is necessary but not sufficient.
Recommendation: Track jointly with OpcUaServer-001. OpcUaServer: add a UNS-rename diff set to
AddressSpacePlan so IsEmpty is false on a rename. Commons: give EnsureFolder (or a new
RenameFolder) update-display-name semantics on an existing folder. Runtime: once those land, ensure
HandleRebuild runs MaterialiseHierarchy for a non-empty UNS-only plan (it already does, since the
short-circuit only fires for a truly-empty plan). No Runtime-only change can close it safely.
Resolution: (deferred — needs OpcUaServer AddressSpacePlan rename signal + Commons EnsureFolder/RenameFolder rename semantics; both outside the Runtime edit boundary. Tracked with OpcUaServer-001.)
Runtime-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs:148-169, 782-828 |
| Status | Resolved |
Description: The artifact-decode methods document a "lenient" contract — "Empty / malformed blobs
return an empty list" / "return an empty composition" — and guard every parse path with
catch (JsonException). But the per-row readers (TryReadSpec, ReadAreaProjection,
ReadLineProjection, ReadEquipmentNode, ReadDriverPlan, ReadAlarmPlan) call JsonElement.GetString()
and GetBoolean() directly on a property element. If the artifact is valid JSON but a field carries the
wrong JSON type (e.g. "Enabled": "true" as a string, or "Name": 5 as a number), those accessors throw
InvalidOperationException, which is not a JsonException and so escapes the method — defeating the
documented lenient-parse contract. For ParseDriverInstances (called from ReconcileDrivers at
DriverHostActor.cs:889, outside its own try) the escape propagates to ApplyAndAck, marking the whole
deployment Failed instead of degrading gracefully. The artifact is normally machine-generated with
controlled types, so impact is bounded, but a single wrong-typed field shouldn't fail-the-deploy when the
code claims to tolerate malformed input.
Recommendation: Make the per-row readers type-check before reading (enEl.ValueKind == True/False
before GetBoolean, ValueKind == String before GetString) — the pattern the Build* methods and
ExtractTag* helpers already follow — so a wrong-typed field degrades to the field default / a skipped row
rather than throwing. Equivalently, broaden the guards to also catch InvalidOperationException.
Resolution: Resolved 2026-06-19 — TryReadSpec now reads Enabled only on a real True/False token
(defaulting true otherwise) and reads every string field via a ReadString helper that returns null on a
non-string ValueKind; ReadAreaProjection/ReadLineProjection/ReadEquipmentNode/ReadDriverPlan/ReadAlarmPlan
use the same helper. Wrong-typed fields now degrade to the field default / skipped row instead of throwing
InvalidOperationException. Regression test DeploymentArtifactMalformedTypeTests.
Runtime-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:320-342 |
| Status | Won't Fix |
Description: In HandleRebuild, _lastApplied = composition is assigned (line 321) immediately after
_applier.Apply(plan) returns, but BEFORE the four Materialise* passes run. If one of those passes throws
(e.g. MaterialiseEquipmentTags), the catch logs the error and returns — but _lastApplied has already
advanced to the new composition. The next rebuild then diffs against a composition whose materialisation
only partially completed, so the partially-failed nodes won't be re-added (the diff sees them as already
applied). In practice the Materialise* passes are individually try/wrapped (SafeEnsureFolder etc.) and
swallow per-node failures, so a throw out of a whole pass is unlikely; and the next config-changing
deploy re-materialises everything. Net risk is low.
Recommendation: Advance _lastApplied only after all materialise passes complete (move the assignment
to the end of the try, after MaterialiseEquipmentVirtualTags). Note this is a deliberate trade-off — moving
it means a partial failure re-runs Apply(plan) (the diff) against the OLD composition next time, which is
the safer direction.
Resolution: Won't Fix (2026-06-19) — the four Materialise* passes are each fully Safe*-wrapped at the
per-node level (AddressSpaceApplier.SafeEnsureFolder / SafeEnsureVariable / SafeMaterialiseAlarmCondition
swallow and log every per-node exception), so a pass cannot throw out to HandleRebuild in practice — the
ordering hazard is not reachable. Reordering the assignment would change the recovery direction on a
hypothetical unreachable path without a demonstrable live benefit, and would risk re-running Apply(plan)
against a stale composition. Recorded as a latent-ordering note rather than a fix.
Runtime-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs:683-684 |
| Status | Won't Fix |
Description: HandleRouteNodeWrite forwards new DriverInstanceActor.WriteAttribute(target.FullName, msg.Value!) where RouteNodeWrite.Value is object? (nullable) but WriteAttribute.Value is non-nullable
object. The ! null-forgiving operator suppresses the warning but does not prevent a genuine null from
flowing through to WriteRequest/IWritable.WriteAsync if an OPC UA client writes a null value. Whether a
null write coerces cleanly is driver-dependent; at minimum the non-null contract on WriteAttribute.Value
is misleading.
Recommendation: Either make WriteAttribute.Value an object? (honest contract; the driver already has
to handle type coercion / rejection), or reject a null write at the gateway with a clear
NodeWriteResult(false, "null value"). Low priority — null variable writes are rare and downstream drivers
generally reject them with a Bad status anyway.
Resolution: Won't Fix (2026-06-19) — this is a public message-contract shape (WriteAttribute.Value is a
wire-serialised Akka message record). Widening it to object? or adding a null-reject branch touches the
inbound-write contract shared with ActorNodeWriteGateway / the node-manager wiring; the orchestrator's hard
constraints forbid a wire/message-contract change in this pass. Downstream IWritable.WriteAsync
implementations already reject an uncoercible value with a Bad status, so the live impact is a slightly less
clear error string, not a crash or data loss. Recorded for a future contract-touching pass.
Runtime-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs:610-615 |
| Status | Won't Fix |
Description: BuildEquipmentScriptedAlarmPlans's local ReadBool has a comment warning that "GetBoolean
only runs for a genuine true/false token (a non-bool token would otherwise throw InvalidOperationException,
uncaught here)". The comment describes a hazard that the code already prevents — the guard
b.ValueKind is JsonValueKind.True or JsonValueKind.False runs before GetBoolean, so GetBoolean can
never see a non-bool token and never throws. The comment is stale/misleading (it reads as if the throw is a
live risk).
Recommendation: Trim the parenthetical to state the guard makes the call safe, rather than implying an uncaught throw. Purely a comment clarity issue.
Resolution: Won't Fix (2026-06-19) — the code is correct (the ValueKind guard precedes GetBoolean),
and the comment is defensively accurate in spirit (it explains why the guard exists). The wording is
slightly alarmist but not factually wrong, and the parity-critical ExtractTag* mirror methods carry the
same defensive phrasing for consistency. Not worth a churn edit on a byte-parity-sensitive file.
Runtime-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs (whole-file) |
| Status | Resolved |
Description: The artifact-decode parity suites (DeploymentArtifact*ParityTests) cover the happy path +
the byte-parity-with-composer contract thoroughly, but there is no test asserting the documented
lenient-parse contract under valid-JSON-but-wrong-type input (e.g. a string Enabled, a numeric
Name). That gap is exactly what let Runtime-002 go unnoticed.
Recommendation: Add a regression test feeding a syntactically-valid artifact with wrong-typed fields and asserting the parse degrades to the field default / a skipped row (no exception).
Resolution: Resolved 2026-06-19 — added DeploymentArtifactMalformedTypeTests (string Enabled, numeric
Name/DriverType, numeric UnsAreaId) asserting ParseDriverInstances + ParseComposition degrade
gracefully instead of throwing. Co-resolved with Runtime-002.