14 KiB
OtOpcUa — FixedTree-injection follow-ups (design)
Date: 2026-06-26
Status: ✅ Implemented (2026-06-26) — all five follow-ups (A–E) built via subagent-driven development
(16 commits c2c368dc..0074f37a on feat/focas-fixedtree-equipment-injection; every task spec+code
reviewed, high-risk tasks with serial Opus reviews). Offline suites green: Runtime.Tests 331, OpcUaServer.Tests 319,
FOCAS 248 + AbLegacy/composer additions; dotnet build 0 errors, production src/ 0 warnings (TreatWarningsAsErrors).
No DB migration and no deployment-artifact wire-format change were needed (E is projection-only — the columns +
Devices array were already serialized). Live wonder re-validation of the single-device FOCAS path is optional/user-gated
(the base feature's live path is unchanged by these follow-ups).
Companion to: 2026-06-26-otopcua-fixedtree-equipment-injection-design.md
(the base feature — ✅ built + live-validated on wonder-app-vd03). This design works through the
follow-ups that feature's review chain surfaced.
Branch: continue on feat/focas-fixedtree-equipment-injection (stacked on fix/focas-poll-io-serialization,
local/unpushed — standing rule is "commit/push only when asked").
Scope (locked with the user 2026-06-26)
The user selected all five items below. The base feature's explicit non-goals — discovered-alarm injection and writable discovered nodes — remain out of scope (locked design decisions, untouched).
| # | Follow-up | Size | Notes |
|---|---|---|---|
| A | Hardcoded 30 s discovery timeout → injectable | trivial | behavior-preserving |
| B | Re-discovery opt-in/policy gate per driver | moderate | back-compat default |
| C | Config-unchanged driver→equipment rebind re-triggers discovery | moderate | reverses a deliberate won't-fix |
| D | De-dup the double SetDesiredSubscriptions during redeploy |
small | one extra unsub/resub blip today |
| E | Lift the ≥1-authored-tag requirement + multi-device-per-driver | largest | projection-only, no DB migration |
Key discovery that shapes E
The Config-DB already models the equipment→driver(→device) association as first-class data — no schema change is needed for E:
Equipment.DriverInstanceId(string?, made nullable by migration20260608104706_NullableEquipmentDriverInstanceId)Equipment.DeviceId(string?) — FK to a multi-device driver's deviceDeviceis a first-class entity (DeviceId,DriverInstanceId, schemalessDeviceConfigJSON with host)- The AdminUI equipment editor already exposes an optional driver pick (
EquipmentInput.DriverInstanceId)
The only gap is the runtime projection: EquipmentNode is (EquipmentId, DisplayName, UnsLineId) and drops
DriverInstanceId/DeviceId, so the injector (DriverHostActor.HandleDiscoveredNodes) can only resolve the
equipment by inferring it from authored EquipmentTags — hence the ≥1-tag requirement. E closes that gap in the
projection + resolver, not the schema.
A. Discovery timeout → injectable
DriverInstanceActor.HandleRediscoverAsync hardcodes new CancellationTokenSource(TimeSpan.FromSeconds(30))
while the rediscover interval + attempt-cap are already constructor parameters. Add a
rediscoverDiscoverTimeout (TimeSpan, default TimeSpan.FromSeconds(30)) to the ctor and the Props
factory; use the field instead of the literal. Pure consistency fix; default preserves behavior.
B. Re-discovery opt-in / policy gate
Problem: StartDiscovery() runs the bounded retry loop for every ITagDiscovery driver on every
(re)connect. FOCAS needs it (its FixedTreeCache fills ~0–2 s after connect, so a single early pass would
capture an empty/partial tree). A driver that browses its full shape synchronously inside DiscoverAsync
(OpcUaClient, TwinCAT, AB) needs at most one pass — the 15×2 s retry is wasted (potentially heavy) network I/O.
Decision: a per-driver policy, declared in code (driver "heaviness" is a property of the driver type,
needs no DB/AdminUI plumbing, lowest risk). Add a default-implemented member to ITagDiscovery:
public enum DiscoveryRediscoverPolicy { UntilStable, Once, Never }
public interface ITagDiscovery
{
DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable; // default = today's behavior
Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken);
}
UntilStable(default, unchanged) — today's loop: retry every_rediscoverIntervalup to_rediscoverMaxAttemptsor until the captured signature is non-empty and stable.Once— kick exactly one discovery pass on connect, emit oneDiscoveredNodesReady, then stop.Never— no post-connect discovery kick at all.
Driver assignments:
- FOCAS →
UntilStable(explicit; it genuinely needs the retry). - OpcUaClient, TwinCAT, AbCip, AbLegacy →
Once(they discover synchronously inDiscoverAsync; one pass on connect injects their tree, the retry loop only added cost). Any driver not overriding the default keepsUntilStable, so this is a strict no-regression change.
Mechanism: DriverInstanceActor.StartDiscovery reads ((ITagDiscovery)_driver).RediscoverPolicy. Never
→ return without scheduling. Once/UntilStable → schedule the first RediscoverTick; HandleRediscoverAsync
stops after the first pass when the policy is Once (instead of evaluating stop-on-stable).
Alternative considered (rejected): a per-instance JSON flag parsed by the host + AdminUI. More flexible but adds artifact/AdminUI plumbing for a knob whose correct value is type-uniform.
C. Config-unchanged rebind re-triggers discovery
Problem: when a redeploy rebinds a driver to a new equipment without a DriverConfig change,
PushDesiredSubscriptions' re-inject tail correctly drops the stale cached plan (a stale EQ-1-scoped graft
under EQ-2 would be worse), but ReconcileDrivers only restarts a child on a DriverConfig change — so a
config-unchanged child is never reconnected and the FixedTree stays absent under the new equipment until the
driver's next natural reconnect/restart.
The base feature deliberately did not add a re-trigger here, to avoid coupling the subscription pass to driver-lifecycle control. This follow-up reverses that — but cleanly, because the trigger is a discovery action, not lifecycle control (no stop/restart), and it is idempotent.
Decision: add a DriverInstanceActor.TriggerRediscovery message. In the re-inject tail, the two branches
that Remove a cached plan because of a rebind/loss also Tell that driver's child TriggerRediscovery. The
child kicks a fresh RediscoverTick (current _initGeneration) iff it is in the Connected state;
otherwise it no-ops (its eventual reconnect re-discovers anyway). The discovery pass re-emits
DiscoveredNodesReady, which resolves against the new composition (_lastComposition) and grafts under the
new equipment. The re-trigger honors B's policy (Never drivers do not re-discover; Once/UntilStable run
their normal pass(es)).
Update the inline comment at the drop site and the follow-up note in the base design doc to record the new behavior.
D. De-dup the double SetDesiredSubscriptions
Problem: during an in-process redeploy, a cached driver receives two SetDesiredSubscriptions:
the bulk authored-only send in PushDesiredSubscriptions, then the authored∪discovered union from
ApplyDiscoveredPlan (the re-inject tail). The first send forces the child to unsubscribe the whole handle
(authored tags included) then the second re-subscribes — one extra blip per cached driver per redeploy.
Decision: in the bulk loop, skip the send for any driver that has a _discoveredByDriver entry — the
re-inject tail sends their complete union. Critical fallback: the re-inject tail can still drop a cached
plan (rebind/loss, see C); when it does, it must send the authored-only set for that driver so its authored
subscriptions are not lost. Net invariant: every driver receives exactly one SetDesiredSubscriptions per
redeploy.
E. Lift the ≥1-authored-tag requirement + multi-device
No DB migration — projection + resolver only.
E1 — projection (AddressSpaceComposer). Extend EquipmentNode:
public sealed record EquipmentNode(
string EquipmentId,
string DisplayName,
string UnsLineId,
string? DriverInstanceId = null, // from Equipment.DriverInstanceId
string? DeviceId = null, // from Equipment.DeviceId
string? DeviceHost = null); // resolved at projection time: Equipment.DeviceId -> Device.DeviceConfig host
DriverInstanceId/DeviceId are copied straight off the Equipment row. DeviceHost is resolved by joining
Equipment.DeviceId → Device and parsing the host out of that Device's schemaless DeviceConfig JSON, so the
resolver can match it against a discovered device-host folder without re-reading the DB. All three are nullable;
existing single-equipment behavior is unaffected when they're null.
E2 — resolver (DriverHostActor.HandleDiscoveredNodes, and the redeploy re-inject tail). Replace tag-only
equipment resolution with:
candidates = { equipments where
EquipmentNode.DriverInstanceId == driverId} ∪ { equipments inferred from authoredEquipmentTagsfordriverId} (keeps today's path working)
- 0 candidates → log Info, skip (unchanged).
- 1 candidate → graft all discovered nodes under it (today's single-device behavior, now also works with
zero authored tags because the equipment-level
DriverInstanceIdresolves it). The device-host folder is still collapsed (single device). - >1 candidates (multi-device) → partition discovered nodes by their device-host folder segment
(
DiscoveredNode.FolderPathSegments[1]) and graft each device's subtree under the equipment whoseDeviceHostmatches that segment. A device-host with no matching equipment is warn-skipped (its subtree is not grafted) rather than mis-grafted. The mapper's existing device-host collapse already disables itself when ≥2 distinct device-host segments are present, so multi-device paths retain the device-host level and don't collide.
⚠️ Implementation risk (E2 multi-device only): the partition join is a host string match — the driver's
emitted device-host folder segment (FOCAS uses device.HostAddress) must equal the equipment's projected
DeviceHost (parsed from DeviceConfig). Both ultimately derive from the same device configuration, but the
string forms must be normalized to match (e.g. host:port). The warn-skip fallback makes a mismatch safe (no
mis-graft, authored tags + single-device paths unaffected); a normalization helper + a unit test pin the formats
together. Single-device deployments (the validated FOCAS z-34184 case) take the "1 candidate" path and are
not exposed to this risk.
Data flow (unchanged)
E changes only which equipment a discovered node is grafted under and whether an equipment with no authored tags participates. Once the NodeId is assigned, the materialize → subscribe → poll → push value path is exactly the base feature's path; B/C/D change when/how often discovery runs and how many subscription pushes occur, not the value path.
Error handling
- B
Neverdriver → no discovery, authored tags unaffected. - C re-trigger on a non-
Connectedchild → no-op (safe; reconnect re-discovers). - D dropped-plan fallback → authored-only send, so a rebind/loss never strands a driver's authored subscriptions.
- E multi-device unmatched device-host → Warning + skip that device's subtree; other devices + authored tags
unaffected. >1 candidate with no
DeviceHostdata anywhere → falls back to the base feature's warn+skip (no regression).
Testing
- A: ctor/
Propswires the timeout; default is 30 s (assert via a short injected timeout in an existing rediscover test). - B:
Never→ noDiscoveredNodesReady;Once→ exactly one even when the captured set would keep growing;UntilStable→ today's loop (regression). FOCAS reportsUntilStable; the four network drivers reportOnce. - C: rebind drop branch
TellsTriggerRediscovery;Connectedchild re-discovers and re-emits; non-Connectedchild no-ops; re-trigger respects aNeverpolicy. - D: single-send invariant — a cached driver gets exactly one
SetDesiredSubscriptionson redeploy (union when applied; authored-only when the plan is dropped). - E1:
EquipmentNodeprojection carriesDriverInstanceId/DeviceId/DeviceHost;DeviceHostresolves via theDevicejoin +DeviceConfighost parse; nulls when unset. - E2: tag-less graft (driver-level link, 0 authored tags); single-candidate unchanged (collapse retained); multi-device partition maps each device-host to the right equipment; unmatched device-host → warn-skip; host-string normalization.
- Regression: Runtime.Tests, OpcUaServer.Tests, and the FOCAS suite stay green; the validated single-device FOCAS injection path is unchanged.
- Live wonder re-validation of the single-device FOCAS path is optional and user-gated (the base feature is already live-validated; these follow-ups don't alter that path's runtime behavior).
Scope / non-goals
- In: A–E above.
- Out (still locked): discovered-alarm injection; writable discovered nodes.
Touched code (anticipated)
src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs—DiscoveryRediscoverPolicyenum + default member.- Driver classes (
FocasDriver,OpcUaClientDriver,TwinCATDriver,AbCipDriver,AbLegacyDriver) — overrideRediscoverPolicy. src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs— injectable timeout (A); policy-gatedStartDiscovery/HandleRediscoverAsync(B);TriggerRediscoverymessage (C).src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs— resolver union + multi-device partition (E2); re-trigger on rebind drop (C); bulk-send skip + dropped-plan fallback (D).src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpaceComposer.cs—EquipmentNodeprojection (E1).src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNodeMapper.cs— multi-device partition support (E2).- Tests under
tests/.../Runtime.Testsandtests/.../OpcUaServer.Tests.
Task tracking
Implementation tasks to be generated by writing-plans from this design.