docs(otopcua): record FixedTree follow-ups A-E as implemented (design, plan, RESUME)
This commit is contained in:
@@ -0,0 +1,234 @@
|
||||
# OtOpcUa — FixedTree-injection follow-ups (design)
|
||||
|
||||
**Date:** 2026-06-26
|
||||
**Status:** ✅ Implemented (2026-06-26) — all five follow-ups (A–E) built via subagent-driven development
|
||||
(16 commits `c2c368dc`..`0074f37a` on `feat/focas-fixedtree-equipment-injection`; every task spec+code
|
||||
reviewed, high-risk tasks with serial Opus reviews). Offline suites green: Runtime.Tests 331, OpcUaServer.Tests 319,
|
||||
FOCAS 248 + AbLegacy/composer additions; `dotnet build` 0 errors, production `src/` 0 warnings (TreatWarningsAsErrors).
|
||||
No DB migration and no deployment-artifact wire-format change were needed (E is projection-only — the columns +
|
||||
`Devices` array were already serialized). Live wonder re-validation of the single-device FOCAS path is optional/user-gated
|
||||
(the base feature's live path is unchanged by these follow-ups).
|
||||
**Companion to:** [`2026-06-26-otopcua-fixedtree-equipment-injection-design.md`](2026-06-26-otopcua-fixedtree-equipment-injection-design.md)
|
||||
(the base feature — ✅ built + live-validated on `wonder-app-vd03`). This design works through the
|
||||
follow-ups that feature's review chain surfaced.
|
||||
**Branch:** continue on `feat/focas-fixedtree-equipment-injection` (stacked on `fix/focas-poll-io-serialization`,
|
||||
local/unpushed — standing rule is "commit/push only when asked").
|
||||
|
||||
---
|
||||
|
||||
## Scope (locked with the user 2026-06-26)
|
||||
|
||||
The user selected **all five** items below. The base feature's explicit non-goals — discovered-**alarm**
|
||||
injection and **writable** discovered nodes — remain out of scope (locked design decisions, untouched).
|
||||
|
||||
| # | Follow-up | Size | Notes |
|
||||
|---|---|---|---|
|
||||
| A | Hardcoded 30 s discovery timeout → injectable | trivial | behavior-preserving |
|
||||
| B | Re-discovery opt-in/policy gate per driver | moderate | back-compat default |
|
||||
| C | Config-unchanged driver→equipment rebind re-triggers discovery | moderate | reverses a deliberate `won't-fix` |
|
||||
| D | De-dup the double `SetDesiredSubscriptions` during redeploy | small | one extra unsub/resub blip today |
|
||||
| E | Lift the ≥1-authored-tag requirement + multi-device-per-driver | largest | **projection-only, no DB migration** |
|
||||
|
||||
## Key discovery that shapes E
|
||||
|
||||
The Config-DB **already** models the equipment→driver(→device) association as first-class data — no schema
|
||||
change is needed for E:
|
||||
|
||||
- `Equipment.DriverInstanceId` (`string?`, made nullable by migration `20260608104706_NullableEquipmentDriverInstanceId`)
|
||||
- `Equipment.DeviceId` (`string?`) — FK to a multi-device driver's device
|
||||
- `Device` is a first-class entity (`DeviceId`, `DriverInstanceId`, schemaless `DeviceConfig` JSON with host)
|
||||
- The AdminUI equipment editor already exposes an optional driver pick (`EquipmentInput.DriverInstanceId`)
|
||||
|
||||
The **only** gap is the runtime projection: `EquipmentNode` is `(EquipmentId, DisplayName, UnsLineId)` and drops
|
||||
`DriverInstanceId`/`DeviceId`, so the injector (`DriverHostActor.HandleDiscoveredNodes`) can only resolve the
|
||||
equipment by inferring it from authored `EquipmentTags` — hence the ≥1-tag requirement. E closes that gap in the
|
||||
projection + resolver, not the schema.
|
||||
|
||||
---
|
||||
|
||||
## A. Discovery timeout → injectable
|
||||
|
||||
`DriverInstanceActor.HandleRediscoverAsync` hardcodes `new CancellationTokenSource(TimeSpan.FromSeconds(30))`
|
||||
while the rediscover interval + attempt-cap are already constructor parameters. Add a
|
||||
`rediscoverDiscoverTimeout` (`TimeSpan`, default `TimeSpan.FromSeconds(30)`) to the ctor and the `Props`
|
||||
factory; use the field instead of the literal. Pure consistency fix; default preserves behavior.
|
||||
|
||||
## B. Re-discovery opt-in / policy gate
|
||||
|
||||
**Problem:** `StartDiscovery()` runs the bounded retry loop for **every** `ITagDiscovery` driver on every
|
||||
(re)connect. FOCAS needs it (its `FixedTreeCache` fills ~0–2 s *after* connect, so a single early pass would
|
||||
capture an empty/partial tree). A driver that browses its full shape **synchronously** inside `DiscoverAsync`
|
||||
(OpcUaClient, TwinCAT, AB) needs at most **one** pass — the 15×2 s retry is wasted (potentially heavy) network I/O.
|
||||
|
||||
**Decision:** a per-driver **policy**, declared in code (driver "heaviness" is a property of the driver *type*,
|
||||
needs no DB/AdminUI plumbing, lowest risk). Add a default-implemented member to `ITagDiscovery`:
|
||||
|
||||
```csharp
|
||||
public enum DiscoveryRediscoverPolicy { UntilStable, Once, Never }
|
||||
|
||||
public interface ITagDiscovery
|
||||
{
|
||||
DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable; // default = today's behavior
|
||||
Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken);
|
||||
}
|
||||
```
|
||||
|
||||
- **`UntilStable`** (default, unchanged) — today's loop: retry every `_rediscoverInterval` up to
|
||||
`_rediscoverMaxAttempts` or until the captured signature is non-empty and stable.
|
||||
- **`Once`** — kick exactly one discovery pass on connect, emit one `DiscoveredNodesReady`, then stop.
|
||||
- **`Never`** — no post-connect discovery kick at all.
|
||||
|
||||
**Driver assignments:**
|
||||
- **FOCAS** → `UntilStable` (explicit; it genuinely needs the retry).
|
||||
- **OpcUaClient, TwinCAT, AbCip, AbLegacy** → `Once` (they discover synchronously in `DiscoverAsync`; one pass
|
||||
on connect injects their tree, the retry loop only added cost). Any driver not overriding the default keeps
|
||||
`UntilStable`, so this is a strict no-regression change.
|
||||
|
||||
**Mechanism:** `DriverInstanceActor.StartDiscovery` reads `((ITagDiscovery)_driver).RediscoverPolicy`. `Never`
|
||||
→ return without scheduling. `Once`/`UntilStable` → schedule the first `RediscoverTick`; `HandleRediscoverAsync`
|
||||
stops after the first pass when the policy is `Once` (instead of evaluating stop-on-stable).
|
||||
|
||||
**Alternative considered (rejected):** a per-instance JSON flag parsed by the host + AdminUI. More flexible but
|
||||
adds artifact/AdminUI plumbing for a knob whose correct value is type-uniform.
|
||||
|
||||
## C. Config-unchanged rebind re-triggers discovery
|
||||
|
||||
**Problem:** when a redeploy rebinds a driver to a new equipment **without** a `DriverConfig` change,
|
||||
`PushDesiredSubscriptions`' re-inject tail correctly **drops** the stale cached plan (a stale `EQ-1`-scoped graft
|
||||
under `EQ-2` would be worse), but `ReconcileDrivers` only restarts a child on a `DriverConfig` change — so a
|
||||
config-unchanged child is never reconnected and the FixedTree stays absent under the new equipment until the
|
||||
driver's next natural reconnect/restart.
|
||||
|
||||
The base feature deliberately did **not** add a re-trigger here, to avoid coupling the subscription pass to
|
||||
driver-lifecycle control. This follow-up reverses that — but cleanly, because the trigger is a **discovery**
|
||||
action, not lifecycle control (no stop/restart), and it is idempotent.
|
||||
|
||||
**Decision:** add a `DriverInstanceActor.TriggerRediscovery` message. In the re-inject tail, the two branches
|
||||
that `Remove` a cached plan because of a rebind/loss also `Tell` that driver's child `TriggerRediscovery`. The
|
||||
child kicks a fresh `RediscoverTick` (current `_initGeneration`) **iff it is in the `Connected` state**;
|
||||
otherwise it no-ops (its eventual reconnect re-discovers anyway). The discovery pass re-emits
|
||||
`DiscoveredNodesReady`, which resolves against the **new** composition (`_lastComposition`) and grafts under the
|
||||
new equipment. The re-trigger honors B's policy (`Never` drivers do not re-discover; `Once`/`UntilStable` run
|
||||
their normal pass(es)).
|
||||
|
||||
Update the inline comment at the drop site and the follow-up note in the base design doc to record the new
|
||||
behavior.
|
||||
|
||||
## D. De-dup the double `SetDesiredSubscriptions`
|
||||
|
||||
**Problem:** during an in-process redeploy, a cached driver receives two `SetDesiredSubscriptions`:
|
||||
the bulk authored-only send in `PushDesiredSubscriptions`, then the authored∪discovered union from
|
||||
`ApplyDiscoveredPlan` (the re-inject tail). The first send forces the child to unsubscribe the whole handle
|
||||
(authored tags included) then the second re-subscribes — one extra blip per cached driver per redeploy.
|
||||
|
||||
**Decision:** in the bulk loop, **skip** the send for any driver that has a `_discoveredByDriver` entry — the
|
||||
re-inject tail sends their complete union. **Critical fallback:** the re-inject tail can still *drop* a cached
|
||||
plan (rebind/loss, see C); when it does, it must send the **authored-only** set for that driver so its authored
|
||||
subscriptions are not lost. Net invariant: every driver receives exactly **one** `SetDesiredSubscriptions` per
|
||||
redeploy.
|
||||
|
||||
## E. Lift the ≥1-authored-tag requirement + multi-device
|
||||
|
||||
No DB migration — projection + resolver only.
|
||||
|
||||
**E1 — projection (`AddressSpaceComposer`).** Extend `EquipmentNode`:
|
||||
|
||||
```csharp
|
||||
public sealed record EquipmentNode(
|
||||
string EquipmentId,
|
||||
string DisplayName,
|
||||
string UnsLineId,
|
||||
string? DriverInstanceId = null, // from Equipment.DriverInstanceId
|
||||
string? DeviceId = null, // from Equipment.DeviceId
|
||||
string? DeviceHost = null); // resolved at projection time: Equipment.DeviceId -> Device.DeviceConfig host
|
||||
```
|
||||
|
||||
`DriverInstanceId`/`DeviceId` are copied straight off the `Equipment` row. `DeviceHost` is resolved by joining
|
||||
`Equipment.DeviceId → Device` and parsing the host out of that `Device`'s schemaless `DeviceConfig` JSON, so the
|
||||
resolver can match it against a discovered device-host folder without re-reading the DB. All three are nullable;
|
||||
existing single-equipment behavior is unaffected when they're null.
|
||||
|
||||
**E2 — resolver (`DriverHostActor.HandleDiscoveredNodes`, and the redeploy re-inject tail).** Replace tag-only
|
||||
equipment resolution with:
|
||||
|
||||
> **candidates** = { equipments where `EquipmentNode.DriverInstanceId == driverId` } **∪**
|
||||
> { equipments inferred from authored `EquipmentTags` for `driverId` } (keeps today's path working)
|
||||
|
||||
- **0 candidates** → log Info, skip (unchanged).
|
||||
- **1 candidate** → graft all discovered nodes under it (today's single-device behavior, now also works with
|
||||
**zero** authored tags because the equipment-level `DriverInstanceId` resolves it). The device-host folder is
|
||||
still collapsed (single device).
|
||||
- **>1 candidates (multi-device)** → partition discovered nodes by their device-host folder segment
|
||||
(`DiscoveredNode.FolderPathSegments[1]`) and graft each device's subtree under the equipment whose
|
||||
`DeviceHost` matches that segment. A device-host with no matching equipment is **warn-skipped** (its subtree is
|
||||
not grafted) rather than mis-grafted. The mapper's existing device-host collapse already disables itself when
|
||||
≥2 distinct device-host segments are present, so multi-device paths retain the device-host level and don't
|
||||
collide.
|
||||
|
||||
**⚠️ Implementation risk (E2 multi-device only):** the partition join is a **host string** match — the driver's
|
||||
emitted device-host folder segment (FOCAS uses `device.HostAddress`) must equal the equipment's projected
|
||||
`DeviceHost` (parsed from `DeviceConfig`). Both ultimately derive from the same device configuration, but the
|
||||
string forms must be normalized to match (e.g. `host:port`). The warn-skip fallback makes a mismatch *safe* (no
|
||||
mis-graft, authored tags + single-device paths unaffected); a normalization helper + a unit test pin the formats
|
||||
together. Single-device deployments (the validated FOCAS `z-34184` case) take the "1 candidate" path and are
|
||||
**not** exposed to this risk.
|
||||
|
||||
---
|
||||
|
||||
## Data flow (unchanged)
|
||||
|
||||
E changes only *which equipment* a discovered node is grafted under and *whether* an equipment with no authored
|
||||
tags participates. Once the NodeId is assigned, the materialize → subscribe → poll → push value path is exactly
|
||||
the base feature's path; B/C/D change *when/how often* discovery runs and *how many* subscription pushes occur,
|
||||
not the value path.
|
||||
|
||||
## Error handling
|
||||
|
||||
- B `Never` driver → no discovery, authored tags unaffected.
|
||||
- C re-trigger on a non-`Connected` child → no-op (safe; reconnect re-discovers).
|
||||
- D dropped-plan fallback → authored-only send, so a rebind/loss never strands a driver's authored subscriptions.
|
||||
- E multi-device unmatched device-host → Warning + skip that device's subtree; other devices + authored tags
|
||||
unaffected. >1 candidate with no `DeviceHost` data anywhere → falls back to the base feature's warn+skip
|
||||
(no regression).
|
||||
|
||||
## Testing
|
||||
|
||||
- **A:** ctor/`Props` wires the timeout; default is 30 s (assert via a short injected timeout in an existing
|
||||
rediscover test).
|
||||
- **B:** `Never` → no `DiscoveredNodesReady`; `Once` → exactly one even when the captured set would keep growing;
|
||||
`UntilStable` → today's loop (regression). FOCAS reports `UntilStable`; the four network drivers report `Once`.
|
||||
- **C:** rebind drop branch `Tell`s `TriggerRediscovery`; `Connected` child re-discovers and re-emits;
|
||||
non-`Connected` child no-ops; re-trigger respects a `Never` policy.
|
||||
- **D:** single-send invariant — a cached driver gets exactly one `SetDesiredSubscriptions` on redeploy
|
||||
(union when applied; authored-only when the plan is dropped).
|
||||
- **E1:** `EquipmentNode` projection carries `DriverInstanceId`/`DeviceId`/`DeviceHost`; `DeviceHost` resolves via
|
||||
the `Device` join + `DeviceConfig` host parse; nulls when unset.
|
||||
- **E2:** tag-less graft (driver-level link, 0 authored tags); single-candidate unchanged (collapse retained);
|
||||
multi-device partition maps each device-host to the right equipment; unmatched device-host → warn-skip;
|
||||
host-string normalization.
|
||||
- **Regression:** Runtime.Tests, OpcUaServer.Tests, and the FOCAS suite stay green; the validated single-device
|
||||
FOCAS injection path is unchanged.
|
||||
- Live wonder re-validation of the single-device FOCAS path is **optional** and user-gated (the base feature is
|
||||
already live-validated; these follow-ups don't alter that path's runtime behavior).
|
||||
|
||||
## Scope / non-goals
|
||||
|
||||
- **In:** A–E above.
|
||||
- **Out (still locked):** discovered-**alarm** injection; **writable** discovered nodes.
|
||||
|
||||
## Touched code (anticipated)
|
||||
|
||||
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs` — `DiscoveryRediscoverPolicy` enum + default member.
|
||||
- Driver classes (`FocasDriver`, `OpcUaClientDriver`, `TwinCATDriver`, `AbCipDriver`, `AbLegacyDriver`) — override
|
||||
`RediscoverPolicy`.
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` — injectable timeout (A); policy-gated
|
||||
`StartDiscovery`/`HandleRediscoverAsync` (B); `TriggerRediscovery` message (C).
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` — resolver union + multi-device partition (E2);
|
||||
re-trigger on rebind drop (C); bulk-send skip + dropped-plan fallback (D).
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpaceComposer.cs` — `EquipmentNode` projection (E1).
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNodeMapper.cs` — multi-device partition support (E2).
|
||||
- Tests under `tests/.../Runtime.Tests` and `tests/.../OpcUaServer.Tests`.
|
||||
|
||||
## Task tracking
|
||||
|
||||
Implementation tasks to be generated by writing-plans from this design.
|
||||
Reference in New Issue
Block a user