235 lines
14 KiB
Markdown
235 lines
14 KiB
Markdown
# OtOpcUa — FixedTree-injection follow-ups (design)
|
||
|
||
**Date:** 2026-06-26
|
||
**Status:** ✅ Implemented (2026-06-26) — all five follow-ups (A–E) built via subagent-driven development
|
||
(16 commits `c2c368dc`..`0074f37a` on `feat/focas-fixedtree-equipment-injection`; every task spec+code
|
||
reviewed, high-risk tasks with serial Opus reviews). Offline suites green: Runtime.Tests 331, OpcUaServer.Tests 319,
|
||
FOCAS 248 + AbLegacy/composer additions; `dotnet build` 0 errors, production `src/` 0 warnings (TreatWarningsAsErrors).
|
||
No DB migration and no deployment-artifact wire-format change were needed (E is projection-only — the columns +
|
||
`Devices` array were already serialized). Live wonder re-validation of the single-device FOCAS path is optional/user-gated
|
||
(the base feature's live path is unchanged by these follow-ups).
|
||
**Companion to:** [`2026-06-26-otopcua-fixedtree-equipment-injection-design.md`](2026-06-26-otopcua-fixedtree-equipment-injection-design.md)
|
||
(the base feature — ✅ built + live-validated on `wonder-app-vd03`). This design works through the
|
||
follow-ups that feature's review chain surfaced.
|
||
**Branch:** continue on `feat/focas-fixedtree-equipment-injection` (stacked on `fix/focas-poll-io-serialization`,
|
||
local/unpushed — standing rule is "commit/push only when asked").
|
||
|
||
---
|
||
|
||
## Scope (locked with the user 2026-06-26)
|
||
|
||
The user selected **all five** items below. The base feature's explicit non-goals — discovered-**alarm**
|
||
injection and **writable** discovered nodes — remain out of scope (locked design decisions, untouched).
|
||
|
||
| # | Follow-up | Size | Notes |
|
||
|---|---|---|---|
|
||
| A | Hardcoded 30 s discovery timeout → injectable | trivial | behavior-preserving |
|
||
| B | Re-discovery opt-in/policy gate per driver | moderate | back-compat default |
|
||
| C | Config-unchanged driver→equipment rebind re-triggers discovery | moderate | reverses a deliberate `won't-fix` |
|
||
| D | De-dup the double `SetDesiredSubscriptions` during redeploy | small | one extra unsub/resub blip today |
|
||
| E | Lift the ≥1-authored-tag requirement + multi-device-per-driver | largest | **projection-only, no DB migration** |
|
||
|
||
## Key discovery that shapes E
|
||
|
||
The Config-DB **already** models the equipment→driver(→device) association as first-class data — no schema
|
||
change is needed for E:
|
||
|
||
- `Equipment.DriverInstanceId` (`string?`, made nullable by migration `20260608104706_NullableEquipmentDriverInstanceId`)
|
||
- `Equipment.DeviceId` (`string?`) — FK to a multi-device driver's device
|
||
- `Device` is a first-class entity (`DeviceId`, `DriverInstanceId`, schemaless `DeviceConfig` JSON with host)
|
||
- The AdminUI equipment editor already exposes an optional driver pick (`EquipmentInput.DriverInstanceId`)
|
||
|
||
The **only** gap is the runtime projection: `EquipmentNode` is `(EquipmentId, DisplayName, UnsLineId)` and drops
|
||
`DriverInstanceId`/`DeviceId`, so the injector (`DriverHostActor.HandleDiscoveredNodes`) can only resolve the
|
||
equipment by inferring it from authored `EquipmentTags` — hence the ≥1-tag requirement. E closes that gap in the
|
||
projection + resolver, not the schema.
|
||
|
||
---
|
||
|
||
## A. Discovery timeout → injectable
|
||
|
||
`DriverInstanceActor.HandleRediscoverAsync` hardcodes `new CancellationTokenSource(TimeSpan.FromSeconds(30))`
|
||
while the rediscover interval + attempt-cap are already constructor parameters. Add a
|
||
`rediscoverDiscoverTimeout` (`TimeSpan`, default `TimeSpan.FromSeconds(30)`) to the ctor and the `Props`
|
||
factory; use the field instead of the literal. Pure consistency fix; default preserves behavior.
|
||
|
||
## B. Re-discovery opt-in / policy gate
|
||
|
||
**Problem:** `StartDiscovery()` runs the bounded retry loop for **every** `ITagDiscovery` driver on every
|
||
(re)connect. FOCAS needs it (its `FixedTreeCache` fills ~0–2 s *after* connect, so a single early pass would
|
||
capture an empty/partial tree). A driver that browses its full shape **synchronously** inside `DiscoverAsync`
|
||
(OpcUaClient, TwinCAT, AB) needs at most **one** pass — the 15×2 s retry is wasted (potentially heavy) network I/O.
|
||
|
||
**Decision:** a per-driver **policy**, declared in code (driver "heaviness" is a property of the driver *type*,
|
||
needs no DB/AdminUI plumbing, lowest risk). Add a default-implemented member to `ITagDiscovery`:
|
||
|
||
```csharp
|
||
public enum DiscoveryRediscoverPolicy { UntilStable, Once, Never }
|
||
|
||
public interface ITagDiscovery
|
||
{
|
||
DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable; // default = today's behavior
|
||
Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken);
|
||
}
|
||
```
|
||
|
||
- **`UntilStable`** (default, unchanged) — today's loop: retry every `_rediscoverInterval` up to
|
||
`_rediscoverMaxAttempts` or until the captured signature is non-empty and stable.
|
||
- **`Once`** — kick exactly one discovery pass on connect, emit one `DiscoveredNodesReady`, then stop.
|
||
- **`Never`** — no post-connect discovery kick at all.
|
||
|
||
**Driver assignments:**
|
||
- **FOCAS** → `UntilStable` (explicit; it genuinely needs the retry).
|
||
- **OpcUaClient, TwinCAT, AbCip, AbLegacy** → `Once` (they discover synchronously in `DiscoverAsync`; one pass
|
||
on connect injects their tree, the retry loop only added cost). Any driver not overriding the default keeps
|
||
`UntilStable`, so this is a strict no-regression change.
|
||
|
||
**Mechanism:** `DriverInstanceActor.StartDiscovery` reads `((ITagDiscovery)_driver).RediscoverPolicy`. `Never`
|
||
→ return without scheduling. `Once`/`UntilStable` → schedule the first `RediscoverTick`; `HandleRediscoverAsync`
|
||
stops after the first pass when the policy is `Once` (instead of evaluating stop-on-stable).
|
||
|
||
**Alternative considered (rejected):** a per-instance JSON flag parsed by the host + AdminUI. More flexible but
|
||
adds artifact/AdminUI plumbing for a knob whose correct value is type-uniform.
|
||
|
||
## C. Config-unchanged rebind re-triggers discovery
|
||
|
||
**Problem:** when a redeploy rebinds a driver to a new equipment **without** a `DriverConfig` change,
|
||
`PushDesiredSubscriptions`' re-inject tail correctly **drops** the stale cached plan (a stale `EQ-1`-scoped graft
|
||
under `EQ-2` would be worse), but `ReconcileDrivers` only restarts a child on a `DriverConfig` change — so a
|
||
config-unchanged child is never reconnected and the FixedTree stays absent under the new equipment until the
|
||
driver's next natural reconnect/restart.
|
||
|
||
The base feature deliberately did **not** add a re-trigger here, to avoid coupling the subscription pass to
|
||
driver-lifecycle control. This follow-up reverses that — but cleanly, because the trigger is a **discovery**
|
||
action, not lifecycle control (no stop/restart), and it is idempotent.
|
||
|
||
**Decision:** add a `DriverInstanceActor.TriggerRediscovery` message. In the re-inject tail, the two branches
|
||
that `Remove` a cached plan because of a rebind/loss also `Tell` that driver's child `TriggerRediscovery`. The
|
||
child kicks a fresh `RediscoverTick` (current `_initGeneration`) **iff it is in the `Connected` state**;
|
||
otherwise it no-ops (its eventual reconnect re-discovers anyway). The discovery pass re-emits
|
||
`DiscoveredNodesReady`, which resolves against the **new** composition (`_lastComposition`) and grafts under the
|
||
new equipment. The re-trigger honors B's policy (`Never` drivers do not re-discover; `Once`/`UntilStable` run
|
||
their normal pass(es)).
|
||
|
||
Update the inline comment at the drop site and the follow-up note in the base design doc to record the new
|
||
behavior.
|
||
|
||
## D. De-dup the double `SetDesiredSubscriptions`
|
||
|
||
**Problem:** during an in-process redeploy, a cached driver receives two `SetDesiredSubscriptions`:
|
||
the bulk authored-only send in `PushDesiredSubscriptions`, then the authored∪discovered union from
|
||
`ApplyDiscoveredPlan` (the re-inject tail). The first send forces the child to unsubscribe the whole handle
|
||
(authored tags included) then the second re-subscribes — one extra blip per cached driver per redeploy.
|
||
|
||
**Decision:** in the bulk loop, **skip** the send for any driver that has a `_discoveredByDriver` entry — the
|
||
re-inject tail sends their complete union. **Critical fallback:** the re-inject tail can still *drop* a cached
|
||
plan (rebind/loss, see C); when it does, it must send the **authored-only** set for that driver so its authored
|
||
subscriptions are not lost. Net invariant: every driver receives exactly **one** `SetDesiredSubscriptions` per
|
||
redeploy.
|
||
|
||
## E. Lift the ≥1-authored-tag requirement + multi-device
|
||
|
||
No DB migration — projection + resolver only.
|
||
|
||
**E1 — projection (`AddressSpaceComposer`).** Extend `EquipmentNode`:
|
||
|
||
```csharp
|
||
public sealed record EquipmentNode(
|
||
string EquipmentId,
|
||
string DisplayName,
|
||
string UnsLineId,
|
||
string? DriverInstanceId = null, // from Equipment.DriverInstanceId
|
||
string? DeviceId = null, // from Equipment.DeviceId
|
||
string? DeviceHost = null); // resolved at projection time: Equipment.DeviceId -> Device.DeviceConfig host
|
||
```
|
||
|
||
`DriverInstanceId`/`DeviceId` are copied straight off the `Equipment` row. `DeviceHost` is resolved by joining
|
||
`Equipment.DeviceId → Device` and parsing the host out of that `Device`'s schemaless `DeviceConfig` JSON, so the
|
||
resolver can match it against a discovered device-host folder without re-reading the DB. All three are nullable;
|
||
existing single-equipment behavior is unaffected when they're null.
|
||
|
||
**E2 — resolver (`DriverHostActor.HandleDiscoveredNodes`, and the redeploy re-inject tail).** Replace tag-only
|
||
equipment resolution with:
|
||
|
||
> **candidates** = { equipments where `EquipmentNode.DriverInstanceId == driverId` } **∪**
|
||
> { equipments inferred from authored `EquipmentTags` for `driverId` } (keeps today's path working)
|
||
|
||
- **0 candidates** → log Info, skip (unchanged).
|
||
- **1 candidate** → graft all discovered nodes under it (today's single-device behavior, now also works with
|
||
**zero** authored tags because the equipment-level `DriverInstanceId` resolves it). The device-host folder is
|
||
still collapsed (single device).
|
||
- **>1 candidates (multi-device)** → partition discovered nodes by their device-host folder segment
|
||
(`DiscoveredNode.FolderPathSegments[1]`) and graft each device's subtree under the equipment whose
|
||
`DeviceHost` matches that segment. A device-host with no matching equipment is **warn-skipped** (its subtree is
|
||
not grafted) rather than mis-grafted. The mapper's existing device-host collapse already disables itself when
|
||
≥2 distinct device-host segments are present, so multi-device paths retain the device-host level and don't
|
||
collide.
|
||
|
||
**⚠️ Implementation risk (E2 multi-device only):** the partition join is a **host string** match — the driver's
|
||
emitted device-host folder segment (FOCAS uses `device.HostAddress`) must equal the equipment's projected
|
||
`DeviceHost` (parsed from `DeviceConfig`). Both ultimately derive from the same device configuration, but the
|
||
string forms must be normalized to match (e.g. `host:port`). The warn-skip fallback makes a mismatch *safe* (no
|
||
mis-graft, authored tags + single-device paths unaffected); a normalization helper + a unit test pin the formats
|
||
together. Single-device deployments (the validated FOCAS `z-34184` case) take the "1 candidate" path and are
|
||
**not** exposed to this risk.
|
||
|
||
---
|
||
|
||
## Data flow (unchanged)
|
||
|
||
E changes only *which equipment* a discovered node is grafted under and *whether* an equipment with no authored
|
||
tags participates. Once the NodeId is assigned, the materialize → subscribe → poll → push value path is exactly
|
||
the base feature's path; B/C/D change *when/how often* discovery runs and *how many* subscription pushes occur,
|
||
not the value path.
|
||
|
||
## Error handling
|
||
|
||
- B `Never` driver → no discovery, authored tags unaffected.
|
||
- C re-trigger on a non-`Connected` child → no-op (safe; reconnect re-discovers).
|
||
- D dropped-plan fallback → authored-only send, so a rebind/loss never strands a driver's authored subscriptions.
|
||
- E multi-device unmatched device-host → Warning + skip that device's subtree; other devices + authored tags
|
||
unaffected. >1 candidate with no `DeviceHost` data anywhere → falls back to the base feature's warn+skip
|
||
(no regression).
|
||
|
||
## Testing
|
||
|
||
- **A:** ctor/`Props` wires the timeout; default is 30 s (assert via a short injected timeout in an existing
|
||
rediscover test).
|
||
- **B:** `Never` → no `DiscoveredNodesReady`; `Once` → exactly one even when the captured set would keep growing;
|
||
`UntilStable` → today's loop (regression). FOCAS reports `UntilStable`; the four network drivers report `Once`.
|
||
- **C:** rebind drop branch `Tell`s `TriggerRediscovery`; `Connected` child re-discovers and re-emits;
|
||
non-`Connected` child no-ops; re-trigger respects a `Never` policy.
|
||
- **D:** single-send invariant — a cached driver gets exactly one `SetDesiredSubscriptions` on redeploy
|
||
(union when applied; authored-only when the plan is dropped).
|
||
- **E1:** `EquipmentNode` projection carries `DriverInstanceId`/`DeviceId`/`DeviceHost`; `DeviceHost` resolves via
|
||
the `Device` join + `DeviceConfig` host parse; nulls when unset.
|
||
- **E2:** tag-less graft (driver-level link, 0 authored tags); single-candidate unchanged (collapse retained);
|
||
multi-device partition maps each device-host to the right equipment; unmatched device-host → warn-skip;
|
||
host-string normalization.
|
||
- **Regression:** Runtime.Tests, OpcUaServer.Tests, and the FOCAS suite stay green; the validated single-device
|
||
FOCAS injection path is unchanged.
|
||
- Live wonder re-validation of the single-device FOCAS path is **optional** and user-gated (the base feature is
|
||
already live-validated; these follow-ups don't alter that path's runtime behavior).
|
||
|
||
## Scope / non-goals
|
||
|
||
- **In:** A–E above.
|
||
- **Out (still locked):** discovered-**alarm** injection; **writable** discovered nodes.
|
||
|
||
## Touched code (anticipated)
|
||
|
||
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs` — `DiscoveryRediscoverPolicy` enum + default member.
|
||
- Driver classes (`FocasDriver`, `OpcUaClientDriver`, `TwinCATDriver`, `AbCipDriver`, `AbLegacyDriver`) — override
|
||
`RediscoverPolicy`.
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` — injectable timeout (A); policy-gated
|
||
`StartDiscovery`/`HandleRediscoverAsync` (B); `TriggerRediscovery` message (C).
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` — resolver union + multi-device partition (E2);
|
||
re-trigger on rebind drop (C); bulk-send skip + dropped-plan fallback (D).
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpaceComposer.cs` — `EquipmentNode` projection (E1).
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNodeMapper.cs` — multi-device partition support (E2).
|
||
- Tests under `tests/.../Runtime.Tests` and `tests/.../OpcUaServer.Tests`.
|
||
|
||
## Task tracking
|
||
|
||
Implementation tasks to be generated by writing-plans from this design.
|