3 Commits

Author SHA1 Message Date
Joseph Doherty 74e07225c9 docs(otopcua): record the equipment-tag data-plane investigation + fix chain 2026-06-26 06:00:05 -04:00
Joseph Doherty 6600ce9940 feat(otopcua): bridge Akka actor logs into Serilog
DriverHostActor/DriverInstanceActor and cluster events log via Akka's ILoggingAdapter, which had no Serilog bridge — under the Windows service host the default StandardOutLogger output is discarded, making the driver-role actor graph invisible (this masked the data-plane diagnosis).

- Add Akka.Logger.Serilog 1.5.60 (deps satisfied by Akka 1.5.62 / Serilog 4.3.1).
- WithOtOpcUaClusterBootstrap: ConfigureLoggers(DebugLevel; ClearLoggers(); AddLogger<SerilogLogger>()) — Akka.Hosting owns logger setup, so HOCON akka.loggers alone is not honored.
- Program.cs: set static Serilog.Log.Logger from the DI root logger after Build() (AddZbSerilog registers the MEL provider but not the static logger, which the Akka SerilogLogger and the startup banner both need).
2026-06-26 06:00:05 -04:00
Joseph Doherty 235b8b8e6d fix(focas): serialize per-device wire I/O + bound reads; tolerate AdminUI config formats
Equipment tags were stuck at Bad_WaitingForInitialData on the deployed driver: the equipment poll, fixed-tree loop, probe and recycle shared one FOCAS/2 socket with no serialization, and the steady-state read had no timeout — concurrent reads collided and a stalled read hung forever, never overwriting the node's initial-data seed.

- SynchronizedFocasClient: per-device SemaphoreSlim gate + per-call timeout around every wire op (Connect/Probe gated, not double-bounded); wired in EnsureConnectedAsync. ReadAsync/WriteAsync map a per-call timeout to BadCommunicationError instead of rethrowing.
- FlexibleStringConverter on FOCAS config Series: the AdminUI persists the enum as a number ("series":6); accept number-or-string instead of throwing -> stub.
- FocasHostAddress.TryParse tolerates a scheme-less {ip}[:{port}] (AdminUI hostAddress form); canonical focas:// unchanged, malformed schemes still rejected.

247 FOCAS tests green; each fix has a regression test. Live-validated on wonder-app-vd03 (tags read Good).
2026-06-26 05:59:54 -04:00
15 changed files with 693 additions and 11 deletions
+1
View File
@@ -8,6 +8,7 @@
<PackageVersion Include="Akka.Cluster.Hosting" Version="1.5.62" /> <PackageVersion Include="Akka.Cluster.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Cluster.Tools" Version="1.5.62" /> <PackageVersion Include="Akka.Cluster.Tools" Version="1.5.62" />
<PackageVersion Include="Akka.Hosting" Version="1.5.62" /> <PackageVersion Include="Akka.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Logger.Serilog" Version="1.5.60" />
<PackageVersion Include="Akka.Remote" Version="1.5.62" /> <PackageVersion Include="Akka.Remote" Version="1.5.62" />
<PackageVersion Include="Akka.Remote.Hosting" Version="1.5.62" /> <PackageVersion Include="Akka.Remote.Hosting" Version="1.5.62" />
<PackageVersion Include="Akka.Streams" Version="1.5.62" /> <PackageVersion Include="Akka.Streams" Version="1.5.62" />
@@ -0,0 +1,178 @@
# OtOpcUa equipment-tag data-plane — why live FOCAS values don't surface (investigation plan)
**Date:** 2026-06-25
**Companion to:** [`2026-06-25-focas-pdu-v3-implementation-plan.md`](2026-06-25-focas-pdu-v3-implementation-plan.md) (the FOCAS PDU-v3 driver work — DONE + deployed) and [`../deployments/wonder-app-vd03-makino-z-34184.md`](../deployments/wonder-app-vd03-makino-z-34184.md).
**Goal:** make the deployed Makino tags (`parts-count`/`parts-required` = `MACRO:3901/3902`) actually carry live values over OPC UA, and make the FOCAS FixedTree (Identity/Axes/Timers/…) appear in the served address space.
## The symptom (precise, observed 2026-06-25 after the v3 deploy)
Against `opc.tcp://wonder-app-vd03.zmr.zimmer.com:4840/OtOpcUa` (Security None, anonymous, DisableLogin):
- `read` AND a 30 s `subscribe` of `ns=2;s=EQ-3686c0272279/parts-count` (and `/parts-required`) return **`0x80320000` Bad_WaitingForInitialData** — never a value.
- A recursive browse of the whole served tree shows ONLY `machining → makino → z-34184 → {parts-count, parts-required}`**no FixedTree nodes** (Identity/Axes/Timers/Program/…).
- Unchanged by host-restart, a full AdminUI re-deploy (`12e0d528`, Sealed/In-sync), and a driver `Restart`.
- A ~15 s box-side watch saw **no 250 ms-cadence TCP connection** to `10.201.31.5:8193` — only the occasional probe connect.
## What is RULED OUT (do not re-investigate)
- **The FOCAS PDU-v3 driver.** Proven working: the rewritten `FocasDriverProbe` does a real initiate + `cnc_statinfo` and the AdminUI shows **DRIVER STATUS: HEALTHY, last success Ns ago**; and on the dev box every surface (sysinfo / axes / dynamic / macros / timer / PMC / servo / alarms) reads correctly through the same `WireFocasClient`. The wire path talks v3 to the Makino.
- **Deployment / Akka roles.** Re-deploy sealed with no "task canceled"; the node has admin+driver roles; the address space (the 2 UNS tags) is served.
- **Reachability.** TCP 8193 reachable from the box; OPC 4840 reachable from the dev box.
So the problem is the OtOpcUa **data plane**: the driver's *control/probe plane* runs, but its *data plane* (the FixedTree poll loop output + the equipment-tag value source) is not reaching the served address space. This was present last session too — it was masked by the v3 reject; with v3 now working it stands alone.
## Leading hypotheses (to confirm/refute, not assume)
- **H1 — FixedTree bootstrap is stuck/throwing.** `FocasDriver.FixedTreeLoopAsync` is started at `FocasDriver.cs:160` inside `InitializeAsync` (which demonstrably ran — the probe loop at `:137` is alive). Its bootstrap `while (state.FixedTreeCache is null)` loop (~`:650`) calls `GetSysInfoAsync`/`GetAxisNamesAsync`/optional probes; if one throws it retries every 2 s and the cache never populates → no FixedTree nodes, no poll. The brief 2 s-spaced connects could explain "no steady 250 ms connection."
- **H2 — FixedTree nodes aren't projected under the Equipment model.** Galaxy/FOCAS are now *standard Equipment-kind drivers*; the served tree is the UNS/Equipment projection (configured tags only). Driver `DiscoverAsync` FixedTree auto-nodes may only live in a raw-driver namespace that the UNS browse doesn't show — i.e. FixedTree-under-Equipment may be unsupported-by-design, not a bug.
- **H3 — equipment-tag value source never primed.** A point-read returns the server's cached value; with no active poll/subscription seeding it, that's `WaitingForInitialData`. But a 30 s `subscribe` ALSO stayed bad — so either the monitored-item → `FocasDriver.ReadAsync` wiring isn't firing, or `ReadAsync(MACRO:3901)` returns no value on the box. (Analogy: the ScadaBridge DCL seed-ordering / static-tag false-bad class of bug — see that memory.)
- **H4 — a data-plane gate.** A role/active-node gate (cf. "data-plane roles need `Security:Ldap:GroupToRole` mapped or write/ack is inert") could suppress the read/poll on this single fused node under DisableLogin.
## ⚠️ Observability is the bottleneck
There are currently **no OtOpcUa app logs on the box** (Windows eventlog shows only NSSM start/stop; Serilog has no file sink in this deployment — cf. the MxGateway windev note). **Getting runtime logs is the single highest-leverage step** and gates H1/H3/H4.
---
## Phase 1 — Offline code trace of the data plane (do FIRST; free, no box/CNC)
Trace, in the repo on the dev box, the two paths end-to-end. This may reveal the root cause with zero box access.
1. **Equipment-tag read path:** from an OPC UA `Read`/`CreateMonitoredItem` on `ns=2;s=EQ-…/parts-count` → the server's address-space node → the "FullName→NodeId router" → `FocasDriver.ReadAsync`. Find: who invokes the driver read, whether point-reads vs subscriptions differ, and where/whether a value is cached vs `WaitingForInitialData` is returned. Entry points: `FocasDriver.cs:247` (resolver-produced equipment tags), `FocasDriver.ReadAsync`, and the server-side poll-group / monitored-item sampler (grep `PollGroup`, `MonitoredItem`, the Equipment namespace value provider).
2. **FixedTree path:** `FocasDriver.FixedTreeLoopAsync` (`:644`) → `state.FixedTreeCache``TryReadFixedTree` → how those nodes are added to the address space. Determine whether FixedTree nodes are emitted into the **Equipment/UNS projection** at all, or only a raw-driver namespace (resolves H2). Grep `DiscoverAsync`, `FixedTreeReference`, and the address-space builder for the Equipment kind.
3. **Driver lifecycle/gates:** confirm what conditions start/suppress the data poll vs the probe loop (active-node, role, enabled flags) — `FocasDriver.cs:87181`, the cluster/active-node gate.
- **Output:** a written root-cause hypothesis (which of H1H4) + the specific code site, OR a precise list of what only box logs can disambiguate.
### ✅ Phase 1 FINDINGS (2026-06-25, offline subagent code-trace — DONE)
| Hypothesis | Verdict | Confidence |
|---|---|---|
| **H2** — FixedTree nodes not projected under Equipment model | **CONFIRMED root cause of symptom #2 — by-design** | proven from code |
| **H3** — equipment-tag value source never primed | **CONFIRMED mechanism of symptom #1**; routing code is self-consistent → break is **runtime**, not static | mechanism proven; exact break needs box logs |
| **H4** — role/active-node data-plane gate | **REFUTED** for the read/subscribe path | proven from code |
| **H1** — FixedTree bootstrap stuck/throwing | best fit for symptom #3 (no 250 ms connect), but **moot** for #2 and **not** the cause of #1 | strongly-suggested; needs logs |
| **H5 (new)** — FOCAS tag with blank `deviceHostAddress``BadNodeIdUnknown` | **latent trap** — yields a *Bad* value, not WaitingForInitialData, so it's the *next* failure, not the current symptom | proven from code |
**Why symptom #2 (no FixedTree nodes):** FixedTree nodes are emitted **only** by `FocasDriver.DiscoverAsync` (`FocasDriver.cs:391-494`, gated on `FixedTreeCache`), and `DiscoverAsync` is **never invoked** by the deployed equipment-kind address-space build. Production materializes the served tree **purely from Config-DB Tag rows** (`EquipmentNodeWalker.cs:24-25,161-165`; `OpcUaPublishActor.RebuildAddressSpace``MaterialiseEquipmentTags(composition)` at `OpcUaPublishActor.cs:326-337`). No runtime caller of `IDriver.DiscoverAsync` exists. ⇒ **By-design gap, not a bug.** Surfacing FixedTree under the Equipment tree is a *feature* (author explicit Tag rows per signal, or extend the composition pipeline to merge `ITagDiscovery` output — a substantial `AddressSpaceComposer`/`AddressSpaceApplier` change).
**Why symptom #1 (`0x80320000` even under a 30 s subscribe):** equipment-tag node value is **push-only**. `0x80320000` is the materialization **seed** (`OtOpcUaNodeManager.EnsureVariable:1400-1402`); there is **no `OnReadValue`/server-side sampler** for equipment tags, so both Read and MonitoredItem serve the cached node value. The seed is overwritten **only** by `ForwardToMux → WriteValue` (`OtOpcUaNodeManager.cs:261-281`), which writes even Bad/null. `StatusFromQuality` maps Bad → `0x80000000`, **not** `0x80320000`. ⇒ A persistent `0x80320000` **proves no `AttributeValueUpdate` ever landed** for those two NodeIds. The routing chain (`DriverHostActor.PushDesiredSubscriptions``_nodeIdByDriverRef``DriverInstanceActor.ResubscribeDesired``PollGroupEngine` forceRaise → `OnDataChangeForward``ForwardToMux`) is **provably self-consistent for FOCAS** (`FocasTagConfigModel` writes no top-level `FullName``ExtractTagFullName` returns the raw JSON blob → `FocasEquipmentTagParser.TryParse` resolves it; subscribed ref == published ref == routing-map key == NodeId, all from the same `t.FullName`). So the static mapping is **refuted**; the break is **runtime**: either the FOCAS child was never subscribed those refs, or `_nodeIdByDriverRef` lacked the `(DriverInstanceId, FullName)` key at publish time — most likely a **`DriverInstanceId` attribution mismatch** between the equipment Tag rows and the spawned `DriverInstance`.
**What only box logs (or a ConfigDb snapshot read) can disambiguate for symptom #1 — capture these on the driver-role node after a deploy:**
1. `DriverHostActor` info *"SubscribeBulk pushed {Refs} references across {Drivers} driver(s)"* (`DriverHostActor.cs:1058`) — `Refs<2` ⇒ equipment Tag `DriverInstanceId` ≠ spawned FOCAS child id (attribution mismatch).
2. `DriverInstanceActor` info *"subscribed to {Count} refs"* (`DriverInstanceActor.cs:571`) — absent/`0``_desiredRefs` empty or never reached `Connected`.
3. `DriverHostActor` debug *"no equipment-tag NodeId for ({Driver},{Ref}) — value dropped"* (`DriverHostActor.cs:550`) — present ⇒ `_nodeIdByDriverRef` key miss.
4. `FocasDriver` debug *"FOCAS fixed-tree bootstrap failed … retrying"* every ~2 s (`FocasDriver.cs:686`) ⇒ confirms H1 (`cnc_sysinfo`/`cnc_rdaxisname` failing on the Makino). Needs Debug level + logger-constructed wire client.
5. A driver-side read of `MACRO:3901` (live Makino): `BadNodeIdUnknown` ⇒ H5 blank-`deviceHostAddress` trap; Good value ⇒ wire path fine, gap is purely the OPC-UA equipment subscription wiring.
**Cheaper-than-Phase-2 lead:** the prime suspect (Tag-row `DriverInstanceId` vs `DriverInstance` row mismatch) is checkable by reading the **deployed ConfigDb snapshot** (`OtOpcUaConfig` on `wonder-sql-vd03:1433`) — no service restart, no log-sink change. Worth doing before/instead of the Phase 2 log-sink change.
**Suggested minimal-fix shapes (NOT implemented):**
- Symptom #1: depends on which log line/row fires — if attribution miss, it's a **config/snapshot re-bind** (no code). If a real wiring gap, localize first.
- H5 latent: in `FocasDriver.ReadAsync`/`WriteAsync` device lookup (`:289`,`:346`), when `DeviceHostAddress` is blank and `_devices.Count==1`, resolve to the sole device instead of `BadNodeIdUnknown` (or make the AdminUI field mandatory).
- Symptom #2 (H2): no fix — by-design; surfacing FixedTree is a feature decision.
### ✅ ConfigDb snapshot read (2026-06-25, symptom #1 path — chosen over box logs)
Read live `OtOpcUaConfig` on `wonder-sql-vd03` (query run on-box so the SQL password never left the box). Ground truth:
- **Equipment** `EQ-3686c0272279` (`z-34184`): `DriverInstanceId=focas-z-34184`, `DeviceId=NULL`, `Enabled=True`, `UnsLineId=makino`.
- **Tags** `parts-count`/`parts-required`: both `DriverInstanceId=focas-z-34184`, `DeviceId=NULL`, `DataType=Double`, `AccessLevel=Read`, `FolderPath=NULL`, **`PollGroupId=NULL`**, `TagConfig={"deviceHostAddress":"10.201.31.5:8193","address":"MACRO:390{1,2}","dataType":"Float64"}`.
- **DriverInstance**: one row `focas-z-34184` (`DriverType=Focas`, `Enabled=True`, ns `dev-equipment`, cluster `DEV`).
- **Device** table: **0 rows** — NORMAL: `FocasDriver` builds `_devices` from `_options.Devices` (DriverConfig JSON) not the Device table (`FocasDriver.cs:97`).
**Refuted by this read:** prime-suspect `DriverInstanceId` attribution mismatch (matches exactly) and H5 blank-`deviceHostAddress` (present). The deployed config is **clean**. ⇒ symptom #1 is a pure value-flow-plumbing break. New live leads: **`PollGroupId=NULL`** on both tags (is a poll group required to subscribe/poll?) and the **resolver-registration** path (equipment-tag refs are "resolver-produced, not seeded at `InitializeAsync`" per `FocasDriver.cs:247` — does poll-time `TryResolve` of the JSON-blob ref ever succeed?). A second offline subagent trace of DriverHostActor↔DriverInstanceActor↔PollGroupEngine↔resolver is running to pin the exact broken link.
### ⚠️ FixedTree feature (symptom #2 — user chose "build the feature") — ARCHITECTURE REALITY
Mapped the composition pipeline. Two address-space paths exist: (1) **Equipment/UNS projection** `AddressSpaceComposer.Compose` (config entities only) → `AddressSpaceApplier.MaterialiseEquipmentTags` → the served `ns=2` tree where `EQ-…` lives; (2) **raw-driver namespace** `GenericDriverNodeManager.BuildAddressSpaceAsync``driver.DiscoverAsync(IAddressSpaceBuilder)`. **Path 2 is DEAD: `BuildAddressSpaceAsync` has no runtime caller and `OpcUaApplicationHost.PopulateAddressSpaces` (its referenced caller) no longer exists.** Even `GalaxyDriver.DiscoverAsync` (`:588`) is reachable only via that dead path — Galaxy surfaces its hierarchy by being **authored as config equipment/tags**, not via discovery. ⇒ In the current Equipment-kind model **every served node is config-driven; `ITagDiscovery`/`DiscoverAsync` is legacy/dead for serving.** So "build the FixedTree feature" is NOT re-wiring an existing path — it's a **new dynamic-node-injection capability** into the Equipment projection, and it must solve a **timing problem**: composition runs at deploy/apply (before the driver connects), but FixedTree data only exists after the driver's async `FixedTreeCache` bootstrap. The far cheaper alternative that yields the same visible result is to **author FixedTree signals as config Tag rows** (each bound to a FOCAS fixed-tree reference) — same mechanism every other equipment tag uses. **Recommend re-confirming scope with the user given this cost delta before building.**
### 🎯 ROOT CAUSE — symptom #1 (CONFIRMED, 2026-06-25, 2nd subagent trace + code verify)
**The FOCAS poll read hangs forever because (1) all wire I/O for a device shares one socket with NO serialization, and (2) the steady-state read has NO timeout.**
- **Unsynchronized shared socket:** `FocasDriver.EnsureConnectedAsync` (`:1101-1128`) returns the single `device.Client` with no I/O mutex. `DeviceState.ProbeLock` (`:1172`) only synchronizes probes with each other. Four independent loops issue wire ops on that same socket concurrently: the equipment poll (`PollGroupEngine`, 1 s — `SubscriptionPublishingInterval` const `DriverHostActor.cs:58`), the FixedTree loop (250 ms / 2 s bootstrap), the probe (5 s), the recycle loop. FOCAS/2 is strict request-response on one socket → concurrent unsynchronized reads interleave; one reader consumes another's response PDU and the victim read blocks waiting for bytes that never come.
- **No read timeout:** the poll calls `FocasDriver.ReadAsync` (`:295-308`) → `WireFocasClient.ReadAsync` (`:417`) → `ReadMacroAsync` **without** the optional `timeout``FocasWireClient.CreateCallTimeout` (`:843-848`) skips `CancelAfter` when `timeout` is null → `ReadExpectedPduAsync` awaits the socket under a token that never fires. The blocked first `forceRaise` poll (`PollGroupEngine.cs:119`) never completes → `onChange` never fires → the `BadWaitingForInitialData` materialization seed (`OtOpcUaNodeManager.cs:1400`) is never overwritten → permanent `0x80320000`. (Connect `:1119` and probe `:604-610` ARE bounded — only the steady-state read isn't.)
**Why it works on the dev box but not deployed:** the dev-box harness/CLI does ONE read at a time with no FixedTree loop running → no socket collision → macros read fine (proven in the v3 work). Deployed, the FixedTree loop races the equipment poll on one socket → collision → hung read.
**Why both tags fail together:** they're in one driver's single 1 s subscription batch; `PollOnceAsync` reads them sequentially and hangs on the first, so neither reaches `onChange`.
**Ruled out:** `PollGroupId=NULL` is a red herring — verified at `ConfigComposer.cs:38` (snapshots all tags, no poll-group filter), `AddressSpaceComposer.Compose` (never reads `PollGroupId`), and `DriverHostActor.PushDesiredSubscriptions:965-973` (groups by `DriverInstanceId`, constant 1 s interval, no poll-group keying). Routing-key mismatch ruled out: subscribe ref == routing key == `onChange` ref (all from the same `FullName`). Not-subscribed ruled out by elimination: a *served* node implies it's in the same `ParseComposition` artifact used for subscribe; and every *deterministic* `ReadAsync` outcome pushes a specific Bad code (`BadNodeIdUnknown`/`BadCommunicationError`/`BadNotSupported`), none of which is `0x80320000` — only a hung (never-returning) read leaves the seed intact.
**PROPOSED FIX (two parts, FOCAS-driver-only, single managed DLL, no migration/proto change — same low-blast-radius deploy path as the v3 work):**
1. **Serialize per-device wire I/O** — an async gate (`SemaphoreSlim`) on `DeviceState` held around each `EnsureConnected + wire op` so the equipment poll, FixedTree loop, probe, and recycle never collide on the one socket. *This is what makes values actually read Good.* MUST be paired with #2 (a lock around an unbounded read would deadlock all I/O).
2. **Bound every steady-state wire read/write** with `_options.Timeout` (mirror the probe's linked CTS at `:604-610`, or thread `timeout` through `WireFocasClient.ReadAsync``ReadMacroAsync`). Converts any stall into a recoverable `BadCommunicationError` push (overwrites the seed, downgrades health → observable + self-healing) instead of permanent silence.
**Confidence:** the two structural gaps are *proven from code*; that their interaction is the active trigger is *strongly-suggested* (works single-threaded on dev box, hangs deployed; fits every observation). Definitive live confirmation = the fix makes the tags leave `0x80320000` (Good, or recoverable BadComm if the macro read genuinely fails). **This is a debugging fix headed for a production CNC node — implement + unit-test (serialization + read-timeout) locally, then deploy to wonder and verify, per the diagnose-before-deploy discipline.**
### Implementation + deploy status (2026-06-26)
- **Fix implemented** on branch `fix/focas-poll-io-serialization` (off `feat/focas-pdu-v3`): new `SynchronizedFocasClient` decorator (per-device `SemaphoreSlim` gate + per-call `_options.Timeout`) wired into `FocasDriver.EnsureConnectedAsync`; `ReadAsync`/`WriteAsync` now map a per-call timeout (OCE while caller token live) → `BadCommunicationError` instead of rethrowing. **243 FOCAS tests green** (8 new in `FocasIoSerializationTests.cs` + 1 sibling in `FocasReadWriteTests`), full solution builds 0 errors.
- **Live baseline re-confirmed** via OtOpcUa CLI from the Mac: both tags `[80320000]`.
- **Deployed** the Release DLL to wonder (single managed-DLL swap; v3 DLL 222208 B → fix DLL 228864 B; backup `E:\ApiInstall\OtOpcUa\_focasbak-pre-iofix-20260626T041913.dll`; service Running, OPC 4840 listening).
- **BLOCKER for verification:** after the swap+restart the FOCAS driver is **not connecting at all** (zero TCP to `10.201.31.5:8193` at 150 s uptime — not even a probe connect), so tags still `[80320000]`.
- **Deploy API enabled** (user-approved): added `Security__DeployApiKey` (64-char generated, value on-box only) to the `OtOpcUaHost` service `Environment` REG_MULTI_SZ + restart; env backed up to `_envbak-pre-deploykey-20260626T043001.txt`. Headless `POST http://localhost:9000/api/deployments` (key read from registry on-box) returned **HTTP 202 Accepted**, sealed deployment `ef384b04…`.
- **Tags STILL `[80320000]`** after the sealed deploy. **Root of the live blocker (from box logs):** the service writes Serilog to `C:\Windows\System32\logs\otopcua-<date>.log` (relative-path-from-service-CWD bug — pending.md D2). Today's log shows the **admin/publish side only**: `Phase7Applier: hierarchy materialised (areas=1,lines=1,equipment=1)` + `equipment tags materialised (tags=2)` at each restart (04:19, 04:30) — which is why the 2 tags are served with the `0x80320000` seed. But across the **entire** day's log there are **ZERO** driver-side lines: `DriverHost`=0, `SubscribeBulk`=0, `subscribed to`=0, `Focas`=0, `GenericDriver`=0 (the 13 `DriverInstance` hits are all SQL `FROM [DriverInstance]`). **⇒ the DriverHostActor / driver-role side is not spawning the FOCAS driver at all on this node today**, so there is no poll, no value push, permanent seed. Health: `/healthz` Healthy, Akka member **Up**, `admin-leader` **Active**; roles env intact (`Cluster__Roles__0=admin`, `__1=driver`, seed=self:4053); `DOTNET_ENVIRONMENT=Production`. The admin half is fully alive; only the driver half is silent.
- **This is a SEPARATE blocker from the I/O fix** (which is correct, unit-tested, and deployed — but cannot be exercised until the driver actually polls).
### Driver-host activation diagnosis (2026-06-26, Debug-logging pass + code reads)
- **Debug Serilog enabled** (env `Serilog__MinimumLevel__Default=Debug` + EF/Microsoft/System→Warning; env bak `_envbak-pre-debug-*.txt`). Revealed: **NO exceptions / load errors anywhere** (my fix DLL is NOT the cause), but also **ZERO Akka lines** in Serilog (`Akka`/`Member`/`akka.tcp`/`singleton` = 0).
- **`DriverHostActor` logs via `Context.GetLogger()` (Akka `ILoggingAdapter`), and there is NO `Akka.Logger.Serilog` bridge anywhere in the app** (verified across `*.cs`/`*.csproj`/HOCON). So Akka actor logs go to the default console logger → **discarded for a Windows service → invisible in BOTH Production AND Development.** `appsettings.Development.json` only changes Serilog min-levels (+ `Ldap:DevStubMode=true`); it adds no Akka bridge. ⇒ **`DOTNET_ENVIRONMENT=Development` would NOT surface the driver host** (held off — futile + flips DevStubMode on prod).
- **The `DriverHostActor` IS instantiated** (unconditionally, same `system.ActorOf` block as the working `OpcUaPublishActor` at `Runtime/ServiceCollectionExtensions.cs:~232`). So it runs — it just isn't producing a connected FOCAS driver, and logs via Akka so the reason is invisible.
- **STRONG HYPOTHESIS (explains everything):** on bootstrap the `DriverHostActor` *recovers* its persisted state (its own logs name `entering Steady` / `recovered Applied state at rev {Rev}`), but a process restart has already killed the live `DriverInstanceActor`s + their FOCAS sockets — and **recovery does NOT re-instantiate them**. Drivers re-spawn only on a deployment dispatch whose **revision changes**. This config is now **stable** (`revisionHash 924b59097eba…` is identical across `0c2db588`/`ef384b04`/`a2a84646` — every `POST /api/deployments` returns 202 but with the SAME rev), so repeated deploys are no-ops for the driver host → after any restart there are **no live drivers** and nothing re-spawns them → zero Makino TCP → tags frozen at the `0x80320000` materialization seed. The **prior session got HEALTHY because the config was being *authored*** (each deploy had a NEW rev → forced a spawn); once authoring stopped, a restart leaves the driver side dark. ⇒ likely the real reason "host-restart didn't change it" in the original symptom.
- **Implication:** to exercise the I/O fix live, force a driver re-spawn by bumping the revision (e.g. toggle the FOCAS DriverInstance `Enabled` off→on, or any benign config edit, then deploy) — reversible. Separately, "restart should re-spawn applied drivers" + "Akka logs should bridge to Serilog" are real OtOpcUa robustness/observability gaps (the latter = the observability bottleneck this plan flagged).
- **Box env mutations still in place (to revert when done):** `Security__DeployApiKey` (deploy API), `Serilog__MinimumLevel__*` (Debug). Backups: `_envbak-pre-deploykey-*`, `_envbak-pre-debug-*`.
### Akka→Serilog observability attempt (2026-06-26) — code done, prod deploy BLOCKED by self-contained layout
- **Code change (correct, builds 0 errors):** added `Akka.Logger.Serilog` 1.5.60 (deps all satisfied by existing pins — Akka 1.5.62 / Serilog 4.3.1) to `Directory.Packages.props` + the Cluster csproj, and `loggers=["Akka.Logger.Serilog.SerilogLogger, …"]` + `loglevel=DEBUG` + `logger-startup-timeout=30s` to the embedded `Resources/akka.conf`. This bridges the DriverHostActor's Akka `ILoggingAdapter` into the Serilog file sink.
- **Prod deploy FAILED TWICE (both auto-rolled-back; service is UP + 4840 listening the whole time after each):**
1. Cluster.dll + Akka.Logger.Serilog.dll swap → crash `FileNotFoundException: Could not load 'Akka.Logger.Serilog 1.5.60'` from `Serilog.Settings.Configuration.ConfigurationReader.LoadConfigurationAssemblies` (the deployed `OtOpcUa.Host.deps.json` didn't list the new assembly; Serilog's `*.Serilog` config-assembly scan then fails fatally).
2. Added the updated `OtOpcUa.Host.deps.json` → crash **"Could not resolve CoreCLR path."** ⇒ the box deployment is **SELF-CONTAINED** (bundles its own runtime), so a `dotnet build` (framework-dependent) deps.json breaks the apphost's runtime resolution.
- **Conclusion:** adding a NEW assembly to this box requires a **full self-contained publish-overlay** (match the box layout; preserve `appsettings*`/`data\`), NOT a DLL/deps.json swap. That's a heavy/risky prod op for what is *diagnostic* observability of the (separate, pre-existing) driver-host re-spawn issue.
- **Prod state now (verified healthy):** FOCAS **I/O-fix DLL still deployed** (228864 B — untouched by these rollbacks), Cluster.dll + deps.json restored to baseline, `Akka.Logger.Serilog.dll` renamed `.disabled`, service Running + OPC 4840 listening. Deploy-API key + Debug-Serilog env still set.
- **RECOMMENDED PIVOT:** the cheap, no-prod-change way to validate the I/O fix is to **force a driver re-spawn via a benign config rev-bump** (the leading hypothesis); and the *proper* driver-host re-spawn fix + the Akka-observability are best developed in a **local docker-dev repro** (safe, full logging, one clean publish-overlay deploy at the end) rather than iterated on the production CNC node. **(User chose: move proper fix to local docker-dev.)**
### 🎯 ROOT CAUSE of the driver-not-spawning — STALE DEPLOYED BINARY (CONFIRMED 2026-06-26)
The DriverHostActor's bootstrap-recovery **already re-spawns drivers in the current source**: `BootstrapRecover` `case Applied:` logs `"recovered Applied state at rev"` and calls `RestoreApplied()` (`DriverHostActor.cs:449-457`), which does `ReconcileDrivers` (re-spawn) + `RebuildAddressSpace` + `PushDesiredSubscriptions` (`:910-918`) — comment: *"the in-memory driver children + OPC UA address space were lost on restart … re-spawn … instead of waiting for a config change (whose identical-config revision would no-op)."* Added by `b1b3f3ff` (2026-06-06) + `397f9b78` (2026-06-07); both ancestors of `feat/focas-pdu-v3`; covered by existing `Restore_on_bootstrap_*` tests.
- **The DEPLOYED wonder `Runtime.dll` is the June-16 base install** (`mtime 2026-06-16 13:05:46`; only the FOCAS *driver* DLL was ever swapped). Binary string scan: `RestoreApplied`/`ReconcileDrivers` method names PRESENT, but the bootstrap-restore log strings `"recovered Applied state at rev"` + `"restored served state for applied deployment"` are **ABSENT** ⇒ the deployed binary **predates b1b3f3ff** and does **NOT** restore served state / re-spawn drivers on bootstrap. **That is exactly why, on this stable-config box, a restart leaves the driver side dark** (and why only config-authoring re-deploys ever lit it up).
- **⇒ No new driver-host fix is needed — it already exists in source.** The proper remedy = **deploy the current Host** (full self-contained publish-overlay onto `E:\ApiInstall\OtOpcUa`, preserving `appsettings*`/`data\`), which in one shot delivers: (1) `RestoreApplied`-on-bootstrap (drivers re-spawn after restart), (2) the FOCAS I/O serialization fix (symptom #1), (3) the Akka→Serilog bridge (observability). The earlier DLL-swap crashes were purely the self-contained-layout mismatch — a full publish-overlay is internally consistent and avoids them.
- **PLAN:** confirm the current Host build boots cleanly + re-spawns drivers on restart in **docker-dev** (safe), then do ONE self-contained publish-overlay to wonder + verify tags leave `0x80320000`.
### ✅ docker-dev confirmation (2026-06-26) — current build re-spawns on bootstrap + Akka bridge works
Built the current source into the docker-dev image (`otopcua-host:dev`), booted `central-1` (fused admin+driver, like wonder). Results:
- **Boots cleanly** with `Akka.Logger.Serilog` (the prod DLL-swap crashes were purely the self-contained-layout mismatch, now confirmed).
- **Akka→Serilog bridge works** — but needed TWO fixes beyond the package: (1) Akka.Hosting owns logger setup so HOCON `akka.loggers` is ignored → wire via `ConfigureLoggers(setup => { setup.LogLevel=DebugLevel; setup.ClearLoggers(); setup.AddLogger<SerilogLogger>(); })` in `WithOtOpcUaClusterBootstrap`; (2) `AddZbSerilog` registers Serilog as the MEL provider but does NOT set the static `Serilog.Log.Logger` (which `Akka.Logger.Serilog` writes to, AND which the Program.cs startup banner uses) → set `Serilog.Log.Logger = app.Services.GetRequiredService<Serilog.ILogger>()` in Program.cs right after `Build()`. With both, the startup banner + full Akka cluster/DriverHost logs now emit. *(The `Log.Logger`-unset gap is a latent bug in the shared `AddZbSerilog` lib affecting all 3 apps' static-`Log` calls — worth a follow-up there.)*
- **`RestoreApplied` re-spawns drivers on bootstrap — CONFIRMED live:** `DriverHost central-1: recovered Applied state at rev …``spawned GalaxyMxGateway/Modbus/OpcUaClient driver` (×3) → `SubscribeBulk pushed 5 references across 3 driver(s)``restored served state for applied deployment … on bootstrap``DriverInstance …: subscribed to N refs`. These are the exact strings ABSENT from the stale wonder binary. ⇒ deploying the current Host fixes the wonder driver-not-spawning blocker.
### Deploy plan (self-contained publish-overlay → wonder)
- Local: `dotnet publish …Host.csproj -c Release -r win-x64 --self-contained true -p:PublishSingleFile=false` (self-contained = brings its own runtime; no version-match concern with the box).
- Zip + SFTP to `win64`. Box overlay (self-healing): full app-dir backup → robocopy publish over `E:\ApiInstall\OtOpcUa` **EXCLUDING `pki\` (OPC server cert) + `appsettings*.json`** → start → verify (4840 + tags) → AUTO-ROLLBACK from backup if unhealthy. Env (deploy key, Debug logging) is registry-side, untouched by the file overlay.
- Then revert the Debug-Serilog env (now superseded; verbose) and verify `parts-count`/`parts-required` read Good (or recoverable BadComm), and that the FixedTree/driver values flow.
### ✅✅ SYMPTOM #1 FIXED + LIVE-VALIDATED ON WONDER (2026-06-26)
After the self-contained overlay (current Host) + two light single-DLL FOCAS swaps, the validation revealed — and each fix peeled back — a **cascade of latent FOCAS-config-vs-driver mismatches** the stale binary had masked. Final state, all live on wonder:
- `DriverHost …: recovered Applied state … → spawned Focas driver focas-z-34184 (stub=false) → DriverInstance focas-z-34184: connected → subscribed to 2 refs`**two ESTABLISHED TCP sessions to `10.201.31.5:8193`**.
- **`read ns=2;s=EQ-3686c0272279/parts-count` → Value 0, Status `0x00000000` (Good)**; `parts-required` → Good. (Value 0 is correct on the idle machine — status, not magnitude.) The original `0x80320000` is gone.
**The complete fix chain (all on `fix/focas-poll-io-serialization`, deployed):**
1. **FOCAS I/O serialization + read timeout** (`SynchronizedFocasClient`) — the original diagnosed root cause: prevents the poll read hanging on the shared socket.
2. **`RestoreApplied`-on-bootstrap** — already in source since `b1b3f3ff`; the wonder fix was deploying the current Host over the stale June-16 binary so the driver re-spawns on restart.
3. **Akka→Serilog bridge** (`ConfigureLoggers().AddLogger<SerilogLogger>()` + set static `Serilog.Log.Logger` in Program.cs) — made the driver-host actor observable; this is what surfaced the next two issues.
4. **`FlexibleStringConverter`** on the FOCAS config `Series` — the AdminUI persists the enum as a number (`"series":6`); the factory now tolerates number-or-string instead of throwing → stub.
5. **Scheme-less host tolerance** in `FocasHostAddress.TryParse` — the AdminUI persists `hostAddress` as a bare `ip:port`; `TryParse` now accepts it (canonical `focas://` unchanged) instead of failing init.
- FOCAS test suite **247 green**; each fix carries a regression test.
- **Follow-up (product quality):** the AdminUI authors FOCAS configs (`series` as number, `hostAddress` without `focas://`) that the driver only now tolerates — the AdminUI↔driver config-format mismatch is worth reconciling at the source. Also: the shared `AddZbSerilog` not setting static `Serilog.Log.Logger` is a latent gap across all 3 apps. And the FixedTree-under-Equipment feature (task #14) is still pending.
## Phase 2 — Get OtOpcUa runtime logs on wonder
Make the Host emit driver-level logs so the data plane is observable. Options (least invasive first): point the service at a Serilog file sink via config/env, or temporarily run with `DOTNET_ENVIRONMENT=Development` (file sink + dev errors — cf. MxGateway note), or add a console capture. Preserve `appsettings*`/`data\`; restore the env after. Then read: did `InitializeAsync` start the FixedTree loop, does the bootstrap throw (and on which call), is `ReadAsync` invoked for the equipment tags, what does it return.
- **Access:** servecli `:2222`, key `~/.ssh/servecli_wonder`, `scratchpad/wonder-ps.sh` (base64 PS over the cmd PTY), SFTP root `C:\Users\dohertj2\Desktop\win64`.
- **Output:** the actual runtime behavior of the data poll + equipment read on the box.
## Phase 3 — Local reproduction in docker-dev (isolate FOCAS-specific vs general)
Reproduce off the production box: in the docker-dev OtOpcUa, configure an equipment tag bound to a driver and check whether values flow at all. Use a driver with an easy local source (e.g. Modbus against a local sim, or the FOCAS wire client against a v1 responder if one can be stood up). If equipment-tag values flow for another driver locally but not FOCAS → FOCAS-specific; if they don't flow for any → a general equipment-projection/data-plane gap. A local repro gives a full-logging debug loop.
- **Output:** scope (FOCAS-only vs general) + a local failing case to fix against.
## Phase 4 — Root cause + minimal fix
From Phases 13, fix the smallest thing that makes the equipment tag carry a value and (if H2 is not by-design) the FixedTree surface. Likely shapes: a swallowed bootstrap exception; a seed/poll-group wiring gap; an Equipment-projection that should include FixedTree/driver auto-nodes; or a data-plane gate that needs config on the wonder node.
## Phase 5 — Validate
- Local (docker-dev) green where reproduced; unit/integration tests for the fixed path.
- Live: re-deploy to wonder, then via the OtOpcUa CLI confirm `parts-count`/`parts-required` read **Good** (value 0 is correct on this idle machine — assert status, not magnitude) and, if applicable, FixedTree Identity/Axes nodes appear with live values. The live Makino helps for FixedTree but is NOT blocking for the value-flow plumbing (macro values are 0 regardless), so this is lower time-pressure than the v3 capture was.
## Phase 6 — Docs + commit
Update this plan's status + the deployment doc; commit on a branch (separate from `feat/focas-pdu-v3` if the fix is OtOpcUa-core rather than FOCAS-driver). Push per the repo flow when asked.
---
## Notes
- **Execution approach:** this is a *debugging* investigation (unknown root cause) — diagnose before fixing; reproduce before claiming a fix; change the smallest thing. Don't deploy a guessed fix to the production CNC node.
- **Key node IDs / endpoints:** equipment `ns=2;s=EQ-3686c0272279`; tags `…/parts-count` (`MACRO:3901`), `…/parts-required` (`MACRO:3902`); OPC `opc.tcp://wonder-app-vd03.zmr.zimmer.com:4840/OtOpcUa`; AdminUI `http://wonder-app-vd03.zmr.zimmer.com:9000` (DisableLogin); deploy = AdminUI `/deployments` → "Deploy current configuration" (Blazo→Akka; no headless API).
- **Reusable tools:** `scratchpad/focas-status/` (live IFocasClient harness), `scratchpad/wonder-ps.sh`, `scratchpad/deploy-focas-v3.ps1`; the OtOpcUa CLI `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` (`read`/`subscribe`/`browse --recursive`).
@@ -7,6 +7,12 @@
# Any divergence from these defaults must be deliberate and recorded in docs/v2/Architecture.md. # Any divergence from these defaults must be deliberate and recorded in docs/v2/Architecture.md.
akka { akka {
# Akka logger wiring (route ILoggingAdapter → Serilog) is configured via Akka.Hosting's
# ConfigureLoggers in ServiceCollectionExtensions.WithOtOpcUaClusterBootstrap — HOCON
# `akka.loggers` alone is not honored by Akka.Hosting. logger-startup-timeout is kept here
# since the Serilog logger can be slow to initialize at startup.
logger-startup-timeout = 30s
extensions = [ extensions = [
"Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider, Akka.Cluster.Tools" "Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider, Akka.Cluster.Tools"
] ]
@@ -1,5 +1,7 @@
using Akka.Cluster.Hosting; using Akka.Cluster.Hosting;
using Akka.Event;
using Akka.Hosting; using Akka.Hosting;
using Akka.Logger.Serilog;
using Akka.Remote.Hosting; using Akka.Remote.Hosting;
using Microsoft.Extensions.Configuration; using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection;
@@ -53,6 +55,19 @@ public static class ServiceCollectionExtensions
builder.AddHocon(HoconLoader.LoadBaseConfig(), HoconAddMode.Append); builder.AddHocon(HoconLoader.LoadBaseConfig(), HoconAddMode.Append);
// Route Akka's internal ILoggingAdapter (DriverHostActor, DriverInstanceActor, cluster
// events, …) into Serilog so those logs reach the same sinks as the MEL/Serilog application
// logs. Akka.Hosting owns logger setup, so HOCON `akka.loggers` alone is not honored — the
// logger must be registered through ConfigureLoggers. Without this the actor graph logs only
// to the default StandardOutLogger (discarded under the Windows service host), which is why
// the driver-role actors were invisible during the 2026-06 data-plane investigation.
builder.ConfigureLoggers(setup =>
{
setup.LogLevel = LogLevel.DebugLevel;
setup.ClearLoggers();
setup.AddLogger<SerilogLogger>();
});
builder.WithRemoting(new RemoteOptions builder.WithRemoting(new RemoteOptions
{ {
HostName = options.Hostname, HostName = options.Hostname,
@@ -10,6 +10,7 @@
<PackageReference Include="Akka.Cluster"/> <PackageReference Include="Akka.Cluster"/>
<PackageReference Include="Akka.Cluster.Hosting"/> <PackageReference Include="Akka.Cluster.Hosting"/>
<PackageReference Include="Akka.Cluster.Tools"/> <PackageReference Include="Akka.Cluster.Tools"/>
<PackageReference Include="Akka.Logger.Serilog"/>
<PackageReference Include="Akka.Remote.Hosting"/> <PackageReference Include="Akka.Remote.Hosting"/>
<PackageReference Include="Microsoft.Extensions.Hosting"/> <PackageReference Include="Microsoft.Extensions.Hosting"/>
<PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions"/> <PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions"/>
@@ -306,7 +306,16 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
Volatile.Read(ref _health).LastSuccessfulRead, Volatile.Read(ref _health).LastSuccessfulRead,
$"FOCAS status 0x{status:X8} reading {reference}")); $"FOCAS status 0x{status:X8} reading {reference}"));
} }
catch (OperationCanceledException) { throw; } catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) { throw; }
catch (OperationCanceledException)
{
// Per-call timeout (not external cancellation) — the read stalled past the device
// Timeout budget. Surface a recoverable comm error so the BadWaitingForInitialData
// seed is overwritten and health degrades, instead of the read hanging forever.
results[i] = new DataValueSnapshot(null, FocasStatusMapper.BadCommunicationError, null, now);
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead, $"FOCAS read timed out for {reference}"));
}
catch (Exception ex) catch (Exception ex)
{ {
results[i] = new DataValueSnapshot(null, FocasStatusMapper.BadCommunicationError, null, now); results[i] = new DataValueSnapshot(null, FocasStatusMapper.BadCommunicationError, null, now);
@@ -356,7 +365,15 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
var status = await client.WriteAsync(parsed, def.DataType, w.Value, cancellationToken).ConfigureAwait(false); var status = await client.WriteAsync(parsed, def.DataType, w.Value, cancellationToken).ConfigureAwait(false);
results[i] = new WriteResult(status); results[i] = new WriteResult(status);
} }
catch (OperationCanceledException) { throw; } catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) { throw; }
catch (OperationCanceledException)
{
// Per-call timeout (not external cancellation) — the write stalled past the device
// Timeout budget. Surface a recoverable comm error rather than aborting the batch.
results[i] = new WriteResult(FocasStatusMapper.BadCommunicationError);
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead, $"FOCAS write timed out for {w.FullReference}"));
}
catch (NotSupportedException nse) catch (NotSupportedException nse)
{ {
results[i] = new WriteResult(FocasStatusMapper.BadNotSupported); results[i] = new WriteResult(FocasStatusMapper.BadNotSupported);
@@ -1113,7 +1130,11 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
device.Client = null; device.Client = null;
} }
device.Client = _clientFactory.Create(); // Wrap the raw wire client so every operation on the device's single FOCAS/2 socket is
// serialized (request→response on one socket cannot interleave) and time-bounded. Without
// this, the equipment poll, fixed-tree loop, probe, and recycle loop collide on the shared
// socket and a stalled read blocks forever — leaving bound tags at BadWaitingForInitialData.
device.Client = new SynchronizedFocasClient(_clientFactory.Create(), _options.Timeout);
try try
{ {
await device.Client.ConnectAsync(device.ParsedAddress, _options.Timeout, ct).ConfigureAwait(false); await device.Client.ConnectAsync(device.ParsedAddress, _options.Timeout, ct).ConfigureAwait(false);
@@ -195,12 +195,41 @@ public static class FocasDriverFactoryExtensions
AllowTrailingCommas = true, AllowTrailingCommas = true,
}; };
/// <summary>
/// Reads a JSON property as a string, tolerating a JSON <b>number</b> token as well. The
/// AdminUI persists the FOCAS <c>Series</c> enum as its integer value (e.g. <c>"series":6</c>),
/// while this DTO models <c>Series</c> as a string handed to <see cref="ParseSeries"/>
/// (Enum.TryParse accepts the numeric form). Without this, System.Text.Json throws
/// "Cannot get the value of a token type 'Number' as a string" on the bare number and the
/// driver falls back to a stub. Accepts string / number / null and emits a string.
/// </summary>
internal sealed class FlexibleStringConverter : JsonConverter<string?>
{
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
reader.TokenType switch
{
JsonTokenType.String => reader.GetString(),
JsonTokenType.Number => reader.TryGetInt64(out var n)
? n.ToString(System.Globalization.CultureInfo.InvariantCulture)
: reader.GetDouble().ToString(System.Globalization.CultureInfo.InvariantCulture),
JsonTokenType.Null => null,
_ => throw new JsonException($"Expected string, number, or null but got {reader.TokenType}."),
};
public override void Write(Utf8JsonWriter writer, string? value, JsonSerializerOptions options)
{
if (value is null) writer.WriteNullValue();
else writer.WriteStringValue(value);
}
}
internal sealed class FocasDriverConfigDto internal sealed class FocasDriverConfigDto
{ {
/// <summary>Gets or sets the FOCAS client factory backend name (e.g. "wire" or "stub").</summary> /// <summary>Gets or sets the FOCAS client factory backend name (e.g. "wire" or "stub").</summary>
public string? Backend { get; init; } public string? Backend { get; init; }
/// <summary>Gets or sets the CNC series for this driver.</summary> /// <summary>Gets or sets the CNC series for this driver.</summary>
[JsonConverter(typeof(FlexibleStringConverter))]
public string? Series { get; init; } public string? Series { get; init; }
/// <summary>Gets or sets the operation timeout in milliseconds.</summary> /// <summary>Gets or sets the operation timeout in milliseconds.</summary>
@@ -234,6 +263,7 @@ public static class FocasDriverFactoryExtensions
public string? DeviceName { get; init; } public string? DeviceName { get; init; }
/// <summary>Gets or sets the CNC series for this device (overrides top-level series if provided).</summary> /// <summary>Gets or sets the CNC series for this device (overrides top-level series if provided).</summary>
[JsonConverter(typeof(FlexibleStringConverter))]
public string? Series { get; init; } public string? Series { get; init; }
/// <summary> /// <summary>
@@ -21,9 +21,19 @@ public sealed record FocasHostAddress(string Host, int Port)
{ {
if (string.IsNullOrWhiteSpace(value)) return null; if (string.IsNullOrWhiteSpace(value)) return null;
const string prefix = "focas://"; const string prefix = "focas://";
if (!value.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) return null;
var body = value[prefix.Length..]; // Canonical form is focas://{ip}[:{port}], but the AdminUI persists the device host as a
// scheme-less "{ip}[:{port}]" (e.g. "10.201.31.5:8193"). Accept that too: take the body
// after focas:// when present, else the whole value when it carries NO other URI scheme
// (a "://" that isn't ours — e.g. http:// — is still rejected). The host-contains-colon
// guard below then rejects malformed scheme typos like "focas:10.0.0.5:8193".
string body;
if (value.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
body = value[prefix.Length..];
else if (!value.Contains("://", StringComparison.Ordinal))
body = value;
else
return null;
if (string.IsNullOrEmpty(body)) return null; if (string.IsNullOrEmpty(body)) return null;
var colonIdx = body.LastIndexOf(':'); var colonIdx = body.LastIndexOf(':');
@@ -39,7 +49,9 @@ public sealed record FocasHostAddress(string Host, int Port)
{ {
host = body; host = body;
} }
if (string.IsNullOrEmpty(host)) return null; // Empty host, or a host still carrying a colon (e.g. the malformed "focas:10.0.0.5" left
// when someone wrote "focas:10.0.0.5:8193" without the //), is invalid.
if (string.IsNullOrEmpty(host) || host.Contains(':', StringComparison.Ordinal)) return null;
return new FocasHostAddress(host, port); return new FocasHostAddress(host, port);
} }
} }
@@ -0,0 +1,152 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.FOCAS;
/// <summary>
/// Decorates an <see cref="IFocasClient"/> so that every wire operation on the device's
/// single FOCAS/2 socket is (1) <b>serialized</b> against all other operations and
/// (2) <b>time-bounded</b>.
/// </summary>
/// <remarks>
/// <para>FOCAS/2 over TCP:8193 is a strict request→response protocol on ONE socket. The
/// driver holds a single <see cref="IFocasClient"/> per device, but several independent loops
/// read from it concurrently — the equipment poll (<see cref="FocasDriver.ReadAsync"/>), the
/// fixed-tree loop (<c>FixedTreeLoopAsync</c>), the connectivity probe, and the recycle loop.
/// Without serialization, two reads interleave their <c>send(request); read(response)</c> on the
/// same socket: one reader consumes the other's response PDU and the victim then blocks forever
/// waiting for bytes that never arrive — leaving the bound OPC UA node stuck at
/// <c>BadWaitingForInitialData</c>. This was the root cause of FOCAS equipment tags never
/// surfacing a value while the probe reported HEALTHY (the probe reads work single-threaded on a
/// dev box, but collide deployed once the fixed-tree loop runs concurrently).</para>
///
/// <para>The gate (<see cref="SemaphoreSlim"/> of count 1) makes each request→response atomic on
/// the socket. The per-call timeout ensures a stalled response can never hold the gate — and thus
/// the socket — indefinitely; a hung read surfaces as a recoverable error at the configured
/// <c>Timeout</c> budget instead of permanent silence. The gate and timeout are paired
/// deliberately: a lock around an <i>unbounded</i> read would deadlock all I/O for the device.</para>
///
/// <para><see cref="ConnectAsync"/> and <see cref="ProbeAsync"/> are serialized but NOT bounded by
/// this decorator's call timeout — they carry their own budgets (the connect timeout argument and
/// the probe's caller-supplied linked token respectively), and double-bounding would shrink them.</para>
/// </remarks>
public sealed class SynchronizedFocasClient : IFocasClient
{
private readonly IFocasClient _inner;
private readonly TimeSpan _callTimeout;
private readonly SemaphoreSlim _gate = new(1, 1);
/// <summary>Wraps <paramref name="inner"/> with per-device serialization + a per-call timeout.</summary>
/// <param name="inner">The underlying FOCAS client to serialize access to.</param>
/// <param name="callTimeout">
/// The budget applied to each data read/write. <see cref="TimeSpan.Zero"/> or negative disables
/// the per-call timeout (callers' own cancellation tokens still apply).
/// </param>
public SynchronizedFocasClient(IFocasClient inner, TimeSpan callTimeout)
{
_inner = inner ?? throw new ArgumentNullException(nameof(inner));
_callTimeout = callTimeout;
}
/// <inheritdoc />
public bool IsConnected => _inner.IsConnected;
/// <inheritdoc />
public Task ConnectAsync(FocasHostAddress address, TimeSpan timeout, CancellationToken cancellationToken) =>
RunGatedAsync(ct => _inner.ConnectAsync(address, timeout, ct), cancellationToken);
/// <inheritdoc />
public Task<bool> ProbeAsync(CancellationToken cancellationToken) =>
RunGatedAsync(ct => _inner.ProbeAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<(object? value, uint status)> ReadAsync(
FocasAddress address, FocasDataType type, CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.ReadAsync(address, type, ct), cancellationToken);
/// <inheritdoc />
public Task<uint> WriteAsync(
FocasAddress address, FocasDataType type, object? value, CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.WriteAsync(address, type, value, ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<FocasActiveAlarm>> ReadAlarmsAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.ReadAlarmsAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<FocasSysInfo> GetSysInfoAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetSysInfoAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<FocasAxisName>> GetAxisNamesAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetAxisNamesAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<FocasSpindleName>> GetSpindleNamesAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetSpindleNamesAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<FocasDynamicSnapshot> ReadDynamicAsync(int axisIndex, CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.ReadDynamicAsync(axisIndex, ct), cancellationToken);
/// <inheritdoc />
public Task<FocasProgramInfo> GetProgramInfoAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetProgramInfoAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<FocasTimer> GetTimerAsync(FocasTimerKind kind, CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetTimerAsync(kind, ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<FocasServoLoad>> GetServoLoadsAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetServoLoadsAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<int>> GetSpindleLoadsAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetSpindleLoadsAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<int>> GetSpindleMaxRpmsAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetSpindleMaxRpmsAsync(ct), cancellationToken);
/// <inheritdoc />
public Task<IReadOnlyList<int>> GetPositionFiguresAsync(CancellationToken cancellationToken) =>
RunBoundedAsync(ct => _inner.GetPositionFiguresAsync(ct), cancellationToken);
/// <inheritdoc />
public void Dispose()
{
_inner.Dispose();
_gate.Dispose();
}
// Gate only — the caller already governs the budget (connect timeout arg / probe linked token).
private async Task<T> RunGatedAsync<T>(Func<CancellationToken, Task<T>> op, CancellationToken ct)
{
await _gate.WaitAsync(ct).ConfigureAwait(false);
try { return await op(ct).ConfigureAwait(false); }
finally { _gate.Release(); }
}
private async Task RunGatedAsync(Func<CancellationToken, Task> op, CancellationToken ct)
{
await _gate.WaitAsync(ct).ConfigureAwait(false);
try { await op(ct).ConfigureAwait(false); }
finally { _gate.Release(); }
}
// Gate + per-call timeout. A fired timeout surfaces as OperationCanceledException whose token is
// the linked (not the caller's) token — callers distinguish it from real cancellation by testing
// their own token's IsCancellationRequested.
private async Task<T> RunBoundedAsync<T>(Func<CancellationToken, Task<T>> op, CancellationToken ct)
{
await _gate.WaitAsync(ct).ConfigureAwait(false);
try
{
if (_callTimeout <= TimeSpan.Zero)
return await op(ct).ConfigureAwait(false);
using var linked = CancellationTokenSource.CreateLinkedTokenSource(ct);
linked.CancelAfter(_callTimeout);
return await op(linked.Token).ConfigureAwait(false);
}
finally { _gate.Release(); }
}
}
@@ -228,6 +228,14 @@ builder.Services.AddOtOpcUaHealth();
builder.Services.AddOtOpcUaObservability(builder.Configuration); builder.Services.AddOtOpcUaObservability(builder.Configuration);
var app = builder.Build(); var app = builder.Build();
// AddZbSerilog registers Serilog as the MEL logging provider but does NOT assign the static
// Serilog.Log.Logger. Set it from the DI root logger so (1) static Log.* calls like the startup
// banner below emit, and (2) Akka.Logger.Serilog's SerilogLogger — which writes to Log.Logger —
// routes the actor graph's logs (DriverHostActor et al.) to the configured sinks. Must run before
// app.RunAsync() starts the ActorSystem (the Akka logger captures Log.Logger at system start).
Serilog.Log.Logger = app.Services.GetRequiredService<Serilog.ILogger>();
app.UseSerilogRequestLogging(); app.UseSerilogRequestLogging();
// Razor class library static assets (_content/<libname>/...) are served via endpoint // Razor class library static assets (_content/<libname>/...) are served via endpoint
@@ -70,9 +70,10 @@ public sealed class FocasDriverProbeTests
[Fact] [Fact]
public async Task MalformedHostAddress_Returns_OkFalse_WithNoHostPortMessage() public async Task MalformedHostAddress_Returns_OkFalse_WithNoHostPortMessage()
{ {
// "not-a-focas-url" is not a focas:// URL — TryParse returns null. // A foreign URI scheme ("http://…") is rejected by TryParse → null. (A bare
// "{ip}[:{port}]" without a scheme is now tolerated, so it can't be the malformed case.)
var result = await Probe.ProbeAsync( var result = await Probe.ProbeAsync(
"{\"devices\":[{\"hostAddress\":\"not-a-focas-url\"}]}", "{\"devices\":[{\"hostAddress\":\"http://10.0.0.5/\"}]}",
TimeSpan.FromSeconds(3), TimeSpan.FromSeconds(3),
TestContext.Current.CancellationToken); TestContext.Current.CancellationToken);
@@ -38,6 +38,25 @@ public sealed class FocasFactoryConfigTests
drv.Options.FixedTree.TimerPollInterval.ShouldBe(TimeSpan.FromSeconds(30)); drv.Options.FixedTree.TimerPollInterval.ShouldBe(TimeSpan.FromSeconds(30));
} }
/// <summary>
/// The AdminUI persists FocasCncSeries as its integer value (e.g. <c>"series":6</c> = Thirty_i) —
/// a bare JSON number. The factory must tolerate it (via FlexibleStringConverter) and build the
/// real driver, not throw + fall back to a stub. Regression for the 2026-06-26 wonder data-plane
/// deploy where the driver stubbed on "Cannot get the value of a token type 'Number' as a string".
/// </summary>
[Fact]
public void CreateInstance_accepts_numeric_Series_from_AdminUI_serialization()
{
const string json = """
{"Backend":"wire","series":6,"devices":[{"hostAddress":"10.0.0.5:8193","deviceName":"Makino","series":6,"positionDecimalPlaces":0}]}
""";
var drv = FocasDriverFactoryExtensions.CreateInstance("drv-1", json);
drv.Options.Devices.ShouldHaveSingleItem();
drv.Options.Devices[0].Series.ShouldBe(FocasCncSeries.Thirty_i);
}
/// <summary>Verifies that the AlarmProjection configuration section is mapped to driver options.</summary> /// <summary>Verifies that the AlarmProjection configuration section is mapped to driver options.</summary>
[Fact] [Fact]
public void CreateInstance_maps_AlarmProjection_section_onto_options() public void CreateInstance_maps_AlarmProjection_section_onto_options()
@@ -0,0 +1,207 @@
using System.Diagnostics;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.FOCAS;
namespace ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests;
/// <summary>
/// Coverage for the FOCAS data-plane fix (2026-06-25 equipment-tag investigation): all wire I/O
/// on a device's single FOCAS/2 socket must be serialized (request→response cannot interleave)
/// and every steady-state read/write must be time-bounded so a stalled CNC read surfaces as a
/// recoverable error instead of hanging forever at BadWaitingForInitialData. See
/// <c>docs/plans/2026-06-25-otopcua-equipment-dataplane-investigation.md</c>.
/// </summary>
[Trait("Category", "Unit")]
public sealed class FocasIoSerializationTests
{
private static readonly FocasAddress Macro500 = new(FocasAreaKind.Macro, null, 500, null);
// ---- SynchronizedFocasClient: serialization ----
[Fact]
public async Task Concurrent_reads_are_serialized_onto_the_inner_client()
{
var inner = new RecordingClient { ReadDelay = TimeSpan.FromMilliseconds(20) };
await using var _ = NoopDispose(inner);
var client = new SynchronizedFocasClient(inner, TimeSpan.FromSeconds(5));
var reads = Enumerable.Range(0, 8)
.Select(_ => client.ReadAsync(Macro500, FocasDataType.Float64, CancellationToken.None));
await Task.WhenAll(reads);
inner.MaxConcurrency.ShouldBe(1); // never more than one wire op on the socket at a time
inner.ReadCount.ShouldBe(8);
}
// ---- SynchronizedFocasClient: per-call timeout ----
[Fact]
public async Task A_hung_read_is_bounded_by_the_call_timeout()
{
var inner = new RecordingClient { BlockReadUntilCancelled = true };
var client = new SynchronizedFocasClient(inner, TimeSpan.FromMilliseconds(100));
var sw = Stopwatch.StartNew();
await Should.ThrowAsync<OperationCanceledException>(
() => client.ReadAsync(Macro500, FocasDataType.Float64, CancellationToken.None));
sw.Stop();
sw.Elapsed.ShouldBeLessThan(TimeSpan.FromSeconds(2)); // bounded, not the indefinite OS TCP wait
}
[Fact]
public async Task A_hung_read_does_not_hold_the_socket_for_the_next_call()
{
// The gate must be released when a bounded call times out, otherwise one stall would wedge
// every subsequent op on the device. Read #1 hangs (times out); read #2 must still proceed.
var inner = new TimeoutThenServeClient { FirstCallBlocks = true };
var client = new SynchronizedFocasClient(inner, TimeSpan.FromMilliseconds(100));
await Should.ThrowAsync<OperationCanceledException>(
() => client.ReadAsync(Macro500, FocasDataType.Float64, CancellationToken.None));
var (value, status) = await client.ReadAsync(Macro500, FocasDataType.Float64, CancellationToken.None);
status.ShouldBe(FocasStatusMapper.Good);
value.ShouldBe(42);
}
[Fact]
public async Task Probe_is_not_bounded_by_the_call_timeout()
{
// Connect/Probe carry their own budgets; the decorator must not shrink them to its read budget.
var inner = new RecordingClient { ProbeDelay = TimeSpan.FromMilliseconds(200) };
var client = new SynchronizedFocasClient(inner, TimeSpan.FromMilliseconds(50));
var result = await client.ProbeAsync(CancellationToken.None);
result.ShouldBeTrue();
}
[Fact]
public async Task Zero_call_timeout_disables_the_per_call_bound()
{
var inner = new RecordingClient { ReadDelay = TimeSpan.FromMilliseconds(120) };
var client = new SynchronizedFocasClient(inner, TimeSpan.Zero);
var (value, status) = await client.ReadAsync(Macro500, FocasDataType.Float64, CancellationToken.None);
status.ShouldBe(FocasStatusMapper.Good);
value.ShouldBe(42);
}
[Fact]
public void Dispose_disposes_the_inner_client()
{
var inner = new RecordingClient();
var client = new SynchronizedFocasClient(inner, TimeSpan.FromSeconds(1));
client.Dispose();
inner.DisposeCount.ShouldBe(1);
}
// ---- Driver level: a timed-out read overwrites the seed with a recoverable status ----
[Fact]
public async Task Driver_read_that_times_out_returns_BadCommunicationError_not_a_hang()
{
var factory = new FakeFocasClientFactory { Customise = () => new RecordingClient { BlockReadUntilCancelled = true } };
var drv = new FocasDriver(new FocasDriverOptions
{
Devices = [new FocasDeviceOptions("focas://10.0.0.5:8193")],
Tags = [new FocasTagDefinition("CustomVar", "focas://10.0.0.5:8193", "MACRO:500", FocasDataType.Float64)],
Probe = new FocasProbeOptions { Enabled = false },
Timeout = TimeSpan.FromMilliseconds(150),
}, "drv-1", factory);
await drv.InitializeAsync("{}", CancellationToken.None);
var sw = Stopwatch.StartNew();
var snap = (await drv.ReadAsync(["CustomVar"], CancellationToken.None)).Single();
sw.Stop();
snap.StatusCode.ShouldBe(FocasStatusMapper.BadCommunicationError);
sw.Elapsed.ShouldBeLessThan(TimeSpan.FromSeconds(2)); // bounded by Timeout, not hung
}
[Fact]
public async Task Driver_read_does_not_propagate_a_call_timeout_as_cancellation()
{
// The per-call timeout must NOT bubble out of ReadAsync as OperationCanceledException — that
// would abort the whole poll batch. It must be caught and turned into a per-tag Bad status.
var factory = new FakeFocasClientFactory { Customise = () => new RecordingClient { BlockReadUntilCancelled = true } };
var drv = new FocasDriver(new FocasDriverOptions
{
Devices = [new FocasDeviceOptions("focas://10.0.0.5:8193")],
Tags = [new FocasTagDefinition("CustomVar", "focas://10.0.0.5:8193", "MACRO:500", FocasDataType.Float64)],
Probe = new FocasProbeOptions { Enabled = false },
Timeout = TimeSpan.FromMilliseconds(120),
}, "drv-1", factory);
await drv.InitializeAsync("{}", CancellationToken.None);
// Should complete (not throw) with a Bad snapshot, even though the caller's token is never cancelled.
var snaps = await drv.ReadAsync(["CustomVar"], CancellationToken.None);
snaps.Single().StatusCode.ShouldBe(FocasStatusMapper.BadCommunicationError);
}
private static DisposeGuard NoopDispose(IDisposable d) => new(d);
private sealed class DisposeGuard(IDisposable inner) : IAsyncDisposable
{
public ValueTask DisposeAsync() { inner.Dispose(); return ValueTask.CompletedTask; }
}
/// <summary>Fake that records concurrency + optionally delays/blocks reads and probes.</summary>
private class RecordingClient : FakeFocasClient
{
private int _current;
public int MaxConcurrency;
public int ReadCount;
public TimeSpan ReadDelay = TimeSpan.Zero;
public bool BlockReadUntilCancelled;
public TimeSpan ProbeDelay = TimeSpan.Zero;
public override async Task<(object? value, uint status)> ReadAsync(
FocasAddress address, FocasDataType type, CancellationToken ct)
{
Interlocked.Increment(ref ReadCount);
var observed = Interlocked.Increment(ref _current);
InterlockedMax(ref MaxConcurrency, observed);
try
{
if (BlockReadUntilCancelled) await Task.Delay(Timeout.Infinite, ct).ConfigureAwait(false);
else if (ReadDelay > TimeSpan.Zero) await Task.Delay(ReadDelay, ct).ConfigureAwait(false);
return ((object?)42, FocasStatusMapper.Good);
}
finally { Interlocked.Decrement(ref _current); }
}
public override async Task<bool> ProbeAsync(CancellationToken ct)
{
if (ProbeDelay > TimeSpan.Zero) await Task.Delay(ProbeDelay, ct).ConfigureAwait(false);
return true;
}
private static void InterlockedMax(ref int target, int value)
{
int seen;
do { seen = Volatile.Read(ref target); if (value <= seen) return; }
while (Interlocked.CompareExchange(ref target, value, seen) != seen);
}
}
/// <summary>First read blocks until cancelled; subsequent reads serve a Good value immediately.</summary>
private sealed class TimeoutThenServeClient : FakeFocasClient
{
public bool FirstCallBlocks;
private int _calls;
public override async Task<(object? value, uint status)> ReadAsync(
FocasAddress address, FocasDataType type, CancellationToken ct)
{
var n = Interlocked.Increment(ref _calls);
if (n == 1 && FirstCallBlocks) await Task.Delay(Timeout.Infinite, ct).ConfigureAwait(false);
return ((object?)42, FocasStatusMapper.Good);
}
}
}
@@ -245,6 +245,32 @@ public sealed class FocasReadWriteTests
/// <summary>Verifies that cancellation signals are propagated.</summary> /// <summary>Verifies that cancellation signals are propagated.</summary>
[Fact] [Fact]
public async Task Cancellation_propagates() public async Task Cancellation_propagates()
{
var (drv, factory) = NewDriver(
new FocasTagDefinition("X", "focas://10.0.0.5:8193", "R100", FocasDataType.Byte));
await drv.InitializeAsync("{}", CancellationToken.None);
using var cts = new CancellationTokenSource();
cts.Cancel();
factory.Customise = () => new FakeFocasClient
{
ThrowOnRead = true,
Exception = new OperationCanceledException(cts.Token),
};
// A CANCELLATION of the caller's token must propagate (abort the read). This is distinct
// from a per-call timeout — an OCE raised while the caller's token is still live is swallowed
// to a per-tag BadCommunicationError (see Swallows_a_spurious_read_OCE_when_caller_not_cancelled).
await Should.ThrowAsync<OperationCanceledException>(
() => drv.ReadAsync(["X"], cts.Token));
}
/// <summary>
/// An OperationCanceledException from the wire read while the CALLER'S token is NOT cancelled
/// (e.g. a per-call timeout firing) must be turned into a per-tag BadCommunicationError, not
/// propagated — otherwise one stalled tag would abort the whole poll batch.
/// </summary>
[Fact]
public async Task Swallows_a_spurious_read_OCE_when_caller_not_cancelled()
{ {
var (drv, factory) = NewDriver( var (drv, factory) = NewDriver(
new FocasTagDefinition("X", "focas://10.0.0.5:8193", "R100", FocasDataType.Byte)); new FocasTagDefinition("X", "focas://10.0.0.5:8193", "R100", FocasDataType.Byte));
@@ -255,8 +281,8 @@ public sealed class FocasReadWriteTests
Exception = new OperationCanceledException(), Exception = new OperationCanceledException(),
}; };
await Should.ThrowAsync<OperationCanceledException>( var snap = (await drv.ReadAsync(["X"], CancellationToken.None)).Single();
() => drv.ReadAsync(["X"], CancellationToken.None)); snap.StatusCode.ShouldBe(FocasStatusMapper.BadCommunicationError);
} }
/// <summary>Verifies that ShutdownAsync disposes the client.</summary> /// <summary>Verifies that ShutdownAsync disposes the client.</summary>
@@ -20,6 +20,9 @@ public sealed class FocasScaffoldingTests
[InlineData("focas://cnc-01.factory.internal:8193", "cnc-01.factory.internal", 8193)] [InlineData("focas://cnc-01.factory.internal:8193", "cnc-01.factory.internal", 8193)]
[InlineData("focas://10.0.0.5:12345", "10.0.0.5", 12345)] [InlineData("focas://10.0.0.5:12345", "10.0.0.5", 12345)]
[InlineData("FOCAS://10.0.0.5:8193", "10.0.0.5", 8193)] // case-insensitive scheme [InlineData("FOCAS://10.0.0.5:8193", "10.0.0.5", 8193)] // case-insensitive scheme
[InlineData("10.201.31.5:8193", "10.201.31.5", 8193)] // scheme-less (AdminUI-persisted form)
[InlineData("10.0.0.5", "10.0.0.5", 8193)] // scheme-less, default port
[InlineData("cnc-01.factory.internal:8193", "cnc-01.factory.internal", 8193)] // scheme-less hostname
public void HostAddress_parses_valid(string input, string host, int port) public void HostAddress_parses_valid(string input, string host, int port)
{ {
var parsed = FocasHostAddress.TryParse(input); var parsed = FocasHostAddress.TryParse(input);
@@ -224,9 +227,11 @@ public sealed class FocasScaffoldingTests
[Fact] [Fact]
public async Task InitializeAsync_malformed_address_faults() public async Task InitializeAsync_malformed_address_faults()
{ {
// A non-focas:// URI scheme is rejected by TryParse (a bare "{ip}[:{port}]" is now
// tolerated, so the malformed case must carry a foreign scheme).
var drv = new FocasDriver(new FocasDriverOptions var drv = new FocasDriver(new FocasDriverOptions
{ {
Devices = [new FocasDeviceOptions("not-an-address")], Devices = [new FocasDeviceOptions("http://10.0.0.5/")],
}, "drv-1"); }, "drv-1");
await Should.ThrowAsync<InvalidOperationException>( await Should.ThrowAsync<InvalidOperationException>(