docs(phase5): real Test-Connect handshakes per driver + degrade semantics
Create docs/drivers/TestConnectProbes.md: full reference for the Phase 5 protocol-handshake probes — result contract, per-driver handshake table, TwinCAT/FOCAS/Galaxy degrade semantics, live-verify scope, and the Historian.Wonderware already-done note. Annotate the Phase 7 step in docs/plans/2026-05-28-adminui-driver-pages-design.md with a shipped note pointing at the phase-5 design doc and TestConnectProbes.md.
This commit is contained in:
@@ -0,0 +1,136 @@
|
||||
# Test-Connect Probes — Protocol Handshakes
|
||||
|
||||
Each driver's **Test-Connect** button in the AdminUI runs a probe against the
|
||||
form's current config (never the persisted row, never the live driver actor).
|
||||
Before Phase 5 (shipped 2026-06-16) every probe was a bare TCP `ConnectAsync`
|
||||
— a live-but-rejecting device showed a healthy green tick, and the operator
|
||||
only discovered the truth when the driver faulted at deploy. Phase 5 replaced
|
||||
each TCP-only probe with a **real protocol handshake** so a reachable-but-wrong
|
||||
or actively-rejecting endpoint now reads RED.
|
||||
|
||||
The `IDriverProbe` / `DriverProbeResult` contract and DI registration are
|
||||
unchanged. Probes run in a transient actor with a timeout clamp of 1–60 s
|
||||
and must not mutate any state.
|
||||
|
||||
For the AdminUI probe flow (button → `AdminOperationsActor` → transient probe
|
||||
actor), see
|
||||
[`docs/plans/2026-05-28-adminui-driver-pages-design.md`](../plans/2026-05-28-adminui-driver-pages-design.md)
|
||||
§4.
|
||||
|
||||
---
|
||||
|
||||
## Result contract
|
||||
|
||||
All probes return a consistent `DriverProbeResult(bool Ok, string? Message, TimeSpan? Latency)`.
|
||||
The message templates below are uniform across all 8 drivers:
|
||||
|
||||
| Outcome | `Ok` | Message template |
|
||||
|---------|------|-----------------|
|
||||
| TCP connect fails | `false` | `"Connect failed: {SocketErrorCode}"` |
|
||||
| TCP ok + handshake ok | `true` | driver-specific descriptive string (see table below) |
|
||||
| TCP ok but handshake rejected | `false` | `"Reachable at {host}:{port} but {proto} handshake failed: {detail}"` |
|
||||
| Timeout | `false` | `"Probe timed out after {n}s."` |
|
||||
|
||||
The third row is the key new behavior: a reachable device that answers on the
|
||||
port but rejects the protocol-level handshake now surfaces a `false` result
|
||||
with a human-readable explanation rather than a false-green TCP-open tick.
|
||||
|
||||
---
|
||||
|
||||
## Per-driver handshake
|
||||
|
||||
| Driver | Handshake | Ok message | Dev-rig target |
|
||||
|--------|-----------|------------|----------------|
|
||||
| **Modbus** | FC03 (Read Holding Registers, qty 1 @ addr 0) via `ModbusTcpTransport`. A Modbus exception PDU still proves a real Modbus device → `Ok`. A non-MBAP reply → handshake fail. | `"Modbus FC03 OK"` | `10.100.0.35:5020` (Modbus sim) |
|
||||
| **OpcUaClient** | `DiscoveryClient.GetEndpointsAsync` — no session, no app-cert, no auth. ≥ 1 endpoint → `Ok`. A non-OPC-UA TCP server throws or times out → handshake fail. | `"OPC UA: N endpoint(s)"` | `opc.tcp://10.100.0.35:50000` (opc-plc) |
|
||||
| **S7** | `Plc.OpenAsync` (COTP CR/CC + S7 setup-communication), check `IsConnected`, then `Close`. Wrong rack/slot or a non-S7 server causes `OpenAsync` to throw → handshake fail. | `"S7 connected (CPU …)"` | `10.100.0.35:1102` (python-snap7 sim) |
|
||||
| **AbCip** | `libplctag` Tag `InitializeAsync` (EIP session + CIP Forward Open). A CIP-level error such as tag-not-found still proves the controller answered CIP → `Ok`. A session/ForwardOpen/connect error → handshake fail. | `"CIP session OK"` | `10.100.0.35:44818` (CIP sim) |
|
||||
| **AbLegacy** | Same `libplctag` `InitializeAsync` handshake as AbCip, PCCC protocol family. | `"CIP session OK"` (PCCC family) | Deferred — no PLC5/SLC sim |
|
||||
| **TwinCAT** | `AdsClient.Connect` + `ReadStateAsync`. See [degrade semantics](#twincat-degrade) below. | `"ADS state: {state}"` | Deferred — no ADS target |
|
||||
| **FOCAS** | `cnc_allclibhndl3` via FWLIB P/Invoke (`Wire.WireFocasClient`). See [degrade semantics](#focas-degrade) below. | `"FOCAS handle OK"` | Deferred — no CNC + FWLIB |
|
||||
| **Galaxy** | gRPC unary call to `GalaxyRepository.TestConnection` on the configured mxaccessgw endpoint. See [auth-rejection rule](#galaxy-auth-rejection) below. | `"gateway gRPC OK"` | `http://10.100.0.48:5120` (mxaccessgw) |
|
||||
|
||||
**Historian.Wonderware** already performed a real handshake (`Hello` → `HelloAck`)
|
||||
before Phase 5 and was not changed by this work. See
|
||||
[`Historian.Wonderware.md`](Historian.Wonderware.md) for details.
|
||||
|
||||
---
|
||||
|
||||
## Degrade semantics
|
||||
|
||||
Three drivers have environmental constraints that can prevent the handshake
|
||||
from running on certain hosts. The **degradation principle** is: the probe
|
||||
must never produce a result *worse* than today's TCP-only probe. A genuine
|
||||
protocol rejection from a reachable device is a correct RED; an inability to
|
||||
*run* the handshake at all (no FWLIB, no managed router) degrades to the
|
||||
existing TCP-reachability message — still a green tick but annotated.
|
||||
|
||||
### TwinCAT degrade
|
||||
|
||||
Where the handshake is available:
|
||||
|
||||
- `AdsClient.Connect(netId, port)` + `ReadStateAsync` → `Ok=true`,
|
||||
`"ADS state: {state}"` (Run / Config / Stop).
|
||||
- An ADS **route-table rejection** from a reachable ADS router is a **true RED**:
|
||||
`"Reachable at {host}:{port} but ADS handshake failed: {detail} — check the
|
||||
target's ADS route table authorizes this host"`. This is the correct result:
|
||||
the driver would also be unable to function without an authorized route.
|
||||
|
||||
Where the handshake is unavailable (headless server, no TwinCAT runtime, the
|
||||
managed AMS router cannot start):
|
||||
|
||||
- Probe degrades to TCP-reachability: `Ok=true`,
|
||||
`"(ADS handshake unavailable on this host — TCP reachability only)"`.
|
||||
|
||||
### FOCAS degrade
|
||||
|
||||
On a Windows host with the FANUC FWLIB shared library present:
|
||||
|
||||
- `cnc_allclibhndl3` is called via the existing `Wire.WireFocasClient` P/Invoke.
|
||||
A successful handle allocation → `Ok=true`, `"FOCAS handle OK"`.
|
||||
- A CNC-level rejection → handshake fail.
|
||||
|
||||
On dev, Linux, or macOS (no native FWLIB — `UnimplementedFocasClientFactory`
|
||||
gates the driver):
|
||||
|
||||
- `DllNotFoundException` / `NotSupportedException` is caught and the probe
|
||||
degrades to TCP-reachability: `Ok=true`,
|
||||
`"(FOCAS handshake unavailable on this host — FWLIB absent, TCP reachability only)"`.
|
||||
|
||||
### Galaxy auth-rejection rule
|
||||
|
||||
The probe builds the gRPC channel from the form's config and issues one
|
||||
lightweight unary call. It does **not** resolve `secretref:` secrets — the
|
||||
key string in the transient config (possibly empty or unresolved) is used as-is.
|
||||
|
||||
- `Unavailable` / transport failure → `Ok=false` (gateway is down or unreachable).
|
||||
- `Unauthenticated` / `PermissionDenied` → **`Ok=true`**,
|
||||
`"gateway reachable & speaking gRPC; auth not checked"` — an auth rejection
|
||||
proves a live mxaccessgw gRPC server. This is the correct result: the driver's
|
||||
own session-layer will handle auth; the probe is testing reachability only.
|
||||
|
||||
---
|
||||
|
||||
## Live-verify scope
|
||||
|
||||
| Driver | Live-verify status | Notes |
|
||||
|--------|-------------------|-------|
|
||||
| Modbus | Verified | Dev-rig sim `10.100.0.35:5020`; green vs sim, RED vs wrong port / non-Modbus server, timeout vs black-hole IP |
|
||||
| OpcUaClient | Verified | opc-plc `10.100.0.35:50000`; same three-scenario matrix |
|
||||
| S7 | Verified | python-snap7 `10.100.0.35:1102` |
|
||||
| AbCip | Verified | CIP sim `10.100.0.35:44818` |
|
||||
| Galaxy | Verified | mxaccessgw `10.100.0.48:5120`; `Unauthenticated` reply counts as Ok |
|
||||
| AbLegacy | Deferred | No PLC5/SLC sim; unit-proven + code path identical to AbCip |
|
||||
| TwinCAT | Deferred | No ADS target; unit-proven + degrade guard tested |
|
||||
| FOCAS | Deferred | No CNC + FWLIB on dev host; degrade guard is the CI-observable path |
|
||||
|
||||
---
|
||||
|
||||
## Implementation references
|
||||
|
||||
- Phase 5 design: `docs/plans/2026-06-16-stillpending-phase-5-probes-design.md`
|
||||
- Parent roadmap: `docs/plans/2026-06-15-stillpending-backlog-design.md` §Phase 5
|
||||
- AdminUI probe flow: `docs/plans/2026-05-28-adminui-driver-pages-design.md` §4
|
||||
- Per-driver probe implementations: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.<Type>/<Type>DriverProbe.cs`
|
||||
- `IDriverProbe` contract: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverProbe.cs`
|
||||
- Probe dispatch + timeout clamp: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Actors/AdminOperationsActor.cs` (around line 284)
|
||||
@@ -296,6 +296,11 @@ Incremental — driver-by-driver swap-over. Each step compile-clean and shippabl
|
||||
5. Delete the generic `DriverEdit.razor` + its route once all 9 typed pages exist.
|
||||
6. Land `DriverStatusHub` + bridge + `<DriverStatusPanel>` (read-only first).
|
||||
7. Land `<DriverTestConnectButton>` + `IDriverProbe` impls + AdminOperationsActor handler.
|
||||
> **Shipped (TCP-only, 2026-05-28).** Real protocol handshakes (Modbus FC03, OPC UA GetEndpoints,
|
||||
> S7 COTP+setup-communication, CIP ForwardOpen, Galaxy gRPC ping) shipped 2026-06-16 via Phase 5
|
||||
> of the still-pending backlog (`docs/plans/2026-06-16-stillpending-phase-5-probes-design.md`).
|
||||
> TwinCAT ADS and FOCAS FWLIB handshakes degrade to TCP-reachability-only on hosts where the native
|
||||
> runtime is absent. See `docs/drivers/TestConnectProbes.md` for the full contract + degrade semantics.
|
||||
8. Land Reconnect/Restart on the status panel with `DriverOperator` policy.
|
||||
9. Land 9 static address builders inside `<DriverTagPicker>`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user