# Test-Connect Probes — Protocol Handshakes Each driver's **Test-Connect** button in the AdminUI runs a probe against the form's current config (never the persisted row, never the live driver actor). Before Phase 5 (shipped 2026-06-16) every probe was a bare TCP `ConnectAsync` — a live-but-rejecting device showed a healthy green tick, and the operator only discovered the truth when the driver faulted at deploy. Phase 5 replaced each TCP-only probe with a **real protocol handshake** so a reachable-but-wrong or actively-rejecting endpoint now reads RED. The `IDriverProbe` / `DriverProbeResult` contract and DI registration are unchanged. Probes run in a transient actor with a timeout clamp of 1–60 s and must not mutate any state. For the AdminUI probe flow (button → `AdminOperationsActor` → transient probe actor), see [`docs/plans/2026-05-28-adminui-driver-pages-design.md`](../plans/2026-05-28-adminui-driver-pages-design.md) §4. --- ## Result contract All probes return a consistent `DriverProbeResult(bool Ok, string? Message, TimeSpan? Latency)`. The message templates below are uniform across all 8 drivers: | Outcome | `Ok` | Message template | |---------|------|-----------------| | TCP connect fails | `false` | `"Connect failed: {SocketErrorCode}"` | | TCP ok + handshake ok | `true` | driver-specific descriptive string (see table below) | | TCP ok but handshake rejected | `false` | `"Reachable at {host}:{port} but {proto} handshake failed: {detail}"` | | Timeout | `false` | `"Probe timed out after {n}s."` | The third row is the key new behavior: a reachable device that answers on the port but rejects the protocol-level handshake now surfaces a `false` result with a human-readable explanation rather than a false-green TCP-open tick. --- ## Per-driver handshake | Driver | Handshake | Ok message | Dev-rig target | |--------|-----------|------------|----------------| | **Modbus** | FC03 (Read Holding Registers, qty 1 @ addr 0) via `ModbusTcpTransport`. A Modbus exception PDU still proves a real Modbus device → `Ok`. A non-MBAP reply → handshake fail. | `"Modbus FC03 OK"` | `10.100.0.35:5020` (Modbus sim) | | **OpcUaClient** | `DiscoveryClient.GetEndpointsAsync` — no session, no app-cert, no auth. ≥ 1 endpoint → `Ok`. A non-OPC-UA TCP server throws or times out → handshake fail. | `"OPC UA: N endpoint(s)"` | `opc.tcp://10.100.0.35:50000` (opc-plc) | | **S7** | `Plc.OpenAsync` (COTP CR/CC + S7 setup-communication), check `IsConnected`, then `Close`. Wrong rack/slot or a non-S7 server causes `OpenAsync` to throw → handshake fail. | `"S7 connected (CPU …)"` | `10.100.0.35:1102` (python-snap7 sim) | | **AbCip** | `libplctag` Tag `InitializeAsync` (EIP session + CIP Forward Open). A CIP-level error such as tag-not-found still proves the controller answered CIP → `Ok`. A session/ForwardOpen/connect error → handshake fail. | `"CIP session OK"` | `10.100.0.35:44818` (CIP sim) | | **AbLegacy** | Same `libplctag` `InitializeAsync` handshake as AbCip, PCCC protocol family. | `"CIP session OK"` (PCCC family) | Deferred — no PLC5/SLC sim | | **TwinCAT** | `AdsClient.Connect` + `ReadStateAsync`. See [degrade semantics](#twincat-degrade) below. | `"ADS state: {state}"` | Deferred — no ADS target | | **FOCAS** | `cnc_allclibhndl3` via a direct `DllImport("fwlib32")` in the probe. See [degrade semantics](#focas-degrade) below. | `"FOCAS handle OK"` | Deferred — no CNC + FWLIB | | **Galaxy** | gRPC unary call to `GalaxyRepository.TestConnection` on the configured mxaccessgw endpoint. See [auth-rejection rule](#galaxy-auth-rejection) below. | `"gateway gRPC OK"` | `http://10.100.0.48:5120` (mxaccessgw) | **Historian.Wonderware** already performed a real handshake (`Hello` → `HelloAck`) before Phase 5 and was not changed by this work. See [`Historian.Wonderware.md`](Historian.Wonderware.md) for details. --- ## Degrade semantics Three drivers have environmental constraints that can prevent the handshake from running on certain hosts. The **degradation principle** is: the probe must never produce a result *worse* than today's TCP-only probe. A genuine protocol rejection from a reachable device is a correct RED; an inability to *run* the handshake at all (no FWLIB, no managed router) degrades to the existing TCP-reachability message — still a green tick but annotated. ### TwinCAT degrade Where the handshake is available: - `AdsClient.Connect(netId, port)` + `ReadStateAsync` → `Ok=true`, `"ADS state: {state}"` (Run / Config / Stop). - An ADS **route-table rejection** from a reachable ADS router is a **true RED**: `"Reachable at {host}:{port} but ADS handshake failed: {detail} — check the target's ADS route table authorizes this host"`. This is the correct result: the driver would also be unable to function without an authorized route. Where the handshake is unavailable (headless server, no TwinCAT runtime, the managed AMS router cannot start): - Probe degrades to TCP-reachability: `Ok=true`, `"(ADS handshake unavailable on this host — TCP reachability only)"`. ### FOCAS degrade On a Windows host with the FANUC FWLIB shared library present: - `cnc_allclibhndl3` is called via a direct `DllImport("fwlib32")` declared in the probe (the production `Wire.WireFocasClient` is a pure-managed FOCAS/2 TCP client, not an FWLIB P/Invoke, so the probe carries its own native binding). A successful handle allocation → `Ok=true`, `"FOCAS handle OK"`. - A CNC-level rejection → handshake fail. On dev, Linux, or macOS (no native FWLIB — `UnimplementedFocasClientFactory` gates the driver): - `DllNotFoundException` / `NotSupportedException` is caught and the probe degrades to TCP-reachability: `Ok=true`, `"(FOCAS handshake unavailable on this host — FWLIB absent, TCP reachability only)"`. ### Galaxy auth-rejection rule The probe builds the gRPC channel from the form's config and issues one lightweight unary call. It does **not** resolve `secretref:` secrets — the key string in the transient config (possibly empty or unresolved) is used as-is. - `Unavailable` / transport failure → `Ok=false` (gateway is down or unreachable). - `Unauthenticated` / `PermissionDenied` → **`Ok=true`**, `"gateway reachable & speaking gRPC (auth not checked)"` — an auth rejection proves a live mxaccessgw gRPC server. This is the correct result: the driver's own session-layer will handle auth; the probe is testing reachability only. The mxaccessgw client surfaces a rejected key as a typed `MxGatewayAuthenticationException` / `MxGatewayAuthorizationException`, **not** a raw `RpcException` — the probe catches both and maps them to the reachable result above. (Live verification on `10.100.0.48:5120` with no key returns `MxGatewayAuthenticationException("Missing or invalid API key.")` → `Ok=true`.) > **Config note:** `UseTls` must match the endpoint scheme — `UseTls:false` for an > `http://` (h2c) gateway, `UseTls:true` for `https://`. A mismatch fails the > client's own validation (the same constraint the Galaxy driver enforces). --- ## Live-verify scope | Driver | Live-verify status | Notes | |--------|-------------------|-------| | Modbus | Verified | Dev-rig sim `10.100.0.35:5020`; green vs sim, RED vs wrong port / non-Modbus server, timeout vs black-hole IP | | OpcUaClient | Verified | opc-plc `10.100.0.35:50000`; same three-scenario matrix | | S7 | Verified | python-snap7 `10.100.0.35:1102` | | AbCip | Verified | CIP sim `10.100.0.35:44818` | | Galaxy | Verified | mxaccessgw `10.100.0.48:5120`; `Unauthenticated` reply counts as Ok | | AbLegacy | Deferred | No PLC5/SLC sim; unit-proven + code path identical to AbCip | | TwinCAT | Deferred | No ADS target; unit-proven + degrade guard tested | | FOCAS | Deferred | No CNC + FWLIB on dev host; degrade guard is the CI-observable path | --- ## Implementation references - Phase 5 design: `docs/plans/2026-06-16-stillpending-phase-5-probes-design.md` - Parent roadmap: `docs/plans/2026-06-15-stillpending-backlog-design.md` §Phase 5 - AdminUI probe flow: `docs/plans/2026-05-28-adminui-driver-pages-design.md` §4 - Per-driver probe implementations: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver./DriverProbe.cs` - `IDriverProbe` contract: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverProbe.cs` - Probe dispatch + timeout clamp: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Actors/AdminOperationsActor.cs` (around line 284)