20 KiB
Phase 5 — Test-Connect Protocol Handshakes — Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
Goal: Replace the 8 bare-TCP Test-Connect probes with real protocol handshakes so a live-but-rejecting device reads RED, not green — reusing each driver's own client primitive, with graceful degradation for the three (TwinCAT/FOCAS/Galaxy) that can't run a real handshake on the dev rig.
Architecture: Per-probe, no shared scaffold (matches the existing self-contained probe style; keeps all 8 driver projects disjoint → parallelizable). Each probe keeps its TCP preflight and adds one handshake step. New three-way result contract (TCP-fail / handshake-ok / TCP-ok-but-handshake-rejected / timeout). IDriverProbe/DriverProbeResult and DI are UNCHANGED. Design: docs/plans/2026-06-16-stillpending-phase-5-probes-design.md.
Tech Stack: C# / .NET 10, xUnit + Shouldly, in-process TcpListener for unit tests, skip-gated DriverTestConnectE2eTests for live verification. Per-driver client libs already referenced (S7netplus, libplctag, Beckhoff.TwinCAT.Ads, OPCFoundation.Opc.Ua.Client, MxGateway.Client gRPC, FOCAS wire P/Invoke).
Consistent result-message templates (apply in EVERY probe):
- TCP connect fails →
Ok=false,"Connect failed: {SocketErrorCode}"(keep as-is) - Handshake OK →
Ok=true,Latency, e.g."Modbus FC03 OK","OPC UA: {n} endpoint(s)","S7 connected (CPU {cpu})","CIP session OK","ADS state: {state}","gateway gRPC OK" - TCP OK but handshake rejected →
Ok=false,"Reachable at {host}:{port} but {proto} handshake failed: {detail}" - Timeout (
OperationCanceledException) →Ok=false,"Probe timed out after {timeout.TotalSeconds:F0}s."(keep as-is) - Degrade (TwinCAT/FOCAS only, env can't run handshake) →
Ok=true,"Reachable at {host}:{port} ({proto} handshake unavailable on this host — TCP reachability only)"
Global rules (every task): TDD red→green. Probes MUST honour ct and MUST NOT mutate state. Stage by path — never git add .; never stage sql_login.txt, src/Server/.../Host/pki/, pending.md, current.md, docker-dev/docker-compose.yml, stillpending.md. Never echo/commit the gateway API key. No --no-verify, no force-push. No IDriverProbe/DriverProbeResult/DI change. No bUnit.
Task 0: Feature branch (done)
Branch feat/stillpending-phase-5-probes off master 050f5c4b already created; design committed 1f2d32ac. No action.
Task 1: Modbus handshake — FC03 read
Classification: small Estimated implement time: ~4 min Parallelizable with: Tasks 2, 3, 4, 5, 6, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ModbusDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/ModbusDriverProbeTests.cs(create)
Approach: After the existing TCP preflight succeeds, do the real handshake with the in-project transport:
// proto label for messages
const string Proto = "Modbus";
// ... existing deserialize + ExtractTarget + TCP preflight unchanged ...
// On TCP success, run a one-shot FC03 (Read Holding Registers, qty 1 @ addr 0):
await using var transport = new ModbusTcpTransport(host, port, /* keep-alive */ default, /* timeouts from opts/defaults */);
await transport.ConnectAsync(ct);
var pdu = new byte[] { 0x03, 0x00, 0x00, 0x00, 0x01 }; // FC03, addr 0, qty 1
try
{
_ = await transport.SendAsync(opts.UnitId /* or default 1 */, pdu, ct);
return new(true, "Modbus FC03 OK", sw.Elapsed);
}
catch (ModbusException) // exception PDU (e.g. illegal data address) STILL proves a real Modbus device
{
return new(true, "Modbus FC03 OK (device returned exception PDU)", sw.Elapsed);
}
- Inspect
ModbusTcpTransport's real ctor signature (ModbusTcpTransport.cs:27-66) andModbusDriverOptionsfor the unit-id field; mirror howModbusDriverconstructs the transport. Keep theSocketException/OperationCanceledException/Exceptioncatches; a non-ModbusExceptionfailure after TCP success →Ok=false,"Reachable at {host}:{port} but Modbus FC03 handshake failed: {ex.Message}". - Update the class XML-doc: it now performs a real FC03 handshake (drop the "Does NOT exchange any protocol bytes" sentence).
Steps: (1) Write failing tests. (2) Run → fail. (3) Implement handshake. (4) Run → pass. (5) dotnet build the Modbus project clean. (6) Commit.
Tests (ModbusDriverProbeTests, in-process TcpListener):
ProbeAsync_invalid_json → Ok=false("invalid").ProbeAsync_no_host → Ok=false("no host/port").ProbeAsync_unreachable_port → Ok=false(Connect failed) — target a closed loopback port.ProbeAsync_tcp_accepts_then_closes → Ok=falsewith "handshake failed" — aTcpListenerthat accepts and immediately closes (no MBAP reply).ProbeAsync_canned_MBAP_response → Ok=true"Modbus FC03 OK" — aTcpListenerthat reads the request frame and writes a valid MBAP FC03 response echoing the TxId.- (optional)
ProbeAsync_exception_PDU → Ok=true— listener replies 0x83 + exception code.
Commit: feat(probe): Modbus Test-Connect does a real FC03 handshake
Task 2: OpcUaClient handshake — GetEndpoints
Classification: small Estimated implement time: ~4 min Parallelizable with: Tasks 1, 3, 4, 5, 6, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientDriverProbeTests.cs(create)
Approach: After TCP preflight, do an unsecured discovery handshake (no session, no app-cert, no auth) — mirror OpcUaClientDriver.cs:417-424:
using var client = await DiscoveryClient.CreateAsync(new Uri(endpointUrl) /* + the SDK's default config as the driver does */);
var endpoints = await client.GetEndpointsAsync(null, ct);
return endpoints is { Count: > 0 }
? new(true, $"OPC UA: {endpoints.Count} endpoint(s)", sw.Elapsed)
: new(false, $"Reachable at {host}:{port} but OPC UA handshake failed: server published 0 endpoints", null);
- Reuse the EXACT
DiscoveryClient.CreateAsync(...)overload the driver uses (ReadOpcUaClientDriver.cs:405-424for the arg shape — it may pass anApplicationConfiguration/EndpointConfiguration). Honourct. A non-OPC-UA TCP server makesGetEndpointsAsyncthrow/timeout → catch →Ok=false"handshake failed: {ex.Message}". Keep the timeout/Connect-failed catches. - Update the class XML-doc (drop "Does NOT open an OPC UA session" → now does a GetEndpoints discovery handshake).
Tests: invalid-json / no-endpoint / unreachable / tcp-accepts-then-closes→handshake-fail. The happy path (real endpoints) is covered live in Task 11 (a faithful in-process OPC UA server is heavy; the accept-then-close negative path is the unit-testable new behavior).
Commit: feat(probe): OpcUaClient Test-Connect does a GetEndpoints discovery handshake
Task 3: S7 handshake — Plc.OpenAsync
Classification: small Estimated implement time: ~4 min Parallelizable with: Tasks 1, 2, 4, 5, 6, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7/S7DriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/S7DriverProbeTests.cs(create)
Approach: After TCP preflight, do the COTP+S7 setup handshake — mirror S7Driver.cs:162-171:
var plc = new Plc(S7CpuTypeMap.ToS7Net(opts.CpuType), host, port, opts.Rack, opts.Slot);
plc.ReadTimeout = (int)timeout.TotalMilliseconds; // set BEFORE OpenAsync (handshake honours it)
try
{
await plc.OpenAsync(ct);
if (plc.IsConnected) return new(true, $"S7 connected (CPU {opts.CpuType})", sw.Elapsed);
return new(false, $"Reachable at {host}:{port} but S7 handshake failed: not connected", null);
}
finally { plc.Close(); }
- Reuse
S7CpuTypeMap.ToS7Net(S7CpuTypeMap.cs). ReadS7DriverOptionsfor Rack/Slot/CpuType field names. Wrong rack/slot or non-S7 server →OpenAsyncthrows → catch →Ok=false"handshake failed: {ex.Message}". Keep Connect-failed / timeout catches. - Update the class XML-doc.
Tests: invalid-json / no-host / unreachable / tcp-accepts-then-closes→handshake-fail (a listener that accepts then closes makes OpenAsync throw). Happy path is live (Task 11, python-snap7 sim).
Commit: feat(probe): S7 Test-Connect does a real ISO-on-TCP + S7 setup handshake
Task 4: AbCip handshake — libplctag init
Classification: small Estimated implement time: ~5 min Parallelizable with: Tasks 1, 2, 3, 5, 6, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipDriverProbeTests.cs(create)
Approach: After TCP preflight, open a CIP session via libplctag by initializing one Tag against the first device:
- Build an
AbCipTagCreateParamsfrom the first device's options (Gateway + CIP path + PlcType/libplctag-attr + a tag name) andnew LibplctagTagRuntime(p).InitializeAsync(ct). ReadLibplctagTagRuntime.cs+AbCipTagCreateParams+ howAbCipDriverbuilds these (AbCipDriver.csdevice-init path, ~:824/:856) for the exact param shape and where the family/PlcType comes from. - For the tag name: prefer the first configured tag path if
optscarries tags; else a benign placeholder. Interpret the outcome:InitializeAsyncsucceeds →Ok"CIP session OK".- A CIP-level error (tag-not-found / bad-path — inspect
GetStatus()/ the libplctagStatusenum) → STILLOk"CIP session OK (controller reachable; probe tag not found)"— the controller answered CIP. - A session/ForwardOpen/connect/timeout error →
Ok=false"Reachable at {host}:{port} but CIP handshake failed: {detail}".
- Dispose the runtime/tag. Update the class XML-doc.
Tests: invalid-json / no-host / unreachable. The CIP-status interpretation happy/CIP-error paths are covered live (Task 11, CIP sim). Keep unit tests to the offline-determinable paths; do NOT spin a fake CIP server.
Commit: feat(probe): AbCip Test-Connect opens a real CIP session (libplctag init)
Task 5: AbLegacy handshake — libplctag init (PCCC)
Classification: small Estimated implement time: ~4 min Parallelizable with: Tasks 1, 2, 3, 4, 6, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/AbLegacyDriverProbeTests.cs(create)
Approach: Same libplctag-init pattern as Task 4 but the AbLegacy project's own runtime/params types (PCCC protocol family). Read the AbLegacy driver's device-init/tag-runtime code for the exact param shape (it mirrors AbCip). Same outcome interpretation (session-open or CIP/PCCC-level error → Ok; connect/timeout → handshake-fail). Message: "PCCC session OK" / "Reachable … but PCCC handshake failed: {detail}". Update the class XML-doc.
Tests: invalid-json / no-host / unreachable. Happy path is deferred (no PLC5/SLC sim on the rig) — note this in the test file header; the handshake code path is the same library as AbCip (verified-by-proxy).
Commit: feat(probe): AbLegacy Test-Connect opens a real PCCC session (libplctag init)
Task 6: TwinCAT handshake — ADS ReadState (degrade-guarded)
Classification: standard Estimated implement time: ~5 min Parallelizable with: Tasks 1, 2, 3, 4, 5, 7, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/TwinCATDriverProbeTests.cs(create)
Approach: After TCP preflight, attempt an ADS state read — mirror AdsTwinCATClient.cs:87-90,194:
using var client = new AdsClient();
client.Connect(parsed.NetId, parsed.Port); // AmsNetId + ADS port from the parsed address
var state = await client.ReadStateAsync(ct); // AdsState
return new(true, $"ADS state: {state.AdsState}", sw.Elapsed);
- Degrade guard: wrap construction/connect in try/catch. Distinguish:
- ADS connected +
ReadStateOK →Ok"ADS state: {state}". - ADS route/auth rejection from a reachable router (the AdsErrorCode indicates target-port/route) →
Ok=false"Reachable at {host}:{port} but ADS handshake failed: {AdsErrorCode} — check the target's ADS route table authorizes this host"(a true RED — the driver also needs the route). - The managed AMS router can't construct/run headless (any other exception that means the handshake could not be ATTEMPTED, not that the device rejected it) → degrade:
Ok=true"Reachable at {host}:{port} (ADS handshake unavailable on this host — TCP reachability only)".
- ADS connected +
- Use the existing
TwinCATAmsAddress.TryParsefor NetId+port (already inExtractTarget). Honourct/timeout. Read theBeckhoff.TwinCAT.AdsAdsClientAPI (Connect,ReadStateAsync,AdsErrorException/AdsErrorCode) to classify route-rejection vs construction-failure. Update the class XML-doc.
Tests: invalid-json / no-host / unreachable (black-hole → timeout or degrade). Assert the degrade path returns Ok=true with the "TCP reachability only" note when AdsClient cannot attempt the handshake. Happy/route-reject paths are deferred (no ADS target on the rig) — note in the test header.
Commit: feat(probe): TwinCAT Test-Connect does an ADS ReadState (degrade-guarded)
Task 7: FOCAS handshake — cnc_allclibhndl3 (degrade-guarded)
Classification: standard Estimated implement time: ~5 min Parallelizable with: Tasks 1, 2, 3, 4, 5, 6, 8
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/FocasDriverProbeTests.cs(create)
Approach: After TCP preflight, attempt the FOCAS library handshake via the existing wire P/Invoke (Wire.WireFocasClient / FocasWireClient — read those for the cnc_allclibhndl3/cnc_freelibhndl entry points). Build the wire client directly (do NOT route through UnimplementedFocasClientFactory, which throws by design). Allocate a handle to the first device's host/port and free it.
- Degrade guard: the FWLIB native lib is absent on the dev box / Linux containers → the P/Invoke throws
DllNotFoundException/NotSupportedException/TypeInitializationException. Catch those specifically and degrade:Ok=true"Reachable at {host}:{port} (FOCAS handshake unavailable on this host — FWLIB absent, TCP reachability only)".- Handle allocated OK →
Ok"FOCAS handle OK". - FWLIB present but
cnc_allclibhndl3returns an error code (e.g. EW_SOCKET) from a reachable-but-non-CNC host →Ok=false"Reachable at {host}:{port} but FOCAS handshake failed: {focasRc}".
- Handle allocated OK →
- Honour
ct/timeout (FWLIB connect can block — run it on a worker/Task.Runbounded by the linked timeout CTS so the probe still returns within budget). Update the class XML-doc.
Tests: invalid-json / no-host / unreachable. Assert the degrade path — on the CI/dev box (no FWLIB) the probe against a reachable TCP listener returns Ok=true with the "FWLIB absent" note (this IS the dev-box behavior, so it's directly testable). Happy/CNC-error paths are deferred (no CNC + no FWLIB) — note in the test header.
Commit: feat(probe): FOCAS Test-Connect attempts a cnc_allclibhndl3 handshake (degrade-guarded)
Task 8: Galaxy handshake — gRPC ping (auth-rejection = reachable)
Classification: standard Estimated implement time: ~5 min Parallelizable with: Tasks 1, 2, 3, 4, 5, 6, 7
Files:
- Modify:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriverProbe.cs - Test:
tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/GalaxyDriverProbeTests.cs(create)
Approach: After TCP preflight, build a Grpc.Net.Client channel to Gateway.Endpoint honouring the cleartext/TLS setting (read how GalaxyDriver / GatewayGalaxy* construct the channel — there's an http2-cleartext path for the dev gw), and issue ONE lightweight unary call from MxGateway.Client/MxGateway.Contracts (pick the cheapest — e.g. a status/echo/health, else the smallest query). Do NOT resolve secretref: secrets — send whatever key string is in the transient config.
- Interpret the gRPC
StatusCode:OK→Ok"gateway gRPC OK".Unauthenticated/PermissionDenied→ alsoOk"gateway reachable & speaking gRPC (auth not checked)"— proves a live mxgw server.Unavailable/ transport error / deadline →Ok=false"Reachable at {host}:{port} but gateway gRPC handshake failed: {StatusCode}".
- Honour
ct/timeout (set the gRPC deadline fromtimeout). Dispose the channel. Update the class XML-doc.
Tests: invalid-json / no-endpoint / unreachable (black-hole → Unavailable/deadline → Ok=false). The Unauthenticated⇒Ok rule: if a tiny in-process gRPC server is disproportionate, cover it live (Task 11, gateway 10.100.0.48:5120) and unit-test the StatusCode→result classification by factoring the mapping into a small pure helper (e.g. static (bool ok, string msg) ClassifyRpc(StatusCode, host, port)) and testing that directly.
Commit: feat(probe): Galaxy Test-Connect does a gRPC ping (auth-rejection counts as reachable)
Task 9: Docs + bookkeeping
Classification: small Estimated implement time: ~4 min Parallelizable with: none (blocked by 1–8)
Files:
- Modify:
docs/plans/2026-05-28-adminui-driver-pages-design.md(mark Phase 7 real-probes done) — OR add a "Test-Connect probes" section to the most appropriate driver doc. - Modify:
docs/Historian.mdor a probes note — record that the Historian probe was already a real handshake. - Create/Modify: a short
docs/drivers/TestConnectProbes.md(or a section in an existing driver overview) documenting the per-driver handshake + the three degrade behaviors + the consistent message contract.
Steps: Document the 8 handshakes, the degrade semantics (TwinCAT route table; FOCAS FWLIB-absent), and the auth-rejection=reachable Galaxy rule. Note AbLegacy/TwinCAT/FOCAS live-verify deferred (no sim/target/FWLIB). Commit: docs(phase5): real Test-Connect handshakes per driver + degrade semantics.
Task 10: Full build + test + final integration review
Classification: high-risk (final integration gate — degrade guards + no-regression-vs-TCP across 8 disjoint probes) Estimated implement time: ~6 min Parallelizable with: none (blocked by 1–9)
Steps:
dotnet build ZB.MOM.WW.OtOpcUa.slnx→ 0 errors (production projects areTreatWarningsAsErrors).dotnet testfor the 8 driver.Testsprojects → all green.- Final integration review focus: (a) every probe still returns the unchanged
"Connect failed"/"timed out"messages on those paths (no regression for offline devices); (b) the TwinCAT + FOCAS degrade guards truly catch "cannot-attempt" vs "device-rejected" and never emit a worse-than-TCP result; (c) Galaxy'sUnauthenticated⇒Ok; (d) no probe mutates state; (e) noIDriverProbe/DriverProbeResult/DI change leaked. - Commit any review fixes.
Task 11: Live /run — extend E2E + run the 5 verifiable probes
Classification: high-risk (acceptance gate, agent-driven) Estimated implement time: ~8 min Parallelizable with: none (blocked by 10)
Files:
- Modify:
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverTestConnectE2eTests.cs(add OpcUaClient/S7/AbCip/Galaxy happy + wrong-port cases, skip-gated like the Modbus ones).
Steps:
- Extend
DriverTestConnectE2eTestswith skip-gated happy-path + wrong-port cases for OpcUaClient (opc.tcp://10.100.0.35:50000), S7 (10.100.0.35:1102), AbCip (10.100.0.35:44818), Galaxy (10.100.0.48:5120), mirroring the existing Modbus pattern (DockerFixtureAvailability.IsReachableskip). - Run the integration suite from the dev Mac (sims reachable): assert each verifiable probe is GREEN vs its live sim and RED vs a wrong port. For Galaxy, source the key WITHOUT echoing if a call needs it (per the established recipe) — but the probe should report reachable even without it.
- Record results. AbLegacy / TwinCAT / FOCAS happy-path live-verify is honestly deferred (no PLC5/SLC sim, no ADS target, no CNC+FWLIB) — unit-proven + degrade-guarded.
- Commit the E2E additions:
test(phase5): live Test-Connect E2E for OpcUaClient/S7/AbCip/Galaxy (skip-gated).
Done =
Build clean + all driver .Tests green + final integration review SHIP + the 5 verifiable probes live-proven GREEN/RED on the rig + docs updated. Then finishing-a-development-branch → merge to master + push.