HistorianGateway is now the sole historian backend (read + alarm SendEvent + continuous WriteLiveValues). Document the final state and retire the Wonderware sidecar from the docs/config/labels: - CLAUDE.md: rewrite the Historian section — ServerHistorian / ContinuousHistorization / AlarmHistorian config keys, the IHistorianProvisioning EnsureTags hook, the GatewayAlarmHistorianWriter SendEvent path + ReadEvents dependency on gateway RuntimeDb:EventReadsEnabled=true, gateway-side prerequisites (RuntimeDb flags + historian:read/write/tags:write scopes), migration note, and two KNOWN-LIMITATION callouts (live-validation gate + empty historized-ref-set recorder follow-on). - appsettings.json: fix the stale ServerHistorian block (Host/Port/SharedSecret/ ServerCertThumbprint -> Endpoint/ApiKey/UseTls/AllowUntrustedServerCertificate/ CaCertificatePath/CallTimeout, keep MaxTieClusterOverfetch); add a disabled ContinuousHistorization block; prune the orphaned Wonderware keys from AlarmHistorian (keep the SQLite knobs). ApiKey env-supplied via ServerHistorian__ApiKey (commented; valid strict JSON via _comment keys). - README.md + docs (Historian.md, AlarmHistorian.md, Configuration.md, ServiceHosting.md, DriverLifecycle.md, drivers/README.md, Uns.md, VirtualTags.md, AlarmTracking.md, Client.UI.md, README.md, TestConnectProbes.md): retire the Wonderware historian backend from current-backend descriptions; fix the stale ServerHistorian/AlarmHistorian config tables (now gateway shape); convert drivers/Historian.Wonderware.md to a retired stub pointing at the gateway. - Source/UI labels (descriptive text only, no behavior change): OtOpcUaServerHostedService.cs, HistoryPaging.cs, OtOpcUaSdkServer.cs, HistorianAdapterActor.cs, VirtualTagModal.razor, ScriptedAlarmModal.razor, AlarmsHistorian.razor now name the HistorianGateway backend. Build clean (0 errors); AdminUI.Tests green (514 passed). Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
8.5 KiB
Test-Connect Probes — Protocol Handshakes
Each driver's Test-Connect button in the AdminUI runs a probe against the
form's current config (never the persisted row, never the live driver actor).
Before Phase 5 (shipped 2026-06-16) every probe was a bare TCP ConnectAsync
— a live-but-rejecting device showed a healthy green tick, and the operator
only discovered the truth when the driver faulted at deploy. Phase 5 replaced
each TCP-only probe with a real protocol handshake so a reachable-but-wrong
or actively-rejecting endpoint now reads RED.
The IDriverProbe / DriverProbeResult contract and DI registration are
unchanged. Probes run in a transient actor with a timeout clamp of 1–60 s
and must not mutate any state.
For the AdminUI probe flow (button → AdminOperationsActor → transient probe
actor), see
docs/plans/2026-05-28-adminui-driver-pages-design.md
§4.
Result contract
All probes return a consistent DriverProbeResult(bool Ok, string? Message, TimeSpan? Latency).
The message templates below are uniform across all 8 drivers:
| Outcome | Ok |
Message template |
|---|---|---|
| TCP connect fails | false |
"Connect failed: {SocketErrorCode}" |
| TCP ok + handshake ok | true |
driver-specific descriptive string (see table below) |
| TCP ok but handshake rejected | false |
"Reachable at {host}:{port} but {proto} handshake failed: {detail}" |
| Timeout | false |
"Probe timed out after {n}s." |
The third row is the key new behavior: a reachable device that answers on the
port but rejects the protocol-level handshake now surfaces a false result
with a human-readable explanation rather than a false-green TCP-open tick.
Per-driver handshake
| Driver | Handshake | Ok message | Dev-rig target |
|---|---|---|---|
| Modbus | FC03 (Read Holding Registers, qty 1 @ addr 0) via ModbusTcpTransport. A Modbus exception PDU still proves a real Modbus device → Ok. A non-MBAP reply → handshake fail. |
"Modbus FC03 OK" |
10.100.0.35:5020 (Modbus sim) |
| OpcUaClient | DiscoveryClient.GetEndpointsAsync — no session, no app-cert, no auth. ≥ 1 endpoint → Ok. A non-OPC-UA TCP server throws or times out → handshake fail. |
"OPC UA: N endpoint(s)" |
opc.tcp://10.100.0.35:50000 (opc-plc) |
| S7 | Plc.OpenAsync (COTP CR/CC + S7 setup-communication), check IsConnected, then Close. Wrong rack/slot or a non-S7 server causes OpenAsync to throw → handshake fail. |
"S7 connected (CPU …)" |
10.100.0.35:1102 (python-snap7 sim) |
| AbCip | libplctag Tag InitializeAsync (EIP session + CIP Forward Open). A CIP-level error such as tag-not-found still proves the controller answered CIP → Ok. A session/ForwardOpen/connect error → handshake fail. |
"CIP session OK" |
10.100.0.35:44818 (CIP sim) |
| AbLegacy | Same libplctag InitializeAsync handshake as AbCip, PCCC protocol family. |
"CIP session OK" (PCCC family) |
Deferred — no PLC5/SLC sim |
| TwinCAT | AdsClient.Connect + ReadStateAsync. See degrade semantics below. |
"ADS state: {state}" |
Deferred — no ADS target |
| FOCAS | cnc_allclibhndl3 via a direct DllImport("fwlib32") in the probe. See degrade semantics below. |
"FOCAS handle OK" |
Deferred — no CNC + FWLIB |
| Galaxy | gRPC unary call to GalaxyRepository.TestConnection on the configured mxaccessgw endpoint. See auth-rejection rule below. |
"gateway gRPC OK" |
http://10.100.0.48:5120 (mxaccessgw) |
Historian.Wonderware had a TCP Hello→HelloAck handshake probe before Phase 5, but the
Wonderware historian backend (and its driver-type / probe) has since been retired — the historian
backend is now the external HistorianGateway (a gRPC client package, not a probed IDriver). See
Historian.Wonderware.md (retired stub) and ../Historian.md.
Degrade semantics
Three drivers have environmental constraints that can prevent the handshake from running on certain hosts. The degradation principle is: the probe must never produce a result worse than today's TCP-only probe. A genuine protocol rejection from a reachable device is a correct RED; an inability to run the handshake at all (no FWLIB, no managed router) degrades to the existing TCP-reachability message — still a green tick but annotated.
TwinCAT degrade
Where the handshake is available:
AdsClient.Connect(netId, port)+ReadStateAsync→Ok=true,"ADS state: {state}"(Run / Config / Stop).- An ADS route-table rejection from a reachable ADS router is a true RED:
"Reachable at {host}:{port} but ADS handshake failed: {detail} — check the target's ADS route table authorizes this host". This is the correct result: the driver would also be unable to function without an authorized route.
Where the handshake is unavailable (headless server, no TwinCAT runtime, the managed AMS router cannot start):
- Probe degrades to TCP-reachability:
Ok=true,"(ADS handshake unavailable on this host — TCP reachability only)".
FOCAS degrade
On a Windows host with the FANUC FWLIB shared library present:
cnc_allclibhndl3is called via a directDllImport("fwlib32")declared in the probe (the productionWire.WireFocasClientis a pure-managed FOCAS/2 TCP client, not an FWLIB P/Invoke, so the probe carries its own native binding). A successful handle allocation →Ok=true,"FOCAS handle OK".- A CNC-level rejection → handshake fail.
On dev, Linux, or macOS (no native FWLIB — UnimplementedFocasClientFactory
gates the driver):
DllNotFoundException/NotSupportedExceptionis caught and the probe degrades to TCP-reachability:Ok=true,"(FOCAS handshake unavailable on this host — FWLIB absent, TCP reachability only)".
Galaxy auth-rejection rule
The probe builds the gRPC channel from the form's config and issues one
lightweight unary call. It does not resolve secretref: secrets — the
key string in the transient config (possibly empty or unresolved) is used as-is.
Unavailable/ transport failure →Ok=false(gateway is down or unreachable).Unauthenticated/PermissionDenied→Ok=true,"gateway reachable & speaking gRPC (auth not checked)"— an auth rejection proves a live mxaccessgw gRPC server. This is the correct result: the driver's own session-layer will handle auth; the probe is testing reachability only.
The mxaccessgw client surfaces a rejected key as a typed
MxGatewayAuthenticationException / MxGatewayAuthorizationException, not a
raw RpcException — the probe catches both and maps them to the reachable result
above. (Live verification on 10.100.0.48:5120 with no key returns
MxGatewayAuthenticationException("Missing or invalid API key.") → Ok=true.)
Config note:
UseTlsmust match the endpoint scheme —UseTls:falsefor anhttp://(h2c) gateway,UseTls:trueforhttps://. A mismatch fails the client's own validation (the same constraint the Galaxy driver enforces).
Live-verify scope
| Driver | Live-verify status | Notes |
|---|---|---|
| Modbus | Verified | Dev-rig sim 10.100.0.35:5020; green vs sim, RED vs wrong port / non-Modbus server, timeout vs black-hole IP |
| OpcUaClient | Verified | opc-plc 10.100.0.35:50000; same three-scenario matrix |
| S7 | Verified | python-snap7 10.100.0.35:1102 |
| AbCip | Verified | CIP sim 10.100.0.35:44818 |
| Galaxy | Verified | mxaccessgw 10.100.0.48:5120; Unauthenticated reply counts as Ok |
| AbLegacy | Deferred | No PLC5/SLC sim; unit-proven + code path identical to AbCip |
| TwinCAT | Deferred | No ADS target; unit-proven + degrade guard tested |
| FOCAS | Deferred | No CNC + FWLIB on dev host; degrade guard is the CI-observable path |
Implementation references
- Phase 5 design:
docs/plans/2026-06-16-stillpending-phase-5-probes-design.md - Parent roadmap:
docs/plans/2026-06-15-stillpending-backlog-design.md§Phase 5 - AdminUI probe flow:
docs/plans/2026-05-28-adminui-driver-pages-design.md§4 - Per-driver probe implementations:
src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.<Type>/<Type>DriverProbe.cs IDriverProbecontract:src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverProbe.cs- Probe dispatch + timeout clamp:
src/Server/ZB.MOM.WW.OtOpcUa.Host/Actors/AdminOperationsActor.cs(around line 284)