The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket can be silently dropped by a middlebox (switch, firewall, NAT) after 2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream sockets, and drive a periodic synthetic FC03 heartbeat on each idle backend socket so a dead path is detected before a real client request hits it. Controlled by Connection.Keepalive (ON by default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.4 KiB
Keepalive & Connection Monitoring
The DL205/DL260 ECOM does not emit TCP keepalives (see ../Reference/dl205.md → "Behavioural Oddities"). An idle socket is silently dropped by middleboxes — switches, firewalls, NAT — typically after 2–5 minutes. The proxy holds one persistent backend socket per PLC (./ConnectionModel.md) plus many accepted upstream client sockets, so it needs its own keepalive on both sides.
Keepalive is enabled by default and is governed by the Connection.Keepalive option block (see ../Operations/Configuration.md). Set Connection.Keepalive.Enabled = false to restore pre-keepalive behaviour exactly.
Two mechanisms
| Mechanism | Scope | Detects |
|---|---|---|
OS TCP keepalive (SO_KEEPALIVE) |
Backend socket and accepted upstream sockets | A peer whose TCP stack is gone (host down, cable pulled, half-open socket). |
| Application heartbeat (FC03 probe) | Backend socket only | The above plus a middlebox idle-drop and an ECOM that is connected-but-not-answering Modbus. |
The application heartbeat is the load-bearing mechanism; OS keepalive is a cheap belt-and-suspenders that also covers the window between heartbeat ticks.
Backend: OS TCP keepalive
SocketKeepalive.Apply sets SO_KEEPALIVE plus the idle-time / probe-interval / probe-count tunables on the backend Socket right after it is created in PlcMultiplexer.EnsureBackendConnectedAsync. The tunables come from Connection.Keepalive.Tcp*. Socket options are applied at connect time — a hot-reload of the Tcp* values only affects backend sockets opened after the change.
Backend: application heartbeat
A per-PlcMultiplexer background loop (RunBackendHeartbeatAsync) is started alongside the backend writer and reader on each successful connect, under the same _backendCts, and dies with them on teardown.
- The multiplexer tracks
_lastBackendActivityUtc, updated by both the writer (on every send) and the reader (on every received frame). Real traffic in either direction therefore suppresses the heartbeat. - Each tick (a quarter of
BackendHeartbeatIdleMs, floored at 500 ms), if the socket has been idle longer thanBackendHeartbeatIdleMs, the loop issues a synthetic FC03 qty=1 read atBackendHeartbeatProbeAddress(default 0 =V0, valid on DL205/DL260). FC08 (Diagnostics) is not supported by the DL260 ECOM, so the probe must be a real register read. - The probe targets the unit ID of the most recent upstream request, so it reaches the same Modbus unit the real clients successfully use.
- The probe takes a real proxy TxId and a
CorrelationMapentry flaggedInFlightRequest.IsHeartbeat. It is enqueued straight onto the backend outbound channel, bypassing the read-coalescing and response-cache paths.
Heartbeat response
The backend reader recognises an IsHeartbeat correlation entry, refreshes the idle timer (already done on frame receipt), frees the TxId, and drops the payload — no rewriter, no cache write-through, no fan-out, and no round-trip-EWMA sample (the synthetic probe never pollutes the client-facing RTT metric).
Heartbeat timeout
If a probe is not answered within BackendRequestTimeoutMs, the per-request timeout watchdog (./ConnectionModel.md → "Per-Request Timeout Watchdog") finds the stale IsHeartbeat entry and — instead of dispatching a 0x0B exception to a (non-existent) upstream party — calls TearDownBackendAsync, cascading every attached upstream pipe.
This is a proactive version of the existing backend-disconnect cascade: the dead path is found during idle instead of corrupting the next real client request. Reconnect stays lazy — the heartbeat keeps an existing backend warm, it never resurrects a dead one and adds no eager-reconnect spinner. Clients reconnect on their next request, exactly as for an organic cascade.
BackendHeartbeatIdleMs must be greater than BackendRequestTimeoutMs (enforced by the reload validator) — a heartbeat interval at or below the request timeout would fire continuously.
Upstream: OS TCP keepalive
SocketKeepalive.Apply is also called on each accepted client Socket in the UpstreamPipe constructor. This is the only standard keepalive available on the upstream side: Modbus TCP is strictly client-initiated, so the proxy — a server to its clients — cannot send an unsolicited application heartbeat to a client. OS keepalive lets the proxy's TCP stack probe each client; a dead or half-open client then faults the pipe's read loop, the pipe is disposed, and its correlation / coalescing slots are freed instead of leaking until the proxy next tries to write.
Counters
Per-PLC, exposed on the status page (see ../Operations/StatusPage.md):
| Counter | Meaning |
|---|---|
backendHeartbeatsSent |
Heartbeat probes issued on idle backend sockets. |
backendHeartbeatsFailed |
Probes not answered within BackendRequestTimeoutMs. |
backendIdleDisconnects |
Backend teardowns triggered by a failed heartbeat (event count — distinct from disconnectCascades, which counts cascaded pipes). |
Log events
mbproxy.keepalive.* — see ../Reference/LogEvents.md:
mbproxy.keepalive.heartbeat.sent(Debug)mbproxy.keepalive.heartbeat.timeout(Warning)mbproxy.keepalive.backend.idle_disconnect(Information)
Hot reload
Connection.Keepalive is read through a live accessor (Func<KeepaliveOptions>), so a reload of appsettings.json propagates without a listener restart:
- The heartbeat interval and probe address are re-read on every loop tick.
- The TCP socket options are applied at connect/accept time, so a reload affects only sockets opened after the change.
Related documentation
./ConnectionModel.md— backend socket lifecycle, the timeout watchdog, and the disconnect cascade this feature hooks into../Operations/Configuration.md— theConnection.Keepaliveoption block../Operations/StatusPage.md— keepalive counters../Reference/LogEvents.md—mbproxy.keepalive.*events../Reference/dl205.md— the device "no keepalive" oddity and FC03/FC08 support