mbproxy: add keepalive / connection monitoring

The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket
can be silently dropped by a middlebox (switch, firewall, NAT) after
2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream
sockets, and drive a periodic synthetic FC03 heartbeat on each idle
backend socket so a dead path is detected before a real client request
hits it. Controlled by Connection.Keepalive (ON by default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-15 09:40:54 -04:00
parent 7466a46aa7
commit 0868613890
25 changed files with 1135 additions and 25 deletions
+28 -1
View File
@@ -99,7 +99,34 @@
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
// (sc.exe stop / Windows Service stop signal). After this deadline the coordinator
// cancels remaining work and proceeds. Keep at or below the SCM wait-hint (30 s).
"GracefulShutdownTimeoutMs": 10000
"GracefulShutdownTimeoutMs": 10000,
// ── Keepalive / connection monitoring ───────────────────────────────────
// The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend
// socket can be silently dropped by a middlebox (switch, firewall, NAT)
// after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both
// backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat
// on each idle backend socket so a dead path is detected before a real
// client request hits it. See docs/Architecture/Keepalive.md.
"Keepalive": {
// Master switch. false → no SO_KEEPALIVE and no heartbeat; the proxy
// behaves exactly as a pre-keepalive build.
"Enabled": true,
// SO_KEEPALIVE: idle time (ms) before the OS sends its first probe.
"TcpIdleTimeMs": 30000,
// SO_KEEPALIVE: interval (ms) between probes once the idle time elapses.
"TcpProbeIntervalMs": 5000,
// SO_KEEPALIVE: unanswered probes before the OS declares the socket dead.
"TcpProbeCount": 4,
// Backend heartbeat: after this much backend idle (ms) the proxy issues a
// synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is
// still answering Modbus. Must be greater than BackendRequestTimeoutMs.
"BackendHeartbeatIdleMs": 30000,
// FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260.
"BackendHeartbeatProbeAddress": 0
}
},
// ── Resilience policies ─────────────────────────────────────────────────────────────