# Keepalive & Connection Monitoring The DL205/DL260 ECOM does not emit TCP keepalives (see [`../Reference/dl205.md`](../Reference/dl205.md) → "Behavioural Oddities"). An idle socket is silently dropped by middleboxes — switches, firewalls, NAT — typically after 2–5 minutes. The proxy holds one **persistent backend socket per PLC** ([`./ConnectionModel.md`](./ConnectionModel.md)) plus many accepted upstream client sockets, so it needs its own keepalive on both sides. Keepalive is **enabled by default** and is governed by the `Connection.Keepalive` option block (see [`../Operations/Configuration.md`](../Operations/Configuration.md)). Set `Connection.Keepalive.Enabled = false` to restore pre-keepalive behaviour exactly. ## Two mechanisms | Mechanism | Scope | Detects | |-----------|-------|---------| | OS TCP keepalive (`SO_KEEPALIVE`) | Backend socket **and** accepted upstream sockets | A peer whose TCP stack is gone (host down, cable pulled, half-open socket). | | Application heartbeat (FC03 probe) | Backend socket only | The above **plus** a middlebox idle-drop and an ECOM that is connected-but-not-answering Modbus. | The application heartbeat is the load-bearing mechanism; OS keepalive is a cheap belt-and-suspenders that also covers the window between heartbeat ticks. ## Backend: OS TCP keepalive `SocketKeepalive.Apply` sets `SO_KEEPALIVE` plus the idle-time / probe-interval / probe-count tunables on the backend `Socket` right after it is created in `PlcMultiplexer.EnsureBackendConnectedAsync`. The tunables come from `Connection.Keepalive.Tcp*`. Socket options are applied **at connect time** — a hot-reload of the `Tcp*` values only affects backend sockets opened *after* the change. ## Backend: application heartbeat A per-`PlcMultiplexer` background loop (`RunBackendHeartbeatAsync`) is started alongside the backend writer and reader on each successful connect, under the same `_backendCts`, and dies with them on teardown. - The multiplexer tracks `_lastBackendActivityUtc`, updated by **both** the writer (on every send) and the reader (on every received frame). Real traffic in either direction therefore suppresses the heartbeat. - Each tick (a quarter of `BackendHeartbeatIdleMs`, floored at 500 ms), if the socket has been idle longer than `BackendHeartbeatIdleMs`, the loop issues a **synthetic FC03 qty=1 read** at `BackendHeartbeatProbeAddress` (default 0 = `V0`, valid on DL205/DL260). FC08 (Diagnostics) is **not** supported by the DL260 ECOM, so the probe must be a real register read. - The probe targets the unit ID of the most recent upstream request, so it reaches the same Modbus unit the real clients successfully use. - The probe takes a real proxy TxId and a `CorrelationMap` entry flagged `InFlightRequest.IsHeartbeat`. It is enqueued straight onto the backend outbound channel, **bypassing** the read-coalescing and response-cache paths. ### Heartbeat response The backend reader recognises an `IsHeartbeat` correlation entry, refreshes the idle timer (already done on frame receipt), frees the TxId, and **drops the payload** — no rewriter, no cache write-through, no fan-out, and no round-trip-EWMA sample (the synthetic probe never pollutes the client-facing RTT metric). ### Heartbeat timeout If a probe is not answered within `BackendRequestTimeoutMs`, the per-request timeout watchdog ([`./ConnectionModel.md`](./ConnectionModel.md) → "Per-Request Timeout Watchdog") finds the stale `IsHeartbeat` entry and — instead of dispatching a 0x0B exception to a (non-existent) upstream party — calls `TearDownBackendAsync`, cascading every attached upstream pipe. This is a **proactive** version of the existing backend-disconnect cascade: the dead path is found during idle instead of corrupting the next real client request. Reconnect stays lazy — the heartbeat keeps an *existing* backend warm, it never resurrects a dead one and adds no eager-reconnect spinner. Clients reconnect on their next request, exactly as for an organic cascade. `BackendHeartbeatIdleMs` must be greater than `BackendRequestTimeoutMs` (enforced by the reload validator) — a heartbeat interval at or below the request timeout would fire continuously. ## Upstream: OS TCP keepalive `SocketKeepalive.Apply` is also called on each accepted client `Socket` in the `UpstreamPipe` constructor. This is the **only** standard keepalive available on the upstream side: Modbus TCP is strictly client-initiated, so the proxy — a server to its clients — cannot send an unsolicited application heartbeat to a client. OS keepalive lets the proxy's TCP stack probe each client; a dead or half-open client then faults the pipe's read loop, the pipe is disposed, and its correlation / coalescing slots are freed instead of leaking until the proxy next tries to write. ## Counters Per-PLC, exposed on the status page (see [`../Operations/StatusPage.md`](../Operations/StatusPage.md)): | Counter | Meaning | |---------|---------| | `backendHeartbeatsSent` | Heartbeat probes issued on idle backend sockets. | | `backendHeartbeatsFailed` | Probes not answered within `BackendRequestTimeoutMs`. | | `backendIdleDisconnects` | Backend teardowns triggered by a failed heartbeat (event count — distinct from `disconnectCascades`, which counts cascaded pipes). | ## Log events `mbproxy.keepalive.*` — see [`../Reference/LogEvents.md`](../Reference/LogEvents.md): - `mbproxy.keepalive.heartbeat.sent` (Debug) - `mbproxy.keepalive.heartbeat.timeout` (Warning) - `mbproxy.keepalive.backend.idle_disconnect` (Information) ## Hot reload `Connection.Keepalive` is read through a live accessor (`Func`), so a reload of `appsettings.json` propagates without a listener restart: - The **heartbeat** interval and probe address are re-read on every loop tick. - The **TCP socket options** are applied at connect/accept time, so a reload affects only sockets opened after the change. ## Related documentation - [`./ConnectionModel.md`](./ConnectionModel.md) — backend socket lifecycle, the timeout watchdog, and the disconnect cascade this feature hooks into - [`../Operations/Configuration.md`](../Operations/Configuration.md) — the `Connection.Keepalive` option block - [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — keepalive counters - [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — `mbproxy.keepalive.*` events - [`../Reference/dl205.md`](../Reference/dl205.md) — the device "no keepalive" oddity and FC03/FC08 support