Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.
Architecture/
- Overview.md (150 lines) — listener topology, request flow, per-PLC isolation
- ConnectionModel.md (247 lines) — TxId multiplexer, watchdog, disconnect cascade
- ReadCoalescing.md (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
- ResponseCache.md (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation
Features/
- BcdRewriting.md (252 lines) — codec, CDAB, FC scope, partial-overlap policy
- HotReload.md (189 lines) — IOptionsMonitor + per-change-kind reconcile rules
Operations/
- Configuration.md (422 lines) — every Mbproxy:* option + validation rules
- StatusPage.md (334 lines) — admin endpoint surface, every JSON field
- Troubleshooting.md (364 lines) — diagnosis playbook keyed to log events
Reference/
- LogEvents.md (499 lines) — 28 events across 7 categories, grep-verified
Testing/
- Simulator.md (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk
Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.
Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Architecture Overview
mbproxy is a .NET 10 background service that sits inline between Modbus TCP clients and a fleet of AutomationDirect DL205/DL260 PLCs, rewriting BCD-encoded registers in both directions while multiplexing many upstream clients onto one persistent backend socket per PLC.
This document is the entry point for readers new to the codebase. It sketches the runtime shape, the listener topology, the per-PLC isolation model, and the path a single Modbus frame takes from accept to response, and then hands off to the per-feature documents under docs/Architecture/, docs/Features/, and docs/Operations/.
Runtime Shape
The process is a single .NET 10 Generic Host worker. Microsoft.Extensions.Hosting.WindowsServices registers the host as a Windows Service so the same binary runs interactively (for development) or under the SCM (in production). All configuration binds from appsettings.json through IOptionsMonitor<MbproxyOptions>, which makes the tag list and PLC roster hot-reloadable without process restart. ProxyWorker is the long-lived BackgroundService that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
There is no in-process database, no message broker, and no persistent cache file: state is per-PLC, in-memory, and ephemeral. Restarting the service drops every in-flight request and every cached response. Upstream clients are expected to reconnect and reissue; the proxy never replays a request on their behalf.
Listener Topology
The proxy opens one TcpListener per PLC on a distinct port. A client picks which PLC it is talking to by choosing which port to connect to. There is no protocol-level routing — port number is the PLC identity. This keeps the upstream surface trivial for Wonderware, Historian gateways, and generic Modbus clients that already know how to point at host:port, and it means no per-frame header inspection is needed to decide where a request is going.
Client A ──┐
Client B ──┼──→ proxy:5020 ──→ PLC #1 (10.0.1.1:502)
├──→ proxy:5021 ──→ PLC #2 (10.0.1.2:502)
│ ...
└──→ proxy:5073 ──→ PLC #54 (10.0.1.54:502)
Each listener runs under a PlcListenerSupervisor that owns its bind lifecycle. If a bind fails at startup or the listener faults at runtime, the supervisor reattempts under a Polly retry pipeline; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones. The supervisor's state (SupervisorState) is observable on the status page so an operator can tell at a glance whether a port is bound, recovering, or shut down.
Because port identity is PLC identity, adding a PLC is purely a configuration change — append an entry to Mbproxy.Plcs with a free ListenPort, save, and the supervisor reconciliation loop binds the new port without touching any other PLC. Removing a PLC follows the same path in reverse.
Per-PLC Isolation
Every PLC gets its own PerPlcContext carrying that PLC's PlcMultiplexer, CorrelationMap, TxIdAllocator, InFlightByKeyMap, optional ResponseCache, CacheInvalidator, and BcdPduPipeline. There is no shared mutable state across PLCs at the request path.
The consequence is fault containment:
- A slow or dead backend on PLC #17 cannot block the request loop for PLC #18. Each multiplexer owns its own outbound channel and its own backend reader/writer task pair.
- A flood of in-flight requests on one PLC consumes only that PLC's TxId allocator (the 16-bit space is per-PLC, not global).
- A backend disconnect on one PLC cascades only to that PLC's attached upstream pipes; the rest of the fleet is unaffected.
- Hot-reload of one PLC's tag list rewrites only that PLC's
BcdPduPipelineview of the tag map. Other PLCs do not observe the swap.
The listener topology and the per-PLC component graph are deliberately aligned: one port, one supervisor, one multiplexer, one backend socket, one cache instance.
Cross-PLC state exists only in three places, and each is read-mostly: the bound IOptionsMonitor<MbproxyOptions> snapshot, the global Serilog logger, and the service-wide counter set surfaced on the status page. Counters are written via lock-free Interlocked operations on disjoint per-PLC fields, then summed when the status page is rendered.
This isolation is what lets the service operate degraded without operator intervention. If three PLCs drop off the network, the supervisor for each enters recovering, their multiplexers tear down their backend sockets, attached upstream clients are disconnected, and the remaining 51 PLCs keep serving traffic with no measurable impact. When the dropped PLCs come back, their supervisors rebind their listeners and the next upstream request triggers a fresh backend connect through the Polly pipeline — no fleet-wide restart, no manual reconnect, no shared state to flush.
Request Flow
The path of an FC03 read from an upstream client through the proxy and back. The cache check, the coalescing check, and the BCD rewrite all sit between the upstream parse and the backend send so the multiplexer can short-circuit the backend entirely when it does not need to be involved. Steps the upstream client never sees are indented.
Upstream client
│ TCP connect → proxy:5020
▼
PlcListener (PlcListener.cs) accepts the socket
│
▼
UpstreamPipe wraps the socket: read loop + bounded response channel
│ parses MBAP frames off the wire, hands each frame to:
▼
PlcMultiplexer.OnUpstreamFrameAsync(pipe, frame, ct)
│
│ 1. Parse MBAP header → originalTxId, unitId
│ 2. Parse PDU → fc, startAddr, qty
│ 3. (FC03/FC04 only) ResponseCache.TryGet(CacheKey)
│ ├─ HIT → splice cached payload onto a fresh MBAP header
│ │ with originalTxId, push to upstream channel, DONE.
│ └─ MISS → fall through.
│ 4. InFlightByKeyMap coalesce check
│ ├─ duplicate read in flight → attach as additional waiter,
│ │ share the eventual response, DONE for this frame.
│ └─ first-of-key → become the leader, fall through.
│ 5. BcdPduPipeline rewrites request payload (FC06/FC16) binary → BCD
│ 6. TxIdAllocator hands out a free proxyTxId
│ 7. CorrelationMap[proxyTxId] = InFlightRequest(pipe, originalTxId, ...)
│ 8. Overwrite MBAP TxId field with proxyTxId; enqueue to outbound channel
▼
Backend writer task drains the outbound channel
│ → single persistent socket → PLC :502
▼
PLC responds; backend reader task picks the frame off the socket
│
│ 9. Look up proxyTxId in CorrelationMap; recover original requester(s)
│ 10. BcdPduPipeline rewrites response payload (FC03/FC04) BCD → binary
│ 11. ResponseCache stores the rewritten payload (if TTL > 0)
│ 12. Fan out to every waiter on the InFlightByKey entry, restoring each
│ waiter's originalTxId before pushing into its UpstreamPipe channel
▼
UpstreamPipe writer task drains its response channel → upstream socket
│
▼
Upstream client sees a response with the TxId it originally sent.
Writes (FC06, FC16) take a shorter path: no cache lookup, no coalescing, but the request payload is BCD-rewritten before forwarding, and the response triggers CacheInvalidator to evict any overlapping cached read ranges so the next read does not serve stale data.
A few invariants are worth flagging because they shape the design:
- Original TxId is preserved end-to-end. The multiplexer rewrites the wire TxId for routing, but every upstream client sees the exact 16-bit value it sent.
InFlightRequestcarries the original TxId alongside the upstream pipe reference. - Single backend writer, single backend reader. No socket-level synchronisation is needed because exactly one task writes to the backend socket and exactly one task reads from it. The outbound channel funnels every request through that single writer.
- The cache check happens before backend connect. If every read in a request is cache-served and the backend is currently disconnected, the upstream client still gets a response. The cache survives backend transitions intentionally.
- No mid-request retries on writes. FC06 and FC16 are non-idempotent on BCD tags (a partial-applied multi-register write could leave a 32-bit BCD value mid-transition), so a backend failure during a write surfaces as Modbus exception 0x0B and the client decides how to recover.
Component Map
The major components a reader will hit when tracing a request, with their file locations under src/Mbproxy/. The list is ordered by where each component sits in the request path — accept loop at the top, rewrite at the bottom.
ProxyWorker—Proxy/ProxyWorker.cs. TheBackgroundServicehost; reconciles the configured PLC list with the supervisor roster on startup and onIOptionsMonitorchange events.PlcListenerSupervisor—Proxy/Supervision/PlcListenerSupervisor.cs. Owns one PLC's listener lifecycle (bind, run, recover, shut down). Uses Polly for bounded recovery.PlcListener—Proxy/PlcListener.cs. The actualTcpListeneraccept loop for one PLC; hands every accepted socket to that PLC's multiplexer as a newUpstreamPipe.UpstreamPipe—Proxy/Multiplexing/UpstreamPipe.cs. One per upstream socket. Frame-parses inbound bytes and pushes parsed MBAP frames into the multiplexer; drains outbound responses from a bounded channel back to the client.PlcMultiplexer—Proxy/Multiplexing/PlcMultiplexer.cs. The per-PLC fanin/fanout core. Owns the persistent backend socket, the outbound write loop, the backend read loop, the per-request watchdog, and the cascade-on-backend-disconnect contract. Entry pointOnUpstreamFrameAsyncis where every upstream frame enters the request path; it is the single function that ties cache, coalescing, BCD rewrite, TxId allocation, and correlation together.CorrelationMap—Proxy/Multiplexing/CorrelationMap.cs. MapsproxyTxId → InFlightRequestso backend responses can be routed back to the originating upstream pipe(s). Also the surface the watchdog scans for stale entries.TxIdAllocator—Proxy/Multiplexing/TxIdAllocator.cs. Allocates and recycles the per-PLC 16-bit proxy TxId space used by the multiplexer.InFlightByKeyMap—Proxy/Multiplexing/InFlightByKeyMap.cs. The read-coalescing seam: keys on(unitId, fc, startAddr, qty)so duplicate concurrent reads share one backend round-trip and one response.ResponseCache—Proxy/Cache/ResponseCache.cs. Opt-in per-tag-range TTL cache for FC03/FC04 responses. A cache hit short-circuits the backend entirely; cache lookup happens before the multiplexer even ensures the backend is connected.CacheInvalidator—Proxy/Cache/CacheInvalidator.cs. Invalidates cached read ranges that overlap with successful FC06/FC16 writes, so writes never leave stale reads behind.BcdPduPipeline—Proxy/BcdPduPipeline.cs. The actual BCD rewrite: walks request and response PDUs against the resolved tag map and re-encodes each configured register between BCD nibbles and binary integers. 32-bit BCD tags spanning the CDAB word pair are rewritten as a unit. Non-BCD registers pass through untouched, and any function code the pipeline does not own (diagnostics, exceptions, coil and discrete-input functions) is forwarded byte-for-byte.
PerPlcContext (Proxy/PerPlcContext.cs) is the container that binds these together for one PLC and is the handle the supervisor and multiplexer carry around.
Two supporting abstractions are worth knowing about even though they do not appear in the per-frame path:
IPduPipeline— the rewrite-pipeline interface (Proxy/IPduPipeline.cs).BcdPduPipelineis the production implementation;NoopPduPipelineis the test/passthrough implementation used when no BCD tags are configured for a PLC.MbapFrame— the static helper (Proxy/MbapFrame.cs) that parses and serialises the 7-byte MBAP header. Every component that touches the wire goes through this helper rather than indexing raw byte arrays directly.
Counters and structured log event names emitted from these components are catalogued in ProxyCounters (Proxy/ProxyCounters.cs) and the various *LogEvents static classes (MultiplexerLogEvents, CoalescingLogEvents, CacheLogEvents, RewriterLogEvents). A reader following a runtime symptom back to its source should grep for the event-name constants in those files first.
Where to Read Next
For the wire-level details of how one backend socket fans out to many upstream clients — TxId rewriting, the correlation map, the per-request watchdog, the backend disconnect cascade — read ./ConnectionModel.md. It is the most load-bearing internal document; almost every failure-mode question routes through it.
For the read-coalescing seam (when duplicate concurrent reads collapse onto one backend request) read ./ReadCoalescing.md. For the opt-in TTL cache and how writes invalidate overlapping read ranges read ./ResponseCache.md. The BCD rewrite itself — what gets rewritten, what passes through, and how CDAB 32-bit values are handled — is in ../Features/BcdRewriting.md.
Operators looking for configuration shape, hot-reload semantics, and the status page should start at ../Operations/Configuration.md and ../Operations/StatusPage.md. When something is misbehaving in production, ../Operations/Troubleshooting.md and ../Reference/LogEvents.md are the two places to look first.
The simulator used by the end-to-end test suite — a pymodbus-based stand-in for a real DL205 — has its own document at ../Testing/Simulator.md. Test-only quirks of that simulator are called out there rather than in the production docs, because the real DL260 ECOM does not share them.
Related Documentation
./ConnectionModel.md— TxId multiplexing, correlation map, per-request watchdog../ReadCoalescing.md— howInFlightByKeyMapcollapses duplicate concurrent reads../ResponseCache.md—ResponseCacheandCacheInvalidatorsemantics.../Features/BcdRewriting.md— theBcdPduPipelinerewrite rules.../Features/HotReload.md—IOptionsMonitorpropagation and supervisor reconciliation.../Operations/Configuration.md—appsettings.jsonschema and tag list shape.../Operations/StatusPage.md— the Kestrel admin endpoint and counter catalog.../Reference/LogEvents.md— stable structured log event names.../design.md— canonical design decisions and rationale.../Testing/Simulator.md—pymodbusDL205 simulator used by the end-to-end suite.../plan/README.md— phase plan with per-phase test inventory.