Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.
Architecture/
- Overview.md (150 lines) — listener topology, request flow, per-PLC isolation
- ConnectionModel.md (247 lines) — TxId multiplexer, watchdog, disconnect cascade
- ReadCoalescing.md (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
- ResponseCache.md (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation
Features/
- BcdRewriting.md (252 lines) — codec, CDAB, FC scope, partial-overlap policy
- HotReload.md (189 lines) — IOptionsMonitor + per-change-kind reconcile rules
Operations/
- Configuration.md (422 lines) — every Mbproxy:* option + validation rules
- StatusPage.md (334 lines) — admin endpoint surface, every JSON field
- Troubleshooting.md (364 lines) — diagnosis playbook keyed to log events
Reference/
- LogEvents.md (499 lines) — 28 events across 7 categories, grep-verified
Testing/
- Simulator.md (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk
Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.
Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Simulator Harness
The pymodbus DL205 simulator stands in for real DL205/DL260 hardware in the E2E test suite. This document describes the launcher, the xUnit fixture, the skip policy, the per-test timeout discipline, and the pymodbus 3.13.0 framer quirk that the test strategy works around.
Why a Simulator
mbproxy targets a fleet of AutomationDirect DL205/DL260 controllers that test machines do not have. The pymodbus profile at ../../DL260/dl205.json already models the device-side quirks (BCD nibbles at known holding-register addresses, CDAB-ordered 32-bit values, C-relay/Y-output coil mappings) as concrete register seeds. The harness wraps that profile in an xUnit IAsyncLifetime fixture so every E2E test class opens against a fresh known-good DL-series target without manual setup.
The device-side rationale for each seed (why HR 1072 is 0x1234, why FC03 caps at 128, etc.) lives in ../../DL260/dl205.md. The harness exists to make that profile addressable from xUnit tests; it does not duplicate the device documentation.
Harness Layout
Three files form the harness contract:
| Path | Role |
|---|---|
tests/sim/run-dl205-sim.ps1 |
PowerShell launcher. Provisions a Python venv under tests/sim/.venv on first run (python -m venv + pip install pymodbus) and execs pymodbus.simulator against dl205.json on the requested port. Idempotent — re-runs reuse the venv. |
tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs |
IAsyncLifetime fixture that picks a free port, spawns the launcher, polls for TCP readiness, and tears the process tree down on dispose. |
tests/Mbproxy.Tests/Sim/DL205SimulatorCollection.cs |
[CollectionDefinition(nameof(DL205SimulatorCollection))] that exposes the fixture as an xUnit ICollectionFixture<DL205SimulatorFixture>. |
The launcher is a PowerShell 7+ script; the fixture invokes it via pwsh -NoProfile -File <script> -Port <picked>. The script's exit codes (1 = venv failure, 2 = pymodbus launch failure, 3 = profile missing) propagate up through the fixture's stdout/stderr capture for diagnosis.
Fixture Lifecycle
DL205SimulatorFixture is sealed and lives in Mbproxy.Tests.Sim. The lifecycle is bounded entirely by InitializeAsync and DisposeAsync.
InitializeAsync
InitializeAsync runs five steps:
- Pick a free local port. Bind a
TcpListeneronIPAddress.Loopback:0, capture the OS-assigned port intoPort, and dispose the listener. The TOCTOU window between dispose and pymodbus binding is documented in the source and considered acceptable for tests; a port-steal would manifest as a connect failure inside the readiness poll, which then surfaces viaSkipReason. - Resolve the launcher script. Walks upward from the test assembly directory (
tests/Mbproxy.Tests/bin/<config>/net10.0/) looking fortests/sim/run-dl205-sim.ps1. If not found,SkipReasonis set with a "could not locate" message that points at the expected layout. - Verify
pwshis onPATH. Spawnspwsh -NoProfile -Command exit 0with a 3 s budget. Ifpwshis missing or returns non-zero,SkipReasonis set. - Spawn the simulator subprocess.
Process.Startinvokes the launcher with the picked port; stdout and stderr are drained asynchronously into a 50-line ring buffer viaBeginOutputReadLine/BeginErrorReadLineto avoid blocking the child on a full pipe. - Poll for TCP readiness. Repeatedly opens
TcpClient.ConnectAsync(Host, Port)at 100 ms intervals until either the connect succeeds, the process exits early, orReadinessTimeoutelapses.
ReadinessTimeout is 120 s. The spec calls for "up to 10 s" of warm-run server startup; the fixture allows the longer budget because a cold run that has to pip install pymodbus can take 30 to 90 s depending on network speed. Warm runs (the common case) succeed in under 2 s. Cold-run provisioning is additive and cannot be separated without a separate pre-provision step.
If readiness never arrives, the fixture distinguishes two failure modes in SkipReason:
- The process exited prematurely. Likely cause: Python not found, pymodbus not installed, profile path wrong. The exit code and the log tail are included in the skip reason.
- The process is still running but never accepted a connection. Likely cause: port stolen, pymodbus stuck in its own startup, firewall blocking loopback. The log tail is included verbatim.
Either way, the process tree is killed before the skip reason is returned, so no orphan pymodbus survives a failed fixture initialisation.
DisposeAsync
DisposeAsync calls _process.Kill(entireProcessTree: true) and then WaitForExitAsync with a 5 s cap. Windows lacks a portable graceful SIGINT from .NET without P/Invoke into the console-attach APIs; pymodbus's atexit handlers may be cut short. The trade-off is documented inline in the fixture — acceptable for test cleanup because the simulator is stateless across runs.
Public Surface
The fixture exposes four members:
public sealed class DL205SimulatorFixture : IAsyncLifetime
{
public string Host { get; } = "127.0.0.1";
public int Port { get; private set; }
public string? SkipReason { get; private set; }
public string LogTail => BuildLogTail();
}
Host is always loopback. Port is the OS-picked free port. SkipReason is non-null when the simulator could not start. LogTail returns the last 50 lines of captured stdout/stderr for diagnosis when a test fails.
Skip Policy
If Python is missing, pwsh is not on PATH, or pip provisioning fails, InitializeAsync populates SkipReason with a human-readable explanation and the fixture proceeds without a live simulator. Every E2E test starts with the same guard:
if (_sim.SkipReason is not null)
Assert.Skip(_sim.SkipReason);
The unit-test suite (any test without [Trait("Category", "E2E")]) runs without any Python at all. CI machines must have Python 3.10+ and PowerShell 7+; local developers running only unit tests need nothing extra. The phase-01 gate (see ../plan/README.md) explicitly verifies that on a machine with Python and pymodbus installed, none of the smoke tests skip — a skip on a properly equipped CI machine is treated as an environment failure, not a test pass.
The skip reasons the fixture produces map cleanly onto the recovery action:
| Skip reason prefix | Cause | Recovery |
|---|---|---|
Could not locate tests/sim/run-dl205-sim.ps1 |
Test assembly is too deep, or the script was deleted | Restore the script; verify the upward search starts inside the repo |
pwsh (PowerShell 7+) is not available on PATH |
Windows PowerShell 5.1 is on PATH but not pwsh | Install PowerShell 7+ and ensure pwsh resolves |
Failed to spawn pwsh: <message> |
Process.Start itself failed |
Inspect the inner message; usually a PATH or permissions issue |
Simulator process exited prematurely (exit code N) |
Launcher script returned non-zero (Python missing, pymodbus install failed, profile missing) | Read the log tail; exit codes 1/2/3 map to venv / launch / profile failures |
Simulator did not accept a TCP connection on port N within 120 s |
Process is alive but never bound the port | Read the log tail; usually a port-steal or a pymodbus internal hang |
Adding an E2E Test
An E2E test class declares the collection, the category trait, takes the fixture in its constructor, and guards every test with the skip check. SimulatorSmokeTests in tests/Mbproxy.Tests/Sim/ is the canonical minimal example:
[Collection(nameof(DL205SimulatorCollection))]
[Trait("Category", "E2E")]
public sealed class SimulatorSmokeTests
{
private readonly DL205SimulatorFixture _sim;
public SimulatorSmokeTests(DL205SimulatorFixture sim) => _sim = sim;
[Fact(Timeout = 5_000)]
public async Task Simulator_FC03_ReturnsBCD_RawValueAtHR1072_0x1234()
{
if (_sim.SkipReason is not null)
Assert.Skip(_sim.SkipReason);
using var client = new TcpClient();
await client.ConnectAsync(_sim.Host, _sim.Port,
TestContext.Current.CancellationToken);
var master = new ModbusFactory().CreateMaster(client);
ushort[] regs = master.ReadHoldingRegisters(
slaveAddress: 1, startAddress: 1072, numberOfPoints: 1);
Assert.Equal(0x1234, regs[0]); // raw BCD nibbles, NOT binary 1234
}
}
For a proxy-shaped test, configure an in-process host pointing at _sim.Host:_sim.Port as the backend PLC and drive NModbus against the proxy's listen port. MultiplexerE2ETests in tests/Mbproxy.Tests/Proxy/Multiplexing/ is the working example with the in-process Host.CreateApplicationBuilder() setup.
Per-Test Timeout Policy
[Fact(Timeout = 5_000)] is the default for every E2E test. Expand per-test only when the test genuinely needs longer — concurrent bursts above 100 ops, reload-propagation debounce, graceful-shutdown drain, Polly-paced backend reconnects. Add a one-line comment on the test explaining the reason whenever the timeout exceeds the 5 s default. MultiplexerE2ETests.E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers uses [Fact(Timeout = 8_000)] and documents the Polly backoff budget inline.
The reason a hard timeout matters: synchronous NModbus calls do not honor TestContext.Current.CancellationToken. Without [Fact(Timeout=…)], a deadlock anywhere in the proxy hangs the test runner indefinitely. The hang-diagnosis pattern for when this nonetheless happens lives in ../Operations/Troubleshooting.md.
The test-runner backstop is a process-level safety net:
dotnet test --filter Category=E2E --blame-hang-timeout 2m
The --blame-hang-timeout is mandatory for E2E runs. It catches the rare case where an individual test's Timeout somehow does not fire — for example, an unmanaged thread blocking finalization.
The Pymodbus 3.13.0 Framer Quirk
Pymodbus 3.13.0's ServerRequestHandler stores a single last_pdu field per TCP connection and schedules the deferred handler via asyncio.call_soon. If two MBAP frames arrive in the same recv buffer — which the multiplexer's shared backend connection produces under truly concurrent upstream reads — the later frame overwrites last_pdu before the first scheduled handler runs, and both responses then carry the later request's TxId. The real DL260 ECOM does not exhibit this bug; it echoes per-request TxIds correctly.
This forces a three-part test strategy:
- Multiplexer correctness under concurrent backend traffic is proven against a stub backend.
PlcMultiplexerTestsdrives the multiplexer with a stub that properly echoes per-request TxIds. That test class is the load-bearing coverage for TxId rewriting; the simulator does not contribute here. - Simulator-backed E2E tests pace requests to keep pymodbus in known-good single-PDU mode.
MultiplexerE2ETestsenforces serialisation explicitly: each upstream client's request is issued only after the previous client's response has returned. The<summary>block at the top ofMultiplexerE2ETests.csdocuments the trade-off and points readers toPlcMultiplexerTestsfor the concurrency proof. - The per-request watchdog defends production. Configurable via
Connection.BackendRequestTimeoutMs, the multiplexer cancels any in-flight request whose response does not arrive within the budget and surfaces Modbus exception0x0Bupstream. Thembproxy.multiplex.request.timeoutlog event (see../Reference/LogEvents.md) is the operational signal. The same code path defends against any backend — pymodbus, a misbehaving ECOM, or a network middlebox — that mis-echoes or drops a TxId.
The connection-model rationale for why the multiplexer produces multi-frame recv buffers in the first place is in ../Architecture/ConnectionModel.md.
Simulator Profile
DL260/dl205.json is the pymodbus server config. It seeds the registers the E2E tests assert against:
| Address | Width | Seeded value | Used to prove |
|---|---|---|---|
| HR 0 | uint16 | 0xCAFE |
Profile is loaded; register 0 is valid on DL205/DL260 |
| HR 200..209 | uint16 | scratch range, writable | FC06/FC16 round-trips for BCD-rewriter E2E tests |
| HR 1072 | uint16 | 0x1234 (raw BCD nibbles) |
Single-register FC03 BCD decode through the proxy |
| HR 1080/1081 | uint16 pair | CDAB-ordered 32-bit BCD | 32-bit BCD decode across the word pair |
The full register map and the device-side rationale for each entry live in ../../DL260/dl205.md.
Two profile-level settings are load-bearing for the harness:
"shared blocks": true— matches the DL series memory model where holding registers and input registers share the same backing store. The proxy's tests assume this; switching it off would change which addresses appear via FC03 versus FC04."type exception": false— controls whether pymodbus raises an exception when an address is read via a function code that does not match its declared type. The defaultfalseis the lax behaviour the tests rely on. Flipping it totrueis an alternate-profile scenario, not a default-profile change.
The write block in the JSON controls which ranges accept FC06/FC16. Writes outside the listed ranges return Modbus exception 02 (illegal data address), which is itself a useful condition the proxy must forward correctly. MultiplexerE2ETests.E2E_RewriterStillWorks_UnderMultiplexedThreeClients uses addresses 200, 202, and 204 from the writable scratch range for exactly this reason — read-only addresses would return exception 02 on the write step and break the round-trip assertion.
Alternate Profiles
The MODBUS_SIM_PROFILE environment variable selects an alternate profile alongside dl205.json. This is the seam for scenario-specific simulators — for example, a profile with "type exception": true to verify the proxy does not depend on the default lax pymodbus behaviour, or a profile that seeds a specific partial-overlap test case at a known address. The existing pattern is DL260/DL205BcdQuirkTests.cs, which already drives the simulator with profile-driven assertions. When a new scenario needs its own profile, drop the JSON alongside dl205.json and select it via the env var rather than swapping the default — the default profile is the contract for the smoke tests and MultiplexerE2ETests and should not be silently mutated.
Running the Simulator Standalone
The launcher is usable outside xUnit for ad-hoc debugging:
pwsh tests/sim/run-dl205-sim.ps1 -Port 5020
The script provisions the venv on first run and execs pymodbus.simulator. Output streams to the terminal; Ctrl-C exits cleanly because the pymodbus process is attached to the script's console group. Useful for poking at the profile with an external Modbus client (e.g. ModScan, mbpoll, NModbus from dotnet fsi) without running the test harness.
A typical debugging loop:
- Launch the simulator standalone on a fixed port.
- Point a manually built proxy host at it via
appsettings.jsonwithHost=127.0.0.1, Port=5020. - Drive the proxy from a Modbus client and inspect log events at
Verboseto see the rewriter in action.
The standalone launcher uses the same script the fixture invokes, so behaviour is identical between the test harness and ad-hoc runs.
End-to-End Test Shape
A full E2E test wires the simulator, an in-process proxy host, and an NModbus client into the same loopback stack:
// 1. Simulator runs at _sim.Host:_sim.Port (fixture-managed).
// 2. Build an in-process proxy host pointing at the simulator as its PLC backend.
int proxyPort = PickFreePort();
var config = new Dictionary<string, string?>
{
["Mbproxy:AdminPort"] = "0",
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
[$"Mbproxy:Plcs:0:Port"] = _sim.Port.ToString(),
["Mbproxy:BcdTags:Global:0:Address"] = "1072",
["Mbproxy:BcdTags:Global:0:Width"] = "16",
};
var host = BuildBcdHost(config);
await host.StartAsync(startCts.Token);
// 3. Drive NModbus against the proxy port.
using var client = new TcpClient();
await client.ConnectAsync("127.0.0.1", proxyPort,
TestContext.Current.CancellationToken);
var master = new ModbusFactory().CreateMaster(client);
// 4a. Read: simulator returns raw 0x1234, proxy rewrites to binary 1234.
ushort[] regs = master.ReadHoldingRegisters(1, 1072, 1);
regs[0].ShouldBe((ushort)1234);
// 4b. Write (against a writable scratch address, e.g. HR 200):
// client writes binary 1234, proxy re-encodes to 0x1234 BCD nibbles,
// simulator stores the nibbles.
master.WriteSingleRegister(1, 200, 1234);
The read direction proves the proxy rewrote the response; the write direction proves the proxy rewrote the request. Both assertions running against the same simulator instance is the smallest viable end-to-end signal that the BCD rewriter is correctly wired.
Related Documentation
- Connection Model — why the multiplexer's shared backend connection produces the multi-frame condition that triggers pymodbus's framer quirk
- Troubleshooting — hang-diagnosis pattern for tests that exceed their
[Fact(Timeout)] - Log Events —
mbproxy.multiplex.request.timeoutis the production watchdog against TxId mis-echo - DL205/DL260 device quirks — device-side rationale for every register the simulator profile seeds
- Phase plan README — Test discipline section that codifies the 5 000 ms default and the
--blame-hang-timeoutrule