7466a46aa7
The standalone design.md, kpi.md, operations.md, and the docs/plan/ phase tree were point-in-time planning artefacts now superseded by the topic-organized docs/ tree (Architecture/, Features/, Operations/, Reference/, Testing/). The DL260/ folder mixed a device-reference doc, a test fixture, a sample test, and a screenshot; its contents now live in their natural homes (dl205.md + mbtcp_settings.JPG under docs/Reference/, dl205.json next to its launcher in tests/sim/, sample test dropped). All cross-references in the surviving docs, README, CLAUDE.md, the config template, and source comments are repointed to the new locations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
237 lines
18 KiB
Markdown
237 lines
18 KiB
Markdown
# Simulator Harness
|
|
|
|
The pymodbus DL205 simulator stands in for real DL205/DL260 hardware in the E2E test suite. This document describes the launcher, the xUnit fixture, the skip policy, the per-test timeout discipline, and the pymodbus 3.13.0 framer quirk that the test strategy works around.
|
|
|
|
## Why a Simulator
|
|
|
|
`mbproxy` targets a fleet of AutomationDirect DL205/DL260 controllers that test machines do not have. The pymodbus profile at [`../../tests/sim/dl205.json`](../../tests/sim/dl205.json) already models the device-side quirks (BCD nibbles at known holding-register addresses, CDAB-ordered 32-bit values, C-relay/Y-output coil mappings) as concrete register seeds. The harness wraps that profile in an xUnit `IAsyncLifetime` fixture so every E2E test class opens against a fresh known-good DL-series target without manual setup.
|
|
|
|
The device-side rationale for each seed (why HR 1072 is `0x1234`, why FC03 caps at 128, etc.) lives in [`../Reference/dl205.md`](../Reference/dl205.md). The harness exists to make that profile addressable from xUnit tests; it does not duplicate the device documentation.
|
|
|
|
## Harness Layout
|
|
|
|
Three files form the harness contract:
|
|
|
|
| Path | Role |
|
|
|------|------|
|
|
| `tests/sim/run-dl205-sim.ps1` | PowerShell launcher. Provisions a Python venv under `tests/sim/.venv` on first run (`python -m venv` + `pip install pymodbus`) and execs `pymodbus.simulator` against `dl205.json` on the requested port. Idempotent — re-runs reuse the venv. |
|
|
| `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs` | `IAsyncLifetime` fixture that picks a free port, spawns the launcher, polls for TCP readiness, and tears the process tree down on dispose. |
|
|
| `tests/Mbproxy.Tests/Sim/DL205SimulatorCollection.cs` | `[CollectionDefinition(nameof(DL205SimulatorCollection))]` that exposes the fixture as an xUnit `ICollectionFixture<DL205SimulatorFixture>`. |
|
|
|
|
The launcher is a PowerShell 7+ script; the fixture invokes it via `pwsh -NoProfile -File <script> -Port <picked>`. The script's exit codes (1 = venv failure, 2 = pymodbus launch failure, 3 = profile missing) propagate up through the fixture's stdout/stderr capture for diagnosis.
|
|
|
|
## Fixture Lifecycle
|
|
|
|
`DL205SimulatorFixture` is sealed and lives in `Mbproxy.Tests.Sim`. The lifecycle is bounded entirely by `InitializeAsync` and `DisposeAsync`.
|
|
|
|
### InitializeAsync
|
|
|
|
`InitializeAsync` runs five steps:
|
|
|
|
1. **Pick a free local port.** Bind a `TcpListener` on `IPAddress.Loopback:0`, capture the OS-assigned port into `Port`, and dispose the listener. The TOCTOU window between dispose and pymodbus binding is documented in the source and considered acceptable for tests; a port-steal would manifest as a connect failure inside the readiness poll, which then surfaces via `SkipReason`.
|
|
2. **Resolve the launcher script.** Walks upward from the test assembly directory (`tests/Mbproxy.Tests/bin/<config>/net10.0/`) looking for `tests/sim/run-dl205-sim.ps1`. If not found, `SkipReason` is set with a "could not locate" message that points at the expected layout.
|
|
3. **Verify `pwsh` is on `PATH`.** Spawns `pwsh -NoProfile -Command exit 0` with a 3 s budget. If `pwsh` is missing or returns non-zero, `SkipReason` is set.
|
|
4. **Spawn the simulator subprocess.** `Process.Start` invokes the launcher with the picked port; stdout and stderr are drained asynchronously into a 50-line ring buffer via `BeginOutputReadLine` / `BeginErrorReadLine` to avoid blocking the child on a full pipe.
|
|
5. **Poll for TCP readiness.** Repeatedly opens `TcpClient.ConnectAsync(Host, Port)` at 100 ms intervals until either the connect succeeds, the process exits early, or `ReadinessTimeout` elapses.
|
|
|
|
`ReadinessTimeout` is 120 s. The spec calls for "up to 10 s" of warm-run server startup; the fixture allows the longer budget because a cold run that has to `pip install pymodbus` can take 30 to 90 s depending on network speed. Warm runs (the common case) succeed in under 2 s. Cold-run provisioning is additive and cannot be separated without a separate pre-provision step.
|
|
|
|
If readiness never arrives, the fixture distinguishes two failure modes in `SkipReason`:
|
|
|
|
- **The process exited prematurely.** Likely cause: Python not found, pymodbus not installed, profile path wrong. The exit code and the log tail are included in the skip reason.
|
|
- **The process is still running but never accepted a connection.** Likely cause: port stolen, pymodbus stuck in its own startup, firewall blocking loopback. The log tail is included verbatim.
|
|
|
|
Either way, the process tree is killed before the skip reason is returned, so no orphan pymodbus survives a failed fixture initialisation.
|
|
|
|
### DisposeAsync
|
|
|
|
`DisposeAsync` calls `_process.Kill(entireProcessTree: true)` and then `WaitForExitAsync` with a 5 s cap. Windows lacks a portable graceful `SIGINT` from .NET without P/Invoke into the console-attach APIs; pymodbus's `atexit` handlers may be cut short. The trade-off is documented inline in the fixture — acceptable for test cleanup because the simulator is stateless across runs.
|
|
|
|
### Public Surface
|
|
|
|
The fixture exposes four members:
|
|
|
|
```csharp
|
|
public sealed class DL205SimulatorFixture : IAsyncLifetime
|
|
{
|
|
public string Host { get; } = "127.0.0.1";
|
|
public int Port { get; private set; }
|
|
public string? SkipReason { get; private set; }
|
|
public string LogTail => BuildLogTail();
|
|
}
|
|
```
|
|
|
|
`Host` is always loopback. `Port` is the OS-picked free port. `SkipReason` is non-null when the simulator could not start. `LogTail` returns the last 50 lines of captured stdout/stderr for diagnosis when a test fails.
|
|
|
|
## Skip Policy
|
|
|
|
If Python is missing, `pwsh` is not on `PATH`, or pip provisioning fails, `InitializeAsync` populates `SkipReason` with a human-readable explanation and the fixture proceeds without a live simulator. Every E2E test starts with the same guard:
|
|
|
|
```csharp
|
|
if (_sim.SkipReason is not null)
|
|
Assert.Skip(_sim.SkipReason);
|
|
```
|
|
|
|
The unit-test suite (any test without `[Trait("Category", "E2E")]`) runs without any Python at all. CI machines must have Python 3.10+ and PowerShell 7+; local developers running only unit tests need nothing extra. The unit-test suite's no-skip policy explicitly verifies that on a machine with Python and pymodbus installed, none of the smoke tests skip — a skip on a properly equipped CI machine is treated as an environment failure, not a test pass.
|
|
|
|
The skip reasons the fixture produces map cleanly onto the recovery action:
|
|
|
|
| Skip reason prefix | Cause | Recovery |
|
|
|--------------------|-------|----------|
|
|
| `Could not locate tests/sim/run-dl205-sim.ps1` | Test assembly is too deep, or the script was deleted | Restore the script; verify the upward search starts inside the repo |
|
|
| `pwsh (PowerShell 7+) is not available on PATH` | Windows PowerShell 5.1 is on PATH but not pwsh | Install PowerShell 7+ and ensure `pwsh` resolves |
|
|
| `Failed to spawn pwsh: <message>` | `Process.Start` itself failed | Inspect the inner message; usually a `PATH` or permissions issue |
|
|
| `Simulator process exited prematurely (exit code N)` | Launcher script returned non-zero (Python missing, pymodbus install failed, profile missing) | Read the log tail; exit codes 1/2/3 map to venv / launch / profile failures |
|
|
| `Simulator did not accept a TCP connection on port N within 120 s` | Process is alive but never bound the port | Read the log tail; usually a port-steal or a pymodbus internal hang |
|
|
|
|
## Adding an E2E Test
|
|
|
|
An E2E test class declares the collection, the category trait, takes the fixture in its constructor, and guards every test with the skip check. `SimulatorSmokeTests` in `tests/Mbproxy.Tests/Sim/` is the canonical minimal example:
|
|
|
|
```csharp
|
|
[Collection(nameof(DL205SimulatorCollection))]
|
|
[Trait("Category", "E2E")]
|
|
public sealed class SimulatorSmokeTests
|
|
{
|
|
private readonly DL205SimulatorFixture _sim;
|
|
|
|
public SimulatorSmokeTests(DL205SimulatorFixture sim) => _sim = sim;
|
|
|
|
[Fact(Timeout = 5_000)]
|
|
public async Task Simulator_FC03_ReturnsBCD_RawValueAtHR1072_0x1234()
|
|
{
|
|
if (_sim.SkipReason is not null)
|
|
Assert.Skip(_sim.SkipReason);
|
|
|
|
using var client = new TcpClient();
|
|
await client.ConnectAsync(_sim.Host, _sim.Port,
|
|
TestContext.Current.CancellationToken);
|
|
|
|
var master = new ModbusFactory().CreateMaster(client);
|
|
ushort[] regs = master.ReadHoldingRegisters(
|
|
slaveAddress: 1, startAddress: 1072, numberOfPoints: 1);
|
|
|
|
Assert.Equal(0x1234, regs[0]); // raw BCD nibbles, NOT binary 1234
|
|
}
|
|
}
|
|
```
|
|
|
|
For a proxy-shaped test, configure an in-process host pointing at `_sim.Host:_sim.Port` as the backend PLC and drive `NModbus` against the proxy's listen port. `MultiplexerE2ETests` in `tests/Mbproxy.Tests/Proxy/Multiplexing/` is the working example with the in-process `Host.CreateApplicationBuilder()` setup.
|
|
|
|
## Per-Test Timeout Policy
|
|
|
|
`[Fact(Timeout = 5_000)]` is the default for every E2E test. Expand per-test only when the test genuinely needs longer — concurrent bursts above 100 ops, reload-propagation debounce, graceful-shutdown drain, Polly-paced backend reconnects. Add a one-line comment on the test explaining the reason whenever the timeout exceeds the 5 s default. `MultiplexerE2ETests.E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers` uses `[Fact(Timeout = 8_000)]` and documents the Polly backoff budget inline.
|
|
|
|
The reason a hard timeout matters: synchronous `NModbus` calls do not honor `TestContext.Current.CancellationToken`. Without `[Fact(Timeout=…)]`, a deadlock anywhere in the proxy hangs the test runner indefinitely. The hang-diagnosis pattern for when this nonetheless happens lives in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
|
|
|
The test-runner backstop is a process-level safety net:
|
|
|
|
```powershell
|
|
dotnet test --filter Category=E2E --blame-hang-timeout 2m
|
|
```
|
|
|
|
The `--blame-hang-timeout` is mandatory for E2E runs. It catches the rare case where an individual test's `Timeout` somehow does not fire — for example, an unmanaged thread blocking finalization.
|
|
|
|
## The Pymodbus 3.13.0 Framer Quirk
|
|
|
|
Pymodbus 3.13.0's `ServerRequestHandler` stores a single `last_pdu` field per TCP connection and schedules the deferred handler via `asyncio.call_soon`. If two MBAP frames arrive in the same recv buffer — which the multiplexer's shared backend connection produces under truly concurrent upstream reads — the later frame overwrites `last_pdu` before the first scheduled handler runs, and both responses then carry the later request's TxId. The real DL260 ECOM does not exhibit this bug; it echoes per-request TxIds correctly.
|
|
|
|
This forces a three-part test strategy:
|
|
|
|
- **Multiplexer correctness under concurrent backend traffic is proven against a stub backend.** `PlcMultiplexerTests` drives the multiplexer with a stub that properly echoes per-request TxIds. That test class is the load-bearing coverage for TxId rewriting; the simulator does not contribute here.
|
|
- **Simulator-backed E2E tests pace requests** to keep pymodbus in known-good single-PDU mode. `MultiplexerE2ETests` enforces serialisation explicitly: each upstream client's request is issued only after the previous client's response has returned. The `<summary>` block at the top of `MultiplexerE2ETests.cs` documents the trade-off and points readers to `PlcMultiplexerTests` for the concurrency proof.
|
|
- **The per-request watchdog defends production.** Configurable via `Connection.BackendRequestTimeoutMs`, the multiplexer cancels any in-flight request whose response does not arrive within the budget and surfaces Modbus exception `0x0B` upstream. The `mbproxy.multiplex.request.timeout` log event (see [`../Reference/LogEvents.md`](../Reference/LogEvents.md)) is the operational signal. The same code path defends against any backend — pymodbus, a misbehaving ECOM, or a network middlebox — that mis-echoes or drops a TxId.
|
|
|
|
The connection-model rationale for why the multiplexer produces multi-frame recv buffers in the first place is in [`../Architecture/ConnectionModel.md`](../Architecture/ConnectionModel.md).
|
|
|
|
## Simulator Profile
|
|
|
|
`tests/sim/dl205.json` is the pymodbus server config. It seeds the registers the E2E tests assert against:
|
|
|
|
| Address | Width | Seeded value | Used to prove |
|
|
|---------|-------|--------------|---------------|
|
|
| HR 0 | uint16 | `0xCAFE` | Profile is loaded; register 0 is valid on DL205/DL260 |
|
|
| HR 200..209 | uint16 | scratch range, writable | FC06/FC16 round-trips for BCD-rewriter E2E tests |
|
|
| HR 1072 | uint16 | `0x1234` (raw BCD nibbles) | Single-register FC03 BCD decode through the proxy |
|
|
| HR 1080/1081 | uint16 pair | CDAB-ordered 32-bit BCD | 32-bit BCD decode across the word pair |
|
|
|
|
The full register map and the device-side rationale for each entry live in [`../Reference/dl205.md`](../Reference/dl205.md).
|
|
|
|
Two profile-level settings are load-bearing for the harness:
|
|
|
|
- **`"shared blocks": true`** — matches the DL series memory model where holding registers and input registers share the same backing store. The proxy's tests assume this; switching it off would change which addresses appear via FC03 versus FC04.
|
|
- **`"type exception": false`** — controls whether pymodbus raises an exception when an address is read via a function code that does not match its declared type. The default `false` is the lax behaviour the tests rely on. Flipping it to `true` is an alternate-profile scenario, not a default-profile change.
|
|
|
|
The `write` block in the JSON controls which ranges accept FC06/FC16. Writes outside the listed ranges return Modbus exception 02 (illegal data address), which is itself a useful condition the proxy must forward correctly. `MultiplexerE2ETests.E2E_RewriterStillWorks_UnderMultiplexedThreeClients` uses addresses 200, 202, and 204 from the writable scratch range for exactly this reason — read-only addresses would return exception 02 on the write step and break the round-trip assertion.
|
|
|
|
## Alternate Profiles
|
|
|
|
The `MODBUS_SIM_PROFILE` environment variable selects an alternate profile alongside `dl205.json`. This is the seam for scenario-specific simulators — for example, a profile with `"type exception": true` to verify the proxy does not depend on the default lax pymodbus behaviour, or a profile that seeds a specific partial-overlap test case at a known address. When a new scenario needs its own profile, drop the JSON alongside `dl205.json` and select it via the env var rather than swapping the default — the default profile is the contract for the smoke tests and `MultiplexerE2ETests` and should not be silently mutated.
|
|
|
|
## Running the Simulator Standalone
|
|
|
|
The launcher is usable outside xUnit for ad-hoc debugging:
|
|
|
|
```powershell
|
|
pwsh tests/sim/run-dl205-sim.ps1 -Port 5020
|
|
```
|
|
|
|
The script provisions the venv on first run and execs `pymodbus.simulator`. Output streams to the terminal; Ctrl-C exits cleanly because the pymodbus process is attached to the script's console group. Useful for poking at the profile with an external Modbus client (e.g. ModScan, mbpoll, NModbus from `dotnet fsi`) without running the test harness.
|
|
|
|
A typical debugging loop:
|
|
|
|
1. Launch the simulator standalone on a fixed port.
|
|
2. Point a manually built proxy host at it via `appsettings.json` with `Host=127.0.0.1, Port=5020`.
|
|
3. Drive the proxy from a Modbus client and inspect log events at `Verbose` to see the rewriter in action.
|
|
|
|
The standalone launcher uses the same script the fixture invokes, so behaviour is identical between the test harness and ad-hoc runs.
|
|
|
|
## End-to-End Test Shape
|
|
|
|
A full E2E test wires the simulator, an in-process proxy host, and an `NModbus` client into the same loopback stack:
|
|
|
|
```csharp
|
|
// 1. Simulator runs at _sim.Host:_sim.Port (fixture-managed).
|
|
// 2. Build an in-process proxy host pointing at the simulator as its PLC backend.
|
|
int proxyPort = PickFreePort();
|
|
var config = new Dictionary<string, string?>
|
|
{
|
|
["Mbproxy:AdminPort"] = "0",
|
|
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
|
|
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
|
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
|
|
[$"Mbproxy:Plcs:0:Port"] = _sim.Port.ToString(),
|
|
["Mbproxy:BcdTags:Global:0:Address"] = "1072",
|
|
["Mbproxy:BcdTags:Global:0:Width"] = "16",
|
|
};
|
|
|
|
var host = BuildBcdHost(config);
|
|
await host.StartAsync(startCts.Token);
|
|
|
|
// 3. Drive NModbus against the proxy port.
|
|
using var client = new TcpClient();
|
|
await client.ConnectAsync("127.0.0.1", proxyPort,
|
|
TestContext.Current.CancellationToken);
|
|
var master = new ModbusFactory().CreateMaster(client);
|
|
|
|
// 4a. Read: simulator returns raw 0x1234, proxy rewrites to binary 1234.
|
|
ushort[] regs = master.ReadHoldingRegisters(1, 1072, 1);
|
|
regs[0].ShouldBe((ushort)1234);
|
|
|
|
// 4b. Write (against a writable scratch address, e.g. HR 200):
|
|
// client writes binary 1234, proxy re-encodes to 0x1234 BCD nibbles,
|
|
// simulator stores the nibbles.
|
|
master.WriteSingleRegister(1, 200, 1234);
|
|
```
|
|
|
|
The read direction proves the proxy rewrote the response; the write direction proves the proxy rewrote the request. Both assertions running against the same simulator instance is the smallest viable end-to-end signal that the BCD rewriter is correctly wired.
|
|
|
|
## Related Documentation
|
|
|
|
- [Connection Model](../Architecture/ConnectionModel.md) — why the multiplexer's shared backend connection produces the multi-frame condition that triggers pymodbus's framer quirk
|
|
- [Troubleshooting](../Operations/Troubleshooting.md) — hang-diagnosis pattern for tests that exceed their `[Fact(Timeout)]`
|
|
- [Log Events](../Reference/LogEvents.md) — `mbproxy.multiplex.request.timeout` is the production watchdog against TxId mis-echo
|
|
- [DL205/DL260 device quirks](../Reference/dl205.md) — device-side rationale for every register the simulator profile seeds
|
|
|
|
Test discipline: E2E tests default to a 5 000 ms `[Fact(Timeout)]`, and `dotnet test` is run with `--blame-hang-timeout` to capture a dump on any hang.
|