Files
wwtools/mbproxy/docs/plan/01-simulator-harness.md
T
Joseph Doherty 56eee3c563 mbproxy: initial commit through Phase 9 (TxId multiplexing)
Adds the mbproxy service end-to-end. Phases 00-08 implement the
production-ready single-listener / 1:1-backend transparent Modbus TCP
proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260
fleet. Phase 9 replaces the connection layer with a single backend
socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's
4-concurrent-client cap as an operational ceiling.

Phase 9 additions of note:
- PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap
- InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing
  for Phase 10 read coalescing — do not collapse to a single field)
- Per-request watchdog: surfaces Modbus exception 0x0B to upstream
  on BackendRequestTimeoutMs, defending against lost responses,
  dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed-
  request bug (its ServerRequestHandler.last_pdu state race)
- Status DTO + HTML gain inFlight / maxInFlight / txIdWraps /
  disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md)

Tests: 263 unit + 38 E2E. Multiplexer correctness under truly
concurrent backend traffic is proved against a stub backend in
PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus
3.13's single-PDU framer stays in known-good mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:49:35 -04:00

7.7 KiB

Phase 01 — Simulator harness

Wrap the existing pymodbus profile at ../../DL260/dl205.json as a managed lifecycle for xUnit tests. After this phase, any test class that declares [Collection(nameof(DL205SimulatorCollection))] gets a running pymodbus server on a known port, with skip-safe behaviour when Python is unavailable.

Depends on: Phase 00 (test project exists). Parallel-safe with: Phase 02, Phase 03. (Touches only tests/sim/ and tests/Mbproxy.Tests/Sim/. Disjoint from codec and proxy work.)

Goal

Eliminate "did the simulator start?" as a source of flaky tests. Encode the launch / readiness-probe / shutdown / cleanup contract once, in a fixture, so phases 03 / 04 / 05 / 06 / 07 don't each reinvent it. Tests must be able to declare a dependency on the simulator and get a hot port back, OR get a clean skip if the environment can't provide one.

Outputs

tests/sim/run-dl205-sim.ps1                    # idempotent launcher; venv-provisioning
tests/sim/README.md                            # how to run the simulator standalone
tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs
tests/Mbproxy.Tests/Sim/DL205SimulatorCollection.cs
tests/Mbproxy.Tests/Sim/SimulatorSmokeTests.cs # connects, sends FC03, verifies a seeded BCD register

Modifications:

  • .gitignore already has tests/sim/.venv/ from phase 00 — verify it's present.
  • tests/Mbproxy.Tests/Mbproxy.Tests.csproj — add NModbus PackageReference (chosen for its small footprint and net10.0 compatibility; record the choice as a top-of-csproj comment). This is the Modbus TCP client used by tests against the simulator from this phase forward.

No other files.

Tasks

  1. tests/sim/run-dl205-sim.ps1 — pure PowerShell. Parameters: -Profile <path> (default ../DL260/dl205.json relative to script), -Port <int> (default 5020). Behaviour:
    • If tests/sim/.venv doesn't exist: python -m venv tests/sim/.venv, then tests/sim/.venv/Scripts/pip.exe install "pymodbus[server]" pinned to a known version (record version in the script + README).
    • Activate the venv (& tests/sim/.venv/Scripts/activate.ps1).
    • Exec pymodbus.server run --modbus-config-path <Profile> --modbus-server tcp --port <Port>. Output streams to stdout/stderr; on script termination, the child server dies with it.
    • Exit codes: 0 on clean exit, 1 on venv provisioning failure, 2 on pymodbus launch failure, 3 if the profile file is missing.
  2. DL205SimulatorFixture : IAsyncLifetime
    • InitializeAsync: pick a free local port (bind/release a TcpListener on IPEndPoint.Any:0, capture the port, dispose). Spawn pwsh -NoProfile -File <run-dl205-sim.ps1> -Port <picked> via System.Diagnostics.Process with RedirectStandardOutput/Error. Poll new TcpClient().ConnectAsync("127.0.0.1", port) at 100 ms intervals for up to 10 s. If the simulator never accepts a connection, capture stderr tail, set SkipReason, and dispose the process.
    • DisposeAsync: send Ctrl-C to the process group (Process.Kill(entireProcessTree: true) on Windows is the pragmatic choice — pymodbus handles SIGTERM gracefully but Windows lacks proper signals; document the tradeoff in a comment). Wait up to 5 s for exit.
    • Public surface: string Host { get; } (always 127.0.0.1), int Port { get; }, string? SkipReason { get; }, string LogTail { get; } (last ~50 lines of stderr, for diagnosis).
  3. DL205SimulatorCollection
    [CollectionDefinition(nameof(DL205SimulatorCollection))]
    public sealed class DL205SimulatorCollection : ICollectionFixture<DL205SimulatorFixture> { }
    
    Tests that need the fixture declare [Collection(nameof(DL205SimulatorCollection))].
  4. SimulatorSmokeTests[Collection(nameof(DL205SimulatorCollection))] [Trait("Category", "E2E")]. Three tests:
    • Simulator_AcceptsTcpConnection
    • Simulator_FC03_ReturnsSeededValue_AtHR0_0xCAFE — reads register 0, expects 0xCAFE (the seeded marker from dl205.json). Uses NModbus directly. This proves the dl205.json profile is in fact loaded.
    • Simulator_FC03_ReturnsBCD_RawValueAtHR1072_0x1234 — reads register 1072, expects raw 0x1234 (= 4660). This is the BCD register the proxy will rewrite later; phase 04's e2e test will read the SAME register through the proxy and assert 1234 instead.
  5. tests/sim/README.md — a few lines: "Run pwsh ./run-dl205-sim.ps1 -Port 5020 to launch the simulator standalone. Used by xUnit tests via DL205SimulatorFixture. Requires Python 3.10+; the script provisions a venv on first run."

Public surface declared in this phase

namespace Mbproxy.Tests.Sim;

public sealed class DL205SimulatorFixture : IAsyncLifetime {
    public string Host { get; }
    public int Port { get; }
    public string? SkipReason { get; }
    public string LogTail { get; }
    public Task InitializeAsync();
    public Task DisposeAsync();
}

[CollectionDefinition(nameof(DL205SimulatorCollection))]
public sealed class DL205SimulatorCollection : ICollectionFixture<DL205SimulatorFixture> { }

No production code is added in this phase.

Tests required

Unit (Category = Unit)

None in this phase. The fixture itself is a test-infrastructure component; its correctness is verified by the e2e smoke tests below.

E2E (Category = E2E)

  1. Simulator_AcceptsTcpConnection — open a TCP socket to fixture.Host:fixture.Port within the fixture lifetime.
  2. Simulator_FC03_ReturnsSeededValue_AtHR0_0xCAFE — NModbus FC03, asserts 0xCAFE.
  3. Simulator_FC03_ReturnsBCD_RawValueAtHR1072_0x1234 — NModbus FC03, asserts raw 0x1234 (4660).

When SkipReason is set, all three skip with Assert.Skip(fixture.SkipReason). The phase gate explicitly verifies that on a machine WITH Python+pymodbus, none of them skip — skips are an environment failure, not a test pass.

Phase gate

  • pwsh tests/sim/run-dl205-sim.ps1 -Port 5020 standalone — script provisions a venv on first run, server logs "Modbus TCP server listening" within 10 s, Ctrl-C exits cleanly.
  • On second run: venv exists, script skips provisioning, server starts in < 2 s.
  • On a machine WITHOUT Python: SkipReason is non-null and tests skip rather than fail.
  • On a machine WITH Python: SkipReason is null, all three e2e smoke tests pass.
  • dotnet test --filter Category=E2E is green on the dev machine.
  • dotnet test --filter Category!=E2E still green (no regression to phase 00's tests).
  • Build zero-warnings.
  • tests/sim/README.md documents the manual launch path.

Out of scope

  • Multiple simultaneous simulators (one fixture instance is enough for all e2e tests via ICollectionFixture).
  • Alternate profiles selected via MODBUS_SIM_PROFILE env var — defer until phase 04 actually needs a partial-overlap scenario; add the env-var support then.
  • A C# pymodbus replacement / in-process Modbus mock. The pymodbus profile is the source of truth for DL-series quirks and we're not duplicating it.
  • pip-mirror or offline-install support. CI is expected to have network or a pre-warmed venv; if a customer site needs offline install, that's a deployment concern (phase 08).

Notes for the subagent

  • Capture the chosen pymodbus version pin in both run-dl205-sim.ps1 and tests/sim/README.md so the version isn't lost across re-provisioning.
  • The free-port-picker pattern (bind on :0, capture port, dispose, then hand the port to the child process) has an inherent TOCTOU race — another process could grab the port between dispose and pymodbus binding. In practice this is rare; acceptable for tests. Note the trade-off in a comment.
  • Pymodbus log output is verbose. Pipe it through a line buffer; only the last ~50 lines need to be available via LogTail for diagnosis.
  • Do not commit the .venv/ directory.