56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
342 lines
31 KiB
Markdown
342 lines
31 KiB
Markdown
# Phase 09 — MBAP TxId multiplexing (single backend connection per PLC)
|
|
|
|
Replace the 1:1 upstream-client ↔ backend-socket model with a **single backend connection per PLC**, multiplexed across all upstream clients via MBAP transaction-ID rewriting and a correlation map. After this phase the H2-ECOM100's 4-simultaneous-TCP-client cap is no longer an operational ceiling — the proxy holds exactly one slot per PLC regardless of how many upstream clients are connected.
|
|
|
|
**Status:** shipped 2026-05-14. Phases 00-08 shipped the production-ready 1:1 model; this phase swapped connection management without changing the transparent-rewrite contract.
|
|
|
|
## Implementation clarifications discovered during 2026-05-14 ship
|
|
|
|
These notes capture decisions and surprises that surfaced during the actual implementation. They supplement (not replace) the Tasks section below.
|
|
|
|
1. **A per-request timeout watchdog is part of Phase 9, not deferred.** The 1:1 model collapsed missing-response handling onto the dedicated backend socket dying. The multiplexed model needs an explicit timer because a single lost or mis-routed response would otherwise leak a correlation entry forever and hang the upstream pipe indefinitely. The watchdog ticks at quarter-`BackendRequestTimeoutMs` (min 100 ms), scans the correlation map, and times out stale requests with **Modbus exception 0x0B (Gateway Target Device Failed To Respond)** delivered to the upstream party with the original TxId restored. Log event `mbproxy.multiplex.request.timeout` (Warning).
|
|
|
|
2. **PlcListener constructs a multiplexer unconditionally.** The Phase-9 draft had `PlcListener` conditionally construct the multiplexer only when a `PerPlcContext` was supplied; the no-context fallback dropped accepted upstream sockets. Tests (and any pre-Phase-6 startup path that lacked a context) hit a regression. The fix is to construct a minimal default `PerPlcContext` from the `PlcOptions` if the caller didn't supply one, and require `_multiplexer` to be non-null when `RunAsync` runs.
|
|
|
|
3. **`BackendConnectFailure_ClosesUpstreamCleanly` is now lazy.** The 1:1 model attempted a backend connect at upstream-accept time, so simply opening a TCP connection to a proxy with a bad backend triggered the close. The multiplexed model connects to the backend on the *first upstream frame*, so the test has to send a Modbus request before the proxy attempts the (failing) backend connect that causes the upstream close. Updated in-place.
|
|
|
|
4. **pymodbus 3.13.0 simulator is broken under multiplexed concurrent requests.** Its `ServerRequestHandler` keeps a single `last_pdu` per connection and schedules `handle_later` via `asyncio.call_soon`; two MBAP frames in one recv buffer overwrite `last_pdu` before the first handler runs, and both responses carry the later TxId. The real DL260 ECOM properly echoes per-request TxIds. Consequence for tests:
|
|
- **Mux correctness under truly concurrent backend traffic is proven against the stub backend in `PlcMultiplexerTests`**, which models the DL260's correct TxId-echo behaviour.
|
|
- **`MultiplexerE2ETests` paces requests** so pymodbus only ever sees one MBAP frame at a time on the shared backend connection. The headline test (`E2E_FiveSimultaneousClients_AllReadHR1072_AllGetDecoded_1234`) verifies the connection ceiling lift (5 simultaneous upstream connections, where Phase-08's 1:1 model would have refused the 5th) — *not* the under-concurrency multiplexing behaviour.
|
|
- **The watchdog is the production defence** if any real backend (or future simulator) ever mis-echoes a TxId: stale entries time out cleanly with exception 0x0B rather than hanging upstream clients.
|
|
|
|
5. **E2E timeouts.** Per `docs/plan/README.md`'s Test discipline, all E2E tests are 5 s by default. Hot-reload tests that genuinely need 5 s + 3 s of propagation windows carry a 10 s timeout with a one-line comment; `E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers` carries 8 s for its sequential connects + Polly-paced reconnect path.
|
|
|
|
6. **`AsyncHostDispose` deadlock note.** Test fixtures that hold `IHost` via `await using` were originally written with a 5 s shutdown timeout; under Phase 9's drained-channel cleanup that occasionally exceeded the test's own `Timeout = 5000`. Reduced to 2-3 s where it doesn't materially affect the test's drain semantics.
|
|
|
|
**Depends on:** Phase 04 (rewriter), Phase 05 (supervisor + Polly), Phase 07 (status page DTO surface).
|
|
**Parallel-safe with:** nothing within itself. **Hard rule.** This phase deletes `PlcConnectionPair` and rewires the supervisor + rewriter correlation path simultaneously; the cross-cut is too broad for safe parallel work. The optional intra-phase slicing (below) is the closest thing to parallel.
|
|
|
|
## Goal
|
|
|
|
The H2-ECOM100 accepts 4 concurrent TCP clients per PLC; today's 1:1 model means the 5th upstream client to the same proxy port fails at backend connect. This phase eliminates that ceiling by making **one persistent backend socket per PLC**, with the proxy serving as a connection multiplexer that rewrites MBAP transaction IDs to keep concurrent in-flight requests from different upstream clients distinguishable on the single wire.
|
|
|
|
The wire-rate ceiling does not change — the H2-ECOM100 internally serializes requests (one per PLC scan, ~2-10 ms scan time) regardless of how many TCP connections it has. We're shifting where serialization happens (proxy outbound queue vs PLC accept queue), not adding throughput. The dashboard pay-off is that "PLC clients connected" can rise into the dozens without the proxy degrading.
|
|
|
|
## Intra-phase slicing (the closest thing to parallel-safe within this phase)
|
|
|
|
The phase is one merge but can be implemented as five small commits in this order:
|
|
|
|
| Slice | Output | Files touched | Hours | Parallelizable? |
|
|
|-------|--------|---------------|-------|-----------------|
|
|
| 9.1 | Pure data types (TxIdAllocator, CorrelationMap, InFlightRequest) + their unit tests | new files under `src/Mbproxy/Proxy/Multiplexing/` and `tests/...` | ~5 | Yes — pure logic, disjoint from rest. A second agent can write the E2E test scaffolding (slice 9.5) in parallel. |
|
|
| 9.2 | `PlcMultiplexer` + `UpstreamPipe` skeleton with backend reader/writer loops | new files in `Multiplexing/` | ~10 | No — depends on 9.1's data types. |
|
|
| 9.3 | Refactor `PlcListener` to own the multiplexer; delete `PlcConnectionPair`; rewire supervisor | modifies existing Proxy + Supervision files | ~8 | No — depends on 9.2. |
|
|
| 9.4 | Update `BcdPduPipeline` to use correlation entries (drop `PerPlcContextWithRequest`); counter additions; status DTO + HTML updates | modifies pipeline + admin files | ~6 | No — depends on 9.3. |
|
|
| 9.5 | Full E2E test suite + design.md + CLAUDE.md doc updates | new test file + doc edits | ~6 | Test-writing yes (slice 9.5 skeleton can land in parallel with 9.1); the doc edits at the end are sequential after 9.3. |
|
|
|
|
**Total:** ~35 hours. With one parallel agent producing slice 9.1's data types and another sketching the e2e test fixtures during slice 9.5-prep, calendar time can compress to ~28 hours.
|
|
|
|
## Outputs (new files in this phase)
|
|
|
|
```
|
|
src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs # single backend conn owner; mux logic
|
|
src/Mbproxy/Proxy/Multiplexing/UpstreamPipe.cs # per-upstream-client reader/writer
|
|
src/Mbproxy/Proxy/Multiplexing/TxIdAllocator.cs # 16-bit allocator with wrap tracking
|
|
src/Mbproxy/Proxy/Multiplexing/CorrelationMap.cs # proxyTxId → InFlightRequest
|
|
src/Mbproxy/Proxy/Multiplexing/InFlightRequest.cs # the correlation record
|
|
src/Mbproxy/Proxy/Multiplexing/MultiplexerLogEvents.cs # [LoggerMessage] vocab for this phase
|
|
|
|
tests/Mbproxy.Tests/Proxy/Multiplexing/TxIdAllocatorTests.cs
|
|
tests/Mbproxy.Tests/Proxy/Multiplexing/CorrelationMapTests.cs
|
|
tests/Mbproxy.Tests/Proxy/Multiplexing/PlcMultiplexerTests.cs # integration, real sockets
|
|
tests/Mbproxy.Tests/Proxy/Multiplexing/RewriterCorrelationTests.cs # rewriter w/ multiplexed paths
|
|
tests/Mbproxy.Tests/Proxy/Multiplexing/MultiplexerE2ETests.cs # against pymodbus sim
|
|
```
|
|
|
|
## Files modified (existing files in this phase)
|
|
|
|
```
|
|
src/Mbproxy/Proxy/PlcListener.cs # owns PlcMultiplexer; accept loop hands sockets to it
|
|
src/Mbproxy/Proxy/PlcConnectionPair.cs # DELETED — replaced by UpstreamPipe + Multiplexer
|
|
src/Mbproxy/Proxy/IPduPipeline.cs # PduContext gains in-flight correlation entry
|
|
src/Mbproxy/Proxy/PerPlcContext.cs # delete PerPlcContextWithRequest; replaced by InFlightRequest passed per-call
|
|
src/Mbproxy/Proxy/BcdPduPipeline.cs # FC03/04 response decodes via InFlightRequest, not last-request slot
|
|
src/Mbproxy/Proxy/ProxyCounters.cs # new fields: InFlightCount, MaxInFlight, TxIdWraps, BackendDisconnectCascades, BackendQueueDepth
|
|
src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs # supervises mux lifecycle alongside listener
|
|
src/Mbproxy/Admin/StatusDto.cs # PlcBackendStatus gains the new mux fields
|
|
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # populate mux fields from counters
|
|
src/Mbproxy/Admin/StatusHtmlRenderer.cs # show inFlight/max-in-flight in the per-PLC row
|
|
|
|
docs/design.md # rewrite Connection model + Failure modes for multiplexed reality
|
|
mbproxy/CLAUDE.md # flip Architecture summary's connection-model bullet
|
|
docs/kpi.md # update operational notes referring to 4-client cap
|
|
```
|
|
|
|
## Tasks
|
|
|
|
### 9.1 Data types (pure logic)
|
|
|
|
1. **`TxIdAllocator`** — `internal sealed class TxIdAllocator`. State: `_inUse` (`bool[65536]` for O(1) lookup; ~64 KB), `_next` (`ushort`), `_inFlightCount` (long), `_wrapCount` (long). Methods:
|
|
- `bool TryAllocate(out ushort id)` — atomic via `lock` (the allocator is per-PLC, contention is low). Scans forward from `_next` for the next free slot; sets `_inUse[id] = true`; bumps `_next`. Returns `false` if `_inFlightCount == 65536` (saturated; emit `mbproxy.multiplex.saturated` Error and let caller decide to drop or queue).
|
|
- `void Release(ushort id)` — clears `_inUse[id]`; decrements `_inFlightCount`.
|
|
- `int InFlightCount { get; }`, `long WrapCount { get; }` — for telemetry.
|
|
- **Wrap counter:** increment whenever `_next` rolls over `0xFFFF → 0x0000`.
|
|
|
|
2. **`InFlightRequest` + `InterestedParty`** — `InterestedParty` is `internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId)`. `InFlightRequest` is `internal sealed record InFlightRequest(byte UnitId, byte Fc, ushort StartAddress, ushort Qty, IReadOnlyList<InterestedParty> InterestedParties, DateTimeOffset SentAtUtc)`. Carries enough state for: (a) restoring each party's original TxId on the way back, (b) the FC03/04 correlation the rewriter needs (start/qty), (c) routing the response to each interested upstream socket, (d) round-trip-time measurement.
|
|
|
|
**In Phase 9 `InterestedParties` always contains exactly one element.** The list shape is forward-compat with [Phase 10 — read coalescing](10-read-coalescing.md), which extends the same record to fan-out responses to multiple upstream clients without further refactor of the multiplexer's data model. Resist any reviewer suggestion to simplify it back to a single `UpstreamPipe Upstream` field — the list shape is the load-bearing foundation for Phase 10.
|
|
|
|
3. **`CorrelationMap`** — wraps a `ConcurrentDictionary<ushort, InFlightRequest>`. Methods: `bool TryAdd(ushort, InFlightRequest)`, `bool TryRemove(ushort, out InFlightRequest)`, `int Count { get; }`, `IReadOnlyCollection<InFlightRequest> Snapshot()` (for diagnostics; allocates a list). The dict is correct-by-construction for the mux's single-writer-add / single-reader-remove pattern; `ConcurrentDictionary` keeps it safe if/when we add upstream-side cancellation.
|
|
|
|
### 9.2 Multiplexer + UpstreamPipe
|
|
|
|
4. **`UpstreamPipe`** — `internal sealed class UpstreamPipe : IAsyncDisposable`. One instance per accepted upstream socket. Fields: `Socket _upstream`, `Guid _id`, `IPEndPoint _remoteEp`, `DateTimeOffset _connectedAtUtc`, `volatile bool _alive`, `Channel<byte[]> _responseChannel` (capacity 16). Two tasks:
|
|
- **Read task**: pumps inbound MBAP frames from `_upstream` to a per-pipe `OnFrame` callback (registered by the multiplexer).
|
|
- **Write task**: drains `_responseChannel` and writes each frame back to `_upstream`.
|
|
On fault: sets `_alive = false`, closes the socket, the multiplexer notices on next correlation lookup and drops responses bound for this pipe.
|
|
|
|
5. **`PlcMultiplexer`** — `internal sealed class PlcMultiplexer : IAsyncDisposable`. One instance per PLC. Fields: backend `Socket`, `TxIdAllocator`, `CorrelationMap`, `Channel<byte[]> _outboundChannel` (cap 256), `PerPlcContext _ctx` (tag map + counters + logger), list of attached `UpstreamPipe`s. Two backend tasks plus a fan-in:
|
|
- **Backend writer task**: drains `_outboundChannel` → writes to backend socket. Single writer; no synchronization on the socket needed.
|
|
- **Backend reader task**: reads MBAP frames from backend → looks up `proxyTxId` in `CorrelationMap` → calls `pipeline.Process(ResponseToClient, header, pdu, ctx with InFlight)` → for each `InterestedParty` in `InFlightRequest.InterestedParties` (always exactly one in Phase 9; list-of-N once Phase 10 ships): writes a copy of the frame with that party's `OriginalTxId` restored in the MBAP header to the party's `UpstreamPipe._responseChannel` (or drops silently for that party if its pipe is `_alive = false`) → `CorrelationMap.TryRemove(proxyTxId)` + `TxIdAllocator.Release(proxyTxId)`.
|
|
- **Per-upstream `OnFrame`**: invoked by each `UpstreamPipe`'s read task. Steps:
|
|
1. Parse MBAP: original TxId, length, unitId, PDU.
|
|
2. `TryAllocate` a proxyTxId. If saturated, write a Modbus exception response (Slave Device Failure, code 04) back to upstream and continue.
|
|
3. Build `InFlightRequest` (parse FC/start/qty from PDU if FC03/04 — needed for FC06 too if we want the symmetric correlation later).
|
|
4. `TryAdd` to correlation map.
|
|
5. Call `pipeline.Process(RequestToBackend, ...)` to apply BCD rewriting.
|
|
6. Overwrite MBAP TxId bytes with proxyTxId.
|
|
7. Enqueue the modified frame into `_outboundChannel`.
|
|
|
|
6. **Backend disconnect handling** — when the backend reader/writer task throws (socket closed, network reset, etc.):
|
|
- Stop both tasks; close the backend socket.
|
|
- Walk the correlation map; for each entry, close that entry's `UpstreamPipe` (cascade). Increment `BackendDisconnectCascades` by the upstream-pipe count.
|
|
- Clear correlation map and TxIdAllocator.
|
|
- The supervisor's Polly pipeline takes over for backend reconnect — when the next upstream request arrives, the multiplexer attempts a fresh backend connection through the Polly pipeline.
|
|
|
|
### 9.3 Listener + supervisor refactor
|
|
|
|
7. **`PlcListener.RunAsync`** — accept loop changes:
|
|
- One `PlcMultiplexer` per listener (constructed in `PlcListenerSupervisor` and handed in).
|
|
- On accept: wrap the socket in `UpstreamPipe`, register with the multiplexer via `mux.Attach(pipe)`.
|
|
- On listener stop: dispose the multiplexer (which closes the backend + all attached pipes).
|
|
- `ActivePairs` property → renamed `ActiveUpstreams` returning the multiplexer's list of attached `UpstreamPipe`s. Status page consumes this.
|
|
|
|
8. **Delete `PlcConnectionPair.cs`** — entire file. The replacement is `UpstreamPipe` + `PlcMultiplexer`. No backwards-compat shims; we're moving cleanly.
|
|
|
|
9. **`PlcListenerSupervisor`** — gains ownership of `PlcMultiplexer` alongside the listener. The Polly listener-recovery pipeline is unchanged; the multiplexer has its own internal Polly backend-connect pipeline (same `ResilienceOptions.BackendConnect` shape as today, just owned by the mux instead of the pair).
|
|
|
|
### 9.4 Rewriter + counters + status page
|
|
|
|
10. **`BcdPduPipeline`** — the FC03/04 response path stops reading `PerPlcContextWithRequest.LastRequestStart/Qty`. Instead, the multiplexer attaches an `InFlightRequest` to the `PduContext` for each response call:
|
|
```csharp
|
|
public sealed class PerPlcContext : PduContext {
|
|
public BcdTagMap TagMap { get; init; }
|
|
public ProxyCounters Counters { get; init; }
|
|
public ILogger Logger { get; init; }
|
|
public InFlightRequest? CurrentRequest { get; init; } // NEW — non-null on response, null on request
|
|
}
|
|
```
|
|
Concurrency: each backend response is handled on the backend reader task; the request path is handled by the per-upstream read task. Different `InFlightRequest` instances → no contention.
|
|
|
|
11. **Drop `PerPlcContextWithRequest`** entirely. The last-request-slot pattern was a 1:1-model workaround; the correlation map subsumes it.
|
|
|
|
12. **`ProxyCounters` additions:**
|
|
- `InFlightCount` (`long` snapshot of `CorrelationMap.Count`)
|
|
- `MaxInFlight` (`long`, peak observed via `Interlocked.Max`)
|
|
- `TxIdWraps` (`long` from `TxIdAllocator.WrapCount`)
|
|
- `BackendDisconnectCascades` (`long`)
|
|
- `BackendQueueDepth` (snapshot of `_outboundChannel.Reader.Count`)
|
|
|
|
13. **Status page** — `StatusDto.PlcBackendStatus` gains `InFlight`, `MaxInFlight`, `TxIdWraps`, `DisconnectCascades`, `QueueDepth`. `StatusSnapshotBuilder` populates them. `StatusHtmlRenderer` adds a column or compact `[3/256]` indicator per PLC row. The JSON field names land in camelCase per the existing source-gen convention.
|
|
|
|
### 9.5 Tests + docs
|
|
|
|
14. **Unit + integration test suites** (see Tests required below).
|
|
|
|
15. **`docs/design.md` updates:**
|
|
- **Connection model** section: rewrite. The diagram changes from "many clients → many backend sockets" to "many clients → one backend socket per PLC, multiplexed by proxy TxId rewriting." The operational consequence warning flips: instead of "5th client fails," it becomes "if backend disconnects, all attached upstream clients are cascaded closed; they reconnect on their own next request."
|
|
- **Failure modes** section: amend to describe the cascade behaviour.
|
|
- **Rewriter** section: amend to note the rewriter consumes `InFlightRequest` for response correlation (no architectural change, just an update to the description of how correlation flows).
|
|
|
|
16. **`mbproxy/CLAUDE.md`** Architecture summary: first bullet flips from "1:1 upstream-client ↔ backend-socket" to "single backend socket per PLC, multiplexed via MBAP TxId rewriting."
|
|
|
|
17. **`docs/kpi.md`** — the "Tier 2 → Connection-cap saturation warning" KPI loses its meaning (4-client cap no longer relevant on the upstream side). Either remove it or repurpose to track in-flight saturation against the 16-bit TxId space (which never realistically saturates but is the new equivalent ceiling).
|
|
|
|
## Public surface declared in this phase
|
|
|
|
All `internal sealed` — the multiplexer types are not consumed outside the assembly.
|
|
|
|
```csharp
|
|
namespace Mbproxy.Proxy.Multiplexing;
|
|
|
|
internal sealed class TxIdAllocator {
|
|
public bool TryAllocate(out ushort id);
|
|
public void Release(ushort id);
|
|
public int InFlightCount { get; }
|
|
public long WrapCount { get; }
|
|
}
|
|
|
|
internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
|
|
|
|
internal sealed record InFlightRequest(
|
|
byte UnitId, byte Fc,
|
|
ushort StartAddress, ushort Qty,
|
|
IReadOnlyList<InterestedParty> InterestedParties,
|
|
DateTimeOffset SentAtUtc);
|
|
// Phase 9: InterestedParties.Count is always 1.
|
|
// Phase 10 (read coalescing): the same record fans out to N parties without further refactor.
|
|
|
|
internal sealed class CorrelationMap {
|
|
public bool TryAdd(ushort proxyTxId, InFlightRequest req);
|
|
public bool TryRemove(ushort proxyTxId, out InFlightRequest req);
|
|
public int Count { get; }
|
|
public IReadOnlyCollection<InFlightRequest> Snapshot();
|
|
}
|
|
|
|
internal sealed class UpstreamPipe : IAsyncDisposable {
|
|
public Guid Id { get; }
|
|
public IPEndPoint RemoteEp { get; }
|
|
public DateTimeOffset ConnectedAtUtc { get; }
|
|
public long PdusForwardedCount { get; }
|
|
public bool IsAlive { get; }
|
|
public Task RunReadLoopAsync(Func<byte[], Task> onFrame, CancellationToken ct);
|
|
public ValueTask SendResponseAsync(byte[] frame, CancellationToken ct);
|
|
public ValueTask DisposeAsync();
|
|
}
|
|
|
|
internal sealed class PlcMultiplexer : IAsyncDisposable {
|
|
public void Attach(UpstreamPipe pipe);
|
|
public IReadOnlyCollection<UpstreamPipe> AttachedPipes { get; }
|
|
public Task RunAsync(CancellationToken ct);
|
|
public ValueTask DisposeAsync();
|
|
}
|
|
```
|
|
|
|
`PerPlcContext` gains a nullable `CurrentRequest` property. `PerPlcContextWithRequest` is removed (along with its `LastRequest*` slots).
|
|
|
|
## Tests required
|
|
|
|
### Unit (`Category = Unit`)
|
|
|
|
**`TxIdAllocatorTests`** (≥ 8 tests):
|
|
|
|
1. `Allocate_FromEmpty_Returns_NextSequential`
|
|
2. `Allocate_AfterRelease_Reuses_FreedId`
|
|
3. `Allocate_AllocatesEveryUshort_BeforeWrapping`
|
|
4. `Allocate_WrapsCorrectly_After0xFFFF`
|
|
5. `Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow`
|
|
6. `Release_OfNonAllocated_IsNoOp`
|
|
7. `Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress` (100 tasks, 1000 ops each)
|
|
8. `WrapCount_IncrementsOnEachFullWrap`
|
|
|
|
**`CorrelationMapTests`** (≥ 5 tests):
|
|
|
|
1. `TryAdd_Then_TryRemove_RoundTrips`
|
|
2. `TryAdd_DuplicateKey_Fails`
|
|
3. `TryRemove_OfMissing_ReturnsFalse`
|
|
4. `Snapshot_ReflectsCurrentState`
|
|
5. `Concurrent_AddRemove_NoDataLoss_Under_Parallel_Stress`
|
|
|
|
**`PlcMultiplexerTests`** (≥ 7 tests, real sockets, no simulator):
|
|
|
|
1. `SingleUpstream_RoundTripsFC03_Through_Multiplexer`
|
|
2. `SingleUpstream_RoundTripsFC06_Through_Multiplexer`
|
|
3. `TwoUpstreams_ConcurrentFC03_BothGetCorrectResponses` — proves TxId rewriting works end-to-end against a stub backend
|
|
4. `TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire` — sniff the backend socket; verify per-request TxIds are unique even when upstream TxIds collide
|
|
5. `UpstreamDisconnect_DoesNotAffectOtherUpstreams` — drop one client mid-flight; other client's response still arrives
|
|
6. `BackendDisconnect_CascadesToAllUpstreams` — kill backend; verify all upstream sockets close within 500 ms, `BackendDisconnectCascades` increments by N
|
|
7. `BackendReconnect_AfterCascade_NextUpstreamRequest_Succeeds`
|
|
|
|
**`RewriterCorrelationTests`** (≥ 4 tests):
|
|
|
|
1. `FC03Response_DecodedViaInFlightRequest_NotPerPairSlot`
|
|
2. `ConcurrentFC03_FromTwoUpstreams_DecodeCorrectly_NoCrossTalk` — set up two `InFlightRequest`s with different start addresses, deliver responses out of order; verify each decodes against its own request
|
|
3. `ConcurrentFC06_FromTwoUpstreams_EncodeCorrectly`
|
|
4. `ResponseForDeadUpstream_IsDropped_NoExceptionPropagates`
|
|
|
|
### Integration (`Category = Unit`, no simulator)
|
|
|
|
These use real `TcpListener` + `Socket` against a stub backend (a `TcpListener` that just echoes or canned-responds). They live in `PlcMultiplexerTests`.
|
|
|
|
### E2E (`Category = E2E`)
|
|
|
|
**`MultiplexerE2ETests`** (≥ 5 tests, against pymodbus simulator):
|
|
|
|
1. `E2E_FiveConcurrentClients_AllReadHR1072_AllGetDecoded_1234` — the headline test. Five NModbus clients connected to the proxy in parallel; pymodbus sim has the BCD register at 1072. All five get `1234`. With Phase 08's 1:1 model, the 5th client would fail at backend connect.
|
|
2. `E2E_TwentyConcurrent_FC03_Requests_AcrossThreeClients_AllSucceed`
|
|
3. `E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers` — kill the sim mid-flight (simulate by closing on its side); verify upstream clients see clean socket close; relaunch sim; new upstream connection succeeds.
|
|
4. `E2E_RewriterStillWorks_UnderMultiplexedThreeClients` — three clients each writing different decimal values to different BCD-configured addresses via FC06; verify sim's register state.
|
|
5. `E2E_StatusPage_Shows_InFlightAndMaxInFlight` — drive 4 concurrent reads, verify `/status.json` reports `inFlight >= 1` during the burst and `maxInFlight >= 4`.
|
|
|
|
## Phase gate
|
|
|
|
- [ ] `dotnet build Mbproxy.slnx -c Debug` — zero warnings, zero errors.
|
|
- [ ] All 271 prior tests still green. Specifically: `Forward_FC03_HR1072_Returns_Decoded_1234`, `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips`, `MbapTxId_IsPreservedEndToEnd`, and `MbapTxId_StillPreserved_AfterRewriting_20Consecutive` continue to pass against the multiplexed implementation. The MBAP-TxId-preserved tests are the **critical regression guard** — if multiplexing leaks proxy TxIds back to the client, these fail.
|
|
- [ ] All new unit tests pass (≥ 24 new in slices 9.1-9.2 alone).
|
|
- [ ] All new E2E tests pass (≥ 5).
|
|
- [ ] `Forward_FC03_HR1072_Returns_Decoded_1234` PASSES with 5 concurrent NModbus clients connected to the same proxy port. **This is THE phase test.**
|
|
- [ ] `PlcConnectionPair.cs` is gone. Grep for the type name across the solution returns zero hits.
|
|
- [ ] `PerPlcContextWithRequest` is gone. Grep returns zero hits.
|
|
- [ ] `docs/design.md` "Connection model" section is rewritten; the 1:1 model description is gone or moved into a "Historical: pre-Phase-09 model" footnote.
|
|
- [ ] `mbproxy/CLAUDE.md` Architecture summary's connection-model bullet is updated.
|
|
- [ ] Backend disconnect with N upstream clients in-flight: all N close within 500 ms; counter `BackendDisconnectCascades += N`.
|
|
- [ ] `mbproxy.multiplex.saturated` Error event fires if TxId allocator hits 65,536 in-flight. (Stress-test acceptable; manufacture by holding 65,536 pending responses against a stub backend.)
|
|
- [ ] Shutdown semantics still work: `ShutdownCoordinator` drains in-flight requests (now visible via `InFlightCount`, not `IsProcessing`).
|
|
- [ ] Status page renders the new fields; HTML page weight remains under 50 KB for 54 PLCs.
|
|
- [ ] CounterSnapshot's existing field set is preserved — only **added** fields, no renames or removals. Backwards-compat per the policy in `docs/kpi.md`.
|
|
|
|
## Out of scope
|
|
|
|
- **Foundation for future caching, not caching itself.** This phase establishes the chokepoint where any future caching or coalescing layer plugs in, but implements no caching of any kind. `InFlightRequest.InterestedParties` is shaped as a list specifically to make [Phase 10 — read coalescing](10-read-coalescing.md) additive without refactor; do not infer caching behavior from the list shape alone. Tier C-2 (short-TTL response cache) and Tier C-3 (periodic poll + cache) remain explicitly out of scope until their own design discussions and `design.md` updates land.
|
|
- **Per-tag read coalescing** — if two clients read the same register at the same time, Phase 9's multiplexer sends both requests. Coalescing them into one backend round-trip is the explicit goal of [Phase 10](10-read-coalescing.md), which plugs into the `InterestedParties` seam created here.
|
|
- **Backend keepalive / heartbeat** — the design's current "no keepalive" position stands. An idle backend with no upstream activity will die after middlebox timeouts; the next upstream request triggers a fresh connect via Polly. Multiplexing doesn't change this.
|
|
- **TxId fairness scheduling** — FIFO order in the `_outboundChannel` is the contract. No round-robin per upstream, no priority. If a single upstream client floods the channel, others queue behind. This is a stated trade-off and matches the ECOM's internal serialization anyway.
|
|
- **Pipelined multi-PDU-in-flight per single upstream client** — still unsupported. One in-flight request per upstream pipe at a time. Multiplexing across DIFFERENT upstream clients works fully; multiplexing across multiple in-flight requests from the SAME upstream client does not. Document the constraint.
|
|
- **Linux / cross-platform packaging** — still Windows Service only.
|
|
|
|
## Subagent briefing
|
|
|
|
If you're the agent picking up this phase, here's the executive summary you need in your head:
|
|
|
|
1. **You are deleting `PlcConnectionPair`.** Everything that file did is now split between `UpstreamPipe` (the per-client half) and `PlcMultiplexer` (the per-PLC half). Read `PlcConnectionPair.cs` once before you delete it — every behavior in there has a destination in one of the two new classes.
|
|
|
|
2. **Single-writer / single-reader on the backend socket.** Two tasks share the backend socket: one writes (drained from `_outboundChannel`), one reads (decodes MBAP frames). No third task touches the socket. This invariant is what makes the channel + dictionary design correct without locks.
|
|
|
|
3. **The rewriter doesn't know about MBAP framing or correlation.** It still receives `(direction, mbapHeader span, pdu span, PerPlcContext ctx)`. The only addition is `ctx.CurrentRequest` (nullable, non-null on response). The rewriter is otherwise unchanged. Resist refactoring it.
|
|
|
|
4. **`InFlightRequest.SentAtUtc` powers `lastRoundTripMs` correctly across multiplexed clients.** Today's EWMA is per-pair; under multiplexing, the timestamp moves to per-request. The status counter stays the same.
|
|
|
|
5. **Cascade-on-backend-disconnect is the most subtle behavior.** Get the test for it right early (`BackendDisconnect_CascadesToAllUpstreams`). It's the difference between "graceful failure" and "leaked upstream sockets that hold connections open until OS timeout."
|
|
|
|
6. **TxId allocator saturation is a real-world impossibility but a stress-test reality.** Hold 65,536 responses in a stub backend; the allocator must refuse the 65,537th cleanly with an exception response code 04, not crash.
|
|
|
|
7. **Update the docs in the SAME PR as the code.** `design.md` Connection model, `mbproxy/CLAUDE.md` Architecture summary, and `docs/kpi.md` connection-cap KPI either get rewritten or removed. Doc drift is a gate fail.
|
|
|
|
8. **Do NOT introduce parallel agents within this phase.** The cross-cut is too broad. If you have spare agent budget, slice 9.1 (data types + their unit tests) can run alongside slice 9.5 (e2e test scaffolding writing against the unchanged outer-shape contract) but the middle slices are sequential.
|
|
|
|
9. **The 4 critical regression tests** that must stay green:
|
|
- `Forward_FC03_HR1072_Returns_Decoded_1234`
|
|
- `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips`
|
|
- `Forward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTrips`
|
|
- `MbapTxId_IsPreservedEndToEnd` ← THIS is the one that proves multiplexing is transparent.
|
|
|
|
10. **When in doubt, re-read `BcdPduPipeline.ProcessResponse`.** The FC03/04 correlation logic there is the most subtle existing code that you're touching. Walk through it with one upstream client in mind first, then mentally replay with two; both must work without code change to the pipeline (only the way `PerPlcContext.CurrentRequest` gets populated changes).
|
|
|
|
## Cross-references
|
|
|
|
- Today's 1:1 model: [`../design.md`](../design.md) → "Connection model" (will be rewritten by this phase).
|
|
- DL260 4-client cap source: [`../../DL260/dl205.md`](../../DL260/dl205.md) → "Behavioral Oddities".
|
|
- Existing rewriter request→response correlation: `src/Mbproxy/Proxy/BcdPduPipeline.cs` `ProcessResponse` (lines reading `PerPlcContextWithRequest.LastRequest*`).
|
|
- Polly pipelines this phase reuses without modification: `src/Mbproxy/Proxy/Supervision/PolicyFactory.cs`.
|
|
- Counter-snapshot backwards-compat policy: [`../kpi.md`](../kpi.md) → "Backwards-compat policy".
|