mbproxy: initial commit through Phase 9 (TxId multiplexing)

Adds the mbproxy service end-to-end. Phases 00-08 implement the
production-ready single-listener / 1:1-backend transparent Modbus TCP
proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260
fleet. Phase 9 replaces the connection layer with a single backend
socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's
4-concurrent-client cap as an operational ceiling.

Phase 9 additions of note:
- PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap
- InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing
  for Phase 10 read coalescing — do not collapse to a single field)
- Per-request watchdog: surfaces Modbus exception 0x0B to upstream
  on BackendRequestTimeoutMs, defending against lost responses,
  dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed-
  request bug (its ServerRequestHandler.last_pdu state race)
- Status DTO + HTML gain inFlight / maxInFlight / txIdWraps /
  disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md)

Tests: 263 unit + 38 E2E. Multiplexer correctness under truly
concurrent backend traffic is proved against a stub backend in
PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus
3.13's single-PDU framer stays in known-good mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 01:49:35 -04:00
parent 2e937228a0
commit 56eee3c563
105 changed files with 18430 additions and 0 deletions
@@ -0,0 +1,155 @@
// mbproxy configuration template — copy to %ProgramData%\mbproxy\appsettings.json
// and edit before starting the service.
//
// The .NET configuration loader accepts // and /* */ comments in JSON files
// (JSONC semantics) when using the default Host.CreateApplicationBuilder path.
//
// IMPORTANT: This file is overwritten on each install ONLY if no appsettings.json
// already exists at the destination. An existing file is always preserved.
{
"Mbproxy": {
// ── Global BCD tag list ─────────────────────────────────────────────────────────────
// These tags apply to EVERY PLC by default.
// Each entry: Address (Modbus PDU address, decimal), Width (16 or 32 bits).
//
// Width 16 — one register holds 4 BCD digits (09999).
// Wire value 0x1234 decodes to decimal 1234.
//
// Width 32 — a CDAB-ordered register pair (Address = low word, Address+1 = high word).
// Decoded decimal = high * 10000 + low (DirectLOGIC CDAB word order).
//
// Per-PLC overrides (see Plcs[].BcdTags below):
// Add — appends extra tags beyond what Global defines, or overrides a
// Global entry's Width when the same Address appears in both.
// Remove — removes specific addresses from the effective set for that PLC.
// Effective set = (Global Add) Remove, resolved per PDU.
"BcdTags": {
"Global": [
// V2000 (octal) = decimal address 1024. 16-bit BCD counter.
{ "Address": 1024, "Width": 16 },
// V2040 (octal) = decimal address 1056. 32-bit BCD total at 1056/1057.
{ "Address": 1056, "Width": 32 },
// V2100 (octal) = decimal address 1088. 16-bit BCD setpoint.
{ "Address": 1088, "Width": 16 }
]
},
// ── PLC list ────────────────────────────────────────────────────────────────────────
// Each entry maps one upstream proxy port → one backend PLC.
// Upstream clients connect to ListenPort; the proxy forwards to Host:Port.
//
// IMPORTANT: H2-ECOM100 modules accept at most 4 simultaneous TCP connections.
// With the 1:1 upstream↔backend model, a fifth upstream client to the same proxy
// port will cause a backend connect failure and an immediate upstream disconnect.
"Plcs": [
{
"Name": "Line1-Mixer", // Human-readable name (shown on status page and in logs)
"ListenPort": 5020, // Port the proxy listens on (upstream clients connect here)
"Host": "10.0.1.1", // PLC IP address or hostname
"Port": 502, // PLC Modbus TCP port (almost always 502)
"BcdTags": {
// Additional 32-bit tag specific to this PLC only.
"Add": [
{ "Address": 1200, "Width": 32 }
],
// Remove address 1056 from the Global list for this PLC
// (this mixer doesn't use the 32-bit BCD total).
"Remove": [ 1056 ]
}
},
{
"Name": "Line1-Conveyor",
"ListenPort": 5021,
"Host": "10.0.1.2",
"Port": 502
// No BcdTags override — uses the Global set as-is.
}
// Add one entry per PLC. Ports must be unique per host. Typical fleet: 54 PLCs.
],
// ── Admin port ──────────────────────────────────────────────────────────────────────
// Read-only HTTP status page.
// GET / → self-contained HTML (auto-refreshes every 5 s)
// GET /status.json → same data as JSON for monitoring scrapers
//
// Authentication is assumed at the network layer (trusted internal segment).
// Set to 0 to disable the admin endpoint.
"AdminPort": 8080,
// ── Connection timeouts ─────────────────────────────────────────────────────────────
"Connection": {
// Max time (ms) to wait for a TCP connect to the PLC backend.
// Each Polly retry attempt gets its own copy of this timeout.
"BackendConnectTimeoutMs": 3000,
// Max time (ms) to wait for the PLC to respond to a forwarded PDU.
// Non-idempotent FC06/FC16 writes are one-shot — the upstream client
// is disconnected immediately on timeout (no retry).
"BackendRequestTimeoutMs": 3000,
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
// (sc.exe stop / Windows Service stop signal). After this deadline the coordinator
// cancels remaining work and proceeds. Keep at or below the SCM wait-hint (30 s).
"GracefulShutdownTimeoutMs": 10000
},
// ── Resilience policies ─────────────────────────────────────────────────────────────
"Resilience": {
// Polly retry policy for backend TCP connect attempts.
// MaxAttempts: total connect tries (including the first).
// BackoffMs: delay between each attempt (must have MaxAttempts1 entries).
"BackendConnect": {
"MaxAttempts": 3,
"BackoffMs": [ 100, 500, 2000 ]
},
// Polly recovery policy for listener bind failures.
// If a PLC's listen port can't be bound (in-use, bad IP, transient OS error),
// the supervisor retries according to this schedule.
// InitialBackoffMs: backoff per step (first N retries).
// SteadyStateMs: backoff for all subsequent retries (runs indefinitely).
"ListenerRecovery": {
"InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ],
"SteadyStateMs": 30000
}
}
},
// ── Serilog ─────────────────────────────────────────────────────────────────────────────
// Structured log output. Default: Information level, rolling-file under ProgramData.
// The EventLogBridge writes Error+ events to the Windows Application Event Log
// automatically when the service runs under the SCM (not under dotnet run).
"Serilog": {
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": {
"Default": "Information",
"Override": {
"Microsoft": "Warning",
"System": "Warning"
}
},
"WriteTo": [
{
"Name": "Console",
"Args": {
"outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}"
}
},
{
"Name": "File",
"Args": {
// Rolling log: one file per day, kept for 30 days.
// Survives uninstall — logs are archived to %ProgramData%\mbproxy.archived-<ts>\.
"path": "C:\\ProgramData\\mbproxy\\logs\\mbproxy-.log",
"rollingInterval": "Day",
"retainedFileCountLimit": 30,
"outputTemplate": "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}"
}
}
]
}
}