mbproxy: remediate the 2026-05-16 code-review findings
Fixes every finding from the codereviews/2026-05-16 multi-agent review (2 Critical, 20 Major, 38 Minor) and adds that review to the repo. Highlights: dashboard XSS escape; response cache invalidated on the write request (not just the response); ReloadValidator now runs at startup so port collisions / duplicate names / malformed Resilience profiles fail fast; AdminPort 0 genuinely disables the admin endpoint; PlcListener accept-loop faults propagate to the supervisor's faulted path; reconciler Restart builds before removing; Resilience pipelines are restart-only from a frozen snapshot; multiplexer connect-race leak, watchdog party-list snapshot, backend-response and FC16 framing validation; frontend reconnect retry and util.js load guard; plus the log-event/doc drift sweep and test-port hygiene. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -143,7 +143,7 @@ DL205 / DL260 BCD is non-negative in the default ladder pattern. `BcdCodec.Encod
|
||||
|
||||
## Exception Pass-Through
|
||||
|
||||
Modbus exception responses pass through unchanged. The rewriter detects an exception response by the high bit of the function code (`fc & 0x80 != 0`), emits a `mbproxy.rewrite.exception_passthrough` event, increments the per-FC exception counter, and returns without touching the payload.
|
||||
Modbus exception responses pass through unchanged. The rewriter detects an exception response by the high bit of the function code (`fc & 0x80 != 0`), emits a `mbproxy.exception.passthrough` event, increments the per-FC exception counter, and returns without touching the payload.
|
||||
|
||||
Covered exception codes:
|
||||
|
||||
@@ -229,7 +229,7 @@ The rewriter feeds two counters that surface on the status page:
|
||||
|
||||
An out-of-range value (`< 0` or `> 9999` for 16-bit; `< 0` or `> 99_999_999` for 32-bit) on a write, or a bad nibble (`>= 0xA`) on a read, increments an internal invalid-BCD counter and emits `mbproxy.rewrite.invalid_bcd` at warning. The PDU passes through raw in that case; the rewriter never substitutes a value the client did not send (writes) or the PLC did not return (reads).
|
||||
|
||||
Both counters are exposed on the status page; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md). The corresponding log events (`mbproxy.rewrite.partial_bcd`, `mbproxy.rewrite.invalid_bcd`, `mbproxy.rewrite.exception_passthrough`) are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). Partial-overlap troubleshooting is covered in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
||||
Both counters are exposed on the status page; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md). The corresponding log events (`mbproxy.rewrite.partial_bcd`, `mbproxy.rewrite.invalid_bcd`, `mbproxy.exception.passthrough`) are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). Partial-overlap troubleshooting is covered in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
||||
|
||||
The `dl205.json` pymodbus simulator profile encodes BCD test fixtures used by the integration test suite; see [`../Testing/Simulator.md`](../Testing/Simulator.md).
|
||||
|
||||
|
||||
@@ -56,6 +56,7 @@ If a step throws, the exception is logged at Error and the loop continues with t
|
||||
| `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. |
|
||||
| `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. |
|
||||
| `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. |
|
||||
| `Resilience.BackendConnect.*` or `Resilience.ListenerRecovery.*` | **Restart-only.** The backend-connect and listener-recovery Polly pipelines are built from the `Resilience` snapshot taken at service startup; the reconciler builds add/restart supervisors from that same frozen snapshot, so a hot-reload of these values does not propagate to any PLC. Restart the service to change them. |
|
||||
| Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. |
|
||||
|
||||
The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections.
|
||||
@@ -78,21 +79,22 @@ The `ReloadPlan` distinguishes two kinds of "PLC is still here but changed":
|
||||
3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit).
|
||||
4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null.
|
||||
|
||||
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa. There is no second validator that could disagree with the first.
|
||||
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa.
|
||||
|
||||
## Validation Rules
|
||||
|
||||
`ReloadValidator.Validate` is the gate the hot-reload path consults directly. It runs the following checks in order:
|
||||
`ReloadValidator.Validate` is the configuration gate. It runs at **startup** (in `ProxyWorker.ExecuteAsync`, before any supervisor is built — a rejection logs `mbproxy.startup.config.rejected` and the service exits non-zero) **and** on every hot reload. It runs the following checks in order:
|
||||
|
||||
1. PLC names are non-empty and unique under ordinal comparison.
|
||||
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list.
|
||||
3. `AdminPort` is in `[1, 65535]` and does not collide with any `ListenPort`.
|
||||
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list; every `Host` is non-empty and every backend `Port` is in `[1, 65535]`.
|
||||
3. `AdminPort` is in `[1, 65535]`, or `0` to disable the admin endpoint; a non-zero `AdminPort` must not collide with any `ListenPort`.
|
||||
4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup.
|
||||
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` must be `>= 0`.
|
||||
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` must be in `[0, 100000]` and `Cache.EvictionIntervalMs` must be `>= 0`.
|
||||
6. `AdminPushIntervalMs` is in `[1, 60000]`; connection timeouts are `> 0`; the keepalive cross-field rule holds; and the `Resilience` profiles are well-formed (`BackendConnect.MaxAttempts >= 1` with at least `MaxAttempts - 1` non-negative `BackoffMs` entries, `ListenerRecovery.SteadyStateMs > 0`, `ReadCoalescing.MaxParties >= 1`).
|
||||
|
||||
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates.
|
||||
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates (at startup, the service refuses to start).
|
||||
|
||||
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording.
|
||||
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two validators overlap deliberately; their error wording is similar but not guaranteed identical.
|
||||
|
||||
### Rejected-reload example
|
||||
|
||||
|
||||
@@ -281,21 +281,22 @@ The cache itself is described in detail in [`../Architecture/ResponseCache.md`](
|
||||
|
||||
## Validation Rules
|
||||
|
||||
`ReloadValidator.Validate` runs on every config load (startup and hot reload) and rejects the entire snapshot if any rule fails. On rejection at startup, the service exits non-zero. On rejection at runtime, the current in-memory config stays in effect and `mbproxy.config.reload.rejected` is logged at `Error`.
|
||||
`ReloadValidator.Validate` runs on every config load (startup and hot reload) and rejects the entire snapshot if any rule fails. On rejection at startup, the service logs `mbproxy.startup.config.rejected` at `Error` and exits non-zero. On rejection at runtime, the current in-memory config stays in effect and `mbproxy.config.reload.rejected` is logged at `Error`.
|
||||
|
||||
Rules (in order):
|
||||
|
||||
1. **PLC names**: every `Plcs[i].Name` is non-empty and unique (ordinal comparison).
|
||||
2. **ListenPort**: every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the array.
|
||||
3. **AdminPort**: in `[1, 65535]` and does not collide with any `ListenPort`.
|
||||
2. **ListenPort / Host / Port**: every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the array; every `Host` is non-empty; every backend `Port` is in `[1, 65535]`.
|
||||
3. **AdminPort**: in `[1, 65535]`, or `0` to disable the admin endpoint; a non-zero value does not collide with any `ListenPort`.
|
||||
4. **BCD tag map** per PLC, delegated to `BcdTagMapBuilder.Build`:
|
||||
- duplicate addresses within a single PLC's resolved tag list
|
||||
- 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry at that address
|
||||
5. **Cache TTL bounds**:
|
||||
- any `CacheTtlMs` or `DefaultCacheTtlMs` less than 0 is rejected
|
||||
- any `CacheTtlMs` or `DefaultCacheTtlMs` greater than `60_000` is rejected unless `Cache.AllowLongTtl = true`
|
||||
6. **Cache size knobs**: `Cache.MaxEntriesPerPlc >= 0`, `Cache.EvictionIntervalMs >= 0`.
|
||||
7. **Width**: every `BcdTagOptions.Width` is `16` or `32` (enforced by `MbproxyOptionsValidator` at schema time).
|
||||
6. **Cache size knobs**: `Cache.MaxEntriesPerPlc` in `[0, 100000]`, `Cache.EvictionIntervalMs >= 0`.
|
||||
7. **AdminPushIntervalMs / timeouts / keepalive / Resilience**: `AdminPushIntervalMs` in `[1, 60000]`; connection timeouts `> 0`; the keepalive cross-field rule (`BackendHeartbeatIdleMs > BackendRequestTimeoutMs`); and well-formed `Resilience` profiles (`BackendConnect.MaxAttempts >= 1` with `>= MaxAttempts - 1` non-negative `BackoffMs` entries, `ListenerRecovery.SteadyStateMs > 0`, `ReadCoalescing.MaxParties >= 1`).
|
||||
8. **Width**: every `BcdTagOptions.Width` is `16` or `32` (also enforced by `MbproxyOptionsValidator` at schema time).
|
||||
|
||||
Sample rejection messages (logged at `Error` with the structured property `errors` carrying the full list):
|
||||
|
||||
|
||||
@@ -330,6 +330,8 @@ The detail page's debug view is fed by an **on-demand per-tag value capture** (`
|
||||
| `debug.tags[].updatedAtUtc` | `string?` | ISO-8601 time of the observation; `null` when no traffic yet. |
|
||||
| `debug.tags[].ageSeconds` | `double?` | Seconds since the observation; `null` when no traffic yet. |
|
||||
|
||||
`PlcDetailResponse` is delivered **only** over the `/hub/status` SignalR feed (the `"plc"` message); there is no `GET` route for it, and it is serialized through the SignalR JSON protocol rather than `StatusJsonContext`. Scrapers that want per-PLC counters use the `plcs[]` array of `GET /status.json` instead — the debug-view capture has no JSON-twin endpoint.
|
||||
|
||||
## How to Scrape It
|
||||
|
||||
The JSON twin is plain HTTP. Any monitoring system that can curl an endpoint can scrape it.
|
||||
|
||||
@@ -45,6 +45,18 @@ Fires once after `ProxyWorker.StartAsync` has spun up every per-PLC supervisor a
|
||||
|
||||
**Operator action:** if the two counts disagree, search for `mbproxy.startup.bind.failed` entries to identify the missing PLCs.
|
||||
|
||||
### mbproxy.startup.config.rejected
|
||||
|
||||
**Level:** Error · **EventId:** 2 · **Source:** `src/Mbproxy/Proxy/ProxyWorker.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
| `Errors` | `string` | Concatenated validation failures (one per `;`). |
|
||||
|
||||
Fires once at startup when `ReloadValidator.Validate` rejects the initial `appsettings.json` — duplicate listen ports, an `AdminPort` collision, duplicate PLC names, a malformed BCD tag list, a bad keepalive cross-field relationship, or an invalid `Resilience` profile. The service then exits non-zero; no listeners are started. This is the startup-time twin of `mbproxy.config.reload.rejected`.
|
||||
|
||||
**Operator action:** fix the offending entry in `appsettings.json` and restart the service. The error text names every failed rule.
|
||||
|
||||
### mbproxy.startup.bind
|
||||
|
||||
**Level:** Information · **EventId:** 20 (`PlcListener`) / 40 (`PlcListenerSupervisor`) · **Source:** `src/Mbproxy/Proxy/PlcListener.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||
@@ -60,7 +72,7 @@ Fires when a per-PLC `TcpListener` successfully binds its configured port. Emitt
|
||||
|
||||
### mbproxy.startup.bind.failed
|
||||
|
||||
**Level:** Error · **EventId:** 21 (`ProxyWorker`) / 41 (`PlcListenerSupervisor`) · **Source:** `src/Mbproxy/Proxy/ProxyWorker.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||
**Level:** Error · **EventId:** 41 · **Source:** `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
@@ -88,7 +100,7 @@ Fires after the supervisor's Polly recovery pipeline successfully rebinds a list
|
||||
|
||||
### mbproxy.listener.faulted
|
||||
|
||||
**Level:** Error (`PlcListener`) / Warning (`PlcListenerSupervisor`) · **EventId:** 22 / 43 · **Source:** `src/Mbproxy/Proxy/PlcListener.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||
**Level:** Warning · **EventId:** 43 · **Source:** `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
@@ -96,7 +108,7 @@ Fires after the supervisor's Polly recovery pipeline successfully rebinds a list
|
||||
| `Port` | `int` | Port whose listener faulted. |
|
||||
| `Reason` | `string` | Top-level exception message. |
|
||||
|
||||
Fires when a listener's accept loop throws. The two sources emit at different levels deliberately: the unsupervised `PlcListener` instance logs at `Error` (a terminal condition for that listener), while the supervised emission is `Warning` because Polly will retry. The supervised path attaches the exception object as the `LoggerMessage` exception parameter, so the stack trace is captured.
|
||||
Fires when a listener's accept loop throws. `PlcListener.RunAsync` propagates the fault to its `PlcListenerSupervisor`, which logs this event at `Warning` (Polly will retry) with the exception object attached as the `LoggerMessage` exception parameter, so the stack trace is captured.
|
||||
|
||||
**Operator action:** if the same `Plc` produces repeated faults inside a few minutes, inspect the network path. A burst of faults paired with `mbproxy.multiplex.backend.disconnected` indicates the PLC itself is unhealthy rather than a proxy issue.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user