mbproxy: remediate the 2026-05-16 code-review findings

Fixes every finding from the codereviews/2026-05-16 multi-agent review
(2 Critical, 20 Major, 38 Minor) and adds that review to the repo.

Highlights: dashboard XSS escape; response cache invalidated on the
write request (not just the response); ReloadValidator now runs at
startup so port collisions / duplicate names / malformed Resilience
profiles fail fast; AdminPort 0 genuinely disables the admin endpoint;
PlcListener accept-loop faults propagate to the supervisor's faulted
path; reconciler Restart builds before removing; Resilience pipelines
are restart-only from a frozen snapshot; multiplexer connect-race leak,
watchdog party-list snapshot, backend-response and FC16 framing
validation; frontend reconnect retry and util.js load guard; plus the
log-event/doc drift sweep and test-port hygiene.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-16 18:08:06 -04:00
parent 0308490aef
commit b222362ce0
45 changed files with 1735 additions and 151 deletions
+9 -7
View File
@@ -56,6 +56,7 @@ If a step throws, the exception is logged at Error and the loop continues with t
| `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. |
| `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. |
| `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. |
| `Resilience.BackendConnect.*` or `Resilience.ListenerRecovery.*` | **Restart-only.** The backend-connect and listener-recovery Polly pipelines are built from the `Resilience` snapshot taken at service startup; the reconciler builds add/restart supervisors from that same frozen snapshot, so a hot-reload of these values does not propagate to any PLC. Restart the service to change them. |
| Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. |
The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections.
@@ -78,21 +79,22 @@ The `ReloadPlan` distinguishes two kinds of "PLC is still here but changed":
3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit).
4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null.
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa. There is no second validator that could disagree with the first.
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa.
## Validation Rules
`ReloadValidator.Validate` is the gate the hot-reload path consults directly. It runs the following checks in order:
`ReloadValidator.Validate` is the configuration gate. It runs at **startup** (in `ProxyWorker.ExecuteAsync`, before any supervisor is built — a rejection logs `mbproxy.startup.config.rejected` and the service exits non-zero) **and** on every hot reload. It runs the following checks in order:
1. PLC names are non-empty and unique under ordinal comparison.
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list.
3. `AdminPort` is in `[1, 65535]` and does not collide with any `ListenPort`.
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list; every `Host` is non-empty and every backend `Port` is in `[1, 65535]`.
3. `AdminPort` is in `[1, 65535]`, or `0` to disable the admin endpoint; a non-zero `AdminPort` must not collide with any `ListenPort`.
4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup.
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` must be `>= 0`.
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` must be in `[0, 100000]` and `Cache.EvictionIntervalMs` must be `>= 0`.
6. `AdminPushIntervalMs` is in `[1, 60000]`; connection timeouts are `> 0`; the keepalive cross-field rule holds; and the `Resilience` profiles are well-formed (`BackendConnect.MaxAttempts >= 1` with at least `MaxAttempts - 1` non-negative `BackoffMs` entries, `ListenerRecovery.SteadyStateMs > 0`, `ReadCoalescing.MaxParties >= 1`).
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates.
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates (at startup, the service refuses to start).
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording.
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two validators overlap deliberately; their error wording is similar but not guaranteed identical.
### Rejected-reload example