mbproxy: Wave 3 cleanups, docs, and test gaps from 2026-05-14 review

Closes the Wave 3 (cleanup) tier of codereviews/2026-05-14/RemediationPlan.md.
Tests: 378 pass / 0 fail (baseline 370 + 8 new W3 regression tests).

Code cleanups:
  * PlcMultiplexer: removed dead `elapsedMs` calculation (the actual EWMA
    conversion uses Stopwatch ticks two lines below).
  * UpstreamPipe.FillAsync: dropped the meaningless `firstRead && remaining
    == count ? false : false` ternary; both branches were `false`.
  * InFlightByKeyMap.TryAttachOrCreate (always returned `true`) renamed to
    `AttachOrCreate` and made `void`. Test sites updated to drop the dead
    `bool ok = ...; ok.ShouldBeTrue();` assertions.
  * BcdCodec.HasBadNibble promoted from private to internal; the duplicate
    copy in BcdPduPipeline removed and the call sites updated to
    `BcdCodec.HasBadNibble`.
  * PlcMultiplexer watchdog comment fixed: said "1-second floor", code uses
    100 ms. Now both agree.
  * StatusSnapshotBuilder: simplified the unreachable
    `RemoteEp?.ToString() ?? RemoteEp?.Address.ToString() ?? "?"` to
    `RemoteEp?.ToString() ?? "?"`.
  * Mbproxy.csproj: stale "deferred" Polly comment replaced with a real
    description of where Polly is used (BackendConnect + ListenerRecovery).

Doc updates:
  * README: added a callout about the unconventional 32-bit BCD wire format
    ("two base-10000 digits in CDAB", not standard binary CDAB Int32) so
    integrators using off-the-shelf clients learn about the silent-corruption
    hazard before configuring writes.
  * docs/design.md: clarified `cacheMissCount` and `coalescedMissCount`
    semantics — "miss" means "did not find a fresh entry / did not coalesce",
    NOT "produced a backend round-trip". Operators wanting actual backend
    traffic should compute `miss − coalescedHit − exception04`.
  * docs/Architecture/ResponseCache.md: documented the structural
    "skip invalidation while recovering" gating (no backend reader during
    recovery → no FC06/FC16 response → no invalidation).
  * docs/Operations/Configuration.md: noted that the Event Log sink is the
    custom EventLogBridge, not Serilog.Sinks.EventLog (W2.23 cached check).
  * docs/plan/README.md: added a Phase 12 row pointing at the remediation
    plan and linking out to codereviews/2026-05-14/.

Test additions (W3 high-value gaps):
  * BcdPduPipelineTests:
    - FC16_WriteStartsOnHighWord_Of32BitPair_PassesThroughRaw_WithPartialWarning
      (symmetric inverse of the existing low-side partial-overlap test).
    - FC03_Mixed_16Bit_32Bit_AndNonBcd_InOneRead_OnlyConfiguredSlotsRewritten
      (mixed-slot routing in a single FC03 read).
    - FC16_Response_PassesThroughUnchanged_RegardlessOfTagMap (FC16 response
      carries no register data; rewriter must pass through).
  * AdminEndpointTests:
    - NonGetMethod_AgainstAdminRoutes_Returns405 (Theory: POST/PUT/DELETE/
      PATCH against `/` and `/status.json` must return 405; guards against
      an accidental MapPost being added later).
  * HotReloadE2ETests:
    - E2E_TagListReload_OnCacheablePlc_EmitsCacheFlushedEvent (validates the
      W2.8 cache.flushed wiring end-to-end via the real FileSystemWatcher
      reload path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 06:06:52 -04:00
parent e66b17fe5f
commit 7ead3581ab
16 changed files with 236 additions and 49 deletions
@@ -303,6 +303,17 @@ cached read is still consistent with the device's actual state. Skipping
the invalidation matches reality — the write did not take effect, so the
read is not stale.
The skip is **structural**, not conditional. Cache invalidation only
fires inside the per-PLC backend reader task, after a non-exception
FC06/FC16 response arrives from the PLC. A `recovering` supervisor has
torn down its multiplexer and there is no backend reader, so no response
can land and the invalidation path is never entered. This is the
reasoning the code at `Proxy/Multiplexing/PlcMultiplexer.cs` documents
inline (W2.9). If a future change ever produced a write response off the
live backend (e.g. a mocked-response path), an explicit `Recovering`
check would need to be added at the invalidator call site to keep the
skip semantics correct.
## No Persistence
The cache is purely in-memory. Process restart wipes every entry. There
+2
View File
@@ -86,6 +86,8 @@ Every supported key under `Mbproxy:*`, populated to a representative default:
`Serilog` configuration is documented in [`./Troubleshooting.md`](./Troubleshooting.md) and lives outside the `Mbproxy` section.
> The Windows Event Log sink is **not** the standard `Serilog.Sinks.EventLog` package. It is a custom `EventLogBridge` (`src/Mbproxy/Diagnostics/EventLogBridge.cs`) that writes Error+ events to the `mbproxy` source under `Application` only when the service runs under the SCM. Event Log source registration is intentionally NOT attempted at runtime (the service account may not be admin); `install.ps1` registers the source at install time. Don't add `Serilog.Sinks.EventLog` — the bridge would duplicate every event. The bridge caches the source-exists check at construction (Phase 12 / W2.23), so a missing source produces no per-event registry traffic.
## `Mbproxy.AdminPort`
Port for the read-only HTTP status server. Binds to all interfaces on startup.
+2 -2
View File
@@ -136,7 +136,7 @@ Properties:
- **Hot-reloadable on/off.** `Mbproxy.Resilience.ReadCoalescing.Enabled` defaults to `true`. Flipping it to `false` at runtime leaves running coalesced entries to drain naturally; subsequent FC03/04 requests take the Phase-9 (one round-trip per upstream request) path.
- **Transparency contract preserved.** Each upstream client still sees its own original MBAP TxId on the response. The BCD rewriter runs once on the shared response buffer; per-party copies are only made when fan-out has more than one party.
Counter accounting balance (per snapshot): `coalescedHitCount + coalescedMissCount` equals the total FC03 + FC04 requests seen since the multiplexer was constructed. Both counters increment regardless of whether the coalescing feature is enabled — `coalescedHitCount` is 0 when disabled, but every read still increments `coalescedMissCount`.
Counter accounting balance (per snapshot): `coalescedHitCount + coalescedMissCount` equals the total FC03 + FC04 requests seen since the multiplexer was constructed. Both counters increment regardless of whether the coalescing feature is enabled — `coalescedHitCount` is 0 when disabled, but every read still increments `coalescedMissCount`. **Saturation paths (allocator full, duplicate-key race) also count as a miss** even though they produce no backend round-trip — the identity above is preserved by counting every entry into the coalescing path, not every backend send. Operators wanting "actual backend round-trips opened" should subtract the multiplexer's exception-04 frames produced from saturation.
## Response cache (Phase 11) — opt-in bounded-staleness cache
@@ -173,7 +173,7 @@ The BCD rewriter runs once on the cache-miss path (the backend reader task decod
### Counter accounting
- `cacheHitCount` — FC03/FC04 requests served from the cache.
- `cacheMissCount` — FC03/FC04 requests that fell through to the coalescing/backend path. (Cache hit + Cache miss = total FC03/FC04 requests that were cache-eligible, i.e. whose resolved TTL was > 0; reads whose effective TTL is 0 increment neither.)
- `cacheMissCount` — FC03/FC04 requests that fell through to the coalescing/backend path. (Cache hit + Cache miss = total FC03/FC04 requests that were cache-eligible, i.e. whose resolved TTL was > 0; reads whose effective TTL is 0 increment neither.) **A "miss" does NOT mean "produced a backend round-trip."** Two upstream peers issuing the same cache-eligible read both increment `cacheMissCount`; one then opens a backend round-trip and the other coalesces onto it via the InFlightByKey path (incrementing `coalescedHitCount`). Operators reading these counters as "backend reads opened" should use `cacheMissCount coalescedHitCount` as the lower bound on actual backend traffic.
- `cacheInvalidations` — count of cache entries invalidated by FC06/FC16 write responses.
- `cacheEntryCount` — point-in-time snapshot of `ResponseCache.Count` (Tier-2 memory-watch KPI).
- `cacheBytes` — point-in-time approximation of cached PDU bytes (Tier-2 memory-watch KPI).
+1
View File
@@ -20,6 +20,7 @@ Phase-by-phase implementation plan for the `mbproxy` service. Each phase is a se
| 09 | [TxId multiplexing](09-txid-multiplexing.md) — single backend connection per PLC (post-1.0 follow-on) | 04, 05, 07 | — |
| 10 | [Read coalescing](10-read-coalescing.md) — in-flight FC03/04 dedup (post-1.0 follow-on) | 09 | — |
| 11 | [Response cache](11-response-cache.md) — short-TTL post-response cache, bounded staleness (post-1.0; **design-contract pivot**) | 10 | — |
| 12 | Code-review remediation (2026-05-14) — Wave 1 critical, Wave 2 major, Wave 3 cleanup. Plan and findings in [`../../codereviews/2026-05-14/`](../../codereviews/2026-05-14/RemediationPlan.md). | 11 | — |
```
┌── 01 (sim) ──┐