f2c6669444
Eight area-focused reviews (BCD rewriter, multiplexer, response cache, supervisor + hot-reload, admin + diagnostics, hosting + options, test suite) plus an Overview that prioritises findings across areas, and a RemediationPlan that groups the work into three waves with per-item file:line citations and regression-test sketches. Findings call out: hot-reload tag-list/cache changes that don't reach the running multiplexer, a coalescing factory leak that hangs late attachers, backend-reader head-of-line block on a wedged upstream, stranded outbound frames after cascade, and ShutdownCoordinator double-stop ordering. Plus the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB, not standard binary), unreachable BcdValidationError.DuplicateAddress, mbproxy.cache.flushed event that's defined but never emitted, and missing test coverage for Cache.AllowLongTtl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
69 lines
12 KiB
Markdown
69 lines
12 KiB
Markdown
# BCD Rewriter Code Review
|
|
|
|
Scope: `src/Mbproxy/Bcd/*`, `src/Mbproxy/Proxy/{BcdPduPipeline,IPduPipeline,NoopPduPipeline,MbapFrame,RewriterLogEvents}.cs`, the corresponding `BcdTagOptions`/`BcdTagListOptions`, and the matching tests under `tests/Mbproxy.Tests/Bcd/*` and `tests/Mbproxy.Tests/Proxy/Bcd*Tests.cs`. Cross-checked against `docs/design.md` (BCD tag shape, Rewriter scope), `docs/Features/BcdRewriting.md`, and `DL260/dl205.md` (BCD encoding + CDAB word order).
|
|
|
|
## Summary
|
|
|
|
- The 16-bit and 32-bit BCD codec is mathematically correct and well-tested; bounds checks use the `(uint)x > Max` cast trick which correctly rejects negatives. Round-trip and CDAB ordering match the design contract.
|
|
- The 32-bit FC16 write/response handling on the wire is **NOT** standard CDAB binary int32 — it is a "two base-10000 digits in CDAB order" format (`value = high*10000 + low`). The codec and pipeline are internally consistent with the design contract, but this contract is unconventional and silently mis-translates any client that sends/expects a true binary int32 split CDAB. Worth a doc callout.
|
|
- `BcdTagMapBuilder` cannot detect duplicate addresses (the input collapses through a `Dictionary` first), so the `DuplicateAddress` validation path is effectively dead code; the test file even acknowledges this. The `OverlappingHighRegister` check, however, reports a single overlap as **two** errors (both directions of the pair).
|
|
- `MbapFrame` is intentionally a parser only — no bounds enforcement. Sound, but the proxy as a whole has no defence against a `length` field exceeding `MaxPduBodySize`; this needs to be enforced by the caller (out of scope for this review but worth flagging).
|
|
|
|
## Critical findings
|
|
|
|
None. No data-corruption or memory-safety bugs. No fence-post / off-by-one in the partial-overlap detection. The MBAP length invariant is preserved.
|
|
|
|
## Major findings
|
|
|
|
**M1. 32-bit FC16/FC03/FC04 wire format is not standard CDAB int32 (`BcdPduPipeline.cs:202-206, 373-379`).** The reconstruction `int binaryValue = clientHigh * 10_000 + clientLow;` and emission `decodedLow = decoded % 10_000; decodedHigh = decoded / 10_000;` together define a "two-base-10000-digits CDAB" format, **not** binary CDAB. This is **internally consistent** with `design.md:123` (`Decoded decimal = high * 10000 + low`), but it diverges from how every standard Modbus client (NModbus, FluentModbus, Wonderware DAServer) interprets a CDAB Int32. A client writing binary 12_345_678 with normal CDAB-int32 semantics sends `(low=0xBC4E=48206, high=0x00BC=188)`; the proxy then computes `188*10000 + 48206 = 2_128_206` and silently encodes that as BCD. This is a silent-corruption hazard that depends entirely on operator awareness of the contract. Recommend either documenting prominently in the *operator-facing* README (not just `design.md`/`BcdRewriting.md`) or reconsidering whether the wire format should be standard CDAB binary.
|
|
|
|
**M2. `BcdValidationError.DuplicateAddress` is unreachable (`BcdTagMapBuilder.cs:84-103`).** The working set is built by iterating into a `Dictionary<ushort, BcdTagOptions>` (lines 58-79), which deduplicates by key on the way in. The `seenAddresses.Add(addr)` call at line 98 then iterates *that* dictionary, where every key is already unique by construction. The test at `BcdTagMapBuilderTests.cs:101-137` documents this gap explicitly. The design's stated requirement ("reject duplicate addresses within a single PLC's resolved list" — `design.md:94`) is therefore not enforced; duplicate `Add` entries silently last-write-wins. Either drop the dead check, or detect duplicates before the dictionary collapse (iterate the lists, not the working dict).
|
|
|
|
**M3. `OverlappingHighRegister` double-reports each overlap (`BcdTagMapBuilder.cs:113-127`).** The collision check iterates every 32-bit tag and looks up the address-of-its-high-register. If two tags `(A=100, W=32)` and `(A=101, W=32)` collide on register 101, the loop reports the collision twice (once when processing the first 32-bit tag, once when processing the second). Worse, when a 32-bit pair `(A=100, W=32)` collides with a 16-bit `(A=101, W=16)`, the lookup hits the 16-bit tag and reports correctly — but if the *16-bit* tag's address happens to also be a 32-bit pair high register, it isn't separately checked from the 16-bit side. Recommend tracking reported pairs in a `HashSet<(ushort,ushort)>` to dedupe.
|
|
|
|
**M4. FC16 reconstruction can silently truncate/mutate values when `clientLow > 9999` (`BcdPduPipeline.cs:206-211`).** A client that writes `(clientLow=9999, clientHigh=9999)` (intending value 0x9999_9999 in binary CDAB) produces `binaryValue = 99_989_999`, which `Encode32` *accepts* and re-encodes as BCD `(low=0x9999, high=0x9998)`. The high word silently changed. The codec validates the value range, but the pipeline never validates that `clientLow < 10_000` and `clientHigh < 10_000` *before* running the formula. Recommend adding `if (clientLow > 9999 || clientHigh > 9999) → invalid_bcd, passthrough` before the multiplication.
|
|
|
|
**M5. FC16 `byteCount` field is not validated (`BcdPduPipeline.cs:159`).** `pdu[5]` is parsed but discarded. A malformed client claiming `qty=10` but sending only 4 bytes of register data is caught by the per-slot bounds check (line 191 / 234) and that slot is silently skipped. This is *safe* (no buffer overrun) but means non-BCD slots upstream of the bad slot still get processed and counted. Reasonable defensiveness, but a single explicit `if (pdu.Length < 6 + qty*2) return;` at the top would be cleaner and would prevent the misleading "successfully rewrote half a multi-write" outcome.
|
|
|
|
## Minor findings
|
|
|
|
**N1. `HasBadNibble` is duplicated** in `BcdCodec.cs:103-110` (private) and `BcdPduPipeline.cs:455-459`. They're identical. The pipeline's copy exists only to support the "which word was bad" reporting at `BcdPduPipeline.cs:364-365`. Either expose the codec's helper as `internal` or accept the duplication as a perf/locality choice and add a comment.
|
|
|
|
**N2. FC16 32-bit write `IncrementInvalidBcd` reports `clientLow` as the raw value (`BcdPduPipeline.cs:215-216`)**, not the reconstructed `binaryValue`. Operator looking at the log sees a 16-bit value when the actual problem is the 32-bit reconstruction. Use `binaryValue` (or both) in the message.
|
|
|
|
**N3. `BcdTagMap.s_emptyHits` is a singleton `Array.Empty<RangeHit>()` but `TryGetForRange` allocates a `List<RangeHit>(4)` on every hit (`BcdTagMap.cs:127`).** For a fleet doing 1000s of FC03/s on PLCs with even one BCD tag, that's a Gen-0 allocation per request. Consider stack-allocating a small `Span<RangeHit>` or a per-thread reusable list. *Probably* fine for current load; flag if memory profiling shows it.
|
|
|
|
**N4. `BcdTagMap.TryGetForRange` does a linear scan over `_map.Values` (`BcdTagMap.cs:107`)** — O(N) over all tags, not O(hits). For a PLC with ~10 BCD tags this is trivial; for a PLC with hundreds it could matter. The data structure is `FrozenDictionary<ushort, BcdTag>`; a sorted-by-address array with a binary search for the range start would be O(log N + hits). Fine for current scale.
|
|
|
|
**N5. `Decode32` decodes high then low (`BcdCodec.cs:95-96`)** — the comment at lines 92-94 acknowledges the choice but says it's "natural order". Consumers who care about which word's bad-nibble fires first will see *high*'s exception, but the rewriter then re-checks via `HasBadNibble(rawLow)` first. Slight inconsistency between codec error attribution and pipeline log attribution. Cosmetic.
|
|
|
|
**N6. `BcdTag.HighRegister` throws `InvalidOperationException` for 16-bit tags (`BcdTag.cs:44-48`)** — fine, but combined with the validator iterating `validated.Values` and only branching on `IsThirtyTwoBit`, the throw is currently unreachable. Future maintainers might call it carelessly — documented well.
|
|
|
|
**N7. FC16 response is not handled at all (`BcdPduPipeline.cs:289-291`)** — explicit `return`. Correct per the spec (response carries no register values), but the test suite doesn't cover an FC16 response, only the request. One xUnit `[Fact]` would close the gap.
|
|
|
|
**N8. Test gap: no test asserts that a `BcdCodec.Encode16(int.MinValue)` doesn't blow up unexpectedly.** Negative inputs are tested only with `-1`. The `(uint)value > Max16` cast handles `int.MinValue` correctly (it becomes `0x80000000`), but a direct test would lock the contract.
|
|
|
|
**N9. Test gap: no test exercises a write range that ends exactly mid-32-bit-pair on the high address (`qty=2, startAddress = tag.Address - 1`)** — i.e., the inverse of the `BcdPduPipelineTests.FC16_WritePartiallyOverlapping32BitPair_PassesThroughRaw_WithPartialWarning` test. Currently the test only covers partial-overlap-on-the-low-side.
|
|
|
|
**N10. Test gap: no test covers the **mixed 16/32/non-BCD in one FC03 read** scenario.** `FC03_Mixed_16BitBcd_And_NonBcd_InSameRead_OnlyBcdSlotRewritten` covers 16 + non-BCD only; no test mixes a 32-bit pair with non-BCD or with a 16-bit tag in one request.
|
|
|
|
## What looks good
|
|
|
|
- **Codec is genuinely allocation-free on the success path.** Pure value-type math, tuple return, and bounds checks via `(uint)cast` rather than two-sided comparisons. Clean.
|
|
- **MBAP transparency invariant** (length field never modified, always re-encoding ushort→ushort) is honored end-to-end. The dedicated `PduLength_IsNeverChangedByRewriting` test enforces it.
|
|
- **Exception-response detection** (`fc & 0x80`) at `BcdPduPipeline.cs:264` is correctly placed *before* the FC switch, so FC03/04 exception responses (`0x83`/`0x84`) are not accidentally treated as register-data responses. The exception-code counter routing is also correct.
|
|
- **Pass-through for unrelated FCs (01, 02, 05, 15)** is implicit and correct — the `switch` in `ProcessRequest` and `ProcessResponse` simply doesn't match them. Tests confirm.
|
|
- **Stateless pipeline + per-call `WithCurrentRequest` clone** (`PerPlcContext.cs:61`) is a clean way to pass per-PDU state without a thread-static or a lock. The single per-call clone allocation is pragmatic.
|
|
- **`FrozenDictionary` for the tag map** is the right .NET 8+ choice for a build-once / read-many lookup table.
|
|
- **`s_emptyHits` sentinel + early returns in `TryGetForRange`** mean the no-hit path (the overwhelming majority for non-BCD addresses on a 54-PLC fleet) is allocation-free.
|
|
- **Stable `EventName`s on `[LoggerMessage]` attributes** (`RewriterLogEvents.cs:14, 31, 47`) match the naming convention in `design.md` and survive a rename refactor of the C# method names.
|
|
- **FC06 echo decode rationale** (NModbus client validation) is documented inline at line 285-286 and the slot-counter is intentionally skipped to avoid double-counting (line 448-449). Good attention to detail.
|
|
- **Partial-overlap detection in `BcdPduPipeline.ProcessFc06Request`** correctly handles *both* the low-of-pair case (via `tag.IsThirtyTwoBit` after a hit) **and** the high-of-pair case (via `TryGetForRange(address, 1)` returning a negative `OffsetWords`). Subtle but right.
|
|
|
|
## Open questions
|
|
|
|
- **Is the "two-base-10000 digits in CDAB" wire format for 32-bit BCD an intentional contract or a bug?** The design doc, the codec, the pipeline, and the tests all agree. But every standard Modbus client treats CDAB as straight binary int32. If the upstream Wonderware/Historian configurations were built knowing this, fine; if they were configured assuming standard CDAB-int32, they're silently producing garbage on writes and reading garbage on reads any time the value exceeds `9_999`. (This is a question for the integrator, not the codebase.)
|
|
- **Should the `Cache.AllowLongTtl` validation reject `CacheTtlMs > 60_000` at the `BcdTagMapBuilder` layer or only at the options-validator layer?** The builder happily accepts any `CacheTtlMs >= 0` (line 107-110), so the long-TTL gate must live elsewhere. Couldn't tell from the in-scope files alone.
|
|
- **`BcdTagMapBuilder.Build` includes `OverlappingHighRegister` errors but still emits the conflicting tag in the map (lines 130-136)**. The XML doc says "Callers should treat any error as a fatal configuration problem at startup," but if a caller ignores errors, the map is *partially valid* with overlapping entries. Is the contract "reject loudly at startup, never run with errors" enforced by the host, or is there a code path where the service runs with `Errors.Count > 0`?
|
|
- **No FC23 (Read/Write Multiple Registers) handling.** The DL260 reference doesn't list FC23 as supported, so it's safely ignored, but if one ever shows up the proxy will pass it through unrewritten. Probably fine; worth confirming the DL fleet never advertises FC23.
|