Eight area-focused reviews (BCD rewriter, multiplexer, response cache, supervisor + hot-reload, admin + diagnostics, hosting + options, test suite) plus an Overview that prioritises findings across areas, and a RemediationPlan that groups the work into three waves with per-item file:line citations and regression-test sketches. Findings call out: hot-reload tag-list/cache changes that don't reach the running multiplexer, a coalescing factory leak that hangs late attachers, backend-reader head-of-line block on a wedged upstream, stranded outbound frames after cascade, and ShutdownCoordinator double-stop ordering. Plus the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB, not standard binary), unreachable BcdValidationError.DuplicateAddress, mbproxy.cache.flushed event that's defined but never emitted, and missing test coverage for Cache.AllowLongTtl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
BCD Rewriter Code Review
Scope: src/Mbproxy/Bcd/*, src/Mbproxy/Proxy/{BcdPduPipeline,IPduPipeline,NoopPduPipeline,MbapFrame,RewriterLogEvents}.cs, the corresponding BcdTagOptions/BcdTagListOptions, and the matching tests under tests/Mbproxy.Tests/Bcd/* and tests/Mbproxy.Tests/Proxy/Bcd*Tests.cs. Cross-checked against docs/design.md (BCD tag shape, Rewriter scope), docs/Features/BcdRewriting.md, and DL260/dl205.md (BCD encoding + CDAB word order).
Summary
- The 16-bit and 32-bit BCD codec is mathematically correct and well-tested; bounds checks use the
(uint)x > Maxcast trick which correctly rejects negatives. Round-trip and CDAB ordering match the design contract. - The 32-bit FC16 write/response handling on the wire is NOT standard CDAB binary int32 — it is a "two base-10000 digits in CDAB order" format (
value = high*10000 + low). The codec and pipeline are internally consistent with the design contract, but this contract is unconventional and silently mis-translates any client that sends/expects a true binary int32 split CDAB. Worth a doc callout. BcdTagMapBuildercannot detect duplicate addresses (the input collapses through aDictionaryfirst), so theDuplicateAddressvalidation path is effectively dead code; the test file even acknowledges this. TheOverlappingHighRegistercheck, however, reports a single overlap as two errors (both directions of the pair).MbapFrameis intentionally a parser only — no bounds enforcement. Sound, but the proxy as a whole has no defence against alengthfield exceedingMaxPduBodySize; this needs to be enforced by the caller (out of scope for this review but worth flagging).
Critical findings
None. No data-corruption or memory-safety bugs. No fence-post / off-by-one in the partial-overlap detection. The MBAP length invariant is preserved.
Major findings
M1. 32-bit FC16/FC03/FC04 wire format is not standard CDAB int32 (BcdPduPipeline.cs:202-206, 373-379). The reconstruction int binaryValue = clientHigh * 10_000 + clientLow; and emission decodedLow = decoded % 10_000; decodedHigh = decoded / 10_000; together define a "two-base-10000-digits CDAB" format, not binary CDAB. This is internally consistent with design.md:123 (Decoded decimal = high * 10000 + low), but it diverges from how every standard Modbus client (NModbus, FluentModbus, Wonderware DAServer) interprets a CDAB Int32. A client writing binary 12_345_678 with normal CDAB-int32 semantics sends (low=0xBC4E=48206, high=0x00BC=188); the proxy then computes 188*10000 + 48206 = 2_128_206 and silently encodes that as BCD. This is a silent-corruption hazard that depends entirely on operator awareness of the contract. Recommend either documenting prominently in the operator-facing README (not just design.md/BcdRewriting.md) or reconsidering whether the wire format should be standard CDAB binary.
M2. BcdValidationError.DuplicateAddress is unreachable (BcdTagMapBuilder.cs:84-103). The working set is built by iterating into a Dictionary<ushort, BcdTagOptions> (lines 58-79), which deduplicates by key on the way in. The seenAddresses.Add(addr) call at line 98 then iterates that dictionary, where every key is already unique by construction. The test at BcdTagMapBuilderTests.cs:101-137 documents this gap explicitly. The design's stated requirement ("reject duplicate addresses within a single PLC's resolved list" — design.md:94) is therefore not enforced; duplicate Add entries silently last-write-wins. Either drop the dead check, or detect duplicates before the dictionary collapse (iterate the lists, not the working dict).
M3. OverlappingHighRegister double-reports each overlap (BcdTagMapBuilder.cs:113-127). The collision check iterates every 32-bit tag and looks up the address-of-its-high-register. If two tags (A=100, W=32) and (A=101, W=32) collide on register 101, the loop reports the collision twice (once when processing the first 32-bit tag, once when processing the second). Worse, when a 32-bit pair (A=100, W=32) collides with a 16-bit (A=101, W=16), the lookup hits the 16-bit tag and reports correctly — but if the 16-bit tag's address happens to also be a 32-bit pair high register, it isn't separately checked from the 16-bit side. Recommend tracking reported pairs in a HashSet<(ushort,ushort)> to dedupe.
M4. FC16 reconstruction can silently truncate/mutate values when clientLow > 9999 (BcdPduPipeline.cs:206-211). A client that writes (clientLow=9999, clientHigh=9999) (intending value 0x9999_9999 in binary CDAB) produces binaryValue = 99_989_999, which Encode32 accepts and re-encodes as BCD (low=0x9999, high=0x9998). The high word silently changed. The codec validates the value range, but the pipeline never validates that clientLow < 10_000 and clientHigh < 10_000 before running the formula. Recommend adding if (clientLow > 9999 || clientHigh > 9999) → invalid_bcd, passthrough before the multiplication.
M5. FC16 byteCount field is not validated (BcdPduPipeline.cs:159). pdu[5] is parsed but discarded. A malformed client claiming qty=10 but sending only 4 bytes of register data is caught by the per-slot bounds check (line 191 / 234) and that slot is silently skipped. This is safe (no buffer overrun) but means non-BCD slots upstream of the bad slot still get processed and counted. Reasonable defensiveness, but a single explicit if (pdu.Length < 6 + qty*2) return; at the top would be cleaner and would prevent the misleading "successfully rewrote half a multi-write" outcome.
Minor findings
N1. HasBadNibble is duplicated in BcdCodec.cs:103-110 (private) and BcdPduPipeline.cs:455-459. They're identical. The pipeline's copy exists only to support the "which word was bad" reporting at BcdPduPipeline.cs:364-365. Either expose the codec's helper as internal or accept the duplication as a perf/locality choice and add a comment.
N2. FC16 32-bit write IncrementInvalidBcd reports clientLow as the raw value (BcdPduPipeline.cs:215-216), not the reconstructed binaryValue. Operator looking at the log sees a 16-bit value when the actual problem is the 32-bit reconstruction. Use binaryValue (or both) in the message.
N3. BcdTagMap.s_emptyHits is a singleton Array.Empty<RangeHit>() but TryGetForRange allocates a List<RangeHit>(4) on every hit (BcdTagMap.cs:127). For a fleet doing 1000s of FC03/s on PLCs with even one BCD tag, that's a Gen-0 allocation per request. Consider stack-allocating a small Span<RangeHit> or a per-thread reusable list. Probably fine for current load; flag if memory profiling shows it.
N4. BcdTagMap.TryGetForRange does a linear scan over _map.Values (BcdTagMap.cs:107) — O(N) over all tags, not O(hits). For a PLC with ~10 BCD tags this is trivial; for a PLC with hundreds it could matter. The data structure is FrozenDictionary<ushort, BcdTag>; a sorted-by-address array with a binary search for the range start would be O(log N + hits). Fine for current scale.
N5. Decode32 decodes high then low (BcdCodec.cs:95-96) — the comment at lines 92-94 acknowledges the choice but says it's "natural order". Consumers who care about which word's bad-nibble fires first will see high's exception, but the rewriter then re-checks via HasBadNibble(rawLow) first. Slight inconsistency between codec error attribution and pipeline log attribution. Cosmetic.
N6. BcdTag.HighRegister throws InvalidOperationException for 16-bit tags (BcdTag.cs:44-48) — fine, but combined with the validator iterating validated.Values and only branching on IsThirtyTwoBit, the throw is currently unreachable. Future maintainers might call it carelessly — documented well.
N7. FC16 response is not handled at all (BcdPduPipeline.cs:289-291) — explicit return. Correct per the spec (response carries no register values), but the test suite doesn't cover an FC16 response, only the request. One xUnit [Fact] would close the gap.
N8. Test gap: no test asserts that a BcdCodec.Encode16(int.MinValue) doesn't blow up unexpectedly. Negative inputs are tested only with -1. The (uint)value > Max16 cast handles int.MinValue correctly (it becomes 0x80000000), but a direct test would lock the contract.
N9. Test gap: no test exercises a write range that ends exactly mid-32-bit-pair on the high address (qty=2, startAddress = tag.Address - 1) — i.e., the inverse of the BcdPduPipelineTests.FC16_WritePartiallyOverlapping32BitPair_PassesThroughRaw_WithPartialWarning test. Currently the test only covers partial-overlap-on-the-low-side.
N10. Test gap: no test covers the mixed 16/32/non-BCD in one FC03 read scenario. FC03_Mixed_16BitBcd_And_NonBcd_InSameRead_OnlyBcdSlotRewritten covers 16 + non-BCD only; no test mixes a 32-bit pair with non-BCD or with a 16-bit tag in one request.
What looks good
- Codec is genuinely allocation-free on the success path. Pure value-type math, tuple return, and bounds checks via
(uint)castrather than two-sided comparisons. Clean. - MBAP transparency invariant (length field never modified, always re-encoding ushort→ushort) is honored end-to-end. The dedicated
PduLength_IsNeverChangedByRewritingtest enforces it. - Exception-response detection (
fc & 0x80) atBcdPduPipeline.cs:264is correctly placed before the FC switch, so FC03/04 exception responses (0x83/0x84) are not accidentally treated as register-data responses. The exception-code counter routing is also correct. - Pass-through for unrelated FCs (01, 02, 05, 15) is implicit and correct — the
switchinProcessRequestandProcessResponsesimply doesn't match them. Tests confirm. - Stateless pipeline + per-call
WithCurrentRequestclone (PerPlcContext.cs:61) is a clean way to pass per-PDU state without a thread-static or a lock. The single per-call clone allocation is pragmatic. FrozenDictionaryfor the tag map is the right .NET 8+ choice for a build-once / read-many lookup table.s_emptyHitssentinel + early returns inTryGetForRangemean the no-hit path (the overwhelming majority for non-BCD addresses on a 54-PLC fleet) is allocation-free.- Stable
EventNames on[LoggerMessage]attributes (RewriterLogEvents.cs:14, 31, 47) match the naming convention indesign.mdand survive a rename refactor of the C# method names. - FC06 echo decode rationale (NModbus client validation) is documented inline at line 285-286 and the slot-counter is intentionally skipped to avoid double-counting (line 448-449). Good attention to detail.
- Partial-overlap detection in
BcdPduPipeline.ProcessFc06Requestcorrectly handles both the low-of-pair case (viatag.IsThirtyTwoBitafter a hit) and the high-of-pair case (viaTryGetForRange(address, 1)returning a negativeOffsetWords). Subtle but right.
Open questions
- Is the "two-base-10000 digits in CDAB" wire format for 32-bit BCD an intentional contract or a bug? The design doc, the codec, the pipeline, and the tests all agree. But every standard Modbus client treats CDAB as straight binary int32. If the upstream Wonderware/Historian configurations were built knowing this, fine; if they were configured assuming standard CDAB-int32, they're silently producing garbage on writes and reading garbage on reads any time the value exceeds
9_999. (This is a question for the integrator, not the codebase.) - Should the
Cache.AllowLongTtlvalidation rejectCacheTtlMs > 60_000at theBcdTagMapBuilderlayer or only at the options-validator layer? The builder happily accepts anyCacheTtlMs >= 0(line 107-110), so the long-TTL gate must live elsewhere. Couldn't tell from the in-scope files alone. BcdTagMapBuilder.BuildincludesOverlappingHighRegistererrors but still emits the conflicting tag in the map (lines 130-136). The XML doc says "Callers should treat any error as a fatal configuration problem at startup," but if a caller ignores errors, the map is partially valid with overlapping entries. Is the contract "reject loudly at startup, never run with errors" enforced by the host, or is there a code path where the service runs withErrors.Count > 0?- No FC23 (Read/Write Multiple Registers) handling. The DL260 reference doesn't list FC23 as supported, so it's safely ignored, but if one ever shows up the proxy will pass it through unrewritten. Probably fine; worth confirming the DL fleet never advertises FC23.