#148 introduced auto-prohibited coalesced ranges that persist for the
driver lifetime. Long-running deployments with transient PLC permission
changes (firmware update unlocking a previously-protected register,
operator reconfiguring the device) had no recovery short of operator
restart.
Adds an opt-in background loop that re-probes each prohibition periodically:
- ModbusDriverOptions.AutoProhibitReprobeInterval (TimeSpan?, default null
= disabled). Set to e.g. TimeSpan.FromHours(1) to opt in.
- _autoProhibited refactored from HashSet<key> to Dictionary<key, DateTime>
so each entry tracks its last failure / last re-probe timestamp.
- ReprobeLoopAsync runs on the same Task.Run pattern as ProbeLoopAsync;
cancelled by ShutdownAsync. Each tick snapshots the prohibition set
and issues a one-shot coalesced read per range. Successful re-probes
drop the prohibition; failed ones bump the timestamp + leave the
prohibition in place.
- Communication failures during re-probe (transport-level) are treated
the same as PLC-exception failures — the prohibition stays, but isn't
upgraded to "permanent" since transports recover. The driver-instance
health surface picks up the failure separately.
- ShutdownAsync explicitly clears the prohibition set so a manual restart
via ReinitializeAsync starts with a clean slate (matches the old
"restart to clear" semantics).
- Factory DTO + JSON binding extended with AutoProhibitReprobeMs field.
Tests (2 new, additive to the 3 in ModbusCoalescingAutoRecoveryTests):
- Reprobe_Clears_Prohibition_When_Range_Becomes_Healthy — protected
register at 102 records prohibition; clearing the simulated protection
+ invoking the re-probe drops the prohibition.
- Reprobe_Leaves_Prohibition_When_Range_Is_Still_Bad — re-probe on a
still-failing range keeps the prohibition in place.
Tests use a new internal RunReprobeOnceForTestAsync helper to fire one
re-probe pass synchronously, so the suite doesn't have to wait on the
background timer (the loop's timer behaviour is exercised implicitly via
the InitializeAsync wire-up + the synchronous helper sharing the actual
re-probe code path).
234 + 2 = 236 unit tests green.
Pre-#148 behaviour: a coalesced FC03/FC04 read that crossed a write-only or
PLC-fault register marked every member tag Bad until the operator manually
flagged the offending tag with CoalesceProhibited. Healthy tags around the
hole stayed broken indefinitely.
Post-#148: two-stage recovery, no operator intervention needed.
1. Same-scan fallback: when a coalesced read fails with a Modbus exception
(IllegalDataAddress, SlaveDeviceFailure, etc.), the planner does NOT
mark members handled. The per-tag fallback in the same scan reads each
member individually — non-protected members surface Good values
immediately, and only the actual protected register stays Bad.
2. Cross-scan prohibition: the failed range (Unit, Region, Start, End) is
recorded in a per-driver `_autoProhibited` set. On subsequent scans the
planner checks each candidate merge against the set and refuses to
re-form any block that overlaps a known-bad range. Net effect: after one
scan with a failure, the protected range goes "per-tag mode" indefinitely
while ranges around it keep coalescing normally.
Communication failures (timeouts, socket drops) are NOT auto-prohibited —
they're transport-level, not structural. The same coalesced read can succeed
once the transport recovers; recording it as "permanently bad" would defeat
coalescing for the whole driver instance.
Auto-prohibition state lives for the driver lifetime and clears on
ReinitializeAsync (operator restart). A periodic re-probe is a follow-up if
deployments need it without a restart.
Implementation:
- Added `_autoProhibited` HashSet<(byte, ModbusRegion, ushort, ushort)> +
`_autoProhibitedLock` on ModbusDriver.
- `RangeIsAutoProhibited(unit, region, start, end)` overlap check called
from the planner when forming blocks.
- `RecordAutoProhibition(...)` called from the catch (ModbusException)
branch.
- The catch (Exception) branch (non-Modbus failures) keeps the pre-#148
"mark all Bad in this scan, don't auto-prohibit" behaviour.
- Internal `AutoProhibitedRangeCount` accessor for tests.
Tests (3 new ModbusCoalescingAutoRecoveryTests):
- First_Failure_Falls_Back_To_PerTag_Same_Scan — three tags around a
protected register at 102: T100 + T104 surface Good values via the
per-tag fallback in the SAME scan; T102 surfaces the exception.
- Second_Scan_Skips_Coalesced_Read_Of_Prohibited_Range — confirms scan 2
doesn't re-attempt the failed merge (no FC03 with quantity > 1 at the
prohibited start).
- Tags_Outside_Prohibited_Range_Still_Coalesce — separate cluster at HR
200..202 keeps coalescing normally even after the 100..104 cluster is
prohibited.
234/234 unit tests green.
Follow-ups intentionally NOT shipped (smaller, independent changes):
- Bisection-style range narrowing — currently the prohibition range is the
full failed block; the planner doesn't try to find the exact protected
register. Operator-visible diagnostic + prohibition stays correct.
- Periodic re-probe to clear stale prohibitions.
- Surface auto-prohibited ranges through GetHostStatuses or a new
diagnostic so the Admin UI can show what's been auto-isolated.