Commit Graph

2 Commits

Author SHA1 Message Date
Joseph Doherty
9e4aae350b Task #151 — Modbus coalescing: periodic re-probe of auto-prohibitions
#148 introduced auto-prohibited coalesced ranges that persist for the
driver lifetime. Long-running deployments with transient PLC permission
changes (firmware update unlocking a previously-protected register,
operator reconfiguring the device) had no recovery short of operator
restart.

Adds an opt-in background loop that re-probes each prohibition periodically:

- ModbusDriverOptions.AutoProhibitReprobeInterval (TimeSpan?, default null
  = disabled). Set to e.g. TimeSpan.FromHours(1) to opt in.
- _autoProhibited refactored from HashSet<key> to Dictionary<key, DateTime>
  so each entry tracks its last failure / last re-probe timestamp.
- ReprobeLoopAsync runs on the same Task.Run pattern as ProbeLoopAsync;
  cancelled by ShutdownAsync. Each tick snapshots the prohibition set
  and issues a one-shot coalesced read per range. Successful re-probes
  drop the prohibition; failed ones bump the timestamp + leave the
  prohibition in place.
- Communication failures during re-probe (transport-level) are treated
  the same as PLC-exception failures — the prohibition stays, but isn't
  upgraded to "permanent" since transports recover. The driver-instance
  health surface picks up the failure separately.
- ShutdownAsync explicitly clears the prohibition set so a manual restart
  via ReinitializeAsync starts with a clean slate (matches the old
  "restart to clear" semantics).
- Factory DTO + JSON binding extended with AutoProhibitReprobeMs field.

Tests (2 new, additive to the 3 in ModbusCoalescingAutoRecoveryTests):
- Reprobe_Clears_Prohibition_When_Range_Becomes_Healthy — protected
  register at 102 records prohibition; clearing the simulated protection
  + invoking the re-probe drops the prohibition.
- Reprobe_Leaves_Prohibition_When_Range_Is_Still_Bad — re-probe on a
  still-failing range keeps the prohibition in place.

Tests use a new internal RunReprobeOnceForTestAsync helper to fire one
re-probe pass synchronously, so the suite doesn't have to wait on the
background timer (the loop's timer behaviour is exercised implicitly via
the InitializeAsync wire-up + the synchronous helper sharing the actual
re-probe code path).

234 + 2 = 236 unit tests green.
2026-04-25 01:12:48 -04:00
Joseph Doherty
3b0e093002 Task #148 — Modbus block-coalescing: auto-recover from protected register holes
Pre-#148 behaviour: a coalesced FC03/FC04 read that crossed a write-only or
PLC-fault register marked every member tag Bad until the operator manually
flagged the offending tag with CoalesceProhibited. Healthy tags around the
hole stayed broken indefinitely.

Post-#148: two-stage recovery, no operator intervention needed.

1. Same-scan fallback: when a coalesced read fails with a Modbus exception
   (IllegalDataAddress, SlaveDeviceFailure, etc.), the planner does NOT
   mark members handled. The per-tag fallback in the same scan reads each
   member individually — non-protected members surface Good values
   immediately, and only the actual protected register stays Bad.

2. Cross-scan prohibition: the failed range (Unit, Region, Start, End) is
   recorded in a per-driver `_autoProhibited` set. On subsequent scans the
   planner checks each candidate merge against the set and refuses to
   re-form any block that overlaps a known-bad range. Net effect: after one
   scan with a failure, the protected range goes "per-tag mode" indefinitely
   while ranges around it keep coalescing normally.

Communication failures (timeouts, socket drops) are NOT auto-prohibited —
they're transport-level, not structural. The same coalesced read can succeed
once the transport recovers; recording it as "permanently bad" would defeat
coalescing for the whole driver instance.

Auto-prohibition state lives for the driver lifetime and clears on
ReinitializeAsync (operator restart). A periodic re-probe is a follow-up if
deployments need it without a restart.

Implementation:
- Added `_autoProhibited` HashSet<(byte, ModbusRegion, ushort, ushort)> +
  `_autoProhibitedLock` on ModbusDriver.
- `RangeIsAutoProhibited(unit, region, start, end)` overlap check called
  from the planner when forming blocks.
- `RecordAutoProhibition(...)` called from the catch (ModbusException)
  branch.
- The catch (Exception) branch (non-Modbus failures) keeps the pre-#148
  "mark all Bad in this scan, don't auto-prohibit" behaviour.
- Internal `AutoProhibitedRangeCount` accessor for tests.

Tests (3 new ModbusCoalescingAutoRecoveryTests):
- First_Failure_Falls_Back_To_PerTag_Same_Scan — three tags around a
  protected register at 102: T100 + T104 surface Good values via the
  per-tag fallback in the SAME scan; T102 surfaces the exception.
- Second_Scan_Skips_Coalesced_Read_Of_Prohibited_Range — confirms scan 2
  doesn't re-attempt the failed merge (no FC03 with quantity > 1 at the
  prohibited start).
- Tags_Outside_Prohibited_Range_Still_Coalesce — separate cluster at HR
  200..202 keeps coalescing normally even after the 100..104 cluster is
  prohibited.

234/234 unit tests green.

Follow-ups intentionally NOT shipped (smaller, independent changes):
- Bisection-style range narrowing — currently the prohibition range is the
  full failed block; the planner doesn't try to find the exact protected
  register. Operator-visible diagnostic + prohibition stays correct.
- Periodic re-probe to clear stale prohibitions.
- Surface auto-prohibited ranges through GetHostStatuses or a new
  diagnostic so the Admin UI can show what's been auto-isolated.
2026-04-25 01:01:42 -04:00