Files
lmxopcua/src/ZB.MOM.WW.OtOpcUa.Core/Stability/ScheduledRecycleScheduler.cs
Joseph Doherty 1d9008e354 Phase 6.1 Stream B.3/B.4/B.5 — MemoryRecycle + ScheduledRecycleScheduler + demand-aware WedgeDetector
Closes out Stream B per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

Core.Abstractions:
- IDriverSupervisor — process-level supervisor contract a Tier C driver's
  out-of-process topology provides (Galaxy Proxy/Supervisor implements this in
  a follow-up Driver.Galaxy wiring PR). Concerns: DriverInstanceId + RecycleAsync.
  Tier A/B drivers don't implement this; Stream B code asserts tier == C before
  ever calling it.

Core.Stability:
- MemoryRecycle — companion to MemoryTracking. On HardBreach, invokes the
  supervisor IFF tier == C AND a supervisor is wired. Tier A/B HardBreach logs
  a promotion-to-Tier-C recommendation and returns false. Soft/None/Warming
  never triggers a recycle at any tier.
- ScheduledRecycleScheduler — Tier C opt-in periodic recycler per decision #67.
  Ctor throws for Tier A/B (structural guard — scheduled recycle on an
  in-process driver would kill every OPC UA session and every co-hosted
  driver). TickAsync(now) advances the schedule by one interval per fire;
  RequestRecycleNowAsync drives an ad-hoc recycle without shifting the cron.
- WedgeDetector — demand-aware per decision #147. Classify(state, demand, now)
  returns:
    * NotApplicable when driver state != Healthy
    * Idle when Healthy + no pending work (bulkhead=0 && monitored=0 && historic=0)
    * Healthy when Healthy + pending work + progress within threshold
    * Faulted when Healthy + pending work + no progress within threshold
  Threshold clamps to min 60 s. DemandSignal.HasPendingWork ORs the three counters.
  The three false-wedge cases the plan calls out all stay Healthy: idle
  subscription-only, slow historian backfill making progress, write-only burst
  with drained bulkhead.

Tests (22 new, all pass):
- MemoryRecycleTests (7): Tier C hard-breach requests recycle; Tier A/B
  hard-breach never requests; Tier C without supervisor no-ops; soft-breach
  at every tier never requests; None/Warming never request.
- ScheduledRecycleSchedulerTests (6): ctor throws for A/B; zero/negative
  interval throws; tick before due no-ops; tick at/after due fires once and
  advances; RequestRecycleNow fires immediately without shifting schedule;
  multiple fires across ticks advance one interval each.
- WedgeDetectorTests (9): threshold clamp to 60 s; unhealthy driver always
  NotApplicable; idle subscription stays Idle; pending+fresh progress stays
  Healthy; pending+stale progress is Faulted; MonitoredItems active but no
  publish is Faulted; MonitoredItems active with fresh publish stays Healthy;
  historian backfill with fresh progress stays Healthy; write-only burst with
  empty bulkhead is Idle; HasPendingWork theory for any non-zero counter.

Full solution dotnet test: 989 passing (baseline 906, +83 for Phase 6.1 so far).
Pre-existing Client.CLI Subscribe flake unchanged.

Stream B complete. Next up: Stream C (health endpoints + structured logging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:03:18 -04:00

87 lines
4.1 KiB
C#

using Microsoft.Extensions.Logging;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
/// <summary>
/// Tier C opt-in periodic-recycle driver per <c>docs/v2/plan.md</c> decision #67.
/// A tick method advanced by the caller (fed by a background timer in prod; by test clock
/// in unit tests) decides whether the configured interval has elapsed and, if so, drives the
/// supplied <see cref="IDriverSupervisor"/> to recycle the Host.
/// </summary>
/// <remarks>
/// Tier A/B drivers MUST NOT use this class — scheduled recycle for in-process drivers would
/// kill every OPC UA session and every co-hosted driver. The ctor throws when constructed
/// with any tier other than C to make the misuse structurally impossible.
///
/// <para>Keeps no background thread of its own — callers invoke <see cref="TickAsync"/> on
/// their ambient scheduler tick (Phase 6.1 Stream C's health-endpoint host runs one). That
/// decouples the unit under test from wall-clock time and thread-pool scheduling.</para>
/// </remarks>
public sealed class ScheduledRecycleScheduler
{
private readonly TimeSpan _recycleInterval;
private readonly IDriverSupervisor _supervisor;
private readonly ILogger<ScheduledRecycleScheduler> _logger;
private DateTime _nextRecycleUtc;
/// <summary>
/// Construct the scheduler for a Tier C driver. Throws if <paramref name="tier"/> isn't C.
/// </summary>
/// <param name="tier">Driver tier; must be <see cref="DriverTier.C"/>.</param>
/// <param name="recycleInterval">Interval between recycles (e.g. 7 days).</param>
/// <param name="startUtc">Anchor time; next recycle fires at <paramref name="startUtc"/> + <paramref name="recycleInterval"/>.</param>
/// <param name="supervisor">Supervisor that performs the actual recycle.</param>
/// <param name="logger">Diagnostic sink.</param>
public ScheduledRecycleScheduler(
DriverTier tier,
TimeSpan recycleInterval,
DateTime startUtc,
IDriverSupervisor supervisor,
ILogger<ScheduledRecycleScheduler> logger)
{
if (tier != DriverTier.C)
throw new ArgumentException(
$"ScheduledRecycleScheduler is Tier C only (got {tier}). " +
"In-process drivers must not use scheduled recycle; see decisions #74 and #145.",
nameof(tier));
if (recycleInterval <= TimeSpan.Zero)
throw new ArgumentException("RecycleInterval must be positive.", nameof(recycleInterval));
_recycleInterval = recycleInterval;
_supervisor = supervisor;
_logger = logger;
_nextRecycleUtc = startUtc + recycleInterval;
}
/// <summary>Next scheduled recycle UTC. Advances by <see cref="RecycleInterval"/> on each fire.</summary>
public DateTime NextRecycleUtc => _nextRecycleUtc;
/// <summary>Recycle interval this scheduler was constructed with.</summary>
public TimeSpan RecycleInterval => _recycleInterval;
/// <summary>
/// Tick the scheduler forward. If <paramref name="utcNow"/> is past
/// <see cref="NextRecycleUtc"/>, requests a recycle from the supervisor and advances
/// <see cref="NextRecycleUtc"/> by exactly one interval. Returns true when a recycle fired.
/// </summary>
public async Task<bool> TickAsync(DateTime utcNow, CancellationToken cancellationToken)
{
if (utcNow < _nextRecycleUtc)
return false;
_logger.LogInformation(
"Scheduled recycle due for Tier C driver {DriverId} at {Now:o}; advancing next to {Next:o}.",
_supervisor.DriverInstanceId, utcNow, _nextRecycleUtc + _recycleInterval);
await _supervisor.RecycleAsync("Scheduled periodic recycle", cancellationToken).ConfigureAwait(false);
_nextRecycleUtc += _recycleInterval;
return true;
}
/// <summary>Request an immediate recycle outside the schedule (e.g. MemoryRecycle hard-breach escalation).</summary>
public Task RequestRecycleNowAsync(string reason, CancellationToken cancellationToken) =>
_supervisor.RecycleAsync(reason, cancellationToken);
}