Closes out Stream B per docs/v2/implementation/phase-6-1-resilience-and-observability.md. Core.Abstractions: - IDriverSupervisor — process-level supervisor contract a Tier C driver's out-of-process topology provides (Galaxy Proxy/Supervisor implements this in a follow-up Driver.Galaxy wiring PR). Concerns: DriverInstanceId + RecycleAsync. Tier A/B drivers don't implement this; Stream B code asserts tier == C before ever calling it. Core.Stability: - MemoryRecycle — companion to MemoryTracking. On HardBreach, invokes the supervisor IFF tier == C AND a supervisor is wired. Tier A/B HardBreach logs a promotion-to-Tier-C recommendation and returns false. Soft/None/Warming never triggers a recycle at any tier. - ScheduledRecycleScheduler — Tier C opt-in periodic recycler per decision #67. Ctor throws for Tier A/B (structural guard — scheduled recycle on an in-process driver would kill every OPC UA session and every co-hosted driver). TickAsync(now) advances the schedule by one interval per fire; RequestRecycleNowAsync drives an ad-hoc recycle without shifting the cron. - WedgeDetector — demand-aware per decision #147. Classify(state, demand, now) returns: * NotApplicable when driver state != Healthy * Idle when Healthy + no pending work (bulkhead=0 && monitored=0 && historic=0) * Healthy when Healthy + pending work + progress within threshold * Faulted when Healthy + pending work + no progress within threshold Threshold clamps to min 60 s. DemandSignal.HasPendingWork ORs the three counters. The three false-wedge cases the plan calls out all stay Healthy: idle subscription-only, slow historian backfill making progress, write-only burst with drained bulkhead. Tests (22 new, all pass): - MemoryRecycleTests (7): Tier C hard-breach requests recycle; Tier A/B hard-breach never requests; Tier C without supervisor no-ops; soft-breach at every tier never requests; None/Warming never request. - ScheduledRecycleSchedulerTests (6): ctor throws for A/B; zero/negative interval throws; tick before due no-ops; tick at/after due fires once and advances; RequestRecycleNow fires immediately without shifting schedule; multiple fires across ticks advance one interval each. - WedgeDetectorTests (9): threshold clamp to 60 s; unhealthy driver always NotApplicable; idle subscription stays Idle; pending+fresh progress stays Healthy; pending+stale progress is Faulted; MonitoredItems active but no publish is Faulted; MonitoredItems active with fresh publish stays Healthy; historian backfill with fresh progress stays Healthy; write-only burst with empty bulkhead is Idle; HasPendingWork theory for any non-zero counter. Full solution dotnet test: 989 passing (baseline 906, +83 for Phase 6.1 so far). Pre-existing Client.CLI Subscribe flake unchanged. Stream B complete. Next up: Stream C (health endpoints + structured logging). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
66 lines
3.1 KiB
C#
66 lines
3.1 KiB
C#
using Microsoft.Extensions.Logging;
|
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
|
|
|
namespace ZB.MOM.WW.OtOpcUa.Core.Stability;
|
|
|
|
/// <summary>
|
|
/// Tier C only process-recycle companion to <see cref="MemoryTracking"/>. On a
|
|
/// <see cref="MemoryTrackingAction.HardBreach"/> signal, invokes the supplied
|
|
/// <see cref="IDriverSupervisor"/> to restart the out-of-process Host.
|
|
/// </summary>
|
|
/// <remarks>
|
|
/// Per <c>docs/v2/plan.md</c> decisions #74 and #145. Tier A/B hard-breach on an in-process
|
|
/// driver would kill every OPC UA session and every co-hosted driver, so for Tier A/B this
|
|
/// class logs a <b>promotion-to-Tier-C recommendation</b> and does NOT invoke any supervisor.
|
|
/// A future tier-migration workflow acts on the recommendation.
|
|
/// </remarks>
|
|
public sealed class MemoryRecycle
|
|
{
|
|
private readonly DriverTier _tier;
|
|
private readonly IDriverSupervisor? _supervisor;
|
|
private readonly ILogger<MemoryRecycle> _logger;
|
|
|
|
public MemoryRecycle(DriverTier tier, IDriverSupervisor? supervisor, ILogger<MemoryRecycle> logger)
|
|
{
|
|
_tier = tier;
|
|
_supervisor = supervisor;
|
|
_logger = logger;
|
|
}
|
|
|
|
/// <summary>
|
|
/// Handle a <see cref="MemoryTracking"/> classification for the driver. For Tier C with a
|
|
/// wired supervisor, <c>HardBreach</c> triggers <see cref="IDriverSupervisor.RecycleAsync"/>.
|
|
/// All other combinations are no-ops with respect to process state (soft breaches + Tier A/B
|
|
/// hard breaches just log).
|
|
/// </summary>
|
|
/// <returns>True when a recycle was requested; false otherwise.</returns>
|
|
public async Task<bool> HandleAsync(MemoryTrackingAction action, long footprintBytes, CancellationToken cancellationToken)
|
|
{
|
|
switch (action)
|
|
{
|
|
case MemoryTrackingAction.SoftBreach:
|
|
_logger.LogWarning(
|
|
"Memory soft-breach on driver {DriverId}: footprint={Footprint:N0} bytes, tier={Tier}. Surfaced to Admin; no action.",
|
|
_supervisor?.DriverInstanceId ?? "(unknown)", footprintBytes, _tier);
|
|
return false;
|
|
|
|
case MemoryTrackingAction.HardBreach when _tier == DriverTier.C && _supervisor is not null:
|
|
_logger.LogError(
|
|
"Memory hard-breach on Tier C driver {DriverId}: footprint={Footprint:N0} bytes. Requesting supervisor recycle.",
|
|
_supervisor.DriverInstanceId, footprintBytes);
|
|
await _supervisor.RecycleAsync($"Memory hard-breach: {footprintBytes} bytes", cancellationToken).ConfigureAwait(false);
|
|
return true;
|
|
|
|
case MemoryTrackingAction.HardBreach:
|
|
_logger.LogError(
|
|
"Memory hard-breach on Tier {Tier} in-process driver {DriverId}: footprint={Footprint:N0} bytes. " +
|
|
"Recommending promotion to Tier C; NOT auto-killing (decisions #74, #145).",
|
|
_tier, _supervisor?.DriverInstanceId ?? "(unknown)", footprintBytes);
|
|
return false;
|
|
|
|
default:
|
|
return false;
|
|
}
|
|
}
|
|
}
|