Files
lmxopcua/src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/ApplyLeaseRegistry.cs
Joseph Doherty 483f55557c Phase 6.3 Stream B + Stream D (core) — ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry
Lands the pure-logic heart of Phase 6.3. OPC UA node wiring (Stream C),
RedundancyCoordinator topology loader (Stream A), Admin UI + metrics (Stream E),
and client interop tests (Stream F) are follow-up work — tracked as
tasks #145-150.

New Server.Redundancy sub-namespace:

- ServiceLevelCalculator — pure 8-state matrix per decision #154. Inputs:
  role, selfHealthy, peerUa/HttpHealthy, applyInProgress, recoveryDwellMet,
  topologyValid, operatorMaintenance. Output: OPC UA Part 5 §6.3.34 Byte.
  Reserved bands (0=Maintenance, 1=NoData, 2=InvalidTopology) override
  everything; operational bands occupy 30..255.
  Key invariants:
    * Authoritative-Primary = 255, Authoritative-Backup = 100.
    * Isolated-Primary = 230 (retains authority with peer down).
    * Isolated-Backup = 80 (does NOT auto-promote — non-transparent model).
    * Primary-Mid-Apply = 200, Backup-Mid-Apply = 50; apply dominates
      peer-unreachable per Stream C.4 integration expectation.
    * Recovering-Primary = 180, Recovering-Backup = 30.
    * Standalone treats healthy as Authoritative-Primary (no peer concept).
- ServiceLevelBand enum — labels every numeric band for logs + Admin UI.
  Values match the calculator table exactly; compliance script asserts
  drift detection.
- RecoveryStateManager — holds Recovering band until (dwell ≥ 60s default)
  AND (one publish witness observed). Re-fault resets both gates so a
  flapping node doesn't shortcut through recovery twice.
- ApplyLeaseRegistry — keyed on (ConfigGenerationId, PublishRequestId) per
  decision #162. BeginApplyLease returns an IAsyncDisposable so every exit
  path (success, exception, cancellation, dispose-twice) closes the lease.
  ApplyMaxDuration watchdog (10 min default) via PruneStale tick forces
  close after a crashed publisher so ServiceLevel can't stick at mid-apply.

Tests (40 new, all pass):
- ServiceLevelCalculatorTests (27): reserved bands override; self-unhealthy
  → NoData; invalid topology demotes both nodes to 2; authoritative primary
  255; backup 100; isolated primary 230 retains authority; isolated backup
  80 does not promote; http-only unreachable triggers isolated; mid-apply
  primary 200; mid-apply backup 50; apply dominates peer-unreachable; recovering
  primary 180; recovering backup 30; standalone treats healthy as 255;
  classify round-trips every band including Unknown sentinel.
- RecoveryStateManagerTests (6): never-faulted auto-meets dwell; faulted-only
  returns true (semantics-doc test — coordinator short-circuits on
  selfHealthy=false); recovered without witness never meets; witness without
  dwell never meets; witness + dwell-elapsed meets; re-fault resets.
- ApplyLeaseRegistryTests (7): empty registry not-in-progress; begin+dispose
  closes; dispose on exception still closes; dispose twice safe; concurrent
  leases isolated; watchdog closes stale; watchdog leaves recent alone.

Full solution dotnet test: 1137 passing (Phase 6.2 shipped at 1097, Phase 6.3
B + D core = +40 = 1137). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:56:34 -04:00

86 lines
3.2 KiB
C#

using System.Collections.Concurrent;
namespace ZB.MOM.WW.OtOpcUa.Server.Redundancy;
/// <summary>
/// Tracks in-progress publish-generation apply leases keyed on
/// <c>(ConfigGenerationId, PublishRequestId)</c>. Per decision #162 a sealed lease pattern
/// ensures <see cref="IsApplyInProgress"/> reflects every exit path (success / exception /
/// cancellation) because the IAsyncDisposable returned by <see cref="BeginApplyLease"/>
/// decrements unconditionally.
/// </summary>
/// <remarks>
/// A watchdog loop calls <see cref="PruneStale"/> periodically with the configured
/// <see cref="ApplyMaxDuration"/>; any lease older than that is force-closed so a crashed
/// publisher can't pin the node at <see cref="ServiceLevelBand.PrimaryMidApply"/>.
/// </remarks>
public sealed class ApplyLeaseRegistry
{
private readonly ConcurrentDictionary<LeaseKey, DateTime> _leases = new();
private readonly TimeProvider _timeProvider;
public TimeSpan ApplyMaxDuration { get; }
public ApplyLeaseRegistry(TimeSpan? applyMaxDuration = null, TimeProvider? timeProvider = null)
{
ApplyMaxDuration = applyMaxDuration ?? TimeSpan.FromMinutes(10);
_timeProvider = timeProvider ?? TimeProvider.System;
}
/// <summary>
/// Register a new lease. Returns an <see cref="IAsyncDisposable"/> whose disposal
/// decrements the registry; use <c>await using</c> in the caller so every exit path
/// closes the lease.
/// </summary>
public IAsyncDisposable BeginApplyLease(long generationId, Guid publishRequestId)
{
var key = new LeaseKey(generationId, publishRequestId);
_leases[key] = _timeProvider.GetUtcNow().UtcDateTime;
return new LeaseScope(this, key);
}
/// <summary>True when at least one apply lease is currently open.</summary>
public bool IsApplyInProgress => !_leases.IsEmpty;
/// <summary>Current open-lease count — diagnostics only.</summary>
public int OpenLeaseCount => _leases.Count;
/// <summary>Force-close any lease older than <see cref="ApplyMaxDuration"/>. Watchdog tick.</summary>
/// <returns>Number of leases the watchdog closed on this tick.</returns>
public int PruneStale()
{
var now = _timeProvider.GetUtcNow().UtcDateTime;
var closed = 0;
foreach (var kv in _leases)
{
if (now - kv.Value > ApplyMaxDuration && _leases.TryRemove(kv.Key, out _))
closed++;
}
return closed;
}
private void Release(LeaseKey key) => _leases.TryRemove(key, out _);
private readonly record struct LeaseKey(long GenerationId, Guid PublishRequestId);
private sealed class LeaseScope : IAsyncDisposable
{
private readonly ApplyLeaseRegistry _owner;
private readonly LeaseKey _key;
private int _disposed;
public LeaseScope(ApplyLeaseRegistry owner, LeaseKey key)
{
_owner = owner;
_key = key;
}
public ValueTask DisposeAsync()
{
if (Interlocked.Exchange(ref _disposed, 1) == 0)
_owner.Release(_key);
return ValueTask.CompletedTask;
}
}
}