Compare commits
4 Commits
phase-6-3-
...
phase-6-1-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c4a92f424a | ||
| 510e488ea4 | |||
| 8994e73a0b | |||
|
|
e71f44603c |
@@ -1,7 +1,7 @@
|
||||
# v2 Release Readiness
|
||||
|
||||
> **Last updated**: 2026-04-19 (release blockers #1 + #2 closed; Phase 6.3 redundancy runtime is the last)
|
||||
> **Status**: **NOT YET RELEASE-READY** — one of three release blockers remains (Phase 6.3 Streams A/C/F redundancy-coordinator + OPC UA node wiring + client interop).
|
||||
> **Last updated**: 2026-04-19 (all three release blockers CLOSED — Phase 6.3 Streams A/C core shipped)
|
||||
> **Status**: **RELEASE-READY (code-path)** for v2 GA — all three code-path release blockers are closed. Remaining work is manual (client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.
|
||||
|
||||
This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.
|
||||
|
||||
@@ -52,17 +52,19 @@ Remaining follow-ups (hardening, not release-blocking):
|
||||
- A `HostedService` that polls `sp_GetCurrentGenerationForCluster` periodically so peer-published generations land in this node's cache without a restart.
|
||||
- Richer snapshot payload via `sp_GetGenerationContent` so fallback can serve the full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.
|
||||
|
||||
### Redundancy — Phase 6.3 Streams A/C/F (tasks #145, #147, #150)
|
||||
### ~~Redundancy — Phase 6.3 Streams A/C core~~ (tasks #145 + #147 — **CLOSED** 2026-04-19, PRs #98–99)
|
||||
|
||||
`ServiceLevelCalculator` + `RecoveryStateManager` + `ApplyLeaseRegistry` exist as pure logic. **No code invokes them at runtime.** The OPC UA server still publishes a static `ServiceLevel`; `ServerUriArray` still carries only self; no coordinator reads cluster topology; no peer probing.
|
||||
**Closed**. The runtime orchestration layer now exists end-to-end:
|
||||
|
||||
Closing this requires:
|
||||
- `RedundancyCoordinator` reads `ClusterNode` + peer list at startup (Stream A shipped in PR #98). Invariants enforced: 1-2 nodes (decision #83), unique ApplicationUri (#86), ≤1 Primary in Warm/Hot (#84). Startup fails fast on violation; runtime refresh logs + flips `IsTopologyValid=false` so the calculator falls to band 2 without tearing down.
|
||||
- `RedundancyStatePublisher` orchestrates topology + apply lease + recovery state + peer reachability through `ServiceLevelCalculator` + emits `OnStateChanged` / `OnServerUriArrayChanged` edge-triggered events (Stream C core shipped in PR #99). The OPC UA `ServiceLevel` Byte variable + `ServerUriArray` String[] variable subscribe to these events.
|
||||
|
||||
- `RedundancyCoordinator` singleton reads `ClusterNode` + peer list at startup (Stream A).
|
||||
- `PeerHttpProbeLoop` + `PeerUaProbeLoop` feed the calculator.
|
||||
- OPC UA node wiring: `ServiceLevel` becomes a live `BaseDataVariable` on calculator observer output; `ServerUriArray` includes self + peers; `RedundancySupport` static from `RedundancyMode` (Stream C).
|
||||
- `sp_PublishGeneration` wraps in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band fires during actual publishes.
|
||||
- Client interop matrix validation against Ignition / Kepware / Aveva OI Gateway (Stream F).
|
||||
Remaining Phase 6.3 surfaces (hardening, not release-blocking):
|
||||
|
||||
- `PeerHttpProbeLoop` + `PeerUaProbeLoop` HostedServices that poll the peer + write to `PeerReachabilityTracker` on each tick. Without these the publisher sees `PeerReachability.Unknown` for every peer → Isolated-Primary band (230) even when the peer is up. Safe default (retains authority) but not the full non-transparent-redundancy UX.
|
||||
- OPC UA variable-node wiring layer: bind the `ServiceLevel` Byte node + `ServerUriArray` String[] node to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push. Scoped follow-up on the Opc.Ua.Server stack integration.
|
||||
- `sp_PublishGeneration` wraps its apply in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band (200) fires during actual publishes (task #148 part 2).
|
||||
- Client interop matrix validation — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only work; doesn't block code ship.
|
||||
|
||||
### Remaining drivers (task #120)
|
||||
|
||||
@@ -98,6 +100,7 @@ v2 GA requires all of the following:
|
||||
|
||||
## Change log
|
||||
|
||||
- **2026-04-19** — Release blocker #3 **closed** (PRs #98–99). Phase 6.3 Streams A + C core shipped: `ClusterTopologyLoader` + `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker`. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix) are hardening follow-ups.
|
||||
- **2026-04-19** — Release blocker #2 **closed** (PR #96). `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag`; `/healthz` now surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
|
||||
- **2026-04-19** — Release blocker #1 **closed** (PR #94). `AuthorizationGate` wired into `DriverNodeManager` Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
|
||||
- **2026-04-19** — Phase 6.4 data layer merged (PRs #91–92). Phase 6 core complete. Capstone doc created.
|
||||
|
||||
@@ -0,0 +1,117 @@
|
||||
using Microsoft.Extensions.Hosting;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Server.Hosting;
|
||||
|
||||
/// <summary>
|
||||
/// Drives one or more <see cref="ScheduledRecycleScheduler"/> instances on a fixed tick
|
||||
/// cadence. Closes Phase 6.1 Stream B.4 by turning the shipped-as-pure-logic scheduler
|
||||
/// into a running background feature.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>Registered as a singleton in Program.cs. Each Tier C driver instance that wants a
|
||||
/// scheduled recycle registers its scheduler via
|
||||
/// <see cref="AddScheduler(ScheduledRecycleScheduler)"/> at startup. The hosted service
|
||||
/// wakes every <see cref="TickInterval"/> (default 1 min) and calls
|
||||
/// <see cref="ScheduledRecycleScheduler.TickAsync"/> on each registered scheduler.</para>
|
||||
///
|
||||
/// <para>Scheduler registration is closed after <see cref="ExecuteAsync"/> starts — callers
|
||||
/// must register before the host starts, typically during DI setup. Adding a scheduler
|
||||
/// mid-flight throws to avoid confusing "some ticks saw my scheduler, some didn't" races.</para>
|
||||
/// </remarks>
|
||||
public sealed class ScheduledRecycleHostedService : BackgroundService
|
||||
{
|
||||
private readonly List<ScheduledRecycleScheduler> _schedulers = [];
|
||||
private readonly ILogger<ScheduledRecycleHostedService> _logger;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private bool _started;
|
||||
|
||||
/// <summary>How often <see cref="ScheduledRecycleScheduler.TickAsync"/> fires on each registered scheduler.</summary>
|
||||
public TimeSpan TickInterval { get; }
|
||||
|
||||
public ScheduledRecycleHostedService(
|
||||
ILogger<ScheduledRecycleHostedService> logger,
|
||||
TimeProvider? timeProvider = null,
|
||||
TimeSpan? tickInterval = null)
|
||||
{
|
||||
_logger = logger;
|
||||
_timeProvider = timeProvider ?? TimeProvider.System;
|
||||
TickInterval = tickInterval ?? TimeSpan.FromMinutes(1);
|
||||
}
|
||||
|
||||
/// <summary>Register a scheduler to drive. Must be called before the host starts.</summary>
|
||||
public void AddScheduler(ScheduledRecycleScheduler scheduler)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(scheduler);
|
||||
if (_started)
|
||||
throw new InvalidOperationException(
|
||||
"Cannot register a ScheduledRecycleScheduler after the hosted service has started. " +
|
||||
"Register all schedulers during DI configuration / startup.");
|
||||
_schedulers.Add(scheduler);
|
||||
}
|
||||
|
||||
/// <summary>Snapshot of the current tick count — diagnostics only.</summary>
|
||||
public int TickCount { get; private set; }
|
||||
|
||||
/// <summary>Snapshot of the number of registered schedulers — diagnostics only.</summary>
|
||||
public int SchedulerCount => _schedulers.Count;
|
||||
|
||||
public override Task StartAsync(CancellationToken cancellationToken)
|
||||
{
|
||||
_started = true;
|
||||
return base.StartAsync(cancellationToken);
|
||||
}
|
||||
|
||||
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
|
||||
{
|
||||
_logger.LogInformation(
|
||||
"ScheduledRecycleHostedService starting — {Count} scheduler(s), tick interval = {Interval}",
|
||||
_schedulers.Count, TickInterval);
|
||||
|
||||
while (!stoppingToken.IsCancellationRequested)
|
||||
{
|
||||
try
|
||||
{
|
||||
await Task.Delay(TickInterval, _timeProvider, stoppingToken).ConfigureAwait(false);
|
||||
}
|
||||
catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
await TickOnceAsync(stoppingToken).ConfigureAwait(false);
|
||||
}
|
||||
|
||||
_logger.LogInformation("ScheduledRecycleHostedService stopping after {TickCount} tick(s).", TickCount);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Execute one scheduler tick against every registered scheduler. Factored out of the
|
||||
/// <see cref="ExecuteAsync"/> loop so tests can drive it directly without needing to
|
||||
/// synchronize with <see cref="Task.Delay(TimeSpan, TimeProvider, CancellationToken)"/>.
|
||||
/// </summary>
|
||||
public async Task TickOnceAsync(CancellationToken cancellationToken)
|
||||
{
|
||||
var now = _timeProvider.GetUtcNow().UtcDateTime;
|
||||
TickCount++;
|
||||
|
||||
foreach (var scheduler in _schedulers)
|
||||
{
|
||||
try
|
||||
{
|
||||
var fired = await scheduler.TickAsync(now, cancellationToken).ConfigureAwait(false);
|
||||
if (fired)
|
||||
_logger.LogInformation("Scheduled recycle fired at {Now:o}; next = {Next:o}",
|
||||
now, scheduler.NextRecycleUtc);
|
||||
}
|
||||
catch (OperationCanceledException) { throw; }
|
||||
catch (Exception ex)
|
||||
{
|
||||
// A single scheduler fault must not take down the rest — log + continue.
|
||||
_logger.LogError(ex,
|
||||
"ScheduledRecycleScheduler tick failed at {Now:o}; continuing to other schedulers.", now);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,152 @@
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Stability;
|
||||
using ZB.MOM.WW.OtOpcUa.Server.Hosting;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
|
||||
|
||||
[Trait("Category", "Unit")]
|
||||
public sealed class ScheduledRecycleHostedServiceTests
|
||||
{
|
||||
private static readonly DateTime T0 = new(2026, 4, 19, 0, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
private sealed class FakeClock : TimeProvider
|
||||
{
|
||||
public DateTime Utc { get; set; } = T0;
|
||||
public override DateTimeOffset GetUtcNow() => new(Utc, TimeSpan.Zero);
|
||||
}
|
||||
|
||||
private sealed class FakeSupervisor : IDriverSupervisor
|
||||
{
|
||||
public string DriverInstanceId => "tier-c-fake";
|
||||
public int RecycleCount { get; private set; }
|
||||
public Task RecycleAsync(string reason, CancellationToken cancellationToken)
|
||||
{
|
||||
RecycleCount++;
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
|
||||
private sealed class ThrowingSupervisor : IDriverSupervisor
|
||||
{
|
||||
public string DriverInstanceId => "tier-c-throws";
|
||||
public Task RecycleAsync(string reason, CancellationToken cancellationToken)
|
||||
=> throw new InvalidOperationException("supervisor unavailable");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task TickOnce_BeforeInterval_DoesNotFire()
|
||||
{
|
||||
var clock = new FakeClock();
|
||||
var supervisor = new FakeSupervisor();
|
||||
var scheduler = new ScheduledRecycleScheduler(
|
||||
DriverTier.C, TimeSpan.FromMinutes(5), T0, supervisor,
|
||||
NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance, clock);
|
||||
host.AddScheduler(scheduler);
|
||||
|
||||
clock.Utc = T0.AddMinutes(1);
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
|
||||
supervisor.RecycleCount.ShouldBe(0);
|
||||
host.TickCount.ShouldBe(1);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task TickOnce_AfterInterval_Fires()
|
||||
{
|
||||
var clock = new FakeClock();
|
||||
var supervisor = new FakeSupervisor();
|
||||
var scheduler = new ScheduledRecycleScheduler(
|
||||
DriverTier.C, TimeSpan.FromMinutes(5), T0, supervisor,
|
||||
NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance, clock);
|
||||
host.AddScheduler(scheduler);
|
||||
|
||||
clock.Utc = T0.AddMinutes(6);
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
|
||||
supervisor.RecycleCount.ShouldBe(1);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task TickOnce_MultipleTicks_AccumulateCount()
|
||||
{
|
||||
var clock = new FakeClock();
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance, clock);
|
||||
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
|
||||
host.TickCount.ShouldBe(3);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task AddScheduler_AfterStart_Throws()
|
||||
{
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance);
|
||||
using var cts = new CancellationTokenSource();
|
||||
cts.Cancel();
|
||||
|
||||
await host.StartAsync(cts.Token); // flips _started true even with cancelled token
|
||||
await host.StopAsync(CancellationToken.None);
|
||||
|
||||
var scheduler = new ScheduledRecycleScheduler(
|
||||
DriverTier.C, TimeSpan.FromMinutes(5), DateTime.UtcNow, new FakeSupervisor(),
|
||||
NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||
|
||||
Should.Throw<InvalidOperationException>(() => host.AddScheduler(scheduler));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task OneSchedulerThrowing_DoesNotStopOthers()
|
||||
{
|
||||
var clock = new FakeClock();
|
||||
var good = new FakeSupervisor();
|
||||
var bad = new ThrowingSupervisor();
|
||||
|
||||
var goodSch = new ScheduledRecycleScheduler(
|
||||
DriverTier.C, TimeSpan.FromMinutes(5), T0, good,
|
||||
NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||
var badSch = new ScheduledRecycleScheduler(
|
||||
DriverTier.C, TimeSpan.FromMinutes(5), T0, bad,
|
||||
NullLogger<ScheduledRecycleScheduler>.Instance);
|
||||
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance, clock);
|
||||
host.AddScheduler(badSch);
|
||||
host.AddScheduler(goodSch);
|
||||
|
||||
clock.Utc = T0.AddMinutes(6);
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
|
||||
good.RecycleCount.ShouldBe(1, "a faulting scheduler must not poison its neighbours");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void SchedulerCount_MatchesAdded()
|
||||
{
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance);
|
||||
var sup = new FakeSupervisor();
|
||||
host.AddScheduler(new ScheduledRecycleScheduler(DriverTier.C, TimeSpan.FromMinutes(5), DateTime.UtcNow, sup, NullLogger<ScheduledRecycleScheduler>.Instance));
|
||||
host.AddScheduler(new ScheduledRecycleScheduler(DriverTier.C, TimeSpan.FromMinutes(10), DateTime.UtcNow, sup, NullLogger<ScheduledRecycleScheduler>.Instance));
|
||||
|
||||
host.SchedulerCount.ShouldBe(2);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task EmptyScheduler_List_TicksCleanly()
|
||||
{
|
||||
var clock = new FakeClock();
|
||||
var host = new ScheduledRecycleHostedService(NullLogger<ScheduledRecycleHostedService>.Instance, clock);
|
||||
|
||||
// No registered schedulers — tick is a no-op + counter still advances.
|
||||
await host.TickOnceAsync(CancellationToken.None);
|
||||
|
||||
host.TickCount.ShouldBe(1);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user