feat(redundancy): gate alarm historization on Primary (A2, defensive — actor currently unfed)

HistorianAdapterActor now subscribes to the redundancy-state DPS topic,
caches the local node's RedundancyRole, and SKIPS the durable-sink enqueue
when the local node is Secondary or Detached. Unknown/null role default-writes
so single-node deploys and the boot window never silently drop historization.
GetStatus stays ungated.

PREMISE: verified the actor is registered but FED BY NOTHING in production —
there is no AlarmHistorianEvent producer and nothing resolves its registry key
to Tell it. This is a FORWARD-LOOKING / DEFENSIVE guard, not a fix for a live
double-write: the moment a per-node feeder lands (engine -> historian, expected
as a per-node cluster broadcast like the alerts topic), only the Primary will
write to the durable sink (exactly-once across all alarm sources).

Mirrors the sibling A1 treatment of ScriptedAlarmHostActor (06c4155) and
OpcUaPublishActor's redundancy-state handler. localNode threaded through
HistorianAdapterActor.Props from ServiceCollectionExtensions (roleInfo.LocalNode).
This commit is contained in:
Joseph Doherty
2026-06-11 08:57:41 -04:00
parent 9ac9f0b7a9
commit 0742946108
3 changed files with 206 additions and 4 deletions
@@ -0,0 +1,138 @@
using Akka.Actor;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Commons.Messages.Redundancy;
using ZB.MOM.WW.OtOpcUa.Commons.Types;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
using ZB.MOM.WW.OtOpcUa.Runtime.Historian;
using ZB.MOM.WW.OtOpcUa.Runtime.Tests.Harness;
namespace ZB.MOM.WW.OtOpcUa.Runtime.Tests.Historian;
/// <summary>
/// TestKit coverage for <see cref="HistorianAdapterActor"/>'s Primary-only historization gate.
/// The actor caches this node's <see cref="RedundancyRole"/> from the <c>redundancy-state</c>
/// topic and SKIPS the sink enqueue when the local node is Secondary/Detached so a future
/// per-node feeder writes exactly once across the warm-redundant pair. Unknown/null role
/// default-writes (single-node deploys + the boot window must never silently drop historization).
/// </summary>
public sealed class HistorianAdapterActorTests : RuntimeActorTestBase
{
/// <summary>The local node id the gating tests construct the adapter with.</summary>
private static readonly NodeId LocalNode = new("node-A");
/// <summary>A short window we allow the fire-and-forget enqueue to land within.</summary>
private static readonly TimeSpan Settle = TimeSpan.FromMilliseconds(500);
/// <summary>Thread-safe fake sink that records every <see cref="EnqueueAsync"/> call.</summary>
private sealed class RecordingSink : IAlarmHistorianSink
{
private int _count;
/// <summary>The number of <see cref="EnqueueAsync"/> calls observed so far.</summary>
public int EnqueueCount => Volatile.Read(ref _count);
/// <inheritdoc />
public Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken)
{
Interlocked.Increment(ref _count);
return Task.CompletedTask;
}
/// <inheritdoc />
public HistorianSinkStatus GetStatus() => new(
QueueDepth: 0,
DeadLetterDepth: 0,
LastDrainUtc: null,
LastSuccessUtc: null,
LastError: null,
DrainState: HistorianDrainState.Idle);
}
/// <summary>Builds a minimal <see cref="AlarmHistorianEvent"/> for the gate tests.</summary>
private static AlarmHistorianEvent SampleEvent() => new(
AlarmId: "alm-1",
EquipmentPath: "Area/Line/Equip",
AlarmName: "HiHi",
AlarmTypeName: "LimitAlarm",
Severity: AlarmSeverity.High,
EventKind: "Activated",
Message: "level high",
User: "system",
Comment: null,
TimestampUtc: DateTime.UtcNow);
/// <summary>Tell <paramref name="actor"/> a <see cref="RedundancyStateChanged"/> snapshot marking
/// <see cref="LocalNode"/> with <paramref name="role"/> so the gate observes the local role.</summary>
private static void TellRedundancyRole(IActorRef actor, RedundancyRole role) =>
actor.Tell(new RedundancyStateChanged(
new[]
{
new NodeRedundancyState(
NodeId: LocalNode,
Role: role,
IsClusterLeader: role == RedundancyRole.Primary,
IsRoleLeaderForDriver: role == RedundancyRole.Primary,
AsOfUtc: DateTime.UtcNow),
},
CorrelationId.NewId()));
/// <summary>Default-write (T1): before any redundancy snapshot — the boot window and the steady
/// state for single-node deploys — the adapter MUST historize. Constructed WITH a localNode but
/// no snapshot sent, so the cached role is unknown ⇒ default-write.</summary>
[Fact]
public void Default_before_redundancy_state_historizes()
{
var sink = new RecordingSink();
var actor = Sys.ActorOf(HistorianAdapterActor.Props(sink, LocalNode));
actor.Tell(SampleEvent());
AwaitAssert(() => sink.EnqueueCount.ShouldBe(1), Settle);
}
/// <summary>Secondary suppression (T2): when the cached local role is Secondary, the adapter MUST
/// NOT enqueue to the durable sink (the Primary writes the single copy).</summary>
[Fact]
public void Secondary_node_does_not_historize()
{
var sink = new RecordingSink();
var actor = Sys.ActorOf(HistorianAdapterActor.Props(sink, LocalNode));
TellRedundancyRole(actor, RedundancyRole.Secondary);
actor.Tell(SampleEvent());
// Give the (suppressed) fire-and-forget a stable window, then assert nothing landed.
ExpectNoMsg(Settle);
sink.EnqueueCount.ShouldBe(0);
}
/// <summary>Detached suppression (T3): a Detached node likewise MUST NOT historize.</summary>
[Fact]
public void Detached_node_does_not_historize()
{
var sink = new RecordingSink();
var actor = Sys.ActorOf(HistorianAdapterActor.Props(sink, LocalNode));
TellRedundancyRole(actor, RedundancyRole.Detached);
actor.Tell(SampleEvent());
ExpectNoMsg(Settle);
sink.EnqueueCount.ShouldBe(0);
}
/// <summary>Primary writes (T4): when the cached local role is Primary, the adapter historizes as
/// normal (this is the single copy the durable sink sees).</summary>
[Fact]
public void Primary_node_historizes()
{
var sink = new RecordingSink();
var actor = Sys.ActorOf(HistorianAdapterActor.Props(sink, LocalNode));
TellRedundancyRole(actor, RedundancyRole.Primary);
actor.Tell(SampleEvent());
AwaitAssert(() => sink.EnqueueCount.ShouldBe(1), Settle);
}
}