Merge pull request '[abcip] AbCip — IPerCallHostResolver failover routing' (#398) from auto/abcip/5.2 into auto/driver-gaps

This commit was merged in pull request #398.
This commit is contained in:
2026-04-26 08:16:21 -04:00
5 changed files with 1031 additions and 42 deletions

View File

@@ -1,14 +1,18 @@
# AbCip — ControlLogix HSBY paired-IP support
PR abcip-5.1 adds **non-transparent** HSBY (Hot-Standby) awareness to the AB
CIP driver. Each device may declare a partner gateway; when both gateways are
up the driver concurrently probes a role tag on each chassis and reports
which one is currently Active.
PR abcip-5.1 + 5.2 ship **non-transparent** HSBY (Hot-Standby) awareness
to the AB CIP driver. Each device may declare a partner gateway; when both
gateways are up the driver concurrently probes a role tag on each chassis,
reports which one is currently Active, and routes reads / writes through
that chassis automatically.
PR abcip-5.1 only **gathers + reports** the role. PR abcip-5.2 is the
follow-up that wires the resolved active address into
`AbCipDriver.ResolveHost` so reads and writes route to whichever chassis is
Active without operator intervention.
- **PR abcip-5.1** — gathers + reports the role of each chassis through
driver diagnostics. See [Role-tag detection matrix](#role-tag-detection-matrix)
+ [Active-resolution rules](#active-resolution-rules).
- **PR abcip-5.2** — wires the resolved active address into
`AbCipDriver.ResolveHost` and the runtime-cache lifecycle. See
[Failover behaviour](#failover-behaviour-pr-52) +
[Failure-mode walkthrough](#failure-mode-walkthrough).
## When to use HSBY paired IPs
@@ -24,7 +28,8 @@ edited the config.
PR abcip-5.1 closes the visibility half of that gap by reading the role tag
on both chassis. PR abcip-5.2 closes the routing half by re-pointing
`ResolveHost` at the Active address each tick.
`ResolveHost` at the Active address each tick + invalidating the per-tag
runtime cache + write-coalescer state on every flip.
## Configuration
@@ -88,14 +93,17 @@ The driver surfaces three diagnostics counters per HSBY-enabled device
| `AbCip.HsbyActive` | `1` if primary is Active, `2` if partner is Active, `0` if neither (or HSBY off) |
| `AbCip.HsbyPrimaryRole` | `(int)HsbyRole``0` = Unknown, `1` = Active, `2` = Standby, `3` = Disqualified |
| `AbCip.HsbyPartnerRole` | Same encoding as `HsbyPrimaryRole`, observed on the partner chassis |
| `AbCip.HsbyFailoverCount` (PR 5.2) | Total number of `ActiveAddress` transitions the probe loop has observed across every HSBY-enabled device on this driver. Each increment maps to one runtime-cache invalidation + write-coalescer reset. |
When more than one HSBY pair is configured on the same driver instance the
flat keys are scoped per primary host: `AbCip.HsbyActive[ab://10.0.0.5/1,0]`,
etc.
The `DeviceState.ActiveAddress` field (internal; surfaced via
`HsbyActive` diagnostics) is the address PR 5.2 will route through
`ResolveHost`.
`HsbyActive` diagnostics) is the address PR 5.2 routes through
`ResolveHost` + uses to scope the per-host bulkhead / breaker key.
See [Failover behaviour](#failover-behaviour-pr-52) for the runtime
implications.
### Active-resolution rules
@@ -104,8 +112,8 @@ The `DeviceState.ActiveAddress` field (internal; surfaced via
| Active | Standby / Disqualified / Unknown | primary |
| Standby / Disqualified / Unknown | Active | partner |
| Active | Active (split-brain) | **primary wins**, warning logged |
| Standby + Standby | Standby + Standby | `null` (PR 5.2 will surface as `BadCommunicationError`) |
| Unknown + Unknown | Unknown + Unknown | `null` |
| Standby + Standby | Standby + Standby | `null` PR 5.2's `ResolveHost` falls back to the configured primary; the existing dial flow surfaces `BadCommunicationError` if the primary is also down. See [Both-stuck](#both-stuck-no-chassis-active). |
| Unknown + Unknown | Unknown + Unknown | `null` (same fallback as Standby + Standby) |
Split-brain (both chassis claim Active simultaneously) is a real
production failure mode — typically a redundancy-module misconfiguration or
@@ -150,28 +158,167 @@ otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0 \
RoleTagAddress, ProbeIntervalMs}` survive deserialise → driver →
`DeviceState`).
- `Hsby.Enabled = false` → no role probing.
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/AbCipHsbyRoleProberTests.cs`):
- **Skipped by default** (`Assert.Skip`) — `ab_server` cannot emulate
a ControlLogix HSBY pair (no `WallClockTime.SyncStatus`, no second
chassis concept). The Docker `paired` profile (PR 5.1) brings up two
`ab_server` instances + a stub `hsby-mux` sidecar so the topology is
documented, but PR 5.2 follow-up needs a patched `ab_server` image
that actually serves the role tag before the integration test can
assert anything against the wire.
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby` finds
this test once it's promoted.
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/`):
- `AbCipHsbyRoleProberTests.cs` (PR 5.1) and
`AbCipHsbyFailoverTests.cs` (PR 5.2) — both **skipped by default**
(`Assert.Skip`). `ab_server` cannot emulate a ControlLogix HSBY
pair (no `WallClockTime.SyncStatus`, no second chassis concept).
The Docker `paired` profile (PR 5.1) brings up two `ab_server`
instances + a stub `hsby-mux` sidecar so the topology is
documented, but a patched `ab_server` image that actually serves
the role tag is still on the follow-up list.
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby`
finds them once they're promoted.
- **End-to-end** (`scripts/e2e/test-abcip-hsby.ps1`, PR 5.2):
- Paired-fixture variant of `test-abcip.ps1`. Subscribes to a tag
through the OPC UA server, flips the active chassis mid-stream
via the `hsby-mux` sidecar's `POST /flip` endpoint, asserts the
stream survives + `AbCip.HsbyFailoverCount` increments. Gated
on operator-supplied `BridgeNodeId` + a running paired fixture;
ships unwired into `test-all.ps1` until the patched `ab_server`
lands.
## Follow-ups (PR 5.2 + beyond)
## Failover behaviour (PR 5.2)
PR 5.2 wires `DeviceState.ActiveAddress` into the read / write hot path
through `AbCipDriver.ResolveHost` and the runtime-cache lifecycle. After
the role-probe loop (PR 5.1) detects an active-address transition the
driver re-points every wire-level operation at the now-Active chassis
without operator intervention.
### What flips on a failover
| Aspect | Pre-flip | Post-flip |
|---|---|---|
| `ResolveHost(tag)` return | primary `HostAddress` | the partner address (when partner is now Active) |
| Per-tag libplctag handles in `DeviceState.Runtimes` | created against primary gateway | dropped on flip; lazily re-created against the partner gateway on next read / write |
| Parent-DINT RMW handles in `DeviceState.ParentRuntimes` | primary gateway | dropped on flip; same re-create-on-demand path |
| `AbCipWriteCoalescer` per-device cache | last-known-written values from the primary | reset; the first write of any value to the partner pays the full round-trip |
| `LogicalInstanceMap` (Logical-mode `@tags` walk) | populated for primary | cleared; the next read on a Logical-mode device re-walks `@tags` against the partner |
| Per-host bulkhead key (Polly bulkhead + breaker, plan decision #144) | keyed on primary `HostAddress` | keyed on the new active address — the partner gets its own fresh breaker state instead of inheriting a tripped breaker from the now-standby |
| `AbCip.HsbyFailoverCount` diagnostic | 0 | incremented by 1 on every transition observed by the probe loop |
### How the invalidation runs
PR 5.2 introduces an internal `OnActiveAddressChanged` event raised by
`HsbyProbeLoopAsync` on every `DeviceState.ActiveAddress` transition. The
driver subscribes to it from its own constructor; the handler
(`HandleActiveAddressChanged`) does the cache invalidation in one place:
1. Disposes every entry in `DeviceState.Runtimes` and
`DeviceState.ParentRuntimes`, then clears both dicts. Disposed
`IAbCipTagRuntime` instances release their underlying libplctag
handles so the native heap doesn't leak.
2. Clears `DeviceState.LogicalInstanceMap` and resets
`LogicalWalkComplete = false` so the next read on a Logical-mode
device re-fires the `@tags` symbol walk against the new chassis.
3. Calls `AbCipWriteCoalescer.Reset(deviceHostAddress)` so cached
"we already wrote 42" decisions don't stale-suppress the first
partner-side write.
4. Resets `DeviceState.RuntimesAddress = null` so subsequent
diagnostics observers see a fresh stamp on the next runtime
creation.
5. `Interlocked.Increment` on the driver-wide
`AbCip.HsbyFailoverCount` counter.
The handler is idempotent — a second event for the same address change
is harmless because the dicts are already empty + the coalescer reset
is itself idempotent.
### Bulkhead key semantics
The per-host resilience pipeline (Polly bulkhead + circuit breaker, plan
decision #144) keys on whatever `IPerCallHostResolver.ResolveHost`
returns. PR 5.2 changes that resolver so an HSBY-failed-over device
returns the partner's address, which means:
- The **device-state lookup** (`_devices.TryGetValue`) keeps using the
configured primary `HostAddress` as the dictionary key — that key
never changes for the lifetime of a device, so multi-device
configurations stay routable.
- The **resilience pipeline** (Polly bulkhead, breaker, retry policies)
receives the active address as the host-name dimension. The standby
chassis's tripped breaker (if its primary went away) doesn't bleed
over to the partner; the partner gets fresh limits + a closed
breaker.
When HSBY is disabled (`Hsby.Enabled = false`) `ResolveHost` returns the
configured primary `HostAddress` exactly as it always has — pre-5.2
behaviour, no double-key risk.
## Failure-mode walkthrough
PR 5.2 adds three failover surface areas to reason about. The table
below summarises the behaviour the driver reports + how an operator
can inspect it.
### Primary-stuck (primary unreachable, partner Active)
The primary chassis goes away (network partition, power loss, a stuck
Forward Open). The role-probe loop reads `HsbyRole.Unknown` for the
primary and `HsbyRole.Active` for the partner.
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | partner address |
| `DeviceState.PrimaryRole` | `Unknown` |
| `DeviceState.PartnerRole` | `Active` |
| `ResolveHost(tag)` | partner address |
| Reads / writes | route through partner gateway transparently |
| `AbCip.HsbyFailoverCount` | incremented when the address transitioned away from the primary |
| `AbCip.HsbyActive` | `2` (partner is the active chassis) |
| Operator action | none required for routing; investigate why the primary is unreachable through the connectivity-probe loop's `_System/_ConnectionStatus` for the device |
### Secondary-stuck (partner unreachable, primary Active)
The partner chassis goes away (its OPC UA server is down, its IP is
unreachable, the redundancy module unhitched it). The probe loop reads
`HsbyRole.Active` for the primary and `HsbyRole.Unknown` for the partner.
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | primary address (no transition; this is the steady state) |
| `DeviceState.PrimaryRole` | `Active` |
| `DeviceState.PartnerRole` | `Unknown` |
| `ResolveHost(tag)` | primary address |
| Reads / writes | route through primary gateway exactly as in a non-HSBY deployment |
| `AbCip.HsbyFailoverCount` | unchanged — no flip happened |
| `AbCip.HsbyActive` | `1` (primary is the active chassis) |
| Operator action | investigate why the partner is unreachable; the operational risk is that a future primary-side outage has no fall-back |
### Both-stuck (no chassis Active)
Both chassis report `Standby` / `Disqualified` / `Unknown` (a
redundancy-module misconfiguration, both controllers in Program mode,
or both unreachable).
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | `null` |
| `ResolveHost(tag)` | falls back to the configured primary `HostAddress` |
| Reads / writes | dispatched to the configured primary; a stuck primary surfaces `BadCommunicationError` per the existing dial flow |
| `AbCip.HsbyActive` | `0` (no chassis Active) |
| `AbCip.HsbyFailoverCount` | incremented when the transition `Active → null` happened |
| Operator action | investigate the redundancy module / mode keys; the SCADA layer sees stuck-or-bad-quality reads, not incorrect routing |
The "fall back to primary on null Active" choice is deliberate. Routing
all reads to a deterministic chassis (the configured primary) keeps the
breaker key + bulkhead state stable while the operator diagnoses the
double-down outage; the alternative (round-robin / partner) would just
trip both breakers in turn and obscure which chassis is the real
problem.
## Follow-ups (beyond PR 5.2)
- **PR 5.2** — wire `ActiveAddress` into `ResolveHost` so reads/writes
route to the live chassis automatically. Today's PR only **gathers** the
role.
- **Patched `ab_server` image** — add a writable `WallClockTime.SyncStatus`
tag (or a separate Python shim) so the Docker `paired` profile can
exercise the wire-level role probe.
exercise the wire-level role probe + the
`tests/.../IntegrationTests/AbCipHsbyFailoverTests.cs` scaffold can
flip its `Assert.Skip` for a real integration assertion.
- **`hsby-mux` REST endpoint** — `POST /flip {"active": "primary"}` writes
`1` to the chosen chassis + `0` to the other so integration tests can
drive switch-overs deterministically.
`1` to the chosen chassis + `0` to the other so integration tests +
`scripts/e2e/test-abcip-hsby.ps1` can drive switch-overs
deterministically.
- **GuardLogix HSBY** — same role-tag plumbing applies; verify against a
real 1756-L8xS pair when one is on-site.

View File

@@ -0,0 +1,210 @@
#Requires -Version 7.0
<#
.SYNOPSIS
End-to-end CLI test for AB CIP HSBY failover routing (PR abcip-5.2). Subscribes to
a tag through the OtOpcUa OPC UA server, flips the active chassis mid-stream via
the paired-fixture's hsby-mux sidecar HTTP endpoint, and asserts the subscribe
stream survives the failover (no permanent loss of notifications + the post-flip
data carries the partner-side update).
.DESCRIPTION
Paired-fixture variant of test-abcip.ps1. Where test-abcip.ps1 runs against a
single ab_server instance, this script assumes a paired fixture with two
ab_server instances (primary + partner) and an hsby-mux sidecar exposing
/flip {"active": "primary" | "partner"} over HTTP.
Five assertions:
- HsbyInitialActive — primary is Active at start (hsby-mux primes it)
- HsbyResolveActive — driver-diagnostics surfaces AbCip.HsbyActive == 1
- HsbyFailoverFlip — POST {"active": "partner"} → AbCip.HsbyActive == 2
- HsbySubscribeSurvives — subscribe stream stays open across the flip + sees
an updated value from the partner side
- HsbyFailoverCount — AbCip.HsbyFailoverCount increments by ≥ 1
.PARAMETER PrimaryGateway
ab://host[:port]/cip-path of the primary chassis. Default ab://127.0.0.1/1,0.
.PARAMETER PartnerGateway
ab://host[:port]/cip-path of the partner chassis. Default ab://127.0.0.2/1,0.
.PARAMETER HsbyMuxUrl
Base URL of the paired-fixture's hsby-mux sidecar. Default http://localhost:7080.
Endpoints used:
GET /role → returns {"primary":"Active","partner":"Standby"}
POST /flip {"active":"primary"|"partner"} → flips role tag values on each chassis
.PARAMETER OpcUaUrl
OtOpcUa server endpoint. Default opc.tcp://localhost:4840.
.PARAMETER BridgeNodeId
NodeId at which the server publishes the tag exercised by the subscribe assertion.
Required.
.PARAMETER TagPath
Logix symbolic path the bridge tag points at. Default 'TestDINT'.
.PARAMETER DriverInstanceId
DriverInstance ID for the AB CIP driver under test. Used to scope the
driver-diagnostics RPC. Default 'abcip-hsby'.
.EXAMPLE
./test-abcip-hsby.ps1 -BridgeNodeId 'ns=2;s=AbCip/Bridge/TestDINT'
#>
param(
[string]$PrimaryGateway = "ab://127.0.0.1/1,0",
[string]$PartnerGateway = "ab://127.0.0.2/1,0",
[string]$HsbyMuxUrl = "http://localhost:7080",
[string]$OpcUaUrl = "opc.tcp://localhost:4840",
[Parameter(Mandatory)] [string]$BridgeNodeId,
[string]$TagPath = "TestDINT",
[string]$DriverInstanceId = "abcip-hsby"
)
$ErrorActionPreference = "Stop"
. "$PSScriptRoot/_common.ps1"
$abcipCli = Get-CliInvocation `
-ProjectFolder "src/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli" `
-ExeName "otopcua-abcip-cli"
$opcUaCli = Get-CliInvocation `
-ProjectFolder "src/ZB.MOM.WW.OtOpcUa.Client.CLI" `
-ExeName "otopcua-cli"
$results = @()
function Invoke-HsbyFlip {
param([string]$Active)
$body = @{ active = $Active } | ConvertTo-Json -Compress
try {
Invoke-RestMethod -Uri "$HsbyMuxUrl/flip" -Method Post -Body $body -ContentType 'application/json'
} catch {
throw "hsby-mux at $HsbyMuxUrl/flip rejected the request: $($_.Exception.Message)"
}
}
function Get-HsbyDiagnosticValue {
param([string]$Counter)
# Pull driver-diagnostics through the OPC UA Admin RPC surface. The CLI returns
# a raw JSON blob; we grep out the named counter so the assertion is robust to
# other counters the driver surfaces.
$diagArgs = @($opcUaCli.PrefixArgs) + @(
"driver-diagnostics", "-u", $OpcUaUrl, "-d", $DriverInstanceId)
$diagOut = & $opcUaCli.File @diagArgs 2>&1
$joined = ($diagOut -join "`n")
if ($joined -match "${Counter}.*?:\s*([\d\.]+)") {
return [double]$matches[1]
}
return $null
}
# ---- HsbyInitialActive — hsby-mux primes primary as Active ----
Write-Header "HsbyInitialActive (POST $HsbyMuxUrl/flip {active=primary})"
try {
Invoke-HsbyFlip -Active "primary" | Out-Null
Start-Sleep -Seconds 3 # role-probe loop default tick is 2s
$active = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyActive"
$passed = ($active -eq 1.0)
$results += [PSCustomObject]@{
Name = "HsbyInitialActive"
Passed = $passed
Detail = if ($passed) { "AbCip.HsbyActive=1 after priming primary" } else { "AbCip.HsbyActive=$active (expected 1)" }
}
} catch {
$results += [PSCustomObject]@{
Name = "HsbyInitialActive"; Passed = $false; Detail = $_.Exception.Message
}
}
# ---- HsbyResolveActive — driver routing reads through the primary ----
Write-Header "HsbyResolveActive (read $TagPath via primary)"
$readArgs = @("read") + @("-g", $PrimaryGateway, "-f", "ControlLogix") + @("-t", $TagPath, "--type", "DInt")
$readOut = & $abcipCli.Exe @($abcipCli.Args + $readArgs) 2>&1
$readOk = ($readOut -join "`n") -notmatch "(error|fail)"
$results += [PSCustomObject]@{
Name = "HsbyResolveActive"
Passed = $readOk
Detail = if ($readOk) { "primary read completed without error" } else { "read failed: $($readOut -join ' ')" }
}
# ---- HsbySubscribeSurvives + HsbyFailoverFlip + HsbyFailoverCount ----
Write-Header "HsbyFailoverFlip + HsbySubscribeSurvives (subscribe across flip)"
$failoverBaseline = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyFailoverCount"
if ($null -eq $failoverBaseline) { $failoverBaseline = 0 }
$duration = 12
$subOut = New-TemporaryFile
$subErr = New-TemporaryFile
$subArgs = @($opcUaCli.PrefixArgs) + @(
"subscribe", "-u", $OpcUaUrl, "-n", $BridgeNodeId, "-i", "200", "--duration", "$duration")
$subProc = Start-Process -FilePath $opcUaCli.File -ArgumentList $subArgs `
-NoNewWindow -PassThru `
-RedirectStandardOutput $subOut.FullName `
-RedirectStandardError $subErr.FullName
# Let the subscribe settle + accumulate primary-side notifications.
Start-Sleep -Seconds 3
# Mid-stream flip — primary→Standby, partner→Active.
try {
Invoke-HsbyFlip -Active "partner" | Out-Null
} catch {
Stop-Process -Id $subProc.Id -Force -ErrorAction SilentlyContinue
$results += [PSCustomObject]@{
Name = "HsbyFailoverFlip"; Passed = $false; Detail = "hsby-mux flip rejected: $($_.Exception.Message)"
}
}
# Wait for the role-probe loop to catch up (default tick 2s + ProbeIntervalMs slack).
Start-Sleep -Seconds 4
# Drive a write through the partner so the subscribe sees a fresh value.
$flipValue = Get-Random -Minimum 70000 -Maximum 79999
$writeArgs = @("write") + @("-g", $PartnerGateway, "-f", "ControlLogix") + @("-t", $TagPath, "--type", "DInt", "-v", $flipValue)
& $abcipCli.Exe @($abcipCli.Args + $writeArgs) | Out-Null
$activeAfter = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyActive"
$flipPassed = ($activeAfter -eq 2.0)
$results += [PSCustomObject]@{
Name = "HsbyFailoverFlip"
Passed = $flipPassed
Detail = if ($flipPassed) { "AbCip.HsbyActive=2 after flip" } else { "AbCip.HsbyActive=$activeAfter (expected 2)" }
}
# Stop the subscribe + harvest the stream.
$subProc.WaitForExit(($duration + 5) * 1000) | Out-Null
if (-not $subProc.HasExited) { Stop-Process -Id $subProc.Id -Force }
$subText = (Get-Content $subOut.FullName -Raw) + (Get-Content $subErr.FullName -Raw)
Remove-Item $subOut.FullName, $subErr.FullName -ErrorAction SilentlyContinue
# Stream survival = at least one notification *after* the flip carries the new
# partner-side value. The post-flip write of $flipValue is the canary.
$saw = $subText -match "$flipValue"
$results += [PSCustomObject]@{
Name = "HsbySubscribeSurvives"
Passed = $saw
Detail = if ($saw) {
"subscribe stream surfaced post-flip value $flipValue from partner chassis"
} else {
"subscribe stream did not see the post-flip canary $flipValue — output: $subText"
}
}
# ---- HsbyFailoverCount — counter incremented by ≥ 1 ----
Write-Header "HsbyFailoverCount"
$failoverAfter = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyFailoverCount"
if ($null -eq $failoverAfter) { $failoverAfter = 0 }
$counterOk = ($failoverAfter - $failoverBaseline) -ge 1
$results += [PSCustomObject]@{
Name = "HsbyFailoverCount"
Passed = $counterOk
Detail = if ($counterOk) {
"AbCip.HsbyFailoverCount went from $failoverBaseline$failoverAfter"
} else {
"AbCip.HsbyFailoverCount unchanged ($failoverBaseline$failoverAfter); expected at least 1 increment"
}
}
Write-Summary -Title "AB CIP HSBY failover e2e" -Results $results
if ($results | Where-Object { -not $_.Passed }) { exit 1 }

View File

@@ -44,6 +44,24 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
private IAddressSpaceBuilder? _cachedBuilder;
private DriverHealth _health = new(DriverState.Unknown, null, null);
// PR abcip-5.2 — failover bookkeeping. Counter is surfaced through driver-diagnostics
// as AbCip.HsbyFailoverCount; the event lets internal subscribers react to an
// ActiveAddress flip without HsbyProbeLoopAsync calling deep into the runtime cache
// directly. The driver subscribes itself in the constructor so cache invalidation +
// write-coalescer reset run inline with the address-change observation.
private long _hsbyFailoverCount;
/// <summary>
/// PR abcip-5.2 — raised by <see cref="HsbyProbeLoopAsync"/> whenever a device's
/// <see cref="DeviceState.ActiveAddress"/> transitions to a value different from
/// the one observed on the previous tick. Args carry the device + the
/// (oldAddress, newAddress) pair so subscribers can decide whether the change
/// matters for them. Internal seam — the driver wires its own runtime-cache /
/// write-coalescer invalidation through this event so the bookkeeping runs in
/// one place + tests can assert via the public diagnostics counter.
/// </summary>
internal event EventHandler<HsbyActiveAddressChangedEventArgs>? OnActiveAddressChanged;
public event EventHandler<DataChangeEventArgs>? OnDataChange;
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
@@ -67,6 +85,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
onChange: (handle, tagRef, snapshot) =>
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, tagRef, snapshot)));
_alarmProjection = new AbCipAlarmProjection(this, _options.AlarmPollInterval);
// PR abcip-5.2 — wire the failover-handling subscriber. Drops every cached per-tag
// / parent-DINT runtime against the now-standby gateway, resets the write-coalescer
// (the prior known-written values were against the standby chassis), clears the
// logical-walk state so the @tags walk reruns against the new active gateway, and
// bumps the diagnostics counter that BuildDiagnostics surfaces.
OnActiveAddressChanged += HandleActiveAddressChanged;
}
/// <summary>
@@ -258,6 +282,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
&& !string.IsNullOrWhiteSpace(state.Options.PartnerHostAddress))
{
state.PartnerAddress = state.Options.PartnerHostAddress;
// PR abcip-5.2 — pre-parse the partner address once so the runtime hot
// path can swap (Gateway, Port, CipPath) without re-parsing on every
// ResolveHost / EnsureTagRuntimeAsync call. A bad partner address is a
// hard config error already flagged by HsbyProbeLoopAsync's TryParse +
// OnWarning path, so a TryParse miss here is non-fatal — the runtime
// never resolves to it because PartnerParsedAddress stays null.
state.PartnerParsedAddress = AbCipHostAddress.TryParse(state.Options.PartnerHostAddress!);
state.HsbyCts = new CancellationTokenSource();
var ct = state.HsbyCts.Token;
_ = Task.Run(() => HsbyProbeLoopAsync(state, hsby, ct), ct);
@@ -784,7 +815,28 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
// No chassis Active — clear so PR abcip-5.2's ResolveHost can fault writes.
newActive = null;
}
// PR abcip-5.2 — fire OnActiveAddressChanged on every transition so the
// runtime-cache invalidation handler runs exactly once per flip. We compare
// before assigning so a steady-state tick (Active didn't change) is a no-op.
var prevActive = state.ActiveAddress;
state.ActiveAddress = newActive;
if (!string.Equals(prevActive, newActive, StringComparison.OrdinalIgnoreCase))
{
try
{
OnActiveAddressChanged?.Invoke(this,
new HsbyActiveAddressChangedEventArgs(state, prevActive, newActive));
}
catch (Exception ex)
{
// A handler that throws must never tear the probe loop down. Surface
// the failure through the warning sink + keep ticking; the next flip
// gets another shot at invalidation.
_options.OnWarning?.Invoke(
$"AbCip HSBY active-address-changed handler threw on " +
$"primary='{state.Options.HostAddress}' partner='{partnerAddress}': {ex.Message}");
}
}
try { await Task.Delay(hsby.ProbeInterval, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { break; }
@@ -836,6 +888,46 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
}
/// <summary>
/// PR abcip-5.2 — invalidation hook for an HSBY failover. Disposes every cached
/// per-tag / parent-DINT runtime on the device so the next read / write re-creates
/// against the new Active gateway, resets the write-coalescer's per-device cache
/// (the prior known-written values were against the now-standby chassis), wipes
/// the Logical-mode @tags walk so the new chassis gets a fresh symbol-table
/// resolution, and bumps the AbCip.HsbyFailoverCount diagnostic. Idempotent — a
/// re-fire against the same address (e.g. an event handler that races the assign)
/// short-circuits on the RuntimesAddress equality check inside
/// <see cref="EnsureTagRuntimeAsync"/>.
/// </summary>
private void HandleActiveAddressChanged(object? sender, HsbyActiveAddressChangedEventArgs e)
{
var state = e.Device;
// Drop the runtime cache. The runtime creators repopulate against the new active
// gateway on next read/write; the disposed handles' libplctag pointers are
// released so the native heap doesn't leak.
foreach (var rt in state.Runtimes.Values)
{
try { rt.Dispose(); } catch { }
}
state.Runtimes.Clear();
foreach (var rt in state.ParentRuntimes.Values)
{
try { rt.Dispose(); } catch { }
}
state.ParentRuntimes.Clear();
// Reset the @tags symbol-table walk so the new chassis re-fires it on next read;
// the standby chassis's instance IDs don't transfer to the now-Active partner.
state.LogicalInstanceMap.Clear();
state.LogicalWalkComplete = false;
// Reset the write-coalescer so the first post-flip write of any value pays the
// full round-trip and the cache rebuilds from the new baseline.
_writeCoalescer.Reset(state.Options.HostAddress);
// Clear the per-device runtimes-address marker so the next runtime creator stamps
// it with whatever the new ActiveParsedAddress resolves to.
state.RuntimesAddress = null;
Interlocked.Increment(ref _hsbyFailoverCount);
}
private void TransitionDeviceState(DeviceState state, HostState newState)
{
HostState old;
@@ -911,11 +1003,34 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
if (AbCipSystemTagSource.IsSystemReference(fullReference))
{
var host = ExtractSystemDeviceHost(fullReference);
if (host is not null) return host;
if (host is not null) return ResolveActiveHostFor(host);
}
if (_tagsByName.TryGetValue(fullReference, out var def))
return def.DeviceHostAddress;
return _options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId;
return ResolveActiveHostFor(def.DeviceHostAddress);
return ResolveActiveHostFor(_options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId);
}
/// <summary>
/// PR abcip-5.2 — failover-aware bulkhead-key resolver. The configured primary
/// <c>HostAddress</c> stays the device-state lookup key (it never changes for a
/// given device), but the resilience pipeline (Polly bulkhead + breaker per plan
/// decision #144) keys on whatever this method returns. When HSBY is enabled and
/// <see cref="DeviceState.ActiveAddress"/> resolves to the partner, we route the
/// bulkhead through the partner's address so the new active partner gets its own
/// fresh breaker state instead of inheriting the now-standby's tripped breaker.
/// <para>
/// When HSBY isn't enabled or no chassis is Active, returns the original
/// primary host address — that's the legacy pre-5.2 behaviour and keeps the
/// bulkhead state stable for the dial flow's BadCommunicationError surface.
/// </para>
/// </summary>
internal string ResolveActiveHostFor(string deviceHostAddress)
{
if (!_devices.TryGetValue(deviceHostAddress, out var state)) return deviceHostAddress;
if (state.Options.Hsby is not { Enabled: true }) return deviceHostAddress;
var active = state.ActiveAddress;
if (string.IsNullOrEmpty(active)) return deviceHostAddress;
return active;
}
/// <summary>
@@ -1367,10 +1482,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
{
sliceLogicalId = sliceId;
}
// PR abcip-5.2 — slice handles also follow the active address.
var sliceActive = device.ActiveParsedAddress;
var baseParams = new AbCipTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.ParsedAddress.CipPath,
Gateway: sliceActive.Gateway,
Port: sliceActive.Port,
CipPath: sliceActive.CipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parsedPath.ToLibplctagName(),
Timeout: _options.Timeout,
@@ -1439,6 +1556,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
throw;
}
device.Runtimes[tagName] = runtime;
// PR abcip-5.2 — keep the slice path's runtime cache lifecycle in lockstep with
// the per-tag handles. The failover handler clears Runtimes wholesale, so the
// address stamp here matches whatever ActiveAddress resolved to when the slice
// params were built (the caller passed createParams pre-resolved).
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
? device.ActiveAddress ?? device.Options.HostAddress
: device.Options.HostAddress;
return runtime;
}
@@ -1859,10 +1983,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
{
parentLogicalId = pid;
}
// PR abcip-5.2 — same active-address routing as EnsureTagRuntimeAsync so
// BOOL-in-DINT RMW handles follow the failover.
var active = device.ActiveParsedAddress;
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.ParsedAddress.CipPath,
Gateway: active.Gateway,
Port: active.Port,
CipPath: active.CipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parentTagName,
Timeout: _options.Timeout,
@@ -1879,6 +2006,9 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
throw;
}
device.ParentRuntimes[parentTagName] = runtime;
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
? device.ActiveAddress ?? device.Options.HostAddress
: device.Options.HostAddress;
return runtime;
}
@@ -1906,10 +2036,15 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
logicalId = resolvedId;
}
// PR abcip-5.2 — route through the resolved active address so an HSBY pair that
// failed-over to the partner targets the partner's gateway / port / cip-path.
// When HSBY is off or no chassis is Active the getter returns ParsedAddress and
// behaviour is identical to pre-5.2 builds.
var active = device.ActiveParsedAddress;
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.ParsedAddress.CipPath,
Gateway: active.Gateway,
Port: active.Port,
CipPath: active.CipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parsed.ToLibplctagName(),
Timeout: _options.Timeout,
@@ -1927,6 +2062,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
throw;
}
device.Runtimes[def.Name] = runtime;
// Stamp the per-device runtimes-address marker so the failover handler can detect
// a stale cache. Compared in DEBUG builds + diagnostics; production code routes
// invalidation through OnActiveAddressChanged.
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
? device.ActiveAddress ?? device.Options.HostAddress
: device.Options.HostAddress;
return runtime;
}
@@ -1951,6 +2092,11 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
["AbCip.WritesPassedThrough"] = _writeCoalescer.TotalWritesPassedThrough,
// PR abcip-4.4 — total _RefreshTagDb truthy writes that dispatched to RebrowseAsync.
["AbCip.RefreshTriggers"] = _systemTagSource.TotalRefreshTriggers,
// PR abcip-5.2 — count of HSBY active-address transitions the probe loop has
// observed. Aggregated across every HSBY-enabled device on this driver
// instance; the per-device breakdown is observable via the per-pair role
// counters below.
["AbCip.HsbyFailoverCount"] = Interlocked.Read(ref _hsbyFailoverCount),
};
// PR abcip-5.1 — HSBY role surface. One <Counter> per HSBY-enabled device:
// AbCip.HsbyActive — 1 if ActiveAddress == primary, 2 if == partner, 0 otherwise.
@@ -2368,6 +2514,49 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
/// </summary>
public string? PartnerAddress { get; set; }
/// <summary>
/// PR abcip-5.2 — parsed form of <see cref="PartnerAddress"/>, populated at init
/// when HSBY is configured. <c>ResolveHost</c>'s caller side keeps using the
/// opaque <see cref="AbCipDeviceOptions.HostAddress"/>; the **runtime hot path**
/// consults <see cref="ActiveParsedAddress"/> so libplctag handles target the
/// currently Active gateway / port / cip-path.
/// </summary>
public AbCipHostAddress? PartnerParsedAddress { get; set; }
/// <summary>
/// PR abcip-5.2 — parsed wire address that per-tag / per-slice / parent-DINT
/// runtimes should be created against right now. Returns <see cref="ParsedAddress"/>
/// (the configured primary) when (a) HSBY isn't enabled, (b) <see cref="ActiveAddress"/>
/// is null (no chassis Active — fall through to the dial flow which will fault
/// with BadCommunicationError on the next wire op), or (c) the active address
/// equals the configured primary host. Returns <see cref="PartnerParsedAddress"/>
/// when the partner is the live chassis. Cheap getter — every tag-runtime
/// creation calls it.
/// </summary>
public AbCipHostAddress ActiveParsedAddress
{
get
{
if (Options.Hsby is not { Enabled: true } || ActiveAddress is null)
return ParsedAddress;
if (PartnerParsedAddress is not null
&& string.Equals(ActiveAddress, PartnerAddress, StringComparison.OrdinalIgnoreCase))
return PartnerParsedAddress;
return ParsedAddress;
}
}
/// <summary>
/// PR abcip-5.2 — address every entry in <see cref="Runtimes"/> +
/// <see cref="ParentRuntimes"/> was created against. <c>null</c> until the first
/// read / write materialises a runtime; set to the resolved active address each
/// time a runtime is created. <see cref="AbCipDriver.HsbyProbeLoopAsync"/>'s
/// active-address-changed callback compares this against the new active and
/// drops every cached handle on mismatch so the next read / write re-creates
/// against the new gateway.
/// </summary>
public string? RuntimesAddress { get; set; }
/// <summary>PR abcip-5.1 — most-recent role observed on the primary chassis.</summary>
public HsbyRole PrimaryRole { get; set; } = HsbyRole.Unknown;
@@ -2420,3 +2609,26 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
}
}
/// <summary>
/// PR abcip-5.2 — event payload raised by <see cref="AbCipDriver"/> when the HSBY
/// probe loop observes a transition in <see cref="AbCipDriver.DeviceState.ActiveAddress"/>.
/// Subscribers consume <see cref="OldAddress"/> / <see cref="NewAddress"/> to decide
/// whether to invalidate cached state. <see cref="OldAddress"/> is <c>null</c> on the
/// first transition (driver freshly initialised) and <see cref="NewAddress"/> is
/// <c>null</c> when neither chassis is Active (both Standby / Disqualified / Unknown).
/// </summary>
internal sealed class HsbyActiveAddressChangedEventArgs : EventArgs
{
public AbCipDriver.DeviceState Device { get; }
public string? OldAddress { get; }
public string? NewAddress { get; }
public HsbyActiveAddressChangedEventArgs(
AbCipDriver.DeviceState device, string? oldAddress, string? newAddress)
{
Device = device;
OldAddress = oldAddress;
NewAddress = newAddress;
}
}

View File

@@ -0,0 +1,47 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.AbCip;
namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests;
/// <summary>
/// PR abcip-5.2 — integration scaffold for HSBY failover routing through
/// <see cref="AbCipDriver.ResolveHost"/>. Skipped by default because the paired
/// fixture (controllogix-secondary <c>ab_server</c> instance + <c>hsby-mux</c>
/// sidecar that flips the role tag on demand) is not yet stable in the Docker
/// compose layout. The scaffold lives here so:
/// <list type="bullet">
/// <item>The trait is discoverable by <c>dotnet test --filter "Category=Hsby"</c>.</item>
/// <item>The companion E2E script (<c>scripts/e2e/test-abcip-hsby.ps1</c>) has a
/// paired surface already wired in tests when an operator stands up the fixture
/// manually.</item>
/// <item>A future PR can flip the skip into a real assertion without restructuring
/// the test layout.</item>
/// </list>
/// The unit-level coverage in <c>AbCipHsbyFailoverTests</c> (in the unit tests
/// project) exercises the active-address-routing + cache-invalidation contract in
/// full against the FakeAbCipTagFactory; this scaffold is just the wire-level shape.
/// </summary>
[Trait("Category", "Hsby")]
[Trait("Requires", "AbServer")]
public sealed class AbCipHsbyFailoverTests
{
[AbServerFact]
public Task ResolveHost_routes_to_partner_after_role_flip_through_hsby_mux()
{
// The paired-fixture compose service (controllogix + controllogix-secondary +
// hsby-mux sidecar at http://localhost:7080) is not yet wired. When it ships,
// the test body will:
// 1. POST {"active": "primary"} to hsby-mux → assert ResolveHost = primary
// gateway via a CLI read.
// 2. POST {"active": "partner"} → wait for the probe loop to catch up →
// assert ResolveHost = partner gateway via a second CLI read.
// 3. Assert AbCip.HsbyFailoverCount on the driver's diagnostics
// ≥ 1 by reading the driver-diagnostics RPC through the OPC UA Admin
// surface.
Assert.Skip("HSBY paired fixture (controllogix-secondary + hsby-mux sidecar) " +
"not yet promoted out of scaffold. Run scripts/e2e/test-abcip-hsby.ps1 against a " +
"manually-stood-up paired fixture when verifying this PR end-to-end.");
return Task.CompletedTask;
}
}

View File

@@ -0,0 +1,373 @@
using System.Collections.Concurrent;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.AbCip;
namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests;
/// <summary>
/// PR abcip-5.2 — unit tests for HSBY failover routing in
/// <see cref="AbCipDriver.ResolveHost"/>. Drives a paired-IP HSBY device through
/// primary→partner role flips via the FakeAbCipTagFactory's <c>Customise</c> hook +
/// asserts:
/// <list type="bullet">
/// <item><see cref="AbCipDriver.ResolveHost"/> returns the address of the
/// currently-Active chassis (and the configured primary when HSBY is off /
/// both Standby).</item>
/// <item>The per-device runtime cache is invalidated on flip — disposed handles
/// prove the failover handler ran.</item>
/// <item><see cref="AbCipWriteCoalescer"/> drops cached values for the device so
/// the partner pays the full round-trip on next write.</item>
/// <item><c>AbCip.HsbyFailoverCount</c> in driver-diagnostics increments per flip.</item>
/// <item>Multiple flips count correctly.</item>
/// </list>
/// </summary>
[Trait("Category", "Unit")]
public sealed class AbCipHsbyFailoverTests
{
private const string Primary = "ab://10.0.0.5/1,0";
private const string Partner = "ab://10.0.0.6/1,0";
// ---- ResolveHost routing ----
[Fact]
public async Task ResolveHost_returns_partner_when_partner_active()
{
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 0, partnerRoleValue: 1);
try
{
await WaitForActiveAsync(drv, Partner);
var resolved = drv.ResolveHost("Motor01_Speed");
// Tag isn't registered; resolver still falls through ResolveActiveHostFor on
// the first configured device, which has the partner active.
resolved.ShouldBe(Partner);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task ResolveHost_returns_primary_when_primary_active()
{
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 1, partnerRoleValue: 0);
try
{
await WaitForActiveAsync(drv, Primary);
drv.ResolveHost("Motor01_Speed").ShouldBe(Primary);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task Toggling_role_flips_ResolveHost_output()
{
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
var drv = BuildDriver(factory);
await drv.InitializeAsync("{}", CancellationToken.None);
try
{
await WaitForActiveAsync(drv, Primary);
drv.ResolveHost("anything").ShouldBe(Primary);
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
await WaitForActiveAsync(drv, Partner);
drv.ResolveHost("anything").ShouldBe(Partner);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task ResolveHost_falls_back_to_primary_when_both_standby()
{
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 0, partnerRoleValue: 0);
try
{
// Wait for the role state to settle so we know the loop ticked at least once.
await WaitForAsync(() => drv.GetDeviceState(Primary)?.PrimaryRole != HsbyRole.Unknown);
drv.ResolveHost("anything").ShouldBe(Primary,
"neither chassis Active means ActiveAddress is null; ResolveHost falls back to the configured primary");
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task ResolveHost_ignores_ActiveAddress_when_Hsby_disabled()
{
var factory = new FakeAbCipTagFactory();
var drv = new AbCipDriver(new AbCipDriverOptions
{
Devices =
[
new AbCipDeviceOptions(
Primary,
PartnerHostAddress: Partner,
Hsby: new AbCipHsbyOptions { Enabled = false }),
],
Probe = new AbCipProbeOptions { Enabled = false },
}, "drv-hsby-off-resolve", factory);
try
{
await drv.InitializeAsync("{}", CancellationToken.None);
// Manually plant an ActiveAddress that conflicts with the primary; ResolveHost
// must still pick the primary because Hsby is disabled.
var state = drv.GetDeviceState(Primary).ShouldNotBeNull();
state.ActiveAddress = Partner;
drv.ResolveHost("anything").ShouldBe(Primary);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
// ---- Cache invalidation on flip ----
[Fact]
public async Task Failover_invalidates_runtime_cache_and_increments_counter()
{
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
var drv = BuildDriverWithTag(factory, "Motor01_Speed");
await drv.InitializeAsync("{}", CancellationToken.None);
try
{
await WaitForActiveAsync(drv, Primary);
// Force a per-tag runtime to be created against the primary.
var initialReads = await drv.ReadAsync(["Motor01_Speed"], CancellationToken.None);
initialReads.Count.ShouldBe(1);
var state = drv.GetDeviceState(Primary).ShouldNotBeNull();
state.Runtimes.ShouldContainKey("Motor01_Speed");
var runtimeBeforeFlip = (FakeAbCipTag)state.Runtimes["Motor01_Speed"];
runtimeBeforeFlip.CreationParams.Gateway.ShouldBe("10.0.0.5");
// Flip — primary→Standby, partner→Active.
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
await WaitForActiveAsync(drv, Partner);
// The pre-flip runtime should have been disposed by the failover handler.
runtimeBeforeFlip.Disposed.ShouldBeTrue();
// Cache should be empty until the next read repopulates it.
state.Runtimes.ShouldNotContainKey("Motor01_Speed");
// Diagnostics counter ticked.
var diag = drv.GetHealth().Diagnostics.ShouldNotBeNull();
diag.ShouldContainKey("AbCip.HsbyFailoverCount");
diag["AbCip.HsbyFailoverCount"].ShouldBeGreaterThanOrEqualTo(1);
// Next read recreates against the partner gateway.
var afterReads = await drv.ReadAsync(["Motor01_Speed"], CancellationToken.None);
afterReads.Count.ShouldBe(1);
state.Runtimes.ShouldContainKey("Motor01_Speed");
var runtimeAfterFlip = (FakeAbCipTag)state.Runtimes["Motor01_Speed"];
runtimeAfterFlip.CreationParams.Gateway.ShouldBe("10.0.0.6",
"post-flip runtime must target the partner's gateway");
runtimeAfterFlip.ShouldNotBeSameAs(runtimeBeforeFlip);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task Failover_resets_write_coalescer_for_device()
{
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
var drv = BuildDriverWithTag(factory, "Motor01_Speed");
await drv.InitializeAsync("{}", CancellationToken.None);
try
{
await WaitForActiveAsync(drv, Primary);
// Seed the coalescer cache for this device + tag. We poke it directly via
// the test seam so we don't depend on the multi-write planner accepting our
// synthetic Motor01_Speed definition.
var def = new AbCipTagDefinition(
Name: "Motor01_Speed",
DeviceHostAddress: Primary,
TagPath: "Motor01_Speed",
DataType: AbCipDataType.DInt,
Writable: true,
WriteOnChange: true);
drv.WriteCoalescer.Record(Primary, def, 42);
drv.WriteCoalescer.ShouldSuppress(Primary, def, 42).ShouldBeTrue(
"baseline: identical re-write must be suppressed pre-failover");
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
await WaitForActiveAsync(drv, Partner);
// The cache for this device was cleared so the same write is no longer suppressed.
drv.WriteCoalescer.ShouldSuppress(Primary, def, 42).ShouldBeFalse(
"failover must drop cached known-written values; partner needs the wire round-trip");
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
[Fact]
public async Task Multiple_flips_each_increment_HsbyFailoverCount()
{
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
var drv = BuildDriver(factory);
await drv.InitializeAsync("{}", CancellationToken.None);
try
{
await WaitForActiveAsync(drv, Primary);
var diagBaseline = drv.GetHealth().Diagnostics.ShouldNotBeNull();
var startCount = diagBaseline.TryGetValue("AbCip.HsbyFailoverCount", out var v) ? v : 0;
// Flip 1: primary→partner
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
await WaitForActiveAsync(drv, Partner);
// Flip 2: partner→primary
FlipRoles(tracker, newPrimary: 1, newPartner: 0);
await WaitForActiveAsync(drv, Primary);
// Flip 3: primary→partner again
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
await WaitForActiveAsync(drv, Partner);
var diag = drv.GetHealth().Diagnostics.ShouldNotBeNull();
diag["AbCip.HsbyFailoverCount"].ShouldBeGreaterThanOrEqualTo(startCount + 3);
}
finally
{
await drv.ShutdownAsync(CancellationToken.None);
}
}
// ---- Helpers ----
private static AbCipDriver BuildDriver(FakeAbCipTagFactory factory) =>
new AbCipDriver(new AbCipDriverOptions
{
Devices =
[
new AbCipDeviceOptions(
Primary,
PartnerHostAddress: Partner,
Hsby: new AbCipHsbyOptions
{
Enabled = true,
RoleTagAddress = "WallClockTime.SyncStatus",
ProbeInterval = TimeSpan.FromMilliseconds(40),
}),
],
Probe = new AbCipProbeOptions { Enabled = false },
}, "drv-hsby-failover", factory);
private static AbCipDriver BuildDriverWithTag(FakeAbCipTagFactory factory, string tagName) =>
new AbCipDriver(new AbCipDriverOptions
{
Devices =
[
new AbCipDeviceOptions(
Primary,
PartnerHostAddress: Partner,
Hsby: new AbCipHsbyOptions
{
Enabled = true,
RoleTagAddress = "WallClockTime.SyncStatus",
ProbeInterval = TimeSpan.FromMilliseconds(40),
}),
],
Tags =
[
new AbCipTagDefinition(
Name: tagName,
DeviceHostAddress: Primary,
TagPath: tagName,
DataType: AbCipDataType.DInt,
Writable: true),
],
Probe = new AbCipProbeOptions { Enabled = false },
}, "drv-hsby-failover-tag", factory);
private static async Task<(AbCipDriver Driver, FakeAbCipTagFactory Factory)>
BuildHsbyDriverAsync(int primaryRoleValue, int partnerRoleValue)
{
var factory = new FakeAbCipTagFactory
{
Customise = p => p.Gateway == "10.0.0.5"
? new FakeAbCipTag(p) { Value = primaryRoleValue }
: new FakeAbCipTag(p) { Value = partnerRoleValue },
};
var drv = BuildDriver(factory);
await drv.InitializeAsync("{}", CancellationToken.None);
return (drv, factory);
}
/// <summary>
/// Snapshot of the live primary + partner role-tag fakes the factory has handed
/// out, keyed by gateway. Populated by the <c>Customise</c> hook on the
/// <see cref="FakeAbCipTagFactory"/> via a side-effecting lambda; the
/// <see cref="FakeAbCipTagFactory.Tags"/> dict alone is insufficient because both
/// chassis use the same role-tag TagName + the dict overwrites on the second
/// create.
/// </summary>
private sealed class HsbyRoleTagTracker
{
public FakeAbCipTag? Primary { get; set; }
public FakeAbCipTag? Partner { get; set; }
}
private static (FakeAbCipTagFactory Factory, HsbyRoleTagTracker Tracker)
BuildTrackingFactory(int initialPrimary, int initialPartner)
{
var tracker = new HsbyRoleTagTracker();
var factory = new FakeAbCipTagFactory();
factory.Customise = p =>
{
if (p.TagName == "WallClockTime.SyncStatus")
{
var fake = new FakeAbCipTag(p)
{
Value = p.Gateway == "10.0.0.5" ? initialPrimary : initialPartner,
};
if (p.Gateway == "10.0.0.5") tracker.Primary = fake;
else tracker.Partner = fake;
return fake;
}
// Non-role-tag handles (e.g. per-tag runtimes) — return a default fake.
return new FakeAbCipTag(p) { Value = 0 };
};
return (factory, tracker);
}
/// <summary>
/// Mutate the live primary / partner role-tag fakes' <c>Value</c> so the next
/// probe-loop tick reads the new role. Probe loop reuses one runtime per chassis
/// once initialised, so direct mutation of <see cref="FakeAbCipTag.Value"/> is
/// sufficient — no re-create required.
/// </summary>
private static void FlipRoles(HsbyRoleTagTracker tracker, int newPrimary, int newPartner)
{
if (tracker.Primary is not null) tracker.Primary.Value = newPrimary;
if (tracker.Partner is not null) tracker.Partner.Value = newPartner;
}
private static Task WaitForActiveAsync(AbCipDriver drv, string expectedActive) =>
WaitForAsync(() => drv.GetDeviceState(Primary)?.ActiveAddress == expectedActive);
private static async Task WaitForAsync(Func<bool> condition, TimeSpan? timeout = null)
{
var deadline = DateTime.UtcNow + (timeout ?? TimeSpan.FromSeconds(2));
while (!condition() && DateTime.UtcNow < deadline)
await Task.Delay(20);
}
}