@@ -1,14 +1,18 @@
|
|||||||
# AbCip — ControlLogix HSBY paired-IP support
|
# AbCip — ControlLogix HSBY paired-IP support
|
||||||
|
|
||||||
PR abcip-5.1 adds **non-transparent** HSBY (Hot-Standby) awareness to the AB
|
PR abcip-5.1 + 5.2 ship **non-transparent** HSBY (Hot-Standby) awareness
|
||||||
CIP driver. Each device may declare a partner gateway; when both gateways are
|
to the AB CIP driver. Each device may declare a partner gateway; when both
|
||||||
up the driver concurrently probes a role tag on each chassis and reports
|
gateways are up the driver concurrently probes a role tag on each chassis,
|
||||||
which one is currently Active.
|
reports which one is currently Active, and routes reads / writes through
|
||||||
|
that chassis automatically.
|
||||||
|
|
||||||
PR abcip-5.1 only **gathers + reports** the role. PR abcip-5.2 is the
|
- **PR abcip-5.1** — gathers + reports the role of each chassis through
|
||||||
follow-up that wires the resolved active address into
|
driver diagnostics. See [Role-tag detection matrix](#role-tag-detection-matrix)
|
||||||
`AbCipDriver.ResolveHost` so reads and writes route to whichever chassis is
|
+ [Active-resolution rules](#active-resolution-rules).
|
||||||
Active without operator intervention.
|
- **PR abcip-5.2** — wires the resolved active address into
|
||||||
|
`AbCipDriver.ResolveHost` and the runtime-cache lifecycle. See
|
||||||
|
[Failover behaviour](#failover-behaviour-pr-52) +
|
||||||
|
[Failure-mode walkthrough](#failure-mode-walkthrough).
|
||||||
|
|
||||||
## When to use HSBY paired IPs
|
## When to use HSBY paired IPs
|
||||||
|
|
||||||
@@ -24,7 +28,8 @@ edited the config.
|
|||||||
|
|
||||||
PR abcip-5.1 closes the visibility half of that gap by reading the role tag
|
PR abcip-5.1 closes the visibility half of that gap by reading the role tag
|
||||||
on both chassis. PR abcip-5.2 closes the routing half by re-pointing
|
on both chassis. PR abcip-5.2 closes the routing half by re-pointing
|
||||||
`ResolveHost` at the Active address each tick.
|
`ResolveHost` at the Active address each tick + invalidating the per-tag
|
||||||
|
runtime cache + write-coalescer state on every flip.
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
@@ -88,14 +93,17 @@ The driver surfaces three diagnostics counters per HSBY-enabled device
|
|||||||
| `AbCip.HsbyActive` | `1` if primary is Active, `2` if partner is Active, `0` if neither (or HSBY off) |
|
| `AbCip.HsbyActive` | `1` if primary is Active, `2` if partner is Active, `0` if neither (or HSBY off) |
|
||||||
| `AbCip.HsbyPrimaryRole` | `(int)HsbyRole` — `0` = Unknown, `1` = Active, `2` = Standby, `3` = Disqualified |
|
| `AbCip.HsbyPrimaryRole` | `(int)HsbyRole` — `0` = Unknown, `1` = Active, `2` = Standby, `3` = Disqualified |
|
||||||
| `AbCip.HsbyPartnerRole` | Same encoding as `HsbyPrimaryRole`, observed on the partner chassis |
|
| `AbCip.HsbyPartnerRole` | Same encoding as `HsbyPrimaryRole`, observed on the partner chassis |
|
||||||
|
| `AbCip.HsbyFailoverCount` (PR 5.2) | Total number of `ActiveAddress` transitions the probe loop has observed across every HSBY-enabled device on this driver. Each increment maps to one runtime-cache invalidation + write-coalescer reset. |
|
||||||
|
|
||||||
When more than one HSBY pair is configured on the same driver instance the
|
When more than one HSBY pair is configured on the same driver instance the
|
||||||
flat keys are scoped per primary host: `AbCip.HsbyActive[ab://10.0.0.5/1,0]`,
|
flat keys are scoped per primary host: `AbCip.HsbyActive[ab://10.0.0.5/1,0]`,
|
||||||
etc.
|
etc.
|
||||||
|
|
||||||
The `DeviceState.ActiveAddress` field (internal; surfaced via
|
The `DeviceState.ActiveAddress` field (internal; surfaced via
|
||||||
`HsbyActive` diagnostics) is the address PR 5.2 will route through
|
`HsbyActive` diagnostics) is the address PR 5.2 routes through
|
||||||
`ResolveHost`.
|
`ResolveHost` + uses to scope the per-host bulkhead / breaker key.
|
||||||
|
See [Failover behaviour](#failover-behaviour-pr-52) for the runtime
|
||||||
|
implications.
|
||||||
|
|
||||||
### Active-resolution rules
|
### Active-resolution rules
|
||||||
|
|
||||||
@@ -104,8 +112,8 @@ The `DeviceState.ActiveAddress` field (internal; surfaced via
|
|||||||
| Active | Standby / Disqualified / Unknown | primary |
|
| Active | Standby / Disqualified / Unknown | primary |
|
||||||
| Standby / Disqualified / Unknown | Active | partner |
|
| Standby / Disqualified / Unknown | Active | partner |
|
||||||
| Active | Active (split-brain) | **primary wins**, warning logged |
|
| Active | Active (split-brain) | **primary wins**, warning logged |
|
||||||
| Standby + Standby | Standby + Standby | `null` (PR 5.2 will surface as `BadCommunicationError`) |
|
| Standby + Standby | Standby + Standby | `null` — PR 5.2's `ResolveHost` falls back to the configured primary; the existing dial flow surfaces `BadCommunicationError` if the primary is also down. See [Both-stuck](#both-stuck-no-chassis-active). |
|
||||||
| Unknown + Unknown | Unknown + Unknown | `null` |
|
| Unknown + Unknown | Unknown + Unknown | `null` (same fallback as Standby + Standby) |
|
||||||
|
|
||||||
Split-brain (both chassis claim Active simultaneously) is a real
|
Split-brain (both chassis claim Active simultaneously) is a real
|
||||||
production failure mode — typically a redundancy-module misconfiguration or
|
production failure mode — typically a redundancy-module misconfiguration or
|
||||||
@@ -150,28 +158,167 @@ otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0 \
|
|||||||
RoleTagAddress, ProbeIntervalMs}` survive deserialise → driver →
|
RoleTagAddress, ProbeIntervalMs}` survive deserialise → driver →
|
||||||
`DeviceState`).
|
`DeviceState`).
|
||||||
- `Hsby.Enabled = false` → no role probing.
|
- `Hsby.Enabled = false` → no role probing.
|
||||||
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/AbCipHsbyRoleProberTests.cs`):
|
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/`):
|
||||||
- **Skipped by default** (`Assert.Skip`) — `ab_server` cannot emulate
|
- `AbCipHsbyRoleProberTests.cs` (PR 5.1) and
|
||||||
a ControlLogix HSBY pair (no `WallClockTime.SyncStatus`, no second
|
`AbCipHsbyFailoverTests.cs` (PR 5.2) — both **skipped by default**
|
||||||
chassis concept). The Docker `paired` profile (PR 5.1) brings up two
|
(`Assert.Skip`). `ab_server` cannot emulate a ControlLogix HSBY
|
||||||
`ab_server` instances + a stub `hsby-mux` sidecar so the topology is
|
pair (no `WallClockTime.SyncStatus`, no second chassis concept).
|
||||||
documented, but PR 5.2 follow-up needs a patched `ab_server` image
|
The Docker `paired` profile (PR 5.1) brings up two `ab_server`
|
||||||
that actually serves the role tag before the integration test can
|
instances + a stub `hsby-mux` sidecar so the topology is
|
||||||
assert anything against the wire.
|
documented, but a patched `ab_server` image that actually serves
|
||||||
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby` finds
|
the role tag is still on the follow-up list.
|
||||||
this test once it's promoted.
|
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby`
|
||||||
|
finds them once they're promoted.
|
||||||
|
- **End-to-end** (`scripts/e2e/test-abcip-hsby.ps1`, PR 5.2):
|
||||||
|
- Paired-fixture variant of `test-abcip.ps1`. Subscribes to a tag
|
||||||
|
through the OPC UA server, flips the active chassis mid-stream
|
||||||
|
via the `hsby-mux` sidecar's `POST /flip` endpoint, asserts the
|
||||||
|
stream survives + `AbCip.HsbyFailoverCount` increments. Gated
|
||||||
|
on operator-supplied `BridgeNodeId` + a running paired fixture;
|
||||||
|
ships unwired into `test-all.ps1` until the patched `ab_server`
|
||||||
|
lands.
|
||||||
|
|
||||||
## Follow-ups (PR 5.2 + beyond)
|
## Failover behaviour (PR 5.2)
|
||||||
|
|
||||||
|
PR 5.2 wires `DeviceState.ActiveAddress` into the read / write hot path
|
||||||
|
through `AbCipDriver.ResolveHost` and the runtime-cache lifecycle. After
|
||||||
|
the role-probe loop (PR 5.1) detects an active-address transition the
|
||||||
|
driver re-points every wire-level operation at the now-Active chassis
|
||||||
|
without operator intervention.
|
||||||
|
|
||||||
|
### What flips on a failover
|
||||||
|
|
||||||
|
| Aspect | Pre-flip | Post-flip |
|
||||||
|
|---|---|---|
|
||||||
|
| `ResolveHost(tag)` return | primary `HostAddress` | the partner address (when partner is now Active) |
|
||||||
|
| Per-tag libplctag handles in `DeviceState.Runtimes` | created against primary gateway | dropped on flip; lazily re-created against the partner gateway on next read / write |
|
||||||
|
| Parent-DINT RMW handles in `DeviceState.ParentRuntimes` | primary gateway | dropped on flip; same re-create-on-demand path |
|
||||||
|
| `AbCipWriteCoalescer` per-device cache | last-known-written values from the primary | reset; the first write of any value to the partner pays the full round-trip |
|
||||||
|
| `LogicalInstanceMap` (Logical-mode `@tags` walk) | populated for primary | cleared; the next read on a Logical-mode device re-walks `@tags` against the partner |
|
||||||
|
| Per-host bulkhead key (Polly bulkhead + breaker, plan decision #144) | keyed on primary `HostAddress` | keyed on the new active address — the partner gets its own fresh breaker state instead of inheriting a tripped breaker from the now-standby |
|
||||||
|
| `AbCip.HsbyFailoverCount` diagnostic | 0 | incremented by 1 on every transition observed by the probe loop |
|
||||||
|
|
||||||
|
### How the invalidation runs
|
||||||
|
|
||||||
|
PR 5.2 introduces an internal `OnActiveAddressChanged` event raised by
|
||||||
|
`HsbyProbeLoopAsync` on every `DeviceState.ActiveAddress` transition. The
|
||||||
|
driver subscribes to it from its own constructor; the handler
|
||||||
|
(`HandleActiveAddressChanged`) does the cache invalidation in one place:
|
||||||
|
|
||||||
|
1. Disposes every entry in `DeviceState.Runtimes` and
|
||||||
|
`DeviceState.ParentRuntimes`, then clears both dicts. Disposed
|
||||||
|
`IAbCipTagRuntime` instances release their underlying libplctag
|
||||||
|
handles so the native heap doesn't leak.
|
||||||
|
2. Clears `DeviceState.LogicalInstanceMap` and resets
|
||||||
|
`LogicalWalkComplete = false` so the next read on a Logical-mode
|
||||||
|
device re-fires the `@tags` symbol walk against the new chassis.
|
||||||
|
3. Calls `AbCipWriteCoalescer.Reset(deviceHostAddress)` so cached
|
||||||
|
"we already wrote 42" decisions don't stale-suppress the first
|
||||||
|
partner-side write.
|
||||||
|
4. Resets `DeviceState.RuntimesAddress = null` so subsequent
|
||||||
|
diagnostics observers see a fresh stamp on the next runtime
|
||||||
|
creation.
|
||||||
|
5. `Interlocked.Increment` on the driver-wide
|
||||||
|
`AbCip.HsbyFailoverCount` counter.
|
||||||
|
|
||||||
|
The handler is idempotent — a second event for the same address change
|
||||||
|
is harmless because the dicts are already empty + the coalescer reset
|
||||||
|
is itself idempotent.
|
||||||
|
|
||||||
|
### Bulkhead key semantics
|
||||||
|
|
||||||
|
The per-host resilience pipeline (Polly bulkhead + circuit breaker, plan
|
||||||
|
decision #144) keys on whatever `IPerCallHostResolver.ResolveHost`
|
||||||
|
returns. PR 5.2 changes that resolver so an HSBY-failed-over device
|
||||||
|
returns the partner's address, which means:
|
||||||
|
|
||||||
|
- The **device-state lookup** (`_devices.TryGetValue`) keeps using the
|
||||||
|
configured primary `HostAddress` as the dictionary key — that key
|
||||||
|
never changes for the lifetime of a device, so multi-device
|
||||||
|
configurations stay routable.
|
||||||
|
- The **resilience pipeline** (Polly bulkhead, breaker, retry policies)
|
||||||
|
receives the active address as the host-name dimension. The standby
|
||||||
|
chassis's tripped breaker (if its primary went away) doesn't bleed
|
||||||
|
over to the partner; the partner gets fresh limits + a closed
|
||||||
|
breaker.
|
||||||
|
|
||||||
|
When HSBY is disabled (`Hsby.Enabled = false`) `ResolveHost` returns the
|
||||||
|
configured primary `HostAddress` exactly as it always has — pre-5.2
|
||||||
|
behaviour, no double-key risk.
|
||||||
|
|
||||||
|
## Failure-mode walkthrough
|
||||||
|
|
||||||
|
PR 5.2 adds three failover surface areas to reason about. The table
|
||||||
|
below summarises the behaviour the driver reports + how an operator
|
||||||
|
can inspect it.
|
||||||
|
|
||||||
|
### Primary-stuck (primary unreachable, partner Active)
|
||||||
|
|
||||||
|
The primary chassis goes away (network partition, power loss, a stuck
|
||||||
|
Forward Open). The role-probe loop reads `HsbyRole.Unknown` for the
|
||||||
|
primary and `HsbyRole.Active` for the partner.
|
||||||
|
|
||||||
|
| Surface | Behaviour |
|
||||||
|
|---|---|
|
||||||
|
| `DeviceState.ActiveAddress` | partner address |
|
||||||
|
| `DeviceState.PrimaryRole` | `Unknown` |
|
||||||
|
| `DeviceState.PartnerRole` | `Active` |
|
||||||
|
| `ResolveHost(tag)` | partner address |
|
||||||
|
| Reads / writes | route through partner gateway transparently |
|
||||||
|
| `AbCip.HsbyFailoverCount` | incremented when the address transitioned away from the primary |
|
||||||
|
| `AbCip.HsbyActive` | `2` (partner is the active chassis) |
|
||||||
|
| Operator action | none required for routing; investigate why the primary is unreachable through the connectivity-probe loop's `_System/_ConnectionStatus` for the device |
|
||||||
|
|
||||||
|
### Secondary-stuck (partner unreachable, primary Active)
|
||||||
|
|
||||||
|
The partner chassis goes away (its OPC UA server is down, its IP is
|
||||||
|
unreachable, the redundancy module unhitched it). The probe loop reads
|
||||||
|
`HsbyRole.Active` for the primary and `HsbyRole.Unknown` for the partner.
|
||||||
|
|
||||||
|
| Surface | Behaviour |
|
||||||
|
|---|---|
|
||||||
|
| `DeviceState.ActiveAddress` | primary address (no transition; this is the steady state) |
|
||||||
|
| `DeviceState.PrimaryRole` | `Active` |
|
||||||
|
| `DeviceState.PartnerRole` | `Unknown` |
|
||||||
|
| `ResolveHost(tag)` | primary address |
|
||||||
|
| Reads / writes | route through primary gateway exactly as in a non-HSBY deployment |
|
||||||
|
| `AbCip.HsbyFailoverCount` | unchanged — no flip happened |
|
||||||
|
| `AbCip.HsbyActive` | `1` (primary is the active chassis) |
|
||||||
|
| Operator action | investigate why the partner is unreachable; the operational risk is that a future primary-side outage has no fall-back |
|
||||||
|
|
||||||
|
### Both-stuck (no chassis Active)
|
||||||
|
|
||||||
|
Both chassis report `Standby` / `Disqualified` / `Unknown` (a
|
||||||
|
redundancy-module misconfiguration, both controllers in Program mode,
|
||||||
|
or both unreachable).
|
||||||
|
|
||||||
|
| Surface | Behaviour |
|
||||||
|
|---|---|
|
||||||
|
| `DeviceState.ActiveAddress` | `null` |
|
||||||
|
| `ResolveHost(tag)` | falls back to the configured primary `HostAddress` |
|
||||||
|
| Reads / writes | dispatched to the configured primary; a stuck primary surfaces `BadCommunicationError` per the existing dial flow |
|
||||||
|
| `AbCip.HsbyActive` | `0` (no chassis Active) |
|
||||||
|
| `AbCip.HsbyFailoverCount` | incremented when the transition `Active → null` happened |
|
||||||
|
| Operator action | investigate the redundancy module / mode keys; the SCADA layer sees stuck-or-bad-quality reads, not incorrect routing |
|
||||||
|
|
||||||
|
The "fall back to primary on null Active" choice is deliberate. Routing
|
||||||
|
all reads to a deterministic chassis (the configured primary) keeps the
|
||||||
|
breaker key + bulkhead state stable while the operator diagnoses the
|
||||||
|
double-down outage; the alternative (round-robin / partner) would just
|
||||||
|
trip both breakers in turn and obscure which chassis is the real
|
||||||
|
problem.
|
||||||
|
|
||||||
|
## Follow-ups (beyond PR 5.2)
|
||||||
|
|
||||||
- **PR 5.2** — wire `ActiveAddress` into `ResolveHost` so reads/writes
|
|
||||||
route to the live chassis automatically. Today's PR only **gathers** the
|
|
||||||
role.
|
|
||||||
- **Patched `ab_server` image** — add a writable `WallClockTime.SyncStatus`
|
- **Patched `ab_server` image** — add a writable `WallClockTime.SyncStatus`
|
||||||
tag (or a separate Python shim) so the Docker `paired` profile can
|
tag (or a separate Python shim) so the Docker `paired` profile can
|
||||||
exercise the wire-level role probe.
|
exercise the wire-level role probe + the
|
||||||
|
`tests/.../IntegrationTests/AbCipHsbyFailoverTests.cs` scaffold can
|
||||||
|
flip its `Assert.Skip` for a real integration assertion.
|
||||||
- **`hsby-mux` REST endpoint** — `POST /flip {"active": "primary"}` writes
|
- **`hsby-mux` REST endpoint** — `POST /flip {"active": "primary"}` writes
|
||||||
`1` to the chosen chassis + `0` to the other so integration tests can
|
`1` to the chosen chassis + `0` to the other so integration tests +
|
||||||
drive switch-overs deterministically.
|
`scripts/e2e/test-abcip-hsby.ps1` can drive switch-overs
|
||||||
|
deterministically.
|
||||||
- **GuardLogix HSBY** — same role-tag plumbing applies; verify against a
|
- **GuardLogix HSBY** — same role-tag plumbing applies; verify against a
|
||||||
real 1756-L8xS pair when one is on-site.
|
real 1756-L8xS pair when one is on-site.
|
||||||
|
|
||||||
|
|||||||
210
scripts/e2e/test-abcip-hsby.ps1
Normal file
210
scripts/e2e/test-abcip-hsby.ps1
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
#Requires -Version 7.0
|
||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
End-to-end CLI test for AB CIP HSBY failover routing (PR abcip-5.2). Subscribes to
|
||||||
|
a tag through the OtOpcUa OPC UA server, flips the active chassis mid-stream via
|
||||||
|
the paired-fixture's hsby-mux sidecar HTTP endpoint, and asserts the subscribe
|
||||||
|
stream survives the failover (no permanent loss of notifications + the post-flip
|
||||||
|
data carries the partner-side update).
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
Paired-fixture variant of test-abcip.ps1. Where test-abcip.ps1 runs against a
|
||||||
|
single ab_server instance, this script assumes a paired fixture with two
|
||||||
|
ab_server instances (primary + partner) and an hsby-mux sidecar exposing
|
||||||
|
/flip {"active": "primary" | "partner"} over HTTP.
|
||||||
|
|
||||||
|
Five assertions:
|
||||||
|
- HsbyInitialActive — primary is Active at start (hsby-mux primes it)
|
||||||
|
- HsbyResolveActive — driver-diagnostics surfaces AbCip.HsbyActive == 1
|
||||||
|
- HsbyFailoverFlip — POST {"active": "partner"} → AbCip.HsbyActive == 2
|
||||||
|
- HsbySubscribeSurvives — subscribe stream stays open across the flip + sees
|
||||||
|
an updated value from the partner side
|
||||||
|
- HsbyFailoverCount — AbCip.HsbyFailoverCount increments by ≥ 1
|
||||||
|
|
||||||
|
.PARAMETER PrimaryGateway
|
||||||
|
ab://host[:port]/cip-path of the primary chassis. Default ab://127.0.0.1/1,0.
|
||||||
|
|
||||||
|
.PARAMETER PartnerGateway
|
||||||
|
ab://host[:port]/cip-path of the partner chassis. Default ab://127.0.0.2/1,0.
|
||||||
|
|
||||||
|
.PARAMETER HsbyMuxUrl
|
||||||
|
Base URL of the paired-fixture's hsby-mux sidecar. Default http://localhost:7080.
|
||||||
|
Endpoints used:
|
||||||
|
GET /role → returns {"primary":"Active","partner":"Standby"}
|
||||||
|
POST /flip {"active":"primary"|"partner"} → flips role tag values on each chassis
|
||||||
|
|
||||||
|
.PARAMETER OpcUaUrl
|
||||||
|
OtOpcUa server endpoint. Default opc.tcp://localhost:4840.
|
||||||
|
|
||||||
|
.PARAMETER BridgeNodeId
|
||||||
|
NodeId at which the server publishes the tag exercised by the subscribe assertion.
|
||||||
|
Required.
|
||||||
|
|
||||||
|
.PARAMETER TagPath
|
||||||
|
Logix symbolic path the bridge tag points at. Default 'TestDINT'.
|
||||||
|
|
||||||
|
.PARAMETER DriverInstanceId
|
||||||
|
DriverInstance ID for the AB CIP driver under test. Used to scope the
|
||||||
|
driver-diagnostics RPC. Default 'abcip-hsby'.
|
||||||
|
|
||||||
|
.EXAMPLE
|
||||||
|
./test-abcip-hsby.ps1 -BridgeNodeId 'ns=2;s=AbCip/Bridge/TestDINT'
|
||||||
|
#>
|
||||||
|
|
||||||
|
param(
|
||||||
|
[string]$PrimaryGateway = "ab://127.0.0.1/1,0",
|
||||||
|
[string]$PartnerGateway = "ab://127.0.0.2/1,0",
|
||||||
|
[string]$HsbyMuxUrl = "http://localhost:7080",
|
||||||
|
[string]$OpcUaUrl = "opc.tcp://localhost:4840",
|
||||||
|
[Parameter(Mandatory)] [string]$BridgeNodeId,
|
||||||
|
[string]$TagPath = "TestDINT",
|
||||||
|
[string]$DriverInstanceId = "abcip-hsby"
|
||||||
|
)
|
||||||
|
|
||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
. "$PSScriptRoot/_common.ps1"
|
||||||
|
|
||||||
|
$abcipCli = Get-CliInvocation `
|
||||||
|
-ProjectFolder "src/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli" `
|
||||||
|
-ExeName "otopcua-abcip-cli"
|
||||||
|
$opcUaCli = Get-CliInvocation `
|
||||||
|
-ProjectFolder "src/ZB.MOM.WW.OtOpcUa.Client.CLI" `
|
||||||
|
-ExeName "otopcua-cli"
|
||||||
|
|
||||||
|
$results = @()
|
||||||
|
|
||||||
|
function Invoke-HsbyFlip {
|
||||||
|
param([string]$Active)
|
||||||
|
$body = @{ active = $Active } | ConvertTo-Json -Compress
|
||||||
|
try {
|
||||||
|
Invoke-RestMethod -Uri "$HsbyMuxUrl/flip" -Method Post -Body $body -ContentType 'application/json'
|
||||||
|
} catch {
|
||||||
|
throw "hsby-mux at $HsbyMuxUrl/flip rejected the request: $($_.Exception.Message)"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-HsbyDiagnosticValue {
|
||||||
|
param([string]$Counter)
|
||||||
|
# Pull driver-diagnostics through the OPC UA Admin RPC surface. The CLI returns
|
||||||
|
# a raw JSON blob; we grep out the named counter so the assertion is robust to
|
||||||
|
# other counters the driver surfaces.
|
||||||
|
$diagArgs = @($opcUaCli.PrefixArgs) + @(
|
||||||
|
"driver-diagnostics", "-u", $OpcUaUrl, "-d", $DriverInstanceId)
|
||||||
|
$diagOut = & $opcUaCli.File @diagArgs 2>&1
|
||||||
|
$joined = ($diagOut -join "`n")
|
||||||
|
if ($joined -match "${Counter}.*?:\s*([\d\.]+)") {
|
||||||
|
return [double]$matches[1]
|
||||||
|
}
|
||||||
|
return $null
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- HsbyInitialActive — hsby-mux primes primary as Active ----
|
||||||
|
Write-Header "HsbyInitialActive (POST $HsbyMuxUrl/flip {active=primary})"
|
||||||
|
try {
|
||||||
|
Invoke-HsbyFlip -Active "primary" | Out-Null
|
||||||
|
Start-Sleep -Seconds 3 # role-probe loop default tick is 2s
|
||||||
|
$active = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyActive"
|
||||||
|
$passed = ($active -eq 1.0)
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyInitialActive"
|
||||||
|
Passed = $passed
|
||||||
|
Detail = if ($passed) { "AbCip.HsbyActive=1 after priming primary" } else { "AbCip.HsbyActive=$active (expected 1)" }
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyInitialActive"; Passed = $false; Detail = $_.Exception.Message
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- HsbyResolveActive — driver routing reads through the primary ----
|
||||||
|
Write-Header "HsbyResolveActive (read $TagPath via primary)"
|
||||||
|
$readArgs = @("read") + @("-g", $PrimaryGateway, "-f", "ControlLogix") + @("-t", $TagPath, "--type", "DInt")
|
||||||
|
$readOut = & $abcipCli.Exe @($abcipCli.Args + $readArgs) 2>&1
|
||||||
|
$readOk = ($readOut -join "`n") -notmatch "(error|fail)"
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyResolveActive"
|
||||||
|
Passed = $readOk
|
||||||
|
Detail = if ($readOk) { "primary read completed without error" } else { "read failed: $($readOut -join ' ')" }
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- HsbySubscribeSurvives + HsbyFailoverFlip + HsbyFailoverCount ----
|
||||||
|
Write-Header "HsbyFailoverFlip + HsbySubscribeSurvives (subscribe across flip)"
|
||||||
|
$failoverBaseline = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyFailoverCount"
|
||||||
|
if ($null -eq $failoverBaseline) { $failoverBaseline = 0 }
|
||||||
|
|
||||||
|
$duration = 12
|
||||||
|
$subOut = New-TemporaryFile
|
||||||
|
$subErr = New-TemporaryFile
|
||||||
|
$subArgs = @($opcUaCli.PrefixArgs) + @(
|
||||||
|
"subscribe", "-u", $OpcUaUrl, "-n", $BridgeNodeId, "-i", "200", "--duration", "$duration")
|
||||||
|
$subProc = Start-Process -FilePath $opcUaCli.File -ArgumentList $subArgs `
|
||||||
|
-NoNewWindow -PassThru `
|
||||||
|
-RedirectStandardOutput $subOut.FullName `
|
||||||
|
-RedirectStandardError $subErr.FullName
|
||||||
|
|
||||||
|
# Let the subscribe settle + accumulate primary-side notifications.
|
||||||
|
Start-Sleep -Seconds 3
|
||||||
|
|
||||||
|
# Mid-stream flip — primary→Standby, partner→Active.
|
||||||
|
try {
|
||||||
|
Invoke-HsbyFlip -Active "partner" | Out-Null
|
||||||
|
} catch {
|
||||||
|
Stop-Process -Id $subProc.Id -Force -ErrorAction SilentlyContinue
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyFailoverFlip"; Passed = $false; Detail = "hsby-mux flip rejected: $($_.Exception.Message)"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Wait for the role-probe loop to catch up (default tick 2s + ProbeIntervalMs slack).
|
||||||
|
Start-Sleep -Seconds 4
|
||||||
|
|
||||||
|
# Drive a write through the partner so the subscribe sees a fresh value.
|
||||||
|
$flipValue = Get-Random -Minimum 70000 -Maximum 79999
|
||||||
|
$writeArgs = @("write") + @("-g", $PartnerGateway, "-f", "ControlLogix") + @("-t", $TagPath, "--type", "DInt", "-v", $flipValue)
|
||||||
|
& $abcipCli.Exe @($abcipCli.Args + $writeArgs) | Out-Null
|
||||||
|
|
||||||
|
$activeAfter = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyActive"
|
||||||
|
$flipPassed = ($activeAfter -eq 2.0)
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyFailoverFlip"
|
||||||
|
Passed = $flipPassed
|
||||||
|
Detail = if ($flipPassed) { "AbCip.HsbyActive=2 after flip" } else { "AbCip.HsbyActive=$activeAfter (expected 2)" }
|
||||||
|
}
|
||||||
|
|
||||||
|
# Stop the subscribe + harvest the stream.
|
||||||
|
$subProc.WaitForExit(($duration + 5) * 1000) | Out-Null
|
||||||
|
if (-not $subProc.HasExited) { Stop-Process -Id $subProc.Id -Force }
|
||||||
|
|
||||||
|
$subText = (Get-Content $subOut.FullName -Raw) + (Get-Content $subErr.FullName -Raw)
|
||||||
|
Remove-Item $subOut.FullName, $subErr.FullName -ErrorAction SilentlyContinue
|
||||||
|
|
||||||
|
# Stream survival = at least one notification *after* the flip carries the new
|
||||||
|
# partner-side value. The post-flip write of $flipValue is the canary.
|
||||||
|
$saw = $subText -match "$flipValue"
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbySubscribeSurvives"
|
||||||
|
Passed = $saw
|
||||||
|
Detail = if ($saw) {
|
||||||
|
"subscribe stream surfaced post-flip value $flipValue from partner chassis"
|
||||||
|
} else {
|
||||||
|
"subscribe stream did not see the post-flip canary $flipValue — output: $subText"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---- HsbyFailoverCount — counter incremented by ≥ 1 ----
|
||||||
|
Write-Header "HsbyFailoverCount"
|
||||||
|
$failoverAfter = Get-HsbyDiagnosticValue -Counter "AbCip.HsbyFailoverCount"
|
||||||
|
if ($null -eq $failoverAfter) { $failoverAfter = 0 }
|
||||||
|
$counterOk = ($failoverAfter - $failoverBaseline) -ge 1
|
||||||
|
$results += [PSCustomObject]@{
|
||||||
|
Name = "HsbyFailoverCount"
|
||||||
|
Passed = $counterOk
|
||||||
|
Detail = if ($counterOk) {
|
||||||
|
"AbCip.HsbyFailoverCount went from $failoverBaseline → $failoverAfter"
|
||||||
|
} else {
|
||||||
|
"AbCip.HsbyFailoverCount unchanged ($failoverBaseline → $failoverAfter); expected at least 1 increment"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Summary -Title "AB CIP HSBY failover e2e" -Results $results
|
||||||
|
if ($results | Where-Object { -not $_.Passed }) { exit 1 }
|
||||||
@@ -44,6 +44,24 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
private IAddressSpaceBuilder? _cachedBuilder;
|
private IAddressSpaceBuilder? _cachedBuilder;
|
||||||
private DriverHealth _health = new(DriverState.Unknown, null, null);
|
private DriverHealth _health = new(DriverState.Unknown, null, null);
|
||||||
|
|
||||||
|
// PR abcip-5.2 — failover bookkeeping. Counter is surfaced through driver-diagnostics
|
||||||
|
// as AbCip.HsbyFailoverCount; the event lets internal subscribers react to an
|
||||||
|
// ActiveAddress flip without HsbyProbeLoopAsync calling deep into the runtime cache
|
||||||
|
// directly. The driver subscribes itself in the constructor so cache invalidation +
|
||||||
|
// write-coalescer reset run inline with the address-change observation.
|
||||||
|
private long _hsbyFailoverCount;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — raised by <see cref="HsbyProbeLoopAsync"/> whenever a device's
|
||||||
|
/// <see cref="DeviceState.ActiveAddress"/> transitions to a value different from
|
||||||
|
/// the one observed on the previous tick. Args carry the device + the
|
||||||
|
/// (oldAddress, newAddress) pair so subscribers can decide whether the change
|
||||||
|
/// matters for them. Internal seam — the driver wires its own runtime-cache /
|
||||||
|
/// write-coalescer invalidation through this event so the bookkeeping runs in
|
||||||
|
/// one place + tests can assert via the public diagnostics counter.
|
||||||
|
/// </summary>
|
||||||
|
internal event EventHandler<HsbyActiveAddressChangedEventArgs>? OnActiveAddressChanged;
|
||||||
|
|
||||||
public event EventHandler<DataChangeEventArgs>? OnDataChange;
|
public event EventHandler<DataChangeEventArgs>? OnDataChange;
|
||||||
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
|
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
|
||||||
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
|
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
|
||||||
@@ -67,6 +85,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
onChange: (handle, tagRef, snapshot) =>
|
onChange: (handle, tagRef, snapshot) =>
|
||||||
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, tagRef, snapshot)));
|
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, tagRef, snapshot)));
|
||||||
_alarmProjection = new AbCipAlarmProjection(this, _options.AlarmPollInterval);
|
_alarmProjection = new AbCipAlarmProjection(this, _options.AlarmPollInterval);
|
||||||
|
// PR abcip-5.2 — wire the failover-handling subscriber. Drops every cached per-tag
|
||||||
|
// / parent-DINT runtime against the now-standby gateway, resets the write-coalescer
|
||||||
|
// (the prior known-written values were against the standby chassis), clears the
|
||||||
|
// logical-walk state so the @tags walk reruns against the new active gateway, and
|
||||||
|
// bumps the diagnostics counter that BuildDiagnostics surfaces.
|
||||||
|
OnActiveAddressChanged += HandleActiveAddressChanged;
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
@@ -258,6 +282,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
&& !string.IsNullOrWhiteSpace(state.Options.PartnerHostAddress))
|
&& !string.IsNullOrWhiteSpace(state.Options.PartnerHostAddress))
|
||||||
{
|
{
|
||||||
state.PartnerAddress = state.Options.PartnerHostAddress;
|
state.PartnerAddress = state.Options.PartnerHostAddress;
|
||||||
|
// PR abcip-5.2 — pre-parse the partner address once so the runtime hot
|
||||||
|
// path can swap (Gateway, Port, CipPath) without re-parsing on every
|
||||||
|
// ResolveHost / EnsureTagRuntimeAsync call. A bad partner address is a
|
||||||
|
// hard config error already flagged by HsbyProbeLoopAsync's TryParse +
|
||||||
|
// OnWarning path, so a TryParse miss here is non-fatal — the runtime
|
||||||
|
// never resolves to it because PartnerParsedAddress stays null.
|
||||||
|
state.PartnerParsedAddress = AbCipHostAddress.TryParse(state.Options.PartnerHostAddress!);
|
||||||
state.HsbyCts = new CancellationTokenSource();
|
state.HsbyCts = new CancellationTokenSource();
|
||||||
var ct = state.HsbyCts.Token;
|
var ct = state.HsbyCts.Token;
|
||||||
_ = Task.Run(() => HsbyProbeLoopAsync(state, hsby, ct), ct);
|
_ = Task.Run(() => HsbyProbeLoopAsync(state, hsby, ct), ct);
|
||||||
@@ -784,7 +815,28 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
// No chassis Active — clear so PR abcip-5.2's ResolveHost can fault writes.
|
// No chassis Active — clear so PR abcip-5.2's ResolveHost can fault writes.
|
||||||
newActive = null;
|
newActive = null;
|
||||||
}
|
}
|
||||||
|
// PR abcip-5.2 — fire OnActiveAddressChanged on every transition so the
|
||||||
|
// runtime-cache invalidation handler runs exactly once per flip. We compare
|
||||||
|
// before assigning so a steady-state tick (Active didn't change) is a no-op.
|
||||||
|
var prevActive = state.ActiveAddress;
|
||||||
state.ActiveAddress = newActive;
|
state.ActiveAddress = newActive;
|
||||||
|
if (!string.Equals(prevActive, newActive, StringComparison.OrdinalIgnoreCase))
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
OnActiveAddressChanged?.Invoke(this,
|
||||||
|
new HsbyActiveAddressChangedEventArgs(state, prevActive, newActive));
|
||||||
|
}
|
||||||
|
catch (Exception ex)
|
||||||
|
{
|
||||||
|
// A handler that throws must never tear the probe loop down. Surface
|
||||||
|
// the failure through the warning sink + keep ticking; the next flip
|
||||||
|
// gets another shot at invalidation.
|
||||||
|
_options.OnWarning?.Invoke(
|
||||||
|
$"AbCip HSBY active-address-changed handler threw on " +
|
||||||
|
$"primary='{state.Options.HostAddress}' partner='{partnerAddress}': {ex.Message}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
try { await Task.Delay(hsby.ProbeInterval, ct).ConfigureAwait(false); }
|
try { await Task.Delay(hsby.ProbeInterval, ct).ConfigureAwait(false); }
|
||||||
catch (OperationCanceledException) { break; }
|
catch (OperationCanceledException) { break; }
|
||||||
@@ -836,6 +888,46 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — invalidation hook for an HSBY failover. Disposes every cached
|
||||||
|
/// per-tag / parent-DINT runtime on the device so the next read / write re-creates
|
||||||
|
/// against the new Active gateway, resets the write-coalescer's per-device cache
|
||||||
|
/// (the prior known-written values were against the now-standby chassis), wipes
|
||||||
|
/// the Logical-mode @tags walk so the new chassis gets a fresh symbol-table
|
||||||
|
/// resolution, and bumps the AbCip.HsbyFailoverCount diagnostic. Idempotent — a
|
||||||
|
/// re-fire against the same address (e.g. an event handler that races the assign)
|
||||||
|
/// short-circuits on the RuntimesAddress equality check inside
|
||||||
|
/// <see cref="EnsureTagRuntimeAsync"/>.
|
||||||
|
/// </summary>
|
||||||
|
private void HandleActiveAddressChanged(object? sender, HsbyActiveAddressChangedEventArgs e)
|
||||||
|
{
|
||||||
|
var state = e.Device;
|
||||||
|
// Drop the runtime cache. The runtime creators repopulate against the new active
|
||||||
|
// gateway on next read/write; the disposed handles' libplctag pointers are
|
||||||
|
// released so the native heap doesn't leak.
|
||||||
|
foreach (var rt in state.Runtimes.Values)
|
||||||
|
{
|
||||||
|
try { rt.Dispose(); } catch { }
|
||||||
|
}
|
||||||
|
state.Runtimes.Clear();
|
||||||
|
foreach (var rt in state.ParentRuntimes.Values)
|
||||||
|
{
|
||||||
|
try { rt.Dispose(); } catch { }
|
||||||
|
}
|
||||||
|
state.ParentRuntimes.Clear();
|
||||||
|
// Reset the @tags symbol-table walk so the new chassis re-fires it on next read;
|
||||||
|
// the standby chassis's instance IDs don't transfer to the now-Active partner.
|
||||||
|
state.LogicalInstanceMap.Clear();
|
||||||
|
state.LogicalWalkComplete = false;
|
||||||
|
// Reset the write-coalescer so the first post-flip write of any value pays the
|
||||||
|
// full round-trip and the cache rebuilds from the new baseline.
|
||||||
|
_writeCoalescer.Reset(state.Options.HostAddress);
|
||||||
|
// Clear the per-device runtimes-address marker so the next runtime creator stamps
|
||||||
|
// it with whatever the new ActiveParsedAddress resolves to.
|
||||||
|
state.RuntimesAddress = null;
|
||||||
|
Interlocked.Increment(ref _hsbyFailoverCount);
|
||||||
|
}
|
||||||
|
|
||||||
private void TransitionDeviceState(DeviceState state, HostState newState)
|
private void TransitionDeviceState(DeviceState state, HostState newState)
|
||||||
{
|
{
|
||||||
HostState old;
|
HostState old;
|
||||||
@@ -911,11 +1003,34 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
if (AbCipSystemTagSource.IsSystemReference(fullReference))
|
if (AbCipSystemTagSource.IsSystemReference(fullReference))
|
||||||
{
|
{
|
||||||
var host = ExtractSystemDeviceHost(fullReference);
|
var host = ExtractSystemDeviceHost(fullReference);
|
||||||
if (host is not null) return host;
|
if (host is not null) return ResolveActiveHostFor(host);
|
||||||
}
|
}
|
||||||
if (_tagsByName.TryGetValue(fullReference, out var def))
|
if (_tagsByName.TryGetValue(fullReference, out var def))
|
||||||
return def.DeviceHostAddress;
|
return ResolveActiveHostFor(def.DeviceHostAddress);
|
||||||
return _options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId;
|
return ResolveActiveHostFor(_options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — failover-aware bulkhead-key resolver. The configured primary
|
||||||
|
/// <c>HostAddress</c> stays the device-state lookup key (it never changes for a
|
||||||
|
/// given device), but the resilience pipeline (Polly bulkhead + breaker per plan
|
||||||
|
/// decision #144) keys on whatever this method returns. When HSBY is enabled and
|
||||||
|
/// <see cref="DeviceState.ActiveAddress"/> resolves to the partner, we route the
|
||||||
|
/// bulkhead through the partner's address so the new active partner gets its own
|
||||||
|
/// fresh breaker state instead of inheriting the now-standby's tripped breaker.
|
||||||
|
/// <para>
|
||||||
|
/// When HSBY isn't enabled or no chassis is Active, returns the original
|
||||||
|
/// primary host address — that's the legacy pre-5.2 behaviour and keeps the
|
||||||
|
/// bulkhead state stable for the dial flow's BadCommunicationError surface.
|
||||||
|
/// </para>
|
||||||
|
/// </summary>
|
||||||
|
internal string ResolveActiveHostFor(string deviceHostAddress)
|
||||||
|
{
|
||||||
|
if (!_devices.TryGetValue(deviceHostAddress, out var state)) return deviceHostAddress;
|
||||||
|
if (state.Options.Hsby is not { Enabled: true }) return deviceHostAddress;
|
||||||
|
var active = state.ActiveAddress;
|
||||||
|
if (string.IsNullOrEmpty(active)) return deviceHostAddress;
|
||||||
|
return active;
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
@@ -1367,10 +1482,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
{
|
{
|
||||||
sliceLogicalId = sliceId;
|
sliceLogicalId = sliceId;
|
||||||
}
|
}
|
||||||
|
// PR abcip-5.2 — slice handles also follow the active address.
|
||||||
|
var sliceActive = device.ActiveParsedAddress;
|
||||||
var baseParams = new AbCipTagCreateParams(
|
var baseParams = new AbCipTagCreateParams(
|
||||||
Gateway: device.ParsedAddress.Gateway,
|
Gateway: sliceActive.Gateway,
|
||||||
Port: device.ParsedAddress.Port,
|
Port: sliceActive.Port,
|
||||||
CipPath: device.ParsedAddress.CipPath,
|
CipPath: sliceActive.CipPath,
|
||||||
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
||||||
TagName: parsedPath.ToLibplctagName(),
|
TagName: parsedPath.ToLibplctagName(),
|
||||||
Timeout: _options.Timeout,
|
Timeout: _options.Timeout,
|
||||||
@@ -1439,6 +1556,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
throw;
|
throw;
|
||||||
}
|
}
|
||||||
device.Runtimes[tagName] = runtime;
|
device.Runtimes[tagName] = runtime;
|
||||||
|
// PR abcip-5.2 — keep the slice path's runtime cache lifecycle in lockstep with
|
||||||
|
// the per-tag handles. The failover handler clears Runtimes wholesale, so the
|
||||||
|
// address stamp here matches whatever ActiveAddress resolved to when the slice
|
||||||
|
// params were built (the caller passed createParams pre-resolved).
|
||||||
|
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
|
||||||
|
? device.ActiveAddress ?? device.Options.HostAddress
|
||||||
|
: device.Options.HostAddress;
|
||||||
return runtime;
|
return runtime;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1859,10 +1983,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
{
|
{
|
||||||
parentLogicalId = pid;
|
parentLogicalId = pid;
|
||||||
}
|
}
|
||||||
|
// PR abcip-5.2 — same active-address routing as EnsureTagRuntimeAsync so
|
||||||
|
// BOOL-in-DINT RMW handles follow the failover.
|
||||||
|
var active = device.ActiveParsedAddress;
|
||||||
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
|
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
|
||||||
Gateway: device.ParsedAddress.Gateway,
|
Gateway: active.Gateway,
|
||||||
Port: device.ParsedAddress.Port,
|
Port: active.Port,
|
||||||
CipPath: device.ParsedAddress.CipPath,
|
CipPath: active.CipPath,
|
||||||
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
||||||
TagName: parentTagName,
|
TagName: parentTagName,
|
||||||
Timeout: _options.Timeout,
|
Timeout: _options.Timeout,
|
||||||
@@ -1879,6 +2006,9 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
throw;
|
throw;
|
||||||
}
|
}
|
||||||
device.ParentRuntimes[parentTagName] = runtime;
|
device.ParentRuntimes[parentTagName] = runtime;
|
||||||
|
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
|
||||||
|
? device.ActiveAddress ?? device.Options.HostAddress
|
||||||
|
: device.Options.HostAddress;
|
||||||
return runtime;
|
return runtime;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1906,10 +2036,15 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
logicalId = resolvedId;
|
logicalId = resolvedId;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// PR abcip-5.2 — route through the resolved active address so an HSBY pair that
|
||||||
|
// failed-over to the partner targets the partner's gateway / port / cip-path.
|
||||||
|
// When HSBY is off or no chassis is Active the getter returns ParsedAddress and
|
||||||
|
// behaviour is identical to pre-5.2 builds.
|
||||||
|
var active = device.ActiveParsedAddress;
|
||||||
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
|
var runtime = _tagFactory.Create(new AbCipTagCreateParams(
|
||||||
Gateway: device.ParsedAddress.Gateway,
|
Gateway: active.Gateway,
|
||||||
Port: device.ParsedAddress.Port,
|
Port: active.Port,
|
||||||
CipPath: device.ParsedAddress.CipPath,
|
CipPath: active.CipPath,
|
||||||
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
|
||||||
TagName: parsed.ToLibplctagName(),
|
TagName: parsed.ToLibplctagName(),
|
||||||
Timeout: _options.Timeout,
|
Timeout: _options.Timeout,
|
||||||
@@ -1927,6 +2062,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
throw;
|
throw;
|
||||||
}
|
}
|
||||||
device.Runtimes[def.Name] = runtime;
|
device.Runtimes[def.Name] = runtime;
|
||||||
|
// Stamp the per-device runtimes-address marker so the failover handler can detect
|
||||||
|
// a stale cache. Compared in DEBUG builds + diagnostics; production code routes
|
||||||
|
// invalidation through OnActiveAddressChanged.
|
||||||
|
device.RuntimesAddress = device.Options.Hsby is { Enabled: true }
|
||||||
|
? device.ActiveAddress ?? device.Options.HostAddress
|
||||||
|
: device.Options.HostAddress;
|
||||||
return runtime;
|
return runtime;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1951,6 +2092,11 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
["AbCip.WritesPassedThrough"] = _writeCoalescer.TotalWritesPassedThrough,
|
["AbCip.WritesPassedThrough"] = _writeCoalescer.TotalWritesPassedThrough,
|
||||||
// PR abcip-4.4 — total _RefreshTagDb truthy writes that dispatched to RebrowseAsync.
|
// PR abcip-4.4 — total _RefreshTagDb truthy writes that dispatched to RebrowseAsync.
|
||||||
["AbCip.RefreshTriggers"] = _systemTagSource.TotalRefreshTriggers,
|
["AbCip.RefreshTriggers"] = _systemTagSource.TotalRefreshTriggers,
|
||||||
|
// PR abcip-5.2 — count of HSBY active-address transitions the probe loop has
|
||||||
|
// observed. Aggregated across every HSBY-enabled device on this driver
|
||||||
|
// instance; the per-device breakdown is observable via the per-pair role
|
||||||
|
// counters below.
|
||||||
|
["AbCip.HsbyFailoverCount"] = Interlocked.Read(ref _hsbyFailoverCount),
|
||||||
};
|
};
|
||||||
// PR abcip-5.1 — HSBY role surface. One <Counter> per HSBY-enabled device:
|
// PR abcip-5.1 — HSBY role surface. One <Counter> per HSBY-enabled device:
|
||||||
// AbCip.HsbyActive — 1 if ActiveAddress == primary, 2 if == partner, 0 otherwise.
|
// AbCip.HsbyActive — 1 if ActiveAddress == primary, 2 if == partner, 0 otherwise.
|
||||||
@@ -2368,6 +2514,49 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
/// </summary>
|
/// </summary>
|
||||||
public string? PartnerAddress { get; set; }
|
public string? PartnerAddress { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — parsed form of <see cref="PartnerAddress"/>, populated at init
|
||||||
|
/// when HSBY is configured. <c>ResolveHost</c>'s caller side keeps using the
|
||||||
|
/// opaque <see cref="AbCipDeviceOptions.HostAddress"/>; the **runtime hot path**
|
||||||
|
/// consults <see cref="ActiveParsedAddress"/> so libplctag handles target the
|
||||||
|
/// currently Active gateway / port / cip-path.
|
||||||
|
/// </summary>
|
||||||
|
public AbCipHostAddress? PartnerParsedAddress { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — parsed wire address that per-tag / per-slice / parent-DINT
|
||||||
|
/// runtimes should be created against right now. Returns <see cref="ParsedAddress"/>
|
||||||
|
/// (the configured primary) when (a) HSBY isn't enabled, (b) <see cref="ActiveAddress"/>
|
||||||
|
/// is null (no chassis Active — fall through to the dial flow which will fault
|
||||||
|
/// with BadCommunicationError on the next wire op), or (c) the active address
|
||||||
|
/// equals the configured primary host. Returns <see cref="PartnerParsedAddress"/>
|
||||||
|
/// when the partner is the live chassis. Cheap getter — every tag-runtime
|
||||||
|
/// creation calls it.
|
||||||
|
/// </summary>
|
||||||
|
public AbCipHostAddress ActiveParsedAddress
|
||||||
|
{
|
||||||
|
get
|
||||||
|
{
|
||||||
|
if (Options.Hsby is not { Enabled: true } || ActiveAddress is null)
|
||||||
|
return ParsedAddress;
|
||||||
|
if (PartnerParsedAddress is not null
|
||||||
|
&& string.Equals(ActiveAddress, PartnerAddress, StringComparison.OrdinalIgnoreCase))
|
||||||
|
return PartnerParsedAddress;
|
||||||
|
return ParsedAddress;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — address every entry in <see cref="Runtimes"/> +
|
||||||
|
/// <see cref="ParentRuntimes"/> was created against. <c>null</c> until the first
|
||||||
|
/// read / write materialises a runtime; set to the resolved active address each
|
||||||
|
/// time a runtime is created. <see cref="AbCipDriver.HsbyProbeLoopAsync"/>'s
|
||||||
|
/// active-address-changed callback compares this against the new active and
|
||||||
|
/// drops every cached handle on mismatch so the next read / write re-creates
|
||||||
|
/// against the new gateway.
|
||||||
|
/// </summary>
|
||||||
|
public string? RuntimesAddress { get; set; }
|
||||||
|
|
||||||
/// <summary>PR abcip-5.1 — most-recent role observed on the primary chassis.</summary>
|
/// <summary>PR abcip-5.1 — most-recent role observed on the primary chassis.</summary>
|
||||||
public HsbyRole PrimaryRole { get; set; } = HsbyRole.Unknown;
|
public HsbyRole PrimaryRole { get; set; } = HsbyRole.Unknown;
|
||||||
|
|
||||||
@@ -2420,3 +2609,26 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — event payload raised by <see cref="AbCipDriver"/> when the HSBY
|
||||||
|
/// probe loop observes a transition in <see cref="AbCipDriver.DeviceState.ActiveAddress"/>.
|
||||||
|
/// Subscribers consume <see cref="OldAddress"/> / <see cref="NewAddress"/> to decide
|
||||||
|
/// whether to invalidate cached state. <see cref="OldAddress"/> is <c>null</c> on the
|
||||||
|
/// first transition (driver freshly initialised) and <see cref="NewAddress"/> is
|
||||||
|
/// <c>null</c> when neither chassis is Active (both Standby / Disqualified / Unknown).
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class HsbyActiveAddressChangedEventArgs : EventArgs
|
||||||
|
{
|
||||||
|
public AbCipDriver.DeviceState Device { get; }
|
||||||
|
public string? OldAddress { get; }
|
||||||
|
public string? NewAddress { get; }
|
||||||
|
|
||||||
|
public HsbyActiveAddressChangedEventArgs(
|
||||||
|
AbCipDriver.DeviceState device, string? oldAddress, string? newAddress)
|
||||||
|
{
|
||||||
|
Device = device;
|
||||||
|
OldAddress = oldAddress;
|
||||||
|
NewAddress = newAddress;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -0,0 +1,47 @@
|
|||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Driver.AbCip;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — integration scaffold for HSBY failover routing through
|
||||||
|
/// <see cref="AbCipDriver.ResolveHost"/>. Skipped by default because the paired
|
||||||
|
/// fixture (controllogix-secondary <c>ab_server</c> instance + <c>hsby-mux</c>
|
||||||
|
/// sidecar that flips the role tag on demand) is not yet stable in the Docker
|
||||||
|
/// compose layout. The scaffold lives here so:
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item>The trait is discoverable by <c>dotnet test --filter "Category=Hsby"</c>.</item>
|
||||||
|
/// <item>The companion E2E script (<c>scripts/e2e/test-abcip-hsby.ps1</c>) has a
|
||||||
|
/// paired surface already wired in tests when an operator stands up the fixture
|
||||||
|
/// manually.</item>
|
||||||
|
/// <item>A future PR can flip the skip into a real assertion without restructuring
|
||||||
|
/// the test layout.</item>
|
||||||
|
/// </list>
|
||||||
|
/// The unit-level coverage in <c>AbCipHsbyFailoverTests</c> (in the unit tests
|
||||||
|
/// project) exercises the active-address-routing + cache-invalidation contract in
|
||||||
|
/// full against the FakeAbCipTagFactory; this scaffold is just the wire-level shape.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Hsby")]
|
||||||
|
[Trait("Requires", "AbServer")]
|
||||||
|
public sealed class AbCipHsbyFailoverTests
|
||||||
|
{
|
||||||
|
[AbServerFact]
|
||||||
|
public Task ResolveHost_routes_to_partner_after_role_flip_through_hsby_mux()
|
||||||
|
{
|
||||||
|
// The paired-fixture compose service (controllogix + controllogix-secondary +
|
||||||
|
// hsby-mux sidecar at http://localhost:7080) is not yet wired. When it ships,
|
||||||
|
// the test body will:
|
||||||
|
// 1. POST {"active": "primary"} to hsby-mux → assert ResolveHost = primary
|
||||||
|
// gateway via a CLI read.
|
||||||
|
// 2. POST {"active": "partner"} → wait for the probe loop to catch up →
|
||||||
|
// assert ResolveHost = partner gateway via a second CLI read.
|
||||||
|
// 3. Assert AbCip.HsbyFailoverCount on the driver's diagnostics
|
||||||
|
// ≥ 1 by reading the driver-diagnostics RPC through the OPC UA Admin
|
||||||
|
// surface.
|
||||||
|
Assert.Skip("HSBY paired fixture (controllogix-secondary + hsby-mux sidecar) " +
|
||||||
|
"not yet promoted out of scaffold. Run scripts/e2e/test-abcip-hsby.ps1 against a " +
|
||||||
|
"manually-stood-up paired fixture when verifying this PR end-to-end.");
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,373 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||||
|
using ZB.MOM.WW.OtOpcUa.Driver.AbCip;
|
||||||
|
|
||||||
|
namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// PR abcip-5.2 — unit tests for HSBY failover routing in
|
||||||
|
/// <see cref="AbCipDriver.ResolveHost"/>. Drives a paired-IP HSBY device through
|
||||||
|
/// primary→partner role flips via the FakeAbCipTagFactory's <c>Customise</c> hook +
|
||||||
|
/// asserts:
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item><see cref="AbCipDriver.ResolveHost"/> returns the address of the
|
||||||
|
/// currently-Active chassis (and the configured primary when HSBY is off /
|
||||||
|
/// both Standby).</item>
|
||||||
|
/// <item>The per-device runtime cache is invalidated on flip — disposed handles
|
||||||
|
/// prove the failover handler ran.</item>
|
||||||
|
/// <item><see cref="AbCipWriteCoalescer"/> drops cached values for the device so
|
||||||
|
/// the partner pays the full round-trip on next write.</item>
|
||||||
|
/// <item><c>AbCip.HsbyFailoverCount</c> in driver-diagnostics increments per flip.</item>
|
||||||
|
/// <item>Multiple flips count correctly.</item>
|
||||||
|
/// </list>
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class AbCipHsbyFailoverTests
|
||||||
|
{
|
||||||
|
private const string Primary = "ab://10.0.0.5/1,0";
|
||||||
|
private const string Partner = "ab://10.0.0.6/1,0";
|
||||||
|
|
||||||
|
// ---- ResolveHost routing ----
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ResolveHost_returns_partner_when_partner_active()
|
||||||
|
{
|
||||||
|
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 0, partnerRoleValue: 1);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
var resolved = drv.ResolveHost("Motor01_Speed");
|
||||||
|
// Tag isn't registered; resolver still falls through ResolveActiveHostFor on
|
||||||
|
// the first configured device, which has the partner active.
|
||||||
|
resolved.ShouldBe(Partner);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ResolveHost_returns_primary_when_primary_active()
|
||||||
|
{
|
||||||
|
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 1, partnerRoleValue: 0);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
drv.ResolveHost("Motor01_Speed").ShouldBe(Primary);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Toggling_role_flips_ResolveHost_output()
|
||||||
|
{
|
||||||
|
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
|
||||||
|
var drv = BuildDriver(factory);
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
drv.ResolveHost("anything").ShouldBe(Primary);
|
||||||
|
|
||||||
|
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
|
||||||
|
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
drv.ResolveHost("anything").ShouldBe(Partner);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ResolveHost_falls_back_to_primary_when_both_standby()
|
||||||
|
{
|
||||||
|
var (drv, _) = await BuildHsbyDriverAsync(primaryRoleValue: 0, partnerRoleValue: 0);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
// Wait for the role state to settle so we know the loop ticked at least once.
|
||||||
|
await WaitForAsync(() => drv.GetDeviceState(Primary)?.PrimaryRole != HsbyRole.Unknown);
|
||||||
|
drv.ResolveHost("anything").ShouldBe(Primary,
|
||||||
|
"neither chassis Active means ActiveAddress is null; ResolveHost falls back to the configured primary");
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ResolveHost_ignores_ActiveAddress_when_Hsby_disabled()
|
||||||
|
{
|
||||||
|
var factory = new FakeAbCipTagFactory();
|
||||||
|
var drv = new AbCipDriver(new AbCipDriverOptions
|
||||||
|
{
|
||||||
|
Devices =
|
||||||
|
[
|
||||||
|
new AbCipDeviceOptions(
|
||||||
|
Primary,
|
||||||
|
PartnerHostAddress: Partner,
|
||||||
|
Hsby: new AbCipHsbyOptions { Enabled = false }),
|
||||||
|
],
|
||||||
|
Probe = new AbCipProbeOptions { Enabled = false },
|
||||||
|
}, "drv-hsby-off-resolve", factory);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
// Manually plant an ActiveAddress that conflicts with the primary; ResolveHost
|
||||||
|
// must still pick the primary because Hsby is disabled.
|
||||||
|
var state = drv.GetDeviceState(Primary).ShouldNotBeNull();
|
||||||
|
state.ActiveAddress = Partner;
|
||||||
|
drv.ResolveHost("anything").ShouldBe(Primary);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- Cache invalidation on flip ----
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Failover_invalidates_runtime_cache_and_increments_counter()
|
||||||
|
{
|
||||||
|
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
|
||||||
|
var drv = BuildDriverWithTag(factory, "Motor01_Speed");
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
|
||||||
|
// Force a per-tag runtime to be created against the primary.
|
||||||
|
var initialReads = await drv.ReadAsync(["Motor01_Speed"], CancellationToken.None);
|
||||||
|
initialReads.Count.ShouldBe(1);
|
||||||
|
var state = drv.GetDeviceState(Primary).ShouldNotBeNull();
|
||||||
|
state.Runtimes.ShouldContainKey("Motor01_Speed");
|
||||||
|
var runtimeBeforeFlip = (FakeAbCipTag)state.Runtimes["Motor01_Speed"];
|
||||||
|
runtimeBeforeFlip.CreationParams.Gateway.ShouldBe("10.0.0.5");
|
||||||
|
|
||||||
|
// Flip — primary→Standby, partner→Active.
|
||||||
|
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
|
||||||
|
// The pre-flip runtime should have been disposed by the failover handler.
|
||||||
|
runtimeBeforeFlip.Disposed.ShouldBeTrue();
|
||||||
|
// Cache should be empty until the next read repopulates it.
|
||||||
|
state.Runtimes.ShouldNotContainKey("Motor01_Speed");
|
||||||
|
|
||||||
|
// Diagnostics counter ticked.
|
||||||
|
var diag = drv.GetHealth().Diagnostics.ShouldNotBeNull();
|
||||||
|
diag.ShouldContainKey("AbCip.HsbyFailoverCount");
|
||||||
|
diag["AbCip.HsbyFailoverCount"].ShouldBeGreaterThanOrEqualTo(1);
|
||||||
|
|
||||||
|
// Next read recreates against the partner gateway.
|
||||||
|
var afterReads = await drv.ReadAsync(["Motor01_Speed"], CancellationToken.None);
|
||||||
|
afterReads.Count.ShouldBe(1);
|
||||||
|
state.Runtimes.ShouldContainKey("Motor01_Speed");
|
||||||
|
var runtimeAfterFlip = (FakeAbCipTag)state.Runtimes["Motor01_Speed"];
|
||||||
|
runtimeAfterFlip.CreationParams.Gateway.ShouldBe("10.0.0.6",
|
||||||
|
"post-flip runtime must target the partner's gateway");
|
||||||
|
runtimeAfterFlip.ShouldNotBeSameAs(runtimeBeforeFlip);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Failover_resets_write_coalescer_for_device()
|
||||||
|
{
|
||||||
|
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
|
||||||
|
var drv = BuildDriverWithTag(factory, "Motor01_Speed");
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
// Seed the coalescer cache for this device + tag. We poke it directly via
|
||||||
|
// the test seam so we don't depend on the multi-write planner accepting our
|
||||||
|
// synthetic Motor01_Speed definition.
|
||||||
|
var def = new AbCipTagDefinition(
|
||||||
|
Name: "Motor01_Speed",
|
||||||
|
DeviceHostAddress: Primary,
|
||||||
|
TagPath: "Motor01_Speed",
|
||||||
|
DataType: AbCipDataType.DInt,
|
||||||
|
Writable: true,
|
||||||
|
WriteOnChange: true);
|
||||||
|
drv.WriteCoalescer.Record(Primary, def, 42);
|
||||||
|
drv.WriteCoalescer.ShouldSuppress(Primary, def, 42).ShouldBeTrue(
|
||||||
|
"baseline: identical re-write must be suppressed pre-failover");
|
||||||
|
|
||||||
|
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
|
||||||
|
// The cache for this device was cleared so the same write is no longer suppressed.
|
||||||
|
drv.WriteCoalescer.ShouldSuppress(Primary, def, 42).ShouldBeFalse(
|
||||||
|
"failover must drop cached known-written values; partner needs the wire round-trip");
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Multiple_flips_each_increment_HsbyFailoverCount()
|
||||||
|
{
|
||||||
|
var (factory, tracker) = BuildTrackingFactory(initialPrimary: 1, initialPartner: 0);
|
||||||
|
var drv = BuildDriver(factory);
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
var diagBaseline = drv.GetHealth().Diagnostics.ShouldNotBeNull();
|
||||||
|
var startCount = diagBaseline.TryGetValue("AbCip.HsbyFailoverCount", out var v) ? v : 0;
|
||||||
|
|
||||||
|
// Flip 1: primary→partner
|
||||||
|
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
|
||||||
|
// Flip 2: partner→primary
|
||||||
|
FlipRoles(tracker, newPrimary: 1, newPartner: 0);
|
||||||
|
await WaitForActiveAsync(drv, Primary);
|
||||||
|
|
||||||
|
// Flip 3: primary→partner again
|
||||||
|
FlipRoles(tracker, newPrimary: 0, newPartner: 1);
|
||||||
|
await WaitForActiveAsync(drv, Partner);
|
||||||
|
|
||||||
|
var diag = drv.GetHealth().Diagnostics.ShouldNotBeNull();
|
||||||
|
diag["AbCip.HsbyFailoverCount"].ShouldBeGreaterThanOrEqualTo(startCount + 3);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
await drv.ShutdownAsync(CancellationToken.None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- Helpers ----
|
||||||
|
|
||||||
|
private static AbCipDriver BuildDriver(FakeAbCipTagFactory factory) =>
|
||||||
|
new AbCipDriver(new AbCipDriverOptions
|
||||||
|
{
|
||||||
|
Devices =
|
||||||
|
[
|
||||||
|
new AbCipDeviceOptions(
|
||||||
|
Primary,
|
||||||
|
PartnerHostAddress: Partner,
|
||||||
|
Hsby: new AbCipHsbyOptions
|
||||||
|
{
|
||||||
|
Enabled = true,
|
||||||
|
RoleTagAddress = "WallClockTime.SyncStatus",
|
||||||
|
ProbeInterval = TimeSpan.FromMilliseconds(40),
|
||||||
|
}),
|
||||||
|
],
|
||||||
|
Probe = new AbCipProbeOptions { Enabled = false },
|
||||||
|
}, "drv-hsby-failover", factory);
|
||||||
|
|
||||||
|
private static AbCipDriver BuildDriverWithTag(FakeAbCipTagFactory factory, string tagName) =>
|
||||||
|
new AbCipDriver(new AbCipDriverOptions
|
||||||
|
{
|
||||||
|
Devices =
|
||||||
|
[
|
||||||
|
new AbCipDeviceOptions(
|
||||||
|
Primary,
|
||||||
|
PartnerHostAddress: Partner,
|
||||||
|
Hsby: new AbCipHsbyOptions
|
||||||
|
{
|
||||||
|
Enabled = true,
|
||||||
|
RoleTagAddress = "WallClockTime.SyncStatus",
|
||||||
|
ProbeInterval = TimeSpan.FromMilliseconds(40),
|
||||||
|
}),
|
||||||
|
],
|
||||||
|
Tags =
|
||||||
|
[
|
||||||
|
new AbCipTagDefinition(
|
||||||
|
Name: tagName,
|
||||||
|
DeviceHostAddress: Primary,
|
||||||
|
TagPath: tagName,
|
||||||
|
DataType: AbCipDataType.DInt,
|
||||||
|
Writable: true),
|
||||||
|
],
|
||||||
|
Probe = new AbCipProbeOptions { Enabled = false },
|
||||||
|
}, "drv-hsby-failover-tag", factory);
|
||||||
|
|
||||||
|
private static async Task<(AbCipDriver Driver, FakeAbCipTagFactory Factory)>
|
||||||
|
BuildHsbyDriverAsync(int primaryRoleValue, int partnerRoleValue)
|
||||||
|
{
|
||||||
|
var factory = new FakeAbCipTagFactory
|
||||||
|
{
|
||||||
|
Customise = p => p.Gateway == "10.0.0.5"
|
||||||
|
? new FakeAbCipTag(p) { Value = primaryRoleValue }
|
||||||
|
: new FakeAbCipTag(p) { Value = partnerRoleValue },
|
||||||
|
};
|
||||||
|
var drv = BuildDriver(factory);
|
||||||
|
await drv.InitializeAsync("{}", CancellationToken.None);
|
||||||
|
return (drv, factory);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Snapshot of the live primary + partner role-tag fakes the factory has handed
|
||||||
|
/// out, keyed by gateway. Populated by the <c>Customise</c> hook on the
|
||||||
|
/// <see cref="FakeAbCipTagFactory"/> via a side-effecting lambda; the
|
||||||
|
/// <see cref="FakeAbCipTagFactory.Tags"/> dict alone is insufficient because both
|
||||||
|
/// chassis use the same role-tag TagName + the dict overwrites on the second
|
||||||
|
/// create.
|
||||||
|
/// </summary>
|
||||||
|
private sealed class HsbyRoleTagTracker
|
||||||
|
{
|
||||||
|
public FakeAbCipTag? Primary { get; set; }
|
||||||
|
public FakeAbCipTag? Partner { get; set; }
|
||||||
|
}
|
||||||
|
|
||||||
|
private static (FakeAbCipTagFactory Factory, HsbyRoleTagTracker Tracker)
|
||||||
|
BuildTrackingFactory(int initialPrimary, int initialPartner)
|
||||||
|
{
|
||||||
|
var tracker = new HsbyRoleTagTracker();
|
||||||
|
var factory = new FakeAbCipTagFactory();
|
||||||
|
factory.Customise = p =>
|
||||||
|
{
|
||||||
|
if (p.TagName == "WallClockTime.SyncStatus")
|
||||||
|
{
|
||||||
|
var fake = new FakeAbCipTag(p)
|
||||||
|
{
|
||||||
|
Value = p.Gateway == "10.0.0.5" ? initialPrimary : initialPartner,
|
||||||
|
};
|
||||||
|
if (p.Gateway == "10.0.0.5") tracker.Primary = fake;
|
||||||
|
else tracker.Partner = fake;
|
||||||
|
return fake;
|
||||||
|
}
|
||||||
|
// Non-role-tag handles (e.g. per-tag runtimes) — return a default fake.
|
||||||
|
return new FakeAbCipTag(p) { Value = 0 };
|
||||||
|
};
|
||||||
|
return (factory, tracker);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Mutate the live primary / partner role-tag fakes' <c>Value</c> so the next
|
||||||
|
/// probe-loop tick reads the new role. Probe loop reuses one runtime per chassis
|
||||||
|
/// once initialised, so direct mutation of <see cref="FakeAbCipTag.Value"/> is
|
||||||
|
/// sufficient — no re-create required.
|
||||||
|
/// </summary>
|
||||||
|
private static void FlipRoles(HsbyRoleTagTracker tracker, int newPrimary, int newPartner)
|
||||||
|
{
|
||||||
|
if (tracker.Primary is not null) tracker.Primary.Value = newPrimary;
|
||||||
|
if (tracker.Partner is not null) tracker.Partner.Value = newPartner;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static Task WaitForActiveAsync(AbCipDriver drv, string expectedActive) =>
|
||||||
|
WaitForAsync(() => drv.GetDeviceState(Primary)?.ActiveAddress == expectedActive);
|
||||||
|
|
||||||
|
private static async Task WaitForAsync(Func<bool> condition, TimeSpan? timeout = null)
|
||||||
|
{
|
||||||
|
var deadline = DateTime.UtcNow + (timeout ?? TimeSpan.FromSeconds(2));
|
||||||
|
while (!condition() && DateTime.UtcNow < deadline)
|
||||||
|
await Task.Delay(20);
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user