Auto: abcip-5.1 — HSBY paired-IP role probing

Closes #242
This commit is contained in:
Joseph Doherty
2026-04-26 07:51:44 -04:00
parent 349aa5c6f4
commit 561b0f9ea9
12 changed files with 1260 additions and 9 deletions

185
docs/drivers/AbCip-HSBY.md Normal file
View File

@@ -0,0 +1,185 @@
# AbCip — ControlLogix HSBY paired-IP support
PR abcip-5.1 adds **non-transparent** HSBY (Hot-Standby) awareness to the AB
CIP driver. Each device may declare a partner gateway; when both gateways are
up the driver concurrently probes a role tag on each chassis and reports
which one is currently Active.
PR abcip-5.1 only **gathers + reports** the role. PR abcip-5.2 is the
follow-up that wires the resolved active address into
`AbCipDriver.ResolveHost` so reads and writes route to whichever chassis is
Active without operator intervention.
## When to use HSBY paired IPs
You have a redundant **ControlLogix** chassis pair (1756-RM redundancy
module, two CPUs, one acting + one standby) and the SCADA / OPC UA layer
needs to keep talking to *whichever chassis is currently Active* without an
operator manually re-pointing the connection.
Pre-5.1 the driver only knew about a single `HostAddress`. After a
hot-standby switch-over, the standby (now Active) carried a **different IP**
and the driver kept probing the dead-but-was-Active address until someone
edited the config.
PR abcip-5.1 closes the visibility half of that gap by reading the role tag
on both chassis. PR abcip-5.2 closes the routing half by re-pointing
`ResolveHost` at the Active address each tick.
## Configuration
```jsonc
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PartnerHostAddress": "ab://10.0.0.6/1,0",
"Hsby": {
"Enabled": true,
"RoleTagAddress": "WallClockTime.SyncStatus",
"ProbeIntervalMs": 2000
}
}
]
}
```
| Field | Default | Notes |
|---|---|---|
| `PartnerHostAddress` | `null` | Canonical `ab://gateway[:port]/cip-path` of the partner chassis. `null` = no HSBY pair; the driver behaves exactly like every pre-5.1 build. |
| `Hsby.Enabled` | `false` | Master switch. When `false` (or `Hsby` omitted) no role probing happens, even if `PartnerHostAddress` is set. |
| `Hsby.RoleTagAddress` | `WallClockTime.SyncStatus` | Address of the role tag on each chassis. See [role-tag detection matrix](#role-tag-detection-matrix). |
| `Hsby.ProbeIntervalMs` | `2000` | How often each chassis is sampled. 2 s is a good default — tight enough to detect a switch-over within one Admin-UI refresh, loose enough to leave headroom for the regular probe loop. |
## Feature-flag gate (`Redundancy.Hsby.Enabled`)
`Hsby.Enabled = false` (the default) is the off-switch for the entire
feature. The role-probe loop never starts, the diagnostics keys are not
emitted, and the driver behaves identically to a pre-5.1 build. This is the
gate to flip when an operator wants to roll the feature out cautiously
across a fleet — set `Hsby.Enabled = true` per-device in driver config (no
build flag, no env var).
When the gate is on but the partner gateway is unreachable, the role-probe
loop reports `HsbyRole.Unknown` for the partner each tick. The primary's
role still drives the active-chassis resolution; the operator sees the
partner's role as Unknown in the Admin UI / driver diagnostics, which is the
correct surface for "we can't reach the standby chassis right now."
## Role-tag detection matrix
| Firmware / fronts | Address | Decode |
|---|---|---|
| **v20 / v24 / v32+ ControlLogix HSBY** | `WallClockTime.SyncStatus` (DINT) | `0` = Standby, `1` = Synchronized / Active, `2` = Disqualified, anything else = Unknown |
| **PLC-5 / SLC500 status-byte fallback** | `S:34` Module Status word | bit 0 = "this chassis is Active". Bit set → `Active`; clear → `Standby` |
| **Custom user role tag** | any DINT-typed CIP path | Same matrix as `WallClockTime.SyncStatus` (0 / 1 / 2). Out-of-range values → Unknown. |
`AbCipHsbyRoleProber.MapValueToRole` is the value-to-role mapper; unit tests
in `tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipHsbyTests.cs` pin every
row of the matrix.
## What gets reported
The driver surfaces three diagnostics counters per HSBY-enabled device
(visible via `driver-diagnostics` RPC + the Admin UI):
| Counter | Value |
|---|---|
| `AbCip.HsbyActive` | `1` if primary is Active, `2` if partner is Active, `0` if neither (or HSBY off) |
| `AbCip.HsbyPrimaryRole` | `(int)HsbyRole``0` = Unknown, `1` = Active, `2` = Standby, `3` = Disqualified |
| `AbCip.HsbyPartnerRole` | Same encoding as `HsbyPrimaryRole`, observed on the partner chassis |
When more than one HSBY pair is configured on the same driver instance the
flat keys are scoped per primary host: `AbCip.HsbyActive[ab://10.0.0.5/1,0]`,
etc.
The `DeviceState.ActiveAddress` field (internal; surfaced via
`HsbyActive` diagnostics) is the address PR 5.2 will route through
`ResolveHost`.
### Active-resolution rules
| Primary role | Partner role | `ActiveAddress` resolution |
|---|---|---|
| Active | Standby / Disqualified / Unknown | primary |
| Standby / Disqualified / Unknown | Active | partner |
| Active | Active (split-brain) | **primary wins**, warning logged |
| Standby + Standby | Standby + Standby | `null` (PR 5.2 will surface as `BadCommunicationError`) |
| Unknown + Unknown | Unknown + Unknown | `null` |
Split-brain (both chassis claim Active simultaneously) is a real
production failure mode — typically a redundancy-module misconfiguration or
a partial network split. The driver picks primary deterministically + emits
a warning through `AbCipDriverOptions.OnWarning` so operators see it in the
log.
## CLI flags
The `otopcua-abcip-cli` tool exposes the HSBY plumbing through two surfaces
(see [Driver.AbCip.Cli.md](../Driver.AbCip.Cli.md) for the full CLI guide):
- `--partner <gateway>` — global flag on every command. Sets
`PartnerHostAddress` + auto-enables `Hsby.Enabled = true` so the role
probe runs alongside any read / write / subscribe.
- `hsby-status` — dedicated command that prints which chassis is
currently Active. Reads the role tag on both gateways for a few ticks +
prints the `(primary, partner, active)` tuple.
```powershell
# Print which chassis is Active right now
otopcua-abcip-cli hsby-status -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0
# Subscribe through the active chassis (PR 5.2 follow-up — today the
# subscribe stays pointed at the primary; the role probe runs alongside).
otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0 \
-t Motor01_Speed --type Real -i 500
```
## Test coverage
- **Unit** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipHsbyTests.cs`):
- Pure `MapValueToRole` matrix (WallClockTime.SyncStatus + S:34 bit
mask + Unknown values).
- End-to-end driver loop: primary Active / partner Standby resolves to
primary; both Active resolves to primary with a warning; both
Standby clears `ActiveAddress`; primary read failure routes to
partner.
- Diagnostics surface (`AbCip.HsbyActive` / `HsbyPrimaryRole` /
`HsbyPartnerRole`).
- DTO JSON round-trip (`PartnerHostAddress` + `Hsby.{Enabled,
RoleTagAddress, ProbeIntervalMs}` survive deserialise → driver →
`DeviceState`).
- `Hsby.Enabled = false` → no role probing.
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/AbCipHsbyRoleProberTests.cs`):
- **Skipped by default** (`Assert.Skip`) — `ab_server` cannot emulate
a ControlLogix HSBY pair (no `WallClockTime.SyncStatus`, no second
chassis concept). The Docker `paired` profile (PR 5.1) brings up two
`ab_server` instances + a stub `hsby-mux` sidecar so the topology is
documented, but PR 5.2 follow-up needs a patched `ab_server` image
that actually serves the role tag before the integration test can
assert anything against the wire.
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby` finds
this test once it's promoted.
## Follow-ups (PR 5.2 + beyond)
- **PR 5.2** — wire `ActiveAddress` into `ResolveHost` so reads/writes
route to the live chassis automatically. Today's PR only **gathers** the
role.
- **Patched `ab_server` image** — add a writable `WallClockTime.SyncStatus`
tag (or a separate Python shim) so the Docker `paired` profile can
exercise the wire-level role probe.
- **`hsby-mux` REST endpoint** — `POST /flip {"active": "primary"}` writes
`1` to the chosen chassis + `0` to the other so integration tests can
drive switch-overs deterministically.
- **GuardLogix HSBY** — same role-tag plumbing applies; verify against a
real 1756-L8xS pair when one is on-site.
## See also
- [`docs/Driver.AbCip.Cli.md`](../Driver.AbCip.Cli.md) — `--partner` flag +
`hsby-status` command reference
- [`docs/drivers/AbServer-Test-Fixture.md`](AbServer-Test-Fixture.md) §"What
it does NOT cover" — HSBY entry
- [`docs/Redundancy.md`](../Redundancy.md) — server-level (OPC UA-stack)
redundancy; HSBY is the **driver-level** companion

View File

@@ -160,6 +160,28 @@ The driver implements all of these + they have unit coverage, but the only
end-to-end paths `ab_server` validates today are atomic `ReadAsync` and
write-deadband / write-on-change suppression.
### 8. ControlLogix HSBY paired-IP role probing (PR abcip-5.1)
`ab_server` has no second-chassis concept and no `WallClockTime.SyncStatus`
tag. The HSBY paired-IP role-prober (PR abcip-5.1) is unit-tested only —
`AbCipHsbyTests` drives two fake runtimes (primary + partner), pins each
chassis's role-tag value, and asserts the active-resolution rules + DTO
round-trip + diagnostics surface.
The `paired` Docker compose profile spins up two `ab_server` instances +
a stub `hsby-mux` sidecar so the topology is documented, but PR 5.2 follow-
up needs a patched `ab_server` image (or a Python shim) that actually
serves the role tag before the integration test
(`AbCipHsbyRoleProberTests`) can flip its `Assert.Skip` into a real wire
assertion. Until then the test is gated on `Category=Hsby` + skipped by
default.
Lab-rig coverage is the authoritative path — a real 1756-RM redundant
chassis pair is the only place the live `WallClockTime.SyncStatus` matrix
+ split-brain handling can be exercised end-to-end. See
[`AbCip-HSBY.md`](AbCip-HSBY.md) for the full configuration + role-tag
detection matrix.
## Logix Emulate golden-box tier
Rockwell Studio 5000 Logix Emulate sits **above** ab_server in fidelity +