@@ -7,10 +7,12 @@ directly without going through a separate diagnostics RPC. Mirrors the AB CIP
|
||||
|
||||
Closes #253 (PR ablegacy-10).
|
||||
|
||||
## The seven counters
|
||||
## The nine counters
|
||||
|
||||
Each device managed by the `AbLegacyDriver` exposes seven read-only nodes under
|
||||
`AbLegacy/<host>/_Diagnostics/<name>`:
|
||||
Each device managed by the `AbLegacyDriver` exposes nine read-only nodes under
|
||||
`AbLegacy/<host>/_Diagnostics/<name>`. The first seven shipped in PR ablegacy-10;
|
||||
`DemoteCount` + `LastDemotedUtc` arrived with PR ablegacy-12 / #255 (auto-demote
|
||||
on comm failure).
|
||||
|
||||
| Name | Type | Semantics |
|
||||
|---|---|---|
|
||||
@@ -21,6 +23,8 @@ Each device managed by the `AbLegacyDriver` exposes seven read-only nodes under
|
||||
| `LastErrorCode` | Int32 | Most recent libplctag status code on a failed read; `0` when no error has been seen since the last reset. |
|
||||
| `LastErrorMessage` | String | Most recent libplctag error message on a failed read; empty when no error has been seen since the last reset. |
|
||||
| `CommFailures` | Int64 | Count of read failures mapped to `BadCommunicationError`. Spans transient libplctag throws + retried-out chains so operators see a single "wire fell off" counter. |
|
||||
| `DemoteCount` | Int64 | **PR ablegacy-12** — cumulative auto-demote events for this device. Bumps every time the driver crosses the consecutive-failure threshold and arms a fresh cool-down window. Cumulative across `ReinitializeAsync` (preserved through redeploys) so a flapping link surfaces as a steadily climbing counter. |
|
||||
| `LastDemotedUtc` | String | **PR ablegacy-12** — ISO-8601 UTC timestamp of the most recent auto-demotion. Empty string when this device has never been demoted. |
|
||||
|
||||
**Address shape**: `_Diagnostics/<deviceHostAddress>/<name>` —
|
||||
e.g. `_Diagnostics/ab://10.0.0.5/1,0/RequestCount`.
|
||||
@@ -34,10 +38,11 @@ user-config tag node, just under a reserved sibling folder.
|
||||
|
||||
| Trigger | Effect |
|
||||
|---|---|
|
||||
| `ReinitializeAsync` | Every counter for every device resets to zero, plus `LastErrorMessage` clears to empty. |
|
||||
| `ShutdownAsync` | Same as Reinitialize — counters drop with the device map. |
|
||||
| `ReinitializeAsync` | Every counter for every device resets to zero, plus `LastErrorMessage` clears to empty. **PR ablegacy-12 exception:** `DemoteCount` + `LastDemotedUtc` survive the reinit so an operator redeploying mid-incident doesn't lose the flapping-link history. |
|
||||
| `ShutdownAsync` | All counters drop with the device map (including `DemoteCount`). |
|
||||
| Driver process restart | Counters start at zero. |
|
||||
| Probe transition Stopped→Running | **No automatic reset** — counters are cumulative across reconnect events so operators can spot intermittent links by watching `CommFailures` keep climbing. |
|
||||
| Probe transition Demoted→Running | **PR ablegacy-12** — early-clear of the active demote window, but the cumulative `DemoteCount` stays put. |
|
||||
|
||||
There is no in-process "reset" RPC at the time of writing. If you need to
|
||||
clear counters without a redeploy, kick a `ReinitializeAsync` from the Admin
|
||||
@@ -99,14 +104,85 @@ overview dashboard, plus a faster rate (1 s) on `LastErrorMessage` /
|
||||
short-circuit makes every read O(1) — there's no penalty for fast polling
|
||||
of the counter itself, only the OPC UA subscription bookkeeping.
|
||||
|
||||
## Auto-demote on comm failure (PR ablegacy-12 / #255)
|
||||
|
||||
When a device fails N consecutive reads or probes the driver marks it
|
||||
**Demoted** for a configurable cool-down window. Reads against a demoted
|
||||
device short-circuit with `BadCommunicationError` *without invoking
|
||||
libplctag* — that's the whole point of the feature: one slow PLC sharing
|
||||
the driver thread can't starve faster peers reading from healthy hosts on
|
||||
the same `AbLegacyDriver` instance.
|
||||
|
||||
### Configuration
|
||||
|
||||
Per-device, optional. `null` keeps the documented defaults (auto-demote
|
||||
**enabled** with 3 failures / 30 s).
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"Devices": [
|
||||
{
|
||||
"HostAddress": "ab://10.0.0.5/1,0",
|
||||
"PlcFamily": "Slc500",
|
||||
"Demote": {
|
||||
"FailureThreshold": 3, // default 3
|
||||
"DemoteForMs": 30000, // default 30s
|
||||
"Enabled": true // default true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
| Knob | Default | Notes |
|
||||
|---|---|---|
|
||||
| `FailureThreshold` | `3` | Consecutive comm failures before the device is demoted. A successful read or probe resets the tally. Terminal failures (`BadNodeIdUnknown`, `BadTypeMismatch`, …) **do not count** — they're config / decoder mismatches, not field outages. |
|
||||
| `DemoteForMs` | `30000` (30s) | Cool-down window. Reads while this is active short-circuit; a successful probe clears it early. |
|
||||
| `Enabled` | `true` | Set to `false` to keep the diagnostic counters but skip the auto-throttle. The failure tally still ticks but never arms the cool-down. |
|
||||
|
||||
### Recovery
|
||||
|
||||
Three ways out of Demoted, in order of likelihood:
|
||||
|
||||
1. **Probe success** — the per-device probe loop (`Probe.Enabled = true`,
|
||||
default address `S:0`) is the fast path. The next probe iteration after
|
||||
demotion will exercise the wire; on success it clears
|
||||
`DemotedUntilUtc` immediately and transitions the host to `Running`.
|
||||
2. **Window expiry** — once `DemoteForMs` elapses the demote marker
|
||||
clears on the next read attempt. The read goes through; if it fails,
|
||||
the failure tally keeps counting from where it left off (so a
|
||||
permanently-down device re-arms the window after one more consecutive
|
||||
failure rather than having to repeat the full threshold).
|
||||
3. **`ReinitializeAsync`** — clears `ConsecutiveFailures` +
|
||||
`DemotedUntilUtc` outright. Cumulative `DemoteCount` survives.
|
||||
|
||||
### Observability
|
||||
|
||||
`DemoteCount` is the headline counter — it bumps once per demotion event,
|
||||
not per short-circuited read. A device that flaps every hour for a week
|
||||
shows `DemoteCount = ~168` on Friday afternoon, which is the operator
|
||||
signal you actually want.
|
||||
|
||||
`LastDemotedUtc` is the ISO-8601 UTC timestamp of the most recent
|
||||
demotion. Bind it on a per-device tile alongside `DemoteCount` for
|
||||
"flapping link" alerting.
|
||||
|
||||
### Host-state surface
|
||||
|
||||
A demoted device reports `HostState.Demoted` (new in PR ablegacy-12
|
||||
on `Core.Abstractions/IHostConnectivityProbe.cs`). Consumers that
|
||||
predate the new value (the central `HostStatusPublisher`) safely treat
|
||||
it as `Stopped` — no schema migration needed.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`AbLegacyDiagnosticTags.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDiagnosticTags.cs)
|
||||
— counter store + read short-circuit
|
||||
- [`AbLegacyDriver.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs)
|
||||
— increment sites in `ReadAsync`, discovery emission in `DiscoverAsync`
|
||||
— increment sites in `ReadAsync`, discovery emission in `DiscoverAsync`,
|
||||
auto-demote bookkeeping in `RecordFailureAndMaybeDemote` + `ProbeLoopAsync`
|
||||
- [`AbLegacy-Test-Fixture.md`](AbLegacy-Test-Fixture.md) — `AbLegacyDiagnosticsTests`
|
||||
+ collision-rejection contract
|
||||
+ `AbLegacyAutoDemoteTests` + collision-rejection contract
|
||||
- [AB CIP `_System/` parallel](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipSystemTagSource.cs)
|
||||
— same pattern with the CIP-specific six entries (incl. writeable
|
||||
`_RefreshTagDb` trigger)
|
||||
|
||||
Reference in New Issue
Block a user