9.3 KiB
AB Legacy diagnostic counters
Per-device diagnostic counters surface as auto-generated read-only OPC UA
variables under each device's synthetic _Diagnostics/ folder. HMIs can bind
directly without going through a separate diagnostics RPC. Mirrors the AB CIP
_System/ pattern from PR abcip-4.3.
Closes #253 (PR ablegacy-10).
The nine counters
Each device managed by the AbLegacyDriver exposes nine read-only nodes under
AbLegacy/<host>/_Diagnostics/<name>. The first seven shipped in PR ablegacy-10;
DemoteCount + LastDemotedUtc arrived with PR ablegacy-12 / #255 (auto-demote
on comm failure).
| Name | Type | Semantics |
|---|---|---|
RequestCount |
Int64 | Total ReadAsync requests issued against this device. One increment per non-diagnostic reference per call, success or failure. |
ResponseCount |
Int64 | Successful read responses. |
ErrorCount |
Int64 | Failed read responses (any non-Good status). |
RetryCount |
Int64 | Retry attempts beyond the first per the PR 9 retry loop. A single read with two retries adds two. |
LastErrorCode |
Int32 | Most recent libplctag status code on a failed read; 0 when no error has been seen since the last reset. |
LastErrorMessage |
String | Most recent libplctag error message on a failed read; empty when no error has been seen since the last reset. |
CommFailures |
Int64 | Count of read failures mapped to BadCommunicationError. Spans transient libplctag throws + retried-out chains so operators see a single "wire fell off" counter. |
DemoteCount |
Int64 | PR ablegacy-12 — cumulative auto-demote events for this device. Bumps every time the driver crosses the consecutive-failure threshold and arms a fresh cool-down window. Cumulative across ReinitializeAsync (preserved through redeploys) so a flapping link surfaces as a steadily climbing counter. |
LastDemotedUtc |
String | PR ablegacy-12 — ISO-8601 UTC timestamp of the most recent auto-demotion. Empty string when this device has never been demoted. |
Address shape: _Diagnostics/<deviceHostAddress>/<name> —
e.g. _Diagnostics/ab://10.0.0.5/1,0/RequestCount.
The <deviceHostAddress> segment is the canonical ab://host[:port]/cip-path
string from AbLegacyDeviceOptions.HostAddress. The browse path looks like
AbLegacy/<deviceHostAddress>/_Diagnostics/<name> — the same shape as a
user-config tag node, just under a reserved sibling folder.
Reset behaviour
| Trigger | Effect |
|---|---|
ReinitializeAsync |
Every counter for every device resets to zero, plus LastErrorMessage clears to empty. PR ablegacy-12 exception: DemoteCount + LastDemotedUtc survive the reinit so an operator redeploying mid-incident doesn't lose the flapping-link history. |
ShutdownAsync |
All counters drop with the device map (including DemoteCount). |
| Driver process restart | Counters start at zero. |
| Probe transition Stopped→Running | No automatic reset — counters are cumulative across reconnect events so operators can spot intermittent links by watching CommFailures keep climbing. |
| Probe transition Demoted→Running | PR ablegacy-12 — early-clear of the active demote window, but the cumulative DemoteCount stays put. |
There is no in-process "reset" RPC at the time of writing. If you need to
clear counters without a redeploy, kick a ReinitializeAsync from the Admin
RPC surface — the driver re-EnsureDevice's each host so the freshly registered
counters start at zero.
What does not increment counters
Reads against _Diagnostics/<host>/<name> are driver-local observability,
not field traffic — they short-circuit before the libplctag dispatch and do
NOT increment RequestCount or any other counter. Otherwise a 1 Hz HMI poll
of RequestCount would make the counter chase its own tail.
Writes against _Diagnostics/* are rejected with BadNotWritable because
every diagnostic node is SecurityClassification.ViewOnly — a misbehaving
SCADA template can't accidentally clobber the diagnostic surface.
Collision with user tags
User-config tags must not shadow the seven reserved diagnostic names and
must not live under the synthetic _Diagnostics/ folder. Both shapes are
rejected at InitializeAsync time with a clear InvalidOperationException:
- A tag named
RequestCount(or any of the other six reserved names) is rejected because it would silently never resolve at read time — the diagnostics short-circuit wins. - A tag whose
Addressstarts with_Diagnostics/is rejected because the whole prefix is owned by the auto-emitted counters.
Pick a different name (SiteRequestCount, MachineRequestCount) or a
different address path (real PCCC files like N7:0).
HMI binding examples
OPC UA Client CLI
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840 `
-n "ns=2;s=AbLegacy/ab://10.0.0.5/1,0/_Diagnostics/RequestCount"
AB Legacy CLI (driver-direct, no OPC UA layer)
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli -- read `
-g "ab://10.0.0.5/1,0" -P Slc500 `
--address "_Diagnostics/RequestCount"
The driver-direct path lets you sanity-check the counter without standing up an OPC UA server — useful when triaging a wire-level issue on the bench.
Subscription pattern
Subscribe to all seven counters at a slow rate (e.g. 5–10 s) on a long-lived
overview dashboard, plus a faster rate (1 s) on LastErrorMessage /
LastErrorCode when actively debugging a flapping link. The diagnostics
short-circuit makes every read O(1) — there's no penalty for fast polling
of the counter itself, only the OPC UA subscription bookkeeping.
Auto-demote on comm failure (PR ablegacy-12 / #255)
When a device fails N consecutive reads or probes the driver marks it
Demoted for a configurable cool-down window. Reads against a demoted
device short-circuit with BadCommunicationError without invoking
libplctag — that's the whole point of the feature: one slow PLC sharing
the driver thread can't starve faster peers reading from healthy hosts on
the same AbLegacyDriver instance.
Configuration
Per-device, optional. null keeps the documented defaults (auto-demote
enabled with 3 failures / 30 s).
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PlcFamily": "Slc500",
"Demote": {
"FailureThreshold": 3, // default 3
"DemoteForMs": 30000, // default 30s
"Enabled": true // default true
}
}
]
}
| Knob | Default | Notes |
|---|---|---|
FailureThreshold |
3 |
Consecutive comm failures before the device is demoted. A successful read or probe resets the tally. Terminal failures (BadNodeIdUnknown, BadTypeMismatch, …) do not count — they're config / decoder mismatches, not field outages. |
DemoteForMs |
30000 (30s) |
Cool-down window. Reads while this is active short-circuit; a successful probe clears it early. |
Enabled |
true |
Set to false to keep the diagnostic counters but skip the auto-throttle. The failure tally still ticks but never arms the cool-down. |
Recovery
Three ways out of Demoted, in order of likelihood:
- Probe success — the per-device probe loop (
Probe.Enabled = true, default addressS:0) is the fast path. The next probe iteration after demotion will exercise the wire; on success it clearsDemotedUntilUtcimmediately and transitions the host toRunning. - Window expiry — once
DemoteForMselapses the demote marker clears on the next read attempt. The read goes through; if it fails, the failure tally keeps counting from where it left off (so a permanently-down device re-arms the window after one more consecutive failure rather than having to repeat the full threshold). ReinitializeAsync— clearsConsecutiveFailures+DemotedUntilUtcoutright. CumulativeDemoteCountsurvives.
Observability
DemoteCount is the headline counter — it bumps once per demotion event,
not per short-circuited read. A device that flaps every hour for a week
shows DemoteCount = ~168 on Friday afternoon, which is the operator
signal you actually want.
LastDemotedUtc is the ISO-8601 UTC timestamp of the most recent
demotion. Bind it on a per-device tile alongside DemoteCount for
"flapping link" alerting.
Host-state surface
A demoted device reports HostState.Demoted (new in PR ablegacy-12
on Core.Abstractions/IHostConnectivityProbe.cs). Consumers that
predate the new value (the central HostStatusPublisher) safely treat
it as Stopped — no schema migration needed.
Cross-references
AbLegacyDiagnosticTags.cs— counter store + read short-circuitAbLegacyDriver.cs— increment sites inReadAsync, discovery emission inDiscoverAsync, auto-demote bookkeeping inRecordFailureAndMaybeDemote+ProbeLoopAsyncAbLegacy-Test-Fixture.md—AbLegacyDiagnosticsTestsAbLegacyAutoDemoteTests+ collision-rejection contract
- AB CIP
_System/parallel — same pattern with the CIP-specific six entries (incl. writeable_RefreshTagDbtrigger)