Files
lmxopcua/docs/drivers/AbLegacy-Diagnostics.md
2026-04-26 08:44:53 -04:00

9.3 KiB
Raw Blame History

AB Legacy diagnostic counters

Per-device diagnostic counters surface as auto-generated read-only OPC UA variables under each device's synthetic _Diagnostics/ folder. HMIs can bind directly without going through a separate diagnostics RPC. Mirrors the AB CIP _System/ pattern from PR abcip-4.3.

Closes #253 (PR ablegacy-10).

The nine counters

Each device managed by the AbLegacyDriver exposes nine read-only nodes under AbLegacy/<host>/_Diagnostics/<name>. The first seven shipped in PR ablegacy-10; DemoteCount + LastDemotedUtc arrived with PR ablegacy-12 / #255 (auto-demote on comm failure).

Name Type Semantics
RequestCount Int64 Total ReadAsync requests issued against this device. One increment per non-diagnostic reference per call, success or failure.
ResponseCount Int64 Successful read responses.
ErrorCount Int64 Failed read responses (any non-Good status).
RetryCount Int64 Retry attempts beyond the first per the PR 9 retry loop. A single read with two retries adds two.
LastErrorCode Int32 Most recent libplctag status code on a failed read; 0 when no error has been seen since the last reset.
LastErrorMessage String Most recent libplctag error message on a failed read; empty when no error has been seen since the last reset.
CommFailures Int64 Count of read failures mapped to BadCommunicationError. Spans transient libplctag throws + retried-out chains so operators see a single "wire fell off" counter.
DemoteCount Int64 PR ablegacy-12 — cumulative auto-demote events for this device. Bumps every time the driver crosses the consecutive-failure threshold and arms a fresh cool-down window. Cumulative across ReinitializeAsync (preserved through redeploys) so a flapping link surfaces as a steadily climbing counter.
LastDemotedUtc String PR ablegacy-12 — ISO-8601 UTC timestamp of the most recent auto-demotion. Empty string when this device has never been demoted.

Address shape: _Diagnostics/<deviceHostAddress>/<name> — e.g. _Diagnostics/ab://10.0.0.5/1,0/RequestCount.

The <deviceHostAddress> segment is the canonical ab://host[:port]/cip-path string from AbLegacyDeviceOptions.HostAddress. The browse path looks like AbLegacy/<deviceHostAddress>/_Diagnostics/<name> — the same shape as a user-config tag node, just under a reserved sibling folder.

Reset behaviour

Trigger Effect
ReinitializeAsync Every counter for every device resets to zero, plus LastErrorMessage clears to empty. PR ablegacy-12 exception: DemoteCount + LastDemotedUtc survive the reinit so an operator redeploying mid-incident doesn't lose the flapping-link history.
ShutdownAsync All counters drop with the device map (including DemoteCount).
Driver process restart Counters start at zero.
Probe transition Stopped→Running No automatic reset — counters are cumulative across reconnect events so operators can spot intermittent links by watching CommFailures keep climbing.
Probe transition Demoted→Running PR ablegacy-12 — early-clear of the active demote window, but the cumulative DemoteCount stays put.

There is no in-process "reset" RPC at the time of writing. If you need to clear counters without a redeploy, kick a ReinitializeAsync from the Admin RPC surface — the driver re-EnsureDevice's each host so the freshly registered counters start at zero.

What does not increment counters

Reads against _Diagnostics/<host>/<name> are driver-local observability, not field traffic — they short-circuit before the libplctag dispatch and do NOT increment RequestCount or any other counter. Otherwise a 1 Hz HMI poll of RequestCount would make the counter chase its own tail.

Writes against _Diagnostics/* are rejected with BadNotWritable because every diagnostic node is SecurityClassification.ViewOnly — a misbehaving SCADA template can't accidentally clobber the diagnostic surface.

Collision with user tags

User-config tags must not shadow the seven reserved diagnostic names and must not live under the synthetic _Diagnostics/ folder. Both shapes are rejected at InitializeAsync time with a clear InvalidOperationException:

  • A tag named RequestCount (or any of the other six reserved names) is rejected because it would silently never resolve at read time — the diagnostics short-circuit wins.
  • A tag whose Address starts with _Diagnostics/ is rejected because the whole prefix is owned by the auto-emitted counters.

Pick a different name (SiteRequestCount, MachineRequestCount) or a different address path (real PCCC files like N7:0).

HMI binding examples

OPC UA Client CLI

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
  -u opc.tcp://localhost:4840 `
  -n "ns=2;s=AbLegacy/ab://10.0.0.5/1,0/_Diagnostics/RequestCount"

AB Legacy CLI (driver-direct, no OPC UA layer)

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli -- read `
  -g "ab://10.0.0.5/1,0" -P Slc500 `
  --address "_Diagnostics/RequestCount"

The driver-direct path lets you sanity-check the counter without standing up an OPC UA server — useful when triaging a wire-level issue on the bench.

Subscription pattern

Subscribe to all seven counters at a slow rate (e.g. 510 s) on a long-lived overview dashboard, plus a faster rate (1 s) on LastErrorMessage / LastErrorCode when actively debugging a flapping link. The diagnostics short-circuit makes every read O(1) — there's no penalty for fast polling of the counter itself, only the OPC UA subscription bookkeeping.

Auto-demote on comm failure (PR ablegacy-12 / #255)

When a device fails N consecutive reads or probes the driver marks it Demoted for a configurable cool-down window. Reads against a demoted device short-circuit with BadCommunicationError without invoking libplctag — that's the whole point of the feature: one slow PLC sharing the driver thread can't starve faster peers reading from healthy hosts on the same AbLegacyDriver instance.

Configuration

Per-device, optional. null keeps the documented defaults (auto-demote enabled with 3 failures / 30 s).

{
  "Devices": [
    {
      "HostAddress": "ab://10.0.0.5/1,0",
      "PlcFamily": "Slc500",
      "Demote": {
        "FailureThreshold": 3,    // default 3
        "DemoteForMs": 30000,     // default 30s
        "Enabled": true           // default true
      }
    }
  ]
}
Knob Default Notes
FailureThreshold 3 Consecutive comm failures before the device is demoted. A successful read or probe resets the tally. Terminal failures (BadNodeIdUnknown, BadTypeMismatch, …) do not count — they're config / decoder mismatches, not field outages.
DemoteForMs 30000 (30s) Cool-down window. Reads while this is active short-circuit; a successful probe clears it early.
Enabled true Set to false to keep the diagnostic counters but skip the auto-throttle. The failure tally still ticks but never arms the cool-down.

Recovery

Three ways out of Demoted, in order of likelihood:

  1. Probe success — the per-device probe loop (Probe.Enabled = true, default address S:0) is the fast path. The next probe iteration after demotion will exercise the wire; on success it clears DemotedUntilUtc immediately and transitions the host to Running.
  2. Window expiry — once DemoteForMs elapses the demote marker clears on the next read attempt. The read goes through; if it fails, the failure tally keeps counting from where it left off (so a permanently-down device re-arms the window after one more consecutive failure rather than having to repeat the full threshold).
  3. ReinitializeAsync — clears ConsecutiveFailures + DemotedUntilUtc outright. Cumulative DemoteCount survives.

Observability

DemoteCount is the headline counter — it bumps once per demotion event, not per short-circuited read. A device that flaps every hour for a week shows DemoteCount = ~168 on Friday afternoon, which is the operator signal you actually want.

LastDemotedUtc is the ISO-8601 UTC timestamp of the most recent demotion. Bind it on a per-device tile alongside DemoteCount for "flapping link" alerting.

Host-state surface

A demoted device reports HostState.Demoted (new in PR ablegacy-12 on Core.Abstractions/IHostConnectivityProbe.cs). Consumers that predate the new value (the central HostStatusPublisher) safely treat it as Stopped — no schema migration needed.

Cross-references

  • AbLegacyDiagnosticTags.cs — counter store + read short-circuit
  • AbLegacyDriver.cs — increment sites in ReadAsync, discovery emission in DiscoverAsync, auto-demote bookkeeping in RecordFailureAndMaybeDemote + ProbeLoopAsync
  • AbLegacy-Test-Fixture.mdAbLegacyDiagnosticsTests
    • AbLegacyAutoDemoteTests + collision-rejection contract
  • AB CIP _System/ parallel — same pattern with the CIP-specific six entries (incl. writeable _RefreshTagDb trigger)