# AB Legacy diagnostic counters Per-device diagnostic counters surface as auto-generated read-only OPC UA variables under each device's synthetic `_Diagnostics/` folder. HMIs can bind directly without going through a separate diagnostics RPC. Mirrors the AB CIP `_System/` pattern from PR abcip-4.3. Closes #253 (PR ablegacy-10). ## The nine counters Each device managed by the `AbLegacyDriver` exposes nine read-only nodes under `AbLegacy//_Diagnostics/`. The first seven shipped in PR ablegacy-10; `DemoteCount` + `LastDemotedUtc` arrived with PR ablegacy-12 / #255 (auto-demote on comm failure). | Name | Type | Semantics | |---|---|---| | `RequestCount` | Int64 | Total `ReadAsync` requests issued against this device. One increment per non-diagnostic reference per call, success or failure. | | `ResponseCount` | Int64 | Successful read responses. | | `ErrorCount` | Int64 | Failed read responses (any non-Good status). | | `RetryCount` | Int64 | Retry attempts beyond the first per the PR 9 retry loop. A single read with two retries adds two. | | `LastErrorCode` | Int32 | Most recent libplctag status code on a failed read; `0` when no error has been seen since the last reset. | | `LastErrorMessage` | String | Most recent libplctag error message on a failed read; empty when no error has been seen since the last reset. | | `CommFailures` | Int64 | Count of read failures mapped to `BadCommunicationError`. Spans transient libplctag throws + retried-out chains so operators see a single "wire fell off" counter. | | `DemoteCount` | Int64 | **PR ablegacy-12** — cumulative auto-demote events for this device. Bumps every time the driver crosses the consecutive-failure threshold and arms a fresh cool-down window. Cumulative across `ReinitializeAsync` (preserved through redeploys) so a flapping link surfaces as a steadily climbing counter. | | `LastDemotedUtc` | String | **PR ablegacy-12** — ISO-8601 UTC timestamp of the most recent auto-demotion. Empty string when this device has never been demoted. | **Address shape**: `_Diagnostics//` — e.g. `_Diagnostics/ab://10.0.0.5/1,0/RequestCount`. The `` segment is the canonical `ab://host[:port]/cip-path` string from `AbLegacyDeviceOptions.HostAddress`. The browse path looks like `AbLegacy//_Diagnostics/` — the same shape as a user-config tag node, just under a reserved sibling folder. ## Reset behaviour | Trigger | Effect | |---|---| | `ReinitializeAsync` | Every counter for every device resets to zero, plus `LastErrorMessage` clears to empty. **PR ablegacy-12 exception:** `DemoteCount` + `LastDemotedUtc` survive the reinit so an operator redeploying mid-incident doesn't lose the flapping-link history. | | `ShutdownAsync` | All counters drop with the device map (including `DemoteCount`). | | Driver process restart | Counters start at zero. | | Probe transition Stopped→Running | **No automatic reset** — counters are cumulative across reconnect events so operators can spot intermittent links by watching `CommFailures` keep climbing. | | Probe transition Demoted→Running | **PR ablegacy-12** — early-clear of the active demote window, but the cumulative `DemoteCount` stays put. | There is no in-process "reset" RPC at the time of writing. If you need to clear counters without a redeploy, kick a `ReinitializeAsync` from the Admin RPC surface — the driver re-EnsureDevice's each host so the freshly registered counters start at zero. ## What does *not* increment counters Reads against `_Diagnostics//` are **driver-local observability**, not field traffic — they short-circuit before the libplctag dispatch and do NOT increment `RequestCount` or any other counter. Otherwise a 1 Hz HMI poll of `RequestCount` would make the counter chase its own tail. Writes against `_Diagnostics/*` are rejected with `BadNotWritable` because every diagnostic node is `SecurityClassification.ViewOnly` — a misbehaving SCADA template can't accidentally clobber the diagnostic surface. ## Collision with user tags User-config tags must not shadow the seven reserved diagnostic names and must not live under the synthetic `_Diagnostics/` folder. Both shapes are rejected at `InitializeAsync` time with a clear `InvalidOperationException`: - A tag named `RequestCount` (or any of the other six reserved names) is rejected because it would silently never resolve at read time — the diagnostics short-circuit wins. - A tag whose `Address` starts with `_Diagnostics/` is rejected because the whole prefix is owned by the auto-emitted counters. Pick a different name (`SiteRequestCount`, `MachineRequestCount`) or a different address path (real PCCC files like `N7:0`). ## HMI binding examples ### OPC UA Client CLI ```powershell dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read ` -u opc.tcp://localhost:4840 ` -n "ns=2;s=AbLegacy/ab://10.0.0.5/1,0/_Diagnostics/RequestCount" ``` ### AB Legacy CLI (driver-direct, no OPC UA layer) ```powershell dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli -- read ` -g "ab://10.0.0.5/1,0" -P Slc500 ` --address "_Diagnostics/RequestCount" ``` The driver-direct path lets you sanity-check the counter without standing up an OPC UA server — useful when triaging a wire-level issue on the bench. ### Subscription pattern Subscribe to all seven counters at a slow rate (e.g. 5–10 s) on a long-lived overview dashboard, plus a faster rate (1 s) on `LastErrorMessage` / `LastErrorCode` when actively debugging a flapping link. The diagnostics short-circuit makes every read O(1) — there's no penalty for fast polling of the counter itself, only the OPC UA subscription bookkeeping. ## Auto-demote on comm failure (PR ablegacy-12 / #255) When a device fails N consecutive reads or probes the driver marks it **Demoted** for a configurable cool-down window. Reads against a demoted device short-circuit with `BadCommunicationError` *without invoking libplctag* — that's the whole point of the feature: one slow PLC sharing the driver thread can't starve faster peers reading from healthy hosts on the same `AbLegacyDriver` instance. ### Configuration Per-device, optional. `null` keeps the documented defaults (auto-demote **enabled** with 3 failures / 30 s). ```jsonc { "Devices": [ { "HostAddress": "ab://10.0.0.5/1,0", "PlcFamily": "Slc500", "Demote": { "FailureThreshold": 3, // default 3 "DemoteForMs": 30000, // default 30s "Enabled": true // default true } } ] } ``` | Knob | Default | Notes | |---|---|---| | `FailureThreshold` | `3` | Consecutive comm failures before the device is demoted. A successful read or probe resets the tally. Terminal failures (`BadNodeIdUnknown`, `BadTypeMismatch`, …) **do not count** — they're config / decoder mismatches, not field outages. | | `DemoteForMs` | `30000` (30s) | Cool-down window. Reads while this is active short-circuit; a successful probe clears it early. | | `Enabled` | `true` | Set to `false` to keep the diagnostic counters but skip the auto-throttle. The failure tally still ticks but never arms the cool-down. | ### Recovery Three ways out of Demoted, in order of likelihood: 1. **Probe success** — the per-device probe loop (`Probe.Enabled = true`, default address `S:0`) is the fast path. The next probe iteration after demotion will exercise the wire; on success it clears `DemotedUntilUtc` immediately and transitions the host to `Running`. 2. **Window expiry** — once `DemoteForMs` elapses the demote marker clears on the next read attempt. The read goes through; if it fails, the failure tally keeps counting from where it left off (so a permanently-down device re-arms the window after one more consecutive failure rather than having to repeat the full threshold). 3. **`ReinitializeAsync`** — clears `ConsecutiveFailures` + `DemotedUntilUtc` outright. Cumulative `DemoteCount` survives. ### Observability `DemoteCount` is the headline counter — it bumps once per demotion event, not per short-circuited read. A device that flaps every hour for a week shows `DemoteCount = ~168` on Friday afternoon, which is the operator signal you actually want. `LastDemotedUtc` is the ISO-8601 UTC timestamp of the most recent demotion. Bind it on a per-device tile alongside `DemoteCount` for "flapping link" alerting. ### Host-state surface A demoted device reports `HostState.Demoted` (new in PR ablegacy-12 on `Core.Abstractions/IHostConnectivityProbe.cs`). Consumers that predate the new value (the central `HostStatusPublisher`) safely treat it as `Stopped` — no schema migration needed. ## Cross-references - [`AbLegacyDiagnosticTags.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDiagnosticTags.cs) — counter store + read short-circuit - [`AbLegacyDriver.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs) — increment sites in `ReadAsync`, discovery emission in `DiscoverAsync`, auto-demote bookkeeping in `RecordFailureAndMaybeDemote` + `ProbeLoopAsync` - [`AbLegacy-Test-Fixture.md`](AbLegacy-Test-Fixture.md) — `AbLegacyDiagnosticsTests` + `AbLegacyAutoDemoteTests` + collision-rejection contract - [AB CIP `_System/` parallel](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipSystemTagSource.cs) — same pattern with the CIP-specific six entries (incl. writeable `_RefreshTagDb` trigger)