Files
lmxopcua/docs/HistoricalDataAccess.md
Joseph Doherty 3b2defd94f Phase 0 — mechanical rename ZB.MOM.WW.LmxOpcUa.* → ZB.MOM.WW.OtOpcUa.*
Renames all 11 projects (5 src + 6 tests), the .slnx solution file, all source-file namespaces, all axaml namespace references, and all v1 documentation references in CLAUDE.md and docs/*.md (excluding docs/v2/ which is already in OtOpcUa form). Also updates the TopShelf service registration name from "LmxOpcUa" to "OtOpcUa" per Phase 0 Task 0.6.

Preserves runtime identifiers per Phase 0 Out-of-Scope rules to avoid breaking v1/v2 client trust during coexistence: OPC UA `ApplicationUri` defaults (`urn:{GalaxyName}:LmxOpcUa`), server `EndpointPath` (`/LmxOpcUa`), `ServerName` default (feeds cert subject CN), `MxAccessConfiguration.ClientName` default (defensive — stays "LmxOpcUa" for MxAccess audit-trail consistency), client OPC UA identifiers (`ApplicationName = "LmxOpcUaClient"`, `ApplicationUri = "urn:localhost:LmxOpcUaClient"`, cert directory `%LocalAppData%\LmxOpcUaClient\pki\`), and the `LmxOpcUaServer` class name (class rename out of Phase 0 scope per Task 0.5 sed pattern; happens in Phase 1 alongside `LmxNodeManager → GenericDriverNodeManager` Core extraction). 23 LmxOpcUa references retained, all enumerated and justified in `docs/v2/implementation/exit-gate-phase-0.md`.

Build clean: 0 errors, 30 warnings (lower than baseline 167). Tests at strict improvement over baseline: 821 passing / 1 failing vs baseline 820 / 2 (one flaky pre-existing failure passed this run; the other still fails — both pre-existing and unrelated to the rename). `Client.UI.Tests`, `Historian.Aveva.Tests`, `Client.Shared.Tests`, `IntegrationTests` all match baseline exactly. Exit gate compliance results recorded in `docs/v2/implementation/exit-gate-phase-0.md` with all 7 checks PASS or DEFERRED-to-PR-review (#7 service install verification needs Windows service permissions on the reviewer's box).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:57:47 -04:00

237 lines
18 KiB
Markdown

# Historical Data Access
`LmxNodeManager` exposes OPC UA historical data access (HDA) through an abstract `IHistorianDataSource` interface (`Historian/IHistorianDataSource.cs`). The Wonderware Historian implementation lives in a separate assembly, `ZB.MOM.WW.OtOpcUa.Historian.Aveva`, which is loaded at runtime only when `Historian.Enabled=true`. This keeps the `aahClientManaged` SDK out of the core Host so deployments that do not need history do not need the SDK installed.
## Plugin Architecture
The historian surface is split across two assemblies:
- **`ZB.MOM.WW.OtOpcUa.Host`** (core) owns only OPC UA / BCL types:
- `IHistorianDataSource` -- the interface `LmxNodeManager` depends on
- `HistorianEventDto` -- SDK-free representation of a historian event record
- `HistorianAggregateMap` -- maps OPC UA aggregate NodeIds to AnalogSummary column names
- `HistorianPluginLoader` -- loads the plugin via `Assembly.LoadFrom` at startup
- `HistoryContinuationPointManager` -- paginates HistoryRead results
- **`ZB.MOM.WW.OtOpcUa.Historian.Aveva`** (plugin) owns everything SDK-bound:
- `HistorianDataSource` -- implements `IHistorianDataSource`, wraps `aahClientManaged`
- `IHistorianConnectionFactory` / `SdkHistorianConnectionFactory` -- opens and polls `ArchestrA.HistorianAccess` connections
- `AvevaHistorianPluginEntry.Create(HistorianConfiguration)` -- the static factory invoked by the loader
The plugin assembly and its SDK dependencies (`aahClientManaged.dll`, `aahClient.dll`, `aahClientCommon.dll`, `Historian.CBE.dll`, `Historian.DPAPI.dll`, `ArchestrA.CloudHistorian.Contract.dll`) deploy to a `Historian/` subfolder next to `ZB.MOM.WW.OtOpcUa.Host.exe`. See [Service Hosting](ServiceHosting.md#required-runtime-assemblies) for the full layout and deployment matrix.
## Plugin Loading
When the service starts with `Historian.Enabled=true`, `OpcUaService` calls `HistorianPluginLoader.TryLoad(config)`. The loader:
1. Probes `AppDomain.CurrentDomain.BaseDirectory\Historian\ZB.MOM.WW.OtOpcUa.Historian.Aveva.dll`.
2. Installs a one-shot `AppDomain.AssemblyResolve` handler that redirects any `aahClientManaged`/`aahClientCommon`/`Historian.*` lookups to the same subfolder, so the CLR can resolve SDK dependencies when the plugin first JITs.
3. Calls the plugin's `AvevaHistorianPluginEntry.Create(HistorianConfiguration)` via reflection and returns the resulting `IHistorianDataSource`.
4. On any failure (plugin missing, entry type not found, SDK assembly unresolvable, bad image), logs a warning with the expected plugin path and returns `null`. The server starts normally and `LmxNodeManager` returns `BadHistoryOperationUnsupported` for every history call.
## Wonderware Historian SDK
The plugin uses the AVEVA Historian managed SDK (`aahClientManaged.dll`) to query historical data. The SDK provides a cursor-based query API through `ArchestrA.HistorianAccess`, replacing direct SQL queries against the Historian Runtime database. Two query types are used:
- **`HistoryQuery`** -- Raw historical samples with timestamp, value (numeric or string), and OPC quality.
- **`AnalogSummaryQuery`** -- Pre-computed aggregates with properties for Average, Minimum, Maximum, ValueCount, First, Last, StdDev, and more.
The SDK DLLs are located in `lib/` and originate from `C:\Program Files (x86)\Wonderware\Historian\`. Only the plugin project (`src/ZB.MOM.WW.OtOpcUa.Historian.Aveva/`) references them at build time; the core Host project does not.
## Configuration
`HistorianConfiguration` controls the SDK connection:
```csharp
public class HistorianConfiguration
{
public bool Enabled { get; set; } = false;
public string ServerName { get; set; } = "localhost";
public List<string> ServerNames { get; set; } = new();
public int FailureCooldownSeconds { get; set; } = 60;
public bool IntegratedSecurity { get; set; } = true;
public string? UserName { get; set; }
public string? Password { get; set; }
public int Port { get; set; } = 32568;
public int CommandTimeoutSeconds { get; set; } = 30;
public int MaxValuesPerRead { get; set; } = 10000;
public int RequestTimeoutSeconds { get; set; } = 60;
}
```
When `Enabled` is `false`, `HistorianPluginLoader.TryLoad` is not called, no plugin is loaded, and the node manager returns `BadHistoryOperationUnsupported` for history read requests. When `Enabled` is `true` but the plugin cannot be loaded (missing `Historian/` subfolder, SDK assembly resolve failure, etc.), the server still starts and returns the same `BadHistoryOperationUnsupported` status with a warning in the log.
### Connection Properties
| Property | Default | Description |
|---|---|---|
| `ServerName` | `localhost` | Single Historian server hostname used when `ServerNames` is empty. Preserved for backward compatibility with pre-cluster deployments |
| `ServerNames` | `[]` | Ordered list of Historian cluster nodes. When non-empty, supersedes `ServerName` and enables read-only cluster failover (see [Cluster Failover](#read-only-cluster-failover)) |
| `FailureCooldownSeconds` | `60` | How long a failed cluster node is skipped before being re-tried. Zero means no cooldown (retry on every request) |
| `IntegratedSecurity` | `true` | Use Windows authentication |
| `UserName` | `null` | Username when `IntegratedSecurity` is false |
| `Password` | `null` | Password when `IntegratedSecurity` is false |
| `Port` | `32568` | Historian TCP port |
| `CommandTimeoutSeconds` | `30` | SDK packet timeout in seconds (inner async bound) |
| `RequestTimeoutSeconds` | `60` | Outer safety timeout applied to sync-over-async history reads on the OPC UA stack thread. Backstop for `CommandTimeoutSeconds`; a timed-out read returns `BadTimeout`. Should be greater than `CommandTimeoutSeconds`. Stability review 2026-04-13 Finding 3 |
| `MaxValuesPerRead` | `10000` | Maximum values per history read request |
## Connection Lifecycle
`HistorianDataSource` (in the plugin assembly) maintains a persistent connection to the Historian server via `ArchestrA.HistorianAccess`:
1. **Lazy connect** -- The connection is established on the first query via `EnsureConnected()`. When a cluster is configured, the data source iterates `HistorianClusterEndpointPicker.GetHealthyNodes()` in order and returns the first node that successfully connects.
2. **Connection reuse** -- Subsequent queries reuse the same connection. The active node is tracked in `_activeProcessNode` / `_activeEventNode` and surfaced on the dashboard.
3. **Auto-reconnect** -- On connection failure, the connection is disposed, the active node is marked failed in the picker, and the next query re-enters the picker loop to try the next eligible candidate.
4. **Clean shutdown** -- `Dispose()` closes the connection when the service stops.
The connection is opened with `ReadOnly = true` and `ConnectionType = Process`. The event (alarm history) path uses a separate connection with `ConnectionType = Event`, but both silos share the same cluster picker so a node that fails on one silo is immediately skipped on the other.
## Read-Only Cluster Failover
When `HistorianConfiguration.ServerNames` is non-empty, the plugin picks from an ordered list of cluster nodes instead of a single `ServerName`. Each connection attempt tries candidates in configuration order until one succeeds. Failed nodes are placed into a timed cooldown and re-admitted when the cooldown elapses.
### HistorianClusterEndpointPicker
The picker (in the plugin assembly, internal) is pure logic with no SDK dependency — all cluster behavior is unit-testable with a fake clock and scripted factory. Key characteristics:
- **Ordered iteration**: nodes are tried in the exact order they appear in `ServerNames`. Operators can express a preference ("primary first, fallback second") by ordering the list.
- **Per-node cooldown**: `MarkFailed(node, error)` starts a `FailureCooldownSeconds` window during which the node is skipped from `GetHealthyNodes()`. `MarkHealthy(node)` clears the window immediately (used on successful connect).
- **Automatic re-admission**: when a node's cooldown elapses, the next call to `GetHealthyNodes()` includes it automatically — no background probe, no manual reset. The cumulative `FailureCount` and `LastError` are retained for operator diagnostics.
- **Thread-safe**: a single lock guards the per-node state. Operations are microsecond-scale so contention is a non-issue.
- **Shared across silos**: one picker instance is shared by the process-values connection and the event-history connection, so a node failure on one path immediately benches it for the other.
- **Zero cooldown mode**: `FailureCooldownSeconds = 0` disables the cooldown entirely — the node is never benched. Useful for tests or for operators who want the SDK's own retry semantics to be the sole gate.
### Connection attempt flow
`HistorianDataSource.ConnectToAnyHealthyNode(HistorianConnectionType)` performs the actual iteration:
1. Snapshot healthy nodes from the picker. If empty, throw `InvalidOperationException` with either "No historian nodes configured" or "All N historian nodes are in cooldown".
2. For each candidate, clone `HistorianConfiguration` with the candidate as `ServerName` and pass it to the factory. On success: `MarkHealthy(node)` and return the `(Connection, Node)` tuple. On exception: `MarkFailed(node, ex.Message)`, log a warning, continue.
3. If all candidates fail, wrap the last inner exception in an `InvalidOperationException` with the cumulative failure count so the existing read-method catch blocks surface a meaningful error through the health counters.
The wrapping exception intentionally includes the last inner error message in the outer `Message` so the health snapshot's `LastError` field is still human-readable when the cluster exhausts every candidate.
### Single-node backward compatibility
When `ServerNames` is empty, the picker is seeded with a single entry from `ServerName` and the iteration loop still runs — it just has one candidate. Legacy deployments see no behavior change: the picker marks the single node healthy on success, runs the same cooldown logic on failure, and the dashboard renders a compact `Node: <hostname>` line instead of the cluster table.
### Cluster health surface
Runtime cluster state is exposed on `HistorianHealthSnapshot`:
- `NodeCount` / `HealthyNodeCount` -- size of the configured cluster and how many are currently eligible.
- `ActiveProcessNode` / `ActiveEventNode` -- which nodes are currently serving the two connection silos, or `null` when a silo has no open connection.
- `Nodes: List<HistorianClusterNodeState>` -- per-node state with `Name`, `IsHealthy`, `CooldownUntil`, `FailureCount`, `LastError`, `LastFailureTime`.
The dashboard renders this as a cluster table when `NodeCount > 1`. See [Status Dashboard](StatusDashboard.md#historian). `HealthCheckService` flips the overall service health to `Degraded` when `HealthyNodeCount < NodeCount` so operators can alert on a partially-failed cluster even while queries are still succeeding via the remaining nodes.
## Runtime Health Counters
`HistorianDataSource` maintains runtime query counters updated on every read method exit — success or failure — so the dashboard can distinguish "plugin loaded but never queried" from "plugin loaded and queries are failing". The load-time `HistorianPluginLoader.LastOutcome` only reports whether the assembly resolved at startup; it cannot catch a connection that succeeds at boot and degrades later.
### Counters
- `TotalQueries` / `TotalSuccesses` / `TotalFailures` — cumulative since startup. Every call to `RecordSuccess` or `RecordFailure` in the read methods updates these under `_healthLock`. Empty result sets count as successes — the counter reflects "the SDK call returned" rather than "the SDK call returned data".
- `ConsecutiveFailures` — latches while queries are failing; reset to zero by the first success. Drives `HealthCheckService` degradation at threshold 3.
- `LastSuccessTime` / `LastFailureTime` — UTC timestamps of the most recent success or failure, or `null` when no query of that outcome has occurred yet.
- `LastError` — exception message from the most recent failure, prefixed with the read-path name (`raw:`, `aggregate:`, `at-time:`, `events:`) so operators can tell which SDK call is broken. Cleared on the next success.
- `ProcessConnectionOpen` / `EventConnectionOpen` — whether the plugin currently holds an open SDK connection on each silo. Read from the data source's `_connection` / `_eventConnection` fields via a `Volatile.Read`.
These fields are read once per dashboard refresh via `IHistorianDataSource.GetHealthSnapshot()` and serialized into `HistorianStatusInfo`. See [Status Dashboard](StatusDashboard.md#historian) for the HTML/JSON surface.
### Two SDK connection silos
The plugin maintains two independent `ArchestrA.HistorianAccess` connections, one per `HistorianConnectionType`:
- **Process connection** (`ConnectionType = Process`) — serves historical *value* queries: `ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`. This is the SDK's query channel for tags stored in the Historian runtime.
- **Event connection** (`ConnectionType = Event`) — serves historical *event/alarm* queries: `ReadEventsAsync`. The SDK requires a separately opened connection for its event store because the query API and wire schema are distinct from value queries.
Both connections are lazy: they open on the first query that needs them. Either can be open, closed, or open against a different cluster node than the other. The dashboard renders both independently in the Historian panel (`Process Conn: open (host-a) | Event Conn: closed`) so operators can tell which silos are active and which node is serving each. When cluster support is configured, both silos share the same `HistorianClusterEndpointPicker`, so a failure on one silo marks the node unhealthy for the other as well.
## Raw Reads
`IHistorianDataSource.ReadRawAsync` (plugin implementation) uses a `HistoryQuery` to retrieve individual samples within a time range:
1. Create a `HistoryQuery` via `_connection.CreateHistoryQuery()`
2. Configure `HistoryQueryArgs` with `TagNames`, `StartDateTime`, `EndDateTime`, and `RetrievalMode = Full`
3. Iterate: `StartQuery` -> `MoveNext` loop -> `EndQuery`
Each result row is converted to an OPC UA `DataValue`:
- `QueryResult.Value` (double) takes priority; `QueryResult.StringValue` is used as fallback for string-typed tags.
- `SourceTimestamp` and `ServerTimestamp` are both set to `QueryResult.StartDateTime`.
- `StatusCode` is mapped from the `QueryResult.OpcQuality` (UInt16) via `QualityMapper` (the same OPC DA quality byte mapping used for live MXAccess data).
## Aggregate Reads
`IHistorianDataSource.ReadAggregateAsync` (plugin implementation) uses an `AnalogSummaryQuery` to retrieve pre-computed aggregates:
1. Create an `AnalogSummaryQuery` via `_connection.CreateAnalogSummaryQuery()`
2. Configure `AnalogSummaryQueryArgs` with `TagNames`, `StartDateTime`, `EndDateTime`, and `Resolution` (milliseconds)
3. Iterate the same `StartQuery` -> `MoveNext` -> `EndQuery` pattern
4. Extract the requested aggregate from named properties on `AnalogSummaryQueryResult`
Null aggregate values return `BadNoData` status rather than `Good` with a null variant.
## Quality Mapping
The Historian SDK returns standard OPC DA quality values in `QueryResult.OpcQuality` (UInt16). The low byte is passed through the shared `QualityMapper` pipeline (`MapFromMxAccessQuality` -> `MapToOpcUaStatusCode`), which maps the OPC DA quality families to OPC UA status codes:
| OPC Quality Byte | OPC DA Family | OPC UA StatusCode |
|---|---|---|
| 0-63 | Bad | `Bad` (with sub-code when an exact enum match exists) |
| 64-191 | Uncertain | `Uncertain` (with sub-code when an exact enum match exists) |
| 192+ | Good | `Good` (with sub-code when an exact enum match exists) |
See `Domain/QualityMapper.cs` and `Domain/Quality.cs` for the full mapping table and sub-code definitions.
## Aggregate Function Mapping
`HistorianAggregateMap.MapAggregateToColumn` (in the core Host assembly, so the node manager can validate aggregate support without requiring the plugin to be loaded) translates OPC UA aggregate NodeIds to `AnalogSummaryQueryResult` property names:
| OPC UA Aggregate | Result Property |
|---|---|
| `AggregateFunction_Average` | `Average` |
| `AggregateFunction_Minimum` | `Minimum` |
| `AggregateFunction_Maximum` | `Maximum` |
| `AggregateFunction_Count` | `ValueCount` |
| `AggregateFunction_Start` | `First` |
| `AggregateFunction_End` | `Last` |
| `AggregateFunction_StandardDeviationPopulation` | `StdDev` |
Unsupported aggregates return `null`, which causes the node manager to return `BadAggregateNotSupported`.
## HistoryReadRawModified Override
`LmxNodeManager` overrides `HistoryReadRawModified` to handle raw history read requests:
1. Resolve the `NodeHandle` to a tag reference via `_nodeIdToTagReference`. Return `BadNodeIdUnknown` if not found.
2. Check that `_historianDataSource` is not null. Return `BadHistoryOperationUnsupported` if historian is disabled.
3. Call `ReadRawAsync` with the time range and `NumValuesPerNode` from the `ReadRawModifiedDetails`.
4. Pack the resulting `DataValue` list into a `HistoryData` object and wrap it in an `ExtensionObject` for the `HistoryReadResult`.
## HistoryReadProcessed Override
`HistoryReadProcessed` handles aggregate history requests with additional validation:
1. Resolve the node and check historian availability (same as raw).
2. Validate that `AggregateType` is present in the `ReadProcessedDetails`. Return `BadAggregateListMismatch` if empty.
3. Map the requested aggregate to a result property via `MapAggregateToColumn`. Return `BadAggregateNotSupported` if unmapped.
4. Call `ReadAggregateAsync` with the time range, `ProcessingInterval`, and property name.
5. Return results in the same `HistoryData` / `ExtensionObject` format.
## Historizing Flag and AccessLevel
During variable node creation in `CreateAttributeVariable`, attributes with `IsHistorized == true` receive two additional settings:
```csharp
if (attr.IsHistorized)
accessLevel |= AccessLevels.HistoryRead;
variable.Historizing = attr.IsHistorized;
```
- **`Historizing = true`** -- Tells OPC UA clients that this node has historical data available.
- **`AccessLevels.HistoryRead`** -- Enables the `HistoryRead` access bit on the node, which the OPC UA stack checks before routing history requests to the node manager override. Nodes without this bit set will be rejected by the framework before reaching `HistoryReadRawModified` or `HistoryReadProcessed`.
The `IsHistorized` flag originates from the Galaxy repository database query, which checks whether the attribute has Historian logging configured.