Files
lmxopcua/docs/plans/2026-06-17-historian-paging-udt-member-paths-design.md
T

15 KiB

Historian within-timestamp paging (#400) + AbCip/TwinCAT UDT member-paths — Design

Status: Approved 2026-06-17. Branch feat/stillpending-historian-paging-udt-members off master c402872c. Next step: writing-plans → subagent-driven-development → finish (merge to master + push).

Goal

Bundle two stillpending.md §2 backlog items into one phase. They touch disjoint projects (server/historian vs. two driver projects), so their implementers run concurrently.

  • WS-A — #400 within-timestamp tie-cluster paging. Retire the loud BadHistoryOperationUnsupported fail when a single SourceTimestamp carries more raw ties than NumValuesPerNode; page within the timestamp instead.
  • WS-B — AbCip + TwinCAT UDT member-paths. Make individual UDT/struct members individually addressable (discover + read) — what both §2 lines explicitly point to (stillpending.md:50 "Must address individual member paths", :51 "discover UDT members individually").

WS-A — #400: within-timestamp tie-cluster paging

Problem

HistoryRead-Raw is paged server-side over a single-shot backend that only accepts (start, end, cap) — it cannot skip/offset. The tie-safe resume cursor (HistoryContinuationState) is a (NextStartUtc=T, BoundarySkipCount=skip) pair. When more samples share one SourceTimestamp than NumValuesPerNode (an oversized tie cluster), the cursor cannot advance past T: every resume re-reads the same first cap ties, the boundary-tie trim empties the page, and ServeRawPaged (OtOpcUaNodeManager.cs:1881-1892) fails that node loudly with BadHistoryOperationUnsupported. The operator's only remedy today is to re-issue with a bigger NumValuesPerNode. For a single tag's raw history this is a data anomaly (raw samples normally carry strictly increasing distinct timestamps) — hence it was deferred as task #400.

Grounding facts (verified against the code)

  • The backend (HistorianDataSource.ReadRawAsync:405) queries StartDateTime = startTime / EndDateTime = endTime (inclusive both ends, RetrievalMode.Full) and honors an explicit maxValues cap. So a degenerate (T, T, bounded) read returns the whole cluster at exactly T.
  • cap == 0 is NOT truly unlimited at the backend — it falls back to _config.MaxValuesPerRead (HistorianDataSource.cs:431-432,441). So the over-fetch must pass an explicit large cap, never rely on 0 = unlimited.
  • The pure paging helpers already live in HistoryPaging.cs (IsFullPage, ComputeResumeCursor, TrimBoundaryDuplicates) — all static + SDK-free, unit-tested directly. The new decision logic joins them there.
  • Inclusive-start resume + boundary-tie trim are already load-bearing invariants of the current tie-safe cursor; the over-fetch inherits them and adds no new assumption beyond stable backend ordering within a single timestamp (which the existing skip-count cursor already relies on).
  • The config knobs live in ServerHistorianOptions.cs (AddServerHistorian DI mirror).

Approaches considered

  • A — within-timestamp over-fetch + cap-sliced serve (CHOSEN). On detecting the stuck cluster, issue a dedicated degenerate read (T, T, MaxTieClusterOverfetch + 1) to pull the whole cluster at T, then serve it cap samples per page by advancing the existing BoundarySkipCount within the cluster. When the cluster drains, advance the cursor to T.AddTicks(1) with skip 0 (safe: all T-ties emitted; no DateTime tick exists strictly between T and T+1). Re-reading (T,T,bounded) each page is idempotent over immutable history. MaxTieClusterOverfetch (new ServerHistorianOptions field) bounds the over-fetch; a cluster larger than the bound preserves today's loud BadHistoryOperationUnsupported as the ultimate backstop. No contract / backend / sidecar change. The slice + next-cursor decision goes in a new pure HistoryPaging.SliceTieCluster(...) (unit-testable without a server/session/SDK); the node-manager owns only the (T,T,bounded) I/O.
  • B — backend skip/offset parameter (REJECTED). Native within-timestamp offset via a (start, end, cap, skip) backend surface. Requires an IHistorianDataSource.ReadRawAsync contract change and a Wonderware sidecar wire-protocol change — infra-gated and off-limits (no wire/proto change).
  • C — buffer the cluster in the continuation store (REJECTED). Over-fetch once, stash the whole cluster in the continuation-point state, serve slices from the buffer. Avoids re-reads (O(N) vs A's O(N²/cap) transfer) but holds the whole cluster in the session-bound, capped, oldest-evicted store — more state, more memory under many concurrent pathological reads, more eviction edge-cases. A re-reads instead; the re-read cost only bites on a documented anomaly path, and steady-state memory stays one page (plus the transient bounded over-fetch).

Architecture (Approach A)

The change is localized to the existing "stuck" branch of ServeRawPaged (OtOpcUaNodeManager.cs:1881-1892). Today that branch fails the node; instead it:

// stuck: inboundCp present, backend page full, boundary-trim emptied the page
cluster = ReadRawAsync(tagname, T, T, MaxTieClusterOverfetch + 1)   // whole cluster at T
if cluster.Count > MaxTieClusterOverfetch:
    log + BadHistoryOperationUnsupported          // preserve today's loud backstop for absurd clusters
    return
(sliceStart, sliceCount, nextCursor) = HistoryPaging.SliceTieCluster(   // PURE
    clusterCount: cluster.Count, skip: boundarySkip, cap: numValuesPerNode, boundaryT: T, endUtc: endUtc)
serve cluster[sliceStart .. sliceStart+sliceCount)   // Good (or GoodNoData if empty + drained + no window left)
emit continuation point iff nextCursor is not null

SliceTieCluster decides:

  • more ties remain at T (skip + sliceCount < clusterCount) → next cursor (T, skip + sliceCount) (stay within the cluster).
  • cluster drained (skip + sliceCount >= clusterCount) → next cursor (T.AddTicks(1), 0) iff T.AddTicks(1) <= endUtc (there may be window left after T); else terminal (no continuation). A drained slice may be short (< cap) yet still emit a CP — the one legitimate exception to "short page ⇒ terminal", because we know there may be un-read window past T. Documented.

A fresh read that lands on an oversized cluster needs no change: it returns the first cap ties, ComputeResumeCursor stores (T, cap), and the next (resume) read hits the stuck branch — which now over-fetches. The (T, N)/skip >= clusterCount degenerate self-heals (slice empty → drained branch → advance past T).

Testing (WS-A)

  • Pure HistoryPaging.SliceTieCluster unit tests: mid-cluster slice, last (short) slice + advance-past-T, exact-cap-boundary, skip >= clusterCount self-heal, T == endUtc terminal, drained-with-window-left emits CP.
  • ServeRawPaged integration vs a fake IHistorianDataSource returning an oversized cluster (cluster bigger than cap at one timestamp): assert the full cluster is paged out across N resumes with no duplicates/drops, then the window past T is read, then terminal — and that > MaxTieClusterOverfetch still loud-fails.
  • Regression: the existing small-tie-cluster + normal Raw paging tests stay green.

WS-B — AbCip + TwinCAT UDT member-paths

Problem

Both §2 lines point to the same feature — make individual UDT members addressable:

  • AbCip (stillpending.md:50): bare-name read of a member-bearing UDT returns BadNotSupported (AbCipDriver.cs:515-522); UDT type maps to a String placeholder (AbCipDataTypeExtensions.cs:33). "Must address individual member paths."
  • TwinCAT (stillpending.md:51): pre-declared Structure tags are rejected at parse (TwinCATDriverFactoryExtensions.cs:89-100, by design — the error tells the operator to "discover UDT members individually"); discovered UDTs/FBs are silently skipped (MapSymbolTypenullDiscoverAsync drops, AdsTwinCATClient.cs:349/364).

Both path parsers already handle member paths. AbCipTagPath.TryParse / TwinCATSymbolPath.TryParse structurally parse Motor.Speed.Setpoint and reassemble the exact native name libplctag (ToLibplctagName) / ADS (ToAdsSymbolName) read — proven by the just-shipped bit-RMW work (which read/wrote parent.bit through these same surfaces). So the read path for a member reference already works; the gap is purely discovery — neither enumerator walks members:

  • LibplctagTagEnumerator.EnumerateAsync (:28-49) emits one AbCipDiscoveredTag per top-level @tags entry; a UDT instance surfaces as a bare Structure tag with Members metadata, never expanded into addressable member tags.
  • AdsTwinCATClient.BrowseSymbolsAsync (:319-339) emits one TwinCATDiscoveredSymbol per top-level symbol and drops Structure symbols (MapSymbolType → null).

Approaches considered

  • A — discovery-time member expansion, read (CHOSEN). When discovery hits a struct/UDT, walk its members and emit each atomic leaf as an individually-addressable discovered tag carrying its concrete member path + resolved atomic data type. The read path already serves it. Minimal, symmetric across both drivers, fully fake-testable.
    • AbCip: expand from the already-decoded AbCipUdtShape.Members (each carries name + offset + atomic type) the template cache (AbCipTemplateCache) holds. Emit AbCipDiscoveredTag per atomic member with reference Parent.Member and the member's DriverDataType. (Whole-UDT grouped reads via AbCipUdtReadPlanner are an existing efficiency path and stay; member tags are the addressable surface.)
    • TwinCAT: recurse ISymbol.SubSymbols in BrowseSymbolsAsync; for each atomic leaf yield a TwinCATDiscoveredSymbol(InstancePath, mappedAtomicType, readOnly, arrayLength). A struct sub-symbol recurses; an unsupported leaf (nested-UDT-array / pointer / FB internal) is dropped (null), never mis-reported — same discipline as the existing multi-dim-array drop.
  • B — runtime member resolution, no discovery change (REJECTED). Operator hand-types member tags; AbCip resolver walks the template at read time. No browse (worse UX) and more resolver complexity for AbCip; A is cleaner.
  • C — whole-UDT structured read (REJECTED). Return the container as one OPC UA structured value — needs a structured OPC UA DataType + encoder + the retirement of the String placeholder; large surface. Defer.

Architecture (Approach A)

  • AbCip: add member expansion to the discovery emit. Where a discovered tag is a Structure with decoded Members, additionally emit one AbCipDiscoveredTag per atomic member (reference Name.Member, atomic DriverDataType from AbCipDataTypeExtensions/the template member type). Bounded recursion depth for nested structs (config/const cap, e.g. 8). The bare Structure container keeps returning BadNotSupported on direct read (unchanged) — members are the addressable surface.
  • TwinCAT: in BrowseSymbolsAsync, when a symbol's IDataType.Category == Struct, recurse symbol.SubSymbols and yield atomic leaves (full InstancePath, MapSymbolType-resolved type). Bounded recursion depth. The pre-declared-Structure-tag reject at TwinCATDriverFactoryExtensions.cs:89 stays (its message already points to discovery) — but now discovery actually surfaces the members it points to.
  • No IDriver / wire / proto / EF change — both edits are driver-internal discovery.

Named deferrals (not silent)

  • Member writes — read-only members this slice (writes via the atomic path may "just work" but add round-trip risk; deferred + documented).
  • Whole-UDT (container) read/write — stays unsupported; address members.
  • Arrays-of-UDT, UDT-typed array members, pointers/references, FB internals beyond plain struct members — dropped in discovery (never mis-reported), deferred.
  • Nested-struct recursion beyond the bounded depth cap.

Testing (WS-B)

  • AbCip: fake template reader / enumerator yields a UDT with atomic + nested members → assert discovery emits per-member tags with the right reference + atomic type; assert a member-path read resolves + reads (existing fake runtime); assert the bare container still BadNotSupported; depth-cap honored.
  • TwinCAT: FakeTwinCATClient symbol tree with a struct + nested struct + an unsupported leaf → assert BrowseSymbolsAsync yields the atomic leaves (full paths, mapped types), drops the unsupported leaf, honors the depth cap; member read via ToAdsSymbolName works.
  • No bUnit.

Live /run

  • WS-A (#400): unit + integration-proven. A real oversized-tie anomaly needs the AVEVA Historian on 10.100.0.48 (infra-gated — same gate as all of Phase C) plus an actual same-timestamp tie cluster. The fake-source integration test is the canonical proof; a live /run is best-effort/operator-gated.
  • WS-B AbCip: best-effort against a local ab_server ControlLogix (--plc=ControlLogix) — UDT support in ab_server is partial, so likely unit-proven; attempt a member round-trip and record honestly.
  • WS-B TwinCAT: operator-gated (Windows-only ADS runtime; no local target) — unit-proven only.

Task slicing (subagent-driven)

Disjoint projects ⇒ WS-A and WS-B implementers run concurrently.

  • T1 (WS-A)HistoryPaging.SliceTieCluster pure helper + MaxTieClusterOverfetch option + ServeRawPaged stuck-branch rewrite + tests. standard. ∥ T2/T3.
  • T2 (WS-B AbCip) — discovery member expansion + member-read verification + tests. standard. ∥ T1/T3.
  • T3 (WS-B TwinCAT)BrowseSymbolsAsync sub-symbol recursion + tests. standard. ∥ T1/T2.
  • T4 — Docs (docs/Historian.md paging-limitation section → resolved; docs/drivers/AbCip.md + docs/drivers/TwinCAT.md UDT member-path sections) + clear the §2 lines via the plan record only (do not stage stillpending.md). small.
  • T5 — Full build + OpcUaServer + AbCip + TwinCAT tests + final integration review. standard.
  • T6 — Live /run best-effort (WS-A fake-integration; WS-B AbCip ab_server best-effort, TwinCAT operator-gated) + finish branch (merge to master + push) + memory. standard.

Dependency graph: {T1 ∥ T2 ∥ T3} → T4 → T5 → T6.

Hard rules (carried from prior phases)

Stage by explicit path, never git add .; never stage sql_login.txt / src/Server/.../pki/ / pending.md / current.md / docker-dev/docker-compose.yml / stillpending.md; never echo/commit secrets; no force-push; no --no-verify; NO EF migration; NO Commons wire/proto contract change (all edits are driver-internal or server-internal — HistoryPaging/OtOpcUaNodeManager/ServerHistorianOptions and the two drivers' discovery; no IDriver/IHistorianDataSource/WriteResult change); NO bUnit; dangerouslyDisableSandbox for all build/test/rig commands.