Files
histsdk/docs/reverse-engineering/wcf-string-handle-wall.md
T
Joseph Doherty 4de222c950 Merge re/r1.4-gethi-finding: R1.1 ExecuteSqlCommand + R1.4 GetHistorianInfo (bounded)
# Conflicts:
#	docs/plans/hcal-roadmap.md
#	src/AVEVA.Historian.Client/HistorianClient.cs
#	src/AVEVA.Historian.Client/Protocol/Historian2020ProtocolDialect.cs
#	tests/AVEVA.Historian.Client.Tests/HistorianClientIntegrationTests.cs
#	tools/AVEVA.Historian.NativeTraceHarness/Program.cs
2026-06-21 16:18:49 -04:00

139 lines
9.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# The 2020 WCF string-handle wall (2026-06-20)
> ## ✅✅ RESOLVED (2026-06-20): the "wall" was a handle-FORMAT bug, not a registration wall.
>
> The string-handle ops are reachable from the pure-managed client after all. The Open2
> storage-session GUID must be passed as the `string handle` **UPPERCASE, dash-separated,
> no braces** — `storageSessionId.ToString("D").ToUpperInvariant()`. The earlier probes that
> "proved" the wall passed the GUID in .NET's default **lowercase** `ToString("D")`, which the
> server's session table does not match. Live-verified end-to-end against the local 2020 server:
> - **GETRP** (R1.2) → returns the runtime `HistorianVersion` (shipped).
> - **GETHI** (R1.4) → `returned=True`, returns the version buffer (`0C000000` + UTF-16 "20,0,000,000").
> - **ExeC** (R1.1) → `returned=True`, `Retr.GetV` prime + `ExeC("SELECT 1 AS ProbeValue", option=0)`
> yields `queryHandle`, then `GetR(handle, queryHandle, sequence=0)` returns a 1232-byte result =
> a **BinaryFormatter-serialized .NET DataTable** (stream header `…System.Data, Version=4.0.0.0…`).
>
> Probes: gated `StringHandleProbeDiagnosticTests` (GETHI + ExeC). Captures:
> `scripts/Capture-RuntimeParam.ps1`, `scripts/Capture-ExecSql.ps1`. The handle for ExeC/GetR is the
> **same** Open2 storage-session GUID (confirmed = `outBuff[5..21]`). The original analysis below is
> retained for history; treat its "blocked" conclusions as **superseded** — the only missing piece
> was the uppercase format.
>
> **Update 2026-06-20 — R1.5 `GetTepByNm` shipped; QTB nuance.** `GetTagExtendedPropertiesFromName`
> (`GetTepByNm`) is now **shipped + live-verified** with the uppercase handle
> (`GetTagExtendedPropertiesAsync`; see `wcf-tag-extended-properties.md`). It confirms the
> string-handle Retrieval family is reachable (and `GetTgByNm`/GetTagInfosFromName was observed
> succeeding alongside it). **But not every string-handle op is just a format fix:** `QTB`
> (`StartTagQuery`) was captured being sent with a correctly-**uppercase** handle and still failed
> with `error 1` *server-side* (`CMdServer::StartTagQuery::StartActiveTagnamesQuery` over
> `\\.\pipe\aahMetadataServer\console`). So QTB/QTG (the active-tagnames query family) are blocked by
> the metadata server, not the handle format — distinct from the handle-format wall. **R1.6
> (localized properties) has no distinct op** and collapses into R1.5.
---
Live-probing the local **Historian 2020** (WCF, port 32568) for HCAL roadmap M1
surfaced a clean structural boundary on what the pure-managed client can call. It
explains why R1.1/R1.4/R1.5 all fail and identifies the single RE target that
unblocks the rest of the M1 read surface.
> ⚠️ **Superseded — see the RESOLVED banner above.** The boundary below is real *only* when the
> handle is sent lowercase. With the uppercased storage GUID the string-handle ops succeed.
## The dichotomy
Retrieval/Status/History ops split by the **type of their first (handle) parameter**:
| Handle type | Examples | Status on 2020 WCF |
|---|---|---|
| **`uint` client handle** (Open2 output) | `StartQuery2`, `GetNextQueryResultBuffer2`, `IsOriginalAllowed`, `GetTagInfosFromName`/`GetTagInfoFromName` (GetTgByNm), `GetSystemParameter`, `StartEventQuery`, `GetNextEventQueryResultBuffer`, `RegisterTags2`, `EnsureTags2`, `UpdateClientStatus3` | ✅ **work** — the proven read/browse/metadata/status-param/event/write surface |
| **`string` GUID handle** | `ExecuteSqlCommand` (ExeC), `StartTagQuery` (QTB), `QueryTag` (QTG), `GetHistorianInfo` (GETHI), `GetTagExtendedPropertiesFromName` (GetTepByNm), `GetTagInfosFromName2` (GetTgByNm2), `GetTagidsByTagnameAndSource` | ⛔ **blocked** — native error type 4, code **51 (InvalidParameter)** or **1 (Failure)** |
## Evidence (this probe + prior notes)
- **ExeC** → type 4 / code 51 for every handle variant (storageGuid, contextGuid).
Matches `implementation-status.md` ~982 / ~1404 ("StartTagQuery depends on earlier
native session/filter registration … do not wire through guessed calls").
- **GETHI** (`HistorianVersion` param query — the *exact* native request shape from
`BuildGetHistorianInfoRequest`, with `Stat.GetV ×2` priming) → type 4 / code **1**
for all five handle formats tried: storage-session GUID, context GUID, uint as
decimal, uint as `X8` hex, uint as `0x`-hex. In the only place GETHI is used (the
event-priming chain) its result is wrapped in `TryRun` and **discarded**, so there
was never evidence it actually returns data from the managed client.
- **GetTepByNm / QTB / QTG / GetTgByNm2** all take a `string handle` → same family.
## Why
The string-handle ops are keyed off a **native-side session/filter registration**
that the C++ client performs but the managed replay does not reproduce. The uint
client handle is the Open2 session token the server already trusts; the string GUID
handle indexes a *different* per-service registration table that stays empty unless
the native priming is replicated faithfully. `Stat.GetV ×2` alone is insufficient.
## Consequence for the roadmap
Every remaining **M1 read** item is a string-handle op:
- R1.1 `ExecuteSqlCommandAsync` (ExeC) — blocked
- R1.4 `GetHistorianInfoAsync` (GETHI) — blocked
- R1.5 extended-property read (GetTepByNm) — blocked (string handle, confirmed)
- R1.6 localized-property read — same family
So **M1 read-surface completion on 2020 WCF is gated entirely behind one RE target:
the native session/filter registration for string-handle ops.** Reverse-engineer it
once and the whole family unlocks. Until then, the alternatives are:
1. **RE the registration** — instrument the native `CRetrievalConnectionWCF` /
`CStatusConnectionWCF` priming between Open2 and the first successful string-handle
call (capture-tier; the highest-leverage single RE task for M1).
2. **2023 R2 gRPC server** — these ops are first-class on the gRPC front door, where
the handle/envelope differs and the registration wall may not apply.
Do **not** ship any string-handle op via guessed calls (project discipline:
"leave them throwing until evidence supports an implementation").
## ⚠️ Update (2026-06-20): GETRP punches through — the wall is not absolute
Roadmap **R1.2 `GetRuntimeParameterAsync`** turned out to be a **`string`-handle op**
(`aa/Stat/GETRP(string handle, byte[] pRequestBuff) → (bool, byte[] pResponseBuff,
byte[] errorBuffer)`) — the **same shape as GETHI**, and in the same native session it
uses the **same handle GUID** as GETHI (confirmed: the GUID equals the Open2 `outBuff`
storage-session id at `[5..21]`, the value the managed `ParseOpenConnectionResponse`
already extracts as `StorageSessionId`).
Yet GETRP **works from the pure-managed client** — live-verified, returns the runtime
`HistorianVersion` value `20,0,000,000`. The only material difference from the failed
GETHI probe is the **handle string format**: the native client sends the GUID
**UPPERCASE, dash-separated, no braces** (format example
`XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX`, all hex upper), i.e.
`storageSessionId.ToString("D").ToUpperInvariant()`. `.NET Guid.ToString("D")` is
lowercase, so a probe that passed the GUID without upcasing would not byte-match what
the server's session table is keyed on.
**Implication — CONFIRMED, the wall is largely a handle-format bug.** The follow-up was done:
GETHI and **ExeC both return data with the uppercased storage-session GUID**.
- **R1.1 `ExecuteSqlCommandAsync` (ExeC + GetR) — SHIPPED + live-verified (2026-06-20).**
`ExecuteSqlCommandAsync(sql)``HistorianSqlResult`. `Retr.GetV` prime → `ExeC(handle,
sql, option=0, ref queryHandle)``GetR` loop. Note: **`GetR` returns false even on
success** (the byte stream is in `pResultBuff` regardless; false = final page). `pResultBuff`
is an **NRBF `DataTable`** (`SerializationFormat.Xml`: `XmlSchema` + `XmlDiffGram`), decoded
read-only with `System.Formats.Nrbf` + `XDocument` (BinaryFormatter is gone from .NET 10).
Shipped: `HistorianSqlResultProtocol`, `HistorianWcfSqlClient`, golden `WcfSqlResultProtocolTests`,
gated live tests. See `docs/reverse-engineering/wcf-exec-sql.md`.
- **GETHI (R1.4)** returns data with the uppercase handle, **but only the named `HistorianVersion`
value** — over 2020 WCF GETHI is a named-value query (the only working key), *not* a full-struct
read. `EventStorageMode` (the 518-byte-struct `@514` field) is **not on the 2020 WCF wire**; it is
the 2023R2 HCAL-native/gRPC model. So R1.4 is **bounded out on WCF / gRPC-2023R2-only** and the
public API is intentionally not shipped. Full analysis: `docs/reverse-engineering/wcf-historian-info.md`.
So the "wall" collapses to the handle **format** for the Retrieval/Status string-handle ops.
**Exception — QTB/QTG:** `StartTagQuery` does *not* punch through; captured with a correctly
uppercase handle it still fails `error 1` **server-side** (`CMdServer::StartActiveTagnamesQuery`
over `\\.\pipe\aahMetadataServer\console`) — a metadata-server blocker, independent of handle
format. Name-based ops route around it.
See `HistorianRuntimeParameterProtocol`, `IStatusServiceContract2.GetRuntimeParameter`,
golden `WcfRuntimeParameterProtocolTests`, and capture tooling
`scripts/Capture-RuntimeParam.ps1` + `scripts/decode-runtime-param-capture.py`.