Files
histsdk/docs/reverse-engineering/grpc-event-query-capture.md
T
Joseph Doherty 8f4a188f78 Event-row parser: verify against the provided 2023 R2 client; fix latent multi-row bug
Used the provided stock client as an oracle to verify the event read path. The
capture-event harness returns 50 real events, and the instrument-grpc-nonstream rewrite
captured the exact GetNextEventQueryResultBuffer.result buffer (63,192 bytes, version
0x0B=11, rowCount 50 = 25 Alarm.Set + 25 Alarm.Clear). Feeding that real buffer through
HistorianEventRowProtocol.Parse exposed a latent parser bug.

The real buffer layout is: version(2) + rowCount(4) + headerField(4, =0x1E) followed by
MARKERLESS rows (rowFormat(2)=7 + filetime(8) + 8x u16 slots + compact-ascii type +
propCount + props). The parser wrongly treated the one-time 0x1E field as a per-row
marker and re-consumed [marker+format] for every row, so it decoded only the FIRST row
of any multi-row buffer and stopped. This is not gRPC-specific: the captured WCF v9
buffer has the identical 0900 <rowCount> 1E000000 0700 header, so the shipped WCF event
read had the same latent multi-row truncation.

Fix: read a 10-byte buffer header (skip the 0x1E field once) and parse markerless rows;
accept container version 9 (WCF) and 11 (gRPC), mirroring the interface-version gate that
accepts History 11 and 12.

Verified: the real 50-row buffer now decodes to exactly 50 events, ending cleanly at
end-of-buffer (Parse_RealStockClientCapture_DecodesAllEvents, gated on
HISTORIAN_EVENT_CAPTURE_NDJSON so it skips without the gitignored capture), plus a
synthetic v11 golden test. 328 offline tests pass.

The parse path is now verified against the provided client's real event data on both
transports; the only remaining gap for gRPC events is the server delivering rows to our
connection (the documented retrieval-server-gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 14:05:35 -04:00

506 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows
Captured the stock 2023 R2 client performing a **gRPC event read** that returns rows, to resolve
the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This
closes the capture-gate: the working request shape is now known.
## How it was captured
`tools/AVEVA.Historian.Grpc2023CaptureHarness` gained a `capture-event` scenario. It loads the
self-contained mixed-mode 2023 R2 `aahClientManaged.dll` and drives `HistorianAccess`:
```
OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true)
-> CreateEventQuery() // NON-null only on an Event connection
-> EventQueryArgs { StartDateTime, EndDateTime, EventCount }
-> EventQuery.StartQuery(args) // => GrpcRetrievalClient.StartEventQuery(requestBuffer)
-> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer
-> EventQuery.EndQuery() -> CloseConnection
```
The existing wide-net `instrument-grpc-nonstream` IL rewrite (every `Grpc*Client` `byte[]` method)
already covers `GrpcRetrievalClient.StartEventQuery.requestBuffer` (entry) and
`GetNextEventQueryResultBuffer.result` (exit) — no new instrument command was needed. Run read-only
(non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture
NDJSON stay under `artifacts/reverse-engineering/grpc-event-capture/` (gitignored — the result
buffer carries event identity data).
Result: **50 events returned over gRPC** (Alarm.Set / Alarm.Clear rows), proving the path works when
driven through an Event connection.
## Two findings
### 1. The event read needs an **Event-type connection** (`ConnectionIndex 1`)
`HistorianAccess.CreateEventQuery()` returns `null` unless `IsEventConnectionRequested()` — i.e. the
connection was opened with `ConnectionType=Event`, which the native client routes to a *separate*
connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on
that connection: `OpenConnection``ExchangeKey``UpdateClientStatus``RegisterTags`(CM_EVENT) →
`EnsureTags`(CM_EVENT) → `GetHistorianInfo` + 7×`GetSystemParameter` (Stat priming) →
`StartEventQuery``GetNextEventQueryResultBuffer` (rows) → `EndEventQuery``CloseConnection`.
### 2. The working `StartEventQuery` request is **version 6**, not 5
Our SDK's `HistorianEventQueryProtocol.CreateNativeFilterAttempt` builds a **version-5** empty-filter
buffer; the stock 2023 R2 client sends **version 6**. Diffed byte-for-byte (same query window +
eventCount), the two buffers are **identical except**:
- **byte 0: version `06` vs `05`**
- **5 additional trailing zero bytes** (stock = 70 bytes, SDK v5 = 65 bytes)
The server returns rows for v6 and **zero rows for v5** (the v5 request is *accepted*
`StartEventQuery` succeeds and yields a query handle — but `GetNextEventQueryResultBuffer` then
matches nothing). Everything else is shared: the two query-window FILETIMEs, `UInt32 eventCount`,
the `UInt32 65536` buffer hint, the `"UTC"` `HistorianString`, and the `01 01000001000001 0000`
metadata-namespace block.
Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no
identity data):
```
[0..1] UInt16 version = 6 // SDK currently sends 5
[2..9] Int64 startUtc (FILETIME)
[10..17] Int64 endUtc (FILETIME)
[18..21] UInt32 eventCount
[22..25] UInt32 0
[26..27] UInt16 0
[28..29] UInt16 1
[30..36] 7 bytes 0 // empty-filter block
[37..40] UInt32 65536 // buffer-size hint
[41..50] HistorianString "UTC" (UInt32 len=3 + UTF-16LE)
[51..60] 01 01 00 00 01 00 00 01 00 00 // metadata-namespace block (marker + 3 empty)
[61..69] 9 bytes 0 // terminal (SDK v5 writes only 4 here)
```
## Fix part 1 — v6 request (DONE, necessary)
`HistorianEventQueryProtocol.CreateStartEventQueryAttempts` gained a `version` parameter (default 5 =
WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading `06` and the 5-byte trailing pad. The
WCF path is unchanged (v5). Golden test `Version6EmptyFilterMatchesCapturedGrpcEnvelope` pins the
envelope; 322/322 offline tests pass.
## Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented)
Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live
server, `GetNextEventQueryResultBuffer` **still long-polls and returns zero rows** (the gated test
still throws). So **v6 is necessary but not sufficient** — the read also requires an **Event-type
connection**, which our SDK does not open.
Isolated by diffing the captured `OpenConnection.openParameters` (302 bytes, native format v8) for a
**Process** connection (`connect` scenario) vs the **Event** connection (`capture-event`): aside from
the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two
sessions), the connection differs in **two clean structural bytes**:
| offset | Process | Event |
|--------|---------|-------|
| 95 | `02` | `01` |
| 96 | `00` | `01` |
These correspond to `HistorianConnectionType` (Process vs Event; the native event path runs on
`ConnectionIndex 1`). The problem: our SDK opens the session with the **2020 OpenConnection3 v6**
buffer (`HistorianNativeHandshake.BuildOpenConnection3Request`, `connectionMode 0x402`), which the
2023 R2 server accepts for reads but which carries no event-connection-type marker. `connectionMode`
is NOT the discriminator (2020 WCF event reads work with `0x402`); the native client distinguishes
event vs process via this separate `ConnectionType` field in its v8 `openParameters`.
### Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection
Decoded the native `openParameters` (302 bytes): **byte 0 = `08` (format version 8)**, then a
context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource
strings, and at **[94] `ClientType=04`** immediately followed by **[95] `ConnectionType`
(`01`=Event / `02`=Process)** + **[96] a flag (`01`/`00`)**, then the rest.
Our SDK builds the **v6** buffer (`HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6`,
byte 0 = `06`): it writes `ClientType` (1 byte) **immediately followed by `ConnectionMode` (uint)**
there is **no `ConnectionType` byte at all**. The v8 format *inserts* `ConnectionType` (+flag) between
`ClientType` and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for *reads*)
structurally cannot mark the connection as Event, and the server returns event rows only for an Event
connection.
Two further obstacles to simply emitting v8:
- the native client authenticated via **`ExchangeKey`** (cert path; 72-byte `btInput`/`btOutput` in
the capture) whereas the SDK's gRPC handshake uses **`ValidateClientCredential`** (Negotiate). The
v8 `openParameters` [68..93] region is session-derived and tied to that auth flow.
- `ConnectionMode` is NOT the lever (2020 WCF event reads work at `0x402`); `ConnectionType` is a
distinct field that only exists from format v8.
Also confirmed a secondary format gap: the native gRPC `EnsureTags` CM_EVENT payload is **86 bytes**
vs the SDK's `SerializeCmEventCTagMetadata` **83 bytes** (a 3-byte 2023 R2 bump, parallel to the
event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns
benign-false yet events flow) but should be matched if the event open is ever rebuilt.
**Conclusion — the event-connection gate is NOT a tweak.** Making event rows flow over gRPC requires
the SDK to emit the native **v8 `OpenConnection` format** with `ConnectionType=Event` (a 302-byte
buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and
likely to adopt the `ExchangeKey` cert auth path. That is a substantial RE+implementation effort
comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated
`ReadEventsAsync_OverGrpc_*` test correctly still pins the no-row throw, and **v6 (part 1) is retained
as the captured-correct request format** for when the open is rebuilt.
Capture artifacts (gitignored): `artifacts/reverse-engineering/grpc-event-capture/`
`event-capture.ndjson` (Event), `process-connect-2.ndjson` (Process).
## v8 `openParameters` fully decoded (2026-06-23) + the ECDH ExchangeKey finding
Full byte map of the native Event-connection `openParameters` (302 bytes; identity values
redacted — they are session-specific and sit in the gitignored capture):
```
[0] byte 0x08 format version = 8
[1] byte 0xf0 constant marker
[2..20] 19 × 0x00
[21] byte 0x01 constant marker
[22..37] 16B GUID per-session client key
[38..41] u32 username length (chars)
[42..N] UTF-16 username (HistorianString)
[..+1] u16 credential-token length (= 26 in the capture)
[..] 26B token ECDH-derived credential token <-- see below
[94] byte 0x04 ClientType (= our NativeClientType 4)
[95] byte ConnectionType 01 = Event / 02 = Process <-- THE GATE
[96] byte flag 01 (Event) / 00 (Process)
[97..] control bytes (0x03 ... small region, not fully named)
[~114..117]u32 FormatVersion=3
[..] HistorianString machine/server node name
[..] HistorianString client node name "(<ver>)"
[..] u32 session-variable (process-ish)
[..] u32 / zeros
[..] u32 datasource len
[..] UTF-16 datasource id e.g. "2023.1219.4004.5"
[270..285] 16 × 0xff ShardId (all-FF = unset; our v6 sends Empty)
[286..289] u32 client/hcal version int
[290..297] i64 FILETIME ClientTimestamp
[298..301] u32 0
```
The tail (`FormatVersion` → machine → clientNode → datasource → ShardId → version → timestamp)
is the **same `ClientCommonInfo` our v6 already emits**. The new/different parts are: version byte,
the `[1]`/`[21]` markers, the GUID position, the **26-byte credential token** (vs v6's fixed-size
block), the **`ConnectionType` byte**, and ShardId=FF.
**The auth is ECDH, not Negotiate.** The capture's `ExchangeKey` buffers begin `45 43 4b 31` =
ASCII **`"ECK1"`** + a 64-byte EC public-key point — a Diffie-Hellman key exchange — and the 26-byte
`openParameters` token is derived from it. `HistorianSecurityMode` offers only `Disabled` / `None` /
`TransportCertificate`; the harness used `TransportCertificate`, which is what drives the ECDH
`ExchangeKey`. There is **no TLS+Negotiate mode** on the native client (it couples TLS with the cert
ECDH path), so a Negotiate-auth v8 capture cannot be produced from the native client.
**Key de-risking insight:** our SDK's v6 `OpenConnection` sends a **fully zeroed** 1026-byte
credential block (`credentialBlock: new byte[1026]`) and reads still work — because authentication is
actually carried by the separate `StorageService.ValidateClientCredential` (Negotiate) handshake, not
by the bytes inside `openParameters`. By analogy the v8 `[68..93]` token may likewise be **ignorable**
once `ValidateClientCredential` has run. So the first build hypothesis (cheapest, read-only to test):
> Reuse the SDK's existing `ValidateClientCredential` handshake, then send a **v8 `OpenConnection`
> with `ConnectionType=Event` and a zeroed credential token**, and see whether the 2023 R2 server
> returns event rows.
If that works, the ECDH ExchangeKey RE is unnecessary. If it fails, the fallback is full reproduction
of the ECDH `ExchangeKey` handshake (curve/KDF/cipher) — a much larger crypto-RE effort. Build path:
add `SerializeNativeOpenConnectionVersion8(connectionType)` to `HistorianOpen2Protocol`, wire the gRPC
event handshake to use it (events only; reads stay on v6), live-test (non-destructive). Full hex in
the gitignored capture.
### Path A built + live-tested 2026-06-23 — DISPROVEN (v8 is coupled to ExchangeKey)
Built `HistorianOpen2Protocol.SerializeNativeOpenConnectionVersion8` (golden-tested,
`Version8EventSerializerReproducesCapturedNativeStructure` — reproduces the captured 302-byte
structure exactly) + `HistorianNativeHandshake.BuildEventOpenConnectionVersion8Request` (zeroed
credential token) + an `eventConnection` switch on `HistorianGrpcHandshake.OpenSession`, and live-ran
the event read against the server. Result: the v8 `OpenConnection` was **parsed by the server** (got
past the byte format) but **rejected at the auth check** with native error
```
type=132 code=34 "aahHcapLib::HistoryService::EstablishConnection — Failed to get client key"
```
i.e. `EstablishConnection` could not find a server-side **client key** for our session. In the v6
path that key is established by `StorageService.ValidateClientCredential` (which is why v6 reads
work); the v8 path looks it up in the registry that **`HistoryService.ExchangeKey` (ECDH)** populates,
and there is **no `ValidateClientCredential` on `HistoryService`** in the gRPC contract. So the server
branches on the OpenConnection version: v6 accepts the Negotiate-established key, **v8 requires the
ExchangeKey-established key**. The zeroed-token hypothesis is therefore disproven — not because of the
token bytes, but because the whole v8 path is gated on `ExchangeKey` having run first.
**Status:** the v8 serializer/builder are correct and retained (golden-tested), plus the
`OpenConnection` failure now decodes the native error (type/code/ASCII). The event orchestrator is
reverted to the v6 session (gated test still pins the no-row throw). The remaining route is **Path B:
implement `HistoryService.ExchangeKey`** — `"ECK1"` + a 64-byte EC public-key point (P-256 X‖Y, by the
size) — using .NET `ECDiffieHellman`, establish the client key, then reissue the v8 `OpenConnection`.
Open question for Path B: whether merely *completing* the ECDH key agreement registers the client key
(so the zeroed openParameters token still rides through), or whether the token must also be derived
from the shared secret (full KDF/cipher RE).
### Path B started 2026-06-23 — ExchangeKey ECDH works; cleared 2 of 3 layers
Implemented `HistoryService.ExchangeKey` as a **pure-managed P-256 ECDH** key exchange
(`HistorianNativeHandshake.BuildExchangeKeyClientHello` / `DeriveExchangeKeySecret`, .NET
`ECDiffieHellman` over `nistP256`; wire format `"ECK1" + u32(32) + X(32) + Y(32)`) and wired it into
`HistorianGrpcHandshake.OpenSession(eventConnection: true)` ahead of the v8 `OpenConnection`,
on the same context-key handle. Live result against the server: the **`ExchangeKey` RPC succeeds**
(the server accepted our public key), and the v8 `OpenConnection` error **moved one layer deeper**:
```
Path A (no ExchangeKey): 132/34 "Failed to get client key"
Path B (ExchangeKey ECDH): 132/171 AuthenticationFailed "EstablishConnection — Authentication failed"
```
So the ECDH cleared the client-key check; the remaining blocker is **authentication**: the 26-byte
v8 credential token must be a *valid* value derived from the ECDH shared secret (not zeros).
### Token crypto traced 2026-06-23 (Frida → Windows CNG) — KDF found, token construction still open
Hooked Windows CNG (`bcrypt.dll`/`ncrypt.dll`) while the native harness ran a real ExchangeKey
(`scripts/frida/aahclientmanaged-cng-exchangekey.js` + `artifacts/.../cng-trace.py`). Findings:
- **The ECDH + KDF are standard CNG, driven by managed `System.Security.Cryptography.ECDiffieHellmanCng`**
(backtrace top frame = `System.Core.ni.dll`; the caller is aahClientManaged's C++/CLI `<Module>`):
`NCryptSecretAgreement` (P-256) → `NCryptDeriveKey(KDF=HASH, HASH_ALGORITHM=SHA256, 32 bytes)`. So the
derived key = **SHA256(ECDH shared secret)** — exactly `ECDiffieHellmanCng{ KeyDerivationFunction=Hash,
HashAlgorithm=SHA256 }.DeriveKeyMaterial(...)`. Our managed `DeriveExchangeKeySecret` should switch to
this (SHA256 of the raw agreement) to match.
- **`"ECK1"` is NOT AVEVA-custom** — it is the standard Windows CNG `BCRYPT_ECCPUBLIC_BLOB` magic for
P-256 (`NCryptExportKey`/`ImportKey` emit exactly `ECK1 + len(32) + X(32) + Y(32)`), confirming our
`BuildExchangeKeyClientHello` wire format is correct.
- **The 26-byte token is a custom construction that is not yet reproduced.** Correlated one run's
derived key (`SHA256(secret)`) with that run's token (from the IL openParameters capture): a
528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key × request slices ×
creds) found **no match**, and the token matches **none** of the traced hash digests. The token
starts with a constant `0x8e` marker in both captured runs (so it is structured, not raw cipher
output). It is built in managed code between the `DeriveKeyMaterial` call and the openParameters
assembly.
**dnlib IL extraction 2026-06-23 — the token scheme is fully reverse-engineered.** ILSpy can't
decompile the mixed-mode assembly (crashes), but loading `dnlib` in PowerShell and scanning the IL
recovered the whole construction:
- **`<Module>::CHistoryConnectionGrpc.GetClientKey`** is the ECDH driver: `new ECDiffieHellmanCng()`
`KeyDerivationFunction = Hash`, `HashAlgorithm = SHA256`, `KeySize = 256`
`GrpcHistoryClient.ExchangeKey(strHandle, ourPubKey.ToByteArray(), out serverPub, out err)`
`CngKey.Import(serverPub, CngKeyBlobFormat.EccPublicBlob)`**`DeriveKeyMaterial`** = the 32-byte
client key = **`SHA256(ECDH shared secret)`**. (So our managed side should derive the key the same
way — `ECDiffieHellman` raw agreement then SHA256, or equivalently `DeriveKeyFromHash(..., SHA256)`.)
- **The 26-byte token is built by `aahClientCommon.CClientBase.ConfigureOpenConnection`** (the lone
caller of `GetClientKey`) using the **`HistorianCrypto.NRC4_V2.aahCryptV2`** scheme — a custom
**MD5-keyed RC4 stream cipher with a version prefix**:
- `aahCryptV2.body`/`HashData` = **MD5** (verified: the IL loads MD5 round constants `0xd76aa478`
and rotates 7/12/17/22).
- `aahCryptV2.prepare_key` = standard **RC4 KSA** seeding the 256-byte S-box from a **16-byte (MD5)**
key (`std.array<unsigned char,16>`).
- `aahCryptV2.enc_buffer` = `MD5(...)` → key, then **`rc4encrypt`** the body; `enc` prepends a
scheme **prefix** (`NRC4_V2.PrefixV2` / `InnerPrefixV2`) — the constant `0x8e` token marker.
- `from_GUID` keys the cipher from a GUID string.
So the token = `prefix + RC4(plaintext, key = MD5(keyMaterial))`, where the key material ties back to
the `SHA256(ECDH secret)` client key. **This is 100% reproducible in pure managed code** (RC4 + MD5
are ~40 lines; nothing AVEVA ships).
**Remaining to finish (next cycle):** read `ConfigureOpenConnection`'s exact wiring (which value is
MD5'd for the RC4 key, what plaintext is encrypted, the exact prefix bytes — a little more dnlib IL),
implement `aahCryptV2` (RC4+MD5+prefix) managed-side, set the v8 token = that, and live-test
(non-destructive). The offline correlation data (one run's derived key + token + openParameters) is
captured under `artifacts/.../` to validate the managed reproduction before going live.
### Token implemented + auth WORKS live (2026-06-23); row retrieval still 0 — proven NOT a payload issue
`token = RC4(password-UTF16LE, key = MD5(SHA256(ECDH secret)))` was implemented in pure managed C#
(`HistorianNativeHandshake.BuildExchangeKeyCredentialToken` + `Rc4`; client key via
`DeriveKeyFromHash(SHA256)`), golden-tested (RC4 standard vector + token construction), and
**live-verified**: the v8 `OpenConnection` now **authenticates** against the 2023 R2 server (past the
`132/171 AuthenticationFailed` wall). Auth is solved.
The event **query** still returns `version-11 rowCount-0` while the native returns 50 for an
**identical** request. Exhaustively ruled out as the cause (all confirmed live, opt-in
`EventReadDiagnostic` test + the IL rewrite extended to log string/uint handle fields):
- `StartEventQuery` request: **byte-identical** to the native (v6 layout)
- v8 `OpenConnection` `openParameters`: **byte-identical** to the native (302 bytes) once ClientNodeName
is matched — every control byte, ConnectionType, token framing, ShardId, etc.
- Handle usage: identical — `ExchangeKey`→contextKey, registration→storage-session GUID (`strHandle`),
query→client uint (`uiHandle`); our parsed handles are valid (registration `RTag/EnsT=True`, valid
`queryHandle`)
- `queryRequestType = 3`, registration sequence/order, gzip metadata header — all match
- window (events exist; native returns 50 *now*), eventCount — not it
So **every observable client-side byte matches the native**, yet the server scopes 0 events to our
connection. The event RPCs succeed over our transport and return a valid *empty* result (not a
transport error), so it is **not a payload or transport-incompatibility issue** — it is a
connection/server-level difference (e.g. session affinity tied to the native `Grpc.Core` HTTP/2
connection or a connection-identity the server uses to scope events) that is **invisible to, and
unfixable by, client payload matching.** Closing it needs server-side insight or a different angle
(e.g. compare the full HTTP/2 connection setup / TLS identity), not more wire-payload RE.
**Shipped this effort:** the complete ExchangeKey crypto (ECDH + SHA256 + MD5-keyed RC4 token) — the
hard wall — pure managed, golden-tested, auth live-verified. Orchestrator stays on the no-row throw;
gated test unchanged.
### NEXT SESSION — the server-side / connection angle (row retrieval pickup)
Client payloads are exhausted (byte-identical to the native, proven above). The next investigation is
**connection-level**, not wire-payload. Pursue in roughly this order; each is concrete and testable.
**Already proven — do NOT redo:** auth works (ExchangeKey ECDH + RC4 token, live-verified); v8
`openParameters`, all handles (str/uint), `StartEventQuery` request, registration (`RTag/EnsT=True` +
order), `queryRequestType=3`, gzip header — all byte-match the native. Events exist (native returns 50
*now*). The event RPCs succeed over our transport and return a valid version-11 **rowCount-0** (not a
transport error). So the server scopes 0 events to *our* connection specifically.
**Tooling already in place:** opt-in diagnostic test `EventReadDiagnostic_OverGrpc_PrintsJourney`
(env `HISTORIAN_GRPC_EVENT_DIAG=1`, prints registration outcomes, handles, result hex, v8 buffer);
the `capture-event` harness scenario (native, returns rows); `instrument-grpc-nonstream` now logs
string/uint handle fields too; the CNG Frida hook. Live recipe: set `HISTORIAN_GRPC_HOST`/`_PORT
32565`/`_TLS true`/`_DNSID` to the 2023 R2 server + domain creds (strip quotes); reach the box per the
live-server access reference.
1. ~~**Transport: native `Grpc.Core` HTTP/2 vs our `Grpc.Net.Client` + `GrpcWebHandler` (gRPC-Web).**~~
**DISPROVEN 2026-06-23.** Built `HistorianGrpcChannelFactory.CreateHttp2` (plain HTTP/2 over a
`SocketsHttpHandler`, no `GrpcWebHandler` wrap, ALPN `h2` to the TLS server) and wired it into the
event orchestrator behind `HISTORIAN_GRPC_EVENT_HTTP2=1` (event path only; reads stay gRPC-Web). Live
side-by-side against the event-bearing server, **everything else held constant**:
| channel | auth | registration | queryHandle | result buffer |
|---------|------|--------------|-------------|---------------|
| `http2` (native HTTP/2) | ✓ | `RTag=True EnsT=True` | 1057 | `0B00000000001E000000` |
| `grpc-web` (default) | ✓ | `RTag=True EnsT=True` | 1058 | `0B00000000001E000000` |
The complete v8 chain — ExchangeKey ECDH auth, CM_EVENT `RegisterTags`/`EnsureTags`, `StartEventQuery`
(valid handle) — runs end-to-end over **plain native HTTP/2**, and the server returns the
**byte-identical** version-11 (`0x0B`) rowCount-0 terminal on both transports. So gRPC-Web vs native
HTTP/2 is **not** the discriminator — the zero-row scoping is identical regardless of transport. The
`CreateHttp2` factory + the `HISTORIAN_GRPC_EVENT_HTTP2` switch + the `EventChannelMode` diagnostic are
retained for future connection-level probing. This eliminates the leading hypothesis and tightens the
conclusion: the server scopes 0 events to our connection at a layer **above** the gRPC transport.
2. ~~**TLS client identity / certificate.**~~ **DISPROVEN 2026-06-23 (decompile + capture).** The stock
client's `GrpcClientBase.InitializeBase` creates a bare `HttpClientHandler` and sets only
`ServerCertificateCustomValidationCallback` — it **never adds a client certificate**. The TLS-tee
capture (below) confirms `clientCert=none` on every native connection. So the native presents no client
cert; this is not the gate.
3. ~~**HTTP/2-level / connection-frame capture.**~~ **DONE 2026-06-23 — topology difference found, tested,
NULL.** Built a TLS-terminating tee proxy (`artifacts/.../httpcap/`, gitignored: self-signed server
cert, forwards through the loopback tunnel, logs decrypted HTTP/1.1 + gRPC-Web both ways) and ran a
**native `capture-event` (returns 50 rows) and our SDK diagnostic (0 rows) through the same
proxy/upstream**. Note: the stock client is gRPC-Web/HTTP-1.1 (not HTTP/2 — `alpn` empty), so the
capture is HTTP/1.1 framing. Findings:
- **Connection topology differs.** The native opens **5 TLS connections, one per service**
`HistoryService` (ExchangeKey/OpenConnection/Register/EnsureTags), `StatusService` (×2), and
**`RetrievalService` (the event query: GetRetrievalInterfaceVersion → StartEventQuery → GetNext →
EndEventQuery) on its own dedicated connection**. Our SDK collapses **every service onto one
connection**. (Matches the decompile: stock has a separate `GrpcClientBase` per service.)
- **Framing differs** (benign): native uses `content-length` + `Expect: 100-continue`; SDK uses
`transfer-encoding: chunked`. The server accepts both (our `StartEventQuery` returns a valid handle),
so framing is not the gate. No extra/hidden header on either side; `clientCert=none` throughout.
- **TESTED the topology hypothesis (`HISTORIAN_GRPC_EVENT_SPLIT_CHANNEL=1`):** ran
`StartEventQuery`/`GetNext`/`EndEventQuery` on a **dedicated RetrievalService connection** (no
re-handshake, reusing the session handle — exactly mirroring native conn4), registration staying on
the main connection. **Result: still `0B00000000001E000000` (0 rows), `QH=1063`.** Splitting the
event query onto its own connection — the one concrete structural difference the capture revealed —
**does not make rows flow.** So the server correlates by session handle, not by connection, and the
topology is **not** the row-scoping gate. The `CreateHttp2`/`SPLIT_CHANNEL` switches + the
`httpcap` proxy are retained as diagnostics.
4. ~~**Server-side ground truth.**~~ **ANSWERED 2026-06-23 (DISPROVES the data-scoping premise).** Via
the SOCKS→SQL relay (read-only; `artifacts/.../sqlschema/`, gitignored), dumped the full event schema
on the live `Runtime` DB. Findings:
- **No per-connection / per-client / per-session column exists anywhere in the event store.** The only
"scoping-like" columns on `Events`/`EventHistory`/snapshots are event *content*`Source_*` (event
origin area/object/PV), `User_*` (who acknowledged), `Provider_NodeName` (alarm provider node),
`SourceServer`/`SourceTag` (cross-server replication). None is "which client connection requested
this."
- **The rich `Events` view is not a relational table — it is served live by the Historian engine via
the `INSQL` OLE DB provider** (`sys.servers` shows linked servers `INSQL` + `INSQLD`;
`OBJECT_DEFINITION('dbo.Events')` is `NULL` = encrypted remote view). The Historian's own
`EventHistory` base table holds just 168 rows / 1 tag (the internal event-tag detector log); the
alarm/event journal the gRPC query reads lives in the engine, surfaced through INSQL.
- **Decisive: same engine, same `-90d..now` window, two paths diverge.** The `Events` view (via INSQL)
returns **71,332 events** for that window — most recent `Alarm.Set` firing seconds before the probe
(live, every few seconds) — while gRPC `StartEventQuery` for **our** connection returns **0**. The
data is global, abundant, recent, and identical-window-addressable; the engine simply does not hand
it to our gRPC connection.
→ There is **nothing in the data to scope by**, so the zero-row gate is **not** data scoping. It is the
gRPC RetrievalService's **per-connection in-process execution state** — the same class of wall as
`DeleteTagExtendedProperties` (server-side native in-process working-set, not reconstructable from
byte-identical wire requests). Reproduce: `artifacts/.../sqlschema/` (Program.cs = SOCKS5 relay +
`Microsoft.Data.SqlClient`; authenticate with the server's SQL login, not the domain Historian acct —
creds in the gitignored creds file).
### Stock managed client decompiled (2026-06-23) — confirms no hidden client-side difference
Closing the gap that prior cycles left: the zero-rows conclusion had leaned on **wire capture**
(`instrument-grpc-nonstream`, which only hooks `byte[]` params on `Grpc*Client` methods) — blind to gRPC
metadata/headers, interceptors, channel options, and any non-`byte[]` call. Read the **stock managed
client source directly** (`histsdk-2023r2-analysis/decompiled/Archestra.Historian.GrpcClient` +
`HistorianAccess`; the pure-managed assemblies decompile cleanly even though the mixed-mode
`aahClientManaged.dll` crashes ILSpy). Findings:
- **`GrpcClientBase.InitializeBase` builds the same channel we do.** `GrpcWebHandler((GrpcWebMode)0,
HttpClientHandler)` with `HttpVersion = 1.1` — i.e. **the stock client speaks gRPC-Web over HTTP/1.1,
the same transport as our SDK.** This *corrects the premise of hypothesis #1*: there was never a native
`Grpc.Core` HTTP/2 path to differ from — the stock client that returns 50 rows is itself gRPC-Web. The
HTTP/2 disproof's *conclusion* stands (and is reinforced: identical transport on both sides).
- **`m_metadata` passed to every RPC (incl. `StartEventQuery`/`GetNextEventQueryResultBuffer`) is only
`grpc-internal-encoding-request: gzip`** — exactly our header set. No connection-id, session token, or
auth header rides in gRPC metadata. The **`ClientInterceptor` is a no-op** (`LogCall` is empty; both
unary overloads just invoke the continuation). So the "invisible per-connection metadata/header" blind
spot is **confirmed empty** — there is no hidden client-side identity the `byte[]` capture missed.
- **The event-read query orchestration is genuinely not in managed code.** `CreateEventQuery` /
`EventQuery.StartQuery` / `MoveNext` are not in the managed `HistorianAccess`; the managed
`GrpcRetrievalClient.StartEventQuery` is a thin one-RPC stub. The query logic lives in the native
C++/CLI `HistorianClient` core (the mixed-mode part ILSpy can't decompile) — consistent with the
working-set being native/server-side, not a managed step we could read and replicate.
So **every client-controllable layer is now confirmed identical by reading the stock source**, not just
by wire match: request bytes, transport, channel options, gRPC metadata, interceptor. The remaining
difference is below the managed surface (native core) / server-side.
**Conclusion (after #1#4 + stock client decompiled + TLS-tee capture).** Every angle is now exhausted:
- **client payload** — byte-identical (IL capture + decompile);
- **transport** — stock client is *also* gRPC-Web/HTTP-1.1; native HTTP/2 makes no difference, both 0 rows;
- **client metadata/interceptor/channel** — decompiled: identical gzip-only header, no-op interceptor, no
client cert; the TLS-tee capture confirms no hidden header and `clientCert=none`;
- **connection topology** — the native splits services across 5 connections and queries on a dedicated
RetrievalService connection; replicating that (`SPLIT_CHANNEL`) still returns 0 rows → the server
correlates by session handle, not connection;
- **data store** — global, unscoped; 71,332 events the engine serves via INSQL but withholds from our
gRPC connection.
The gate is a **server-internal per-connection retrieval working-set** that a pure-managed client cannot
reconstruct by matching wire bytes, transport, metadata, topology, or data — and the establishing logic is
in the native `HistorianClient` C++ core, not in any decompilable managed step or observable on the wire.
**gRPC event-row retrieval stands documented as auth-solved / retrieval-server-gated**; `ReadEventsAsync`
over gRPC keeps the honest no-row throw, and event reads use the WCF transport. Diagnostics retained for
any future server-side investigation: the `httpcap` TLS-tee proxy, the `CreateHttp2` / `SPLIT_CHANNEL`
switches, the `EventReadDiagnostic` test, and the `capture-event` harness (native, returns rows).
### Verify the parse path against the provided client's real data (2026-06-23) — found + fixed a latent bug
Used the provided 2023 R2 client as an **oracle**: the `capture-event` harness returns 50 real events
(verified live + through the `httpcap` proxy), and the `instrument-grpc-nonstream` rewrite captured the
exact `GetNextEventQueryResultBuffer.result` buffer the stock client received — **63,192 bytes, version
`0x0B` (11), rowCount 50** (25 `Alarm.Set` + 25 `Alarm.Clear`). Fed that real buffer through our
`HistorianEventRowProtocol.Parse` to verify the read path decodes genuine gRPC event data, and it
**exposed a latent parser bug**:
- The real row buffer is `version(2) + rowCount(4) + headerField(4, =0x1E)` then **markerless rows**
(`rowFormat(2)=7 + filetime(8) + 8×u16 slots + compact-ascii type + propCount + props`). Our parser
wrongly treated the one-time `0x1E` field as a **per-row marker** and re-consumed `[marker+format]`
every row — so it parsed only the **first** row of any multi-row buffer and stopped. This is **not
gRPC-specific**: the captured **WCF v9** buffer has the identical `0900 <rowCount> 1E000000 0700 …`
header, so the shipped WCF event read had the same latent multi-row truncation.
- **Fix:** read a 10-byte buffer header (skip the `0x1E` field once) and parse markerless rows; accept
container version **9 (WCF) and 11 (gRPC)**. Verified: the real 50-row buffer now decodes to exactly 50
events, ending cleanly at end-of-buffer (`Parse_RealStockClientCapture_DecodesAllEvents`, gated on
`HISTORIAN_EVENT_CAPTURE_NDJSON`); plus a synthetic v11 golden test. 328 offline tests pass.
So the **parse path is now verified against the provided client's real event data** — the one remaining
gap is strictly the server delivering rows to our gRPC connection (the working-set gate above). If that
were ever opened, the decoded events would now flow through correctly on both transports.
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The
token-loop routing guardrail (`HistorianGrpcHandshakeRoutingTests`) was scoped to the closure so the
legitimate ExchangeKey call is allowed while still pinning that the Negotiate token loop never routes
there.