8f4a188f78
Used the provided stock client as an oracle to verify the event read path. The capture-event harness returns 50 real events, and the instrument-grpc-nonstream rewrite captured the exact GetNextEventQueryResultBuffer.result buffer (63,192 bytes, version 0x0B=11, rowCount 50 = 25 Alarm.Set + 25 Alarm.Clear). Feeding that real buffer through HistorianEventRowProtocol.Parse exposed a latent parser bug. The real buffer layout is: version(2) + rowCount(4) + headerField(4, =0x1E) followed by MARKERLESS rows (rowFormat(2)=7 + filetime(8) + 8x u16 slots + compact-ascii type + propCount + props). The parser wrongly treated the one-time 0x1E field as a per-row marker and re-consumed [marker+format] for every row, so it decoded only the FIRST row of any multi-row buffer and stopped. This is not gRPC-specific: the captured WCF v9 buffer has the identical 0900 <rowCount> 1E000000 0700 header, so the shipped WCF event read had the same latent multi-row truncation. Fix: read a 10-byte buffer header (skip the 0x1E field once) and parse markerless rows; accept container version 9 (WCF) and 11 (gRPC), mirroring the interface-version gate that accepts History 11 and 12. Verified: the real 50-row buffer now decodes to exactly 50 events, ending cleanly at end-of-buffer (Parse_RealStockClientCapture_DecodesAllEvents, gated on HISTORIAN_EVENT_CAPTURE_NDJSON so it skips without the gitignored capture), plus a synthetic v11 golden test. 328 offline tests pass. The parse path is now verified against the provided client's real event data on both transports; the only remaining gap for gRPC events is the server delivering rows to our connection (the documented retrieval-server-gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
506 lines
35 KiB
Markdown
506 lines
35 KiB
Markdown
# gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows
|
||
|
||
Captured the stock 2023 R2 client performing a **gRPC event read** that returns rows, to resolve
|
||
the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This
|
||
closes the capture-gate: the working request shape is now known.
|
||
|
||
## How it was captured
|
||
|
||
`tools/AVEVA.Historian.Grpc2023CaptureHarness` gained a `capture-event` scenario. It loads the
|
||
self-contained mixed-mode 2023 R2 `aahClientManaged.dll` and drives `HistorianAccess`:
|
||
|
||
```
|
||
OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true)
|
||
-> CreateEventQuery() // NON-null only on an Event connection
|
||
-> EventQueryArgs { StartDateTime, EndDateTime, EventCount }
|
||
-> EventQuery.StartQuery(args) // => GrpcRetrievalClient.StartEventQuery(requestBuffer)
|
||
-> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer
|
||
-> EventQuery.EndQuery() -> CloseConnection
|
||
```
|
||
|
||
The existing wide-net `instrument-grpc-nonstream` IL rewrite (every `Grpc*Client` `byte[]` method)
|
||
already covers `GrpcRetrievalClient.StartEventQuery.requestBuffer` (entry) and
|
||
`GetNextEventQueryResultBuffer.result` (exit) — no new instrument command was needed. Run read-only
|
||
(non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture
|
||
NDJSON stay under `artifacts/reverse-engineering/grpc-event-capture/` (gitignored — the result
|
||
buffer carries event identity data).
|
||
|
||
Result: **50 events returned over gRPC** (Alarm.Set / Alarm.Clear rows), proving the path works when
|
||
driven through an Event connection.
|
||
|
||
## Two findings
|
||
|
||
### 1. The event read needs an **Event-type connection** (`ConnectionIndex 1`)
|
||
|
||
`HistorianAccess.CreateEventQuery()` returns `null` unless `IsEventConnectionRequested()` — i.e. the
|
||
connection was opened with `ConnectionType=Event`, which the native client routes to a *separate*
|
||
connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on
|
||
that connection: `OpenConnection` → `ExchangeKey` → `UpdateClientStatus` → `RegisterTags`(CM_EVENT) →
|
||
`EnsureTags`(CM_EVENT) → `GetHistorianInfo` + 7×`GetSystemParameter` (Stat priming) →
|
||
`StartEventQuery` → `GetNextEventQueryResultBuffer` (rows) → `EndEventQuery` → `CloseConnection`.
|
||
|
||
### 2. The working `StartEventQuery` request is **version 6**, not 5
|
||
|
||
Our SDK's `HistorianEventQueryProtocol.CreateNativeFilterAttempt` builds a **version-5** empty-filter
|
||
buffer; the stock 2023 R2 client sends **version 6**. Diffed byte-for-byte (same query window +
|
||
eventCount), the two buffers are **identical except**:
|
||
|
||
- **byte 0: version `06` vs `05`**
|
||
- **5 additional trailing zero bytes** (stock = 70 bytes, SDK v5 = 65 bytes)
|
||
|
||
The server returns rows for v6 and **zero rows for v5** (the v5 request is *accepted* —
|
||
`StartEventQuery` succeeds and yields a query handle — but `GetNextEventQueryResultBuffer` then
|
||
matches nothing). Everything else is shared: the two query-window FILETIMEs, `UInt32 eventCount`,
|
||
the `UInt32 65536` buffer hint, the `"UTC"` `HistorianString`, and the `01 01000001000001 0000`
|
||
metadata-namespace block.
|
||
|
||
Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no
|
||
identity data):
|
||
|
||
```
|
||
[0..1] UInt16 version = 6 // SDK currently sends 5
|
||
[2..9] Int64 startUtc (FILETIME)
|
||
[10..17] Int64 endUtc (FILETIME)
|
||
[18..21] UInt32 eventCount
|
||
[22..25] UInt32 0
|
||
[26..27] UInt16 0
|
||
[28..29] UInt16 1
|
||
[30..36] 7 bytes 0 // empty-filter block
|
||
[37..40] UInt32 65536 // buffer-size hint
|
||
[41..50] HistorianString "UTC" (UInt32 len=3 + UTF-16LE)
|
||
[51..60] 01 01 00 00 01 00 00 01 00 00 // metadata-namespace block (marker + 3 empty)
|
||
[61..69] 9 bytes 0 // terminal (SDK v5 writes only 4 here)
|
||
```
|
||
|
||
## Fix part 1 — v6 request (DONE, necessary)
|
||
|
||
`HistorianEventQueryProtocol.CreateStartEventQueryAttempts` gained a `version` parameter (default 5 =
|
||
WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading `06` and the 5-byte trailing pad. The
|
||
WCF path is unchanged (v5). Golden test `Version6EmptyFilterMatchesCapturedGrpcEnvelope` pins the
|
||
envelope; 322/322 offline tests pass.
|
||
|
||
## Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented)
|
||
|
||
Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live
|
||
server, `GetNextEventQueryResultBuffer` **still long-polls and returns zero rows** (the gated test
|
||
still throws). So **v6 is necessary but not sufficient** — the read also requires an **Event-type
|
||
connection**, which our SDK does not open.
|
||
|
||
Isolated by diffing the captured `OpenConnection.openParameters` (302 bytes, native format v8) for a
|
||
**Process** connection (`connect` scenario) vs the **Event** connection (`capture-event`): aside from
|
||
the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two
|
||
sessions), the connection differs in **two clean structural bytes**:
|
||
|
||
| offset | Process | Event |
|
||
|--------|---------|-------|
|
||
| 95 | `02` | `01` |
|
||
| 96 | `00` | `01` |
|
||
|
||
These correspond to `HistorianConnectionType` (Process vs Event; the native event path runs on
|
||
`ConnectionIndex 1`). The problem: our SDK opens the session with the **2020 OpenConnection3 v6**
|
||
buffer (`HistorianNativeHandshake.BuildOpenConnection3Request`, `connectionMode 0x402`), which the
|
||
2023 R2 server accepts for reads but which carries no event-connection-type marker. `connectionMode`
|
||
is NOT the discriminator (2020 WCF event reads work with `0x402`); the native client distinguishes
|
||
event vs process via this separate `ConnectionType` field in its v8 `openParameters`.
|
||
|
||
### Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection
|
||
|
||
Decoded the native `openParameters` (302 bytes): **byte 0 = `08` (format version 8)**, then a
|
||
context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource
|
||
strings, and at **[94] `ClientType=04`** immediately followed by **[95] `ConnectionType`
|
||
(`01`=Event / `02`=Process)** + **[96] a flag (`01`/`00`)**, then the rest.
|
||
|
||
Our SDK builds the **v6** buffer (`HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6`,
|
||
byte 0 = `06`): it writes `ClientType` (1 byte) **immediately followed by `ConnectionMode` (uint)** —
|
||
there is **no `ConnectionType` byte at all**. The v8 format *inserts* `ConnectionType` (+flag) between
|
||
`ClientType` and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for *reads*)
|
||
structurally cannot mark the connection as Event, and the server returns event rows only for an Event
|
||
connection.
|
||
|
||
Two further obstacles to simply emitting v8:
|
||
- the native client authenticated via **`ExchangeKey`** (cert path; 72-byte `btInput`/`btOutput` in
|
||
the capture) whereas the SDK's gRPC handshake uses **`ValidateClientCredential`** (Negotiate). The
|
||
v8 `openParameters` [68..93] region is session-derived and tied to that auth flow.
|
||
- `ConnectionMode` is NOT the lever (2020 WCF event reads work at `0x402`); `ConnectionType` is a
|
||
distinct field that only exists from format v8.
|
||
|
||
Also confirmed a secondary format gap: the native gRPC `EnsureTags` CM_EVENT payload is **86 bytes**
|
||
vs the SDK's `SerializeCmEventCTagMetadata` **83 bytes** (a 3-byte 2023 R2 bump, parallel to the
|
||
event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns
|
||
benign-false yet events flow) but should be matched if the event open is ever rebuilt.
|
||
|
||
**Conclusion — the event-connection gate is NOT a tweak.** Making event rows flow over gRPC requires
|
||
the SDK to emit the native **v8 `OpenConnection` format** with `ConnectionType=Event` (a 302-byte
|
||
buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and
|
||
likely to adopt the `ExchangeKey` cert auth path. That is a substantial RE+implementation effort
|
||
comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated
|
||
`ReadEventsAsync_OverGrpc_*` test correctly still pins the no-row throw, and **v6 (part 1) is retained
|
||
as the captured-correct request format** for when the open is rebuilt.
|
||
|
||
Capture artifacts (gitignored): `artifacts/reverse-engineering/grpc-event-capture/` —
|
||
`event-capture.ndjson` (Event), `process-connect-2.ndjson` (Process).
|
||
|
||
## v8 `openParameters` fully decoded (2026-06-23) + the ECDH ExchangeKey finding
|
||
|
||
Full byte map of the native Event-connection `openParameters` (302 bytes; identity values
|
||
redacted — they are session-specific and sit in the gitignored capture):
|
||
|
||
```
|
||
[0] byte 0x08 format version = 8
|
||
[1] byte 0xf0 constant marker
|
||
[2..20] 19 × 0x00
|
||
[21] byte 0x01 constant marker
|
||
[22..37] 16B GUID per-session client key
|
||
[38..41] u32 username length (chars)
|
||
[42..N] UTF-16 username (HistorianString)
|
||
[..+1] u16 credential-token length (= 26 in the capture)
|
||
[..] 26B token ECDH-derived credential token <-- see below
|
||
[94] byte 0x04 ClientType (= our NativeClientType 4)
|
||
[95] byte ConnectionType 01 = Event / 02 = Process <-- THE GATE
|
||
[96] byte flag 01 (Event) / 00 (Process)
|
||
[97..] control bytes (0x03 ... small region, not fully named)
|
||
[~114..117]u32 FormatVersion=3
|
||
[..] HistorianString machine/server node name
|
||
[..] HistorianString client node name "(<ver>)"
|
||
[..] u32 session-variable (process-ish)
|
||
[..] u32 / zeros
|
||
[..] u32 datasource len
|
||
[..] UTF-16 datasource id e.g. "2023.1219.4004.5"
|
||
[270..285] 16 × 0xff ShardId (all-FF = unset; our v6 sends Empty)
|
||
[286..289] u32 client/hcal version int
|
||
[290..297] i64 FILETIME ClientTimestamp
|
||
[298..301] u32 0
|
||
```
|
||
|
||
The tail (`FormatVersion` → machine → clientNode → datasource → ShardId → version → timestamp)
|
||
is the **same `ClientCommonInfo` our v6 already emits**. The new/different parts are: version byte,
|
||
the `[1]`/`[21]` markers, the GUID position, the **26-byte credential token** (vs v6's fixed-size
|
||
block), the **`ConnectionType` byte**, and ShardId=FF.
|
||
|
||
**The auth is ECDH, not Negotiate.** The capture's `ExchangeKey` buffers begin `45 43 4b 31` =
|
||
ASCII **`"ECK1"`** + a 64-byte EC public-key point — a Diffie-Hellman key exchange — and the 26-byte
|
||
`openParameters` token is derived from it. `HistorianSecurityMode` offers only `Disabled` / `None` /
|
||
`TransportCertificate`; the harness used `TransportCertificate`, which is what drives the ECDH
|
||
`ExchangeKey`. There is **no TLS+Negotiate mode** on the native client (it couples TLS with the cert
|
||
ECDH path), so a Negotiate-auth v8 capture cannot be produced from the native client.
|
||
|
||
**Key de-risking insight:** our SDK's v6 `OpenConnection` sends a **fully zeroed** 1026-byte
|
||
credential block (`credentialBlock: new byte[1026]`) and reads still work — because authentication is
|
||
actually carried by the separate `StorageService.ValidateClientCredential` (Negotiate) handshake, not
|
||
by the bytes inside `openParameters`. By analogy the v8 `[68..93]` token may likewise be **ignorable**
|
||
once `ValidateClientCredential` has run. So the first build hypothesis (cheapest, read-only to test):
|
||
|
||
> Reuse the SDK's existing `ValidateClientCredential` handshake, then send a **v8 `OpenConnection`
|
||
> with `ConnectionType=Event` and a zeroed credential token**, and see whether the 2023 R2 server
|
||
> returns event rows.
|
||
|
||
If that works, the ECDH ExchangeKey RE is unnecessary. If it fails, the fallback is full reproduction
|
||
of the ECDH `ExchangeKey` handshake (curve/KDF/cipher) — a much larger crypto-RE effort. Build path:
|
||
add `SerializeNativeOpenConnectionVersion8(connectionType)` to `HistorianOpen2Protocol`, wire the gRPC
|
||
event handshake to use it (events only; reads stay on v6), live-test (non-destructive). Full hex in
|
||
the gitignored capture.
|
||
|
||
### Path A built + live-tested 2026-06-23 — DISPROVEN (v8 is coupled to ExchangeKey)
|
||
|
||
Built `HistorianOpen2Protocol.SerializeNativeOpenConnectionVersion8` (golden-tested,
|
||
`Version8EventSerializerReproducesCapturedNativeStructure` — reproduces the captured 302-byte
|
||
structure exactly) + `HistorianNativeHandshake.BuildEventOpenConnectionVersion8Request` (zeroed
|
||
credential token) + an `eventConnection` switch on `HistorianGrpcHandshake.OpenSession`, and live-ran
|
||
the event read against the server. Result: the v8 `OpenConnection` was **parsed by the server** (got
|
||
past the byte format) but **rejected at the auth check** with native error
|
||
|
||
```
|
||
type=132 code=34 "aahHcapLib::HistoryService::EstablishConnection — Failed to get client key"
|
||
```
|
||
|
||
i.e. `EstablishConnection` could not find a server-side **client key** for our session. In the v6
|
||
path that key is established by `StorageService.ValidateClientCredential` (which is why v6 reads
|
||
work); the v8 path looks it up in the registry that **`HistoryService.ExchangeKey` (ECDH)** populates,
|
||
and there is **no `ValidateClientCredential` on `HistoryService`** in the gRPC contract. So the server
|
||
branches on the OpenConnection version: v6 accepts the Negotiate-established key, **v8 requires the
|
||
ExchangeKey-established key**. The zeroed-token hypothesis is therefore disproven — not because of the
|
||
token bytes, but because the whole v8 path is gated on `ExchangeKey` having run first.
|
||
|
||
**Status:** the v8 serializer/builder are correct and retained (golden-tested), plus the
|
||
`OpenConnection` failure now decodes the native error (type/code/ASCII). The event orchestrator is
|
||
reverted to the v6 session (gated test still pins the no-row throw). The remaining route is **Path B:
|
||
implement `HistoryService.ExchangeKey`** — `"ECK1"` + a 64-byte EC public-key point (P-256 X‖Y, by the
|
||
size) — using .NET `ECDiffieHellman`, establish the client key, then reissue the v8 `OpenConnection`.
|
||
Open question for Path B: whether merely *completing* the ECDH key agreement registers the client key
|
||
(so the zeroed openParameters token still rides through), or whether the token must also be derived
|
||
from the shared secret (full KDF/cipher RE).
|
||
|
||
### Path B started 2026-06-23 — ExchangeKey ECDH works; cleared 2 of 3 layers
|
||
|
||
Implemented `HistoryService.ExchangeKey` as a **pure-managed P-256 ECDH** key exchange
|
||
(`HistorianNativeHandshake.BuildExchangeKeyClientHello` / `DeriveExchangeKeySecret`, .NET
|
||
`ECDiffieHellman` over `nistP256`; wire format `"ECK1" + u32(32) + X(32) + Y(32)`) and wired it into
|
||
`HistorianGrpcHandshake.OpenSession(eventConnection: true)` ahead of the v8 `OpenConnection`,
|
||
on the same context-key handle. Live result against the server: the **`ExchangeKey` RPC succeeds**
|
||
(the server accepted our public key), and the v8 `OpenConnection` error **moved one layer deeper**:
|
||
|
||
```
|
||
Path A (no ExchangeKey): 132/34 "Failed to get client key"
|
||
Path B (ExchangeKey ECDH): 132/171 AuthenticationFailed "EstablishConnection — Authentication failed"
|
||
```
|
||
|
||
So the ECDH cleared the client-key check; the remaining blocker is **authentication**: the 26-byte
|
||
v8 credential token must be a *valid* value derived from the ECDH shared secret (not zeros).
|
||
|
||
### Token crypto traced 2026-06-23 (Frida → Windows CNG) — KDF found, token construction still open
|
||
|
||
Hooked Windows CNG (`bcrypt.dll`/`ncrypt.dll`) while the native harness ran a real ExchangeKey
|
||
(`scripts/frida/aahclientmanaged-cng-exchangekey.js` + `artifacts/.../cng-trace.py`). Findings:
|
||
|
||
- **The ECDH + KDF are standard CNG, driven by managed `System.Security.Cryptography.ECDiffieHellmanCng`**
|
||
(backtrace top frame = `System.Core.ni.dll`; the caller is aahClientManaged's C++/CLI `<Module>`):
|
||
`NCryptSecretAgreement` (P-256) → `NCryptDeriveKey(KDF=HASH, HASH_ALGORITHM=SHA256, 32 bytes)`. So the
|
||
derived key = **SHA256(ECDH shared secret)** — exactly `ECDiffieHellmanCng{ KeyDerivationFunction=Hash,
|
||
HashAlgorithm=SHA256 }.DeriveKeyMaterial(...)`. Our managed `DeriveExchangeKeySecret` should switch to
|
||
this (SHA256 of the raw agreement) to match.
|
||
- **`"ECK1"` is NOT AVEVA-custom** — it is the standard Windows CNG `BCRYPT_ECCPUBLIC_BLOB` magic for
|
||
P-256 (`NCryptExportKey`/`ImportKey` emit exactly `ECK1 + len(32) + X(32) + Y(32)`), confirming our
|
||
`BuildExchangeKeyClientHello` wire format is correct.
|
||
- **The 26-byte token is a custom construction that is not yet reproduced.** Correlated one run's
|
||
derived key (`SHA256(secret)`) with that run's token (from the IL openParameters capture): a
|
||
528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key × request slices ×
|
||
creds) found **no match**, and the token matches **none** of the traced hash digests. The token
|
||
starts with a constant `0x8e` marker in both captured runs (so it is structured, not raw cipher
|
||
output). It is built in managed code between the `DeriveKeyMaterial` call and the openParameters
|
||
assembly.
|
||
|
||
**dnlib IL extraction 2026-06-23 — the token scheme is fully reverse-engineered.** ILSpy can't
|
||
decompile the mixed-mode assembly (crashes), but loading `dnlib` in PowerShell and scanning the IL
|
||
recovered the whole construction:
|
||
|
||
- **`<Module>::CHistoryConnectionGrpc.GetClientKey`** is the ECDH driver: `new ECDiffieHellmanCng()`
|
||
→ `KeyDerivationFunction = Hash`, `HashAlgorithm = SHA256`, `KeySize = 256` →
|
||
`GrpcHistoryClient.ExchangeKey(strHandle, ourPubKey.ToByteArray(), out serverPub, out err)` →
|
||
`CngKey.Import(serverPub, CngKeyBlobFormat.EccPublicBlob)` → **`DeriveKeyMaterial`** = the 32-byte
|
||
client key = **`SHA256(ECDH shared secret)`**. (So our managed side should derive the key the same
|
||
way — `ECDiffieHellman` raw agreement then SHA256, or equivalently `DeriveKeyFromHash(..., SHA256)`.)
|
||
- **The 26-byte token is built by `aahClientCommon.CClientBase.ConfigureOpenConnection`** (the lone
|
||
caller of `GetClientKey`) using the **`HistorianCrypto.NRC4_V2.aahCryptV2`** scheme — a custom
|
||
**MD5-keyed RC4 stream cipher with a version prefix**:
|
||
- `aahCryptV2.body`/`HashData` = **MD5** (verified: the IL loads MD5 round constants `0xd76aa478`…
|
||
and rotates 7/12/17/22).
|
||
- `aahCryptV2.prepare_key` = standard **RC4 KSA** seeding the 256-byte S-box from a **16-byte (MD5)**
|
||
key (`std.array<unsigned char,16>`).
|
||
- `aahCryptV2.enc_buffer` = `MD5(...)` → key, then **`rc4encrypt`** the body; `enc` prepends a
|
||
scheme **prefix** (`NRC4_V2.PrefixV2` / `InnerPrefixV2`) — the constant `0x8e` token marker.
|
||
- `from_GUID` keys the cipher from a GUID string.
|
||
|
||
So the token = `prefix + RC4(plaintext, key = MD5(keyMaterial))`, where the key material ties back to
|
||
the `SHA256(ECDH secret)` client key. **This is 100% reproducible in pure managed code** (RC4 + MD5
|
||
are ~40 lines; nothing AVEVA ships).
|
||
|
||
**Remaining to finish (next cycle):** read `ConfigureOpenConnection`'s exact wiring (which value is
|
||
MD5'd for the RC4 key, what plaintext is encrypted, the exact prefix bytes — a little more dnlib IL),
|
||
implement `aahCryptV2` (RC4+MD5+prefix) managed-side, set the v8 token = that, and live-test
|
||
(non-destructive). The offline correlation data (one run's derived key + token + openParameters) is
|
||
captured under `artifacts/.../` to validate the managed reproduction before going live.
|
||
|
||
### Token implemented + auth WORKS live (2026-06-23); row retrieval still 0 — proven NOT a payload issue
|
||
|
||
`token = RC4(password-UTF16LE, key = MD5(SHA256(ECDH secret)))` was implemented in pure managed C#
|
||
(`HistorianNativeHandshake.BuildExchangeKeyCredentialToken` + `Rc4`; client key via
|
||
`DeriveKeyFromHash(SHA256)`), golden-tested (RC4 standard vector + token construction), and
|
||
**live-verified**: the v8 `OpenConnection` now **authenticates** against the 2023 R2 server (past the
|
||
`132/171 AuthenticationFailed` wall). Auth is solved.
|
||
|
||
The event **query** still returns `version-11 rowCount-0` while the native returns 50 for an
|
||
**identical** request. Exhaustively ruled out as the cause (all confirmed live, opt-in
|
||
`EventReadDiagnostic` test + the IL rewrite extended to log string/uint handle fields):
|
||
|
||
- `StartEventQuery` request: **byte-identical** to the native (v6 layout)
|
||
- v8 `OpenConnection` `openParameters`: **byte-identical** to the native (302 bytes) once ClientNodeName
|
||
is matched — every control byte, ConnectionType, token framing, ShardId, etc.
|
||
- Handle usage: identical — `ExchangeKey`→contextKey, registration→storage-session GUID (`strHandle`),
|
||
query→client uint (`uiHandle`); our parsed handles are valid (registration `RTag/EnsT=True`, valid
|
||
`queryHandle`)
|
||
- `queryRequestType = 3`, registration sequence/order, gzip metadata header — all match
|
||
- window (events exist; native returns 50 *now*), eventCount — not it
|
||
|
||
So **every observable client-side byte matches the native**, yet the server scopes 0 events to our
|
||
connection. The event RPCs succeed over our transport and return a valid *empty* result (not a
|
||
transport error), so it is **not a payload or transport-incompatibility issue** — it is a
|
||
connection/server-level difference (e.g. session affinity tied to the native `Grpc.Core` HTTP/2
|
||
connection or a connection-identity the server uses to scope events) that is **invisible to, and
|
||
unfixable by, client payload matching.** Closing it needs server-side insight or a different angle
|
||
(e.g. compare the full HTTP/2 connection setup / TLS identity), not more wire-payload RE.
|
||
|
||
**Shipped this effort:** the complete ExchangeKey crypto (ECDH + SHA256 + MD5-keyed RC4 token) — the
|
||
hard wall — pure managed, golden-tested, auth live-verified. Orchestrator stays on the no-row throw;
|
||
gated test unchanged.
|
||
|
||
### NEXT SESSION — the server-side / connection angle (row retrieval pickup)
|
||
|
||
Client payloads are exhausted (byte-identical to the native, proven above). The next investigation is
|
||
**connection-level**, not wire-payload. Pursue in roughly this order; each is concrete and testable.
|
||
|
||
**Already proven — do NOT redo:** auth works (ExchangeKey ECDH + RC4 token, live-verified); v8
|
||
`openParameters`, all handles (str/uint), `StartEventQuery` request, registration (`RTag/EnsT=True` +
|
||
order), `queryRequestType=3`, gzip header — all byte-match the native. Events exist (native returns 50
|
||
*now*). The event RPCs succeed over our transport and return a valid version-11 **rowCount-0** (not a
|
||
transport error). So the server scopes 0 events to *our* connection specifically.
|
||
|
||
**Tooling already in place:** opt-in diagnostic test `EventReadDiagnostic_OverGrpc_PrintsJourney`
|
||
(env `HISTORIAN_GRPC_EVENT_DIAG=1`, prints registration outcomes, handles, result hex, v8 buffer);
|
||
the `capture-event` harness scenario (native, returns rows); `instrument-grpc-nonstream` now logs
|
||
string/uint handle fields too; the CNG Frida hook. Live recipe: set `HISTORIAN_GRPC_HOST`/`_PORT
|
||
32565`/`_TLS true`/`_DNSID` to the 2023 R2 server + domain creds (strip quotes); reach the box per the
|
||
live-server access reference.
|
||
|
||
1. ~~**Transport: native `Grpc.Core` HTTP/2 vs our `Grpc.Net.Client` + `GrpcWebHandler` (gRPC-Web).**~~
|
||
**DISPROVEN 2026-06-23.** Built `HistorianGrpcChannelFactory.CreateHttp2` (plain HTTP/2 over a
|
||
`SocketsHttpHandler`, no `GrpcWebHandler` wrap, ALPN `h2` to the TLS server) and wired it into the
|
||
event orchestrator behind `HISTORIAN_GRPC_EVENT_HTTP2=1` (event path only; reads stay gRPC-Web). Live
|
||
side-by-side against the event-bearing server, **everything else held constant**:
|
||
|
||
| channel | auth | registration | queryHandle | result buffer |
|
||
|---------|------|--------------|-------------|---------------|
|
||
| `http2` (native HTTP/2) | ✓ | `RTag=True EnsT=True` | 1057 | `0B00000000001E000000` |
|
||
| `grpc-web` (default) | ✓ | `RTag=True EnsT=True` | 1058 | `0B00000000001E000000` |
|
||
|
||
The complete v8 chain — ExchangeKey ECDH auth, CM_EVENT `RegisterTags`/`EnsureTags`, `StartEventQuery`
|
||
(valid handle) — runs end-to-end over **plain native HTTP/2**, and the server returns the
|
||
**byte-identical** version-11 (`0x0B`) rowCount-0 terminal on both transports. So gRPC-Web vs native
|
||
HTTP/2 is **not** the discriminator — the zero-row scoping is identical regardless of transport. The
|
||
`CreateHttp2` factory + the `HISTORIAN_GRPC_EVENT_HTTP2` switch + the `EventChannelMode` diagnostic are
|
||
retained for future connection-level probing. This eliminates the leading hypothesis and tightens the
|
||
conclusion: the server scopes 0 events to our connection at a layer **above** the gRPC transport.
|
||
|
||
2. ~~**TLS client identity / certificate.**~~ **DISPROVEN 2026-06-23 (decompile + capture).** The stock
|
||
client's `GrpcClientBase.InitializeBase` creates a bare `HttpClientHandler` and sets only
|
||
`ServerCertificateCustomValidationCallback` — it **never adds a client certificate**. The TLS-tee
|
||
capture (below) confirms `clientCert=none` on every native connection. So the native presents no client
|
||
cert; this is not the gate.
|
||
|
||
3. ~~**HTTP/2-level / connection-frame capture.**~~ **DONE 2026-06-23 — topology difference found, tested,
|
||
NULL.** Built a TLS-terminating tee proxy (`artifacts/.../httpcap/`, gitignored: self-signed server
|
||
cert, forwards through the loopback tunnel, logs decrypted HTTP/1.1 + gRPC-Web both ways) and ran a
|
||
**native `capture-event` (returns 50 rows) and our SDK diagnostic (0 rows) through the same
|
||
proxy/upstream**. Note: the stock client is gRPC-Web/HTTP-1.1 (not HTTP/2 — `alpn` empty), so the
|
||
capture is HTTP/1.1 framing. Findings:
|
||
- **Connection topology differs.** The native opens **5 TLS connections, one per service** —
|
||
`HistoryService` (ExchangeKey/OpenConnection/Register/EnsureTags), `StatusService` (×2), and
|
||
**`RetrievalService` (the event query: GetRetrievalInterfaceVersion → StartEventQuery → GetNext →
|
||
EndEventQuery) on its own dedicated connection**. Our SDK collapses **every service onto one
|
||
connection**. (Matches the decompile: stock has a separate `GrpcClientBase` per service.)
|
||
- **Framing differs** (benign): native uses `content-length` + `Expect: 100-continue`; SDK uses
|
||
`transfer-encoding: chunked`. The server accepts both (our `StartEventQuery` returns a valid handle),
|
||
so framing is not the gate. No extra/hidden header on either side; `clientCert=none` throughout.
|
||
- **TESTED the topology hypothesis (`HISTORIAN_GRPC_EVENT_SPLIT_CHANNEL=1`):** ran
|
||
`StartEventQuery`/`GetNext`/`EndEventQuery` on a **dedicated RetrievalService connection** (no
|
||
re-handshake, reusing the session handle — exactly mirroring native conn4), registration staying on
|
||
the main connection. **Result: still `0B00000000001E000000` (0 rows), `QH=1063`.** Splitting the
|
||
event query onto its own connection — the one concrete structural difference the capture revealed —
|
||
**does not make rows flow.** So the server correlates by session handle, not by connection, and the
|
||
topology is **not** the row-scoping gate. The `CreateHttp2`/`SPLIT_CHANNEL` switches + the
|
||
`httpcap` proxy are retained as diagnostics.
|
||
|
||
4. ~~**Server-side ground truth.**~~ **ANSWERED 2026-06-23 (DISPROVES the data-scoping premise).** Via
|
||
the SOCKS→SQL relay (read-only; `artifacts/.../sqlschema/`, gitignored), dumped the full event schema
|
||
on the live `Runtime` DB. Findings:
|
||
- **No per-connection / per-client / per-session column exists anywhere in the event store.** The only
|
||
"scoping-like" columns on `Events`/`EventHistory`/snapshots are event *content* — `Source_*` (event
|
||
origin area/object/PV), `User_*` (who acknowledged), `Provider_NodeName` (alarm provider node),
|
||
`SourceServer`/`SourceTag` (cross-server replication). None is "which client connection requested
|
||
this."
|
||
- **The rich `Events` view is not a relational table — it is served live by the Historian engine via
|
||
the `INSQL` OLE DB provider** (`sys.servers` shows linked servers `INSQL` + `INSQLD`;
|
||
`OBJECT_DEFINITION('dbo.Events')` is `NULL` = encrypted remote view). The Historian's own
|
||
`EventHistory` base table holds just 168 rows / 1 tag (the internal event-tag detector log); the
|
||
alarm/event journal the gRPC query reads lives in the engine, surfaced through INSQL.
|
||
- **Decisive: same engine, same `-90d..now` window, two paths diverge.** The `Events` view (via INSQL)
|
||
returns **71,332 events** for that window — most recent `Alarm.Set` firing seconds before the probe
|
||
(live, every few seconds) — while gRPC `StartEventQuery` for **our** connection returns **0**. The
|
||
data is global, abundant, recent, and identical-window-addressable; the engine simply does not hand
|
||
it to our gRPC connection.
|
||
|
||
→ There is **nothing in the data to scope by**, so the zero-row gate is **not** data scoping. It is the
|
||
gRPC RetrievalService's **per-connection in-process execution state** — the same class of wall as
|
||
`DeleteTagExtendedProperties` (server-side native in-process working-set, not reconstructable from
|
||
byte-identical wire requests). Reproduce: `artifacts/.../sqlschema/` (Program.cs = SOCKS5 relay +
|
||
`Microsoft.Data.SqlClient`; authenticate with the server's SQL login, not the domain Historian acct —
|
||
creds in the gitignored creds file).
|
||
|
||
### Stock managed client decompiled (2026-06-23) — confirms no hidden client-side difference
|
||
|
||
Closing the gap that prior cycles left: the zero-rows conclusion had leaned on **wire capture**
|
||
(`instrument-grpc-nonstream`, which only hooks `byte[]` params on `Grpc*Client` methods) — blind to gRPC
|
||
metadata/headers, interceptors, channel options, and any non-`byte[]` call. Read the **stock managed
|
||
client source directly** (`histsdk-2023r2-analysis/decompiled/Archestra.Historian.GrpcClient` +
|
||
`HistorianAccess`; the pure-managed assemblies decompile cleanly even though the mixed-mode
|
||
`aahClientManaged.dll` crashes ILSpy). Findings:
|
||
|
||
- **`GrpcClientBase.InitializeBase` builds the same channel we do.** `GrpcWebHandler((GrpcWebMode)0,
|
||
HttpClientHandler)` with `HttpVersion = 1.1` — i.e. **the stock client speaks gRPC-Web over HTTP/1.1,
|
||
the same transport as our SDK.** This *corrects the premise of hypothesis #1*: there was never a native
|
||
`Grpc.Core` HTTP/2 path to differ from — the stock client that returns 50 rows is itself gRPC-Web. The
|
||
HTTP/2 disproof's *conclusion* stands (and is reinforced: identical transport on both sides).
|
||
- **`m_metadata` passed to every RPC (incl. `StartEventQuery`/`GetNextEventQueryResultBuffer`) is only
|
||
`grpc-internal-encoding-request: gzip`** — exactly our header set. No connection-id, session token, or
|
||
auth header rides in gRPC metadata. The **`ClientInterceptor` is a no-op** (`LogCall` is empty; both
|
||
unary overloads just invoke the continuation). So the "invisible per-connection metadata/header" blind
|
||
spot is **confirmed empty** — there is no hidden client-side identity the `byte[]` capture missed.
|
||
- **The event-read query orchestration is genuinely not in managed code.** `CreateEventQuery` /
|
||
`EventQuery.StartQuery` / `MoveNext` are not in the managed `HistorianAccess`; the managed
|
||
`GrpcRetrievalClient.StartEventQuery` is a thin one-RPC stub. The query logic lives in the native
|
||
C++/CLI `HistorianClient` core (the mixed-mode part ILSpy can't decompile) — consistent with the
|
||
working-set being native/server-side, not a managed step we could read and replicate.
|
||
|
||
So **every client-controllable layer is now confirmed identical by reading the stock source**, not just
|
||
by wire match: request bytes, transport, channel options, gRPC metadata, interceptor. The remaining
|
||
difference is below the managed surface (native core) / server-side.
|
||
|
||
**Conclusion (after #1–#4 + stock client decompiled + TLS-tee capture).** Every angle is now exhausted:
|
||
- **client payload** — byte-identical (IL capture + decompile);
|
||
- **transport** — stock client is *also* gRPC-Web/HTTP-1.1; native HTTP/2 makes no difference, both 0 rows;
|
||
- **client metadata/interceptor/channel** — decompiled: identical gzip-only header, no-op interceptor, no
|
||
client cert; the TLS-tee capture confirms no hidden header and `clientCert=none`;
|
||
- **connection topology** — the native splits services across 5 connections and queries on a dedicated
|
||
RetrievalService connection; replicating that (`SPLIT_CHANNEL`) still returns 0 rows → the server
|
||
correlates by session handle, not connection;
|
||
- **data store** — global, unscoped; 71,332 events the engine serves via INSQL but withholds from our
|
||
gRPC connection.
|
||
|
||
The gate is a **server-internal per-connection retrieval working-set** that a pure-managed client cannot
|
||
reconstruct by matching wire bytes, transport, metadata, topology, or data — and the establishing logic is
|
||
in the native `HistorianClient` C++ core, not in any decompilable managed step or observable on the wire.
|
||
**gRPC event-row retrieval stands documented as auth-solved / retrieval-server-gated**; `ReadEventsAsync`
|
||
over gRPC keeps the honest no-row throw, and event reads use the WCF transport. Diagnostics retained for
|
||
any future server-side investigation: the `httpcap` TLS-tee proxy, the `CreateHttp2` / `SPLIT_CHANNEL`
|
||
switches, the `EventReadDiagnostic` test, and the `capture-event` harness (native, returns rows).
|
||
|
||
### Verify the parse path against the provided client's real data (2026-06-23) — found + fixed a latent bug
|
||
|
||
Used the provided 2023 R2 client as an **oracle**: the `capture-event` harness returns 50 real events
|
||
(verified live + through the `httpcap` proxy), and the `instrument-grpc-nonstream` rewrite captured the
|
||
exact `GetNextEventQueryResultBuffer.result` buffer the stock client received — **63,192 bytes, version
|
||
`0x0B` (11), rowCount 50** (25 `Alarm.Set` + 25 `Alarm.Clear`). Fed that real buffer through our
|
||
`HistorianEventRowProtocol.Parse` to verify the read path decodes genuine gRPC event data, and it
|
||
**exposed a latent parser bug**:
|
||
|
||
- The real row buffer is `version(2) + rowCount(4) + headerField(4, =0x1E)` then **markerless rows**
|
||
(`rowFormat(2)=7 + filetime(8) + 8×u16 slots + compact-ascii type + propCount + props`). Our parser
|
||
wrongly treated the one-time `0x1E` field as a **per-row marker** and re-consumed `[marker+format]`
|
||
every row — so it parsed only the **first** row of any multi-row buffer and stopped. This is **not
|
||
gRPC-specific**: the captured **WCF v9** buffer has the identical `0900 <rowCount> 1E000000 0700 …`
|
||
header, so the shipped WCF event read had the same latent multi-row truncation.
|
||
- **Fix:** read a 10-byte buffer header (skip the `0x1E` field once) and parse markerless rows; accept
|
||
container version **9 (WCF) and 11 (gRPC)**. Verified: the real 50-row buffer now decodes to exactly 50
|
||
events, ending cleanly at end-of-buffer (`Parse_RealStockClientCapture_DecodesAllEvents`, gated on
|
||
`HISTORIAN_EVENT_CAPTURE_NDJSON`); plus a synthetic v11 golden test. 328 offline tests pass.
|
||
|
||
So the **parse path is now verified against the provided client's real event data** — the one remaining
|
||
gap is strictly the server delivering rows to our gRPC connection (the working-set gate above). If that
|
||
were ever opened, the decoded events would now flow through correctly on both transports.
|
||
|
||
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
|
||
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
|
||
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The
|
||
token-loop routing guardrail (`HistorianGrpcHandshakeRoutingTests`) was scoped to the closure so the
|
||
legitimate ExchangeKey call is allowed while still pinning that the Negotiate token loop never routes
|
||
there.
|