Files
histsdk/docs/reverse-engineering/grpc-event-query-capture.md
T
Joseph Doherty c45f1a957b docs(grpc-events): token scheme fully RE'd via dnlib — aahCryptV2 (MD5-keyed RC4 + prefix)
Loaded dnlib in PowerShell (ILSpy crashes on the mixed-mode assembly) and scanned
the IL to recover the entire v8 token construction:

- <Module>::CHistoryConnectionGrpc.GetClientKey drives the ECDH: ECDiffieHellmanCng
  {KeyDerivationFunction=Hash, HashAlgorithm=SHA256, KeySize=256} -> ExchangeKey ->
  CngKey.Import(serverPub, EccPublicBlob) -> DeriveKeyMaterial = SHA256(shared secret),
  the 32-byte client key.
- aahClientCommon.CClientBase.ConfigureOpenConnection (the lone GetClientKey caller)
  builds the 26-byte token via HistorianCrypto.NRC4_V2.aahCryptV2 = a custom MD5-keyed
  RC4 stream cipher with a version prefix:
    * body/HashData = MD5 (verified by the round constants 0xd76aa478... + shifts 7/12/17/22)
    * prepare_key = RC4 KSA from a 16-byte MD5 key
    * enc_buffer = MD5 -> key, then rc4encrypt; enc prepends PrefixV2/InnerPrefixV2
      (the constant 0x8e token marker)
  So token = prefix + RC4(plaintext, key=MD5(keyMaterial)), keyMaterial tied to the
  SHA256(ECDH secret) client key. 100% reproducible in pure managed code (RC4+MD5).

Remaining (next cycle): read ConfigureOpenConnection's exact key/plaintext/prefix bytes,
implement aahCryptV2 managed-side, set the v8 token, live-test. Frida CNG + dnlib are
the RE path; nothing AVEVA is shipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 11:21:55 -04:00

309 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows
Captured the stock 2023 R2 client performing a **gRPC event read** that returns rows, to resolve
the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This
closes the capture-gate: the working request shape is now known.
## How it was captured
`tools/AVEVA.Historian.Grpc2023CaptureHarness` gained a `capture-event` scenario. It loads the
self-contained mixed-mode 2023 R2 `aahClientManaged.dll` and drives `HistorianAccess`:
```
OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true)
-> CreateEventQuery() // NON-null only on an Event connection
-> EventQueryArgs { StartDateTime, EndDateTime, EventCount }
-> EventQuery.StartQuery(args) // => GrpcRetrievalClient.StartEventQuery(requestBuffer)
-> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer
-> EventQuery.EndQuery() -> CloseConnection
```
The existing wide-net `instrument-grpc-nonstream` IL rewrite (every `Grpc*Client` `byte[]` method)
already covers `GrpcRetrievalClient.StartEventQuery.requestBuffer` (entry) and
`GetNextEventQueryResultBuffer.result` (exit) — no new instrument command was needed. Run read-only
(non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture
NDJSON stay under `artifacts/reverse-engineering/grpc-event-capture/` (gitignored — the result
buffer carries event identity data).
Result: **50 events returned over gRPC** (Alarm.Set / Alarm.Clear rows), proving the path works when
driven through an Event connection.
## Two findings
### 1. The event read needs an **Event-type connection** (`ConnectionIndex 1`)
`HistorianAccess.CreateEventQuery()` returns `null` unless `IsEventConnectionRequested()` — i.e. the
connection was opened with `ConnectionType=Event`, which the native client routes to a *separate*
connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on
that connection: `OpenConnection``ExchangeKey``UpdateClientStatus``RegisterTags`(CM_EVENT) →
`EnsureTags`(CM_EVENT) → `GetHistorianInfo` + 7×`GetSystemParameter` (Stat priming) →
`StartEventQuery``GetNextEventQueryResultBuffer` (rows) → `EndEventQuery``CloseConnection`.
### 2. The working `StartEventQuery` request is **version 6**, not 5
Our SDK's `HistorianEventQueryProtocol.CreateNativeFilterAttempt` builds a **version-5** empty-filter
buffer; the stock 2023 R2 client sends **version 6**. Diffed byte-for-byte (same query window +
eventCount), the two buffers are **identical except**:
- **byte 0: version `06` vs `05`**
- **5 additional trailing zero bytes** (stock = 70 bytes, SDK v5 = 65 bytes)
The server returns rows for v6 and **zero rows for v5** (the v5 request is *accepted*
`StartEventQuery` succeeds and yields a query handle — but `GetNextEventQueryResultBuffer` then
matches nothing). Everything else is shared: the two query-window FILETIMEs, `UInt32 eventCount`,
the `UInt32 65536` buffer hint, the `"UTC"` `HistorianString`, and the `01 01000001000001 0000`
metadata-namespace block.
Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no
identity data):
```
[0..1] UInt16 version = 6 // SDK currently sends 5
[2..9] Int64 startUtc (FILETIME)
[10..17] Int64 endUtc (FILETIME)
[18..21] UInt32 eventCount
[22..25] UInt32 0
[26..27] UInt16 0
[28..29] UInt16 1
[30..36] 7 bytes 0 // empty-filter block
[37..40] UInt32 65536 // buffer-size hint
[41..50] HistorianString "UTC" (UInt32 len=3 + UTF-16LE)
[51..60] 01 01 00 00 01 00 00 01 00 00 // metadata-namespace block (marker + 3 empty)
[61..69] 9 bytes 0 // terminal (SDK v5 writes only 4 here)
```
## Fix part 1 — v6 request (DONE, necessary)
`HistorianEventQueryProtocol.CreateStartEventQueryAttempts` gained a `version` parameter (default 5 =
WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading `06` and the 5-byte trailing pad. The
WCF path is unchanged (v5). Golden test `Version6EmptyFilterMatchesCapturedGrpcEnvelope` pins the
envelope; 322/322 offline tests pass.
## Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented)
Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live
server, `GetNextEventQueryResultBuffer` **still long-polls and returns zero rows** (the gated test
still throws). So **v6 is necessary but not sufficient** — the read also requires an **Event-type
connection**, which our SDK does not open.
Isolated by diffing the captured `OpenConnection.openParameters` (302 bytes, native format v8) for a
**Process** connection (`connect` scenario) vs the **Event** connection (`capture-event`): aside from
the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two
sessions), the connection differs in **two clean structural bytes**:
| offset | Process | Event |
|--------|---------|-------|
| 95 | `02` | `01` |
| 96 | `00` | `01` |
These correspond to `HistorianConnectionType` (Process vs Event; the native event path runs on
`ConnectionIndex 1`). The problem: our SDK opens the session with the **2020 OpenConnection3 v6**
buffer (`HistorianNativeHandshake.BuildOpenConnection3Request`, `connectionMode 0x402`), which the
2023 R2 server accepts for reads but which carries no event-connection-type marker. `connectionMode`
is NOT the discriminator (2020 WCF event reads work with `0x402`); the native client distinguishes
event vs process via this separate `ConnectionType` field in its v8 `openParameters`.
### Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection
Decoded the native `openParameters` (302 bytes): **byte 0 = `08` (format version 8)**, then a
context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource
strings, and at **[94] `ClientType=04`** immediately followed by **[95] `ConnectionType`
(`01`=Event / `02`=Process)** + **[96] a flag (`01`/`00`)**, then the rest.
Our SDK builds the **v6** buffer (`HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6`,
byte 0 = `06`): it writes `ClientType` (1 byte) **immediately followed by `ConnectionMode` (uint)**
there is **no `ConnectionType` byte at all**. The v8 format *inserts* `ConnectionType` (+flag) between
`ClientType` and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for *reads*)
structurally cannot mark the connection as Event, and the server returns event rows only for an Event
connection.
Two further obstacles to simply emitting v8:
- the native client authenticated via **`ExchangeKey`** (cert path; 72-byte `btInput`/`btOutput` in
the capture) whereas the SDK's gRPC handshake uses **`ValidateClientCredential`** (Negotiate). The
v8 `openParameters` [68..93] region is session-derived and tied to that auth flow.
- `ConnectionMode` is NOT the lever (2020 WCF event reads work at `0x402`); `ConnectionType` is a
distinct field that only exists from format v8.
Also confirmed a secondary format gap: the native gRPC `EnsureTags` CM_EVENT payload is **86 bytes**
vs the SDK's `SerializeCmEventCTagMetadata` **83 bytes** (a 3-byte 2023 R2 bump, parallel to the
event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns
benign-false yet events flow) but should be matched if the event open is ever rebuilt.
**Conclusion — the event-connection gate is NOT a tweak.** Making event rows flow over gRPC requires
the SDK to emit the native **v8 `OpenConnection` format** with `ConnectionType=Event` (a 302-byte
buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and
likely to adopt the `ExchangeKey` cert auth path. That is a substantial RE+implementation effort
comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated
`ReadEventsAsync_OverGrpc_*` test correctly still pins the no-row throw, and **v6 (part 1) is retained
as the captured-correct request format** for when the open is rebuilt.
Capture artifacts (gitignored): `artifacts/reverse-engineering/grpc-event-capture/`
`event-capture.ndjson` (Event), `process-connect-2.ndjson` (Process).
## v8 `openParameters` fully decoded (2026-06-23) + the ECDH ExchangeKey finding
Full byte map of the native Event-connection `openParameters` (302 bytes; identity values
redacted — they are session-specific and sit in the gitignored capture):
```
[0] byte 0x08 format version = 8
[1] byte 0xf0 constant marker
[2..20] 19 × 0x00
[21] byte 0x01 constant marker
[22..37] 16B GUID per-session client key
[38..41] u32 username length (chars)
[42..N] UTF-16 username (HistorianString)
[..+1] u16 credential-token length (= 26 in the capture)
[..] 26B token ECDH-derived credential token <-- see below
[94] byte 0x04 ClientType (= our NativeClientType 4)
[95] byte ConnectionType 01 = Event / 02 = Process <-- THE GATE
[96] byte flag 01 (Event) / 00 (Process)
[97..] control bytes (0x03 ... small region, not fully named)
[~114..117]u32 FormatVersion=3
[..] HistorianString machine/server node name
[..] HistorianString client node name "(<ver>)"
[..] u32 session-variable (process-ish)
[..] u32 / zeros
[..] u32 datasource len
[..] UTF-16 datasource id e.g. "2023.1219.4004.5"
[270..285] 16 × 0xff ShardId (all-FF = unset; our v6 sends Empty)
[286..289] u32 client/hcal version int
[290..297] i64 FILETIME ClientTimestamp
[298..301] u32 0
```
The tail (`FormatVersion` → machine → clientNode → datasource → ShardId → version → timestamp)
is the **same `ClientCommonInfo` our v6 already emits**. The new/different parts are: version byte,
the `[1]`/`[21]` markers, the GUID position, the **26-byte credential token** (vs v6's fixed-size
block), the **`ConnectionType` byte**, and ShardId=FF.
**The auth is ECDH, not Negotiate.** The capture's `ExchangeKey` buffers begin `45 43 4b 31` =
ASCII **`"ECK1"`** + a 64-byte EC public-key point — a Diffie-Hellman key exchange — and the 26-byte
`openParameters` token is derived from it. `HistorianSecurityMode` offers only `Disabled` / `None` /
`TransportCertificate`; the harness used `TransportCertificate`, which is what drives the ECDH
`ExchangeKey`. There is **no TLS+Negotiate mode** on the native client (it couples TLS with the cert
ECDH path), so a Negotiate-auth v8 capture cannot be produced from the native client.
**Key de-risking insight:** our SDK's v6 `OpenConnection` sends a **fully zeroed** 1026-byte
credential block (`credentialBlock: new byte[1026]`) and reads still work — because authentication is
actually carried by the separate `StorageService.ValidateClientCredential` (Negotiate) handshake, not
by the bytes inside `openParameters`. By analogy the v8 `[68..93]` token may likewise be **ignorable**
once `ValidateClientCredential` has run. So the first build hypothesis (cheapest, read-only to test):
> Reuse the SDK's existing `ValidateClientCredential` handshake, then send a **v8 `OpenConnection`
> with `ConnectionType=Event` and a zeroed credential token**, and see whether the 2023 R2 server
> returns event rows.
If that works, the ECDH ExchangeKey RE is unnecessary. If it fails, the fallback is full reproduction
of the ECDH `ExchangeKey` handshake (curve/KDF/cipher) — a much larger crypto-RE effort. Build path:
add `SerializeNativeOpenConnectionVersion8(connectionType)` to `HistorianOpen2Protocol`, wire the gRPC
event handshake to use it (events only; reads stay on v6), live-test (non-destructive). Full hex in
the gitignored capture.
### Path A built + live-tested 2026-06-23 — DISPROVEN (v8 is coupled to ExchangeKey)
Built `HistorianOpen2Protocol.SerializeNativeOpenConnectionVersion8` (golden-tested,
`Version8EventSerializerReproducesCapturedNativeStructure` — reproduces the captured 302-byte
structure exactly) + `HistorianNativeHandshake.BuildEventOpenConnectionVersion8Request` (zeroed
credential token) + an `eventConnection` switch on `HistorianGrpcHandshake.OpenSession`, and live-ran
the event read against the server. Result: the v8 `OpenConnection` was **parsed by the server** (got
past the byte format) but **rejected at the auth check** with native error
```
type=132 code=34 "aahHcapLib::HistoryService::EstablishConnection — Failed to get client key"
```
i.e. `EstablishConnection` could not find a server-side **client key** for our session. In the v6
path that key is established by `StorageService.ValidateClientCredential` (which is why v6 reads
work); the v8 path looks it up in the registry that **`HistoryService.ExchangeKey` (ECDH)** populates,
and there is **no `ValidateClientCredential` on `HistoryService`** in the gRPC contract. So the server
branches on the OpenConnection version: v6 accepts the Negotiate-established key, **v8 requires the
ExchangeKey-established key**. The zeroed-token hypothesis is therefore disproven — not because of the
token bytes, but because the whole v8 path is gated on `ExchangeKey` having run first.
**Status:** the v8 serializer/builder are correct and retained (golden-tested), plus the
`OpenConnection` failure now decodes the native error (type/code/ASCII). The event orchestrator is
reverted to the v6 session (gated test still pins the no-row throw). The remaining route is **Path B:
implement `HistoryService.ExchangeKey`** — `"ECK1"` + a 64-byte EC public-key point (P-256 X‖Y, by the
size) — using .NET `ECDiffieHellman`, establish the client key, then reissue the v8 `OpenConnection`.
Open question for Path B: whether merely *completing* the ECDH key agreement registers the client key
(so the zeroed openParameters token still rides through), or whether the token must also be derived
from the shared secret (full KDF/cipher RE).
### Path B started 2026-06-23 — ExchangeKey ECDH works; cleared 2 of 3 layers
Implemented `HistoryService.ExchangeKey` as a **pure-managed P-256 ECDH** key exchange
(`HistorianNativeHandshake.BuildExchangeKeyClientHello` / `DeriveExchangeKeySecret`, .NET
`ECDiffieHellman` over `nistP256`; wire format `"ECK1" + u32(32) + X(32) + Y(32)`) and wired it into
`HistorianGrpcHandshake.OpenSession(eventConnection: true)` ahead of the v8 `OpenConnection`,
on the same context-key handle. Live result against the server: the **`ExchangeKey` RPC succeeds**
(the server accepted our public key), and the v8 `OpenConnection` error **moved one layer deeper**:
```
Path A (no ExchangeKey): 132/34 "Failed to get client key"
Path B (ExchangeKey ECDH): 132/171 AuthenticationFailed "EstablishConnection — Authentication failed"
```
So the ECDH cleared the client-key check; the remaining blocker is **authentication**: the 26-byte
v8 credential token must be a *valid* value derived from the ECDH shared secret (not zeros).
### Token crypto traced 2026-06-23 (Frida → Windows CNG) — KDF found, token construction still open
Hooked Windows CNG (`bcrypt.dll`/`ncrypt.dll`) while the native harness ran a real ExchangeKey
(`scripts/frida/aahclientmanaged-cng-exchangekey.js` + `artifacts/.../cng-trace.py`). Findings:
- **The ECDH + KDF are standard CNG, driven by managed `System.Security.Cryptography.ECDiffieHellmanCng`**
(backtrace top frame = `System.Core.ni.dll`; the caller is aahClientManaged's C++/CLI `<Module>`):
`NCryptSecretAgreement` (P-256) → `NCryptDeriveKey(KDF=HASH, HASH_ALGORITHM=SHA256, 32 bytes)`. So the
derived key = **SHA256(ECDH shared secret)** — exactly `ECDiffieHellmanCng{ KeyDerivationFunction=Hash,
HashAlgorithm=SHA256 }.DeriveKeyMaterial(...)`. Our managed `DeriveExchangeKeySecret` should switch to
this (SHA256 of the raw agreement) to match.
- **`"ECK1"` is NOT AVEVA-custom** — it is the standard Windows CNG `BCRYPT_ECCPUBLIC_BLOB` magic for
P-256 (`NCryptExportKey`/`ImportKey` emit exactly `ECK1 + len(32) + X(32) + Y(32)`), confirming our
`BuildExchangeKeyClientHello` wire format is correct.
- **The 26-byte token is a custom construction that is not yet reproduced.** Correlated one run's
derived key (`SHA256(secret)`) with that run's token (from the IL openParameters capture): a
528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key × request slices ×
creds) found **no match**, and the token matches **none** of the traced hash digests. The token
starts with a constant `0x8e` marker in both captured runs (so it is structured, not raw cipher
output). It is built in managed code between the `DeriveKeyMaterial` call and the openParameters
assembly.
**dnlib IL extraction 2026-06-23 — the token scheme is fully reverse-engineered.** ILSpy can't
decompile the mixed-mode assembly (crashes), but loading `dnlib` in PowerShell and scanning the IL
recovered the whole construction:
- **`<Module>::CHistoryConnectionGrpc.GetClientKey`** is the ECDH driver: `new ECDiffieHellmanCng()`
`KeyDerivationFunction = Hash`, `HashAlgorithm = SHA256`, `KeySize = 256`
`GrpcHistoryClient.ExchangeKey(strHandle, ourPubKey.ToByteArray(), out serverPub, out err)`
`CngKey.Import(serverPub, CngKeyBlobFormat.EccPublicBlob)`**`DeriveKeyMaterial`** = the 32-byte
client key = **`SHA256(ECDH shared secret)`**. (So our managed side should derive the key the same
way — `ECDiffieHellman` raw agreement then SHA256, or equivalently `DeriveKeyFromHash(..., SHA256)`.)
- **The 26-byte token is built by `aahClientCommon.CClientBase.ConfigureOpenConnection`** (the lone
caller of `GetClientKey`) using the **`HistorianCrypto.NRC4_V2.aahCryptV2`** scheme — a custom
**MD5-keyed RC4 stream cipher with a version prefix**:
- `aahCryptV2.body`/`HashData` = **MD5** (verified: the IL loads MD5 round constants `0xd76aa478`
and rotates 7/12/17/22).
- `aahCryptV2.prepare_key` = standard **RC4 KSA** seeding the 256-byte S-box from a **16-byte (MD5)**
key (`std.array<unsigned char,16>`).
- `aahCryptV2.enc_buffer` = `MD5(...)` → key, then **`rc4encrypt`** the body; `enc` prepends a
scheme **prefix** (`NRC4_V2.PrefixV2` / `InnerPrefixV2`) — the constant `0x8e` token marker.
- `from_GUID` keys the cipher from a GUID string.
So the token = `prefix + RC4(plaintext, key = MD5(keyMaterial))`, where the key material ties back to
the `SHA256(ECDH secret)` client key. **This is 100% reproducible in pure managed code** (RC4 + MD5
are ~40 lines; nothing AVEVA ships).
**Remaining to finish (next cycle):** read `ConfigureOpenConnection`'s exact wiring (which value is
MD5'd for the RC4 key, what plaintext is encrypted, the exact prefix bytes — a little more dnlib IL),
implement `aahCryptV2` (RC4+MD5+prefix) managed-side, set the v8 token = that, and live-test
(non-destructive). The offline correlation data (one run's derived key + token + openParameters) is
captured under `artifacts/.../` to validate the managed reproduction before going live.
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The
token-loop routing guardrail (`HistorianGrpcHandshakeRoutingTests`) was scoped to the closure so the
legitimate ExchangeKey call is allowed while still pinning that the Negotiate token loop never routes
there.