Files
histsdk/docs/reverse-engineering/grpc-event-query-capture.md
T
Joseph Doherty b2ac35b98e docs(grpc-events): trace the ExchangeKey token crypto — KDF=SHA256(secret); token construction localized
Frida-hooked Windows CNG (scripts/frida/aahclientmanaged-cng-exchangekey.js) during
a real native ExchangeKey to recover the token derivation:

- The ECDH + KDF are standard CNG driven by managed System.Security.Cryptography
  .ECDiffieHellmanCng: NCryptSecretAgreement (P-256) -> NCryptDeriveKey(KDF=HASH,
  SHA256, 32 bytes). So the derived key = SHA256(ECDH shared secret).
- "ECK1" is the standard CNG BCRYPT_ECCPUBLIC_BLOB magic (P-256), confirming our
  BuildExchangeKeyClientHello wire format.
- The 26-byte token (constant 0x8e marker) is a custom construction over the
  derived key: a 528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the
  derived key x request slices x creds) found no match, and it matches none of the
  traced hash digests. It is built in aahClientManaged's C++/CLI <Module> code
  between the DeriveKeyMaterial call and the openParameters assembly.

Next: ILSpy cannot decompile the mixed-mode assembly (crashes, exit 70); use dnlib
(IL-level) to dump the <Module> method referencing DeriveKeyMaterial and read the
post-derive token construction. 2 of 3 layers cleared (key exchange + client key);
the 3rd (token) is localized, pending dnlib extraction. Orchestrator stays on v6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 11:11:21 -04:00

18 KiB
Raw Blame History

gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows

Captured the stock 2023 R2 client performing a gRPC event read that returns rows, to resolve the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This closes the capture-gate: the working request shape is now known.

How it was captured

tools/AVEVA.Historian.Grpc2023CaptureHarness gained a capture-event scenario. It loads the self-contained mixed-mode 2023 R2 aahClientManaged.dll and drives HistorianAccess:

OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true)
  -> CreateEventQuery()                      // NON-null only on an Event connection
  -> EventQueryArgs { StartDateTime, EndDateTime, EventCount }
  -> EventQuery.StartQuery(args)             // => GrpcRetrievalClient.StartEventQuery(requestBuffer)
  -> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer
  -> EventQuery.EndQuery() -> CloseConnection

The existing wide-net instrument-grpc-nonstream IL rewrite (every Grpc*Client byte[] method) already covers GrpcRetrievalClient.StartEventQuery.requestBuffer (entry) and GetNextEventQueryResultBuffer.result (exit) — no new instrument command was needed. Run read-only (non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture NDJSON stay under artifacts/reverse-engineering/grpc-event-capture/ (gitignored — the result buffer carries event identity data).

Result: 50 events returned over gRPC (Alarm.Set / Alarm.Clear rows), proving the path works when driven through an Event connection.

Two findings

1. The event read needs an Event-type connection (ConnectionIndex 1)

HistorianAccess.CreateEventQuery() returns null unless IsEventConnectionRequested() — i.e. the connection was opened with ConnectionType=Event, which the native client routes to a separate connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on that connection: OpenConnectionExchangeKeyUpdateClientStatusRegisterTags(CM_EVENT) → EnsureTags(CM_EVENT) → GetHistorianInfo + 7×GetSystemParameter (Stat priming) → StartEventQueryGetNextEventQueryResultBuffer (rows) → EndEventQueryCloseConnection.

2. The working StartEventQuery request is version 6, not 5

Our SDK's HistorianEventQueryProtocol.CreateNativeFilterAttempt builds a version-5 empty-filter buffer; the stock 2023 R2 client sends version 6. Diffed byte-for-byte (same query window + eventCount), the two buffers are identical except:

  • byte 0: version 06 vs 05
  • 5 additional trailing zero bytes (stock = 70 bytes, SDK v5 = 65 bytes)

The server returns rows for v6 and zero rows for v5 (the v5 request is acceptedStartEventQuery succeeds and yields a query handle — but GetNextEventQueryResultBuffer then matches nothing). Everything else is shared: the two query-window FILETIMEs, UInt32 eventCount, the UInt32 65536 buffer hint, the "UTC" HistorianString, and the 01 01000001000001 0000 metadata-namespace block.

Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no identity data):

[0..1]   UInt16  version = 6                 // SDK currently sends 5
[2..9]   Int64   startUtc (FILETIME)
[10..17] Int64   endUtc   (FILETIME)
[18..21] UInt32  eventCount
[22..25] UInt32  0
[26..27] UInt16  0
[28..29] UInt16  1
[30..36] 7 bytes 0                           // empty-filter block
[37..40] UInt32  65536                       // buffer-size hint
[41..50] HistorianString "UTC"  (UInt32 len=3 + UTF-16LE)
[51..60] 01 01 00 00 01 00 00 01 00 00       // metadata-namespace block (marker + 3 empty)
[61..69] 9 bytes 0                            // terminal (SDK v5 writes only 4 here)

Fix part 1 — v6 request (DONE, necessary)

HistorianEventQueryProtocol.CreateStartEventQueryAttempts gained a version parameter (default 5 = WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading 06 and the 5-byte trailing pad. The WCF path is unchanged (v5). Golden test Version6EmptyFilterMatchesCapturedGrpcEnvelope pins the envelope; 322/322 offline tests pass.

Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented)

Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live server, GetNextEventQueryResultBuffer still long-polls and returns zero rows (the gated test still throws). So v6 is necessary but not sufficient — the read also requires an Event-type connection, which our SDK does not open.

Isolated by diffing the captured OpenConnection.openParameters (302 bytes, native format v8) for a Process connection (connect scenario) vs the Event connection (capture-event): aside from the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two sessions), the connection differs in two clean structural bytes:

offset Process Event
95 02 01
96 00 01

These correspond to HistorianConnectionType (Process vs Event; the native event path runs on ConnectionIndex 1). The problem: our SDK opens the session with the 2020 OpenConnection3 v6 buffer (HistorianNativeHandshake.BuildOpenConnection3Request, connectionMode 0x402), which the 2023 R2 server accepts for reads but which carries no event-connection-type marker. connectionMode is NOT the discriminator (2020 WCF event reads work with 0x402); the native client distinguishes event vs process via this separate ConnectionType field in its v8 openParameters.

Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection

Decoded the native openParameters (302 bytes): byte 0 = 08 (format version 8), then a context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource strings, and at [94] ClientType=04 immediately followed by [95] ConnectionType (01=Event / 02=Process) + [96] a flag (01/00), then the rest.

Our SDK builds the v6 buffer (HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6, byte 0 = 06): it writes ClientType (1 byte) immediately followed by ConnectionMode (uint) — there is no ConnectionType byte at all. The v8 format inserts ConnectionType (+flag) between ClientType and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for reads) structurally cannot mark the connection as Event, and the server returns event rows only for an Event connection.

Two further obstacles to simply emitting v8:

  • the native client authenticated via ExchangeKey (cert path; 72-byte btInput/btOutput in the capture) whereas the SDK's gRPC handshake uses ValidateClientCredential (Negotiate). The v8 openParameters [68..93] region is session-derived and tied to that auth flow.
  • ConnectionMode is NOT the lever (2020 WCF event reads work at 0x402); ConnectionType is a distinct field that only exists from format v8.

Also confirmed a secondary format gap: the native gRPC EnsureTags CM_EVENT payload is 86 bytes vs the SDK's SerializeCmEventCTagMetadata 83 bytes (a 3-byte 2023 R2 bump, parallel to the event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns benign-false yet events flow) but should be matched if the event open is ever rebuilt.

Conclusion — the event-connection gate is NOT a tweak. Making event rows flow over gRPC requires the SDK to emit the native v8 OpenConnection format with ConnectionType=Event (a 302-byte buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and likely to adopt the ExchangeKey cert auth path. That is a substantial RE+implementation effort comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated ReadEventsAsync_OverGrpc_* test correctly still pins the no-row throw, and v6 (part 1) is retained as the captured-correct request format for when the open is rebuilt.

Capture artifacts (gitignored): artifacts/reverse-engineering/grpc-event-capture/event-capture.ndjson (Event), process-connect-2.ndjson (Process).

v8 openParameters fully decoded (2026-06-23) + the ECDH ExchangeKey finding

Full byte map of the native Event-connection openParameters (302 bytes; identity values redacted — they are session-specific and sit in the gitignored capture):

[0]        byte   0x08            format version = 8
[1]        byte   0xf0            constant marker
[2..20]    19 ×   0x00
[21]       byte   0x01            constant marker
[22..37]   16B    GUID            per-session client key
[38..41]   u32                    username length (chars)
[42..N]    UTF-16 username        (HistorianString)
[..+1]     u16                    credential-token length (= 26 in the capture)
[..]       26B    token           ECDH-derived credential token  <-- see below
[94]       byte   0x04            ClientType (= our NativeClientType 4)
[95]       byte   ConnectionType  01 = Event / 02 = Process   <-- THE GATE
[96]       byte   flag            01 (Event) / 00 (Process)
[97..]     control bytes          (0x03 ... small region, not fully named)
[~114..117]u32    FormatVersion=3
[..]       HistorianString        machine/server node name
[..]       HistorianString        client node name "(<ver>)"
[..]       u32                    session-variable (process-ish)
[..]       u32 / zeros
[..]       u32    datasource len
[..]       UTF-16 datasource id   e.g. "2023.1219.4004.5"
[270..285] 16 ×   0xff            ShardId (all-FF = unset; our v6 sends Empty)
[286..289] u32                    client/hcal version int
[290..297] i64    FILETIME        ClientTimestamp
[298..301] u32    0

The tail (FormatVersion → machine → clientNode → datasource → ShardId → version → timestamp) is the same ClientCommonInfo our v6 already emits. The new/different parts are: version byte, the [1]/[21] markers, the GUID position, the 26-byte credential token (vs v6's fixed-size block), the ConnectionType byte, and ShardId=FF.

The auth is ECDH, not Negotiate. The capture's ExchangeKey buffers begin 45 43 4b 31 = ASCII "ECK1" + a 64-byte EC public-key point — a Diffie-Hellman key exchange — and the 26-byte openParameters token is derived from it. HistorianSecurityMode offers only Disabled / None / TransportCertificate; the harness used TransportCertificate, which is what drives the ECDH ExchangeKey. There is no TLS+Negotiate mode on the native client (it couples TLS with the cert ECDH path), so a Negotiate-auth v8 capture cannot be produced from the native client.

Key de-risking insight: our SDK's v6 OpenConnection sends a fully zeroed 1026-byte credential block (credentialBlock: new byte[1026]) and reads still work — because authentication is actually carried by the separate StorageService.ValidateClientCredential (Negotiate) handshake, not by the bytes inside openParameters. By analogy the v8 [68..93] token may likewise be ignorable once ValidateClientCredential has run. So the first build hypothesis (cheapest, read-only to test):

Reuse the SDK's existing ValidateClientCredential handshake, then send a v8 OpenConnection with ConnectionType=Event and a zeroed credential token, and see whether the 2023 R2 server returns event rows.

If that works, the ECDH ExchangeKey RE is unnecessary. If it fails, the fallback is full reproduction of the ECDH ExchangeKey handshake (curve/KDF/cipher) — a much larger crypto-RE effort. Build path: add SerializeNativeOpenConnectionVersion8(connectionType) to HistorianOpen2Protocol, wire the gRPC event handshake to use it (events only; reads stay on v6), live-test (non-destructive). Full hex in the gitignored capture.

Path A built + live-tested 2026-06-23 — DISPROVEN (v8 is coupled to ExchangeKey)

Built HistorianOpen2Protocol.SerializeNativeOpenConnectionVersion8 (golden-tested, Version8EventSerializerReproducesCapturedNativeStructure — reproduces the captured 302-byte structure exactly) + HistorianNativeHandshake.BuildEventOpenConnectionVersion8Request (zeroed credential token) + an eventConnection switch on HistorianGrpcHandshake.OpenSession, and live-ran the event read against the server. Result: the v8 OpenConnection was parsed by the server (got past the byte format) but rejected at the auth check with native error

type=132 code=34   "aahHcapLib::HistoryService::EstablishConnection — Failed to get client key"

i.e. EstablishConnection could not find a server-side client key for our session. In the v6 path that key is established by StorageService.ValidateClientCredential (which is why v6 reads work); the v8 path looks it up in the registry that HistoryService.ExchangeKey (ECDH) populates, and there is no ValidateClientCredential on HistoryService in the gRPC contract. So the server branches on the OpenConnection version: v6 accepts the Negotiate-established key, v8 requires the ExchangeKey-established key. The zeroed-token hypothesis is therefore disproven — not because of the token bytes, but because the whole v8 path is gated on ExchangeKey having run first.

Status: the v8 serializer/builder are correct and retained (golden-tested), plus the OpenConnection failure now decodes the native error (type/code/ASCII). The event orchestrator is reverted to the v6 session (gated test still pins the no-row throw). The remaining route is Path B: implement HistoryService.ExchangeKey"ECK1" + a 64-byte EC public-key point (P-256 X‖Y, by the size) — using .NET ECDiffieHellman, establish the client key, then reissue the v8 OpenConnection. Open question for Path B: whether merely completing the ECDH key agreement registers the client key (so the zeroed openParameters token still rides through), or whether the token must also be derived from the shared secret (full KDF/cipher RE).

Path B started 2026-06-23 — ExchangeKey ECDH works; cleared 2 of 3 layers

Implemented HistoryService.ExchangeKey as a pure-managed P-256 ECDH key exchange (HistorianNativeHandshake.BuildExchangeKeyClientHello / DeriveExchangeKeySecret, .NET ECDiffieHellman over nistP256; wire format "ECK1" + u32(32) + X(32) + Y(32)) and wired it into HistorianGrpcHandshake.OpenSession(eventConnection: true) ahead of the v8 OpenConnection, on the same context-key handle. Live result against the server: the ExchangeKey RPC succeeds (the server accepted our public key), and the v8 OpenConnection error moved one layer deeper:

Path A (no ExchangeKey):  132/34  "Failed to get client key"
Path B (ExchangeKey ECDH): 132/171 AuthenticationFailed  "EstablishConnection — Authentication failed"

So the ECDH cleared the client-key check; the remaining blocker is authentication: the 26-byte v8 credential token must be a valid value derived from the ECDH shared secret (not zeros).

Token crypto traced 2026-06-23 (Frida → Windows CNG) — KDF found, token construction still open

Hooked Windows CNG (bcrypt.dll/ncrypt.dll) while the native harness ran a real ExchangeKey (scripts/frida/aahclientmanaged-cng-exchangekey.js + artifacts/.../cng-trace.py). Findings:

  • The ECDH + KDF are standard CNG, driven by managed System.Security.Cryptography.ECDiffieHellmanCng (backtrace top frame = System.Core.ni.dll; the caller is aahClientManaged's C++/CLI <Module>): NCryptSecretAgreement (P-256) → NCryptDeriveKey(KDF=HASH, HASH_ALGORITHM=SHA256, 32 bytes). So the derived key = SHA256(ECDH shared secret) — exactly ECDiffieHellmanCng{ KeyDerivationFunction=Hash, HashAlgorithm=SHA256 }.DeriveKeyMaterial(...). Our managed DeriveExchangeKeySecret should switch to this (SHA256 of the raw agreement) to match.
  • "ECK1" is NOT AVEVA-custom — it is the standard Windows CNG BCRYPT_ECCPUBLIC_BLOB magic for P-256 (NCryptExportKey/ImportKey emit exactly ECK1 + len(32) + X(32) + Y(32)), confirming our BuildExchangeKeyClientHello wire format is correct.
  • The 26-byte token is a custom construction that is not yet reproduced. Correlated one run's derived key (SHA256(secret)) with that run's token (from the IL openParameters capture): a 528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key × request slices × creds) found no match, and the token matches none of the traced hash digests. The token starts with a constant 0x8e marker in both captured runs (so it is structured, not raw cipher output). It is built in managed code between the DeriveKeyMaterial call and the openParameters assembly.

Next step: ILSpy cannot decompile the mixed-mode assembly (full-assembly and <Module> both crash, exit 70). Use dnlib (IL-level, won't choke on the native parts) to dump the <Module> method that references ECDiffieHellmanCng.DeriveKeyMaterial and read the post-derive token construction, then implement it managed-side and re-test (non-destructive).

2 of 3 layers cleared (key exchange + client key); the 3rd (token construction) is localized to a specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the orchestrator stays on v6 (set eventConnection: true to re-arm once the token construction lands). The token-loop routing guardrail (HistorianGrpcHandshakeRoutingTests) was scoped to the closure so the legitimate ExchangeKey call is allowed while still pinning that the Negotiate token loop never routes there.