b2ac35b98e
Frida-hooked Windows CNG (scripts/frida/aahclientmanaged-cng-exchangekey.js) during a real native ExchangeKey to recover the token derivation: - The ECDH + KDF are standard CNG driven by managed System.Security.Cryptography .ECDiffieHellmanCng: NCryptSecretAgreement (P-256) -> NCryptDeriveKey(KDF=HASH, SHA256, 32 bytes). So the derived key = SHA256(ECDH shared secret). - "ECK1" is the standard CNG BCRYPT_ECCPUBLIC_BLOB magic (P-256), confirming our BuildExchangeKeyClientHello wire format. - The 26-byte token (constant 0x8e marker) is a custom construction over the derived key: a 528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key x request slices x creds) found no match, and it matches none of the traced hash digests. It is built in aahClientManaged's C++/CLI <Module> code between the DeriveKeyMaterial call and the openParameters assembly. Next: ILSpy cannot decompile the mixed-mode assembly (crashes, exit 70); use dnlib (IL-level) to dump the <Module> method referencing DeriveKeyMaterial and read the post-derive token construction. 2 of 3 layers cleared (key exchange + client key); the 3rd (token) is localized, pending dnlib extraction. Orchestrator stays on v6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
283 lines
18 KiB
Markdown
283 lines
18 KiB
Markdown
# gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows
|
||
|
||
Captured the stock 2023 R2 client performing a **gRPC event read** that returns rows, to resolve
|
||
the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This
|
||
closes the capture-gate: the working request shape is now known.
|
||
|
||
## How it was captured
|
||
|
||
`tools/AVEVA.Historian.Grpc2023CaptureHarness` gained a `capture-event` scenario. It loads the
|
||
self-contained mixed-mode 2023 R2 `aahClientManaged.dll` and drives `HistorianAccess`:
|
||
|
||
```
|
||
OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true)
|
||
-> CreateEventQuery() // NON-null only on an Event connection
|
||
-> EventQueryArgs { StartDateTime, EndDateTime, EventCount }
|
||
-> EventQuery.StartQuery(args) // => GrpcRetrievalClient.StartEventQuery(requestBuffer)
|
||
-> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer
|
||
-> EventQuery.EndQuery() -> CloseConnection
|
||
```
|
||
|
||
The existing wide-net `instrument-grpc-nonstream` IL rewrite (every `Grpc*Client` `byte[]` method)
|
||
already covers `GrpcRetrievalClient.StartEventQuery.requestBuffer` (entry) and
|
||
`GetNextEventQueryResultBuffer.result` (exit) — no new instrument command was needed. Run read-only
|
||
(non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture
|
||
NDJSON stay under `artifacts/reverse-engineering/grpc-event-capture/` (gitignored — the result
|
||
buffer carries event identity data).
|
||
|
||
Result: **50 events returned over gRPC** (Alarm.Set / Alarm.Clear rows), proving the path works when
|
||
driven through an Event connection.
|
||
|
||
## Two findings
|
||
|
||
### 1. The event read needs an **Event-type connection** (`ConnectionIndex 1`)
|
||
|
||
`HistorianAccess.CreateEventQuery()` returns `null` unless `IsEventConnectionRequested()` — i.e. the
|
||
connection was opened with `ConnectionType=Event`, which the native client routes to a *separate*
|
||
connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on
|
||
that connection: `OpenConnection` → `ExchangeKey` → `UpdateClientStatus` → `RegisterTags`(CM_EVENT) →
|
||
`EnsureTags`(CM_EVENT) → `GetHistorianInfo` + 7×`GetSystemParameter` (Stat priming) →
|
||
`StartEventQuery` → `GetNextEventQueryResultBuffer` (rows) → `EndEventQuery` → `CloseConnection`.
|
||
|
||
### 2. The working `StartEventQuery` request is **version 6**, not 5
|
||
|
||
Our SDK's `HistorianEventQueryProtocol.CreateNativeFilterAttempt` builds a **version-5** empty-filter
|
||
buffer; the stock 2023 R2 client sends **version 6**. Diffed byte-for-byte (same query window +
|
||
eventCount), the two buffers are **identical except**:
|
||
|
||
- **byte 0: version `06` vs `05`**
|
||
- **5 additional trailing zero bytes** (stock = 70 bytes, SDK v5 = 65 bytes)
|
||
|
||
The server returns rows for v6 and **zero rows for v5** (the v5 request is *accepted* —
|
||
`StartEventQuery` succeeds and yields a query handle — but `GetNextEventQueryResultBuffer` then
|
||
matches nothing). Everything else is shared: the two query-window FILETIMEs, `UInt32 eventCount`,
|
||
the `UInt32 65536` buffer hint, the `"UTC"` `HistorianString`, and the `01 01000001000001 0000`
|
||
metadata-namespace block.
|
||
|
||
Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no
|
||
identity data):
|
||
|
||
```
|
||
[0..1] UInt16 version = 6 // SDK currently sends 5
|
||
[2..9] Int64 startUtc (FILETIME)
|
||
[10..17] Int64 endUtc (FILETIME)
|
||
[18..21] UInt32 eventCount
|
||
[22..25] UInt32 0
|
||
[26..27] UInt16 0
|
||
[28..29] UInt16 1
|
||
[30..36] 7 bytes 0 // empty-filter block
|
||
[37..40] UInt32 65536 // buffer-size hint
|
||
[41..50] HistorianString "UTC" (UInt32 len=3 + UTF-16LE)
|
||
[51..60] 01 01 00 00 01 00 00 01 00 00 // metadata-namespace block (marker + 3 empty)
|
||
[61..69] 9 bytes 0 // terminal (SDK v5 writes only 4 here)
|
||
```
|
||
|
||
## Fix part 1 — v6 request (DONE, necessary)
|
||
|
||
`HistorianEventQueryProtocol.CreateStartEventQueryAttempts` gained a `version` parameter (default 5 =
|
||
WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading `06` and the 5-byte trailing pad. The
|
||
WCF path is unchanged (v5). Golden test `Version6EmptyFilterMatchesCapturedGrpcEnvelope` pins the
|
||
envelope; 322/322 offline tests pass.
|
||
|
||
## Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented)
|
||
|
||
Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live
|
||
server, `GetNextEventQueryResultBuffer` **still long-polls and returns zero rows** (the gated test
|
||
still throws). So **v6 is necessary but not sufficient** — the read also requires an **Event-type
|
||
connection**, which our SDK does not open.
|
||
|
||
Isolated by diffing the captured `OpenConnection.openParameters` (302 bytes, native format v8) for a
|
||
**Process** connection (`connect` scenario) vs the **Event** connection (`capture-event`): aside from
|
||
the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two
|
||
sessions), the connection differs in **two clean structural bytes**:
|
||
|
||
| offset | Process | Event |
|
||
|--------|---------|-------|
|
||
| 95 | `02` | `01` |
|
||
| 96 | `00` | `01` |
|
||
|
||
These correspond to `HistorianConnectionType` (Process vs Event; the native event path runs on
|
||
`ConnectionIndex 1`). The problem: our SDK opens the session with the **2020 OpenConnection3 v6**
|
||
buffer (`HistorianNativeHandshake.BuildOpenConnection3Request`, `connectionMode 0x402`), which the
|
||
2023 R2 server accepts for reads but which carries no event-connection-type marker. `connectionMode`
|
||
is NOT the discriminator (2020 WCF event reads work with `0x402`); the native client distinguishes
|
||
event vs process via this separate `ConnectionType` field in its v8 `openParameters`.
|
||
|
||
### Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection
|
||
|
||
Decoded the native `openParameters` (302 bytes): **byte 0 = `08` (format version 8)**, then a
|
||
context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource
|
||
strings, and at **[94] `ClientType=04`** immediately followed by **[95] `ConnectionType`
|
||
(`01`=Event / `02`=Process)** + **[96] a flag (`01`/`00`)**, then the rest.
|
||
|
||
Our SDK builds the **v6** buffer (`HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6`,
|
||
byte 0 = `06`): it writes `ClientType` (1 byte) **immediately followed by `ConnectionMode` (uint)** —
|
||
there is **no `ConnectionType` byte at all**. The v8 format *inserts* `ConnectionType` (+flag) between
|
||
`ClientType` and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for *reads*)
|
||
structurally cannot mark the connection as Event, and the server returns event rows only for an Event
|
||
connection.
|
||
|
||
Two further obstacles to simply emitting v8:
|
||
- the native client authenticated via **`ExchangeKey`** (cert path; 72-byte `btInput`/`btOutput` in
|
||
the capture) whereas the SDK's gRPC handshake uses **`ValidateClientCredential`** (Negotiate). The
|
||
v8 `openParameters` [68..93] region is session-derived and tied to that auth flow.
|
||
- `ConnectionMode` is NOT the lever (2020 WCF event reads work at `0x402`); `ConnectionType` is a
|
||
distinct field that only exists from format v8.
|
||
|
||
Also confirmed a secondary format gap: the native gRPC `EnsureTags` CM_EVENT payload is **86 bytes**
|
||
vs the SDK's `SerializeCmEventCTagMetadata` **83 bytes** (a 3-byte 2023 R2 bump, parallel to the
|
||
event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns
|
||
benign-false yet events flow) but should be matched if the event open is ever rebuilt.
|
||
|
||
**Conclusion — the event-connection gate is NOT a tweak.** Making event rows flow over gRPC requires
|
||
the SDK to emit the native **v8 `OpenConnection` format** with `ConnectionType=Event` (a 302-byte
|
||
buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and
|
||
likely to adopt the `ExchangeKey` cert auth path. That is a substantial RE+implementation effort
|
||
comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated
|
||
`ReadEventsAsync_OverGrpc_*` test correctly still pins the no-row throw, and **v6 (part 1) is retained
|
||
as the captured-correct request format** for when the open is rebuilt.
|
||
|
||
Capture artifacts (gitignored): `artifacts/reverse-engineering/grpc-event-capture/` —
|
||
`event-capture.ndjson` (Event), `process-connect-2.ndjson` (Process).
|
||
|
||
## v8 `openParameters` fully decoded (2026-06-23) + the ECDH ExchangeKey finding
|
||
|
||
Full byte map of the native Event-connection `openParameters` (302 bytes; identity values
|
||
redacted — they are session-specific and sit in the gitignored capture):
|
||
|
||
```
|
||
[0] byte 0x08 format version = 8
|
||
[1] byte 0xf0 constant marker
|
||
[2..20] 19 × 0x00
|
||
[21] byte 0x01 constant marker
|
||
[22..37] 16B GUID per-session client key
|
||
[38..41] u32 username length (chars)
|
||
[42..N] UTF-16 username (HistorianString)
|
||
[..+1] u16 credential-token length (= 26 in the capture)
|
||
[..] 26B token ECDH-derived credential token <-- see below
|
||
[94] byte 0x04 ClientType (= our NativeClientType 4)
|
||
[95] byte ConnectionType 01 = Event / 02 = Process <-- THE GATE
|
||
[96] byte flag 01 (Event) / 00 (Process)
|
||
[97..] control bytes (0x03 ... small region, not fully named)
|
||
[~114..117]u32 FormatVersion=3
|
||
[..] HistorianString machine/server node name
|
||
[..] HistorianString client node name "(<ver>)"
|
||
[..] u32 session-variable (process-ish)
|
||
[..] u32 / zeros
|
||
[..] u32 datasource len
|
||
[..] UTF-16 datasource id e.g. "2023.1219.4004.5"
|
||
[270..285] 16 × 0xff ShardId (all-FF = unset; our v6 sends Empty)
|
||
[286..289] u32 client/hcal version int
|
||
[290..297] i64 FILETIME ClientTimestamp
|
||
[298..301] u32 0
|
||
```
|
||
|
||
The tail (`FormatVersion` → machine → clientNode → datasource → ShardId → version → timestamp)
|
||
is the **same `ClientCommonInfo` our v6 already emits**. The new/different parts are: version byte,
|
||
the `[1]`/`[21]` markers, the GUID position, the **26-byte credential token** (vs v6's fixed-size
|
||
block), the **`ConnectionType` byte**, and ShardId=FF.
|
||
|
||
**The auth is ECDH, not Negotiate.** The capture's `ExchangeKey` buffers begin `45 43 4b 31` =
|
||
ASCII **`"ECK1"`** + a 64-byte EC public-key point — a Diffie-Hellman key exchange — and the 26-byte
|
||
`openParameters` token is derived from it. `HistorianSecurityMode` offers only `Disabled` / `None` /
|
||
`TransportCertificate`; the harness used `TransportCertificate`, which is what drives the ECDH
|
||
`ExchangeKey`. There is **no TLS+Negotiate mode** on the native client (it couples TLS with the cert
|
||
ECDH path), so a Negotiate-auth v8 capture cannot be produced from the native client.
|
||
|
||
**Key de-risking insight:** our SDK's v6 `OpenConnection` sends a **fully zeroed** 1026-byte
|
||
credential block (`credentialBlock: new byte[1026]`) and reads still work — because authentication is
|
||
actually carried by the separate `StorageService.ValidateClientCredential` (Negotiate) handshake, not
|
||
by the bytes inside `openParameters`. By analogy the v8 `[68..93]` token may likewise be **ignorable**
|
||
once `ValidateClientCredential` has run. So the first build hypothesis (cheapest, read-only to test):
|
||
|
||
> Reuse the SDK's existing `ValidateClientCredential` handshake, then send a **v8 `OpenConnection`
|
||
> with `ConnectionType=Event` and a zeroed credential token**, and see whether the 2023 R2 server
|
||
> returns event rows.
|
||
|
||
If that works, the ECDH ExchangeKey RE is unnecessary. If it fails, the fallback is full reproduction
|
||
of the ECDH `ExchangeKey` handshake (curve/KDF/cipher) — a much larger crypto-RE effort. Build path:
|
||
add `SerializeNativeOpenConnectionVersion8(connectionType)` to `HistorianOpen2Protocol`, wire the gRPC
|
||
event handshake to use it (events only; reads stay on v6), live-test (non-destructive). Full hex in
|
||
the gitignored capture.
|
||
|
||
### Path A built + live-tested 2026-06-23 — DISPROVEN (v8 is coupled to ExchangeKey)
|
||
|
||
Built `HistorianOpen2Protocol.SerializeNativeOpenConnectionVersion8` (golden-tested,
|
||
`Version8EventSerializerReproducesCapturedNativeStructure` — reproduces the captured 302-byte
|
||
structure exactly) + `HistorianNativeHandshake.BuildEventOpenConnectionVersion8Request` (zeroed
|
||
credential token) + an `eventConnection` switch on `HistorianGrpcHandshake.OpenSession`, and live-ran
|
||
the event read against the server. Result: the v8 `OpenConnection` was **parsed by the server** (got
|
||
past the byte format) but **rejected at the auth check** with native error
|
||
|
||
```
|
||
type=132 code=34 "aahHcapLib::HistoryService::EstablishConnection — Failed to get client key"
|
||
```
|
||
|
||
i.e. `EstablishConnection` could not find a server-side **client key** for our session. In the v6
|
||
path that key is established by `StorageService.ValidateClientCredential` (which is why v6 reads
|
||
work); the v8 path looks it up in the registry that **`HistoryService.ExchangeKey` (ECDH)** populates,
|
||
and there is **no `ValidateClientCredential` on `HistoryService`** in the gRPC contract. So the server
|
||
branches on the OpenConnection version: v6 accepts the Negotiate-established key, **v8 requires the
|
||
ExchangeKey-established key**. The zeroed-token hypothesis is therefore disproven — not because of the
|
||
token bytes, but because the whole v8 path is gated on `ExchangeKey` having run first.
|
||
|
||
**Status:** the v8 serializer/builder are correct and retained (golden-tested), plus the
|
||
`OpenConnection` failure now decodes the native error (type/code/ASCII). The event orchestrator is
|
||
reverted to the v6 session (gated test still pins the no-row throw). The remaining route is **Path B:
|
||
implement `HistoryService.ExchangeKey`** — `"ECK1"` + a 64-byte EC public-key point (P-256 X‖Y, by the
|
||
size) — using .NET `ECDiffieHellman`, establish the client key, then reissue the v8 `OpenConnection`.
|
||
Open question for Path B: whether merely *completing* the ECDH key agreement registers the client key
|
||
(so the zeroed openParameters token still rides through), or whether the token must also be derived
|
||
from the shared secret (full KDF/cipher RE).
|
||
|
||
### Path B started 2026-06-23 — ExchangeKey ECDH works; cleared 2 of 3 layers
|
||
|
||
Implemented `HistoryService.ExchangeKey` as a **pure-managed P-256 ECDH** key exchange
|
||
(`HistorianNativeHandshake.BuildExchangeKeyClientHello` / `DeriveExchangeKeySecret`, .NET
|
||
`ECDiffieHellman` over `nistP256`; wire format `"ECK1" + u32(32) + X(32) + Y(32)`) and wired it into
|
||
`HistorianGrpcHandshake.OpenSession(eventConnection: true)` ahead of the v8 `OpenConnection`,
|
||
on the same context-key handle. Live result against the server: the **`ExchangeKey` RPC succeeds**
|
||
(the server accepted our public key), and the v8 `OpenConnection` error **moved one layer deeper**:
|
||
|
||
```
|
||
Path A (no ExchangeKey): 132/34 "Failed to get client key"
|
||
Path B (ExchangeKey ECDH): 132/171 AuthenticationFailed "EstablishConnection — Authentication failed"
|
||
```
|
||
|
||
So the ECDH cleared the client-key check; the remaining blocker is **authentication**: the 26-byte
|
||
v8 credential token must be a *valid* value derived from the ECDH shared secret (not zeros).
|
||
|
||
### Token crypto traced 2026-06-23 (Frida → Windows CNG) — KDF found, token construction still open
|
||
|
||
Hooked Windows CNG (`bcrypt.dll`/`ncrypt.dll`) while the native harness ran a real ExchangeKey
|
||
(`scripts/frida/aahclientmanaged-cng-exchangekey.js` + `artifacts/.../cng-trace.py`). Findings:
|
||
|
||
- **The ECDH + KDF are standard CNG, driven by managed `System.Security.Cryptography.ECDiffieHellmanCng`**
|
||
(backtrace top frame = `System.Core.ni.dll`; the caller is aahClientManaged's C++/CLI `<Module>`):
|
||
`NCryptSecretAgreement` (P-256) → `NCryptDeriveKey(KDF=HASH, HASH_ALGORITHM=SHA256, 32 bytes)`. So the
|
||
derived key = **SHA256(ECDH shared secret)** — exactly `ECDiffieHellmanCng{ KeyDerivationFunction=Hash,
|
||
HashAlgorithm=SHA256 }.DeriveKeyMaterial(...)`. Our managed `DeriveExchangeKeySecret` should switch to
|
||
this (SHA256 of the raw agreement) to match.
|
||
- **`"ECK1"` is NOT AVEVA-custom** — it is the standard Windows CNG `BCRYPT_ECCPUBLIC_BLOB` magic for
|
||
P-256 (`NCryptExportKey`/`ImportKey` emit exactly `ECK1 + len(32) + X(32) + Y(32)`), confirming our
|
||
`BuildExchangeKeyClientHello` wire format is correct.
|
||
- **The 26-byte token is a custom construction that is not yet reproduced.** Correlated one run's
|
||
derived key (`SHA256(secret)`) with that run's token (from the IL openParameters capture): a
|
||
528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the derived key × request slices ×
|
||
creds) found **no match**, and the token matches **none** of the traced hash digests. The token
|
||
starts with a constant `0x8e` marker in both captured runs (so it is structured, not raw cipher
|
||
output). It is built in managed code between the `DeriveKeyMaterial` call and the openParameters
|
||
assembly.
|
||
|
||
**Next step:** ILSpy cannot decompile the mixed-mode assembly (full-assembly and `<Module>` both crash,
|
||
exit 70). Use **dnlib** (IL-level, won't choke on the native parts) to dump the `<Module>` method that
|
||
references `ECDiffieHellmanCng.DeriveKeyMaterial` and read the post-derive token construction, then
|
||
implement it managed-side and re-test (non-destructive).
|
||
|
||
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
|
||
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
|
||
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The
|
||
token-loop routing guardrail (`HistorianGrpcHandshakeRoutingTests`) was scoped to the closure so the
|
||
legitimate ExchangeKey call is allowed while still pinning that the Negotiate token loop never routes
|
||
there.
|