diff --git a/docs/reverse-engineering/grpc-event-query-capture.md b/docs/reverse-engineering/grpc-event-query-capture.md new file mode 100644 index 0000000..17de437 --- /dev/null +++ b/docs/reverse-engineering/grpc-event-query-capture.md @@ -0,0 +1,141 @@ +# gRPC event-query capture (2026-06-22) — the StartEventQuery request that returns rows + +Captured the stock 2023 R2 client performing a **gRPC event read** that returns rows, to resolve +the open item "gRPC event ROW retrieval returns zero rows" (handoff §Current Status item 1). This +closes the capture-gate: the working request shape is now known. + +## How it was captured + +`tools/AVEVA.Historian.Grpc2023CaptureHarness` gained a `capture-event` scenario. It loads the +self-contained mixed-mode 2023 R2 `aahClientManaged.dll` and drives `HistorianAccess`: + +``` +OpenConnection(ConnectionMode=Historian /*gRPC*/, ConnectionType=Event, ReadOnly=true) + -> CreateEventQuery() // NON-null only on an Event connection + -> EventQueryArgs { StartDateTime, EndDateTime, EventCount } + -> EventQuery.StartQuery(args) // => GrpcRetrievalClient.StartEventQuery(requestBuffer) + -> loop EventQuery.MoveNext() / QueryResult// => GrpcRetrievalClient.GetNextEventQueryResultBuffer + -> EventQuery.EndQuery() -> CloseConnection +``` + +The existing wide-net `instrument-grpc-nonstream` IL rewrite (every `Grpc*Client` `byte[]` method) +already covers `GrpcRetrievalClient.StartEventQuery.requestBuffer` (entry) and +`GetNextEventQueryResultBuffer.result` (exit) — no new instrument command was needed. Run read-only +(non-destructive) against the live 2023 R2 server over the loopback tunnel; the rewrite + capture +NDJSON stay under `artifacts/reverse-engineering/grpc-event-capture/` (gitignored — the result +buffer carries event identity data). + +Result: **50 events returned over gRPC** (Alarm.Set / Alarm.Clear rows), proving the path works when +driven through an Event connection. + +## Two findings + +### 1. The event read needs an **Event-type connection** (`ConnectionIndex 1`) + +`HistorianAccess.CreateEventQuery()` returns `null` unless `IsEventConnectionRequested()` — i.e. the +connection was opened with `ConnectionType=Event`, which the native client routes to a *separate* +connection (ConnectionIndex 1) from the process/data path. The full captured pre-query sequence on +that connection: `OpenConnection` → `ExchangeKey` → `UpdateClientStatus` → `RegisterTags`(CM_EVENT) → +`EnsureTags`(CM_EVENT) → `GetHistorianInfo` + 7×`GetSystemParameter` (Stat priming) → +`StartEventQuery` → `GetNextEventQueryResultBuffer` (rows) → `EndEventQuery` → `CloseConnection`. + +### 2. The working `StartEventQuery` request is **version 6**, not 5 + +Our SDK's `HistorianEventQueryProtocol.CreateNativeFilterAttempt` builds a **version-5** empty-filter +buffer; the stock 2023 R2 client sends **version 6**. Diffed byte-for-byte (same query window + +eventCount), the two buffers are **identical except**: + +- **byte 0: version `06` vs `05`** +- **5 additional trailing zero bytes** (stock = 70 bytes, SDK v5 = 65 bytes) + +The server returns rows for v6 and **zero rows for v5** (the v5 request is *accepted* — +`StartEventQuery` succeeds and yields a query handle — but `GetNextEventQueryResultBuffer` then +matches nothing). Everything else is shared: the two query-window FILETIMEs, `UInt32 eventCount`, +the `UInt32 65536` buffer hint, the `"UTC"` `HistorianString`, and the `01 01000001000001 0000` +metadata-namespace block. + +Captured v6 request layout (70 bytes; the FILETIMEs below are just the harness query window — no +identity data): + +``` +[0..1] UInt16 version = 6 // SDK currently sends 5 +[2..9] Int64 startUtc (FILETIME) +[10..17] Int64 endUtc (FILETIME) +[18..21] UInt32 eventCount +[22..25] UInt32 0 +[26..27] UInt16 0 +[28..29] UInt16 1 +[30..36] 7 bytes 0 // empty-filter block +[37..40] UInt32 65536 // buffer-size hint +[41..50] HistorianString "UTC" (UInt32 len=3 + UTF-16LE) +[51..60] 01 01 00 00 01 00 00 01 00 00 // metadata-namespace block (marker + 3 empty) +[61..69] 9 bytes 0 // terminal (SDK v5 writes only 4 here) +``` + +## Fix part 1 — v6 request (DONE, necessary) + +`HistorianEventQueryProtocol.CreateStartEventQueryAttempts` gained a `version` parameter (default 5 = +WCF/2020; the gRPC orchestrator passes 6). v6 emits the leading `06` and the 5-byte trailing pad. The +WCF path is unchanged (v5). Golden test `Version6EmptyFilterMatchesCapturedGrpcEnvelope` pins the +envelope; 322/322 offline tests pass. + +## Fix part 2 — EVENT connection (the remaining gate, NOT yet implemented) + +Live validation 2026-06-22: with the orchestrator now sending v6 against the event-bearing live +server, `GetNextEventQueryResultBuffer` **still long-polls and returns zero rows** (the gated test +still throws). So **v6 is necessary but not sufficient** — the read also requires an **Event-type +connection**, which our SDK does not open. + +Isolated by diffing the captured `OpenConnection.openParameters` (302 bytes, native format v8) for a +**Process** connection (`connect` scenario) vs the **Event** connection (`capture-event`): aside from +the per-session auth GUID/credential-hash regions ([22..37], [68..93], which vary between any two +sessions), the connection differs in **two clean structural bytes**: + +| offset | Process | Event | +|--------|---------|-------| +| 95 | `02` | `01` | +| 96 | `00` | `01` | + +These correspond to `HistorianConnectionType` (Process vs Event; the native event path runs on +`ConnectionIndex 1`). The problem: our SDK opens the session with the **2020 OpenConnection3 v6** +buffer (`HistorianNativeHandshake.BuildOpenConnection3Request`, `connectionMode 0x402`), which the +2023 R2 server accepts for reads but which carries no event-connection-type marker. `connectionMode` +is NOT the discriminator (2020 WCF event reads work with `0x402`); the native client distinguishes +event vs process via this separate `ConnectionType` field in its v8 `openParameters`. + +### Diagnosis (2026-06-22): the v6 Open2 format cannot express an event connection + +Decoded the native `openParameters` (302 bytes): **byte 0 = `08` (format version 8)**, then a +context GUID, username, a 26-byte session-derived region ([68..93]), machine/client-node/datasource +strings, and at **[94] `ClientType=04`** immediately followed by **[95] `ConnectionType` +(`01`=Event / `02`=Process)** + **[96] a flag (`01`/`00`)**, then the rest. + +Our SDK builds the **v6** buffer (`HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6`, +byte 0 = `06`): it writes `ClientType` (1 byte) **immediately followed by `ConnectionMode` (uint)** — +there is **no `ConnectionType` byte at all**. The v8 format *inserts* `ConnectionType` (+flag) between +`ClientType` and the rest. So the v6 buffer the SDK sends (accepted by the 2023 R2 server for *reads*) +structurally cannot mark the connection as Event, and the server returns event rows only for an Event +connection. + +Two further obstacles to simply emitting v8: +- the native client authenticated via **`ExchangeKey`** (cert path; 72-byte `btInput`/`btOutput` in + the capture) whereas the SDK's gRPC handshake uses **`ValidateClientCredential`** (Negotiate). The + v8 `openParameters` [68..93] region is session-derived and tied to that auth flow. +- `ConnectionMode` is NOT the lever (2020 WCF event reads work at `0x402`); `ConnectionType` is a + distinct field that only exists from format v8. + +Also confirmed a secondary format gap: the native gRPC `EnsureTags` CM_EVENT payload is **86 bytes** +vs the SDK's `SerializeCmEventCTagMetadata` **83 bytes** (a 3-byte 2023 R2 bump, parallel to the +event-query v5→v6). This is likely benign on its own (CM_EVENT pre-exists; 2020 EnsT2 returns +benign-false yet events flow) but should be matched if the event open is ever rebuilt. + +**Conclusion — the event-connection gate is NOT a tweak.** Making event rows flow over gRPC requires +the SDK to emit the native **v8 `OpenConnection` format** with `ConnectionType=Event` (a 302-byte +buffer whose layout differs from the v6 buffer and includes a session-derived auth region), and +likely to adopt the `ExchangeKey` cert auth path. That is a substantial RE+implementation effort +comparable to the original Open2 work — scoped as a follow-on, not a quick fix. Until then the gated +`ReadEventsAsync_OverGrpc_*` test correctly still pins the no-row throw, and **v6 (part 1) is retained +as the captured-correct request format** for when the open is rebuilt. + +Capture artifacts (gitignored): `artifacts/reverse-engineering/grpc-event-capture/` — +`event-capture.ndjson` (Event), `process-connect-2.ndjson` (Process).