diff --git a/docs/reverse-engineering/grpc-event-query-capture.md b/docs/reverse-engineering/grpc-event-query-capture.md index 131cf7b..c08c1fb 100644 --- a/docs/reverse-engineering/grpc-event-query-capture.md +++ b/docs/reverse-engineering/grpc-event-query-capture.md @@ -333,6 +333,53 @@ unfixable by, client payload matching.** Closing it needs server-side insight or hard wall — pure managed, golden-tested, auth live-verified. Orchestrator stays on the no-row throw; gated test unchanged. +### NEXT SESSION — the server-side / connection angle (row retrieval pickup) + +Client payloads are exhausted (byte-identical to the native, proven above). The next investigation is +**connection-level**, not wire-payload. Pursue in roughly this order; each is concrete and testable. + +**Already proven — do NOT redo:** auth works (ExchangeKey ECDH + RC4 token, live-verified); v8 +`openParameters`, all handles (str/uint), `StartEventQuery` request, registration (`RTag/EnsT=True` + +order), `queryRequestType=3`, gzip header — all byte-match the native. Events exist (native returns 50 +*now*). The event RPCs succeed over our transport and return a valid version-11 **rowCount-0** (not a +transport error). So the server scopes 0 events to *our* connection specifically. + +**Tooling already in place:** opt-in diagnostic test `EventReadDiagnostic_OverGrpc_PrintsJourney` +(env `HISTORIAN_GRPC_EVENT_DIAG=1`, prints registration outcomes, handles, result hex, v8 buffer); +the `capture-event` harness scenario (native, returns rows); `instrument-grpc-nonstream` now logs +string/uint handle fields too; the CNG Frida hook. Live recipe: set `HISTORIAN_GRPC_HOST`/`_PORT +32565`/`_TLS true`/`_DNSID` to the 2023 R2 server + domain creds (strip quotes); reach the box per the +live-server access reference. + +1. **Transport: native `Grpc.Core` HTTP/2 vs our `Grpc.Net.Client` + `GrpcWebHandler` (gRPC-Web).** + This is the leading hypothesis — the strongest remaining difference. Reads work over gRPC-Web *and* + return rows, so gRPC-Web isn't broken in general; but events are connection-scoped and may require a + **native HTTP/2** connection. TEST: build the event channel WITHOUT the `GrpcWebHandler` wrap (plain + HTTP/2 `GrpcChannel`) in `HistorianGrpcChannelFactory` for the event path only, and re-run the + diagnostic. If rows flow → gate found. (Mind TLS/ALPN over the loopback tunnel — may need + `HttpVersion = 2.0`/`HttpVersionPolicy.RequestVersionExact`.) + +2. **TLS client identity / certificate.** The native used `SecurityMode=TransportCertificate`. Determine + whether it presents a **client certificate** the server uses to scope events (our SDK presents none — + `AllowUntrustedServerCertificate=true`, server cert only). TEST: capture the TLS handshake (e.g. + `SSLKEYLOGFILE` + Wireshark, or a decrypting proxy) for a native `capture-event` run and check the + Certificate message; if a client cert is presented, replicate it. + +3. **HTTP/2-level capture.** The byte[]/handle capture is RPC-payload only. Capture the actual HTTP/2 + frames (HEADERS/SETTINGS/stream IDs, connection reuse) for the native run vs ours — via a + TLS-decrypting mitm on the loopback forward — to see any connection-level header/affinity our capture + can't see. + +4. **Server-side ground truth.** Via the SOCKS→SQL relay (user-authorized, read-only), inspect the + `Runtime.dbo.Events` schema for any per-connection / per-client / source-session column that would + explain why the server returns the rows to the native connection but not ours. Also check whether the + StorageService/event-store path has a connection-scoping notion the History-service event query + depends on. + +If 1–4 don't crack it, the realistic conclusion is that gRPC event-row retrieval has a server-side +connection-identity dependency not reachable from a pure-managed client, and it stays documented as +auth-solved / retrieval-connection-gated. + **2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The diff --git a/docs/reverse-engineering/handoff.md b/docs/reverse-engineering/handoff.md index 47237c7..9092cbb 100644 --- a/docs/reverse-engineering/handoff.md +++ b/docs/reverse-engineering/handoff.md @@ -38,26 +38,28 @@ reuses the proven 2020 WCF byte serializers/parsers unchanged inside protobuf **Everything still open is gated — none is a pure-code task:** -1. **gRPC event ROW retrieval** (`ReadEventsAsync` #2) — **CAPTURED + DIAGNOSED - 2026-06-22 (merged `8ad160b`); remaining gate is now a scoped RE+impl effort, - not a capture.** The `capture-event` harness scenario drove the stock 2023 R2 - client over an Event-type gRPC connection and captured it reading **50 events** - (the live server is event-bearing — SQL ground truth via the INSQL linked - server: `Runtime.dbo.Events` = 19,356 rows/30d, 90,944/365d). Two gaps vs the - SDK: **(a)** the working `StartEventQuery` request is **version 6** (byte 0 = - `06` + a 5-byte trailing pad), the SDK sent v5 — **SHIPPED** - (`HistorianEventQueryProtocol` `version` param, default 5 = WCF; the gRPC - orchestrator passes 6; golden-tested). **(b) the real gate:** rows flow only on - an **Event-type connection**. The native `OpenConnection.openParameters` is - format **v8** with a `ConnectionType` byte (Event `01` / Process `02`) right - after `ClientType`; the SDK's **v6** Open2 buffer has no such field (it writes - `ClientType` then `ConnectionMode` back-to-back), so the read-capable v6 - connection cannot be marked as Event. `ConnectionMode` is not the lever (2020 - WCF events work at `0x402`). Making event rows flow needs the SDK to emit the - native v8 `OpenConnection` with `ConnectionType=Event` (and likely the - `ExchangeKey` cert auth path) — a substantial follow-on. Full evidence + - byte-level diagnosis: `docs/reverse-engineering/grpc-event-query-capture.md`. - Gated test still pins the no-row throw; bounded ≤30s. +1. **gRPC event ROW retrieval** (`ReadEventsAsync` #2) — **v8 EVENT-CONNECTION AUTH + SOLVED 2026-06-23 (merged `9a25fa4`); row retrieval is now CONNECTION-gated, not + a payload gap.** The v8 OpenConnection crypto wall is fully cracked + live-verified: + the event connection authenticates via **`HistoryService.ExchangeKey` (P-256 ECDH) + → client key = `SHA256(shared secret)` → credential token = `RC4(password-UTF16LE, + key=MD5(client key))`** (the native `HistorianCrypto.NRC4_V2.aahCryptV2` MD5-keyed + RC4 scheme). RE'd via Frida CNG hooks + dnlib IL extraction + an offline cracker; + implemented pure-managed, golden-tested, auth live-PASSES (past `132/171 + AuthenticationFailed`). The `StartEventQuery` v6 request and the Event-type v8 + `OpenConnection` (`ConnectionType=Event`) are also shipped. **BUT** the event query + still returns version-11 **rowCount-0** while the native returns 50 for a + BYTE-IDENTICAL request. Exhaustively proven NOT a client-payload issue: v8 + `openParameters`, all str/uint handles, the request, registration (`RTag/EnsT=True` + + order), `queryRequestType=3`, gzip header — **all byte-match the native**. The + event RPCs succeed and return a valid EMPTY result (not a transport error), so it's + a **connection/server-level scoping difference** (session affinity tied to the + native `Grpc.Core` HTTP/2 connection or a connection identity). **Next session: see + the "NEXT SESSION — the server-side / connection angle" section of + `docs/reverse-engineering/grpc-event-query-capture.md`** (ordered, testable + hypotheses: HTTP/2-vs-gRPC-Web transport, TLS client cert, HTTP/2 frame capture, SQL + event-store scoping). Orchestrator stays on the no-row throw; `eventConnection: true` + is wired; opt-in `EventReadDiagnostic` test (`HISTORIAN_GRPC_EVENT_DIAG=1`). 2. **R4.3 active-SF magnitude** — needs an **SF-active server** (D2 storage-engine console handle). 3. **SendEvent over gRPC** — **capture-gated**: no distinct RPC, framing uncaptured. @@ -75,10 +77,11 @@ reuses the proven 2020 WCF byte serializers/parsers unchanged inside protobuf 8. **Deferred-by-design** items (`write-commands` D1–D3, non-analog tag create, etc.) — bounded out until an explicit customer/user demand signal. -To move any remaining item you need a **scoped RE+impl effort** (the v8 -Event-type `OpenConnection` — item 1, already captured + diagnosed), a **fresh -native capture** (SendEvent gRPC framing — item 3), a **different server** -(SF-active for item 2), or a **demand signal** to unlock a deferred item. +To move any remaining item you need a **server-side / connection-level angle** +(item 1 — v8 event auth is solved; row retrieval is connection-gated, see the +NEXT SESSION section of `grpc-event-query-capture.md`), a **fresh native capture** +(SendEvent gRPC framing — item 3), a **different server** (SF-active for item 2), +or a **demand signal** to unlock a deferred item. Live-server gRPC probe recipe: set `HISTORIAN_GRPC_HOST`/`_PORT 32565`/`_TLS true`/`_DNSID` + domain creds (strip quotes — `reference_wonder_sql_vd03_credentials`) and run the gated