docs(grpc-events): document the server-side/connection angle for next session

Records the row-retrieval pickup now that the v8 ExchangeKey auth is solved and the
gap is proven connection-level (not client payload):
- grpc-event-query-capture.md: a "NEXT SESSION — the server-side / connection angle"
  section — what's already proven (don't redo), the in-place tooling, and ordered,
  testable hypotheses (HTTP/2 vs gRPC-Web transport [leading], TLS client cert,
  HTTP/2 frame capture, SQL event-store scoping).
- handoff.md item 1: updated to "v8 auth solved; row retrieval connection-gated",
  pointing at the NEXT SESSION section; the "to move any item" summary updated.

Doc-only; sanitization scan clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
This commit is contained in:
Joseph Doherty
2026-06-23 12:37:22 -04:00
parent 9a25fa4ef7
commit 88287a8c66
2 changed files with 74 additions and 24 deletions
@@ -333,6 +333,53 @@ unfixable by, client payload matching.** Closing it needs server-side insight or
hard wall — pure managed, golden-tested, auth live-verified. Orchestrator stays on the no-row throw;
gated test unchanged.
### NEXT SESSION — the server-side / connection angle (row retrieval pickup)
Client payloads are exhausted (byte-identical to the native, proven above). The next investigation is
**connection-level**, not wire-payload. Pursue in roughly this order; each is concrete and testable.
**Already proven — do NOT redo:** auth works (ExchangeKey ECDH + RC4 token, live-verified); v8
`openParameters`, all handles (str/uint), `StartEventQuery` request, registration (`RTag/EnsT=True` +
order), `queryRequestType=3`, gzip header — all byte-match the native. Events exist (native returns 50
*now*). The event RPCs succeed over our transport and return a valid version-11 **rowCount-0** (not a
transport error). So the server scopes 0 events to *our* connection specifically.
**Tooling already in place:** opt-in diagnostic test `EventReadDiagnostic_OverGrpc_PrintsJourney`
(env `HISTORIAN_GRPC_EVENT_DIAG=1`, prints registration outcomes, handles, result hex, v8 buffer);
the `capture-event` harness scenario (native, returns rows); `instrument-grpc-nonstream` now logs
string/uint handle fields too; the CNG Frida hook. Live recipe: set `HISTORIAN_GRPC_HOST`/`_PORT
32565`/`_TLS true`/`_DNSID` to the 2023 R2 server + domain creds (strip quotes); reach the box per the
live-server access reference.
1. **Transport: native `Grpc.Core` HTTP/2 vs our `Grpc.Net.Client` + `GrpcWebHandler` (gRPC-Web).**
This is the leading hypothesis — the strongest remaining difference. Reads work over gRPC-Web *and*
return rows, so gRPC-Web isn't broken in general; but events are connection-scoped and may require a
**native HTTP/2** connection. TEST: build the event channel WITHOUT the `GrpcWebHandler` wrap (plain
HTTP/2 `GrpcChannel`) in `HistorianGrpcChannelFactory` for the event path only, and re-run the
diagnostic. If rows flow → gate found. (Mind TLS/ALPN over the loopback tunnel — may need
`HttpVersion = 2.0`/`HttpVersionPolicy.RequestVersionExact`.)
2. **TLS client identity / certificate.** The native used `SecurityMode=TransportCertificate`. Determine
whether it presents a **client certificate** the server uses to scope events (our SDK presents none —
`AllowUntrustedServerCertificate=true`, server cert only). TEST: capture the TLS handshake (e.g.
`SSLKEYLOGFILE` + Wireshark, or a decrypting proxy) for a native `capture-event` run and check the
Certificate message; if a client cert is presented, replicate it.
3. **HTTP/2-level capture.** The byte[]/handle capture is RPC-payload only. Capture the actual HTTP/2
frames (HEADERS/SETTINGS/stream IDs, connection reuse) for the native run vs ours — via a
TLS-decrypting mitm on the loopback forward — to see any connection-level header/affinity our capture
can't see.
4. **Server-side ground truth.** Via the SOCKS→SQL relay (user-authorized, read-only), inspect the
`Runtime.dbo.Events` schema for any per-connection / per-client / source-session column that would
explain why the server returns the rows to the native connection but not ours. Also check whether the
StorageService/event-store path has a connection-scoping notion the History-service event query
depends on.
If 14 don't crack it, the realistic conclusion is that gRPC event-row retrieval has a server-side
connection-identity dependency not reachable from a pure-managed client, and it stays documented as
auth-solved / retrieval-connection-gated.
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The