Event-row parser: verify against the provided 2023 R2 client; fix latent multi-row bug

Used the provided stock client as an oracle to verify the event read path. The
capture-event harness returns 50 real events, and the instrument-grpc-nonstream rewrite
captured the exact GetNextEventQueryResultBuffer.result buffer (63,192 bytes, version
0x0B=11, rowCount 50 = 25 Alarm.Set + 25 Alarm.Clear). Feeding that real buffer through
HistorianEventRowProtocol.Parse exposed a latent parser bug.

The real buffer layout is: version(2) + rowCount(4) + headerField(4, =0x1E) followed by
MARKERLESS rows (rowFormat(2)=7 + filetime(8) + 8x u16 slots + compact-ascii type +
propCount + props). The parser wrongly treated the one-time 0x1E field as a per-row
marker and re-consumed [marker+format] for every row, so it decoded only the FIRST row
of any multi-row buffer and stopped. This is not gRPC-specific: the captured WCF v9
buffer has the identical 0900 <rowCount> 1E000000 0700 header, so the shipped WCF event
read had the same latent multi-row truncation.

Fix: read a 10-byte buffer header (skip the 0x1E field once) and parse markerless rows;
accept container version 9 (WCF) and 11 (gRPC), mirroring the interface-version gate that
accepts History 11 and 12.

Verified: the real 50-row buffer now decodes to exactly 50 events, ending cleanly at
end-of-buffer (Parse_RealStockClientCapture_DecodesAllEvents, gated on
HISTORIAN_EVENT_CAPTURE_NDJSON so it skips without the gitignored capture), plus a
synthetic v11 golden test. 328 offline tests pass.

The parse path is now verified against the provided client's real event data on both
transports; the only remaining gap for gRPC events is the server delivering rows to our
connection (the documented retrieval-server-gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
This commit is contained in:
Joseph Doherty
2026-06-23 14:05:35 -04:00
parent 8199dde452
commit 8f4a188f78
3 changed files with 126 additions and 27 deletions
@@ -473,6 +473,30 @@ over gRPC keeps the honest no-row throw, and event reads use the WCF transport.
any future server-side investigation: the `httpcap` TLS-tee proxy, the `CreateHttp2` / `SPLIT_CHANNEL`
switches, the `EventReadDiagnostic` test, and the `capture-event` harness (native, returns rows).
### Verify the parse path against the provided client's real data (2026-06-23) — found + fixed a latent bug
Used the provided 2023 R2 client as an **oracle**: the `capture-event` harness returns 50 real events
(verified live + through the `httpcap` proxy), and the `instrument-grpc-nonstream` rewrite captured the
exact `GetNextEventQueryResultBuffer.result` buffer the stock client received — **63,192 bytes, version
`0x0B` (11), rowCount 50** (25 `Alarm.Set` + 25 `Alarm.Clear`). Fed that real buffer through our
`HistorianEventRowProtocol.Parse` to verify the read path decodes genuine gRPC event data, and it
**exposed a latent parser bug**:
- The real row buffer is `version(2) + rowCount(4) + headerField(4, =0x1E)` then **markerless rows**
(`rowFormat(2)=7 + filetime(8) + 8×u16 slots + compact-ascii type + propCount + props`). Our parser
wrongly treated the one-time `0x1E` field as a **per-row marker** and re-consumed `[marker+format]`
every row — so it parsed only the **first** row of any multi-row buffer and stopped. This is **not
gRPC-specific**: the captured **WCF v9** buffer has the identical `0900 <rowCount> 1E000000 0700 …`
header, so the shipped WCF event read had the same latent multi-row truncation.
- **Fix:** read a 10-byte buffer header (skip the `0x1E` field once) and parse markerless rows; accept
container version **9 (WCF) and 11 (gRPC)**. Verified: the real 50-row buffer now decodes to exactly 50
events, ending cleanly at end-of-buffer (`Parse_RealStockClientCapture_DecodesAllEvents`, gated on
`HISTORIAN_EVENT_CAPTURE_NDJSON`); plus a synthetic v11 golden test. 328 offline tests pass.
So the **parse path is now verified against the provided client's real event data** — the one remaining
gap is strictly the server delivering rows to our gRPC connection (the working-set gate above). If that
were ever opened, the decoded events would now flow through correctly on both transports.
**2 of 3 layers cleared** (key exchange + client key); the 3rd (token construction) is localized to a
specific managed method, pending dnlib extraction. ExchangeKey + the v8 serializer are committed; the
orchestrator stays on v6 (set `eventConnection: true` to re-arm once the token construction lands). The