Commit Graph

12 Commits

Author SHA1 Message Date
Joseph Doherty 8f4a188f78 Event-row parser: verify against the provided 2023 R2 client; fix latent multi-row bug
Used the provided stock client as an oracle to verify the event read path. The
capture-event harness returns 50 real events, and the instrument-grpc-nonstream rewrite
captured the exact GetNextEventQueryResultBuffer.result buffer (63,192 bytes, version
0x0B=11, rowCount 50 = 25 Alarm.Set + 25 Alarm.Clear). Feeding that real buffer through
HistorianEventRowProtocol.Parse exposed a latent parser bug.

The real buffer layout is: version(2) + rowCount(4) + headerField(4, =0x1E) followed by
MARKERLESS rows (rowFormat(2)=7 + filetime(8) + 8x u16 slots + compact-ascii type +
propCount + props). The parser wrongly treated the one-time 0x1E field as a per-row
marker and re-consumed [marker+format] for every row, so it decoded only the FIRST row
of any multi-row buffer and stopped. This is not gRPC-specific: the captured WCF v9
buffer has the identical 0900 <rowCount> 1E000000 0700 header, so the shipped WCF event
read had the same latent multi-row truncation.

Fix: read a 10-byte buffer header (skip the 0x1E field once) and parse markerless rows;
accept container version 9 (WCF) and 11 (gRPC), mirroring the interface-version gate that
accepts History 11 and 12.

Verified: the real 50-row buffer now decodes to exactly 50 events, ending cleanly at
end-of-buffer (Parse_RealStockClientCapture_DecodesAllEvents, gated on
HISTORIAN_EVENT_CAPTURE_NDJSON so it skips without the gitignored capture), plus a
synthetic v11 golden test. 328 offline tests pass.

The parse path is now verified against the provided client's real event data on both
transports; the only remaining gap for gRPC events is the server delivering rows to our
connection (the documented retrieval-server-gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 14:05:35 -04:00
Joseph Doherty 8199dde452 gRPC events: capture decrypted HTTP/1.1 frames native vs ours — topology found + tested null
Pursued hypothesis #3 (connection-frame capture). Built a TLS-terminating tee proxy
(artifacts/.../httpcap/, gitignored: self-signed server cert, forwards through the
loopback tunnel, logs decrypted HTTP/1.1 + gRPC-Web both directions) and ran a native
capture-event (returns 50 rows) and our SDK diagnostic (0 rows) through the SAME
proxy/upstream for a clean A/B.

Findings:
- The stock client is gRPC-Web/HTTP-1.1 (alpn empty), and clientCert=none on every
  connection — confirming (with the decompile) that hypothesis #2 (TLS client cert) is
  moot: the native presents no client cert.
- Connection topology differs: the native opens 5 TLS connections, one per service, and
  runs the event query (StartEventQuery/GetNext/EndEventQuery) on a DEDICATED
  RetrievalService connection, separate from the HistoryService connection that opened and
  registered the session. Our SDK collapses every service onto one connection. (Matches the
  decompile: the stock client has a separate GrpcClientBase per service.)
- Framing differs benignly: native uses content-length + Expect: 100-continue; SDK uses
  transfer-encoding: chunked. The server accepts both (StartEventQuery returns a valid
  handle), so framing is not the gate. No hidden header on either side.

Tested the topology hypothesis with a new env-gated switch
(HISTORIAN_GRPC_EVENT_SPLIT_CHANNEL=1): run StartEventQuery/GetNext/EndEventQuery on a
dedicated RetrievalService connection (no re-handshake, reusing the session handle —
mirroring native conn4), registration staying on the main connection. Result: still
0B00000000001E000000 (0 rows), QH=1063. Splitting the event query onto its own connection
does not make rows flow — the server correlates by session handle, not connection, so
topology is not the row-scoping gate.

Every angle is now exhausted (payload, transport, metadata/interceptor/cert, topology,
data store). The gate is a server-internal per-connection retrieval working-set in the
native HistorianClient C++ core, unreachable from a pure-managed client. Conclusion
unchanged: auth-solved / retrieval-server-gated; ReadEventsAsync over gRPC keeps the
no-row throw, event reads use WCF. 56 offline gRPC/event tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 13:43:21 -04:00
Joseph Doherty 6cf4dd13fe gRPC events: decompile the stock managed client — confirms no hidden client-side difference
Closed the blind spot the zero-rows conclusion had leaned on: prior cycles used wire
capture (instrument-grpc-nonstream hooks only byte[] params), blind to gRPC metadata,
interceptors, channel options. Read the stock managed source directly
(histsdk-2023r2-analysis/decompiled/Archestra.Historian.GrpcClient + HistorianAccess;
the pure-managed assemblies decompile cleanly though mixed-mode aahClientManaged crashes
ILSpy).

Findings:
- GrpcClientBase.InitializeBase uses GrpcWebHandler (GrpcWebMode, HttpVersion 1.1) — the
  stock client speaks gRPC-Web over HTTP/1.1, the SAME transport as our SDK. This corrects
  the premise of hypothesis #1: there was never a native Grpc.Core HTTP/2 path to differ
  from; the stock client returning 50 rows is itself gRPC-Web. The HTTP/2 disproof's
  conclusion stands and is reinforced (identical transport on both sides).
- m_metadata on every RPC (incl. StartEventQuery/GetNextEventQueryResultBuffer) is only
  grpc-internal-encoding-request: gzip — exactly our header set. The ClientInterceptor is
  a no-op (empty LogCall). So the "invisible per-connection metadata/header" blind spot is
  confirmed empty — no hidden client-side identity the byte[] capture missed.
- CreateEventQuery/StartQuery/MoveNext are not in managed code; the managed
  GrpcRetrievalClient.StartEventQuery is a thin one-RPC stub. The query logic lives in the
  native C++/CLI HistorianClient core — consistent with the working-set being native/server-side.

Every client-controllable layer is now confirmed identical by reading the stock source,
not just by wire match: request bytes, transport, channel options, gRPC metadata,
interceptor. The remaining difference is below the managed surface / server-side.
Conclusion unchanged: gRPC event-row retrieval is auth-solved / retrieval-server-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 13:20:39 -04:00
Joseph Doherty f19eb3b821 gRPC events: answer hypothesis #4 (SQL ground truth) — event store is global, not connection-scoped
Pursued the server-side SQL angle for the gRPC event zero-rows. Built a read-only
SOCKS5-relay + Microsoft.Data.SqlClient probe (gitignored, artifacts/.../sqlschema/)
and dumped the live Runtime event schema. Findings:

- No per-connection / per-client / per-session column exists anywhere in the event
  store. The only scoping-like columns on Events/EventHistory/snapshots are event
  content (Source_* origin, User_* acker, Provider_NodeName, SourceServer replication).
- The rich Events view is not a relational table — it is served live by the Historian
  engine via the INSQL OLE DB provider (linked servers INSQL/INSQLD; encrypted remote
  view). The SQL EventHistory base table holds only 168 rows / 1 tag.
- Decisive: for the SAME -90d..now window the gRPC StartEventQuery diagnostic returned
  0 rows, the Events view via INSQL returns 71,332 events (most recent Alarm.Set firing
  seconds before the probe). Same engine, same window — INSQL serves the data, gRPC
  withholds it from our connection.

So there is nothing in the data to scope by: the zero-row gate is the gRPC
RetrievalService's per-connection in-process execution state, not data scoping or
transport (the same class of wall as DeleteTagExtendedProperties). Combined with the
transport disproof, three independent angles are now exhausted — client payload
(byte-identical), transport (HTTP/2 == gRPC-Web), and data store (global, unscoped).
gRPC event-row retrieval stands documented as auth-solved / retrieval-server-gated;
ReadEventsAsync over gRPC keeps the no-row throw and event reads use WCF.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 13:05:19 -04:00
Joseph Doherty b0388e7a40 gRPC events: disprove transport hypothesis (native HTTP/2 also returns zero rows)
Tested grpc-event-query-capture.md's leading next-session hypothesis — that the
native client's Grpc.Core HTTP/2 transport (vs our Grpc.Net.Client + GrpcWebHandler
gRPC-Web) is why event reads return zero rows. Added HistorianGrpcChannelFactory
.CreateHttp2 (plain HTTP/2 over SocketsHttpHandler, no gRPC-Web wrap) and an
HISTORIAN_GRPC_EVENT_HTTP2 switch on the event orchestrator (event path only; reads
stay gRPC-Web).

Live side-by-side against the event-bearing 2023 R2 server, everything else held
constant: the full v8 chain (ExchangeKey auth, CM_EVENT RegisterTags/EnsureTags=True,
StartEventQuery with a valid handle) runs end-to-end over BOTH native HTTP/2 and
gRPC-Web, and the server returns the byte-identical version-11 rowCount-0 terminal
(0B00000000001E000000) on both transports. Transport choice makes no difference —
the leading hypothesis is disproven and the zero-row scoping sits above the gRPC
transport layer.

Also confirmed the native capture-event harness queries a 30-day historical window
(returns 50 rows), so the native read is connection-scoped historical retrieval,
not a live subscription.

CreateHttp2 + the env switch + the EventChannelMode diagnostic are retained for
further connection-level probing. 44 offline tests pass; orchestrator stays on the
no-row throw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 12:55:09 -04:00
Joseph Doherty 88287a8c66 docs(grpc-events): document the server-side/connection angle for next session
Records the row-retrieval pickup now that the v8 ExchangeKey auth is solved and the
gap is proven connection-level (not client payload):
- grpc-event-query-capture.md: a "NEXT SESSION — the server-side / connection angle"
  section — what's already proven (don't redo), the in-place tooling, and ordered,
  testable hypotheses (HTTP/2 vs gRPC-Web transport [leading], TLS client cert,
  HTTP/2 frame capture, SQL event-store scoping).
- handoff.md item 1: updated to "v8 auth solved; row retrieval connection-gated",
  pointing at the NEXT SESSION section; the "to move any item" summary updated.

Doc-only; sanitization scan clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 12:37:22 -04:00
Joseph Doherty 0921e21bdb feat(grpc-events): handle-capture cycle — event-row gap proven NOT a client payload issue
Extends the instrument-grpc rewrite to log string (strHandle) + uint (uiHandle /
queryRequestType) params, not just byte[], and captures our SDK's live v8
openParameters for a byte-diff against the native.

Result of the exhaustive comparison (all live-confirmed via the opt-in
EventReadDiagnostic test):
- StartEventQuery request: byte-identical to the native (v6 layout)
- v8 OpenConnection openParameters: byte-identical to the native (302B) once
  ClientNodeName matches — every control byte/ConnectionType/token/ShardId
- handle usage identical: ExchangeKey->contextKey, registration->storage GUID
  (strHandle), query->client uint (uiHandle); handles valid (RTag/EnsT=True)
- queryRequestType=3, registration order, gzip metadata header — all match
- window has events (native returns 50 now); eventCount not it

Every observable client-side byte matches the native, yet the server scopes 0
events to our connection. The event RPCs succeed over our transport and return a
valid EMPTY result (not a transport error), so this is a connection/server-level
difference (session affinity tied to the native Grpc.Core HTTP/2 connection or a
connection identity used to scope events) — invisible to and unfixable by client
payload matching. Needs server-side insight, not more wire RE.

Added opt-in diagnostics (RegistrationDiag, LastResultBufferHex,
LastEventOpenRequestHex). 326/326 offline; gated test still pins the no-row throw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 12:31:33 -04:00
Joseph Doherty c45f1a957b docs(grpc-events): token scheme fully RE'd via dnlib — aahCryptV2 (MD5-keyed RC4 + prefix)
Loaded dnlib in PowerShell (ILSpy crashes on the mixed-mode assembly) and scanned
the IL to recover the entire v8 token construction:

- <Module>::CHistoryConnectionGrpc.GetClientKey drives the ECDH: ECDiffieHellmanCng
  {KeyDerivationFunction=Hash, HashAlgorithm=SHA256, KeySize=256} -> ExchangeKey ->
  CngKey.Import(serverPub, EccPublicBlob) -> DeriveKeyMaterial = SHA256(shared secret),
  the 32-byte client key.
- aahClientCommon.CClientBase.ConfigureOpenConnection (the lone GetClientKey caller)
  builds the 26-byte token via HistorianCrypto.NRC4_V2.aahCryptV2 = a custom MD5-keyed
  RC4 stream cipher with a version prefix:
    * body/HashData = MD5 (verified by the round constants 0xd76aa478... + shifts 7/12/17/22)
    * prepare_key = RC4 KSA from a 16-byte MD5 key
    * enc_buffer = MD5 -> key, then rc4encrypt; enc prepends PrefixV2/InnerPrefixV2
      (the constant 0x8e token marker)
  So token = prefix + RC4(plaintext, key=MD5(keyMaterial)), keyMaterial tied to the
  SHA256(ECDH secret) client key. 100% reproducible in pure managed code (RC4+MD5).

Remaining (next cycle): read ConfigureOpenConnection's exact key/plaintext/prefix bytes,
implement aahCryptV2 managed-side, set the v8 token, live-test. Frida CNG + dnlib are
the RE path; nothing AVEVA is shipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 11:21:55 -04:00
Joseph Doherty b2ac35b98e docs(grpc-events): trace the ExchangeKey token crypto — KDF=SHA256(secret); token construction localized
Frida-hooked Windows CNG (scripts/frida/aahclientmanaged-cng-exchangekey.js) during
a real native ExchangeKey to recover the token derivation:

- The ECDH + KDF are standard CNG driven by managed System.Security.Cryptography
  .ECDiffieHellmanCng: NCryptSecretAgreement (P-256) -> NCryptDeriveKey(KDF=HASH,
  SHA256, 32 bytes). So the derived key = SHA256(ECDH shared secret).
- "ECK1" is the standard CNG BCRYPT_ECCPUBLIC_BLOB magic (P-256), confirming our
  BuildExchangeKeyClientHello wire format.
- The 26-byte token (constant 0x8e marker) is a custom construction over the
  derived key: a 528-candidate offline cracker (HMAC/SHA/AES-GCM/CBC/CTR over the
  derived key x request slices x creds) found no match, and it matches none of the
  traced hash digests. It is built in aahClientManaged's C++/CLI <Module> code
  between the DeriveKeyMaterial call and the openParameters assembly.

Next: ILSpy cannot decompile the mixed-mode assembly (crashes, exit 70); use dnlib
(IL-level) to dump the <Module> method referencing DeriveKeyMaterial and read the
post-derive token construction. 2 of 3 layers cleared (key exchange + client key);
the 3rd (token) is localized, pending dnlib extraction. Orchestrator stays on v6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 11:11:21 -04:00
Joseph Doherty 3fd522fa10 docs(grpc-events): Path B — ExchangeKey ECDH clears 2 of 3 layers
Records that the pure-managed P-256 ExchangeKey works (cleared the v8 client-key
check; error advanced to 132/171 AuthenticationFailed). The remaining layer is the
26-byte credential-token KDF, which requires recovering the native key derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 10:31:37 -04:00
Joseph Doherty 7284fdc976 docs(grpc-events): Path A disproven — v8 OpenConnection coupled to ExchangeKey
Records the full v8 openParameters byte map, the ECDH ExchangeKey finding, and
the Path A live result: the v8 OpenConnection on a ValidateClientCredential
session is rejected with native 132/34 "EstablishConnection Failed to get client
key". The v8 path requires the client key established by HistoryService.ExchangeKey
(ECDH), so the next route is Path B — implement ExchangeKey ("ECK1" + 64-byte
P-256 point) via .NET ECDiffieHellman, then reissue the v8 open.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 09:46:07 -04:00
Joseph Doherty c6752804ee docs(grpc-events): event-query capture finding + v8 connection-type gate
Records the 2026-06-22 capture of the stock 2023 R2 gRPC event read and the
diagnosis of why row retrieval is gated:

1. The working StartEventQuery request is version 6 (vs the SDK's v5) — shipped in
   the companion code commit.
2. Rows additionally require an EVENT-type connection. Decoding the captured
   OpenConnection.openParameters (native format v8) shows a ConnectionType byte
   (Event=01 / Process=02) right after ClientType — a field the SDK's v6 Open2
   format does not have (it writes ClientType then ConnectionMode back-to-back).
   So the v6 buffer the SDK sends (accepted for reads) cannot mark the connection
   as Event, and the 2023 R2 server returns event rows only on an Event
   connection. The native client also used the ExchangeKey cert auth path.

Conclusion: making event rows flow over gRPC requires the SDK to emit the native
v8 OpenConnection format with ConnectionType=Event (a larger RE+implementation
effort), scoped as a follow-on. v6 is retained as the captured-correct request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-22 10:41:29 -04:00