From 3849f177463245ae582dbf047a353303e3f8deb6 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 25 Jun 2026 01:19:50 -0400 Subject: [PATCH] docs: handshake-reuse spike results + verdict (GREEN, idle timeout ~20-25s) --- .../handshake-reuse-spike-results.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/reverse-engineering/handshake-reuse-spike-results.md diff --git a/docs/reverse-engineering/handshake-reuse-spike-results.md b/docs/reverse-engineering/handshake-reuse-spike-results.md new file mode 100644 index 0000000..889ae42 --- /dev/null +++ b/docs/reverse-engineering/handshake-reuse-spike-results.md @@ -0,0 +1,100 @@ +# Handshake / session-reuse spike — live results + +> **Question:** does the 2023 R2 historian honor REUSING one authenticated session +> (channel + `OpenConnection` client handle) across multiple operations, instead of +> the per-operation Create+handshake the SDK does today? This is the precondition for +> "handshake amortization" (HistorianGateway `pending.md` A1). +> +> **Verdict: GREEN — reuse works and the win is large — but the server idle-expires a +> session in ~20–25 s, so a reuse pool must keep sessions warm.** + +**Date:** 2026-06-25 +**Branch:** `spike/handshake-reuse` +**Server:** live 2023 R2 (`wonder-sql-vd03`), RemoteGrpc transport, read-only test tag. +**Harness:** `tests/AVEVA.Historian.Client.Tests/HandshakeReuseSpikeTests.cs` driving the new +internal seam `HistorianGrpcReadOrchestrator.RunRawQueryOnSession(connection, clientHandle, …)` +(runs a raw query against an externally-supplied, already-authenticated connection + handle — +no Create, no handshake). + +--- + +## 1. Reuse validity — GREEN + +`ReusedSession_RunsManyReads_AllSucceed` **passed**: one `HistorianGrpcChannelFactory.Create` ++ one `HistorianGrpcHandshake.OpenSession`, then **5 consecutive `RunRawQueryOnSession` reads on +the same `ClientHandle`** — all returned rows. + +``` +open-session (handshake) = 325 ms +reused-read[0] = 96 ms, rows=8 +reused-read[1] = 101 ms, rows=8 +reused-read[2] = 179 ms, rows=8 +reused-read[3] = 92 ms, rows=8 +reused-read[4] = 95 ms, rows=8 +``` + +The server accepts the same client handle across back-to-back `StartQuery`/`GetNextQueryResultBuffer`/ +`EndQuery` cycles. Per-query handles are opened/closed each op; the **session** handle is the reused +artifact. + +## 2. Win magnitude — large (~4.7×) + +`ReusedSession_VsPerCallPath_LogsLatencyDelta` (logged, not asserted): + +``` +per-call (5 ops) = 2626 ms # fresh Create + full handshake + query, ×5 +amortized (5 ops) = 561 ms # one handshake + 5 reused reads +saving over 5 ops = 2065 ms +``` + +The handshake (`GetInterfaceVersion` → `ValidateClientCredential` NTLM token loop → +`OpenConnection`, ~325 ms) dominates per-call cost. Amortized, a read is ~110 ms vs ~525 ms +per-call. **Amortization is clearly worth the refactor for any burst of activity.** + +## 3. Expiry — idle timeout ~20–25 s (NOT an absolute TTL) + +`ReusedSession_IdleSweep_SurfacesExpiryTier` rethrows at the first idle gap the server rejects. + +Coarse sweep `[0, 30]`: `idle 0s → OK`, `idle 30s → BROKE`. +Fine sweep `[0,5,10,15,20,25,30]`: + +``` +idle 0s -> OK (rows=8) +idle 5s -> OK +idle 10s -> OK +idle 15s -> OK +idle 20s -> OK # session age here ≈ 50 s cumulative, still alive +idle 25s -> BROKE (InvalidOperationException: gRPC StartQuery (raw) failed, errorLen=5) +``` + +**Key inference — it's an idle timeout, not a fixed session lifetime.** The reads at gaps of +5/10/15/20 s kept succeeding even though the cumulative session age reached ~50 s by the 20 s-gap +read. The session only died after a **≥25 s idle gap**. So a session survives indefinitely as long +as operations are spaced under ~20 s apart; a quiet gap of ≥25 s invalidates it. + +Expired-session failure mode on the wire: `StartQuery` returns `BSuccess=false` with a 5-byte error +buffer, surfaced by the SDK as `InvalidOperationException: gRPC StartQuery (raw) failed +(errorLen=5)`. + +--- + +## 4. Implications for Phase 1 (the full amortization refactor) + +A reuse pool is viable and high-value, with two requirements driven by §3: + +1. **Keep sessions warm.** Ping each pooled session well under the ~20 s idle floor (e.g. a + ~10–15 s keepalive — a cheap handle-using op such as `GetSystemParameter`) so a steady-state + session never crosses the idle timeout. Without a keepalive, amortization only helps within a + <~20 s activity burst. +2. **Reactive re-auth on expiry.** Treat `StartQuery failed (errorLen=5)` (and the equivalent on + other handle ops) as an expired-session signal: evict the session and re-handshake on next use + (one handshake penalty). In HistorianGateway this maps onto the existing + `IHistorianConnectionPool.ReportFaulted` eviction seam. + +**Concurrency note (unchanged guidance):** lease a session exclusively per-op from a bounded pool — +this validity test only exercised *sequential* reuse, so concurrent use of one handle (esp. +streaming cursors) remains unproven and should be avoided by exclusive leasing. + +**Gate decision:** GREEN → HistorianGateway A1 Phase 1 (HistorianSession primitive + orchestrator +acquire/execute split + re-vendor + leased-session pool with keepalive) is warranted and earns its +own design + plan.