Files
histsdk/docs/reverse-engineering/handshake-reuse-spike-results.md
T

101 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Handshake / session-reuse spike — live results
> **Question:** does the 2023 R2 historian honor REUSING one authenticated session
> (channel + `OpenConnection` client handle) across multiple operations, instead of
> the per-operation Create+handshake the SDK does today? This is the precondition for
> "handshake amortization" (HistorianGateway `pending.md` A1).
>
> **Verdict: GREEN — reuse works and the win is large — but the server idle-expires a
> session in ~2025 s, so a reuse pool must keep sessions warm.**
**Date:** 2026-06-25
**Branch:** `spike/handshake-reuse`
**Server:** live 2023 R2 (`wonder-sql-vd03`), RemoteGrpc transport, read-only test tag.
**Harness:** `tests/AVEVA.Historian.Client.Tests/HandshakeReuseSpikeTests.cs` driving the new
internal seam `HistorianGrpcReadOrchestrator.RunRawQueryOnSession(connection, clientHandle, …)`
(runs a raw query against an externally-supplied, already-authenticated connection + handle —
no Create, no handshake).
---
## 1. Reuse validity — GREEN
`ReusedSession_RunsManyReads_AllSucceed` **passed**: one `HistorianGrpcChannelFactory.Create`
+ one `HistorianGrpcHandshake.OpenSession`, then **5 consecutive `RunRawQueryOnSession` reads on
the same `ClientHandle`** — all returned rows.
```
open-session (handshake) = 325 ms
reused-read[0] = 96 ms, rows=8
reused-read[1] = 101 ms, rows=8
reused-read[2] = 179 ms, rows=8
reused-read[3] = 92 ms, rows=8
reused-read[4] = 95 ms, rows=8
```
The server accepts the same client handle across back-to-back `StartQuery`/`GetNextQueryResultBuffer`/
`EndQuery` cycles. Per-query handles are opened/closed each op; the **session** handle is the reused
artifact.
## 2. Win magnitude — large (~4.7×)
`ReusedSession_VsPerCallPath_LogsLatencyDelta` (logged, not asserted):
```
per-call (5 ops) = 2626 ms # fresh Create + full handshake + query, ×5
amortized (5 ops) = 561 ms # one handshake + 5 reused reads
saving over 5 ops = 2065 ms
```
The handshake (`GetInterfaceVersion``ValidateClientCredential` NTLM token loop →
`OpenConnection`, ~325 ms) dominates per-call cost. Amortized, a read is ~110 ms vs ~525 ms
per-call. **Amortization is clearly worth the refactor for any burst of activity.**
## 3. Expiry — idle timeout ~2025 s (NOT an absolute TTL)
`ReusedSession_IdleSweep_SurfacesExpiryTier` rethrows at the first idle gap the server rejects.
Coarse sweep `[0, 30]`: `idle 0s → OK`, `idle 30s → BROKE`.
Fine sweep `[0,5,10,15,20,25,30]`:
```
idle 0s -> OK (rows=8)
idle 5s -> OK
idle 10s -> OK
idle 15s -> OK
idle 20s -> OK # session age here ≈ 50 s cumulative, still alive
idle 25s -> BROKE (InvalidOperationException: gRPC StartQuery (raw) failed, errorLen=5)
```
**Key inference — it's an idle timeout, not a fixed session lifetime.** The reads at gaps of
5/10/15/20 s kept succeeding even though the cumulative session age reached ~50 s by the 20 s-gap
read. The session only died after a **≥25 s idle gap**. So a session survives indefinitely as long
as operations are spaced under ~20 s apart; a quiet gap of ≥25 s invalidates it.
Expired-session failure mode on the wire: `StartQuery` returns `BSuccess=false` with a 5-byte error
buffer, surfaced by the SDK as `InvalidOperationException: gRPC StartQuery (raw) failed
(errorLen=5)`.
---
## 4. Implications for Phase 1 (the full amortization refactor)
A reuse pool is viable and high-value, with two requirements driven by §3:
1. **Keep sessions warm.** Ping each pooled session well under the ~20 s idle floor (e.g. a
~1015 s keepalive — a cheap handle-using op such as `GetSystemParameter`) so a steady-state
session never crosses the idle timeout. Without a keepalive, amortization only helps within a
<~20 s activity burst.
2. **Reactive re-auth on expiry.** Treat `StartQuery failed (errorLen=5)` (and the equivalent on
other handle ops) as an expired-session signal: evict the session and re-handshake on next use
(one handshake penalty). In HistorianGateway this maps onto the existing
`IHistorianConnectionPool.ReportFaulted` eviction seam.
**Concurrency note (unchanged guidance):** lease a session exclusively per-op from a bounded pool —
this validity test only exercised *sequential* reuse, so concurrent use of one handle (esp.
streaming cursors) remains unproven and should be avoided by exclusive leasing.
**Gate decision:** GREEN → HistorianGateway A1 Phase 1 (HistorianSession primitive + orchestrator
acquire/execute split + re-vendor + leased-session pool with keepalive) is warranted and earns its
own design + plan.