Files
histsdk/docs/reverse-engineering/handshake-reuse-spike-results.md
T

5.8 KiB
Raw Blame History

Handshake / session-reuse spike — live results

Question: does the 2023 R2 historian honor REUSING one authenticated session (channel + OpenConnection client handle) across multiple operations, instead of the per-operation Create+handshake the SDK does today? This is the precondition for "handshake amortization" (HistorianGateway pending.md A1).

Verdict: GREEN — reuse works and the win is large — but the server idle-expires a session in ~2025 s, so a reuse pool must keep sessions warm.

Date: 2026-06-25 Branch: spike/handshake-reuse Server: live 2023 R2 (wonder-sql-vd03), RemoteGrpc transport, read-only test tag. Harness: tests/AVEVA.Historian.Client.Tests/HandshakeReuseSpikeTests.cs driving the new internal seam HistorianGrpcReadOrchestrator.RunRawQueryOnSession(connection, clientHandle, …) (runs a raw query against an externally-supplied, already-authenticated connection + handle — no Create, no handshake).


1. Reuse validity — GREEN

ReusedSession_RunsManyReads_AllSucceed passed: one HistorianGrpcChannelFactory.Create

  • one HistorianGrpcHandshake.OpenSession, then 5 consecutive RunRawQueryOnSession reads on the same ClientHandle — all returned rows.
open-session (handshake) = 325 ms
reused-read[0] =  96 ms, rows=8
reused-read[1] = 101 ms, rows=8
reused-read[2] = 179 ms, rows=8
reused-read[3] =  92 ms, rows=8
reused-read[4] =  95 ms, rows=8

The server accepts the same client handle across back-to-back StartQuery/GetNextQueryResultBuffer/ EndQuery cycles. Per-query handles are opened/closed each op; the session handle is the reused artifact.

2. Win magnitude — large (~4.7×)

ReusedSession_VsPerCallPath_LogsLatencyDelta (logged, not asserted):

per-call  (5 ops) = 2626 ms      # fresh Create + full handshake + query, ×5
amortized (5 ops) =  561 ms      # one handshake + 5 reused reads
saving over 5 ops = 2065 ms

The handshake (GetInterfaceVersionValidateClientCredential NTLM token loop → OpenConnection, ~325 ms) dominates per-call cost. Amortized, a read is ~110 ms vs ~525 ms per-call. Amortization is clearly worth the refactor for any burst of activity.

3. Expiry — idle timeout ~2025 s (NOT an absolute TTL)

ReusedSession_IdleSweep_SurfacesExpiryTier rethrows at the first idle gap the server rejects.

Coarse sweep [0, 30]: idle 0s → OK, idle 30s → BROKE. Fine sweep [0,5,10,15,20,25,30]:

idle  0s -> OK (rows=8)
idle  5s -> OK
idle 10s -> OK
idle 15s -> OK
idle 20s -> OK          # session age here ≈ 50 s cumulative, still alive
idle 25s -> BROKE (InvalidOperationException: gRPC StartQuery (raw) failed, errorLen=5)

Key inference — it's an idle timeout, not a fixed session lifetime. The reads at gaps of 5/10/15/20 s kept succeeding even though the cumulative session age reached ~50 s by the 20 s-gap read. The session only died after a ≥25 s idle gap. So a session survives indefinitely as long as operations are spaced under ~20 s apart; a quiet gap of ≥25 s invalidates it.

Expired-session failure mode on the wire: StartQuery returns BSuccess=false with a 5-byte error buffer, surfaced by the SDK as InvalidOperationException: gRPC StartQuery (raw) failed (errorLen=5).


4. Implications for Phase 1 (the full amortization refactor)

A reuse pool is viable and high-value, with two requirements driven by §3:

  1. Keep sessions warm. Ping each pooled session well under the ~20 s idle floor (e.g. a ~1015 s keepalive — a cheap handle-using op such as GetSystemParameter) so a steady-state session never crosses the idle timeout. Without a keepalive, amortization only helps within a <~20 s activity burst.
  2. Reactive re-auth on expiry. Treat StartQuery failed (errorLen=5) (and the equivalent on other handle ops) as an expired-session signal: evict the session and re-handshake on next use (one handshake penalty). In HistorianGateway this maps onto the existing IHistorianConnectionPool.ReportFaulted eviction seam.

Concurrency note (unchanged guidance): lease a session exclusively per-op from a bounded pool — this validity test only exercised sequential reuse, so concurrent use of one handle (esp. streaming cursors) remains unproven and should be avoided by exclusive leasing.

Gate decision: GREEN → HistorianGateway A1 Phase 1 (HistorianSession primitive + orchestrator acquire/execute split + re-vendor + leased-session pool with keepalive) is warranted and earns its own design + plan.


5. Write-spike addendum (Phase 1 Stage 0) — 2026-06-25

Extends the harness to the write path via the RunWriteOnSession seam on HistorianGrpcHistoricalWriteOrchestrator. Read + bounded writes to HISTORIAN_WRITE_SANDBOX_TAG only.

reused-write[0] = 377 ms, ok=True
reused-write[1] =  76 ms, ok=True       # 2nd write reuses the same 0x401 session — no handshake
read-on-0x401 -> OK (rows=3)            # a WRITE-enabled session ALSO serves reads

Findings:

  • Write-session reuse — GREEN. Two historical writes on one reused 0x401 (write-enabled) session both succeed; the 2nd skips the Create+handshake.
  • One-kind pool — CONFIRMED. A 0x401 session served a StartQuery read (session.ClientHandle) successfully. So a single write-enabled session serves both reads and writes — the gateway pool needs one session kind, not two. (0x401 "unlocks write capability" and is a superset of the 0x402 read-only mode, as the vendored comment hinted.)

Decision for Phase 1 Stage 3: the gateway always opens WriteEnabled sessions; the HistorianSessionPool is a single warm pool (no per-kind keying). HistorianSessionKind still exists upstream for API clarity, but the gateway uses only WriteEnabled.