127 lines
5.8 KiB
Markdown
127 lines
5.8 KiB
Markdown
# Handshake / session-reuse spike — live results
|
||
|
||
> **Question:** does the 2023 R2 historian honor REUSING one authenticated session
|
||
> (channel + `OpenConnection` client handle) across multiple operations, instead of
|
||
> the per-operation Create+handshake the SDK does today? This is the precondition for
|
||
> "handshake amortization" (HistorianGateway `pending.md` A1).
|
||
>
|
||
> **Verdict: GREEN — reuse works and the win is large — but the server idle-expires a
|
||
> session in ~20–25 s, so a reuse pool must keep sessions warm.**
|
||
|
||
**Date:** 2026-06-25
|
||
**Branch:** `spike/handshake-reuse`
|
||
**Server:** live 2023 R2 (`wonder-sql-vd03`), RemoteGrpc transport, read-only test tag.
|
||
**Harness:** `tests/AVEVA.Historian.Client.Tests/HandshakeReuseSpikeTests.cs` driving the new
|
||
internal seam `HistorianGrpcReadOrchestrator.RunRawQueryOnSession(connection, clientHandle, …)`
|
||
(runs a raw query against an externally-supplied, already-authenticated connection + handle —
|
||
no Create, no handshake).
|
||
|
||
---
|
||
|
||
## 1. Reuse validity — GREEN
|
||
|
||
`ReusedSession_RunsManyReads_AllSucceed` **passed**: one `HistorianGrpcChannelFactory.Create`
|
||
+ one `HistorianGrpcHandshake.OpenSession`, then **5 consecutive `RunRawQueryOnSession` reads on
|
||
the same `ClientHandle`** — all returned rows.
|
||
|
||
```
|
||
open-session (handshake) = 325 ms
|
||
reused-read[0] = 96 ms, rows=8
|
||
reused-read[1] = 101 ms, rows=8
|
||
reused-read[2] = 179 ms, rows=8
|
||
reused-read[3] = 92 ms, rows=8
|
||
reused-read[4] = 95 ms, rows=8
|
||
```
|
||
|
||
The server accepts the same client handle across back-to-back `StartQuery`/`GetNextQueryResultBuffer`/
|
||
`EndQuery` cycles. Per-query handles are opened/closed each op; the **session** handle is the reused
|
||
artifact.
|
||
|
||
## 2. Win magnitude — large (~4.7×)
|
||
|
||
`ReusedSession_VsPerCallPath_LogsLatencyDelta` (logged, not asserted):
|
||
|
||
```
|
||
per-call (5 ops) = 2626 ms # fresh Create + full handshake + query, ×5
|
||
amortized (5 ops) = 561 ms # one handshake + 5 reused reads
|
||
saving over 5 ops = 2065 ms
|
||
```
|
||
|
||
The handshake (`GetInterfaceVersion` → `ValidateClientCredential` NTLM token loop →
|
||
`OpenConnection`, ~325 ms) dominates per-call cost. Amortized, a read is ~110 ms vs ~525 ms
|
||
per-call. **Amortization is clearly worth the refactor for any burst of activity.**
|
||
|
||
## 3. Expiry — idle timeout ~20–25 s (NOT an absolute TTL)
|
||
|
||
`ReusedSession_IdleSweep_SurfacesExpiryTier` rethrows at the first idle gap the server rejects.
|
||
|
||
Coarse sweep `[0, 30]`: `idle 0s → OK`, `idle 30s → BROKE`.
|
||
Fine sweep `[0,5,10,15,20,25,30]`:
|
||
|
||
```
|
||
idle 0s -> OK (rows=8)
|
||
idle 5s -> OK
|
||
idle 10s -> OK
|
||
idle 15s -> OK
|
||
idle 20s -> OK # session age here ≈ 50 s cumulative, still alive
|
||
idle 25s -> BROKE (InvalidOperationException: gRPC StartQuery (raw) failed, errorLen=5)
|
||
```
|
||
|
||
**Key inference — it's an idle timeout, not a fixed session lifetime.** The reads at gaps of
|
||
5/10/15/20 s kept succeeding even though the cumulative session age reached ~50 s by the 20 s-gap
|
||
read. The session only died after a **≥25 s idle gap**. So a session survives indefinitely as long
|
||
as operations are spaced under ~20 s apart; a quiet gap of ≥25 s invalidates it.
|
||
|
||
Expired-session failure mode on the wire: `StartQuery` returns `BSuccess=false` with a 5-byte error
|
||
buffer, surfaced by the SDK as `InvalidOperationException: gRPC StartQuery (raw) failed
|
||
(errorLen=5)`.
|
||
|
||
---
|
||
|
||
## 4. Implications for Phase 1 (the full amortization refactor)
|
||
|
||
A reuse pool is viable and high-value, with two requirements driven by §3:
|
||
|
||
1. **Keep sessions warm.** Ping each pooled session well under the ~20 s idle floor (e.g. a
|
||
~10–15 s keepalive — a cheap handle-using op such as `GetSystemParameter`) so a steady-state
|
||
session never crosses the idle timeout. Without a keepalive, amortization only helps within a
|
||
<~20 s activity burst.
|
||
2. **Reactive re-auth on expiry.** Treat `StartQuery failed (errorLen=5)` (and the equivalent on
|
||
other handle ops) as an expired-session signal: evict the session and re-handshake on next use
|
||
(one handshake penalty). In HistorianGateway this maps onto the existing
|
||
`IHistorianConnectionPool.ReportFaulted` eviction seam.
|
||
|
||
**Concurrency note (unchanged guidance):** lease a session exclusively per-op from a bounded pool —
|
||
this validity test only exercised *sequential* reuse, so concurrent use of one handle (esp.
|
||
streaming cursors) remains unproven and should be avoided by exclusive leasing.
|
||
|
||
**Gate decision:** GREEN → HistorianGateway A1 Phase 1 (HistorianSession primitive + orchestrator
|
||
acquire/execute split + re-vendor + leased-session pool with keepalive) is warranted and earns its
|
||
own design + plan.
|
||
|
||
---
|
||
|
||
## 5. Write-spike addendum (Phase 1 Stage 0) — 2026-06-25
|
||
|
||
Extends the harness to the write path via the `RunWriteOnSession` seam on
|
||
`HistorianGrpcHistoricalWriteOrchestrator`. Read + bounded writes to `HISTORIAN_WRITE_SANDBOX_TAG`
|
||
only.
|
||
|
||
```
|
||
reused-write[0] = 377 ms, ok=True
|
||
reused-write[1] = 76 ms, ok=True # 2nd write reuses the same 0x401 session — no handshake
|
||
read-on-0x401 -> OK (rows=3) # a WRITE-enabled session ALSO serves reads
|
||
```
|
||
|
||
**Findings:**
|
||
- **Write-session reuse — GREEN.** Two historical writes on one reused `0x401` (write-enabled)
|
||
session both succeed; the 2nd skips the Create+handshake.
|
||
- **One-kind pool — CONFIRMED.** A `0x401` session served a `StartQuery` read (`session.ClientHandle`)
|
||
successfully. So a single **write-enabled** session serves both reads and writes — the gateway pool
|
||
needs **one session kind**, not two. (`0x401` "unlocks write capability" and is a superset of the
|
||
`0x402` read-only mode, as the vendored comment hinted.)
|
||
|
||
**Decision for Phase 1 Stage 3:** the gateway always opens `WriteEnabled` sessions; the
|
||
`HistorianSessionPool` is a **single warm pool** (no per-kind keying). `HistorianSessionKind` still
|
||
exists upstream for API clarity, but the gateway uses only `WriteEnabled`.
|