Files
histsdk/docs/plans/tcp-connection-validation.md
dohertj2 6888b8c55a Wire SDK for remote-TCP end to end; live-verify RemoteTcpIntegrated
Executes docs/plans/tcp-connection-validation.md. Full read-only SDK
surface now works against a remote AVEVA Historian over Net.TCP with
Windows transport authentication. 124/124 tests pass; the +10 new live
integration tests in RemoteTcpIntegrationTests.cs are gated by
HISTORIAN_REMOTE_TCP_HOST + HISTORIAN_REMOTE_TCP_TAG.

Two SDK bugs found while executing the plan:

1. Historian2020ProtocolDialect.ReadRawAsync / ReadAggregateAsync /
   ReadAtTimeAsync / ReadEventsAsync had explicit
   `if (_options.Transport != HistorianTransport.LocalPipe) return Missing<T>`
   guards. These were a guardrail from before the orchestrators handled
   TCP; the orchestrators have always used CreateBindingPair(options)
   which dispatches on transport correctly. Gates removed.

2. HistorianWcfStatusClient and HistorianWcfEventOrchestrator hardcoded
   HistorianWcfBindingFactory.CreatePipeEndpointAddress for the auxiliary
   services (Stat, Trx, Retr). Worked for LocalPipe; for TCP it produced
   an EndpointAddress with scheme net.pipe attached to a TCP binding
   (channel factory rejected the URI). Worse, when only the endpoint was
   transport-aware, the binding still requested a Windows-transport-
   security upgrade that the Stat endpoint over TCP doesn't support
   (auxiliaries don't repeat the auth — the Hist session is already
   authenticated). Added two helpers:
   - HistorianWcfBindingFactory.CreateAuxiliaryEndpointAddress(options, name)
     -> net.pipe for LocalPipe, net.tcp for remote
   - HistorianWcfBindingFactory.CreateAuxiliaryBinding(options)
     -> NamedPipe for LocalPipe, plain MdasNetTcpBinding for remote
   Both call sites updated.

Live verification against the remote (probed previously in prior
sessions; reachability re-confirmed today):
- ProbeAsync over RemoteTcpIntegrated and RemoteTcpCertificate
- ReadRawAsync (8 samples returned for SysTimeSec)
- ReadAggregateAsync (TimeWeightedAverage, 1-min cycle, 10-min window)
- ReadAtTimeAsync (3 timestamps)
- BrowseTagNamesAsync (finds the test tag)
- GetTagMetadataAsync (full metadata populated)
- ReadEventsAsync (chain runs without throwing)
- GetConnectionStatusAsync (ConnectedToServer=true)
- GetSystemParameterAsync (HistorianVersion="20,0,000,000")

The default 'NT SERVICE\aahClientAccessPoint' SPN turned out to work
for the remote too — discovery workstream A (SPN-finding) was not
needed in practice.

README and the TCP plan doc updated to reflect the executed status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 07:33:50 -04:00

18 KiB
Raw Permalink Blame History

TCP Connection Validation Plan

Status: EXECUTED on 2026-05-04. RemoteTcpIntegrated transport is now live-verified end-to-end against 10.100.0.48 (Historian InterfaceVersion=11) for ProbeAsync, ReadRawAsync, ReadAggregateAsync, ReadAtTimeAsync, ReadEventsAsync, BrowseTagNamesAsync, GetTagMetadataAsync, GetConnectionStatusAsync, GetSystemParameterAsync. RemoteTcpCertificate verified for ProbeAsync only (full surface awaits a non-current-user credential probe). Test count 114 → 124 (+10) per success criteria.

Two SDK bugs were uncovered and fixed during execution:

  1. Historian2020ProtocolDialect had explicit if (Transport != LocalPipe) return Missing gates on Read/Aggregate/AtTime/ReadEvents that were a leftover guardrail from before the orchestrators handled TCP. Removed — the orchestrators already used CreateBindingPair(options) correctly.
  2. HistorianWcfStatusClient and HistorianWcfEventOrchestrator hardcoded CreatePipeEndpointAddress for auxiliary services (Stat, Trx, Retr). Added HistorianWcfBindingFactory.CreateAuxiliaryEndpointAddress and CreateAuxiliaryBinding helpers that dispatch on Transport; for TCP the auxiliaries use plain MdasNetTcpBinding (no transport upgrade — the Hist endpoint already authenticated the session).

Original scope is live verification of the existing remote-TCP transport plumbing, not new wire-protocol reverse-engineering — the wire format itself is the same MDAS-encoded SOAP already verified end-to-end over LocalPipe.

Read together with:

1. Goal

"TCP transport works" means the production SDK at src/AVEVA.Historian.Client/ performs every operation in the CLAUDE.md required surface end-to-end against a remote AVEVA Historian over Net.TCP on port 32568, with parsed responses, gated live integration tests, and explicit expectations about when RemoteTcpIntegrated vs RemoteTcpCertificate is the right transport choice.

In scope:

  1. RemoteTcpIntegrated — Net.TCP + SSPI Windows transport credentials, binding CreateMdasNetTcpWindowsBinding (HistorianWcfBindingFactory.cs:36), endpoint /Hist-Integrated. The SDK already wires this through HistorianClientOptions.Transport = HistorianTransport.RemoteTcpIntegrated.
  2. RemoteTcpCertificate — Net.TCP + transport security with no client credential (server cert only), binding CreateMdasNetTcpCertificateBinding (HistorianWcfBindingFactory.cs:66), endpoint /HistCert. SDK plumbing exists.
  3. Plain RemoteTcp (no transport security) — CreateMdasNetTcpBinding (HistorianWcfBindingFactory.cs:13) is what the /Retr endpoint uses for both above. Verify it works end-to-end as the read-side channel.
  4. Verification of every public op over each transport: ProbeAsync, ReadRawAsync, ReadAggregateAsync, ReadAtTimeAsync, ReadEventsAsync, BrowseTagNamesAsync, GetTagMetadataAsync, GetConnectionStatusAsync, GetStoreForwardStatusAsync, GetSystemParameterAsync.

Out of scope:

  • New wire-protocol reverse engineering — the binary protocol is identical across transports; only the WCF binding shape and credential negotiation differ.
  • Discovering or installing remote Historian instances — operator task, not SDK work.
  • Cert generation / CA bootstrap — operator task; the SDK consumes a cert, not provisions one.
  • RemoteTcp* for the explicit-credentials path (IntegratedSecurity = false with username + password) — that's a separate gap (HistorianSspiClient currently only handles current-user credentials).
  • Connection pooling, reconnection on idle disconnect, or load balancing across redundant Historians.

2. Preconditions

The work cannot start until all of these are true:

Precondition Why Who
A reachable remote AVEVA Historian (not localhost) TCP transport behavior cannot be exercised against a same-host install (the LocalPipe binding short-circuits the SSPI negotiation that Net.TCP + Windows transport actually exercises) Operator
Network reachability on port 32568 (TCP) from the dev workstation The Historian listens on this port for the /Hist-Integrated, /HistCert, /Retr endpoints Operator + IT
A test account with at least read access Used for RemoteTcpIntegrated SSPI negotiation Operator
The Historian's SPN registered on the host account SSPI auth fails without a valid SPN; default native uses NT SERVICE\aahClientAccessPoint for LocalPipe. Remote uses something else (likely MSSQLSvc/host:port or a custom historian SPN) — DISCOVER FIRST Discovery (§4.A)
For RemoteTcpCertificate: a server cert exposed at /HistCert, with the cert's CA chain trusted by the dev workstation OR an explicit thumbprint pinning hook TLS handshake aborts otherwise Operator + Discovery
At least one tag with non-zero history rows on the remote Historian Otherwise ReadRawAsync returns empty and we can't distinguish "transport works, no data" from "transport silently broken" Operator
Time skew between dev workstation and remote Historian < 5 minutes SSPI negotiation rejects out-of-skew tickets Operator

If any precondition is missing, the plan stops at §4.A discovery and reports back; don't try to "guess" a workaround.

3. Current state

Already wired and compiling, never live-verified:

Path Status
HistorianWcfBindingFactory.CreateBindingPair (:126) — dispatch on HistorianTransport enum all three transport branches exist
RemoteTcpIntegrated branch (:138) — uses MdasNetTcpWindows for Hist + plain MdasNetTcp for Retr, both at Host:Port wired
RemoteTcpCertificate branch (:143) — uses MdasNetTcpCertificate for Hist + plain MdasNetTcp for Retr wired
MdasMessageEncodingBindingElement shared across transports same encoder used everywhere; not transport-specific
HistorianSspiClient (P/Invoke InitializeSecurityContextW with native flags 0x2081C / 0x81C) ⚠️ only exercised over LocalPipe; need to verify SPN logic works for TCP host SPN

Hard-coded LocalPipe in tests (must not be left in place once TCP is verified):

EventChainDiagnosticTests.cs:30
HistorianClientIntegrationTests.cs:79, :114, :154, :184, :215, :237, :262, :320

There are ten instances of Transport = HistorianTransport.LocalPipe. The existing tests skip cleanly when HISTORIAN_HOST != "localhost"; they do NOT need to change to validate TCP — instead, add new parallel tests gated by a separate env var (e.g., HISTORIAN_REMOTE_TCP_HOST) so both test families run independently.

4. Discovery workstreams

These can run in any order; A, B, and C are parallelizable since each hits a different surface (binary inspection vs probe vs operator I/O). D must be sequential after one of A/B/C produces actionable results.

A. SPN discovery (parallel-safe — read-only binary + WCF probe)

The native client's TCP SPN is currently unknown. Find it via:

  1. Static IL — search current/aahClientManaged.dll for the strings "NT SERVICE", "aahClient", and "MSSQLSvc" using tools/AVEVA.Historian.ReverseEngineering (the methods and dnlib-method --instructions commands handled the SSPI flag discovery the same way). The HistorianClientOptions.TargetSpn default (NT SERVICE\aahClientAccessPoint) is the LocalPipe SPN — TCP almost certainly differs.
  2. WCF probetools/AVEVA.Historian.ReverseEngineering -- wcf-probe <remote-host> 32568 against the remote Historian. Capture the SOAP fault on failure: it usually echoes the expected SPN in the wsa:FaultDetail.
  3. Cross-reference with setspn -L <historian-svc-account> on the remote Historian (requires operator access).

Output: a documented TargetSpn value for RemoteTcpIntegrated use, plus how the SPN is computed (likely host-derived).

B. Cert binding discovery (parallel-safe — read-only WCF probe)

For RemoteTcpCertificate:

  1. WCF cert probetools/AVEVA.Historian.ReverseEngineering -- wcf-cert-probe <remote-host> 32568 <expected-cn>. The probe captures the cert chain and reports CN/SAN.
  2. Cert validation policy — current binding (HistorianWcfBindingFactory.cs:74) sets ClientCredentialType = None; the server cert is validated by the default WCF chain check. Document what's required: trusted CA, or thumbprint pinning, or X509ServiceCertificateAuthentication. CertificateValidationMode = PeerOrChainTrust.
  3. Verify endpoint identity/HistCert may require an EndpointIdentity (DNS or RSA) on the EndpointAddress. Current code (HistorianWcfBindingFactory.cs:152) does not set one. Test whether identity verification fails without it.

Output: documented cert validation requirements + whether EndpointAddress(uri, identity) overload is needed.

C. Operator setup checklist (parallel-safe — operator-side)

Produce a one-page checklist the operator runs against the remote Historian to confirm preconditions. Includes:

  • Test-NetConnection -ComputerName <host> -Port 32568 from dev workstation
  • Get-Service aahHistorian* on the remote (verify running)
  • setspn -L <svc-account> to capture registered SPNs
  • sqlcmd -E -S <host> -d Runtime -Q "SELECT TOP 1 TagName FROM Tag" to prove the Runtime DB is reachable with the operator's credentials
  • w32tm /query /status to confirm time sync vs the Historian

Output: docs/plans/tcp-validation-operator-checklist.md (or appendix in this doc) the operator can hand back filled in.

D. Auth-chain delta vs LocalPipe (sequential — needs A's SPN)

Once A returns an SPN, run tools/AVEVA.Historian.ReverseEngineering -- wcf-probe against /Hist-Integrated over TCP and confirm:

  1. The Hist.GetV → Hist.ValCl × N → Hist.Open2 chain runs the same number of ValCl rounds (LocalPipe was 2; TCP may be 2 or 3 depending on whether the underlying transport already negotiated something).
  2. The OpenConnection2 request bytes are identical (the body is transport-agnostic — only the WCF wrapper differs).
  3. The Open2 response carries the same outParameters shape (42 bytes with version, session GUID, FILETIMEs, status). If TCP returns a different shape, the parser at HistorianWcfAuthChainHelper.cs needs a transport-aware path.

Output: byte-for-byte diff between LocalPipe and TCP capture, with any deltas noted.

5. Verification workstreams

Once §4.A and §4.B return actionable answers, every operation gets a parallel verification track. All five tracks below are parallelizable because each exercises a different SDK method and they don't share state.

Track Op Live test to author Parallel-safe?
V1 ProbeAsync ProbeAsync_RemoteTcpIntegrated_ReturnsTrue + ProbeAsync_RemoteTcpCertificate_ReturnsTrue
V2 Reads (ReadRawAsync, ReadAggregateAsync, ReadAtTimeAsync) mirror of the existing 3 LocalPipe tests, env-var gated by HISTORIAN_REMOTE_TCP_HOST
V3 ReadEventsAsync mirror of ReadEventsAsync_AgainstLocalHistorian_DoesNotThrow
V4 Tag ops (BrowseTagNamesAsync, GetTagMetadataAsync) mirror of the two LocalPipe tests
V5 Status helpers (GetConnectionStatusAsync, GetStoreForwardStatusAsync, GetSystemParameterAsync) mirror of the three LocalPipe tests

The only sequential dependency: V1 must pass before V2-V5 are meaningful (if ProbeAsync returns false, the others will too for transport reasons).

For each track, the test pattern is:

string? host = Environment.GetEnvironmentVariable("HISTORIAN_REMOTE_TCP_HOST");
if (string.IsNullOrWhiteSpace(host) || !OperatingSystem.IsWindows()) return;

HistorianClient client = new(new HistorianClientOptions
{
    Host = host,
    Port = 32568,
    IntegratedSecurity = true,
    Transport = HistorianTransport.RemoteTcpIntegrated,
    TargetSpn = Environment.GetEnvironmentVariable("HISTORIAN_REMOTE_TCP_SPN")
        ?? throw new InvalidOperationException("Set HISTORIAN_REMOTE_TCP_SPN per §4.A"),
});

// ... existing test body, unchanged ...

Add a parallel set for RemoteTcpCertificate gated by HISTORIAN_REMOTE_TCPCERT_HOST + HISTORIAN_REMOTE_TCPCERT_THUMBPRINT.

6. Risks and mitigations

Risk Likelihood Impact Mitigation
SPN mismatch — TCP SSPI negotiation rejects with SEC_E_TARGET_UNKNOWN High All TCP ops fail §4.A discovery first; expose TargetSpn as already done in HistorianClientOptions
Cert chain validation rejects — server cert not trusted by dev workstation High for Certificate transport Cert transport unusable §4.B: document required CA / pinning hook; consider a ServerCertificateValidator callback option
Endpoint identity required/HistCert rejects without DNS identity in EndpointAddress Medium Cert transport unusable §4.B step 3; if confirmed, add overload to CreateEndpointAddress
Wire-level idle disconnect — TCP connection dropped after N seconds idle, mid-test Medium Flaky tests Set RequestTimeout low enough to fail fast; add reconnect logic if seen repeatedly
Open2 response differs over TCP — extra bytes in outParameters for TCP-specific session state Low (reads/events use the same ConnectionMode 0x402 regardless of transport) Auth chain breaks §4.D byte-diff captures it; if found, transport-aware parser branch
Compression negotiationHistorianClientOptions.Compression unset on LocalPipe; over TCP, the server might enable gzip and our MdasMessageEncoder doesn't unwrap it Medium-Low Requests succeed, responses garbled Confirm compression off in initial probe; add gzip handling later if needed
Time skew — Kerberos ticket clock skew > 5min rejects auth Low Total auth failure Operator checklist (§4.C) catches this
Probe succeeds but reads silently empty — common when tag-permissions don't grant the test account read access Medium False positive in V1, V2 fails V2 asserts samples.Count > 0 for a tag known to have data

7. Success criteria

For RemoteTcpIntegrated:

  • ProbeAsync returns true against the remote host
  • All five Verification tracks (V1-V5) pass against the remote host
  • Captured wire bytes for Open2 and StartQuery2 match the LocalPipe captures (modulo session-specific GUIDs / FILETIMEs)
  • Test count is 114 → 124 (10 new live tests) when both env vars set
  • All existing Transport = HistorianTransport.LocalPipe tests still pass when only the LocalPipe env var is set (no regression)

For RemoteTcpCertificate: same as above, gated by the cert-specific env vars. May skip ReadEvents if the cert account doesn't have AnE permission.

For documentation:

  • README.md operation status table updated: RemoteTcpIntegrated and RemoteTcpCertificate transports change from "wired but only LocalPipe has live verification" to "live-verified"
  • docs/reverse-engineering/handoff.md gets a new section documenting any LocalPipe vs TCP wire-byte deltas found in §4.D

8. Open questions

  1. Is there even a remote Historian available to test against? If not, this plan stalls at §2 preconditions until one is provisioned. (Note: handoff.md mentions a 10.100.0.x remote Historian and a Debian relay used in earlier sessions — verify whether that infrastructure is still live and reachable.)
  2. Does RemoteTcpCertificate use mutual-TLS or just server-cert? Current binding (HistorianWcfBindingFactory.cs:74) sets ClientCredentialType = None (server-cert only). Confirm against the actual /HistCert endpoint behavior.
  3. Does the /Retr channel need its own auth, or does it inherit from the /Hist-Integrated session? Current code uses plain MdasNetTcp (no transport security) for Retr in both RemoteTcpIntegrated and RemoteTcpCertificate configurations — is that actually how the native client does it, or does the native push security on Retr too?
  4. What happens if the cert presented by /HistCert has a SAN that doesn't match the host the SDK connected to? Decide pinning vs DNS validation.
  5. The HistorianClientOptions.Compression flag exists but is not consumed anywhere in the WCF layer. Is compression a transport concern or an application-payload concern? Need to know before TCP — the bandwidth savings only matter over WAN.

9. Parallelization summary

Within the discovery phase: A, B, C run in parallel. D blocks on A.

Within the verification phase: V1 must pass first, then V2-V5 parallel.

Both RemoteTcpIntegrated and RemoteTcpCertificate verification tracks can run independently from each other.

End-to-end estimated wall-clock if all preconditions are met:

  • §4 discovery: half a day if SPN is straightforward, longer if cert chain surprises bite.
  • §5 verification: 2-4 hours given the test scaffolding is largely a copy of the existing LocalPipe tests.

If only one developer works the plan: ~1 day. With two developers parallelizing across RemoteTcpIntegrated and RemoteTcpCertificate: ~half a day.

10. Out of scope (filed under separate plans)

  • Write commands over TCPdocs/plans/write-commands-reverse-engineering.md covers writes; once that lands, this doc adds a §11 "TCP write verification" track.
  • Store/Forward sidecar over TCP — covered by docs/plans/store-forward-cache-reverse-engineering.md. SF probably uses a separate IPC anyway, not Net.TCP.
  • Explicit-credentials TCPIntegratedSecurity = false with username + password requires HistorianSspiClient to support explicit credentials, which is its own task. Net.TCP can use either Kerberos or the explicit creds, but the SDK's SSPI client only does current-user Kerberos today.