From 1b31c24c8d2f1c2b7ab69ac596a62b5736087ac9 Mon Sep 17 00:00:00 2001 From: dohertj2 Date: Mon, 4 May 2026 07:20:39 -0400 Subject: [PATCH] Plan TCP connection validation (live verification of the existing remote-TCP plumbing) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit docs/plans/tcp-connection-validation.md (308 lines): Plan to live-verify the RemoteTcpIntegrated and RemoteTcpCertificate transports against an actual remote AVEVA Historian. The SDK's HistorianWcfBindingFactory already builds all three bindings (CreateMdasNetTcpBinding, CreateMdasNetTcpWindowsBinding, CreateMdasNetTcpCertificateBinding) but only LocalPipe has been exercised end-to-end. Wire format is identical across transports; only WCF binding shape and credential negotiation differ. Discovery workstreams A/B/C run in parallel (SPN discovery via static IL + WCF probe; cert binding requirements via wcf-cert-probe; operator preconditions checklist). D blocks on A. Verification tracks V1-V5 also parallelize once V1 (ProbeAsync) confirms the transport is reachable. Includes risks (SPN mismatch, cert chain validation, idle disconnect, Open2 response delta, compression negotiation, time skew, false-positive empty reads), success criteria, eight open questions, and explicit out-of-scope items filed under the existing write-commands and store-forward plans. No code changes; no preconditions assumed met. Implementer must satisfy §2 preconditions (reachable remote Historian, port 32568 open, test account, SPN registered, etc.) before §4 discovery starts. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/plans/tcp-connection-validation.md | 308 ++++++++++++++++++++++++ 1 file changed, 308 insertions(+) create mode 100644 docs/plans/tcp-connection-validation.md diff --git a/docs/plans/tcp-connection-validation.md b/docs/plans/tcp-connection-validation.md new file mode 100644 index 0000000..d1a12e5 --- /dev/null +++ b/docs/plans/tcp-connection-validation.md @@ -0,0 +1,308 @@ +# TCP Connection Validation Plan + +Status: PLAN ONLY (no implementation yet). Scope is **live verification of +the existing remote-TCP transport plumbing**, not new wire-protocol +reverse-engineering — the wire format itself is the same MDAS-encoded SOAP +already verified end-to-end over `LocalPipe`. + +Read together with: + +- [`docs/reverse-engineering/handoff.md`](../reverse-engineering/handoff.md) — protocol decode state for the reads/events/status helpers +- [`src/AVEVA.Historian.Client/Wcf/HistorianWcfBindingFactory.cs`](../../src/AVEVA.Historian.Client/Wcf/HistorianWcfBindingFactory.cs) — the three already-built bindings +- [`tests/AVEVA.Historian.Client.Tests/HistorianClientIntegrationTests.cs`](../../tests/AVEVA.Historian.Client.Tests/HistorianClientIntegrationTests.cs) — the existing live test pattern (env-var gated, currently `LocalPipe`-only) + +## 1. Goal + +"TCP transport works" means the production SDK at `src/AVEVA.Historian.Client/` +performs every operation in the CLAUDE.md required surface end-to-end against +a **remote** AVEVA Historian over Net.TCP on port `32568`, with parsed +responses, gated live integration tests, and explicit expectations about +when `RemoteTcpIntegrated` vs `RemoteTcpCertificate` is the right transport +choice. + +In scope: + +1. **`RemoteTcpIntegrated`** — Net.TCP + SSPI Windows transport credentials, + binding `CreateMdasNetTcpWindowsBinding` (`HistorianWcfBindingFactory.cs:36`), + endpoint `/Hist-Integrated`. The SDK already wires this through + `HistorianClientOptions.Transport = HistorianTransport.RemoteTcpIntegrated`. +2. **`RemoteTcpCertificate`** — Net.TCP + transport security with no client + credential (server cert only), binding `CreateMdasNetTcpCertificateBinding` + (`HistorianWcfBindingFactory.cs:66`), endpoint `/HistCert`. SDK plumbing + exists. +3. **Plain `RemoteTcp`** (no transport security) — `CreateMdasNetTcpBinding` + (`HistorianWcfBindingFactory.cs:13`) is what the `/Retr` endpoint uses + for both above. Verify it works end-to-end as the read-side channel. +4. **Verification of every public op** over each transport: `ProbeAsync`, + `ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`, `ReadEventsAsync`, + `BrowseTagNamesAsync`, `GetTagMetadataAsync`, `GetConnectionStatusAsync`, + `GetStoreForwardStatusAsync`, `GetSystemParameterAsync`. + +Out of scope: + +- New wire-protocol reverse engineering — the binary protocol is identical + across transports; only the WCF binding shape and credential negotiation + differ. +- Discovering or installing remote Historian instances — operator task, + not SDK work. +- Cert generation / CA bootstrap — operator task; the SDK consumes a cert, + not provisions one. +- `RemoteTcp*` for the explicit-credentials path + (`IntegratedSecurity = false` with username + password) — that's a + separate gap (`HistorianSspiClient` currently only handles current-user + credentials). +- Connection pooling, reconnection on idle disconnect, or load balancing + across redundant Historians. + +## 2. Preconditions + +The work cannot start until **all** of these are true: + +| Precondition | Why | Who | +|---|---|---| +| A reachable remote AVEVA Historian (not `localhost`) | TCP transport behavior cannot be exercised against a same-host install (the LocalPipe binding short-circuits the SSPI negotiation that `Net.TCP + Windows transport` actually exercises) | Operator | +| Network reachability on port 32568 (TCP) from the dev workstation | The Historian listens on this port for the `/Hist-Integrated`, `/HistCert`, `/Retr` endpoints | Operator + IT | +| A test account with at least read access | Used for `RemoteTcpIntegrated` SSPI negotiation | Operator | +| The Historian's SPN registered on the host account | SSPI auth fails without a valid SPN; default native uses `NT SERVICE\aahClientAccessPoint` for LocalPipe. Remote uses something else (likely `MSSQLSvc/host:port` or a custom historian SPN) — DISCOVER FIRST | Discovery (§4.A) | +| For `RemoteTcpCertificate`: a server cert exposed at `/HistCert`, with the cert's CA chain trusted by the dev workstation OR an explicit thumbprint pinning hook | TLS handshake aborts otherwise | Operator + Discovery | +| At least one tag with non-zero history rows on the remote Historian | Otherwise `ReadRawAsync` returns empty and we can't distinguish "transport works, no data" from "transport silently broken" | Operator | +| Time skew between dev workstation and remote Historian < 5 minutes | SSPI negotiation rejects out-of-skew tickets | Operator | + +If any precondition is missing, the plan **stops** at §4.A discovery and +reports back; don't try to "guess" a workaround. + +## 3. Current state + +**Already wired and compiling, never live-verified:** + +| Path | Status | +|---|---| +| `HistorianWcfBindingFactory.CreateBindingPair` (`:126`) — dispatch on `HistorianTransport` enum | ✅ all three transport branches exist | +| `RemoteTcpIntegrated` branch (`:138`) — uses `MdasNetTcpWindows` for Hist + plain `MdasNetTcp` for Retr, both at `Host:Port` | ✅ wired | +| `RemoteTcpCertificate` branch (`:143`) — uses `MdasNetTcpCertificate` for Hist + plain `MdasNetTcp` for Retr | ✅ wired | +| `MdasMessageEncodingBindingElement` shared across transports | ✅ same encoder used everywhere; not transport-specific | +| `HistorianSspiClient` (P/Invoke `InitializeSecurityContextW` with native flags `0x2081C` / `0x81C`) | ⚠️ only exercised over LocalPipe; need to verify SPN logic works for TCP host SPN | + +**Hard-coded LocalPipe in tests** (must not be left in place once TCP is verified): + +```text +EventChainDiagnosticTests.cs:30 +HistorianClientIntegrationTests.cs:79, :114, :154, :184, :215, :237, :262, :320 +``` + +There are ten instances of `Transport = HistorianTransport.LocalPipe`. The +existing tests skip cleanly when `HISTORIAN_HOST != "localhost"`; they do +NOT need to change to validate TCP — instead, add **new** parallel tests +gated by a separate env var (e.g., `HISTORIAN_REMOTE_TCP_HOST`) so both +test families run independently. + +## 4. Discovery workstreams + +These can run in any order; **A, B, and C are parallelizable** since each +hits a different surface (binary inspection vs probe vs operator I/O). D +must be sequential after one of A/B/C produces actionable results. + +### A. SPN discovery (parallel-safe — read-only binary + WCF probe) + +The native client's TCP SPN is currently unknown. Find it via: + +1. **Static IL** — search `current/aahClientManaged.dll` for the strings + `"NT SERVICE"`, `"aahClient"`, and `"MSSQLSvc"` using + `tools/AVEVA.Historian.ReverseEngineering` (the `methods` and + `dnlib-method --instructions` commands handled the SSPI flag discovery + the same way). The `HistorianClientOptions.TargetSpn` default + (`NT SERVICE\aahClientAccessPoint`) is the LocalPipe SPN — TCP almost + certainly differs. +2. **WCF probe** — `tools/AVEVA.Historian.ReverseEngineering -- wcf-probe + 32568` against the remote Historian. Capture the SOAP + fault on failure: it usually echoes the expected SPN in the + `wsa:FaultDetail`. +3. **Cross-reference** with `setspn -L ` on the + remote Historian (requires operator access). + +Output: a documented `TargetSpn` value for `RemoteTcpIntegrated` use, plus +how the SPN is computed (likely host-derived). + +### B. Cert binding discovery (parallel-safe — read-only WCF probe) + +For `RemoteTcpCertificate`: + +1. **WCF cert probe** — `tools/AVEVA.Historian.ReverseEngineering -- + wcf-cert-probe 32568 `. The probe captures + the cert chain and reports CN/SAN. +2. **Cert validation policy** — current binding (`HistorianWcfBindingFactory.cs:74`) + sets `ClientCredentialType = None`; the server cert is validated by the + default WCF chain check. Document what's required: trusted CA, or + thumbprint pinning, or `X509ServiceCertificateAuthentication. + CertificateValidationMode = PeerOrChainTrust`. +3. **Verify endpoint identity** — `/HistCert` may require an `EndpointIdentity` + (DNS or RSA) on the `EndpointAddress`. Current code (`HistorianWcfBindingFactory.cs:152`) + does not set one. Test whether identity verification fails without it. + +Output: documented cert validation requirements + whether +`EndpointAddress(uri, identity)` overload is needed. + +### C. Operator setup checklist (parallel-safe — operator-side) + +Produce a one-page checklist the operator runs against the remote Historian +to confirm preconditions. Includes: + +- `Test-NetConnection -ComputerName -Port 32568` from dev workstation +- `Get-Service aahHistorian*` on the remote (verify running) +- `setspn -L ` to capture registered SPNs +- `sqlcmd -E -S -d Runtime -Q "SELECT TOP 1 TagName FROM Tag"` to + prove the Runtime DB is reachable with the operator's credentials +- `w32tm /query /status` to confirm time sync vs the Historian + +Output: `docs/plans/tcp-validation-operator-checklist.md` (or appendix in +this doc) the operator can hand back filled in. + +### D. Auth-chain delta vs LocalPipe (sequential — needs A's SPN) + +Once A returns an SPN, run `tools/AVEVA.Historian.ReverseEngineering -- +wcf-probe` against `/Hist-Integrated` over TCP and confirm: + +1. The `Hist.GetV → Hist.ValCl × N → Hist.Open2` chain runs the same number + of ValCl rounds (LocalPipe was 2; TCP may be 2 or 3 depending on whether + the underlying transport already negotiated something). +2. The `OpenConnection2` request bytes are identical (the body is + transport-agnostic — only the WCF wrapper differs). +3. The Open2 response carries the same `outParameters` shape (42 bytes + with version, session GUID, FILETIMEs, status). If TCP returns a + different shape, the parser at `HistorianWcfAuthChainHelper.cs` needs a + transport-aware path. + +Output: byte-for-byte diff between LocalPipe and TCP capture, with any +deltas noted. + +## 5. Verification workstreams + +Once §4.A and §4.B return actionable answers, every operation gets a +parallel verification track. **All five tracks below are parallelizable** +because each exercises a different SDK method and they don't share state. + +| Track | Op | Live test to author | Parallel-safe? | +|---|---|---|---| +| V1 | `ProbeAsync` | `ProbeAsync_RemoteTcpIntegrated_ReturnsTrue` + `ProbeAsync_RemoteTcpCertificate_ReturnsTrue` | ✅ | +| V2 | Reads (`ReadRawAsync`, `ReadAggregateAsync`, `ReadAtTimeAsync`) | mirror of the existing 3 LocalPipe tests, env-var gated by `HISTORIAN_REMOTE_TCP_HOST` | ✅ | +| V3 | `ReadEventsAsync` | mirror of `ReadEventsAsync_AgainstLocalHistorian_DoesNotThrow` | ✅ | +| V4 | Tag ops (`BrowseTagNamesAsync`, `GetTagMetadataAsync`) | mirror of the two LocalPipe tests | ✅ | +| V5 | Status helpers (`GetConnectionStatusAsync`, `GetStoreForwardStatusAsync`, `GetSystemParameterAsync`) | mirror of the three LocalPipe tests | ✅ | + +The only sequential dependency: V1 must pass before V2-V5 are meaningful +(if `ProbeAsync` returns false, the others will too for transport reasons). + +For each track, the test pattern is: + +```csharp +string? host = Environment.GetEnvironmentVariable("HISTORIAN_REMOTE_TCP_HOST"); +if (string.IsNullOrWhiteSpace(host) || !OperatingSystem.IsWindows()) return; + +HistorianClient client = new(new HistorianClientOptions +{ + Host = host, + Port = 32568, + IntegratedSecurity = true, + Transport = HistorianTransport.RemoteTcpIntegrated, + TargetSpn = Environment.GetEnvironmentVariable("HISTORIAN_REMOTE_TCP_SPN") + ?? throw new InvalidOperationException("Set HISTORIAN_REMOTE_TCP_SPN per §4.A"), +}); + +// ... existing test body, unchanged ... +``` + +Add a parallel set for `RemoteTcpCertificate` gated by +`HISTORIAN_REMOTE_TCPCERT_HOST` + `HISTORIAN_REMOTE_TCPCERT_THUMBPRINT`. + +## 6. Risks and mitigations + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| **SPN mismatch** — TCP SSPI negotiation rejects with `SEC_E_TARGET_UNKNOWN` | High | All TCP ops fail | §4.A discovery first; expose `TargetSpn` as already done in `HistorianClientOptions` | +| **Cert chain validation rejects** — server cert not trusted by dev workstation | High for `Certificate` transport | Cert transport unusable | §4.B: document required CA / pinning hook; consider a `ServerCertificateValidator` callback option | +| **Endpoint identity required** — `/HistCert` rejects without DNS identity in `EndpointAddress` | Medium | Cert transport unusable | §4.B step 3; if confirmed, add overload to `CreateEndpointAddress` | +| **Wire-level idle disconnect** — TCP connection dropped after N seconds idle, mid-test | Medium | Flaky tests | Set `RequestTimeout` low enough to fail fast; add reconnect logic if seen repeatedly | +| **Open2 response differs over TCP** — extra bytes in `outParameters` for TCP-specific session state | Low (reads/events use the same ConnectionMode 0x402 regardless of transport) | Auth chain breaks | §4.D byte-diff captures it; if found, transport-aware parser branch | +| **Compression negotiation** — `HistorianClientOptions.Compression` unset on LocalPipe; over TCP, the server might enable gzip and our `MdasMessageEncoder` doesn't unwrap it | Medium-Low | Requests succeed, responses garbled | Confirm compression off in initial probe; add gzip handling later if needed | +| **Time skew** — Kerberos ticket clock skew > 5min rejects auth | Low | Total auth failure | Operator checklist (§4.C) catches this | +| **`Probe` succeeds but reads silently empty** — common when tag-permissions don't grant the test account read access | Medium | False positive in V1, V2 fails | V2 asserts `samples.Count > 0` for a tag known to have data | + +## 7. Success criteria + +For `RemoteTcpIntegrated`: + +- [ ] `ProbeAsync` returns `true` against the remote host +- [ ] All five Verification tracks (V1-V5) pass against the remote host +- [ ] Captured wire bytes for `Open2` and `StartQuery2` match the LocalPipe + captures (modulo session-specific GUIDs / FILETIMEs) +- [ ] Test count is 114 → 124 (10 new live tests) when both env vars set +- [ ] All existing `Transport = HistorianTransport.LocalPipe` tests still + pass when only the LocalPipe env var is set (no regression) + +For `RemoteTcpCertificate`: same as above, gated by the cert-specific env +vars. May skip ReadEvents if the cert account doesn't have AnE permission. + +For documentation: + +- [ ] `README.md` operation status table updated: `RemoteTcpIntegrated` and + `RemoteTcpCertificate` transports change from "wired but only + `LocalPipe` has live verification" to "live-verified" +- [ ] `docs/reverse-engineering/handoff.md` gets a new section documenting + any LocalPipe vs TCP wire-byte deltas found in §4.D + +## 8. Open questions + +1. Is there even a remote Historian available to test against? If not, this + plan stalls at §2 preconditions until one is provisioned. (Note: + handoff.md mentions a `10.100.0.x` remote Historian and a Debian relay + used in earlier sessions — verify whether that infrastructure is still + live and reachable.) +2. Does `RemoteTcpCertificate` use mutual-TLS or just server-cert? Current + binding (`HistorianWcfBindingFactory.cs:74`) sets + `ClientCredentialType = None` (server-cert only). Confirm against the + actual `/HistCert` endpoint behavior. +3. Does the `/Retr` channel need its own auth, or does it inherit from the + `/Hist-Integrated` session? Current code uses plain `MdasNetTcp` (no + transport security) for Retr in both `RemoteTcpIntegrated` and + `RemoteTcpCertificate` configurations — is that actually how the native + client does it, or does the native push security on Retr too? +4. What happens if the cert presented by `/HistCert` has a SAN that doesn't + match the host the SDK connected to? Decide pinning vs DNS validation. +5. The `HistorianClientOptions.Compression` flag exists but is not consumed + anywhere in the WCF layer. Is compression a transport concern or an + application-payload concern? Need to know before TCP — the bandwidth + savings only matter over WAN. + +## 9. Parallelization summary + +Within the discovery phase: A, B, C run in parallel. D blocks on A. + +Within the verification phase: V1 must pass first, then V2-V5 parallel. + +Both `RemoteTcpIntegrated` and `RemoteTcpCertificate` verification tracks +can run independently from each other. + +End-to-end estimated wall-clock if all preconditions are met: + +- §4 discovery: half a day if SPN is straightforward, longer if cert chain + surprises bite. +- §5 verification: 2-4 hours given the test scaffolding is largely a copy + of the existing LocalPipe tests. + +If only one developer works the plan: ~1 day. With two developers +parallelizing across `RemoteTcpIntegrated` and `RemoteTcpCertificate`: +~half a day. + +## 10. Out of scope (filed under separate plans) + +- **Write commands over TCP** — `docs/plans/write-commands-reverse-engineering.md` + covers writes; once that lands, this doc adds a §11 "TCP write + verification" track. +- **Store/Forward sidecar over TCP** — covered by + `docs/plans/store-forward-cache-reverse-engineering.md`. SF probably + uses a separate IPC anyway, not Net.TCP. +- **Explicit-credentials TCP** — `IntegratedSecurity = false` with + username + password requires `HistorianSspiClient` to support explicit + credentials, which is its own task. Net.TCP can use either Kerberos or + the explicit creds, but the SDK's SSPI client only does current-user + Kerberos today.