diff --git a/docs/reverse-engineering/wcf-event-read-spike-results.md b/docs/reverse-engineering/wcf-event-read-spike-results.md index 16c610d..1f4ad98 100644 --- a/docs/reverse-engineering/wcf-event-read-spike-results.md +++ b/docs/reverse-engineering/wcf-event-read-spike-results.md @@ -1,63 +1,75 @@ -# WCF event-read spike — live result (2026-06-25): WCF transport not served on 2023 R2 +# WCF event-read spike — live result (2026-06-25/26): transport+auth viable, row-retrieval server-gated Settles the open question behind **C2** ("event reads over gRPC are gated; the only listed unblock is *route event reads via WCF*"). The gRPC event-read path is a proven server-side dead-end (`grpc-event-query-capture.md`: auth fully solved, every client-controllable layer byte-matched to the -stock client, yet the server scopes 0 rows to our connection). This spike tested the **WCF** leg. +stock client, yet the server scopes 0 rows to our connection). This spike resolved the **WCF** leg. + +> **Correction to an earlier draft of this doc.** A first pass concluded "the 2023 R2 historian does not +> serve the legacy WCF transport (connection reset at framing)." **That was a test error, not a server +> fact.** It connected to the historian's real WCF port `32568` *directly* and used the Windows-integrated +> transport. In this environment the historian is reached through a **reverse SSH tunnel** (local +> `42568` → historian `32568`), and integrated/Kerberos auth does not work through that tunnel. The +> socket-RST was the tunnel/transport mismatch, not an absent listener. Corrected below. ## What was run -A Windows-only, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`, -gated by `HISTORIAN_WCF_EVENT_HOST`) drove `HistorianWcfEventOrchestrator.ReadEventsAsync` directly -over `RemoteTcpIntegrated` (WCF `net.tcp`, port 32568) against the **live 2023 R2 historian**, with a -−90d window (the engine holds tens of thousands of events in that range), run from the native Windows -capture rig over VPN. Auth supplied as explicit domain credentials (consumed by the app-level -`ValidateClientCredential` SSPI rounds). +A Windows-only-by-default, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`) +drives `HistorianWcfEventOrchestrator.ReadEventsAsync` directly. The decisive run was **cross-platform, +direct** (no tunnel): from the VPN-holding host straight to the historian's real WCF endpoint +`net.tcp://:32568/HistCert`, using the **certificate transport** (`RemoteTcpCertificate`, +TLS, `AllowUntrustedServerCertificate`) and `NegotiateAuthentication` (cross-platform, explicit domain +credentials). The SDK's interface-version gate was bypassed (`VerifyServerInterfaceVersion=false`) — +the 2023 R2 WCF **History interface reports version 13** (this SDK's serializers target 11/12). -## Result — RED (transport not served), sanitized +## Result — transport+auth viable; row-retrieval server-gated (sanitized) -Event spike: +Progression of the live errors as the addressing/transport was corrected: -| field | value | +| attempt | error | |---|---| -| outcome | `THREW System.ServiceModel.CommunicationException` ("The socket connection was aborted") | -| inner | `System.Net.Sockets.SocketException` — "An existing connection was forcibly closed by the remote host" | -| events observed | 0 | -| LastUpdC3ReturnCode / LastRTag2ReturnCode / LastAddReturnCode(EnsT2) | 0 / 0 / 0 | -| LastEnsT2PayloadSha256 | empty | -| LastResultBufferLength | 0 | +| direct `:32568`, integrated | `SocketException` "forcibly closed" (wrong port + transport for the tunnel) | +| tunnel `:42568`, integrated | `ProtocolException` at the security UpgradeResponse (integrated can't negotiate through the tunnel) | +| tunnel `:42568`, certificate | reached the WCF dispatcher → `AddressFilter` mismatch (tunnel rewrites the port) | +| **direct `:32568`, certificate, cross-platform** | **past auth** → `ProtocolEvidenceMissingException`: History interface version **13** | +| + `VerifyServerInterfaceVersion=false` | **full chain runs**; query returns a 10-byte **0-row** header, then `GetNext` long-polls | -All native return codes are `0` and the EnsT2 payload sha256 is empty: the chain failed at the **first -WCF call** (`GetInterfaceVersion`), *before* any auth token round or CM_EVENT registration ran. +Connection-mode experiment (certificate transport, direct, version-bypassed, a 1-day window that holds +events), comparing the native OpenConnection mode used for the event-read chain: -Corroboration — a basic (non-event) `RemoteTcpIntegrated` `ProbeAsync` + `ReadRawAsync` (the committed -`RemoteTcpIntegrationTests`) throws the **identical** exception, with the stack landing in -`System.ServiceModel.Channels.SocketConnection.WriteAsync` — i.e. the failure is **transport-wide**, not -event-specific, and not auth-specific (it never reaches auth). - -Phase 0 (reachability) had confirmed TCP 32568 is **open** (the connect succeeds). So the port accepts a -socket, but the moment the SDK writes its `net.tcp` binary-SOAP framing the server **resets the -connection** (RST at the socket-write layer). +| connMode | RegisterTags (RTag2) | EnsureTags (EnsT2) | result buffer | events | +|---|---|---|---|---| +| `0x501` (event) | **0 — success** | 1 (benign-false, as in the 2020 flow) | 10 bytes (0-row header) | **0** | +| `0x401` (write) | 1 (fail) | 1 | 10 bytes | 0 | +| `0x402` (read-only, default) | 1 (fail) | 1 | 10 bytes | 0 | ## Conclusion -The **2023 R2 historian does not serve the legacy WCF NetTcp transport.** A raw RST at the first socket -write — before any security negotiation, SOAP fault, or auth exchange — is the signature of a listener -that does not speak `net.tcp` binary SOAP, not of an auth/SPN problem or event-row scoping. (The earlier -WCF event-chain native return codes 76/85 documented in `HistorianWcfEventOrchestrator` were only ever -observed against a **2020** historian; against 2023 R2 there is no WCF endpoint to reach at all.) +1. **WCF transport + auth ARE viable on 2023 R2.** The certificate (TLS) transport negotiates and the + `NegotiateAuthentication` app-level handshake authenticates — **cross-platform** (proven from a + non-Windows VPN host). The earlier "WCF not served" conclusion was wrong. (Integrated/Windows + transport security is not usable through the reverse tunnel — `net.tcp` Kerberos does not tunnel.) +2. **The event-read chain needs the `0x501` event connection mode.** With it, CM_EVENT `RegisterTags` + **succeeds** (it fails on `0x402`/`0x401`). `EnsureTags` returns false, but that is documented as + benign in the 2020 flow that *did* return rows. +3. **Row retrieval is server-gated — same as gRPC.** Even with auth solved and `RegisterTags` succeeding, + over a window that holds events, `StartEventQuery` succeeds but `GetNextEventQueryResultBuffer` returns + a **0-row** header (10 bytes) and long-polls. Registration and window are ruled out as the cause; the + server simply does not scope event rows to a managed connection. This is the **identical** server-side + per-connection retrieval working-set gate proven for gRPC in `grpc-event-query-capture.md`. -Therefore **C2's "route event reads via WCF" unblock is moot on 2023 R2** — there is no WCF endpoint to -route to. Event reads are unavailable on the 2023 R2 historian over **both** transports: +**Therefore event reads do not return rows on the 2023 R2 historian over either transport** — gRPC +(retrieval-server-gated) and WCF (transport+auth work, but the same server-side row gate). The only +remaining theoretical unblock is server-side (AVEVA exposing event-row retrieval to a managed +connection) — not client-fixable. **C2 stays closed won't-fix**, for this (corrected) reason. -- **gRPC** — auth-solved but retrieval-server-gated (server scopes 0 rows to our connection; - `grpc-event-query-capture.md`). -- **WCF (`net.tcp`)** — transport not served on 2023 R2 (connection reset at framing). +## SDK additions from this investigation (retained, build-clean, golden where applicable) -The WCF event-read managed path would only ever apply to a legacy **2020** historian, which the gateway -does not target (the gateway runs `RemoteGrpc` against 2023 R2). The only remaining theoretical unblock -is server-side (AVEVA exposing event-row retrieval to a managed gRPC connection) — not client-fixable. - -**C2 is closed won't-fix** for the gateway's target (2023 R2). `ReadEventsAsync` over gRPC keeps its -honest no-row throw; the gating messages are corrected so they no longer point operators at the WCF -transport as a live fallback on 2023 R2. +- `HistorianClientOptions.ConnectViaAddress` — WCF `Via` (connect to a tunnel/proxy while addressing the + SOAP `To` the real endpoint), so a port-forward whose local port differs from the server's real port + satisfies the server-side WCF AddressFilter. +- `HistorianClientOptions.EventReadConnectionModeOverride` — diagnostic override of the event-read + OpenConnection mode (the `0x501` finding above). +- The C2 spike is now transport-selectable (integrated|certificate), cross-platform for the cert + transport, bounded (per-call timeout + overall budget with a phase-diagnostic dump), and version-gate + bypassable. Output stays sanitized (counts, native return codes, buffer lengths, sha256). diff --git a/src/AVEVA.Historian.Client/Grpc/HistorianGrpcEventOrchestrator.cs b/src/AVEVA.Historian.Client/Grpc/HistorianGrpcEventOrchestrator.cs index 74e8568..c1464a6 100644 --- a/src/AVEVA.Historian.Client/Grpc/HistorianGrpcEventOrchestrator.cs +++ b/src/AVEVA.Historian.Client/Grpc/HistorianGrpcEventOrchestrator.cs @@ -104,11 +104,12 @@ internal sealed class HistorianGrpcEventOrchestrator { throw new ProtocolEvidenceMissingException( $"ReadEvents over gRPC did not return rows within {OverallBudget.TotalSeconds:0}s: StartEventQuery " + - "succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval " + - "over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " + - "(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " + - "on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " + - "docs/reverse-engineering/wcf-event-read-spike-results.md), so there is no event-read path on a 2023 R2 historian."); + "succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval is " + + "auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " + + "connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md). The WCF transport reaches the " + + "2023 R2 historian (certificate transport + auth work, CM_EVENT registration succeeds on the 0x501 event " + + "connection) but hits the SAME server-side row gate — 0-row buffer + long-poll (see " + + "docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport."); } foreach (HistorianEvent evt in events) @@ -175,18 +176,20 @@ internal sealed class HistorianGrpcEventOrchestrator // returning the WCF code-85 terminal), we cannot distinguish "genuinely no events in range" // from "the CM_EVENT registration replay didn't fully land over gRPC" — so we refuse to return // a possibly-false empty list and surface the gated state instead. Proven server-gated: the live - // 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed gRPC connection - // (grpc-event-query-capture.md); WCF is not a 2023 R2 fallback (wcf-event-read-spike-results.md). + // 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed connection + // (grpc-event-query-capture.md). WCF reaches the same historian (cert transport + auth work, + // CM_EVENT registers on the 0x501 event connection) but hits the SAME row gate — not a fallback + // (wcf-event-read-spike-results.md). if (events.Count == 0) { throw new ProtocolEvidenceMissingException( "ReadEvents over gRPC: the chain completes and StartEventQuery succeeds, but " + "GetNextEventQueryResultBuffer returns no rows (it long-polls to the no-data terminal " + - $"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval " + - "over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " + - "(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " + - "on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " + - "docs/reverse-engineering/wcf-event-read-spike-results.md)."); + $"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval is " + + "auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " + + "connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md; WCF reaches the historian and " + + "registers on the 0x501 event connection yet hits the same row gate: " + + "docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport."); } return events; diff --git a/src/AVEVA.Historian.Client/Wcf/HistorianWcfEventOrchestrator.cs b/src/AVEVA.Historian.Client/Wcf/HistorianWcfEventOrchestrator.cs index 7ff341e..f6ef4dc 100644 --- a/src/AVEVA.Historian.Client/Wcf/HistorianWcfEventOrchestrator.cs +++ b/src/AVEVA.Historian.Client/Wcf/HistorianWcfEventOrchestrator.cs @@ -8,13 +8,15 @@ using AVEVA.Historian.Client.Wcf.Contracts; namespace AVEVA.Historian.Client.Wcf; /// -/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow. -/// Applies to legacy 2020-era WCF (net.tcp) historians only. The event row-buffer layout is now -/// decoded (; verified against real captured rows). Note: a -/// 2023 R2 historian does NOT serve this WCF transport at all — net.tcp is reset at the framing -/// layer before any auth (live-disproven 2026-06-25; see -/// docs/reverse-engineering/wcf-event-read-spike-results.md), so this orchestrator is not a -/// fallback for 2023 R2 deployments. The native return codes 76/85 noted below were 2020-historian +/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow. The +/// event row-buffer layout is decoded (; verified against real +/// captured rows). A 2023 R2 historian does serve this transport via the certificate +/// (TLS) endpoint (the cert transport + NegotiateAuthentication auth work cross-platform; the +/// integrated/Windows transport does not tunnel). With the 0x501 event connection mode CM_EVENT +/// registration succeeds — but StartEventQuery still returns a 0-row buffer and long-polls: event +/// rows are server-gated per connection on 2023 R2, the same wall as the gRPC path, and not +/// client-fixable (see docs/reverse-engineering/wcf-event-read-spike-results.md and +/// grpc-event-query-capture.md). The native return codes 76/85 noted below were 2020-historian /// observations. /// internal sealed class HistorianWcfEventOrchestrator