feat(wcf): C2 spike + ConnectViaAddress/connmode — WCF transport viable, rows server-gated #1

Merged
dohertj2 merged 16 commits from feat/c2-wcf-event-spike into main 2026-06-26 06:48:03 -04:00
3 changed files with 80 additions and 63 deletions
Showing only changes of commit f2297315b9 - Show all commits
@@ -1,63 +1,75 @@
# WCF event-read spike — live result (2026-06-25): WCF transport not served on 2023 R2
# WCF event-read spike — live result (2026-06-25/26): transport+auth viable, row-retrieval server-gated
Settles the open question behind **C2** ("event reads over gRPC are gated; the only listed unblock is
*route event reads via WCF*"). The gRPC event-read path is a proven server-side dead-end
(`grpc-event-query-capture.md`: auth fully solved, every client-controllable layer byte-matched to the
stock client, yet the server scopes 0 rows to our connection). This spike tested the **WCF** leg.
stock client, yet the server scopes 0 rows to our connection). This spike resolved the **WCF** leg.
> **Correction to an earlier draft of this doc.** A first pass concluded "the 2023 R2 historian does not
> serve the legacy WCF transport (connection reset at framing)." **That was a test error, not a server
> fact.** It connected to the historian's real WCF port `32568` *directly* and used the Windows-integrated
> transport. In this environment the historian is reached through a **reverse SSH tunnel** (local
> `42568` → historian `32568`), and integrated/Kerberos auth does not work through that tunnel. The
> socket-RST was the tunnel/transport mismatch, not an absent listener. Corrected below.
## What was run
A Windows-only, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`,
gated by `HISTORIAN_WCF_EVENT_HOST`) drove `HistorianWcfEventOrchestrator.ReadEventsAsync` directly
over `RemoteTcpIntegrated` (WCF `net.tcp`, port 32568) against the **live 2023 R2 historian**, with a
90d window (the engine holds tens of thousands of events in that range), run from the native Windows
capture rig over VPN. Auth supplied as explicit domain credentials (consumed by the app-level
`ValidateClientCredential` SSPI rounds).
A Windows-only-by-default, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`)
drives `HistorianWcfEventOrchestrator.ReadEventsAsync` directly. The decisive run was **cross-platform,
direct** (no tunnel): from the VPN-holding host straight to the historian's real WCF endpoint
`net.tcp://<historian>:32568/HistCert`, using the **certificate transport** (`RemoteTcpCertificate`,
TLS, `AllowUntrustedServerCertificate`) and `NegotiateAuthentication` (cross-platform, explicit domain
credentials). The SDK's interface-version gate was bypassed (`VerifyServerInterfaceVersion=false`) —
the 2023 R2 WCF **History interface reports version 13** (this SDK's serializers target 11/12).
## Result — RED (transport not served), sanitized
## Result — transport+auth viable; row-retrieval server-gated (sanitized)
Event spike:
Progression of the live errors as the addressing/transport was corrected:
| field | value |
| attempt | error |
|---|---|
| outcome | `THREW System.ServiceModel.CommunicationException` ("The socket connection was aborted") |
| inner | `System.Net.Sockets.SocketException` — "An existing connection was forcibly closed by the remote host" |
| events observed | 0 |
| LastUpdC3ReturnCode / LastRTag2ReturnCode / LastAddReturnCode(EnsT2) | 0 / 0 / 0 |
| LastEnsT2PayloadSha256 | empty |
| LastResultBufferLength | 0 |
| direct `:32568`, integrated | `SocketException` "forcibly closed" (wrong port + transport for the tunnel) |
| tunnel `:42568`, integrated | `ProtocolException` at the security UpgradeResponse (integrated can't negotiate through the tunnel) |
| tunnel `:42568`, certificate | reached the WCF dispatcher → `AddressFilter` mismatch (tunnel rewrites the port) |
| **direct `:32568`, certificate, cross-platform** | **past auth**`ProtocolEvidenceMissingException`: History interface version **13** |
| + `VerifyServerInterfaceVersion=false` | **full chain runs**; query returns a 10-byte **0-row** header, then `GetNext` long-polls |
All native return codes are `0` and the EnsT2 payload sha256 is empty: the chain failed at the **first
WCF call** (`GetInterfaceVersion`), *before* any auth token round or CM_EVENT registration ran.
Connection-mode experiment (certificate transport, direct, version-bypassed, a 1-day window that holds
events), comparing the native OpenConnection mode used for the event-read chain:
Corroboration — a basic (non-event) `RemoteTcpIntegrated` `ProbeAsync` + `ReadRawAsync` (the committed
`RemoteTcpIntegrationTests`) throws the **identical** exception, with the stack landing in
`System.ServiceModel.Channels.SocketConnection.WriteAsync` — i.e. the failure is **transport-wide**, not
event-specific, and not auth-specific (it never reaches auth).
Phase 0 (reachability) had confirmed TCP 32568 is **open** (the connect succeeds). So the port accepts a
socket, but the moment the SDK writes its `net.tcp` binary-SOAP framing the server **resets the
connection** (RST at the socket-write layer).
| connMode | RegisterTags (RTag2) | EnsureTags (EnsT2) | result buffer | events |
|---|---|---|---|---|
| `0x501` (event) | **0 — success** | 1 (benign-false, as in the 2020 flow) | 10 bytes (0-row header) | **0** |
| `0x401` (write) | 1 (fail) | 1 | 10 bytes | 0 |
| `0x402` (read-only, default) | 1 (fail) | 1 | 10 bytes | 0 |
## Conclusion
The **2023 R2 historian does not serve the legacy WCF NetTcp transport.** A raw RST at the first socket
write — before any security negotiation, SOAP fault, or auth exchange — is the signature of a listener
that does not speak `net.tcp` binary SOAP, not of an auth/SPN problem or event-row scoping. (The earlier
WCF event-chain native return codes 76/85 documented in `HistorianWcfEventOrchestrator` were only ever
observed against a **2020** historian; against 2023 R2 there is no WCF endpoint to reach at all.)
1. **WCF transport + auth ARE viable on 2023 R2.** The certificate (TLS) transport negotiates and the
`NegotiateAuthentication` app-level handshake authenticates — **cross-platform** (proven from a
non-Windows VPN host). The earlier "WCF not served" conclusion was wrong. (Integrated/Windows
transport security is not usable through the reverse tunnel — `net.tcp` Kerberos does not tunnel.)
2. **The event-read chain needs the `0x501` event connection mode.** With it, CM_EVENT `RegisterTags`
**succeeds** (it fails on `0x402`/`0x401`). `EnsureTags` returns false, but that is documented as
benign in the 2020 flow that *did* return rows.
3. **Row retrieval is server-gated — same as gRPC.** Even with auth solved and `RegisterTags` succeeding,
over a window that holds events, `StartEventQuery` succeeds but `GetNextEventQueryResultBuffer` returns
a **0-row** header (10 bytes) and long-polls. Registration and window are ruled out as the cause; the
server simply does not scope event rows to a managed connection. This is the **identical** server-side
per-connection retrieval working-set gate proven for gRPC in `grpc-event-query-capture.md`.
Therefore **C2's "route event reads via WCF" unblock is moot on 2023 R2** — there is no WCF endpoint to
route to. Event reads are unavailable on the 2023 R2 historian over **both** transports:
**Therefore event reads do not return rows on the 2023 R2 historian over either transport** — gRPC
(retrieval-server-gated) and WCF (transport+auth work, but the same server-side row gate). The only
remaining theoretical unblock is server-side (AVEVA exposing event-row retrieval to a managed
connection) — not client-fixable. **C2 stays closed won't-fix**, for this (corrected) reason.
- **gRPC** — auth-solved but retrieval-server-gated (server scopes 0 rows to our connection;
`grpc-event-query-capture.md`).
- **WCF (`net.tcp`)** — transport not served on 2023 R2 (connection reset at framing).
## SDK additions from this investigation (retained, build-clean, golden where applicable)
The WCF event-read managed path would only ever apply to a legacy **2020** historian, which the gateway
does not target (the gateway runs `RemoteGrpc` against 2023 R2). The only remaining theoretical unblock
is server-side (AVEVA exposing event-row retrieval to a managed gRPC connection) — not client-fixable.
**C2 is closed won't-fix** for the gateway's target (2023 R2). `ReadEventsAsync` over gRPC keeps its
honest no-row throw; the gating messages are corrected so they no longer point operators at the WCF
transport as a live fallback on 2023 R2.
- `HistorianClientOptions.ConnectViaAddress` — WCF `Via` (connect to a tunnel/proxy while addressing the
SOAP `To` the real endpoint), so a port-forward whose local port differs from the server's real port
satisfies the server-side WCF AddressFilter.
- `HistorianClientOptions.EventReadConnectionModeOverride` — diagnostic override of the event-read
OpenConnection mode (the `0x501` finding above).
- The C2 spike is now transport-selectable (integrated|certificate), cross-platform for the cert
transport, bounded (per-call timeout + overall budget with a phase-diagnostic dump), and version-gate
bypassable. Output stays sanitized (counts, native return codes, buffer lengths, sha256).
@@ -104,11 +104,12 @@ internal sealed class HistorianGrpcEventOrchestrator
{
throw new ProtocolEvidenceMissingException(
$"ReadEvents over gRPC did not return rows within {OverallBudget.TotalSeconds:0}s: StartEventQuery " +
"succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval " +
"over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " +
"(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " +
"on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " +
"docs/reverse-engineering/wcf-event-read-spike-results.md), so there is no event-read path on a 2023 R2 historian.");
"succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval is " +
"auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " +
"connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md). The WCF transport reaches the " +
"2023 R2 historian (certificate transport + auth work, CM_EVENT registration succeeds on the 0x501 event " +
"connection) but hits the SAME server-side row gate — 0-row buffer + long-poll (see " +
"docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport.");
}
foreach (HistorianEvent evt in events)
@@ -175,18 +176,20 @@ internal sealed class HistorianGrpcEventOrchestrator
// returning the WCF code-85 terminal), we cannot distinguish "genuinely no events in range"
// from "the CM_EVENT registration replay didn't fully land over gRPC" — so we refuse to return
// a possibly-false empty list and surface the gated state instead. Proven server-gated: the live
// 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed gRPC connection
// (grpc-event-query-capture.md); WCF is not a 2023 R2 fallback (wcf-event-read-spike-results.md).
// 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed connection
// (grpc-event-query-capture.md). WCF reaches the same historian (cert transport + auth work,
// CM_EVENT registers on the 0x501 event connection) but hits the SAME row gate — not a fallback
// (wcf-event-read-spike-results.md).
if (events.Count == 0)
{
throw new ProtocolEvidenceMissingException(
"ReadEvents over gRPC: the chain completes and StartEventQuery succeeds, but " +
"GetNextEventQueryResultBuffer returns no rows (it long-polls to the no-data terminal " +
$"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval " +
"over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " +
"(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " +
"on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " +
"docs/reverse-engineering/wcf-event-read-spike-results.md).");
$"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval is " +
"auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " +
"connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md; WCF reaches the historian and " +
"registers on the 0x501 event connection yet hits the same row gate: " +
"docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport.");
}
return events;
@@ -8,13 +8,15 @@ using AVEVA.Historian.Client.Wcf.Contracts;
namespace AVEVA.Historian.Client.Wcf;
/// <remarks>
/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow.
/// Applies to <b>legacy 2020-era WCF (net.tcp) historians only</b>. The event row-buffer layout is now
/// decoded (<see cref="HistorianEventRowProtocol"/>; verified against real captured rows). Note: a
/// <b>2023 R2</b> historian does NOT serve this WCF transport at all — net.tcp is reset at the framing
/// layer before any auth (live-disproven 2026-06-25; see
/// <c>docs/reverse-engineering/wcf-event-read-spike-results.md</c>), so this orchestrator is not a
/// fallback for 2023 R2 deployments. The native return codes 76/85 noted below were 2020-historian
/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow. The
/// event row-buffer layout is decoded (<see cref="HistorianEventRowProtocol"/>; verified against real
/// captured rows). A <b>2023 R2</b> historian <i>does</i> serve this transport via the <b>certificate</b>
/// (TLS) endpoint (the cert transport + <c>NegotiateAuthentication</c> auth work cross-platform; the
/// integrated/Windows transport does not tunnel). With the <c>0x501</c> event connection mode CM_EVENT
/// registration succeeds — but <c>StartEventQuery</c> still returns a 0-row buffer and long-polls: event
/// rows are <b>server-gated</b> per connection on 2023 R2, the same wall as the gRPC path, and not
/// client-fixable (see <c>docs/reverse-engineering/wcf-event-read-spike-results.md</c> and
/// <c>grpc-event-query-capture.md</c>). The native return codes 76/85 noted below were 2020-historian
/// observations.
/// </remarks>
internal sealed class HistorianWcfEventOrchestrator