feat(wcf): C2 spike + ConnectViaAddress/connmode — WCF transport viable, rows server-gated #1
@@ -1,63 +1,75 @@
|
|||||||
# WCF event-read spike — live result (2026-06-25): WCF transport not served on 2023 R2
|
# WCF event-read spike — live result (2026-06-25/26): transport+auth viable, row-retrieval server-gated
|
||||||
|
|
||||||
Settles the open question behind **C2** ("event reads over gRPC are gated; the only listed unblock is
|
Settles the open question behind **C2** ("event reads over gRPC are gated; the only listed unblock is
|
||||||
*route event reads via WCF*"). The gRPC event-read path is a proven server-side dead-end
|
*route event reads via WCF*"). The gRPC event-read path is a proven server-side dead-end
|
||||||
(`grpc-event-query-capture.md`: auth fully solved, every client-controllable layer byte-matched to the
|
(`grpc-event-query-capture.md`: auth fully solved, every client-controllable layer byte-matched to the
|
||||||
stock client, yet the server scopes 0 rows to our connection). This spike tested the **WCF** leg.
|
stock client, yet the server scopes 0 rows to our connection). This spike resolved the **WCF** leg.
|
||||||
|
|
||||||
|
> **Correction to an earlier draft of this doc.** A first pass concluded "the 2023 R2 historian does not
|
||||||
|
> serve the legacy WCF transport (connection reset at framing)." **That was a test error, not a server
|
||||||
|
> fact.** It connected to the historian's real WCF port `32568` *directly* and used the Windows-integrated
|
||||||
|
> transport. In this environment the historian is reached through a **reverse SSH tunnel** (local
|
||||||
|
> `42568` → historian `32568`), and integrated/Kerberos auth does not work through that tunnel. The
|
||||||
|
> socket-RST was the tunnel/transport mismatch, not an absent listener. Corrected below.
|
||||||
|
|
||||||
## What was run
|
## What was run
|
||||||
|
|
||||||
A Windows-only, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`,
|
A Windows-only-by-default, env-gated diagnostic (`tests/AVEVA.Historian.Client.Tests/WcfEventReadSpikeTests.cs`)
|
||||||
gated by `HISTORIAN_WCF_EVENT_HOST`) drove `HistorianWcfEventOrchestrator.ReadEventsAsync` directly
|
drives `HistorianWcfEventOrchestrator.ReadEventsAsync` directly. The decisive run was **cross-platform,
|
||||||
over `RemoteTcpIntegrated` (WCF `net.tcp`, port 32568) against the **live 2023 R2 historian**, with a
|
direct** (no tunnel): from the VPN-holding host straight to the historian's real WCF endpoint
|
||||||
−90d window (the engine holds tens of thousands of events in that range), run from the native Windows
|
`net.tcp://<historian>:32568/HistCert`, using the **certificate transport** (`RemoteTcpCertificate`,
|
||||||
capture rig over VPN. Auth supplied as explicit domain credentials (consumed by the app-level
|
TLS, `AllowUntrustedServerCertificate`) and `NegotiateAuthentication` (cross-platform, explicit domain
|
||||||
`ValidateClientCredential` SSPI rounds).
|
credentials). The SDK's interface-version gate was bypassed (`VerifyServerInterfaceVersion=false`) —
|
||||||
|
the 2023 R2 WCF **History interface reports version 13** (this SDK's serializers target 11/12).
|
||||||
|
|
||||||
## Result — RED (transport not served), sanitized
|
## Result — transport+auth viable; row-retrieval server-gated (sanitized)
|
||||||
|
|
||||||
Event spike:
|
Progression of the live errors as the addressing/transport was corrected:
|
||||||
|
|
||||||
| field | value |
|
| attempt | error |
|
||||||
|---|---|
|
|---|---|
|
||||||
| outcome | `THREW System.ServiceModel.CommunicationException` ("The socket connection was aborted") |
|
| direct `:32568`, integrated | `SocketException` "forcibly closed" (wrong port + transport for the tunnel) |
|
||||||
| inner | `System.Net.Sockets.SocketException` — "An existing connection was forcibly closed by the remote host" |
|
| tunnel `:42568`, integrated | `ProtocolException` at the security UpgradeResponse (integrated can't negotiate through the tunnel) |
|
||||||
| events observed | 0 |
|
| tunnel `:42568`, certificate | reached the WCF dispatcher → `AddressFilter` mismatch (tunnel rewrites the port) |
|
||||||
| LastUpdC3ReturnCode / LastRTag2ReturnCode / LastAddReturnCode(EnsT2) | 0 / 0 / 0 |
|
| **direct `:32568`, certificate, cross-platform** | **past auth** → `ProtocolEvidenceMissingException`: History interface version **13** |
|
||||||
| LastEnsT2PayloadSha256 | empty |
|
| + `VerifyServerInterfaceVersion=false` | **full chain runs**; query returns a 10-byte **0-row** header, then `GetNext` long-polls |
|
||||||
| LastResultBufferLength | 0 |
|
|
||||||
|
|
||||||
All native return codes are `0` and the EnsT2 payload sha256 is empty: the chain failed at the **first
|
Connection-mode experiment (certificate transport, direct, version-bypassed, a 1-day window that holds
|
||||||
WCF call** (`GetInterfaceVersion`), *before* any auth token round or CM_EVENT registration ran.
|
events), comparing the native OpenConnection mode used for the event-read chain:
|
||||||
|
|
||||||
Corroboration — a basic (non-event) `RemoteTcpIntegrated` `ProbeAsync` + `ReadRawAsync` (the committed
|
| connMode | RegisterTags (RTag2) | EnsureTags (EnsT2) | result buffer | events |
|
||||||
`RemoteTcpIntegrationTests`) throws the **identical** exception, with the stack landing in
|
|---|---|---|---|---|
|
||||||
`System.ServiceModel.Channels.SocketConnection.WriteAsync` — i.e. the failure is **transport-wide**, not
|
| `0x501` (event) | **0 — success** | 1 (benign-false, as in the 2020 flow) | 10 bytes (0-row header) | **0** |
|
||||||
event-specific, and not auth-specific (it never reaches auth).
|
| `0x401` (write) | 1 (fail) | 1 | 10 bytes | 0 |
|
||||||
|
| `0x402` (read-only, default) | 1 (fail) | 1 | 10 bytes | 0 |
|
||||||
Phase 0 (reachability) had confirmed TCP 32568 is **open** (the connect succeeds). So the port accepts a
|
|
||||||
socket, but the moment the SDK writes its `net.tcp` binary-SOAP framing the server **resets the
|
|
||||||
connection** (RST at the socket-write layer).
|
|
||||||
|
|
||||||
## Conclusion
|
## Conclusion
|
||||||
|
|
||||||
The **2023 R2 historian does not serve the legacy WCF NetTcp transport.** A raw RST at the first socket
|
1. **WCF transport + auth ARE viable on 2023 R2.** The certificate (TLS) transport negotiates and the
|
||||||
write — before any security negotiation, SOAP fault, or auth exchange — is the signature of a listener
|
`NegotiateAuthentication` app-level handshake authenticates — **cross-platform** (proven from a
|
||||||
that does not speak `net.tcp` binary SOAP, not of an auth/SPN problem or event-row scoping. (The earlier
|
non-Windows VPN host). The earlier "WCF not served" conclusion was wrong. (Integrated/Windows
|
||||||
WCF event-chain native return codes 76/85 documented in `HistorianWcfEventOrchestrator` were only ever
|
transport security is not usable through the reverse tunnel — `net.tcp` Kerberos does not tunnel.)
|
||||||
observed against a **2020** historian; against 2023 R2 there is no WCF endpoint to reach at all.)
|
2. **The event-read chain needs the `0x501` event connection mode.** With it, CM_EVENT `RegisterTags`
|
||||||
|
**succeeds** (it fails on `0x402`/`0x401`). `EnsureTags` returns false, but that is documented as
|
||||||
|
benign in the 2020 flow that *did* return rows.
|
||||||
|
3. **Row retrieval is server-gated — same as gRPC.** Even with auth solved and `RegisterTags` succeeding,
|
||||||
|
over a window that holds events, `StartEventQuery` succeeds but `GetNextEventQueryResultBuffer` returns
|
||||||
|
a **0-row** header (10 bytes) and long-polls. Registration and window are ruled out as the cause; the
|
||||||
|
server simply does not scope event rows to a managed connection. This is the **identical** server-side
|
||||||
|
per-connection retrieval working-set gate proven for gRPC in `grpc-event-query-capture.md`.
|
||||||
|
|
||||||
Therefore **C2's "route event reads via WCF" unblock is moot on 2023 R2** — there is no WCF endpoint to
|
**Therefore event reads do not return rows on the 2023 R2 historian over either transport** — gRPC
|
||||||
route to. Event reads are unavailable on the 2023 R2 historian over **both** transports:
|
(retrieval-server-gated) and WCF (transport+auth work, but the same server-side row gate). The only
|
||||||
|
remaining theoretical unblock is server-side (AVEVA exposing event-row retrieval to a managed
|
||||||
|
connection) — not client-fixable. **C2 stays closed won't-fix**, for this (corrected) reason.
|
||||||
|
|
||||||
- **gRPC** — auth-solved but retrieval-server-gated (server scopes 0 rows to our connection;
|
## SDK additions from this investigation (retained, build-clean, golden where applicable)
|
||||||
`grpc-event-query-capture.md`).
|
|
||||||
- **WCF (`net.tcp`)** — transport not served on 2023 R2 (connection reset at framing).
|
|
||||||
|
|
||||||
The WCF event-read managed path would only ever apply to a legacy **2020** historian, which the gateway
|
- `HistorianClientOptions.ConnectViaAddress` — WCF `Via` (connect to a tunnel/proxy while addressing the
|
||||||
does not target (the gateway runs `RemoteGrpc` against 2023 R2). The only remaining theoretical unblock
|
SOAP `To` the real endpoint), so a port-forward whose local port differs from the server's real port
|
||||||
is server-side (AVEVA exposing event-row retrieval to a managed gRPC connection) — not client-fixable.
|
satisfies the server-side WCF AddressFilter.
|
||||||
|
- `HistorianClientOptions.EventReadConnectionModeOverride` — diagnostic override of the event-read
|
||||||
**C2 is closed won't-fix** for the gateway's target (2023 R2). `ReadEventsAsync` over gRPC keeps its
|
OpenConnection mode (the `0x501` finding above).
|
||||||
honest no-row throw; the gating messages are corrected so they no longer point operators at the WCF
|
- The C2 spike is now transport-selectable (integrated|certificate), cross-platform for the cert
|
||||||
transport as a live fallback on 2023 R2.
|
transport, bounded (per-call timeout + overall budget with a phase-diagnostic dump), and version-gate
|
||||||
|
bypassable. Output stays sanitized (counts, native return codes, buffer lengths, sha256).
|
||||||
|
|||||||
@@ -104,11 +104,12 @@ internal sealed class HistorianGrpcEventOrchestrator
|
|||||||
{
|
{
|
||||||
throw new ProtocolEvidenceMissingException(
|
throw new ProtocolEvidenceMissingException(
|
||||||
$"ReadEvents over gRPC did not return rows within {OverallBudget.TotalSeconds:0}s: StartEventQuery " +
|
$"ReadEvents over gRPC did not return rows within {OverallBudget.TotalSeconds:0}s: StartEventQuery " +
|
||||||
"succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval " +
|
"succeeds but GetNextEventQueryResultBuffer long-polls to the no-data terminal. Event-row retrieval is " +
|
||||||
"over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " +
|
"auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " +
|
||||||
"(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " +
|
"connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md). The WCF transport reaches the " +
|
||||||
"on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " +
|
"2023 R2 historian (certificate transport + auth work, CM_EVENT registration succeeds on the 0x501 event " +
|
||||||
"docs/reverse-engineering/wcf-event-read-spike-results.md), so there is no event-read path on a 2023 R2 historian.");
|
"connection) but hits the SAME server-side row gate — 0-row buffer + long-poll (see " +
|
||||||
|
"docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport.");
|
||||||
}
|
}
|
||||||
|
|
||||||
foreach (HistorianEvent evt in events)
|
foreach (HistorianEvent evt in events)
|
||||||
@@ -175,18 +176,20 @@ internal sealed class HistorianGrpcEventOrchestrator
|
|||||||
// returning the WCF code-85 terminal), we cannot distinguish "genuinely no events in range"
|
// returning the WCF code-85 terminal), we cannot distinguish "genuinely no events in range"
|
||||||
// from "the CM_EVENT registration replay didn't fully land over gRPC" — so we refuse to return
|
// from "the CM_EVENT registration replay didn't fully land over gRPC" — so we refuse to return
|
||||||
// a possibly-false empty list and surface the gated state instead. Proven server-gated: the live
|
// a possibly-false empty list and surface the gated state instead. Proven server-gated: the live
|
||||||
// 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed gRPC connection
|
// 2023 R2 server holds tens of thousands of events yet scopes 0 to a managed connection
|
||||||
// (grpc-event-query-capture.md); WCF is not a 2023 R2 fallback (wcf-event-read-spike-results.md).
|
// (grpc-event-query-capture.md). WCF reaches the same historian (cert transport + auth work,
|
||||||
|
// CM_EVENT registers on the 0x501 event connection) but hits the SAME row gate — not a fallback
|
||||||
|
// (wcf-event-read-spike-results.md).
|
||||||
if (events.Count == 0)
|
if (events.Count == 0)
|
||||||
{
|
{
|
||||||
throw new ProtocolEvidenceMissingException(
|
throw new ProtocolEvidenceMissingException(
|
||||||
"ReadEvents over gRPC: the chain completes and StartEventQuery succeeds, but " +
|
"ReadEvents over gRPC: the chain completes and StartEventQuery succeeds, but " +
|
||||||
"GetNextEventQueryResultBuffer returns no rows (it long-polls to the no-data terminal " +
|
"GetNextEventQueryResultBuffer returns no rows (it long-polls to the no-data terminal " +
|
||||||
$"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval " +
|
$"after the CM_EVENT registration replay; last={LastErrorBufferDescription}). Event-row retrieval is " +
|
||||||
"over gRPC is auth-solved but server-gated — the 2023 R2 server scopes 0 rows to a managed connection " +
|
"auth-solved but SERVER-GATED on 2023 R2 over both transports — the server scopes 0 rows to a managed " +
|
||||||
"(see docs/reverse-engineering/grpc-event-query-capture.md). The legacy WCF transport is NOT a fallback " +
|
"connection (gRPC: docs/reverse-engineering/grpc-event-query-capture.md; WCF reaches the historian and " +
|
||||||
"on 2023 R2 (live-disproven 2026-06-25: net.tcp is reset at the framing layer — see " +
|
"registers on the 0x501 event connection yet hits the same row gate: " +
|
||||||
"docs/reverse-engineering/wcf-event-read-spike-results.md).");
|
"docs/reverse-engineering/wcf-event-read-spike-results.md). Not client-fixable on either transport.");
|
||||||
}
|
}
|
||||||
|
|
||||||
return events;
|
return events;
|
||||||
|
|||||||
@@ -8,13 +8,15 @@ using AVEVA.Historian.Client.Wcf.Contracts;
|
|||||||
namespace AVEVA.Historian.Client.Wcf;
|
namespace AVEVA.Historian.Client.Wcf;
|
||||||
|
|
||||||
/// <remarks>
|
/// <remarks>
|
||||||
/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow.
|
/// Mirrors HistorianWcfReadOrchestrator but targets IRetrievalServiceContract4 for the event flow. The
|
||||||
/// Applies to <b>legacy 2020-era WCF (net.tcp) historians only</b>. The event row-buffer layout is now
|
/// event row-buffer layout is decoded (<see cref="HistorianEventRowProtocol"/>; verified against real
|
||||||
/// decoded (<see cref="HistorianEventRowProtocol"/>; verified against real captured rows). Note: a
|
/// captured rows). A <b>2023 R2</b> historian <i>does</i> serve this transport via the <b>certificate</b>
|
||||||
/// <b>2023 R2</b> historian does NOT serve this WCF transport at all — net.tcp is reset at the framing
|
/// (TLS) endpoint (the cert transport + <c>NegotiateAuthentication</c> auth work cross-platform; the
|
||||||
/// layer before any auth (live-disproven 2026-06-25; see
|
/// integrated/Windows transport does not tunnel). With the <c>0x501</c> event connection mode CM_EVENT
|
||||||
/// <c>docs/reverse-engineering/wcf-event-read-spike-results.md</c>), so this orchestrator is not a
|
/// registration succeeds — but <c>StartEventQuery</c> still returns a 0-row buffer and long-polls: event
|
||||||
/// fallback for 2023 R2 deployments. The native return codes 76/85 noted below were 2020-historian
|
/// rows are <b>server-gated</b> per connection on 2023 R2, the same wall as the gRPC path, and not
|
||||||
|
/// client-fixable (see <c>docs/reverse-engineering/wcf-event-read-spike-results.md</c> and
|
||||||
|
/// <c>grpc-event-query-capture.md</c>). The native return codes 76/85 noted below were 2020-historian
|
||||||
/// observations.
|
/// observations.
|
||||||
/// </remarks>
|
/// </remarks>
|
||||||
internal sealed class HistorianWcfEventOrchestrator
|
internal sealed class HistorianWcfEventOrchestrator
|
||||||
|
|||||||
Reference in New Issue
Block a user