Files
histsdk/docs/reverse-engineering/handoff.md
T
Joseph Doherty afc7c4bf96 SendEvent over gRPC: implement + live-validate (was capture-gated)
Captured the native 2023 R2 client's gRPC event send (new capture-send-event
harness scenario): it rides HistoryService.AddStreamValues with the SAME "OS"
(0x534F) storage-sample buffer the WCF path already uses (HistorianEventWrite-
Protocol) — confirming "no distinct RPC", and that it is NOT the historical
write's "ON" buffer. Diffed the write-enabled vs read-only v8 Event open: byte-
identical apart from per-session crypto, so the existing OpenSession event path
is reused unchanged.

So SendEvent-over-gRPC was pure assembly of proven parts:
- HistorianGrpcEventWriteOrchestrator = v8 Event open + CM_EVENT registration
  (UpdC3/RegisterTags/EnsureTags) + AddStreamValues(OS buffer).
- HistorianClient.SendEventAsync now routes to it for RemoteGrpc (WCF otherwise).

Live-validated end-to-end against the 2023 R2 server: pure-managed SDK send →
AddStreamValues BSuccess=true → the event reads back from the server (markers
confirmed in returned event rows). The native gRPC RegisterTags(24B) +
EnsureTags(86B) byte-match our serializers (new GrpcEventSendProtocolTests
golden, closing the 83-vs-86 EnsureTags question). Gated live test
SendEventAsync_OverGrpc_AcceptsEvent (opt-in HISTORIAN_GRPC_EVENT_SEND=1).
331 offline tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-23 15:37:22 -04:00

65 KiB
Raw Blame History

AVEVA Historian Managed Driver Handoff

Last updated: 2026-06-23 (event-row parser fix merged; roadmap still exhausted — no actionable pure-code tasks remain)

Current status supersedes the historical blocker narrative below. The sections from "Active Blocker" onward are a preserved reverse-engineering record of how the 2020 WCF read/write/event paths were cracked (2026-05-04). They are kept for provenance; they are not the live state. Start with "Current Status" immediately below.

Current Status (2026-06-22) — roadmap exhausted

docs/plans/hcal-roadmap.md reachable surface is complete, and every plan under docs/plans/ is either DONE or has only gated items left. There are zero actionable pure-code tasks remaining. Memory anchor: project_roadmap_exhausted_2020wcf.

Shipped + live-verified across both transports:

  • Reads (WCF + gRPC): ProbeAsync, ReadRawAsync, ReadAggregateAsync, ReadAtTimeAsync, ReadEventsAsync, BrowseTagNamesAsync, GetTagMetadataAsync, status helpers (GetConnectionStatusAsync, GetStoreForwardStatusAsync, GetSystemParameterAsync).
  • Writes: EnsureTagAsync (analog Float/Double/Int2/Int4/UInt4, ApplyScaling), DeleteTagAsync, RenameTagsAsync, AddTagExtendedPropertiesAsync, and the M3 historical/backfill AddHistoricalValuesAsync (gRPC-only, all five analog types golden-tested + live write/read-back).
  • Config reads (mostly gRPC): GetRuntimeParameterAsync, GetTagExtendedPropertiesAsync, ExecuteSqlCommandAsync (WCF; gRPC server-walled), GetServerTimeZoneAsync (gRPC-only).
  • Client-side: M4 R4.1 store-and-forward outbox, R4.4 redundancy, R4.3 measured-idle SF status.

The 2023 R2 gRPC transport (HistorianTransport.RemoteGrpc, port 32565) reuses the proven 2020 WCF byte serializers/parsers unchanged inside protobuf bytes fields, keyed by the Open2 session handle. Live-verified against a real 2023 R2 server (History interface v12) — see reference_2023r2_live_server_access.

Everything still open is gated — none is a pure-code task:

  1. gRPC event ROW retrieval (ReadEventsAsync #2) — AUTH-SOLVED / PARSE-VERIFIED / RETRIEVAL-SERVER-GATED (every client-side angle exhausted 2026-06-23, merged 6faf8a5). The v8 OpenConnection crypto wall is fully cracked + live-verified: the event connection authenticates via HistoryService.ExchangeKey (P-256 ECDH) → client key = SHA256(shared secret) → credential token = RC4(password-UTF16LE, key=MD5(client key)) (the native HistorianCrypto.NRC4_V2.aahCryptV2 MD5-keyed RC4 scheme). RE'd via Frida CNG hooks + dnlib IL extraction + an offline cracker; implemented pure-managed, golden-tested, auth live-PASSES. The StartEventQuery v6 request and the Event-type v8 OpenConnection (ConnectionType=Event) are shipped. BUT the query still returns rowCount-0 while the native returns 50 for a byte-identical request — and all four next-session angles are now tested and ruled out (grpc-event-query-capture.md):

    • transport — the stock client is also gRPC-Web/HTTP-1.1 (decompiled); plain native HTTP/2 (CreateHttp2) returns the same 0 rows;
    • client metadata/cert — decompiled + TLS-tee captured: gzip-only metadata, no-op interceptor, no TLS client cert on either side;
    • connection topology — the native splits services across 5 connections and queries on a dedicated RetrievalService connection; replicating that (HISTORIAN_GRPC_EVENT_SPLIT_CHANNEL=1) still returns 0 rows → the server correlates by session handle, not connection;
    • data store — via SOCKS→SQL: the event store is global/unscoped (no per-connection column); the Events view (served by the engine via the INSQL provider) returns 71,332 events for the same window the gRPC query gets 0.

    So the gate is a server-internal per-connection retrieval working-set in the native HistorianClient C++ core — not reconstructable from a pure-managed client. PARSE PATH NOW VERIFIED + a latent bug FIXED: fed the provided stock client's real captured result buffer (63,192 B, 50 events) through HistorianEventRowProtocol.Parse — it exposed that the parser treated the one-time 0x1E buffer header field as a per-row marker, decoding only the first row of any multi-row buffer. This also hit the shipped WCF event read (identical 0900 <rowCount> 1E000000 0700 header). Fixed to a 10-byte buffer header + markerless rows, accepting container version 9 (WCF) and 11 (gRPC); the real 50-row buffer now decodes to exactly 50 events (Parse_RealStockClientCapture_DecodesAllEvents, gated on HISTORIAN_EVENT_CAPTURE_NDJSON). So if the server gate ever opens, decoded events flow through correctly on both transports; until then the orchestrator stays on the no-row throw (eventConnection: true wired; opt-in EventReadDiagnostic test, HISTORIAN_GRPC_EVENT_DIAG=1). Diagnostics retained: the httpcap TLS-tee proxy, CreateHttp2/SPLIT_CHANNEL switches, the sqlschema SOCKS→SQL probe, the capture-event harness (native, returns rows).

  2. R4.3 active-SF magnitude — needs an SF-active server (D2 storage-engine console handle).

  3. SendEvent over gRPC SHIPPED + LIVE-VALIDATED 2026-06-23. SendEventAsync now routes over RemoteGrpc (HistorianGrpcEventWriteOrchestrator). Captured the native client live (capture-send-event harness scenario): the send rides HistoryService.AddStreamValues with the same "OS" (0x534F) buffer the WCF path uses (HistorianEventWriteProtocol — "no distinct RPC" confirmed true), on a v8 Event session + CM_EVENT registration. The write-enabled Event open is byte-identical to the read-only one (diffed live — only per-session crypto differs), so the existing event-open path is reused unchanged. End-to-end: pure-managed SDK send → BSuccess=true → event read back from the live server (markers SdkSendProbe/SdkCaptureProbe confirmed in returned rows). Golden-tested (GrpcEventSendProtocolTests) + gated live test (SendEventAsync_OverGrpc_AcceptsEvent, opt-in HISTORIAN_GRPC_EVENT_SEND=1).

  4. ExecuteSqlCommand over gRPCserver-walled (CSrvDbConnection; RegisterTags prime doesn't help). Use WCF for SQL.

  5. R4.2 revision EDITS — storage-engine-pipe-only on BOTH transports (the D2 wall).

  6. ReadBlocks (StartBlockRetrievalQuery) — never captured on either transport.

  7. DeleteTagExtendedProperties — server-blocked on BOTH transports. The gRPC multiplexed-channel hypothesis was PROBED + DISPROVEN 2026-06-22 (merge c88260c): GetTgByNm + GetTepByNm primes succeed on one shared write-enabled gRPC channel, yet DelTep is still rejected (native code=1) and the property survives — the working set is native in-process registration state, not the wire session. Pinned by gated negative test DeleteTagExtendedProperties_OverGrpc_ProbeMultiplexedChannel.

  8. Deferred-by-design items (write-commands D1D3, non-analog tag create, etc.) — bounded out until an explicit customer/user demand signal.

To move any remaining item you need a server-side / connection-level angle (item 1 — v8 event auth is solved; row retrieval is connection-gated, see the NEXT SESSION section of grpc-event-query-capture.md), a fresh native capture (SendEvent gRPC framing — item 3), a different server (SF-active for item 2), or a demand signal to unlock a deferred item. Live-server gRPC probe recipe: set HISTORIAN_GRPC_HOST/_PORT 32565/_TLS true/_DNSID + domain creds (strip quotes — reference_wonder_sql_vd03_credentials) and run the gated HistorianGrpcIntegrationTests.

2023 R2 stock-client binary dive (2026-06-23) — sharpened verdicts

Re-read the full decompiled stock 2023 R2 managed client (histsdk-2023r2-analysis/decompiled/: Archestra.Historian.GrpcClient, ArchestrA.HistorianAccess, Archestra.Grpc.Contract, HistorianEvent, HistorianAccessUtil) as the oracle for every still-pending item. Governing fact: ArchestrA.HistorianAccess.dll is a C++/CLI mixed assembly — every data/config/write method is a thin shim into native <Module>.HistorianClient.*, and the managed Grpc*Client wrappers are instantiated by nothing in the decompiled set (new Grpc*Client( → zero call sites). So the buffer-building and RPC-dispatch sequencing for these items lives in native C++ not present in the binaries. That confirms the "gated" calls were not from missing managed steps — with these refinements:

  • Item 1 (gRPC event rows)confirmed native/server-side. Stock event call graph is provably identical to ours (transport, per-service channels, gzip-only metadata, CM_EVENT registration, v8 ECDH Event-open, StartEventQuery request bytes). EventQuery.StartQuery/MoveNext dispatch straight into native HistorianClient.StartEventQuery/GetNextRow; the query orchestration that would differ is native and not on the wire. One untested low-effort check remains: byte-diff a captured Event-connection EnsureTags/RegisterTags against our replay (the 83-vs-86-byte EnsT gap was never actually compared).
  • Item 3 (SendEvent over gRPC) SHIPPED + LIVE-VALIDATED 2026-06-23 (was "capturable"). RPC confirmed = HistoryService.AddStreamValues (the "no distinct RPC" note is TRUE). The btValues VTQ buffer turned out to be already-owned: our M2 HistorianEventWriteProtocol.SerializeAddStreamValuesBuffer ("OS" buffer, decoded from the WCF event-send) is the transport-independent PackToVtq equivalent and the gRPC send uses it verbatim (live capture: sig OS/0x534F, CM_EVENT GUID, identical framing — NOT the historical write's "ON" buffer). The write-enabled Event open is byte-identical to the read-only one (live diff). So SendEvent-over-gRPC was pure assembly: HistorianGrpcEventWriteOrchestrator = existing v8 Event open + existing CM_EVENT registration + AddStreamValues(OS buffer). End-to-end live-validated (send → BSuccess → read back from the live server). Golden-tested + gated live test.
  • Item 4 (ExecuteSql over gRPC)confirmed walled + explained. The stock client gates SQL out client-side: HistorianAccess.ExecuteSqlCommand returns OperationNotSupported when IsManagedHistorian(node) or !IsProcessConnectionRequested() (decompile ~:6198/:6214) and never sends the RPC. SQL-over-gRPC is unsupported by design on a managed/gRPC historian; our ProtocolEvidenceMissingException is correct.
  • Item 5 (R4.2 revision edits)confirmed HARD. There is no Revision RPC in the gRPC contract (zero "Revision" message types); the stock client reaches a revision edit only via the native HistorianClient.AddRevisionValuesBegin/AddRevisionValue/ AddRevisionValuesEnd transaction trio over the storage-engine channel. NOTE: this is a distinct capability from AddNonStreamValues (non-streamed original insert) — HistorianGrpcRevisionProbe probes the latter; its doc comment was corrected to say so.
  • Item 6 (ReadBlocks/LoadBlocks)LoadBlocks request is a trivial handle+sequence cursor but the historyBlocks response is a native blob with no managed decoder, and it needs the D2-blocked OpenStorageConnection console handle. Walled.
  • Item 7 (DeleteTagExtendedProperties)reframed to a capturable lead. RPC + string handle are correct in our SDK; ADD and DELETE are structurally identical and neither routes through StartJob. The differentiator is the deleteFromServer flag carried inside the native-built BtInput plus the native HCAL cache-sync background worker that actually propagates the delete server-side (config writes hit the in-process cache first, then sync). Capturable: capture native DeleteTagExtendedPropertiesByName(deleteFromServer=true)'s BtInput to learn whether one well-formed RPC durably deletes (→ shippable) or whether it genuinely depends on the cache-sync worker (→ walled).
  • SF/snapshot/shard/ForwardSnapshot ops — only Get/SetSFParameter are managed-built (typed strings); all others carry opaque native buffers and need the storage console handle. Walled / tooling-internal.

Net: 3 items hard-confirmed walled with real explanations (4, 5, 6 + OpenStorageConnection), and 2 moved to a precise, local-box-capturable target: SendEvent (PackToVtq output) and DeleteTEP (BtInput with deleteFromServer=true). Both need native instrumentation of aahClientManaged.dll (Frida / IL-rewrite — repo tooling exists under tools/AVEVA.Historian.NativeTraceHarness + scripts/frida/), not a special server.

Project Direction

The project goal is still a fully managed .NET 10 C# AVEVA Historian client. The production SDK must not depend on aahClientManaged.dll, aahClient.dll, or any other AVEVA native runtime binary.

Do not pivot to REST or a P/Invoke production shim unless the project requirements change. Native and P/Invoke tools in this repo are reverse engineering aids only.

Required production surface (all live-verified):

  • ProbeAsync
  • ReadRawAsync
  • ReadAggregateAsync
  • ReadAtTimeAsync
  • ReadEventsAsync
  • BrowseTagNamesAsync
  • GetTagMetadataAsync
  • Status helpers: GetConnectionStatusAsync, GetStoreForwardStatusAsync, GetSystemParameterAsync

Write surface (added 2026-05-04 by explicit user request — see docs/plans/write-commands-reverse-engineering.md Status section):

  • EnsureTagAsync for analog Float / Double / Int2 / Int4 / UInt4 (with optional ApplyScaling=true for distinct MinRaw / MaxRaw persistence — server sets AnalogTag.Scaling=1 when the EnsT2 trailer's second byte is 0x01 instead of 0x00).
  • DeleteTagAsync.

AddS2 (write samples) is architecturally blocked — server runtime cache only ingests from configured IOServers / Application Server pipelines. Discrete / String / Int1 / Int8 / UInt8 EnsT2 fail at native AddTag and are unsupported. There is no UpdateTags operation on the WCF surface; the misnomer in earlier write-up drafts has been removed.

Repository Map

  • AGENTS.md - standing project instructions and constraints.
  • instructions.md - original plan and decision record.
  • current\ - deployed sidecar dependency DLL set; use this first for wrapper behavior.
  • aveva-install-x64\ and aveva-install-x86\ - full installed AVEVA DLL sets for comparison.
  • src\AVEVA.Historian.Client\ - production managed SDK.
  • tests\AVEVA.Historian.Client.Tests\ - unit and gated integration tests.
  • tools\AVEVA.Historian.ReverseEngineering\ - .NET 10 CLI for static inspection, WCF probes, and IL-rewrite generation.
  • tools\AVEVA.Historian.NativeTraceHarness\ - .NET Framework native-wrapper comparison harness. Reverse-engineering only.
  • tools\AVEVA.Historian.NetFxWcfProbe\ - .NET Framework WCF probe used to rule out .NET 10 WCF-only differences.
  • tools\AVEVA.Historian.ReverseInstrumentation\ - helper assembly injected into rewritten wrapper copies for sanitized logging.
  • tools\AVEVA.Historian.WcfCaptureServer\ - fake WCF capture server used for endpoint experiments.
  • scripts\ - PowerShell runners and Frida scripts.
  • docs\reverse-engineering\ - sanitized notes and small evidence summaries.
  • artifacts\reverse-engineering\ - ignored raw/sensitive runtime artifacts. Do not commit raw captures or identity-bearing logs.

Build And Test

From the repository root, normally %USERPROFILE%\Desktop\histsdk:

dotnet build .\Histsdk.slnx --no-restore
dotnet test .\Histsdk.slnx --no-build --logger "console;verbosity=minimal"

Current known-good result (2026-06-23):

  • Build succeeds (0 warnings / 0 errors).
  • Offline tests pass: 328/328 (live gRPC/integration tests skip cleanly without their env vars). Gated live tests add to this when HISTORIAN_* / HISTORIAN_GRPC_* are set. The +7 over the prior 321 are the event-row parser fix's golden + gated-capture coverage (HistorianEventRowProtocolTests: markerless multi-row, the v11 gRPC header, and the 50-event stock-client capture).

The workspace is a Git working tree (origin: gitea.dohertylan.com). Use normal git workflow for change tracking; the prior "no working tree, use timestamps" note is obsolete.

Environment Variables

Live integration tests and probes are gated by environment variables:

$env:HISTORIAN_HOST = "<host>"
$env:HISTORIAN_PORT = "32568"
$env:HISTORIAN_USER = "<DOMAIN\user or machine\user>"
$env:HISTORIAN_PASSWORD = "<password>"
$env:HISTORIAN_TEST_TAG = "<known historized tag>"
$env:HISTORIAN_TAG_FILTER = "<LIKE filter, optional>"

Do not write actual credentials into docs, scripts, captures, or command logs. The scripts read these values from the process environment.

Useful Commands

Probe managed WCF endpoints:

dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- wcf-probe $env:HISTORIAN_HOST 32568
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- wcf-cert-probe $env:HISTORIAN_HOST 32568 localhost

Test the positive managed tag-browse route:

dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- wcf-like-tag-browse $env:HISTORIAN_HOST 32568 $env:HISTORIAN_TAG_FILTER

Run a bounded negative StartQuery2 replay without burning the full matrix:

dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- wcf-start-query $env:HISTORIAN_HOST 32568 $env:HISTORIAN_TEST_TAG --max-attempts 1 --timeout-seconds 3

Run the native wrapper comparison harness:

dotnet run --project tools\AVEVA.Historian.NativeTraceHarness -- --scenario history --tag $env:HISTORIAN_TEST_TAG --lookback-minutes 1440
dotnet run --project tools\AVEVA.Historian.NativeTraceHarness -- --scenario event --lookback-minutes 10080

Search local Galaxy Repository for historized tags:

powershell.exe -NoProfile -ExecutionPolicy Bypass -File .\scripts\Find-GalaxyHistorizedTags.ps1

Prompt for Historian credentials in a PowerShell window:

powershell.exe -NoProfile -ExecutionPolicy Bypass -File .\scripts\Prompt-HistorianCredentialsAndOpen2.ps1

Script Locations

Credential/session helpers:

  • scripts\Prompt-HistorianCredentialsAndOpen2.ps1
  • scripts\Test-AahClientManagedOpen.ps1
  • scripts\Test-AahClientManagedReadIntegrated.ps1

Native/wrapper capture runners:

  • scripts\Run-AahClientManagedFridaCapture.ps1
  • scripts\Attach-AahClientManagedFridaCapture.ps1
  • scripts\Attach-NativeTraceHarnessRuntimePointerCapture.ps1
  • scripts\Attach-NativeTraceHarnessWinsockCapture.ps1
  • scripts\Attach-NativeTraceHarnessSystemBoundaryCapture.ps1
  • scripts\Attach-NativeTraceHarnessAahClientExportCapture.ps1

Server-side ValCl probe:

  • scripts\Capture-AahClientAccessPointValClContext.ps1
  • scripts\frida\aahclientaccesspoint-valcl-context.js

Network/relay experiments:

  • scripts\Attach-SystemBoundaryViaDebianRelay.ps1
  • scripts\Run-DebianHistorianRelayCapture.ps1
  • scripts\Run-PktmonDebianRelayCapture.ps1
  • scripts\Start-WcfOpen2CaptureServer.ps1

Frida hook implementations:

  • scripts\frida\aahclientmanaged-open-query.js
  • scripts\frida\aahclientmanaged-system-boundary.js
  • scripts\frida\aahclientmanaged-winsock.js
  • scripts\frida\aahclient-exports.js

Current Evidence Summary

Positive evidence:

  • Fully managed WCF/MDAS endpoint probing works.
  • /Hist, /Retr, /Stat, and /Trx GetV calls are reachable.
  • /HistCert is reachable with MDAS over transport security.
  • /Hist-Integrated accepts managed Windows integrated Open2.
  • The returned Open2 handle is accepted by Retr.IsOriginalAllowed.
  • Managed wildcard tag browse works through:
    • Retr.StartLikeTagNameSearch
    • Retr.GetLikeTagnames
  • Native wrapper history reads succeed in the direct/local path for known historized tags.
  • Native wrapper event query succeeds and returns sanitized local-dev rows.
  • DataQueryRequest serialization is byte-matched for:
    • full/raw request
    • time-weighted aggregate request
    • interpolated request
  • EventQueryRequest serialization is byte-matched for the current empty-filter event query fixture.
  • OpenConnection3 request/response layout is partially decoded:
    • request byte 0: version 6
    • request bytes 1..16: authenticated context GUID
    • request byte 17: content selector
    • response byte 0: version 3
    • response bytes 1..4: transient /Retr client handle
    • response includes storage session id, connect time, and server time

Negative evidence:

  • Open2 by itself is not enough for history/event query starts.
  • Direct managed /Retr.StartQuery2 fails even with byte-matched DataQueryRequest bytes.
  • The bounded current replay shape is:
    • /Hist-Integrated Open2 succeeds
    • Retr.IsOriginalAllowed returns true
    • StartQuery2 returns false
    • response and error buffers are empty
    • legacy StartQuery may fault with a server null-reference
  • Query failure is not caused by:
    • wrong basic WCF service path
    • wrong MDAS content type
    • wrong DataQueryRequest serializer
    • wrong QueryType sweep
    • wrong common selector flag variants
    • missing IsOriginalAllowed
    • simple explicit username/password mismatch
  • Managed standalone ValCl replay reproduces the first native wrapped NTLM token but still fails at round 0.
  • Running the same managed ValCl path through .NET Framework also fails, so this is not just a .NET 10 WCF behavior difference.

Historical: Read-Path Blocker (resolved 2026-05-04)

Preserved RE record. This was the original active blocker; it is long resolved and is not the live state — see "Current Status" at the top.

Resolved on 2026-05-04. The previous blocker — managed ValCl rejected by the server — had two causes, both now fixed:

  1. WCF parameter-name mismatch. SDK and probe declared the ValidateClientCredential byte parameters as inputBuffer / outputBuffer; the actual server contract (per ildasm of aahClientAccessPoint.exe) uses inBuff / outBuff. WCF derives body element names from the C# parameter name, so the server's deserialiser was ignoring the unknown <inputBuffer> element and arg.2 was null, NRE-ing at IL 0x01AA. Fixed via [MessageParameter(Name = "inBuff")] / Name = "outBuff" in the probe and in src/AVEVA.Historian.Client/Wcf/Contracts/IHistoryServiceContract2.cs and IStorageServiceContract.cs.
  2. SSPI request-flag mismatch. Probe used ALLOCATE_MEMORY | CONFIDENTIALITY | INTEGRITY | CONNECTION = 0x10910; the native wrapper uses 0x2081C round 0 / 0x81C round 1+ (adds IDENTIFY round 0 and REPLAY_DETECT + SEQUENCE_DETECT always). The REPLAY/SEQUENCE pair gates NTLM MIC generation; without it, AcceptSecurityContext rejects round 1 with SEC_E_INVALID_TOKEN. Fixed in the probe's SspiClient.

The full chain a successful native read uses is now reproducible from a fully managed client end-to-end:

  1. Hist-Integrated.GetV → version 11
  2. Hist-Integrated.ValCl round 0 (69 → 239 bytes) ✓
  3. Hist-Integrated.ValCl round 1 (93 → 1 byte terminal) ✓

The next evidence layers — OpenConnection3 (with the now-known context key), Retr.IsOriginalAllowed, and Retr.StartQuery2 — should now work, because the native context-map registration that ProcessServerToken performs has finally been completed by a managed client. Run the same managed sequence and observe whether OpenConnection3 returns the expected 42-byte response and whether StartQuery2 returns a non-empty result for OtOpcUaParityTest_001.Counter.

Next Pickup Steps

scripts\Capture-AahClientAccessPointValClContext.ps1 cannot get server-side helper visibility on this host. Both scenarios were re-run on 2026-05-03 from an elevated PowerShell session (Admin, High Mandatory Label, SeDebugPrivilege enabled) and Frida attach into aahClientAccessPoint.exe (running as NT SERVICE\aahClientAccessPoint) was rejected with Failed to attach: process with pid <pid> either refused to load frida-agent, or terminated during injection. The actual Frida Python exception is frida.ProcessNotRespondingError, which means the agent injection handshake did not complete in time, not a load-time refusal. The probes themselves still ran cleanly: NativeRead reproduced the canonical fixture row, and ManagedValCl reproduced the type-4/code-1 round-zero failure with the canonical wrapped-NTLM prefix.

Hypotheses already ruled out on this host:

  • Process mitigation policy. Get-ProcessMitigation -Id <pid> reports every category OFF for the service, including BinarySignature.MicrosoftSignedOnly, DynamicCode.BlockDynamicCode, Cfg.Enable, ImageLoad.BlockRemoteImageLoads, ExtensionPoint.DisableExtensionPoints, and UserShadowStack.*.
  • DACL / token. OpenProcess(PROCESS_ALL_ACCESS) from the elevated token succeeds, including PROCESS_VM_OPERATION, PROCESS_VM_WRITE, and PROCESS_CREATE_THREAD.
  • Bitness. Cross-bitness Frida (64-bit Python attaching to a fresh C:\Windows\SysWOW64\cmd.exe) works.
  • AV / EDR. Defender real-time protection, behavior monitoring, and on-access protection are OFF; no third-party AV/EDR is registered with SecurityCenter2; no EDR-style filter driver is active.
  • IFEO / AppInit. No IFEO debugger entry for aahClientAccessPoint.exe; AppInit_DLLs empty in 64-bit and WOW64 hives.
  • Frida realm / persist_timeout knobs. realm='native', realm='emulated', and persist_timeout=30 all fail identically.

Likely remaining cause: service-internal — aahClientAccessPoint.exe runs ~150 threads, many in EventPairLow ALPC/SCM waits, and Frida's manual mapper does not get a cooperative thread to complete its RPC bootstrap.

ETW SSPI tracing then produced the actionable evidence Frida could not. A logman session capturing LsaSrv, LSA, Microsoft-Windows-NTLM, NTLM Security Protocol, and Security: NTLM Authentication providers at level 0xFF and keywords 0xFFFFFFFFFFFFFFFF recorded 10 SSPI events from aahClientAccessPoint during a successful native read (Ids 30, 34, 35, 40, 84, 10, 12, 16, 17, 86 in a 47 ms burst) and zero from the same process during a failing managed ValCl run. lsass-side SSPI activity also drops 35x in the failing run (4330 → 121 events). The implication is that the long-standing HistoryService.ValidateClientCredential caught NullReferenceException at line 1593 fires before reaching CServerNode.ProcessServerToken at IL 0x01DC, i.e. between Guid.TryParse(handle) at IL 0x012A and the ProcessServerToken call site. Likely culprits: CServerBuffer vtable allocation at IL 0x0183, the byte-array pointer/length copy into buffer +72/+76, or a parameter pull from ServiceSecurityContext.Current whose WindowsIdentity is null on the plain Security.Mode = None pipe binding.

Static IL inspection of HistoryService.ValidateClientCredential (token 0x06000774, 779 instructions, in mixed-mode aahClientAccessPoint.exe) enumerates every NRE-capable instruction on the straight-line path before the ProcessServerToken call and narrows the failure to five candidates (full table in openconnection3-correlation-latest.json ValidateClientCredentialIlNreCandidates):

  • 0x00EDLogHistorianMessage(... CServerClient*, ...) in the prologue. NREs if the CServerClient* is null on the failing binding.
  • 0x017E and 0x0182 — vtable derefs in the allocator chain at &g_ClientAccessPoint + 2328 → vtable → +40. NREs if the field is uninitialised; ruled out as the differentiator because g_ClientAccessPoint is a process-wide singleton.
  • 0x01AA (ldelema) and 0x01B2 (ldlen) on arg.2 = byte[] inputBuffer. NREs if WCF deserialises the buffer as null even though 69 bytes are on the wire.

The two custom-error paths in this method (code 28 for invalid GUID text at 0x012F, code 204 for allocator-null at 0x018A) are both explicitly handled, so neither would manifest as the logged NullReferenceException.

Differential analysis against the successful native local read (which uses the same Security.Mode = None pipe binding) rules out the prologue and the static-singleton vtable chain as differentiators. The byte-array deref at 0x01AA/0x01B2 is the most plausible remaining candidate because it depends on WCF body deserialisation which can silently differ between the managed probe and the native wrapper even when both sides claim the same operation contract.

SOAP-body comparison via WCF message logging in the .NET Framework probe resolved this. The wire body sent <inputBuffer>BASE64DATA</inputBuffer> but the response used <outBuff b:nil="true"/>. ildasm against aahClientAccessPoint.exe confirmed the actual server contract is

ValidateClientCredential(string handle, uint8[] inBuff,
                         [out] uint8[]& outBuff,
                         [out] uint8[]& errorBuffer)

WCF derives the request body element name from the C# parameter name, so the probe's inputBuffer parameter produced <inputBuffer> on the wire and the server's WCF deserialiser ignored that unknown element, leaving server-side arg.2 = inBuff = null. IL 0x01AA ldelema System.Byte then NREs and the C++/CLI catch handler converts it to native error type 4 / code 1.

Adding [MessageParameter(Name = "inBuff")] and [MessageParameter(Name = "outBuff")] to the probe's ValidateClientCredential declaration unblocks the request:

  • Round 0: ServerSuccess=true, ServerOutputLength=239, ServerContinue=true, output prefix 01 4e 54 4c 4d 53 53 50 00 02 ... (continue byte + NTLMSSP type-2 challenge). Matches the documented native-success "69→239 byte" first round exactly.
  • Round 1: Type=129 Code=0x80090308 = SEC_E_INVALID_TOKEN with a 100-byte error buffer whose ASCII payload includes aahClientAccessPoint::CServerContext::ProcessClientToken and InitializeSecurityContext. The original parameter-binding NRE is gone; the next layer of failure is real SSPI rejection inside AcceptSecurityContext.

The same [MessageParameter] fix is now applied to the production SDK contracts IHistoryServiceContract2.ValidateClientCredential and IStorageServiceContract.ValidateClientCredential. ildasm also revealed the same parameter-naming mismatch on EnsT/EnsT2/RTag2/ExKey/StJb/GtJb with their current SDK declarations; those operations are not on the read-only SDK path so they are intentionally left alone for now (audit when those flows become required — see ServerContractAuditedOtherOperationsWithLikelySameMismatch in openconnection3-correlation-latest.json for the table).

Native SSPI flag replication on 2026-05-04 resolved SEC_E_INVALID_TOKEN. Decoded native flags:

  • 0x2081C round 0 = ISC_REQ_IDENTIFY | ISC_REQ_CONNECTION | ISC_REQ_CONFIDENTIALITY | ISC_REQ_SEQUENCE_DETECT | ISC_REQ_REPLAY_DETECT
  • 0x81C round 1+ = same minus ISC_REQ_IDENTIFY

The probe was missing ISC_REQ_REPLAY_DETECT, ISC_REQ_SEQUENCE_DETECT, and round-0 ISC_REQ_IDENTIFY. The REPLAY/SEQUENCE pair gates NTLM MIC generation in the type-3 response; without it the server's AcceptSecurityContext rejects with SEC_E_INVALID_TOKEN. Adding those flags (and tracking the round count internally in SspiClient, keeping ALLOCATE_MEMORY for buffer convenience) reproduces the documented native two-round sequence byte-for-byte from a managed client:

Round Wire Server output Continue Error
0 69 wrapped 239 (NTLM type-2 challenge) true none
1 93 wrapped 1 byte (0x00 terminal) false none

FinalServerSuccess: true, FinalNativeError: null. The long-standing managed ValCl blocker is resolved. The chain a successful native read uses is now reproducible from a managed client end-to-end:

  1. Hist-Integrated.GetV → version 11
  2. Hist-Integrated.ValCl round 0 (69 → 239 bytes) ✓
  3. Hist-Integrated.ValCl round 1 (93 → 1 byte terminal) ✓

End-to-end chain verification on 2026-05-04. The .NET Framework probe was extended to chain Hist.Open2 (replaying the captured 1346-byte v6 request with the leading 16 context-key bytes spliced to match the managed ValCl GUID), then Retr.IsOriginalAllowed, then Retr.StartQuery2 (replaying the captured 251-byte OtOpcUaParityTest_001.Counter DataQueryRequest). Result:

Step Outcome
Hist.Open2 42 bytes, version 0x03, transient /Retr client handle decoded
Retr.GetV version 4
Retr.IsOriginalAllowed(handle) return code 0, isAllowed = true
Retr.StartQuery2(handle, 1, 251 bytes, ...) Success=true, response 31 bytes, QueryHandlePresent=true, no error

The 31-byte StartQuery2 response SHA-256 4c062b5ce8181308f0f46bfd8c6088acb52e6ade94401651b7d3ccc8952edfb5 is byte-for-byte identical to the previously captured native success response. The full AVEVA Historian native wire protocol chain through StartQuery2 is now reproducible end-to-end from a fully managed client.

This required one additional contract fix: IRetrievalServiceContract2 had the same parameter-name mismatch class. Server uses pRequestBuff / pResponseBuff / errSize / err on StartQuery2 (and pResultBuff / errSize / err on GetNextQueryResultBuffer2, errSize / err on EndQuery2). [MessageParameter(Name = ...)] attributes added to src/AVEVA.Historian.Client/Wcf/Contracts/IRetrievalServiceContract2.cs.

Reproduce the chain with:

.\tools\AVEVA.Historian.NetFxWcfProbe\bin\Debug\net481\AVEVA.Historian.NetFxWcfProbe.exe `
  --endpoint "net.pipe://localhost/Hist" `
  --retr-endpoint "net.pipe://localhost/Retr" `
  --open2-replay  .\artifacts\reverse-engineering\openconnection3-request-replay.bin `
  --data-query-replay .\artifacts\reverse-engineering\startdataquery-request-replay.bin

The two *.bin inputs are extracted from artifacts/reverse-engineering/instrumented-openconnection3-correlation/capture.ndjson (OpenConnection3.Request and StartDataQuery.Request Base64 fields) and stay under artifacts/ (gitignored). The probe stdout JSON only echoes lengths, SHAs, version bytes, and prefix hex; it does not echo identity payloads or transient handle values.

Production SDK note: src/AVEVA.Historian.Client currently has no SSPI client (only wrap/unwrap helpers in HistorianWcfAuthenticationProtocol). When the SDK auth flow is wired for the production read path, it must use the same native-equivalent flags. .NET 10's System.Net.Security.NegotiateAuthentication does not expose ISC_REQ_* directly; P/Invoke InitializeSecurityContextW (or equivalent) to set IDENTIFY + REPLAY_DETECT + SEQUENCE_DETECT explicitly. Reference implementation in tools/AVEVA.Historian.NetFxWcfProbe/Program.cs SspiClient.

The protocol is now fully understood end-to-end for the read path; remaining work is plumbing — replace the captured-replay Open2 payload with HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6 (already in the SDK), then chain ValCl → Open2 → /Retr.StartQuery2 → /Retr.GetNextQueryResultBuffer2 for the canonical read fixture.

Production SDK plumbing landed on 2026-05-04. The fully managed .NET 10 SDK now reads history end-to-end against the live local Historian. New SDK pieces:

  • Wcf/HistorianSspiClient.cs — managed SSPI client, P/Invokes InitializeSecurityContextW with native flags 0x2081C round 0 / 0x81C later. [SupportedOSPlatform("windows")].
  • Wcf/HistorianWcfBindingFactory.CreateMdasNetNamedPipeBinding + CreatePipeEndpointAddress — Named Pipe transport for the local Historian. [SupportedOSPlatform("windows")].
  • Wcf/HistorianDataQueryProtocol.TryParseGetNextQueryResultBufferRows — parses UInt16 version=9 + UInt32 rowCount + N self-describing rows; recognises the 5-byte 04 1E 00 00 00 ("no more data") terminal.
  • Wcf/HistorianWcfReadOrchestrator.cs — chains Hist.GetV → Hist.ValCl × N → Hist.Open2 → /Retr.GetV → Retr.IsOriginalAllowed → Retr.StartQuery2 → loop Retr.GetNextQueryResultBuffer2. Builds the OpenConnection3 v6 request through HistorianOpen2Protocol.SerializeNativeOpenConnection3Version6 with documented native constants (ClientType=4, ConnectionMode=0x402, FormatVersion=4, HcalVersion=17, DataSourceId="2020.406.2652.2").
  • HistorianClientOptions.Transport (defaults to LocalPipe) and HistorianClientOptions.TargetSpn (defaults to NT SERVICE\aahClientAccessPoint).
  • Models/HistorianSample.PercentGood.
  • Protocol/Historian2020ProtocolDialect.ReadRawAsync now delegates to the orchestrator on Windows + LocalPipe.

ReadRawAsync against the live local Historian for the canonical OtOpcUaParityTest_001.Counter fixture returns parsed HistorianSample rows including Quality, OpcQuality, QualityDetail, NumericValue, PercentGood, and TimestampUtc.

Test coverage:

  • Without the integration env vars: 64/64 unit tests pass (golden-byte coverage of SSPI flag selection, Named Pipe binding shape, and the row-buffer parser for the captured 570-byte fixture).
  • With HISTORIAN_HOST=localhost + HISTORIAN_TEST_TAG=OtOpcUaParityTest_001.Counter: 69/69 pass, including HistorianClientIntegrationTests.ReadRawAsync_AgainstLocalHistorian_ReturnsAtLeastOneRow which exercises the full managed chain end-to-end.

Reverse-engineering for the read path is complete. Remaining follow-up work (not blocked by protocol discovery — only plumbing):

  • Aggregate row layouts (Interpolated, TimeWeightedAverage) and ReadAggregateAsync / ReadAtTimeAsync wiring (use the per-mode dnlib row captures already in docs/reverse-engineering/).
  • ReadEventsAsync wiring (StartEventQuery request bytes are already byte-matched; need event row layout + a similar orchestrator).
  • Remote TCP transports (RemoteTcpIntegrated, RemoteTcpCertificate).
  • Explicit username/password authentication (current orchestrator is integrated-only).
  • [MessageParameter] audit on the other contracts ildasm flagged with parameter-name mismatches: EnsT, EnsT2, RTag2, ExKey, StJb, GtJb (none on the read path so far).
  • Decode the trailing 34 bytes per row (likely string-value placeholder + aggregate end-timestamp slot).

All of the above landed on 2026-05-04. The SDK now exposes ReadRawAsync, ReadAggregateAsync, ReadAtTimeAsync, and ReadEventsAsync end-to-end; [MessageParameter] audits applied to ~30 parameter-name mismatches across IHistoryServiceContract, IHistoryServiceContract2, IRetrievalServiceContract, IRetrievalServiceContract3, and IRetrievalServiceContract4; HistorianWcfBindingFactory.CreateBindingPair(options) now selects the right Hist + Retr binding/endpoint pair for LocalPipe, RemoteTcpIntegrated, and RemoteTcpCertificate transports; HistorianSspiClient has an explicit-creds constructor overload that builds SEC_WINNT_AUTH_IDENTITY. 72/72 tests pass with HISTORIAN_HOST=localhost + HISTORIAN_TEST_TAG=... set, including seven live integration tests against the local Historian.

Surfaced new evidence target during event-flow verification: Retr.GetNextEventQueryResultBuffer returns native error type=4 code=85 (0x55) — a fresh server response we haven't seen before, likely caused by the missing RegisterTags2(CM_EVENT) prerequisite that the native wrapper's CreateDefaultEventTag performs before any event read. The orchestrator treats the 5-byte type=4 buffer as a soft terminal so the chain doesn't throw; LastErrorBufferDescription surfaces the full code for diagnostics.

Open items (each isolated, no protocol discovery required):

  1. Event default-tag registration (CM_EVENT prerequisite) — partially decoded, full chain incomplete. Built instrument-wcf-writemessage IL-rewrite tooling that hooks aahMDASEncoder.ClientMessageEncoder.WriteMessage (token 0x06005E65, MDAS encoder layer) to capture every outgoing WCF body via the existing CaptureLogger pattern. The captured event scenario flow has 27 outgoing WCF calls between session startup and the first event row:

    # Action Notes
    0 Hist/GetV version probe
    1-2 Hist/GetI get-info
    3-4 Hist/ValCl ×2 auth (handle = ValCl context key GUID)
    5 Hist/Open2 1472-byte v6 buffer (we replicate this)
    6-7 unknown 105-byte session setup
    8-9 unknown 211-byte first appearance of session GUID 6D332FCD-… (later used as EnsT2 handle)
    10 Hist/UpdC3 status update — uses 6D332FCD
    11-16 unknown 183/185/188/192-byte more setup
    17 Hist/RTag2 uses 6D332FCD
    18 unknown 184-byte
    19 Trx/GetV transaction service version probe
    20 unknown 105-byte
    21 Retr/GetV retrieval version probe
    22 Hist/EnsT2 CTagMetadata(CM_EVENT) — uses 6D332FCD
    23 Retr/StartEventQuery succeeds when 22 succeeds
    24 Retr/GetNextEventQueryResultBuffer returns row buffer
    25 Retr/EndEventQuery terminal
    26 Hist/Close2 session close

    CTagMetadata payload is now byte-for-byte verified. Captured 83-byte CM_EVENT payload from record 22 matches our SDK HistorianAddTagsProtocol.SerializeCmEventCTagMetadata exactly when the captured FILETIME is substituted in (verified via reflection unit dump: 83/83 bytes match). Layout corrections from the wire capture vs. the previously-documented format:

    • Action URI is aa/Hist/EnsT2, NOT aa/Hist/AddT.
    • 7-byte storage block ends with 0x01, not 0x00.
    • Layout is flags(7) + uint(0) + FILETIME(8) + GUID(16) + tail(5), NOT FILETIME + flags + uint(rate) + uint(deadband) + GUID.
    • Common Archestra event type GUID is 5f59ae42-3bb6-4760-91a5-ab0be01f9f02 (NOT …e01f2f27 as previously documented from IL inspection).
    • 5-byte tail 2F 27 01 01 01 (3 unknown bytes + 2 trailing 01s).

    Live event reads still return zero events because:

    • Records 6-9 (which establish the session GUID 6D332FCD-… used by every subsequent call) and records 11-16 (~5 unknown setup calls) have NOT been decoded yet.
    • Without those calls, our SDK's EnsT2 uses the storage session id from the Open2 response as the handle, but the server expects the session GUID established by records 8-9 — which it never received because we never made those calls. EnsT2 returns false and Retr.GetNextEventQueryResultBuffer returns native code 85.
    • SDK's EnsT2 attempt is wrapped in try/catch and surfaces the return code via HistorianWcfEventOrchestrator.LastAddReturnCode for diagnostics; the chain doesn't throw.

    Concrete remaining work for live event reads:

    • Identify and decode records 6-9 from artifacts/reverse-engineering/instrumented-wcf-writemessage/writemessage-capture-event-latest.ndjson. The action URI of each will be visible as ASCII in the body (e.g. aa/Hist/Foo). For each, decode the request body shape and identify which call returns the session GUID 6D332FCD-… that subsequent calls use as their handle.
    • Implement those calls in the orchestrator before EnsT2.
    • Same for records 11-16 (unknown 183/185/188/192-byte calls).
    • Then re-test EnsT2 should return true and events should flow.
    • Once events flow, capture the GetNextEventQueryResultBuffer response bytes (would require also instrumenting ReadMessage — symmetric to WriteMessage) and write the event-row parser.

    The IL-rewrite tooling (tools/AVEVA.Historian.ReverseEngineering instrument-wcf-writemessage command) and corresponding LogByteArraySegment helper in CaptureLogger are now in place for any future capture work. Reproduce a fresh capture with:

    dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- instrument-wcf-writemessage
    # Then stage the modified DLL into a current-copy dir alongside
    # AVEVA.Historian.ReverseInstrumentation.dll, set AVEVA_HISTORIAN_RE_CAPTURE,
    # and run the native trace harness with --current-dir <copy> --managed-dll-path <copy>/aahClientManaged.dll
    
  2. Capture a Wcf.GetNextEventQueryResultBuffer.ResultBytes fixture (only possible AFTER the registration step above succeeds and rows actually flow), then write a parser using the same approach as TryParseGetNextQueryResultBufferRows.

  3. Verify RemoteTcpIntegrated and RemoteTcpCertificate against an actual remote Historian.

  4. Verify explicit-creds path with a non-current user account.

  5. Add RetrievalModeQueryType mappings for the modes beyond Full / Interpolated / TimeWeightedAverage / Cyclic.

  6. Decode the trailing ~24 bytes of each row body (vary across rows for the same tag — likely per-sample value/source/state metadata).

Diagnostic helper: EventChainDiagnosticTests.EventOrchestrator_DiagnosticDump_AgainstLocalHistorian calls the orchestrator directly via InternalsVisibleTo and prints LastResultBufferLength + LastErrorBufferDescription. Useful when iterating on the registration step. Run with:

$env:HISTORIAN_HOST = 'localhost'
dotnet test .\Histsdk.slnx --no-build --logger "console;verbosity=detailed" --filter "FullyQualifiedName~EventOrchestrator_DiagnosticDump"

SQL ground-truth check for events (verified against the live Historian on 2026-05-04):

sqlcmd -E -S . -d Runtime -W -Q "SELECT TOP 10 EventTimeUtc, Type, Source_Object FROM Events WHERE EventTimeUtc > DATEADD(DAY, -7, GETUTCDATE()) ORDER BY EventTimeUtc"

Returns event rows like System.OffScan, System.Stop, Alarm.Set that the managed ReadEventsAsync should also surface once the registration step is wired.

If runtime confirmation is later required (e.g., to capture the actual NRE stack frame), pick exactly one of these escalation paths and do not retry plain elevated Frida:

  1. SYSTEM-token injection (requires explicit user consent — spawns a SYSTEM shell). Whether or not this clears ProcessNotRespondingError is uncertain (the bottleneck looks like the agent RPC handshake, not the caller token). Cheapest test, but ETW already answered the immediate question.

    PsExec64.exe -accepteula -s -i frida -p <aahClientAccessPoint-pid> -l .\scripts\frida\aahclientaccesspoint-valcl-context.js -o .\artifacts\reverse-engineering\valcl-context-system.ndjson
    
  2. Signed Detours/EasyHook DLL. Slowest path, but does not depend on Frida's bootstrap handshake completing.

  3. WinDbg non-invasive attach (windbg -p <pid> -pv). Useful for one-shot stack/handle inspection rather than live hook coverage, and it confirms whether the process responds to a debugger at all.

To rerun the ETW capture (no service touch, only ETW providers and the existing harness/probe binaries):

$artifacts = "$PWD\artifacts\reverse-engineering"; New-Item -ItemType Directory -Force -Path $artifacts | Out-Null
$stamp = Get-Date -Format "yyyyMMdd-HHmmss"
$nativeEtl  = Join-Path $artifacts "etw-sspi-nativeread-$stamp.etl"
$managedEtl = Join-Path $artifacts "etw-sspi-managedvalcl-$stamp.etl"
$providers = @(
    '{199FE037-2B82-40A9-82AC-E1D46C792B99}',  # LsaSrv
    '{CC85922F-DB41-11D2-9244-006008269001}',  # LSA
    '{AC43300D-5FCC-4800-8E99-1BD3F85F0320}',  # Microsoft-Windows-NTLM
    '{C92CF544-91B3-4DC0-8E11-C580339A0BF8}',  # NTLM Security Protocol
    '{5BBB6C18-AA45-49B1-A15F-085F7ED0AA90}'   # Security: NTLM Authentication
)
function Start-Sspi($name, $etl) {
    logman create trace $name -ow -o $etl -p $providers[0] 0xFFFFFFFFFFFFFFFF 0xFF -ets | Out-Null
    foreach ($p in $providers[1..($providers.Count-1)]) { logman update trace $name -p $p 0xFFFFFFFFFFFFFFFF 0xFF -ets | Out-Null }
}
Start-Sspi 'histsdk-sspi-nativeread' $nativeEtl
.\tools\AVEVA.Historian.NativeTraceHarness\bin\Debug\net481\AVEVA.Historian.NativeTraceHarness.exe --scenario history --server-name localhost --tcp-port 32568 --tag OtOpcUaParityTest_001.Counter --lookback-minutes 1440 --max-rows 1 --connection-wait-seconds 15 | Out-Null
logman stop 'histsdk-sspi-nativeread' -ets | Out-Null
Start-Sspi 'histsdk-sspi-managedvalcl' $managedEtl
.\tools\AVEVA.Historian.NetFxWcfProbe\bin\Debug\net481\AVEVA.Historian.NetFxWcfProbe.exe --endpoint "net.pipe://localhost/Hist" | Out-Null
logman stop 'histsdk-sspi-managedvalcl' -ets | Out-Null

Decode with Get-WinEvent -Path <etl> -Oldest, then group by ProcessId. Only aahClientAccessPoint's event count + Id list belongs in committed docs; ETL files contain SSPI tokens and identity metadata and stay under artifacts\reverse-engineering\ (gitignored).

After the chosen path produces server-helper telemetry:

  1. Compare native vs managed runs for whether first-round setup helper 0x0050FFC0 runs, whether lookup helper 0x00517AB0 returns a context, whether AcquireCredentialsHandleW succeeds, whether AcceptSecurityContext is reached, and whether failures occur before or after native context-map insertion.

  2. Update:

    • docs\reverse-engineering\implementation-status.md
    • docs\reverse-engineering\openconnection3-correlation-latest.json
  3. Re-run:

    dotnet test .\Histsdk.slnx --no-build --logger "console;verbosity=minimal"
    
  4. Run a targeted secret scan after touching auth/capture docs:

    rg -n "(?i)(password|credential|secret|token|<known-sensitive-host>|<known-sensitive-machine>|<known-sensitive-user>)" docs\reverse-engineering scripts tools
    

Expected scan output includes generic words like token, credential, and environment variable names. It must not include real passwords, unsanitized server names, or customer tag data.

Primary Reference Docs

Read these first when resuming:

  • docs\reverse-engineering\implementation-status.md
  • docs\reverse-engineering\wcf-contract-evidence.md
  • docs\reverse-engineering\managed-wrapper-findings.md
  • docs\reverse-engineering\openconnection3-correlation-latest.json
  • docs\reverse-engineering\query-handle-correlation-latest.json
  • docs\reverse-engineering\cclientcommon-startquery-correlation-latest.json
  • docs\reverse-engineering\capture-workflow.md

Event-flow prereqs (2026-05-04)

HistorianWcfEventOrchestrator.AddCmEventTagViaAddT now replays the prerequisite calls captured via instrument-wcf-writemessage against the live native event read. Before invoking EnsT2(CM_EVENT), the orchestrator now calls:

  1. UpdC3 (UpdateClientStatus3) — handle = storage session id (string GUID), clientStatusSize=81, clientStatus = 02 01 00…00 1E 00 00 00 (81-byte blob: 2 leading bytes + 76 zero bytes + uint32 0x1E trailer).
  2. RTag2 (RegisterTags2) — handle = same GUID, ElementCount=1, pInBuff = 50 67 02 00 01 00 00 00 + 16-byte CmEventTagId (353b8145-5df0-4d46-a253-871aef49b321) = 24 bytes total.
  3. EnsT2 (EnsureTags2) — unchanged byte-for-byte CTagMetadata payload.

Live diagnostic against localhost:

Stage Result
UpdC3 success (return = 0)
RTag2 success (return = 0)
EnsT2 returns false (likely benign — CM_EVENT exists with same metadata)
StartEventQuery success, query handle returned
GetNextEventQueryResultBuffer empty result + 5-byte error 04 55 00 00 00 (type=4 code=85)

The Stat-service queries the native client also issues (Stat/GetV, Stat/GETHI for HistorianVersion, Stat/GetSystemParameter for AllowOriginals, HistorianPartner, HistorianVersion, MaxCyclicStorageTimeout, RealTimeWindow, FutureTimeThreshold, AllowRenameTags) appear informational and are skipped.

Decoded native aa/Retr/StartEventQuery pRequestBuff (63 bytes captured vs 65 bytes our SDK sends) — diff narrowed to the trailing 4 bytes of HistorianEventQueryProtocol.CreateNativeEmptyFilterAttempt. Reverting the trailer to a ushort 0 yielded code 46 (validation reject) instead of code 85, so the uint trailer is structurally correct against this server even though the captured native bytes appear to use 2 bytes there. Either the server tolerates both shapes or the metadata-namespace encoding is off; resolution requires a ReadMessage capture.

24,773 events exist in the last 7 days per SELECT COUNT(*) FROM Events WHERE EventTimeUtc >= DATEADD(DAY, -7, GETUTCDATE()), so code 85 is not "no events".

ReadMessage instrumentation + decoded event responses (2026-05-04)

instrument-wcf-readmessage CLI command added to tools/AVEVA.Historian.ReverseEngineering. Mirror of instrument-wcf-writemessage; targets aahMDASEncoder.ClientMessageEncoder.ReadMessage(ArraySegment<byte>, BufferManager, string) (token 0x06005E63). Injects at method entry (IL_0000) capturing arg.1 (the incoming ArraySegment<byte>) so both the compressed (post-DecompressBuffer V_1) and uncompressed (direct arg.1 at IL_009C) paths are recorded.

Capture obtained (28 records; artifacts/reverse-engineering/instrumented-wcf-readmessage/readmessage-capture-event-latest.ndjson, gitignored). Key responses:

Record Response Length Decoded
5 Open2Response 1586 encoded user identity + session state — must not commit
18 StartEventQueryResponse 299 responseSize=1, pResponseBuff=nil, queryHandle=0x3E (=62), errSize=1, err=nil
23 RTag2Response 208 outBuff 24 bytes (echoes input shape), errorBuffer=nil
24 GetNextEventQueryResultBufferResponse 2783 resultSize=2506, pResultBuff starts 09 00 02 00 00 00 1E 00 00 00 07 00…Alarm.Set…
25 EnsT2Response 229 EnsT2Result=true, OutBuff 45 bytes echoing CmEventTagId

Critical finding: native EnsT2 returns true with a 45-byte OutBuff that echoes CmEventTagId. Our SDK's EnsT2 returns false. Since the request bytes are byte-identical (verified prior pass), the difference is server-side session state. Between UpdC3 (record 10) and RTag2 (record 17) the native flow issues 7 Stat/GetSystemParameter queries (AllowOriginals, HistorianPartner, HistorianVersion, MaxCyclicStorageTimeout, RealTimeWindow, FutureTimeThreshold, AllowRenameTags) plus 2 Stat/GETHI for HistorianVersion. These were previously assumed informational; the EnsT2 false vs true differential suggests at least one of them primes the session for tag operations.

Event-row wire shape (from record 24 pResultBuff):

UInt16 version = 9
UInt32 rowCount
N rows, each:
  UInt32 rowMarker = 0x1E
  UInt16 fieldCount = 7
  Int64  filetimeUtc
  UInt16[fieldCount-1] fieldOffsets   // running offsets into the trailing string blob
  variable-length UTF-16 strings (Alarm.Set, …)

The 2506-byte fixture contains exactly 2 event rows (matches --max-rows 2 passed to the harness). Once the EnsT2-priming gap is closed, this layout plugs directly into HistorianWcfEventOrchestrator.RunEventQuery.

Reproduce with:

$captureDir = "artifacts\reverse-engineering\instrumented-wcf-readmessage"
dotnet run --no-build --project tools\AVEVA.Historian.ReverseEngineering -- `
  instrument-wcf-readmessage current\aahClientManaged.dll "$captureDir\aahClientManaged.dll"
Copy-Item -Force "$captureDir\aahClientManaged.dll" "$captureDir\current-copy\aahClientManaged.dll"
$env:AVEVA_HISTORIAN_RE_CAPTURE = (Resolve-Path $captureDir).Path + "\readmessage-capture-event-latest.ndjson"
dotnet run --no-build --project tools\AVEVA.Historian.NativeTraceHarness -- `
  --scenario event --tag CM_EVENT --lookback-minutes 1440 --max-rows 2 `
  --current-dir (Resolve-Path "$captureDir\current-copy").Path `
  --managed-dll-path (Resolve-Path "$captureDir\current-copy\aahClientManaged.dll").Path
python scripts\decode-readmessage-capture.py

Stat-priming + event-row parser landed (2026-05-04)

HistorianWcfEventOrchestrator.AddCmEventTagViaAddT now replays the Stat-service priming sequence captured from native:

  1. Stat/GetV ×2 (records 6, 7)
  2. Stat/GETHI(HistorianVersion) ×2 (records 8, 9) — builds the 39-byte pRequestBuff via BuildGetHistorianInfoRequest("HistorianVersion")
  3. Hist/UpdC3 (record 10)
  4. Stat/GetSystemParameter ×6 for AllowOriginals, HistorianPartner, HistorianVersion, MaxCyclicStorageTimeout, RealTimeWindow, FutureTimeThreshold (records 11-16)
  5. Hist/RTag2(CmEventTagId) (record 17)
  6. Stat/GetSystemParameter("AllowRenameTags") (record 18)
  7. Stat/GetV (record 20)
  8. Hist/EnsT2(CTagMetadata) (record 22)

Each Stat call is wrapped in best-effort TryRun(...) so individual rejections don't abort the chain. Also fixed:

  • IStatusServiceContract2.GetHistorianInfo parameter naming — [MessageParameter(Name = "pRequestBuff")] and Name = "pResponseBuff" attributes added to match the wire (default would have been <requestBuffer> and the server would have ignored the body).
  • Event-flow ConnectionMode switched from 0x501 to 0x402 — decoded from the native Open2 request bytes (writemessage record 5 offset 0x26). The previous 0x501 was an unverified guess; native uses the same 0x402 read-only mode for both data and event scenarios.

Diagnostic against localhost:

Stage Result
UpdC3 success (return = 0)
RTag2 success (return = 0)
EnsT2 still returns false
GetNextEventQueryResultBuffer type=4 code=85

EnsT2 still doesn't match native (which returns true with a 45-byte OutBuff). Hypothesis under investigation: the StorageSessionId extracted at Open2 response offset 5-20 is the v3 layout; the v6 response (1345 bytes payload, contains user identity) likely has the session GUID at a different offset. Tested bytes 1-16 — UpdC3+RTag2 then both fail (return 1), so 5-20 is the acceptable handle for those ops. The right offset for EnsT2 may be elsewhere in the response. The Open2 v6 response decode requires bytes-level inspection of identity-bearing data (kept under artifacts/, never committed) — see record 5 of instrumented-wcf-readmessage/readmessage-capture-event-latest.ndjson.

Event-row parser

CORRECTED 2026-06-23 (merged 6faf8a5). The skeleton below mis-read the 0x1E as a per-row marker. Verifying the parser against the provided stock client's real 50-event buffer proved 0x1E is a one-time buffer-level header field, and the rows are markerless — so the original parser silently returned only the first row of any multi-row buffer (on WCF too). The corrected layout and behaviour are below.

Wcf/HistorianEventRowProtocol.Parse(ReadOnlySpan<byte>) parses the row buffer (container version 9 for WCF, 11 for 2023 R2 gRPC — both accepted):

UInt16 version = 9 (WCF) | 11 (gRPC)
UInt32 rowCount
UInt32 headerField = 0x1E      // ONE buffer-level field, NOT a per-row marker
N rows, each (MARKERLESS):
  UInt16 rowFormat = 7
  Int64  filetimeUtc           (event time)
  UInt16 × 8 fieldOffsets       (opaque — purpose not fully decoded)
  UInt16 propertyCount
  Property bag (propertyCount × name=value pairs; first field is the event type)

The parser reads the 10-byte buffer header (skipping the 0x1E field once), then walks each markerless row by length: rowFormat(2) + filetime(8) + 8×UInt16 slots + compact-ASCII type + propertyCount + propertyCount × (name + value). Value encoding is implemented (compact ASCII 09 LEN 00 …, Boolean 0x02, GUID 0x10, FILETIME 0x18, Int32 0x31, UTF-16 0x43; unknown markers preserve raw bytes). Verified against the provided client's real buffer: Parse_RealStockClientCapture_DecodesAllEvents decodes all 50 events (25 Alarm.Set + 25 Alarm.Clear) to end-of-buffer (gated on HISTORIAN_EVENT_CAPTURE_NDJSON), plus a synthetic v11 golden test.

5 unit tests in HistorianEventRowProtocolTests.cs cover empty buffer, zero-row, wrong-version, two-row synthetic, and missing-marker. Test count went from 73 to 78. The orchestrator's RunEventQuery now calls the parser on each non-empty resultBuffer, so events will flow with timestamps + types once the EnsT2-priming gap is closed.

Open2 v6 response decoded + live events working (2026-05-04)

A combined Read+Write capture under artifacts/reverse-engineering/instrumented-wcf-both/ (gitignored) let us correlate the session GUID used as handle in the UpdC3/RTag2/EnsT2 REQUESTS with its location in the Open2 RESPONSE.

Open2Response decoded (~1586 bytes WCF body):

Open2Response wraps three byte[] outputs:
  inParameters  (echoed ref param — contains user identity; never commit)
  outParameters (the session blob)
  err           (empty on success)

outParameters payload (42 bytes):

byte 0     protocol version (server returns 3 even when we send Open3 v6 request)
bytes 1-4  UInt32 (purpose unknown — possibly a connect sequence/checksum)
bytes 5-20 16-byte session GUID — used as `handle` for UpdC3/RTag2/EnsT2/Close2
bytes 21-28 Int64 FILETIME (connect time)
bytes 29-36 Int64 FILETIME (server time)
bytes 37-41 5 trailing bytes (status flags?)

This matches HistorianNativeOpen3Output exactly — our existing offset 5-20 GUID extraction was always correct. The earlier hypothesis about a "v6 response layout" was wrong; the server returns the v3 layout regardless of the request version.

Real blocker resolved. Native does three cross-service version probes between RTag2 and EnsT2 — Trx/GetV (record 19), Stat/GetV (record 20), Retr/GetV (record 21) — that register the client with each service's session table. Without them the server rejects EnsT2 (returns false) and GetNextEventQueryResultBuffer reports type=4 code=85.

HistorianWcfEventOrchestrator.AddCmEventTagViaAddT now opens ITransactionServiceContract and IRetrievalServiceContract4 channels inside the setup callback (in addition to the existing IStatusServiceContract2 channel) and calls GetInterfaceVersion on all three between RTag2 and EnsT2.

Final live-read diagnostic (localhost):

Stage Result
UpdC3 success (return = 0)
RTag2 success (return = 0)
Trx/GetV, Stat/GetV, Retr/GetV success
EnsT2 returns false (benign — "CM_EVENT exists with same metadata")
StartEventQuery success
GetNextEventQueryResultBuffer returns event-row buffer
Parser Events observed: 1

LastErrorBufferDescription: type=4 code=85 reaches the orchestrator only on the terminal (no-more-data) call, after the first batch returned an event. The existing soft-terminal handling (if errorBuffer[0] == 4 return) is correct.

The full managed event-read chain is reproducible end-to-end from a pure .NET 10 SDK: GetV → ValCl × N → Open2 → UpdC3 → 6× GetSystemParameter → RTag2 → GetSystemParameter(AllowRenameTags) → Trx/GetV → Stat/GetV → Retr/GetV → EnsT2 → StartEventQuery → GetNextEventQueryResultBuffer loop → EndEventQuery → Close2.

Property-bag value-type parser landed (2026-05-04)

Decoded the row property-bag wire format. Unified value layout:

typeMarker  (UInt8)
length      (UInt8 — bytes of value following the status byte)
status      (UInt8 — observed 0x00 in successful captures)
value       (length × byte, encoding determined by typeMarker)

Typemarker dispatch:

Marker Type Value bytes
0x02 Boolean 1 byte (0/1)
0x10 GUID 16 bytes (.NET Guid byte order)
0x18 FILETIME UTC Int64 LE
0x31 Int32 4 bytes LE
0x43 UTF-16 string UInt16 charCount + (charCount × 2) UTF-16 LE bytes

Unknown markers preserve the raw length value bytes as a byte[] in the property dictionary.

Each row layout (corrected 2026-06-23 — see the "Event-row parser" note above; the 0x1E is a one-time buffer header field, NOT a per-row marker, and rows are markerless):

buffer header: UInt16 version (9|11) + UInt32 rowCount + UInt32 headerField (0x1E)
each row (markerless):
UInt16  rowFormat = 7
Int64   eventTimeUtcFiletime
UInt16 × 8                          // purpose unclear
compact ASCII string                // event type ("Alarm.Set", …)
UInt16  propertyCount
propertyCount × Property {
  compact ASCII string              // property name
  Value (per the typed format above)
}

HistorianEventRowProtocol.Parse populates HistorianEvent fields by mapping known property names: alarm_idId, receivedtimeReceivedTimeUtc, source_processvariable/source_objectSourceName, namespace/provider_systemNamespace, revisionversionRevisionVersion. All decoded properties (typed, not raw bytes) are also exposed via the Properties dictionary.

Live verification (localhost): Events observed: 1, Properties.Count: 31, Has alarm_id: True, EventTimeUtc and ReceivedTimeUtc decoded as plausible timestamps.

Tests: 78 → 80. Added Parse_RowWithKnownProperties_PopulatesEventFields (verifies all known-name → HistorianEvent-field mappings using synthetic placeholder values) and Parse_UnknownTypeMarker_KeepsRawBytesInPropertyBag (verifies the unknown-type fallback).

The fully managed event read is now end-to-end: chain auth → Stat priming → EnsT2 → StartEventQuery → row buffer → typed event with property dictionary.

Safety Notes

  • Keep raw captures and identity-bearing logs under artifacts\reverse-engineering.
  • Do not commit credentials, hostnames, user names, customer tags, or raw packet captures.
  • Prefer sanitized JSON and Markdown summaries under docs\reverse-engineering.
  • Production code under src\AVEVA.Historian.Client must remain pure managed .NET 10.
  • Reverse-engineering harnesses may reference native AVEVA binaries only for analysis and parity comparison.