df28bcfa53
M0/M1/M2/M3 done + live-verified; M4 R4.1/R4.3(idle)/R4.4 merged to main; the grpc-tooling-completion plan is fully executed. Add a top-of-file status banner enumerating the only remaining items and why each is gated (infra-gated event-row retrieval + active-SF magnitude; capture-gated SendEvent; server-walled SQL + revision edits; out-of-scope ReadBlocks / DeleteTagExtendedProperties). Nothing left is a pure code task. README transport matrix stays authoritative per-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
363 lines
39 KiB
Markdown
363 lines
39 KiB
Markdown
# HCAL modern-.NET client — implementation roadmap
|
||
|
||
Ordered, actionable plan to grow **histsdk** from "reads + basic config" into a broad
|
||
HCAL replacement, built on the **2023 R2 gRPC transport**. Derived from
|
||
[`hcal-capability-matrix.md`](hcal-capability-matrix.md); event details in
|
||
[`histevents.md`](histevents.md).
|
||
|
||
> Move to the repo's `docs/plans/` when execution starts. Each work item lands as: a
|
||
> protocol serializer/parser + golden-byte unit test + an env-gated live integration
|
||
> test against the local Historian.
|
||
|
||
## Status: roadmap exhausted (2026-06-22)
|
||
|
||
The reachable surface is **complete**. M0/M1/M2/M3 are done and live-verified; M4 R4.1 (store-forward
|
||
outbox), R4.3 (measured idle-state SF status), and R4.4 (redundancy) are shipped and **merged to
|
||
`main`**; and the follow-on `grpc-tooling-completion.md` plan is fully executed (writes, ReadEvents,
|
||
GetConnectionStatus, the extended-property read-parser fix — all done or bounded-out). The transport
|
||
matrix in the top-level `README.md` is the authoritative per-operation status.
|
||
|
||
**Nothing left is a pure code task** — every remaining item is gated:
|
||
|
||
- **Infra-gated** (needs a different live server): gRPC event *row* retrieval (`StartEventQuery`
|
||
succeeds but `GetNextEventQueryResultBuffer` long-polls — needs an **event-bearing** 2023 R2 server);
|
||
R4.3 active-SF *magnitude* (needs an **SF-active** server / the D2 storage-engine console handle).
|
||
- **Capture-gated** ("capture before guessing wire bytes"): `SendEvent` over gRPC (no distinct RPC;
|
||
framing uncaptured).
|
||
- **Architecturally walled** (no client-side fix): `ExecuteSqlCommand` over gRPC (server-side
|
||
`CSrvDbConnection` fault; a `RegisterTags` prime does not clear it); R4.2 revision *edits*
|
||
(storage-engine-pipe-only on both transports).
|
||
- **Out of scope until a demand signal**: `ReadBlocks` (`StartBlockRetrievalQuery`, never captured);
|
||
`DeleteTagExtendedProperties` (server-blocked on WCF).
|
||
|
||
## Progress (updated 2026-06-22)
|
||
|
||
- ✅ **R0.6 version gate** — `HistorianServerVersionGate` + `HistorianClientOptions.VerifyServerInterfaceVersion`;
|
||
fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are
|
||
evidence-based (Hist=11/12, Retr=4, Trx=2; Status reachability-only), captured from the
|
||
live server. History 12 (2023 R2 gRPC) accepted alongside 11 (buffer-compatible).
|
||
- ✅ **CW-1 capture pipeline** — `ProtocolCaptureSanitizer` + `ProtocolFixtureWriter` +
|
||
`capture-tag-info` CLI command; produces sanitized `fixtures/protocol/<op>/` golden files.
|
||
11 unit tests. First fixture: `get-tag-info/analog-*.json`.
|
||
- ✅ **gRPC auth handshake (read chain)** — LIVE-VERIFIED 2026-06-21 against a real 2023 R2
|
||
server: `ReadRawAsync` over `RemoteGrpc` returns rows. Token loop routes to
|
||
`StorageService.ValidateClientCredential`. Shared handshake extracted to
|
||
`Grpc/HistorianGrpcHandshake` for reuse by the status/browse/metadata paths.
|
||
- ✅ **R0.4 Probe over gRPC** — `Grpc/HistorianGrpcProbe` (History/Retrieval/Status
|
||
`GetInterfaceVersion`); `ProbeAsync` routes over gRPC when `Transport==RemoteGrpc`.
|
||
**LIVE-VERIFIED 2026-06-21** (no credentials required — runs before the auth loop).
|
||
- ✅ **R0.3 System parameter over gRPC** — `Grpc/HistorianGrpcStatusClient.GetSystemParameterAsync`
|
||
(`StatusService.GetSystemParameter`); routed in the dialect. Built + unit-tested + **LIVE-VERIFIED
|
||
2026-06-21** against a real 2023 R2 server (returned `HistorianVersion`). Code path is the proven
|
||
handshake + a single string-in/string-out RPC.
|
||
- ✅ **R0.2 Tag metadata over gRPC** — `Grpc/HistorianGrpcTagClient.GetTagMetadataAsync`
|
||
(`RetrievalService.GetTagInfosFromName`, the plural **string-handle** op). `GetTagMetadataAsync`
|
||
routes over gRPC when `Transport==RemoteGrpc`. Request `btTagNames` = `uint count + per-name(uint
|
||
charCount + UTF-16LE)` (golden-byte unit-tested); response `btTagInfos` = `uint count + CTagMetadata`
|
||
records (reuses `ParseGetTagInfoResponse`); string handle = uppercase Open2 storage GUID. The 2020
|
||
WCF string-handle wall does **not** apply on the gRPC front door (as predicted). **LIVE-VERIFIED
|
||
2026-06-21** — `GetTagMetadataAsync` returned the requested tag + a valid data type.
|
||
- ✅ **R0.1 Browse over gRPC** — DONE, **LIVE-VERIFIED 2026-06-21**.
|
||
`HistorianClient.BrowseTagNamesAsync` routes over gRPC via
|
||
`Grpc/HistorianGrpcTagClient.BrowseTagNamesAsync`: StartTagQuery(**OData** filter) → paged
|
||
**QueryTag** (`btRequest` = `u16 0x6752 + u16 1 + u16 queryType + u32 startIndex + u32 count`) →
|
||
EndTagQuery; response = `u32 count + per-name(u32 charCount + UTF-16LE) + trailer`. The SDK glob
|
||
filter is translated by `GlobToODataFilter` (`Pre*`→`startswith`, `*suf`→`endswith`, `*mid*`→
|
||
`contains`, exact→`eq`). The QueryTag packet-id `0x6752` was recovered from a `.rdata`
|
||
packet-descriptor table (`{0x6751,1}`=StartTagQuery, `{0x6752,1}`=QueryTag) — no Ghidra needed.
|
||
Golden-byte + glob unit tests + gated live test. Full finding:
|
||
`docs/reverse-engineering/grpc-tag-query-odata.md`.
|
||
|
||
> ✅ **Milestone 0 (gRPC parity) is COMPLETE** — probe, system-param, metadata, and browse all run
|
||
> over `RemoteGrpc` and are live-verified against a real 2023 R2 server, alongside the read chain.
|
||
|
||
> ℹ️ **Auth note (2026-06-21, resolved):** an apparent NTLM round-1 `SEC_E_LOGON_DENIED` blocker
|
||
> turned out to be a **test-harness credential-parsing bug**, not a server/account/SDK issue — the
|
||
> gitignored creds file stores **quoted** values (`"nam\user"`, `"pass"`), and the env-setup must
|
||
> **strip surrounding quotes** before exporting `HISTORIAN_USER`/`HISTORIAN_PASSWORD`. With quotes
|
||
> stripped, the domain account authenticates and the full read + system-param + probe chain passes
|
||
> live. The round-failure diagnostic added during the hunt is kept
|
||
> (`HistorianNativeHandshake.DescribeError` decodes the native error + hex/ASCII preview).
|
||
|
||
> ⚠️ **Live-verification constraint:** the local Historian is **2020** (WCF, port 32568) — the
|
||
> 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1–R0.4) can be built and
|
||
> golden-byte/unit-tested here but **cannot be live-verified** without an actual 2023 R2 server.
|
||
> Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.
|
||
|
||
> 🔬 **M1a re-classification (2026-06-20).** Two "trivial" items were live-probed against the
|
||
> 2020 WCF server and found **not deliverable here**, both for evidence-backed reasons:
|
||
> - **R1.3 `GetServerTimeZoneAsync`** — `Status.GetSystemTimeZoneName` is a client-side *stub*
|
||
> on 2020 (rc=0, empty value), same family as `GetServerTime`. gRPC/2023R2-only.
|
||
> - **R1.1 `ExecuteSqlCommandAsync`** — `ExeC` returns native error 51 (InvalidParameter);
|
||
> the contract-3 string-handle ops require an unmapped native session/filter registration
|
||
> step (the `StartTagQuery` wall).
|
||
>
|
||
> Takeaway: the M1a "cheap surface" is *cheap only on the 2023 R2 gRPC front door*. On 2020 WCF
|
||
> the boundary is the **handle type** (see the string-handle wall note under §1b and
|
||
> `docs/reverse-engineering/wcf-string-handle-wall.md`): **`uint`-handle ops work, `string`-handle
|
||
> ops are blocked.** GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed,
|
||
> reachable). The reachable **`uint`-handle** items are now **DONE**: ~~R1.8/R1.9 StartQuery
|
||
> summary/state modes~~ (resolved = existing `ReadAggregateAsync`) and ~~R1.7 event filters~~
|
||
> (✅ 2026-06-20 — `ReadEventsAsync(…, HistorianEventFilter)`, live-honored). M2 event send is
|
||
> also done (✅ WCF `AddS2`). **R1.2 `GetRuntimeParameterAsync` is also done** (✅ 2026-06-20,
|
||
> `aa/Stat/GETRP`, live-verified) — notably a *string-handle* op that punches through the wall
|
||
> using the Open2 storage-session GUID as an **uppercase** string handle, which proved the
|
||
> GETHI/ExeC failures were a handle-*format* issue rather than a missing native registration.
|
||
> **Follow-up done:** R1.1 `ExecuteSqlCommandAsync` shipped; R1.5 extended-property read shipped
|
||
> (R1.6 collapsed into it — no distinct localized op). **R1.4 `GetHistorianInfo` bounded out on
|
||
> 2020 WCF** — GETHI there is a named-value query (only `HistorianVersion`); `EventStorageMode` is
|
||
> 2023R2-gRPC-only (see `wcf-historian-info.md`). Net: the **reachable 2020-WCF M1 read surface is
|
||
> complete**; what remains is config *writes* (M1c — gated on an explicit user request) and the
|
||
> gRPC/2023R2-only items (R1.3 timezone, R1.4 EventStorageMode — need a live 2023 R2 server).
|
||
>
|
||
> **Update 2026-06-21 (live 2023 R2 gRPC probe — both closed):** **R1.3 SHIPPED on gRPC** —
|
||
> `GetServerTimeZoneAsync` returns a real zone ("Eastern Daylight Time") via
|
||
> `StatusService.GetSystemTimeZoneName`; non-gRPC path fails closed
|
||
> (`ProtocolEvidenceMissingException`). **R1.4 bounded out on gRPC too** — `GetHistorianInfo` is
|
||
> named-value-only on the gRPC wire as well, `EventStorageMode` resolves under no name on either
|
||
> `GetHistorianInfo` or `GetSystemParameter`, and the 518-byte struct is C++-HCAL-internal (filled
|
||
> via native vtable+648, not the gRPC op). So **no gRPC/2023R2-specific reads remain open** — the
|
||
> entire M1 read surface (2020 WCF + 2023 R2 gRPC) is now closed.
|
||
|
||
## Guiding principles
|
||
|
||
1. **gRPC-first.** New ops go on the `RemoteGrpc` transport (clean protobuf envelope);
|
||
the inner `bytes` blob is the only thing to RE. Keep WCF as the legacy/Windows path.
|
||
2. **Two tests per op, always.** A golden-byte test (deterministic, no server) **and** a
|
||
gated live test (`HISTORIAN_GRPC_HOST` / `HISTORIAN_HOST`). No op is "done" without both.
|
||
3. **Version-pin, fail closed.** Read server version at connect; gate every byte
|
||
serializer on it; throw `ProtocolEvidenceMissingException` on mismatch — never
|
||
best-effort parse.
|
||
4. **Capture once, encode forever.** For CAPTURE-tier items, instrument one native call,
|
||
save a sanitized fixture under `fixtures/protocol/`, then implement against the fixture.
|
||
5. **Ship per milestone.** Each milestone is independently releasable.
|
||
|
||
Effort: **S** ≈ days · **M** ≈ ~1 week · **L** ≈ weeks. Estimates are incremental on
|
||
histsdk's existing infra (auth chain, transport, frame primitives, test harness).
|
||
|
||
---
|
||
|
||
## Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)
|
||
|
||
*Goal: everything already working over WCF also works over `RemoteGrpc`, so the whole
|
||
read/browse/status surface is Windows-free and the gRPC stack is the default path.*
|
||
|
||
| ID | Work | gRPC op | Files | Verify | Effort |
|
||
|---|---|---|---|---|---|
|
||
| R0.1 | Route browse over gRPC | `Retrieval.StartTagQuery`/`QueryTag` or `GetTagInfosFromName` | `Grpc/HistorianGrpcReadOrchestrator` (+ new `…GrpcBrowseClient`), `Historian2020ProtocolDialect` | browse tags live over gRPC | S |
|
||
| R0.2 | Route tag metadata over gRPC | `Retrieval.GetTagInfosFromName` | dialect + grpc client | metadata matches WCF result | S |
|
||
| R0.3 | Route status/system-param over gRPC | `Status.GetSystemParameter`, `Status.GetHistorianConsoleStatus` | new `Grpc/HistorianGrpcStatusClient` | system param + conn status live | S |
|
||
| R0.4 | Probe over gRPC | `*.GetInterfaceVersion` | grpc clients | `ProbeAsync` Windows-free | XS |
|
||
| R0.5 | **Capture harness for gRPC payloads** | n/a | reuse `instrument-wcf-*` tooling (same byte blobs) + add a `grpc-call-dump` helper | dump any request/response `bytes` to a fixture | S |
|
||
| R0.6 | **Version gate** | server version at connect | `HistorianClientOptions`, orchestrators | mismatched version → throws | S |
|
||
|
||
**Acceptance:** the entire Phase-0 capability set runs end-to-end over `RemoteGrpc`
|
||
(incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.
|
||
|
||
---
|
||
|
||
## Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (M–L total)
|
||
|
||
*Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.*
|
||
|
||
### 1a. Trivial (XS–S each, no new payload format)
|
||
| ID | Capability | gRPC op | Notes |
|
||
|---|---|---|---|
|
||
| ~~R1.1~~ | ~~`ExecuteSqlCommandAsync`~~ | `Retrieval.ExecuteSqlCommand` (`ExeC`+`GetR`) | ✅ **DONE (2026-06-20), live-verified.** `ExecuteSqlCommandAsync(sql)` → `HistorianSqlResult` (columns + typed rows). String-handle op via the uppercase storage GUID. Chain: `Retr.GetV` prime → `ExeC(handle, sql, option=0, ref queryHandle)` → `GetR` loop (note: `GetR` returns **false even on success** — the stream is in `pResultBuff` regardless; false = final page). `GetR`'s `pResultBuff` is an **NRBF-serialized `DataTable`** (`SerializationFormat.Xml`: members `XmlSchema` + `XmlDiffGram`). BinaryFormatter is gone from .NET 10, so it's decoded read-only with `System.Formats.Nrbf` + `XDocument` (no BinaryFormatter). Shipped: `HistorianSqlResult`/`HistorianSqlColumn`/`HistorianSqlExecuteOption`, `HistorianSqlResultProtocol`, `HistorianWcfSqlClient`, golden `WcfSqlResultProtocolTests`, gated live tests. See `docs/reverse-engineering/wcf-exec-sql.md`. |
|
||
| ~~R1.2~~ | ~~`GetRuntimeParameterAsync`~~ | `Status.GetRuntimeParameter` (`aa/Stat/GETRP`) | ✅ **DONE (2026-06-20), live-verified.** Captured (`scripts/Capture-RuntimeParam.ps1`): GETRP is a **`string`-handle** op (GETHI's shape), but reachable from the managed client using the Open2 storage-session GUID as an **uppercase** string handle (`ToString("D").ToUpperInvariant()`). Returns `HistorianVersion` = `20,0,000,000` live. pRequestBuff = `54 67 01 00` + uint nameCount + per-name(uint charCount + UTF-16); pResponseBuff = version + uint resultCount + CRetVariant(`0x43` VT_BSTR + uint16 len + uint16 charCount + UTF-16). Single string-valued param only (multi-name framing inferred, not captured). Shipped: `HistorianClient.GetRuntimeParameterAsync(name)`; golden `WcfRuntimeParameterProtocolTests`. **Note:** GETRP punching through the string-handle wall with the uppercase storage GUID is a strong lead that GETHI/ExeC may be a handle-*format* issue — see `wcf-string-handle-wall.md` §Update. |
|
||
| ~~R1.3~~ | `GetServerTimeZoneAsync` | `Status.GetSystemTimeZoneName` | ✅ **DONE on gRPC (2026-06-21), LIVE-VERIFIED** against the real 2023 R2 server — returns `"Eastern Daylight Time"`. `HistorianClient.GetServerTimeZoneAsync` routes over `RemoteGrpc` (`HistorianGrpcStatusClient.GetSystemTimeZoneNameAsync`, `uiHandle`-in/string-out, no buffer). The 2020 WCF op stays a client-side stub (rc=0, empty), so the non-gRPC path **throws `ProtocolEvidenceMissingException`** (fail-closed) rather than return an empty string. Golden message-shape + non-gRPC guardrail unit tests + gated live test. (2020-only routes — per-block `HistoryBlock.TimeZoneOffset`, SQL via R1.1 — remain DST-specific and are not this op.) |
|
||
|
||
> ✅ **String-handle "wall" RESOLVED (2026-06-20) — it was a handle-FORMAT bug.** R1.4/R1.5/R1.6
|
||
> (and R1.1) take a **`string` GUID handle**; the earlier "code 1/51 blocked" verdict came from
|
||
> passing the Open2 storage GUID in .NET's default **lowercase**. Sent **uppercase**
|
||
> (`storageSessionId.ToString("D").ToUpperInvariant()`) the same handle works: **GETRP** (R1.2,
|
||
> shipped), **GETHI** (R1.4) and **ExeC** (R1.1) are all live-verified reachable, and **R1.5
|
||
> `GetTepByNm`** is now **shipped + live-verified** (`GetTagExtendedPropertiesAsync`). **R1.6 has no
|
||
> distinct op** (collapses into R1.5). Note: `QTB` (StartTagQuery) does **not** punch through — it
|
||
> fails *server-side* (`CMdServer::StartActiveTagnamesQuery` over the `aahMetadataServer` pipe),
|
||
> independent of handle format, so the index-based property/query paths stay blocked here. Full
|
||
> analysis: `docs/reverse-engineering/wcf-string-handle-wall.md` (RESOLVED banner) and
|
||
> `docs/reverse-engineering/wcf-tag-extended-properties.md`.
|
||
> R1.8/R1.9 (StartQuery summary/state modes) are `uint`-handle and were already reachable.
|
||
|
||
### 1b. Bounded (decode one `bytes` payload; S–M each)
|
||
| ID | Capability | gRPC op | Payload to decode | Depends |
|
||
|---|---|---|---|---|
|
||
| ~~R1.4~~ | `GetHistorianInfoAsync` | `Status.GetHistorianInfo` (`GETHI`) | ⛔ **BOUNDED OUT — now confirmed on the 2023 R2 gRPC front door too (2026-06-21, live-probed).** The motivating field `EventStorageMode` is **not on the wire on either transport.** Live gRPC probe against the real 2023 R2 server: `GetHistorianInfo` is a **named-value** query exactly like 2020 WCF — only `HistorianVersion` resolves (→ `"23,1,000,000"` + `02 00 01 00` trailer); `EventStorageMode` + 7 name variants fail (`success=false`) on **both** `GetHistorianInfo` **and** `GetSystemParameter`. The 518-byte `HISTORIAN_INFO` struct (mode@514) is the **C++ HCAL in-memory model** (managed `HistorianAccess.GetHistorianInfo` fills it via a native **vtable+648** call, not the gRPC op — verified in the 2023 R2 decompile), derived outside the wire. The only wire-reachable field (version) is already shipped (`ProbeAsync`/`GetSystemParameterAsync`/`GetRuntimeParameterAsync`), so a struct API would be hollow + misleading. **Closes the prior "build against a live 2023 R2 server" caveat — done, and there is nothing to ship.** See `docs/reverse-engineering/wcf-historian-info.md`. | uppercase string handle |
|
||
| ~~R1.5~~ | Extended-property **read** | `Retrieval.GetTagExtendedPropertiesFromName` (`GetTepByNm`) | ✅ **DONE (2026-06-20), live-verified.** `GetTagExtendedPropertiesAsync(tag)` → name/value pairs. String-handle op via the uppercase storage GUID; name-based path (`GetTagExtendedPropertiesByName`, not the QTB-gated TagQuery path). Request `tagNames` = `uint count` + per-name(`uint charCount`+UTF-16); response = `uint tagCount` + per-tag(marker + compact-ASCII name + `uint propCount` + per-prop(marker + compact-ASCII name + `0x43` VT_BSTR value) + trailer). Sequence-paged. Shipped: `HistorianTagExtendedPropertyProtocol`, golden `WcfTagExtendedPropertyProtocolTests`, gated live test. See `docs/reverse-engineering/wcf-tag-extended-properties.md`. | uppercase string handle |
|
||
| ~~R1.6~~ | Localized-property **read** | (no op) | ⛔ **No distinct op on 2020 — collapses into R1.5.** There is no `GetTagLocalizedPropertiesFromName`/`GetTlpByNm` or `GetTagLocalizedPropertiesByName` in `current/aahClientManaged.dll`; the only "localized" surfaces are error-message/UI-text localization. Extended properties (R1.5) are the user-defined tag-property read surface. Closed, not throwing. | — |
|
||
| ~~R1.7~~ | Event **filters** | filter bytes in `Retrieval.StartEventQuery` | ✅ **DONE (2026-06-20), live-honored.** `ReadEventsAsync(start, end, HistorianEventFilter)`. The filter rides `StartEventQuery`'s `pRequestBuff` (captured via `EventQuery.AddEventFilter` + instrument-wcf-writemessage; Equal vs Contains diffed to isolate the op). Filter block: `ushort 0 + uint filterCount + uint condCount + uint nameLen + name(UTF-16) + uint 1 + ushort op + uint 1 + value(0x09-len-0x00 compact-ASCII) + byte 0`. **REAL, not inert** (a non-matching predicate returns 0 events; matching returns the subset). Single string-valued predicate only; multi-filter (OR) / multi-condition (AND via `AddEventFilterCondition`) framing not yet fully captured. See `HistorianEventFilter`, golden `WcfEventQueryProtocolTests`. | — |
|
||
| ~~R1.8~~ | Analog-summary query | `Retrieval.StartQuery` (summary mode) | ✅ **RESOLVED (2026-06-21) — no new code; == existing `ReadAggregateAsync`.** Request + response both captured (`scripts/Capture-SummaryRequest.ps1 -WithResponse`): the `GetNextQueryResultBuffer2` response is the **ordinary version-9 row buffer** the raw/aggregate parser already handles (decoded 7 rows = SQL ground truth exactly). There is **no rich `CAnalogSummaryValue` struct on the wire** — each row carries a *single* value selected by `RetrievalMode`/QueryType (Integral→8, TimeWeightedAverage→5, …), not an all-aggregates-in-one row; `ValueSelector`/`AggregationType`/`MaxStates` are **inert** on the WCF retrieval path (they configure the SQL provider, not this query). The all-aggregates-at-once shape is the SQL/OLEDB provider's, or the gRPC front door — not 2020 WCF binary. Plan + capture evidence: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md). | — |
|
||
| ~~R1.9~~ | State-summary query | `Retrieval.StartQuery` (state mode) | ✅ **RESOLVED (2026-06-21) — same finding as R1.8.** State-summary is the **same `StartQuery2` request** (only `MaxStates`/defaults differ on the wire); the response carries no distinct `CStateSummaryStruct` on the 2020 WCF binary path. Covered by the existing aggregate read; no new `src/` code warranted. Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md). | — |
|
||
|
||
### 1c. Bounded config writes (S–M each)
|
||
| ID | Capability | gRPC op | Payload | Notes |
|
||
|---|---|---|---|---|
|
||
| R1.10 | `RenameTagsAsync` | History rename op | rename request buffer | `AllowRenameTags` already probed |
|
||
| ~~R1.11~~ | Extended-property **write** | `History.AddTagExtendedProperties` (AddTEx) | ✅ **Add DONE (2026-06-21), live-verified.** `AddTagExtendedPropertiesAsync`/`AddTagExtendedPropertyAsync` (write mode, uppercase handle). inBuff = exact inverse of the R1.5 read framing (`uint32 groupCount + 0x01 + compact-ASCII tag + uint32 propCount + per prop[0x02 + compact-ASCII name + 0x43 VT_BSTR value] + 0x01 trailer + 0x00 terminator`); the trailing `0x00` is required or the server throws. Golden `WcfTagExtendedPropertyWriteProtocolTests` + gated live write/read-back test. **Delete (DelTep): wire format CAPTURED + serializer golden-proven (2026-06-21), but live delete is server-blocked and NOT shipped.** Captured via a two-session trick (add in Run A → fresh-session read-sync → delete in Run B, past the native err-229 client gate); inBuff = same group framing as Add but property-name-only and a `0x00` group trailer. A decisive experiment shows SDK-added properties ARE deletable (the native client deletes one), so SDK-add is complete; the SDK's own DelTep is rejected (`SErrorException` in `CHistStorage::DeleteTagExtendedProperties`) despite matching mode/handle/inBuff + GetTgByNm/GetTepByNm prime + open channel + 60s retries. Root cause: the native multiplexes services over ONE connection (per-connection working set), which the SDK's per-service WCF channels don't reproduce — needs transport-level multiplexing. See `docs/reverse-engineering/wcf-add-tag-extended-properties.md` §Delete. |
|
||
| ~~R1.12~~ | Localized-property **write** | (no op) | ⛔ **No distinct op on 2020 — closed (mirror of R1.6).** A symbol sweep of `current/*.dll` finds no `AddTagLocalizedProperties` / `DeleteTagLocalizedProperties` / any `*LocalizedPropert*` / `TagLocalized*`; only UI/error-text localization (`GetLocalizedText`/`GetLocalizedMessage`/`LocalizedResourcesDir`). Localized properties are a 2023 R2/gRPC concept. Closed, not throwing. See `docs/reverse-engineering/wcf-tag-extended-properties.md` §R1.12. | 2026-06-21 |
|
||
| ~~R1.13~~ | Non-analog tag create (string/discrete) | `History.EnsureTags` | distinct CTagMetadata variant | ⛔ **GATED — bounded out (2026-06-21, live-probed).** Native `AddTag` rejects every non-analog type **client-side** (`ErrorCode=ValidationFailed` / "Transaction validation failed", before any WCF op): SingleByteString, DoubleByteString, **and Int1** all fail; Float (control) succeeds. The native `HistorianDataType` enum has **no Discrete/Boolean** and no Int8/UInt8 (SDK-only extensions); `HistorianTag` has **no TagType setter** (type is data-type-derived). So no non-analog wire request is ever emitted → nothing to capture/implement. String/discrete create goes via a different subsystem (config editor / SQL), not this client's AddTag. `EnsureTagAsync` stays analog-only. See `docs/reverse-engineering/wcf-non-analog-tag-create.md`. |
|
||
|
||
**Acceptance:** read + browse + metadata + system/status + property R/W + summaries +
|
||
event-filtered reads + rename all live-verified over gRPC.
|
||
|
||
---
|
||
|
||
## Milestone 2 — Event sending (CAPTURE) (S–M) ← headline gap
|
||
|
||
*Goal: `SendEventAsync(HistorianEvent)`. Path fully mapped in histevents.md; one capture away.*
|
||
|
||
> ✅ **DONE (2026-06-20) — `HistorianClient.SendEventAsync(HistorianEvent)` shipped and
|
||
> live-accepted over 2020 WCF.** The headline assumption — that event delivery would ride the
|
||
> non-WCF storage-engine pipe (and so be blocked like revision writes) — was **disproved by
|
||
> capture**: a native `AddStreamedValue(HistorianEvent)` leaves over WCF as **`AddS2`
|
||
> (`IHistoryServiceContract2.AddStreamValues2`)**. CM_EVENT is a built-in registered tag, so the
|
||
> `129 TagNotFoundInCache` gate that blocks `AddS2` for user tags does **not** apply to events.
|
||
> The full managed chain (Open2 event-mode **0x501** → CM_EVENT RTag2/EnsT2 → AddS2) is accepted
|
||
> by the server (`AddS2` returns success, empty error buffer). See the event-send field map under
|
||
> §"Event-send wire format" in `histevents.md` and `HistorianEventWriteProtocol`.
|
||
>
|
||
> ⚠️ **Persistence caveat (environment, not SDK):** on the local dev Historian, accepted events
|
||
> are **not persisted** to the queryable store (`v_AlarmEventHistory2` latest stays at the
|
||
> pre-test date; count only ages down). The **native** client exhibits the identical behaviour
|
||
> (its `AddS2` also returns success but nothing lands), so this is the box's event-ingestion
|
||
> pipeline not being active — not an SDK protocol gap. The SDK emits byte-equivalent `AddS2`
|
||
> (golden-tested). Full send→store→read-back round-trip awaits a Historian with an active event
|
||
> storage pipeline.
|
||
|
||
| ID | Work | Status |
|
||
|---|---|---|
|
||
| R2.1 | Capture the event value blob | ✅ `scripts/Capture-EventSend.ps1` (event-send harness scenario + instrument-wcf-{write,read}message); two captures diffed to separate constant framing from value fields. Decisive finding: event-send = WCF `AddS2`, not storage pipe. |
|
||
| R2.2 | `HistorianEventWriteProtocol` | ✅ Serializes the `AddS2` pBuf (storage sample buffer wrapping the event VTQ): "OS" sig + sampleCount + length fields + CM_EVENT tag id + EventTime FILETIME + OpcQuality + opaque descriptor + event Id + ReceivedTime FILETIME + Namespace + EventType + version + typed property bag (string props reuse the read parser's `0x43` encoding). Golden-byte test pins capture A. |
|
||
| R2.3 | Event write orchestrator | ✅ `HistorianWcfEventOrchestrator.SendEventAsync`: Open2 (0x501) → reuse CM_EVENT RTag2/EnsT2 registration → `AddStreamValues2(handle, pBuf, out err)` on the same /Hist channel + storage-session handle. |
|
||
| R2.4 | Public API | ✅ `HistorianClient.SendEventAsync(HistorianEvent)`. Original events only (RevisionVersion=0) with string-valued properties; other property types + revision/update/delete throw `ProtocolEvidenceMissingException` until captured. |
|
||
| R2.5 | Round-trip test | ✅ Golden-byte on R2.2 + gated live test `SendEventAsync_AgainstLocalHistorian_AcceptedByServer` (asserts server acceptance; SQL read-back best-effort given the persistence caveat). |
|
||
|
||
**Acceptance:** an event sent from histsdk is accepted by the historian over WCF with a
|
||
byte-correct `AddS2` (✅). Appears-and-reads-back is environment-gated on event persistence (see caveat).
|
||
|
||
---
|
||
|
||
## Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)
|
||
|
||
*Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.*
|
||
|
||
> ✅ **gRPC UNLOCK (2026-06-21, LIVE-VERIFIED): the transaction lifecycle is REACHABLE over the
|
||
> 2023 R2 gRPC front door.** The `grpc-revision-probe` opened a **write-enabled** (`0x401`) gRPC
|
||
> session and drove `TransactionService.AddNonStreamValuesBegin(storage-GUID **uppercase**)` →
|
||
> real `strTransactionId` → `AddNonStreamValuesEnd(bCommit=false)` (discarded, no data written).
|
||
> Where 2020 WCF returns `UnknownClient (51)`, the gRPC `TransactionService` is itself the gateway
|
||
> to the storage engine, so the Open2 session GUID is accepted directly — **no legacy pipe**. This
|
||
> answers the M3-over-gRPC question below: **yes**, the non-streamed *original* write transaction is
|
||
> reachable from the pure-managed SDK. **Not yet shipped:** the `AddNonStreamValues` `btInput` VTQ
|
||
> buffer must be captured before any value-commit (never guess wire bytes); revision *edits* (R4.2)
|
||
> remain pipe-only even on gRPC. Full detail + decompile basis:
|
||
> [`revision-write-path.md`](revision-write-path.md) §"2023 R2 gRPC — the wall is gone".
|
||
>
|
||
> ⛔ **BLOCKED on 2020 WCF — re-confirmed by the D2 probe (2026-05-05), see
|
||
> [`revision-write-path.md`](revision-write-path.md).** The premise above ("the path that is NOT
|
||
> the gated cache push") was **disproved** *on WCF*: R3.1's op
|
||
> (`Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End`) is the **same**
|
||
> `ITransactionServiceContract2.AddNonStreamValuesBegin2` D2 probed, and over WCF it returns
|
||
> `04 33 00 00 00` = `UnknownClient (51)` for every handle format **and** the full priming chain
|
||
> (Stat/Hist/Retr/Trx GetV + UpdC3 + 6× GetSystemParameter + RTag2). Root cause (IL-walk:
|
||
> `CClient.TransactionBegin` → `CHistStorageConnection.StartTransaction` →
|
||
> `CStorageEngineConsoleClient.StartTransaction`): the real transaction rides a **shared-memory +
|
||
> named-pipe** channel (`STransactPipeClient2` + `SCrtMemFile`) to `aaStorageEngine.exe`, separate
|
||
> from WCF. The WCF Trx op is a server-side **relay** that requires a pre-existing storage-engine
|
||
> pipe session, which no WCF op can establish. So **M3 over 2020 WCF is unimplementable as a
|
||
> pure-managed SDK** — same architectural wall as R4.2 (revisions) and the `AddS2` cache gate.
|
||
>
|
||
> **Only remaining lever:** the **2023 R2 gRPC front door** (HCAL-native, no legacy storage-engine
|
||
> pipe). Whether the gRPC services expose a non-streamed/revision write that bypasses the pipe is
|
||
> **untested** — it needs the live 2023 R2 server + a native gRPC capture of the write op, then
|
||
> decode/implement. Treat as on-demand (no current demand signal); the WCF path is closed.
|
||
|
||
| ID | Work | gRPC op | Status |
|
||
|---|---|---|---|
|
||
| R3.1 | Decode non-streamed VTQ packet | `History.AddStreamValues` ("ON" buffer) + `EnsureTags` | ✅ **CAPTURED + VALIDATED 2026-06-21.** Drove the native 2023 R2 client through a committed historical write (sandbox tag) with the IL-rewritten gRPC client dumping every `byte[]`; the value **read back over gRPC**. The path is **NOT** `AddNonStreamValues`/TransactionService — it's **`HistoryService.AddStreamValues`** with an **"ON" storage-sample buffer** (AddS2 "OS" family) + `EnsureTags`. Buffer decoded: `"ON"(0x4E4F) + u16 count + u32 totalLen + u16 payloadLen + 16B tag GUID + FILETIME + u16 quality + u32 type + FILETIME + 8B double`. D2 cache gate does NOT block the primed 2023 R2 client. See [`revision-write-path.md`](revision-write-path.md) §"R3.1 CAPTURED". |
|
||
| R3.2 | `AddHistoricalValuesAsync` | `History.AddStreamValues` ("ON") + `EnsureTags` | ✅ **SHIPPED + LIVE-VALIDATED 2026-06-21.** `HistorianClient.AddHistoricalValuesAsync(tag, values)` over `RemoteGrpc`: write-enabled session → `GetTagInfosFromName` (resolves the per-tag GUID = tag-info `TypeId`) → `HistoryService.AddStreamValues` ("ON" buffer, golden-tested). The pure-managed SDK wrote a value and read it back live. All five analog types captured + golden-tested + live write/read-back validated — **Float/Double/Int2/Int4/UInt4** (value = `u32(0) + native-width value`, descriptor `C0 10 01 00` constant; width selected from the tag's declared type); other types throw. gRPC-only (non-gRPC throws). |
|
||
| R3.3 | Ingest-permission validation | confirm the target accepts original-data insert (distinct from `AddS2` cache wall) | ✅ **confirmed** — the D2/AddS2 cache gate (err 129) does NOT block the primed 2023 R2 client; the historical write commits and reads back |
|
||
|
||
**Acceptance:** historical points inserted and read back. **WCF path closed (D2).** gRPC path:
|
||
**transaction lifecycle proven (Begin/End live) + full sequence mapped**; the remaining insert is a
|
||
focused follow-up — reproduce `StorageService.OpenStorageConnection` (+ `RegisterTags`), then decode
|
||
the `btInput` VTQ buffer, each a live-production probe loop.
|
||
|
||
---
|
||
|
||
## Milestone 4 — HARD subsystems (deferred / optional) (L each)
|
||
|
||
Only if the use case demands them. Each is a real subsystem, not an op.
|
||
|
||
| ID | Capability | Approach | Risk |
|
||
|---|---|---|---|
|
||
| R4.1 | Store-and-forward | ✅ **SHIPPED (2026-06-21) — pragmatic durable outbox.** `AVEVA.Historian.Client.StoreForward`: `HistorianStoreForwardWriter` buffers historical-value + event writes to an `IHistorianOutboxStore` (`FileHistorianOutboxStore` = crash-durable atomic JSON-per-entry, FIFO by filename sequence, corrupt-file quarantine; `InMemoryHistorianOutboxStore` for tests) and replays them through an `IHistorianWriteSink` (default `HistorianClientWriteSink`). Background drain loop retries on reconnect; FIFO head-of-line blocking with optional `MaxDeliveryAttempts` dead-lettering; `DropOldest`/`Reject` overflow policy; `GetStatusAsync` snapshot (Pending/Storing/ErrorOccurred mirrors the server SF semantics). 12 unit tests (durability-across-restart, reconnect-drain, head-of-line order, dead-letter, overflow, background loop). **NOT** the bit-faithful native SF cache (`Forward*Snapshot` decode) — that stays deferred; pure client-side, no RE. | high; consider "good enough" |
|
||
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
|
||
| R4.3 | Real store-forward **status** | ⚠️ **PARTIAL — measured idle-state SHIPPED (2026-06-21, gRPC); active-SF magnitude D2-blocked.** Re-scoped against the recovered 2023 R2 gRPC contract (the old "duplex push vs pull" risk is gone — `StorageService` exposes SF state as plain *pull* RPCs). Idle-baseline probe (`grpc-sf-status-probe`) against the live 2023 R2 server resolved the open handle question: the direct SF pull RPCs (`GetSFParameter` / `GetRemainingSnapshotsSize`) require the `OpenStorageConnection` storage-engine **console handle** and are **D2-gated** (same wall as R4.2 revisions), so `Storing`/`Pending`/`DataStored` magnitude is unreachable from a pure managed client. But `StatusService.GetHistorianConsoleStatus` IS reachable on the session string handle, so `GetStoreForwardStatusAsync` over gRPC now returns a **measured** idle-state — it actually contacts the server and reports `ErrorOccurred` when unreachable (vs the old blind all-false synthesis), live-verified + gated test. Non-gRPC keeps the synthesized fallback. Active-SF magnitude (path b) stays deferred behind D2 + needs an invasive force-SF capture to decode the console-status enum. See `docs/plans/store-forward-cache-reverse-engineering.md` §9. | medium (idle done; magnitude D2-blocked) |
|
||
| R4.4 | Multi-historian / redundancy | ✅ **SHIPPED (2026-06-21) — client-side orchestration.** `AVEVA.Historian.Client.Redundancy`: `HistorianRedundantClient` fronts N `IHistorianMember`s (default `HistorianClientMember` over `HistorianClient`) as one logical client. Reads fail over to the next member in priority order — streaming reads only fail over *before the first row* (mid-stream failures propagate to avoid dup/gap); writes fan out (`AllMembers`/`PreferredOnly`) with `All`/`Any` ack policy returning a per-member `HistorianRedundantWriteResult`. Per-member health (`FailureThreshold` demotion) + background watchdog (`CheckHealthAsync`/`PeriodicTimer`) restores recovered members; `GetStatus()` snapshot. Composes with R4.1: back a member's writes with a `HistorianStoreForwardWriter` for the pragmatic ReSyncTags equivalent (down member buffers + replays). 14 unit tests (failover order, mid-stream no-failover, ack policies, fanout modes, watchdog recovery, all-fail aggregation). Pure client-side, no server-side redundancy protocol, no RE. | medium |
|
||
|
||
---
|
||
|
||
## Won't-do from the client (GATED)
|
||
|
||
- **Streaming process-sample writes** (`AddStreamedValue(HistorianDataValue)` / `AddS2`):
|
||
runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your
|
||
ingestion architecture instead of pursuing this.
|
||
|
||
---
|
||
|
||
## Cross-cutting workstreams (run alongside all milestones)
|
||
|
||
- **CW-1 Capture tooling** (enables R0.5, R1.x, R2.1): one reusable "call op → dump
|
||
request/response `bytes` → sanitized fixture" path. Highest leverage — do first.
|
||
- **CW-2 Version compatibility:** matrix of tested Historian versions; serializers keyed
|
||
by version; CI gate.
|
||
- **CW-3 Cross-platform CI:** run the gRPC suite on Linux/macOS (transport is portable;
|
||
explicit-cred auth path).
|
||
- **CW-4 Fixtures discipline:** every new op ships a `fixtures/protocol/<op>/` golden file;
|
||
sanitize hostnames/tags/GUIDs before commit.
|
||
- **CW-5 Public API shape:** keep the modern surface (async, `IAsyncEnumerable`,
|
||
cancellation, options record, DI-friendly) consistent as the surface grows.
|
||
|
||
---
|
||
|
||
## Sequencing (critical path)
|
||
|
||
```
|
||
CW-1 capture tooling ─┐
|
||
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
|
||
R0.6 version gate ────┘
|
||
```
|
||
|
||
Recommended first sprint: **CW-1 + M0 (R0.1–R0.6)** → a fully Windows-free, version-safe
|
||
gRPC client at today's capability. Second sprint: **M1a + M2** (cheap wins + the headline
|
||
event-send). M3/M4 as demand dictates.
|
||
|
||
> **Status 2026-06-21:** sprints 1 + 2 are **complete** (M0 gRPC parity, the reachable M1 surface,
|
||
> and M2 event-send all shipped + live-verified; remaining M1 items are evidence-bounded-out). The
|
||
> reachable surface on the **available 2020 WCF infrastructure is exhausted**. **M3 update
|
||
> (2026-06-21):** with the live 2023 R2 server, the **M3 non-streamed write transaction is now
|
||
> proven reachable over gRPC** — `TransactionService.AddNonStreamValuesBegin/End` round-trips live
|
||
> (the D2 storage-engine-pipe wall is WCF-only). The remaining M3 work is bounded and concrete:
|
||
> capture the `AddNonStreamValues` `btInput` VTQ buffer → golden-tested serializer → real
|
||
> commit+read-back → public `AddHistoricalValuesAsync`. The other levers are unchanged: R4.2 revision
|
||
> *edits* stay pipe-only even on gRPC, and M4 (SF / redundancy) is a HARD deferred subsystem.
|
||
>
|
||
> **M4 update (2026-06-21):** R4.1 store-and-forward, R4.4 redundancy, and R4.3 *measured idle-state*
|
||
> SF status are all SHIPPED (pragmatic, client-side). What remains deferred sits behind the **D2
|
||
> storage-engine console-pipe wall**: R4.2 revision edits and the R4.3 *active-SF magnitude*
|
||
> (`Storing`/`Pending`/`DataStored`) — the SF pull RPCs that carry it need the console handle the
|
||
> managed client can't obtain. Decoding the active-SF console-status enum additionally needs an
|
||
> invasive force-SF capture on a sacrificial Historian.
|
||
|
||
## One-glance status
|
||
|
||
| Milestone | Tier | Effort | Value | When |
|
||
|---|---|---|---|---|
|
||
| M0 gRPC parity + capture tooling | foundation | M | unblocks everything, Windows-free | ✅ **done** |
|
||
| M1 cheap surface | TRIVIAL/BOUNDED | M–L | most remaining read/config | ✅ **done** (reachable surface; rest bounded out) |
|
||
| M2 event send | CAPTURE | S–M | headline write capability | ✅ **done** |
|
||
| M3 historical writes | BOUNDED | M | backfill | ✅ **SHIPPED + LIVE-VALIDATED (2026-06-21)** — `AddHistoricalValuesAsync` over gRPC = `HistoryService.AddStreamValues` ("ON" buffer) + tag-GUID resolve. Pure-managed SDK write read back live. All 5 analog types (Float/Double/Int2/Int4/UInt4). WCF still blocked (D2) |
|
||
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | **R4.1 store-and-forward + R4.4 redundancy + R4.3 measured idle-state SF status SHIPPED** (client-side, 2026-06-21); R4.2 revisions + R4.3 active-SF magnitude deferred behind the same D2 storage-engine-pipe wall (R4.3 magnitude also needs an SF-active server capture) |
|