Files
histsdk/docs/plans/hcal-roadmap.md
T
Joseph Doherty 60b3673f01 M4 R4.4: client-side multi-historian redundancy
Adds AVEVA.Historian.Client.Redundancy — HistorianRedundantClient orchestrates
N single-historian members (IHistorianMember; default HistorianClientMember
over HistorianClient) as one logical client. Pure client-side, no server-side
redundancy protocol, no RE.

- Reads fail over to the next member in priority order. Streaming reads only
  fail over BEFORE the first row is observed; a mid-stream failure propagates
  (failing over mid-stream would risk duplicated/skipped rows).
- Writes fan out: WriteFanout AllMembers | PreferredOnly, with All | Any ack
  policy, returning a per-member HistorianRedundantWriteResult.
- Per-member health: FailureThreshold demotes a failing member out of the
  preferred pool; a background watchdog (PeriodicTimer) + CheckHealthAsync
  re-probe and restore recovered members. GetStatus() snapshot + ActiveMember.
- Composes with R4.1: back a member's writes with a HistorianStoreForwardWriter
  so a down member buffers and replays on recovery — the pragmatic client-side
  equivalent of native ReSyncTags.

14 unit tests (no server): failover order, mid-stream no-failover, all-fail
aggregation, probe-any-up, fan-out ack policies, PreferredOnly, soft reject,
health demotion + CheckHealthAsync restore, watchdog recovery. Full suite 307
green. Roadmap R4.4 marked shipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-21 22:46:10 -04:00

335 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HCAL modern-.NET client — implementation roadmap
Ordered, actionable plan to grow **histsdk** from "reads + basic config" into a broad
HCAL replacement, built on the **2023 R2 gRPC transport**. Derived from
[`hcal-capability-matrix.md`](hcal-capability-matrix.md); event details in
[`histevents.md`](histevents.md).
> Move to the repo's `docs/plans/` when execution starts. Each work item lands as: a
> protocol serializer/parser + golden-byte unit test + an env-gated live integration
> test against the local Historian.
## Progress (updated 2026-06-21)
-**R0.6 version gate**`HistorianServerVersionGate` + `HistorianClientOptions.VerifyServerInterfaceVersion`;
fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are
evidence-based (Hist=11/12, Retr=4, Trx=2; Status reachability-only), captured from the
live server. History 12 (2023 R2 gRPC) accepted alongside 11 (buffer-compatible).
-**CW-1 capture pipeline**`ProtocolCaptureSanitizer` + `ProtocolFixtureWriter` +
`capture-tag-info` CLI command; produces sanitized `fixtures/protocol/<op>/` golden files.
11 unit tests. First fixture: `get-tag-info/analog-*.json`.
-**gRPC auth handshake (read chain)** — LIVE-VERIFIED 2026-06-21 against a real 2023 R2
server: `ReadRawAsync` over `RemoteGrpc` returns rows. Token loop routes to
`StorageService.ValidateClientCredential`. Shared handshake extracted to
`Grpc/HistorianGrpcHandshake` for reuse by the status/browse/metadata paths.
-**R0.4 Probe over gRPC**`Grpc/HistorianGrpcProbe` (History/Retrieval/Status
`GetInterfaceVersion`); `ProbeAsync` routes over gRPC when `Transport==RemoteGrpc`.
**LIVE-VERIFIED 2026-06-21** (no credentials required — runs before the auth loop).
-**R0.3 System parameter over gRPC**`Grpc/HistorianGrpcStatusClient.GetSystemParameterAsync`
(`StatusService.GetSystemParameter`); routed in the dialect. Built + unit-tested + **LIVE-VERIFIED
2026-06-21** against a real 2023 R2 server (returned `HistorianVersion`). Code path is the proven
handshake + a single string-in/string-out RPC.
-**R0.2 Tag metadata over gRPC**`Grpc/HistorianGrpcTagClient.GetTagMetadataAsync`
(`RetrievalService.GetTagInfosFromName`, the plural **string-handle** op). `GetTagMetadataAsync`
routes over gRPC when `Transport==RemoteGrpc`. Request `btTagNames` = `uint count + per-name(uint
charCount + UTF-16LE)` (golden-byte unit-tested); response `btTagInfos` = `uint count + CTagMetadata`
records (reuses `ParseGetTagInfoResponse`); string handle = uppercase Open2 storage GUID. The 2020
WCF string-handle wall does **not** apply on the gRPC front door (as predicted). **LIVE-VERIFIED
2026-06-21** — `GetTagMetadataAsync` returned the requested tag + a valid data type.
-**R0.1 Browse over gRPC** — DONE, **LIVE-VERIFIED 2026-06-21**.
`HistorianClient.BrowseTagNamesAsync` routes over gRPC via
`Grpc/HistorianGrpcTagClient.BrowseTagNamesAsync`: StartTagQuery(**OData** filter) → paged
**QueryTag** (`btRequest` = `u16 0x6752 + u16 1 + u16 queryType + u32 startIndex + u32 count`) →
EndTagQuery; response = `u32 count + per-name(u32 charCount + UTF-16LE) + trailer`. The SDK glob
filter is translated by `GlobToODataFilter` (`Pre*``startswith`, `*suf``endswith`, `*mid*`
`contains`, exact→`eq`). The QueryTag packet-id `0x6752` was recovered from a `.rdata`
packet-descriptor table (`{0x6751,1}`=StartTagQuery, `{0x6752,1}`=QueryTag) — no Ghidra needed.
Golden-byte + glob unit tests + gated live test. Full finding:
`docs/reverse-engineering/grpc-tag-query-odata.md`.
> ✅ **Milestone 0 (gRPC parity) is COMPLETE** — probe, system-param, metadata, and browse all run
> over `RemoteGrpc` and are live-verified against a real 2023 R2 server, alongside the read chain.
> ️ **Auth note (2026-06-21, resolved):** an apparent NTLM round-1 `SEC_E_LOGON_DENIED` blocker
> turned out to be a **test-harness credential-parsing bug**, not a server/account/SDK issue — the
> gitignored creds file stores **quoted** values (`"nam\user"`, `"pass"`), and the env-setup must
> **strip surrounding quotes** before exporting `HISTORIAN_USER`/`HISTORIAN_PASSWORD`. With quotes
> stripped, the domain account authenticates and the full read + system-param + probe chain passes
> live. The round-failure diagnostic added during the hunt is kept
> (`HistorianNativeHandshake.DescribeError` decodes the native error + hex/ASCII preview).
> ⚠️ **Live-verification constraint:** the local Historian is **2020** (WCF, port 32568) — the
> 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1R0.4) can be built and
> golden-byte/unit-tested here but **cannot be live-verified** without an actual 2023 R2 server.
> Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.
> 🔬 **M1a re-classification (2026-06-20).** Two "trivial" items were live-probed against the
> 2020 WCF server and found **not deliverable here**, both for evidence-backed reasons:
> - **R1.3 `GetServerTimeZoneAsync`** — `Status.GetSystemTimeZoneName` is a client-side *stub*
> on 2020 (rc=0, empty value), same family as `GetServerTime`. gRPC/2023R2-only.
> - **R1.1 `ExecuteSqlCommandAsync`** — `ExeC` returns native error 51 (InvalidParameter);
> the contract-3 string-handle ops require an unmapped native session/filter registration
> step (the `StartTagQuery` wall).
>
> Takeaway: the M1a "cheap surface" is *cheap only on the 2023 R2 gRPC front door*. On 2020 WCF
> the boundary is the **handle type** (see the string-handle wall note under §1b and
> `docs/reverse-engineering/wcf-string-handle-wall.md`): **`uint`-handle ops work, `string`-handle
> ops are blocked.** GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed,
> reachable). The reachable **`uint`-handle** items are now **DONE**: ~~R1.8/R1.9 StartQuery
> summary/state modes~~ (resolved = existing `ReadAggregateAsync`) and ~~R1.7 event filters~~
> (✅ 2026-06-20 — `ReadEventsAsync(…, HistorianEventFilter)`, live-honored). M2 event send is
> also done (✅ WCF `AddS2`). **R1.2 `GetRuntimeParameterAsync` is also done** (✅ 2026-06-20,
> `aa/Stat/GETRP`, live-verified) — notably a *string-handle* op that punches through the wall
> using the Open2 storage-session GUID as an **uppercase** string handle, which proved the
> GETHI/ExeC failures were a handle-*format* issue rather than a missing native registration.
> **Follow-up done:** R1.1 `ExecuteSqlCommandAsync` shipped; R1.5 extended-property read shipped
> (R1.6 collapsed into it — no distinct localized op). **R1.4 `GetHistorianInfo` bounded out on
> 2020 WCF** — GETHI there is a named-value query (only `HistorianVersion`); `EventStorageMode` is
> 2023R2-gRPC-only (see `wcf-historian-info.md`). Net: the **reachable 2020-WCF M1 read surface is
> complete**; what remains is config *writes* (M1c — gated on an explicit user request) and the
> gRPC/2023R2-only items (R1.3 timezone, R1.4 EventStorageMode — need a live 2023 R2 server).
>
> **Update 2026-06-21 (live 2023 R2 gRPC probe — both closed):** **R1.3 SHIPPED on gRPC** —
> `GetServerTimeZoneAsync` returns a real zone ("Eastern Daylight Time") via
> `StatusService.GetSystemTimeZoneName`; non-gRPC path fails closed
> (`ProtocolEvidenceMissingException`). **R1.4 bounded out on gRPC too** — `GetHistorianInfo` is
> named-value-only on the gRPC wire as well, `EventStorageMode` resolves under no name on either
> `GetHistorianInfo` or `GetSystemParameter`, and the 518-byte struct is C++-HCAL-internal (filled
> via native vtable+648, not the gRPC op). So **no gRPC/2023R2-specific reads remain open** — the
> entire M1 read surface (2020 WCF + 2023 R2 gRPC) is now closed.
## Guiding principles
1. **gRPC-first.** New ops go on the `RemoteGrpc` transport (clean protobuf envelope);
the inner `bytes` blob is the only thing to RE. Keep WCF as the legacy/Windows path.
2. **Two tests per op, always.** A golden-byte test (deterministic, no server) **and** a
gated live test (`HISTORIAN_GRPC_HOST` / `HISTORIAN_HOST`). No op is "done" without both.
3. **Version-pin, fail closed.** Read server version at connect; gate every byte
serializer on it; throw `ProtocolEvidenceMissingException` on mismatch — never
best-effort parse.
4. **Capture once, encode forever.** For CAPTURE-tier items, instrument one native call,
save a sanitized fixture under `fixtures/protocol/`, then implement against the fixture.
5. **Ship per milestone.** Each milestone is independently releasable.
Effort: **S** ≈ days · **M** ≈ ~1 week · **L** ≈ weeks. Estimates are incremental on
histsdk's existing infra (auth chain, transport, frame primitives, test harness).
---
## Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)
*Goal: everything already working over WCF also works over `RemoteGrpc`, so the whole
read/browse/status surface is Windows-free and the gRPC stack is the default path.*
| ID | Work | gRPC op | Files | Verify | Effort |
|---|---|---|---|---|---|
| R0.1 | Route browse over gRPC | `Retrieval.StartTagQuery`/`QueryTag` or `GetTagInfosFromName` | `Grpc/HistorianGrpcReadOrchestrator` (+ new `…GrpcBrowseClient`), `Historian2020ProtocolDialect` | browse tags live over gRPC | S |
| R0.2 | Route tag metadata over gRPC | `Retrieval.GetTagInfosFromName` | dialect + grpc client | metadata matches WCF result | S |
| R0.3 | Route status/system-param over gRPC | `Status.GetSystemParameter`, `Status.GetHistorianConsoleStatus` | new `Grpc/HistorianGrpcStatusClient` | system param + conn status live | S |
| R0.4 | Probe over gRPC | `*.GetInterfaceVersion` | grpc clients | `ProbeAsync` Windows-free | XS |
| R0.5 | **Capture harness for gRPC payloads** | n/a | reuse `instrument-wcf-*` tooling (same byte blobs) + add a `grpc-call-dump` helper | dump any request/response `bytes` to a fixture | S |
| R0.6 | **Version gate** | server version at connect | `HistorianClientOptions`, orchestrators | mismatched version → throws | S |
**Acceptance:** the entire Phase-0 capability set runs end-to-end over `RemoteGrpc`
(incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.
---
## Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (ML total)
*Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.*
### 1a. Trivial (XSS each, no new payload format)
| ID | Capability | gRPC op | Notes |
|---|---|---|---|
| ~~R1.1~~ | ~~`ExecuteSqlCommandAsync`~~ | `Retrieval.ExecuteSqlCommand` (`ExeC`+`GetR`) | ✅ **DONE (2026-06-20), live-verified.** `ExecuteSqlCommandAsync(sql)``HistorianSqlResult` (columns + typed rows). String-handle op via the uppercase storage GUID. Chain: `Retr.GetV` prime → `ExeC(handle, sql, option=0, ref queryHandle)``GetR` loop (note: `GetR` returns **false even on success** — the stream is in `pResultBuff` regardless; false = final page). `GetR`'s `pResultBuff` is an **NRBF-serialized `DataTable`** (`SerializationFormat.Xml`: members `XmlSchema` + `XmlDiffGram`). BinaryFormatter is gone from .NET 10, so it's decoded read-only with `System.Formats.Nrbf` + `XDocument` (no BinaryFormatter). Shipped: `HistorianSqlResult`/`HistorianSqlColumn`/`HistorianSqlExecuteOption`, `HistorianSqlResultProtocol`, `HistorianWcfSqlClient`, golden `WcfSqlResultProtocolTests`, gated live tests. See `docs/reverse-engineering/wcf-exec-sql.md`. |
| ~~R1.2~~ | ~~`GetRuntimeParameterAsync`~~ | `Status.GetRuntimeParameter` (`aa/Stat/GETRP`) | ✅ **DONE (2026-06-20), live-verified.** Captured (`scripts/Capture-RuntimeParam.ps1`): GETRP is a **`string`-handle** op (GETHI's shape), but reachable from the managed client using the Open2 storage-session GUID as an **uppercase** string handle (`ToString("D").ToUpperInvariant()`). Returns `HistorianVersion` = `20,0,000,000` live. pRequestBuff = `54 67 01 00` + uint nameCount + per-name(uint charCount + UTF-16); pResponseBuff = version + uint resultCount + CRetVariant(`0x43` VT_BSTR + uint16 len + uint16 charCount + UTF-16). Single string-valued param only (multi-name framing inferred, not captured). Shipped: `HistorianClient.GetRuntimeParameterAsync(name)`; golden `WcfRuntimeParameterProtocolTests`. **Note:** GETRP punching through the string-handle wall with the uppercase storage GUID is a strong lead that GETHI/ExeC may be a handle-*format* issue — see `wcf-string-handle-wall.md` §Update. |
| ~~R1.3~~ | `GetServerTimeZoneAsync` | `Status.GetSystemTimeZoneName` | ✅ **DONE on gRPC (2026-06-21), LIVE-VERIFIED** against the real 2023 R2 server — returns `"Eastern Daylight Time"`. `HistorianClient.GetServerTimeZoneAsync` routes over `RemoteGrpc` (`HistorianGrpcStatusClient.GetSystemTimeZoneNameAsync`, `uiHandle`-in/string-out, no buffer). The 2020 WCF op stays a client-side stub (rc=0, empty), so the non-gRPC path **throws `ProtocolEvidenceMissingException`** (fail-closed) rather than return an empty string. Golden message-shape + non-gRPC guardrail unit tests + gated live test. (2020-only routes — per-block `HistoryBlock.TimeZoneOffset`, SQL via R1.1 — remain DST-specific and are not this op.) |
> ✅ **String-handle "wall" RESOLVED (2026-06-20) — it was a handle-FORMAT bug.** R1.4/R1.5/R1.6
> (and R1.1) take a **`string` GUID handle**; the earlier "code 1/51 blocked" verdict came from
> passing the Open2 storage GUID in .NET's default **lowercase**. Sent **uppercase**
> (`storageSessionId.ToString("D").ToUpperInvariant()`) the same handle works: **GETRP** (R1.2,
> shipped), **GETHI** (R1.4) and **ExeC** (R1.1) are all live-verified reachable, and **R1.5
> `GetTepByNm`** is now **shipped + live-verified** (`GetTagExtendedPropertiesAsync`). **R1.6 has no
> distinct op** (collapses into R1.5). Note: `QTB` (StartTagQuery) does **not** punch through — it
> fails *server-side* (`CMdServer::StartActiveTagnamesQuery` over the `aahMetadataServer` pipe),
> independent of handle format, so the index-based property/query paths stay blocked here. Full
> analysis: `docs/reverse-engineering/wcf-string-handle-wall.md` (RESOLVED banner) and
> `docs/reverse-engineering/wcf-tag-extended-properties.md`.
> R1.8/R1.9 (StartQuery summary/state modes) are `uint`-handle and were already reachable.
### 1b. Bounded (decode one `bytes` payload; SM each)
| ID | Capability | gRPC op | Payload to decode | Depends |
|---|---|---|---|---|
| ~~R1.4~~ | `GetHistorianInfoAsync` | `Status.GetHistorianInfo` (`GETHI`) | ⛔ **BOUNDED OUT — now confirmed on the 2023 R2 gRPC front door too (2026-06-21, live-probed).** The motivating field `EventStorageMode` is **not on the wire on either transport.** Live gRPC probe against the real 2023 R2 server: `GetHistorianInfo` is a **named-value** query exactly like 2020 WCF — only `HistorianVersion` resolves (→ `"23,1,000,000"` + `02 00 01 00` trailer); `EventStorageMode` + 7 name variants fail (`success=false`) on **both** `GetHistorianInfo` **and** `GetSystemParameter`. The 518-byte `HISTORIAN_INFO` struct (mode@514) is the **C++ HCAL in-memory model** (managed `HistorianAccess.GetHistorianInfo` fills it via a native **vtable+648** call, not the gRPC op — verified in the 2023 R2 decompile), derived outside the wire. The only wire-reachable field (version) is already shipped (`ProbeAsync`/`GetSystemParameterAsync`/`GetRuntimeParameterAsync`), so a struct API would be hollow + misleading. **Closes the prior "build against a live 2023 R2 server" caveat — done, and there is nothing to ship.** See `docs/reverse-engineering/wcf-historian-info.md`. | uppercase string handle |
| ~~R1.5~~ | Extended-property **read** | `Retrieval.GetTagExtendedPropertiesFromName` (`GetTepByNm`) | ✅ **DONE (2026-06-20), live-verified.** `GetTagExtendedPropertiesAsync(tag)` → name/value pairs. String-handle op via the uppercase storage GUID; name-based path (`GetTagExtendedPropertiesByName`, not the QTB-gated TagQuery path). Request `tagNames` = `uint count` + per-name(`uint charCount`+UTF-16); response = `uint tagCount` + per-tag(marker + compact-ASCII name + `uint propCount` + per-prop(marker + compact-ASCII name + `0x43` VT_BSTR value) + trailer). Sequence-paged. Shipped: `HistorianTagExtendedPropertyProtocol`, golden `WcfTagExtendedPropertyProtocolTests`, gated live test. See `docs/reverse-engineering/wcf-tag-extended-properties.md`. | uppercase string handle |
| ~~R1.6~~ | Localized-property **read** | (no op) | ⛔ **No distinct op on 2020 — collapses into R1.5.** There is no `GetTagLocalizedPropertiesFromName`/`GetTlpByNm` or `GetTagLocalizedPropertiesByName` in `current/aahClientManaged.dll`; the only "localized" surfaces are error-message/UI-text localization. Extended properties (R1.5) are the user-defined tag-property read surface. Closed, not throwing. | — |
| ~~R1.7~~ | Event **filters** | filter bytes in `Retrieval.StartEventQuery` | ✅ **DONE (2026-06-20), live-honored.** `ReadEventsAsync(start, end, HistorianEventFilter)`. The filter rides `StartEventQuery`'s `pRequestBuff` (captured via `EventQuery.AddEventFilter` + instrument-wcf-writemessage; Equal vs Contains diffed to isolate the op). Filter block: `ushort 0 + uint filterCount + uint condCount + uint nameLen + name(UTF-16) + uint 1 + ushort op + uint 1 + value(0x09-len-0x00 compact-ASCII) + byte 0`. **REAL, not inert** (a non-matching predicate returns 0 events; matching returns the subset). Single string-valued predicate only; multi-filter (OR) / multi-condition (AND via `AddEventFilterCondition`) framing not yet fully captured. See `HistorianEventFilter`, golden `WcfEventQueryProtocolTests`. | — |
| ~~R1.8~~ | Analog-summary query | `Retrieval.StartQuery` (summary mode) | ✅ **RESOLVED (2026-06-21) — no new code; == existing `ReadAggregateAsync`.** Request + response both captured (`scripts/Capture-SummaryRequest.ps1 -WithResponse`): the `GetNextQueryResultBuffer2` response is the **ordinary version-9 row buffer** the raw/aggregate parser already handles (decoded 7 rows = SQL ground truth exactly). There is **no rich `CAnalogSummaryValue` struct on the wire** — each row carries a *single* value selected by `RetrievalMode`/QueryType (Integral→8, TimeWeightedAverage→5, …), not an all-aggregates-in-one row; `ValueSelector`/`AggregationType`/`MaxStates` are **inert** on the WCF retrieval path (they configure the SQL provider, not this query). The all-aggregates-at-once shape is the SQL/OLEDB provider's, or the gRPC front door — not 2020 WCF binary. Plan + capture evidence: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md). | — |
| ~~R1.9~~ | State-summary query | `Retrieval.StartQuery` (state mode) | ✅ **RESOLVED (2026-06-21) — same finding as R1.8.** State-summary is the **same `StartQuery2` request** (only `MaxStates`/defaults differ on the wire); the response carries no distinct `CStateSummaryStruct` on the 2020 WCF binary path. Covered by the existing aggregate read; no new `src/` code warranted. Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md). | — |
### 1c. Bounded config writes (SM each)
| ID | Capability | gRPC op | Payload | Notes |
|---|---|---|---|---|
| R1.10 | `RenameTagsAsync` | History rename op | rename request buffer | `AllowRenameTags` already probed |
| ~~R1.11~~ | Extended-property **write** | `History.AddTagExtendedProperties` (AddTEx) | ✅ **Add DONE (2026-06-21), live-verified.** `AddTagExtendedPropertiesAsync`/`AddTagExtendedPropertyAsync` (write mode, uppercase handle). inBuff = exact inverse of the R1.5 read framing (`uint32 groupCount + 0x01 + compact-ASCII tag + uint32 propCount + per prop[0x02 + compact-ASCII name + 0x43 VT_BSTR value] + 0x01 trailer + 0x00 terminator`); the trailing `0x00` is required or the server throws. Golden `WcfTagExtendedPropertyWriteProtocolTests` + gated live write/read-back test. **Delete (DelTep): wire format CAPTURED + serializer golden-proven (2026-06-21), but live delete is server-blocked and NOT shipped.** Captured via a two-session trick (add in Run A → fresh-session read-sync → delete in Run B, past the native err-229 client gate); inBuff = same group framing as Add but property-name-only and a `0x00` group trailer. A decisive experiment shows SDK-added properties ARE deletable (the native client deletes one), so SDK-add is complete; the SDK's own DelTep is rejected (`SErrorException` in `CHistStorage::DeleteTagExtendedProperties`) despite matching mode/handle/inBuff + GetTgByNm/GetTepByNm prime + open channel + 60s retries. Root cause: the native multiplexes services over ONE connection (per-connection working set), which the SDK's per-service WCF channels don't reproduce — needs transport-level multiplexing. See `docs/reverse-engineering/wcf-add-tag-extended-properties.md` §Delete. |
| ~~R1.12~~ | Localized-property **write** | (no op) | ⛔ **No distinct op on 2020 — closed (mirror of R1.6).** A symbol sweep of `current/*.dll` finds no `AddTagLocalizedProperties` / `DeleteTagLocalizedProperties` / any `*LocalizedPropert*` / `TagLocalized*`; only UI/error-text localization (`GetLocalizedText`/`GetLocalizedMessage`/`LocalizedResourcesDir`). Localized properties are a 2023 R2/gRPC concept. Closed, not throwing. See `docs/reverse-engineering/wcf-tag-extended-properties.md` §R1.12. | 2026-06-21 |
| ~~R1.13~~ | Non-analog tag create (string/discrete) | `History.EnsureTags` | distinct CTagMetadata variant | ⛔ **GATED — bounded out (2026-06-21, live-probed).** Native `AddTag` rejects every non-analog type **client-side** (`ErrorCode=ValidationFailed` / "Transaction validation failed", before any WCF op): SingleByteString, DoubleByteString, **and Int1** all fail; Float (control) succeeds. The native `HistorianDataType` enum has **no Discrete/Boolean** and no Int8/UInt8 (SDK-only extensions); `HistorianTag` has **no TagType setter** (type is data-type-derived). So no non-analog wire request is ever emitted → nothing to capture/implement. String/discrete create goes via a different subsystem (config editor / SQL), not this client's AddTag. `EnsureTagAsync` stays analog-only. See `docs/reverse-engineering/wcf-non-analog-tag-create.md`. |
**Acceptance:** read + browse + metadata + system/status + property R/W + summaries +
event-filtered reads + rename all live-verified over gRPC.
---
## Milestone 2 — Event sending (CAPTURE) (SM) ← headline gap
*Goal: `SendEventAsync(HistorianEvent)`. Path fully mapped in histevents.md; one capture away.*
> ✅ **DONE (2026-06-20) — `HistorianClient.SendEventAsync(HistorianEvent)` shipped and
> live-accepted over 2020 WCF.** The headline assumption — that event delivery would ride the
> non-WCF storage-engine pipe (and so be blocked like revision writes) — was **disproved by
> capture**: a native `AddStreamedValue(HistorianEvent)` leaves over WCF as **`AddS2`
> (`IHistoryServiceContract2.AddStreamValues2`)**. CM_EVENT is a built-in registered tag, so the
> `129 TagNotFoundInCache` gate that blocks `AddS2` for user tags does **not** apply to events.
> The full managed chain (Open2 event-mode **0x501** → CM_EVENT RTag2/EnsT2 → AddS2) is accepted
> by the server (`AddS2` returns success, empty error buffer). See the event-send field map under
> §"Event-send wire format" in `histevents.md` and `HistorianEventWriteProtocol`.
>
> ⚠️ **Persistence caveat (environment, not SDK):** on the local dev Historian, accepted events
> are **not persisted** to the queryable store (`v_AlarmEventHistory2` latest stays at the
> pre-test date; count only ages down). The **native** client exhibits the identical behaviour
> (its `AddS2` also returns success but nothing lands), so this is the box's event-ingestion
> pipeline not being active — not an SDK protocol gap. The SDK emits byte-equivalent `AddS2`
> (golden-tested). Full send→store→read-back round-trip awaits a Historian with an active event
> storage pipeline.
| ID | Work | Status |
|---|---|---|
| R2.1 | Capture the event value blob | ✅ `scripts/Capture-EventSend.ps1` (event-send harness scenario + instrument-wcf-{write,read}message); two captures diffed to separate constant framing from value fields. Decisive finding: event-send = WCF `AddS2`, not storage pipe. |
| R2.2 | `HistorianEventWriteProtocol` | ✅ Serializes the `AddS2` pBuf (storage sample buffer wrapping the event VTQ): "OS" sig + sampleCount + length fields + CM_EVENT tag id + EventTime FILETIME + OpcQuality + opaque descriptor + event Id + ReceivedTime FILETIME + Namespace + EventType + version + typed property bag (string props reuse the read parser's `0x43` encoding). Golden-byte test pins capture A. |
| R2.3 | Event write orchestrator | ✅ `HistorianWcfEventOrchestrator.SendEventAsync`: Open2 (0x501) → reuse CM_EVENT RTag2/EnsT2 registration → `AddStreamValues2(handle, pBuf, out err)` on the same /Hist channel + storage-session handle. |
| R2.4 | Public API | ✅ `HistorianClient.SendEventAsync(HistorianEvent)`. Original events only (RevisionVersion=0) with string-valued properties; other property types + revision/update/delete throw `ProtocolEvidenceMissingException` until captured. |
| R2.5 | Round-trip test | ✅ Golden-byte on R2.2 + gated live test `SendEventAsync_AgainstLocalHistorian_AcceptedByServer` (asserts server acceptance; SQL read-back best-effort given the persistence caveat). |
**Acceptance:** an event sent from histsdk is accepted by the historian over WCF with a
byte-correct `AddS2` (✅). Appears-and-reads-back is environment-gated on event persistence (see caveat).
---
## Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)
*Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.*
> ✅ **gRPC UNLOCK (2026-06-21, LIVE-VERIFIED): the transaction lifecycle is REACHABLE over the
> 2023 R2 gRPC front door.** The `grpc-revision-probe` opened a **write-enabled** (`0x401`) gRPC
> session and drove `TransactionService.AddNonStreamValuesBegin(storage-GUID **uppercase**)` →
> real `strTransactionId` → `AddNonStreamValuesEnd(bCommit=false)` (discarded, no data written).
> Where 2020 WCF returns `UnknownClient (51)`, the gRPC `TransactionService` is itself the gateway
> to the storage engine, so the Open2 session GUID is accepted directly — **no legacy pipe**. This
> answers the M3-over-gRPC question below: **yes**, the non-streamed *original* write transaction is
> reachable from the pure-managed SDK. **Not yet shipped:** the `AddNonStreamValues` `btInput` VTQ
> buffer must be captured before any value-commit (never guess wire bytes); revision *edits* (R4.2)
> remain pipe-only even on gRPC. Full detail + decompile basis:
> [`revision-write-path.md`](revision-write-path.md) §"2023 R2 gRPC — the wall is gone".
>
> ⛔ **BLOCKED on 2020 WCF — re-confirmed by the D2 probe (2026-05-05), see
> [`revision-write-path.md`](revision-write-path.md).** The premise above ("the path that is NOT
> the gated cache push") was **disproved** *on WCF*: R3.1's op
> (`Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End`) is the **same**
> `ITransactionServiceContract2.AddNonStreamValuesBegin2` D2 probed, and over WCF it returns
> `04 33 00 00 00` = `UnknownClient (51)` for every handle format **and** the full priming chain
> (Stat/Hist/Retr/Trx GetV + UpdC3 + 6× GetSystemParameter + RTag2). Root cause (IL-walk:
> `CClient.TransactionBegin` → `CHistStorageConnection.StartTransaction` →
> `CStorageEngineConsoleClient.StartTransaction`): the real transaction rides a **shared-memory +
> named-pipe** channel (`STransactPipeClient2` + `SCrtMemFile`) to `aaStorageEngine.exe`, separate
> from WCF. The WCF Trx op is a server-side **relay** that requires a pre-existing storage-engine
> pipe session, which no WCF op can establish. So **M3 over 2020 WCF is unimplementable as a
> pure-managed SDK** — same architectural wall as R4.2 (revisions) and the `AddS2` cache gate.
>
> **Only remaining lever:** the **2023 R2 gRPC front door** (HCAL-native, no legacy storage-engine
> pipe). Whether the gRPC services expose a non-streamed/revision write that bypasses the pipe is
> **untested** — it needs the live 2023 R2 server + a native gRPC capture of the write op, then
> decode/implement. Treat as on-demand (no current demand signal); the WCF path is closed.
| ID | Work | gRPC op | Status |
|---|---|---|---|
| R3.1 | Decode non-streamed VTQ packet | `History.AddStreamValues` ("ON" buffer) + `EnsureTags` | ✅ **CAPTURED + VALIDATED 2026-06-21.** Drove the native 2023 R2 client through a committed historical write (sandbox tag) with the IL-rewritten gRPC client dumping every `byte[]`; the value **read back over gRPC**. The path is **NOT** `AddNonStreamValues`/TransactionService — it's **`HistoryService.AddStreamValues`** with an **"ON" storage-sample buffer** (AddS2 "OS" family) + `EnsureTags`. Buffer decoded: `"ON"(0x4E4F) + u16 count + u32 totalLen + u16 payloadLen + 16B tag GUID + FILETIME + u16 quality + u32 type + FILETIME + 8B double`. D2 cache gate does NOT block the primed 2023 R2 client. See [`revision-write-path.md`](revision-write-path.md) §"R3.1 CAPTURED". |
| R3.2 | `AddHistoricalValuesAsync` | `History.AddStreamValues` ("ON") + `EnsureTags` | ✅ **SHIPPED + LIVE-VALIDATED 2026-06-21.** `HistorianClient.AddHistoricalValuesAsync(tag, values)` over `RemoteGrpc`: write-enabled session → `GetTagInfosFromName` (resolves the per-tag GUID = tag-info `TypeId`) → `HistoryService.AddStreamValues` ("ON" buffer, golden-tested). The pure-managed SDK wrote a value and read it back live. All five analog types captured + golden-tested + live write/read-back validated — **Float/Double/Int2/Int4/UInt4** (value = `u32(0) + native-width value`, descriptor `C0 10 01 00` constant; width selected from the tag's declared type); other types throw. gRPC-only (non-gRPC throws). |
| R3.3 | Ingest-permission validation | confirm the target accepts original-data insert (distinct from `AddS2` cache wall) | ✅ **confirmed** — the D2/AddS2 cache gate (err 129) does NOT block the primed 2023 R2 client; the historical write commits and reads back |
**Acceptance:** historical points inserted and read back. **WCF path closed (D2).** gRPC path:
**transaction lifecycle proven (Begin/End live) + full sequence mapped**; the remaining insert is a
focused follow-up — reproduce `StorageService.OpenStorageConnection` (+ `RegisterTags`), then decode
the `btInput` VTQ buffer, each a live-production probe loop.
---
## Milestone 4 — HARD subsystems (deferred / optional) (L each)
Only if the use case demands them. Each is a real subsystem, not an op.
| ID | Capability | Approach | Risk |
|---|---|---|---|
| R4.1 | Store-and-forward | ✅ **SHIPPED (2026-06-21) — pragmatic durable outbox.** `AVEVA.Historian.Client.StoreForward`: `HistorianStoreForwardWriter` buffers historical-value + event writes to an `IHistorianOutboxStore` (`FileHistorianOutboxStore` = crash-durable atomic JSON-per-entry, FIFO by filename sequence, corrupt-file quarantine; `InMemoryHistorianOutboxStore` for tests) and replays them through an `IHistorianWriteSink` (default `HistorianClientWriteSink`). Background drain loop retries on reconnect; FIFO head-of-line blocking with optional `MaxDeliveryAttempts` dead-lettering; `DropOldest`/`Reject` overflow policy; `GetStatusAsync` snapshot (Pending/Storing/ErrorOccurred mirrors the server SF semantics). 12 unit tests (durability-across-restart, reconnect-drain, head-of-line order, dead-letter, overflow, background loop). **NOT** the bit-faithful native SF cache (`Forward*Snapshot` decode) — that stays deferred; pure client-side, no RE. | high; consider "good enough" |
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
| R4.3 | Real store-forward **status** | duplex push (`SetStoreForwardEvent`) or a decoded pull endpoint — see store-forward plan | medium |
| R4.4 | Multi-historian / redundancy | ✅ **SHIPPED (2026-06-21) — client-side orchestration.** `AVEVA.Historian.Client.Redundancy`: `HistorianRedundantClient` fronts N `IHistorianMember`s (default `HistorianClientMember` over `HistorianClient`) as one logical client. Reads fail over to the next member in priority order — streaming reads only fail over *before the first row* (mid-stream failures propagate to avoid dup/gap); writes fan out (`AllMembers`/`PreferredOnly`) with `All`/`Any` ack policy returning a per-member `HistorianRedundantWriteResult`. Per-member health (`FailureThreshold` demotion) + background watchdog (`CheckHealthAsync`/`PeriodicTimer`) restores recovered members; `GetStatus()` snapshot. Composes with R4.1: back a member's writes with a `HistorianStoreForwardWriter` for the pragmatic ReSyncTags equivalent (down member buffers + replays). 14 unit tests (failover order, mid-stream no-failover, ack policies, fanout modes, watchdog recovery, all-fail aggregation). Pure client-side, no server-side redundancy protocol, no RE. | medium |
---
## Won't-do from the client (GATED)
- **Streaming process-sample writes** (`AddStreamedValue(HistorianDataValue)` / `AddS2`):
runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your
ingestion architecture instead of pursuing this.
---
## Cross-cutting workstreams (run alongside all milestones)
- **CW-1 Capture tooling** (enables R0.5, R1.x, R2.1): one reusable "call op → dump
request/response `bytes` → sanitized fixture" path. Highest leverage — do first.
- **CW-2 Version compatibility:** matrix of tested Historian versions; serializers keyed
by version; CI gate.
- **CW-3 Cross-platform CI:** run the gRPC suite on Linux/macOS (transport is portable;
explicit-cred auth path).
- **CW-4 Fixtures discipline:** every new op ships a `fixtures/protocol/<op>/` golden file;
sanitize hostnames/tags/GUIDs before commit.
- **CW-5 Public API shape:** keep the modern surface (async, `IAsyncEnumerable`,
cancellation, options record, DI-friendly) consistent as the surface grows.
---
## Sequencing (critical path)
```
CW-1 capture tooling ─┐
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
R0.6 version gate ────┘
```
Recommended first sprint: **CW-1 + M0 (R0.1R0.6)** → a fully Windows-free, version-safe
gRPC client at today's capability. Second sprint: **M1a + M2** (cheap wins + the headline
event-send). M3/M4 as demand dictates.
> **Status 2026-06-21:** sprints 1 + 2 are **complete** (M0 gRPC parity, the reachable M1 surface,
> and M2 event-send all shipped + live-verified; remaining M1 items are evidence-bounded-out). The
> reachable surface on the **available 2020 WCF infrastructure is exhausted**. **M3 update
> (2026-06-21):** with the live 2023 R2 server, the **M3 non-streamed write transaction is now
> proven reachable over gRPC** — `TransactionService.AddNonStreamValuesBegin/End` round-trips live
> (the D2 storage-engine-pipe wall is WCF-only). The remaining M3 work is bounded and concrete:
> capture the `AddNonStreamValues` `btInput` VTQ buffer → golden-tested serializer → real
> commit+read-back → public `AddHistoricalValuesAsync`. The other levers are unchanged: R4.2 revision
> *edits* stay pipe-only even on gRPC, and M4 (SF / redundancy) is a HARD deferred subsystem.
## One-glance status
| Milestone | Tier | Effort | Value | When |
|---|---|---|---|---|
| M0 gRPC parity + capture tooling | foundation | M | unblocks everything, Windows-free | ✅ **done** |
| M1 cheap surface | TRIVIAL/BOUNDED | ML | most remaining read/config | ✅ **done** (reachable surface; rest bounded out) |
| M2 event send | CAPTURE | SM | headline write capability | ✅ **done** |
| M3 historical writes | BOUNDED | M | backfill | ✅ **SHIPPED + LIVE-VALIDATED (2026-06-21)**`AddHistoricalValuesAsync` over gRPC = `HistoryService.AddStreamValues` ("ON" buffer) + tag-GUID resolve. Pure-managed SDK write read back live. All 5 analog types (Float/Double/Int2/Int4/UInt4). WCF still blocked (D2) |
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | **R4.1 store-and-forward + R4.4 redundancy SHIPPED** (pragmatic, client-side, 2026-06-21); R4.2 revisions deferred (storage-engine-pipe wall), R4.3 real SF status deferred (RE-heavy, needs SF-active server) |