Files
histsdk/docs/plans/hcal-roadmap.md
T
Joseph Doherty 85ff1b48df R0.1 browse over gRPC SHIPPED — QueryTag cracked, M0 gRPC parity complete
Wires HistorianClient.BrowseTagNamesAsync over gRPC (Transport==RemoteGrpc) via
Grpc/HistorianGrpcTagClient.BrowseTagNamesAsync: StartTagQuery(OData) -> paged
QueryTag -> EndTagQuery. Live-verified against a real 2023 R2 server (returns Sys* tags).

QueryTag packet-id recovered WITHOUT native disassembly: a .rdata packet-descriptor
table in aahClientManaged.dll lists {0x6751,1}=StartTagQuery immediately followed by
{0x6752,1}=QueryTag (found via pefile byte-scan of .rdata), confirmed live.

Wire format (live-verified):
- request btRequest = u16 0x6752 + u16 version(1) + u16 queryType(1=names) + u32 startIndex + u32 count
- response btResonse = u32 count + per-name(u32 charCount + UTF-16LE) + trailer (NextIndex/metadata, ignored)
- new HistorianTagQueryProtocol.ParseTagNameQueryPage tolerates the trailer
- GlobToODataFilter translates the SDK glob filter to OData (Pre*->startswith, *suf->endswith,
  *mid*->contains, exact->eq); the 2023 R2 metadata-server parses filters as OData.

Replaces the earlier RE probe helpers with the shipped browse path. Adds golden-byte
(BuildQueryTagRequest) + 8 glob-translation unit tests + gated live browse test.
226 unit tests pass; 5/5 live gRPC tests pass (read, probe, system-param, metadata, browse).

Milestone 0 (full gRPC parity) is complete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-21 16:01:15 -04:00

252 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HCAL modern-.NET client — implementation roadmap
Ordered, actionable plan to grow **histsdk** from "reads + basic config" into a broad
HCAL replacement, built on the **2023 R2 gRPC transport**. Derived from
[`hcal-capability-matrix.md`](hcal-capability-matrix.md); event details in
[`histevents.md`](histevents.md).
> Move to the repo's `docs/plans/` when execution starts. Each work item lands as: a
> protocol serializer/parser + golden-byte unit test + an env-gated live integration
> test against the local Historian.
## Progress (updated 2026-06-21)
-**R0.6 version gate**`HistorianServerVersionGate` + `HistorianClientOptions.VerifyServerInterfaceVersion`;
fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are
evidence-based (Hist=11/12, Retr=4, Trx=2; Status reachability-only), captured from the
live server. History 12 (2023 R2 gRPC) accepted alongside 11 (buffer-compatible).
-**CW-1 capture pipeline**`ProtocolCaptureSanitizer` + `ProtocolFixtureWriter` +
`capture-tag-info` CLI command; produces sanitized `fixtures/protocol/<op>/` golden files.
11 unit tests. First fixture: `get-tag-info/analog-*.json`.
-**gRPC auth handshake (read chain)** — LIVE-VERIFIED 2026-06-21 against a real 2023 R2
server: `ReadRawAsync` over `RemoteGrpc` returns rows. Token loop routes to
`StorageService.ValidateClientCredential`. Shared handshake extracted to
`Grpc/HistorianGrpcHandshake` for reuse by the status/browse/metadata paths.
-**R0.4 Probe over gRPC**`Grpc/HistorianGrpcProbe` (History/Retrieval/Status
`GetInterfaceVersion`); `ProbeAsync` routes over gRPC when `Transport==RemoteGrpc`.
**LIVE-VERIFIED 2026-06-21** (no credentials required — runs before the auth loop).
-**R0.3 System parameter over gRPC**`Grpc/HistorianGrpcStatusClient.GetSystemParameterAsync`
(`StatusService.GetSystemParameter`); routed in the dialect. Built + unit-tested + **LIVE-VERIFIED
2026-06-21** against a real 2023 R2 server (returned `HistorianVersion`). Code path is the proven
handshake + a single string-in/string-out RPC.
-**R0.2 Tag metadata over gRPC**`Grpc/HistorianGrpcTagClient.GetTagMetadataAsync`
(`RetrievalService.GetTagInfosFromName`, the plural **string-handle** op). `GetTagMetadataAsync`
routes over gRPC when `Transport==RemoteGrpc`. Request `btTagNames` = `uint count + per-name(uint
charCount + UTF-16LE)` (golden-byte unit-tested); response `btTagInfos` = `uint count + CTagMetadata`
records (reuses `ParseGetTagInfoResponse`); string handle = uppercase Open2 storage GUID. The 2020
WCF string-handle wall does **not** apply on the gRPC front door (as predicted). **LIVE-VERIFIED
2026-06-21** — `GetTagMetadataAsync` returned the requested tag + a valid data type.
-**R0.1 Browse over gRPC** — DONE, **LIVE-VERIFIED 2026-06-21**.
`HistorianClient.BrowseTagNamesAsync` routes over gRPC via
`Grpc/HistorianGrpcTagClient.BrowseTagNamesAsync`: StartTagQuery(**OData** filter) → paged
**QueryTag** (`btRequest` = `u16 0x6752 + u16 1 + u16 queryType + u32 startIndex + u32 count`) →
EndTagQuery; response = `u32 count + per-name(u32 charCount + UTF-16LE) + trailer`. The SDK glob
filter is translated by `GlobToODataFilter` (`Pre*``startswith`, `*suf``endswith`, `*mid*`
`contains`, exact→`eq`). The QueryTag packet-id `0x6752` was recovered from a `.rdata`
packet-descriptor table (`{0x6751,1}`=StartTagQuery, `{0x6752,1}`=QueryTag) — no Ghidra needed.
Golden-byte + glob unit tests + gated live test. Full finding:
`docs/reverse-engineering/grpc-tag-query-odata.md`.
> ✅ **Milestone 0 (gRPC parity) is COMPLETE** — probe, system-param, metadata, and browse all run
> over `RemoteGrpc` and are live-verified against a real 2023 R2 server, alongside the read chain.
> ️ **Auth note (2026-06-21, resolved):** an apparent NTLM round-1 `SEC_E_LOGON_DENIED` blocker
> turned out to be a **test-harness credential-parsing bug**, not a server/account/SDK issue — the
> gitignored creds file stores **quoted** values (`"nam\user"`, `"pass"`), and the env-setup must
> **strip surrounding quotes** before exporting `HISTORIAN_USER`/`HISTORIAN_PASSWORD`. With quotes
> stripped, the domain account authenticates and the full read + system-param + probe chain passes
> live. The round-failure diagnostic added during the hunt is kept
> (`HistorianNativeHandshake.DescribeError` decodes the native error + hex/ASCII preview).
> ⚠️ **Live-verification constraint:** the local Historian is **2020** (WCF, port 32568) — the
> 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1R0.4) can be built and
> golden-byte/unit-tested here but **cannot be live-verified** without an actual 2023 R2 server.
> Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.
> 🔬 **M1a re-classification (2026-06-20).** Two "trivial" items were live-probed against the
> 2020 WCF server and found **not deliverable here**, both for evidence-backed reasons:
> - **R1.3 `GetServerTimeZoneAsync`** — `Status.GetSystemTimeZoneName` is a client-side *stub*
> on 2020 (rc=0, empty value), same family as `GetServerTime`. gRPC/2023R2-only.
> - **R1.1 `ExecuteSqlCommandAsync`** — `ExeC` returns native error 51 (InvalidParameter);
> the contract-3 string-handle ops require an unmapped native session/filter registration
> step (the `StartTagQuery` wall).
>
> Takeaway: the M1a "cheap surface" is *cheap only on the 2023 R2 gRPC front door*. On 2020 WCF
> the boundary is the **handle type** (see the string-handle wall note under §1b and
> `docs/reverse-engineering/wcf-string-handle-wall.md`): **`uint`-handle ops work, `string`-handle
> ops are blocked.** GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed,
> reachable). The genuinely reachable next items on 2020 WCF are the remaining **`uint`-handle**
> ops: **R1.8/R1.9 StartQuery summary/state modes** and **R1.7 event filters** (filter bytes ride
> the proven `uint`-handle `StartEventQuery`). Everything string-handle waits on one RE target:
> the native session/filter registration.
## Guiding principles
1. **gRPC-first.** New ops go on the `RemoteGrpc` transport (clean protobuf envelope);
the inner `bytes` blob is the only thing to RE. Keep WCF as the legacy/Windows path.
2. **Two tests per op, always.** A golden-byte test (deterministic, no server) **and** a
gated live test (`HISTORIAN_GRPC_HOST` / `HISTORIAN_HOST`). No op is "done" without both.
3. **Version-pin, fail closed.** Read server version at connect; gate every byte
serializer on it; throw `ProtocolEvidenceMissingException` on mismatch — never
best-effort parse.
4. **Capture once, encode forever.** For CAPTURE-tier items, instrument one native call,
save a sanitized fixture under `fixtures/protocol/`, then implement against the fixture.
5. **Ship per milestone.** Each milestone is independently releasable.
Effort: **S** ≈ days · **M** ≈ ~1 week · **L** ≈ weeks. Estimates are incremental on
histsdk's existing infra (auth chain, transport, frame primitives, test harness).
---
## Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)
*Goal: everything already working over WCF also works over `RemoteGrpc`, so the whole
read/browse/status surface is Windows-free and the gRPC stack is the default path.*
| ID | Work | gRPC op | Files | Verify | Effort |
|---|---|---|---|---|---|
| R0.1 | Route browse over gRPC | `Retrieval.StartTagQuery`/`QueryTag` or `GetTagInfosFromName` | `Grpc/HistorianGrpcReadOrchestrator` (+ new `…GrpcBrowseClient`), `Historian2020ProtocolDialect` | browse tags live over gRPC | S |
| R0.2 | Route tag metadata over gRPC | `Retrieval.GetTagInfosFromName` | dialect + grpc client | metadata matches WCF result | S |
| R0.3 | Route status/system-param over gRPC | `Status.GetSystemParameter`, `Status.GetHistorianConsoleStatus` | new `Grpc/HistorianGrpcStatusClient` | system param + conn status live | S |
| R0.4 | Probe over gRPC | `*.GetInterfaceVersion` | grpc clients | `ProbeAsync` Windows-free | XS |
| R0.5 | **Capture harness for gRPC payloads** | n/a | reuse `instrument-wcf-*` tooling (same byte blobs) + add a `grpc-call-dump` helper | dump any request/response `bytes` to a fixture | S |
| R0.6 | **Version gate** | server version at connect | `HistorianClientOptions`, orchestrators | mismatched version → throws | S |
**Acceptance:** the entire Phase-0 capability set runs end-to-end over `RemoteGrpc`
(incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.
---
## Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (ML total)
*Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.*
### 1a. Trivial (XSS each, no new payload format)
| ID | Capability | gRPC op | Notes |
|---|---|---|---|
| ~~R1.1~~ | ~~`ExecuteSqlCommandAsync`~~ | `Retrieval.ExecuteSqlCommand` | ⚠ **Blocked on 2020 WCF.** Live-probed 2026-06-20: `ExeC` returns native error type 4 / code **51 (InvalidParameter)** for every handle variant — same unmapped *native session/filter registration* prerequisite that blocks `StartTagQuery`/`QueryTag` (see `implementation-status.md` lines ~982, ~1404). Needs that registration RE'd, or a 2023 R2 gRPC server. Do not wire via guessed calls. |
| R1.2 | `GetRuntimeParameterAsync` | `Status.GetRuntimeParameter` | mirror `GetSystemParameter` |
| ~~R1.3~~ | ~~`GetServerTimeZoneAsync`~~ | `Status.GetSystemTimeZoneName` | ⚠ **gRPC/2023R2-only.** Verified 2026-06-20: over **2020 WCF** this op is a stub (rc=0, empty value) in the `GetServerTime` family — not shippable here. Build+verify only against a live 2023 R2 server. See `docs/reverse-engineering/wcf-status-localhost.md`. |
> ⛔ **String-handle wall (2026-06-20).** R1.4/R1.5/R1.6 (and R1.1) are **all blocked on 2020
> WCF** for the *same* reason: their ops take a **`string` GUID handle** and require an unmapped
> native session/filter registration. Probed live — GETHI returns code 1 for the exact native
> request shape across 5 handle formats + Stat.GetV priming; ExeC returns code 51. The proven
> surface uses **`uint`-handle** ops only. **One RE target — the native string-handle session
> registration — unblocks this whole sub-milestone.** Full analysis:
> `docs/reverse-engineering/wcf-string-handle-wall.md`. R1.8/R1.9 (StartQuery summary/state modes)
> are `uint`-handle and remain reachable on 2020 WCF.
### 1b. Bounded (decode one `bytes` payload; SM each)
| ID | Capability | gRPC op | Payload to decode | Depends |
|---|---|---|---|---|
| ~~R1.4~~ | `GetHistorianInfoAsync` | `Status.GetHistorianInfo` | ⛔ **string-handle wall** — GETHI returns code 1 on 2020 WCF (all handle/priming variants). GETHI buffer incl. `EventStorageMode`@514. | string-handle RE |
| ~~R1.5~~ | Extended-property **read** | `Retrieval.GetTagExtendedPropertiesFromName` | ⛔ **string-handle wall** (GetTepByNm takes `string handle`). TEP result buffer. | string-handle RE |
| ~~R1.6~~ | Localized-property **read** | `Retrieval.GetTagLocalizedPropertiesFromName` | ⛔ **string-handle wall** (same family). | string-handle RE |
| R1.7 | Event **filters** | filter bytes in `Retrieval.StartEventQuery` | filter predicate encoding (name/op/value) — **`uint`-handle**, reachable | R0.5 |
| R1.8 | Analog-summary query | `Retrieval.StartQuery` (summary mode) | summary row layout — **`uint`-handle, reachable. Scoped + decode targets located** (`CAnalogSummaryValue.UnpackFromValueBuffer`, fields Min/Max/First/Last/ValueCount/Integral/…). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
| R1.9 | State-summary query | `Retrieval.StartQuery` (state mode) | state-summary row layout — **`uint`-handle, reachable. Scoped** (`CStateSummaryStruct`: MinContained/MaxContained/TotalContained/PartialStart/PartialEnd/StateEntryCount). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
### 1c. Bounded config writes (SM each)
| ID | Capability | gRPC op | Payload | Notes |
|---|---|---|---|---|
| R1.10 | `RenameTagsAsync` | History rename op | rename request buffer | `AllowRenameTags` already probed |
| R1.11 | Extended-property **write** | `History.AddTagExtendedProperties` (+ groups) / `DeleteTagExtendedProperties` | TEP serialize | mirror analog CTagMetadata discipline |
| R1.12 | Localized-property **write** | `History.AddTagLocalizedProperties` / `DeleteTagLocalizedProperties` | localized serialize | |
| R1.13 | Non-analog tag create (string/discrete) | `History.EnsureTags` | distinct CTagMetadata variant | ⚠ native AddTag rejected some types — confirm server path first; may be GATED |
**Acceptance:** read + browse + metadata + system/status + property R/W + summaries +
event-filtered reads + rename all live-verified over gRPC.
---
## Milestone 2 — Event sending (CAPTURE) (SM) ← headline gap
*Goal: `SendEventAsync(HistorianEvent)`. Path fully mapped in histevents.md; one capture away.*
| ID | Work | Detail |
|---|---|---|
| R2.1 | Capture the event value blob | Instrument `CCommonArchestraEventValue::PackToVtq` (or dump the VTQ value bytes) on a live `AddStreamedValue(HistorianEvent)`; save sanitized fixture |
| R2.2 | `HistorianEventWriteProtocol` | Serialize header (`ReceivedTime, EventType, EventTime, Id, RevisionVersion, IsUpdate/IsDelete, Namespace`) + typed property bag — **inverse of `HistorianEventRowProtocol`** (reuse typemarkers `0x02/0x10/0x18/0x31/0x43/…`) |
| R2.3 | Event write orchestrator | Open **Event** connection (write mode) → register CM_EVENT (already have) → `Storage.AddStreamValues` with the event VTQ |
| R2.4 | Public API | `HistorianClient.SendEventAsync(HistorianEvent)` (+ `HistorianEvent` model: Type, EventTime, property bag) |
| R2.5 | Round-trip test | Send an event → read it back via `StartEventQuery` / `v_AlarmEventHistory2`; golden-byte on R2.2 |
**Acceptance:** an event sent from histsdk appears in the historian and is read back with
matching Type + properties. **Now practical** — Historian is installed locally.
---
## Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)
*Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.*
| ID | Work | gRPC op |
|---|---|---|
| R3.1 | Decode non-streamed VTQ packet | `Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End` |
| R3.2 | `AddHistoricalValuesAsync` | batched begin→values→end |
| R3.3 | Ingest-permission validation | confirm the target accepts original-data insert (distinct from `AddS2` cache wall) |
**Acceptance:** historical points inserted and read back. Document clearly where this
differs from (gated) streaming sample writes.
---
## Milestone 4 — HARD subsystems (deferred / optional) (L each)
Only if the use case demands them. Each is a real subsystem, not an op.
| ID | Capability | Approach | Risk |
|---|---|---|---|
| R4.1 | Store-and-forward | **Pragmatic local queue** (durable outbox + replay on reconnect) rather than bit-faithful SF cache + `Forward*Snapshot`. Faithful SF = decode SF cache format + snapshot framing + recovery log | high; consider "good enough" |
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
| R4.3 | Real store-forward **status** | duplex push (`SetStoreForwardEvent`) or a decoded pull endpoint — see store-forward plan | medium |
| R4.4 | Multi-historian / redundancy | client-side orchestration over N single-historian sessions (failover, ReSyncTags, partner watchdog) — build last | medium |
---
## Won't-do from the client (GATED)
- **Streaming process-sample writes** (`AddStreamedValue(HistorianDataValue)` / `AddS2`):
runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your
ingestion architecture instead of pursuing this.
---
## Cross-cutting workstreams (run alongside all milestones)
- **CW-1 Capture tooling** (enables R0.5, R1.x, R2.1): one reusable "call op → dump
request/response `bytes` → sanitized fixture" path. Highest leverage — do first.
- **CW-2 Version compatibility:** matrix of tested Historian versions; serializers keyed
by version; CI gate.
- **CW-3 Cross-platform CI:** run the gRPC suite on Linux/macOS (transport is portable;
explicit-cred auth path).
- **CW-4 Fixtures discipline:** every new op ships a `fixtures/protocol/<op>/` golden file;
sanitize hostnames/tags/GUIDs before commit.
- **CW-5 Public API shape:** keep the modern surface (async, `IAsyncEnumerable`,
cancellation, options record, DI-friendly) consistent as the surface grows.
---
## Sequencing (critical path)
```
CW-1 capture tooling ─┐
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
R0.6 version gate ────┘
```
Recommended first sprint: **CW-1 + M0 (R0.1R0.6)** → a fully Windows-free, version-safe
gRPC client at today's capability. Second sprint: **M1a + M2** (cheap wins + the headline
event-send). M3/M4 as demand dictates.
## One-glance status
| Milestone | Tier | Effort | Value | When |
|---|---|---|---|---|
| M0 gRPC parity + capture tooling | foundation | M | unblocks everything, Windows-free | **now** |
| M1 cheap surface | TRIVIAL/BOUNDED | ML | most remaining read/config | next |
| M2 event send | CAPTURE | SM | headline write capability | next |
| M3 historical writes | BOUNDED | M | backfill | on demand |
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | defer |