Files
histsdk/docs/plans/hcal-roadmap.md
T
Joseph Doherty 4da5287d01 R1.2 GetRuntimeParameter + string-handle wall RESOLVED (handle-format bug)
Execute HCAL roadmap R1.2 (GetRuntimeParameterAsync) end-to-end, and in doing so
discover that the "string-handle wall" blocking R1.1/R1.4/R1.5/R1.6 was a handle
FORMAT bug, not a missing native session/filter registration.

R1.2 (shipped, live-verified):
- Captured native GetRuntimeParameter -> WCF op aa/Stat/GETRP (string-handle op,
  GETHI's shape), via scripts/Capture-RuntimeParam.ps1 + instrument-wcf-{write,read}message.
- HistorianRuntimeParameterProtocol serializes pRequestBuff (54 67 01 00 + uint
  nameCount + per-name uint charCount + UTF-16) and parses pResponseBuff (version +
  uint resultCount + CRetVariant 0x43 VT_BSTR + uint16 len + uint16 charCount + UTF-16).
- IStatusServiceContract2.GetRuntimeParameter (GETRP) op; HistorianWcfStatusClient
  passes the Open2 storage-session GUID as the string handle, UPPERCASE.
- Public HistorianClient.GetRuntimeParameterAsync(name) via the dialect.
- Golden WcfRuntimeParameterProtocolTests + gated live test; returns HistorianVersion.

String-handle wall RESOLVED (proven, public APIs deferred):
- The Open2 storage GUID works as the string handle when sent UPPERCASE
  (ToString("D").ToUpperInvariant()); earlier "blocked" probes used lowercase.
- Live-probed GETHI (R1.4) -> returns data; ExeC (R1.1) -> Retr.GetV prime -> ExeC ->
  GetR returns a BinaryFormatter-serialized .NET DataTable. Gated
  StringHandleProbeDiagnosticTests + scripts/Capture-ExecSql.ps1 + exec-sql harness scenario.
- Docs flipped: wcf-string-handle-wall.md RESOLVED banner; roadmap R1.1/R1.4 reachable,
  R1.5/R1.6 likely; wcf-status-localhost.md GETRP section.
- R1.1/R1.4 public APIs NOT shipped: ExeC needs a GetR paging loop + a BinaryFormatter-
  stream parser (BinaryFormatter is removed from .NET 10); GETHI full-info struct needs
  its own capture.

223 unit tests pass; gated live tests green against the local 2020 Historian.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-20 22:10:31 -04:00

236 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HCAL modern-.NET client — implementation roadmap
Ordered, actionable plan to grow **histsdk** from "reads + basic config" into a broad
HCAL replacement, built on the **2023 R2 gRPC transport**. Derived from
[`hcal-capability-matrix.md`](hcal-capability-matrix.md); event details in
[`histevents.md`](histevents.md).
> Move to the repo's `docs/plans/` when execution starts. Each work item lands as: a
> protocol serializer/parser + golden-byte unit test + an env-gated live integration
> test against the local Historian.
## Progress (updated 2026-06-19)
-**R0.6 version gate**`HistorianServerVersionGate` + `HistorianClientOptions.VerifyServerInterfaceVersion`;
fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are
evidence-based (Hist=11, Retr=4, Trx=2; Status reachability-only), captured from the
live server. 10 unit tests.
-**CW-1 capture pipeline**`ProtocolCaptureSanitizer` + `ProtocolFixtureWriter` +
`capture-tag-info` CLI command; produces sanitized `fixtures/protocol/<op>/` golden files.
11 unit tests. First fixture: `get-tag-info/analog-*.json`.
> ⚠️ **Live-verification constraint:** the local Historian is **2020** (WCF, port 32568) — the
> 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1R0.4) can be built and
> golden-byte/unit-tested here but **cannot be live-verified** without an actual 2023 R2 server.
> Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.
> 🔬 **M1a re-classification (2026-06-20).** Two "trivial" items were live-probed against the
> 2020 WCF server and found **not deliverable here**, both for evidence-backed reasons:
> - **R1.3 `GetServerTimeZoneAsync`** — `Status.GetSystemTimeZoneName` is a client-side *stub*
> on 2020 (rc=0, empty value), same family as `GetServerTime`. gRPC/2023R2-only.
> - **R1.1 `ExecuteSqlCommandAsync`** — `ExeC` returns native error 51 (InvalidParameter);
> the contract-3 string-handle ops require an unmapped native session/filter registration
> step (the `StartTagQuery` wall).
>
> Takeaway: the M1a "cheap surface" is *cheap only on the 2023 R2 gRPC front door*. On 2020 WCF
> the boundary is the **handle type** (see the string-handle wall note under §1b and
> `docs/reverse-engineering/wcf-string-handle-wall.md`): **`uint`-handle ops work, `string`-handle
> ops are blocked.** GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed,
> reachable). The reachable **`uint`-handle** items are now **DONE**: ~~R1.8/R1.9 StartQuery
> summary/state modes~~ (resolved = existing `ReadAggregateAsync`) and ~~R1.7 event filters~~
> (✅ 2026-06-20 — `ReadEventsAsync(…, HistorianEventFilter)`, live-honored). M2 event send is
> also done (✅ WCF `AddS2`). **R1.2 `GetRuntimeParameterAsync` is also done** (✅ 2026-06-20,
> `aa/Stat/GETRP`, live-verified) — notably a *string-handle* op that punches through the wall
> using the Open2 storage-session GUID as an **uppercase** string handle, which is a strong lead
> that the GETHI/ExeC failures are (at least partly) a handle-*format* issue rather than only a
> missing native registration. **Cheap high-value follow-up: retry GETHI/ExeC with the uppercased
> storage GUID** before assuming the registration wall (see `wcf-string-handle-wall.md` §Update).
## Guiding principles
1. **gRPC-first.** New ops go on the `RemoteGrpc` transport (clean protobuf envelope);
the inner `bytes` blob is the only thing to RE. Keep WCF as the legacy/Windows path.
2. **Two tests per op, always.** A golden-byte test (deterministic, no server) **and** a
gated live test (`HISTORIAN_GRPC_HOST` / `HISTORIAN_HOST`). No op is "done" without both.
3. **Version-pin, fail closed.** Read server version at connect; gate every byte
serializer on it; throw `ProtocolEvidenceMissingException` on mismatch — never
best-effort parse.
4. **Capture once, encode forever.** For CAPTURE-tier items, instrument one native call,
save a sanitized fixture under `fixtures/protocol/`, then implement against the fixture.
5. **Ship per milestone.** Each milestone is independently releasable.
Effort: **S** ≈ days · **M** ≈ ~1 week · **L** ≈ weeks. Estimates are incremental on
histsdk's existing infra (auth chain, transport, frame primitives, test harness).
---
## Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)
*Goal: everything already working over WCF also works over `RemoteGrpc`, so the whole
read/browse/status surface is Windows-free and the gRPC stack is the default path.*
| ID | Work | gRPC op | Files | Verify | Effort |
|---|---|---|---|---|---|
| R0.1 | Route browse over gRPC | `Retrieval.StartTagQuery`/`QueryTag` or `GetTagInfosFromName` | `Grpc/HistorianGrpcReadOrchestrator` (+ new `…GrpcBrowseClient`), `Historian2020ProtocolDialect` | browse tags live over gRPC | S |
| R0.2 | Route tag metadata over gRPC | `Retrieval.GetTagInfosFromName` | dialect + grpc client | metadata matches WCF result | S |
| R0.3 | Route status/system-param over gRPC | `Status.GetSystemParameter`, `Status.GetHistorianConsoleStatus` | new `Grpc/HistorianGrpcStatusClient` | system param + conn status live | S |
| R0.4 | Probe over gRPC | `*.GetInterfaceVersion` | grpc clients | `ProbeAsync` Windows-free | XS |
| R0.5 | **Capture harness for gRPC payloads** | n/a | reuse `instrument-wcf-*` tooling (same byte blobs) + add a `grpc-call-dump` helper | dump any request/response `bytes` to a fixture | S |
| R0.6 | **Version gate** | server version at connect | `HistorianClientOptions`, orchestrators | mismatched version → throws | S |
**Acceptance:** the entire Phase-0 capability set runs end-to-end over `RemoteGrpc`
(incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.
---
## Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (ML total)
*Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.*
### 1a. Trivial (XSS each, no new payload format)
| ID | Capability | gRPC op | Notes |
|---|---|---|---|
| R1.1 | `ExecuteSqlCommandAsync` | `Retrieval.ExecuteSqlCommand` (`ExeC`+`GetR`) | ✅ **REACHABLE (2026-06-20, live-probed).** The earlier "code 51 blocked" verdict was a handle-**format** bug — `ExeC` succeeds with the Open2 storage GUID sent **uppercase** (`ToString("D").ToUpperInvariant()`). Chain: `Retr.GetV` prime → `ExeC(handle, sqlString, option=0, ref queryHandle)``GetR(handle, queryHandle, ref sequence)` returns the result as a **BinaryFormatter-serialized .NET DataTable**. Proven by `StringHandleProbeDiagnosticTests` + `scripts/Capture-ExecSql.ps1`. **Public API not yet shipped** — needs a `GetR` continuation loop + a custom BinaryFormatter-stream parser (BinaryFormatter is removed from .NET 10, so a DataTable can't just be deserialized). |
| ~~R1.2~~ | ~~`GetRuntimeParameterAsync`~~ | `Status.GetRuntimeParameter` (`aa/Stat/GETRP`) | ✅ **DONE (2026-06-20), live-verified.** Captured (`scripts/Capture-RuntimeParam.ps1`): GETRP is a **`string`-handle** op (GETHI's shape), but reachable from the managed client using the Open2 storage-session GUID as an **uppercase** string handle (`ToString("D").ToUpperInvariant()`). Returns `HistorianVersion` = `20,0,000,000` live. pRequestBuff = `54 67 01 00` + uint nameCount + per-name(uint charCount + UTF-16); pResponseBuff = version + uint resultCount + CRetVariant(`0x43` VT_BSTR + uint16 len + uint16 charCount + UTF-16). Single string-valued param only (multi-name framing inferred, not captured). Shipped: `HistorianClient.GetRuntimeParameterAsync(name)`; golden `WcfRuntimeParameterProtocolTests`. **Note:** GETRP punching through the string-handle wall with the uppercase storage GUID is a strong lead that GETHI/ExeC may be a handle-*format* issue — see `wcf-string-handle-wall.md` §Update. |
| ~~R1.3~~ | ~~`GetServerTimeZoneAsync`~~ | `Status.GetSystemTimeZoneName` | ⚠ **gRPC/2023R2-only.** Verified 2026-06-20: over **2020 WCF** this op is a stub (rc=0, empty value) in the `GetServerTime` family — not shippable here. Build+verify only against a live 2023 R2 server. See `docs/reverse-engineering/wcf-status-localhost.md`. |
> ✅ **String-handle "wall" RESOLVED (2026-06-20) — it was a handle-FORMAT bug.** R1.4/R1.5/R1.6
> (and R1.1) take a **`string` GUID handle**; the earlier "code 1/51 blocked" verdict came from
> passing the Open2 storage GUID in .NET's default **lowercase**. Sent **uppercase**
> (`storageSessionId.ToString("D").ToUpperInvariant()`) the same handle works: **GETRP** (R1.2,
> shipped), **GETHI** (R1.4) and **ExeC** (R1.1) are all live-verified reachable. R1.5/R1.6
> (GetTepByNm family) + QTB/QTG are very likely reachable the same way (not yet individually
> re-probed). Full analysis: `docs/reverse-engineering/wcf-string-handle-wall.md` (RESOLVED banner).
> R1.8/R1.9 (StartQuery summary/state modes) are `uint`-handle and were already reachable.
### 1b. Bounded (decode one `bytes` payload; SM each)
| ID | Capability | gRPC op | Payload to decode | Depends |
|---|---|---|---|---|
| R1.4 | `GetHistorianInfoAsync` | `Status.GetHistorianInfo` (`GETHI`) | ✅ **REACHABLE (2026-06-20, live-probed)** via the uppercase storage GUID — `GETHI` returns data (`StringHandleProbeDiagnosticTests`). The version-keyed request returns `uint charCount + UTF-16`; the full info struct (incl. `EventStorageMode`@514) needs its own request capture. **Public API not yet shipped.** | uppercase string handle |
| R1.5 | Extended-property **read** | `Retrieval.GetTagExtendedPropertiesFromName` | 🟡 **Likely reachable** via uppercase string handle (GetTepByNm family) — not yet individually re-probed. TEP result buffer. | uppercase string handle |
| R1.6 | Localized-property **read** | `Retrieval.GetTagLocalizedPropertiesFromName` | 🟡 **Likely reachable** (same family) — not yet re-probed. | uppercase string handle |
| ~~R1.7~~ | Event **filters** | filter bytes in `Retrieval.StartEventQuery` | ✅ **DONE (2026-06-20), live-honored.** `ReadEventsAsync(start, end, HistorianEventFilter)`. The filter rides `StartEventQuery`'s `pRequestBuff` (captured via `EventQuery.AddEventFilter` + instrument-wcf-writemessage; Equal vs Contains diffed to isolate the op). Filter block: `ushort 0 + uint filterCount + uint condCount + uint nameLen + name(UTF-16) + uint 1 + ushort op + uint 1 + value(0x09-len-0x00 compact-ASCII) + byte 0`. **REAL, not inert** (a non-matching predicate returns 0 events; matching returns the subset). Single string-valued predicate only; multi-filter (OR) / multi-condition (AND via `AddEventFilterCondition`) framing not yet fully captured. See `HistorianEventFilter`, golden `WcfEventQueryProtocolTests`. | — |
| R1.8 | Analog-summary query | `Retrieval.StartQuery` (summary mode) | summary row layout — **`uint`-handle, reachable. Scoped + decode targets located** (`CAnalogSummaryValue.UnpackFromValueBuffer`, fields Min/Max/First/Last/ValueCount/Integral/…). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
| R1.9 | State-summary query | `Retrieval.StartQuery` (state mode) | state-summary row layout — **`uint`-handle, reachable. Scoped** (`CStateSummaryStruct`: MinContained/MaxContained/TotalContained/PartialStart/PartialEnd/StateEntryCount). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
### 1c. Bounded config writes (SM each)
| ID | Capability | gRPC op | Payload | Notes |
|---|---|---|---|---|
| R1.10 | `RenameTagsAsync` | History rename op | rename request buffer | `AllowRenameTags` already probed |
| R1.11 | Extended-property **write** | `History.AddTagExtendedProperties` (+ groups) / `DeleteTagExtendedProperties` | TEP serialize | mirror analog CTagMetadata discipline |
| R1.12 | Localized-property **write** | `History.AddTagLocalizedProperties` / `DeleteTagLocalizedProperties` | localized serialize | |
| R1.13 | Non-analog tag create (string/discrete) | `History.EnsureTags` | distinct CTagMetadata variant | ⚠ native AddTag rejected some types — confirm server path first; may be GATED |
**Acceptance:** read + browse + metadata + system/status + property R/W + summaries +
event-filtered reads + rename all live-verified over gRPC.
---
## Milestone 2 — Event sending (CAPTURE) (SM) ← headline gap
*Goal: `SendEventAsync(HistorianEvent)`. Path fully mapped in histevents.md; one capture away.*
> ✅ **DONE (2026-06-20) — `HistorianClient.SendEventAsync(HistorianEvent)` shipped and
> live-accepted over 2020 WCF.** The headline assumption — that event delivery would ride the
> non-WCF storage-engine pipe (and so be blocked like revision writes) — was **disproved by
> capture**: a native `AddStreamedValue(HistorianEvent)` leaves over WCF as **`AddS2`
> (`IHistoryServiceContract2.AddStreamValues2`)**. CM_EVENT is a built-in registered tag, so the
> `129 TagNotFoundInCache` gate that blocks `AddS2` for user tags does **not** apply to events.
> The full managed chain (Open2 event-mode **0x501** → CM_EVENT RTag2/EnsT2 → AddS2) is accepted
> by the server (`AddS2` returns success, empty error buffer). See the event-send field map under
> §"Event-send wire format" in `histevents.md` and `HistorianEventWriteProtocol`.
>
> ⚠️ **Persistence caveat (environment, not SDK):** on the local dev Historian, accepted events
> are **not persisted** to the queryable store (`v_AlarmEventHistory2` latest stays at the
> pre-test date; count only ages down). The **native** client exhibits the identical behaviour
> (its `AddS2` also returns success but nothing lands), so this is the box's event-ingestion
> pipeline not being active — not an SDK protocol gap. The SDK emits byte-equivalent `AddS2`
> (golden-tested). Full send→store→read-back round-trip awaits a Historian with an active event
> storage pipeline.
| ID | Work | Status |
|---|---|---|
| R2.1 | Capture the event value blob | ✅ `scripts/Capture-EventSend.ps1` (event-send harness scenario + instrument-wcf-{write,read}message); two captures diffed to separate constant framing from value fields. Decisive finding: event-send = WCF `AddS2`, not storage pipe. |
| R2.2 | `HistorianEventWriteProtocol` | ✅ Serializes the `AddS2` pBuf (storage sample buffer wrapping the event VTQ): "OS" sig + sampleCount + length fields + CM_EVENT tag id + EventTime FILETIME + OpcQuality + opaque descriptor + event Id + ReceivedTime FILETIME + Namespace + EventType + version + typed property bag (string props reuse the read parser's `0x43` encoding). Golden-byte test pins capture A. |
| R2.3 | Event write orchestrator | ✅ `HistorianWcfEventOrchestrator.SendEventAsync`: Open2 (0x501) → reuse CM_EVENT RTag2/EnsT2 registration → `AddStreamValues2(handle, pBuf, out err)` on the same /Hist channel + storage-session handle. |
| R2.4 | Public API | ✅ `HistorianClient.SendEventAsync(HistorianEvent)`. Original events only (RevisionVersion=0) with string-valued properties; other property types + revision/update/delete throw `ProtocolEvidenceMissingException` until captured. |
| R2.5 | Round-trip test | ✅ Golden-byte on R2.2 + gated live test `SendEventAsync_AgainstLocalHistorian_AcceptedByServer` (asserts server acceptance; SQL read-back best-effort given the persistence caveat). |
**Acceptance:** an event sent from histsdk is accepted by the historian over WCF with a
byte-correct `AddS2` (✅). Appears-and-reads-back is environment-gated on event persistence (see caveat).
---
## Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)
*Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.*
| ID | Work | gRPC op |
|---|---|---|
| R3.1 | Decode non-streamed VTQ packet | `Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End` |
| R3.2 | `AddHistoricalValuesAsync` | batched begin→values→end |
| R3.3 | Ingest-permission validation | confirm the target accepts original-data insert (distinct from `AddS2` cache wall) |
**Acceptance:** historical points inserted and read back. Document clearly where this
differs from (gated) streaming sample writes.
---
## Milestone 4 — HARD subsystems (deferred / optional) (L each)
Only if the use case demands them. Each is a real subsystem, not an op.
| ID | Capability | Approach | Risk |
|---|---|---|---|
| R4.1 | Store-and-forward | **Pragmatic local queue** (durable outbox + replay on reconnect) rather than bit-faithful SF cache + `Forward*Snapshot`. Faithful SF = decode SF cache format + snapshot framing + recovery log | high; consider "good enough" |
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
| R4.3 | Real store-forward **status** | duplex push (`SetStoreForwardEvent`) or a decoded pull endpoint — see store-forward plan | medium |
| R4.4 | Multi-historian / redundancy | client-side orchestration over N single-historian sessions (failover, ReSyncTags, partner watchdog) — build last | medium |
---
## Won't-do from the client (GATED)
- **Streaming process-sample writes** (`AddStreamedValue(HistorianDataValue)` / `AddS2`):
runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your
ingestion architecture instead of pursuing this.
---
## Cross-cutting workstreams (run alongside all milestones)
- **CW-1 Capture tooling** (enables R0.5, R1.x, R2.1): one reusable "call op → dump
request/response `bytes` → sanitized fixture" path. Highest leverage — do first.
- **CW-2 Version compatibility:** matrix of tested Historian versions; serializers keyed
by version; CI gate.
- **CW-3 Cross-platform CI:** run the gRPC suite on Linux/macOS (transport is portable;
explicit-cred auth path).
- **CW-4 Fixtures discipline:** every new op ships a `fixtures/protocol/<op>/` golden file;
sanitize hostnames/tags/GUIDs before commit.
- **CW-5 Public API shape:** keep the modern surface (async, `IAsyncEnumerable`,
cancellation, options record, DI-friendly) consistent as the surface grows.
---
## Sequencing (critical path)
```
CW-1 capture tooling ─┐
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
R0.6 version gate ────┘
```
Recommended first sprint: **CW-1 + M0 (R0.1R0.6)** → a fully Windows-free, version-safe
gRPC client at today's capability. Second sprint: **M1a + M2** (cheap wins + the headline
event-send). M3/M4 as demand dictates.
## One-glance status
| Milestone | Tier | Effort | Value | When |
|---|---|---|---|---|
| M0 gRPC parity + capture tooling | foundation | M | unblocks everything, Windows-free | **now** |
| M1 cheap surface | TRIVIAL/BOUNDED | ML | most remaining read/config | next |
| M2 event send | CAPTURE | SM | headline write capability | next |
| M3 historical writes | BOUNDED | M | backfill | on demand |
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | defer |