f1e23a3a02
Roadmap Milestone 2 (event sending). Capture disproved the assumption that event delivery uses the non-WCF storage-engine pipe (which would block it like revision writes): a native AddStreamedValue(HistorianEvent) leaves over WCF as AddS2 (IHistoryServiceContract2.AddStreamValues2). CM_EVENT is a built-in registered tag, so the 129 TagNotFoundInCache gate that blocks AddS2 for user tags does not apply. - R2.1: NativeTraceHarness "event-send" scenario + Capture-EventSend.ps1; two captures diffed to separate constant framing from value-dependent fields. - R2.2: HistorianEventWriteProtocol serializes the AddS2 pBuf (storage sample buffer wrapping the event VTQ) — golden-byte tested. Decoded "OS" sig + length fields + CM_EVENT tag id + EventTime/ReceivedTime FILETIMEs + Opc 192 + 0x118D descriptor + event Id + Namespace + EventType + version 5 + typed property bag. - R2.3/R2.4: HistorianWcfEventOrchestrator.SendEventAsync (Open2 event-mode 0x501 -> reuse CM_EVENT RTag2/EnsT2 -> AddStreamValues2) + HistorianClient.SendEventAsync. - R2.5: gated live test; server accepts the AddS2 (success, empty error buffer). Server requires delivered byte[].Length == declared packet length (uint32@0x04); the native relies on the MDAS encoder adding a pad byte, so the SDK emits an explicit trailing 0x00 (else AddS2 rejects with "CValuStream buffer size vs packet length mismatch"). Original events only (RevisionVersion=0) with string properties; other property types + revision/update/delete throw ProtocolEvidenceMissingException. Caveat (documented): accepted events are not persisted on the local dev box; the native client behaves identically (event ingestion pipeline inactive) — not an SDK gap. 212 unit tests pass; 16/16 event tests pass live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
231 lines
16 KiB
Markdown
231 lines
16 KiB
Markdown
# HCAL modern-.NET client — implementation roadmap
|
||
|
||
Ordered, actionable plan to grow **histsdk** from "reads + basic config" into a broad
|
||
HCAL replacement, built on the **2023 R2 gRPC transport**. Derived from
|
||
[`hcal-capability-matrix.md`](hcal-capability-matrix.md); event details in
|
||
[`histevents.md`](histevents.md).
|
||
|
||
> Move to the repo's `docs/plans/` when execution starts. Each work item lands as: a
|
||
> protocol serializer/parser + golden-byte unit test + an env-gated live integration
|
||
> test against the local Historian.
|
||
|
||
## Progress (updated 2026-06-19)
|
||
|
||
- ✅ **R0.6 version gate** — `HistorianServerVersionGate` + `HistorianClientOptions.VerifyServerInterfaceVersion`;
|
||
fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are
|
||
evidence-based (Hist=11, Retr=4, Trx=2; Status reachability-only), captured from the
|
||
live server. 10 unit tests.
|
||
- ✅ **CW-1 capture pipeline** — `ProtocolCaptureSanitizer` + `ProtocolFixtureWriter` +
|
||
`capture-tag-info` CLI command; produces sanitized `fixtures/protocol/<op>/` golden files.
|
||
11 unit tests. First fixture: `get-tag-info/analog-*.json`.
|
||
|
||
> ⚠️ **Live-verification constraint:** the local Historian is **2020** (WCF, port 32568) — the
|
||
> 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1–R0.4) can be built and
|
||
> golden-byte/unit-tested here but **cannot be live-verified** without an actual 2023 R2 server.
|
||
> Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.
|
||
|
||
> 🔬 **M1a re-classification (2026-06-20).** Two "trivial" items were live-probed against the
|
||
> 2020 WCF server and found **not deliverable here**, both for evidence-backed reasons:
|
||
> - **R1.3 `GetServerTimeZoneAsync`** — `Status.GetSystemTimeZoneName` is a client-side *stub*
|
||
> on 2020 (rc=0, empty value), same family as `GetServerTime`. gRPC/2023R2-only.
|
||
> - **R1.1 `ExecuteSqlCommandAsync`** — `ExeC` returns native error 51 (InvalidParameter);
|
||
> the contract-3 string-handle ops require an unmapped native session/filter registration
|
||
> step (the `StartTagQuery` wall).
|
||
>
|
||
> Takeaway: the M1a "cheap surface" is *cheap only on the 2023 R2 gRPC front door*. On 2020 WCF
|
||
> the boundary is the **handle type** (see the string-handle wall note under §1b and
|
||
> `docs/reverse-engineering/wcf-string-handle-wall.md`): **`uint`-handle ops work, `string`-handle
|
||
> ops are blocked.** GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed,
|
||
> reachable). The genuinely reachable next items on 2020 WCF are the remaining **`uint`-handle**
|
||
> ops: **R1.8/R1.9 StartQuery summary/state modes** and **R1.7 event filters** (filter bytes ride
|
||
> the proven `uint`-handle `StartEventQuery`). Everything string-handle waits on one RE target:
|
||
> the native session/filter registration.
|
||
|
||
## Guiding principles
|
||
|
||
1. **gRPC-first.** New ops go on the `RemoteGrpc` transport (clean protobuf envelope);
|
||
the inner `bytes` blob is the only thing to RE. Keep WCF as the legacy/Windows path.
|
||
2. **Two tests per op, always.** A golden-byte test (deterministic, no server) **and** a
|
||
gated live test (`HISTORIAN_GRPC_HOST` / `HISTORIAN_HOST`). No op is "done" without both.
|
||
3. **Version-pin, fail closed.** Read server version at connect; gate every byte
|
||
serializer on it; throw `ProtocolEvidenceMissingException` on mismatch — never
|
||
best-effort parse.
|
||
4. **Capture once, encode forever.** For CAPTURE-tier items, instrument one native call,
|
||
save a sanitized fixture under `fixtures/protocol/`, then implement against the fixture.
|
||
5. **Ship per milestone.** Each milestone is independently releasable.
|
||
|
||
Effort: **S** ≈ days · **M** ≈ ~1 week · **L** ≈ weeks. Estimates are incremental on
|
||
histsdk's existing infra (auth chain, transport, frame primitives, test harness).
|
||
|
||
---
|
||
|
||
## Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)
|
||
|
||
*Goal: everything already working over WCF also works over `RemoteGrpc`, so the whole
|
||
read/browse/status surface is Windows-free and the gRPC stack is the default path.*
|
||
|
||
| ID | Work | gRPC op | Files | Verify | Effort |
|
||
|---|---|---|---|---|---|
|
||
| R0.1 | Route browse over gRPC | `Retrieval.StartTagQuery`/`QueryTag` or `GetTagInfosFromName` | `Grpc/HistorianGrpcReadOrchestrator` (+ new `…GrpcBrowseClient`), `Historian2020ProtocolDialect` | browse tags live over gRPC | S |
|
||
| R0.2 | Route tag metadata over gRPC | `Retrieval.GetTagInfosFromName` | dialect + grpc client | metadata matches WCF result | S |
|
||
| R0.3 | Route status/system-param over gRPC | `Status.GetSystemParameter`, `Status.GetHistorianConsoleStatus` | new `Grpc/HistorianGrpcStatusClient` | system param + conn status live | S |
|
||
| R0.4 | Probe over gRPC | `*.GetInterfaceVersion` | grpc clients | `ProbeAsync` Windows-free | XS |
|
||
| R0.5 | **Capture harness for gRPC payloads** | n/a | reuse `instrument-wcf-*` tooling (same byte blobs) + add a `grpc-call-dump` helper | dump any request/response `bytes` to a fixture | S |
|
||
| R0.6 | **Version gate** | server version at connect | `HistorianClientOptions`, orchestrators | mismatched version → throws | S |
|
||
|
||
**Acceptance:** the entire Phase-0 capability set runs end-to-end over `RemoteGrpc`
|
||
(incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.
|
||
|
||
---
|
||
|
||
## Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (M–L total)
|
||
|
||
*Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.*
|
||
|
||
### 1a. Trivial (XS–S each, no new payload format)
|
||
| ID | Capability | gRPC op | Notes |
|
||
|---|---|---|---|
|
||
| ~~R1.1~~ | ~~`ExecuteSqlCommandAsync`~~ | `Retrieval.ExecuteSqlCommand` | ⚠ **Blocked on 2020 WCF.** Live-probed 2026-06-20: `ExeC` returns native error type 4 / code **51 (InvalidParameter)** for every handle variant — same unmapped *native session/filter registration* prerequisite that blocks `StartTagQuery`/`QueryTag` (see `implementation-status.md` lines ~982, ~1404). Needs that registration RE'd, or a 2023 R2 gRPC server. Do not wire via guessed calls. |
|
||
| R1.2 | `GetRuntimeParameterAsync` | `Status.GetRuntimeParameter` | mirror `GetSystemParameter` |
|
||
| ~~R1.3~~ | ~~`GetServerTimeZoneAsync`~~ | `Status.GetSystemTimeZoneName` | ⚠ **gRPC/2023R2-only.** Verified 2026-06-20: over **2020 WCF** this op is a stub (rc=0, empty value) in the `GetServerTime` family — not shippable here. Build+verify only against a live 2023 R2 server. See `docs/reverse-engineering/wcf-status-localhost.md`. |
|
||
|
||
> ⛔ **String-handle wall (2026-06-20).** R1.4/R1.5/R1.6 (and R1.1) are **all blocked on 2020
|
||
> WCF** for the *same* reason: their ops take a **`string` GUID handle** and require an unmapped
|
||
> native session/filter registration. Probed live — GETHI returns code 1 for the exact native
|
||
> request shape across 5 handle formats + Stat.GetV priming; ExeC returns code 51. The proven
|
||
> surface uses **`uint`-handle** ops only. **One RE target — the native string-handle session
|
||
> registration — unblocks this whole sub-milestone.** Full analysis:
|
||
> `docs/reverse-engineering/wcf-string-handle-wall.md`. R1.8/R1.9 (StartQuery summary/state modes)
|
||
> are `uint`-handle and remain reachable on 2020 WCF.
|
||
|
||
### 1b. Bounded (decode one `bytes` payload; S–M each)
|
||
| ID | Capability | gRPC op | Payload to decode | Depends |
|
||
|---|---|---|---|---|
|
||
| ~~R1.4~~ | `GetHistorianInfoAsync` | `Status.GetHistorianInfo` | ⛔ **string-handle wall** — GETHI returns code 1 on 2020 WCF (all handle/priming variants). GETHI buffer incl. `EventStorageMode`@514. | string-handle RE |
|
||
| ~~R1.5~~ | Extended-property **read** | `Retrieval.GetTagExtendedPropertiesFromName` | ⛔ **string-handle wall** (GetTepByNm takes `string handle`). TEP result buffer. | string-handle RE |
|
||
| ~~R1.6~~ | Localized-property **read** | `Retrieval.GetTagLocalizedPropertiesFromName` | ⛔ **string-handle wall** (same family). | string-handle RE |
|
||
| R1.7 | Event **filters** | filter bytes in `Retrieval.StartEventQuery` | filter predicate encoding (name/op/value) — **`uint`-handle**, reachable | R0.5 |
|
||
| R1.8 | Analog-summary query | `Retrieval.StartQuery` (summary mode) | summary row layout — **`uint`-handle, reachable. Scoped + decode targets located** (`CAnalogSummaryValue.UnpackFromValueBuffer`, fields Min/Max/First/Last/ValueCount/Integral/…). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
|
||
| R1.9 | State-summary query | `Retrieval.StartQuery` (state mode) | state-summary row layout — **`uint`-handle, reachable. Scoped** (`CStateSummaryStruct`: MinContained/MaxContained/TotalContained/PartialStart/PartialEnd/StateEntryCount). Plan: [`r1.8-r1.9-summary-queries.md`](r1.8-r1.9-summary-queries.md) | — |
|
||
|
||
### 1c. Bounded config writes (S–M each)
|
||
| ID | Capability | gRPC op | Payload | Notes |
|
||
|---|---|---|---|---|
|
||
| R1.10 | `RenameTagsAsync` | History rename op | rename request buffer | `AllowRenameTags` already probed |
|
||
| R1.11 | Extended-property **write** | `History.AddTagExtendedProperties` (+ groups) / `DeleteTagExtendedProperties` | TEP serialize | mirror analog CTagMetadata discipline |
|
||
| R1.12 | Localized-property **write** | `History.AddTagLocalizedProperties` / `DeleteTagLocalizedProperties` | localized serialize | |
|
||
| R1.13 | Non-analog tag create (string/discrete) | `History.EnsureTags` | distinct CTagMetadata variant | ⚠ native AddTag rejected some types — confirm server path first; may be GATED |
|
||
|
||
**Acceptance:** read + browse + metadata + system/status + property R/W + summaries +
|
||
event-filtered reads + rename all live-verified over gRPC.
|
||
|
||
---
|
||
|
||
## Milestone 2 — Event sending (CAPTURE) (S–M) ← headline gap
|
||
|
||
*Goal: `SendEventAsync(HistorianEvent)`. Path fully mapped in histevents.md; one capture away.*
|
||
|
||
> ✅ **DONE (2026-06-20) — `HistorianClient.SendEventAsync(HistorianEvent)` shipped and
|
||
> live-accepted over 2020 WCF.** The headline assumption — that event delivery would ride the
|
||
> non-WCF storage-engine pipe (and so be blocked like revision writes) — was **disproved by
|
||
> capture**: a native `AddStreamedValue(HistorianEvent)` leaves over WCF as **`AddS2`
|
||
> (`IHistoryServiceContract2.AddStreamValues2`)**. CM_EVENT is a built-in registered tag, so the
|
||
> `129 TagNotFoundInCache` gate that blocks `AddS2` for user tags does **not** apply to events.
|
||
> The full managed chain (Open2 event-mode **0x501** → CM_EVENT RTag2/EnsT2 → AddS2) is accepted
|
||
> by the server (`AddS2` returns success, empty error buffer). See the event-send field map under
|
||
> §"Event-send wire format" in `histevents.md` and `HistorianEventWriteProtocol`.
|
||
>
|
||
> ⚠️ **Persistence caveat (environment, not SDK):** on the local dev Historian, accepted events
|
||
> are **not persisted** to the queryable store (`v_AlarmEventHistory2` latest stays at the
|
||
> pre-test date; count only ages down). The **native** client exhibits the identical behaviour
|
||
> (its `AddS2` also returns success but nothing lands), so this is the box's event-ingestion
|
||
> pipeline not being active — not an SDK protocol gap. The SDK emits byte-equivalent `AddS2`
|
||
> (golden-tested). Full send→store→read-back round-trip awaits a Historian with an active event
|
||
> storage pipeline.
|
||
|
||
| ID | Work | Status |
|
||
|---|---|---|
|
||
| R2.1 | Capture the event value blob | ✅ `scripts/Capture-EventSend.ps1` (event-send harness scenario + instrument-wcf-{write,read}message); two captures diffed to separate constant framing from value fields. Decisive finding: event-send = WCF `AddS2`, not storage pipe. |
|
||
| R2.2 | `HistorianEventWriteProtocol` | ✅ Serializes the `AddS2` pBuf (storage sample buffer wrapping the event VTQ): "OS" sig + sampleCount + length fields + CM_EVENT tag id + EventTime FILETIME + OpcQuality + opaque descriptor + event Id + ReceivedTime FILETIME + Namespace + EventType + version + typed property bag (string props reuse the read parser's `0x43` encoding). Golden-byte test pins capture A. |
|
||
| R2.3 | Event write orchestrator | ✅ `HistorianWcfEventOrchestrator.SendEventAsync`: Open2 (0x501) → reuse CM_EVENT RTag2/EnsT2 registration → `AddStreamValues2(handle, pBuf, out err)` on the same /Hist channel + storage-session handle. |
|
||
| R2.4 | Public API | ✅ `HistorianClient.SendEventAsync(HistorianEvent)`. Original events only (RevisionVersion=0) with string-valued properties; other property types + revision/update/delete throw `ProtocolEvidenceMissingException` until captured. |
|
||
| R2.5 | Round-trip test | ✅ Golden-byte on R2.2 + gated live test `SendEventAsync_AgainstLocalHistorian_AcceptedByServer` (asserts server acceptance; SQL read-back best-effort given the persistence caveat). |
|
||
|
||
**Acceptance:** an event sent from histsdk is accepted by the historian over WCF with a
|
||
byte-correct `AddS2` (✅). Appears-and-reads-back is environment-gated on event persistence (see caveat).
|
||
|
||
---
|
||
|
||
## Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)
|
||
|
||
*Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.*
|
||
|
||
| ID | Work | gRPC op |
|
||
|---|---|---|
|
||
| R3.1 | Decode non-streamed VTQ packet | `Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End` |
|
||
| R3.2 | `AddHistoricalValuesAsync` | batched begin→values→end |
|
||
| R3.3 | Ingest-permission validation | confirm the target accepts original-data insert (distinct from `AddS2` cache wall) |
|
||
|
||
**Acceptance:** historical points inserted and read back. Document clearly where this
|
||
differs from (gated) streaming sample writes.
|
||
|
||
---
|
||
|
||
## Milestone 4 — HARD subsystems (deferred / optional) (L each)
|
||
|
||
Only if the use case demands them. Each is a real subsystem, not an op.
|
||
|
||
| ID | Capability | Approach | Risk |
|
||
|---|---|---|---|
|
||
| R4.1 | Store-and-forward | **Pragmatic local queue** (durable outbox + replay on reconnect) rather than bit-faithful SF cache + `Forward*Snapshot`. Faithful SF = decode SF cache format + snapshot framing + recovery log | high; consider "good enough" |
|
||
| R4.2 | Revision / edit writes | `AddRevisionValue(s)` go via the **non-WCF storage-engine pipe** (`STransactPipeClient2`) — separate transport RE | high |
|
||
| R4.3 | Real store-forward **status** | duplex push (`SetStoreForwardEvent`) or a decoded pull endpoint — see store-forward plan | medium |
|
||
| R4.4 | Multi-historian / redundancy | client-side orchestration over N single-historian sessions (failover, ReSyncTags, partner watchdog) — build last | medium |
|
||
|
||
---
|
||
|
||
## Won't-do from the client (GATED)
|
||
|
||
- **Streaming process-sample writes** (`AddStreamedValue(HistorianDataValue)` / `AddS2`):
|
||
runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your
|
||
ingestion architecture instead of pursuing this.
|
||
|
||
---
|
||
|
||
## Cross-cutting workstreams (run alongside all milestones)
|
||
|
||
- **CW-1 Capture tooling** (enables R0.5, R1.x, R2.1): one reusable "call op → dump
|
||
request/response `bytes` → sanitized fixture" path. Highest leverage — do first.
|
||
- **CW-2 Version compatibility:** matrix of tested Historian versions; serializers keyed
|
||
by version; CI gate.
|
||
- **CW-3 Cross-platform CI:** run the gRPC suite on Linux/macOS (transport is portable;
|
||
explicit-cred auth path).
|
||
- **CW-4 Fixtures discipline:** every new op ships a `fixtures/protocol/<op>/` golden file;
|
||
sanitize hostnames/tags/GUIDs before commit.
|
||
- **CW-5 Public API shape:** keep the modern surface (async, `IAsyncEnumerable`,
|
||
cancellation, options record, DI-friendly) consistent as the surface grows.
|
||
|
||
---
|
||
|
||
## Sequencing (critical path)
|
||
|
||
```
|
||
CW-1 capture tooling ─┐
|
||
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
|
||
R0.6 version gate ────┘
|
||
```
|
||
|
||
Recommended first sprint: **CW-1 + M0 (R0.1–R0.6)** → a fully Windows-free, version-safe
|
||
gRPC client at today's capability. Second sprint: **M1a + M2** (cheap wins + the headline
|
||
event-send). M3/M4 as demand dictates.
|
||
|
||
## One-glance status
|
||
|
||
| Milestone | Tier | Effort | Value | When |
|
||
|---|---|---|---|---|
|
||
| M0 gRPC parity + capture tooling | foundation | M | unblocks everything, Windows-free | **now** |
|
||
| M1 cheap surface | TRIVIAL/BOUNDED | M–L | most remaining read/config | next |
|
||
| M2 event send | CAPTURE | S–M | headline write capability | next |
|
||
| M3 historical writes | BOUNDED | M | backfill | on demand |
|
||
| M4 SF / revisions / redundancy | HARD | L×N | parity completeness | defer |
|