Files
histsdk/docs/plans/hcal-roadmap.md
T
Joseph Doherty f1e23a3a02 M2: implement SendEventAsync — event-send rides WCF AddS2, not the storage pipe
Roadmap Milestone 2 (event sending). Capture disproved the assumption that event
delivery uses the non-WCF storage-engine pipe (which would block it like revision
writes): a native AddStreamedValue(HistorianEvent) leaves over WCF as AddS2
(IHistoryServiceContract2.AddStreamValues2). CM_EVENT is a built-in registered tag,
so the 129 TagNotFoundInCache gate that blocks AddS2 for user tags does not apply.

- R2.1: NativeTraceHarness "event-send" scenario + Capture-EventSend.ps1; two
  captures diffed to separate constant framing from value-dependent fields.
- R2.2: HistorianEventWriteProtocol serializes the AddS2 pBuf (storage sample buffer
  wrapping the event VTQ) — golden-byte tested. Decoded "OS" sig + length fields +
  CM_EVENT tag id + EventTime/ReceivedTime FILETIMEs + Opc 192 + 0x118D descriptor +
  event Id + Namespace + EventType + version 5 + typed property bag.
- R2.3/R2.4: HistorianWcfEventOrchestrator.SendEventAsync (Open2 event-mode 0x501 ->
  reuse CM_EVENT RTag2/EnsT2 -> AddStreamValues2) + HistorianClient.SendEventAsync.
- R2.5: gated live test; server accepts the AddS2 (success, empty error buffer).

Server requires delivered byte[].Length == declared packet length (uint32@0x04); the
native relies on the MDAS encoder adding a pad byte, so the SDK emits an explicit
trailing 0x00 (else AddS2 rejects with "CValuStream buffer size vs packet length
mismatch"). Original events only (RevisionVersion=0) with string properties; other
property types + revision/update/delete throw ProtocolEvidenceMissingException.

Caveat (documented): accepted events are not persisted on the local dev box; the
native client behaves identically (event ingestion pipeline inactive) — not an SDK
gap. 212 unit tests pass; 16/16 event tests pass live.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B6mcaT2PjRFKcogzp9UkfC
2026-06-20 18:00:52 -04:00

16 KiB
Raw Blame History

HCAL modern-.NET client — implementation roadmap

Ordered, actionable plan to grow histsdk from "reads + basic config" into a broad HCAL replacement, built on the 2023 R2 gRPC transport. Derived from hcal-capability-matrix.md; event details in histevents.md.

Move to the repo's docs/plans/ when execution starts. Each work item lands as: a protocol serializer/parser + golden-byte unit test + an env-gated live integration test against the local Historian.

Progress (updated 2026-06-19)

  • R0.6 version gateHistorianServerVersionGate + HistorianClientOptions.VerifyServerInterfaceVersion; fail-closed on connect, wired into both WCF and gRPC paths. Supported versions are evidence-based (Hist=11, Retr=4, Trx=2; Status reachability-only), captured from the live server. 10 unit tests.
  • CW-1 capture pipelineProtocolCaptureSanitizer + ProtocolFixtureWriter + capture-tag-info CLI command; produces sanitized fixtures/protocol/<op>/ golden files. 11 unit tests. First fixture: get-tag-info/analog-*.json.

⚠️ Live-verification constraint: the local Historian is 2020 (WCF, port 32568) — the 2023 R2 gRPC endpoint (32565) is absent. M0's gRPC routing (R0.1R0.4) can be built and golden-byte/unit-tested here but cannot be live-verified without an actual 2023 R2 server. Treat gRPC ops as unverified until then; the byte payloads remain the proven 2020 protocol.

🔬 M1a re-classification (2026-06-20). Two "trivial" items were live-probed against the 2020 WCF server and found not deliverable here, both for evidence-backed reasons:

  • R1.3 GetServerTimeZoneAsyncStatus.GetSystemTimeZoneName is a client-side stub on 2020 (rc=0, empty value), same family as GetServerTime. gRPC/2023R2-only.
  • R1.1 ExecuteSqlCommandAsyncExeC returns native error 51 (InvalidParameter); the contract-3 string-handle ops require an unmapped native session/filter registration step (the StartTagQuery wall).

Takeaway: the M1a "cheap surface" is cheap only on the 2023 R2 gRPC front door. On 2020 WCF the boundary is the handle type (see the string-handle wall note under §1b and docs/reverse-engineering/wcf-string-handle-wall.md): uint-handle ops work, string-handle ops are blocked. GETHI/GetTepByNm were probed and confirmed blocked (not, as first guessed, reachable). The genuinely reachable next items on 2020 WCF are the remaining uint-handle ops: R1.8/R1.9 StartQuery summary/state modes and R1.7 event filters (filter bytes ride the proven uint-handle StartEventQuery). Everything string-handle waits on one RE target: the native session/filter registration.

Guiding principles

  1. gRPC-first. New ops go on the RemoteGrpc transport (clean protobuf envelope); the inner bytes blob is the only thing to RE. Keep WCF as the legacy/Windows path.
  2. Two tests per op, always. A golden-byte test (deterministic, no server) and a gated live test (HISTORIAN_GRPC_HOST / HISTORIAN_HOST). No op is "done" without both.
  3. Version-pin, fail closed. Read server version at connect; gate every byte serializer on it; throw ProtocolEvidenceMissingException on mismatch — never best-effort parse.
  4. Capture once, encode forever. For CAPTURE-tier items, instrument one native call, save a sanitized fixture under fixtures/protocol/, then implement against the fixture.
  5. Ship per milestone. Each milestone is independently releasable.

Effort: S ≈ days · M ≈ ~1 week · L ≈ weeks. Estimates are incremental on histsdk's existing infra (auth chain, transport, frame primitives, test harness).


Milestone 0 — Foundation: full gRPC parity for the DONE surface (M)

Goal: everything already working over WCF also works over RemoteGrpc, so the whole read/browse/status surface is Windows-free and the gRPC stack is the default path.

ID Work gRPC op Files Verify Effort
R0.1 Route browse over gRPC Retrieval.StartTagQuery/QueryTag or GetTagInfosFromName Grpc/HistorianGrpcReadOrchestrator (+ new …GrpcBrowseClient), Historian2020ProtocolDialect browse tags live over gRPC S
R0.2 Route tag metadata over gRPC Retrieval.GetTagInfosFromName dialect + grpc client metadata matches WCF result S
R0.3 Route status/system-param over gRPC Status.GetSystemParameter, Status.GetHistorianConsoleStatus new Grpc/HistorianGrpcStatusClient system param + conn status live S
R0.4 Probe over gRPC *.GetInterfaceVersion grpc clients ProbeAsync Windows-free XS
R0.5 Capture harness for gRPC payloads n/a reuse instrument-wcf-* tooling (same byte blobs) + add a grpc-call-dump helper dump any request/response bytes to a fixture S
R0.6 Version gate server version at connect HistorianClientOptions, orchestrators mismatched version → throws S

Acceptance: the entire Phase-0 capability set runs end-to-end over RemoteGrpc (incl. Linux), no WCF on the path. 188+ unit tests green; live gRPC integration suite green.


Milestone 1 — Cheap surface completion (TRIVIAL/BOUNDED) (ML total)

Goal: knock out the remaining read/config surface. Order = ascending payload difficulty.

1a. Trivial (XSS each, no new payload format)

ID Capability gRPC op Notes
R1.1 ExecuteSqlCommandAsync Retrieval.ExecuteSqlCommand Blocked on 2020 WCF. Live-probed 2026-06-20: ExeC returns native error type 4 / code 51 (InvalidParameter) for every handle variant — same unmapped native session/filter registration prerequisite that blocks StartTagQuery/QueryTag (see implementation-status.md lines ~982, ~1404). Needs that registration RE'd, or a 2023 R2 gRPC server. Do not wire via guessed calls.
R1.2 GetRuntimeParameterAsync Status.GetRuntimeParameter mirror GetSystemParameter
R1.3 GetServerTimeZoneAsync Status.GetSystemTimeZoneName gRPC/2023R2-only. Verified 2026-06-20: over 2020 WCF this op is a stub (rc=0, empty value) in the GetServerTime family — not shippable here. Build+verify only against a live 2023 R2 server. See docs/reverse-engineering/wcf-status-localhost.md.

String-handle wall (2026-06-20). R1.4/R1.5/R1.6 (and R1.1) are all blocked on 2020 WCF for the same reason: their ops take a string GUID handle and require an unmapped native session/filter registration. Probed live — GETHI returns code 1 for the exact native request shape across 5 handle formats + Stat.GetV priming; ExeC returns code 51. The proven surface uses uint-handle ops only. One RE target — the native string-handle session registration — unblocks this whole sub-milestone. Full analysis: docs/reverse-engineering/wcf-string-handle-wall.md. R1.8/R1.9 (StartQuery summary/state modes) are uint-handle and remain reachable on 2020 WCF.

1b. Bounded (decode one bytes payload; SM each)

ID Capability gRPC op Payload to decode Depends
R1.4 GetHistorianInfoAsync Status.GetHistorianInfo string-handle wall — GETHI returns code 1 on 2020 WCF (all handle/priming variants). GETHI buffer incl. EventStorageMode@514. string-handle RE
R1.5 Extended-property read Retrieval.GetTagExtendedPropertiesFromName string-handle wall (GetTepByNm takes string handle). TEP result buffer. string-handle RE
R1.6 Localized-property read Retrieval.GetTagLocalizedPropertiesFromName string-handle wall (same family). string-handle RE
R1.7 Event filters filter bytes in Retrieval.StartEventQuery filter predicate encoding (name/op/value) — uint-handle, reachable R0.5
R1.8 Analog-summary query Retrieval.StartQuery (summary mode) summary row layout — uint-handle, reachable. Scoped + decode targets located (CAnalogSummaryValue.UnpackFromValueBuffer, fields Min/Max/First/Last/ValueCount/Integral/…). Plan: r1.8-r1.9-summary-queries.md
R1.9 State-summary query Retrieval.StartQuery (state mode) state-summary row layout — uint-handle, reachable. Scoped (CStateSummaryStruct: MinContained/MaxContained/TotalContained/PartialStart/PartialEnd/StateEntryCount). Plan: r1.8-r1.9-summary-queries.md

1c. Bounded config writes (SM each)

ID Capability gRPC op Payload Notes
R1.10 RenameTagsAsync History rename op rename request buffer AllowRenameTags already probed
R1.11 Extended-property write History.AddTagExtendedProperties (+ groups) / DeleteTagExtendedProperties TEP serialize mirror analog CTagMetadata discipline
R1.12 Localized-property write History.AddTagLocalizedProperties / DeleteTagLocalizedProperties localized serialize
R1.13 Non-analog tag create (string/discrete) History.EnsureTags distinct CTagMetadata variant ⚠ native AddTag rejected some types — confirm server path first; may be GATED

Acceptance: read + browse + metadata + system/status + property R/W + summaries + event-filtered reads + rename all live-verified over gRPC.


Milestone 2 — Event sending (CAPTURE) (SM) ← headline gap

Goal: SendEventAsync(HistorianEvent). Path fully mapped in histevents.md; one capture away.

DONE (2026-06-20) — HistorianClient.SendEventAsync(HistorianEvent) shipped and live-accepted over 2020 WCF. The headline assumption — that event delivery would ride the non-WCF storage-engine pipe (and so be blocked like revision writes) — was disproved by capture: a native AddStreamedValue(HistorianEvent) leaves over WCF as AddS2 (IHistoryServiceContract2.AddStreamValues2). CM_EVENT is a built-in registered tag, so the 129 TagNotFoundInCache gate that blocks AddS2 for user tags does not apply to events. The full managed chain (Open2 event-mode 0x501 → CM_EVENT RTag2/EnsT2 → AddS2) is accepted by the server (AddS2 returns success, empty error buffer). See the event-send field map under §"Event-send wire format" in histevents.md and HistorianEventWriteProtocol.

⚠️ Persistence caveat (environment, not SDK): on the local dev Historian, accepted events are not persisted to the queryable store (v_AlarmEventHistory2 latest stays at the pre-test date; count only ages down). The native client exhibits the identical behaviour (its AddS2 also returns success but nothing lands), so this is the box's event-ingestion pipeline not being active — not an SDK protocol gap. The SDK emits byte-equivalent AddS2 (golden-tested). Full send→store→read-back round-trip awaits a Historian with an active event storage pipeline.

ID Work Status
R2.1 Capture the event value blob scripts/Capture-EventSend.ps1 (event-send harness scenario + instrument-wcf-{write,read}message); two captures diffed to separate constant framing from value fields. Decisive finding: event-send = WCF AddS2, not storage pipe.
R2.2 HistorianEventWriteProtocol Serializes the AddS2 pBuf (storage sample buffer wrapping the event VTQ): "OS" sig + sampleCount + length fields + CM_EVENT tag id + EventTime FILETIME + OpcQuality + opaque descriptor + event Id + ReceivedTime FILETIME + Namespace + EventType + version + typed property bag (string props reuse the read parser's 0x43 encoding). Golden-byte test pins capture A.
R2.3 Event write orchestrator HistorianWcfEventOrchestrator.SendEventAsync: Open2 (0x501) → reuse CM_EVENT RTag2/EnsT2 registration → AddStreamValues2(handle, pBuf, out err) on the same /Hist channel + storage-session handle.
R2.4 Public API HistorianClient.SendEventAsync(HistorianEvent). Original events only (RevisionVersion=0) with string-valued properties; other property types + revision/update/delete throw ProtocolEvidenceMissingException until captured.
R2.5 Round-trip test Golden-byte on R2.2 + gated live test SendEventAsync_AgainstLocalHistorian_AcceptedByServer (asserts server acceptance; SQL read-back best-effort given the persistence caveat).

Acceptance: an event sent from histsdk is accepted by the historian over WCF with a byte-correct AddS2 (). Appears-and-reads-back is environment-gated on event persistence (see caveat).


Milestone 3 — Historical / non-streamed value writes (BOUNDED) (M)

Goal: insert original historical VTQs (backfill), the path that is NOT the gated cache push.

ID Work gRPC op
R3.1 Decode non-streamed VTQ packet Transaction.AddNonStreamValuesBegin/AddNonStreamValues/End
R3.2 AddHistoricalValuesAsync batched begin→values→end
R3.3 Ingest-permission validation confirm the target accepts original-data insert (distinct from AddS2 cache wall)

Acceptance: historical points inserted and read back. Document clearly where this differs from (gated) streaming sample writes.


Milestone 4 — HARD subsystems (deferred / optional) (L each)

Only if the use case demands them. Each is a real subsystem, not an op.

ID Capability Approach Risk
R4.1 Store-and-forward Pragmatic local queue (durable outbox + replay on reconnect) rather than bit-faithful SF cache + Forward*Snapshot. Faithful SF = decode SF cache format + snapshot framing + recovery log high; consider "good enough"
R4.2 Revision / edit writes AddRevisionValue(s) go via the non-WCF storage-engine pipe (STransactPipeClient2) — separate transport RE high
R4.3 Real store-forward status duplex push (SetStoreForwardEvent) or a decoded pull endpoint — see store-forward plan medium
R4.4 Multi-historian / redundancy client-side orchestration over N single-historian sessions (failover, ReSyncTags, partner watchdog) — build last medium

Won't-do from the client (GATED)

  • Streaming process-sample writes (AddStreamedValue(HistorianDataValue) / AddS2): runtime cache only ingests from configured IOServer/AppServer pipelines. Confirm your ingestion architecture instead of pursuing this.

Cross-cutting workstreams (run alongside all milestones)

  • CW-1 Capture tooling (enables R0.5, R1.x, R2.1): one reusable "call op → dump request/response bytes → sanitized fixture" path. Highest leverage — do first.
  • CW-2 Version compatibility: matrix of tested Historian versions; serializers keyed by version; CI gate.
  • CW-3 Cross-platform CI: run the gRPC suite on Linux/macOS (transport is portable; explicit-cred auth path).
  • CW-4 Fixtures discipline: every new op ships a fixtures/protocol/<op>/ golden file; sanitize hostnames/tags/GUIDs before commit.
  • CW-5 Public API shape: keep the modern surface (async, IAsyncEnumerable, cancellation, options record, DI-friendly) consistent as the surface grows.

Sequencing (critical path)

CW-1 capture tooling ─┐
M0 gRPC parity ───────┼─→ M1 cheap surface ─→ M2 event send ─→ M3 historical writes ─→ (M4 optional)
R0.6 version gate ────┘

Recommended first sprint: CW-1 + M0 (R0.1R0.6) → a fully Windows-free, version-safe gRPC client at today's capability. Second sprint: M1a + M2 (cheap wins + the headline event-send). M3/M4 as demand dictates.

One-glance status

Milestone Tier Effort Value When
M0 gRPC parity + capture tooling foundation M unblocks everything, Windows-free now
M1 cheap surface TRIVIAL/BOUNDED ML most remaining read/config next
M2 event send CAPTURE SM headline write capability next
M3 historical writes BOUNDED M backfill on demand
M4 SF / revisions / redundancy HARD L×N parity completeness defer